Case Study - BGP Routing Policy

Introduction

BGP Routing policy is a very interesting topic. I get asked about it formally and informally all the time. I have to admit, there are lots of ways to organize an automous system. Vendors have unique features and templating / procedural functions, but in the end, BGP routing policy all boils down to two+two things:

  1. Not accepting the prefixes you don’t want (inbound)
    • For those prefixes accepted, ensure they have correct attributes.
  2. Not announcing prefixes to folks who shouldn’t see them (outbound)
    • For those prefixes announced, ensure they have correct attributes.

At IPng Networks, I’ve cycled through a few iterations and landed on a specific setup that works well for me. It provides sufficient information to enable our downstream (customers) to make good decisions on what they should accept from us, as well as enough expressivity for them to determine which prefixes we should propagate for them, where, and how.

This article describes one approach to a relatively feature rich routing policy which is in use at IPng Networks (AS8298). It uses the Bird2 configuration language, although the concepts would be implementable in ~any modern routing suite (ie. FRR, Cisco, Juniper, Arista, Extreme, et cetera).

Interested in one operator’s opinion? Read on!

1. Concepts

There are three basic pieces of routing filtering, which I’ll describe briefly.

Prefix Lists

A prefix list (also sometimes referred to as an access-list in older software) is a list of IPv4 of IPv6 prefixes, often with a prefixlen boundary, that determines if a given prefix is “in” or “out”.

An example could be: 2001:db8::/32{32,48} which describes any prefix in the supernet 2001:db8::/32 that has a prefix length of anywhere between /32 and /48, inclusive.

AS Paths

In BGP, each prefix learned comes with an AS path on how to reach it. If my router learns a prefix from a peer with AS number 65520, it’ll see every prefix that peer sends as a list of AS numbers starting with 65520. With AS Paths, the very first one in the list is the one the router directly learned the prefix from, and the very last one is the origin of the prefix. Often times the prefix is shown as a regular expression, starting with ^ and ending with $ and to help readability, spaces are often written as _.

Examples: ^25091_1299_3301$ and ^58299_174_1299_3301$

BGP Communities

When learning (or originating) a prefix in BGP, zero or more so called communities can be added to it along the way. The Routing Information Base or RIB carries these communities and can share them between peering sessions. Communities can be added, removed and modified. Some communities have special meaning (which is agreed upon by everyone), and some have local meaning (agreed upon by only one or a small set of operators).

There’s three types of communities: normal communities are a pair of 16-bit integers; extended communities are 8 bytes, split into one 16-bit integer and an additional 48-bit value; and finally large communities consist of a triplet of 32-bit values.

Examples: (8298, 1234) (normal), or (8298, 3, 212323) (large)

Routing Policy

Now that I’ve explained a little bit about the ingredients we have to work with, let me share an observation that took me a few decades to make: BGP sessions are really all the same. As such, every single one of the BGP sessions at IPng Networks are generated with one template. What makes the difference between ‘Transit’, ‘Customer’ and ‘Peer’ and ‘Private Interconnect’, really all boils down to what types of filtering are applied on in- and outbound updates. I will demonstrate this by means of two main functions in Bird: ebgp_import() discussed first in the section Inbound: Learning Routes section, and ebgp_export() in the section Outbound: Announcing Routes.

2. Inbound: Learning Routes

Let’s consider this function:

function ebgp_import(int remote_as) {
  if aspath_bogon() then return false;
  if (net.type = NET_IP4 && ipv4_bogon()) then return false;
  if (net.type = NET_IP6 && ipv6_bogon()) then return false;

  if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
  if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;

  # Demote certain AS nexthops to lower pref
  if (bgp_path.first ~ AS_LOCALPREF50 && bgp_path.len > 1) then bgp_local_pref = 50;
  if (bgp_path.first ~ AS_LOCALPREF30 && bgp_path.len > 1) then bgp_local_pref = 30;
  if (bgp_path.first ~ AS_LOCALPREF10 && bgp_path.len > 1) then bgp_local_pref = 10;

  # Graceful Shutdown (RFC8326)
  if (65535, 0) ~ bgp_community then bgp_local_pref = 0;

  # Scrub BLACKHOLE community
  bgp_community.delete((65535, 666));

  return true;
}

The function works by order of elimination – for each prefix that is offered on the session, it will either be rejected (by means of returning false), or modified (by means of setting attributes like bgp_local_pref) and then accepted (by means of returning true).

AS-Path Bogon filtering is a way to remove prefixes that have an invalid AS number in their path. The main example of this are private AS numbers (64496-131071) and their 32 bit equivalents (4200000000-4294967295). In case you haven’t come across this yet, AS number 23456 is also magic, see RFC4893 for details:

function aspath_bogon() {
  return bgp_path ~ [0, 23456, 64496..131071, 4200000000..4294967295];
}

Prefix Bogon comes next, as certain prefixes that are not publicly routable (you know, such as RFC1918, but there are many others). They look differently for IPv4 and IPv6:

function ipv4_bogon() {
  return net ~ [
    0.0.0.0/0,              # Default
    0.0.0.0/32-,            # RFC 5735 Special Use IPv4 Addresses
    0.0.0.0/0{0,7},         # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3
    10.0.0.0/8+,            # RFC 1918 Address Allocation for Private Internets
    100.64.0.0/10+,         # RFC 6598 IANA-Reserved IPv4 Prefix for Shared Address Space
    127.0.0.0/8+,           # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3
    169.254.0.0/16+,        # RFC 3927 Dynamic Configuration of IPv4 Link-Local Addresses
    172.16.0.0/12+,         # RFC 1918 Address Allocation for Private Internets
    192.0.0.0/24+,          # RFC 6890 Special-Purpose Address Registries
    192.0.2.0/24+,          # RFC 5737 IPv4 Address Blocks Reserved for Documentation
    192.168.0.0/16+,        # RFC 1918 Address Allocation for Private Internets
    198.18.0.0/15+,         # RFC 2544 Benchmarking Methodology for Network Interconnect Devices
    198.51.100.0/24+,       # RFC 5737 IPv4 Address Blocks Reserved for Documentation
    203.0.113.0/24+,        # RFC 5737 IPv4 Address Blocks Reserved for Documentation
    224.0.0.0/4+,           # RFC 1112 Host Extensions for IP Multicasting
    240.0.0.0/4+            # RFC 6890 Special-Purpose Address Registries
  ];
}

function ipv6_bogon() {
 return net ~ [
    ::/0,                   # Default
    ::/96,                  # IPv4-compatible IPv6 address - deprecated by RFC4291
    ::/128,                 # Unspecified address
    ::1/128,                # Local host loopback address
    ::ffff:0.0.0.0/96+,     # IPv4-mapped addresses
    ::224.0.0.0/100+,       # Compatible address (IPv4 format)
    ::127.0.0.0/104+,       # Compatible address (IPv4 format)
    ::0.0.0.0/104+,         # Compatible address (IPv4 format)
    ::255.0.0.0/104+,       # Compatible address (IPv4 format)
    0000::/8+,              # Pool used for unspecified, loopback and embedded IPv4 addresses
    0100::/8+,              # RFC 6666 - reserved for Discard-Only Address Block
    0200::/7+,              # OSI NSAP-mapped prefix set (RFC4548) - deprecated by RFC4048
    0400::/6+,              # RFC 4291 - Reserved by IETF
    0800::/5+,              # RFC 4291 - Reserved by IETF
    1000::/4+,              # RFC 4291 - Reserved by IETF
    2001:10::/28+,          # RFC 4843 - Deprecated (previously ORCHID)
    2001:20::/28+,          # RFC 7343 - ORCHIDv2
    2001:db8::/32+,         # Reserved by IANA for special purposes and documentation
    2002:e000::/20+,        # Invalid 6to4 packets (IPv4 multicast)
    2002:7f00::/24+,        # Invalid 6to4 packets (IPv4 loopback)
    2002:0000::/24+,        # Invalid 6to4 packets (IPv4 default)
    2002:ff00::/24+,        # Invalid 6to4 packets
    2002:0a00::/24+,        # Invalid 6to4 packets (IPv4 private 10.0.0.0/8 network)
    2002:ac10::/28+,        # Invalid 6to4 packets (IPv4 private 172.16.0.0/12 network)
    2002:c0a8::/32+,        # Invalid 6to4 packets (IPv4 private 192.168.0.0/16 network)
    3ffe::/16+,             # Former 6bone, now decommissioned
    4000::/3+,              # RFC 4291 - Reserved by IETF
    5f00::/8+,              # RFC 5156 - used for the 6bone but was returned
    6000::/3+,              # RFC 4291 - Reserved by IETF
    8000::/3+,              # RFC 4291 - Reserved by IETF
    a000::/3+,              # RFC 4291 - Reserved by IETF
    c000::/3+,              # RFC 4291 - Reserved by IETF
    e000::/4+,              # RFC 4291 - Reserved by IETF
    f000::/5+,              # RFC 4291 - Reserved by IETF
    f800::/6+,              # RFC 4291 - Reserved by IETF
    fc00::/7+,              # Unicast Unique Local Addresses (ULA) - RFC 4193
    fe80::/10+,             # Link-local Unicast
    fec0::/10+,             # Site-local Unicast - deprecated by RFC 3879 (replaced by ULA)
    ff00::/8+               # Multicast
  ];
}

That’s a long list!! But operators on the DFZ should really never be accepting any of these, and we should all collectively yell at those who propagate them.

RPKI Filtering is a fantastic routing security feature, described in RFC6810 and relatively straight forward to implement. For each originating AS number, we can check in a table of known <origin,prefix> mapping, if it is the correct ISP to originate the prefix. The lookup can either match (which makes the prefix RPKI valid), the lookup can fail because the prefix is missing (which makes the prefix RPKI unknown), and it can specifically mismatch (which makes the prefix RPKI invalid). Operators are encouraged to flag and drop invalid prefixes:

function ipv4_rpki_invalid() {
  return roa_check(t_roa4, net, bgp_path.last) = ROA_INVALID;
}

function ipv6_rpki_invalid() {
  return roa_check(t_roa6, net, bgp_path.last) = ROA_INVALID;
}

NOTE: In NLNOG my post sparked a bit of debate on the use of bgp_path.last_nonaggregated versus simply bgp_path.last. Job Snijders did some spelunking and offered this post and a reference to RFC6907 for details, and Tijn confirmed that Coloclue (on which many of my approaches have been modeled) indeed uses bgp_path.last. I’ve updated my configs, with many thanks for the discussion.

Alright, now that I’ve determined the as-path and prefix are kosher, and that it is not known to be hijacked (ie. is either ROA_VALID or ROA_UNKNOWN), I’m ready to set a few attributes, notably:

  • AS_LOCALPREF If the peer I learned this prefix from is in the given list, set the BGP local preference to either 50, 30 or 10 respectively (a lower localpref means the prefix is less likely to be selected). Some internet providers send lots of prefixes, but have poor network connectivity to the place I learned the routes from (a few examples to this, 6939 is often oversubscribed in Amsterdam, and 39533 was for a while connected via a tunnel (!) to Zurich, and several hobby/amateur IXPs are on a VXLAN bridged domain rather than a physical switch).

  • Graceful Shutdown described in RFC8326, shows a way to allow operators to pre-announce their downtime by setting a special BGP community that informs their peers to deselect that path by setting the local preference to the lowest possible value. This oneliner matching on (65535,0) implements that behavior.

  • Blackhole Community described in RFC7999, is another special BGP community of (65535,666) which signals the need to stop sending traffic to the prefix at hand. I haven’t yet implemented the blackhole routing (this has to do with an intricacy of the VPP Linux-CP code that I wrote), so for now I’ll just remove the community.

Alright, based on this one template, I’m now ready to implement all three types of BGP session: Peer, Upstream, and Downstream.

Peers

function ebgp_import_peer(int remote_as) {
  # Scrub BGP Communities (RFC 7454 Section 11)
  bgp_community.delete([(8298, *)]);
  bgp_large_community.delete([(8298, *, *)]);

  return ebgp_import(remote_as);
}

It’s dangerous to accept communities for my own AS8298 from peers. This is because several of them can actively change the behavior of route propagation (these types of communities are commonly called action communities). So with peering relationships, I’ll just toss them all.

Now, working my way up to the actual BGP peering session, taking for example a peer that I’m connecting to at LSIX (the routeserver, in fact) in Amsterdam:

filter ebgp_lsix_49917_import {
  if ! ebgp_import_peer(49917) then reject;

  # Add IXP Communities
  bgp_community.add((8298,1036));
  bgp_large_community.add((8298,1,1036));

  accept;
}

protocol bgp lsix_49917_ipv4_1 {
  description "LSIX IX Route Servers (LSIX)";
  local as 8298;
  source address 185.1.32.74;
  neighbor 185.1.32.254 as 49917;
  default bgp_med 0;
  default bgp_local_pref 200;
  ipv4 {
    import keep filtered;
    import filter ebgp_lsix_49917_import;
    export filter ebgp_lsix_49917_export;
    receive limit 100000 action restart;
    next hop self on;
  };
};

Parsing this through: the ipv4 import filter is called ebgp_lsix_49917_import and its job is to run the whole kittenkaboodle of filtering I described above, and then if the ebgp_import_peer() function returns false, to simply drop the prefix. But if it is accepted, I’ll tag it with a few communities. As I’ll show later, any other peer will receive these communities if I decide to propagate the prefix to them. This is specifically useful for downstream (customers), who can decide to accept/deny the prefix based on a wellknown set of communities we tag.

IXP Community: If the prefix is learned at an IXP, I’ll add a large community (8298,1,*) and backwards compat normal community (8298,10XX).

One last thing I’ll note, and this is a matter of taste, is for most peering prefixes picked up at internet exchanges (like LSIX), are typically much cheaper per megabit than the transit routes, so I will set a default bgp_local_pref of 200 (higher localpref is more likely to be selected as the active route).

Upstream

An interesting observation: from Peers and from Upstreams I typically am happy to take all the prefixes I can get (but see the epilog below for an important note on this). For a Peer, this is mostly “their own prefixes” and for a Transit, this is mostly “all prefixes”, but there’s things in the middle, say partial transit of “all prefixes learned at IXP A B and C”. Really, all inbound sessions are very similar:

function ebgp_import_upstream(int remote_as) {
  # Scrub BGP Communities (RFC 7454 Section 11)
  bgp_community.delete([(8298, *)]);
  bgp_large_community.delete([(8298, *, *)]);

  return ebgp_import(remote_as);
}

… is in fact identical to the ebgp_import_peer() function above, so I’ll not discuss it further. But for the sessions to upstream (==transit) providers, it can make sense to use slightly different BGP community tags and a lower localpref:

filter ebgp_ipmax_25091_import {
  if ! ebgp_import_upstream(25091) then reject;

  # Add BGP Large Communities
  bgp_large_community.add((8298,2,25091));

  # Add BGP Communities
  bgp_community.add((8298,2000));

  accept;
}

protocol bgp ipmax_25091_ipv4_1 {
  description "IP-Max Transit";
  local as 8298;
  source address 46.20.242.210;
  neighbor 46.20.242.209 as 25091;
  default bgp_med 0;
  default bgp_local_pref 50;
  ipv4 {
    import keep filtered;
    import filter ebgp_ipmax_25091_import;
    export filter ebgp_ipmax_25091_export;
    next hop self on;
  };
};

Again, a very similar pattern; the only material difference is that the inbound prefixes are tagged with an Upstream Community which is of the form (8298,2,*) and backwards compatible (8298,20XX). Downstream customers can use this, if they wish, to select or reject routes (maybe they don’t like routes coming from AS25091, although they should know better because IP-Max rocks!).

The other slight change here is the bgp_local_pref is set to 50, which implies that it will be used only if there are no alternatives in the RIB with a higher localpref, or with a similar localpref but shorter as-path, or many other scenarios which I won’t get into here, because BGP selection criteria 101 is a whole blogpost of its own.

Downstream

That brings us to the third type of BGP sessions – commonly referred to as customers except that not everybody pays :) so I just call them downstreams:

function ebgp_import_downstream(int remote_as) {
  # We do not scrub BGP Communities (RFC 7454 Section 11) for customers
  return ebgp_import(remote_as);
}

Here, I have a special relationship with the remote_as, and I do not scrub the communities, letting the downstream operator set whichever they like. As I’ll demonstrate in the next chapter, they can use these communities to drive certain types of behavior.

Here’s how I use this ebgp_import_downstream() function in the full filter for a downstream:

# bgpq4 -Ab4 -R 24 -m 24 -l 'define AS201723_IPV4' AS201723
define AS201723_IPV4 = [
    185.54.95.0/24
];

# bgpq4 -Ab6 -R 48 -m 48 -l 'define AS201723_IPV6' AS201723
define AS201723_IPV6 = [
    2001:678:3d4::/48,
    2001:67c:6bc::/48
];

filter ebgp_raymon_201723_import {
  if (net.type = NET_IP4 && ! (net ~ AS201723_IPV4)) then reject;
  if (net.type = NET_IP6 && ! (net ~ AS201723_IPV6)) then reject;
  if ! ebgp_import_downstream(201723) then reject;

  # Add BGP Large Communities
  bgp_large_community.add((8298,3,201723));

  # Add BGP Communities
  bgp_community.add((8298,3500));

  accept;
}

protocol bgp raymon_201723_ipv4_1 {
  local as 8298;
  source address 185.54.95.250;
  neighbor 185.54.95.251 as 201723;
  default bgp_med 0;
  default bgp_local_pref 400;
  ipv4 {
    import keep filtered;
    import filter ebgp_raymon_201723_import;
    export filter ebgp_raymon_201723_export;
    receive limit 94 action restart;
    next hop self on;
  };
};

OK, so this is a mouthful, but the one thing that I really need to do with customers is ensure that I only accept prefixes from them that they’re supposed to send me. I do this with a prefix-list for IPv4 and IPv6, and in the importer, I simply reject any prefixes that are not in the list. From then on, it looks very much like a peer, with identical filtering and tagging, except now I’m using yet another Customer Community which starts with (8298,3,*) and a vanilla (8298,3500) community. Anybody who wishes to, can act on the presence of these communities to know that it’s a downstream of IPng Networks AS8298.

A note on Peers and Downstreams:

Some ISPs will not peer with their customers (as in: once you become a transit customer they will terminate all BGP sessions at public internet exchanges), and I find that silly. However, for me the situation becomes a little bit more complex if I were to have AS201723 both as a Downstream (as shown here) as well as a Peer (which in fact, I do, at multiple Amsterdam based internet exchanges). Note how the bgp_local_pref is 400 on this session, and it will always be lower on other types of sessions. The implication is that this prefix from the RIB which carries (8298,3,201723) will be selected, and the ones I learn from LSIX will carry (8298,1,*) and the ones I learn from A2B (a transit provider) will carry (8298,2,51088) and both will not be selected due to those having a lower localpref. As I’ll demonstrate below, I can make smart use of these communities when announcing prefixes to my own peers and upstreams, … read on :)

3. Outbound: Announcing Routes

Alright, the RIB is now filled with lots of prefixes that have the right localpref and communities, for example from having been learned at an IXP, from an Upstream, or from a Downstream. Now let’s consider the following generic exporter:

function ebgp_export(int remote_as) {
  # Remove private ASNs
  bgp_path.delete([64512..65535, 4200000000..4294967295]);

  # Well known BGP Large Communities
  if (8298, 0, remote_as) ~ bgp_large_community then return false;
  if (8298, 0, 0) ~ bgp_large_community then return false;

  # Well known BGP Communities
  if (0, 8298) ~ bgp_community then return false;
  if (remote_as < 65536 && (0, remote_as) ~ bgp_community) then return false;

  # AS path prepending
  if ((8298, 103, remote_as) ~ bgp_large_community ||
      (8298, 103, 0) ~ bgp_large_community) then {
    bgp_path.prepend( bgp_path.first );
    bgp_path.prepend( bgp_path.first );
    bgp_path.prepend( bgp_path.first );
  } else if ((8298, 102, remote_as) ~ bgp_large_community ||
             (8298, 102, 0) ~ bgp_large_community) then {
    bgp_path.prepend( bgp_path.first );
    bgp_path.prepend( bgp_path.first );
  } else if ((8298, 101, remote_as) ~ bgp_large_community ||
             (8298, 101, 0) ~ bgp_large_community) then {
    bgp_path.prepend( bgp_path.first );
  }

  return true;
}

Oh, wow! There’s some really cool stuff to unpack here. As a belt-and-braces type safety, I will remove any private AS numbers from the as-path - this avoids my own announcements from tripping any as-path bogon filtering. But then, there’s a few well-known communities that help determine if the announcement is made or not, and there are three-and-a-half ways of doing this:

  1. (8298,0,remote_as)
  2. (8298,0,0)
  3. (0,8298)
  4. (0,remote_as) but only if the remote_as is 16 bits.

All four of these methods will tell the router to refuse announcing the prefix on this session. Note that downstreams are allowed to set (8298,*,*) and (8298,*) communities (and they’re the only ones who are allowed to do so). So here is where some of the cool magic starts to happen.

Then, to drive prepending of the prefix on this session, I’ll again match certain communities (8298, 103, *) will prepend the customer’s AS number three times, using 102 will prepend twice, and 101 will prepend once. If the third digit is 0, then any session with this filter will prepend. If the third digit is the AS number, then only sessions to this AS number will be prepended.

Using these types of communities allow downstream (customers) incredibly fine grained propagation actions, at the per-IPng-session level. Not many ISPs offer this functionality!

Peers

Exporting to peers, I really need to make sure that I don’t send too many prefixes. Most of us have at some point gone through the embarassing motions of being told by a fellow operator “hey you’re sending a full table”. It is paramount to good peering hygiene that I do not leak. So I’ll define a healthy set of defense in depth principles here:

# bgpq4 -A4b -R 24 -m 24 -l 'define AS8298_IPV4' AS8298
define AS8298_IPV4 = [ 92.119.38.0/24, 194.1.163.0/24, 194.126.235.0/24 ];

# bgpq4 -A6bR 48 -m 48 -l 'define AS8298_IPV6' AS8298
define AS8298_IPV6 = [ 2001:678:d78::/48, 2a0b:dd80::/29{29,48} ];

# bgpq4 -A4b -R 24 -m 24 -l 'define AS_IPNG_IPV4' AS-IPNG
define AS_IPNG_IPV4 = [ ... ## Removed for brevity ];

# bgpq4 -A6bR 48 -m 48 -l 'define AS_IPNG_IPV6' AS-IPNG
define AS_IPNG_IPV6 = [ .. ## Removed for brevity ];

# bgpq4 -t4b -l 'define AS_IPNG' AS-IPNG
define AS_IPNG = [112, 8298, 50869, 57777, 60557, 201723, 212323, 212855];

function aspath_first_valid() {
  return (bgp_path.len = 0 || bgp_path.first ~ AS_IPNG);
}

# A list of well-known tier1 transit providers
function aspath_contains_tier1() {
  return bgp_path ~ [
     174,                  # Cogent
     209,                  # Qwest (HE carries this on IXPs IPv6 (Jul 12 2018))
     701,                  # UUNET
     702,                  # UUNET
     1239,                 # Sprint
     1299,                 # Telia
     2914,                 # NTT Communications
     3257,                 # GTT Backbone
     3320,                 # Deutsche Telekom AG (DTAG)
     3356,                 # Level3
     3549,                 # Level3
     3561,                 # Savvis / CenturyLink
     4134,                 # Chinanet
     5511,                 # Orange opentransit
     6453,                 # Tata Communications
     6762,                 # Seabone / Telecom Italia
     7018 ];               # AT&T
}

# The list of our own uplink (transit) providers
# Note: This list is autogenerated by our automation.
function aspath_contains_upstream() {
  return bgp_path ~ [ 8283,25091,34549,51088,58299 ];
}

function ipv4_prefix_valid() {
  # Our (locally sourced) prefixes
  if (net ~ AS8298_IPV4) then return true;

  # Customer prefixes in AS-IPNG must be tagged with customer community
  if (net ~ AS_IPNG_IPV4 &&
       (bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)])
     ) then return true;

  return false;
}
function ipv6_prefix_valid() {
  # Our (locally sourced) prefixes
  if (net ~ AS8298_IPV6) then return true;

  # Customer prefixes in AS-IPNG must be tagged with customer community
  if (net ~ AS_IPNG_IPV6 &&
       (bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)])
     ) then return true;

  return false;
}
function prefix_valid() {
  # as-path based filtering
  if !aspath_first_valid() then return false;
  if aspath_contains_tier1() then return false;
  if aspath_contains_upstream() then return false;

  # prefix (and BGP community) based filtering
  if (net.type = NET_IP4 && !ipv4_prefix_valid()) then return false;
  if (net.type = NET_IP6 && !ipv6_prefix_valid()) then return false;
  return true;
}

function ebgp_export_peer(int remote_as) {
  if !prefix_valid() then return false;
  return ebgp_export(remote_as);
}

Wow, alrighty then!! All I’m doing here is checking if the call to prefix_valid() returns true. That function isn’t very complex. It takes a look at three as-path based filters and then a prefix-list based filter. Let’s go over them in turn:

aspath_first_valid() takes a look at the first hop in the as-path. I need to make sure that I’ve received this prefix from an actual downstream, and those are collected in a RIPE as-set called AS-IPNG. So if the first BGP hop in the path is not one of these, I’ll refuse to announce the prefix.

aspath_contains_tier1() is a belt-and-braces style check. How on earth would I provide transit for any prefix for which there’s already a global Tier1 provider in the path? I mean, in no universe would AS174 or AS1299 need me to reach any of their customers, or indeed, any place in the world. So this filter helps me never announce the prefix, if it has one of these ISPs in the path.

aspath_contains_upstream() similarly, if I am receiving a full table from an upstream provider, I should not be passing this prefix along - I would for similar reasons never be a transit provider for A2B or IP-Max or Meerfarbig. Due to a bug in my configuration, my buddy Erik kindly pointed out this issue to me, so hat-tip to him for the intelligence.

ipv[46]_prefix_valid() is the main thrust of prefix-based filtering. At this point we’ve already established that the as-path is clean, but it could be that the downstream is sending prefixes they should not (possibly leaking a full table) so let’s take a look at a good way to avoid this.

  • First, we look at locally sourced routes from AS8298, that is the ones that I myself originate at IPng Networks. These are always OK. The list is carefully curated.
  • Alternatively, the prefix needs to be from the as-set AS-IPNG (which contains both my prefixes and all route and route6 objects belonging to any AS number that I consider a downstream),
  • Finally, if the prefix is from AS-IPNG, I’ll still add one additional check to ensure that there is a so-called customer community attached. Remember that I discused this specifically up in the Inbound - Downstream section.

So before I were to announce anything on such a session, all four of as-path, inbound prefix-list, outbound prefix-list and bgp-community are checked. This makes it incredibly unlikely that AS8298 ever leaks prefixes – knock on wood!

Upstream

Interestingly and if you think about it, unsurprisingly, an upstream configuration is exactly identical to a peer:

function ebgp_export_upstream(int remote_as) {
  if !prefix_valid() then return false;
  return ebgp_export(remote_as);
}

Alright, nothing to see here, moving on …

Downstream

Now the difference between a Peer and an Upstream on the one hand, and a Downstream on the other, is that the former two will only see a very limited set of prefixes, heavily guarded by all of that filtering I described. But a downstream typically has the luxury of getting to learn every prefix I’ve learned:

function ipv4_acceptable_size() {
  if net.len < 8 then return false;
  if net.len > 24 then return false;
  return true;
}
function ipv6_acceptable_size() {
  if net.len < 12 then return false;
  if net.len > 48 then return false;
  return true;
}
function ebgp_export_downstream(int remote_as) {
  if (source != RTS_BGP && source != RTS_STATIC) then return false;
  if (net.type = NET_IP4 && ! ipv4_acceptable_size()) then return false;
  if (net.type = NET_IP6 && ! ipv6_acceptable_size()) then return false;

  return ebgp_export(remote_as);
}

So here I’ll assert that the prefix has to be either from the RTS_BGP source, or from the RTS_STATIC source. This latter source is what Bird uses for locally generated routes (ie. the ones in AS8298 itself). Locally generated routes are not known from BGP, but known instead because they are blackholed / null-routed on the router itself. And from these routes, I further deselect those prefixes that are too short or too long, which are slightly different based on address family (IPv4 is anywhere between /8-/24 and for IPv6 is anywhere between /12-/48).

Now, I will note that I’ve seen many operators who inject OSPF or connected or static routes into BGP, and all of those folks will have to maintain elaborate egress “bogon” route filters, for example for those IXP prefixes that they picked up due to them being directly connected. If those operators would simply not propagate directly connected routes, their life would be so much simpler .. but I digress and it’s time for me to wrap up.

Epilog

I hope this little dissertation proves useful for other Bird enthusiasts out there. I myself had to fiddle a bit over the years with the idiosyncracies (and bugs) of Bird and Bird2. I wanted to make a few comments:

  1. Thanks to the crew at Coloclue for having a really phenomenal routing setup, with a lot of thoughtful documentation, action communities, and strict ingress and egress filtering. It’s also fully automated and I’ve derived, although completely rewritten, my own automation based off of Kees.
  2. I understand that the main destinction on inbound Peer and Upstream, is that for Peers many folks will want to do strict filtering. I’ve considered this for a long time and ultimately decided against it, because a combination of max prefix, tier1 as-path filtering and RPKI filtering would take care of the most egregious mistakes and otherwise, I’m actually happy to get more prefixes via IXPs rather than less.