FreeIX Remote - Part 2

FreeIX, Artists Rendering

Introduction

A few months ago, I wrote about [an idea] to help boost the value of small Internet Exchange Points (IXPs). When such an exchange doesn’t have many members, then the operational costs of connecting to it (cross connects, router ports, finding peers, etc) are not very favorable.

Clearly, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through the IXP improves routing efficiency and fault-tolerance, and at the same time it avoids traffic going the scenic route to a large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.

Refresher: FreeIX Remote

FreeIX Remote

Let’s take for example the [Free IX in Greece] that was announced at GRNOG16 in Athens on April 19th, 2024. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will be Gratis, so please make sure to apply if you’re in this region! I myself have connected one very special router to Free IX Greece, which will be offering an outreach infrastructure by connecting to other Internet Exchange Points in Amsterdam, and allowing all FreeIX Greece members to benefit from that in the following way:

  1. FreeIX Remote uses AS50869 to peer with any network operator (or routeserver) available at public Internet Exchange Points or using private interconnects. For these peers, it looks like a completely normal service provider in this regard. It will connect to internet exchange points, and learn a bunch of routes and announce other routes.

  2. FreeIX Remote members can join the program, after which they are granted certain propagation permissions by FreeIX Remote at the point where they have a BGP session with AS50869. The prefixes learned on these member sessions are marked as such, and will be allowed to propagate. Members will receive some or all learned prefixes from AS50869.

  3. FreeIX members can set fine grained BGP communities to determine which of their prefixes are propagated to and from which locations, by router, country or Internet Exchange Point.

Members at smaller internet exchange points greatly benefit from this type of outreach, by receiving large portions of the public internet directly at their preferred peering location. The Free IX Remote routers will carry member traffic to and from these remote Internet Exchange Points. My [previous article] went into a good amount of detail on the principles of operation, but back then I made a promise to come back to the actual implementation of such a complex routing topology. As a starting point, I work with the structure I shared in [IPng’s Routing Policy]. If you haven’t read that yet, I think it may make sense to take a look as many of the structural elements and concepts will be similar.

Implementation

The routing policy calls for three classes of (large) BGP communities: informational, permission and inhibit. It also defines a few classic BGP communties, but I’ll skip over those as they are not very interesting. Firstly, I will use the informational communities to tag which prefixes were learned by which router, in which country and at which internet exchange point, which I will call a group.

Then, I will use the same structure to grant members permissions, that is to say, when AS50869 learns their prefixes, they will get tagged with specific action communities that enable propagation to other places. I will call this ‘Member-to-IXP’. Sometimes, I’d like to be able to inhibit propagation of ‘Member-to-IXP’, so there will be a third set of communities that perform this function. Finally, matching on the informational communities in a clever way will enable a symmetric ‘IXP-to-Member’ propagation.

To help structure this implementation, it helps if I think about it in the following way:

Let’s say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a member called M1 at IXP1, and that member is ‘permitted’ to reach IXP2 and IXP3, but it is ‘inhibited’ from reaching IXP4. My FreeIX Remote implementation now has to satisfy three main requirements:

  1. Ingress: learn prefixes (from peers and members alike) at internet exchange points or private network interconnects, and ’tag’ them with the correct informational communities.
  2. Egress: Member-to-IXP: Announce M1’s prefixes to IXP2 and IXP3, but not to IXP4.
  3. Egress: IXP-to-Member: Announce IXP2’s and IXP3’s prefixes to M1, but not IXP4’s.

Defining Countries and Routers

I’ll start by giving each country which has at least one router a unique country_id in a YAML file, leaving the value 0 to mean ‘all’ countries:

$ cat config/common/countries.yaml
country:
  all: 0
  CH: 1
  NL: 2
  GR: 3
  IT: 4

Each router has its own configuration file, and at the top, I’ll define some metadata which includes things like the country in which it operates, and its own unique router_id, like so:

$ cat config/chrma0.net.free-ix.net.yaml
device:
  id: 1
  hostname: chrma0.free-ix.net
  shortname: chrma0
  country: CH
  loopbacks:
    ipv4: 194.126.235.16
    ipv6: "2a0b:dd80:3101::"
  location: "Hofwiesenstrasse, Ruemlang, Zurich, Switzerland"
...

Defining communities

Next, I define the BGP communities in class and subclass types, in the following YAML structure:

ebgp:
  community:
    legacy:
      noannounce: 0
      blackhole: 666
      inhibit: 3000
      prepend1: 3100
      prepend2: 3200
      prepend3: 3300
    large:
      class:
        informational: 1000
        permission: 2000
        inhibit: 3000
        prepend1: 3100
        prepend2: 3200
        prepend3: 3300
      subclass:
        all: 0
        router: 10
        country: 20
        group: 30
        asn: 40

Defining Members

In order to keep this system manageable, I have to rely on automation. I intend to leverage the BGP community subclasses in a simple ACL system consisting of the following YAML, taking my buddy Antonios’ network as an example:

$ cat config/common/members.yaml
member:
  210312:
    description: DaKnObNET
    prefix_filter: AS-SET-DNET
    permission: [ router:chrma0 ]
    inhibit: [ group:chix ]
  ...

The syntax of the permission and inhibit fields are identical. They are lists of key:value pairs where they key must be one of the subclasses (eg. ‘router’, ‘country’, ‘group’, ‘asn’), and the value appropriate for that type. In this example, AS50869 is being asked to grant permissions for Antonios’ prefixes to any peer connected to router:chrma0, but inhibit propagation to/from the exchange point called group:chix. I could extend this list, for example by adding a permission to country:NL or an inhibit to router:grskg0 and so on.

I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its members.

Ingress: Learning Prefixes

With what I’ve defined so far, I can start to set informational BGP communtiies:

  • The prefixes learned on subclass router for chrma0 will have value of device.id=1: (50869,1010,1)
  • The prefixes learned on subclass country for chrma0 will learn from device.country=CH and be able to look up in countries['CH'] that this means value 1: (50869,1020,1)
  • When learning prefixes from a given internet exchange, Kees already knows its PeeringDB ixp_id, which is a unique value for each exchange point. Thus, subclass group for chrma0 at [CommunityIX] is ixp_id=2013: (50869,1030,2013)

Ingress: Learning from members

I need to make sure that members send only the prefixes that I expect from them. To do this, I’ll make use of a common tool called [bgpq4] which cobbles together the prefixes belonging to an AS-SET by referencing one or more IRR databases.

In Python, I’ll prepare the Jinja context by generating the prefix filter lists like so:

if session["type"] == "member":
  session = {**session, **data["member"][asn]}

pf = ebgp_merge_value(data["ebgp"], group, session, "prefix_filter", None)
if pf: 
    ctx["prefix_filter"] = {}
    pfn = pf
    pfn = pfn.replace("-", "_")
    pfn = pfn.replace(":", "_")

    for af in [4, 6]:
        filter_name = "%s_%s_IPV%d" % (groupname.upper(), pfn, af)
        filter_contents = fetch_bgpq(filter_name, pf, af, allow_morespecifics=True) 
        if "[" in filter_contents:
            ctx["prefix_filter"][filter_name] = { "str": filter_contents, "af": af }
            ctx["prefix_filter_ipv%d" % af] = True
        else:
            log.warning(f"Filter {filter_name} is empty!")
            ctx["prefix_filter_ipv%d" % af] = False

First, if a given BGP session is of type member, I’ll merge the member[asn] dictionary into the ebgp.group.session[asn]. I’ve left out error handling for brevity, but in case the member YAML file doesn’t have an entry for the given ASN, it’ll just revert back to being of type peer.

I’ll use a helper function ebgp_merge_value() to walk the YAML hiearchy from the member-data enriched session to the group and finally to the ebgp scope, looking for the existence of a key called prefix_filter and defaulting to None in case none was found. With the value of prefix_filter in hand (in this case AS-SET-DNET), I shell out to bgpq4 for IPv4 and IPv6 respectively. Sometimes, there are no IPv6 prefixes (why must you be like this?!) and sometimes there are no IPv4 prefixes (welcome to the Internet, kid!)

All of this context, including the session and group information, are then fed as context to a Jinja renderer, where I can use them in an import filter like so:

{% for plname, pl in (prefix_filter | default({})).items() %}
{{pl.str}}
{% endfor %}

filter ebgp_{{group_name}}_{{their_asn}}_import {
{% if not prefix_filter_ipv4 | default(True) %}
  # WARNING: No IPv4 prefix filter found
  if (net.type = NET_IP4) then reject;
{% endif %}
{% if not prefix_filter_ipv6 | default(True) %}
  # WARNING: No IPv6 prefix filter found
  if (net.type = NET_IP6) then reject;
{% endif %}
{% for plname, pl in (prefix_filter | default({})).items() %}
{% if pl.af == 4 %}
  if (net.type = NET_IP4 && ! (net ~ {{plname}})) then reject;
{% elif pl.af == 6 %}
  if (net.type = NET_IP6 && ! (net ~ {{plname}})) then reject;
{% endif %}
{% endfor %}
{% if session_type is defined %}
  if ! ebgp_import_{{session_type}}({{their_asn}}) then reject;
{% endif %}

  # Add FreeIX Remote: Informational
  bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.router}},{{device.id}})); ## informational.router = {{ device.hostname }}
  bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.country}},{{country[device.country]}})); ## informational.country = {{ device.country }}
{% if group.peeringdb_ix.id %}
  bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.group}},{{group.peeringdb_ix.id}})); ## informational.group = {{ group_name }}
{% endif %}

  ## NOTE(pim): More comes here, see Member-to-IXP below

  accept;
}

Let me explain what’s going on here, as Jinja templating language that my generator uses is a bit … chatty. The first block will print the dictionary of zero or more prefix_filter entries. If the prefix_filter context variable doesn’t exist, assume it’s the empty dictionary and thus, print no prefix lists.

Then, I create a Bird2 filter and these must each have a globally unique name. I satisfy this requirement by giving it a name with the tuple of {group, their_asn}. The first thing this filter does, is inspect prefix_filter_ipv4 and prefix_filter_ipv6, and if they are explicitly set to False (for example, if a member doesn’t have any IRR prefixes associated with their AS-SET), then I’ll reject any prefixes from them. Then, I’ll match the prefixes with the prefix_filter, if provided, and reject any prefixes that aren’t in the list I’m expecting on this session. Assuming we’re still good to go, I’ll hand this prefix off to a function called ebgp_import_peer() for peers and ebgp_import_member() for members, both of which ensure BGP communities are scrubbed.

function ebgp_import_peer(int remote_as) -> bool
{
  # Scrub BGP Communities (RFC 7454 Section 11)
  bgp_community.delete([(50869, *)]);
  bgp_large_community.delete([(50869, *, *)]);

  # Scrub BLACKHOLE community
  bgp_community.delete((65535, 666));

  return ebgp_import(remote_as);
}

function ebgp_import_member(int remote_as) -> bool
{
  # We scrub only our own (informational, permissions) BGP Communities for members
  bgp_large_community.delete([(50869,1000..2999,*)]);

  return ebgp_import(remote_as);
}

After scrubbing the communities (peers are not allowed to set any communities, and members are not allowed to set their own informational or permissions communities, but they are allowed to inhibit themselves or prepend, if they wish), one last check is performed by calling the underlying ebgp_import():

function ebgp_import(int remote_as) -> bool
{
  if aspath_bogon() then return false;
  if (net.type = NET_IP4 && ipv4_bogon()) then return false;
  if (net.type = NET_IP6 && ipv6_bogon()) then return false;

  if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
  if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;

  # Graceful Shutdown (https://www.rfc-editor.org/rfc/rfc8326.html)
  if (65535, 0) ~ bgp_community then bgp_local_pref = 0;

  return true;
}

Here, belt-and-suspenders checks are performed, notably bogon AS Paths, IPv4/IPv6 prefixes and RPKI invalids are filtered out. If the prefix has well-known community for [BGP Graceful Shutdown], honor it and set the local preference to zero (making sure to prefer any other available path).

OK, after all these checks are done, I am finally ready to accept the prefix from this peer or member. It’s time to add the informational communities based on the router_id, the router’s country_id and (if this is a session at a public internet exchange point documented in PeeringDB), the group’s ixp_id.

Ingress Example: member

Here’s what the rendered template looks like for Antonios’ member session at CHIX:

# bgpq4 -Ab4 -R 32 -l 'define CHIX_AS_SET_DNET_IPV4' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV4 = [
 44.31.27.0/24{24,32}, 44.154.130.0/24{24,32}, 44.154.132.0/24{24,32},
 147.189.216.0/21{21,32}, 193.5.16.0/22{22,32}, 212.46.55.0/24{24,32}
];

# bgpq4 -Ab6 -R 128 -l 'define CHIX_AS_SET_DNET_IPV6' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV6 = [
 2001:678:f5c::/48{48,128}, 2a05:dfc1:9174::/48{48,128}, 2a06:9f81:2500::/40{40,128},
 2a06:9f81:2600::/40{40,128}, 2a0a:6044:7100::/40{40,128}, 2a0c:2f04:100::/40{40,128},
 2a0d:3dc0::/29{29,128}, 2a12:bc0::/29{29,128}
];

filter ebgp_chix_210312_import {
  if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
  if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
  if ! ebgp_import_member(210312) then reject;

  # Add FreeIX Remote: Informational
  bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
  bgp_large_community.add((50869,1020,1)); ## informational.country = CH
  bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
  
  ## NOTE(pim): More comes here, see Member-to-IXP below

  accept;
}

Ingress Example: peer

For completeness, here’s a regular peer Cloudflare at CHIX, and I hope you agree that the Jinja template renders down to something waaaay more readable now:

filter ebgp_chix_13335_import {
  if ! ebgp_import_peer(13335) then reject;

  # Add FreeIX Remote: Informational
  bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
  bgp_large_community.add((50869,1020,1)); ## informational.country = CH
  bgp_large_community.add((50869,1030,2365)); ## informational.group = chix

  accept;
}

Most sessions will actually look like this one: just learning prefixes, scrubbing inbound communities that are nobody’s business to be setting but mine, tossing weird prefixes like bogons and then setting typically the three informational communities. I now know exactly which prefixes are picked up at group CHIX, which ones in country Switzerland, and which ones on router chrma0.

Egress: Propagating Prefixes

And with that, I’ve completed the ’learning’ part. Let me move to the ‘propagating’ part. A design goal of FreeIX Remote is to have symmetric propagation. In my example above, member M1 should have its prefixes announced at IXP2 and IXP3, and all prefixes learned at IXP2 and IXP3 should be announced to member M1.

First, let me create a helper function in the generator. It’s job is to take the symbolic member.*.permissions and member.*.inhibit lists and resolve them into a structure of numeric values suitable for BGP community list adding and matching. It’s a bit of a beast, but I’ve simplified it a bit. Notably, I’ve removed all the error and exception handling for brevity:

def parse_member_communities(data, asn, type):
  myasn = data["ebgp"]["asn"]
  cls = data["ebgp"]["community"]["large"]["class"]
  sub = data["ebgp"]["community"]["large"]["subclass"]

  bgp_cl = []
  member = data["member"][asn]

  for perm in perms:
    if perm == "all":
      el = { "class": int(cls[type]), "subclass": int(sub["all"]),
             "value": 0, "description": f"{type}.all" }
      return [el]
    k, v = perm.split(":")
    if k == "country":
      country_id = data["country"][v]
      el = { "class": int(cls[type]), "subclass": int(sub["country"]),
             "value": int(country_id), "description": f"{type}.{k} = {v}" }
      bgp_cl.append(el)
    elif k == "asn":
      el = { "class": int(cls[type]), "subclass": int(sub["asn"]),
             "value": int(v), "description": f"{type}.{k} = {v}" }
      bgp_cl.append(el)
    elif k == "router":
      device_id = data["_devices"][v]["id"]
      el = { "class": int(cls[type]), "subclass": int(sub["router"]),
             "value": int(device_id), "description": f"{type}.{k} = {v}" }
      bgp_cl.append(el)
    elif k == "group":
      group = data["ebgp"]["groups"][v]
      if isinstance(group["peeringdb_ix"], dict):
        ix_id = group["peeringdb_ix"]["id"]
      else:
        ix_id = group["peeringdb_ix"]
      el = { "class": int(cls[type]), "subclass": int(sub["group"]),
             "value": int(ix_id), "description": f"{type}.{k} = {v}" }
      bgp_cl.append(el)
    else:
      log.warning (f"No implementation for {type} subclass '{k}' for member AS{asn}, skipping")

    return bgp_cl

The essence of this function is to take a human readable list of symbols, like ‘router:chrma0’ and look up what subclass is called ‘router’ and what router_id is ‘chrma0’. It does this for keywords ‘router’, ‘country’, ‘group’ and ‘asn’ and for a special keyword called ‘all’ as well.

Running this a function on Antonios’ member data above would reveal the following:

Member 210312 has permissions:
 [{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}]
Member 210312 has inhibits:
 [{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}]

The neat thing about this is, that this data will come in handy for both types of propagation, and the parse_member_communities() helper function returns pretty readable data, which will help in debugging and further understanding the ultimately generated configuration.

Egress: Member-to-IXP

OK, when I learned Antonios’ prefixes, I have instructed the system to propagate them to all sessions on router chrma0, except sessions on group chix. This means that in the direction of from AS50869 to others, I can do the following:

1. Tag permissions and inhibits on ingress

I add a tiny bit of logic using this data structure I just created above. In the import filter, remember I added NOTE(pim): More comes here? After setting the informational communities, I also add these:

{% if session_type == "member" %}
{% if permissions %}

  # Add FreeIX Remote: Permission
{% for el in permissions %}
  bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% if inhibits %}

  # Add FreeIX Remote: Inhibit
{% for el in inhibits %}
  bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% endif %}

Seeing as this block only gets rendered if the session type is member, let me show you how Antonios’ import filter looks like in its full glory:

filter ebgp_chix_210312_import {
  if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
  if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
  if ! ebgp_import_member(210312) then reject;

  # Add FreeIX Remote: Informational
  bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
  bgp_large_community.add((50869,1020,1)); ## informational.country = CH
  bgp_large_community.add((50869,1030,2365)); ## informational.group = chix

  # Add FreeIX Remote: Permission
  bgp_large_community.add((50869,2010,1)); ## permission.router = chrma0

  # Add FreeIX Remote: Inhibit
  bgp_large_community.add((50869,3030,2365)); ## inhibit.group = chix

  accept;
}

Remember, the ebgp_import_member() helper will strip any informational (the 1000s) and permissions (the 2000s), but it would allow Antonios to set inhibits and prepends (the 3000s) so these BGP communities will still be allowed in. In other words, Antonios can’t give himself propagation rights (sorry, buddy!) but if he would like to make AS50869 stop sending his prefixes to, say, CommunityIX, he could simply add the BGP community (50869,3030,2013) on his announcements, and that will get honored. If he’d like AS50869 to prepend itself twice before announcing to peer AS8298, he could set (50869,3200,8298) and that will also get picked up.

2. Match permissions and inhibits on egress

Now that all of Antonios’ prefixes are tagged with permissions and inhibits, I can reveal how I implemented the export filters for AS50869:

function member_prefix(int group) -> bool
{ 
  bool permitted = false;

  if (({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
    permitted = true;
  }
  if (({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
      ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
    permitted = false;
  }
  return (permitted);
}

function valid_prefix(int group) -> bool
{    
  return (source_prefix() || member_prefix(group));
}    

function ebgp_export_peer(int remote_as; int group) -> bool
{
  if (source != RTS_BGP && source != RTS_STATIC) then return false;
  if !valid_prefix(group) then return false;

  bgp_community.delete([(50869, *)]);
  bgp_large_community.delete([(50869, *, *)]);

  return ebgp_export(remote_as);
}

From the bottom, the function ebgp_export_peer() is invoked on each peering session, and it gets the argument of the remote AS (for example 13335 for CloudFlare), and the group (for example 2365 for CHIX). The function ensures that it’s either a static route or a BGP route. Then it makes sure it’s a valid_prefix() for the group.

The valid_prefix() function first checks if it’s one of our own (as in: AS50869’s own) prefixes, which it does by calling source_prefix(), which i’ve ommitted here as it would be a distraction. All it does is check if the prefix is in a static prefix list generated with bgpq4 for AS50869 itself. The more interesting observation is that to be eligible, the prefix needs to be either source_prefix() or member_prefix(group).

The propagation decision for ‘Member-to-IXP’ actually happens in that member_prefix() function. It starts off by assuming the prefix is not permitted. Then it scans all relevant permissions communities which may be present in the RIB for this prefix:

  • is the all permissions community (50869,2000,0) set?
  • what about the router permission (50869,2010,R) for my router_id?
  • perhaps the country permission (50869,2020,C) for my country_id?
  • or maybe the group permission (50869,2030,G) for the ixp_id that this session lives on?

If any of these conditions are true, then this prefix might pe permitted, so I set the variable to True. Next, I check and see if any of the inhibit communities are set, either by me (in members.yaml) or by the member on the live BGP session. If any one of them matches, then I flip the variable to False again. Once the verdict is known, I can return True or False here, which makes its way all the way up the call stack and ultimately announces the member prefix on the BGP session, or not. Slick!

Egress: IXP-to-Member

At this point, members’ prefixes get announced at the correct internet exchange points, but I need to satisfy one more requirement: the prefixes picked up at those IXPs, should also be announced to members. For this, the helper dictionary with permissions and inhibits can be used in a clever way. What if I held them against the informational communities? For example, I have permitted Antonios to be annouced at any IXP connected to router chrma0, then all prefixes I learned at chrma0 are fair game, right? But, I configured an inhibit for Antonios’ prefixes at CHIX. No problem, I have an informational community for all prefixes I learned from the CHIX group!

I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that I would announce to a peer, I will also announce to a member. Off I go, adding one last helper function to the BGP session Jinja template:

{% if session_type == "member" %}
function ebgp_export_{{group_name}}_{{their_asn}}(int remote_as; int group) -> bool
{
  bool permitted = false;

  if (source != RTS_BGP && source != RTS_STATIC) then return false;
  if valid_prefix(group) then return ebgp_export(remote_as);

{% for el in permissions | default([]) %}
  if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=true; ## {{el.description}}
{% endfor %}
{% for el in inhibits | default([]) %}
  if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=false; ## {{el.description}}
{% endfor %}

  if (permitted) then return ebgp_export(remote_as);
  return false;
}
{% endif %}

Note that in essence, this new function still calls valid_prefix(), which in turn calls source_prefix() or member_prefix(group), so it announces the same prefixes that are also announced to sessions of type ‘peer’. But then, I’ll also inspect the informational communities, where the value of 0 is replaced with a wildcard, because ‘permit or inhibit all’ would mean ‘match any of these BGP communities’. This template renders as follows for Antonios at CHIX:

function ebgp_export_chix_210312(int remote_as; int group) -> bool
{
  bool export = false;

  if (source != RTS_BGP && source != RTS_STATIC) then return false;
  if valid_prefix(group) then return ebgp_export(remote_as);

  if (bgp_large_community ~ [(50869,1010,1)]) then export=true; ## permission.router = chrma0
  if (bgp_large_community ~ [(50869,1030,2365)]) then export=false; ## inhibit.group = chix

  if (export) then return ebgp_export(remote_as);
  return false;
}

Results

With this, the propagation logic is complete. Announcements are symmetric, that is to say the function ebgp_export_chix_210312() sees to it that Antonios gets the prefixes learned at router chrma0 but not those learned at group CHIX. Similarly, the ebgp_export_peer() ensures that Antonios’ prefixes are propagated to any session at router chrma0 except those sessions at group CHIX.

VPP

I have installed VPP with [OSPFv3] unnumbered interfaces, so each router has exactly one IPv4 and IPv6 loopback address. The router in Rümlang has been operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki (grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan (itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara.

I deployed a test setup with a few permissions and inhibits on the Rümlang router, with many thanks to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the following test configuration:

member:
  35202:
    description: OnTheGo (Sam Aschwanden)
    prefix_filter: AS-OTG
    permission: [ router:chrma0 ]
    inhibit: [ group:comix ]
  210312:
    description: DaKnObNET
    prefix_filter: AS-SET-DNET
    permission: [ router:chrma0 ]
    inhibit: [ group:chix ]
  212635:
    description: Jurrian van Iersel
    prefix_filter: AS212635:AS-212635
    permission: [ router:chrma0 ]
    inhibit: [ group:chix, group:fogixp ]

I can see the following prefix learn/announce counts towards members:

pim@chrma0:~$ for i in $(birdc show protocol | grep member | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes; done
chix_member_35202_ipv4_1        2 imported, 0 filtered, 159984 exported, 0 preferred
chix_member_35202_ipv6_1        2 imported, 0 filtered, 61730 exported, 0 preferred
chix_member_210312_ipv4_1       3 imported, 0 filtered, 3518 exported, 3 preferred
chix_member_210312_ipv6_1       2 imported, 0 filtered, 1251 exported, 2 preferred
comix_member_35202_ipv4_1       2 imported, 0 filtered, 159981 exported, 2 preferred
comix_member_35202_ipv4_2       2 imported, 0 filtered, 159981 exported, 1 preferred
comix_member_35202_ipv6_1       2 imported, 0 filtered, 61727 exported, 2 preferred
comix_member_35202_ipv6_2       2 imported, 0 filtered, 61727 exported, 1 preferred
fogixp_member_212635_ipv4_1     1 imported, 0 filtered, 442 exported, 1 preferred
fogixp_member_212635_ipv6_1     14 imported, 0 filtered, 181 exported, 14 preferred
freeix_ch_member_210312_ipv4_1  3 imported, 0 filtered, 3521 exported, 0 preferred
freeix_ch_member_210312_ipv6_1  2 imported, 0 filtered, 1253 exported, 0 preferred

Let me make a few observations:

  • Hurricane Electric AS6939 is present at CHIX, and they tend to announce a very large number of prefixes. So every member who is permitted (and not inhibited) at CHIX will see all of those: Sam’s AS35202 is inhibited on CommunityIX but not on CHIX, and he’s permitted on both. That explains why he is seeing the routes on both sessions.
  • I’ve inhibited Jurrian’s AS212635 to/from both CHIX and FogIXP, which means he will be seeing CommunityIX (~245 IPv4, 85 IPv6 prefixes), and FreeIX CH (~173 IPv4 and ~60 IPv6). We also send him the member prefixes, which is about 35 or so additional prefixes. This explains why Jurrian is receiving from us ~440 IPv4 and ~180 IPv6.
  • Antonios’ AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077 or so IPv4 and 1056 IPv6 prefixes, while I’ve already added up FreeIX, CommunityIX, and our members (this is what we’re sending Jurrian!), at 330 resp 180, so Antonios should be getting about 3500 IPv4 prefixes and 1250 IPv6 prefixes.

In the other direction, I would expect to be announcing to peers only prefixes belonging to either AS50869 itself, or those of our members:

pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes || echo; done
chix_peer_212100_ipv4_1      57618 imported, 0 filtered, 24 exported, 778 preferred
chix_peer_212100_ipv6_1      21979 imported, 1 filtered, 37 exported, 7186 preferred
chix_peer_13335_ipv4_1       4767 imported, 9 filtered, 24 exported, 4765 preferred
chix_peer_13335_ipv6_1       371 imported, 1 filtered, 37 exported, 369 preferred
chix_peer_6939_ipv4_1        151787 imported, 27 filtered, 24 exported, 133943 preferred
chix_peer_6939_ipv6_1        61191 imported, 6 filtered, 37 exported, 16223 preferred
comix_peer_44596_ipv4_1      594 imported, 0 filtered, 25 exported, 10 preferred
comix_peer_44596_ipv6_1      1147 imported, 0 filtered, 50 exported, 0 preferred
comix_peer_8298_ipv4_1       23 imported, 0 filtered, 25 exported, 0 preferred
comix_peer_8298_ipv6_1       34 imported, 0 filtered, 50 exported, 0 preferred
fogixp_peer_47498_ipv4_1     3286 imported, 1 filtered, 27 exported, 3077 preferred
fogixp_peer_47498_ipv6_1     1838 imported, 0 filtered, 39 exported, 1056 preferred
freeix_ch_peer_51530_ipv4_1  355 imported, 0 filtered, 28 exported, 0 preferred
freeix_ch_peer_51530_ipv6_1  143 imported, 0 filtered, 53 exported, 0 preferred

Some observations:

  • Nobody is inhibited at FreeIX Switzerland. It stands to reason therefore, that it has the most exported prefixes: 28 for IPv4 and 53 for IPv6.
  • Two members are inhibited at CHIX, which makes it have the lowest amount of exported prefixes: 24 for IPv4 and 27 for IPv6.
  • All members at each exchange (group) will have the same amount of prefixes. I can confirm that at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all peers have the same amount.
  • If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit BGP community (eg (50869,3020,1) to inhibit country Switzerland), they could tweak these numbers.

What’s next

This all adds up. I’d like to test the waters with my friendly neighborhood canaries a little bit, to make sure that announcements are expected, and traffic flows where appropriate. In the mean time, I’ll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of course FreeIX Greece in Thessaloniki. I’ll try to get the Milano VPP router deployed (it’s already installed and configured, but currently powered off) and connected to PCIX, MIX and a few others.

How can you help?

If you’re willing to participate with a VPP router and connect it to either multiple local internet exchanges (like I’ve demonstrated in Zurich), or better yet, to one or more of the other existing routers, I would welcome your contribution. [Contact] me for details.

A bit further down the pike, a connection from Amsterdam to Zurich, from Zurich to Milan and from Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of inner payload, please also [reach out] as I’m sure many small network operators would be thrilled.