Introduction
A few months ago, I wrote about [an idea] to help boost the value of small Internet Exchange Points (IXPs). When such an exchange doesn’t have many members, then the operational costs of connecting to it (cross connects, router ports, finding peers, etc) are not very favorable.
Clearly, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through the IXP improves routing efficiency and fault-tolerance, and at the same time it avoids traffic going the scenic route to a large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.
Refresher: FreeIX Remote
Let’s take for example the [Free IX in Greece] that was announced at GRNOG16 in Athens on April 19th, 2024. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will be Gratis, so please make sure to apply if you’re in this region! I myself have connected one very special router to Free IX Greece, which will be offering an outreach infrastructure by connecting to other Internet Exchange Points in Amsterdam, and allowing all FreeIX Greece members to benefit from that in the following way:
-
FreeIX Remote uses AS50869 to peer with any network operator (or routeserver) available at public Internet Exchange Points or using private interconnects. For these peers, it looks like a completely normal service provider in this regard. It will connect to internet exchange points, and learn a bunch of routes and announce other routes.
-
FreeIX Remote members can join the program, after which they are granted certain propagation permissions by FreeIX Remote at the point where they have a BGP session with AS50869. The prefixes learned on these member sessions are marked as such, and will be allowed to propagate. Members will receive some or all learned prefixes from AS50869.
-
FreeIX members can set fine grained BGP communities to determine which of their prefixes are propagated to and from which locations, by router, country or Internet Exchange Point.
Members at smaller internet exchange points greatly benefit from this type of outreach, by receiving large portions of the public internet directly at their preferred peering location. The Free IX Remote routers will carry member traffic to and from these remote Internet Exchange Points. My [previous article] went into a good amount of detail on the principles of operation, but back then I made a promise to come back to the actual implementation of such a complex routing topology. As a starting point, I work with the structure I shared in [IPng’s Routing Policy]. If you haven’t read that yet, I think it may make sense to take a look as many of the structural elements and concepts will be similar.
Implementation
The routing policy calls for three classes of (large) BGP communities: informational, permission and inhibit. It also defines a few classic BGP communties, but I’ll skip over those as they are not very interesting. Firstly, I will use the informational communities to tag which prefixes were learned by which router, in which country and at which internet exchange point, which I will call a group.
Then, I will use the same structure to grant members permissions, that is to say, when AS50869 learns their prefixes, they will get tagged with specific action communities that enable propagation to other places. I will call this ‘Member-to-IXP’. Sometimes, I’d like to be able to inhibit propagation of ‘Member-to-IXP’, so there will be a third set of communities that perform this function. Finally, matching on the informational communities in a clever way will enable a symmetric ‘IXP-to-Member’ propagation.
To help structure this implementation, it helps if I think about it in the following way:
Let’s say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a member called M1 at IXP1, and that member is ‘permitted’ to reach IXP2 and IXP3, but it is ‘inhibited’ from reaching IXP4. My FreeIX Remote implementation now has to satisfy three main requirements:
- Ingress: learn prefixes (from peers and members alike) at internet exchange points or private network interconnects, and ’tag’ them with the correct informational communities.
- Egress: Member-to-IXP: Announce M1’s prefixes to IXP2 and IXP3, but not to IXP4.
- Egress: IXP-to-Member: Announce IXP2’s and IXP3’s prefixes to M1, but not IXP4’s.
Defining Countries and Routers
I’ll start by giving each country which has at least one router a unique country_id in a YAML file, leaving the value 0 to mean ‘all’ countries:
$ cat config/common/countries.yaml
country:
all: 0
CH: 1
NL: 2
GR: 3
IT: 4
Each router has its own configuration file, and at the top, I’ll define some metadata which includes things like the country in which it operates, and its own unique router_id, like so:
$ cat config/chrma0.net.free-ix.net.yaml
device:
id: 1
hostname: chrma0.free-ix.net
shortname: chrma0
country: CH
loopbacks:
ipv4: 194.126.235.16
ipv6: "2a0b:dd80:3101::"
location: "Hofwiesenstrasse, Ruemlang, Zurich, Switzerland"
...
Defining communities
Next, I define the BGP communities in class
and subclass
types, in the following YAML structure:
ebgp:
community:
legacy:
noannounce: 0
blackhole: 666
inhibit: 3000
prepend1: 3100
prepend2: 3200
prepend3: 3300
large:
class:
informational: 1000
permission: 2000
inhibit: 3000
prepend1: 3100
prepend2: 3200
prepend3: 3300
subclass:
all: 0
router: 10
country: 20
group: 30
asn: 40
Defining Members
In order to keep this system manageable, I have to rely on automation. I intend to leverage the BGP community subclasses in a simple ACL system consisting of the following YAML, taking my buddy Antonios’ network as an example:
$ cat config/common/members.yaml
member:
210312:
description: DaKnObNET
prefix_filter: AS-SET-DNET
permission: [ router:chrma0 ]
inhibit: [ group:chix ]
...
The syntax of the permission
and inhibit
fields are identical. They are lists of key:value pairs
where they key must be one of the subclasses (eg. ‘router’, ‘country’, ‘group’, ‘asn’), and the
value appropriate for that type. In this example, AS50869 is being asked to grant permissions for
Antonios’ prefixes to any peer connected to router:chrma0
, but inhibit propagation to/from the
exchange point called group:chix
. I could extend this list, for example by adding a permission to
country:NL
or an inhibit to router:grskg0
and so on.
I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its members.
Ingress: Learning Prefixes
With what I’ve defined so far, I can start to set informational BGP communtiies:
- The prefixes learned on subclass router for
chrma0
will have value of device.id=1:(50869,1010,1)
- The prefixes learned on subclass country for
chrma0
will learn from device.country=CH and be able to look up incountries['CH']
that this means value 1:(50869,1020,1)
- When learning prefixes from a given internet exchange, Kees already knows its PeeringDB
ixp_id, which is a unique value for each exchange point. Thus, subclass group for
chrma0
at [CommunityIX] is ixp_id=2013:(50869,1030,2013)
Ingress: Learning from members
I need to make sure that members send only the prefixes that I expect from them. To do this, I’ll make use of a common tool called [bgpq4] which cobbles together the prefixes belonging to an AS-SET by referencing one or more IRR databases.
In Python, I’ll prepare the Jinja context by generating the prefix filter lists like so:
if session["type"] == "member":
session = {**session, **data["member"][asn]}
pf = ebgp_merge_value(data["ebgp"], group, session, "prefix_filter", None)
if pf:
ctx["prefix_filter"] = {}
pfn = pf
pfn = pfn.replace("-", "_")
pfn = pfn.replace(":", "_")
for af in [4, 6]:
filter_name = "%s_%s_IPV%d" % (groupname.upper(), pfn, af)
filter_contents = fetch_bgpq(filter_name, pf, af, allow_morespecifics=True)
if "[" in filter_contents:
ctx["prefix_filter"][filter_name] = { "str": filter_contents, "af": af }
ctx["prefix_filter_ipv%d" % af] = True
else:
log.warning(f"Filter {filter_name} is empty!")
ctx["prefix_filter_ipv%d" % af] = False
First, if a given BGP session is of type member, I’ll merge the member[asn]
dictionary
into the ebgp.group.session[asn]
. I’ve left out error handling for brevity, but in case the member
YAML file doesn’t have an entry for the given ASN, it’ll just revert back to being of type peer.
I’ll use a helper function ebgp_merge_value()
to walk the YAML hiearchy from the member-data
enriched session to the group and finally to the ebgp scope, looking for the existence of a
key called prefix_filter and defaulting to None in case none was found. With the value of
prefix_filter in hand (in this case AS-SET-DNET
), I shell out to bgpq4
for IPv4 and IPv6
respectively. Sometimes, there are no IPv6 prefixes (why must you be like this?!) and sometimes
there are no IPv4 prefixes (welcome to the Internet, kid!)
All of this context, including the session and group information, are then fed as context to a Jinja renderer, where I can use them in an import filter like so:
{% for plname, pl in (prefix_filter | default({})).items() %}
{{pl.str}}
{% endfor %}
filter ebgp_{{group_name}}_{{their_asn}}_import {
{% if not prefix_filter_ipv4 | default(True) %}
# WARNING: No IPv4 prefix filter found
if (net.type = NET_IP4) then reject;
{% endif %}
{% if not prefix_filter_ipv6 | default(True) %}
# WARNING: No IPv6 prefix filter found
if (net.type = NET_IP6) then reject;
{% endif %}
{% for plname, pl in (prefix_filter | default({})).items() %}
{% if pl.af == 4 %}
if (net.type = NET_IP4 && ! (net ~ {{plname}})) then reject;
{% elif pl.af == 6 %}
if (net.type = NET_IP6 && ! (net ~ {{plname}})) then reject;
{% endif %}
{% endfor %}
{% if session_type is defined %}
if ! ebgp_import_{{session_type}}({{their_asn}}) then reject;
{% endif %}
# Add FreeIX Remote: Informational
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.router}},{{device.id}})); ## informational.router = {{ device.hostname }}
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.country}},{{country[device.country]}})); ## informational.country = {{ device.country }}
{% if group.peeringdb_ix.id %}
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.group}},{{group.peeringdb_ix.id}})); ## informational.group = {{ group_name }}
{% endif %}
## NOTE(pim): More comes here, see Member-to-IXP below
accept;
}
Let me explain what’s going on here, as Jinja templating language that my generator uses is a bit
… chatty. The first block will print the dictionary of zero or more prefix_filter
entries. If
the prefix_filter
context variable doesn’t exist, assume it’s the empty dictionary and thus,
print no prefix lists.
Then, I create a Bird2 filter and these must each have a globally unique name. I satisfy this
requirement by giving it a name with the tuple of {group, their_asn}. The first thing this filter
does, is inspect prefix_filter_ipv4
and prefix_filter_ipv6
, and if they are explicitly set to
False (for example, if a member doesn’t have any IRR prefixes associated with their AS-SET), then
I’ll reject any prefixes from them. Then, I’ll match the prefixes with the prefix_filter
, if
provided, and reject any prefixes that aren’t in the list I’m expecting on this session. Assuming
we’re still good to go, I’ll hand this prefix off to a function called ebgp_import_peer()
for
peers and ebgp_import_member()
for members, both of which ensure BGP communities are scrubbed.
function ebgp_import_peer(int remote_as) -> bool
{
# Scrub BGP Communities (RFC 7454 Section 11)
bgp_community.delete([(50869, *)]);
bgp_large_community.delete([(50869, *, *)]);
# Scrub BLACKHOLE community
bgp_community.delete((65535, 666));
return ebgp_import(remote_as);
}
function ebgp_import_member(int remote_as) -> bool
{
# We scrub only our own (informational, permissions) BGP Communities for members
bgp_large_community.delete([(50869,1000..2999,*)]);
return ebgp_import(remote_as);
}
After scrubbing the communities (peers are not allowed to set any communities, and members are not
allowed to set their own informational or permissions communities, but they are allowed to inhibit
themselves or prepend, if they wish), one last check is performed by calling the underlying
ebgp_import()
:
function ebgp_import(int remote_as) -> bool
{
if aspath_bogon() then return false;
if (net.type = NET_IP4 && ipv4_bogon()) then return false;
if (net.type = NET_IP6 && ipv6_bogon()) then return false;
if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;
# Graceful Shutdown (https://www.rfc-editor.org/rfc/rfc8326.html)
if (65535, 0) ~ bgp_community then bgp_local_pref = 0;
return true;
}
Here, belt-and-suspenders checks are performed, notably bogon AS Paths, IPv4/IPv6 prefixes and RPKI invalids are filtered out. If the prefix has well-known community for [BGP Graceful Shutdown], honor it and set the local preference to zero (making sure to prefer any other available path).
OK, after all these checks are done, I am finally ready to accept the prefix from this peer or member. It’s time to add the informational communities based on the router_id, the router’s country_id and (if this is a session at a public internet exchange point documented in PeeringDB), the group’s ixp_id.
Ingress Example: member
Here’s what the rendered template looks like for Antonios’ member session at CHIX:
# bgpq4 -Ab4 -R 32 -l 'define CHIX_AS_SET_DNET_IPV4' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV4 = [
44.31.27.0/24{24,32}, 44.154.130.0/24{24,32}, 44.154.132.0/24{24,32},
147.189.216.0/21{21,32}, 193.5.16.0/22{22,32}, 212.46.55.0/24{24,32}
];
# bgpq4 -Ab6 -R 128 -l 'define CHIX_AS_SET_DNET_IPV6' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV6 = [
2001:678:f5c::/48{48,128}, 2a05:dfc1:9174::/48{48,128}, 2a06:9f81:2500::/40{40,128},
2a06:9f81:2600::/40{40,128}, 2a0a:6044:7100::/40{40,128}, 2a0c:2f04:100::/40{40,128},
2a0d:3dc0::/29{29,128}, 2a12:bc0::/29{29,128}
];
filter ebgp_chix_210312_import {
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
if ! ebgp_import_member(210312) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
## NOTE(pim): More comes here, see Member-to-IXP below
accept;
}
Ingress Example: peer
For completeness, here’s a regular peer Cloudflare at CHIX, and I hope you agree that the Jinja template renders down to something waaaay more readable now:
filter ebgp_chix_13335_import {
if ! ebgp_import_peer(13335) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
accept;
}
Most sessions will actually look like this one: just learning prefixes, scrubbing inbound
communities that are nobody’s business to be setting but mine, tossing weird prefixes like bogons
and then setting typically the three informational communities. I now know exactly which prefixes
are picked up at group CHIX, which ones in country Switzerland, and which ones on router chrma0
.
Egress: Propagating Prefixes
And with that, I’ve completed the ’learning’ part. Let me move to the ‘propagating’ part. A design goal of FreeIX Remote is to have symmetric propagation. In my example above, member M1 should have its prefixes announced at IXP2 and IXP3, and all prefixes learned at IXP2 and IXP3 should be announced to member M1.
First, let me create a helper function in the generator. It’s job is to take the symbolic
member.*.permissions
and member.*.inhibit
lists and resolve them into a structure of numeric
values suitable for BGP community list adding and matching. It’s a bit of a beast, but I’ve
simplified it a bit. Notably, I’ve removed all the error and exception handling for brevity:
def parse_member_communities(data, asn, type):
myasn = data["ebgp"]["asn"]
cls = data["ebgp"]["community"]["large"]["class"]
sub = data["ebgp"]["community"]["large"]["subclass"]
bgp_cl = []
member = data["member"][asn]
for perm in perms:
if perm == "all":
el = { "class": int(cls[type]), "subclass": int(sub["all"]),
"value": 0, "description": f"{type}.all" }
return [el]
k, v = perm.split(":")
if k == "country":
country_id = data["country"][v]
el = { "class": int(cls[type]), "subclass": int(sub["country"]),
"value": int(country_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "asn":
el = { "class": int(cls[type]), "subclass": int(sub["asn"]),
"value": int(v), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "router":
device_id = data["_devices"][v]["id"]
el = { "class": int(cls[type]), "subclass": int(sub["router"]),
"value": int(device_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "group":
group = data["ebgp"]["groups"][v]
if isinstance(group["peeringdb_ix"], dict):
ix_id = group["peeringdb_ix"]["id"]
else:
ix_id = group["peeringdb_ix"]
el = { "class": int(cls[type]), "subclass": int(sub["group"]),
"value": int(ix_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
else:
log.warning (f"No implementation for {type} subclass '{k}' for member AS{asn}, skipping")
return bgp_cl
The essence of this function is to take a human readable list of symbols, like ‘router:chrma0’ and look up what subclass is called ‘router’ and what router_id is ‘chrma0’. It does this for keywords ‘router’, ‘country’, ‘group’ and ‘asn’ and for a special keyword called ‘all’ as well.
Running this a function on Antonios’ member data above would reveal the following:
Member 210312 has permissions:
[{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}]
Member 210312 has inhibits:
[{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}]
The neat thing about this is, that this data will come in handy for both types of propagation, and
the parse_member_communities()
helper function returns pretty readable data, which will help in
debugging and further understanding the ultimately generated configuration.
Egress: Member-to-IXP
OK, when I learned Antonios’ prefixes, I have instructed the system to propagate them to all
sessions on router chrma0
, except sessions on group chix
. This means that in the direction of
from AS50869 to others, I can do the following:
1. Tag permissions and inhibits on ingress
I add a tiny bit of logic using this data structure I just created above. In the import filter,
remember I added NOTE(pim): More comes here
? After setting the informational communities, I also
add these:
{% if session_type == "member" %}
{% if permissions %}
# Add FreeIX Remote: Permission
{% for el in permissions %}
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% if inhibits %}
# Add FreeIX Remote: Inhibit
{% for el in inhibits %}
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% endif %}
Seeing as this block only gets rendered if the session type is member, let me show you how Antonios’ import filter looks like in its full glory:
filter ebgp_chix_210312_import {
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
if ! ebgp_import_member(210312) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
# Add FreeIX Remote: Permission
bgp_large_community.add((50869,2010,1)); ## permission.router = chrma0
# Add FreeIX Remote: Inhibit
bgp_large_community.add((50869,3030,2365)); ## inhibit.group = chix
accept;
}
Remember, the ebgp_import_member()
helper will strip any informational (the 1000s) and permissions
(the 2000s), but it would allow Antonios to set inhibits and prepends (the 3000s) so these BGP
communities will still be allowed in. In other words, Antonios can’t give himself propagation rights
(sorry, buddy!) but if he would like to make AS50869 stop sending his prefixes to, say, CommunityIX,
he could simply add the BGP community (50869,3030,2013)
on his announcements, and that will get
honored. If he’d like AS50869 to prepend itself twice before announcing to peer AS8298, he could set
(50869,3200,8298)
and that will also get picked up.
2. Match permissions and inhibits on egress
Now that all of Antonios’ prefixes are tagged with permissions and inhibits, I can reveal how I implemented the export filters for AS50869:
function member_prefix(int group) -> bool
{
bool permitted = false;
if (({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
permitted = true;
}
if (({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
permitted = false;
}
return (permitted);
}
function valid_prefix(int group) -> bool
{
return (source_prefix() || member_prefix(group));
}
function ebgp_export_peer(int remote_as; int group) -> bool
{
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if !valid_prefix(group) then return false;
bgp_community.delete([(50869, *)]);
bgp_large_community.delete([(50869, *, *)]);
return ebgp_export(remote_as);
}
From the bottom, the function ebgp_export_peer()
is invoked on each peering session, and it gets
the argument of the remote AS (for example 13335 for CloudFlare), and the group (for example 2365
for CHIX). The function ensures that it’s either a static route or a BGP route. Then it makes
sure it’s a valid_prefix()
for the group.
The valid_prefix()
function first checks if it’s one of our own (as in: AS50869’s own) prefixes,
which it does by calling source_prefix()
, which i’ve ommitted here as it would be a distraction.
All it does is check if the prefix is in a static prefix list generated with bgpq4
for AS50869
itself. The more interesting observation is that to be eligible, the prefix needs to be either
source_prefix()
or member_prefix(group)
.
The propagation decision for ‘Member-to-IXP’ actually happens in that member_prefix()
function. It
starts off by assuming the prefix is not permitted. Then it scans all relevant permissions
communities which may be present in the RIB for this prefix:
- is the
all
permissions community(50869,2000,0)
set? - what about the
router
permission(50869,2010,R)
for my router_id? - perhaps the
country
permission(50869,2020,C)
for my country_id? - or maybe the
group
permission(50869,2030,G)
for the ixp_id that this session lives on?
If any of these conditions are true, then this prefix might pe permitted, so I set the variable to
True. Next, I check and see if any of the inhibit communities are set, either by me (in
members.yaml
) or by the member on the live BGP session. If any one of them matches, then I flip
the variable to False again. Once the verdict is known, I can return True or False here, which
makes its way all the way up the call stack and ultimately announces the member prefix on the BGP
session, or not. Slick!
Egress: IXP-to-Member
At this point, members’ prefixes get announced at the correct internet exchange points, but I need to
satisfy one more requirement: the prefixes picked up at those IXPs, should also be announced to
members. For this, the helper dictionary with permissions and inhibits can be used in a clever way.
What if I held them against the informational communities? For example, I have permitted
Antonios to be annouced at any IXP connected to router chrma0
, then all prefixes I learned at
chrma0
are fair game, right? But, I configured an inhibit for Antonios’ prefixes at CHIX. No
problem, I have an informational community for all prefixes I learned from the CHIX group!
I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that I would announce to a peer, I will also announce to a member. Off I go, adding one last helper function to the BGP session Jinja template:
{% if session_type == "member" %}
function ebgp_export_{{group_name}}_{{their_asn}}(int remote_as; int group) -> bool
{
bool permitted = false;
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if valid_prefix(group) then return ebgp_export(remote_as);
{% for el in permissions | default([]) %}
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=true; ## {{el.description}}
{% endfor %}
{% for el in inhibits | default([]) %}
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=false; ## {{el.description}}
{% endfor %}
if (permitted) then return ebgp_export(remote_as);
return false;
}
{% endif %}
Note that in essence, this new function still calls valid_prefix()
, which in turn calls
source_prefix()
or member_prefix(group)
, so it announces the same prefixes that are also
announced to sessions of type ‘peer’. But then, I’ll also inspect the informational communities,
where the value of 0
is replaced with a wildcard, because ‘permit or inhibit all’ would mean
‘match any of these BGP communities’. This template renders as follows for Antonios at CHIX:
function ebgp_export_chix_210312(int remote_as; int group) -> bool
{
bool export = false;
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if valid_prefix(group) then return ebgp_export(remote_as);
if (bgp_large_community ~ [(50869,1010,1)]) then export=true; ## permission.router = chrma0
if (bgp_large_community ~ [(50869,1030,2365)]) then export=false; ## inhibit.group = chix
if (export) then return ebgp_export(remote_as);
return false;
}
Results
With this, the propagation logic is complete. Announcements are symmetric, that is to say the function
ebgp_export_chix_210312()
sees to it that Antonios gets the prefixes learned at router chrma0
but not those learned at group CHIX
. Similarly, the ebgp_export_peer()
ensures that Antonios’
prefixes are propagated to any session at router chrma0
except those sessions at group CHIX
.
I have installed VPP with [OSPFv3] unnumbered interfaces, so each router has exactly one IPv4 and IPv6 loopback address. The router in Rümlang has been operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki (grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan (itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara.
I deployed a test setup with a few permissions and inhibits on the Rümlang router, with many thanks to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the following test configuration:
member:
35202:
description: OnTheGo (Sam Aschwanden)
prefix_filter: AS-OTG
permission: [ router:chrma0 ]
inhibit: [ group:comix ]
210312:
description: DaKnObNET
prefix_filter: AS-SET-DNET
permission: [ router:chrma0 ]
inhibit: [ group:chix ]
212635:
description: Jurrian van Iersel
prefix_filter: AS212635:AS-212635
permission: [ router:chrma0 ]
inhibit: [ group:chix, group:fogixp ]
I can see the following prefix learn/announce counts towards members:
pim@chrma0:~$ for i in $(birdc show protocol | grep member | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes; done
chix_member_35202_ipv4_1 2 imported, 0 filtered, 159984 exported, 0 preferred
chix_member_35202_ipv6_1 2 imported, 0 filtered, 61730 exported, 0 preferred
chix_member_210312_ipv4_1 3 imported, 0 filtered, 3518 exported, 3 preferred
chix_member_210312_ipv6_1 2 imported, 0 filtered, 1251 exported, 2 preferred
comix_member_35202_ipv4_1 2 imported, 0 filtered, 159981 exported, 2 preferred
comix_member_35202_ipv4_2 2 imported, 0 filtered, 159981 exported, 1 preferred
comix_member_35202_ipv6_1 2 imported, 0 filtered, 61727 exported, 2 preferred
comix_member_35202_ipv6_2 2 imported, 0 filtered, 61727 exported, 1 preferred
fogixp_member_212635_ipv4_1 1 imported, 0 filtered, 442 exported, 1 preferred
fogixp_member_212635_ipv6_1 14 imported, 0 filtered, 181 exported, 14 preferred
freeix_ch_member_210312_ipv4_1 3 imported, 0 filtered, 3521 exported, 0 preferred
freeix_ch_member_210312_ipv6_1 2 imported, 0 filtered, 1253 exported, 0 preferred
Let me make a few observations:
- Hurricane Electric AS6939 is present at CHIX, and they tend to announce a very large number of prefixes. So every member who is permitted (and not inhibited) at CHIX will see all of those: Sam’s AS35202 is inhibited on CommunityIX but not on CHIX, and he’s permitted on both. That explains why he is seeing the routes on both sessions.
- I’ve inhibited Jurrian’s AS212635 to/from both CHIX and FogIXP, which means he will be seeing CommunityIX (~245 IPv4, 85 IPv6 prefixes), and FreeIX CH (~173 IPv4 and ~60 IPv6). We also send him the member prefixes, which is about 35 or so additional prefixes. This explains why Jurrian is receiving from us ~440 IPv4 and ~180 IPv6.
- Antonios’ AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077 or so IPv4 and 1056 IPv6 prefixes, while I’ve already added up FreeIX, CommunityIX, and our members (this is what we’re sending Jurrian!), at 330 resp 180, so Antonios should be getting about 3500 IPv4 prefixes and 1250 IPv6 prefixes.
In the other direction, I would expect to be announcing to peers only prefixes belonging to either AS50869 itself, or those of our members:
pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes || echo; done
chix_peer_212100_ipv4_1 57618 imported, 0 filtered, 24 exported, 778 preferred
chix_peer_212100_ipv6_1 21979 imported, 1 filtered, 37 exported, 7186 preferred
chix_peer_13335_ipv4_1 4767 imported, 9 filtered, 24 exported, 4765 preferred
chix_peer_13335_ipv6_1 371 imported, 1 filtered, 37 exported, 369 preferred
chix_peer_6939_ipv4_1 151787 imported, 27 filtered, 24 exported, 133943 preferred
chix_peer_6939_ipv6_1 61191 imported, 6 filtered, 37 exported, 16223 preferred
comix_peer_44596_ipv4_1 594 imported, 0 filtered, 25 exported, 10 preferred
comix_peer_44596_ipv6_1 1147 imported, 0 filtered, 50 exported, 0 preferred
comix_peer_8298_ipv4_1 23 imported, 0 filtered, 25 exported, 0 preferred
comix_peer_8298_ipv6_1 34 imported, 0 filtered, 50 exported, 0 preferred
fogixp_peer_47498_ipv4_1 3286 imported, 1 filtered, 27 exported, 3077 preferred
fogixp_peer_47498_ipv6_1 1838 imported, 0 filtered, 39 exported, 1056 preferred
freeix_ch_peer_51530_ipv4_1 355 imported, 0 filtered, 28 exported, 0 preferred
freeix_ch_peer_51530_ipv6_1 143 imported, 0 filtered, 53 exported, 0 preferred
Some observations:
- Nobody is inhibited at FreeIX Switzerland. It stands to reason therefore, that it has the most exported prefixes: 28 for IPv4 and 53 for IPv6.
- Two members are inhibited at CHIX, which makes it have the lowest amount of exported prefixes: 24 for IPv4 and 27 for IPv6.
- All members at each exchange (group) will have the same amount of prefixes. I can confirm that at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all peers have the same amount.
- If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit
BGP community (eg
(50869,3020,1)
to inhibit country Switzerland), they could tweak these numbers.
What’s next
This all adds up. I’d like to test the waters with my friendly neighborhood canaries a little bit, to make sure that announcements are expected, and traffic flows where appropriate. In the mean time, I’ll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of course FreeIX Greece in Thessaloniki. I’ll try to get the Milano VPP router deployed (it’s already installed and configured, but currently powered off) and connected to PCIX, MIX and a few others.
How can you help?
If you’re willing to participate with a VPP router and connect it to either multiple local internet exchanges (like I’ve demonstrated in Zurich), or better yet, to one or more of the other existing routers, I would welcome your contribution. [Contact] me for details.
A bit further down the pike, a connection from Amsterdam to Zurich, from Zurich to Milan and from Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of inner payload, please also [reach out] as I’m sure many small network operators would be thrilled.