About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation service router), VPP will look and feel quite familiar as many of the approaches are shared between the two.
There’s some really fantastic features in VPP, some of which are lesser well known, and not always very well documented. In this article, I will describe a unique usecase in which I think VPP will excel, notably acting as a gateway for Internet Exchange Points.
In this first article, I’ll take a closer look at three things that would make such a gateway possible: bridge domains, MAC address filtering and traffic shaping.
Internet Exchanges are typically L2 (ethernet) switch platforms that allow their connected members to exchange traffic amongst themselves. Not all members share physical locations with the Internet Exchange itself, for example the IXP may be at NTT Zurich, but the member may be present in Interxion Zurich. For smaller clubs, like IPng Networks, it’s not always financially feasible (or desirable) to order a dark fiber between two adjacent datacenters, or even a cross connect in the same datacenter (as many of them are charging exorbitant fees for what is essentially passive fiber optics and patch panels), if the amount of traffic passed is modest.
One solution to such problems is to have one member transport multiple end-user downstream members to the platform, for example by means of an Ethernet over MPLS or VxLAN transport from where the enduser lives, to the physical port of the Internet Exchange. These transport members are often called IXP Resellers noting that usually, but not always, some form of payment is required.
From the point of view of the IXP, it’s often the case that there is a one MAC address per member limitation, and not all members will have the same bandwidth guarantees. Many IXPs will offer physical connection speeds (like a Gigabit, TenGig or HundredGig port), but they also have a common practice to limit the passed traffic by means of traffic shaping, for example one might have a TenGig port but only entitled to pass 3.0 Gbit/sec of traffic in- and out of the platform.
For a long time I thought this kind of sucked, after all, who wants to connect to an internet exchange point but then see their traffic rate limited? But if you think about it, this is often to protect both the member, and the reseller, and the exchange itself: if the total downstream bandwidth to the reseller is potentially larger than the reseller’s port to the exchange, and this is almost certainly the case in the other direction: the total IXP bandwidth that might go to one individual members, is significantly larger than the reseller’s port to the exchange.
Due to these two issues, a reseller port may become a bottleneck and packetlo may occur. To protect the ecosystem, having the internet exchange try to enforce fairness and bandwidth limits makes operational sense.
VPP as an IXP Gateway
Here’s a few requirements that may be necessary to provide an end-to-end solution:
- Downstream ports MAY be untagged, or tagged, in which case encapsulation (for example .1q VLAN tags) SHOULD be provided, one per downstream member.
- Each downstream member MUST ONLY be allowed to send traffic from one or more registered MAC addresses, in other words, strict filtering MUST be applied by the gateway.
- If a downstream member is assigned an up- and downstream bandwidth limit, this MUST be enforced by the gateway.
Of course, all sorts of other things come to mind – perhaps MPLS encapsulation, or VxLAN/GENEVE tunneling endpoints, and certainly some monitoring with SNMP or Prometheus, and how about just directly integrating this gateway with [IXPManager] while we’re at it. Yes, yes! But for this article, I’m going to stick to the bits and pieces regarding VPP itself, and leave the other parts for another day!
First, I build a quick lab out of this, by taking one supermicro bare metal server with VPP (it will be the VPP IXP Gateway), and a couple of Debian servers and switches to simulate clients (A-J):
- Client A-D (on port
e3) will use
- Client E-G (on switch port
e2of switch0, behind port
xe0) will use
- Client H-J (on switch port
e2of switch1, behind port
xe1) will use
- There will be a server attached to port
- The server will run
- The server will run
VPP: Bridge Domains
The fundamental topology described in the picture above tries to bridge together a bunch of untagged
e3 1Gbit each)) with two tagged ports (
xe1, 10Gbit) into an upstream IXP
xxv0, 25Gbit). One thing to note for the pedants (and I love me some good pedantry) is that
the total physical bandwidth to downstream members in this gateway (4x1+2x10 == 24Gbit) is lower
than the physical bandwidth to the IXP platform (25Gbit), which makes sense. It means that there
will not be contention per se.
Building this topology in VPP is rather straight forward by using a so called Bridge Domain, which will be referred to by its bridge-id, for which I’ll rather arbitrarily choose 8298:
vpp# create bridge-domain 8298 vpp# set interface l2 bridge xxv0 8298 vpp# set interface l2 bridge e0 8298 vpp# set interface l2 bridge e1 8298 vpp# set interface l2 bridge e2 8298 vpp# set interface l2 bridge e3 8298 vpp# set interface l2 bridge xe0 8298 vpp# set interface l2 bridge xe1 8298
VPP: Bridge Domain Encapsulations
I cheated a little bit in the previous section: I added the two TenGig ports called
directly to the bridge; however they are trunk ports to breakout switches which will each contain
three additional downstream customers. So to add these six new customers, I will do the following:
vpp# set interface l3 xe0 vpp# create sub-interfaces xe0 10 vpp# create sub-interfaces xe0 20 vpp# create sub-interfaces xe0 30 vpp# set interface l2 bridge xe0.10 8298 vpp# set interface l2 bridge xe0.20 8298 vpp# set interface l2 bridge xe0.30 8298
The first command here puts the interface
xe0 back into Layer3 mode, which will detach it from the
bridge-domain. The second set of commands creates sub-interfaces with dot1q tags 10, 20 and 30
respectively. The third set then adds these three sub-interfaces to the bridge. By the way, I’ll do
this for both
xe0 shown above, but also for the second
xe1 port, so all-up that makes 6
downstream member ports.
Readers of my articles at this point may have a little bit of an uneasy feeling: “What about the
VLAN Gymnastics?” I hear you ask :) You see, VPP will generally just pick up these ethernet frames
xe0.10 which are tagged, and add them as-is to the bridge, which is weird, because all the
other bridge ports are expecting untagged frames. So what I must do is tell VPP, upon receipt of a
tagged ethernet frame on these ports, to strip the tag; and on the way out, before transmitting the
ethernet frame, to wrap it into its correct encapsulation. This is called tag rewriting in VPP,
and I’ve written a bit about it in [this article] in case
you’re curious. But to cut to the chase:
vpp# set interface l2 tag-rewrite xe0.10 pop 1 vpp# set interface l2 tag-rewrite xe0.20 pop 1 vpp# set interface l2 tag-rewrite xe0.30 pop 1 vpp# set interface l2 tag-rewrite xe1.10 pop 1 vpp# set interface l2 tag-rewrite xe1.20 pop 1 vpp# set interface l2 tag-rewrite xe1.30 pop 1
Allright, with the VLAN gymnastics properly applied, I now have a bridge with all ten downstream
members and one upstream port (
vpp# show bridge-domain 8298 int BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term arp-ufwd Learn-co Learn-li BVI-Intf 8298 1 0 off on on flood on off off 1 16777216 N/A Interface If-idx ISN SHG BVI TxFlood VLAN-Tag-Rewrite xxv0 3 1 0 - * none e0 5 1 0 - * none e1 6 1 0 - * none e2 7 1 0 - * none e3 8 1 0 - * none xe0.10 19 1 0 - * pop-1 xe0.20 20 1 0 - * pop-1 xe0.30 21 1 0 - * pop-1 xe1.10 22 1 0 - * pop-1 xe1.20 23 1 0 - * pop-1 xe1.30 24 1 0 - * pop-1
One cool thing to re-iterate is that VPP is really a router, not a switch. It’s
entirely possible and common to create two completely independent subinterfaces with .1q tag 10 (in
xe1.10 and use the bridge-domain to tie them together.
Validating Bridge Domains
Looking at my clients above, I can see that several of them are untagged (
e3) and a few of
them are tagged behind ports
xe1. It should be straight forward to validate reachability
with the following simple ping command:
pim@clientA:~$ fping -a -g 192.0.2.0/24 192.0.2.1 is alive 192.0.2.2 is alive 192.0.2.3 is alive 192.0.2.4 is alive 192.0.2.5 is alive 192.0.2.6 is alive 192.0.2.7 is alive 192.0.2.8 is alive 192.0.2.9 is alive 192.0.2.10 is alive 192.0.2.254 is alive
At this point the table stakes configuration provides for a Layer2 bridge domain spanning all of these ports, including performing the correct encapsulation on the TenGig ports that connect to the switches. There is L2 reachability between all clients over this VPP IXP Gateway.
✅ Requirement #1 is implemented!
VPP: MAC Address Filtering
Enter classifiers! Actually while doing the research for this article, I accidentally nerd-sniped myself while going through the features provided by VPP’s classifier system, and holy moly is that thing powerful!
I’m only going to show the results of that little journey through the code base and documentation,
but in an upcoming article I intend to do a thorough deep-dive into VPP classifiers, and add them to
vppcfg because I think that would be the bee’s knees!
Back to the topic of MAC address filtering, a classifier would look roughly like this:
vpp# classify table acl-miss-next deny mask l2 src table 5 vpp# classify session acl-hit-next permit table-index 5 match l2 src 00:01:02:03:ca:fe vpp# classify session acl-hit-next permit table-index 5 match l2 src 00:01:02:03:d0:d0 vpp# set interface input acl intfc e0 l2-table 5 vpp# show inacl type l2 Intfc idx Classify table Interface name 5 5 e0
The first line create a classify table where we’ll want to match on Layer2 source addresses, and if
there is no entry in the table that matches, the default will be to deny (drop) the ethernet
frame. The next two lines add an entry for ethernet frames which have Layer2 source of the cafe
and d0d0 MAC addresses. When matching, the action is to permit (accept) the ethernet frame.
Then, I apply this classifier as an l2 input ACL on interface
Incidentally the input ACL can operate at five distinct points in the packet’s journey through the dataplane. At the Layer2 input stage, like I’m using here, in the IPv4 and IPv6 input path, and when punting traffic for IPv4 and IPv6 respectively.
Validating MAC filtering
Remember when I created the classify table and added two bogus MAC addresses to it? Let me show you
what would happen on client A, which is directly connected to port
pim@clientA:~$ ip -br link show eno3 eno3 UP 3c:ec:ef:6a:7b:74 <BROADCAST,MULTICAST,UP,LOWER_UP> pim@clientA:~$ ping 192.0.2.254 PING 192.0.2.254 (192.0.2.254) 56(84) bytes of data. ...
This is expected because ClientA’s MAC address has not yet been added to the classify table driving the Layer2 input ACL, which is quicky remedied like so:
vpp# classify session acl-hit-next permit table-index 5 match l2 src 3c:ec:ef:6a:7b:74 ... 64 bytes from 192.0.2.254: icmp_seq=34 ttl=64 time=2048 ms 64 bytes from 192.0.2.254: icmp_seq=35 ttl=64 time=1024 ms 64 bytes from 192.0.2.254: icmp_seq=36 ttl=64 time=0.450 ms 64 bytes from 192.0.2.254: icmp_seq=37 ttl=64 time=0.262 ms
✅ Requirement #2 is implemented!
VPP: Traffic Policers
I realize that from the IXP’s point of view, not all the available bandwidth behind
xxv0 should be
made available to all clients. Some may have negotiated a higher- or lower- bandwidth available to
them. Therefor, the VPP IXP Gateway should be able to rate limit the traffic through the it, for
which a VPP feature already exists: Policers.
Consider for a moment our client A (untagged on port
e0), and client E (behind port
xe0 with a
dot1q tag of 10). Client A has a bandwidth of 1Gbit, but client E nominally has a bandwidth of
10Gbit. If I were to want to restrict both clients to, say, 150Mbit, I could do the following:
vpp# policer add name client-a rate kbps cir 150000 cb 15000000 conform-action transmit vpp# policer input name client-a e0 vpp# policer output name client-a e0 vpp# policer add name client-e rate kbps cir 150000 cb 15000000 conform-action transmit vpp# policer input name client-e xe0.10 vpp# policer output name client-e xe0.10
And here’s where I bump into a stubborn VPP dataplane. I would’ve expected the input and output
packet shaping to occur on both the untagged interface
e0 as well as the tagged interface
xe0.10, but alas, the policer only works in one of these four cases. Ouch!
I read the code around
vnet/src/policer/ and understand the following:
- On input, the policer is applied on
device-inputwhich is the Phy, not the Sub-Interface. This explains why the policer works on untagged, but not on tagged interfaces.
- On output, the policer is applied on
ip6-output, which works only for L3 enabled interfaces, not for L2 ones like the ones in this bridge domain.
I also tried to work with classifiers, like in the MAC address filtering above – but I concluded here as well, that the policer works only on input, not on output. So the mission is now to figure out how to enable an L2 policer on (1) untagged output, and (2) tagged in- and output.
❌ Requirement #3 is not implemented!
It’s too bad that policers are a bit fickle. That’s quite unfortunate, but I think fixable. I’ve
started a thread on
vpp-dev@ to discuss, and will reach out to Stanislav who added the
policer output capability in commit
Of course, this is just a proof of concept. I typed most of the configuration by hand on the VPP IXP Gateway, just to show a few of the more advanced features of VPP. For me, this triggered a whole new line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and arbitrary traffic redirection through VPP’s directed graph (eg. selecting a next node for processing). I’m going to deep-dive into this classifier behavior in an upcoming article, and see how I might add this to [vppcfg], because I think it would be super powerful to abstract away the rather complex underlying API into something a little bit more … user friendly. Stay tuned! :)