About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation service router), VPP will look and feel quite familiar as many of the approaches are shared between the two.
There’s some really fantastic features in VPP, some of which are lesser well known, and not always very well documented. In this article, I will describe a unique usecase in which I think VPP will excel, notably acting as a gateway for Internet Exchange Points.
In this first article, I’ll take a closer look at three things that would make such a gateway possible: bridge domains, MAC address filtering and traffic shaping.
Introduction
Internet Exchanges are typically L2 (ethernet) switch platforms that allow their connected members to exchange traffic amongst themselves. Not all members share physical locations with the Internet Exchange itself, for example the IXP may be at NTT Zurich, but the member may be present in Interxion Zurich. For smaller clubs, like IPng Networks, it’s not always financially feasible (or desirable) to order a dark fiber between two adjacent datacenters, or even a cross connect in the same datacenter (as many of them are charging exorbitant fees for what is essentially passive fiber optics and patch panels), if the amount of traffic passed is modest.
One solution to such problems is to have one member transport multiple end-user downstream members to the platform, for example by means of an Ethernet over MPLS or VxLAN transport from where the enduser lives, to the physical port of the Internet Exchange. These transport members are often called IXP Resellers noting that usually, but not always, some form of payment is required.
From the point of view of the IXP, it’s often the case that there is a one MAC address per member limitation, and not all members will have the same bandwidth guarantees. Many IXPs will offer physical connection speeds (like a Gigabit, TenGig or HundredGig port), but they also have a common practice to limit the passed traffic by means of traffic shaping, for example one might have a TenGig port but only entitled to pass 3.0 Gbit/sec of traffic in- and out of the platform.
For a long time I thought this kind of sucked, after all, who wants to connect to an internet exchange point but then see their traffic rate limited? But if you think about it, this is often to protect both the member, and the reseller, and the exchange itself: if the total downstream bandwidth to the reseller is potentially larger than the reseller’s port to the exchange, and this is almost certainly the case in the other direction: the total IXP bandwidth that might go to one individual members, is significantly larger than the reseller’s port to the exchange.
Due to these two issues, a reseller port may become a bottleneck and packetlo may occur. To protect the ecosystem, having the internet exchange try to enforce fairness and bandwidth limits makes operational sense.
VPP as an IXP Gateway
Here’s a few requirements that may be necessary to provide an end-to-end solution:
- Downstream ports MAY be untagged, or tagged, in which case encapsulation (for example .1q VLAN tags) SHOULD be provided, one per downstream member.
- Each downstream member MUST ONLY be allowed to send traffic from one or more registered MAC addresses, in other words, strict filtering MUST be applied by the gateway.
- If a downstream member is assigned an up- and downstream bandwidth limit, this MUST be enforced by the gateway.
Of course, all sorts of other things come to mind – perhaps MPLS encapsulation, or VxLAN/GENEVE tunneling endpoints, and certainly some monitoring with SNMP or Prometheus, and how about just directly integrating this gateway with [IXPManager] while we’re at it. Yes, yes! But for this article, I’m going to stick to the bits and pieces regarding VPP itself, and leave the other parts for another day!
First, I build a quick lab out of this, by taking one supermicro bare metal server with VPP (it will be the VPP IXP Gateway), and a couple of Debian servers and switches to simulate clients (A-J):
- Client A-D (on port
e0
-e3
) will use192.0.2.1-4/24
and2001:db8::1-5/64
- Client E-G (on switch port
e0
-e2
of switch0, behind portxe0
) will use192.0.2.5-7/24
and2001:db8::5-7/64
- Client H-J (on switch port
e0
-e2
of switch1, behind portxe1
) will use192.0.2.8-10/24
and2001:db8::8-a/64
- There will be a server attached to port
xxv0
with address198.0.2.254/24
and2001:db8::ff/64
- The server will run
iperf3
.
- The server will run
VPP: Bridge Domains
The fundamental topology described in the picture above tries to bridge together a bunch of untagged
ports (e0
..e3
1Gbit each)) with two tagged ports (xe0
and xe1
, 10Gbit) into an upstream IXP
port (xxv0
, 25Gbit). One thing to note for the pedants (and I love me some good pedantry) is that
the total physical bandwidth to downstream members in this gateway (4x1+2x10 == 24Gbit) is lower
than the physical bandwidth to the IXP platform (25Gbit), which makes sense. It means that there
will not be contention per se.
Building this topology in VPP is rather straight forward by using a so called Bridge Domain, which will be referred to by its bridge-id, for which I’ll rather arbitrarily choose 8298:
vpp# create bridge-domain 8298
vpp# set interface l2 bridge xxv0 8298
vpp# set interface l2 bridge e0 8298
vpp# set interface l2 bridge e1 8298
vpp# set interface l2 bridge e2 8298
vpp# set interface l2 bridge e3 8298
vpp# set interface l2 bridge xe0 8298
vpp# set interface l2 bridge xe1 8298
VPP: Bridge Domain Encapsulations
I cheated a little bit in the previous section: I added the two TenGig ports called xe0
and xe1
directly to the bridge; however they are trunk ports to breakout switches which will each contain
three additional downstream customers. So to add these six new customers, I will do the following:
vpp# set interface l3 xe0
vpp# create sub-interfaces xe0 10
vpp# create sub-interfaces xe0 20
vpp# create sub-interfaces xe0 30
vpp# set interface l2 bridge xe0.10 8298
vpp# set interface l2 bridge xe0.20 8298
vpp# set interface l2 bridge xe0.30 8298
The first command here puts the interface xe0
back into Layer3 mode, which will detach it from the
bridge-domain. The second set of commands creates sub-interfaces with dot1q tags 10, 20 and 30
respectively. The third set then adds these three sub-interfaces to the bridge. By the way, I’ll do
this for both xe0
shown above, but also for the second xe1
port, so all-up that makes 6
downstream member ports.
Readers of my articles at this point may have a little bit of an uneasy feeling: “What about the
VLAN Gymnastics?” I hear you ask :) You see, VPP will generally just pick up these ethernet frames
from xe0.10
which are tagged, and add them as-is to the bridge, which is weird, because all the
other bridge ports are expecting untagged frames. So what I must do is tell VPP, upon receipt of a
tagged ethernet frame on these ports, to strip the tag; and on the way out, before transmitting the
ethernet frame, to wrap it into its correct encapsulation. This is called tag rewriting in VPP,
and I’ve written a bit about it in [this article] in case
you’re curious. But to cut to the chase:
vpp# set interface l2 tag-rewrite xe0.10 pop 1
vpp# set interface l2 tag-rewrite xe0.20 pop 1
vpp# set interface l2 tag-rewrite xe0.30 pop 1
vpp# set interface l2 tag-rewrite xe1.10 pop 1
vpp# set interface l2 tag-rewrite xe1.20 pop 1
vpp# set interface l2 tag-rewrite xe1.30 pop 1
Allright, with the VLAN gymnastics properly applied, I now have a bridge with all ten downstream
members and one upstream port (xxv0
):
vpp# show bridge-domain 8298 int
BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term arp-ufwd Learn-co Learn-li BVI-Intf
8298 1 0 off on on flood on off off 1 16777216 N/A
Interface If-idx ISN SHG BVI TxFlood VLAN-Tag-Rewrite
xxv0 3 1 0 - * none
e0 5 1 0 - * none
e1 6 1 0 - * none
e2 7 1 0 - * none
e3 8 1 0 - * none
xe0.10 19 1 0 - * pop-1
xe0.20 20 1 0 - * pop-1
xe0.30 21 1 0 - * pop-1
xe1.10 22 1 0 - * pop-1
xe1.20 23 1 0 - * pop-1
xe1.30 24 1 0 - * pop-1
One cool thing to re-iterate is that VPP is really a router, not a switch. It’s
entirely possible and common to create two completely independent subinterfaces with .1q tag 10 (in
my case, xe0.10
and xe1.10
and use the bridge-domain to tie them together.
Validating Bridge Domains
Looking at my clients above, I can see that several of them are untagged (e0
-e3
) and a few of
them are tagged behind ports xe0
and xe1
. It should be straight forward to validate reachability
with the following simple ping command:
pim@clientA:~$ fping -a -g 192.0.2.0/24
192.0.2.1 is alive
192.0.2.2 is alive
192.0.2.3 is alive
192.0.2.4 is alive
192.0.2.5 is alive
192.0.2.6 is alive
192.0.2.7 is alive
192.0.2.8 is alive
192.0.2.9 is alive
192.0.2.10 is alive
192.0.2.254 is alive
At this point the table stakes configuration provides for a Layer2 bridge domain spanning all of these ports, including performing the correct encapsulation on the TenGig ports that connect to the switches. There is L2 reachability between all clients over this VPP IXP Gateway.
✅ Requirement #1 is implemented!
VPP: MAC Address Filtering
Enter classifiers! Actually while doing the research for this article, I accidentally nerd-sniped myself while going through the features provided by VPP’s classifier system, and holy moly is that thing powerful!
I’m only going to show the results of that little journey through the code base and documentation,
but in an upcoming article I intend to do a thorough deep-dive into VPP classifiers, and add them to
vppcfg
because I think that would be the bee’s knees!
Back to the topic of MAC address filtering, a classifier would look roughly like this:
vpp# classify table acl-miss-next deny mask l2 src table 5
vpp# classify session acl-hit-next permit table-index 5 match l2 src 00:01:02:03:ca:fe
vpp# classify session acl-hit-next permit table-index 5 match l2 src 00:01:02:03:d0:d0
vpp# set interface input acl intfc e0 l2-table 5
vpp# show inacl type l2
Intfc idx Classify table Interface name
5 5 e0
The first line create a classify table where we’ll want to match on Layer2 source addresses, and if
there is no entry in the table that matches, the default will be to deny (drop) the ethernet
frame. The next two lines add an entry for ethernet frames which have Layer2 source of the cafe
and d0d0 MAC addresses. When matching, the action is to permit (accept) the ethernet frame.
Then, I apply this classifier as an l2 input ACL on interface e0
.
Incidentally the input ACL can operate at five distinct points in the packet’s journey through the dataplane. At the Layer2 input stage, like I’m using here, in the IPv4 and IPv6 input path, and when punting traffic for IPv4 and IPv6 respectively.
Validating MAC filtering
Remember when I created the classify table and added two bogus MAC addresses to it? Let me show you
what would happen on client A, which is directly connected to port e0
.
pim@clientA:~$ ip -br link show eno3
eno3 UP 3c:ec:ef:6a:7b:74 <BROADCAST,MULTICAST,UP,LOWER_UP>
pim@clientA:~$ ping 192.0.2.254
PING 192.0.2.254 (192.0.2.254) 56(84) bytes of data.
...
This is expected because ClientA’s MAC address has not yet been added to the classify table driving the Layer2 input ACL, which is quicky remedied like so:
vpp# classify session acl-hit-next permit table-index 5 match l2 src 3c:ec:ef:6a:7b:74
...
64 bytes from 192.0.2.254: icmp_seq=34 ttl=64 time=2048 ms
64 bytes from 192.0.2.254: icmp_seq=35 ttl=64 time=1024 ms
64 bytes from 192.0.2.254: icmp_seq=36 ttl=64 time=0.450 ms
64 bytes from 192.0.2.254: icmp_seq=37 ttl=64 time=0.262 ms
✅ Requirement #2 is implemented!
VPP: Traffic Policers
I realize that from the IXP’s point of view, not all the available bandwidth behind xxv0
should be
made available to all clients. Some may have negotiated a higher- or lower- bandwidth available to
them. Therefor, the VPP IXP Gateway should be able to rate limit the traffic through the it, for
which a VPP feature already exists: Policers.
Consider for a moment our client A (untagged on port e0
), and client E (behind port xe0
with a
dot1q tag of 10). Client A has a bandwidth of 1Gbit, but client E nominally has a bandwidth of
10Gbit. If I were to want to restrict both clients to, say, 150Mbit, I could do the following:
vpp# policer add name client-a rate kbps cir 150000 cb 15000000 conform-action transmit
vpp# policer input name client-a e0
vpp# policer output name client-a e0
vpp# policer add name client-e rate kbps cir 150000 cb 15000000 conform-action transmit
vpp# policer input name client-e xe0.10
vpp# policer output name client-e xe0.10
And here’s where I bump into a stubborn VPP dataplane. I would’ve expected the input and output
packet shaping to occur on both the untagged interface e0
as well as the tagged interface
xe0.10
, but alas, the policer only works in one of these four cases. Ouch!
I read the code around vnet/src/policer/
and understand the following:
- On input, the policer is applied on
device-input
which is the Phy, not the Sub-Interface. This explains why the policer works on untagged, but not on tagged interfaces. - On output, the policer is applied on
ip4-output
andip6-output
, which works only for L3 enabled interfaces, not for L2 ones like the ones in this bridge domain.
I also tried to work with classifiers, like in the MAC address filtering above – but I concluded here as well, that the policer works only on input, not on output. So the mission is now to figure out how to enable an L2 policer on (1) untagged output, and (2) tagged in- and output.
❌ Requirement #3 is not implemented!
What’s Next
It’s too bad that policers are a bit fickle. That’s quite unfortunate, but I think fixable. I’ve
started a thread on vpp-dev@
to discuss, and will reach out to Stanislav who added the
policer output capability in commit e5a3ae0179
.
Of course, this is just a proof of concept. I typed most of the configuration by hand on the VPP IXP Gateway, just to show a few of the more advanced features of VPP. For me, this triggered a whole new line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and arbitrary traffic redirection through VPP’s directed graph (eg. selecting a next node for processing). I’m going to deep-dive into this classifier behavior in an upcoming article, and see how I might add this to [vppcfg], because I think it would be super powerful to abstract away the rather complex underlying API into something a little bit more … user friendly. Stay tuned! :)