Case Study - VLAN Gymnastics with VPP

VPP

About this series

Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation services router), VPP will look and feel quite familiar as many of the approaches are shared between the two.

After completing the Linux CP plugin, interfaces and their attributes such as addresses and routes can be shared between VPP and the Linux kernel in a clever way, so running software like FRR or Bird on top of VPP and achieving >100Mpps and >100Gbps forwarding rates are easily in reach! But after the controlplane is up and running, VPP has so much more to offer - many interesting L2 and L3 services that you’d expect in commercial (and very pricy) routers like Cisco ASR are well within reach.

When Fred and I were in Paris [report], I got stuck trying to configure an Ethernet over MPLS circuit for IPng from Paris to Zurich. Fred took a look for me and quickly determined “Ah, you forgot to do the VLAN gymnastics”. I found it a fun way to describe the solution to my problem back then, and come to think of it: the router really can be configured to hook up anything to pretty much anything – this post takes a look at similar flexibility in VPP.

Introduction

When I first started learning how to work on Cisco’s Advanced Services Router platform (Cisco IOS/XR), I was surprised that there is no concept of a switch. As many network engineers, I was used to be able to put a number of ports in the same switch VLAN; and take a different set of ports and put them into L3 mode with an IPv4/IPv6 address, or activate MPLS. And I was used to combining these two concepts by creating VLAN (L3) interfaces.

Turning to VPP, much like its commercial sibling Cisco IOS/XR, the mental model and approach they take is different. Each physical interface can have a number of sub-interfaces which carry an encapsulation, for example a dot1q, or a dot1ad or even a double-tagged (QinQ or QinAD). When ethernet frames arrive on the physical interface, VPP will match them to the sub-interface which is configured to receive frames of that specific encapsulation, and drop frames that do not match any sub-interface.

Sub Interfaces in VPP

There are several forms of sub-interface, let’s take a look at them:

1. create sub <interface> <subId> dot1q|dot1ad <vlanId>
2. create sub <interface> <subId> dot1q|dot1ad <vlanId> exact-match
3. create sub <interface> <subId> dot1q|dot1ad <vlanId> inner-dot1q <vlanId>|any
4. create sub <interface> <subId> dot1q|dot1ad <vlanId> inner-dot1q <vlanId> exact-match
5. create sub <interface> <subId>
6. create sub <interface> <subId>-<subId>
7. create sub <interface> <subId> untagged
8. create sub <interface> <subId> default

Alright, that’s a lot of choice! Let me go over these one by one.

  1. The first variant creates a sub-interface which will match frames with the first VLAN tag being either dot1q or dot1ad with the given vlanId. An important note to this: there might be more VLAN tags following in the ethernet frame, ie the frame may be QinQ or QinAD, and all of these will be matched.
  2. The second variant looks to do the same thing, but there, the frame will only match if there is exactly one VLAN tag, not more, not less. So this sub-interface will not match frames which are QinQ or QinAD.
  3. The third variant creates a sub-interface which matches an outer dot1q or dot1ad VLAN and in addition an inner dot1q tag. The special keyword any can be specified, which will make the sub-interface match QinQ or QinAD frames without caring which inner tag is used.
  4. The fourth variant looks a bit like the second one, in that it will match for frames which have exactly two VLAN tags (either dot1q.dot1q or dot1ad.dot1q). In this exact-match mode of operation, precisely those two tags must be present, and no other tags may follow.
  5. The fifth variant is simply a shorthand for the second one, it creates an exact-match dot1q with a vlanId equal to the given subId. This is the most obvious form, and people will recognize this as “just” a VLAN :)
  6. The sixth variant further expands on this pattern, and creates a list of these dot1q exact-match (eg. 100-200 will create 101 sub-interfaces).
  7. The seventh variant creates a sub-interface that matches any frames that have exactly zero tags (ie. untagged), and finally
  8. The eighth variant might match anything that is not matched in any other sub-interface (ie. the fallthrough default).

When I first saw this, it seemed overly complicated to me, but now that I’ve gotten to know this way of thinking, what’s being presented here is a way for any physical interface to branch off inbound traffic based on either exactly zero (untagged), exactly one (dot1q or dot1ad with exact-match), exactly two (outer dot1q or dot1ad followed by inner-dot1q with exact-match), and one outer tag followed by any inner tag(s). In other words, any combination of zero, one or two present tags on the frame can be matched and acted on by this logic.

A few other considerations:

  • If a sub-interface is created with a given dot1q or dot1ad tag, you can’t have another sub-interface with a diffent matching logic on that same tag, for example creating dot1q 100 means you can’t then also create dot1q 100 exact-match. If that behavior is desired, then you’ll want to create dot1q 100 inner-dot1q any followed by dot1q 100 exact-match
  • For L3 interfaces, it only makes sense to have exact-match interfaces. I found a bug in VPP that leads to a crash, which I’ve fixed in [this gerrit], so now the API and CLI throw an error instead of taking down the router.

Bridge Domains

So how do we make the functional equivalent of a VLAN, where several interfaces are bound together into an L2 broadcast domain, like a regular switch might do? The VPP answer to this is a bridge-domain which I can create and give a number, and then add any interface to it, like so:

vpp# create bridge-domain 10
vpp# set interface l2 bridge GigabitEthernet10/0/0 10
vpp# set interface l2 bridge BondEthernet0 10

And if I want to add an IP address (creating the equivalent of a routable VLAN Interface), I create what is called a Bridge Virtual Interface or BVI, add that interface to the bridge domain, and optionally expose it in Linux with the LinuxCP plugin:

vpp# bvi create instance 10 mac 02:fe:4b:4c:22:8f
vpp# set interface l2 bridge bvi10 10 bvi
vpp# set interface ip address bvi10 192.0.2.1/24
vpp# set interface ip address bvi10 2001:db8::1/64
vpp# lcp create bvi10 host-if bvi10

A bridge-domain is fully configurable - by default it’ll participate in L2 learning, maintain a FIB (which MAC addresses are seen behind which interface), and pass along ARP requests and Neighbor Discovery. But I can configure it to turn on/off forwarding, ARP, handling of unknown unicast frames, and so on, the complete list of functionality that can be changed at runtime:

set bridge-domain arp entry <bridge-domain-id> [<ip-addr> <mac-addr> [del] | del-all]
set bridge-domain arp term <bridge-domain-id> [disable]
set bridge-domain arp-ufwd <bridge-domain-id> [disable]
set bridge-domain default-learn-limit <maxentries>
set bridge-domain flood <bridge-domain-id> [disable]
set bridge-domain forward <bridge-domain-id> [disable]
set bridge-domain learn <bridge-domain-id> [disable]
set bridge-domain learn-limit <bridge-domain-id> <learn-limit>
set bridge-domain mac-age <bridge-domain-id> <mins>
set bridge-domain rewrite <bridge-domain> [disable]
set bridge-domain uu-flood <bridge-domain-id> [disable]

This makes bridge domains a very powerful concept, and actually much more powerful (a strict superset) of what I might be able to configure on an L2 switch.

L2 CrossConnect

I thought it’d be useful to point out another powerful concept, which made an appearance in my previous post about Virtual Leased Lines. If all I want to do is connect two interfaces together, there won’t be a need for learning, L2 FIB, and so on. It is computationally much simpler to just take any frame received on interface A and transmit it out on interface B, unmodified. This is known in VPP as a layer2 crossconnect, and can be configured like so:

vpp# set interface l2 xconnect GigabitEthernet10/0/0 GigabitEthernet10/0/3
vpp# set interface l2 xconnect GigabitEthernet10/0/3 GigabitEthernet10/0/0

I should point out that this has to be done in both directions. The first invocation will transmit any frame received on Gi10/0/0 directly out on Gi10/0/3, and the second one will transmit any frame from Gi10/0/3 directly out on Gi10/0/0, turning this into a very efficient way to connect two interfaces together. Obviously, this only works in pairs, if more interfaces have to be connected, the bridge-domain is the way to go. That said, L2 cross connects are super common.

Tag Rewriting

If I want to connect two tagged sub-interfaces together, for example Gi10/0/0.123 to Gi10/0/3.321, things get a bit more complicated. When VPP receives the frame from the first interface, it’ll arrive tagged with VLAN 123, so what happens if that is l2 crossconnected to Gi10/0/3.321? The answer will surprise you, so let’s take a look:

vpp# set interface state GigabitEthernet10/0/0 up
vpp# set interface state GigabitEthernet10/0/3 up
vpp# create sub GigabitEthernet10/0/0 123
vpp# set interface state GigabitEthernet10/0/0.123 up
vpp# create sub GigabitEthernet10/0/3 321
vpp# set interface state GigabitEthernet10/0/3.321 up
vpp# set interface l2 xconnect GigabitEthernet10/0/0.123 GigabitEthernet10/0/3.321
vpp# set interface l2 xconnect GigabitEthernet10/0/3.321 GigabitEthernet10/0/0.123

If I send a packet into Gi10/0/0.123, the L2 crossconnect will copy the entire frame, unmodified into Gi10/0/3.321, but how can that be? That interface Gi10/0/3.321 is tagged with VLAN 321! VPP will end up sending the frame out on interface Gi10/0/3 tagged as VLAN 123. In the other direction, frames received on Gi10/0/3.321 will be sent out tagged as VLAN 321 on Gi10/0/0. This is certainly not what I expected.

To address this, VPP can add or remove VLAN tags when it receives a frame, when it transmits a frame, or both, let me show you this concept up close, as it’s really powerful!

VLAN tag rewrite provides the ability to change the VLAN tags on a packet. Existing tags can be popped, new tags can be pushed, and existing tags can be swapped with new tags. The rewrite feature is attached to a sub-interface as input and output operations. The input operation is explicitly configured by CLI or API calls, and the output operation is the symmetric opposite and is automatically derived from the input operation.

  • POP: For pop operations, the sub-interface encapsulation (the vlan tags specified when it was created) must have at least the number of popped tags. e.g. the “pop 2” operation would be rejected on a single-vlan interface. The output tag-rewrite operation will push the specified number of vlan tags onto the packet before transmitting. The pushed tag values are taken from the sub-interface encapsulation configuration.
  • PUSH: For push operations, the ethertype (dot1q or dot1ad) is also specified. The output tag-rewrite operation for pushes is to pop the same number of tags off the packet. If the packet doesn’t have enough tags it is dropped.
  • TRANSLATE: This is a combination of a pop and a push operation.

This may be confusing at first, so let me demonstrate how this works, by extending the example above. On the machine connected to Gi10/0/0.123, I’ll configure an IP address and try to ping its neighbor:

pim@hippo:~$ sudo ip link add link enp4s0f0 name vlan123 type vlan id 123
pim@hippo:~$ sudo ip link set vlan123 up
pim@hippo:~$ sudo ip addr add 192.0.2.1/30 dev vlan123
pim@hippo:~$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
...

On the other side, I’ll tcpdump what comes out the Gi10/0/3 port (which, as I observed above, is not carrying the tag, 321, but instead carrying the original ingress tag, 123):

16:33:59.489246 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 46:
  ethertype 802.1Q (0x8100), vlan 123, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28

Now, to demonstrate tag rewriting, I will remove (pop) the ingress VLAN tag from Gi10/0/0.123 when a packet is received:

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 pop 1

16:37:42.721424 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 42:
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28

There is no tag at all. What happened here is that when Gi10/0/0.123 received the frame, the ‘pop’ operation stripped 1 VLAN tag off the frame. And as we’ll see later, when that sub-interface transmits a frame, the ‘pop’ operation will add one VLAN tag (123) to the front of the frame.

Remember how I pointed out above that the ‘pop’ operation is symmetric? I can use that because if I were to also apply this on the Gi10/0/3.321 interface, then it will push the tag (of Gi10/0/3.321) onto the packet before sending it, and of course the other way around as well:

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 pop 1
vpp# set interface l2 tag-rewrite GigabitEthernet10/0/3.321 pop 1

16:41:00.352840 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 46:
  ethertype 802.1Q (0x8100), vlan 321, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28
16:41:00.352867 fe:54:00:00:10:03 > fe:54:00:00:10:00, length 46:
  ethertype 802.1Q (0x8100), vlan 321, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Reply 192.0.2.2 is-at fe:54:00:00:10:03, length 28

Hey look, there’s our ARP reply packet! That packet coming back into Gi10/0/3.321, when hitting the tag-rewrite, will in turn remove the tag, and the ‘pop’ being symmetrical, will of course add a new tag 123 on egress of Gi10/0/0.123, and I can now see connectivity end to end. Neat!

Other operations that are interesting, include arbitrarily adding a dot1q tag (or even two tags):

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 push dot1q 100

16:45:33.121049 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 50:
  ethertype 802.1Q (0x8100), vlan 100, p 0,
  ethertype 802.1Q (0x8100), vlan 123, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 push dot1q 100 200

16:48:15.936807 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 54:
  ethertype 802.1Q (0x8100), vlan 100, p 0,
  ethertype 802.1Q (0x8100), vlan 200, p 0,
  ethertype 802.1Q (0x8100), vlan 123, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28

And finally, swapping (translating) VLAN tags:

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 translate 1-1 dot1ad 100

16:50:56.705015 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 46:
  ethertype 802.1Q-QinQ (0x88a8), vlan 100, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28

vpp# set interface l2 tag-rewrite GigabitEthernet10/0/0.123 translate 1-1 dot1q 321
vpp# set interface l2 tag-rewrite GigabitEthernet10/0/3.321 translate 1-1 dot1q 123

16:44:03.462842 fe:54:00:00:10:00 > ff:ff:ff:ff:ff:ff, length 46:
  ethertype 802.1Q (0x8100), vlan 321, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Request who-has 192.0.2.2 tell 192.0.2.1, length 28
16:44:03.462847 fe:54:00:00:10:03 > fe:54:00:00:10:00, length 46:
  ethertype 802.1Q (0x8100), vlan 321, p 0,
  ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4),
    Reply 192.0.2.2 is-at fe:54:00:00:10:03, length 28

This last set of ’translate 1-1’ has a similar effect to the ‘pop 1’, the VLAN is rewritten to 321 when receiving from Gi10/0/0.123, and it’s rewritten to 123 when receiving from Gi10/0/3.321, making end to end traffic possible again.

Final conclusion

The four concepts discussed here can be combined in countless interesting ways:

  • Create sub-interface with or without exact-match, to handle certain encapsulated packets
  • Provide layer2 crossconnect functionality between any two interfaces or sub-interfaces
  • Add multiple interfaces and sub-interfaces into a bridge-domain
  • Ensure that VLAN tags are popped and pushed consistently on tagged sub-interfaces

The practical conclusion is that VPP can provide fully transparent, dot1q and jumboframe enabled virtual leased lines (see my previous post on VLL performance), including using regular breakout switches to greatly increase the total port count for customers.

I’ll leave you with a working example of an L2VPN between a breakout switch behind nlams0.ipng.ch in Amsterdam and a remote VPP router in Zurich called ddln0.ipng.ch. Take the following S5860-20SQ switch, which connects to the VPP router on Te0/1 and a customer on Te0/2:

fsw0(config)#vlan 3438
fsw0(config-vlan)#name v-vll-customer
fsw0(config-vlan)#exit
fsw0(config)#interface TenGigabitEthernet 0/1
fsw0(config-if-TenGigabitEthernet 0/1)#description Core: nlams0.ipng.ch Te6/0/0
fsw0(config-if-TenGigabitEthernet 0/1)#mtu 9216
fsw0(config-if-TenGigabitEthernet 0/1)#switchport mode trunk
fsw0(config-if-TenGigabitEthernet 0/1)#switchport trunk allowed vlan add 3438

fsw0(config)#interface TenGigabitEthernet 0/2
fsw0(config-if-TenGigabitEthernet 0/2)#description Cust: Customer VLL Port NIKHEF
fsw0(config-if-TenGigabitEthernet 0/2)#mtu 1522
fsw0(config-if-TenGigabitEthernet 0/2)#switchport mode dot1q-tunnel
fsw0(config-if-TenGigabitEthernet 0/2)#switchport dot1q-tunnel native vlan 3438
fsw0(config-if-TenGigabitEthernet 0/2)#switchport dot1q-tunnel allowed vlan add untagged 3438
fsw0(config-if-TenGigabitEthernet 0/2)#switchport dot1q-tunnel allowed vlan add tagged 1000-2000

I configure the first port here to be a VLAN trunk port to the router, and add VLAN 3438 to it. Then, I configure the second port to be a customer dot1q-tunnel port, which accepts untagged frames and puts them in VLAN 3438, and additionally accepts tagged frames in VLAN 1000-2000 and prepends the customer VLAN 3438 to them - so these will become QinQ double tagged 3438.1000-2000.

The corresponding snippet of the VPP router configuration as such:

comment { Customer VLL to DDLN }
lcp lcp-auto-sub-int off
create sub TenGigabitEthernet6/0/0 3438 dot1q 3438
set interface mtu packet 1518 TenGigabitEthernet6/0/0.3438
set interface state TenGigabitEthernet6/0/0.3438 up
set interface l2 tag-rewrite TenGigabitEthernet6/0/0.3438 pop 1

create vxlan tunnel instance 12 src 194.1.163.32 dst 194.1.163.5 vni 320501 decap-next l2
set interface state vxlan_tunnel12 up
set interface mtu packet 1518 vxlan_tunnel12
set interface l2 xconnect TenGigabitEthernet6/0/0.3438 vxlan_tunnel12
set interface l2 xconnect vxlan_tunnel12 TenGigabitEthernet6/0/0.3438
lcp lcp-auto-sub-int on

The customer facing interfaces have an MTU of 1518 bytes, which is enough for the 1500 bytes of the IP packet, including 14 bytes of L2 overhead (src-mac, dst-mac, ethertype), and one optional VLAN tag. In other words, this VLL is dot1q capable, because the VPP sub-interface Te6/0/0.3438 did not specify exact-match, so it’ll accept any additional VLAN tags. Of course this does require the path from nlams0.ipng.ch to ddln0.ipng.ch to be (baby)jumbo enabled, which they are as AS8298 is fully 9000 byte capable.