VPP

About this series

Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation services router), VPP will look and feel quite familiar as many of the approaches are shared between the two. One thing notably missing, is the higher level control plane, that is to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a VPP plugin which is called the Linux Control Plane, or LCP for short, which creates Linux network devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the interface state (links, addresses and routes) itself. When the plugin is completed, running software like FRR or Bird on top of VPP and achieving >100Mpps and >100Gbps forwarding rates will be well in reach!

In this first post, let’s take a look at tablestakes: making a copy of VPP’s interfaces appear in the Linux kernel.

My test setup

I took two AMD64 machines, each with 32GB of memory and one Intel X710-DA4 network card (which offers four SFP+ cages), and installed Ubuntu 20.04 on them. I connected each of the network ports back to back with DAC cables. This gives me plenty of interfaces to play with. On the vanilla Ubuntu machine, I created a bunch of different types of interfaces and configured IPv4 and IPv6 addresses on them.

The goal of this post is to show what code needed to be written and which changes needed to be made to the plugin, in order to mirror each type of interface from VPP into a valid Linux interface. As we’ll see, marrying the Linux network interface approach with the VPP interface approach can be tricky! Throughout this post, the vanilla Ubuntu machine will keep the following configuration, the config file of which you can see in the Appendix:

Name type Addresses
enp66s0f0 untagged 10.0.1.2/30 2001:db8:0:1::2/64
enp66s0f0.q dot1q 1234 10.0.2.2/30 2001:db8:0:2::2/64
enp66s0f0.qinq outer dot1q 1234, inner dot1q 1000 10.0.3.2/30 2001:db8:0:3::2/64
enp66s0f0.ad dot1ad 2345 10.0.4.2/30 2001:db8:0:4::2/64
enp66s0f0.qinad outer dot1ad 2345, inner dot1q 1000 10.0.5.2/30 2001:db8:0:5::2/64

This configuration will allow me to ensure that all common types of sub-interface are supported by the plugin.

Startingpoint

The linux-cp plugin that ships with VPP 21.06, when initialized with the desired startup config (see Appendix), will yield this (Hippo is the machine that runs my development branch of VPP, it’s called like that because it’s always hungry for packets):


pim@hippo:~/src/lcpng$ ip ro
default via 194.1.163.65 dev enp6s0 proto static 
10.0.1.0/30 dev e0 proto kernel scope link src 10.0.1.1 
10.0.2.0/30 dev e0.1234 proto kernel scope link src 10.0.2.1 
10.0.4.0/30 dev e0.1236 proto kernel scope link src 10.0.4.1 
194.1.163.64/27 dev enp6s0 proto kernel scope link src 194.1.163.88 

pim@hippo:~/src/lcpng$ fping 10.0.1.2 10.0.2.2 10.0.3.2 10.0.4.2 10.0.5.2 
10.0.1.2 is alive
10.0.2.2 is alive
10.0.3.2 is unreachable
10.0.4.2 is unreachable
10.0.5.2 is unreachable

pim@hippo:~/src/lcpng$ fping6 2001:db8:0:1::2 2001:db8:0:2::2 \
  2001:db8:0:3::2 2001:db8:0:4::2 2001:db8:0:5::2
2001:db8:0:1::2 is alive
2001:db8:0:2::2 is alive
2001:db8:0:3::2 is unreachable
2001:db8:0:4::2 is unreachable
2001:db8:0:5::2 is unreachable

Yikes! So the plugin really only knows how to handle untagged interfaces, and sub-interfaces with one dot1q VLAN tag. The other three scenarios (dot1ad VLAN tag; dot1q in dot1q; and dot1q in dot1ad) are not ok. And, curiously, the dot1ad 2345 exact-match interface was created (as linux interface e0.1236, but it doesn’t ping, and I’ll show you why :-) But principally: let’s fix this plugin!

Anatomy of Linux Interface Pairs

In VPP, the plumbing to the Linux kernel is done via a TUN/TAP interface. For L3 interfaces, TAP is used. This TAP appears in the Linux network namesapce as a device with which you can interact. From the Linux point of view, on egress, all packets coming from the host into the TAP are cross-connected directly to the logical VPP network interface. In VPP, on ingress, packets destined for an L3 address on any VPP interface, as well as packets that are multicast, are punted into the TAP, which makes them appear in the kernel.

In VPP, a linux interface pair (LIP for short) is therefore a tuple { vpp_phy_idx, vpp_tap_idx, netlink_idx }. Creating one of these, is the art of first creating a tap, and associating it with the vpp_phy, copying traffic from it into the dataplane, and punting traffic from the dataplane into the TAP so that Linux can see it. The plugin exposes an API endpoint that creates, deletes and lists these linux interface pairs:

lcp create <sw_if_index>|<if-name> host-if <host-if-name> netns <namespace> [tun]
lcp delete <sw_if_index>|<if-name>
show lcp [phy <interface>]

If you’re still with me, congratulations, because this is where it starts to get fun!

Create interface: physical

The easiest interface type is a physical one. Here, the plugin will create a TAP, copy the MAC address from the PHY, and set a bunch of attributes on the TAP, such as MTU and link state. Here, I made my first set of changes (in [patchset 3]) to the plugin:

  • Initialize the link state of the VPP interface, not unconditionally set it to ‘down’.
  • Initialize the MTU of the VPP interface into the TAP, do not assume it is the VPP default of 9000; if the MTU is not known, assume the TAP has 9216, the largest possible on ethernet.

Taking a look:

DBGvpp# show int TenGigabitEthernet3/0/0
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count     
TenGigabitEthernet3/0/0           1     down         9000/0/0/0     
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# show tap tap1
Interface: tap1 (ifindex 7)
  name "e0"
  host-ns "(nil)"
  host-mtu-size "9000"
  host-mac-addr: 68:05:ca:32:46:14
...

DBGvpp# set interface state TenGigabitEthernet3/0/1 up
DBGvpp# set interface mtu packet 1500 TenGigabitEthernet3/0/1 
DBGvpp# lcp create TenGigabitEthernet3/0/1 host-if e1

And in Linux, unceremoniously, both interfaces appear:

pim@hippo:~/src/lcpng$ ip link show e0
291: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:32:46:14 brd ff:ff:ff:ff:ff:ff
pim@hippo:~/src/lcpng$ ip link show e1
307: e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:32:46:15 brd ff:ff:ff:ff:ff:ff

The MAC address from the physical interface show hardware-interface TenGigabitEthernet3/0/0 corresponds to the one seen in the TAP, and the one seen in the Linux interface we just created. The Linux interfaces respect the MTU and link state of their counterpart VPP interfaces (e0 is down at 9000b, e1 is up at 1500b).

Create interface: dot1q

Note that creating an ethernet sub-interface in VPP takes the following form:

create sub-interfaces <interface> {<subId> [default|untagged]} | {<subId>-<subId>}
  | {<subId> dot1q|dot1ad <vlanId>|any [inner-dot1q <vlanId>|any] [exact-match]}

Here, I’ll start with the simplest form, canonically called a .1q VLAN or a tagged interface. The plugin handles it just fine, with a codepath that first creates a sub-interface on the parent’s TAP, forwards traffic to/from the VPP subinterface into the parent TAP, asks the Linux kernel to create a new interface of type vlan with the id set to the dot1q tag, as a child of the e0 interface. Note however the exact-match keyword, which is very important. In VPP, without setting exact-match, any ethernet frame that matches the sub-interface expression, will be handled by it. This means the VLAN with tag 1234, but also a stacked (Q-in-Q or Q-in-AD) VLAN with the outer tag set to 1234 will match. This is non-sensical for an IP interface, and as such the first two examples will successfully create, but the third example will crash the plugin:

## Good, shorthand sets exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1234
DBGvpp# lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234

## Good, explicitly set exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1234 dot1q 1234 exact-match
DBGvpp# lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234

## Bad, will crash
DBGvpp# create sub TenGigabitEthernet3/0/0 1234 dot1q 1234
DBGvpp# lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234

The reason is that the first call is a shorthand: it creates sub-int 1234 as dot1q 1234 exact-match, which is literally what the second example does, while the third example creates a non-exact-match sub-int 1235 with dot1q 1235. So I changed the behavior to explicitly reject sub-interfaces that are not exact-match in [patchset 4]. Actually, it turns out that VPP upstream also crashes on setting an ip address on a sub-int that is not configured with exact-match, so I fixed that upstream in this [gerrit] too.

Create interface: dot1ad

While by far 802.1q VLAN interfaces are the most used, there’s a lesser known sibling called 802.1ad – the only difference is that VLAN ethernet frames with .1q use the well known 0x8100 ethernet type (called a a tag protocol identifier, or TPID), while .1ad uses a lesser known 0x88a8 type. In the first beginnings, Q-in-Q was suggested to use the 0x88a8 tag for the outer type, and 0x8100 for the inner type, differentiating the two. But the industry was conflicted, and many vendors chose to use 0x8100 for both inner- and outer-types, VPP supports it and so does Linux, so let’s implement it in [patchset 5]. Without this change, the plugin would create the interface, but it would invariably create it as .1q on the linux side, which explains why the e0.1236 interface exists but doesn’t ping in my startingpoint above. Now we have the expected behavior:

DBGvpp# create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
DBGvpp# lcp create TenGigabitEthernet3/0/0.1236 host-if e0.1236

pim@hippo:~/src/lcpng$ ping 10.0.4.2
PING 10.0.4.2 (10.0.4.2) 56(84) bytes of data.
64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=0.58 ms
64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.57 ms
64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.62 ms
64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.67 ms
^C
--- 10.0.4.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.566/0.608/0.672/0.041 ms

Create interface: dot1q in dot1ad

This is the original Q-in-Q as it was intended. Frames here carry an outer ethernet TPID of 0x88a8 (dot1ad) which is followed by an inner ethernet TPID of 0x8100 (dot1q). Of course, untagged inner frames are also possible - they show up as simply one ethernet TPID of dot1ad followed directly by the L3 payload. Here, things get a bit more tricky. On the VPP side, we can simply create the sub-interface directly; but on the Linux side, we cannot do that. This is because in VPP, all sub-interfaces are directly parented by their physical interface, while in Linux, the interfaces are stacked on one another. Compare:

### VPP idiomatic q-in-ad (1 interface)
DBGvpp# create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match

### Linux idiomatic q-in-ad stack (2 interfaces)
ip link add link e0 name e0.2345 type vlan id 2345 proto 802.1ad
ip link add link e0.2345 name e0.2345.1000 type vlan id 1000 proto 802.1q

So in order to create Q-in-AD sub-interfaces, for Linux their intermediary parent must exist, while in VPP this is not necessary. I have to make a compromise, so I’ll be a bit more explicit and allow this type of LIP to be created only under these conditions:

  • A sub-int exists with the intermediary (in this case, dot1ad 2345 exact-match)
  • That sub-int itself has a LIP, with a Linux interface device that we can spawn the inner interface off of

If these conditions don’t hold, I reject the request. If they do, I create an interface pair:

DBGvpp# create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
DBGvpp# lcp create TenGigabitEthernet3/0/0.1236 host-if e0.1236
DBGvpp# lcp create TenGigabitEthernet3/0/0.1237 host-if e0.1237

pim@hippo:~/src/lcpng$ ip link show e0.1236
375: e0.1236@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:32:46:14 brd ff:ff:ff:ff:ff:ff
pim@hippo:~/src/lcpng$ ip link show e0.1237
376: e0.1237@e0.1236: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:32:46:14 brd ff:ff:ff:ff:ff:ff

Here, e0.1237 was indeed created as a child of e0.1236, which in turn was created as a child of e0.

The code for this is in [patchset 6].

Create interface: dot1q in dot1q

Given the change above, this is an entirely obvious capability that the plugin now handles, but I did find a failure mode, when I tried to create a LIP for a sub-interface when there are no LIPs created. It causes a NULL deref when trying to look up the LIP of the parent (which doesn’t yet have a LIP defined). I fixed that in this [patchset 7].

Results

After applying the configuration to VPP (in Appendix), here’s the results:

pim@hippo:~/src/lcpng$ ip ro
default via 194.1.163.65 dev enp6s0 proto static 
10.0.1.0/30 dev e0 proto kernel scope link src 10.0.1.1 
10.0.2.0/30 dev e0.1234 proto kernel scope link src 10.0.2.1 
10.0.3.0/30 dev e0.1235 proto kernel scope link src 10.0.3.1 
10.0.4.0/30 dev e0.1236 proto kernel scope link src 10.0.4.1 
10.0.5.0/30 dev e0.1237 proto kernel scope link src 10.0.5.1 
194.1.163.64/27 dev enp6s0 proto kernel scope link src 194.1.163.88 

pim@hippo:~/src/lcpng$ fping 10.0.1.2 10.0.2.2 10.0.3.2 10.0.4.2 10.0.5.2 
10.0.1.2 is alive
10.0.2.2 is alive
10.0.3.2 is alive
10.0.4.2 is alive
10.0.5.2 is alive

pim@hippo:~/src/lcpng$ fping6 2001:db8:0:1::2 2001:db8:0:2::2 \
  2001:db8:0:3::2 2001:db8:0:4::2 2001:db8:0:5::2
2001:db8:0:1::2 is alive
2001:db8:0:2::2 is alive
2001:db8:0:3::2 is alive
2001:db8:0:4::2 is alive
2001:db8:0:5::2 is alive

As can be seen, all interface types ping. Mirroring interfaces from VPP to Linux is now done!

We still have to manually copy the configuration (like link states, MTU changes, IP addresses and routes) from VPP into Linux, and of course it would be great if we could mirror those states also into Linux, and this is exactly the topic of my next post.

Credits

I’d like to make clear that the Linux CP plugin is a great collaboration between several great folks and that my work stands on their shoulders. I’ve had a little bit of help along the way from Neale Ranns, Matthew Smith and Jon Loeliger, and I’d like to thank them for their work!

Appendix

Ubuntu config

# Untagged interface
ip addr add 10.0.1.2/30 dev enp66s0f0
ip addr add 2001:db8:0:1::2/64 dev enp66s0f0
ip link set enp66s0f0 up mtu 9000

# Single 802.1q tag 1234
ip link add link enp66s0f0 name enp66s0f0.q type vlan id 1234
ip link set enp66s0f0.q up mtu 9000
ip addr add 10.0.2.2/30 dev enp66s0f0.q
ip addr add 2001:db8:0:2::2/64 dev enp66s0f0.q

# Double 802.1q tag 1234 inner-tag 1000
ip link add link enp66s0f0.q name enp66s0f0.qinq type vlan id 1000
ip link set enp66s0f0.qinq up mtu 9000
ip addr add 10.0.3.3/30 dev enp66s0f0.qinq
ip addr add 2001:db8:0:3::2/64 dev enp66s0f0.qinq

# Single 802.1ad tag 2345
ip link add link enp66s0f0 name enp66s0f0.ad type vlan id 2345 proto 802.1ad
ip link set enp66s0f0.ad up mtu 9000
ip addr add 10.0.4.2/30 dev enp66s0f0.ad
ip addr add 2001:db8:0:4::2/64 dev enp66s0f0.ad

# Double 802.1ad tag 2345 inner-tag 1000
ip link add link enp66s0f0.ad name enp66s0f0.qinad type vlan id 1000 proto 802.1q
ip link set enp66s0f0.qinad up mtu 9000
ip addr add 10.0.5.2/30 dev enp66s0f0.qinad
ip addr add 2001:db8:0:5::2/64 dev enp66s0f0.qinad

VPP config

vppctl set interface state TenGigabitEthernet3/0/0 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0
vppctl set interface ip address TenGigabitEthernet3/0/0 10.0.1.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0 2001:db8:0:1::1/64
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
ip link set e0 up mtu 9000
ip addr add 10.0.1.1/30 dev e0
ip addr add 2001:db8:0:1::1/64 dev e0

vppctl create sub TenGigabitEthernet3/0/0 1234
vppctl set interface state TenGigabitEthernet3/0/0.1234 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1234
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 10.0.2.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 2001:db8:0:2::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234
ip link set e0.1234 up mtu 9000
ip addr add 10.0.2.1/30 dev e0.1234
ip addr add 2001:db8:0:2::1/64 dev e0.1234

vppctl create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1235 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1235
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 10.0.3.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 2001:db8:0:3::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1235 host-if e0.1235
ip link set e0.1235 up mtu 9000
ip addr add 10.0.3.1/30 dev e0.1235
ip addr add 2001:db8:0:3::1/64 dev e0.1235

vppctl create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1236 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1236
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 10.0.4.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 2001:db8:0:4::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1236 host-if e0.1236
ip link set e0.1236 up mtu 9000
ip addr add 10.0.4.1/30 dev e0.1236
ip addr add 2001:db8:0:4::1/64 dev e0.1236

vppctl create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1237 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1237
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 10.0.5.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 2001:db8:0:5::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1237 host-if e0.1237
ip link set e0.1237 up mtu 9000
ip addr add 10.0.5.1/30 dev e0.1237
ip addr add 2001:db8:0:5::1/64 dev e0.1237