About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation services router), VPP will look and feel quite familiar as many of the approaches are shared between the two. One thing notably missing, is the higher level control plane, that is to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a VPP plugin which is called the Linux Control Plane, or LCP for short, which creates Linux network devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the interface state (links, addresses and routes) itself. When the plugin is completed, running software like FRR or Bird on top of VPP and achieving >100Mpps and >100Gbps forwarding rates will be well in reach!
In the first three posts, I added the ability for VPP to synchronize its state (like link state, MTU, and interface addresses) into Linux. In this post, I’ll make a start on the other direction: allowing changes to interfaces made in Linux to make their way back into VPP!
My test setup
I’m keeping the setup from the third post. A Linux machine has an
interface enp66s0f0
which has 4 sub-interfaces (one dot1q tagged, one q-in-q, one dot1ad tagged,
and one q-in-ad), giving me five flavors in total. Then, I created an LACP bond0
interface, which
also has the whole kit and caboodle of sub-interfaces defined, see below in the Appendix for details,
but here’s the table again for reference:
Name | type | Addresses |
---|---|---|
enp66s0f0 | untagged | 10.0.1.2/30 2001:db8:0:1::2/64 |
enp66s0f0.q | dot1q 1234 | 10.0.2.2/30 2001:db8:0:2::2/64 |
enp66s0f0.qinq | outer dot1q 1234, inner dot1q 1000 | 10.0.3.2/30 2001:db8:0:3::2/64 |
enp66s0f0.ad | dot1ad 2345 | 10.0.4.2/30 2001:db8:0:4::2/64 |
enp66s0f0.qinad | outer dot1ad 2345, inner dot1q 1000 | 10.0.5.2/30 2001:db8:0:5::2/64 |
bond0 | untagged | 10.1.1.2/30 2001:db8:1:1::2/64 |
bond0.q | dot1q 1234 | 10.1.2.2/30 2001:db8:1:2::2/64 |
bond0.qinq | outer dot1q 1234, inner dot1q 1000 | 10.1.3.2/30 2001:db8:1:3::2/64 |
bond0.ad | dot1ad 2345 | 10.1.4.2/30 2001:db8:1:4::2/64 |
bond0.qinad | outer dot1ad 2345, inner dot1q 1000 | 10.1.5.2/30 2001:db8:1:5::2/64 |
The goal of this post is to show what code needed to be written and introduces an entirely new plugin, so that we can separate concerns (and have a higher chance of community acceptance of the plugins). In the first plugin, now called the Interface Mirror, I have previously implemented the VPP-to-Linux synchronization. In this new plugin (called the Netlink Listener) I implement the Linux-to-VPP synchronization using, quelle surprise, Netlink message handlers.
Startingpoint
Based on the state of the plugin after the third post,
operators can enable lcp-sync
(which copies changes made in VPP into their Linux counterpart)
and lcp-auto-subint
(which extends sub-interface creation in VPP to automatically create a
Linux Interface Pair, or LIP, and its companion Linux network interface):
DBGvpp# lcp lcp-sync on
DBGvpp# lcp lcp-auto-subint on
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# create sub TenGigabitEthernet3/0/0 1234
DBGvpp# create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
pim@hippo:~/src/lcpng$ ip link | grep e0
1286: e0.1234@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1287: e0.1235@e0.1234: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1288: e0.1236@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1289: e0.1237@e0.1236: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1701: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000
The vision for this plugin has been that Linux can drive most control-plane operations, such as
creating sub-interfaces, adding/removing addresses, changing MTU on links, etc. We can do that by
listening to Netlink messages, which were designed for
transferring miscellaneous networking information between the kernel space and userspace processes
(like VPP
). Networking utilities, such as the iproute2 family and its command line utilities
(like ip
) use Netlink to communicate with the Linux kernel from userspace.
Netlink Listener
The first task at hand is to install a Netlink listener. In this new plugin, I first register
lcp_nl_init()
which adds Linux interface pair (LIP) add/del callbacks from the first plugin.
I’m now made aware of new LIPs as they are created.
In lcb_nl_pair_add_cb()
, I will initiate Netlink listener for first interface that gets created,
noting its netns. If subsequent adds are in other netns, I’ll just issue a warning. And, I will keep
a refcount so I know how many LIPs are bound to this listener.
In lcb_nl_pair_del_cb()
, I can remove the listener when the last interface pair is removed.
Then for listening itself, a Netlink socket is opened, and because Linux can be quite chatty on Netlink sockets, I’ll raise its read/write buffers to something quite large (typically 64M read and 16K write size). One note on this size, it’ll need some sysctl to be set before VPP starts, typically done as follows:
pim@hippo:~/src/vpp$ cat << EOF | sudo tee /etc/sysctl.d/81-vpp-Netlink.conf
# Increase Netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
pim@hippo:~/src/vpp$ sudo sysctl -p
After creating the Netlink socket, I add its file descriptor to VPP’s built in file handler, which
will see to polling it. On the file handler, I install lcp_nl_read_cb()
and lcp_nl_error_cb()
callbacks which will be invoked when anything interesting happens on the socket:
A bit of explanation on why I’d use a queue rather than just consuming the Netlink messages directly
as they are offered. I have to use a queue for the common case in which VPP is running single threaded.
Instead of consuming a block of potentially a million route del/add’s (say, if BGP is reconverging),
and thereby blocking VPP from reading new packets from DPDK, but more importantly, new Netlink
messages from the kernel, which will fill the 64M socket buffer and overflow it, losing Netlink messages,
which is bad because it requires an end to end resync of the Linux namespace into the VPP dataplane,
something called an NLM_F_DUMP
but that’s a story for another day.
So I process only a batch of messages and only for a maximum amount of time per batch. If there are still some messages left in the queue, I’ll just reschedule consumption after M milliseconds. This allows new Netlink messages to continuously be read from the kernel by VPP’s file handler, even if there’s a lot of work to do.
lcp_nl_read_cb()
callslcp_nl_callback()
which pushes Netlink messages onto a queue and issues aNL_EVENT_READ
event, any socket read error issuesNL_EVENT_READ_ERR
event.lcp_nl_error_cb()
simply issuesNL_EVENT_READ_ERR
event and moves on with life.
To capture these events, I initialize a process node called lcp_nl_process()
, which handles:
NL_EVENT_READ
by callinglcp_nl_process_msgs()
and processing a batch of messages (either a maximum count, or a maximum duration, whichever is reached first).NL_EVENT_READ_ERR
is the other event that can happen, in case VPP’s file handler or my ownlcp_nl_read_cb()
encounter a read error. All it does is close and reopen the Netlink socket in the same network namespace we were before, in an attempt to minimize the damage, dazed and confused, but trying to continue.
Allright, so at this point, I have a producer queue that gets added to by the Netlink reader
machinery, so all I have to do is consume them. lcp_nl_process_msgs()
processes up to N messages
and/or for up to M msecs, whichever comes first, and for each individual Netlink message, it
will call lcp_nl_dispatch()
to handle messages of a given type.
For now, lcp_nl_dispatch()
just throws the message away after logging it with format_nl_object()
,
a function that will come in very useful as I start to explore all the different Netlink message types.
The code that forms the basis of our Netlink Listener lives in [this commit] and specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale’s awesome work in this pending Gerrit.
Netlink: Neighbor
ARP and IPv6 Neighbor Discovery will trigger a set of Netlink messages, which are of type
RTM_NEWNEIGH
and RTM_DELNEIGH
First, I’ll add a new source file lcpng_nl_sync.c
that will house these handler functions.
Their purpose is to take state learned from Netlink messages, and apply that state to VPP.
Then, I add lcp_nl_neigh_add()
and lcp_nl_neigh_del()
which implement the following
pattern: Most Netlink messages are somehow about a link
, which is identified by an
interface index (ifindex
or just idx for short). That’s the same interface index I stored
when I created the LIP, calling it vif_index
because in VPP, it describes a virtio
device which implements the IO for the TAP.
If I’m handling a message for link with a given ifindex, I can correlate it with a LIP. Not all
messages will be related to something VPP knows or cares about, I’ll discuss that more later when
I discuss RTM_NEWLINK
messages.
If there is no LIP associated with the ifindex
, then clearly this message is about a
Linux interface VPP is not aware of. But, if I can find the LIP, I can convert the lladdr
(MAC address) and IP address from the Netlink message into their VPP variants, and then simply
add or remove the ip4/ip6 neighbor adjacency.
The code for this first Netlink message handler lives in this [commit]. An ironic insight is that after writing the code, I don’t think any of it will be necessary, because the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its neighbor adjacency tables; but I’m leaving the code in for now.
Netlink: Address
A decidedly more interesting message is RTM_NEWADDR
and its deletion companion RTM_DELADDR
.
It’s pretty straight forward to add and remove IPv4 and IPv6 addresses on interfaces. I have to convert the Netlink representation of an IP address to its VPP counterpart with a helper, add it or remove it, and if there are no link-local addresses left, disable IPv6 on the interface. There’s also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
The code for IP address handling is in this [commit], but when I took it out for a spin, I noticed something curious, looking at the log lines that are generated for the following sequence:
ip addr add 10.0.1.1/30 dev e0
debug linux-cp/nl addr_add: Netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
warn linux-cp/nl dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
ping 10.0.1.2
debug linux-cp/nl neigh_add: Netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
notice linux-cp/nl neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0
ip addr del 10.0.1.1/30 dev e0
debug linux-cp/nl addr_del: Netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
notice linux-cp/nl addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0
warn linux-cp/nl dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
debug linux-cp/nl neigh_del: Netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
error linux-cp/nl neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0
It is this very last message that’s a bit of a surprise – the ping brought the peer’s
lladdr into the neighbor cache; and the subsequent address deletion first removed the address,
then all the typical local routes (the connected, the broadcast, the network, and the self/local);
but then as well explicitly deleted the neighbor, which I suppose is correct behavior for Linux,
were it not that VPP already invalidates the neighbor cache and adds/removes the connected routes
for example in ip/ip4_forward.c
L826-L830 and L583.
I can see more of these false positive non-errors like the one on lcp_nl_neigh_del()
because
interface and directly connected route addition/deletion is slightly different in VPP than in Linux.
So, I decide to take a little shortcut – if an addition returns “already there”, or a deletion returns
“no such entry”, I’ll just consider it a successful addition and deletion respectively, saving my eyes
from being screamed at by this red error message. I changed that in this
[commit],
turning this situation in a friendly green notice instead.
Netlink: Link (existing)
There’s a bunch of use cases for these messages RTM_NEWLINK
and RTM_DELLINK
. They carry information
about carrier (link, no-link), admin state (up/down), MTU, and so on. The function lcp_nl_link_del()
is the easier of the two. If I see a message like this for an ifindex that VPP has a LIP for, I’ll
just remove it. This means first calling the lcp_itf_pair_delete()
function and then, if the message
was for a VLAN interface, remove the accompanying sub-interface (both the physical one (eg. TenGigabitEthernet3/0/0.1234
)
as well as the TAP that we used to communicate to the host with (eg. tap8.1234
).
The other message (the RTM_NEWLINK
one), is much more complicated, because it’s actually many types
of operation all in one message type: We can set the link up/down, change its MTU, and change its MAC
address, in any combination, perhaps like so:
ip link set e0 mtu 9216 address 00:11:22:33:44:55 down
So in turn, lcp_nl_link_add()
will first look at admin state and apply it to the phy and tap,
apply the MTU if it’s different to what VPP has, and apply the MAC address if it’s different to
what VPP has, notably applying MAC addresses only in ‘hardware’ interfaces, which I now know are
not just physical ones like TenGigabitEthernet3/0/0
but also virtual ones like BondEthernet0
.
One thing I noticed, is that link state and MTU changes tend to go around in circles (from Netlink
into VPP, with this code, but when lcp-sync
is on in the interface mirror plugin, changes to link
and mtu will trigger a callback there, which will in turn generate a Netlink message, and so on).
To avoid this loop, I temporarily turn off lcp-sync
just before handling a batch of messages, and
turn it back to its original state when I’m done with that.
The code for all/del of existing links is in this [commit].
Netlink: Link (new)
Here’s where it gets interesting! What if the RTM_NEWLINK
message was for an interface that VPP
doesn’t have a LIP for, but specifically describes a VLAN interface? Well, then clearly the operator
is trying to create a new sub-interface. And supporting that operation would be super cool, so let’s go!
Using the earlier placeholder hint in lcp_nl_link_add()
(see the previous
[commit]),
I know that I’ve gotten a NEWLINK request but the Linux ifindex doesn’t have a LIP. This could be
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
a VLAN sub-interface on one:
ip link add dum0 type dummy
ip link add link dum0 name dum0.10 type vlan id 10
Or perhaps more interestingly, the operator is actually trying to create a VLAN sub-interface on an interface we created in VPP earlier, like these:
ip link add link e0 name e0.1234 type vlan id 1234
ip link add link e0.1234 name e0.1235 type vlan id 1000
ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
ip link add link e0.1236 name e0.1237 type vlan id 1000
None of these RTM_NEWLINK
messages, represented by vif (Linux ifindex) will have a corresponding LIP.
So, I try to create one by calling lcp_nl_link_add_vlan()
.
First, I’ll lookup the parent ifindex (dum0
or e0
in the examples above). The first example parent,
dum0
, doesn’t have a LIP, so I bail after logging a warning. The second example however, e0
,
definitely does have a LIP, so it’s known to VPP.
Now, I have two further choices:
- the LIP is a phy (ie
TenGigabitEthernet3/0/0
orBondEthernet0
) and this is a regular tagged interface with a given proto (dot1q or dot1ad); or - the LIP is itself a subint (ie
TenGigabitEthernet3/0/0.1234
) and what I’m being asked for is actually a QinQ or QinAD sub-interface. Remember, there’s an important difference:- In Linux these sub-interfaces are chained (
e0
creates childe0.1234@e0
for a normal VLAN, ande0.1234
creates childe0.1235@e0.1234
for the QinQ). - In VPP these are actually all flat sub-interfaces, with the ‘regular’ VLAN interface carrying
the
one_tag
flag with only anouter_vlan_id
set, and the latter QinQ carrying thetwo_tags
flag with both anouter_vlan_id
(1234) and aninner_vlan_id
(1000).
- In Linux these sub-interfaces are chained (
So I look up both the parent LIP as well the phy LIP. I now have all the ingredients I need to create the VPP sub-interfaces with the correct inner-dot1q and outer dot1q or dot1ad.
Of course, I don’t really know what subinterface ID to use. It’s appealing to “just” use the vlan id,
but that’s not helpful if the outer tag and the inner tag are the same. So I write a helper function
vnet_sw_interface_get_available_subid()
whose job it is to return an unused subid for the phy,
starting from 1.
Here as well, the interface plugin can be configured to automatically create LIPs for sub-interfaces,
which I have to turn off temporarily to let my new form of creation do its thing. I carefully ensure that
the thread barrier is taken/released and the original setting of lcp-auto-subint
is restored at all
exit points. One cool thing is that the new link’s name is given in the Netlink message, so I can just
use that one. I like the aesthetic a bit more, because here the operator can give the Linux interface
any name they like, where-as in the other direction, VPP’s lcp-auto-subint
feature has to make up
a boring <phy>.<subid>
name.
Alright, without further ado, the code for the main innovation here, the implementation of
lcp_nl_link_add_vlan()
, is in this
[commit].
Results
The functional regression test I made on day one, the one that ensures end-to-end connectivity to and
from the Linux host interfaces works for all 5 interface types (untagged, .1q tagged, QinQ, .1ad tagged
and QinAD) and for both physical and virtual interfaces (like TenGigabitEthernet3/0/0
and BondEthernet0
),
still works.
After this code is in, the operator will only have to create a LIP for any phy interfaces, and
can rely on the new Netlink Listener plugin and the use of ip
in Linux for all the rest. This
implementation starts approaching ‘vanilla’ Linux user experience!
Here’s a new screencast [asciinema, gif] showing me playing around a bit, demonstrating that synchronization works pretty well in both directions, a huge improvement from the [previous asciinema, gif] in my [second post], which was only two weeks ago:
Further Work
You will note that there’s one important Netlink message type that’s missing: routes! They are so important in fact, that they’re a topic of their very own post. Also, I haven’t written the code for them yet :-)
A few things worth noting, as future work.
Multiple NetNS - The original Netlink Listener (ref) would
only listen to the default netns specified in the configuration file. This is problematic because the
interface plugin does allow interfaces to be made in other namespaces (by issuing
lcp create ... host-if X netns foo
), the Netlink world of which will be unknown to VPP. I
created struct lcp_nl_netlink_namespace
to hold the stuff needed for the Netlink listener,
which is a good starting point to create not one but multiple listeners, one for each unique
namespace that has one or more LIPs defined. This is version-two work :)
Multithreading - In testing, I noticed that while my plugin itself are (or seem to be..) thread
safe, virtio
may not be totally clean, and I noticed that in a multithreaded VPP instance with many
workers, there’s a crash in lcp_arp_phy_node()
where vlib_buffer_copy()
returns NULL, which should
not happen. When VPP is in such a state, other plugins (notably DHCP and IPv6 ND) also start complaining,
and show errors
shows millions of virtio-input
errors about unavailable buffers.
I do confirm though, that running VPP single threaded does not have these issues.
Credits
I’d like to make clear that the Linux CP plugin is a collaboration between several great minds, and that my work stands on other software engineer’s shoulders. In particular most of the Netlink socket handling and Netlink message queueing was written by Matthew Smith, and I’ve had a little bit of help along the way from Neale Ranns and Jon Loeliger. I’d like to thank them for their work!
Appendix
Ubuntu config
This configuration has been the exact same ever since my first post:
# Untagged interface
ip addr add 10.0.1.2/30 dev enp66s0f0
ip addr add 2001:db8:0:1::2/64 dev enp66s0f0
ip link set enp66s0f0 up mtu 9000
# Single 802.1q tag 1234
ip link add link enp66s0f0 name enp66s0f0.q type vlan id 1234
ip link set enp66s0f0.q up mtu 9000
ip addr add 10.0.2.2/30 dev enp66s0f0.q
ip addr add 2001:db8:0:2::2/64 dev enp66s0f0.q
# Double 802.1q tag 1234 inner-tag 1000
ip link add link enp66s0f0.q name enp66s0f0.qinq type vlan id 1000
ip link set enp66s0f0.qinq up mtu 9000
ip addr add 10.0.3.2/30 dev enp66s0f0.qinq
ip addr add 2001:db8:0:3::2/64 dev enp66s0f0.qinq
# Single 802.1ad tag 2345
ip link add link enp66s0f0 name enp66s0f0.ad type vlan id 2345 proto 802.1ad
ip link set enp66s0f0.ad up mtu 9000
ip addr add 10.0.4.2/30 dev enp66s0f0.ad
ip addr add 2001:db8:0:4::2/64 dev enp66s0f0.ad
# Double 802.1ad tag 2345 inner-tag 1000
ip link add link enp66s0f0.ad name enp66s0f0.qinad type vlan id 1000 proto 802.1q
ip link set enp66s0f0.qinad up mtu 9000
ip addr add 10.0.5.2/30 dev enp66s0f0.qinad
ip addr add 2001:db8:0:5::2/64 dev enp66s0f0.qinad
## Bond interface
ip link add bond0 type bond mode 802.3ad
ip link set enp66s0f2 down
ip link set enp66s0f3 down
ip link set enp66s0f2 master bond0
ip link set enp66s0f3 master bond0
ip link set enp66s0f2 up
ip link set enp66s0f3 up
ip link set bond0 up
ip addr add 10.1.1.2/30 dev bond0
ip addr add 2001:db8:1:1::2/64 dev bond0
ip link set bond0 up mtu 9000
# Single 802.1q tag 1234
ip link add link bond0 name bond0.q type vlan id 1234
ip link set bond0.q up mtu 9000
ip addr add 10.1.2.2/30 dev bond0.q
ip addr add 2001:db8:1:2::2/64 dev bond0.q
# Double 802.1q tag 1234 inner-tag 1000
ip link add link bond0.q name bond0.qinq type vlan id 1000
ip link set bond0.qinq up mtu 9000
ip addr add 10.1.3.2/30 dev bond0.qinq
ip addr add 2001:db8:1:3::2/64 dev bond0.qinq
# Single 802.1ad tag 2345
ip link add link bond0 name bond0.ad type vlan id 2345 proto 802.1ad
ip link set bond0.ad up mtu 9000
ip addr add 10.1.4.2/30 dev bond0.ad
ip addr add 2001:db8:1:4::2/64 dev bond0.ad
# Double 802.1ad tag 2345 inner-tag 1000
ip link add link bond0.ad name bond0.qinad type vlan id 1000 proto 802.1q
ip link set bond0.qinad up mtu 9000
ip addr add 10.1.5.2/30 dev bond0.qinad
ip addr add 2001:db8:1:5::2/64 dev bond0.qinad
VPP config
We can whittle down the VPP configuration to the bare minimum:
vppctl lcp default netns dataplane
vppctl lcp lcp-sync on
vppctl lcp lcp-auto-subint on
## Create `e0`
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
## Create `be0`
vppctl create bond mode lacp load-balance l34
vppctl bond add BondEthernet0 TenGigabitEthernet3/0/2
vppctl bond add BondEthernet0 TenGigabitEthernet3/0/3
vppctl set interface state TenGigabitEthernet3/0/2 up
vppctl set interface state TenGigabitEthernet3/0/3 up
vppctl lcp create BondEthernet0 host-if be0
And the rest of the confifuration work is done entirely from the Linux side!
IP="sudo ip netns exec dataplane ip"
## `e0` aka TenGigabitEthernet3/0/0
$IP link add link e0 name e0.1234 type vlan id 1234
$IP link add link e0.1234 name e0.1235 type vlan id 1000
$IP link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
$IP link add link e0.1236 name e0.1237 type vlan id 1000
$IP link set e0 up mtu 9000
$IP addr add 10.0.1.1/30 dev e0
$IP addr add 2001:db8:0:1::1/64 dev e0
$IP addr add 10.0.2.1/30 dev e0.1234
$IP addr add 2001:db8:0:2::1/64 dev e0.1234
$IP addr add 10.0.3.1/30 dev e0.1235
$IP addr add 2001:db8:0:3::1/64 dev e0.1235
$IP addr add 10.0.4.1/30 dev e0.1236
$IP addr add 2001:db8:0:4::1/64 dev e0.1236
$IP addr add 10.0.5.1/30 dev e0.1237
$IP addr add 2001:db8:0:5::1/64 dev e0.1237
## `be0` aka BondEthernet0
$IP link add link be0 name be0.1234 type vlan id 1234
$IP link add link be0.1234 name be0.1235 type vlan id 1000
$IP link add link be0 name be0.1236 type vlan id 2345 proto 802.1ad
$IP link add link be0.1236 name be0.1237 type vlan id 1000
$IP link set be0 up mtu 9000
$IP addr add 10.1.1.1/30 dev be0
$IP addr add 2001:db8:1:1::1/64 dev be0
$IP addr add 10.1.2.1/30 dev be0.1234
$IP addr add 2001:db8:1:2::1/64 dev be0.1234
$IP addr add 10.1.3.1/30 dev be0.1235
$IP addr add 2001:db8:1:3::1/64 dev be0.1235
$IP addr add 10.1.4.1/30 dev be0.1236
$IP addr add 2001:db8:1:4::1/64 dev be0.1236
$IP addr add 10.1.5.1/30 dev be0.1237
$IP addr add 2001:db8:1:5::1/64 dev be0.1237
Final note
You may have noticed that the [commit] links are all to git commits in my private working copy. I want to wait until my previous work is reviewed and submitted before piling on more changes. Feel free to contact vpp-dev@ for more information in the mean time :-)