Introduction
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance. However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
The other day I joined the [ZANOG'25] in Durban, South Africa. One of the presenters was Nardus le Roux of Nokia, and he showed off a project called [Containerlab], which provides a CLI for orchestrating and managing container-based networking labs. It starts the containers, builds a virtual wiring between them to create lab topologies of users choice and manages labs lifecycle.
Quite regularly I am asked ‘when will you add VPP to Containerlab?’, but at ZANOG I made a promise to actually add them. Here I go, on a journey to integrate VPP into Containerlab!
Containerized VPP
The folks at [Tigera] maintain a project called Calico, which accelerates Kubernetes CNI (Container Network Interface) by using [FD.io] VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to reason that it should be possible to run a containerized VPP. I start by reading up on how they create their Docker image, and I learn a lot.
Docker Build
Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
on debian:bookworm
as well. The build starts off quite modest:
pim@summer:~$ mkdir -p src/vpp-containerlab
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
FROM debian:bookworm
ARG DEBIAN_FRONTEND=noninteractive
ARG VPP_INSTALL_SKIP_SYSCTL=true
ARG REPO=release
RUN apt-get update && apt-get -y install curl procps && apt-get clean
# Install VPP
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
EOF
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
One gotcha - when I install the upstream VPP debian packages, they generate a sysctl
file which it
tries to execute. However, I can’t set sysctl’s in the container, so the build fails. I take a look
at the VPP source code and find src/pkg/debian/vpp.postinst
which helpfully contains a means to
override setting the sysctl’s, using an environment variable called VPP_INSTALL_SKIP_SYSCTL
.
Running VPP in Docker
With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it to run well in a Docker environment. There are a few things I make note of:
- We may not have huge pages on the host machine, so I’ll set all the page sizes to the linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but in the case of Containerlab, we’re not here to build high performance stuff, but rather users will be doing functional testing.
- DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called poll mode driver to the network cards. It also requires huge pages. Since my first version will be using only virtual ethernet interfaces, I’ll disable DPDK and VFIO alltogether.
- VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only one thread. Of course, this will not be a high performance setup, but since I’m already not using hugepages, I’ll use only 1 thread.
The VPP startup.conf
configuration file I came up with:
pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
unix {
interactive
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
cli-prompt vpp-clab#
cli-no-pager
poll-sleep-usec 100
}
api-trace {
on
}
memory {
main-heap-size 512M
main-heap-page-size 4k
}
buffers {
buffers-per-numa 16000
default data-size 2048
page-size 4k
}
statseg {
size 64M
page-size 4k
per-node-counters on
}
plugins {
plugin default { enable }
plugin dpdk_plugin.so { disable }
}
EOF
Just a couple of notes for those who are running VPP in production. Each of the *-page-size
config
settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
hugepages. Then, I’ll specifically disable the DPDK plugin, although I didn’t install it in the
Dockerfile build, as it lives in its own dedicated Debian package called vpp-plugin-dpdk
. Finally,
I’ll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
In production environments, VPP will use 100% of the CPUs it’s assigned, but in this lab, it will
not be quite as hungry. By the way, even in this sleepy mode, it’ll still easily handle a gigabit
of traffic!
Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost, and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the [manpage]:
- CAP_SYS_NICE: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and to migrate and move memory pages.
- CAP_NET_ADMIN: allows to perform various network-relates operations like interface configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
- CAP_SYS_PTRACE: allows to trace arbitrary processes using
ptrace(2)
, and a few related kernel system calls.
Being a networking dataplane implementation, VPP wants to be able to tinker with network devices. This is not typically allowed in Docker containers, although the Docker developers did make some consessions for those containers that need just that little bit more access. They described it in their [docs] as follows:
| The –privileged flag gives all capabilities to the container. When the operator executes docker | run –privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or | SELinux to allow the container nearly all the same access to the host as processes running outside | containers on the host. Use this flag with caution. For more information about the –privileged | flag, see the docker run reference.
--privileged
flag
set does give it a lot of privileges. A container with --privileged
is not a securely sandboxed
process. Containers in this mode can get a root shell on the host and take control over the system.
With that little fineprint warning out of the way, I am going to Yolo like a boss:
pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
--cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
--device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
--privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
docker.io/pimvanpelt/vpp-containerlab
clab-pim
Configuring VPP in Docker
And with that, the Docker container is running! I post a screenshot on [Mastodon] and my buddy John responds with a polite but firm insistence that I explain myself. Here you go, buddy :)
In another terminal, I can play around with this VPP instance a little bit:
pim@summer:~$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
root@d57c3716eee9:/# ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 2.2 0.2 17498852 160300 ? Rs 15:11 0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
root 10 0.0 0.0 4192 3388 pts/0 Ss 15:11 0:00 bash
root 18 0.0 0.0 8104 4056 pts/0 R+ 15:12 0:00 ps auxw
root@d57c3716eee9:/# vppctl
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
vpp-clab# show version
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
vpp-clab# show interfaces
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
local0 0 down 0/0/0/0
Slick! I can see that the container has an eth0
device, which Docker has connected to the main
bridged network. For now, there’s only one process running, pid 1 proudly shows VPP (as in Docker,
the CMD
field will simply replace init
. Later on, I can imagine running a few more daemons like
SSH and so on, but for now, I’m happy.
Looking at VPP itself, it has no network interfaces yet, except for the default local0
interface.
Adding Interfaces in Docker
But if I don’t have DPDK, how will I add interfaces? Enter veth(4)
. From the
[manpage], I learn that veth devices are
virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to
a physical network device in another namespace, but can also be used as standalone network devices.
veth devices are always created in interconnected pairs.
Of course, Docker users will recognize this. It’s like bread and butter for containers to communicate with one another - and with the host they’re running on. I can simply create a Docker network and attach one half of it to a running container, like so:
pim@summer:~$ docker network create --driver=bridge clab-network \
--subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
The first command here creates a new network called clab-network
in Docker. As a result, a new
bridge called br-5711b95c6c32
shows up on the host. The bridge name is chosen from the UUID of the
Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
first address in both:
pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
bridge name bridge id STP enabled interfaces
br-5711b95c6c32 8000.0242099728c6 no veth021e363
pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
br-5711b95c6c32 UP 192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64
The second command creates a veth
pair, and puts one half of it in the bridge, and this interface
is called veth021e363
above. The other half of it pops up as eth1
in the Docker container:
pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1@if530577 UP 02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
One of the many awesome features of VPP is its ability to attach to these veth
devices by means of
its af-packet
driver, by reusing the same MAC address (in this case 02:42:c0:00:02:02
). I first
take a look at the linux [manpage] for it,
and then read up on the VPP
[documentation] on the
topic.
However, my attention is drawn to Docker assigning an IPv4 and IPv6 address to the container:
root@d57c3716eee9:/# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0@if530566 UP 172.17.0.2/16
eth1@if530577 UP 192.0.2.2/24 2001:db8::2/64 fe80::42:c0ff:fe00:202/64
root@d57c3716eee9:/# ip addr del 192.0.2.2/24 dev eth1
root@d57c3716eee9:/# ip addr del 2001:db8::2/64 dev eth1
I decide to remove them from here, as in the end, eth1
will be owned by VPP so it should be
setting the IPv4 and IPv6 addresses. For the life of me, I don’t see how I can avoid Docker from
assinging IPv4 and IPv6 addresses to this container … and the
[docs] seem to be off as well, as they suggest I can pass
a flagg --ipv4=False
but that flag doesn’t exist, at least not on my Bookworm Docker variant. I
make a mental note to discuss this with the folks in the Containerlab community.
Anyway, armed with this knowledge I can bind the container-side veth pair called eth1
to VPP, like
so:
root@d57c3716eee9:/# vppctl
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
vpp-clab# create host-interface name eth1 hw-addr 02:42:c0:00:02:02
vpp-clab# set interface name host-eth1 eth1
vpp-clab# set interface mtu 1500 eth1
vpp-clab# set interface ip address eth1 192.0.2.2/24
vpp-clab# set interface ip address eth1 2001:db8::2/64
vpp-clab# set interface state eth1 up
vpp-clab# show int addr
eth1 (up):
L3 192.0.2.2/24
L3 2001:db8::2/64
local0 (dn):
Results
After all this work, I’ve successfully created a Docker image based on Debian Bookworm and VPP 25.02
(the current stable release version), started a container with it, added a network bridge in Docker,
which binds the host summer
to the container. Proof, as they say, is in the ping-pudding:
pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms
--- 2001:db8::2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms
--- 192.0.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
And in case that simple ping-test wasn’t enough to get you excited, here’s a packet trace from VPP itself, while I’m performing this ping:
vpp-clab# trace add af-packet-input 100
vpp-clab# wait 3
vpp-clab# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1
00:07:03:979275: af-packet-input
af_packet: hw_if_index 1 rx-queue 0 next-index 4
block 47:
address 0x7fbf23b7d000 version 2 seq_num 48 pkt_num 0
tpacket3_hdr:
status 0x20000001 len 98 snaplen 98 mac 92 net 106
sec 0x68164381 nsec 0x258e7659 vlan 0 vlan_tpid 0
vnet-hdr:
flags 0x00 gso_type 0x00 hdr_len 0
gso_size 0 csum_start 0 csum_offset 0
00:07:03:979293: ethernet-input
IP4: 02:42:09:97:28:c6 -> 02:42:c0:00:02:02
00:07:03:979306: ip4-input
ICMP: 192.0.2.1 -> 192.0.2.2
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
fragment id 0x5813, flags DONT_FRAGMENT
ICMP echo_request checksum 0xc16 id 21197
00:07:03:979315: ip4-lookup
fib 0 dpo-idx 9 flow hash: 0x00000000
ICMP: 192.0.2.1 -> 192.0.2.2
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
fragment id 0x5813, flags DONT_FRAGMENT
ICMP echo_request checksum 0xc16 id 21197
00:07:03:979322: ip4-receive
fib:0 adj:9 flow:0x00000000
ICMP: 192.0.2.1 -> 192.0.2.2
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
fragment id 0x5813, flags DONT_FRAGMENT
ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-input
ICMP: 192.0.2.1 -> 192.0.2.2
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
fragment id 0x5813, flags DONT_FRAGMENT
ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-echo-request
ICMP: 192.0.2.1 -> 192.0.2.2
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
fragment id 0x5813, flags DONT_FRAGMENT
ICMP echo_request checksum 0xc16 id 21197
00:07:03:979326: ip4-load-balance
fib 0 dpo-idx 5 flow hash: 0x00000000
ICMP: 192.0.2.2 -> 192.0.2.1
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
fragment id 0x2dc4, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979325: ip4-rewrite
tx_sw_if_index 1 dpo-idx 5 : ipv4 via 192.0.2.1 eth1: mtu:1500 next:3 flags:[] 0242099728c60242c00002020800 flow hash: 0x00000000
00000000: 0242099728c60242c00002020800450000542dc44000400188e1c0000202c000
00000020: 02010000141652cd00018143166800000000399d0900000000001011
00:07:03:979326: eth1-output
eth1 flags 0x02180005
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
ICMP: 192.0.2.2 -> 192.0.2.1
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
fragment id 0x2dc4, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979327: eth1-tx
af_packet: hw_if_index 1 tx-queue 0
tpacket3_hdr:
status 0x1 len 108 snaplen 108 mac 0 net 0
sec 0x0 nsec 0x0 vlan 0 vlan_tpid 0
vnet-hdr:
flags 0x00 gso_type 0x00 hdr_len 0
gso_size 0 csum_start 0 csum_offset 0
buffer 0xf97c4:
current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
local l2-hdr-offset 0 l3-hdr-offset 14
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
ICMP: 192.0.2.2 -> 192.0.2.1
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
fragment id 0x2dc4, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x1416 id 21197
Well, that’s a mouthfull, isn’t it! Here, I get to show you VPP in action. After receiving the
packet on its af-packet-input
node from 192.0.2.1 (Summer, who is pinging us) to 192.0.2.2 (the
VPP container), the packet traverses the dataplane graph. It goes through ethernet-input
, then
ip4-input
, which sees it’s destined to an IPv4 address configured, so the packet is handed to
ip4-receive
. That one sees that the IP protocol is ICMP, so it hands the packet to
ip4-icmp-input
which notices that the packet is an ICMP echo request, so off to
ip4-icmp-echo-request
our little packet goes. The ICMP plugin in VPP now answers by
ip4-rewrite
‘ing the packet, sending the return to 192.0.2.1 at MAC address 02:42:09:97:28:c6
(this is Summer, the host doing the pinging!), after which the newly created ICMP echo-reply is
handed to eth1-output
which marshalls it back into the kernel’s AF_PACKET interface using
eth1-tx
.
Boom. I could not be more pleased.
What’s Next
This was a nice exercise for me! I’m going this direction becaue the
[Containerlab] framework will start containers with given NOS images,
not too dissimilar from the one I just made, and then attaches veth
pairs between the containers.
I started dabbling with a [pull-request], but
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
You see, I will need to generate two files:
- A
startup.conf
file that is specific to the containerlab Docker container. I’d like them to each set their own hostname so that the CLI has a unique prompt. I can do this by settingunix { cli-prompt {{ .ShortName }}# }
in the template renderer. - Containerlab will know all of the veth pairs that are planned to be created into each VPP
container. I’ll need it to then write a little snippet of config that does the
create host-interface
spiel, to attach theseveth
pairs to the VPP dataplane.
I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab. Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that will connect a few VPP containers together with an SR Linux node in a lab. Stand by!
Once we have that, there’s still quite some work for me to do. Notably:
- Configuration persistence.
clab
allows you to save the running config. For that, I’ll need to introduce [vppcfg] and a means to invoke it when the lab operator wants to save their config, and then reconfigure VPP when the container restarts. - I’ll need to have a few files from
clab
shared with the host, notably thestartup.conf
andvppcfg.yaml
, as well as some manual pre- and post-flight configuration for the more esoteric stuff. Building the plumbing for this is a TODO for now.
Acknowledgements
I wanted to give a shout-out to Nardus le Roux who inspired me to contribute this Containerlab VPP node type, and to Roman Dodin for his help getting the Containerlab parts squared away when I got a little bit stuck.
First order of business: get it to ping at all … it’ll go faster from there on out :)