VPP in Containerlab - Part 1

Containerlab Logo

Introduction

From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance. However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP performance almost the same as on bare metal. But did you know that VPP can also run in Docker?

The other day I joined the [ZANOG'25] in Durban, South Africa. One of the presenters was Nardus le Roux of Nokia, and he showed off a project called [Containerlab], which provides a CLI for orchestrating and managing container-based networking labs. It starts the containers, builds a virtual wiring between them to create lab topologies of users choice and manages labs lifecycle.

Quite regularly I am asked ‘when will you add VPP to Containerlab?’, but at ZANOG I made a promise to actually add them. Here I go, on a journey to integrate VPP into Containerlab!

Containerized VPP

The folks at [Tigera] maintain a project called Calico, which accelerates Kubernetes CNI (Container Network Interface) by using [FD.io] VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to reason that it should be possible to run a containerized VPP. I start by reading up on how they create their Docker image, and I learn a lot.

Docker Build

Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based on debian:bookworm as well. The build starts off quite modest:

pim@summer:~$ mkdir -p src/vpp-containerlab
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
FROM debian:bookworm
ARG DEBIAN_FRONTEND=noninteractive
ARG VPP_INSTALL_SKIP_SYSCTL=true
ARG REPO=release
RUN apt-get update && apt-get -y install curl procps && apt-get clean

# Install VPP
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean

CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
EOF
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab

One gotcha - when I install the upstream VPP debian packages, they generate a sysctl file which it tries to execute. However, I can’t set sysctl’s in the container, so the build fails. I take a look at the VPP source code and find src/pkg/debian/vpp.postinst which helpfully contains a means to override setting the sysctl’s, using an environment variable called VPP_INSTALL_SKIP_SYSCTL.

Running VPP in Docker

With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it to run well in a Docker environment. There are a few things I make note of:

  1. We may not have huge pages on the host machine, so I’ll set all the page sizes to the linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but in the case of Containerlab, we’re not here to build high performance stuff, but rather users will be doing functional testing.
  2. DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called poll mode driver to the network cards. It also requires huge pages. Since my first version will be using only virtual ethernet interfaces, I’ll disable DPDK and VFIO alltogether.
  3. VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only one thread. Of course, this will not be a high performance setup, but since I’m already not using hugepages, I’ll use only 1 thread.

The VPP startup.conf configuration file I came up with:

pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
unix {
  interactive
  log /var/log/vpp/vpp.log
  full-coredump
  cli-listen /run/vpp/cli.sock
  cli-prompt vpp-clab#
  cli-no-pager
  poll-sleep-usec 100
}

api-trace {
  on
}

memory {
  main-heap-size 512M
  main-heap-page-size 4k
}
buffers {
  buffers-per-numa 16000
  default data-size 2048
  page-size 4k
}

statseg {
  size 64M
  page-size 4k
  per-node-counters on
}

plugins {
  plugin default { enable }
  plugin dpdk_plugin.so { disable }
}
EOF

Just a couple of notes for those who are running VPP in production. Each of the *-page-size config settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy hugepages. Then, I’ll specifically disable the DPDK plugin, although I didn’t install it in the Dockerfile build, as it lives in its own dedicated Debian package called vpp-plugin-dpdk. Finally, I’ll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration. In production environments, VPP will use 100% of the CPUs it’s assigned, but in this lab, it will not be quite as hungry. By the way, even in this sleepy mode, it’ll still easily handle a gigabit of traffic!

Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost, and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the [manpage]:

  • CAP_SYS_NICE: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and to migrate and move memory pages.
  • CAP_NET_ADMIN: allows to perform various network-relates operations like interface configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
  • CAP_SYS_PTRACE: allows to trace arbitrary processes using ptrace(2), and a few related kernel system calls.

Being a networking dataplane implementation, VPP wants to be able to tinker with network devices. This is not typically allowed in Docker containers, although the Docker developers did make some consessions for those containers that need just that little bit more access. They described it in their [docs] as follows:

| The –privileged flag gives all capabilities to the container. When the operator executes docker | run –privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or | SELinux to allow the container nearly all the same access to the host as processes running outside | containers on the host. Use this flag with caution. For more information about the –privileged | flag, see the docker run reference.

Warning
In this moment, I feel I should point out that running a Docker container with --privileged flag set does give it a lot of privileges. A container with --privileged is not a securely sandboxed process. Containers in this mode can get a root shell on the host and take control over the system.

With that little fineprint warning out of the way, I am going to Yolo like a boss:

pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
                --cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
                --device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
                --privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
                docker.io/pimvanpelt/vpp-containerlab
clab-pim

Configuring VPP in Docker

And with that, the Docker container is running! I post a screenshot on [Mastodon] and my buddy John responds with a polite but firm insistence that I explain myself. Here you go, buddy :)

In another terminal, I can play around with this VPP instance a little bit:

pim@summer:~$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 

root@d57c3716eee9:/# ps auxw   
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  2.2  0.2 17498852 160300 ?     Rs   15:11   0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
root          10  0.0  0.0   4192  3388 pts/0    Ss   15:11   0:00 bash
root          18  0.0  0.0   8104  4056 pts/0    R+   15:12   0:00 ps auxw

root@d57c3716eee9:/# vppctl
    _______    _        _   _____  ___ 
 __/ __/ _ \  (_)__    | | / / _ \/ _ \
 _/ _// // / / / _ \   | |/ / ___/ ___/
 /_/ /____(_)_/\___/   |___/_/  /_/    

vpp-clab# show version
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
vpp-clab# show interfaces
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count     
local0                            0     down          0/0/0/0       

Slick! I can see that the container has an eth0 device, which Docker has connected to the main bridged network. For now, there’s only one process running, pid 1 proudly shows VPP (as in Docker, the CMD field will simply replace init. Later on, I can imagine running a few more daemons like SSH and so on, but for now, I’m happy.

Looking at VPP itself, it has no network interfaces yet, except for the default local0 interface.

Adding Interfaces in Docker

But if I don’t have DPDK, how will I add interfaces? Enter veth(4). From the [manpage], I learn that veth devices are virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to a physical network device in another namespace, but can also be used as standalone network devices. veth devices are always created in interconnected pairs.

Of course, Docker users will recognize this. It’s like bread and butter for containers to communicate with one another - and with the host they’re running on. I can simply create a Docker network and attach one half of it to a running container, like so:

pim@summer:~$ docker network create --driver=bridge clab-network \
                     --subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''

The first command here creates a new network called clab-network in Docker. As a result, a new bridge called br-5711b95c6c32 shows up on the host. The bridge name is chosen from the UUID of the Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the first address in both:

pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
bridge name       bridge id               STP enabled     interfaces
br-5711b95c6c32   8000.0242099728c6       no              veth021e363


pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
br-5711b95c6c32  UP     192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64 

The second command creates a veth pair, and puts one half of it in the bridge, and this interface is called veth021e363 above. The other half of it pops up as eth1 in the Docker container:

pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
eth1@if530577    UP             02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 

One of the many awesome features of VPP is its ability to attach to these veth devices by means of its af-packet driver, by reusing the same MAC address (in this case 02:42:c0:00:02:02). I first take a look at the linux [manpage] for it, and then read up on the VPP [documentation] on the topic.

However, my attention is drawn to Docker assigning an IPv4 and IPv6 address to the container:

root@d57c3716eee9:/# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0@if530566    UP             172.17.0.2/16
eth1@if530577    UP             192.0.2.2/24 2001:db8::2/64 fe80::42:c0ff:fe00:202/64
root@d57c3716eee9:/# ip addr del 192.0.2.2/24  dev eth1
root@d57c3716eee9:/# ip addr del 2001:db8::2/64 dev eth1

I decide to remove them from here, as in the end, eth1 will be owned by VPP so it should be setting the IPv4 and IPv6 addresses. For the life of me, I don’t see how I can avoid Docker from assinging IPv4 and IPv6 addresses to this container … and the [docs] seem to be off as well, as they suggest I can pass a flagg --ipv4=False but that flag doesn’t exist, at least not on my Bookworm Docker variant. I make a mental note to discuss this with the folks in the Containerlab community.

Anyway, armed with this knowledge I can bind the container-side veth pair called eth1 to VPP, like so:

root@d57c3716eee9:/# vppctl
    _______    _        _   _____  ___ 
 __/ __/ _ \  (_)__    | | / / _ \/ _ \
 _/ _// // / / / _ \   | |/ / ___/ ___/
 /_/ /____(_)_/\___/   |___/_/  /_/    

vpp-clab# create host-interface name eth1 hw-addr 02:42:c0:00:02:02
vpp-clab# set interface name host-eth1 eth1
vpp-clab# set interface mtu 1500 eth1
vpp-clab# set interface ip address eth1 192.0.2.2/24
vpp-clab# set interface ip address eth1 2001:db8::2/64
vpp-clab# set interface state eth1 up
vpp-clab# show int addr
eth1 (up):
  L3 192.0.2.2/24
  L3 2001:db8::2/64
local0 (dn):

Results

After all this work, I’ve successfully created a Docker image based on Debian Bookworm and VPP 25.02 (the current stable release version), started a container with it, added a network bridge in Docker, which binds the host summer to the container. Proof, as they say, is in the ping-pudding:

pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms

--- 2001:db8::2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms

--- 192.0.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms

And in case that simple ping-test wasn’t enough to get you excited, here’s a packet trace from VPP itself, while I’m performing this ping:

vpp-clab# trace add af-packet-input 100
vpp-clab# wait 3
vpp-clab# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:07:03:979275: af-packet-input
  af_packet: hw_if_index 1 rx-queue 0 next-index 4
    block 47:
      address 0x7fbf23b7d000 version 2 seq_num 48 pkt_num 0
    tpacket3_hdr:
      status 0x20000001 len 98 snaplen 98 mac 92 net 106
      sec 0x68164381 nsec 0x258e7659 vlan 0 vlan_tpid 0
    vnet-hdr:
      flags 0x00 gso_type 0x00 hdr_len 0
      gso_size 0 csum_start 0 csum_offset 0
00:07:03:979293: ethernet-input
  IP4: 02:42:09:97:28:c6 -> 02:42:c0:00:02:02
00:07:03:979306: ip4-input
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979315: ip4-lookup
  fib 0 dpo-idx 9 flow hash: 0x00000000
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979322: ip4-receive
    fib:0 adj:9 flow:0x00000000
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-input
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-echo-request
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979326: ip4-load-balance
  fib 0 dpo-idx 5 flow hash: 0x00000000
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
    fragment id 0x2dc4, flags DONT_FRAGMENT
  ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979325: ip4-rewrite
  tx_sw_if_index 1 dpo-idx 5 : ipv4 via 192.0.2.1 eth1: mtu:1500 next:3 flags:[] 0242099728c60242c00002020800 flow hash: 0x00000000
  00000000: 0242099728c60242c00002020800450000542dc44000400188e1c0000202c000
  00000020: 02010000141652cd00018143166800000000399d0900000000001011
00:07:03:979326: eth1-output
  eth1 flags 0x02180005
  IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
    fragment id 0x2dc4, flags DONT_FRAGMENT
  ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979327: eth1-tx
  af_packet: hw_if_index 1 tx-queue 0
    tpacket3_hdr:
      status 0x1 len 108 snaplen 108 mac 0 net 0
      sec 0x0 nsec 0x0 vlan 0 vlan_tpid 0
    vnet-hdr:
      flags 0x00 gso_type 0x00 hdr_len 0
      gso_size 0 csum_start 0 csum_offset 0
    buffer 0xf97c4:
      current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
      local l2-hdr-offset 0 l3-hdr-offset 14 
    IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
    ICMP: 192.0.2.2 -> 192.0.2.1
      tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
      fragment id 0x2dc4, flags DONT_FRAGMENT
    ICMP echo_reply checksum 0x1416 id 21197

Well, that’s a mouthfull, isn’t it! Here, I get to show you VPP in action. After receiving the packet on its af-packet-input node from 192.0.2.1 (Summer, who is pinging us) to 192.0.2.2 (the VPP container), the packet traverses the dataplane graph. It goes through ethernet-input, then ip4-input, which sees it’s destined to an IPv4 address configured, so the packet is handed to ip4-receive. That one sees that the IP protocol is ICMP, so it hands the packet to ip4-icmp-input which notices that the packet is an ICMP echo request, so off to ip4-icmp-echo-request our little packet goes. The ICMP plugin in VPP now answers by ip4-rewrite‘ing the packet, sending the return to 192.0.2.1 at MAC address 02:42:09:97:28:c6 (this is Summer, the host doing the pinging!), after which the newly created ICMP echo-reply is handed to eth1-output which marshalls it back into the kernel’s AF_PACKET interface using eth1-tx.

Boom. I could not be more pleased.

What’s Next

This was a nice exercise for me! I’m going this direction becaue the [Containerlab] framework will start containers with given NOS images, not too dissimilar from the one I just made, and then attaches veth pairs between the containers. I started dabbling with a [pull-request], but I got stuck with a part of the Containerlab code that pre-deploys config files into the containers. You see, I will need to generate two files:

  1. A startup.conf file that is specific to the containerlab Docker container. I’d like them to each set their own hostname so that the CLI has a unique prompt. I can do this by setting unix { cli-prompt {{ .ShortName }}# } in the template renderer.
  2. Containerlab will know all of the veth pairs that are planned to be created into each VPP container. I’ll need it to then write a little snippet of config that does the create host-interface spiel, to attach these veth pairs to the VPP dataplane.

I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab. Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that will connect a few VPP containers together with an SR Linux node in a lab. Stand by!

Once we have that, there’s still quite some work for me to do. Notably:

  • Configuration persistence. clab allows you to save the running config. For that, I’ll need to introduce [vppcfg] and a means to invoke it when the lab operator wants to save their config, and then reconfigure VPP when the container restarts.
  • I’ll need to have a few files from clab shared with the host, notably the startup.conf and vppcfg.yaml, as well as some manual pre- and post-flight configuration for the more esoteric stuff. Building the plumbing for this is a TODO for now.

Acknowledgements

I wanted to give a shout-out to Nardus le Roux who inspired me to contribute this Containerlab VPP node type, and to Roman Dodin for his help getting the Containerlab parts squared away when I got a little bit stuck.

First order of business: get it to ping at all … it’ll go faster from there on out :)