VPP Configuration - Part2

VPP

About this series

I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community merged the Linux Control Plane plugin. I wrote about its deployment to both regular servers like the Supermicro routers that run on our AS8298, as well as virtual machines running in KVM/Qemu.

Now that I’ve been running VPP in production for about half a year, I can’t help but notice one specific drawback: VPP is a programmable dataplane, and by design it does not include any configuration or controlplane management stack. It’s meant to be integrated into a full stack by operators. For end-users, this unfortunately means that typing on the CLI won’t persist any configuration, and if VPP is restarted, it will not pick up where it left off. There’s one developer convenience in the form of the exec command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line by line. However, if any typo is made in the file, processing immediately stops. It’s meant as a convenience for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies.

Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts, I’ll detail the work I’ve done to create a configuration utility that can take a YAML configuration file, compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply the configuration to the dataplane. Welcome to vppcfg!

In this second post of the series, I want to talk a little bit about how planning a path from a running configuration to a desired new configuration might look like.

Note: Code is on my Github, but it’s not quite ready for prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves) or reach out by contacting us.

VPP Config: a DAG

Before we dive into my vppcfg code, let me first introduce a mental model of how configuration is built. We rarely stop and think about it, but when we configure our routers (no matter if it’s a Cisco or a Juniper or a VPP router), in our mind we logically order the operations in a very particular way. To state the obvious, if I want to create a sub-interface which also has an address, I would create the sub-int before adding the address, right? Similarly, if I wanted to expose a sub-interface Hu12/0/0.100 in Linux as a LIP, I would create it only after having created a LIP for the parent interface Hu12/0/0, to satisfy Linux’s requirement all sub-interfaces have a parent interface, like so:

vpp# create sub HundredGigabitEthernet12/0/0 100
vpp# set interface ip address HundredGigabitEthernet12/0/0.100 192.0.2.1/29
vpp# lcp create HundredGigabitEthernet12/0/0 host-if ice0
vpp# lcp create HundredGigabitEthernet12/0/0.100 host-if ice0.100
vpp# set interface state HundredGigabitEthernet12/0/0 up
vpp# set interface state HundredGigabitEthernet12/0/0.100 up

Of course some of the ordering doesn’t strictly matter. For example, I can set the state of Hu12/0/0.100 up before adding the address, or after adding the address, or even after adding the LIP, but one thing is certain: I cannot set its state to up before it was created in the first place! In the other direction, when removing things, it’s easy to see that you cannot manipulate the state of a sub-interface after deleting it, so to cleanly remove the construction above, I would have to walk the statements back in reverse, like so:

vpp# set interface state HundredGigabitEthernet12/0/0.100 down
vpp# set interface state HundredGigabitEthernet12/0/0 down
vpp# lcp delete HundredGigabitEthernet12/0/0.100 host-if ice0.100
vpp# lcp delete HundredGigabitEthernet12/0/0 host-if ice0
vpp# set interface ip address del HundredGigabitEthernet12/0/0.100 192.0.2.1/29
vpp# delete sub HundredGigabitEthernet12/0/0.100

Because of this reasonably straight forward ordering, it’s possible to construct a graph showing operations that depend on other operations having been completed beforehand. Such a graph, called a Directed Acyclic Graph or DAG.

DAG

First some theory (from Wikipedia): A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, but the one I’m mostly interested here is dependency mapping and computational scheduling.

A graph is formed by vertices and by edges connecting pairs of vertices, where the vertices are objects that might exist in VPP (interfaces, bridge-domains, VXLAN tunnels, IP addresses, etc), and these objects are connected in pairs by edges. In the case of a directed graph, each edge has an orientation (or direction), from one (source) vertex to another (destination) vertex. A path in a directed graph is a sequence of edges having the property that the ending vertex of each edge in the sequence is the same as the starting vertex of the next edge in the sequence; a path forms a cycle if the starting vertex of its first edge equals the ending vertex of its last edge. A directed acyclic graph is a directed graph that has no cycles, which in this particular case means that objects' existence can’t rely other things that ultimately rely back on their own existence.

After I got that technobabble out of the way, practically speaking, the edges in this graph model dependencies, let me give a few examples:

  1. The arrow from Sub Interface pointing at BondEther and Physical Int makes the claim that for the sub-int to exist, it depends on the existence of either a BondEthernet, or a PHY.
  2. The arrow from the BondEther to the Physical Int, which makes the claim that for the BondEthernet to work, it must have one or more PHYs in it.
  3. There is no arrow between BondEther and Sub Interface which makes the claim that they are independent, there is no need for a sub-int to exist in order for a BondEthernet to work.

VPP Config: Ordering

In my previous post, I talked about a bunch of constraints that make certain YAML configurations invalid (for example, having both dot1q and dot1ad on a sub-interface, that wouldn’t make any sense). Here, I’m going to talk about another type of constraint: Temporal Constraints are statements about the ordering of operations. With the example DAG above, I derive the following constraints:

  • A parent interface must exist before a sub-interface can be created on it
  • An interface (regardless of sub-int or phy) must exist before an IP address can be added to it
  • A LIP can be created on a sub-int only if its parent PHY has a LIP
  • LIPs must be removed from all sub-interfaces before a PHY’s LIP can be removed
  • The admin-state of a sub-interface can only be up if its PHY is up
  • … and so on.

But there’s a second thing to keep in mind, and this is a bit more specific to the VPP configuration operations themselves. Sometimes, I may find that an object already exists, say a sub-interface, but that it has configuration attributes that are not what I wanted. For example, I may have previously configured a sub-int to be of a certain encapsulation dot1q 1000 inner-dot1q 1234, but I changed my mind and want the sub-int to now be dot1ad 1000 inner-dot1q 1234 instead. Some attributes of an interface can be changed on the fly (like the MTU, for example), but some really cannot, and in my example here, the encapsulation change has to be done another way.

I’ll make an obvious but hopefully helpful observation: I can’t create the second sub-int with the same subid, because one already exists (duh). The intuitive way to solve this, of course, is to delete the old sub-int first and then create a new sub-int with the correct attributes (dot1ad outer encapsulation).

Here’s another scenario that illustrates the ordering: Let’s say I want to move an IP address from interface A to interface B. In VPP, I can’t configure the same IP address/prefixlen on two interfaces at the same time, so as with the previous scenario of the encap changing, I will want to remove the IP address from A before adding it to B.

Come to think of it, there are lots of scenarios where remove-before-add is required:

  • If an interface was in bridge-domain A but now wants to be put in bridge-domain B, it’ll have to be removed from the first bridge before being added to the second bridge, because an interface can’t be in two bridges at the same time.
  • If an interface was a member of a BondEthernet, but will be moved to be a member of a bridge-domain now, it will have to be removed from the bond before being added to the bridge, because an interface can’t be both a bondethernet member and a member of a bridge at the same time.
  • And to add to the list, the scenario above: A sub-interface that differs in its intended encapsulation must be removed before a new one with the same subid can be created.

All of these cases can be modeled as edges (arrows) between vertices (objects) in the graph describing the ordering of operations in VPP! I’m now ready to draw two important conclusions:

  1. All objects that differ from their intended configuration must be removed before being added elsewhere, in order to avoid them being referenced/used twice.
  2. All objects must be created before their attributes can be set.

vppcfg: Path Planning

By thinking about the configuration in this way, I can precisely predict the order of operations needed to go from any running dataplane configuration to any new target dataplane configuration. A so called path-planner emerges, which has three main phases of execution:

  1. Prune phase (remove objects from VPP that are not in the config)
  2. Create phase (add objects to VPP that are in the config but not VPP)
  3. Sync phase, for each object in the configuration

When removing things, care has to be taken to remove inner-most objects first (first removing LCP, then QinQ, Dot1Q, BondEthernet, and lastly PHY), because indeed, there exists a dependency relationship between objects in this DAG. Conversely, when creating objects, the edges flip their directionality, because creation must be done on outer-most objects first (first creating the PHY, then BondEthernet, Dot1Q, QinQ and lastly LCP).

For example, QinQ/QinAD sub-interfaces should be removed before before their intermediary Dot1Q/Dot1AD can be removed. Another example, MTU of parents should raise before their children, while children should shrink before their parent.

Order matters.

Pruning: First, vppcfg will ensure all objects do not have attributes which they should not (eg. IP addresses) and that objects are destroyed that are not needed (ie. have been removed from the target config). After this phase, I am certain that any object that exists in the dataplane, both (a) has the right to exist (because it’s in the target configuration), and (b) has the correct create-time (ie non syncable) attributes.

Creating: Next, vppcfg will ensure that all objects that are not yet present (including the ones that it just removed because they were present but had incorrect attributes), get (re)created in the right order. After this phase, I am certain that all objects in the dataplane now (a) have the right to exist (because they are in the target configuration), (b) have the correct attributes, but newly, also that (c) all objects that are in the target configuration also got created and now exist in the dataplane.

Syncing: Finally, all objects are synchronized with the target configuration (IP addresses, MTU etc), taking care to shrink children before their parents, and growing parents before their children (this is for the special case of any given sub-interface’s MTU having to be equal to or lower than their parent’s MTU).

vppcfg: Demonstration

I’ll create three configurations and let vppcfg path-plan between them. I start a completely empty VPP dataplane which has two GigabitEthernet and two HundredGigabitEthernet interfaces:

pim@hippo:~/src/vpp$ make run
    _______    _        _   _____  ___
 __/ __/ _ \  (_)__    | | / / _ \/ _ \
 _/ _// // / / / _ \   | |/ / ___/ ___/
 /_/ /____(_)_/\___/   |___/_/  /_/

DBGvpp# show interface
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count
GigabitEthernet3/0/0              1     down         9000/0/0/0
GigabitEthernet3/0/1              2     down         9000/0/0/0
HundredGigabitEthernet12/0/0      3     down         9000/0/0/0
HundredGigabitEthernet12/0/1      4     down         9000/0/0/0
local0                            0     down          0/0/0/0

Demo 1: First time config (empty VPP)

First, starting simple, I write the following YAML configuration called hippo4.yaml. It defines a few sub-interfaces, a bridgedomain with one QinQ sub-interface Hu12/0/0.101 in it, and it then cross-connects Gi3/0/0.100 with Hu12/0/1.100, keeping all sub-interfaces at an MTU of 2000 and their PHYs at an MTU of 9216:

interfaces:
  GigabitEthernet3/0/0:
    mtu: 9216
    sub-interfaces:
      100:
        mtu: 2000
        l2xc: HundredGigabitEthernet12/0/1.100
  GigabitEthernet3/0/1:
    description: Not Used
  HundredGigabitEthernet12/0/0:
    mtu: 9216
    sub-interfaces:
      100:
        mtu: 3000
      101:
        mtu: 2000
        encapsulation:
          dot1q: 100
          inner-dot1q: 200
          exact-match: True
  HundredGigabitEthernet12/0/1:
    mtu: 9216
    sub-interfaces:
      100:
        mtu: 2000
        l2xc: GigabitEthernet3/0/0.100

bridgedomains:
  bd10:
    description: "Bridge Domain 10"
    mtu: 2000
    interfaces: [ HundredGigabitEthernet12/0/0.101 ]

If I offer this config to vppcfg and ask it to plan a path, there won’t be any pruning going on, because there are no objects in the newly started VPP dataplane that need to be deleted. But I do expect to see a bunch of sub-interface and one bridge-domain creation, followed by syncing a bunch of interfaces with bridge-domain memberships and L2 Cross Connects. Finally, the MTU of the interfaces will be sync’d to their configured values, and the path is planned like so:

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
[INFO    ] root.main: Loading configfile hippo4.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
create sub GigabitEthernet3/0/0 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/0 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/1 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/0 101 dot1q 100 inner-dot1q 200 exact-match
create bridge-domain 10
set interface l2 bridge HundredGigabitEthernet12/0/0.101 10
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 pop 2
set interface l2 xconnect GigabitEthernet3/0/0.100 HundredGigabitEthernet12/0/1.100
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 pop 1
set interface l2 xconnect HundredGigabitEthernet12/0/1.100 GigabitEthernet3/0/0.100
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 pop 1
set interface mtu 9216 GigabitEthernet3/0/0
set interface mtu 9216 HundredGigabitEthernet12/0/0
set interface mtu 9216 HundredGigabitEthernet12/0/1
set interface mtu packet 1500 GigabitEthernet3/0/1
set interface mtu packet 9216 GigabitEthernet3/0/0
set interface mtu packet 9216 HundredGigabitEthernet12/0/0
set interface mtu packet 9216 HundredGigabitEthernet12/0/1
set interface mtu packet 2000 GigabitEthernet3/0/0.100
set interface mtu packet 3000 HundredGigabitEthernet12/0/0.100
set interface mtu packet 2000 HundredGigabitEthernet12/0/1.100
set interface mtu packet 2000 HundredGigabitEthernet12/0/0.101
set interface mtu 1500 GigabitEthernet3/0/1
set interface state GigabitEthernet3/0/0 up
set interface state GigabitEthernet3/0/0.100 up
set interface state GigabitEthernet3/0/1 up
set interface state HundredGigabitEthernet12/0/0 up
set interface state HundredGigabitEthernet12/0/0.100 up
set interface state HundredGigabitEthernet12/0/0.101 up
set interface state HundredGigabitEthernet12/0/1 up
set interface state HundredGigabitEthernet12/0/1.100 up
[INFO    ] root.main: Planning succeeded

On the vppctl commandline, I can simply cut-and-paste these CLI commands and the dataplane ends up configured exactly like was desired in the hippo4.yaml configuration file. One nice way to tell if the reconciliation of the config file into the running VPP instance was successful is by running the planner again with the same YAML config file. It should not find anything worth pruning, creating nor syncing, and indeed:

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
[INFO    ] root.main: Loading configfile hippo4.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO    ] root.main: Planning succeeded

Demo 2: Moving from one config to another

To demonstrate how my reconciliation algorithm works in practice, I decide to invent a radically different configuration for Hippo, called hippo12.yaml, in which a new BondEthernet appears, two of its sub-interfaces are cross connected, Hu12/0/0 now gets a LIP and some IP addresses, and the bridge-domain bd10 is replaced by two others, bd1 and bd11, the former of which also sports a BVI (with a LIP called bvi1) and a VXLAN Tunnel bridged into bd1 for good measure:

bondethernets:
  BondEthernet0:
    interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ]

interfaces:
  GigabitEthernet3/0/0:
    mtu: 9000
    description: "LAG #1"
  GigabitEthernet3/0/1:
    mtu: 9000
    description: "LAG #2"

  HundredGigabitEthernet12/0/0:
    lcp: "ice12-0-0"
    mtu: 9000
    addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ]
    sub-interfaces:
      1234:
        mtu: 1200
        lcp: "ice0.1234"
        encapsulation:
          dot1q: 1234
          exact-match: True
      1235:
        mtu: 1100
        lcp: "ice0.1234.1000"
        encapsulation:
          dot1q: 1234
          inner-dot1q: 1000
          exact-match: True

  HundredGigabitEthernet12/0/1:
    mtu: 2000
    description: "Bridged"
  BondEthernet0:
    mtu: 9000
    lcp: "bond0"
    sub-interfaces:
      10:
        lcp: "bond0.10"
        mtu: 3000
      100:
        mtu: 2500
        l2xc: BondEthernet0.200
        encapsulation:
           dot1q: 100
           exact-match: False
      200:
        mtu: 2500
        l2xc: BondEthernet0.100
        encapsulation:
           dot1q: 200
           exact-match: False
      500:
        mtu: 2000
        encapsulation:
           dot1ad: 500
           exact-match: False
      501:
        mtu: 2000
        encapsulation:
           dot1ad: 501
           exact-match: False
  vxlan_tunnel1:
    mtu: 2000

loopbacks:
  loop0:
    lcp: "lo0"
    addresses: [ 10.0.0.1/32, 2001:db8::1/128 ]
  loop1:
    lcp: "bvi1"
    addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ]

bridgedomains:
  bd1:
    mtu: 2000
    bvi: loop1
    interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ]
  bd11:
    mtu: 1500

vxlan_tunnels:
  vxlan_tunnel1:
    local: 192.0.2.1
    remote: 192.0.2.2
    vni: 101

Before writing vppcfg, the art of moving from hippo4.yaml to this radically different hippo12.yaml would be a nightmare, and almost certainly have caused me to miss a step and cause an outage. But, due to the fundamental understanding of ordering, and the methodical execution of pruning, creating and syncing the objects, the path planner comes up with the following sequence, which I’ll break down in its three constituent phases:

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
[INFO    ] root.main: Loading configfile hippo12.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
set interface state HundredGigabitEthernet12/0/0.101 down
set interface state GigabitEthernet3/0/0.100 down
set interface state HundredGigabitEthernet12/0/0.100 down
set interface state HundredGigabitEthernet12/0/1.100 down
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 disable
set interface l3 HundredGigabitEthernet12/0/0.101
create bridge-domain 10 del
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 disable
set interface l3 GigabitEthernet3/0/0.100
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 disable
set interface l3 HundredGigabitEthernet12/0/1.100
delete sub HundredGigabitEthernet12/0/0.101
delete sub GigabitEthernet3/0/0.100
delete sub HundredGigabitEthernet12/0/0.100
delete sub HundredGigabitEthernet12/0/1.100

First, vppcfg concludes that Hu12/0/0.101, Hu12/0/1.100 and Gi3/0/0.100 are no longer needed, so it sets them all admin-state down. The bridge-domain bd10 no longer has the right to exist, the poor thing. But before it is deleted, the interface that was in bd10 can be pruned (membership depends on the bridge, so in pruning, dependencies are removed before dependents). Considering Hu12/0/1.101 and Gi3/0/0.100 were an L2XC pair before, they are returned to default (L3) mode and because it’s no longer needed, the VLAN Gymnastics tag rewriting is also cleaned up for both interfaces. Finally, the sub-interfaces that do not appear in the target configuration are deleted, completing the pruning phase.

It then continues with the create phase:

create loopback interface instance 0
create loopback interface instance 1
create bond mode lacp load-balance l34 id 0
create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 instance 1 vni 101 decap-next l2
create sub HundredGigabitEthernet12/0/0 1234 dot1q 1234 exact-match
create sub BondEthernet0 10 dot1q 10 exact-match
create sub BondEthernet0 100 dot1q 100
create sub BondEthernet0 200 dot1q 200
create sub BondEthernet0 500 dot1ad 500
create sub BondEthernet0 501 dot1ad 501
create sub HundredGigabitEthernet12/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
create bridge-domain 1
create bridge-domain 11
lcp create HundredGigabitEthernet12/0/0 host-if ice12-0-0
lcp create BondEthernet0 host-if bond0
lcp create loop0 host-if lo0
lcp create loop1 host-if bvi1
lcp create HundredGigabitEthernet12/0/0.1234 host-if ice0.1234
lcp create BondEthernet0.10 host-if bond0.10
lcp create HundredGigabitEthernet12/0/0.1235 host-if ice0.1234.1000

Here, interfaces are created in order of loopbacks first, then BondEthernets, then Tunnels, and finally sub-interfaces, first creating single-tagged and then creating dual-tagged sub-interfaces. Of course, the BondEthernet has to be created before any sub-int will be able to be created on it. Note that the QinQ Hu12/0/0.1235 will be created after its intermediary parent Hu12/0/0.1234 due to this ordering requirement.

Then, the two new bridgedomains bd1 and bd11 are created, and finally the LIP plumbing is performed, starting with the PHY ice12-0-0 and BondEthernet bond0, then the two loopbacks, and only then advancing to the two single-tag dot1q interfaces and finally the QinQ interface. For LCPs, this is very important, because in Linux, the interfaces are a tree, not a list. ice12-0-0 must be created before its child ice0.1234@ice12-0-0 can be created, and only then can the QinQ ice0.1234.1000@ice0.1234 be created. This creation order follows from the DAG having an edge signalling an LCP depending on the sub-interface, and an edge between the sub-interface with two tags depending on the sub-interface with one tag, and an edge between the single-tagged sub-interface depending on its PHY.

After all this work, vppcfg can assert (a) every object that now exists in VPP is in the target configuration and (b) that any object that exists in the configuration also is present in VPP (with the correct attributes).

But there’s one last thing to do, and that’s ensure that the attributes that can be changed at runtime (IP addresses, L2XCs, BondEthernet and bridge-domain members, etc) , are sync’d into their respective objects in VPP based on what’s in the target configuration:

bond add BondEthernet0 GigabitEthernet3/0/0
bond add BondEthernet0 GigabitEthernet3/0/1
comment { ip link set bond0 address 00:25:90:0c:05:01 }
set interface l2 bridge loop1 1 bvi
set interface l2 bridge BondEthernet0.500 1
set interface l2 tag-rewrite BondEthernet0.500 pop 1
set interface l2 bridge BondEthernet0.501 1
set interface l2 tag-rewrite BondEthernet0.501 pop 1
set interface l2 bridge HundredGigabitEthernet12/0/1 1
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1 disable
set interface l2 bridge vxlan_tunnel1 1
set interface l2 tag-rewrite vxlan_tunnel1 disable
set interface l2 xconnect BondEthernet0.100 BondEthernet0.200
set interface l2 tag-rewrite BondEthernet0.100 pop 1
set interface l2 xconnect BondEthernet0.200 BondEthernet0.100
set interface l2 tag-rewrite BondEthernet0.200 pop 1
set interface state GigabitEthernet3/0/1 down
set interface mtu 9000 GigabitEthernet3/0/1
set interface state GigabitEthernet3/0/1 up
set interface mtu packet 9000 GigabitEthernet3/0/0
set interface mtu packet 9000 HundredGigabitEthernet12/0/0
set interface mtu packet 2000 HundredGigabitEthernet12/0/1
set interface mtu packet 2000 vxlan_tunnel1
set interface mtu packet 1500 loop0
set interface mtu packet 1500 loop1
set interface mtu packet 9000 GigabitEthernet3/0/1
set interface mtu packet 1200 HundredGigabitEthernet12/0/0.1234
set interface mtu packet 3000 BondEthernet0.10
set interface mtu packet 2500 BondEthernet0.100
set interface mtu packet 2500 BondEthernet0.200
set interface mtu packet 2000 BondEthernet0.500
set interface mtu packet 2000 BondEthernet0.501
set interface mtu packet 1100 HundredGigabitEthernet12/0/0.1235
set interface state GigabitEthernet3/0/0 down
set interface mtu 9000 GigabitEthernet3/0/0
set interface state GigabitEthernet3/0/0 up
set interface state HundredGigabitEthernet12/0/0 down
set interface mtu 9000 HundredGigabitEthernet12/0/0
set interface state HundredGigabitEthernet12/0/0 up
set interface state HundredGigabitEthernet12/0/1 down
set interface mtu 2000 HundredGigabitEthernet12/0/1
set interface state HundredGigabitEthernet12/0/1 up
set interface ip address HundredGigabitEthernet12/0/0 192.0.2.17/30
set interface ip address HundredGigabitEthernet12/0/0 2001:db8:3::1/64
set interface ip address loop0 10.0.0.1/32
set interface ip address loop0 2001:db8::1/128
set interface ip address loop1 10.0.1.1/24
set interface ip address loop1 2001:db8:1::1/64
set interface state HundredGigabitEthernet12/0/0.1234 up
set interface state HundredGigabitEthernet12/0/0.1235 up
set interface state BondEthernet0 up
set interface state BondEthernet0.10 up
set interface state BondEthernet0.100 up
set interface state BondEthernet0.200 up
set interface state BondEthernet0.500 up
set interface state BondEthernet0.501 up
set interface state vxlan_tunnel1 up
set interface state loop0 up
set interface state loop1 up

I’m not gonna lie, it’s a tonne of work, but it’s all a pretty staight forward juggle. The sync phase will look at each object in the config and ensure that the attributes that same object has in the dataplane are present and correct. In my demo, hippo12.yaml creates a lot of interfaces and IP addresses, and changes the MTU of pretty much every interface, but in order:

  • The bondethernet gets its members Gi3/0/0 and Gi3/0/1. As an interesting aside, when VPP creates a BondEthernet it’ll initially assign it an ephemeral MAC address. Then, when its first member is added, the MAC address of the BondEthernet will change to that of the first member. The comment reminds me to also set this MAC on the Linux device bond0. In the future, I’ll add some PyRoute2 code to do that automatically.
  • BridgeDomains are next. The BVI loop1 is added first, then a few sub-interfaces and a tunnel, and VLAN tag-rewriting for tagged interfaces is configured. There are two bridges, but only one of them has members, so there’s not much (in fact, there’s nothing) to do for the other one.
  • L2 Cross Connects can be changed at runtime, and they’re next. The two interfaces BE0.100 and BE0.200 are connected to one another and tag-rewrites are set up for them, considering they are both tagged sub-interfaces.
  • MTU is next. There’s two variants of this. The first one set interface mtu is actually a change in the DPDK driver to change the maximum allowable frame size. For this change, some interface types have to be brought down first, the max frame size changed, and then brought back up again. For all the others, the MTU will be changed in a specific order:
    1. PHYs will grow their MTU first, as growing a PHY is guaranteed to be always safe.
    2. Sub-interfaces will shrink QinX first, then Dot1Q/Dot1AD, then untagged interfaces. This is to ensure we do not leave VPP and LinuxCP in a state where a QinQ sub-int has a higher MTU than any of its parents.
    3. Sub-interfaces will grow untagged first, then DOt1Q/Dot1AD, and finally QinX sub-interfaces. Same reason as step 2, no sub-interface will end up with a higher MTU than any of its parents.
    4. PHYs will shrink their MTU last. The YAML configuration validation asserts that no PHY can have an MTU lower than any of its children, so this is safe.
  • Finally, IP addresses are added to Hu12/0/0, loop0 and loop1. I can guarantee that adding IP addresses will not clash with any other interface, because pruning would’ve removed IP addresses from interfaces where they don’t belong previously.
  • And to finish off, the admin state for interfaces is set, again going from PHY, Bond, Tunnel, 1-tagged sub-interfaces and finally 2-tagged sub-interfaces and loopbacks.

Let’s take it to the test:

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan -o hippo4-to-12.exec
[INFO    ] root.main: Loading configfile hippo12.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO    ] vppcfg.reconciler.write: Wrote 94 lines to hippo4-to-12.exec
[INFO    ] root.main: Planning succeeded

pim@hippo:~/src/vppcfg$ vppctl exec ~/src/vppcfg/hippo4-to-12.exec

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
[INFO    ] root.main: Loading configfile hippo12.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO    ] root.main: Planning succeeded

Notice that after applying hippo4-to-12.exec, the planner had nothing else to say. VPP is now in the target configuration state, slick!

Demo 3: Returning VPP to empty

This one is easy, but shows the pruning in action. Let’s say I wanted to return VPP to a default configuration without any objects, and its interfaces all at MTU 1500:

interfaces:
  GigabitEthernet3/0/0:
    mtu: 1500
    description: Not Used
  GigabitEthernet3/0/1:
    mtu: 1500
    description: Not Used
  HundredGigabitEthernet12/0/0:
    mtu: 1500
    description: Not Used
  HundredGigabitEthernet12/0/1:
    mtu: 1500
    description: Not Used

Simply applying that plan:

pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo-empty.yaml plan -o 12-to-empty.exec
[INFO    ] root.main: Loading configfile hippo-empty.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO    ] vppcfg.reconciler.write: Wrote 66 lines to 12-to-empty.exec
[INFO    ] root.main: Planning succeeded

pim@hippo:~/src/vppcfg$ vppctl
vpp# exec ~/src/vppcfg/12-to-empty.exec
vpp# show interface
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count
GigabitEthernet3/0/0              1      up          1500/0/0/0
GigabitEthernet3/0/1              2      up          1500/0/0/0
HundredGigabitEthernet12/0/0      3      up          1500/0/0/0
HundredGigabitEthernet12/0/1      4      up          1500/0/0/0
local0                            0     down          0/0/0/0

Final notes

Now you may have been wondering why I would call the first file hippo4.yaml and the second one hippo12.yaml. This is because I have 20 such YAML files that bring Hippo into all sorts of esoteric configuration states, and I do this so that I can do a full integration test of any config morphing into any other config:

for i in hippo[0-9]*.yaml; do
  echo "Clearing: Moving to hippo-empty.yaml"
  ./vppcfg -c hippo-empty.yaml > /tmp/vppcfg-exec-empty
  [ -s /tmp/vppcfg-exec-empty ] && vppctl exec /tmp/vppcfg-exec-empty

  for j in hippo[0-9]*.yaml; do
    echo " - Moving to $i .. "
    ./vppcfg -c $i > /tmp/vppcfg-exec_$i
    [ -s /tmp/vppcfg-exec_$i ] && vppctl exec /tmp/vppcfg-exec_$i

    echo " - Moving from $i to $j"
    ./vppcfg -c $j > /tmp/vppcfg-exec_${i}_${j}
    [ -s /tmp/vppcfg-exec_${i}_${j} ] && vppctl exec /tmp/vppcfg-exec_${i}_${j}

    echo " - Checking that from $j to $j is empty"
    ./vppcfg -c $j > /tmp/vppcfg-exec_${j}_${j}_null
  done
done

What this does is starts off Hippo with an empty config, then moves it to hippo1.yaml and from there it moves the configuration to each YAML file and back to hippo1.yaml. Doing this proves, that no matter which configuration I want to obtain, I can get there safely when the VPP dataplane config starts out looking like what is described in hippo1.yaml. I’ll then move it back to empty, and into hippo2.yaml, doing the whole cycle again. So for 20 files, this means ~400 or so configuration transitions. And some of these are special, notably moving from hippoN.yaml to the same hippoN.yaml should result in zero diffs.

With this path planner reasonably well tested, I have pretty high confidence that vppcfg can change the dataplane from any existing configuration to any desired target configuration.

What’s next

One thing that I didn’t mention yet, is that the vppcfg path planner works by reading the API configuration state exactly once (at startup), and then it figures out the CLI calls to print without needing to talk to VPP again. This is super useful as it’s a non-intrusive way to inspect the changes before applying them, and it’s a property I’d like to carry forward.

However, I don’t necessarily think that emitting the CLI statements is the best user experience, it’s more for the purposes of analysis that they can be useful. What I really want to do is emit API calls after the plan is created and reviewed/approved, directly reprogramming the VPP dataplane, and likely the Linux network namespace interfaces as well, for example setting the MAC address of a BondEthernet as I showed in that one comment above, or setting interface alias names based on the configured descriptions.

However, the VPP API set needed to do this is not 100% baked yet. For example, I observed crashes when tinkering with BVIs and Loopbacks (thread), and fixed a few obvious errors in the Linux CP API (gerrit) but there are still a few more issues to work through before I can set the next step with vppcfg.

But for now, it’s already helping me out tremendously at IPng Networks and I hope it’ll be useful for others, too.