About this series
I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community merged the Linux Control Plane plugin. I wrote about its deployment to both regular servers like the Supermicro routers that run on our AS8298, as well as virtual machines running in KVM/Qemu.
Now that I’ve been running VPP in production for about half a year, I can’t help but notice one specific
drawback: VPP is a programmable dataplane, and by design it does not include any configuration or
controlplane management stack. It’s meant to be integrated into a full stack by operators. For end-users,
this unfortunately means that typing on the CLI won’t persist any configuration, and if VPP is restarted,
it will not pick up where it left off. There’s one developer convenience in the form of the exec
command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line
by line. However, if any typo is made in the file, processing immediately stops. It’s meant as a convenience
for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies.
Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts,
I’ll detail the work I’ve done to create a configuration utility that can take a YAML configuration file,
compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply
the configuration to the dataplane. Welcome to vppcfg
!
In this second post of the series, I want to talk a little bit about how planning a path from a running configuration to a desired new configuration might look like.
Note: Code is on my Github, but it’s not quite ready for prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves) or reach out by contacting us.
VPP Config: a DAG
Before we dive into my vppcfg
code, let me first introduce a mental model of how configuration is built. We
rarely stop and think about it, but when we configure our routers (no matter if it’s a Cisco or a Juniper or
a VPP router), in our mind we logically order the operations in a very particular way. To state the obvious,
if I want to create a sub-interface which also has an address, I would create the sub-int before adding the
address, right? Similarly, if I wanted to expose a sub-interface Hu12/0/0.100
in Linux as a LIP, I would
create it only after having created a LIP for the parent interface Hu12/0/0
, to satisfy Linux’s
requirement all sub-interfaces have a parent interface, like so:
vpp# create sub HundredGigabitEthernet12/0/0 100
vpp# set interface ip address HundredGigabitEthernet12/0/0.100 192.0.2.1/29
vpp# lcp create HundredGigabitEthernet12/0/0 host-if ice0
vpp# lcp create HundredGigabitEthernet12/0/0.100 host-if ice0.100
vpp# set interface state HundredGigabitEthernet12/0/0 up
vpp# set interface state HundredGigabitEthernet12/0/0.100 up
Of course some of the ordering doesn’t strictly matter. For example, I can set the state of
Hu12/0/0.100
up before adding the address, or after adding the address, or even after adding the
LIP, but one thing is certain: I cannot set its state to up before it was created in the first place!
In the other direction, when removing things, it’s easy to see that you cannot manipulate the state
of a sub-interface after deleting it, so to cleanly remove the construction above, I would have to
walk the statements back in reverse, like so:
vpp# set interface state HundredGigabitEthernet12/0/0.100 down
vpp# set interface state HundredGigabitEthernet12/0/0 down
vpp# lcp delete HundredGigabitEthernet12/0/0.100 host-if ice0.100
vpp# lcp delete HundredGigabitEthernet12/0/0 host-if ice0
vpp# set interface ip address del HundredGigabitEthernet12/0/0.100 192.0.2.1/29
vpp# delete sub HundredGigabitEthernet12/0/0.100
Because of this reasonably straight forward ordering, it’s possible to construct a graph showing operations that depend on other operations having been completed beforehand. Such a graph, called a Directed Acyclic Graph or DAG.
First some theory (from Wikipedia): A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, but the one I’m mostly interested here is dependency mapping and computational scheduling.
A graph is formed by vertices and by edges connecting pairs of vertices, where the vertices are objects that might exist in VPP (interfaces, bridge-domains, VXLAN tunnels, IP addresses, etc), and these objects are connected in pairs by edges. In the case of a directed graph, each edge has an orientation (or direction), from one (source) vertex to another (destination) vertex. A path in a directed graph is a sequence of edges having the property that the ending vertex of each edge in the sequence is the same as the starting vertex of the next edge in the sequence; a path forms a cycle if the starting vertex of its first edge equals the ending vertex of its last edge. A directed acyclic graph is a directed graph that has no cycles, which in this particular case means that objects' existence can’t rely other things that ultimately rely back on their own existence.
After I got that technobabble out of the way, practically speaking, the edges in this graph model dependencies, let me give a few examples:
- The arrow from Sub Interface pointing at BondEther and Physical Int makes the claim that for the sub-int to exist, it depends on the existence of either a BondEthernet, or a PHY.
- The arrow from the BondEther to the Physical Int, which makes the claim that for the BondEthernet to work, it must have one or more PHYs in it.
- There is no arrow between BondEther and Sub Interface which makes the claim that they are independent, there is no need for a sub-int to exist in order for a BondEthernet to work.
VPP Config: Ordering
In my previous post, I talked about a bunch of constraints that make certain YAML configurations invalid (for example, having both dot1q and dot1ad on a sub-interface, that wouldn’t make any sense). Here, I’m going to talk about another type of constraint: Temporal Constraints are statements about the ordering of operations. With the example DAG above, I derive the following constraints:
- A parent interface must exist before a sub-interface can be created on it
- An interface (regardless of sub-int or phy) must exist before an IP address can be added to it
- A LIP can be created on a sub-int only if its parent PHY has a LIP
- LIPs must be removed from all sub-interfaces before a PHY’s LIP can be removed
- The admin-state of a sub-interface can only be up if its PHY is up
- … and so on.
But there’s a second thing to keep in mind, and this is a bit more specific to the VPP configuration
operations themselves. Sometimes, I may find that an object already exists, say a sub-interface, but
that it has configuration attributes that are not what I wanted. For example, I may have previously
configured a sub-int to be of a certain encapsulation dot1q 1000 inner-dot1q 1234
, but I changed
my mind and want the sub-int to now be dot1ad 1000 inner-dot1q 1234
instead. Some attributes of
an interface can be changed on the fly (like the MTU, for example), but some really cannot, and in
my example here, the encapsulation change has to be done another way.
I’ll make an obvious but hopefully helpful observation: I can’t create the second sub-int with
the same subid, because one already exists (duh). The intuitive way to solve this, of course, is to
delete the old sub-int first and then create a new sub-int with the correct attributes (dot1ad
outer encapsulation).
Here’s another scenario that illustrates the ordering: Let’s say I want to move an IP address from interface A to interface B. In VPP, I can’t configure the same IP address/prefixlen on two interfaces at the same time, so as with the previous scenario of the encap changing, I will want to remove the IP address from A before adding it to B.
Come to think of it, there are lots of scenarios where remove-before-add is required:
- If an interface was in bridge-domain A but now wants to be put in bridge-domain B, it’ll have to be removed from the first bridge before being added to the second bridge, because an interface can’t be in two bridges at the same time.
- If an interface was a member of a BondEthernet, but will be moved to be a member of a bridge-domain now, it will have to be removed from the bond before being added to the bridge, because an interface can’t be both a bondethernet member and a member of a bridge at the same time.
- And to add to the list, the scenario above: A sub-interface that differs in its intended
encapsulation must be removed before a new one with the same
subid
can be created.
All of these cases can be modeled as edges (arrows) between vertices (objects) in the graph describing the ordering of operations in VPP! I’m now ready to draw two important conclusions:
- All objects that differ from their intended configuration must be removed before being added elsewhere, in order to avoid them being referenced/used twice.
- All objects must be created before their attributes can be set.
vppcfg: Path Planning
By thinking about the configuration in this way, I can precisely predict the order of operations needed to go from any running dataplane configuration to any new target dataplane configuration. A so called path-planner emerges, which has three main phases of execution:
- Prune phase (remove objects from VPP that are not in the config)
- Create phase (add objects to VPP that are in the config but not VPP)
- Sync phase, for each object in the configuration
When removing things, care has to be taken to remove inner-most objects first (first removing LCP, then QinQ, Dot1Q, BondEthernet, and lastly PHY), because indeed, there exists a dependency relationship between objects in this DAG. Conversely, when creating objects, the edges flip their directionality, because creation must be done on outer-most objects first (first creating the PHY, then BondEthernet, Dot1Q, QinQ and lastly LCP).
For example, QinQ/QinAD sub-interfaces should be removed before before their intermediary Dot1Q/Dot1AD can be removed. Another example, MTU of parents should raise before their children, while children should shrink before their parent.
Order matters.
Pruning: First, vppcfg
will ensure all objects do not have attributes which they should not (eg. IP
addresses) and that objects are destroyed that are not needed (ie. have been removed from the
target config). After this phase, I am certain that any object that exists in the dataplane,
both (a) has the right to exist (because it’s in the target configuration), and (b) has the
correct create-time (ie non syncable) attributes.
Creating: Next, vppcfg
will ensure that all objects that are not yet present (including the ones that
it just removed because they were present but had incorrect attributes), get (re)created in the
right order. After this phase, I am certain that all objects in the dataplane now (a) have the
right to exist (because they are in the target configuration), (b) have the correct attributes,
but newly, also that (c) all objects that are in the target configuration also got created and
now exist in the dataplane.
Syncing: Finally, all objects are synchronized with the target configuration (IP addresses, MTU etc), taking care to shrink children before their parents, and growing parents before their children (this is for the special case of any given sub-interface’s MTU having to be equal to or lower than their parent’s MTU).
vppcfg: Demonstration
I’ll create three configurations and let vppcfg path-plan between them. I start a completely empty VPP dataplane which has two GigabitEthernet and two HundredGigabitEthernet interfaces:
pim@hippo:~/src/vpp$ make run
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
DBGvpp# show interface
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
GigabitEthernet3/0/0 1 down 9000/0/0/0
GigabitEthernet3/0/1 2 down 9000/0/0/0
HundredGigabitEthernet12/0/0 3 down 9000/0/0/0
HundredGigabitEthernet12/0/1 4 down 9000/0/0/0
local0 0 down 0/0/0/0
Demo 1: First time config (empty VPP)
First, starting simple, I write the following YAML configuration called hippo4.yaml
. It defines a
few sub-interfaces, a bridgedomain with one QinQ sub-interface Hu12/0/0.101
in it, and it then
cross-connects Gi3/0/0.100
with Hu12/0/1.100
, keeping all sub-interfaces at an MTU of 2000 and
their PHYs at an MTU of 9216:
interfaces:
GigabitEthernet3/0/0:
mtu: 9216
sub-interfaces:
100:
mtu: 2000
l2xc: HundredGigabitEthernet12/0/1.100
GigabitEthernet3/0/1:
description: Not Used
HundredGigabitEthernet12/0/0:
mtu: 9216
sub-interfaces:
100:
mtu: 3000
101:
mtu: 2000
encapsulation:
dot1q: 100
inner-dot1q: 200
exact-match: True
HundredGigabitEthernet12/0/1:
mtu: 9216
sub-interfaces:
100:
mtu: 2000
l2xc: GigabitEthernet3/0/0.100
bridgedomains:
bd10:
description: "Bridge Domain 10"
mtu: 2000
interfaces: [ HundredGigabitEthernet12/0/0.101 ]
If I offer this config to vppcfg
and ask it to plan a path, there won’t be any pruning going on,
because there are no objects in the newly started VPP dataplane that need to be deleted. But I do expect
to see a bunch of sub-interface and one bridge-domain creation, followed by syncing a bunch of
interfaces with bridge-domain memberships and L2 Cross Connects. Finally, the MTU of the interfaces will
be sync’d to their configured values, and the path is planned like so:
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
[INFO ] root.main: Loading configfile hippo4.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
create sub GigabitEthernet3/0/0 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/0 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/1 100 dot1q 100 exact-match
create sub HundredGigabitEthernet12/0/0 101 dot1q 100 inner-dot1q 200 exact-match
create bridge-domain 10
set interface l2 bridge HundredGigabitEthernet12/0/0.101 10
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 pop 2
set interface l2 xconnect GigabitEthernet3/0/0.100 HundredGigabitEthernet12/0/1.100
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 pop 1
set interface l2 xconnect HundredGigabitEthernet12/0/1.100 GigabitEthernet3/0/0.100
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 pop 1
set interface mtu 9216 GigabitEthernet3/0/0
set interface mtu 9216 HundredGigabitEthernet12/0/0
set interface mtu 9216 HundredGigabitEthernet12/0/1
set interface mtu packet 1500 GigabitEthernet3/0/1
set interface mtu packet 9216 GigabitEthernet3/0/0
set interface mtu packet 9216 HundredGigabitEthernet12/0/0
set interface mtu packet 9216 HundredGigabitEthernet12/0/1
set interface mtu packet 2000 GigabitEthernet3/0/0.100
set interface mtu packet 3000 HundredGigabitEthernet12/0/0.100
set interface mtu packet 2000 HundredGigabitEthernet12/0/1.100
set interface mtu packet 2000 HundredGigabitEthernet12/0/0.101
set interface mtu 1500 GigabitEthernet3/0/1
set interface state GigabitEthernet3/0/0 up
set interface state GigabitEthernet3/0/0.100 up
set interface state GigabitEthernet3/0/1 up
set interface state HundredGigabitEthernet12/0/0 up
set interface state HundredGigabitEthernet12/0/0.100 up
set interface state HundredGigabitEthernet12/0/0.101 up
set interface state HundredGigabitEthernet12/0/1 up
set interface state HundredGigabitEthernet12/0/1.100 up
[INFO ] root.main: Planning succeeded
On the vppctl
commandline, I can simply cut-and-paste these CLI commands and the dataplane ends up
configured exactly like was desired in the hippo4.yaml
configuration file. One nice way to tell if
the reconciliation of the config file into the running VPP instance was successful is by running the
planner again with the same YAML config file. It should not find anything worth pruning, creating nor
syncing, and indeed:
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
[INFO ] root.main: Loading configfile hippo4.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO ] root.main: Planning succeeded
Demo 2: Moving from one config to another
To demonstrate how my reconciliation algorithm works in practice, I decide to invent a radically
different configuration for Hippo, called hippo12.yaml
, in which a new BondEthernet appears,
two of its sub-interfaces are cross connected, Hu12/0/0
now gets a LIP and some IP addresses, and
the bridge-domain bd10
is replaced by two others, bd1
and bd11
, the former of which also sports
a BVI (with a LIP called bvi1
) and a VXLAN Tunnel bridged into bd1
for good measure:
bondethernets:
BondEthernet0:
interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ]
interfaces:
GigabitEthernet3/0/0:
mtu: 9000
description: "LAG #1"
GigabitEthernet3/0/1:
mtu: 9000
description: "LAG #2"
HundredGigabitEthernet12/0/0:
lcp: "ice12-0-0"
mtu: 9000
addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ]
sub-interfaces:
1234:
mtu: 1200
lcp: "ice0.1234"
encapsulation:
dot1q: 1234
exact-match: True
1235:
mtu: 1100
lcp: "ice0.1234.1000"
encapsulation:
dot1q: 1234
inner-dot1q: 1000
exact-match: True
HundredGigabitEthernet12/0/1:
mtu: 2000
description: "Bridged"
BondEthernet0:
mtu: 9000
lcp: "bond0"
sub-interfaces:
10:
lcp: "bond0.10"
mtu: 3000
100:
mtu: 2500
l2xc: BondEthernet0.200
encapsulation:
dot1q: 100
exact-match: False
200:
mtu: 2500
l2xc: BondEthernet0.100
encapsulation:
dot1q: 200
exact-match: False
500:
mtu: 2000
encapsulation:
dot1ad: 500
exact-match: False
501:
mtu: 2000
encapsulation:
dot1ad: 501
exact-match: False
vxlan_tunnel1:
mtu: 2000
loopbacks:
loop0:
lcp: "lo0"
addresses: [ 10.0.0.1/32, 2001:db8::1/128 ]
loop1:
lcp: "bvi1"
addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ]
bridgedomains:
bd1:
mtu: 2000
bvi: loop1
interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ]
bd11:
mtu: 1500
vxlan_tunnels:
vxlan_tunnel1:
local: 192.0.2.1
remote: 192.0.2.2
vni: 101
Before writing vppcfg
, the art of moving from hippo4.yaml
to this radically different hippo12.yaml
would be a nightmare, and almost certainly have caused me to miss a step and cause an outage. But, due to
the fundamental understanding of ordering, and the methodical execution of pruning, creating and
syncing the objects, the path planner comes up with the following sequence, which I’ll break down
in its three constituent phases:
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
[INFO ] root.main: Loading configfile hippo12.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
set interface state HundredGigabitEthernet12/0/0.101 down
set interface state GigabitEthernet3/0/0.100 down
set interface state HundredGigabitEthernet12/0/0.100 down
set interface state HundredGigabitEthernet12/0/1.100 down
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 disable
set interface l3 HundredGigabitEthernet12/0/0.101
create bridge-domain 10 del
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 disable
set interface l3 GigabitEthernet3/0/0.100
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 disable
set interface l3 HundredGigabitEthernet12/0/1.100
delete sub HundredGigabitEthernet12/0/0.101
delete sub GigabitEthernet3/0/0.100
delete sub HundredGigabitEthernet12/0/0.100
delete sub HundredGigabitEthernet12/0/1.100
First, vppcfg
concludes that Hu12/0/0.101
, Hu12/0/1.100
and Gi3/0/0.100
are no longer
needed, so it sets them all admin-state down. The bridge-domain bd10
no longer has the right to
exist, the poor thing. But before it is deleted, the interface that was in bd10
can be pruned
(membership depends on the bridge, so in pruning, dependencies are removed before dependents).
Considering Hu12/0/1.101
and Gi3/0/0.100
were an L2XC pair before, they are returned to default
(L3) mode and because it’s no longer needed, the VLAN Gymnastics
tag rewriting is also cleaned up for both interfaces. Finally, the sub-interfaces that do not appear
in the target configuration are deleted, completing the pruning phase.
It then continues with the create phase:
create loopback interface instance 0
create loopback interface instance 1
create bond mode lacp load-balance l34 id 0
create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 instance 1 vni 101 decap-next l2
create sub HundredGigabitEthernet12/0/0 1234 dot1q 1234 exact-match
create sub BondEthernet0 10 dot1q 10 exact-match
create sub BondEthernet0 100 dot1q 100
create sub BondEthernet0 200 dot1q 200
create sub BondEthernet0 500 dot1ad 500
create sub BondEthernet0 501 dot1ad 501
create sub HundredGigabitEthernet12/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
create bridge-domain 1
create bridge-domain 11
lcp create HundredGigabitEthernet12/0/0 host-if ice12-0-0
lcp create BondEthernet0 host-if bond0
lcp create loop0 host-if lo0
lcp create loop1 host-if bvi1
lcp create HundredGigabitEthernet12/0/0.1234 host-if ice0.1234
lcp create BondEthernet0.10 host-if bond0.10
lcp create HundredGigabitEthernet12/0/0.1235 host-if ice0.1234.1000
Here, interfaces are created in order of loopbacks first, then BondEthernets, then Tunnels, and
finally sub-interfaces, first creating single-tagged and then creating dual-tagged sub-interfaces.
Of course, the BondEthernet has to be created before any sub-int will be able to be created on it.
Note that the QinQ Hu12/0/0.1235
will be created after its intermediary parent Hu12/0/0.1234
due to this ordering requirement.
Then, the two new bridgedomains bd1
and bd11
are created, and finally the LIP plumbing is
performed, starting with the PHY ice12-0-0
and BondEthernet bond0
, then the two loopbacks,
and only then advancing to the two single-tag dot1q interfaces and finally the QinQ interface. For
LCPs, this is very important, because in Linux, the interfaces are a tree, not a list. ice12-0-0
must be created before its child ice0.1234@ice12-0-0
can be created, and only then can the QinQ
ice0.1234.1000@ice0.1234
be created. This creation order follows from the DAG having an edge
signalling an LCP depending on the sub-interface, and an edge between the sub-interface with two
tags depending on the sub-interface with one tag, and an edge between the single-tagged sub-interface
depending on its PHY.
After all this work, vppcfg
can assert (a) every object that now exists in VPP is in the
target configuration and (b) that any object that exists in the configuration also is present in
VPP (with the correct attributes).
But there’s one last thing to do, and that’s ensure that the attributes that can be changed at runtime (IP addresses, L2XCs, BondEthernet and bridge-domain members, etc) , are sync’d into their respective objects in VPP based on what’s in the target configuration:
bond add BondEthernet0 GigabitEthernet3/0/0
bond add BondEthernet0 GigabitEthernet3/0/1
comment { ip link set bond0 address 00:25:90:0c:05:01 }
set interface l2 bridge loop1 1 bvi
set interface l2 bridge BondEthernet0.500 1
set interface l2 tag-rewrite BondEthernet0.500 pop 1
set interface l2 bridge BondEthernet0.501 1
set interface l2 tag-rewrite BondEthernet0.501 pop 1
set interface l2 bridge HundredGigabitEthernet12/0/1 1
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1 disable
set interface l2 bridge vxlan_tunnel1 1
set interface l2 tag-rewrite vxlan_tunnel1 disable
set interface l2 xconnect BondEthernet0.100 BondEthernet0.200
set interface l2 tag-rewrite BondEthernet0.100 pop 1
set interface l2 xconnect BondEthernet0.200 BondEthernet0.100
set interface l2 tag-rewrite BondEthernet0.200 pop 1
set interface state GigabitEthernet3/0/1 down
set interface mtu 9000 GigabitEthernet3/0/1
set interface state GigabitEthernet3/0/1 up
set interface mtu packet 9000 GigabitEthernet3/0/0
set interface mtu packet 9000 HundredGigabitEthernet12/0/0
set interface mtu packet 2000 HundredGigabitEthernet12/0/1
set interface mtu packet 2000 vxlan_tunnel1
set interface mtu packet 1500 loop0
set interface mtu packet 1500 loop1
set interface mtu packet 9000 GigabitEthernet3/0/1
set interface mtu packet 1200 HundredGigabitEthernet12/0/0.1234
set interface mtu packet 3000 BondEthernet0.10
set interface mtu packet 2500 BondEthernet0.100
set interface mtu packet 2500 BondEthernet0.200
set interface mtu packet 2000 BondEthernet0.500
set interface mtu packet 2000 BondEthernet0.501
set interface mtu packet 1100 HundredGigabitEthernet12/0/0.1235
set interface state GigabitEthernet3/0/0 down
set interface mtu 9000 GigabitEthernet3/0/0
set interface state GigabitEthernet3/0/0 up
set interface state HundredGigabitEthernet12/0/0 down
set interface mtu 9000 HundredGigabitEthernet12/0/0
set interface state HundredGigabitEthernet12/0/0 up
set interface state HundredGigabitEthernet12/0/1 down
set interface mtu 2000 HundredGigabitEthernet12/0/1
set interface state HundredGigabitEthernet12/0/1 up
set interface ip address HundredGigabitEthernet12/0/0 192.0.2.17/30
set interface ip address HundredGigabitEthernet12/0/0 2001:db8:3::1/64
set interface ip address loop0 10.0.0.1/32
set interface ip address loop0 2001:db8::1/128
set interface ip address loop1 10.0.1.1/24
set interface ip address loop1 2001:db8:1::1/64
set interface state HundredGigabitEthernet12/0/0.1234 up
set interface state HundredGigabitEthernet12/0/0.1235 up
set interface state BondEthernet0 up
set interface state BondEthernet0.10 up
set interface state BondEthernet0.100 up
set interface state BondEthernet0.200 up
set interface state BondEthernet0.500 up
set interface state BondEthernet0.501 up
set interface state vxlan_tunnel1 up
set interface state loop0 up
set interface state loop1 up
I’m not gonna lie, it’s a tonne of work, but it’s all a pretty staight forward juggle. The sync
phase will look at each object in the config and ensure that the attributes that same object has in the
dataplane are present and correct. In my demo, hippo12.yaml
creates a lot of interfaces and IP
addresses, and changes the MTU of pretty much every interface, but in order:
- The bondethernet gets its members
Gi3/0/0
andGi3/0/1
. As an interesting aside, when VPP creates a BondEthernet it’ll initially assign it an ephemeral MAC address. Then, when its first member is added, the MAC address of the BondEthernet will change to that of the first member. The comment reminds me to also set this MAC on the Linux devicebond0
. In the future, I’ll add somePyRoute2
code to do that automatically. - BridgeDomains are next. The BVI
loop1
is added first, then a few sub-interfaces and a tunnel, and VLAN tag-rewriting for tagged interfaces is configured. There are two bridges, but only one of them has members, so there’s not much (in fact, there’s nothing) to do for the other one. - L2 Cross Connects can be changed at runtime, and they’re next. The two interfaces
BE0.100
andBE0.200
are connected to one another and tag-rewrites are set up for them, considering they are both tagged sub-interfaces. - MTU is next. There’s two variants of this. The first one
set interface mtu
is actually a change in the DPDK driver to change the maximum allowable frame size. For this change, some interface types have to be brought down first, the max frame size changed, and then brought back up again. For all the others, the MTU will be changed in a specific order:- PHYs will grow their MTU first, as growing a PHY is guaranteed to be always safe.
- Sub-interfaces will shrink QinX first, then Dot1Q/Dot1AD, then untagged interfaces. This is to ensure we do not leave VPP and LinuxCP in a state where a QinQ sub-int has a higher MTU than any of its parents.
- Sub-interfaces will grow untagged first, then DOt1Q/Dot1AD, and finally QinX sub-interfaces. Same reason as step 2, no sub-interface will end up with a higher MTU than any of its parents.
- PHYs will shrink their MTU last. The YAML configuration validation asserts that no PHY can have an MTU lower than any of its children, so this is safe.
- Finally, IP addresses are added to
Hu12/0/0
,loop0
andloop1
. I can guarantee that adding IP addresses will not clash with any other interface, because pruning would’ve removed IP addresses from interfaces where they don’t belong previously. - And to finish off, the admin state for interfaces is set, again going from PHY, Bond, Tunnel, 1-tagged sub-interfaces and finally 2-tagged sub-interfaces and loopbacks.
Let’s take it to the test:
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan -o hippo4-to-12.exec
[INFO ] root.main: Loading configfile hippo12.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO ] vppcfg.reconciler.write: Wrote 94 lines to hippo4-to-12.exec
[INFO ] root.main: Planning succeeded
pim@hippo:~/src/vppcfg$ vppctl exec ~/src/vppcfg/hippo4-to-12.exec
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
[INFO ] root.main: Loading configfile hippo12.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO ] root.main: Planning succeeded
Notice that after applying hippo4-to-12.exec
, the planner had nothing else to say. VPP is now in
the target configuration state, slick!
Demo 3: Returning VPP to empty
This one is easy, but shows the pruning in action. Let’s say I wanted to return VPP to a default configuration without any objects, and its interfaces all at MTU 1500:
interfaces:
GigabitEthernet3/0/0:
mtu: 1500
description: Not Used
GigabitEthernet3/0/1:
mtu: 1500
description: Not Used
HundredGigabitEthernet12/0/0:
mtu: 1500
description: Not Used
HundredGigabitEthernet12/0/1:
mtu: 1500
description: Not Used
Simply applying that plan:
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo-empty.yaml plan -o 12-to-empty.exec
[INFO ] root.main: Loading configfile hippo-empty.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
[INFO ] vppcfg.reconciler.write: Wrote 66 lines to 12-to-empty.exec
[INFO ] root.main: Planning succeeded
pim@hippo:~/src/vppcfg$ vppctl
vpp# exec ~/src/vppcfg/12-to-empty.exec
vpp# show interface
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
GigabitEthernet3/0/0 1 up 1500/0/0/0
GigabitEthernet3/0/1 2 up 1500/0/0/0
HundredGigabitEthernet12/0/0 3 up 1500/0/0/0
HundredGigabitEthernet12/0/1 4 up 1500/0/0/0
local0 0 down 0/0/0/0
Final notes
Now you may have been wondering why I would call the first file hippo4.yaml
and the second one
hippo12.yaml
. This is because I have 20 such YAML files that bring Hippo into all sorts of
esoteric configuration states, and I do this so that I can do a full integration test of any config
morphing into any other config:
for i in hippo[0-9]*.yaml; do
echo "Clearing: Moving to hippo-empty.yaml"
./vppcfg -c hippo-empty.yaml > /tmp/vppcfg-exec-empty
[ -s /tmp/vppcfg-exec-empty ] && vppctl exec /tmp/vppcfg-exec-empty
for j in hippo[0-9]*.yaml; do
echo " - Moving to $i .. "
./vppcfg -c $i > /tmp/vppcfg-exec_$i
[ -s /tmp/vppcfg-exec_$i ] && vppctl exec /tmp/vppcfg-exec_$i
echo " - Moving from $i to $j"
./vppcfg -c $j > /tmp/vppcfg-exec_${i}_${j}
[ -s /tmp/vppcfg-exec_${i}_${j} ] && vppctl exec /tmp/vppcfg-exec_${i}_${j}
echo " - Checking that from $j to $j is empty"
./vppcfg -c $j > /tmp/vppcfg-exec_${j}_${j}_null
done
done
What this does is starts off Hippo with an empty config, then moves it to hippo1.yaml
and from
there it moves the configuration to each YAML file and back to hippo1.yaml
. Doing this proves,
that no matter which configuration I want to obtain, I can get there safely when the VPP dataplane
config starts out looking like what is described in hippo1.yaml
. I’ll then move it back to empty,
and into hippo2.yaml
, doing the whole cycle again. So for 20 files, this means ~400 or so
configuration transitions. And some of these are special, notably moving from hippoN.yaml
to
the same hippoN.yaml
should result in zero diffs.
With this path planner reasonably well tested, I have pretty high confidence that vppcfg
can
change the dataplane from any existing configuration to any desired target configuration.
What’s next
One thing that I didn’t mention yet, is that the vppcfg
path planner works by reading the API
configuration state exactly once (at startup), and then it figures out the CLI calls to print
without needing to talk to VPP again. This is super useful as it’s a non-intrusive way to inspect
the changes before applying them, and it’s a property I’d like to carry forward.
However, I don’t necessarily think that emitting the CLI statements is the best user experience, it’s more for the purposes of analysis that they can be useful. What I really want to do is emit API calls after the plan is created and reviewed/approved, directly reprogramming the VPP dataplane, and likely the Linux network namespace interfaces as well, for example setting the MAC address of a BondEthernet as I showed in that one comment above, or setting interface alias names based on the configured descriptions.
However, the VPP API set needed to do this is not 100% baked yet. For example, I observed crashes
when tinkering with BVIs and Loopbacks (thread), and
fixed a few obvious errors in the Linux CP API (gerrit) but
there are still a few more issues to work through before I can set the next step with vppcfg
.
But for now, it’s already helping me out tremendously at IPng Networks and I hope it’ll be useful for others, too.