VPP Configuration - Part1

Posted: 2022-03-27

About this series

I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community merged the Linux Control Plane plugin. I wrote about its deployment to both regular servers like the Supermicro routers that run on our AS8298, as well as virtual machines running in KVM/Qemu.

Now that I’ve been running VPP in production for about half a year, I can’t help but notice one specific drawback: VPP is a programmable dataplane, and by design it does not include any configuration or controlplane management stack. It’s meant to be integrated into a full stack by operators. For end-users, this unfortunately means that typing on the CLI won’t persist any configuration, and if VPP is restarted, it will not pick up where it left off. There’s one developer convenience in the form of the exec command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line by line. However, if any typo is made in the file, processing immediately stops. It’s meant as a convenience for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies.

Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts, I’ll detail the work I’ve done to create a configuration utility that can take a YAML configuration file, compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply the configuration to the dataplane. Welcome to vppcfg!

In this first post, let’s take a look at tablestakes: writing a YAML specification which models the main configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as semantically correct.

Note: Code is on my Github, but it’s not quite ready for prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves) or reach out by contacting us.

YAML Specification

I decide to use Yamale, which is a schema description language and validator for YAML. YAML is a very simple, text/human-readable annotation format that can be used to store a wide range of data types. An interesting, but quick introduction to the YAML language can be found on CraftIRC’s GitHub page.

The first order of business for me is to devise a YAML file specification which models the configuration options of VPP objects in an idiomatic way. It’s apealing to make the decision to immediately build a higher level abstraction, but I resist the urge and instead look at the types of objects that exist in VPP, for example the VNET_DEVICE_CLASS types:

ethernet_simulated_device_class: Loopbacks
bvi_device_class: Bridge Virtual Interfaces
dpdk_device_class: DPDK Interfaces
rdma_device_class: RDMA Interfaces
bond_device_class: BondEthernet Interfaces
vxlan_device_class: VXLAN Tunnels

There are several others, but I decide to start with these, as I’ll be needing each one of these in my own network. Looking over the device class specification, I learn a lot about how they are configured, which arguments and of which type they need, and which data-structures they are represent as in VPP internally.

Syntax Validation

Yamale first reads a schema definition file, and then holds a given YAML file against the definition and shows if the file has a syntax that is well-formed or not. As a practical example, let me start with the following definition:

$ cat << EOF > schema.yaml
sub-interfaces: map(include('sub-interface'),key=int())
---
sub-interface:
  description: str(exclude='\'"',len=64,required=False)
  lcp: str(max=15,matches='[a-z]+[a-z0-9-]*',required=False)
  mtu: int(min=128,max=9216,required=False)
  addresses: list(ip(version=6),required=False)
  encapsulation: include('encapsulation',required=False)
---
encapsulation:
  dot1q: int(min=1,max=4095,required=False)
  dot1ad: int(min=1,max=4095,required=False)
  inner-dot1q: int(min=1,max=4095,required=False)
  exact-match: bool(required=False)
EOF

This snippet creates two types, one called sub-interface and the other called encapsulation. The fields of the sub-interface, for example the description field, must follow the given typing to be valid. In the case of the description, it must be at most 64 characters long and it must not contain the or " characters. The designationrequired=Falsenotes that this is an optional field and may be omitted. Thelcpfield is also a string but it must match a certain regular expression, and start with a lowercase letter. TheMTU` field must be an integer between 128 and 9216, and so on.

One nice feature of Yamale is the ability to reference other object types. I do this here with the encapsulation field, which references an object type of the same name, and again, is optional. This means that when the encapsulation field is encountered in the YAML file Yamale is validating, it’ll hold the contents of that field to the schema below. There, we have dot1q, dot1ad, inner-dot1q and exact-match fields, which are all optional.

Then, at the top of the file, I create the entrypoint schema, which expects YAML files to contain a map called sub-interfaces which is keyed by integers and contains values of type sub-interface, tying it all together.

Yamale comes with a commandline utility to do direct schema validation, which is handy. Let me demonstrate with the following terrible YAML:

$ cat << EOF > bad.yaml
sub-interfaces:
  100:
     description: "Pim's illegal description"
     lcp: "NotAGoodName-AmIRite"
     mtu: 16384
     addresses: 192.0.2.1
     encapsulation: False
EOF

$ yamale -s schemal.yaml bad.yaml
Validating /home/pim/bad.yaml...
Validation failed!
Error validating data '/home/pim/bad.yaml' with schema '/home/pim/schema.yaml'
	sub-interfaces.100.description: 'Pim's illegal description' contains excluded character '''
	sub-interfaces.100.lcp: Length of NotAGoodName-AmIRite is greater than 15
	sub-interfaces.100.lcp: NotAGoodName-AmIRite is not a regex match.
	sub-interfaces.100.mtu: 16384 is greater than 9216
	sub-interfaces.100.addresses: '192.0.2.1' is not a list.
	sub-interfaces.100.encapsulation : 'False' is not a map

This file trips so many syntax violations, it should be a crime! In fact every single field is invalid. The one that is closest to being correct is the addresses field, but there I’ve set it up as a list (not a scalar), and even then, the list elements are expected to be IPv6 addresses, not IPv4 ones.

So let me try again:

$ cat << EOF > good.yaml
sub-interfaces:
  100:
     description: "Core: switch.example.com Te0/1"
     lcp: "xe3-0-0"
     mtu: 9216
     addresses: [ 2001:db8::1, 2001:db8:1::1 ]
     encapsulation:
       dot1q: 100
       exact-match: True
EOF

$ yamale good.yaml
Validating /home/pim/good.yaml...
Validation success! 👍

Semantic Validation

When using Yamale, I can make a good start in syntax validation, that is to say, if a field is present, it follows a prescribed type. But that’s not the whole story, though. There are many configuration files I can think of that would be syntactically correct, but still make no sense in practice. For example, creating an encapsulation which has both dot1q as well as dot1ad, or creating a LIP (Linux Interface Pair) for sub-interface which does not have exact-match set. Or how’s about having two sub-interfaces with the same exact encapsulation?

Here’s where semantic validation comes in to play. So I set out to create all sorts of constraints, and after reading the (Yamale validated, so syntactically correct) YAML file, I can hand it into a set of validators that check for violations of these constraints. By means of example, let me create a few constraints that might capture the issues described above:

If a sub-interface has encapsulation:
1. It MUST have dot1q OR dot1ad set
2. It MUST NOT have dot1q AND dot1ad both set
If a sub-interface has one or more addresses:
1. Its encapsulation MUST be set to exact-match
2. It MUST have an lcp set.
3. Each individual address MUST NOT occur in any other interface

Config Validation

After spending a few weeks thinking about the problem, I came up with 59 semantic constraints, that is to say things that might appear OK, but will yield impossible to implement or otherwise erratic VPP configurations. This article would be a bad place to discuss them all, so I will talk about the structure of vppcfg instead.

First, a Validator class is instantiated with the Yamale schema. Then, a YAML file is read and passed to the validator’s validate() method. It will first run Yamale on the YAML file and make note of any issues that arise. If so, it will enumerate them in a list and return (bool, [list-of-messages]). The validation will have failed if the boolean returned is false, and if so, the list of messages will help understand which constraint was violated.

The vppcfg schema consists of toplevel types, which are validated in order:

validate_bondethernets()’s job is to ensure that anything configured in the bondethernets toplevel map is correct. For example, if a BondEthernet device is created there, its members should reference existing interfaces, and it itself should make an appearance in the interfaces map, and the MTU of each member should be equal to the MTU of the BondEthernet, and so on. See config/bondethernet.py for a complete rundown.
validate_loopbacks() is pretty straight forward. It makes a few common assertions, such as that if the loopback has addresses, it must also have an LCP, and if it has an LCP, that no other interface has the same LCP name, and that all of the addresses configured are unique.
validate_vxlan_tunnels() Yamale already asserts that the local and remote fields are present and an IP address. The semantic validator ensures that the address family of the tunnel endpoints are the same, and that the used VNI is unique.
validate_bridgedomains() fiddles with its Bridge Virtual Interface, making sure that its addresses and LCP name are unique. Further, it makes sure that a given member interface is in at most one bridge, and that said member is in L2 mode, in other words, that it doesn’t have an LCP or an address. An L2 interface can be either in a bridgedomain, or act as an L2 Cross Connect, but not both. Finally, it asserts that each member has an MTU identical to the bridge’s MTU value.
validate_interfaces() is by far the most complex, but a few common things worth calling out is that each sub-interface must have a unique encapsulation, and if a given QinQ or QinAD 2-tagged sub-interface has an LCP, that there exist a parent Dot1Q or Dot1AD interface with the correct encapsulation, and that it also has an LCP. See config/interface.py for an extensive overview.

Testing

Of course, in a configuration model so complex as a VPP router, being able to do a lot of validation helps ensure that the constraints above are implemented correctly. To help this along, I use regular unittesting as provided by the Python3 unittest framework, but I extend it to run as well a special kind of test which I call a YAMLTest.

Unit Testing

This is bread and butter, and should be straight forward for software engineers. I took a model of so called test-driven development, where I start off by writing a test, which of course fails because the code hasn’t been implemented yet. Then I implement the code, and run this and all other unittests expecting them to pass.

Let me give an example based on BondEthernets, with a YAML config file as follows:

bondethernets:
  BondEthernet0:
    interfaces: [ GigabitEthernet1/0/0, GigabitEthernet1/0/1 ]
interfaces:
  GigabitEthernet1/0/0:
    mtu: 3000
  GigabitEthernet1/0/1:
    mtu: 3000
  GigabitEthernet2/0/0:
    mtu: 3000
    sub-interfaces:
      100:
        mtu: 2000

  BondEthernet0:
    mtu: 3000
    lcp: "be012345678"
    addresses: [ 192.0.2.1/29, 2001:db8::1/64 ]
    sub-interfaces:
      100:
        mtu: 2000
        addresses: [ 192.0.2.9/29, 2001:db8:1::1/64 ]

As I mentioned when discussing the semantic constraints, there’s a few here that jump out at me. First, the BondEthernet members Gi1/0/0 and Gi1/0/1 must exist. There is one BondEthernet defined in this file (obvious, I know, but bear with me), and Gi2/0/0 is not a bond member, and certainly Gi2/0/0.100 is not a bond member, because having a sub-interface as an LACP member would be super weird. Taking things like this into account, here’s a few tests that could assert that the behavior of the bondethernets map in the YAML config is correct:

class TestBondEthernetMethods(unittest.TestCase):
    def setUp(self):
        with open("unittest/test_bondethernet.yaml", "r") as f:
            self.cfg = yaml.load(f, Loader = yaml.FullLoader)

    def test_get_by_name(self):
        ifname, iface = bondethernet.get_by_name(self.cfg, "BondEthernet0")
        self.assertIsNotNone(iface)
        self.assertEqual("BondEthernet0", ifname)
        self.assertIn("GigabitEthernet1/0/0", iface['interfaces'])
        self.assertNotIn("GigabitEthernet2/0/0", iface['interfaces'])

        ifname, iface = bondethernet.get_by_name(self.cfg, "BondEthernet-notexist")
        self.assertIsNone(iface)
        self.assertIsNone(ifname)

    def test_members(self):
        self.assertTrue(bondethernet.is_bond_member(self.cfg, "GigabitEthernet1/0/0"))
        self.assertTrue(bondethernet.is_bond_member(self.cfg, "GigabitEthernet1/0/1"))
        self.assertFalse(bondethernet.is_bond_member(self.cfg, "GigabitEthernet2/0/0"))
        self.assertFalse(bondethernet.is_bond_member(self.cfg, "GigabitEthernet2/0/0.100"))

    def test_is_bondethernet(self):
        self.assertTrue(bondethernet.is_bondethernet(self.cfg, "BondEthernet0"))
        self.assertFalse(bondethernet.is_bondethernet(self.cfg, "BondEthernet-notexist"))
        self.assertFalse(bondethernet.is_bondethernet(self.cfg, "GigabitEthernet1/0/0"))

    def test_enumerators(self):
        ifs = bondethernet.get_bondethernets(self.cfg)
        self.assertEqual(len(ifs), 1)
        self.assertIn("BondEthernet0", ifs)
        self.assertNotIn("BondEthernet-noexist", ifs)

Every single function that is defined in the file config/bondethernet.py (there are four) will have an accompanying unittest to ensure it works as expected. And every validator module, will have a suite of unittests fully covering their functionality. In total, I wrote a few dozen unit tests like this, in an attempt to be reasonably certain that the config validator functionality works as advertised.

YAML Testing

I added one additional class of unittest called a YAMLTest. What happens here is that a certain YAML configuration file, which may be valid or have errors, is offered to the end to end config parser (so both the Yamale schema validator as well as the semantic validators), and all errors are accounted for. As an example, two sub-interfaces on the same parent cannot have the same encapsulation, so offering the following file to the config validator is expected to trip errors:

$ cat unittest/yaml/error-subinterface1.yaml << EOF
test:
  description: "Two subinterfaces can't have the same encapsulation"
  errors:
    expected:
     - "sub-interface .*.100 does not have unique encapsulation"
     - "sub-interface .*.102 does not have unique encapsulation"
    count: 2
---
interfaces:
  GigabitEthernet1/0/0:
    sub-interfaces:
      100:
        description: "VLAN 100"
      101:
        description: "Another VLAN 100, but without exact-match"
        encapsulation:
          dot1q: 100
      102:
        description: "Another VLAN 100, but without exact-match"
        encapsulation:
          dot1q: 100
          exact-match: True
EOF

You can see the file here has two YAML documents (separated by ---), the first one explains to the YAMLTest class what to expect. There can either be no errors (in which case test.errors.count=0), or there can be specific errors that are expected. In this case, Gi1/0/0.100 and Gi1/0/0/102 have the same encapsulation but Gi1/0/0.101 is unique (if you’re curious, this is because the encap on 100 and 102 has exact-match, but the one one 101 does not have exact-match).

The implementation of this YAMLTest class is in tests.py, which in turn runs all YAML tests on the files it finds in unittest/yaml/*.yaml (currently 47 specific cases are tested there, which covered 100% of the semantic constraints), and regular unittests (currently 42, which is a coincidence, I swear!)

What’s next?

These tests, together, give me a pretty strong assurance that any given YAML file that passes the validator, is indeed a valid configuration for VPP. In my next post, I’ll go one step further, and talk about applying the configuration to a running VPP instance, which is of course the overarching goal. But I would not want to mess up my (or your!) VPP router by feeding it garbage, so the lions’ share of my time so far on this project has been to assert the YAML file is both syntactically and semantically valid.

In the mean time, you can take a look at my code on GitHub, but to whet your appetite, here’s a hefty configuration that demonstrates all implemented types:

bondethernets:
  BondEthernet0:
    interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ]

interfaces:
  GigabitEthernet3/0/0:
    mtu: 9000
    description: "LAG #1"
  GigabitEthernet3/0/1:
    mtu: 9000
    description: "LAG #2"

  HundredGigabitEthernet12/0/0:
    lcp: "ice0"
    mtu: 9000
    addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ]
    sub-interfaces:
      1234:
        mtu: 1200
        lcp: "ice0.1234"
        encapsulation:
          dot1q: 1234
          exact-match: True
      1235:
        mtu: 1100
        lcp: "ice0.1234.1000"
        encapsulation:
          dot1q: 1234
          inner-dot1q: 1000
          exact-match: True

  HundredGigabitEthernet12/0/1:
    mtu: 2000
    description: "Bridged"

  BondEthernet0:
    mtu: 9000
    lcp: "be0"
    sub-interfaces:
      100:
        mtu: 2500
        l2xc: BondEthernet0.200
        encapsulation:
           dot1q: 100
           exact-match: False
      200:
        mtu: 2500
        l2xc: BondEthernet0.100
        encapsulation:
           dot1q: 200
           exact-match: False
      500:
        mtu: 2000
        encapsulation:
           dot1ad: 500
           exact-match: False
      501:
        mtu: 2000
        encapsulation:
           dot1ad: 501
           exact-match: False
  vxlan_tunnel1:
    mtu: 2000

loopbacks:
  loop0:
    lcp: "lo0"
    addresses: [ 10.0.0.1/32, 2001:db8::1/128 ]
  loop1:
    lcp: "bvi1"
    addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ]

bridgedomains:
  bd1:
    mtu: 2000
    bvi: loop1
    interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ]
  bd11:
    mtu: 1500

vxlan_tunnels:
  vxlan_tunnel1:
    local: 192.0.2.1
    remote: 192.0.2.2
    vni: 101

The vision for my VPP Configuration utility is that it can move from any existing VPP configuration to any other (validated successfully) configuration with a minimal amount of steps, and that it will plan its way declaratively from A to B, ordering the calls to the API safely and quickly. Interested? Good, because I do expect that a utility like this would be very valuable to serious VPP users!