Case Study: Site Local NGINX

A while ago I rolled out an important change to the IPng Networks design: I inserted a bunch of [Centec MPLS] and IPv4/IPv6 capable switches underneath [AS8298], which gave me two specific advantages:

  1. The entire IPng network is now capable of delivering L2VPN services, taking the form of MPLS point-to-point ethernet, and VPLS, as shown in a previous [deep dive], in addition to IPv4 and IPv6 transit provided by VPP in an elaborate and elegant [BGP Routing Policy].

  2. A new internal private network becomes available to any device connected IPng switches, with addressing in 198.19.0.0/16 and 2001:678:d78:500::/56. This network is completely isolated from the Internet, with access controlled via N+2 redundant gateways/firewalls, described in more detail in a previous [deep dive] as well.

Overview

Toxicity

After rolling out this spiffy BGP Free [MPLS Core], I wanted to take a look at maybe conserving a few IP addresses here and there, as well as tightening access and protecting the more important machines that IPng Networks runs. You see, most enterprise networks will include a bunch of internal services, like databases, network attached storage, backup servers, network monitoring, billing/CRM et cetera. IPng Networks is no different.

Somewhere between the sacred silence and sleep, lives my little AS8298. It’s a gnarly and toxic place out there in the DFZ, how do you own disorder?

Connectivity

Backbone

As a refresher, here’s the current situation at IPng Networks:

1. Site Local Connectivity

Each switch gets what is called an IPng Site Local (or ipng-sl) interface. This is a /27 IPv4 and a /64 IPv6 that is bound on a local VLAN on each switch on our private network. Remember: the links between sites are no longer switched, they are routed and pass ethernet frames only using MPLS. I can connect for example all of the fleet’s hypervisors to this internal network using jumboframes using 198.19.0.0/16 and 2001:678:d78:500::/56 which is not connected to the internet.

2. Egress Connectivity

There are three geographically diverse gateways that inject an OSPF E1 default route into the Centec Site Local network, and they will provide NAT for IPv4 and IPv6 to the internet. This setup allows all machines in the internal private network to reach the internet, using their closest gateway. Failing over between gateways is fully automatic, when one is unavailable or down for maintenance, the network will simply find the next-closest gateway.

3. Ingress Connectivity

Inbound traffic (from the internet to IPng Site Local) is held at the gateways. First of all, the reserved IPv4 space 198.18.0.0/15 is a bogon and will not be routed on the public internet, but our VPP routers in AS8298 do carry the route albeit with the well-known BGP no-export community set, so traffic could arrive at the gateway coming from our own network only. This is not true for IPv6, because here our prefix is a part of the AS8298 IPv6 PI space, and traffic will be globally routable. Even then, only very few prefixes are allowed to enter into the IPng Site Local private network, nominally only our NOC prefixes, one or two external bastion hosts, and our own Wireguard endpoints which are running on the gateways.

Frontend Setup

One of my goals for the private network is IPv4 conservation. I decided to move our web-frontends to be dual-homed: one network interface towards the internet using public IPv4 and IPv6 addresses, and another network interface that finds backend servers in the IPng Site Local private network.

This way, I can have one NGINX instance (or a pool of them), terminate the HTTP/HTTPS connection (there’s an InfraSec joke about SSL is inserted and removed here :)), no matter how many websites, domains, or physical webservers I want to use. Some SSL certificate providers allow for wildcards (ie. *.ipng.ch), but I’m going to keep it relatively simple and use [Let’s Encrypt] which offers free certificates with a validity of three months.

Installing NGINX

First, I will install three minimal VMs with Debian Bullseye on separate hypervisors (in Rümlang chrma0, Plan-les-Ouates chplo0 and Amsterdam nlams1), giving them each 4 CPUs, a 16G blockdevice on the hypervisor’s ZFS (which is automatically snapsotted and backed up offsite using ZFS replication!), and 1GB of memory. These machines will be the IPng Frontend servers, and handle all client traffic to our web properties. Their job is to forward that HTTP/HTTPS traffic internally to webservers that are running entirely in the IPng Site Local (private) network.

I’ll install a few tablestakes packages on them, taking nginx0.chrma0 as an example:

pim@nginx0-chrma0:~$ sudo apt install nginx iptables ufw rsync
pim@nginx0-chrma0:~$ sudo ufw allow 80
pim@nginx0-chrma0:~$ sudo ufw allow 443
pim@nginx0-chrma0:~$ sudo ufw allow from 198.19.0.0/16
pim@nginx0-chrma0:~$ sudo ufw allow from 2001:678:d78:500::/56
pim@nginx0-chrma0:~$ sudo ufw enable

Installing Lego

Next, I’ll install one more highly secured minimal VM with Debian Bullseye, giving it 1 CPU, a 16G blockdevice and 1GB of memory. This is where my Let’s Encrypt SSL certificate store will live. This machine does not need to be publicly available, so it will only get one interface, connected to the IPng Site Local network, so it’ll be using private IPs.

This virtual machine really is bare-bones, it only gets a firewall, rsync, and the lego package. It doesn’t technically even need to run SSH, because I can log into serial console using the hypervisor. Considering it’s an internal-only server (not connected to the internet), but also because I do believe in OpenSSH’s track record of safety, I decide to leave SSH enabled:

pim@lego:~$ apt install ufw lego rsync
pim@lego:~$ sudo ufw allow 8080
pim@lego:~$ sudo ufw allow 22
pim@lego:~$ sudo ufw enable

Now that all four machines are set up and appropriately filtered (using a simple ufw Debian package):

  • NGINX will allow port 80 and 443 for public facing web traffic, and is permissive for the IPng Site Local network, to allow SSH for rsync and maintenance tasks
  • LEGO will be entirely closed off, allowing access only from trusted sources for SSH, and to one TCP port 8080 on which HTTP-01 certificate challenges will be served.

I make a pre-flight check to make sure that jumbo frames are possible from the frontends into the backend network.

pim@nginx0-nlams1:~$ traceroute lego
traceroute to lego (198.19.4.6), 30 hops max, 60 byte packets
 1  msw0.nlams0.net.ipng.ch (198.19.4.97)  0.737 ms  0.958 ms  1.155 ms
 2  msw0.defra0.net.ipng.ch (198.19.2.22)  6.414 ms  6.748 ms  7.089 ms
 3  msw0.chrma0.net.ipng.ch (198.19.2.7)  12.147 ms  12.315 ms  12.401 ms
 2  msw0.chbtl0.net.ipng.ch (198.19.2.0)  12.685 ms  12.429 ms  12.557 ms
 3  lego.net.ipng.ch (198.19.4.7)  12.916 ms  12.864 ms  12.944 ms

pim@nginx0-nlams1:~$ ping -c 3 -6 -M do -s 8952 lego
PING lego(lego.net.ipng.ch (2001:678:d78:503::6)) 8952 data bytes
8960 bytes from lego.net.ipng.ch (2001:678:d78:503::7): icmp_seq=1 ttl=62 time=13.33 ms
8960 bytes from lego.net.ipng.ch (2001:678:d78:503::7): icmp_seq=2 ttl=62 time=13.52 ms
8960 bytes from lego.net.ipng.ch (2001:678:d78:503::7): icmp_seq=3 ttl=62 time=13.28 ms

--- lego ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 13.280/13.437/13.590/0.116 ms

pim@nginx0-nlams1:~$ ping -c 5 -3 -M do -s 8972 lego
PING  (198.19.4.6) 8972(9000) bytes of data.
8980 bytes from lego.net.ipng.ch (198.19.4.7): icmp_seq=1 ttl=62 time=12.85 ms
8980 bytes from lego.net.ipng.ch (198.19.4.7): icmp_seq=2 ttl=62 time=12.82 ms
8980 bytes from lego.net.ipng.ch (198.19.4.7): icmp_seq=3 ttl=62 time=12.91 ms

---  ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 12.823/12.843/12.913/0.138 ms

A note on the size used: An IPv4 header is 20 bytes, an IPv6 header is 40 bytes, and an ICMP header is 8 bytes. If the MTU defined on the network is 9000, then the size of the ping payload can be 9000-20-8=8972 bytes for IPv4 and 9000-40-8=8952 for IPv6 packets. Using jumboframes internally is a small optimization for the benefit of the internal webservers - less packets/sec means more throughput and performance in general. It’s also cool :)

CSRs and ACME, oh my!

In the old days, (and indeed, still today in many cases!) operators would write a Certificate Signing Request or CSR with the pertinent information for their website, and the SSL authority would then issue a certificate, send it to the operator via e-mail (or would you believe it, paper mail), after which the webserver operator could install and use the cert.

Today, most SSL authorities and their customers use the Automatic Certificate Management Environment or ACME protocol which is described in [RFC8555]. It defines a way for certificate authorities to check the websites that they are asked to issue a certificate for using so-called challenges. There are several challenge types to choose from, but the one I’ll be focusing on is called HTTP-01. These challenges are served from a well known URI, unsurprisingly in the path /.well-known/..., as described in [RFC5785].

Certbot

Certbot: Usually when running a webserver with SSL enabled, folks will use the excellent [Certbot] tool from the electronic frontier foundation. This tool is really smart, and has plugins that can automatically take a webserver running common server software like Apache, Nginx, HAProxy or Plesk, figure out how you configured the webserver (which hostname, options, etc), request a certificate and rewrite your configuration. What I find a nice touch is that it automatically installs certificate renewal using a crontab.

LEGO
LEGO: A Let’s Encrypt client and ACME library written in Go [ref] and it’s super powerful, able to solve for multiple ACME challenges, and tailored to work well with Let’s Encrypt as a certificate authority. The HTTP-01 challenge works as follows: when an operator wants to prove that they own a given domain name, the CA can challenge the client to host a mutually agreed upon random number at a random URL under their webserver’s /.well-known/acme-challenge/ on port 80. The CA will send an HTTP GET to this random URI and expect the number back in the response.

Shared SSL at Edge

Because I will be running multiple frontends in different locations, it’s operationally tricky to serve this HTTP-01 challenge random number in a randomly named file on all three NGINX servers. But while the LEGO client can write the challenge file directly into a file in the webroot of a server, it can also run as an HTTP server with the sole purpose of responding to the challenge.

ACME Flow

This is a killer feature: if I point the /.well-known/acme-challenge/ URI on all the NGINX servers to the one LEGO instance running centrally, it no longer matters which of the NGINX servers Let’s Encrypt will try to use to solve the challenge - they will all serve the same thing! The LEGO client will construct the challenge request, ask Let’s Encrypt to send the challenge, and then serve the response. The only thing left to do then is copy the resulting certificate to the frontends.

Let me demonstrate how this works, by taking an example based on four websites, none of which run on servers that are reachable from the internet: [go.ipng.ch], [video.ipng.ch], [billing.ipng.ch] and [grafana.ipng.ch]. These run on four separate virtual machines (or docker containers), all within the IPng Site Local private network in 198.19.0.0/16 and 2001:678:d78:500::/56 which aren’t reachable from the internet.

Ready? Let’s go!

lego@lego:~$ lego --path /etc/lego/ --http --http.port :8080 --email=noc@ipng.ch \
  --domains=nginx0.ipng.ch --domains=grafana.ipng.ch --domains=go.ipng.ch \
  --domains=video.ipng.ch --domains=billing.ipng.ch \
  run

The flow of requests is as follows:

  1. The LEGO client contacts the Certificate Authority and requests validation for a list of the cluster hostname nginx0.ipng.ch and the additional four domains. It asks the CA to perform an HTTP-01 challenge. The CA will share two random numbers with LEGO, which will start a webserver on port 8080 and serve the URI /.well-known/acme-challenge/$(NUMBER1).

  2. The CA will now resolve the A/AAAA addresses for the domain (grafana.ipng.ch), which is a CNAME for the cluster (nginx0.ipng.ch), which in turn has multiple A/AAAA pointing to the three machines associated with it. Visit any one of the NGINX servers on that negotiated URI, and they will forward requests for /.well-known/acme-challenge internally back to the machine running LEGO on its port 8080.

  3. The LEGO client will know that it’s going to be visited on the URI /.well-known/acme-challenge/$(NUMBER1), as it has negotiated that with the CA in step 1. When the challenge request arrives, LEGO will know to respond using the contents as agreed upon in $(NUMBER2).

  4. After validating that the response on the random URI contains the agreed upon random number, the CA knows that the operator of the webserver is the same as the certificate requestor for the domain. It issues a certificate to the LEGO client, which stores it on its local filesystem.

  5. The LEGO machine finally distributes the private key and certificate to all NGINX machines, which are now capable of serving SSL traffic under the given names.

This sequence is done for each of the domains (and indeed, any other domain I’d like to add), and in the end a bundled certiicate with the common name nginx0.ipng.ch and the four additional alternate names is issued and saved in the certificate store. Up until this point, NGINX has been operating in clear text, that is to say the CA has issued the ACME challenge on port 80, and NGINX has forwarded it internally to the machine running LEGO on its port 8080 without using encryption.

Taking a look at the certificate that I’ll install in the NGINX frontends (note: never share your .key material, but .crt files are public knowledge):

lego@lego:~$ openssl x509 -noout -text -in /etc/lego/certificates/nginx0.ipng.ch.crt
...
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            03:db:3d:99:05:f8:c0:92:ec:6b:f6:27:f2:31:55:81:0d:10
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Let's Encrypt, CN = R3
        Validity
            Not Before: Mar 16 19:16:29 2023 GMT
            Not After : Jun 14 19:16:28 2023 GMT
        Subject: CN = nginx0.ipng.ch
...
        X509v3 extensions:
            X509v3 Subject Alternative Name:
                DNS:billing.ipng.ch, DNS:go.ipng.ch, DNS:grafana.ipng.ch,
                DNS:nginx0.ipng.ch, DNS:video.ipng.ch

While the amount of output of this certificate is considerable, I’ve highlighted the cool bits. The Subject (also called Common Name or CN) of the cert is the first --domains entry, and the alternate names are that one plus all other --domains given when calling LEGO earlier. In other words, this certificate is valid for all five DNS domain names. Sweet!

NGINX HTTP Configuration

I find it useful to think about the NGINX configuration in two parts: (1) the cleartext / non-ssl parts on port 80, and (2) the website itself that lives behind SSL on port 443. So in order, here’s my configuration for the acme-challenge bits on port 80:

pim@nginx0-chrma0:~$ cat < EOF | tee /etc/nginx/conf.d/lego.inc
location /.well-known/acme-challenge/ {
  auth_basic off;
  proxy_intercept_errors  on;
  proxy_http_version      1.1;
  proxy_set_header        Host $host;
  proxy_pass              http://lego.net.ipng.ch:8080;
  break;
}
EOF

pim@nginx0-chrma0:~$ cat < EOF | tee /etc/nginx/sites-available/go.ipng.ch.conf
server {
  listen [::]:80;
  listen 0.0.0.0:80;

  server_name go.ipng.ch go.net.ipng.ch go;
  access_log /var/log/nginx/go.ipng.ch-access.log;

  include "conf.d/lego.inc";

  location / {
    return 301 https://go.ipng.ch$request_uri;
  }
}
EOF

The first file is an include-file that is shared between all websites I’ll serve from this cluster. Its purpose is to forward any requests that start with the well-known ACME challenge URI onto the backend LEGO virtual machine, without requiring any authorization. Then, the second snippet defines a simple webserver on port 80 giving it a few names (the FQDN go.ipng.ch but also two shorthands go.net.ipng.ch and go). Due to the include, the ACME challenge will be performed on port 80. All other requests will be rewritten and returned as a redirect to https://go.ipng.ch/. If you’ve ever wondered how folks are able to type http://go/foo and still avoid certificate errors, here’s a cool trick that accomplishes that.

Actually these two things are all that’s needed to obtain the SSL cert from Let’s Encrypt. I haven’t even started a webserver on port 443 yet! To recap:

  • Listen only to /.well-known/acme-challenge/ on port 80, and forward those requests to LEGO.
  • Rewrite all other port-80 traffic to https://go.ipng.ch/ to avoid serving any unencrypted content.

NGINX HTTPS Configuration

Now that I have the SSL certificate in hand, I can start to write webserver configs to handle the SSL parts. I’ll include a few common options to make SSL as safe as it can be (borrowed from Certbot), and then create the configs for the webserver itself:

pim@nginx0-chrma0:~$ cat < EOF | tee -a /etc/nginx/conf.d/options-ssl-nginx.inc
ssl_session_cache shared:le_nginx_SSL:10m;
ssl_session_timeout 1440m;
ssl_session_tickets off;

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;

ssl_ciphers "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:
  ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:
  DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384";
EOF

pim@nginx0-chrma0:~$ cat < EOF | tee /etc/nginx/sites-available/go.ipng.ch.conf
server {
  listen [::]:443 ssl http2;
  listen 0.0.0.0:443 ssl http2;
  ssl_certificate /etc/nginx/conf.d/nginx0.ipng.ch.crt;
  ssl_certificate_key /etc/nginx/conf.d/nginx0.ipng.ch.key;
  include /etc/nginx/conf.d/options-ssl-nginx.inc;
  ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.pem;

  server_name go.ipng.ch;
  access_log /var/log/nginx/go.ipng.ch-access.log upstream;

  location /edit/ {
    proxy_pass http://git.net.ipng.ch:5000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    satisfy any;
    allow 198.19.0.0/16;
    allow 194.1.163.0/24;
    allow 2001:678:d78::/48;
    deny  all;
    auth_basic           "Go Edits";
    auth_basic_user_file /etc/nginx/conf.d/go.ipng.ch-htpasswd;
  }

  location / {
    proxy_pass http://git.net.ipng.ch:5000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}
EOF

The certificate and SSL options are loaded first from /etc/nginc/conf.d/nginx0.ipng.ch.{crt,key}.

Next, I don’t want folks on the internet to be able to create or edit/overwrite my go-links, so I’ll add an ACL on the URI starting with /edit/. Either you come from a trusted IPv4/IPv6 prefix, in which case you can edit links at will, or alternatively you present a username and password that is stored in the go-ipng.ch-htpasswd file (using the Debian package apache2-utils).

Finally, all other traffic is forwarded internally to the machine git.net.ipng.ch on port 5000, where the go-link server is running as a Docker container. That server accepts requests from the IPv4 and IPv6 IPng Site Local addresses of all three NGINX frontends to its port 5000.

Icing on the cake: Internal SSL

The go-links server I described above doesn’t itself spreak SSL. It’s meant to be frontended on the same machine by an Apache or NGINX or HAProxy which handles the client en- and decryption, and usually that frontend will be running on the same server, at which point I could just let it bind localhost:5000. However, the astute observer will point out that the traffic on the IPng Site Local network is cleartext. Now, I don’t think that my go-links traffic poses a security or privacy threat, but certainly other sites (like billing.ipng.ch) are more touchy, and as such require a end to end encryption on the network.

In 2003, twenty years ago, a feature was added to TLS that allows the client to specify the hostname it was expecting to connect to, in a feature called Server Name Indication or SNI, described in detail in [RFC3546]:

[TLS] does not provide a mechanism for a client to tell a server the name of the server it is contacting. It may be desirable for clients to provide this information to facilitate secure connections to servers that host multiple ‘virtual’ servers at a single underlying network address.

In order to provide the server name, clients MAY include an extension of type “server_name” in the (extended) client hello.

Every modern webserver and -browser can utilize the SNI extention when talking to eachother. NGINX can be configured to pass traffic along to the internal webserver by re-encrypting it with a new SSL connection. Considering the internal hostname will not necessarily be the same as the external website hostname, I can use SNI to force the NGINX->Billing connection to re-use the billing.ipng.ch hostname:

  server_name billing.ipng.ch;
  ...
  location / {
    proxy_set_header        Host $host;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;
    proxy_read_timeout      60;

    proxy_pass              https://biling.net.ipng.ch:443;
    proxy_ssl_name          $host;
    proxy_ssl_server_name   on;
  }

What happens here is the upstream server is hit on port 443 with hostname billing.net.ipng.ch but the SNI value is set back to $host which is billing.ipng.ch (note, without the *.net.ipng.ch domain). The cool thing is, now the internal webserver can reuse the same certificate! I can use the mechanism described here to obtain the bundled certificate, and then pass that key+cert along to the billing machine, and serve it there using the same certificate files as the frontend NGINX.

What’s next

Of course, the mission to save IPv4 addresses is achieved - I can now run dozens of websites behind these three IPv4 and IPv6 addresses, and security gets a little bit better too, as the webservers themselves are tucked away in IPng Site Local and unreachable from the public internet.

This IPng Frontend design also helps with reliability and latency. I can put frontends in any number of places, renumber them relatively easily (by adding or removing A/AAAA records to nginx0.ipng.ch and otherwise CNAMEing all my websites to that cluster-name). If load becomes an issue, NGINX has a bunch of features like caching, cookie-persistence, loadbalancing with health checking (so I could use multiple backend webservers and round-robin over the healthy ones), and so on. Our Mastodon server on [ublog.tech] or our Peertube server on [video.ipng.ch] can make use of many of these optimizations, but while I do love engineering, I am also super lazy so I prefer not to prematurely over-optimize.

The main thing that’s next is to automate a bit more of this. IPng Networks has an Ansible controller, which I’d like to add maintenance of the NGINX and LEGO configuration. That would sort of look like defining pool nginx0 with hostnames A, B and C; and then having a playbook that creates the virtual machine, installes and configures NGINX, and plumbs it through to the LEGO machine. I can imagine running a specific playbook that ensures the certificates stay fresh in some CI/CD (I have a drone runner alongside our [Gitea] server), or just add something clever to a cronjob on the LEGO machine that periodically runs lego ... renew and when new certificates are issued, copy them out to the NGINX machines in the given cluster with rsync, and reloading their configuration to pick up the new certs.

But considering Ansible is its whole own elaborate bundle of joy, I’ll leave that for maybe another article.