Case Study: IPng's Client Certificates

Posted: 2026-06-27

Introduction

At IPng, I run a cluster of NGINX web frontends, the design of which I described in [this article]. Most internal services sit behind a nifty feature of NGINX called the [geo module], which creates a map of IP prefixes which either are, or aren’t allowed to access the website. There’s also a more sophisticated [geoip module], which allows me to lookup the ASN or country of a given source IP, and if it resolves to a country I am not interested in serving, NGINX returns an early 403. It is a blunt but effective first line of defense against probes and bots from the other side of the world.

The problem is that geo-IP is a country-level filter, not a person-level one. IPng has a bunch of internal websites, such as the VPP Maglev, an eVPN control panel, a billing console, which are all behind the same frontend cluster. I simply can’t use geo-IP alone to express more fine grained access control, such as “Pim is allowed to reach evpn.ipng.ch/admin/, but Alice and Bob may reach only evpn.ipng.ch/view/”. Every legitimate user in an admitted country (or IP prefix) would get access to everything, which is no bueno.

What is pretty common in our industry is to break this problem into two distinct things.

Authentication: AuthN is a way to establish who is making the request, down to the individual person and ideally down to the individual device or browser
Authorization: AuthZ is some form of policy that says this authenticated identity may or may not reach a given web property or path.

This article is a case study in how I chose to implement this at IPng Networks.

WebPKI and Authentication

The web has a mechanism for client authentication, built right into TLS: client certificates, as described in [RFC 8446]. A browser presents a certificate during the TLS handshake; the server verifies it against a trusted Certificate Authority and learns the client’s identity. NGINX handles this natively, which is dope. The gap is that a certificate alone gives identity (AuthN), not permissions (AuthZ). Knowing that Alice is Alice does not tell NGINX whether Alice is allowed to visit /admin (and she’s not!).

A natural but naive first thought is to issue short-lived certificates: a 24-hour cert per access grant, let it expire, issue a new one. But short-lived certificates are a support nightmare: “Install this certificate in your browser” is hard enough once; doing it every day, across a phone, a laptop, and a work machine, and for multiple websites, quickly becomes a constant chore that nobody in the history of ever will be signing up for. I want to go the other direction: give long-lived credentials so that I can instead revoke any access entitlements instantly, not credentials that expire and need re-enrollment continuously.

This article describes my chosen solution on IPng’s web farm: ipng-nginx-auth. It issues and manages long-lived client certificates through a private CA, handles the full certificate lifecycle from enrollment to renewal and revocation, and layers a fine-grained ACL engine on top so that each certificate identity maps to exactly the set of web properties and paths it is permitted to reach.

The project ships three components: authd is a central control plane, authz is a distributed per-NGINX enforcement sidecar, and authc is an operator commandline interface to configure the system. I like reusing patterns, so the project follows the same architecture as [vpp-evpn] and [vpp-maglev] before it: a single authoritative daemon, a fleet of lightweight enforcement agents, a gRPC-based distribution event stream, and a golang-cli shell which I published earlier on [git.ipng.ch/ipng/golang-cli]. Speaking of git, that website is now protected by [Anubis] due to the constant stream of bots hammering my git repos. Keep it classy, Big Tech!

Mutual TLS

When you visit https://ipng.ch, normally the TLS handshake authenticates the server to your browser. Reading up on [RFC 8446] Section 4.4.2, the server sends a Certificate message containing its leaf certificate and any intermediate CA certificates. Your browser validates the chain up to a trusted root in its certificate store, checks that the server name matches the certificate’s Subject Alternative Names, and verifies the certificate has not expired or been revoked. Only then does the handshake complete. This one-directional authentication is the foundation of HTTPS and it is what pretty much every website you and I visit relies on.

Mutual TLS (mTLS) adds validation in the opposite direction: the server also asks the client to present a certificate and validates it against a CA it trusts. RFC 8446 Section 4.3.2 describes the server-side CertificateRequest message that initiates this exchange. I can take a look and see:

$ openssl s_client -connect evpn.ipng.ch:443 -servername evpn.ipng.ch \
    -cert client.crt -key client.key
...
Certificate chain
 0 s:CN=ipng.ch
   i:C=US, O=Let's Encrypt, CN=E8
   a:PKEY: EC, (prime256v1); sigalg: ecdsa-with-SHA384
   v:NotBefore: May 12 06:44:46 2026 GMT; NotAfter: Aug 10 06:44:45 2026 GMT
 1 s:C=US, O=Let's Encrypt, CN=E8
   i:C=US, O=Internet Security Research Group, CN=ISRG Root X1
   a:PKEY: EC, (secp384r1); sigalg: sha256WithRSAEncryption
   v:NotBefore: Mar 13 00:00:00 2024 GMT; NotAfter: Mar 12 23:59:59 2027 GMT
...
Acceptable client certificate CA names
CN=ipng-nginx-auth client-auth CA
Requested Signature Algorithms: id-ml-dsa-65:id-ml-dsa-87:id-ml-dsa-44:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:ecdsa_brainpoolP256r1_sha256:ecdsa_brainpoolP384r1_sha384:ecdsa_brainpoolP512r1_sha512:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Shared Requested Signature Algorithms: id-ml-dsa-65:id-ml-dsa-87:id-ml-dsa-44:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:ecdsa_brainpoolP256r1_sha256:ecdsa_brainpoolP384r1_sha384:ecdsa_brainpoolP512r1_sha512:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: ecdsa_secp256r1_sha256
Negotiated TLS1.3 group: X25519MLKEM768
---
SSL handshake has read 3752 bytes and written 2570 bytes
Verification: OK

The first block is the webserver authenticating itself, but then the second block is the webserver itself demanding that the client present a certificate, more precisely one that is coming from a CA called CN=ipng-nginx-auth client-auth CA. If I don’t issue the -cert and -key flags, the (nginx) server will see no client certificate. Depending on NGINX’s ssl_verify_client setting, it will either abort the handshake entirely (on) or complete it and mark verification as failed (optional). Maybe you’re thinking “Why would you use optional?” Until I understood it better, I asked myself that too, but I’ll explain why in a minute, hang on.

Once the mTLS handshake completes, the server has verified that the client holds the private key corresponding to a certificate signed by its specified CA. The client’s identity is encoded in the certificate’s Subject [Distinguished Name (DN)], a structured string per [RFC 4514] composed of attribute type/value pairs such as CN=alice@ipng.ch,O=IPng Networks GmbH. The DN can include Organization (O), Organizational Unit (OU), Country (C), and other attributes, but the leaf-level identifier is the aptly named Common Name (CN).

What I take away from this, is that for ipng-nginx-auth, I can create my own Private Certificate Authority and tell NGINX to use it to ask clients to authenticate themselves. I can use as many or few fields as I like, so my first decision is to use only the Common Name, setting it simply to the user’s email address. When client authentication is turned on, NGINX sets three pertinent variables:

$ssl_client_s_dn captures a verified client’s full Distinguished Name (DN)
$ssl_client_serial contains the used certificate serial number, and
$ssl_client_verify can be either SUCCESS, FAILED: reason, or NONE, signaling the verdict of the client certificate inspection.

What’s nice about this, is I can let NGINX do the hard work of mTLS and cryptographic validation of the client identity (in other words, it implements the entire AuthN for me), and simply leverage these three resulting variables to perform the AuthZ parts.

So far, so good.

AuthN: Private CA

First, I need to study a bit more jargon. A [Certificate Authority] is an entity that issues [X.509] certificates per [RFC 5280]. Public CAs like [Let’s Encrypt] and [Sectigo] are pre-trusted by browsers and operating systems world-wide. Their trustworthiness rests on strict auditing programs and on Certificate Transparency: a public audit log of every certificate any CA issues. I covered CT Logs in depth in my [Certificate Transparency series]. IPng runs two [Static CT Logs] of its own.

Anyway, for client certificates, a public CA is both overkill and the wrong tool. Nobody outside of IPng Networks itself needs to trust these certificates. A private CA, whose root certificate I distribute to exactly the NGINX nodes that need it, gives me complete control over validity periods, issuance policy, and revocation, with no external dependencies, fees, or audits. If I break it, I get to keep the broken parts, but at least nobody complains, and that’s the way I like it.

The moving parts of a CA are straightforward. The CA holds a key pair and a self-signed root certificate that acts as the trust anchor. To issue a leaf certificate, the normal flow is: the requestor generates a key pair, constructs a Certificate Signing Request (CSR) containing their public key and proposed subject DN, and submits it. The CA validates the CSR, signs a certificate with its own key, and returns a signed certificate (possibly uploading the cert into a transparency log). The resulting certificate chain, the leaf cert signed by the CA’s key, is what NGINX validates at handshake time against the CA file given to ssl_client_certificate, exactly the thing I played with above in the openssl s_client command.

To make things a bit simpler, I decide to make ipng-nginx-auth skip the CSR step entirely and run its enrollment server-side. The daemon generates the key pair, signs the certificate, and bundles both into a password-protected [PKCS#12] (.p12) file and an Apple .mobileconfig profile. Using my iPhone I can install the bundle from a single tap, and in FireFox, Chrome, Safari I can import the P12 just once. My browser will now know, that if a website demands mTLS, that they can look in their client certificate store, and select a cert that belongs to the CA that NGINX is saying it needs a client identity from.

AuthN: Client TLS certificates

Enabling client certificate verification in NGINX requires just two directives inside a server {} block. The ssl_client_certificate directive points to the CA chain PEM that NGINX uses to validate the client’s certificate, and ssl_verify_client controls the strictness:

ssl_client_certificate /etc/ipng-nginx-auth/client-ca-chain.crt;
ssl_verify_client      optional;

Setting ssl_verify_client on makes NGINX reject the TLS handshake entirely if no valid client certificate is presented. This is the strictest posture but it is also inflexible: health check endpoints, public landing pages, and any browser without a certificate all fail at the TLS layer before NGINX logic runs, which yields a terrible user experience.

In the optional mode, NGINX sets $ssl_client_verify to SUCCESS when the certificate is valid and signed by the configured CA, to NONE if no certificate was presented, or to a FAILED: reason string otherwise. As I saw above, the verified subject DN and the serial are stored in variables. This mode lets me enforce client certificates selectively per location {}, leaving health checks and unauthenticated paths untouched while enforcing them on the locations that matter. It gives a much better user experience.

While rummaging through the NGINX [docs], I notice that NGINX also supports static Certificate Revocation List (CRL) checking via ssl_crl, pointing to a PEM-encoded revocation list on disk. This works but kind of sucks, at the same time: the CRL file must be distributed to every NGINX node and the process must reload for changes to take effect. On a fleet of many frontends, a prompt revocation means scripting a copy and reload across all nodes. But there is another way. An AuthZ sidecar could contain an in-memory certificate status set, including the CRL, fed in real time from a central authd service over a streaming gRPC subscription. But I get ahead of myself.. let me learn more about the NGINX parts first.

To wrap up, a candidate configuration that admits any browser presenting a CA-issued certificate that has not expired looks like this:

server {
    listen 443 ssl;
    ssl_certificate     /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/private/server.key;

    ssl_client_certificate /etc/ipng-nginx-auth/client-ca-chain.crt;
    ssl_verify_client      optional;

    location / {
        if ($ssl_client_verify != "SUCCESS") {
            return 403;
        }
        proxy_pass http://backend;
    }
}

This configuration admits any browser holding a certificate from my private CA that has not expired, regardless of who that user is or which path they are requesting. Pim, Alice, and that asshole Bob all get through to the same backend. Some policy like “Alice can use /view/ but not /admin/” is still not expressible here. A certificate gives me an identity; it does not give me permissions. Getting from identity to fine-grained path-level access control is exactly the Authorization problem, and that requires something beyond NGINX’s built-in TLS directives.

Authorization: NGINX Module

I find out that NGINX ships an [auth_request module] that fits this authorization gap like a glove. When auth_request is configured for a location, NGINX makes an internal HTTP subrequest to a designated URI before serving the original request. The game is laughably simple: If the subrequest returns HTTP 200, the original request proceeds. If it returns HTTP 403, NGINX denies the client. Any context needed for the policy decision, the client’s DN, certificate serial number, the original host and URI, and any additional information like say the name of an ACL to evaluate, I can attach all of these as custom proxy headers on the internal subrequest. It immediately strikes me as a very idiomatic way to implement AuthZ.

And what’s even better, the module is included by default in Debian’s nginx package. The configuration pattern is:

set $ipng_authz_acl "wiki";

location / {
    auth_request /.well-known/ipng/authz;
    proxy_pass http://backend;
    error_page 403 =302 https://ipng.ch/;
}

location = /.well-known/ipng/authz {
    internal;
    proxy_pass http://unix:/run/ipng-nginx-authz/authz.sock:/check?acl=$ipng_authz_acl;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
    proxy_set_header X-Client-Verify $ssl_client_verify;
    proxy_set_header X-Client-Serial $ssl_client_serial;
    proxy_set_header X-Client-DN     $ssl_client_s_dn;
    proxy_set_header X-Client-Addr   $remote_addr;
    proxy_set_header X-Orig-URI      $request_uri;
    proxy_set_header X-Orig-Host     $host;
}

First, a belt-and-suspenders comment: the internal flag prevents the authz location from being reached directly by external clients. The proxy_pass constructs a Unix Domain Socket (UDS) path upon which an external sidecar server listens, it passes all of these headers and then calls /check?acl=$ipng_authz_acl, where the query parameter tells the sidecar which named ACL to evaluate.

If the authorization service returns 403, NGINX by default renders a bare error page, which is kind of gross. In the location / handler, I can intercept that with a redirect to a friendlier destination. Turning the 403 into a 302 redirect will send the user to the IPng homepage rather than a cryptic error. For unauthenticated users, this approach could also serve as an implicit enrollment prompt: the landing page might explain how to obtain and install a certificate, rather than leaving them puzzled at a 403 Forbidden. Not everybody speaks HTTP, you know!

I also take a moment to appreciate that the auth_request module knows nothing about certificates, ACLs, or users. It delegates the allow/deny decision to an external HTTP endpoint and respects the response code. That separation of concerns is what makes the system composable: NGINX handles TLS termination and extraction of the mTLS identity of the connecting client (the AuthN stuff). Then, I can add an authorization sidecar to handle policy evaluation (the AuthZ stuff). Neither component needs to understand the other’s internals.

IPng NGINX Auth

This is how I came to sketch a quick ACL language that would allow a concoction like above to pass along an internal AuthZ request to a sidecar running alongside NGINX. Its job is to receive ACLs from a central location, and answer questions “is cert SERIAL from distinguished name DN, coming from address IP, allowed to visit website HOST on uri path URI?”

Structure

Why think of new ways poorly, if you can steal somebody else’s approach? Swipe, Swiper, swipe!! My ACL evaluation model is lightly inspired from [OpenBSD pf.conf], the packet filter configuration language that, to my mind, remains one of the cleanest firewall policy languages ever written. The key idea is simple: rules are evaluated in order, every matching rule updates a running verdict, and the default verdict is deny. The quick keyword (which I call terminate in my ACLs) short-circuits evaluation immediately and returns whatever is the current verdict.

I make each named ACL an ordered sequence of rules. Each rule has a sequence number (seq), perhaps I am a network engineer after all, and five optional match constraints, each coming from the headers fed in by nginx:

host regular expression, matching the X-Orig-Host header
uri regular expression, matching the X-Orig-URI header
user regular expression, matching the X-Client-DN header
cert literal, matching the X-Client-Serial header
prefix CIDR prefix, matching the X-Client-Addr header (for IPv6 and IPv4).

These are accompanied by an action (permit or deny), and an optional terminate flag. A rule matches when all of its set constraints match (AND-logic); an unset constraint matches anything. Evaluation starts with deny, walks rules in ascending seq order, and each matching rule updates the running verdict. A matching rule with terminate stops the walk immediately.

The authoring workflow needs to be staged because I can’t go publishing half-edited ACLs, I’m not Cisco IOS or FRR after all. I decide to make rule edits accumulate in a staged version inside of the central authd server, which running authz sidecars never see. The RPC and CLI command acl <name> commit atomically promotes the staged version to live and pushes it to every connected sidecar on their Watch stream. Similarly, acl <name> rollback discards staged edits with no fleet-visible effect. This gives a safe author-review-commit loop and ensures that partial edits are never served to clients mid-edit.

authd: Centralized Auth Daemon

Now that I have the ACL language roughly shaped, my attention turns to authd, a centralized control plane in one Go binary: two private CAs, a SQLite-backed object store, an ACL staging and testing engine, and a gRPC server. The full API is defined in proto/auth.proto as a single AuthService and covers four domains: CA management, user and certificate lifecycle, ACL authoring, and the fleet distribution stream.

Two separate CAs are a deliberate design choice:

control-plane CA signs the gRPC server certificate and the client certificates that operators and authz sidecars use to authenticate to the gRPC API, each with their own permissions.
client-auth CA signs the browser certificates that end users install; its chain PEM is what NGINX trusts via ssl_client_certificate.

Keeping the two CAs separate makes sense, it limits key loss blast-radius: a leaked sidecar mTLS credential cannot be presented as a browser identity, and a browser certificate cannot authenticate any gRPC call. NGINX trusts only the client-auth CA chain; the gRPC API trusts only the control-plane CA. And of course, seeing as this is an ACL system, having mTLS on the gRPC channels is required, I definitely do not want to have an open unauthenticated RPC endpoint to manipulate my authentication system…

But now I have created a bootstrapping problem: the gRPC API requires mTLS, but the mTLS certificates come from the API. I think about this for a while, and decide to break the loop with three offline subcommands that operate directly on the local SQLite database before any network service is running (internal/bootstrap).

bootstrap database creates the schema;
bootstrap ca creates both CAs and the daemon’s own gRPC server leaf;
bootstrap client <name> mints the first operator identity for the gRPC server

Using this first client, every further credential can be issued online through the gRPC API and recorded in the database, making every identity auditable and revocable via the CLI.

The AuthService gRPC interface in proto/auth.proto exposes a CRUDL surface over four nouns CA, User, Cert and ACL:

ca defined in (internal/authd/info.go, internal/authd/clients.go) handles CA info, the revocations in CRL, and control-plane client identities via ca client create|show|list|delete. Clients issued with the authz role are restricted to the Watch stream and ReportEvents RPC; only an operator-role certificate may invoke mutating calls.
user manages users by email with simple verbs like create|show|list|delete|enable|disable. It is described in internal/authd/user.go.
cert in (internal/authd/certs.go) covers certificate lifecycle: cert create runs the full enrollment pipeline (in internal/enroll) including server-side keygen, CA signing, PKCS#12 and Apple .mobileconfig assembly, and delivery to the operator. It discards the client’s private key.
acl family (in internal/authd/acls.go) covers staged rule editing, along with a commit, rollback, and the acl test simulator, which uses the same evaluation engine as the sidecar itself (see internal/acl).

Two streaming RPCs tie the system together. First, Watch pushes the committed ACLs and certificate revocation snapshot (stream AuthzSnapshot) to every subscribed sidecar on connect, and pushes a new snapshot on every policy change (internal/authd/dist.go). Secondly, ReportEvents is the reverse: authz sidecars push their lifecycle events up to authd, which merges them with its own and re-exposes the union through WatchEvents to operators.

authc: CLI

If you’ve followed along in previous articles, these gRPC endpoints are excellent companions for a Command Line Interface. authc is the operator CLI for this system. It uses the same [golang-cli] interactive shell library that drives the evpnc CLI in [vpp-evpn] and maglevc in [vpp-maglev). Invoked without arguments it drops into a tab-completing interactive shell; invoked with a command it runs once and exits, useful for scripting. The -json flag switches all output to JSON. The command tree mirrors the gRPC API: ca, user, cert, acl, config, watch, with getters and setters. And I get all of this almost for free, the CLI is 1200 Lines of Code all up.

Creating a user, issuing two certificates for two different devices, and wiring up an ACL that gives the user general access but restricts /admin to only the workstation certificate looks like this:

ipng-nginx-authc> user create alice@ipng.ch
created user "alice@ipng.ch"

ipng-nginx-authc> cert create alice@ipng.ch expire 1y
issued cert for alice@ipng.ch (cid A3F2..., expires 2027-06-27)
  p12:          alice_at_ipng.ch-A3F2.p12
  mobileconfig: alice_at_ipng.ch-A3F2.mobileconfig
  password:     b9k2-xqmf-8vrp

ipng-nginx-authc> cert create alice@ipng.ch expire 1y
issued cert for alice@ipng.ch (cid 9C11..., expires 2027-06-27)
  p12:          alice_at_ipng.ch-9C11.p12
  mobileconfig: alice_at_ipng.ch-9C11.mobileconfig
  password:     j4t7-nmzs-6wkh

ipng-nginx-authc> acl create wiki
ipng-nginx-authc> acl wiki seq 5 user pim@ipng.ch prefix 2001:687:d78:300::/62 permit terminate
ipng-nginx-authc> acl wiki seq 10 user alice@ipng.ch permit
ipng-nginx-authc> acl wiki seq 20 host wiki.ipng.ch uri ^/admin deny
ipng-nginx-authc> acl wiki seq 30 host wiki.ipng.ch uri ^/admin cert 9C11... permit terminate
ipng-nginx-authc> acl wiki commit

Let me go over this step by step, as it shows the philosophy of the OpenBSD pf.conf I was talking about earlier. In this session Alice gets two certificates: A3F2 for her everyday laptop and 9C11 for her admin workstation. The ACL has four rules. The first one at seq 5 says: if any cert is presented with CN=pim@ipng.ch from the internal IPv6 network in IPng Site Local, permit it and stop evaluating rules (terminate). The following three rules are for Alice: seq 10 permits any request from Alice’s email, giving both certificates broad access. seq 20 goes on to deny any request to /admin, overriding the broad permit for that path. seq 30 re-permits /admin but only for the exact certificate serial 9C11, and it locks in the verdict by issuing a terminate statement.

The net result: Alice’s laptop (A3F2) reaches everything except /admin; while her workstation (9C11) reaches everything including /admin. My certificates get full access, but only if I come from a specific Wireguard VPN pool.

I built a handy acl test simulator which lets me verify this before committing, using the identical evaluation engine the sidecar runs:

ipng-nginx-authc> acl test wiki user alice@ipng.ch cert A3F2 https://wiki.ipng.ch/admin/settings detail
result: deny
reason: matched seq 20

ipng-nginx-authc> acl test wiki user alice@ipng.ch cert 9C11 https://wiki.ipng.ch/admin/settings detail
result: permit
reason: matched seq 30 (terminate)

NGINX

With authd holding the policy and authc authoring it, the remaining piece is getting that policy enforced on each NGINX server, of course without shipping the CA keys or the SQLite database there, and with zero-dependencies, and with high performance. Standard issue distributed evaluation stuff.

authz: Minimalistic Sidecar

My final component is authz, a sidecar that runs one instance per NGINX node, answers auth_request subrequests over a local Unix Domain Socket, and stays synchronized with authd by holding a streaming gRPC Watch subscription and receiving ACL snapshots. It is entirely stateless: no database, no CA keys, nothing on disk except its own mTLS identity and static configuration pointing it to the authd central registry.

After startup, it fetches the AuthzSnapshot which contains all live ACLs and the complete certificate status set including the CRL and any disabled certificates and users. It then simply begins answering requests over the unix domain socket. There’s no TCP listener, and I do this on purpose. Not only is UDS faster, it also leaves much less room for bad guys to tinker with the AuthZ service.

authz: resilience

The authz-role mTLS certificate is scoped at the gRPC layer to Watch and ReportEvents only; it can’t create users, issue certificates, or modify ACLs. Even with code execution inside authz, an attacker gains nothing actionable on the control plane, and no user keys or certificates. authd taking the day (of week) off means authz keeps serving its last-known state, it does not fail open. As Adam from North of the Border would say: “That’s not just good, it’s good enough.”

Subsequent snapshots arrive as pushed updates over the same Watch stream. When an operator commits an ACL change, authd pushes a new AuthzSnapshot to every connected sidecar, which swaps its in-memory ACL set atomically via a sync/atomic.Pointer. If the gRPC stream drops, there is no problem. authz just keeps serving its last-known live set and reconnects automatically when authd returns. An authd outage does not interrupt request serving; it only prevents new policy changes from propagating until connectivity is restored.

authz: performance

I ran a load test on Nimbus, an older Ryzen 5950X machine, against a standalone authz using a -acl-file flag, which loads a static AuthzSnapshot with no control plane involved and isolates the dataplane decision path. The table shows requests per second at varying ACL rule counts (terms) and thread counts (thr):

 terms  thr    req/s       avg       p50       p90       p95       p98       p99       max
     1    1    13745      72us      73us      81us      89us      98us     108us     812us
     1    2    28611      69us      66us      85us      93us     108us     138us     808us
     1    4    56592      70us      66us      98us     114us     138us     225us     1.1ms
     1    8   114024      69us      60us      93us     114us     225us     301us     911us
     1   16   141096     112us      89us     185us     287us     491us     597us     1.6ms
    10    1    13664      72us      73us      81us      85us      93us     108us     741us
    10    2    27462      72us      70us      85us      93us     114us     152us     874us
    10    4    54301      73us      70us     103us     119us     152us     236us     922us
    10    8   110441      71us      63us      98us     119us     236us     316us     974us
    10   16   138105     115us      89us     185us     287us     491us     597us     1.7ms
   100    1    10288      96us      98us     108us     114us     125us     138us     836us
   100    2    20946      94us      93us     114us     119us     138us     168us     1.2ms
   100    4    47938      82us      81us     119us     138us     168us     225us     1.1ms
   100    8    95498      83us      73us     108us     138us     236us     316us     1.0ms
   100   16   119828     132us     108us     214us     316us     516us     627us     1.7ms
  1000    1     3033     329us     332us     349us     366us     366us     385us     953us
  1000    2     6009     332us     332us     385us     404us     424us     445us     1.7ms
  1000    4    20954     190us     176us     273us     316us     349us     385us     878us
  1000    8    36658     217us     204us     316us     366us     424us     491us     1.2ms
  1000   16    50304     317us     287us     468us     568us     762us     882us     2.4ms

At the realistic operating point of 10 ACL rules and 8 threads, authz turns around about 110k requests per second at a median latency of 70 microseconds! Even at 1000 rules and 16 threads the throughput holds at 50k req/s with a median under 300 microseconds. Considering the NGINX instances at IPng run at between 1k-5k req/s, this is more than enough. Slick!

Logging / Observability

Every component writes structured JSON to its local journal via log/slog. authz additionally pushes its lifecycle events up to authd over the ReportEvents streaming RPC (internal/authz/reporter.go) on the same mTLS gRPC connection: ACL snapshot applied, control-plane connected and disconnected, startup and shutdown. authd tags each event with the originating sidecar’s identity, merges it with its own events, and re-exposes the union through WatchEvents. An operator running ipng-nginx-authc watch events sees the whole fleet in one stream, filtered optionally by event type, log level, or origin node.

Per-request permit/deny decisions are deliberately excluded from this stream. A busy NGINX node evaluates hundreds of auth_request calls per second; logging each one would produce logspam that buries lifecycle events and overwhelms the gRPC uplink. Instead, decision outcomes are counted in Prometheus metrics on authz at :8299. These counters and histograms break down outcomes by permit or deny and ACL name. Information is also shared on the connected authz clients, and which version of the snapshot they are serving.

When I need to trace individual decisions for a specific site, I can enable per-request logging by appending &debug to the ACL name in NGINX config (set $ipng_authz_acl "wiki&debug";), or alternatively I can enable logging for a given ACL name (authc acl test logging enable). authz then logs that verdict at info level, which reaches the local journal and streams up through authc watch events. On a healthy node in normal operation nothing is logged per request, though, Prometheus carries the signal.

Results

Take a look at this asciinema recording: In it, you can see the daily operations at play:

configuring a website to use AuthN/AuthZ with two simple includes
with authc, creating a user, a cert, an ACL referencing it
a curl client trying to use the web service with and without certificates presented
realtime ACL changes to permit/deny a user and a specific certificate

It should give you a rough feeling for how this project is intended to work:

What’s next

The immediate next step is deploying ipng-nginx-auth to the IPng Networks production NGINX frontends. I plan to start with the EVPN status page and the BGP looking glass, where the user population is small and access patterns are well understood, and then progressively migrate the billing console and management tools from geo-IP fencing to per-user ACLs.

A few items are on the roadmap. Email delivery of enrollment artifacts is currently out of band: authc cert create writes the .p12 and .mobileconfig to a local directory and the operator delivers them to the user by other means. A future version could have authd mail the bundle directly. Control-plane high availability is another gap: authd is a single VM today, and while the authz sidecars survive its absence indefinitely, no new policy changes or certificate enrollments can happen while it is down. A warm standby with SQLite replication is the most likely path. Finally, the current enrollment model uses server-side keygen for ease of use; a CSR import path for power users who want to generate their own key material would be a natural addition.

In the mean time you can take a look at the developing code on [git.ipng.ch]: