Introduction

At IPng, I run a cluster of NGINX web frontends, the design of which I described in [this article]. Most internal services sit behind a nifty feature of NGINX called the [geo module], which creates a map of IP prefixes which either are, or aren’t allowed to access the website. There’s also a more sophisticated [geoip module], which allows me to lookup the ASN or country of a given source IP, and if it resolves to a country I am not interested in serving, NGINX returns an early 403. It is a blunt but effective first line of defense against probes and bots from the other side of the world.
The problem is that geo-IP is a country-level filter, not a person-level one. IPng has a bunch of
internal websites, such as the VPP Maglev, an eVPN control panel, a billing console, which are all
behind the same frontend cluster. I simply can’t use geo-IP alone to express more fine grained
access control, such as “Pim is allowed to reach evpn.ipng.ch/admin/, but Alice and Bob may reach
only evpn.ipng.ch/view/”. Every legitimate user in an admitted country (or IP prefix) would get
access to everything, which is no bueno.
What is pretty common in our industry is to break this problem into two distinct things.
- Authentication: AuthN is a way to establish who is making the request, down to the individual person and ideally down to the individual device or browser
- Authorization: AuthZ is some form of policy that says this authenticated identity may or may not reach a given web property or path.
This article is a case study in how I chose to implement this at IPng Networks.
WebPKI and Authentication
The web has a mechanism for client authentication, built right into TLS: client certificates, as
described in [RFC 8446]. A browser presents a
certificate during the TLS handshake; the server verifies it against a trusted Certificate Authority
and learns the client’s identity. NGINX handles this natively, which is dope. The gap is that a
certificate alone gives identity (AuthN), not permissions (AuthZ). Knowing that Alice is Alice does
not tell NGINX whether Alice is allowed to visit /admin (and she’s not!).
A natural but naive first thought is to issue short-lived certificates: a 24-hour cert per access grant, let it expire, issue a new one. But short-lived certificates are a support nightmare: “Install this certificate in your browser” is hard enough once; doing it every day, across a phone, a laptop, and a work machine, and for multiple websites, quickly becomes a constant chore that nobody in the history of ever will be signing up for. I want to go the other direction: give long-lived credentials so that I can instead revoke any access entitlements instantly, not credentials that expire and need re-enrollment continuously.
This article describes my chosen solution on IPng’s web farm: ipng-nginx-auth. It issues and manages
long-lived client certificates through a private CA, handles the full certificate lifecycle from
enrollment to renewal and revocation, and layers a fine-grained ACL engine on top so that each certificate
identity maps to exactly the set of web properties and paths it is permitted to reach.
The project ships three components: authd is a central control plane, authz is a distributed
per-NGINX enforcement sidecar, and authc is an operator commandline interface to configure the
system. I like reusing patterns, so the project follows the same architecture as [vpp-evpn] and [vpp-maglev] before it: a
single authoritative daemon, a fleet of lightweight enforcement agents, a gRPC-based distribution
event stream, and a golang-cli shell which I published earlier on
[git.ipng.ch/ipng/golang-cli]. Speaking of git, that website
is now protected by [Anubis] due to the constant stream of
bots hammering my git repos. Keep it classy, Big Tech!
Mutual TLS
When you visit https://ipng.ch, normally the TLS handshake authenticates the server to your
browser. Reading up on [RFC 8446] Section 4.4.2,
the server sends a Certificate message containing its leaf certificate and any intermediate CA
certificates. Your browser validates the chain up to a trusted root in its certificate store,
checks that the server name matches the certificate’s Subject Alternative Names, and verifies the
certificate has not expired or been revoked. Only then does the handshake complete. This
one-directional authentication is the foundation of HTTPS and it is what pretty much every website
you and I visit relies on.
Mutual TLS (mTLS) adds validation in the opposite direction: the server also asks the client to
present a certificate and validates it against a CA it trusts. RFC 8446 Section 4.3.2 describes the
server-side CertificateRequest message that initiates this exchange. I can take a look and see:
$ openssl s_client -connect evpn.ipng.ch:443 -servername evpn.ipng.ch \
-cert client.crt -key client.key
...
Certificate chain
0 s:CN=ipng.ch
i:C=US, O=Let's Encrypt, CN=E8
a:PKEY: EC, (prime256v1); sigalg: ecdsa-with-SHA384
v:NotBefore: May 12 06:44:46 2026 GMT; NotAfter: Aug 10 06:44:45 2026 GMT
1 s:C=US, O=Let's Encrypt, CN=E8
i:C=US, O=Internet Security Research Group, CN=ISRG Root X1
a:PKEY: EC, (secp384r1); sigalg: sha256WithRSAEncryption
v:NotBefore: Mar 13 00:00:00 2024 GMT; NotAfter: Mar 12 23:59:59 2027 GMT
...
Acceptable client certificate CA names
CN=ipng-nginx-auth client-auth CA
Requested Signature Algorithms: id-ml-dsa-65:id-ml-dsa-87:id-ml-dsa-44:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:ecdsa_brainpoolP256r1_sha256:ecdsa_brainpoolP384r1_sha384:ecdsa_brainpoolP512r1_sha512:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Shared Requested Signature Algorithms: id-ml-dsa-65:id-ml-dsa-87:id-ml-dsa-44:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:ecdsa_brainpoolP256r1_sha256:ecdsa_brainpoolP384r1_sha384:ecdsa_brainpoolP512r1_sha512:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: ecdsa_secp256r1_sha256
Negotiated TLS1.3 group: X25519MLKEM768
---
SSL handshake has read 3752 bytes and written 2570 bytes
Verification: OK
The first block is the webserver authenticating itself, but then the second block is the webserver
itself demanding that the client present a certificate, more precisely one that is coming from a
CA called CN=ipng-nginx-auth client-auth CA. If I don’t issue the -cert and -key flags, the
(nginx) server will see no client certificate. Depending on NGINX’s ssl_verify_client setting, it
will either abort the handshake entirely (on) or complete it and mark verification as failed
(optional). Maybe you’re thinking “Why would you use optional?” Until I understood it better, I
asked myself that too, but I’ll explain why in a minute, hang on.
Once the mTLS handshake completes, the server has verified that the client holds the private key
corresponding to a certificate signed by its specified CA. The client’s identity is encoded in the
certificate’s Subject [Distinguished Name (DN)], a structured
string per [RFC 4514] composed of attribute
type/value pairs such as CN=alice@ipng.ch,O=IPng Networks GmbH. The DN can include Organization
(O), Organizational Unit (OU), Country (C), and other attributes, but the leaf-level
identifier is the aptly named Common Name (CN).
What I take away from this, is that for ipng-nginx-auth, I can create my own Private Certificate
Authority and tell NGINX to use it to ask clients to authenticate themselves. I can use as many or
few fields as I like, so my first decision is to use only the Common Name, setting it simply to the
user’s email address. When client authentication is turned on, NGINX sets three pertinent
variables:
$ssl_client_s_dncaptures a verified client’s full Distinguished Name (DN)$ssl_client_serialcontains the used certificate serial number, and$ssl_client_verifycan be either SUCCESS, FAILED: reason, or NONE, signaling the verdict of the client certificate inspection.
What’s nice about this, is I can let NGINX do the hard work of mTLS and cryptographic validation of the client identity (in other words, it implements the entire AuthN for me), and simply leverage these three resulting variables to perform the AuthZ parts.
So far, so good.
AuthN: Private CA
First, I need to study a bit more jargon. A [Certificate Authority] is an entity that issues [X.509] certificates per [RFC 5280]. Public CAs like [Let’s Encrypt] and [Sectigo] are pre-trusted by browsers and operating systems world-wide. Their trustworthiness rests on strict auditing programs and on Certificate Transparency: a public audit log of every certificate any CA issues. I covered CT Logs in depth in my [Certificate Transparency series]. IPng runs two [Static CT Logs] of its own.
Anyway, for client certificates, a public CA is both overkill and the wrong tool. Nobody outside of IPng Networks itself needs to trust these certificates. A private CA, whose root certificate I distribute to exactly the NGINX nodes that need it, gives me complete control over validity periods, issuance policy, and revocation, with no external dependencies, fees, or audits. If I break it, I get to keep the broken parts, but at least nobody complains, and that’s the way I like it.
The moving parts of a CA are straightforward. The CA holds a key pair and a
self-signed root certificate that acts as the trust anchor. To issue a leaf certificate, the
normal flow is: the requestor generates a key pair, constructs a Certificate Signing Request
(CSR) containing their public key and proposed subject DN, and submits it. The CA validates the
CSR, signs a certificate with its own key, and returns a signed certificate (possibly uploading the
cert into a transparency log). The resulting certificate chain, the leaf cert signed by the CA’s
key, is what NGINX validates at handshake time against the CA file given to
ssl_client_certificate, exactly the thing I played with above in the openssl s_client command.
To make things a bit simpler, I decide to make ipng-nginx-auth skip the CSR step entirely and run
its enrollment server-side. The daemon generates the key pair, signs the certificate, and bundles
both into a password-protected [PKCS#12] (.p12) file and
an Apple .mobileconfig profile. Using my iPhone I can install the bundle from a single tap, and in
FireFox, Chrome, Safari I can import the P12 just once. My browser will now know, that if a
website demands mTLS, that they can look in their client certificate store, and select a cert that
belongs to the CA that NGINX is saying it needs a client identity from.
AuthN: Client TLS certificates
Enabling client certificate verification in NGINX requires just two directives inside a
server {} block. The ssl_client_certificate directive points to the CA chain PEM that NGINX
uses to validate the client’s certificate, and ssl_verify_client controls the strictness:
ssl_client_certificate /etc/ipng-nginx-auth/client-ca-chain.crt;
ssl_verify_client optional;

Setting ssl_verify_client on makes NGINX reject the TLS handshake entirely if no valid client
certificate is presented. This is the strictest posture but it is also inflexible: health check
endpoints, public landing pages, and any browser without a certificate all fail at the TLS
layer before NGINX logic runs, which yields a terrible user experience.
In the optional mode, NGINX sets $ssl_client_verify to SUCCESS when the certificate is valid
and signed by the configured CA, to NONE if no certificate was presented, or to a FAILED: reason
string otherwise. As I saw above, the verified subject DN and the serial are stored in variables.
This mode lets me enforce client certificates selectively per location {}, leaving health checks
and unauthenticated paths untouched while enforcing them on the locations that matter. It gives a
much better user experience.
While rummaging through the NGINX [docs],
I notice that NGINX also supports static Certificate Revocation List (CRL) checking via ssl_crl,
pointing to a PEM-encoded revocation list on disk. This works but kind of sucks, at the same time:
the CRL file must be distributed to every NGINX node and the process must reload for changes to take
effect. On a fleet of many frontends, a prompt revocation means scripting a copy and reload across
all nodes. But there is another way. An AuthZ sidecar could contain an in-memory certificate status
set, including the CRL, fed in real time from a central authd service over a streaming gRPC
subscription. But I get ahead of myself.. let me learn more about the NGINX parts first.
To wrap up, a candidate configuration that admits any browser presenting a CA-issued certificate that has not expired looks like this:
server {
listen 443 ssl;
ssl_certificate /etc/ssl/certs/server.crt;
ssl_certificate_key /etc/ssl/private/server.key;
ssl_client_certificate /etc/ipng-nginx-auth/client-ca-chain.crt;
ssl_verify_client optional;
location / {
if ($ssl_client_verify != "SUCCESS") {
return 403;
}
proxy_pass http://backend;
}
}
This configuration admits any browser holding a certificate from my private CA that has not
expired, regardless of who that user is or which path they are requesting. Pim, Alice, and that
asshole Bob all get through to the same backend. Some policy like “Alice can use /view/ but not
/admin/” is still not expressible here. A certificate gives me an identity; it does not give me
permissions. Getting from identity to fine-grained path-level access control is exactly the
Authorization problem, and that requires something beyond NGINX’s built-in TLS directives.
Authorization: NGINX Module
I find out that NGINX ships an
[auth_request module] that
fits this authorization gap like a glove. When auth_request is configured for a location, NGINX
makes an internal HTTP subrequest to a designated URI before serving the original request. The game
is laughably simple: If the subrequest returns HTTP 200, the original request proceeds. If it
returns HTTP 403, NGINX denies the client. Any context needed for the policy decision, the
client’s DN, certificate serial number, the original host and URI, and any additional information
like say the name of an ACL to evaluate, I can attach all of these as custom proxy headers on the
internal subrequest. It immediately strikes me as a very idiomatic way to implement AuthZ.
And what’s even better, the module is included by default in Debian’s nginx package. The
configuration pattern is:
set $ipng_authz_acl "wiki";
location / {
auth_request /.well-known/ipng/authz;
proxy_pass http://backend;
error_page 403 =302 https://ipng.ch/;
}
location = /.well-known/ipng/authz {
internal;
proxy_pass http://unix:/run/ipng-nginx-authz/authz.sock:/check?acl=$ipng_authz_acl;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-Client-Verify $ssl_client_verify;
proxy_set_header X-Client-Serial $ssl_client_serial;
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_set_header X-Client-Addr $remote_addr;
proxy_set_header X-Orig-URI $request_uri;
proxy_set_header X-Orig-Host $host;
}
First, a belt-and-suspenders comment: the internal flag prevents the authz location from being
reached directly by external clients. The proxy_pass constructs a Unix Domain Socket (UDS) path
upon which an external sidecar server listens, it passes all of these headers and then calls
/check?acl=$ipng_authz_acl, where the query parameter tells the sidecar which named ACL to evaluate.
If the authorization service returns 403, NGINX by default renders a bare error page, which is
kind of gross. In the location / handler, I can intercept that with a redirect to a friendlier
destination. Turning the 403 into a 302 redirect will send the user to the IPng homepage rather than
a cryptic error. For unauthenticated users, this approach could also serve as an implicit enrollment
prompt: the landing page might explain how to obtain and install a certificate, rather than leaving
them puzzled at a 403 Forbidden. Not everybody speaks HTTP, you know!
I also take a moment to appreciate that the auth_request module knows nothing about certificates,
ACLs, or users. It delegates the allow/deny decision to an external HTTP endpoint and respects the
response code. That separation of concerns is what makes the system composable: NGINX handles TLS
termination and extraction of the mTLS identity of the connecting client (the AuthN stuff). Then, I
can add an authorization sidecar to handle policy evaluation (the AuthZ stuff). Neither component
needs to understand the other’s internals.
IPng NGINX Auth
This is how I came to sketch a quick ACL language that would allow a concoction like above to pass along an internal AuthZ request to a sidecar running alongside NGINX. Its job is to receive ACLs from a central location, and answer questions “is cert SERIAL from distinguished name DN, coming from address IP, allowed to visit website HOST on uri path URI?”
Structure

Why think of new ways poorly, if you can steal somebody else’s approach? Swipe, Swiper, swipe!! My ACL evaluation
model is lightly inspired from [OpenBSD pf.conf], the packet
filter configuration language that, to my mind, remains one of the cleanest firewall policy
languages ever written. The key idea is simple: rules are evaluated in order, every matching rule
updates a running verdict, and the default verdict is deny. The quick keyword (which I call
terminate in my ACLs) short-circuits evaluation immediately and returns whatever is the current
verdict.
I make each named ACL an ordered sequence of rules. Each rule has a sequence number
(seq), perhaps I am a network engineer after all, and five optional match constraints, each
coming from the headers fed in by nginx:
hostregular expression, matching theX-Orig-Hostheaderuriregular expression, matching theX-Orig-URIheaderuserregular expression, matching theX-Client-DNheadercertliteral, matching theX-Client-SerialheaderprefixCIDR prefix, matching theX-Client-Addrheader (for IPv6 and IPv4).
These are accompanied by an action (permit or deny), and an optional terminate flag. A rule
matches when all of its set constraints match (AND-logic); an unset constraint matches anything.
Evaluation starts with deny, walks rules in ascending seq order, and each matching rule updates
the running verdict. A matching rule with terminate stops the walk immediately.
The authoring workflow needs to be staged because I can’t go publishing half-edited ACLs, I’m not
Cisco IOS or FRR after all. I decide to make rule edits accumulate in a staged version inside of the
central authd server, which running authz sidecars never see. The RPC and CLI command acl <name> commit atomically promotes the staged version to live and pushes it to every connected
sidecar on their Watch stream. Similarly, acl <name> rollback discards staged edits with no
fleet-visible effect. This gives a safe author-review-commit loop and ensures that partial edits are
never served to clients mid-edit.
authd: Centralized Auth Daemon
Now that I have the ACL language roughly shaped, my attention turns to authd, a centralized
control plane in one Go binary: two private CAs, a SQLite-backed object store, an ACL staging and
testing engine, and a gRPC server. The full API is defined in proto/auth.proto as a single
AuthService and covers four domains: CA management, user and certificate lifecycle, ACL authoring,
and the fleet distribution stream.
Two separate CAs are a deliberate design choice:
- control-plane CA signs the gRPC server certificate and the client certificates that
operators and
authzsidecars use to authenticate to the gRPC API, each with their own permissions. - client-auth CA signs the browser certificates that end users install; its chain PEM is what
NGINX trusts via
ssl_client_certificate.
Keeping the two CAs separate makes sense, it limits key loss blast-radius: a leaked sidecar mTLS credential cannot be presented as a browser identity, and a browser certificate cannot authenticate any gRPC call. NGINX trusts only the client-auth CA chain; the gRPC API trusts only the control-plane CA. And of course, seeing as this is an ACL system, having mTLS on the gRPC channels is required, I definitely do not want to have an open unauthenticated RPC endpoint to manipulate my authentication system…
But now I have created a bootstrapping problem: the gRPC API requires mTLS, but the mTLS certificates
come from the API. I think about this for a while, and decide to break the loop with three offline
subcommands that operate directly on the local SQLite database before any network service is running
(internal/bootstrap).
bootstrap databasecreates the schema;bootstrap cacreates both CAs and the daemon’s own gRPC server leaf;bootstrap client <name>mints the first operator identity for the gRPC server
Using this first client, every further credential can be issued online through the gRPC API and recorded in the database, making every identity auditable and revocable via the CLI.
The AuthService gRPC interface in proto/auth.proto exposes a CRUDL surface over four nouns CA,
User, Cert and ACL:
- ca defined in (
internal/authd/info.go,internal/authd/clients.go) handles CA info, the revocations in CRL, and control-plane client identities viaca client create|show|list|delete. Clients issued with theauthzrole are restricted to theWatchstream andReportEventsRPC; only anoperator-role certificate may invoke mutating calls. - user manages users by email with simple verbs like
create|show|list|delete|enable|disable. It is described ininternal/authd/user.go. - cert in (
internal/authd/certs.go) covers certificate lifecycle:cert createruns the full enrollment pipeline (ininternal/enroll) including server-side keygen, CA signing, PKCS#12 and Apple.mobileconfigassembly, and delivery to the operator. It discards the client’s private key. - acl family (in
internal/authd/acls.go) covers staged rule editing, along with acommit,rollback, and theacl testsimulator, which uses the same evaluation engine as the sidecar itself (seeinternal/acl).
Two streaming RPCs tie the system together. First, Watch pushes the committed ACLs and certificate
revocation snapshot (stream AuthzSnapshot) to every subscribed sidecar on connect, and pushes a
new snapshot on every policy change (internal/authd/dist.go). Secondly, ReportEvents is the
reverse: authz sidecars push their lifecycle events up to authd, which merges them with
its own and re-exposes the union through WatchEvents to operators.
authc: CLI
If you’ve followed along in previous articles, these gRPC endpoints are excellent companions for a
Command Line Interface. authc is the operator CLI for this system. It uses the same
[golang-cli] interactive shell library that drives the
evpnc CLI in [vpp-evpn] and maglevc in
[vpp-maglev). Invoked without arguments it drops into a
tab-completing interactive shell; invoked with a command it runs once and exits, useful for
scripting. The -json flag switches all output to JSON. The command tree mirrors the gRPC
API: ca, user, cert, acl, config, watch, with getters and setters. And I get all of this
almost for free, the CLI is 1200 Lines of Code all up.
Creating a user, issuing two certificates for two different devices, and wiring up an ACL that
gives the user general access but restricts /admin to only the workstation certificate looks
like this:
ipng-nginx-authc> user create alice@ipng.ch
created user "alice@ipng.ch"
ipng-nginx-authc> cert create alice@ipng.ch expire 1y
issued cert for alice@ipng.ch (cid A3F2..., expires 2027-06-27)
p12: alice_at_ipng.ch-A3F2.p12
mobileconfig: alice_at_ipng.ch-A3F2.mobileconfig
password: b9k2-xqmf-8vrp
ipng-nginx-authc> cert create alice@ipng.ch expire 1y
issued cert for alice@ipng.ch (cid 9C11..., expires 2027-06-27)
p12: alice_at_ipng.ch-9C11.p12
mobileconfig: alice_at_ipng.ch-9C11.mobileconfig
password: j4t7-nmzs-6wkh
ipng-nginx-authc> acl create wiki
ipng-nginx-authc> acl wiki seq 5 user pim@ipng.ch prefix 2001:687:d78:300::/62 permit terminate
ipng-nginx-authc> acl wiki seq 10 user alice@ipng.ch permit
ipng-nginx-authc> acl wiki seq 20 host wiki.ipng.ch uri ^/admin deny
ipng-nginx-authc> acl wiki seq 30 host wiki.ipng.ch uri ^/admin cert 9C11... permit terminate
ipng-nginx-authc> acl wiki commit
Let me go over this step by step, as it shows the philosophy of the OpenBSD pf.conf I was talking
about earlier. In this session Alice gets two certificates: A3F2 for her everyday laptop and
9C11 for her admin workstation. The ACL has four rules. The first one at seq 5 says: if any cert is
presented with CN=pim@ipng.ch from the internal IPv6 network in IPng Site Local, permit it and
stop evaluating rules (terminate). The following three rules are for Alice: seq 10 permits any
request from Alice’s email, giving both certificates broad access. seq 20 goes on to deny any
request to /admin, overriding the broad permit for that path. seq 30 re-permits /admin but
only for the exact certificate serial 9C11, and it locks in the verdict by issuing a terminate
statement.
The net result: Alice’s laptop (A3F2) reaches everything except /admin; while her workstation
(9C11) reaches everything including /admin. My certificates get full access, but only if I come
from a specific Wireguard VPN pool.
I built a handy acl test simulator which lets me verify this before committing, using the
identical evaluation engine the sidecar runs:
ipng-nginx-authc> acl test wiki user alice@ipng.ch cert A3F2 https://wiki.ipng.ch/admin/settings detail
result: deny
reason: matched seq 20
ipng-nginx-authc> acl test wiki user alice@ipng.ch cert 9C11 https://wiki.ipng.ch/admin/settings detail
result: permit
reason: matched seq 30 (terminate)
NGINX
With authd holding the policy and authc authoring it, the remaining piece is getting that policy
enforced on each NGINX server, of course without shipping the CA keys or the SQLite database
there, and with zero-dependencies, and with high performance. Standard issue distributed evaluation
stuff.
authz: Minimalistic Sidecar
My final component is authz, a sidecar that runs one instance per NGINX node, answers auth_request
subrequests over a local Unix Domain Socket, and stays synchronized with authd by holding a
streaming gRPC Watch subscription and receiving ACL snapshots. It is entirely stateless: no
database, no CA keys, nothing on disk except its own mTLS identity and static configuration pointing
it to the authd central registry.
After startup, it fetches the AuthzSnapshot which contains all live ACLs and the complete certificate
status set including the CRL and any disabled certificates and users. It then simply begins
answering requests over the unix domain socket. There’s no TCP listener, and I do this on purpose.
Not only is UDS faster, it also leaves much less room for bad guys to tinker with the AuthZ service.
authz: resilience
The authz-role mTLS certificate is scoped at the gRPC layer to Watch and ReportEvents only;
it can’t create users, issue certificates, or modify ACLs. Even with code execution inside authz,
an attacker gains nothing actionable on the control plane, and no user keys or certificates.
authd taking the day (of week) off means authz keeps serving its last-known state, it does not
fail open. As Adam from North of the Border would say: “That’s not just good, it’s good enough.”
Subsequent snapshots arrive as pushed updates over the same Watch stream. When an operator
commits an ACL change, authd pushes a new AuthzSnapshot to every connected sidecar, which
swaps its in-memory ACL set atomically via a sync/atomic.Pointer. If the gRPC stream drops, there
is no problem. authz just keeps serving its last-known live set and reconnects automatically when
authd returns. An authd outage does not interrupt request serving; it only prevents new policy
changes from propagating until connectivity is restored.
authz: performance
I ran a load test on Nimbus, an older Ryzen 5950X machine, against a standalone authz using a
-acl-file flag, which loads a static AuthzSnapshot with no control plane involved and isolates
the dataplane decision path. The table shows requests per second at varying ACL rule counts
(terms) and thread counts (thr):
terms thr req/s avg p50 p90 p95 p98 p99 max
1 1 13745 72us 73us 81us 89us 98us 108us 812us
1 2 28611 69us 66us 85us 93us 108us 138us 808us
1 4 56592 70us 66us 98us 114us 138us 225us 1.1ms
1 8 114024 69us 60us 93us 114us 225us 301us 911us
1 16 141096 112us 89us 185us 287us 491us 597us 1.6ms
10 1 13664 72us 73us 81us 85us 93us 108us 741us
10 2 27462 72us 70us 85us 93us 114us 152us 874us
10 4 54301 73us 70us 103us 119us 152us 236us 922us
10 8 110441 71us 63us 98us 119us 236us 316us 974us
10 16 138105 115us 89us 185us 287us 491us 597us 1.7ms
100 1 10288 96us 98us 108us 114us 125us 138us 836us
100 2 20946 94us 93us 114us 119us 138us 168us 1.2ms
100 4 47938 82us 81us 119us 138us 168us 225us 1.1ms
100 8 95498 83us 73us 108us 138us 236us 316us 1.0ms
100 16 119828 132us 108us 214us 316us 516us 627us 1.7ms
1000 1 3033 329us 332us 349us 366us 366us 385us 953us
1000 2 6009 332us 332us 385us 404us 424us 445us 1.7ms
1000 4 20954 190us 176us 273us 316us 349us 385us 878us
1000 8 36658 217us 204us 316us 366us 424us 491us 1.2ms
1000 16 50304 317us 287us 468us 568us 762us 882us 2.4ms
At the realistic operating point of 10 ACL rules and 8 threads, authz turns around about
110k requests per second at a median latency of 70 microseconds! Even at 1000 rules and 16
threads the throughput holds at 50k req/s with a median under 300 microseconds. Considering the
NGINX instances at IPng run at between 1k-5k req/s, this is more than enough. Slick!
Logging / Observability
Every component writes structured JSON to its local journal via log/slog. authz additionally
pushes its lifecycle events up to authd over the ReportEvents streaming RPC
(internal/authz/reporter.go) on the same mTLS gRPC connection: ACL snapshot applied,
control-plane connected and disconnected, startup and shutdown. authd tags each event with the
originating sidecar’s identity, merges it with its own events, and re-exposes the union through
WatchEvents. An operator running ipng-nginx-authc watch events sees the whole fleet in one
stream, filtered optionally by event type, log level, or origin node.
Per-request permit/deny decisions are deliberately excluded from this stream. A busy NGINX node
evaluates hundreds of auth_request calls per second; logging each one would produce logspam
that buries lifecycle events and overwhelms the gRPC uplink. Instead, decision outcomes are
counted in Prometheus metrics on authz at :8299. These counters and histograms
break down outcomes by permit or deny and ACL name. Information is also shared on the connected
authz clients, and which version of the snapshot they are serving.
When I need to trace individual decisions for a specific site, I can enable per-request logging
by appending &debug to the ACL name in NGINX config (set $ipng_authz_acl "wiki&debug";), or
alternatively I can enable logging for a given ACL name (authc acl test logging enable). authz
then logs that verdict at info level, which reaches the local journal and streams up through
authc watch events. On a healthy node in normal operation nothing is logged per request, though,
Prometheus carries the signal.
Results
Take a look at this asciinema recording: In it, you can see the daily operations at play:
- configuring a website to use AuthN/AuthZ with two simple includes
- with
authc, creating a user, a cert, an ACL referencing it - a
curlclient trying to use the web service with and without certificates presented - realtime ACL changes to permit/deny a user and a specific certificate
It should give you a rough feeling for how this project is intended to work:
What’s next
The immediate next step is deploying ipng-nginx-auth to the IPng Networks production NGINX
frontends. I plan to start with the EVPN status page and the BGP looking glass, where the user
population is small and access patterns are well understood, and then progressively migrate the
billing console and management tools from geo-IP fencing to per-user ACLs.
A few items are on the roadmap. Email delivery of enrollment artifacts is currently out of band:
authc cert create writes the .p12 and .mobileconfig to a local directory and the operator
delivers them to the user by other means. A future version could have authd mail the bundle
directly. Control-plane high availability is another gap: authd is a single VM today, and
while the authz sidecars survive its absence indefinitely, no new policy changes or
certificate enrollments can happen while it is down. A warm standby with SQLite replication is
the most likely path. Finally, the current enrollment model uses server-side keygen for ease of
use; a CSR import path for power users who want to generate their own key material would be a
natural addition.
In the mean time you can take a look at the developing code on [git.ipng.ch]: