Certificate Transparency - Part 3 - Operations

ctlog logo

Introduction

There once was a Dutch company called [DigiNotar], as the name suggests it was a form of digital notary, and they were in the business of issuing security certificates. Unfortunately, in June of 2011, their IT infrastructure was compromised and subsequently it issued hundreds of fraudulent SSL certificates, some of which were used for man-in-the-middle attacks on Iranian Gmail users. Not cool.

Google launched a project called Certificate Transparency, because it was becoming more common that the root of trust given to Certification Authorities could no longer be unilaterally trusted. These attacks showed that the lack of transparency in the way CAs operated was a significant risk to the Web Public Key Infrastructure. It led to the creation of this ambitious [project] to improve security online by bringing accountability to the system that protects our online services with SSL (Secure Socket Layer) and TLS (Transport Layer Security).

In 2013, [RFC 6962] was published by the IETF. It describes an experimental protocol for publicly logging the existence of Transport Layer Security (TLS) certificates as they are issued or observed, in a manner that allows anyone to audit certificate authority (CA) activity and notice the issuance of suspect certificates as well as to audit the certificate logs themselves. The intent is that eventually clients would refuse to honor certificates that do not appear in a log, effectively forcing CAs to add all issued certificates to the logs.

In the first two articles of this series, I explored [Sunlight] and [TesseraCT], two open source implementations of the Static CT protocol. In this final article, I’ll share the details on how I created the environment and production instances for four logs that IPng will be providing: Rennet and Lipase are two ingredients to make cheese and will serve as our staging/testing logs. Gouda and Halloumi are two delicious cheeses that pay hommage to our heritage, Jeroen and I being Dutch and Antonis being Greek.

Hardware

At IPng Networks, all hypervisors are from the same brand: Dell’s Poweredge line. In this project, Jeroen is also contributing a server, and it so happens that he also has a Dell Poweredge. We’re both running Debian on our hypervisor, so we install a fresh VM with Debian 13.0, codenamed Trixie, and give the machine 16GB of memory, 8 vCPU and a 16GB boot disk. Boot disks are placed on the hypervisor’s ZFS pool, and a blockdevice snapshot is taken every 6hrs. This allows the boot disk to be rolled back to a last known good point in case an upgrade goes south. If you haven’t seen it yet, take a look at [zrepl], a one-stop, integrated solution for ZFS replication. This tool is incredibly powerful, and can do snapshot management, sourcing / sinking to remote hosts, of course using incremental snapshots as they are native to ZFS.

Once the machine is up, we pass four enterprise-class storage drives, in our case 3.84TB Kioxia NVMe, model KXD51RUE3T84 which are PCIe 3.1 x4 lanes, and NVMe 1.2.1 specification with a good durability and reasonable (albeit not stellar) read throughput of ~2700MB/s, write throughput of ~800MB/s with 240 kIOPS random read and 21 kIOPS random write. My attention is also drawn to a specific specification point: these drives allow for 1.0 DWPD, which stands for Drive Writes Per Day, in other words they are not going to run themselves off a cliff after a few petabytes of writes, and I am reminded that a CT Log wants to write to disk a lot during normal operation.

The point of these logs is to keep them safe, and the most important aspects of the compute environment are the use of ECC memory to detect single bit errors, and dependable storage. Toshiba makes a great product.

ctlog1:~$ sudo zpool create -f -o ashift=12 -o autotrim=on -O atime=off -O xattr=sa \
               ssd-vol0 raidz2 /dev/disk/by-id/nvme-KXD51RUE3T84_TOSHIBA_*M
ctlog1:~$ sudo zfs create -o encryption=on -o keyformat=passphrase ssd-vol0/enc
ctlog1:~$ sudo zfs create ssd-vol0/logs
ctlog1:~$ for log in lipase; do \
    for shard in 2025h2 2026h1 2026h2 2027h1 2027h2; do \
      sudo zfs create ssd-vol0/logs/${log}${shard} \
    done \
  done

The hypervisor will use PCI passthrough for the NVMe drives, and we’ll handle ZFS directly on the VM. The first command creates a ZFS raidz2 pool using 4kB blocks, turns of atime (which avoids one metadata write for each read!), and turns on SSD trimming in ZFS, a very useful feature.

Then I’ll create an encrypted volume for the configuration and key material. This way, if the machine is ever physically transported, the keys will be safe in transit. Finally, I’ll create the temporal log shards starting at 2025h2, all the way through to 2027h2 for our testing log called Lipase and our production log called Halloumi on Jeroen’s machine. On my own machine, it’ll be Rennet for the testing log and Gouda for the production log.

Sunlight

Sunlight logo

I set up Sunlight first. as its authors have extensive operational notes both in terms of the [config] of Geomys’ Tuscolo log, as well as on the [Sunlight] homepage. I really appreciate that Filippo added some [Gists] and [Doc] with pretty much all I need to know to run one too. Our Rennet and Gouda logs use very similar approach for their configuration, with one notable exception: the VMs do not have a public IP address, and are tucked away in a private network called IPng Site Local. I’ll get back to that later.

ctlog@ctlog0:/ssd-vol0/enc/sunlight$ cat << EOF | tee sunlight-staging.yaml
listen:
  - "[::]:16420"
checkpoints: /ssd-vol0/shared/checkpoints.db
logs:
  - shortname: rennet2025h2
    inception: 2025-07-28
    period: 200
    poolsize: 750
    submissionprefix: https://rennet2025h2.log.ct.ipng.ch
    monitoringprefix: https://rennet2025h2.mon.ct.ipng.ch
    ccadbroots: testing
    extraroots: /ssd-vol0/enc/sunlight/extra-roots-staging.pem
    secret: /ssd-vol0/enc/sunlight/keys/rennet2025h2.seed.bin
    cache: /ssd-vol0/logs/rennet2025h2/cache.db
    localdirectory: /ssd-vol0/logs/rennet2025h2/data
    notafterstart: 2025-07-01T00:00:00Z
    notafterlimit: 2026-01-01T00:00:00Z
...
EOF
ctlog@ctlog0:/ssd-vol0/enc/sunlight$ cat << EOF | tee skylight-staging.yaml
listen:
  - "[::]:16421"
homeredirect: https://ipng.ch/s/ct/
logs:
  - shortname: rennet2025h2
    monitoringprefix: https://rennet2025h2.mon.ct.ipng.ch
    localdirectory: /ssd-vol0/logs/rennet2025h2/data
    staging: true
...

In the first configuration file, I’ll tell Sunlight (the write path component) to listen on port :16420 and I’ll tell Skylight (the read path component) to listen on port :16421. I’ve disabled the automatic certificate renewals, and will handle SSL upstream. A few notes on this:

  1. Most importantly, I will be using a common frontend pool with a wildcard certificate for *.ct.ipng.ch. I wrote about [DNS-01] before, it’s a very convenient way for IPng to do certificate pool management. I will be sharing certificate for all log types under this certificate.
  2. ACME/HTTP-01 could be made to work with a bit of effort; plumbing through the /.well-known/ URIs on the frontend and pointing them to these instances. But then the cert would have to be copied from Sunlight back to the frontends.

I’ve noticed that when the log doesn’t exist yet, I can start Sunlight and it’ll create the bits and pieces on the local filesystem and start writing checkpoints. But if the log already exists, I am required to have the monitoringprefix active, otherwise Sunlight won’t start up. It’s a small thing, as I will have the read path operational in a few simple steps. Anyway, all five logshards for Rennet, and a few days later, for Gouda, are operational this way.

Skylight provides all the things I need to serve the data back, which is a huge help. The [Static Log Spec] is very clear on things like compression, content-type, cache-control and other headers. Skylight makes this a breeze, as it reads a configuration file very similar to the Sunlight write-path one, and takes care of it all for me.

TesseraCT

TesseraCT logo

Good news came to our community on August 14th, when Google’s TrustFabric team announced their Alpha milestone of [TesseraCT]. This release also moved the POSIX variant from experimental alongside the already further along GCP and AWS personalities. After playing around with it with Al and the team, I think I’ve learned enough to get us going in a public tesseract-posix instance.

One thing I liked about Sunlight is its compact YAML file that described the pertinent bits of the system, and that I can serve any number of logs with the same process. On the other hand, TesseraCT can serve only one log per process. Both have pro’s and con’s, notably if any poisonous submission would be offered, Sunlight might take down all logs, while TesseraCT would only take down the log receiving the offensive submission. On the other hand, maintaining separate processes is cumbersome, and all log instances need to be meticulously configured.

TesseraCT genconf

I decide to automate this by vibing a little tool called tesseract-genconf, which I’ve published on [Gitea]. What it does is take a YAML file describing the logs, and outputs the bits and pieces needed to operate multiple separate processes that together form the sharded static log. I’ve attempted to stay mostly compatible with the Sunlight YAML configuration, and came up with a variant like this one:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat << EOF | tee tesseract-staging.yaml
listen:
 - "[::]:8080"
roots: /ssd-vol0/enc/tesseract/roots.pem
logs:
  - shortname: lipase2025h2
    listen: "[::]:16900"
    submissionprefix: https://lipase2025h2.log.ct.ipng.ch
    monitoringprefix: https://lipase2025h2.mon.ct.ipng.ch
    extraroots: /ssd-vol0/enc/tesseract/extra-roots-staging.pem
    secret: /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem
    localdirectory: /ssd-vol0/logs/lipase2025h2/data
    notafterstart: 2025-07-01T00:00:00Z
    notafterlimit: 2026-01-01T00:00:00Z
...
EOF

With this snippet, I have all the information I need. Here’s the steps I take to construct the log itself:

1. Generate keys

The keys are prime256v1 and the format that TesseraCT accepts did change since I wrote up my first [deep dive] a few weeks ago. Now, the tool accepts a PEM format private key, from which the Log ID and Public Key can be derived. So off I go:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-key
Creating /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem
Creating /ssd-vol0/enc/tesseract/keys/lipase2026h1.pem
Creating /ssd-vol0/enc/tesseract/keys/lipase2026h2.pem
Creating /ssd-vol0/enc/tesseract/keys/lipase2027h1.pem
Creating /ssd-vol0/enc/tesseract/keys/lipase2027h2.pem

Of course, if a file already exists at that location, it’ll just print a warning like:

Key already exists: /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem (skipped)

2. Generate JSON/HTML

I will be operating the read-path with NGINX. Log operators have started speaking about their log metadata in terms of a small JSON file called log.v3.json, and Skylight does a good job of exposing that one, alongside all the other pertinent metadata. So I’ll generate these files for each of the logs:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-html
Creating /ssd-vol0/logs/lipase2025h2/data/index.html
Creating /ssd-vol0/logs/lipase2025h2/data/log.v3.json
Creating /ssd-vol0/logs/lipase2026h1/data/index.html
Creating /ssd-vol0/logs/lipase2026h1/data/log.v3.json
Creating /ssd-vol0/logs/lipase2026h2/data/index.html
Creating /ssd-vol0/logs/lipase2026h2/data/log.v3.json
Creating /ssd-vol0/logs/lipase2027h1/data/index.html
Creating /ssd-vol0/logs/lipase2027h1/data/log.v3.json
Creating /ssd-vol0/logs/lipase2027h2/data/index.html
Creating /ssd-vol0/logs/lipase2027h2/data/log.v3.json
TesseraCT Lipase Log

It’s nice to see a familiar look-and-feel for these logs appear in those index.html (which all cross-link to each other within the logs specificied in tesseract-staging.yaml, which is dope.

3. Generate Roots

Antonis had seen this before (thanks for the explanation!) but TesseraCT does not natively implement fetching of the [CCADB] roots. But, he points out, you can just get them from any other running log instance, so I’ll implement a gen-roots command:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf gen-roots \
  --source https://tuscolo2027h1.sunlight.geomys.org --output production-roots.pem
Fetching roots from: https://tuscolo2027h1.sunlight.geomys.org/ct/v1/get-roots
2025/08/25 08:24:58 Warning: Failed to parse certificate,carefully skipping: x509: negative serial number
Creating production-roots.pem
Successfully wrote 248 certificates to tusc.pem (out of 249 total)

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf gen-roots \
  --source https://navigli2027h1.sunlight.geomys.org --output testing-roots.pem
Fetching roots from: https://navigli2027h1.sunlight.geomys.org/ct/v1/get-roots
Creating testing-roots.pem
Successfully wrote 82 certificates to tusc.pem (out of 82 total)

I can do this regularly, say daily, in a cronjob and if the files were to change, restart the TesseraCT processes. It’s not ideal (because the restart might be briefly disruptive), but it’s a reasonable option for the time being.

4. Generate TesseraCT cmdline

I will be running TesseraCT as a templated unit in systemd. These are system unit files that have an argument, they will have an @ in their name, like so:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat << EOF | sudo tee /lib/systemd/system/tesseract@.service 
[Unit]
Description=Tesseract CT Log service for %i
ConditionFileExists=/ssd-vol0/logs/%i/data/.env
After=network.target

[Service]
# The %i here refers to the instance name, e.g., "lipase2025h2"
# This path should point to where your instance-specific .env files are located
EnvironmentFile=/ssd-vol0/logs/%i/data/.env
ExecStart=/home/ctlog/bin/tesseract-posix $TESSERACT_ARGS
User=ctlog
Group=ctlog
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

I can now implement a gen-env command for my tool:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-env
Creating /ssd-vol0/logs/lipase2025h2/data/roots.pem
Creating /ssd-vol0/logs/lipase2025h2/data/.env
Creating /ssd-vol0/logs/lipase2026h1/data/roots.pem
Creating /ssd-vol0/logs/lipase2026h1/data/.env
Creating /ssd-vol0/logs/lipase2026h2/data/roots.pem
Creating /ssd-vol0/logs/lipase2026h2/data/.env
Creating /ssd-vol0/logs/lipase2027h1/data/roots.pem
Creating /ssd-vol0/logs/lipase2027h1/data/.env
Creating /ssd-vol0/logs/lipase2027h2/data/roots.pem
Creating /ssd-vol0/logs/lipase2027h2/data/.env

Looking at one of those .env files, I can show the exact commandline I’ll be feeding to the tesseract-posix binary:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat /ssd-vol0/logs/lipase2025h2/data/.env
TESSERACT_ARGS="--private_key=/ssd-vol0/enc/tesseract/keys/lipase2025h2.pem 
  --origin=lipase2025h2.log.ct.ipng.ch --storage_dir=/ssd-vol0/logs/lipase2025h2/data
  --roots_pem_file=/ssd-vol0/logs/lipase2025h2/data/roots.pem --http_endpoint=[::]:16900"
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Warning
A quick operational note on OpenTelemetry (also often referred to as Otel): Al and the TrustFabric team added open telemetry to the TesseraCT personalities, as it was mostly already implemented in the underlying Tessera library. By default, it’ll try to send its telemetry to localhost using https, which makes sense in those cases where the collector is on a different machine. In my case, I’ll keep otelcol (the collector) on the same machine. Its job is to consume the Otel telemetry stream, and turn those back into Prometheus /metrics endpoint on port :9464.

The gen-env command also assembles the per-instance roots.pem file. For staging logs, it’ll take the file pointed to by the roots: key, and append any per-log extraroots: files. For me, these extraroots are empty and the main roots file points at either the testing roots that came from Rennet (our Sunlight staging log), or the production roots that came from Gouda. A job well done!

5. Generate NGINX

When I first ran my tests, I noticed that the log check tool called ct-fsck threw errors on my read path. Filippo explained that the HTTP headers matter in the Static CT specification. Tiles, Issuers, and Checkpoint must all have specific caching and content type headers set. This is what makes Skylight such a gem - I get to read it (and the spec!) to see what I’m supposed to be serving.

And thus, gen-nginx command is born, and listens on port :8080 for requests:

ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-nginx
Creating nginx config: /ssd-vol0/logs/lipase2025h2/data/lipase2025h2.mon.ct.ipng.ch.conf
Creating nginx config: /ssd-vol0/logs/lipase2026h1/data/lipase2026h1.mon.ct.ipng.ch.conf
Creating nginx config: /ssd-vol0/logs/lipase2026h2/data/lipase2026h2.mon.ct.ipng.ch.conf
Creating nginx config: /ssd-vol0/logs/lipase2027h1/data/lipase2027h1.mon.ct.ipng.ch.conf
Creating nginx config: /ssd-vol0/logs/lipase2027h2/data/lipase2027h2.mon.ct.ipng.ch.conf

All that’s left for me to do is symlink these from /etc/nginx/sites-enabled/ and the read-path is off to the races. With these commands in the tesseract-genconf tool, I am hoping that future travelers have an easy time setting up their static log. Please let me know if you’d like to use, or contribute, to the tool. You can find me in the Transparency Dev Slack, in #ct and also #cheese.

IPng Frontends

ctlog at ipng

IPng Networks has a private internal network called [IPng Site Local], which is not routed on the internet. Our [Frontends] are the only things that have public IPv4 and IPv6 addresses. It allows for things like anycasted webservers and loadbalancing with [Maglev].

The IPng Site Local network kind of looks like the picture to the right. The hypervisors running the Sunlight and TesseraCT logs are at NTT Zurich1 in Rümlang, Switzerland. The IPng frontends are in green, and the sweet thing is, some of them run in IPng’s own ISP network (AS8298), while others run in partner networks (like IP-Max AS25091, and Coloclue AS8283). This means that I will benefit from some pretty solid connectivity redundancy.

The frontends are provisioned with Ansible. There are two aspects to them - firstly, a certbot instance maintains the Let’s Encrypt wildcard certificates for *.ct.ipng.ch. There’s a machine tucked away somewhere called lego.net.ipng.ch – again, not exposed on the internet – and its job is to renew certificates and copy them to the machines that need them. Next, a cluster of NGINX servers uses these certificates to expose IPng and customer services to the Internet.

I can tie it all together with a snippet like so, for which I apologize in advance - it’s quite a wall of text:

map $http_user_agent $no_cache_ctlog_lipase {
  "~*TesseraCT fsck" 1;
  default 0;
}

server {
  listen [::]:443 ssl http2;
  listen 0.0.0.0:443 ssl http2;
  ssl_certificate /etc/certs/ct.ipng.ch/fullchain.pem;
  ssl_certificate_key /etc/certs/ct.ipng.ch/privkey.pem;
  include /etc/nginx/conf.d/options-ssl-nginx.inc;
  ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;

  server_name lipase2025h2.log.ct.ipng.ch;
  access_log /nginx/logs/lipase2025h2.log.ct.ipng.ch-access.log upstream buffer=512k flush=5s;
  include /etc/nginx/conf.d/ipng-headers.inc;

  location = / {
    proxy_http_version 1.1;
    proxy_set_header Host lipase2025h2.mon.ct.ipng.ch;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_pass http://ctlog1.net.ipng.ch:8080/index.html;
  }

  location = /metrics {
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_pass http://ctlog1.net.ipng.ch:9464;
  }

  location / {
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_pass http://ctlog1.net.ipng.ch:16900;
  }
}

server {
  listen [::]:443 ssl http2;
  listen 0.0.0.0:443 ssl http2;
  ssl_certificate /etc/certs/ct.ipng.ch/fullchain.pem;
  ssl_certificate_key /etc/certs/ct.ipng.ch/privkey.pem;
  include /etc/nginx/conf.d/options-ssl-nginx.inc;
  ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;

  server_name lipase2025h2.mon.ct.ipng.ch;
  access_log /nginx/logs/lipase2025h2.mon.ct.ipng.ch-access.log upstream buffer=512k flush=5s;
  include /etc/nginx/conf.d/ipng-headers.inc;

  location = /checkpoint {
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_pass http://ctlog1.net.ipng.ch:8080;
  }

  location / {
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    include /etc/nginx/conf.d/ipng-upstream-headers.inc;
    proxy_cache ipng_cache;
    proxy_cache_key "$scheme://$host$request_uri";
    proxy_cache_valid 200 24h;
    proxy_cache_revalidate off;
    proxy_cache_bypass $no_cache_ctlog_lipase;
    proxy_no_cache $no_cache_ctlog_lipase;

    proxy_pass http://ctlog1.net.ipng.ch:8080;
  }
}

Taking Lipase shard 2025h2 as an example, The submission path (on *.log.ct.ipng.ch) will show the same index.html as the monitoring path (on *.mon.ct.ipng.ch), to provide some consistency with Sunlight logs. Otherwise, the /metrics endpoint is forwarded to the otelcol running on port :9464, and the rest (the /ct/v1/ and so on) are sent to the first port :16900 of the TesseraCT.

Then the read-path makes a special-case of the /checkpoint endpoint, which it does not cache. That request (as all others) are forwarded to port :8080 which is where NGINX is running. Other requests (notably /tile and /issuer) are cacheable, so I’ll cache these on the upstream NGINX servers, both for resilience as well as for performance. Having four of these NGINX upstream will allow the Static CT logs (regardless of being Sunlight or TesseraCT) to serve very high read-rates.

What’s Next

I need to spend a little bit of time thinking about rate limits, specifically write-ratelimits. I think I’ll use a request limiter in upstream NGINX, to allow for each IP or /24 or /48 subnet to only send a fixed number of requests/sec. I’ll probably keep that part private though, as it’s a good rule of thumb to never offer information to attackers.

Together with Antonis Chariton and Jeroen Massar, IPng Networks will be offering both TesseraCT and Sunlight logs on the public internet. One final step is to productionize both logs, and file the paperwork for them in the community. At this point our Sunlight log has been running for a month or so, and we’ve filed the paperwork for it to be included at Apple and Google.

I’m going to have folks poke at Lipase as well, after which I’ll try to run a few ct-fsck to make sure the logs are sane, before offering them into the inclusion program as well. Wish us luck!