
Introduction
There once was a Dutch company called [DigiNotar], as the name suggests it was a form of digital notary, and they were in the business of issuing security certificates. Unfortunately, in June of 2011, their IT infrastructure was compromised and subsequently it issued hundreds of fraudulent SSL certificates, some of which were used for man-in-the-middle attacks on Iranian Gmail users. Not cool.
Google launched a project called Certificate Transparency, because it was becoming more common that the root of trust given to Certification Authorities could no longer be unilateraly trusted. These attacks showed that the lack of transparency in the way CAs operated was a significant risk to the Web Public Key Infrastructure. It led to the creation of this ambitious [project] to improve security online by bringing accountability to the system that protects our online services with SSL (Secure Socket Layer) and TLS (Transport Layer Security).
In 2013, [RFC 6962] was published by the IETF. It describes an experimental protocol for publicly logging the existence of Transport Layer Security (TLS) certificates as they are issued or observed, in a manner that allows anyone to audit certificate authority (CA) activity and notice the issuance of suspect certificates as well as to audit the certificate logs themselves. The intent is that eventually clients would refuse to honor certificates that do not appear in a log, effectively forcing CAs to add all issued certificates to the logs.
This series explores and documents how IPng Networks will be running two Static CT Logs with two different implementations. One will be [Sunlight], and the other will be [TesseraCT].
Static Certificate Transparency
In this context, Logs are network services that implement the protocol operations for submissions and queries that are defined in a specification that builds on the previous RFC. A few years ago, my buddy Antonis asked me if I would be willing to run a log, but operationally they were very complex and expensive to run. However, over the years, the concept of Static Logs put running one in reach. This [Static CT API] defines a read-path HTTP static asset hierarchy (for monitoring) to be implemented alongside the write-path RFC 6962 endpoints (for submission).
Aside from the different read endpoints, a log that implements the Static API is a regular CT log that can work alongside RFC 6962 logs and that fulfills the same purpose. In particular, it requires no modification to submitters and TLS clients.
If you only read one document about Static CT, read Filippo Valsorda’s excellent [paper]. It describes a radically cheaper and easier to operate [Certificate Transparency] log that is backed by a consistent object storage, and can scale to 30x the current issuance rate for 2-10% of the costs with no merge delay.
Scalable, Cheap, Reliable: choose two
In the diagram, I’ve drawn an overview of IPng’s network. In red a european backbone network is provided by a [BGP Free Core network]. It operates a private IPv4, IPv6, and MPLS network, called IPng Site Local, which is not connected to the internet. On top of that, IPng offers L2 and L3 services, for example using [VPP].
In green I built a cluster of replicated NGINX frontends. They connect into IPng Site Local and can reach all hypervisors, VMs, and storage systems. They also connect to the Internet with a single IPv4 and IPv6 address. One might say that SSL is added and removed here :-) [ref].
Then in orange I built a set of [MinIO] S3 storage pools. Amongst others, I serve the static content from the IPng website from these pools, providing fancy redundancy and caching. I wrote about its design in [this article].
Finally, I turn my attention to the blue which is two hypervisors, one run by [IPng] and the other by [Massar]. Each of them will be running one of the Log implementations. IPng provides two large ZFS storage tanks for offsite backup, in case a hypervisor decides to check out, and daily backups to an S3 bucket using Restic.
Having explained all of this, I am well aware that end to end reliability will be coming from the fact that there are many independent Log operators, and folks wanting to validate certificates can simply monitor many. If there is a gap in coverage, say due to any given Log’s downtime, this will not necessarily be problematic. It does mean that I may have to suppress the SRE in me…
MinIO
My first instinct is to leverage the distributed storage IPng has, but as I’ll show in the rest of this article, maybe a simpler, more elegant design could be superior, precisely because individual log reliability is not as important as having many available log instances to choose from.
From operators in the field I understand that the world-wide generation of certificates is roughly 17M/day, which amounts of some 200-250qps of writes. Antonis explains that certs with a validity if 180 days or less will need two CT log entries, while certs with a validity more than 180d will need three CT log entries. So the write rate is roughly 2.2x that, as an upper bound.
My first thought is to see how fast my open source S3 machines can go, really. I’m curious also as to the difference between SSD and spinning disks.
I boot two Dell R630s in the Lab. These machines have two Xeon E5-2640 v4 CPUs for a total of 20 cores and 40 threads, and 512GB of DDR4 memory. They also sport a SAS controller. In one machine I place 6pcs 1.2TB SAS3 disks (HPE part number EG1200JEHMC), and in the second machine I place 6pcs of 1.92TB enterprise storage (Samsung part number P1633N19).
I spin up a 6-device MinIO cluster on both and take them out for a spin using [S3 Benchmark] from Wasabi Tech.
pim@ctlog-test:~/src/s3-benchmark$ for dev in disk ssd; do \
for t in 1 8 32; do \
for z in 4M 1M 8k 4k; do \
./s3-benchmark -a $KEY -s $SECRET -u http://minio-$dev:9000 -t $t -z $z \
| tee -a minio-results.txt; \
done; \
done; \
done
The loadtest above does a bunch of runs with varying parameters. First it tries to read and write object sizes of 4MB, 1MB, 8kB and 4kB respectively. Then it tries to do this with either 1 thread, 8 threads or 32 threads. Finally it tests both the disk-based variant as well as the SSD based one. The loadtest runs from a third machine, so that the Dell R630 disk tanks can stay completely dedicated to their task of running MinIO.

The left-hand side graph feels pretty natural to me. With one thread, uploading 8kB objects will quickly hit the IOPS rate of the disks, each of which have to participate in the write due to EC:3 encoding when using six disks, and it tops out at ~56 PUT/s. The single thread hitting SSDs will not hit that limit, and has ~371 PUT/s which I found a bit underwhelming. But, when performing the loadtest with either 8 or 32 write threads, the hard disks become only marginally faster (topping out at 240 PUT/s), while the SSDs really start to shine, with 3850 PUT/s. Pretty good performance.
On the read-side, I am pleasantly surprised that there’s not really that much of a difference between disks and SSDs. This is likely because the host filesystem cache is playing a large role, so the 1-thread performance is equivalent (765 GET/s for disks, 677 GET/s for SSDs), and the 32-thread performance is also equivalent (at 7624 GET/s for disks with 7261 GET/s for SSDs). I do wonder why the hard disks consistently outperform the SSDs with all the other variables (OS, MinIO version, hardware) the same.
Sidequest: SeaweedFS
Something that has long caught my attention is the way in which [SeaweedFS] approaches blob storage. Many operators have great success with many small file writes in SeaweedFS compared to MinIO and even AWS S3 storage. This is because writes with WeedFS are not broken into erasure-sets, which would require every disk to write a small part or checksum of the data, but rather files are replicated within the cluster in their entirety on different disks, racks or datacenters. I won’t bore you with the details of SeaweedFS but I’ll tack on a docker [compose file] that I used at the end of this article, if you’re curious.

In the write-path, SeaweedFS dominates in all cases, due to its different way of achieving durable storage (per-file replication in SeaweedFS versus all-disk erasure-sets in MinIO):
- 4k: 3,384 ops/sec vs MinIO’s 111 ops/sec (30x faster!)
- 8k: 3,332 ops/sec vs MinIO’s 111 ops/sec (30x faster!)
- 1M: 383 ops/sec vs MinIO’s 44 ops/sec (9x faster)
- 4M: 104 ops/sec vs MinIO’s 32 ops/sec (4x faster)
For the read-path, in GET operations MinIO is better at small objects, and really dominates the large objects:
- 4k: 7,411 ops/sec vs SeaweedFS 5,014 ops/sec
- 8k: 7,666 ops/sec vs SeaweedFS 5,165 ops/sec
- 1M: 5,466 ops/sec vs SeaweedFS 2,212 ops/sec
- 4M: 3,084 ops/sec vs SeaweedFS 646 ops/sec
This makes me draw an interesting conclusion: seeing as CT Logs are read/write heavy (every couple of seconds, the Merkle tree is recomputed which is reasonably disk-intensive), SeaweedFS might be a slight better choice. IPng Networks has three MinIO deployments, but no SeaweedFS deployments. Yet.
Tessera
[Tessera] is a Go library for building tile-based transparency logs (tlogs) [ref]. It is the logical successor to the approach that Google took when building and operating Logs using its predecessor called [Trillian]. The implementation and its APIs bake-in current best-practices based on the lessons learned over the past decade of building and operating transparency logs in production environments and at scale.
Tessera was introduced at the Transparency.Dev summit in October 2024. I first watch Al and Martin [introduce] it at last year’s summit. At a high level, it wraps what used to be a whole kubernetes cluster full of components, into a single library that can be used with Cloud based services, either like AWS S3 and RDS database, or like GCP’s GCS storage and Spanner database. However, Google also made is easy to use a regular POSIX filesystem implementation.
TesseraCT

While Tessera is a library, a CT log implementation comes from its sibling GitHub repository called
[TesseraCT]. Because it leverages Tessera under the
hood, TesseraCT can run on GCP, AWS, POSIX-compliant, or on S3-compatible systems alongside a MySQL
database. In order to provide ecosystem agility and to control the growth of CT Log sizes, new CT
Logs must be temporally sharded, defining a certificate expiry range denoted in the form of two
dates: [rangeBegin, rangeEnd)
. The certificate expiry range allows a Log to reject otherwise valid
logging submissions for certificates that expire before or after this defined range, thus
partitioning the set of publicly-trusted certificates that each Log will accept. I will be expected
to keep logs for an extended period of time, say 3-5 years.
It’s time for me to figure out what this TesseraCT thing can do .. are you ready? Let’s go!
TesseraCT: S3 and SQL
TesseraCT comes with a few so-called personalities. Those are an implementation of the underlying
storage infrastructure in an opinionated way. The first personality I look at is the aws
one in
cmd/tesseract/aws
. I notice that this personality does make hard assumptions about the use of AWS
which is unfortunate as the documentation says ‘.. or self-hosted S3 and MySQL database’. However,
the aws
personality assumes the AWS SecretManager in order to fetch its signing key. Before I
can be successful, I need to detangle that.
TesseraCT: AWS and Local Signer
First, I change cmd/tesseract/aws/main.go
to add two new flags:
- -signer_public_key_file: a path to the public key for checkpoints and SCT signer
- -signer_private_key_file: a path to the private key for checkpoints and SCT signer
I then change the program to assume if these flags are both set, the user will want a
NewLocalSigner instead of a NewSecretsManagerSigner. Now all I have to do is implement the
signer interface in a package local_signer.go
. There, function NewLocalSigner() will read the
public and private PEM from file, decode them, and create an ECDSAWithSHA256Signer with them, a
simple example to show what I mean:
// NewLocalSigner creates a new signer that uses the ECDSA P-256 key pair from
// local disk files for signing digests.
func NewLocalSigner(publicKeyFile, privateKeyFile string) (*ECDSAWithSHA256Signer, error) {
// Read public key
publicKeyPEM, err := os.ReadFile(publicKeyFile)
publicPemBlock, rest := pem.Decode(publicKeyPEM)
var publicKey crypto.PublicKey
publicKey, err = x509.ParsePKIXPublicKey(publicPemBlock.Bytes)
ecdsaPublicKey, ok := publicKey.(*ecdsa.PublicKey)
// Read private key
privateKeyPEM, err := os.ReadFile(privateKeyFile)
privatePemBlock, rest := pem.Decode(privateKeyPEM)
var ecdsaPrivateKey *ecdsa.PrivateKey
ecdsaPrivateKey, err = x509.ParseECPrivateKey(privatePemBlock.Bytes)
// Verify the correctness of the signer key pair
if !ecdsaPrivateKey.PublicKey.Equal(ecdsaPublicKey) {
return nil, errors.New("signer key pair doesn't match")
}
return &ECDSAWithSHA256Signer{
publicKey: ecdsaPublicKey,
privateKey: ecdsaPrivateKey,
}, nil
}
In the snippet above I omitted all of the error handling, but the local signer logic itself is hopefully clear. And with that, I am liberated from Amazon’s Cloud offering and can run this thing all by myself!
TesseraCT: Running with S3, MySQL, and Local Signer
First, I need to create a suitable ECDSA key:
pim@ctlog-test:~$ openssl ecparam -name prime256v1 -genkey -noout -out /tmp/private_key.pem
pim@ctlog-test:~$ openssl ec -in /tmp/private_key.pem -pubout -out /tmp/public_key.pem
Then, I’ll install the MySQL server and create the databases:
pim@ctlog-test:~$ sudo apt install default-mysql-server
pim@ctlog-test:~$ sudo mysql -u root
CREATE USER 'tesseract'@'localhost' IDENTIFIED BY '<db_passwd>';
CREATE DATABASE tesseract;
CREATE DATABASE tesseract_antispam;
GRANT ALL PRIVILEGES ON tesseract.* TO 'tesseract'@'localhost';
GRANT ALL PRIVILEGES ON tesseract_antispam.* TO 'tesseract'@'localhost';
Finally, I use the SSD MinIO lab-machine that I just loadtested to create an S3 bucket.
pim@ctlog-test:~$ mc mb minio-ssd/tesseract-test
pim@ctlog-test:~$ cat << EOF > /tmp/minio-access.json
{ "Version": "2012-10-17", "Statement": [ {
"Effect": "Allow",
"Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ],
"Resource": [ "arn:aws:s3:::tesseract-test/*", "arn:aws:s3:::tesseract-test" ]
} ]
}
EOF
pim@ctlog-test:~$ mc admin user add minio-ssd <user> <secret>
pim@ctlog-test:~$ mc admin policy create minio-ssd tesseract-test-access /tmp/minio-access.json
pim@ctlog-test:~$ mc admin policy attach minio-ssd tesseract-test-access --user <user>
pim@ctlog-test:~$ mc anonymous set public minio-ssd/tesseract-test

After some fiddling, I understand that the AWS software development kit makes some assumptions that you’ll be using .. quelle surprise .. AWS services. But you can also use local S3 services by setting a few key environment variables. I had heard of the S3 access and secret key environment variables before, but I now need to also use a different S3 endpoint. That little detour into the codebase only took me .. several hours.
Armed with that knowledge, I can build and finally start my TesseraCT instance:
pim@ctlog-test:~/src/tesseract/cmd/tesseract/aws$ go build -o ~/aws .
pim@ctlog-test:~$ export AWS_DEFAULT_REGION="us-east-1"
pim@ctlog-test:~$ export AWS_ACCESS_KEY_ID="<user>"
pim@ctlog-test:~$ export AWS_SECRET_ACCESS_KEY="<secret>"
pim@ctlog-test:~$ export AWS_ENDPOINT_URL_S3="http://minio-ssd.lab.ipng.ch:9000/"
pim@ctlog-test:~$ ./aws --http_endpoint='[::]:6962' \
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
--bucket=tesseract-test \
--db_host=ctlog-test.lab.ipng.ch \
--db_user=tesseract \
--db_password=<db_passwd> \
--db_name=tesseract \
--antispam_db_name=tesseract_antispam \
--signer_public_key_file=/tmp/public_key.pem \
--signer_private_key_file=/tmp/private_key.pem \
--roots_pem_file=internal/hammer/testdata/test_root_ca_cert.pem
I0727 15:13:04.666056 337461 main.go:128] **** CT HTTP Server Starting ****
Hah! I think most of the command line flags and environment variables should make sense, but I was
struggling for a while with the --roots_pem_file
and the --origin
flags, so I phoned a friend
(Al Cutter, Googler extraordinaire and an expert in Tessera/CT). He explained to me that the Log is
actually an open endpoint to which anybody might POST data. However, to avoid folks abusing the log
infrastructure, each POST is expected to come from one of the certificate authorities listed in the
--roots_pem_file
. OK, that makes sense.
Then, the --origin
flag designates how my log calls itself. In the resulting checkpoint
file it
will enumerate a hash of the latest merged and published Merkle tree. In case a server serves
multiple logs, it uses the --origin
flag to make the destinction which checksum belongs to which.
pim@ctlog-test:~/src/tesseract$ curl http://tesseract-test.minio-ssd.lab.ipng.ch:9000/checkpoint
ctlog-test.lab.ipng.ch/test-ecdsa
0
JGPitKWWI0aGuCfC2k1n/p9xdWAYPm5RZPNDXkCEVUU=
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMCONUBAMARjBEAiA/nc9dig6U//vPg7SoTHjt9bxP5K+x3w4MYKpIRn4ULQIgUY5zijRK8qyuJGvZaItDEmP1gohCt+wI+sESBnhkuqo=
When creating the bucket above, I used mc anonymous set public
, which made the S3 bucket
world-readable. I can now execute the whole read-path simply by hitting the S3 service. Check.
TesseraCT: Loadtesting S3/MySQL

The write path is a server on [::]:6962
. I should be able to write a log to it, but how? Here’s
where I am grateful to find a tool in the TesseraCT GitHub repository called hammer
. This hammer
sets up read and write traffic to a Static CT API log to test correctness and performance under
load. The traffic is sent according to the [Static CT API] spec.
Slick!
The tool start a text-based UI (my favorite! also when using Cisco T-Rex loadtester) in the terminal
that shows the current status, logs, and supports increasing/decreasing read and write traffic. This
TUI allows for a level of interactivity when probing a new configuration of a log in order to find
any cliffs where performance degrades. For real load-testing applications, especially headless runs
as part of a CI pipeline, it is recommended to run the tool with -show_ui=false
in order to disable
the UI.
I’m a bit lost in the somewhat terse
[README.md], but my buddy
Al comes to my rescue and explains the flags to me. First of all, the loadtester wants to hit the
same --origin
that I configured the write-path to accept. In my case this is
ctlog-test.lab.ipng.ch/test-ecdsa
. Then, it needs the public key for that Log, which I can find
in /tmp/public_key.pem
. The text there is the DER (Distinguished Encoding Rules), stored as a
base64 encoded string. What follows next was the most difficult for me to understand, as I was
thinking the hammer would read some log from the internet somewhere and replay it locally. Al
explains that actually, the hammer
tool synthetically creates all of these entries itself, and it
regularly reads the checkpoint
from the --log_url
place, while it writes its certificates to
--write_log_url
. The last few flags just inform the hammer
how many read and write ops/sec it
should generate, and with that explanation my brain plays tadaa.wav and I am ready to go.
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer \
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEucHtDWe9GYNicPnuGWbEX8rJg/VnDcXs8z40KdoNidBKy6/ZXw2u+NW1XAUnGpXcZozxufsgOMhijsWb25r7jw== \
--log_url=http://tesseract-test.minio-ssd.lab.ipng.ch:9000/ \
--write_log_url=http://localhost:6962/ctlog-test.lab.ipng.ch/test-ecdsa/ \
--max_read_ops=0 \
--num_writers=5000 \
--max_write_ops=100

Cool! It seems that the loadtest is happily chugging along at 100qps. The log is consuming them in
the HTTP write-path by accepting POST requests to
/ctlog-test.lab.ipng.ch/test-ecdsa/ct/v1/add-chain
, where hammer is offering them at a rate of
100qps, with a configured probability of duplicates set at 10%. What that means is that every now
and again, it’ll repeat a previous request. The purpose of this is to stress test the so-called
antispam
implementation. When hammer
sends its requests, it signs them with a certificate that
was issued by the CA described in internal/hammer/testdata/test_root_ca_cert.pem
, which is why
TesseraCT accepts them.
I raise the write load by using the ‘>’ key a few times. I notice things are great at 500qps, which
is nice because that’s double what we are to expect. But I start seeing a bit more noise at 600qps.
When I raise the write-rate to 1000qps, all hell breaks loose on the logs of the server (and similar
logs in the hammer
loadtester:
W0727 15:54:33.419881 348475 handlers.go:168] ctlog-test.lab.ipng.ch/test-ecdsa: AddChain handler error: couldn't store the leaf: failed to fetch entry bundle at index 0: failed to fetch resource: getObject: failed to create reader for object "tile/data/000" in bucket "tesseract-test": operation error S3: GetObject, context deadline exceeded
W0727 15:55:02.727962 348475 aws.go:345] GarbageCollect failed: failed to delete one or more objects: failed to delete objects: operation error S3: DeleteObjects, https response error StatusCode: 400, RequestID: 1856202CA3C4B83F, HostID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, api error MalformedXML: The XML you provided was not well-formed or did not validate against our published schema.
E0727 15:55:10.448973 348475 append_lifecycle.go:293] followerStats: follower "AWS antispam" EntriesProcessed(): failed to read follow coordination info: Error 1040: Too many connections
I see on the MinIO instance that it’s doing about 150/s of GETs and 15/s of PUTs, which is totally reasonable:
pim@ctlog-test:~/src/tesseract$ mc admin trace --stats ssd
Duration: 6m9s ▰▱▱
RX Rate:↑ 34 MiB/m
TX Rate:↓ 2.3 GiB/m
RPM : 10588.1
-------------
Call Count RPM Avg Time Min Time Max Time Avg TTFB Max TTFB Avg Size Rate /min
s3.GetObject 60558 (92.9%) 9837.2 4.3ms 708µs 48.1ms 3.9ms 47.8ms ↑144B ↓246K ↑1.4M ↓2.3G
s3.PutObject 2199 (3.4%) 357.2 5.3ms 2.4ms 32.7ms 5.3ms 32.7ms ↑92K ↑32M
s3.DeleteMultipleObjects 1212 (1.9%) 196.9 877µs 290µs 41.1ms 850µs 41.1ms ↑230B ↓369B ↑44K ↓71K
s3.ListObjectsV2 1212 (1.9%) 196.9 18.4ms 999µs 52.8ms 18.3ms 52.7ms ↑131B ↓261B ↑25K ↓50K
Another nice way to see what makes it through is this oneliner, which reads the checkpoint
every
second, and once it changes, shows the delta in seconds and how many certs were written:
pim@ctlog-test:~/src/tesseract$ T=0; O=0; while :; do \
N=$(curl -sS http://tesseract-test.minio-ssd.lab.ipng.ch:9000/checkpoint | grep -E '^[0-9]+$'); \
if [ "$N" -eq "$O" ]; then \
echo -n .; \
else \
echo " $T seconds $((N-O)) certs"; O=$N; T=0; echo -n $N\ ;
fi; \
T=$((T+1)); sleep 1; done
1012905 .... 5 seconds 2081 certs
1014986 .... 5 seconds 2126 certs
1017112 .... 5 seconds 1913 certs
1019025 .... 5 seconds 2588 certs
1021613 .... 5 seconds 2591 certs
1024204 .... 5 seconds 2197 certs
So I can see that the checkpoint is refreshed every 5 seconds and between 1913 and 2591 certs are written each time. And indeed, at 400/s there are no errors or warnings at all. At this write rate, TesseraCT is using about 2.9 CPUs/s, with MariaDB using 0.3 CPUs/s, but the hammer is using 6.0 CPUs/s. Overall, the machine is perfectly happily serving for a few hours under this load test.
Conclusion: a write-rate of 400/s should be safe with S3+MySQL
TesseraCT: POSIX
I have been playing with this idea of having a reliable read-path by having the S3 cluster be redundant, or by replicating the S3 bucket. But Al asks: why not use our experimental POSIX? We discuss two very important benefits, but also two drawbacks:
- On the plus side:
- There is no need for S3 storage, read/writing to a local ZFS raidz2 pool instead.
- There is no need for MySQL, as the POSIX implementation can use a local badger instance also on the local filesystem.
- On the drawbacks:
- There is a SPOF in the read-path, as the single VM must handle both. The write-path always has a SPOF on the TesseraCT VM.
- Local storage is more expensive than S3 storage, and can be used only for the purposes of one application (and at best, shared with other VMs on the same hypervisor).
Come to think of it, this is maybe not such a bad tradeoff. I do kind of like having a single-VM with a single-binary and no other moving parts. It greatly simplifies the architecture, and for the read-path I can (and will) still use multiple upstream NGINX machines in IPng’s network.
I consider myself nerd-sniped, and take a look at the POSIX variant. I have a few SAS3
solid state storage (NetAPP part number X447_S1633800AMD), which I plug into the ctlog-test
machine.
pim@ctlog-test:~$ sudo zpool create -o ashift=12 -o autotrim=on -o ssd-vol0 mirror \
/dev/disk/by-id/wwn-0x5002538a0???????
pim@ctlog-test:~$ sudo zfs create ssd-vol0/tesseract-test
pim@ctlog-test:~$ sudo chown pim:pim /ssd-vol0/tesseract-test
pim@ctlog-test:~/src/tesseract$ go run ./cmd/experimental/posix --http_endpoint='[::]:6962' \
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
--private_key=/tmp/private_key.pem \
--storage_dir=/ssd-vol0/tesseract-test \
--roots_pem_file=internal/hammer/testdata/test_root_ca_cert.pem
badger 2025/07/27 16:29:15 INFO: All 0 tables opened in 0s
badger 2025/07/27 16:29:15 INFO: Discard stats nextEmptySlot: 0
badger 2025/07/27 16:29:15 INFO: Set nextTxnTs to 0
I0727 16:29:15.032845 363156 files.go:502] Initializing directory for POSIX log at "/ssd-vol0/tesseract-test" (this should only happen ONCE per log!)
I0727 16:29:15.034101 363156 main.go:97] **** CT HTTP Server Starting ****
pim@ctlog-test:~/src/tesseract$ cat /ssd-vol0/tesseract-test/checkpoint
ctlog-test.lab.ipng.ch/test-ecdsa
0
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMSgC8BAMARzBFAiBjT5zdkniKlryqlUlx/gLHOtVK26zuWwrc4BlyTVzCWgIhAJ0GIrlrP7YGzRaHjzdB5tnS5rpP3LeOsPbpLateaiFc
Alright, I can see the log started and created an empty checkpoint file. Nice!
Before I can loadtest it, I will need to get the read-path to become visible. The hammer
can read
a checkpoint from local file:///
prefixes, but I’ll have to serve them over the network eventually
anyway, so I create the following NGINX config for it:
server {
listen 80 default_server backlog=4096;
listen [::]:80 default_server backlog=4096;
root /ssd-vol0/tesseract-test/;
index index.html index.htm index.nginx-debian.html;
server_name _;
access_log /var/log/nginx/access.log combined buffer=512k flush=5s;
location / {
try_files $uri $uri/ =404;
tcp_nopush on;
sendfile on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
}
}
Just a couple of small thoughts on this configuration. I’m using buffered access logs, to avoid
excessive disk writes in the read-path. Then, I’m using kernel sendfile()
which will instruct the
kernel to serve the static objects directly, so that NGINX can move on. Further, I’ll allow for a
long keepalive in HTTP 1.1, so that future requests can use the same TCP connection, and I’ll set
the flag tcp_nodelay
and tcp_nopush
to just blast the data out without waiting.
Without much ado:
pim@ctlog-test:~/src/tesseract$ curl -sS ctlog-test.lab.ipng.ch/checkpoint
ctlog-test.lab.ipng.ch/test-ecdsa
0
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMTfksBAMASDBGAiEAqADLH0P/SRVloF6G1ezlWG3Exf+sTzPIY5u6VjAKLqACIQCkJO2N0dZQuDHvkbnzL8Hd91oyU41bVqfD3vs5EwUouA==
TesseraCT: Loadtesting POSIX
The loadtesting is roughly the same. I start the hammer
with the same 500qps of write rate, which
was roughly where the S3+MySQL variant topped. My checkpoint tracker shows the following:
pim@ctlog-test:~/src/tesseract$ T=0; O=0; while :; do \
N=$(curl -sS http://localhost/checkpoint | grep -E '^[0-9]+$'); \
if [ "$N" -eq "$O" ]; then \
echo -n .; \
else \
echo " $T seconds $((N-O)) certs"; O=$N; T=0; echo -n $N\ ;
fi; \
T=$((T+1)); sleep 1; done
59250 ......... 10 seconds 5244 certs
64494 ......... 10 seconds 5000 certs
69494 ......... 10 seconds 5000 certs
74494 ......... 10 seconds 5000 certs
79494 ......... 10 seconds 5256 certs
79494 ......... 10 seconds 5256 certs
84750 ......... 10 seconds 5244 certs
89994 ......... 10 seconds 5256 certs
95250 ......... 10 seconds 5000 certs
100250 ......... 10 seconds 5000 certs
105250 ......... 10 seconds 5000 certs
I learn two things. First, the checkpoint interval in this posix
variant is 10 seconds, compared
to the 5 seconds of the aws
variant I tested before. I dive into the code, because there doesn’t
seem to be a --checkpoint_interval
flag. In the tessera
library, I find
DefaultCheckpointInterval
which is set to 10 seconds. I change it to be 2 seconds instead, and
restart the posix
binary:
238250 . 2 seconds 1000 certs
239250 . 2 seconds 1000 certs
240250 . 2 seconds 1000 certs
241250 . 2 seconds 1000 certs
242250 . 2 seconds 1000 certs
243250 . 2 seconds 1000 certs
244250 . 2 seconds 1000 certs

Very nice! Maybe I can write a few more certs? I restart the hammer
with 5000/s, which somewhat to my
surprise, ends up serving!
642608 . 2 seconds 6155 certs
648763 . 2 seconds 10256 certs
659019 . 2 seconds 9237 certs
668256 . 2 seconds 8800 certs
677056 . 2 seconds 8729 certs
685785 . 2 seconds 8237 certs
694022 . 2 seconds 7487 certs
701509 . 2 seconds 8572 certs
710081 . 2 seconds 7413 certs
The throughput is highly variable though, seemingly between 3700/sec and 5100/sec, and I quickly
find out that the hammer
is completely saturating the CPU on the machine, leaving very little room
for the posix
TesseraCT to serve. I’m going to need more machines!
So I start a hammer
loadtester on the two now-idle MinIO servers, and run them at about 6000qps
each, for a total of 12000 certs/sec. And my little posix
binary is keeping up like a champ:
2987169 . 2 seconds 23040 certs
3010209 . 2 seconds 23040 certs
3033249 . 2 seconds 21760 certs
3055009 . 2 seconds 21504 certs
3076513 . 2 seconds 23808 certs
3100321 . 2 seconds 22528 certs
One thing is reasonably clear, the posix
TesseraCT is CPU bound, not disk bound. The CPU is now
running at about 18.5 CPUs/s (with 20 cores), which is pretty much all this Dell has to offer. The
NetAPP enterprise solid state drives are not impressed:
pim@ctlog-test:~/src/tesseract$ zpool iostat -v ssd-vol0 10 100
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
ssd-vol0 11.4G 733G 0 3.13K 0 117M
mirror-0 11.4G 733G 0 3.13K 0 117M
wwn-0x5002538a05302930 - - 0 1.04K 0 39.1M
wwn-0x5002538a053069f0 - - 0 1.06K 0 39.1M
wwn-0x5002538a06313ed0 - - 0 1.02K 0 39.1M
-------------------------- ----- ----- ----- ----- ----- -----
pim@ctlog-test:~/src/tesseract$ zpool iostat -l ssd-vol0 10
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
ssd-vol0 14.0G 730G 0 1.48K 0 35.4M - 2ms - 535us - 1us - 3ms - 50ms
ssd-vol0 14.0G 730G 0 1.12K 0 23.0M - 1ms - 733us - 2us - 1ms - 44ms
ssd-vol0 14.1G 730G 0 1.42K 0 45.3M - 508us - 122us - 914ns - 2ms - 41ms
ssd-vol0 14.2G 730G 0 678 0 21.0M - 863us - 144us - 2us - 2ms - -
Results
OK, that kind of seals the deal for me. The write path needs about 250 certs/sec and I’m hammering
now with 12'000 certs/sec, with room to spare. But what about the read path? The cool thing about
the static log is that reads are all entirely done by NGINX. The only file that isn’t cacheable is
the checkpoint
file which gets updated every two seconds (or ten seconds in the default tessera
settings).
So I start yet another hammer
whose job it is to read back from the static filesystem:
pim@ctlog-test:~/src/tesseract$ curl localhost/nginx_status; sleep 60; curl localhost/nginx_status
Active connections: 10556
server accepts handled requests
25302 25302 1492918
Reading: 0 Writing: 1 Waiting: 10555
Active connections: 7791
server accepts handled requests
25764 25764 1727631
Reading: 0 Writing: 1 Waiting: 7790
And I can see that it’s keeping up quite nicely. In one minute, it handled (1727631-1492918) or
234713 requests, which is a cool 3911 requests/sec. All these read/write hammers are kind of
saturating the ctlog-test
machine though:

But after a little bit of fiddling, I can assert my conclusion:
Conclusion: a write-rate of 8'000/s alongside a read-rate of 4'000/s should be safe with POSIX
What’s Next
I am going to offer such a machine in production together with Antonis Chariton, and Jeroen Massar. I plan to do a few additional things:
- Test Sunlight as well on the same hardware. It would be nice to see a comparison between write rates of the two implementations.
- Work with Al Cutter and the Transparency Dev team to close a few small gaps (like the
local_signer.go
and some Prometheus monitoring of theposix
binary. - Install and launch both under
*.ct.ipng.ch
, which in itself deserves its own report, showing how I intend to do log cycling and care/feeding, as well as report on the real production experience running these CT Logs.