Archiving Certificate Transparency Logs

145 views
Skip to first unread message

Filippo Valsorda

unread,
Sep 22, 2025, 6:35:51 AM (2 days ago) Sep 22
to Certificate Transparency Policy
Hello fellow mortals,

After chatting with some of you, including Philippe, Rasmus, and Ben, I built a little prototype of how long-term archival for CT log data could look like. Google currently operates mirrors of many historical logs, but that's not a sustainable solution in perpetuity. The idea is to package logs up as Static CT tiles, and upload them to the Internet Archive.

  • vanity-mirror downloads an RFC 6962 log into the Static CT format. It expects a log.v3.json file in the current directory, and takes a mirror URL as an argument. It does parallel get-entries and rebuilds the Merkle Tree, eventually checking it against the STH, which it verifies and converts to a checkpoint.
  • photocamera-archiver compresses a Static CT log into a series of zip files (000.zip, 001.zip, ...), each containing a subtree of height 24 (16 Mi entries, roughly 11.4 GiB). Every archive also contains a README, the checkpoint, the log.v3.json file, issuers, level 3+ tiles, and the partial tiles on the right edge, so each archive is self-verifying.
It would have been nice to upload the Static CT log uncompressed to a single Internet Archive item, but the IA system struggles with more than a few files per item. (It took a couple days to delete ~1000 files from an initial attempt.)

A sample archive, using a small 2018 vintage DigiCert log, is at https://archive.org/details/ct_digicert_yeti2018. It clocks in at 61.2 GiB over six zip files for 90785920 entries.

Since zip files are seekable, I am planning to make filippo.io/sunlight.Client capable of pulling entries and hashes directly from a set of zip files, without requiring decompression.

I'm looking for feedback on the strategy, format, and tools. Ideally log operators would archive their own logs, so I would especially like feedback from log operators.

Cheers,
Filippo

P.S. The tools took about an afternoon. Then I spent 3-4 days over multiple weekends debugging why the IA kept rejecting 005.zip (and only 005.zip) with "Uploaded content is unacceptable. - error checking archive file". I excruciatingly bisected the log size (since an almost empty 005.zip was acceptable, but an almost complete 005.zip was not) down to a single tile, tile/data/x351/191. When I removed that file from the full-size archive, it uploaded. Then, bizarrely, if I added it back using zip(1), it still uploaded. I have no idea wtf is going on, and I almost went mad over it. I'm hoping it's just a fluke and won't happen again with other logs, but FYI.
Reply all
Reply to author
Forward
0 new messages