Hello fellow mortals,
After chatting with some of you, including Philippe, Rasmus, and Ben, I built a little prototype of how long-term archival for CT log data could look like. Google currently operates mirrors of many historical logs, but that's not a sustainable solution in perpetuity. The idea is to package logs up as Static CT tiles, and upload them to the Internet Archive.
- vanity-mirror downloads an RFC 6962 log into the Static CT format. It expects a log.v3.json file in the current directory, and takes a mirror URL as an argument. It does parallel get-entries and rebuilds the Merkle Tree, eventually checking it against the STH, which it verifies and converts to a checkpoint.
- photocamera-archiver compresses a Static CT log into a series of zip files (000.zip, 001.zip, ...), each containing a subtree of height 24 (16 Mi entries, roughly 11.4 GiB). Every archive also contains a README, the checkpoint, the log.v3.json file, issuers, level 3+ tiles, and the partial tiles on the right edge, so each archive is self-verifying.
It would have been nice to upload the Static CT log uncompressed to a single Internet Archive item, but the IA system struggles with more than a few files per item. (It took a couple days to delete ~1000 files from an initial attempt.)
Since zip files are seekable, I am planning to make
filippo.io/sunlight.Client capable of pulling entries and hashes directly from a set of zip files, without requiring decompression.
I'm looking for feedback on the strategy, format, and tools. Ideally log operators would archive their own logs, so I would especially like feedback from log operators.
Cheers,
Filippo
P.S. The tools took about an afternoon. Then I spent 3-4 days over multiple weekends debugging why the IA kept rejecting 005.zip (and only 005.zip) with "Uploaded content is unacceptable. - error checking archive file". I excruciatingly bisected the log size (since an almost empty 005.zip was acceptable, but an almost complete 005.zip was not) down to a single tile, tile/data/x351/191. When I removed that file from the full-size archive, it uploaded. Then, bizarrely, if I added it back using zip(1), it still uploaded. I have no idea wtf is going on, and I almost went mad over it. I'm hoping it's just a fluke and won't happen again with other logs, but FYI.