That said having to unpack the .zip's and then serve the files, given the sheer size involved, can be problematic.
No doubt your plan was to add a HTTP server component to serve the archives at some point, but since one was not yet available and to meet my own needs, I've created a project to serve the content from the zip file collections, over HTTP, directly without extraction.
It's still very much under development so if anyone is interested in using this it would be great to receive feedback and issues for bugs/problems encountered.
Hopefully by providing an existing CT client compatible HTTP interface into the zips it will also encourage/allow people to continue seeding, particularly for archive's that are too large for the IA to host as you mentioned on another thread recently?
Rationale:
1. No duplicate storage required, e.g. no need to unzip 10+TB of zip files only to wind up with ~25+TB (guesstimate) of disk utilisation
2. Serve static tiled log archives via HTTP interface that existing CT client code already uses with large log sets *without* having to add and maintain code to enable tile/etc extraction from zip files.
3. Helps enable development and testing of client code offline/in isolation with multiple large logs with minimal storage requirements
The PoC in the GitHub repo:
2. Automatically serves the log archives via HTTP after discovering matching folder names (ct_\* by default as from torrents) with at minimum a 000.zip file that has a valid structure and retrievable checkpoint and log.v3.json files from within 000.zip
3. Automatically generates and publishes /logs.v3.json describing all valid logs that it has found available so far. Periodically discovers and adds new logs that have been downloaded once valid 000.zip is available.
4. Single Go binary in a minimal container
5. Prometheus metrics
6. Optional example headless qBittorrent configuration and container (refer to compose-all.yml) that has been pre-configured with the URL to torrents.rss and auto-download rules to download everything
There's no TLS/HTTPS support in the Go binary so it's just acting as a plaintext HTTP zip content server, with no rate limiting or other features, as it focuses solely on serving up the files from within the zip files as fast as possible.
My thought is that anyone who wants to add HTTPS, caching, rate limiting etc, successful request logging etc; would stick a reverse proxy in front of it to meet whatever their needs actually are.
There's one customisation worth noting.
Given some of the archives lack /issuers/ files the generated /monitor.json includes an extra per log boolean field simply named "has_issuers", this defaults to false but is set to true if ct-archive-serve detects at least one file in 000.zip that starts with "/issuers/".
I was running into issues with my client trying to constantly retrieve CA certs from under /issuers/, to the point it was really killing performance.