I've been looking to work on CT Logs as a personal project for a long time and have found even my "home" hardware has worked well so far. Disk storage space is the main need for me. 30TiB of raw response data and counting, ready for import at some point).
I was looking at MariaDB but it seems you can index expressions in PostgreSQL which looks very interesting (never used PG before, I'm a MS SQL Server user). Possible that would save a lot of space given the fact I want to keep the original leaf verbatim as well as index most (all?) its attributes.
Out of interest how much storage does crt.sh use?
(I found the logs are very large, and extra_data is very redundant, of course, so careful design can minimise storage cost)
As an aside, I have found massive performance differences / behaviour between the various logs/operators (around what triggers a 429 HTTP Response). Cloudflare takes anything you can throw at it, whilst DigiCert is the most fragile.