Dear users,
We're painfully aware that crt.sh:443 and crt.sh:5432 have been struggling for some time, as evidenced by the ongoing volume of HTTP 50x errors, connection failures, slow response times, etc. At Sectigo we keep working on a best-effort basis to keep the systems up, but sadly all of the backend servers are constantly overworked and the PostgreSQL replicas have become prone to freezing on an almost daily basis (root cause unknown).
crt.sh was
last revamped in 2019. That revamp included a PostgreSQL version upgrade, a table partitioning strategy, and a new indexing strategy (Full Text Search). In my view, the primary factor behind today's performance woes is that the indexing strategy has not aged well. By adopting Full Text Search we succeeded in reducing the index sizes, but the service has been unable to sort result sets efficiently.
So, it's time to revamp crt.sh again (and I don't mean just this week's UI refresh!) :-)
Over the coming weeks/months (no ETA yet; it'll take as long as it takes), we plan to do something along these lines...
- Spin up a new primary certwatch DB (PostgreSQL 15, running on Debian 12).
- Create a revamped schema; expected highlights include:
- Partition the "certificate" table by ID instead of notAfter timestamp, so that partition sizes are predictable and therefore easier to manage.
- Replace the "certificate" table's Full Text Search index with a separate "certificate_identity" table. (Similar to the original crt.sh design from 2015, but with various space-saving improvements and more-targeted indexes that we're in the process of devising).
- Copy the data from the current DB to the new DB.
- Create the indexes.
- Create new replicas, and point crt.sh:443 and crt.sh:5432 at them.
Hopefully we'll have enough disk space to do all that before we need to tear down the current databases, but it's going to be tight. The current DB is 68TB uncompressed, with 1 primary and 7 replicas. After compression and deduplication, the storage array is around 40% full.
These changes will of course impact lots of queries that users are running against crt.sh:5432. This is unavoidable, but also a necessary step towards us being able to continue operating and supporting the service.