Performance improvements

117 views
Skip to first unread message

r...@sectigo.com

unread,
Sep 28, 2023, 4:37:21 PM9/28/23
to crt.sh
Hi everyone.

I'm sure some of you will have noticed that crt.sh's log entry ingestion backlog has grown rather large over the last few months.  I've been working towards addressing this problem for some time, and earlier this week we were at last able to deploy a set of performance improvements that have already made a considerable dent in the ingestion backlog.

The primary performance gains come from enabling ct_monitor to utilise multiple CPU cores when writing newly ingested certificates to the database, and from moving several other tasks (that were previously performed during the ingestion process) out of the critical path to be handled by a new cert_processor application.

cert_processor is now responsible for tracking the counts of certificate issuances and expirations that are shown in the "Populations" table on each CA page and on the cert-populations page.  In the past I've noticed slight anomalies with the issuance/expiration counts for a few CAs, but after refactoring and closely reviewing the relevant parts of the code (as part of the effort to create cert_processor) I'm now fairly confident that I've fixed any errors and/or potential race conditions that could have caused these slight anomalies.

Since the counting is now completely decoupled from the log entry ingestion, I've reset the per-CA issuance/expiration tracking for all CAs, meaning that cert_processor is now trawling through a backlog of all certs known to crt.sh in order to produce accurate counts.  Once this backlog has gone, I will remove the "Population counts are currently being recalculated" notice from the CA pages and cert-populations page.

Assuming the backlogs continue to shrink at the same rates I've observed so far, I estimate that cert_processor will catch up within 2-3 weeks and that ct_monitor will catch up within the next month or so.

You may also have noticed that since early July https://crt.sh/ page loads have been much faster and much less likely to return HTTP 50x errors, and that queries targeting crt.sh:5432 have also been faster.  These improvements are due to our Ops team having switched the front-end read-only database replicas to use a much more performant storage array.

r...@sectigo.com

unread,
Oct 10, 2023, 11:48:10 AM10/10/23
to crt.sh
> Assuming the backlogs continue to shrink at the same rates I've observed so far, I estimate that cert_processor will catch up within 2-3 weeks and that ct_monitor will catch up within the next month or so.

One down, two to go...

Issuance counting backlog: cert_processor has caught up!
Expiration counting backlog: cert_processor is continuing to process this backlog.  Should hopefully finish within the next day or two, after which I will remove the "Population counts are currently being recalculated" notices.
Log entry ingestion backlog: ct_monitor is continuing to process this backlog, which so far has been reduced to about half the size it was at its peak.

r...@sectigo.com

unread,
Oct 17, 2023, 7:27:40 AM10/17/23
to crt.sh
Two down, one to go...

Log entry ingestion backlog: This reached zero a few days ago, which was much sooner than I'd expected!  :-)
Expiration counting backlog: This is going more slowly than initial progress suggested it would.  Another week or so to go, it would appear.

r...@sectigo.com

unread,
Oct 25, 2023, 5:53:11 AM10/25/23
to crt.sh
The expiration counting backlog has now gone, so I've removed the "Population counts are currently being recalculated" notices.
Reply all
Reply to author
Forward
0 new messages