Improving Sectigo's RFC6962 logs and introducing CF_CTile

663 views
Skip to first unread message

Rob Stradling

unread,
Jul 9, 2025, 8:55:30 AMJul 9
to Certificate Transparency Policy
Sectigo's logs are now running behind Cloudflare.  To reduce the load on these logs from get-entries and (to a lesser extent) get-sth requests, we have built a caching proxy (https://github.com/sectigo/CF_CTile) that runs as a Cloudflare Snippet.  This has a very similar goal to Let's Encrypt's CTile project, but it uses Cloudflare's CDN caching instead of S3 object storage.

Our metrics show that our PostgreSQL-based logs (the Elephant and Tiger shards) are now handling their current traffic volumes well, with room for growth and to handle traffic peaks.  The rate limits for the Elephant and Tiger shards are now set as follows:
  • Per IP: no limits defined.
  • Per shard: 3,000 req/sec.
  • Per shard, for "old" submissions: 40 req/sec for add-(pre-)chain requests where the certificate's notBefore timestamp is older than 28 hours at time of submission.
Sadly, our MariaDB-based logs (the Mammoth and Sabre shards) are still struggling to handle their current traffic volumes reliably, even with much lower rate limits (Per shard: 400 req/sec).

Now that we have built confidence in our PostgreSQL-based logs, we feel the time has come to start making plans to deprecate our use of MariaDB.  Our plan so far has been to wind down the Sabre and Mammoth shards once the Tiger and Elephant shards have proven themselves and transitioned to Usable.  However, we're now wondering if it would be less disruptive for the ecosystem if we instead look to migrate the existing Sabre and Mammoth shard databases from MariaDB to PostgreSQL, and let these shards continue to operate as Usable logs.  If any community members have opinions on which approach to take, we're keen to hear from you.

Andrew Ayer

unread,
Jul 9, 2025, 7:37:23 PMJul 9
to Rob Stradling, 'Rob Stradling' via Certificate Transparency Policy
Thanks for the update, Rob.

On Wed, 9 Jul 2025 05:55:30 -0700 (PDT)
"'Rob Stradling' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> Sectigo's logs are now running behind Cloudflare. To reduce the load
> on these logs from get-entries and (to a lesser extent) get-sth
> requests, we have built a caching proxy
> (https://github.com/sectigo/CF_CTile) that runs as a Cloudflare
> Snippet <https://developers.cloudflare.com/rules/snippets/>. This has
> a very similar goal to Let's Encrypt's CTile
> <https://github.com/letsencrypt/ctile> project, but it uses
> Cloudflare's CDN caching instead of S3 object storage.

CF_CTile is extremely cool (only 83 lines!) and is something I had long hoped someone would write.

> Our metrics show that our PostgreSQL-based logs (the Elephant and
> Tiger shards) are now handling their current traffic volumes well,
> with room for growth and to handle traffic peaks. The rate limits
> for the Elephant and Tiger shards are now set as follows:
>
> - Per IP: no limits defined.
> - Per shard: 3,000 req/sec.
> - Per shard, for "old" submissions
> <https://github.com/google/certificate-transparency-go/blob/master/CHANGELOG.md#ctfe-rate-limiting-of-non-fresh-submissions>:
> 40 req/sec for add-(pre-)chain requests where the certificate's
> notBefore timestamp is older than 28 hours at time of submission.
>
> Sadly, our MariaDB-based logs (the Mammoth and Sabre shards) are
> still struggling to handle their current traffic volumes reliably,
> even with much lower rate limits (Per shard: 400 req/sec).

I'm surprised, since AIUI CTile solved all of Let's Encrypt's read path problems. Are you still seeing significant amounts of read traffic hitting the origin, or is it the write load that's causing the problems?

Do you have Tiered Caching <https://developers.cloudflare.com/cache/how-to/tiered-cache/> enabled in Cloudflare? It looks like that can reduce the number of requests hitting the origin.

> Now that we have built confidence in our PostgreSQL-based logs, we
> feel the time has come to start making plans to deprecate our use of
> MariaDB. Our plan so far has been to wind down the Sabre and Mammoth
> shards once the Tiger and Elephant shards have proven themselves and
> transitioned to Usable. However, we're now wondering if it would be
> less disruptive for the ecosystem if we instead look to migrate the
> existing Sabre and Mammoth shard databases from MariaDB to
> PostgreSQL, and let these shards continue to operate as Usable logs.
> If any community members have opinions on which approach to take,
> we're keen to hear from you.

It will be 70 days until Tiger is Usable in Chrome, which IMO is too long for Sabre and Mammoth to continue in this state. If migrating them to PostgreSQL is the fastest way to get their availability back to an acceptable level, then I think that's the preferable option.

Regards,
Andrew

Rob Stradling

unread,
Jul 10, 2025, 7:20:36 PMJul 10
to Certificate Transparency Policy, Andrew Ayer, 'Rob Stradling' via Certificate Transparency Policy, Rob Stradling
> CF_CTile is extremely cool (only 83 lines!) and is something I had long hoped someone would write.

Thanks!  It took me a surprisingly long time to write those 83 lines!  Several approaches didn't quite work due to Cloudflare limitations (for instance, I tried incorporating a get-sth call into the get-entries handler so that it could consider the actual tree_size of the log, but I discovered that a Snippet isn't allowed to do more than one subrequest to the origin).  And it took several frustrating weeks to eventually figure out the counterintuitive fact that the default "Browser Cache TTL" setting was enforcing a minimum CDN cache TTL for fetched tiles, which is obviously a problem for partial tiles.

Talking of wonky partial tile caching, we've noticed that there's still some very occasional caching misbehaviour going on (Cloudflare bug?), and the mitigation I've implemented for that problem increases the code size to 107 lines.  ;-)

> I'm surprised, since AIUI CTile solved all of Let's Encrypt's read path problems. Are you still seeing significant amounts of read traffic hitting the origin, or is it the write load that's causing the problems?

Mammoth2025h2 is currently getting ~10x more get-entries requests than Elephant2025h2, which is presumably because lots of monitors currently have large backlogs for Mammoth2025h2.  Backlogs implies more variation in which tiles the monitors want to fetch.  For get-entries requests that receive a 200 response, our Cloudflare dashboard currently shows a cache hit rate of ~97% for Elephant2025h2 but only ~93% for Mammoth2025h2.

Due to Mammoth2025h2's stricter rate limits, only ~2-3x more get-entries requests are actually hitting Mammoth2025h2's database (compared to Elephant2025h2).  About two thirds of those requests are currently getting a 504 response, which means that the number of 200 responses for get-entries requests is currently about the same for these two shards.

These two shards are currently getting roughly the same volume of add-(pre-)chain requests.

I believe the 504s represent either request timeouts or resource exhaustion (i.e., no DB connections available), which is all wasted effort.  This implies that the rate limits for the Mammoth and Sabre shards should be lowered further.


> Do you have Tiered Caching <https://developers.cloudflare.com/cache/how-to/tiered-cache/> enabled in Cloudflare? It looks like that can reduce the number of requests hitting the origin.

Yes.  Being able to use Tiered Caching was a design goal - this is why CF_CTile uses the fetch API rather than the Cache API, which is not compatible with Tiered Caching.

BTW, based solely on this report of occasional missing Age headers (which is also a hallmark of the occasional caching misbehaviour we're seeing) and a healthy dose of speculation, I'm wondering if Tiered Caching might somehow be the culprit of the very occasional caching misbehaviour.

> It will be 70 days until Tiger is Usable in Chrome, which IMO is too long for Sabre and Mammoth to continue in this state. If migrating them to PostgreSQL is the fastest way to get their availability back to an acceptable level, then I think that's the preferable option.

Only 12 days until the Elephant shards are Usable though.  :-)

We will investigate options for database migration.  https://pgloader.readthedocs.io/en/latest/ref/mysql.html looks promising.  Holiday season will slow us down somewhat though.

Rob Stradling

unread,
Jul 11, 2025, 6:28:27 PMJul 11
to Luke Valenta, Certificate Transparency Policy, Andrew Ayer
Thanks Luke!

From: Luke Valenta <lval...@cloudflare.com>
Sent: 11 July 2025 20:26
To: Rob Stradling <r...@sectigo.com>
Cc: Certificate Transparency Policy <ct-p...@chromium.org>; Andrew Ayer <ag...@andrewayer.name>
Subject: Re: [ct-policy] Improving Sectigo's RFC6962 logs and introducing CF_CTile
 
Thanks for sharing, Rob! CF_CTile seems like something we could drop in front of our own RFC6962 logs as well. I'll escalate the Tiered Caching bug internally to make sure the right team is aware. Best, Luke On Thu, Jul 10, 2025 at 7: 20
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
 
ZjQcmQRYFpfptBannerEnd
Thanks for sharing, Rob! CF_CTile seems like something we could drop in front of our own RFC6962 logs as well. I'll escalate the Tiered Caching bug internally to make sure the right team is aware.

Best,
Luke

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/2c5dc70a-e0cd-4821-bc32-79191f4f3800n%40chromium.org.


--
Luke Valenta
Systems Engineer - Research

Luke Valenta

unread,
Jul 14, 2025, 3:48:37 PMJul 14
to Rob Stradling, Certificate Transparency Policy, Andrew Ayer
Thanks for sharing, Rob! CF_CTile seems like something we could drop in front of our own RFC6962 logs as well. I'll escalate the Tiered Caching bug internally to make sure the right team is aware.

Best,
Luke

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/2c5dc70a-e0cd-4821-bc32-79191f4f3800n%40chromium.org.

Andrew Ayer

unread,
Jul 31, 2025, 10:47:55 AMJul 31
to Rob Stradling, ct-p...@chromium.org
I've received positive feedback from certspotter users[1] that the performance of Sectigo's logs has improved enough to effectively monitor the logs again from a single source IP address.

However, there's one troubling phenomenon that I think warrants investigation: from time to time, the get-entries endpoint will return a 500 Internal Server Error with the message "Worker threw exception". This error persists for many hours, impeding monitoring and allowing a large backlog to accumulate. It seems to be specific to a Cloudflare POP, as trying from a different geographic location succeeds.

Since 2025-07-31 01:05:43 UTC, I've been seeing this with the URL https://dumbo.ctlabs.sectigo.com/ct/v1/get-entries?start=715088640&end=715088895 when fetched from 2600:1f18:2469:d900:b53:ceaa:9bd:6fb6. The full HTTP response is at https://gist.github.com/AGWA/f4128c711daea60ec4f3890e16522d44

Regards,
Andrew

[1] https://github.com/SSLMate/certspotter/issues/108

Rob Stradling

unread,
Aug 4, 2025, 10:18:37 AMAug 4
to Certificate Transparency Policy, Andrew Ayer, ct-p...@chromium.org, Rob Stradling
Hi Andrew.  Thanks for this report.

Cloudflare Support has advised us that "At present, Cloudflare does not provide detailed logging or tracing for JavaScript exceptions that occur within Snippets in production environments. Unfortunately, there is no built-in mechanism to obtain full exception details or stack traces from deployed Snippets via the Cloudflare dashboard or logs."

So in order to investigate this further, I've added an exception handler to Dumbo's deployment of CF_CTile (see https://github.com/sectigo/CF_CTile/pull/new/catch_exceptions).  When a JavaScript exception occurs, Dumbo should now emit an HTTP 500 response with details of the exception in the response body.

I've not been able to reproduce this issue myself, so please send me the exception details when you see them.  Thanks!

Rob Stradling

unread,
Aug 4, 2025, 10:37:47 AMAug 4
to Certificate Transparency Policy, Rob Stradling, Andrew Ayer, ct-p...@chromium.org
> So in order to investigate this further, I've added an exception handler to Dumbo's deployment of CF_CTile (see https://github.com/sectigo/CF_CTile/pull/new/catch_exceptions).  When a JavaScript exception occurs, Dumbo should now emit an HTTP 500 response with details of the exception in the response body.

This is now deployed for all of the Sabre, Mammoth, Tiger, and Elephant logs too.

Rob Stradling

unread,
Aug 11, 2025, 7:02:26 AMAug 11
to Certificate Transparency Policy, Rob Stradling, Andrew Ayer, ct-p...@chromium.org
> > However, there's one troubling phenomenon that I think warrants investigation: from time to time, the get-entries endpoint will return a 500 Internal Server Error with the message "Worker threw exception". This error persists for many hours, impeding monitoring and allowing a large backlog to accumulate. It seems to be specific to a Cloudflare POP, as trying from a different geographic location succeeds.
>
> I've added an exception handler to Dumbo's deployment of CF_CTile (see https://github.com/sectigo/CF_CTile/pull/new/catch_exceptions).  When a JavaScript exception occurs, Dumbo should now emit an HTTP 500 response with details of the exception in the response body.

Andrew was able to capture and report the error message (see https://github.com/sectigo/CF_CTile/issues/4), which then enabled me to implement an effective workaround.

Rob Stradling

unread,
Aug 19, 2025, 7:04:09 AMAug 19
to Certificate Transparency Policy, Rob Stradling, Andrew Ayer, 'Rob Stradling' via Certificate Transparency Policy
> We will investigate options for database migration.  https://pgloader.readthedocs.io/en/latest/ref/mysql.html looks promising.  Holiday season will slow us down somewhat though.

We've done some experimentation, but unfortunately we've had to conclude that it's simply not viable for us to migrate the Mammoth and Sabre databases to PostgreSQL in any reasonable timeframe.

So today we've asked the Chrome and Apple CT log programs to move all of the Sabre and Mammoth shards out of the "Usable" state to either "ReadOnly" or "Retired" as they see fit.  Unless directed otherwise by either of those CT log programs, we will stop accepting submissions to the Sabre and Mammoth shards at approximately 2025-08-25 15:00 UTC.  We will continue to operate the read paths for these logs at least until each one's expiry window ends.

Andrew Ayer

unread,
Aug 19, 2025, 9:41:12 AMAug 19
to Rob Stradling, Certificate Transparency Policy
On Tue, 19 Aug 2025 04:04:09 -0700 (PDT)
Rob Stradling <r...@sectigo.com> wrote:

> > We will investigate options for database migration.
> https://pgloader.readthedocs.io/en/latest/ref/mysql.html looks
> promising. Holiday season will slow us down somewhat though.
>
> We've done some experimentation, but unfortunately we've had to
> conclude that it's simply not viable for us to migrate the Mammoth
> and Sabre databases to PostgreSQL in any reasonable timeframe.
>
> So today we've asked the Chrome and Apple CT log programs to move all
> of the Sabre and Mammoth shards out of the "Usable" state to either
> "ReadOnly" or "Retired" as they see fit. Unless directed otherwise
> by either of those CT log programs, we will stop accepting
> submissions to the Sabre and Mammoth shards at approximately
> 2025-08-25 15:00 UTC. We will continue to operate the read paths for
> these logs at least until each one's expiry window ends.

I think it would be preferable to wait until Elephant and Tiger are Usable in Chrome and Apple if possible, to avoid any reduction in logging capacity. FWIW, I haven't experienced, or received reports of, any trouble monitoring Sabre or Mammoth since the latest CF_CTile fix, so as a monitor I would have no complaints if these logs continued to accept new certificates.

In any case, _thank you_ for continuing to operate the read paths until their expiry windows end, as this will allow SCTs to continue to count as qualified-at-time-of-check.

Regards,
Andrew

Joe DeBlasio

unread,
Aug 19, 2025, 6:22:21 PMAug 19
to Andrew Ayer, Rob Stradling, Certificate Transparency Policy
Thanks for the update, Rob.
 
> Unless directed otherwise

> by either of those CT log programs, we will stop accepting
> submissions to the Sabre and Mammoth shards at approximately
> 2025-08-25 15:00 UTC.  We will continue to operate the read paths for
> these logs at least until each one's expiry window ends.

I think it would be preferable to wait until Elephant and Tiger are Usable in Chrome and Apple if possible, to avoid any reduction in logging capacity. 

I agree with this. If possible, it would be great if Sectigo was willing to continue to accept submissions (on at least Sabre* or Mammoth*) until Tiger shards are usable.

It'd be great if that request volume had somewhere to go without testing other operators' capacity. Elephant shards are already Usable everywhere, but it's a lot to ask of those shards to handle both Sabre's and Mammoth's load. Tiger shards move to Usable in Chrome on September 17th (they're already Usable in Apple's list).

Would you be willing to continue to accept entries until September 18th? (This isn't a directive from the Chrome CT program, though, and if there are countervailing forces, that'd be good to know.)
 
In any case, _thank you_ for continuing to operate the read paths until their expiry windows end, as this will allow SCTs to continue to count as qualified-at-time-of-check.

Indeed, thank you very much for keeping this data available until their contents expire. I appreciate the CT citizenship very much.

In order for the SCTs to count towards the qualified-at-time-of-check, we'd need to transition the logs to ReadOnly. Internally, we have very mixed feelings about the ReadOnly state and regularly debate whether we should deprecate the state entirely, but this does seem like basically the ideal use case for it, so we'll transition the Sabre/Mammoth shards to ReadOnly aligned with when the logs stop accepting submissions.

Joe 

Rob Stradling

unread,
Aug 20, 2025, 4:56:26 AMAug 20
to Certificate Transparency Policy, Joe DeBlasio, Rob Stradling, Certificate Transparency Policy, Andrew Ayer
Andrew, Joe,

Thanks for that feedback.  Yes, we're willing to continue to accept submissions to Sabre* and Mammoth* until the Tiger shards are Usable.

> Unless directed otherwise by either of those CT log programs, we will stop accepting submissions to the Sabre and Mammoth shards at approximately 2025-08-25 15:00 UTC.

Update: We'll stop accepting submissions to the Sabre and Mammoth shards at approximately 2025-09-18 15:00 UTC.

Andrew Ayer

unread,
Aug 20, 2025, 1:03:51 PMAug 20
to Joe DeBlasio, Certificate Transparency Policy
On Tue, 19 Aug 2025 15:22:00 -0700
Joe DeBlasio <jdeb...@chromium.org> wrote:

> Tiger shards move to Usable in Chrome on September 17th (they're already Usable in Apple's list).

This is odd - https://valid.apple.com/ct/log_list/current_log_list.json currently shows "qualified" for all Tiger and Elephant shards, albeit with a transition date well over 70 days ago. Are you actually seeing "usable" or are you going based on the date?

> In order for the SCTs to count towards the qualified-at-time-of-check, we'd need to transition the logs to ReadOnly. Internally, we have very mixed feelings about the ReadOnly state and regularly debate whether we should deprecate the state entirely, but this does seem like basically the ideal use case for it, so we'll transition the Sabre/Mammoth shards to ReadOnly aligned with when the logs stop accepting submissions.

I think it would be a mistake to deprecate ReadOnly - there seems to be a clear need to be able to gracefully wind down logs in this manner.

Regards,
Andrew

Rob Stradling

unread,
Aug 20, 2025, 1:12:29 PMAug 20
to Certificate Transparency Policy, Andrew Ayer, Certificate Transparency Policy, Joe DeBlasio
> > Tiger shards move to Usable in Chrome on September 17th (they're already Usable in Apple's list).
>
> This is odd - https://valid.apple.com/ct/log_list/current_log_list.json currently shows "qualified" for all Tiger and Elephant shards, albeit with a transition date well over 70 days ago. Are you actually seeing "usable" or are you going based on the date?

I quizzed Apple yesterday regarding why Elephant and Tiger are still listed as "Qualified" in Apple's CT log list, despite having been in that state for a lot longer than 70 days.

Apple replied:
"They should move to Usable in the next Log List update, which should be published this week."

Joe DeBlasio

unread,
Aug 20, 2025, 2:30:48 PMAug 20
to Andrew Ayer, Certificate Transparency Policy
This is odd - https://valid.apple.com/ct/log_list/current_log_list.json currently shows "qualified" for all Tiger and Elephant shards, albeit with a transition date well over 70 days ago.  Are you actually seeing "usable" or are you going based on the date?

I did a cursory glance at their loglist to see that they qualified Tiger* at the same time Elephant* was qualified, and responded without thinking too hard about it. That was an oversight -- apologies. 

(Hopefully the point still stands. I don't know how Apple does this, but for Chrome's log list, the Usable state is purely signaling to external consumers that the ecosystem that at least 70 days has elapsed since the log's qualification. There is no technical distinction in Chrome between Qualified and Usable, and any log in our list that is Qualified for longer than 70 days is technically safe to use, even when the "Usable" state hasn't been added yet. Obviously certificate submitters should still probably wait until the designation is made, for simplicity and safety, but still.)

Joe
Reply all
Reply to author
Forward
0 new messages