Cloudflare Raio and Nimbus2027 log shards

481 views
Skip to first unread message

Luke Valenta

unread,
Jul 30, 2025, 4:17:27 PMJul 30
to Certificate Transparency Policy
Hi folks,

We've launched the following new log shards and submitted them for inclusion to the Chrome and Apple CT programs. More details for these logs are available at https://issues.chromium.org/issues/434895698.

rfc6962:
- nimbus2027

static-ct-api:
- raio2025h2a
- raio2026h1a
- raio2026h2a
- raio2027h1a
- raio2027h2a

The static-ct-api-compatible Raio ('ray-o') logs are deployed on Cloudflare Workers using our open-source Azul implementation. We welcome feedback as GitHub issues or you can reach us at ct-...@cloudflare.com.

We also plan to spin down the cftest2025h1 and cftest2025h2 test logs in the coming weeks.

Best,
Luke

--
Luke Valenta
Systems Engineer - Research

Luke Valenta

unread,
Jul 31, 2025, 9:58:41 AMJul 31
to Certificate Transparency Policy
Hi folks,

The Chrome team identified an integrity violation (missing tiles) in the raio2025h2 log shard (https://issues.chromium.org/issues/434895698#comment8), so we have withdrawn the inclusion request for all raio log shards. We'll give a full report when we understand the root cause.

Best,
Luke

Joe DeBlasio

unread,
Jul 31, 2025, 11:16:24 AMJul 31
to Luke Valenta, Certificate Transparency Policy
As much as I'd would love to claim we noticed, all credit goes to Andrew Ayer for that discovery :-)

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CAAUDTJgUYYURscKFxDFkz-BAQO4sw-WJjTMwqJEjzvA9%2BzbVJw%40mail.gmail.com.

Luke Valenta

unread,
Jul 31, 2025, 11:26:32 AMJul 31
to Joe DeBlasio, Certificate Transparency Policy
Thanks Andrew! Our CT monitor also detected and logged the issue, but just didn't issue alerts (we'll get this fixed as well).

-Luke

Luke Valenta

unread,
Aug 6, 2025, 12:35:29 PMAug 6
to Joe DeBlasio, Certificate Transparency Policy

Hi folks,


We've identified and fixed the issue in the Azul-based Raio static-ct-api logs that caused missing tiles.


A recent refactor (f10355f) introduced the bug by changing how the log recovered from 'fatal' sequencing errors (e.g., failing to write tiles to object storage) that require reloading the log to get back into a good state. After the refactor, the log attempted to immediately reload state after a sequencing failure, but critically did not correctly handle the error if reloading failed as well. This allowed the log to continue in a bad state despite the missing tiles. The fix (d38ccac) is to force the log to reload before the next sequencing operation so that the recovery mechanisms kick in.


Thus, the bug could be triggered by two transient failures when writing or reading to object storage. The regression quickly surfaced for the raio2025h2a log shard at 2025-07-30T10:24:00Z (approximately 18 hours after submitting the inclusion request to the Chrome and Apple CT programs), resulting in missing tiles and an irrecoverable integrity violation. We subsequently withdrew the inclusion requests.


Timeline:


[2025-07-21T13:01Z] Bug introduced in commit f10355f.

[2025-07-28T16:17Z] Raio log shards deployed.

[2025-07-28T17:42Z] Start monitoring Raio log shards with CT monitor

[2025-07-29T13:40Z] Start adding artificial load on test logs with entries from equivalent Tucscolo logs.

[2025-07-29T16:37Z] Submit Raio logs shards for inclusion in Chrome and Apple CT programs.

[2025-07-30T10:24Z] Bug triggered for raio2025h2a log shard.

[2025-07-30T11:11Z] Bug detected by Cloudflare ct-monitor as 404 error message (no alerts fired)

[2025-07-31T13:14Z] Missing tiles reported by Google.

[2025-07-31T13:48Z] Inclusion requests withdrawn.

[2025-08-05T12:13Z] Bug fix merged in commit d38ccac.

[2025-08-05T14:07Z] Bug fix deployed for all Raio log shards. Raio2025h2b launched to replace Raio2025h2a.


As the fix is now in place, we will resume the inclusion requests for the following log shards, with raio2025h2b replacing raio2025h2a as that was the only log shard for which the bug triggered.


What went poorly:

- Existing tests were insufficient to catch the regression in the log's safe recovery mechanisms. The library itself has tests for safe recovery, but the bug was in the application code calling the library. Adding end-to-end tests of the full application with built-in network fault injection could have helped to surface the issue sooner.

- The commit that introduced the bug was part of a larger refactoring, which contributed to allowing the bug to slip through manual review.

- Our own CT monitor detected the issue as it was unable to retrieve the missing tiles, but it did not fire alerts. Thus, we only learned about the bug from external reports.


What went well:

- The bug was reported to us quickly thanks to the diligence of Andrew Ayer and the Chrome team.

- The artificial load we've been putting on the log (cross-pollinating entries from the equivalent Tuscolo log shard -- thanks Filippo!) helped to surface the bug while the impact was low.


Going forward, we’ll work towards addressing the shortcomings mentioned above so that bugs are less likely to occur, and that we are quicker to detect and react to operational issues.


Best,

Luke



Andrew Ayer

unread,
Aug 20, 2025, 12:53:55 PMAug 20
to Luke Valenta, ct-p...@chromium.org
Hi Luke,

I'm noticing some noncompliant behavior with the chains being served by Raio.

For example, Raio 2025h2b entry 141066818's chain terminates with the this certificate: https://raio2025h2b.ct.cloudflare.com/issuer/76b27b80a58027dc3cf1da68dac17010ed93997d0b603e2fadbe85012493b5a7

However, this certificate does not appear in the get-roots response. A certificate with the same key and subject (C = US, O = Google Trust Services LLC, CN = GTS Root R4) does appear, but I don't believe that's compliant - RFC6962 says that the last certificate in the chain must be a root *certificate* accepted by the log, rather than a trust anchor or CA or similar language.

This should be fixed for future entries, but I tentatively don't think it's necessary to restart the logs or fix existing entries - the entry above is still attributable to a CA with its existing chain, and we can just pretend that the last certificate was returned by get-roots at the time the entry was logged.

Regards,
Andrew

Luke Valenta

unread,
Aug 20, 2025, 1:35:32 PMAug 20
to Andrew Ayer, ct-p...@chromium.org
Hi Andrew,

Thanks for the report! We'll investigate and report back soon.

Best,
Luke

Luke Valenta

unread,
Aug 20, 2025, 5:11:32 PMAug 20
to Andrew Ayer, ct-p...@chromium.org
Hi folks,

We've investigated the reported issue and have a patch ready at https://github.com/cloudflare/azul/pull/92.

The tl;dr is that while we were only storing the chain hashes that were submitted in the add-[pre-]chain request without correctly appending the log-accepted root certificate. This means that for entries submitted without the complete chain, the corresponding data tiles will not have a complete chain to a root served by get-roots (or even one with a matching key and subject).

The good news is that this is a recoverable issue as the "certificate_chain" field of data tile entries is not authenticated in the tree. Pending further guidance from CT log programs and the CT community, our plan will be to deploy the bug fix and then (carefully!) rewrite existing data tiles for all Raio logs to include the complete certificate chains.

Thanks again, Andrew, for the vigilance!

Best,
Luke


Andrew Ayer

unread,
Aug 20, 2025, 5:45:32 PMAug 20
to Luke Valenta, 'Luke Valenta' via Certificate Transparency Policy
Thanks for the speedy investigation and fix!

Sounds like the impact was worse than I thought, if the chain could end in a cert that doesn't even have a matching key and subject. Given that, I think it would be bad to leave the old entries as-is. Your plan sounds good to me.

Regards,
Andrew

On Wed, 20 Aug 2025 17:11:15 -0400
"'Luke Valenta' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> --
> You received this message because you are subscribed to the Google
> Groups "Certificate Transparency Policy" group. To unsubscribe from
> this group and stop receiving emails from it, send an email to
> ct-policy+...@chromium.org. To view this discussion visit
> https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CAAUDTJhSPtGD1R6%2BjdCsYfuqthJnn7S4vYpfXh9dWdVzfoWWFQ%40mail.gmail.com.

Joe DeBlasio

unread,
Aug 22, 2025, 7:16:27 PMAug 22
to Andrew Ayer, Luke Valenta, 'Luke Valenta' via Certificate Transparency Policy
Thanks for the investigation, and the remediation plan, Luke. Fixing the old entries such that the log is then indistinguishable from a log that never had an issue sounds good to us.

(good luck🤞!)

Joe

Andrew Ayer

unread,
Aug 25, 2025, 12:15:03 PM (13 days ago) Aug 25
to Luke Valenta, 'Luke Valenta' via Certificate Transparency Policy
Hi Luke,

Unfortunately, the patch for the chain issue is incomplete. While the certificate_chain array now includes the root's fingerprint, the root is not being stored under /issuer. So now the logs contain entries (e.g. 1160972 in Raio 2026h1a) which reference non-existent issuers.

The broken references are preventing my monitor from making progress and causing large backlogs to accumulate. It would be great if certificate submission could be disabled until this is fixed.

Regards,
Andrew

Luke Valenta

unread,
Aug 25, 2025, 1:03:50 PM (13 days ago) Aug 25
to Andrew Ayer, 'Luke Valenta' via Certificate Transparency Policy
Thanks Andrew--we should have a patch for this deployed shortly, and will make sure that the missing issuers are uploaded. If for any reason we don't have the patch out this afternoon, we'll temporarily disable submissions.

Best,
Luke

Luke Valenta

unread,
Aug 25, 2025, 2:43:20 PM (13 days ago) Aug 25
to Andrew Ayer, 'Luke Valenta' via Certificate Transparency Policy
Hi Andrew (this time without dropping ct-policy from cc),

We've deployed the fix (https://github.com/cloudflare/azul/pull/93) and uploaded missing roots to /issuers. Would you be able to confirm if you're able to again crawl the logs?

Thanks,
Luke

Andrew Ayer

unread,
Aug 25, 2025, 2:56:31 PM (13 days ago) Aug 25
to Luke Valenta, 'Luke Valenta' via Certificate Transparency Policy
On Mon, 25 Aug 2025 14:43:04 -0400
"'Luke Valenta' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> Hi Andrew (this time without dropping ct-policy from cc),
>
> We've deployed the fix (https://github.com/cloudflare/azul/pull/93)
> and uploaded missing roots to /issuers. Would you be able to confirm
> if you're able to again crawl the logs?

Thanks Luke. Yes, I'm able to crawl the logs again.

Regards,
Andrew

Luke Valenta

unread,
Aug 27, 2025, 2:05:17 PM (11 days ago) Aug 27
to Andrew Ayer, 'Luke Valenta' via Certificate Transparency Policy
Hi folks,

Just to provide an update on the original reported issue, we've added a job to repair data tiles that may have been impacted (https://github.com/cloudflare/azul/pull/101). We're running this first for our dev and cftest log shards, and then will deploy for the raio log shards after triple-checking that everything looks correct.

I'll provide another update when the logs are all fully repaired, likely in a week or two.

Best,
Luke

Luke Valenta

unread,
Sep 2, 2025, 3:34:19 PM (5 days ago) Sep 2
to Andrew Ayer, 'Luke Valenta' via Certificate Transparency Policy
Hi folks,

The Raio log shards are now all fully repaired to include the accepted root in each entry's fingerprint chain, and I've purged the cache so that no clients should still be served the old tiles. Please do let us know if you see further issues. For those curious, you can see a diff for a tile where 98 entries were updated with the following command:


Best,
Luke
Reply all
Reply to author
Forward
0 new messages