Cloudflare Nimbus 2024 & 2054 get-sth not consistent ?

423 views
Skip to first unread message

Bret McGee

unread,
Jan 30, 2024, 12:30:31 PMJan 30
to Certificate Transparency Policy
Hi All

I'm working on some personal projects with CT logs and during my work I found Cloudflare Nimbus 2024 and 2025 are not behaving like the other logs when calling get-sth.  (As far as I can tell - these two certainly stick out).

The get-sth signature is always valid, but the tree size and timestamp are not always in chronological order.


See the small screenshot below;  The table is in chronological order with the api_call_date_time in UTC+00:00.  The timestamp and tree_size is the actual data from the call.  The red rows are ones that have "gone back in time".  Over 45 minutes old in some cases.

It's as if some API calls are hitting servers with old tree heads?  No idea what could be causing this - but it was surprising to know your downloads are up to date then suddenly you have 1000s more that the signed tree head.  Anyway, seems like there is some issue somewhere.

If you need any more information please ask.


nimbus.png

Regards
Bret

Dina Kozlov

unread,
Jan 30, 2024, 5:01:07 PMJan 30
to Bret McGee, Certificate Transparency Policy
Thank you for reporting the issue! We had a cache setting misconfigured that was causing old tree heads to be cached for longer than expected. It should now be resolved, but please reach out if you notice any further issues. 

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/3057fade-915f-4eb6-8602-3187c8aa2f57n%40chromium.org.

Bret McGee

unread,
Jan 30, 2024, 7:08:15 PMJan 30
to Certificate Transparency Policy, Dina Kozlov, Certificate Transparency Policy, Bret McGee
Hi Dina,

It's my pleasure, thank you for investigating so quickly.  It's late here in the UK - my system is still running so I will review in the morning to see if things have improved.

A wider question to the group:  Should the monitors have detected this issue and fired an alert?  This looks like a monitoring gap to me.

[of course I made a typo in the subject: 2054 should read 2025 ...]

Regards,
Bret.

Devon O'Brien

unread,
Jan 30, 2024, 8:00:23 PMJan 30
to Certificate Transparency Policy, bgam...@gmail.com, dko...@cloudflare.com, Certificate Transparency Policy
Hi Bret,

Thanks for spotting this and for bringing this to the community's attention. You're right that this behavior is something that an attentive CT Monitor (or Auditor, as the roles are often blurred in practice) could have detected, but I wouldn't go so far as to say it's a failure on the part of any CT Monitor/Auditor because their primary purpose is verifying the behaviors outlined in RFC 6962 Section 5.3 (Monitoring) and RFC 6962 Section 5.4 (Auditing). 

So long as they are verifying that STHs are well-formed, signed, consistent,  <= 24 hours old as defined by CT policy, and entries for known SCTs are incorporated into logs within the MMD, Monitors/Auditors are demonstrating logs are performing the requisite checks to ensure the health of the CT log ecosystem. In terms of stale STHs, if a CT Log began serving STHs that were more than 24 hours old, this is where we would expect monitors to detect and alert, as this would interfere with the timely detection of newly-issued certificates.

I agree that the observed behavior isn't ideal (and it appears that Cloudflare rapidly addressed it!), but is there a specific CT workflow you're trying to perform that's stymied by CT Logs caching/serving valid but slightly stale STHs inconsistently? This is something that could be addressable by policy, but we should be very careful about adding additional requirements on log operators beyond what's necessary for the health of the ecosystem. 

-Devon

Philippe Boneff

unread,
Jan 31, 2024, 5:27:03 AMJan 31
to Devon O'Brien, Certificate Transparency Policy, bgam...@gmail.com, dko...@cloudflare.com
Ah, caching can be hard indeed.

+1 to what Devon said, while this can lead to interesting behaviours, I think it's good that the policy leaves room for such operator issues, especially in this case: it was not intentional and swiftly fixed.

Crucially, this was just affecting the STH, right? Or was it also affecting get-entries and get-sth-consistency? For instance, if an STH for size 10 was served, and then an STH for size 3 by a different server. Was the server serving the STH for size 3 able to server entries 3 to 9?

Cheers,
Philippe



Message has been deleted

Bret McGee

unread,
Jan 31, 2024, 6:03:08 PMJan 31
to Certificate Transparency Policy, Devon O'Brien, bgam...@gmail.com, Certificate Transparency Policy
Hi Devon

Thank you for your insightful response.  With respect to workflow it's not causing any problems per se; the download backlog shows as negative until the most recent head comes back.  This can easily be worked around my side by using MAX(tree_size) rather than MAX(create_date_time).

All the best,
Bret

Bret McGee

unread,
Jan 31, 2024, 6:03:19 PMJan 31
to Certificate Transparency Policy, Dina Kozlov, Certificate Transparency Policy, Bret McGee
Hi Dina,

The get-sth logs in my database this morning still exhibit the same chronological ordering issues as first reported.  The screenshot below demonstrates:

jan31snapshot.png

Please reach out if you need further details.

Regards,
Bret.

On Tuesday 30 January 2024 at 22:01:07 UTC Dina Kozlov wrote:
Reply all
Reply to author
Forward
0 new messages