Cloudflare Nimbus outage

1,855 views
Skip to first unread message

Dina Kozlov

unread,
Nov 3, 2023, 9:42:37 AM11/3/23
to Certificate Transparency Policy
Hi all, 

Cloudflare is aware of ongoing issues with Nimbus. Our team is working on restoring the service and will provide a post-mortem analysis. 

Best, 
Dina

Dina Kozlov

unread,
Nov 3, 2023, 6:01:51 PM11/3/23
to Certificate Transparency Policy, Dina Kozlov

The CT logs are now restored and accepting requests again. The incident impacted the 2023, 2024, and 2025 Cloudflare Nimbus logs. During the impact windows, listed below, Cloudflare was unable to accept and process new log entries. The logs are now fully restored and are catching up. We will send a post-mortem as soon as it’s ready. 

Impact windows: 

Nimbus 2023 — 2023-11-02 11:44 UTC to 2023-11-03 19:50:00 UTC

Nimbus 2024 — 2023-11-02 11:44 UTC to 2023-11-03 20:00:00 UTC

Nimbus 2025 — 2023-11-02 11:44 UTC to 2023-11-03 18:00:00 UTC

Chris Thompson

unread,
Nov 3, 2023, 6:53:34 PM11/3/23
to Dina Kozlov, Certificate Transparency Policy
Hi Dina -- Thanks for the update, and glad you were able to get the logs up again. I checked for inclusion proofs for some of the certificates we saw that had not been incorporated by the MMD, and they appear to still not be included (e.g., https://crt.sh/?q=6144c13f475483d218950627b22363b6115a970f for Nimbus2023 and https://crt.sh/?q=cea7ddec9062eb9bfe680c96ca0b519814400e9f for Nimbus2024), but hopefully the log can catch up on previous submissions soon -- please give us another update once the logs have worked through their submission backlogs so we can work on verifying that everything has been successfully included.

Cheers,
Chris

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/80a93305-c019-4040-9ce2-402bc52678b9n%40chromium.org.

Leland Garofalo

unread,
Nov 4, 2023, 12:16:32 PM11/4/23
to Certificate Transparency Policy, Chris Thompson, Certificate Transparency Policy, Dina Kozlov
The Cloudflare Nimbus 2023 and 2024 Logs are not accepting any new submissions starting 2023-11-04 16:04 UTC. Existing submissions to the Cloudflare Nimbus 2023 and 2024 Logs are delayed in processing since 2023-11-04 06:25 UTC. The Cloudflare Nimbus 2025 Log is accepting submissions and there are no delays in processing. We will provide the next update on Monday.

Kurt Roeckx

unread,
Nov 4, 2023, 1:20:20 PM11/4/23
to Leland Garofalo, Certificate Transparency Policy, Chris Thompson, Dina Kozlov
I'm getting 500 status error from nimbus2024, 503 from
cirrus.


Kurt
> >> <https://groups.google.com/a/chromium.org/d/msgid/ct-policy/80a93305-c019-4040-9ce2-402bc52678b9n%40chromium.org?utm_medium=email&utm_source=footer>
> >> .
> >>
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/82b12895-aa01-4d82-b0b3-227f3320f5bdn%40chromium.org.

Dina Kozlov

unread,
Nov 4, 2023, 7:31:09 PM11/4/23
to Kurt Roeckx, Leland Garofalo, Certificate Transparency Policy, Chris Thompson
Hi, 

The 500 errors are expected since we have shut down submissions for the 2023 and 2024 logs. 

Kurt Roeckx

unread,
Nov 4, 2023, 7:38:39 PM11/4/23
to Dina Kozlov, Leland Garofalo, Certificate Transparency Policy, Chris Thompson
On Sat, Nov 04, 2023 at 11:30:50PM +0000, Dina Kozlov wrote:
> Hi,
>
> The 500 errors are expected since we have shut down submissions for the
> 2023 and 2024 logs.

This is on get-entries.


Kurt

Dina Kozlov

unread,
Nov 5, 2023, 9:50:50 AM11/5/23
to Kurt Roeckx, Leland Garofalo, Certificate Transparency Policy, Chris Thompson
This is expected with the current state of our log. 

Dina Kozlov

unread,
Nov 6, 2023, 7:23:30 PM11/6/23
to Certificate Transparency Policy, Dina Kozlov, Leland Garofalo, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx
Hi all, 

An update from the Cloudflare side: 
We are deprioritizing our focus on the 2023 Nimbus log, since it's approaching the end of its usable period. In addition, our team is working to recover the 2024 Nimbus log. Cloudflare experienced an outage which impacted our CT log availability. While our CT log was unreachable during the time of the incident, we didn't lose any data and retained all successful log submissions. 

Our question to the community: Is it better for Cloudflare to stand up a new log shard from a new endpoint, so that we can start accepting new submissions, or is it better for us to focus on restoring the existing log and continue to use the existing endpoint? 
The response from the community will help us prioritize our next steps. 

Best regards, 
Dina

Amir Omidi

unread,
Nov 6, 2023, 7:34:44 PM11/6/23
to Dina Kozlov, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx, Leland Garofalo
Hi!

Email on a personal capacity. 

Is the 2023 log fully broken at this point? As in are there inconsistencies on the tree? If not, is it possible to turn it back on and let the various ct policy teams decide if it should get disqualified?

Note I have no idea what the scope of the failure looks like. This is under the assumption that the service can be turned back on and it would operate like before, just with a bit of a data gap?

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Dina Kozlov

unread,
Nov 6, 2023, 7:54:46 PM11/6/23
to Amir Omidi, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx, Leland Garofalo
"This is under the assumption that the service can be turned back on and it would operate like before, just with a bit of a data gap?" -- That's correct 

Aaron Gable

unread,
Nov 6, 2023, 8:10:36 PM11/6/23
to Dina Kozlov, Amir Omidi, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx, Leland Garofalo
I am of the opinion that it is in everyone's best interest for Cloudflare to spin up a new 2024 log on a new endpoint. Not because I have any doubts about the integrity of the existing 2024 log (once it's back online), but because incidents like this clearly indicate the value of redundancy in logs. Multiple co-extant Cloudflare 2024 logs, operating in separate datacenters, would significantly decrease the chance that both are down at the same time, and decrease the load that must be absorbed by other CT logs if/when one does go down. Just like Google operates both Argon and Xenon, I believe that it would be good for Cloudflare to operate both Nimbus and (say) Cirrus.

Aaron

Devon O'Brien

unread,
Nov 7, 2023, 4:31:14 PM11/7/23
to Certificate Transparency Policy, dko...@cloudflare.com, lel...@cloudflare.com, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx
Hi Dina,

Thank you for the update on the efforts to recover the Nimbus CT logs from the outage last week. We are glad to hear that all submission data has been retained; can you please confirm that all submissions for which Nimbus logs issued SCTs have been fully incorporated? If that has not yet happened, can you please let us know when you expect that to be completed? This is needed to ensure completeness and auditability of these logs.

We appreciate that the path to recovery post-incident requires some prioritization given finite resources. From our perspective, Cloudflare’s CT team should prioritize the following next steps:
1. Ensure that all pending submissions have been incorporated into Nimbus 2023, Nimbus 2024, and Nimbus 2025.
2. Restore each of these logs to normal operation as soon as possible. We are treating this as an ongoing incident until they are again fully operational.

We appreciate that there are only about 8 weeks left in Nimbus 2023’s expiry range, but standing down this log will both impact newly-issued short-term certificates and constrain logging options for existing certificates (e.g. those that use OCSP- or TLS-delivered SCTs). As for Nimbus 2024, restoring both read and write API access to this log should be a high priority, since even if you had a replacement log stood up today, it would take over 3 months before the log was usable for certificates expiring in 2024. At this time, we are not asking for Cloudflare to spin up a replacement log, especially at the cost of restoring existing logs to normal operation in a consistent and auditable state.

Once logs have been restored, it would help both CT-enforcing user agents as well as other current and prospective log operators if you could provide a post-mortem with a more detailed explanation of root cause, recovery efforts, and mitigation steps in the event of Nimbus experiencing another DC outage.

Finally, as others have noted in this thread, we would encourage Cloudflare to explore operating a parallel set of (possibly cloud-themed!) CT logs with matching expiry ranges, ideally in a different datacenter or disaster recovery zone. This would help support a healthy CT ecosystem by mitigating future exposure to similar incidents.

Best,
Devon

Dina Kozlov

unread,
Nov 7, 2023, 7:06:24 PM11/7/23
to Certificate Transparency Policy, Devon O'Brien, dko...@cloudflare.com, lel...@cloudflare.com, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx

Thank you all for the input, we appreciate the feedback from the community. Given the current state of our log, we have decided to take the following steps:

  1. We will spin up a replacement Nimbus 2024 log. Our goal is to have the replacement log ready by the end of tomorrow, at which point we plan to submit it for the compliance monitoring phase. If there's anything that can expedite the process to make the log Usable, please let us know.
  2. We will rebuild the current Nimbus 2024 log, so that it's in a readable state. Once the log is back up, we will incorporate the submissions that have been accepted but not processed. 
  3. The Nimbus 2025 Log is completely operational and has all pending submissions incorporated.
  4. The Nimbus 2023 log will continue to be deprioritized.

In addition, we will share the root cause analysis, once it’s ready. 

Best, 

Dina

Andrew Ayer

unread,
Nov 7, 2023, 8:39:30 PM11/7/23
to Certificate Transparency Policy
On Tue, 7 Nov 2023 13:31:14 -0800 (PST)
"'Devon O'Brien' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> can you please confirm that all submissions for which Nimbus logs
> issued SCTs have been fully incorporated?

I have observed 227 unincorporated SCTs from Nimbus 2023, and
36,345 unincorporated SCTs from Nimbus 2024.

Regards,
Andrew

Devon O'Brien

unread,
Nov 8, 2023, 7:01:49 AM11/8/23
to Certificate Transparency Policy, dko...@cloudflare.com, Devon O'Brien, lel...@cloudflare.com, Certificate Transparency Policy, Chris Thompson, Kurt Roeckx
Hi Dina,

Thanks for more information about your planned next steps. Can you explain what the rationale is for bringing Nimbus 2024 into a fully recovered state but then replacing it with a separate log that has to undergo a full application process and usability waiting period? Is there a separate CT UA policy motivation for this? Is it something infrastructure-specific to this log? The lowest friction response here would seem to be turning submissions back on for Nimbus 2024 once it is fully recovered. 

-Devon

Andrew Ayer

unread,
Nov 8, 2023, 8:02:18 AM11/8/23
to Dina Kozlov, Certificate Transparency Policy
Hi Dina,

1. Do you have an ETA for when all submissions to Nimbus 2023 will be
incorporated?

2. Do you have an ETA for when all submissions to Nimbus 2024 will be
incorporated?

Nimbus 2023 and 2024 have collectively issued tens of thousands of SCTs
for (pre)certificates which are being relied upon by Chrome clients,
but which cannot be accessed by monitors. This is a bad state for the
CT ecosystem to be in, and can only be addressed by the recovery of the
logs or the retirement of the logs. This is why was a good idea for
Chrome to recommend that Cloudflare's #1 priority be to "ensure that all
pending submissions have been incorporated into Nimbus 2023, Nimbus
2024, and Nimbus 2025." It's difficult to understand why you would
reject that recommendation without explanation.

Regards,
Andrew

On Tue, 7 Nov 2023 16:06:24 -0800 (PST)
"'Dina Kozlov' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

>
>
> Thank you all for the input, we appreciate the feedback from the
> community. Given the current state of our log, we have decided to
> take the following steps:
>
>
> 1. We will spin up a replacement Nimbus 2024 log. Our goal is to
> have the replacement log ready by the end of tomorrow, at which point
> we plan to submit it for the compliance monitoring phase. If there's
> anything that can expedite the process to make the log Usable, please
> let us know.
> 2. We will rebuild the current Nimbus 2024 log, so that it's in a
> readable state. Once the log is back up, we will incorporate the
> submissions that have been accepted but not processed.
> 3. The Nimbus 2025 Log is completely operational and has all
> pending submissions incorporated.
> 4. The Nimbus 2023 log will continue to be deprioritized.
>
> In addition, we will share the root cause analysis, once it___s ready.
>
> Best,
>
> Dina
> On Tuesday, November 7, 2023 at 1:31:14___PM UTC-8 Devon O'Brien wrote:
>
> > Hi Dina,
> >
> > Thank you for the update on the efforts to recover the Nimbus CT
> > logs from the outage last week. We are glad to hear that all
> > submission data has been retained; can you please confirm that all
> > submissions for which Nimbus logs issued SCTs have been fully
> > incorporated? If that has not yet happened, can you please let us
> > know when you expect that to be completed? This is needed to ensure
> > completeness and auditability of these logs.
> >
> > We appreciate that the path to recovery post-incident requires some
> > prioritization given finite resources. From our perspective,
> > Cloudflare___s CT team should prioritize the following next steps:
> > 1. Ensure that all pending submissions have been incorporated into
> > Nimbus 2023, Nimbus 2024, and Nimbus 2025.
> > 2. Restore each of these logs to normal operation as soon as
> > possible. We are treating this as an ongoing incident until they
> > are again fully operational.
> >
> > We appreciate that there are only about 8 weeks left in Nimbus
> > 2023___s expiry range, but standing down this log will both impact
> > newly-issued short-term certificates and constrain logging options
> > for existing certificates (e.g. those that use OCSP- or
> > TLS-delivered SCTs). As for Nimbus 2024, restoring both read and
> > write API access to this log should be a high priority, since even
> > if you had a replacement log stood up today, it would take over 3
> > months before the log was usable for certificates expiring in 2024.
> > At this time, we are not asking for Cloudflare to spin up a
> > replacement log, especially at the cost of restoring existing logs
> > to normal operation in a consistent and auditable state.
> >
> > Once logs have been restored, it would help both CT-enforcing user
> > agents as well as other current and prospective log operators if
> > you could provide a post-mortem with a more detailed explanation of
> > root cause, recovery efforts, and mitigation steps in the event of
> > Nimbus experiencing another DC outage.
> >
> > Finally, as others have noted in this thread, we would encourage
> > Cloudflare to explore operating a parallel set of (possibly
> > cloud-themed!) CT logs with matching expiry ranges, ideally in a
> > different datacenter or disaster recovery zone. This would help
> > support a healthy CT ecosystem by mitigating future exposure to
> > similar incidents.
> >
> > Best,
> > Devon
> >
> > On Monday, November 6, 2023 at 4:23:30___PM UTC-8
> > dko...@cloudflare.com wrote:
> >
> >> Hi all,
> >>
> >> *An update from the Cloudflare side: *
> >> We are deprioritizing our focus on the 2023 Nimbus log, since it's
> >> approaching the end of its usable period. In addition, our team is
> >> working to recover the 2024 Nimbus log. Cloudflare experienced an
> >> outage
> >> <https://blog.cloudflare.com/post-mortem-on-cloudflare-control-plane-and-analytics-outage/
> >> > which impacted our CT log availability. While our CT log was
> >> > unreachable during the time of the incident, we didn't lose any
> >> > data and retained all
> >> successful log submissions.
> >>
> >> *Our question to the community:* Is it better for Cloudflare to
> >> stand up a new log shard from a new endpoint, so that we can start
> >> accepting new submissions, or is it better for us to focus on
> >> restoring the existing log and continue to use the existing
> >> endpoint? The response from the community will help us prioritize
> >> our next steps.
> >>
> >> Best regards,
> >> Dina
> >> On Sunday, November 5, 2023 at 6:50:50___AM UTC-8 Dina Kozlov wrote:
> >>
> >>> This is expected with the current state of our log.
> >>>
> >>> On Sat, Nov 4, 2023 at 4:38___PM Kurt Roeckx <ku...@roeckx.be>
> >>> wrote:
> >>>
> >>>> On Sat, Nov 04, 2023 at 11:30:50PM +0000, Dina Kozlov wrote:
> >>>> > Hi,
> >>>> >
> >>>> > The 500 errors are expected since we have shut down
> >>>> > submissions for
> >>>> the
> >>>> > 2023 and 2024 logs.
> >>>>
> >>>> This is on get-entries.
> >>>>
> >>>>
> >>>> Kurt
> >>>>
> >>>>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Certificate Transparency Policy" group. To unsubscribe from
> this group and stop receiving emails from it, send an email to
> ct-policy+...@chromium.org. To view this discussion on the
> web visit
> https://groups.google.com/a/chromium.org/d/msgid/ct-policy/828d90e6-e620-4896-8e1d-b9d6beb589c3n%40chromium.org.

blanka.sarajova1954

unread,
Nov 8, 2023, 8:09:08 AM11/8/23
to dko...@cloudflare.com, Chris Thompson, Certificate Transparency Policy, lel...@cloudflare.com, Devon O'Brien, Kurt Roeckx




Odesláno z mého zařízení Galaxy


-------- Původní zpráva --------
Od: 'Devon O'Brien' via Certificate Transparency Policy <ct-p...@chromium.org>
Datum: 08.11.23 13:01 (GMT+01:00)
Komu: Certificate Transparency Policy <ct-p...@chromium.org>
Cc: "dko...@cloudflare.com" <dko...@cloudflare.com>, Devon O'Brien <asymm...@google.com>, "lel...@cloudflare.com" <lel...@cloudflare.com>, Certificate Transparency Policy <ct-p...@chromium.org>, Chris Thompson <cth...@google.com>, Kurt Roeckx <ku...@roeckx.be>
Předmět: Re: [ct-policy] Re: Cloudflare Nimbus outage

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Clint Wilson

unread,
Nov 8, 2023, 12:24:32 PM11/8/23
to dko...@cloudflare.com, Certificate Transparency Policy, lel...@cloudflare.com, Chris Thompson, Kurt Roeckx, Devon O'Brien
Hi Dina,

For what it's worth, I heavily agree (based on what’s been shared so far) with the recommendations of prioritizing 1) incorporation of all submissions to the Nimbus logs and 2) returning the Nimbus 2024 log to a normal functioning state once recovered. However, perhaps I've misunderstood something about the analysis to this point which is blocking these, so I look forward to the further root cause analysis to aid in reaching a shared understanding on the chosen course of action. Thank you (and the team) for your communication here!

Cheers!
-Clint

-- 
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Dina Kozlov

unread,
Nov 8, 2023, 3:53:40 PM11/8/23
to Certificate Transparency Policy, Clint Wilson, lel...@cloudflare.com, Chris Thompson, Kurt Roeckx, Devon O'Brien, dko...@cloudflare.com
Hi all,

To give some context around our decision making, the Nimbus 2023 and Nimbus 2024 CT logs were initially built on HBase, while Nimbus 2025 is built on PostgresSQL. Over the past few years, we have encountered a number of challenges with maintaining the CT log on HBase. We have spent the last week investigating the work that would need to be done in order to rebuild the Nimbus 2024 log on HBase and we’ve determined that it would take considerably more time to recover the log in its current state than to stand up a new log with a new backend that we know we can reliability trust.

Our priority at the moment is to have a fully functional CT log as quickly as possible. The most efficient and reliable way for us to achieve this is to set up the replacement log on a new endpoint so that it can start the compliance monitoring process. Once the new log is up and running, our team will focus on rebuilding the 2024 log on PostgresSQL, including the unprocessed submissions, and getting the log into a readable state.

We’ve made the decision to deprioritize Nimbus 2023 due to the uncertainty around the timeline for rebuilding the log. Given the log’s usability is limited until the end of the year, our primary focus will be on Nimbus 2024.

Best,
Dina

Dina Kozlov

unread,
Nov 15, 2023, 4:16:13 PM11/15/23
to Certificate Transparency Policy, Dina Kozlov, Clint Wilson, lel...@cloudflare.com, Chris Thompson, Kurt Roeckx, Devon O'Brien
Hi all, 

An update: The Nimbus 2024 log is back up and able to accept new submissions. All unprocessed entries have been incorporated into the log. We are currently running inclusion proof checks on log entries. At this time, we have validated all entries submitted after Wednesday, 2023-11-01 11:13:10 UTC. We will continue to run the validation checks and will send an update once the full process is completed. 

We will now focus on restoring Nimbus 2023 to a functional state.

Best, 
Dina

Joe DeBlasio

unread,
Nov 16, 2023, 7:13:59 PM11/16/23
to Dina Kozlov, Certificate Transparency Policy, Clint Wilson, lel...@cloudflare.com, Chris Thompson, Kurt Roeckx, Devon O'Brien

Hi all,


This email provides an update on Chrome's policy response and perspective on the ongoing Cloudflare Nimbus log outages.


First off, we'd like to acknowledge that this incident is the longest outage from a Chrome-Usable log in many years (possibly ever). While we are grateful to Cloudflare for restoring Nimbus2024, we are aware and sensitive to the risk that this sort of incident causes to the entire CT ecosystem.


However, given the current information available to us, we do not expect to Retire Nimbus2024.


While the significant downtime Nimbus2024 experienced resulted in being unable to verify the inclusion of several thousand certificates with valid SCTs, our understanding is that all certificates for which an SCT has been issued have now been included in the log. While this is a significant violation of the log's MMD, we believe that maintaining the log in the Usable state is in the best interest of our users and the CT ecosystem. 


We will continue to audit SCTs encountered by Chrome users for inclusion in Nimbus2024. If that auditing, or other external sources, uncovers evidence of certificates that were never included, we will move quickly to Retire the log.


This incident is ongoing for Nimbus2023, and we hope that log shard can make a similar recovery in the coming days. If so, our response will be similar.


In general, we aim to ensure that replacing logs is as inexpensive as possible, and where there is direct evidence of log corruption, Chrome moves quickly to retire the log and ask that the operator stand up a replacement. However, when we do not have evidence that the log is corrupted, the situation is more complex. Between initial compliance monitoring and ensuring that changes have rolled out to all Chrome clients, standing up a new log takes at least 100 days. Recovering an existing log, even when it has had substantial downtime, can provide substantially better log availability for CAs and other certificate submitters.


Once Nimbus2023 has been recovered, we're asking that Cloudflare provide a postmortem, as we do following all major log incidents, and as they have offered. Postmortems are importantly not about assigning blame, but rather they provide the community to learn from our collective mistakes and thus reduce the likelihood that similar issues happen in the future. In this particular case, we're asking that the postmortem cover:

  • a root cause analysis for the incident overall,

  • how Cloudflare was confident that the logs could be fully recovered,

  • what major factors contributed to the length of the incident, and

  • what steps Cloudflare is taking to reduce the likelihood of similar outages and long recovery timelines in the future.


Thankfully, by design, Certificate Transparency is robust to the failure of single logs. However, multiple log failures can cause availability issues for certificates. As a reminder, CAs and sites providing SCTs directly who are concerned about the risk of multiple log failures can partly mitigate that risk by providing additional SCTs beyond the minimal set required by user agents.


As always, we're happy to answer any questions you have,

Joe, on behalf of the Chrome CT team



Matt Palmer

unread,
Nov 16, 2023, 8:05:37 PM11/16/23
to ct-p...@chromium.org
On Thu, Nov 16, 2023 at 04:13:39PM -0800, Joe DeBlasio wrote:
> First off, we'd like to acknowledge that this incident is the longest
> outage from a Chrome-Usable log in many years (possibly ever). While we are
> grateful to Cloudflare for restoring Nimbus2024, we are aware and sensitive
> to the risk that this sort of incident causes to the entire CT ecosystem.
>
> However, given the current information available to us, we do not expect to
> Retire Nimbus2024.

[...]

> As always, we're happy to answer any questions you have,

The question that comes to my mind is given that blown MMDs and uptime
requirements are not (any longer?) disqualifying events for a Qualified log,
should those requirements be removed from the CT Log policy?

- Matt

Amir Omidi

unread,
Nov 16, 2023, 8:21:27 PM11/16/23
to Matt Palmer, ct-p...@chromium.org
Emailing on a personal capacity.

I don’t really think that one violation (albeit a long one) is reason to remove the MMD requirements. 

Cloudflare’s logs have consistently been doing well and the way I interpret the rules is more to for the ongoing operations of a log rather than a binary “sorry you broke your promise once, you’re out.”

Maybe language changes can happen to reflect that expectation if that is the intention. 

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Matt Palmer

unread,
Nov 16, 2023, 10:35:54 PM11/16/23
to ct-p...@chromium.org
On Thu, Nov 16, 2023 at 08:21:12PM -0500, Amir Omidi wrote:
> Emailing on a personal capacity.
>
> I don’t really think that one violation (albeit a long one) is reason to
> remove the MMD requirements.

The existence of this incident is not the reason I'm asking whether MMD
requirements should be removed. It's the implications of Chrome's decision
to not remove the logs that has caused me to ask the question.

If blowing the MMD in and of itself introduces no appreciable security risk,
then does there need to be an MMD requirement in the policy? The
counter-factual case, that there *is* a security risk caused solely by a
blown MMD, is somewhat disproven by the fact that the Nimbus logs are not
being removed.

- Matt

Dina Kozlov

unread,
Nov 17, 2023, 9:37:09 AM11/17/23
to Certificate Transparency Policy, Matt Palmer
Acknowledged, thank you for the update. We will continue working on bringing up the 2023 Nimbus log and will share the postmortem shortly after. 

Joe DeBlasio

unread,
Nov 20, 2023, 8:08:05 PM11/20/23
to Matt Palmer, ct-p...@chromium.org
The question that comes to my mind is given that blown MMDs and uptime
requirements are not (any longer?) disqualifying events for a Qualified log,
should those requirements be removed from the CT Log policy? 
 ...
If blowing the MMD in and of itself introduces no appreciable security risk,
then does there need to be an MMD requirement in the policy?  The
counter-factual case, that there *is* a security risk caused solely by a
blown MMD, is somewhat disproven by the fact that the Nimbus logs are not
being removed.

A log not respecting its MMD is a violation of Chrome's log policy, and we take those incidents seriously when they occur. In cases where logs eventually include all certificates for which SCTs have been issued, however, retiring the log does little to reduce the risk of certificate misuse. While we may still Retire logs due to a single incident, it is not the case that a single policy violation always necessitates immediate log retirement. Our incident response focuses on what we believe is best for the ecosystem moving forward.

Thanks for the question,
Joe

 



- Matt

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Matt Palmer

unread,
Nov 20, 2023, 8:38:41 PM11/20/23
to ct-p...@chromium.org
On Mon, Nov 20, 2023 at 05:07:45PM -0800, Joe DeBlasio wrote:
> >
> > The question that comes to my mind is given that blown MMDs and uptime
> > requirements are not (any longer?) disqualifying events for a Qualified
> > log,
> > should those requirements be removed from the CT Log policy?
>
> ...
>
> > If blowing the MMD in and of itself introduces no appreciable security risk,
> > then does there need to be an MMD requirement in the policy? The
> > counter-factual case, that there *is* a security risk caused solely by a
> > blown MMD, is somewhat disproven by the fact that the Nimbus logs are not
> > being removed.
> >
>
> A log not respecting its MMD is a violation of Chrome's log policy, and we
> take those incidents seriously when they occur.

That's tautological, though. "Should this be in the policy?" is not
meaningfully answered by "it's in the policy".

> In cases where logs
> eventually include all certificates for which SCTs have been issued,
> however, retiring the log does little to reduce the risk of certificate
> misuse.

I'll take this as agreement that a blown MMD, at the very least, has "little"
security impact, in and of itself. Absent any indication to the contrary,
I'll take prolonged downtime as being in the same category.

> While we may still Retire logs due to a single incident, it is not
> the case that a single policy violation always necessitates immediate log
> retirement. Our incident response focuses on what we believe is best for
> the ecosystem moving forward.

Which still leaves unaddressed my original question: should MMD (and uptime)
requirements continue to be a part of the Chrome CT Policy? Implicit in
that is the natural followup question: if so, why?

- Matt

K. York

unread,
Nov 26, 2023, 11:13:48 AM11/26/23
to Matt Palmer, ct-p...@chromium.org
I believe that the MMD policy is mostly targeted towards the "slow queue growth" style of incidents -- a situation where the merge delay slowly gets larger and larger. 24 hours is an arbitrary measurement chosen to allow for human responders to get a few tries to fix it.

Not having any upper bound also makes a monitor's job harder -- at what point do you raise alarms that you can't find a proof? The answer is clear with this: 24 hours after first observation of the cert. With no policy, "eventual consistency" has no upper bound.

In summary: the exact value of 24 hours has loose technical justifications but large human factors justifications, and that's why it's on a human time scale instead of a machine time scale (like 30 minutes or 30 seconds).

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Dina Kozlov

unread,
Nov 27, 2023, 12:36:20 PM11/27/23
to Certificate Transparency Policy, K. York, ct-p...@chromium.org, Matt Palmer
Hi all, 

The Nimbus 2023 log is back up and able to accept new submissions. In addition, all unprocessed entries have been incorporated into the log. We started accepting new entries for the Nimbus 2023 log on 2023-11-22 01:15 UTC. We are continuing to run validation checks on both the Nimbus 2023 and Nimbus 2024 entries and will update the group once the validation checks have been completed. 

Best, 

Dina


Matt Palmer

unread,
Nov 27, 2023, 3:54:57 PM11/27/23
to ct-p...@chromium.org
On Sun, Nov 26, 2023 at 08:13:34AM -0800, K. York wrote:
> I believe that the MMD policy is mostly targeted towards the "slow queue
> growth" style of incidents -- a situation where the merge delay slowly gets
> larger and larger. 24 hours is an arbitrary measurement chosen to allow for
> human responders to get a few tries to fix it.
>
> Not having any upper bound also makes a monitor's job harder -- at what
> point do you raise alarms that you can't find a proof? The answer is clear
> with this: 24 hours after first observation of the cert. With no policy,
> "eventual consistency" has no upper bound.

I believe that MMD is defined as the interval between the SCT and visibility
in the log, not from when the cert is first observed.

In any event, I'm not questioning the existence of the MMD as a *concept*,
but rather its continued inclusion in the Chrome CT Log policy as a
mandatory adherence, rather than a nice-to-have.

- Matt

Bas Westerbaan

unread,
Nov 27, 2023, 4:41:06 PM11/27/23
to Matt Palmer, ct-p...@chromium.org
In any event, I'm not questioning the existence of the MMD as a *concept*,
but rather its continued inclusion in the Chrome CT Log policy as a
mandatory adherence, rather than a nice-to-have.

There can be very useful policies in between "violate this and be disqualified immediately" and "we ask nicely, but can't do anything if you break the rule".

Best,

 Bas

Matt Palmer

unread,
Nov 27, 2023, 8:30:46 PM11/27/23
to ct-p...@chromium.org
Except that the Chrome CT Log policy states that they can kick any log out
at any time for any reason, so "can't do anything if you break the rule"
isn't the case ever.

- Matt