Guidance for weekly updates on Bugzilla incidents

Dimitris Zacharopoulos

unread,

Jun 23, 2021, 6:33:32 AM6/23/21

to dev-secur...@mozilla.org

I would like to ask for some guidance after reviewing a recent comment in a bug regarding the need for "weekly updates".

CAs are required to provide timely updates on open incidents as described in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed.

If a CA claims to have completed all remediation steps, and the bug is pending review from a Mozilla official, does it make sense for the CA to continue providing weekly updates that say "nothing to report", until a Mozilla official reviews what has been submitted? This doesn't seem to be meaningful at all. I don't think this was ever intended when the "Keeping Us Informed" section was written but perhaps I'm wrong here.

I understand the need to have clear semantics so we don't fall into the situation where a CA thinks remediation is complete and does not signal this properly to the Mozilla official. An improvement to this process would be for the CA to set the NI flag to Ben when it considers to have completed all remediation steps. After setting the NI flag to Ben, the CA would not need to send weekly updates until they receive some feedback that would reset the NI back to the CA.

Does that seem reasonable to people?

Dimitris.

Corey Bonnell

unread,

Jun 23, 2021, 3:59:51 PM6/23/21

to Dimitris Zacharopoulos, dev-secur...@mozilla.org

I agree with Dimitris that weekly content-free “pings” even after the CA has indicated remediation is complete doesn’t seem to be terribly useful for anyone monitoring incidents. CAs are encouraged to watch the CA Certificate Compliance component, but such comments serve to increase the amount of noise from watching that component. I can imagine there’s similar pain for Mozilla representatives and others looking for meaningful updates on CA incidents.

Given this, I think Dimitris’s proposal to use the NI-flag is an effective way to signal that remediation is complete from the CA’s perspective and a representative needs to evaluate and make a determination on whether the remediation is sufficient.

Thanks,

Corey

--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/72b32dec-b1f6-6716-c931-c6c9e6a4c598%40it.auth.gr.

Ryan Sleevi

unread,

Jun 23, 2021, 5:19:59 PM6/23/21

to Corey Bonnell, Dimitris Zacharopoulos, dev-secur...@mozilla.org

I disagree with Corey and Dimitris, but I can understand and appreciate why they take the position they do.

Dimitris highlights a recent bug, but this has actually been in place for far longer. This language was adopted by Gerv in 28 September 2017: https://wiki.mozilla.org/index.php?title=CA/Responding_To_An_Incident&diff=1181405&oldid=1179230 , with the related public discussion at https://groups.google.com/g/mozilla.dev.security.policy/c/Y392OBvDvr8/m/Pf4VCG_-BQAJ , based on the set of patterns and issues we were seeing, such as non-acknowledgement of issues or non-responsiveness. You'll also find plenty of earlier comments on other bugs reiterating these expectations, so I'm not sure if this was simply selected for the "most recent".

There are a number of reasons this is useful:

It's one of the few positive signals we have that a CA is actually capable of, and is actively, reviewing all of their CA incidents weekly. Thus, the argument that this is "content-free" is not really accurate, because it's indicative of functional processes at work.

We've seen a number of CAs, stretching back for years, failing to implement these basic policies and checks. To suggest this is a "solved problem" is to ignore that we're still filing issues on CAs for non-responsiveness. The ability to provide such updates is useful for continually demonstrating the CA has processes in place (e.g. even in the event of vacations). I can think of some bugs that have demonstrated positive practices, and some quite negative, and happy to dig up examples of both if it's useful here.

It's a reliable, repeatable signal to the community that the CA believes their remediation is complete, in a way that is open for the whole community.

N-I will only ping the entity who the N-I is requested for, and only show up on the dashboard of the entity that the N-I is requested for. Historically, these bugs have involved a great deal of community discussion. A CA stating that they believe their remediation is complete once, and setting N-I, actually provides no further signal. For example, if a CA or community member started observing the compliance incidents after the CA believed it was complete, they would receive no notification of the outstanding review.
Put differently: The moment the CA starts repeating responses also provides a useful signal, even for those not on the N-I, to thoroughly review.

Ultimately, the proposed path of not expecting updates is "trust the CA to make the good judgement", when the Incident Report itself is at least some signal and evidence that the CA has failed to exercise good judgement and good process. Their belief that the issue is fixed is by no means a guarantee that it is fixed.

That is, if a CA's judgement allows the CA to release themselves from an obligation, that sort of incentive greatly encourages CAs exercising poor/hurried judgement. That's not a good outcome.

As CAs have shared in their own review of other CAs' incidents, they prioritize those incidents most recently reported, because they want to ensure they promptly review their systems. This has also been true of the incident reports and investigation: the incident reports go through initial triage and investigation, which means that if you have X hours in a day to review incident reports, the "newest updated" are given priority.

The proposal to N-I certainly prioritizes a Module Owner/Peer (Bugzilla will generate mails for them), but does nothing to affect the review and triage of incidents.
Put differently: The regular updates help ensure issues that CAs believe are closed are able to be promptly responded to and reviewed. If that takes longer than a week (e.g. a glut of new incidents, module peers/owners on vacation), that helps ensure it's still prioritized quickly.

So there's definite value here, which the proposed solution doesn't really address.

I can appreciate that Dimitris and Corey, whether or not they're representing individually or CAs, may not be interested in this signal, and agree, it may not be as useful to them, because for better or worse, they don't have to be invested in knowing whether CA Foo is regularly reviewing their incidents, or if CA Bar is a bad judge of completion. But it's a signal that has turned out to reliably predict how well a CA has a bug management strategy in place. I appreciate Dimitris raising this, because it certainly shows an in-depth review of ongoing reports, although as noted, this has been a longstanding requirement and expectation.

That's not to say I have concerns with CA's setting N-I, but I definitely don't think it should cut off the CA's expectations for responsiveness, whether it's new questions or new updates, including their belief that there are no updates. Too many CAs are still struggling with the basics of incident reports (as you can see from that 2017 thread and look at 2021 incidents) to think that "just trust the CA" is the right path forward here for users.

To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/BN7PR14MB21786D94093A998F691C008E92089%40BN7PR14MB2178.namprd14.prod.outlook.com.

Dimitris Zacharopoulos

unread,

Jun 24, 2021, 4:54:07 AM6/24/21

to dev-secur...@mozilla.org

On 24/6/2021 12:19 π.μ., Ryan Sleevi wrote:
> I disagree with Corey and Dimitris, but I can understand and
> appreciate why they take the position they do.
>
> Dimitris highlights a recent bug, but this has actually been in place
> for far longer. This language was adopted by Gerv in 28 September
> 2017:
> https://wiki.mozilla.org/index.php?title=CA/Responding_To_An_Incident&diff=1181405&oldid=1179230
> <https://wiki.mozilla.org/index.php?title=CA/Responding_To_An_Incident&diff=1181405&oldid=1179230>
> , with the related public discussion at
> https://groups.google.com/g/mozilla.dev.security.policy/c/Y392OBvDvr8/m/Pf4VCG_-BQAJ
> <https://groups.google.com/g/mozilla.dev.security.policy/c/Y392OBvDvr8/m/Pf4VCG_-BQAJ>
> , based on the set of patterns and issues we were seeing, such as
> non-acknowledgement of issues or non-responsiveness. You'll also find
> plenty of earlier comments on other bugs reiterating these
> expectations, so I'm not sure if this was simply selected for the
> "most recent".

Just to clarify that I had seen this practice performed by some CAs over
the past years and didn't think it was that important of an issue to
comment. That doesn't mean that I thought it was a "good" or "useful"
practice. After seeing repeated recent notices that CAs "must" respond
weekly even when they have clearly indicated that all remediation steps
are complete, I though it was strange enough an expectation and decided
to ask here :)

I believe it was never Gerv's or this community's intention to require
clerical weekly updates even when all incident-related activities and
CA's commitments/remediations are claimed to be fulfilled. Posting
repeated messages "nothing to report" doesn't add any value to the
incident report and contradicts with the "CAs may learn from the
incident" part of the policy. The volume of incidents and posts that
this community must review on a very regular basis has increased
significantly, so IMHO it would be reasonable to ask for reducing the
"noise", as Corey elegantly described this practice.

Obviously I don't want this community to spend too much time and energy
on a minor topic such as this, and I will accept any decision that Ben
and Kathleen make on this issue, with a clear statement that would
hopefully also make it to the
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed
<https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed>
section so it is very clear to CAs going forward.

Taking this opportunity, I would also hope this community and its
leaders be a little more concerned when we have examples of:

> this has been a longstanding requirement and expectation

and yet we see repeated deviations of those expectations from multiple
CAs. IMO this is a clear indication that the policy/procedure is not
clear enough and needs to be improved/clarified. It is a standard
practice to improve policy and procedural documents when repeated
misunderstandings/deviations occur.

To be honest, I was motivated by Corey's work taking the results of a
long thread about U-labels
(https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/TMqAcMJh45U/m/dh1Ty10PAgAJ).
During that discussion there were strong statements about policy
expectations from Ryan, where the policy did not 100% reflect those
expectations. Corey decided to step up and propose concrete improvements
as demonstrated in https://github.com/cabforum/servercert/pull/285.

It is very helpful when we see statements that make it unambiguously
clear what the expectations are, and Ryan is always very precise and
clear about his expectations. FWIW I would also like to encourage
members to identify parts of Mozilla documentation that allows for
interpretations that don't meet those expectations and speak up so they
can be improved.

Dimitris.

Ryan Sleevi

unread,

Jun 24, 2021, 10:43:14 AM6/24/21

to Dimitris Zacharopoulos, dev-secur...@mozilla.org

On Thu, Jun 24, 2021 at 4:54 AM Dimitris Zacharopoulos <ji...@it.auth.gr> wrote:

I believe it was never Gerv's or this community's intention to require
clerical weekly updates even when all incident-related activities and
CA's commitments/remediations are claimed to be fulfilled.

I mean, Gerv and I discussed this precise point in 2017, in addition to the on-list discussion, so I don't know what more I can provide you.

Posting
repeated messages "nothing to report" doesn't add any value to the
incident report and contradicts with the "CAs may learn from the
incident" part of the policy. The volume of incidents and posts that
this community must review on a very regular basis has increased
significantly, so IMHO it would be reasonable to ask for reducing the
"noise", as Corey elegantly described this practice.

Right, but the previous post specifically addressed why this is not "noise" and valuable.

I hope you can provide examples of what you believe is "noise" or "of no value", given the incidents discussed later on.

Taking this opportunity, I would also hope this community and its
leaders be a little more concerned when we have examples of:

> this has been a longstanding requirement and expectation

and yet we see repeated deviations of those expectations from multiple
CAs. IMO this is a clear indication that the policy/procedure is not
clear enough and needs to be improved/clarified.

Somewhere there's a spectrum here, I hope we can agree. "underscores" is something that should have been clear, and yet CAs felt it wasn't, because the word "underscore" wasn't mentioned, just the rules that prohibited underscores. "No MITM CAs" is something that should have been clear, and yet CAs felt it wasn't, because the policy didn't say "no MITM CAs", it just said "must validate domains".. Mozilla Policy has long required disclosure of what validation methods CAs use - and yet we see CAs still, in 2021, failing to do so in their CP/CPS, or disclosing methods long forbidden.

I want to make sure that we're clear between "clarifying existing requirements" and "imposing new requirements". Your message implies that this is the latter, but that couldn't be further from the truth.

Since you only noticed it on the Telekom Systems bug, a few more examples showing that the "failure to meet expectations" isn't necessarily one of ambiguous requirements.

And, for certain specific examples regarding the "remediation" expectation

https://bugzilla.mozilla.org/show_bug.cgi?id=1532842#c5 (2019-07-04 - part of multiple reminders, sent to GTS, Camerfirma, GRCA, and GoDaddy, reminding them of the expectation until closed or indicated otherwise)
https://bugzilla.mozilla.org/show_bug.cgi?id=1563579#c10 (2019-09-27 - Showing a CA understanding the expectation)
https://bugzilla.mozilla.org/show_bug.cgi?id=1575022#c2 (2019-09-06)
https://bugzilla.mozilla.org/show_bug.cgi?id=1634795#c1 (2020-05-07 - Showing how a failure to provide those updates led to a CA overlooking failure to remediate)
https://bugzilla.mozilla.org/show_bug.cgi?id=1610303#c13 (2020-07-10 - concerned about the lack of progress)
https://bugzilla.mozilla.org/show_bug.cgi?id=1628292#c8 (2020-08-26 - Showing another CA understanding the expectations)
https://bugzilla.mozilla.org/show_bug.cgi?id=1647468 (2020-06-29 - A CA demonstrating they meet the expectations)
https://bugzilla.mozilla.org/show_bug.cgi?id=1648717#c39 (2021-01-14 - Focusing on how the lack of substance in updates is concerning)
https://bugzilla.mozilla.org/show_bug.cgi?id=1650845#c9 (2020-08-26 - using the bug status to indicate a relationship between bugs)
https://bugzilla.mozilla.org/show_bug.cgi?id=1674886#c7 (2021-01-06)
https://bugzilla.mozilla.org/show_bug.cgi?id=1678183#c4 (2021-03-30)
https://bugzilla.mozilla.org/show_bug.cgi?id=1705791#c24 (2021-05-27)
https://bugzilla.mozilla.org/show_bug.cgi?id=1712106#c4 (2021-05-27)

I understand that you may feel it's noise, but when we've got a clear, long-running pattern of some CAs failing to meet expectations - and some CAs totally meeting them - that feeling is missing the bigger picture of incidents. I hope you can also see the real issue here in letting the CA assume when it's appropriate to stop or not provide updates, given the patterns of issues you can see, and why the expectation is that until the remediation is accepted - and the issue is either closed or a NextUpdate set - then every open incident should be treated as just that: an open incident worth providing an update to the community.

The alternative, of sometimes not providing updates, demonstrably shows it will cause issues to be overlooked, and to the detriment of users.

Dimitris Zacharopoulos

unread,

Jun 24, 2021, 11:55:05 AM6/24/21

to ry...@sleevi.com, dev-secur...@mozilla.org

On 24/6/2021 5:43 μ.μ., Ryan Sleevi wrote:

Taking this opportunity, I would also hope this community and its
leaders be a little more concerned when we have examples of:

> this has been a longstanding requirement and expectation

and yet we see repeated deviations of those expectations from multiple
CAs. IMO this is a clear indication that the policy/procedure is not
clear enough and needs to be improved/clarified.

Somewhere there's a spectrum here, I hope we can agree. "underscores" is something that should have been clear, and yet CAs felt it wasn't, because the word "underscore" wasn't mentioned, just the rules that prohibited underscores. "No MITM CAs" is something that should have been clear, and yet CAs felt it wasn't, because the policy didn't say "no MITM CAs", it just said "must validate domains".. Mozilla Policy has long required disclosure of what validation methods CAs use - and yet we see CAs still, in 2021, failing to do so in their CP/CPS, or disclosing methods long forbidden.

I want to make sure that we're clear between "clarifying existing requirements" and "imposing new requirements". Your message implies that this is the latter, but that couldn't be further from the truth.

I meant it as the former but I understand that some CAs may be caught in between those two. For example, in the U-label issue, the existing requirements were not entirely clear, as demonstrated by the related discussion thread. So, when you clearly stated that there is a clear expectation that it is not allowed to be in the subject:commonName, it appeared as a "new requirement" to CAs that were using it.

My point is that when this community sees repeated failures from many different CAs regarding the same issue, among other things it could also mean that there is some problem with the existing language either of the policy or a requirement or even an RFC language. We can ignore the signs and keep on seeing failures, even from CAs that have a good history of following all Root store operators' guidance and policies, or reduce this probability (of a misunderstanding of existing language) and provide clear language making the expectations unambiguously clear.

Cases like MiTM CAs were not encountered by a large number of CAs but the security risk of it being "misunderstood" was so great that it was explicitly spelled out for all CAs. The use of "underscores" in DNS labels was identified as a repeated failure by many CAs. The course of action to clearly forbid it, was IMO the right thing to do, just like with the U-labels.

I didn't go over all the bugs you sent but after reading the first one (https://bugzilla.mozilla.org/show_bug.cgi?id=1391054#c3) you end your comment with "Weekly updates, for example, would be entirely appropriate". This comment was asking the CA to provide weekly updates while addressing the concerns raised in the incident report. This seems perfectly aligned with the existing language in the guidance provided by Mozilla. My concern is related to providing weekly "ping messages" to incidents after the CA considers that all remediation steps are completed, pending review from a Mozilla official.

All I'm saying is that enforcement of weekly updates to open incidents regardless of the state (CA remediation is completed or in progress), if that is the longstanding requirement and expectation, should be clearer in the guidance page.

Dimitris.

Ryan Sleevi

unread,

Jun 24, 2021, 1:19:45 PM6/24/21

to Dimitris Zacharopoulos, Ryan Sleevi, dev-secur...@mozilla.org

On Thu, Jun 24, 2021 at 11:55 AM Dimitris Zacharopoulos <ji...@it.auth.gr> wrote:

My point is that when this community sees repeated failures from many different CAs regarding the same issue, among other things it could also mean that there is some problem with the existing language either of the policy or a requirement or even an RFC language. We can ignore the signs and keep on seeing failures, even from CAs that have a good history of following all Root store operators' guidance and policies, or reduce this probability (of a misunderstanding of existing language) and provide clear language making the expectations unambiguously clear.

Right, I believe I understood that at your intent.

The point of my previous reply was to show how even when language is believed to be unambiguous, we still see patterns of failures. I appreciate that your theory is "We see failures, because the language is ambiguous", and so your solution: "clarify the language", would address this. My previous reply was to highlight "We see failures, even when the language is unambiguous" (the first batch of bugs), to highlight that we can't assume that repeat failures are inherently the results of ambiguities.

The second set of bugs were to reflect the "long-standing" aspect of this, and showing both positive and negative cases here. This isn't an argument against providing additional clarity in policy, but rather, to highlight additional examples of past clarifications (and this was a brief scan; I believe there are more bugs here, I just find Bugzilla's search interface a bit rough), as well as to provide some examples of where and why those past clarifications are useful, which is more speaking to the second point your original message raised: are these useful.

Put differently, I understood your points as:

Is it useful for {CAs, Root Stores, the Community} to provide updates after the CA believes the issue is resolved?
If it is useful, is the expectation existing or new?
If it's existing, can/should this be made clearer?

Hopefully that's not too reductionist. I believe the first point, yes, it's very useful - and my previous messages tried to provide explanations as to how/why it's useful for at least _some_ of the community.

To the point of expectation being existing or new, the second batch of bugs tried to capture existing, going back for several years.

To the final point, I haven't really said anything yet, because working out new language is only useful if there's agreement on it being a useful practice, which I totally understand and appreciate that there's still some discussion about.

Ben Wilson

unread,

Jun 24, 2021, 1:59:56 PM6/24/21

to Ryan Sleevi, Dimitris Zacharopoulos, dev-secur...@mozilla.org

A CA can avoid posting repetitive, weekly "No updates" by posting a comment in the bug that they will not have an update until X date and request that a "next update" be set in the whiteboard. If the proposed next update is reasonable, then we can amend the whiteboard to indicate a date by which a next update is due. If the CA provides an update by that date, and the "next update" is not reset in the whiteboard, then the CA should resume providing weekly updates or request that the "next update" be reset.

If the CA believes that a CA Compliance issue in Bugzilla has been fully resolved, then it is welcome (1) to add a comment to that effect in the bug, (2) to ask that the bug be Resolved as Fixed, and (3) to add a Need Info for me to review the status of the issue. However, until expressly acted upon, such requests to close and need-infos do not waive the need to provide updates accordingly.

--

You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.

To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CAErg%3DHHJNERGtHCqFG%3DXrFPna2pWAqK1VkAm_h972v2DgCt7NA%40mail.gmail.com.

Reply all

Reply to author

Forward