Underscore characters

Jeremy Rowley

unread,

Dec 18, 2018, 5:28:40 PM12/18/18

to mozilla-dev-security-policy

We're looking at the feasibility of replacing the certificates with
underscore characters by Jan 15th. Revoking all of the certificates will
cause pretty bad outages. We're prepared to revoke them but would like to
discuss (before the date) what should happen if we don't revoke. There are
about 15 customers (which I can't disclose the names yet but am working on
the list of certs). The number of certificates range between 1700-100
certificates per customer.

The primary reason for this is every one of these organization are in a
holiday blackout. The blackout periods end between Jan 12-Feb 15. Can we
start this discussion now on what this means? I'll provide certificate lists
as I have a timeline on when they plan on replacing.

Jeremy

Jeremy Rowley

unread,

Dec 18, 2018, 5:43:16 PM12/18/18

to Jeremy Rowley, mozilla-dev-security-policy

The total number of certs impacted is about 2200. Just more info.

Ryan Sleevi

unread,

Dec 18, 2018, 9:35:03 PM12/18/18

to Jeremy Rowley, mozilla-dev-security-policy

Jeremy,

It seems like any answer for what it "might" look like if a CA violated the
BRs in a particular way is going to be predicated on what the incident
report says. In the case of a hypothetical like this, it seems like the
hypothetical incident report would discuss what is planned or proposed, and
should a CA go forward with such an intentional violation, the 'actual'
incident report would equally consider how accurate that was.

Recall that the approach to incident reporting is not punitive - it's to
make sure that we're addressing systemic gaps and issues, that we've
understood the issues, and have the available data to assess the impact,
risk, and any patterns of issues. The incident reporting template is one
way to provide that data in a structured way and to gather feedback.

I think a minimum next step is to move from the abstract discussion to the
concrete: imagine you went forward on Jan 15 and had to file an incident
report. Write the report like that. Include the timeframes, affected
certificates, impact, root causes, remediation plans, etc. Having a
complete presentation of what the discussion is about seems critical to
having that discussion, because it would be unreasonable to expect
information to trickle in and new customers or use cases added as the
discussion progresses.

Thus there's a balance to be struck: Treating each hypothetical as a
"separate" incident report runs the risk of being considered in isolation,
ignoring both the systemic gaps and the cumulative risk. At the same time,
treating it as a "singular" incident report tries to paint all problems in
the same stroke, and can overlook distinct systemic issues. Both cases run
the risk of "scope creep", which is constantly adding or expanding the
scope, which is as well received in legitimate incident reports as it is
hypothetical (which is to say: not well). Perhaps the best analogy is to
that of subordinate CAs: each time a subordinate has an issue, that's an
incident report, and a pattern of issues at distinct subordinates is
equally a concerning issue for the parent CA. You don't want to loop all
distinct subordinates into one issue, but you also don't want to lose sight
of the systemic issues with the parent.

Beyond that framing and execution, it seems useful to suggest that any
timeline about underscores should at least acknowledge Ballot 202 in June
2017 and any/all steps the CA took leading up to and following SC12.

None of this is radically new or should be surprising: DigiCert and other
CAs have already had similar conversations in discussing other matters of
BR compliance and revocation. All of these have become part of the CA's
record of incidents. When the CA proposes extending revocation timelines, a
discussion of the facts, risks, scope, and patterns play a core part in any
discussion in determining the short term acceptability of the proposal, and
unquestionably all factor in to any long-term discussions that may later
happen.

The one last closing thought is that I think we're in the waning days for
when such hypothetical issues or concrete delay proposals can or should be
discussed. Given the many discussions that have been had regarding
revocation - regarding technical non-compliance, compromised keys, weak
validation, etc - the argument that "replacing a cert is hard" is not
really going to be acceptable anymore without demonstration about what
steps are being taken by CAs and Subscribers to mitigate that risk (such as
automation) and communicate expectations (such as in Subscriber Agreements
or Terms of Sale). I don't think we want to go through 2019, and certainly
not come out of it, having the same conversations we've been having in
2018. The best way to prevent that is for CAs to take clear steps to work
to resolve these issues with their customers, so that it never becomes an
issue for them, or their CA, in the first place. CAs that aren't able to
demonstrate steps towards that in future discussions are unlikely to be
looked upon too favorably if there are future incident reports.

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>

Jeremy Rowley

unread,

Dec 18, 2018, 9:52:06 PM12/18/18

to ry...@sleevi.com, mozilla-dev-security-policy

Yeah – I’ll be providing an accurate incident report (working on gathering all the information). The incident report assumes we don’t revoke of course. Revocation is still on the table. However, I wanted to start the conversation with everything I know so far:

1) ~2200 certs

2) Roughly 15 companies
3) Only one has publicly chimed in so far on the Mozilla thread (still hoping more will…)

4) Revocation of all certs will occur by May 1, 2019, depending on how the discussion here goes.
5) The common thread is that the Jan 15th deadline falls in the blackout window of most orgs. They generally come off it between Jan 15-Feb 15. They can’t replace the cert or change the domain so the 30 day cert option doesn’t help.

6) We provided notice as soon as the ballot passed. We blocked issuance prior to that date, but we’d hoped that the certs could remain valid until expiration. We had trouble with our BI providing the information so some notices went out later than I’d hoped. I’ll find the exact date on when all notices were complete.

Ballot 202 failed. I’m not sure how it’s relevant other than to indicate there was definite disagreement about whether underscores were permitted or not. As previously mentioned, I didn’t consider underscore characters prohibited until the ballot was proposed eliminating them in Oct. I know the general Mozilla population disagrees but, right or wrong, that’s the root cause of it all. I can explain my reasoning again here, but I doubt it materially alters the conversation and outcome.

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Tuesday, December 18, 2018 7:35 PM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

Jeremy,

It seems like any answer for what it "might" look like if a CA violated the BRs in a particular way is going to be predicated on what the incident report says. In the case of a hypothetical like this, it seems like the hypothetical incident report would discuss what is planned or proposed, and should a CA go forward with such an intentional violation, the 'actual' incident report would equally consider how accurate that was.

Recall that the approach to incident reporting is not punitive - it's to make sure that we're addressing systemic gaps and issues, that we've understood the issues, and have the available data to assess the impact, risk, and any patterns of issues. The incident reporting template is one way to provide that data in a structured way and to gather feedback.

I think a minimum next step is to move from the abstract discussion to the concrete: imagine you went forward on Jan 15 and had to file an incident report. Write the report like that. Include the timeframes, affected certificates, impact, root causes, remediation plans, etc. Having a complete presentation of what the discussion is about seems critical to having that discussion, because it would be unreasonable to expect information to trickle in and new customers or use cases added as the discussion progresses.

Thus there's a balance to be struck: Treating each hypothetical as a "separate" incident report runs the risk of being considered in isolation, ignoring both the systemic gaps and the cumulative risk. At the same time, treating it as a "singular" incident report tries to paint all problems in the same stroke, and can overlook distinct systemic issues. Both cases run the risk of "scope creep", which is constantly adding or expanding the scope, which is as well received in legitimate incident reports as it is hypothetical (which is to say: not well). Perhaps the best analogy is to that of subordinate CAs: each time a subordinate has an issue, that's an incident report, and a pattern of issues at distinct subordinates is equally a concerning issue for the parent CA. You don't want to loop all distinct subordinates into one issue, but you also don't want to lose sight of the systemic issues with the parent.

Beyond that framing and execution, it seems useful to suggest that any timeline about underscores should at least acknowledge Ballot 202 in June 2017 and any/all steps the CA took leading up to and following SC12.

None of this is radically new or should be surprising: DigiCert and other CAs have already had similar conversations in discussing other matters of BR compliance and revocation. All of these have become part of the CA's record of incidents. When the CA proposes extending revocation timelines, a discussion of the facts, risks, scope, and patterns play a core part in any discussion in determining the short term acceptability of the proposal, and unquestionably all factor in to any long-term discussions that may later happen.

The one last closing thought is that I think we're in the waning days for when such hypothetical issues or concrete delay proposals can or should be discussed. Given the many discussions that have been had regarding revocation - regarding technical non-compliance, compromised keys, weak validation, etc - the argument that "replacing a cert is hard" is not really going to be acceptable anymore without demonstration about what steps are being taken by CAs and Subscribers to mitigate that risk (such as automation) and communicate expectations (such as in Subscriber Agreements or Terms of Sale). I don't think we want to go through 2019, and certainly not come out of it, having the same conversations we've been having in 2018. The best way to prevent that is for CAs to take clear steps to work to resolve these issues with their customers, so that it never becomes an issue for them, or their CA, in the first place. CAs that aren't able to demonstrate steps towards that in future discussions are unlikely to be looked upon too favorably if there are future incident reports.

On Tue, Dec 18, 2018 at 5:43 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

The total number of certs impacted is about 2200. Just more info.

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org <mailto:dev-security-...@lists.mozilla.org> > On
Behalf Of Jeremy Rowley via dev-security-policy
Sent: Tuesday, December 18, 2018 3:28 PM
To: mozilla-dev-security-policy
<mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Underscore characters

We're looking at the feasibility of replacing the certificates with
underscore characters by Jan 15th. Revoking all of the certificates will
cause pretty bad outages. We're prepared to revoke them but would like to
discuss (before the date) what should happen if we don't revoke. There are
about 15 customers (which I can't disclose the names yet but am working on
the list of certs). The number of certificates range between 1700-100
certificates per customer.

The primary reason for this is every one of these organization are in a
holiday blackout. The blackout periods end between Jan 12-Feb 15. Can we
start this discussion now on what this means? I'll provide certificate lists
as I have a timeline on when they plan on replacing.

Jeremy

_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Peter Bowen

unread,

Dec 18, 2018, 10:15:01 PM12/18/18

to jeremy rowley, Ryan Sleevi, mozilla-dev-s...@lists.mozilla.org

On Tue, Dec 18, 2018 at 6:52 PM Jeremy Rowley via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> Ballot 202 failed. I’m not sure how it’s relevant other than to indicate
> there was definite disagreement about whether underscores were permitted or
> not. As previously mentioned, I didn’t consider underscore characters
> prohibited until the ballot was proposed eliminating them in Oct. I know
> the general Mozilla population disagrees but, right or wrong, that’s the
> root cause of it all. I can explain my reasoning again here, but I doubt it
> materially alters the conversation and outcome.
>

I agree that Jeremy that the situation with underscores was unclear prior
to the ballot in October. Three years ago when I was writing certlint, my
very first public commit has the comment:
# Allow RFC defying '*' and '_'

I honestly haven't been pay a lot of attention to the CA/Browser Forum
recently. Given the rationale for getting rid of underscores is RFC
compliance, did the ballot also disallow asterisks? They are also not
allowed by the "preferred name syntax", as specified by Section 3.5 of
[RFC1034] <https://tools.ietf.org/html/rfc1034#section-3.5> and as modified
by Section 2.1 of <https://tools.ietf.org/html/rfc1123#section-2.1>
[RFC1123] <https://tools.ietf.org/html/rfc1123#section-2.1>.

Thanks,
Peter

Jakob Bohm

unread,

Dec 19, 2018, 4:43:53 AM12/19/18

to mozilla-dev-s...@lists.mozilla.org

The problematic section of RFC5280 contains this paragraph, wedged between
encoding descriptions (which happen to include a reference to the "preferred
syntax" of host names) and the corresponding ASN.1 syntax:

Finally, the semantics of subject alternative names that include
wildcard characters (e.g., as a placeholder for a set of names) are
not addressed by this specification. Applications with specific
requirements MAY use such names, but they must define the semantics.

A different RFC defines the modern semantics of wildcard certificates,
thus providing the required definition.

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Ryan Sleevi

unread,

Dec 19, 2018, 9:17:51 AM12/19/18

to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-security-policy

While I appreciate you sharing what you have, as I tried to capture in my
previous message, I don't believe there can be any discussion or
consideration in earnest without the full and final information. I don't
think it's reasonable to drip in information piece meal, given the impact
and affect that can have - whether incomplete information for the issue or
whether additional customers being added.

You're making a huge request of the community, arguably one that
borderlines unreasonable given the set of issues had in the past. I do want
to help you achieve your goal of understanding what that would look like,
but that's only possible with full and complete information. You mentioned
it's roughly 15 companies. If you had ten committed, but were waiting on
the remaining five to give the OK, I think it would be irresponsible to
hold up having that conversation until you get the OK. Quite simply, if you
don't get the OK from those five companies, then we shouldn't even be
discussing them. Ultimately, the ball is in your court as to how you want
to address this with your customers, but I think that delaying the
conversation in order to make sure "stragglers" are included is probably
not the wisest for your customers that have their stuff together.

As such, I don't think the conversation can begin without that
(hypothetical) incident report, and I look forward to you deciding what
that scope will be in order to share and commit to it.

On Tue, Dec 18, 2018 at 9:51 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> Yeah – I’ll be providing an accurate incident report (working on gathering
> all the information). The incident report assumes we don’t revoke of
> course. Revocation is still on the table. However, I wanted to start the
> conversation with everything I know so far:
>
> 1) ~2200 certs
>
> 2) Roughly 15 companies
> 3) Only one has publicly chimed in so far on the Mozilla thread (still
> hoping more will…)
>
> 4) Revocation of all certs will occur by May 1, 2019, depending on how the
> discussion here goes.
> 5) The common thread is that the Jan 15th deadline falls in the blackout
> window of most orgs. They generally come off it between Jan 15-Feb 15. They
> can’t replace the cert or change the domain so the 30 day cert option
> doesn’t help.
>
> 6) We provided notice as soon as the ballot passed. We blocked issuance
> prior to that date, but we’d hoped that the certs could remain valid until
> expiration. We had trouble with our BI providing the information so some
> notices went out later than I’d hoped. I’ll find the exact date on when all
> notices were complete.
>
>
>

> Ballot 202 failed. I’m not sure how it’s relevant other than to indicate
> there was definite disagreement about whether underscores were permitted or
> not. As previously mentioned, I didn’t consider underscore characters
> prohibited until the ballot was proposed eliminating them in Oct. I know
> the general Mozilla population disagrees but, right or wrong, that’s the
> root cause of it all. I can explain my reasoning again here, but I doubt it
> materially alters the conversation and outcome.
>
>
>

> *From:* Ryan Sleevi <ry...@sleevi.com>
> *Sent:* Tuesday, December 18, 2018 7:35 PM
> *To:* Jeremy Rowley <jeremy...@digicert.com>
> *Cc:* mozilla-dev-security-policy <
> mozilla-dev-s...@lists.mozilla.org>
> *Subject:* Re: Underscore characters

> https://lists.mozilla.org/listinfo/dev-security-policy
> <https://clicktime.symantec.com/a/1/UwaqiTBLrHZxNFaFiw6tF5llM6J0dwQUFAOmtu6Yhmg=?d=GFS0cws0XLcWxiZpBrQGssA_ePKmj4UUy4D2_uRaH8vcPjyqROPrY3hUGg2pgvZX_ZSweYg_qEW7ZnDID39n3Y03BrX9ZUmyKw12P0Lj3uQ1-NFXv2hJ3n_IhoJZ45zw5xY2Xlqsb5yTdnfpqFO0GRgdzt8VIkyfA4oCGdPCIie8zO5lwRWA9_9L1nY_oZnqebapEewPO7G3TFTC4Vzng0UMv_e8PwMZ74yTMF7rGLvtqD3lsC4TLVB3F-26Y6p4yOZ6AszODfYLWEldaDnPqOG-u9qQ_sjKPN8Wkk7PJK_Feu4CFeTXTv8QoOYXf3d8ZDukCHi7G9GW5g98ljUVS25gLdi_Qv4sRenxqpOmGBi6LfMOAfGfsRcnQGYIdObfC97HsN8F&u=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy>
>
>

Jeremy Rowley

unread,

Dec 19, 2018, 12:21:16 PM12/19/18

to ry...@sleevi.com, mozilla-dev-security-policy

We will post the full list of exceptions today.

One of the big factors should be the risk to the industry/community if the certificates aren’t revoked. Perhaps we can identify what the risk to the community is in revocation delays first? There’s no need to know the exact certs to talk about what the risk associated with underscore characters is. Could you please explain the risk to the community in a revocation delay as the “unreasonable” argument isn’t really supported without that understanding.

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Wednesday, December 19, 2018 7:17 AM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: ry...@sleevi.com; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

While I appreciate you sharing what you have, as I tried to capture in my previous message, I don't believe there can be any discussion or consideration in earnest without the full and final information. I don't think it's reasonable to drip in information piece meal, given the impact and affect that can have - whether incomplete information for the issue or whether additional customers being added.

You're making a huge request of the community, arguably one that borderlines unreasonable given the set of issues had in the past. I do want to help you achieve your goal of understanding what that would look like, but that's only possible with full and complete information. You mentioned it's roughly 15 companies. If you had ten committed, but were waiting on the remaining five to give the OK, I think it would be irresponsible to hold up having that conversation until you get the OK. Quite simply, if you don't get the OK from those five companies, then we shouldn't even be discussing them. Ultimately, the ball is in your court as to how you want to address this with your customers, but I think that delaying the conversation in order to make sure "stragglers" are included is probably not the wisest for your customers that have their stuff together.

As such, I don't think the conversation can begin without that (hypothetical) incident report, and I look forward to you deciding what that scope will be in order to share and commit to it.

On Tue, Dec 18, 2018 at 9:51 PM Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> > wrote:

Yeah – I’ll be providing an accurate incident report (working on gathering all the information). The incident report assumes we don’t revoke of course. Revocation is still on the table. However, I wanted to start the conversation with everything I know so far:

1) ~2200 certs

2) Roughly 15 companies
3) Only one has publicly chimed in so far on the Mozilla thread (still hoping more will…)

4) Revocation of all certs will occur by May 1, 2019, depending on how the discussion here goes.
5) The common thread is that the Jan 15th deadline falls in the blackout window of most orgs. They generally come off it between Jan 15-Feb 15. They can’t replace the cert or change the domain so the 30 day cert option doesn’t help.

6) We provided notice as soon as the ballot passed. We blocked issuance prior to that date, but we’d hoped that the certs could remain valid until expiration. We had trouble with our BI providing the information so some notices went out later than I’d hoped. I’ll find the exact date on when all notices were complete.

Ballot 202 failed. I’m not sure how it’s relevant other than to indicate there was definite disagreement about whether underscores were permitted or not. As previously mentioned, I didn’t consider underscore characters prohibited until the ballot was proposed eliminating them in Oct. I know the general Mozilla population disagrees but, right or wrong, that’s the root cause of it all. I can explain my reasoning again here, but I doubt it materially alters the conversation and outcome.

From: Ryan Sleevi <ry...@sleevi.com <mailto:ry...@sleevi.com> >
Sent: Tuesday, December 18, 2018 7:35 PM

To: Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> >
Cc: mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Re: Underscore characters

Jeremy,

It seems like any answer for what it "might" look like if a CA violated the BRs in a particular way is going to be predicated on what the incident report says. In the case of a hypothetical like this, it seems like the hypothetical incident report would discuss what is planned or proposed, and should a CA go forward with such an intentional violation, the 'actual' incident report would equally consider how accurate that was.

Recall that the approach to incident reporting is not punitive - it's to make sure that we're addressing systemic gaps and issues, that we've understood the issues, and have the available data to assess the impact, risk, and any patterns of issues. The incident reporting template is one way to provide that data in a structured way and to gather feedback.

I think a minimum next step is to move from the abstract discussion to the concrete: imagine you went forward on Jan 15 and had to file an incident report. Write the report like that. Include the timeframes, affected certificates, impact, root causes, remediation plans, etc. Having a complete presentation of what the discussion is about seems critical to having that discussion, because it would be unreasonable to expect information to trickle in and new customers or use cases added as the discussion progresses.

Thus there's a balance to be struck: Treating each hypothetical as a "separate" incident report runs the risk of being considered in isolation, ignoring both the systemic gaps and the cumulative risk. At the same time, treating it as a "singular" incident report tries to paint all problems in the same stroke, and can overlook distinct systemic issues. Both cases run the risk of "scope creep", which is constantly adding or expanding the scope, which is as well received in legitimate incident reports as it is hypothetical (which is to say: not well). Perhaps the best analogy is to that of subordinate CAs: each time a subordinate has an issue, that's an incident report, and a pattern of issues at distinct subordinates is equally a concerning issue for the parent CA. You don't want to loop all distinct subordinates into one issue, but you also don't want to lose sight of the systemic issues with the parent.

Beyond that framing and execution, it seems useful to suggest that any timeline about underscores should at least acknowledge Ballot 202 in June 2017 and any/all steps the CA took leading up to and following SC12.

None of this is radically new or should be surprising: DigiCert and other CAs have already had similar conversations in discussing other matters of BR compliance and revocation. All of these have become part of the CA's record of incidents. When the CA proposes extending revocation timelines, a discussion of the facts, risks, scope, and patterns play a core part in any discussion in determining the short term acceptability of the proposal, and unquestionably all factor in to any long-term discussions that may later happen.

The one last closing thought is that I think we're in the waning days for when such hypothetical issues or concrete delay proposals can or should be discussed. Given the many discussions that have been had regarding revocation - regarding technical non-compliance, compromised keys, weak validation, etc - the argument that "replacing a cert is hard" is not really going to be acceptable anymore without demonstration about what steps are being taken by CAs and Subscribers to mitigate that risk (such as automation) and communicate expectations (such as in Subscriber Agreements or Terms of Sale). I don't think we want to go through 2019, and certainly not come out of it, having the same conversations we've been having in 2018. The best way to prevent that is for CAs to take clear steps to work to resolve these issues with their customers, so that it never becomes an issue for them, or their CA, in the first place. CAs that aren't able to demonstrate steps towards that in future discussions are unlikely to be looked upon too favorably if there are future incident reports.

On Tue, Dec 18, 2018 at 5:43 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

The total number of certs impacted is about 2200. Just more info.

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org <mailto:dev-security-...@lists.mozilla.org> > On
Behalf Of Jeremy Rowley via dev-security-policy
Sent: Tuesday, December 18, 2018 3:28 PM
To: mozilla-dev-security-policy
<mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Underscore characters

We're looking at the feasibility of replacing the certificates with
underscore characters by Jan 15th. Revoking all of the certificates will
cause pretty bad outages. We're prepared to revoke them but would like to
discuss (before the date) what should happen if we don't revoke. There are
about 15 customers (which I can't disclose the names yet but am working on
the list of certs). The number of certificates range between 1700-100
certificates per customer.

The primary reason for this is every one of these organization are in a
holiday blackout. The blackout periods end between Jan 12-Feb 15. Can we
start this discussion now on what this means? I'll provide certificate lists
as I have a timeline on when they plan on replacing.

Jeremy

_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Ryan Sleevi

unread,

Dec 19, 2018, 1:05:17 PM12/19/18

to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-security-policy

Look forward to seeing and discussing once the full scope of the request is
shared.

On Wed, Dec 19, 2018 at 12:21 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> We will post the full list of exceptions today.
>
>
>
> One of the big factors should be the risk to the industry/community if the
> certificates aren’t revoked. Perhaps we can identify what the risk to the
> community is in revocation delays first? There’s no need to know the exact
> certs to talk about what the risk associated with underscore characters is.
> Could you please explain the risk to the community in a revocation delay as
> the “unreasonable” argument isn’t really supported without that
> understanding.
>
>
>

> *From:* Ryan Sleevi <ry...@sleevi.com>

> *Sent:* Wednesday, December 19, 2018 7:17 AM

> *To:* Jeremy Rowley <jeremy...@digicert.com>

> *Cc:* ry...@sleevi.com; mozilla-dev-security-policy <
> mozilla-dev-s...@lists.mozilla.org>
> *Subject:* Re: Underscore characters

>
>
>
> While I appreciate you sharing what you have, as I tried to capture in my
> previous message, I don't believe there can be any discussion or
> consideration in earnest without the full and final information. I don't
> think it's reasonable to drip in information piece meal, given the impact
> and affect that can have - whether incomplete information for the issue or
> whether additional customers being added.
>
>
>
> You're making a huge request of the community, arguably one that
> borderlines unreasonable given the set of issues had in the past. I do want
> to help you achieve your goal of understanding what that would look like,
> but that's only possible with full and complete information. You mentioned
> it's roughly 15 companies. If you had ten committed, but were waiting on
> the remaining five to give the OK, I think it would be irresponsible to
> hold up having that conversation until you get the OK. Quite simply, if you
> don't get the OK from those five companies, then we shouldn't even be
> discussing them. Ultimately, the ball is in your court as to how you want
> to address this with your customers, but I think that delaying the
> conversation in order to make sure "stragglers" are included is probably
> not the wisest for your customers that have their stuff together.
>
>
>
> As such, I don't think the conversation can begin without that
> (hypothetical) incident report, and I look forward to you deciding what
> that scope will be in order to share and commit to it.
>
>
>
> On Tue, Dec 18, 2018 at 9:51 PM Jeremy Rowley <jeremy...@digicert.com>

> *From:* Ryan Sleevi <ry...@sleevi.com>
> *Sent:* Tuesday, December 18, 2018 7:35 PM
> *To:* Jeremy Rowley <jeremy...@digicert.com>
> *Cc:* mozilla-dev-security-policy <
> mozilla-dev-s...@lists.mozilla.org>

> *Subject:* Re: Underscore characters

> https://lists.mozilla.org/listinfo/dev-security-policy
> <https://clicktime.symantec.com/a/1/UwaqiTBLrHZxNFaFiw6tF5llM6J0dwQUFAOmtu6Yhmg=?d=GFS0cws0XLcWxiZpBrQGssA_ePKmj4UUy4D2_uRaH8vcPjyqROPrY3hUGg2pgvZX_ZSweYg_qEW7ZnDID39n3Y03BrX9ZUmyKw12P0Lj3uQ1-NFXv2hJ3n_IhoJZ45zw5xY2Xlqsb5yTdnfpqFO0GRgdzt8VIkyfA4oCGdPCIie8zO5lwRWA9_9L1nY_oZnqebapEewPO7G3TFTC4Vzng0UMv_e8PwMZ74yTMF7rGLvtqD3lsC4TLVB3F-26Y6p4yOZ6AszODfYLWEldaDnPqOG-u9qQ_sjKPN8Wkk7PJK_Feu4CFeTXTv8QoOYXf3d8ZDukCHi7G9GW5g98ljUVS25gLdi_Qv4sRenxqpOmGBi6LfMOAfGfsRcnQGYIdObfC97HsN8F&u=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy>
>
>

Matt Palmer

unread,

Dec 19, 2018, 8:29:29 PM12/19/18

to dev-secur...@lists.mozilla.org

On Wed, Dec 19, 2018 at 05:20:59PM +0000, Jeremy Rowley via dev-security-policy wrote:
> One of the big factors should be the risk to the industry/community if the
> certificates aren’t revoked. Perhaps we can identify what the risk to the
> community is in revocation delays first? There’s no need to know the
> exact certs to talk about what the risk associated with underscore
> characters is. Could you please explain the risk to the community in a
> revocation delay as the “unreasonable” argument isn’t really supported
> without that understanding.

I think an important risk to the community of not revoking as per the CA/B
Forum's accepted ballot timeline is sending the message that the rules of
the game are optional, and there is commercial benefit to be gained in not
following the rules.

I'm sure there are some CAs whose systems were always setup in a
standards-compliant fashion, and refused to issue certificates for invalid
names. I'm equally sure that at least one of those CAs lost a sale over the
years as a result of that, and that sale went to a competitor which was
*not* adhering to the standards.

Now the issue has been raised, clarification made, and a decision has been
made as to how to move forward. To provide further benefit (in the form of
a waiver from the agreed-upon rules) to CAs which have failed to follow the
rules in the past does not encourage adherence in the future. Certainly, if
I were the CEO of a for-profit CA which *had* followed the no-underscores
rule, I might be inclined to gently encourage my developers to play a little
faster and looser with their interpretations in the future, if it would
provide my organisation with a revenue benefit, and it was clear that there
was to be no meaningful negative consequence *to me* as a result. To do
otherwise would actually be contrary to the stated goals of the
organisation.

Please don't misunderstand my words to think that I'm saying that any CA
*deliberately* ignored the standards around underscores in order to sell
some more certs. I'm well aware that the rules around valid hostnames,
domain names, DNS labels, etc are not the clearest, and most people wouldn't
read them even if they were.

It's undeniable, though, that CAs which allowed underscores in places that
are supposed to be valid LDH domain names made a mistake, and to
deliberately misquote Jurassic Park's John Hammond, "I don't blame people
for their mistakes, but I do expect them to take responsibility for them."
To not expect CAs to take responsibility for their mistakes sends a
*terrible* message to the entire ecosystem, one that would have far greater
long-term repurcussions than any isolated harm from the presence of
underscores themselves.

Whilst it's not quite the textbook definition, part of the Wikipedia page on
"Moral Hazard" says, "when a person takes more risks because someone else
bears the cost of those risks". That's a pretty reasonable expression of
what's going on here.

- Matt

Jeremy Rowley

unread,

Dec 20, 2018, 12:55:39 AM12/20/18

to ry...@sleevi.com, mozilla-dev-security-policy

Done:

https://bugzilla.mozilla.org/show_bug.cgi?id=1515564

It ended up being about 1200 certs total that we are hearing can’t be replaced because of blackout periods.

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Wednesday, December 19, 2018 11:05 AM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: ry...@sleevi.com; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

Look forward to seeing and discussing once the full scope of the request is shared.

On Wed, Dec 19, 2018 at 12:21 PM Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> > wrote:

We will post the full list of exceptions today.

One of the big factors should be the risk to the industry/community if the certificates aren’t revoked. Perhaps we can identify what the risk to the community is in revocation delays first? There’s no need to know the exact certs to talk about what the risk associated with underscore characters is. Could you please explain the risk to the community in a revocation delay as the “unreasonable” argument isn’t really supported without that understanding.

From: Ryan Sleevi <ry...@sleevi.com <mailto:ry...@sleevi.com> >
Sent: Wednesday, December 19, 2018 7:17 AM
To: Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> >

Cc: ry...@sleevi.com <mailto:ry...@sleevi.com> ; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Re: Underscore characters

While I appreciate you sharing what you have, as I tried to capture in my previous message, I don't believe there can be any discussion or consideration in earnest without the full and final information. I don't think it's reasonable to drip in information piece meal, given the impact and affect that can have - whether incomplete information for the issue or whether additional customers being added.

You're making a huge request of the community, arguably one that borderlines unreasonable given the set of issues had in the past. I do want to help you achieve your goal of understanding what that would look like, but that's only possible with full and complete information. You mentioned it's roughly 15 companies. If you had ten committed, but were waiting on the remaining five to give the OK, I think it would be irresponsible to hold up having that conversation until you get the OK. Quite simply, if you don't get the OK from those five companies, then we shouldn't even be discussing them. Ultimately, the ball is in your court as to how you want to address this with your customers, but I think that delaying the conversation in order to make sure "stragglers" are included is probably not the wisest for your customers that have their stuff together.

As such, I don't think the conversation can begin without that (hypothetical) incident report, and I look forward to you deciding what that scope will be in order to share and commit to it.

On Tue, Dec 18, 2018 at 9:51 PM Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> > wrote:

Yeah – I’ll be providing an accurate incident report (working on gathering all the information). The incident report assumes we don’t revoke of course. Revocation is still on the table. However, I wanted to start the conversation with everything I know so far:

1) ~2200 certs

2) Roughly 15 companies
3) Only one has publicly chimed in so far on the Mozilla thread (still hoping more will…)

4) Revocation of all certs will occur by May 1, 2019, depending on how the discussion here goes.
5) The common thread is that the Jan 15th deadline falls in the blackout window of most orgs. They generally come off it between Jan 15-Feb 15. They can’t replace the cert or change the domain so the 30 day cert option doesn’t help.

6) We provided notice as soon as the ballot passed. We blocked issuance prior to that date, but we’d hoped that the certs could remain valid until expiration. We had trouble with our BI providing the information so some notices went out later than I’d hoped. I’ll find the exact date on when all notices were complete.

Ballot 202 failed. I’m not sure how it’s relevant other than to indicate there was definite disagreement about whether underscores were permitted or not. As previously mentioned, I didn’t consider underscore characters prohibited until the ballot was proposed eliminating them in Oct. I know the general Mozilla population disagrees but, right or wrong, that’s the root cause of it all. I can explain my reasoning again here, but I doubt it materially alters the conversation and outcome.

From: Ryan Sleevi <ry...@sleevi.com <mailto:ry...@sleevi.com> >
Sent: Tuesday, December 18, 2018 7:35 PM
To: Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> >
Cc: mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >

Subject: Re: Underscore characters

Jeremy,

It seems like any answer for what it "might" look like if a CA violated the BRs in a particular way is going to be predicated on what the incident report says. In the case of a hypothetical like this, it seems like the hypothetical incident report would discuss what is planned or proposed, and should a CA go forward with such an intentional violation, the 'actual' incident report would equally consider how accurate that was.

Recall that the approach to incident reporting is not punitive - it's to make sure that we're addressing systemic gaps and issues, that we've understood the issues, and have the available data to assess the impact, risk, and any patterns of issues. The incident reporting template is one way to provide that data in a structured way and to gather feedback.

I think a minimum next step is to move from the abstract discussion to the concrete: imagine you went forward on Jan 15 and had to file an incident report. Write the report like that. Include the timeframes, affected certificates, impact, root causes, remediation plans, etc. Having a complete presentation of what the discussion is about seems critical to having that discussion, because it would be unreasonable to expect information to trickle in and new customers or use cases added as the discussion progresses.

Thus there's a balance to be struck: Treating each hypothetical as a "separate" incident report runs the risk of being considered in isolation, ignoring both the systemic gaps and the cumulative risk. At the same time, treating it as a "singular" incident report tries to paint all problems in the same stroke, and can overlook distinct systemic issues. Both cases run the risk of "scope creep", which is constantly adding or expanding the scope, which is as well received in legitimate incident reports as it is hypothetical (which is to say: not well). Perhaps the best analogy is to that of subordinate CAs: each time a subordinate has an issue, that's an incident report, and a pattern of issues at distinct subordinates is equally a concerning issue for the parent CA. You don't want to loop all distinct subordinates into one issue, but you also don't want to lose sight of the systemic issues with the parent.

Beyond that framing and execution, it seems useful to suggest that any timeline about underscores should at least acknowledge Ballot 202 in June 2017 and any/all steps the CA took leading up to and following SC12.

None of this is radically new or should be surprising: DigiCert and other CAs have already had similar conversations in discussing other matters of BR compliance and revocation. All of these have become part of the CA's record of incidents. When the CA proposes extending revocation timelines, a discussion of the facts, risks, scope, and patterns play a core part in any discussion in determining the short term acceptability of the proposal, and unquestionably all factor in to any long-term discussions that may later happen.

The one last closing thought is that I think we're in the waning days for when such hypothetical issues or concrete delay proposals can or should be discussed. Given the many discussions that have been had regarding revocation - regarding technical non-compliance, compromised keys, weak validation, etc - the argument that "replacing a cert is hard" is not really going to be acceptable anymore without demonstration about what steps are being taken by CAs and Subscribers to mitigate that risk (such as automation) and communicate expectations (such as in Subscriber Agreements or Terms of Sale). I don't think we want to go through 2019, and certainly not come out of it, having the same conversations we've been having in 2018. The best way to prevent that is for CAs to take clear steps to work to resolve these issues with their customers, so that it never becomes an issue for them, or their CA, in the first place. CAs that aren't able to demonstrate steps towards that in future discussions are unlikely to be looked upon too favorably if there are future incident reports.

On Tue, Dec 18, 2018 at 5:43 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

The total number of certs impacted is about 2200. Just more info.

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org <mailto:dev-security-...@lists.mozilla.org> > On
Behalf Of Jeremy Rowley via dev-security-policy
Sent: Tuesday, December 18, 2018 3:28 PM
To: mozilla-dev-security-policy
<mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Underscore characters

We're looking at the feasibility of replacing the certificates with
underscore characters by Jan 15th. Revoking all of the certificates will
cause pretty bad outages. We're prepared to revoke them but would like to
discuss (before the date) what should happen if we don't revoke. There are
about 15 customers (which I can't disclose the names yet but am working on
the list of certs). The number of certificates range between 1700-100
certificates per customer.

The primary reason for this is every one of these organization are in a
holiday blackout. The blackout periods end between Jan 12-Feb 15. Can we
start this discussion now on what this means? I'll provide certificate lists
as I have a timeline on when they plan on replacing.

Jeremy

_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Wayne Thayer

unread,

Dec 20, 2018, 11:04:02 AM12/20/18

to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-security-policy

Jeremy,

On Wed, Dec 19, 2018 at 10:55 PM Jeremy Rowley via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> Done:
>
>
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1515564
>
> Thanks for submitting this.

>
>
> It ended up being about 1200 certs total that we are hearing can’t be
> replaced because of blackout periods.
>

> These 1200 are only the ones that can't be replaced by Jan 15th and will
cause outages if revoked then?

I don't think the information you've supplied is anywhere close to what
Ryan asked for or what the community needs in order to make the decision
you're asking for. I'm looking for specifics on why every cohort (i.e.
every deployment scenario for every customer requesting an extension) of
these certificates can't be revoked, such as:
* Specific per-customer change freeze dates and the rationale for them
* Explanations of the effort and risk involved in replacing them
* Reason that publicly-trusted certificates are in use
* Reason that the provision for 30-day certificates isn't helpful

Only with this information can we have some assurance that any exceptions
are limited to the bare minimum and that we're able to learn and improve.

Without this information, we're still in the situation of blindly trusting
DigiCert to do the right thing, which is no different than having a CA
report an incident after the fact.

Is it realistic to expect that you can provide the level of detail that
Ryan and I are requesting prior to 15-Jan?

Jeremy Rowley

unread,

Dec 20, 2018, 1:13:00 PM12/20/18

to Wayne Thayer, ry...@sleevi.com, mozilla-dev-security-policy

Thanks Wayne. Happy to update with that information. We’ll try to provide it all be end of the year, definitely before Jan 12. I can answer two of these generally now:

* Reason that publicly-trusted certificates are in use

- They are used on websites and infrastructure accessed through browsers. There are some that don’t need to be publicly trusted certificates, but the systems still check revocation. If Mozilla blocked the certs, the impact is minimal. If the certificates are revoke, the infrastructure goes down. Would you like me to identify which ones could be blocked by Mozilla vs. revoked?

* Reason that the provision for 30-day certificates isn't helpful

- The blackout periods don’t allow any change in infrastructure so replacing the certificates is just as difficult as changing the domain names. Even large non-tech companies don’t have a ton of automation so it’s not surprising that they’ll need more days to replace the certs if their blackout period ends the 12th.

Of course, I’ll provide further information in the incident report as more specific information comes in.

From: Wayne Thayer <wth...@mozilla.com>
Sent: Thursday, December 20, 2018 9:04 AM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: ry...@sleevi.com; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

Jeremy,

On Wed, Dec 19, 2018 at 10:55 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

Done:

https://bugzilla.mozilla.org/show_bug.cgi?id=1515564

Thanks for submitting this.

It ended up being about 1200 certs total that we are hearing can’t be replaced because of blackout periods.

These 1200 are only the ones that can't be replaced by Jan 15th and will cause outages if revoked then?

I don't think the information you've supplied is anywhere close to what Ryan asked for or what the community needs in order to make the decision you're asking for. I'm looking for specifics on why every cohort (i.e. every deployment scenario for every customer requesting an extension) of these certificates can't be revoked, such as:

* Specific per-customer change freeze dates and the rationale for them

* Explanations of the effort and risk involved in replacing them

* Reason that publicly-trusted certificates are in use

* Reason that the provision for 30-day certificates isn't helpful

Only with this information can we have some assurance that any exceptions are limited to the bare minimum and that we're able to learn and improve.

Without this information, we're still in the situation of blindly trusting DigiCert to do the right thing, which is no different than having a CA report an incident after the fact.

Is it realistic to expect that you can provide the level of detail that Ryan and I are requesting prior to 15-Jan?

From: Ryan Sleevi <ry...@sleevi.com <mailto:ry...@sleevi.com> >
Sent: Wednesday, December 19, 2018 11:05 AM

To: Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> >

Cc: ry...@sleevi.com <mailto:ry...@sleevi.com> ; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org <mailto:mozilla-dev-s...@lists.mozilla.org> >
Subject: Re: Underscore characters

Look forward to seeing and discussing once the full scope of the request is shared.

Ryan Sleevi

unread,

Dec 20, 2018, 1:16:27 PM12/20/18

to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-security-policy

Thanks for filing this, Jeremy.

If I understand correctly, the request DigiCert is asking is: "If we
submitted this as an incident report, would it be likely that conversations
about distrusting DigiCert would begin?", and that's what you're trying to
gauge from the community?

I think Wayne's already captured the "We need more information", but I
think it may be helpful to explain the reasoning and thinking here.

The Baseline Requirements and Root Program policies exist for a purpose: To
provide a consistent set of expectations for CAs, which meet the security
needs of the products using or operating those policies. As these policies
tend to call out, a CA may be removed (distrusted) for any reason or no
reason at all - it's entirely at the Program discretion. That said, history
tends to largely see removals for patterns of issues that, in aggregate,
demonstrate an ongoing and significant risk to users and the Internet at
large, although there have been CAs removed for single incidents in the
past - such as key compromise or issuing MITM certificates.

As a CA, the risk is that any and every incident may lead to the CA's
removal, and thus the best path to avoid that is to not have incidents in
the first place. Further, a CA with a pattern of incidents is not wrong to
be even more careful when it comes to presenting new incidents, especially
if they realize that they share similar root causes or further demonstrate
problematic patterns. That's not to say that if you only had a single
incident, you wouldn't be removed - as the policies capture, any reason and
no reason - but on the balance, it has historically tended to be
less-likely that first-time incidents lead to removal.

When incidents happen, it becomes necessary for Root Programs and the
communities they represent or collaborate with to evaluate the details of
the incident, as part of making a determination about what next steps are
appropriate. This involves investigating what the underlying root causes
are, to both ensure that the current CA with the incident understands the
significance, the severity, and the steps to remediate it, as well as to
help the industry at large develop and learn best practices, to prevent
future incidents. We're not the only industry to do this - in many ways,
it's borrowed from the aviation industry, that recognizes that critical
safety functions deserve thoughtful and detailed analysis to prevent harm
coming to those that trust in them.

Incident reports also serve to triage the issues - to work and identify the
risks and make sure they're being mitigated in a timely fashion. Sometimes,
the mitigation of risk may be to remove trust in the CA, other times, there
may be less significant steps that can be taken to address both the
immediate problem and the underlying issues.

DigiCert is now in a precarious position. As a CA, it knows that every one
of its Subscribers have agreed, in some legally binding form, that if the
CA has misissued a certificate, that it MUST be revoked within 24 hours
(or, very recently, and only in some cases, 5 days). The CA has a duty and
obligation to their customers, the Subscribers, to make sure that they
understand this. This is not about a punitive measure or punishing those
users for something their CA did - it's because the fundamental and
inherent risk is that there are incidents where certificates will need to
be replaced in as little as 24 hours, up to and including trust in the CA
being removed. To go back to that aviation analogy, the reason planes have
maintenance schedules is not because they're going to completely come
unglued and fall apart if you miss that maintenance schedule by a day - but
because of the severe and significant harm that comes about from having no
maintenance schedule at all, or even simply one that just isn't suitable
for the risks (to life, property, and safety). Matt Palmer's reply earlier
in this thread further expands on some of the other risks here and the
hazards that come with.

At the same time, DigiCert is, on behalf of their customers, saying that
even though both DigiCert and their customers agreed to the 24-hour
revocation rule, there are circumstances and situations that make that
risky. Despite being an industry standard (as captured in the Baseline
Requirements), and despite these agreements, DigiCert is concerned that
there are consequences for these customers that did not take adequate
precautions to meet the expectations they agreed to, and is trying to
perform a risk analysis. Further, they're looking for feedback from the
community to make sure that their analysis of the risk - the disruption to
their customers - is significant enough that it warrants both the immediate
risk of not revoking, the business risk to DigiCert, and the lasting risk
to the ecosystem, in intentionally violating the BRs.

It's not my intent to sound harsh, but to make sure it's clearly and
unambiguously stated as to what's happening. The reason for doing this is
because, on the balance, this seems to be exactly the recommendation in
https://wiki.mozilla.org/CA/Responding_To_An_Incident . This is called out
explicitly in the section on Revocation, which instruct the CA to perform a
risk analysis, develop a report, and devise a plan and timeline for
remediation. Further, this analysis should consider feedback of
third-parties, calling out explicitly both the CA's auditor and Root
Stores, as a means of checking that the analysis is balancing the right
tradeoffs, and that the plan is reasonable.

When a CA reports an incident, there is a discussion about what
certificates were impacted and the CA's plan and timeline to remediate them
- with the standing expectation being immediate revocation without some
otherwise demonstrable exigent risk. These plans factor into how the
incident is responded to by the Root Program - for example, the plan may
have inappropriately balanced the risk, they may have outright
misrepresented it, they may have misunderstood or mislead the community on
the size and scope of the issue, etc. Further, even if a plan is agreed to
as being acceptable (i.e. the incident not leading outright to discussions
of distrust), the incident is not actually closed out until the CA has
demonstrated the successful execution against that plan.

I know this message is long, and much of it stuff you know (but for which
others following may be unfamiliar with), but it gets both to the heart of
the request you're making and the key expectations to be able to respond.
You want to know whether, if this incident were filed, it would lead to a
discussion of distrust in some form, whether individually or in the
collective whole of the issues that DigiCert has had over the past several
years. The only reason we're even discussing this incident, specifically,
is because it relates to revocation following a previous incident
(underscores), which is the only thing acknowledged as even being up for
discussion or risk assessment by CAs. To be unambiguously clear, this would
be a wholly inappropriate request for any other form of BR violation, but
because it's specifically about balancing revocation and risk, it is
allowed, for now.

In order to answer that, we need to know:
1) What's the scope of the issue
2) What are the risks, as identified by DigiCert, and are they meaningfully
explained?
3) What's the concrete plan for remediation being presented

As it stands, it sounds like you've provided #1, which is Question 4 on the
incident report template. As called out by Wayne, #2 seems missing, and
that's captured by Question 6 on the incident report template, combined
with the facts and details from Question 2. Most concerning to me, however,
is that I can't find an answer to #3 - which is what Question 7 on the
template is trying to help identify. These are things that only DigiCert
can answer, and like any other CA, it needs to provide sufficient detail to
demonstrate that the issues are understood and being meaningfully
addressed, and that opportunities to improve are actively being pursued.

Please don't think of this as punishing DigiCert for even asking. I think
its commendable, that for the sole topic of revocation, DigiCert is taking
steps to engage in the risk analysis early, and publicly. You're not the
first CA to do so - other CAs have shared remediation plans regarding, for
example, TLS validation methods, and those too provided ways to balance
risk and measure progress. That said, as I mentioned earlier, I think that
going into 2019, we collectively, and CAs particularly, need to be taking
steps to prevent these conversations from ever being necessary, and,
fortunately or unfortunately for DigiCert, this places y'all in a unique
position of having both opportunity to use this long-standing and existing
practice, but also high-expectations on how to meaningfully ensure this
process never has to happen again.

All of this is said to make it clear that #3 - the concrete plan - not only
needs to include the remediation plan for these specific certs to be
revoked, and concrete dates and measurable milestones to see how well
DigiCert is progressing on that, but also needs to provide details as to
how DigiCert is taking steps to ensure that their customers do not find
themselves in these positions going forward.

For example, a commitment to open, standards-based automation solutions
provides an interoperable, industry-wide solution that such customers can
ensure certificates are replaced timely, whether because the issuing CA
needed to reissue, or because the issuing CA was no longer trusted.
Similarly, one could imagine that a plan also included a communication plan
to existing Subscribers to remind them of the details of the Subscriber
Agreement, which is industry standard and applies to all CAs, in requiring
timely revocation, and providing resources to help those customers prepare.
These are just two examples that, from the limited details provided, seem
to apply, but I expect that as the questions Wayne highlighted about the
risk analysis being performed, it may be that others are identified as
well. And that's what the incident process serves.

On Thu, Dec 20, 2018 at 12:55 AM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> Done:
>
>
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1515564

Wayne Thayer

unread,

Dec 20, 2018, 2:25:23 PM12/20/18

to Ryan Sleevi, Jeremy Rowley, mozilla-dev-security-policy

Jeremy,

It's good to hear that you do believe you can provide the necessary level
of information prior to 15-Jan. Given that, I'm now thinking of this as if
it were a normal incident except that we're moving the reporting prior to
the incident actually occurring. With 15 affected customers, and perhaps
many more deployment scenarios, I would ask you to break this into separate
incident reports per customer as Ryan has previously suggested/requested. I
can understand the desire to not have 15 separate compliance bugs filed
under DigiCert for what is arguably the same issue, but I think that
reporting separately per customer will help to ensure that we receive the
level of detail needed to assess the hypothetical incident.

I'm also not a fan of the proposed 30-April exceptional revocation
deadline. This provides zero opportunity to employ the 30-day cert option
if something is missed, and it seems to be an arbitrary date rather than an
evaluation of the earliest date by which each customer can safely replace
their non-compliant certificates.

- Wayne

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-security-policy
>

Jeremy Rowley

unread,

Dec 20, 2018, 2:28:35 PM12/20/18

to Wayne Thayer, Ryan Sleevi, mozilla-dev-security-policy

I can break down the date by customer. April 30 was the last date for all customers. The actual revocation occurs sometime between Jan 15th and April 30th (still working on a per cert basis to determine this). Note that we actually have the 30 day option available and are recommending it as a remediation instead of an exception. We’d prefer customers to move to the 30 day cert sooner if they can replace the cert but not change the domain name. I’ll file a separate incident report per company.

Thanks!

Jeremy

From: Wayne Thayer <wth...@mozilla.com>
Sent: Thursday, December 20, 2018 12:25 PM
To: Ryan Sleevi <ry...@sleevi.com>
Cc: Jeremy Rowley <jeremy...@digicert.com>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

Jeremy,

It's good to hear that you do believe you can provide the necessary level of information prior to 15-Jan. Given that, I'm now thinking of this as if it were a normal incident except that we're moving the reporting prior to the incident actually occurring. With 15 affected customers, and perhaps many more deployment scenarios, I would ask you to break this into separate incident reports per customer as Ryan has previously suggested/requested. I can understand the desire to not have 15 separate compliance bugs filed under DigiCert for what is arguably the same issue, but I think that reporting separately per customer will help to ensure that we receive the level of detail needed to assess the hypothetical incident.

I'm also not a fan of the proposed 30-April exceptional revocation deadline. This provides zero opportunity to employ the 30-day cert option if something is missed, and it seems to be an arbitrary date rather than an evaluation of the earliest date by which each customer can safely replace their non-compliant certificates.

- Wayne

On Thu, Dec 20, 2018 at 12:55 AM Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> >

wrote:

> Done:
>
>
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1515564
>
>
>
> It ended up being about 1200 certs total that we are hearing can’t be
> replaced because of blackout periods.
>
_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Jeremy Rowley

unread,

Dec 20, 2018, 5:34:37 PM12/20/18

to Wayne Thayer, Ryan Sleevi, mozilla-dev-security-policy

Hey all,

Here’s the first of the companies. Figured I’d do one and see if it has the information you want.

https://bugzilla.mozilla.org/show_bug.cgi?id=1515788

I think this answers all of your questions (except Ryan’s question about remediation). Could you let me know if more detail is required or if you’d like additional info included?

Matt Palmer

unread,

Dec 20, 2018, 6:54:37 PM12/20/18

to dev-secur...@lists.mozilla.org

On Thu, Dec 20, 2018 at 10:34:21PM +0000, Jeremy Rowley via dev-security-policy wrote:
> Here’s the first of the companies. Figured I’d do one and see if it has the information you want.
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1515788

Complete side-note: when the customer said you couldn't identify them by
name, did they know you were going to be linking to a whole pile of
certificates with their organizationName in them? <grin>

> I think this answers all of your questions (except Ryan’s question about
> remediation). Could you let me know if more detail is required or if
> you’d like additional info included?

The question that comes to my mind most of all is tangentially related to
Ryan's question around long-term remediation, but I'll ask it anyway as it's
at least somewhat independent.

You state in the bug you linked that "[The certificates] can't be replaced
before [April 30, 2019] because of the code freeze and the risk of an
outage". That is an absolute statement, which I take to mean that there is
*no* possibility of certificates being replaced in the freeze period (Oct
15-Feb 1).

So, my question is: what would this organization do if they had suffered a
key compromise (to take one possible reason for immediate revocation, which
happens to be forefront in my mind) on Oct 16? Would this organization
continue to use a certificate which uses a known-compromised key until
possibly as late as April 30 -- which, if my counting-on-fingers is correct,
is approximately six and a half months? Would DigiCert consider not
revoking the certificate with a compromised key for that long, if the
customer asked them to? Would DigiCert expect trust stores to bless that
decision?

If the answer to the above questions is "no, of course not!" (as I would
sincerely hope they would be), then your absolute statement that the
certificates "*can't* be replaced" should probably read something more like
"the organization would prefer not to replace the certificates before then,
because it is a lot of work and entails some degree of risk". Which takes
us back to the calculus of options:

1. DigiCert revokes, organization changes domain names and certs: the
organization does lots of work, and takes the risk of Things Going Very
Wrong because of domain name and cert changes.

2. DigiCert revokes, organization switches to 30 day certs: the organization
does lots of work, and takes the risk of Things Going Very Wrong because
of cert changes.

3. DigiCert revokes, organization doesn't swap certs: the organization does
lots of work, and takes the risk of Things Going Very Wrong because of
revocation checking.

4. DigiCert doesn't revoke, on its own behest: the organization does no
work, takes no risk, while DigiCert takes the risk of consequences from
trust stores of failing to follow the BRs.

5. DigiCert doesn't revoke, trust stores bless this decision: the
organization does no work and takes no risk, DigiCert takes no risk,
while trust stores take the risk that CAs' future behaviour is influenced
by the appearance that deliberately failing to adhere to the BRs carries
little-to-no consequences.

Given that the entities which made the decisions which led to the current
situation are DigiCert (for allowing issuance of invalid certificates) and
this organization (which failed to heed their subscriber agreement and
decided to build an infrastructure which cannot be adjusted on the timeframe
required under their subscriber agreement), can you explain why it is
reasonable for the trust stores -- which appear to have done nothing
inappropriate to cause the current state of affairs -- to be the ones taking
on *any* of the risk here?

Please don't think that I'm not sympathetic to the situation this "Major
Pharmacy Benefits Manager" and DigiCert are in -- my day job is keeping
production systems up and running, and the idea of having to make fast
changes isn't one I enjoy. Running a commercial entity is never made any
easier when you have to cause problems for your customers. SC12 is not the
phase-out plan *I* would have chosen, were I Benevolent Dictator of the
Internet. However, now that it is the plan which has been put in place,
following the rules of the game trust stores and CAs have agreed to play by,
I find it disconcerting that DigiCert and Organization One want to shift the
risk for their decisions onto trust stores. What is the *benefit* to trust
stores in taking on that risk, to the undeniable benefit to DigiCert and
Organization One?

Perhaps, at the end of the day, *that* is the real question to be answered.

- Matt

Wayne Thayer

unread,

Dec 20, 2018, 7:28:22 PM12/20/18

to Matt Palmer, MDSP

> I agree that more information is needed here. My hypothetical is that of a
critical vulnerability in one of Organization One's systems being
discovered on 16-Oct. Does Organization One hold off on patching until Feb?
If not, what makes these certificates different? Why is so much
coordination required if they are just used in browsers? Was a risk
assessment performed to evaluate the possibility of replacing them during
the freeze? Are routine changes permitted during the change? If so, why is
a certificate replacement not a routine change?

> I've managed to convince myself that the path we're on is not one of
Mozilla accepting the risk, but rather one of Mozilla shifting the ensuing
incident process from after the risk is taken to before. Assuming that they
can provide enough information in advance and we trust that the incident
will play out as described, this gives DigiCert the benefit of knowing the
likely outcome of the incident investigation. It is still DigiCert deciding
to accept the risk - or not - just as would be the case if they chose not
to revoke without advance notification and discussion.

- Matt
>
>

Jeremy Rowley

unread,

Dec 26, 2018, 11:13:56 AM12/26/18

to dev-secur...@lists.mozilla.org

Hey Matt,

The trust stores are always free to ignore the CAB Forum mandates and make their own rules. Mozilla has in the past (see the Mozilla audit criteria exception for other audits outside of Webtrust and ETSI). The root stores are also the entities that determine what happens if the rules are violated. Thus, we're asking what the violation of this revocation timeline results in and whether Mozilla is enforcing the CAB Forum requirement. The browsers always decide the risk they want to bear and when that risk becomes unacceptable. The question we're asking is whether this particular mis-revocation provision would amount to unacceptable risk to the browsers.

I don't think we're asking browsers to take on any risk. In fact the opposite. The risk of revocation is a browser outage for that website. A delay in revocation gives the operator for specifically issued certificates gives them more time to avoid an outage. Thus, risk is mitigated. A poor explanation, but I think we have to identify what the risk is before browsers can say they are taking on anything additional. The "CAs may be doing bad things in the dark" allegation can't be responded to because it's too vague. I'm also troubled to think that might be a concern as our policy is to over-report issues. Plus, that risk is pretty hard to sell to management as an immediate threat requiring replacement of their certificates. This lack of definition on the problem is also the main difference between this event and a compromised key. Explaining key compromise to executive management for an emergency exception to a blackout period is a lot different than explaining why hundreds of certificates require replacement because they contain underscores. I think everyone would benefit (myself included) if I could get more information about why underscore characters themselves present an actual risk. If we could get a statement on that, you'd see a lot less confusion.

Jeremy

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org> On Behalf Of Matt Palmer via dev-security-policy
Sent: Thursday, December 20, 2018 4:54 PM
To: dev-secur...@lists.mozilla.org
Subject: Re: Underscore characters

On Thu, Dec 20, 2018 at 10:34:21PM +0000, Jeremy Rowley via dev-security-policy wrote:
> Here’s the first of the companies. Figured I’d do one and see if it has the information you want.
>

> https://clicktime.symantec.com/a/1/Vso73rFeURLZ94HBeuxBLdmk1isdwCRJ1YP
> CMFAxstM=?d=I3ucqXub-yr0TW9Ocbs-YTBkM2F0beNtptvGgqlq3YEDH6Fzq26eV5Vign
> YLpVcHu_P3Gdnnz-qiPqcKis3N25Fp-2RGfSxyMcVFVUNXL4_EQlFrw0BYTZpuPCQdk5mm
> -nSlrDH6uc4OgNw1QYDQACt6RPMqV8qWIioLa1QehqMa3nJlcGcR8b3abEqcOYnxwAZBxE
> lxsBIDqHumeVxhaczrPgjNCOobWmoaqVYwIp9ZGyEADoOrpFVhL_p7uYKkSi1JVOAePuQg
> WB8Xu_QHkdm22N_ZkxRxbBdD1Jc0xy4YuXr58Tfv96bX0LGUeM69JWT8_jRwCLwOPgMJXW
> pRNZec6GRDumz3V2itO4ujx1MRsegZuKhuwUOxc3M0QrEHr734ym37mw%3D%3D&u=https
> %3A%2F%2Fbugzilla.mozilla.org%2Fshow_bug.cgi%3Fid%3D1515788

Complete side-note: when the customer said you couldn't identify them by name, did they know you were going to be linking to a whole pile of certificates with their organizationName in them? <grin>

> I think this answers all of your questions (except Ryan’s question
> about remediation). Could you let me know if more detail is required
> or if you’d like additional info included?

The question that comes to my mind most of all is tangentially related to Ryan's question around long-term remediation, but I'll ask it anyway as it's at least somewhat independent.

You state in the bug you linked that "[The certificates] can't be replaced before [April 30, 2019] because of the code freeze and the risk of an outage". That is an absolute statement, which I take to mean that there is
*no* possibility of certificates being replaced in the freeze period (Oct 15-Feb 1).

So, my question is: what would this organization do if they had suffered a key compromise (to take one possible reason for immediate revocation, which happens to be forefront in my mind) on Oct 16? Would this organization continue to use a certificate which uses a known-compromised key until possibly as late as April 30 -- which, if my counting-on-fingers is correct, is approximately six and a half months? Would DigiCert consider not revoking the certificate with a compromised key for that long, if the customer asked them to? Would DigiCert expect trust stores to bless that decision?

If the answer to the above questions is "no, of course not!" (as I would sincerely hope they would be), then your absolute statement that the certificates "*can't* be replaced" should probably read something more like "the organization would prefer not to replace the certificates before then, because it is a lot of work and entails some degree of risk". Which takes us back to the calculus of options:

1. DigiCert revokes, organization changes domain names and certs: the
organization does lots of work, and takes the risk of Things Going Very
Wrong because of domain name and cert changes.

2. DigiCert revokes, organization switches to 30 day certs: the organization
does lots of work, and takes the risk of Things Going Very Wrong because of cert changes.

3. DigiCert revokes, organization doesn't swap certs: the organization does
lots of work, and takes the risk of Things Going Very Wrong because of
revocation checking.

4. DigiCert doesn't revoke, on its own behest: the organization does no
work, takes no risk, while DigiCert takes the risk of consequences from
trust stores of failing to follow the BRs.

5. DigiCert doesn't revoke, trust stores bless this decision: the
organization does no work and takes no risk, DigiCert takes no risk,
while trust stores take the risk that CAs' future behaviour is influenced
by the appearance that deliberately failing to adhere to the BRs carries
little-to-no consequences.

Given that the entities which made the decisions which led to the current situation are DigiCert (for allowing issuance of invalid certificates) and this organization (which failed to heed their subscriber agreement and decided to build an infrastructure which cannot be adjusted on the timeframe required under their subscriber agreement), can you explain why it is reasonable for the trust stores -- which appear to have done nothing inappropriate to cause the current state of affairs -- to be the ones taking on *any* of the risk here?

Please don't think that I'm not sympathetic to the situation this "Major Pharmacy Benefits Manager" and DigiCert are in -- my day job is keeping production systems up and running, and the idea of having to make fast changes isn't one I enjoy. Running a commercial entity is never made any easier when you have to cause problems for your customers. SC12 is not the phase-out plan *I* would have chosen, were I Benevolent Dictator of the Internet. However, now that it is the plan which has been put in place, following the rules of the game trust stores and CAs have agreed to play by, I find it disconcerting that DigiCert and Organization One want to shift the risk for their decisions onto trust stores. What is the *benefit* to trust stores in taking on that risk, to the undeniable benefit to DigiCert and Organization One?

Perhaps, at the end of the day, *that* is the real question to be answered.

- Matt

_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org

https://clicktime.symantec.com/a/1/436E5xm6PkMnwWrirZaS8qJCq35SjOIUEdPkC1sIwlA=?d=I3ucqXub-yr0TW9Ocbs-YTBkM2F0beNtptvGgqlq3YEDH6Fzq26eV5VignYLpVcHu_P3Gdnnz-qiPqcKis3N25Fp-2RGfSxyMcVFVUNXL4_EQlFrw0BYTZpuPCQdk5mm-nSlrDH6uc4OgNw1QYDQACt6RPMqV8qWIioLa1QehqMa3nJlcGcR8b3abEqcOYnxwAZBxElxsBIDqHumeVxhaczrPgjNCOobWmoaqVYwIp9ZGyEADoOrpFVhL_p7uYKkSi1JVOAePuQgWB8Xu_QHkdm22N_ZkxRxbBdD1Jc0xy4YuXr58Tfv96bX0LGUeM69JWT8_jRwCLwOPgMJXWpRNZec6GRDumz3V2itO4ujx1MRsegZuKhuwUOxc3M0QrEHr734ym37mw%3D%3D&u=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy

Ryan Sleevi

unread,

Dec 26, 2018, 12:05:52 PM12/26/18

to Jeremy Rowley, dev-secur...@lists.mozilla.org

Jeremy,

While I can't speak for Wayne, I tried to highlight how dangerous and
problematic this thinking is and framing. By framing it as you have, it
makes it much more difficult to see this as a productive discussion about
handling revocation following an incident, and instead about a CA arguing
that they should be able to ignore the BRs at will. I'm sure you can see
how that latter framing is especially problematic, and I think arguments
that try to present it as such have a chance at steering the conversation
very negatively.

You've heard from two browsers at least (Mozilla and Google) that they
expect an incident report, which means that they are enforcing the Baseline
Requirements and do view this as non-compliance by a CA. There is no
exception being granted - it's non-compliance. Further, the discussion
you're looking to have is seemingly not about whether this particular
incident is problematic in-and-of-itself, even though you've framed it as
such here, but instead whether the pattern and set of incidents represents
a concern about the ongoing risk posed by continued trust. A poor analogy,
but one that hopefully highlights the flaws in the argument you're making,
is a bit like asking "What's so bad about stealing a candy bar from the
shop", while trying to ignore whether you robbed the till the day previous
or have been stealing every day the past week.

The framing that seems to have resonated is that we are NOT talking about
whether or not stealing candy bars is OK and acceptable. We've seemingly
agreed it's bad, and thus (in the CA space) are expecting an incident
report and treating it as an incident. It would be extremely risky to
suggest that stealing is sometimes OK, both in the immediate and long-term.
The question being discussed is what to do if (or, in this case, when)
you're caught stealing, and what it would look like.

Matt's moral hazard is absolutely correct with respect to legitimizing
things - especially treating them as non-incidents. Similarly, I have
concerns with the ideas that CAs can or should ask the community
"Hypothetically, what would happen if we did (Bad Thing X)" - I think that
demonstrates less than stellar trust. That's why I suggested that this is a
continuation of the discussion about underscores - "So, a CA did bad thing
X, how do we get the ecosystem whole without causing unnecessary
challenges" - rather than being on trying to segment out the hierarchy into
compromise vs CA negligence.

Jeremy Rowley

unread,

Dec 26, 2018, 1:03:15 PM12/26/18

to ry...@sleevi.com, dev-secur...@lists.mozilla.org

I don’t think I’m arguing that CAs should ever ignore the BRs. I’m arguing that deciding the consequences of failing to follow the BRs falls in the hands of the browsers. But I think you definitely highlighted why this discussion is confusing. I think all agree on the following:

1. Failure to revoke by Jan 15th is a non-compliance with the BRs.
2. Non-compliances require an incident report
3. The incident should appear on the audit report. Side note – there won’t be audit criteria around this particular issue by the time all certs are revoked. We’re planning to inform our auditor of course (already have in fact), but without audit criteria any delay in issuance essentially goes undetected unless someone in this community notices. Because the audit criteria won’t be updated until well after our audit report, if we were a bad acting CA, the incident would just never show up.

I think the only thing we disagree on is:

4. Can the browser say what happens for a failure to comply with the BRs before the failure happens.

Is that a fair assessment? I see why you wouldn’t want to engage in the question you asked (“Hypothetically, what would happen if we did (Bad Thing X)". That would be terrible. Much better to treat this question as “We know X is going to happen. What’s the best way to mitigate the concerns of the community?” Exception was the wrong word in my original post. I should have used “What would you like us to do to mitigate when we miss the Jan 15ht deadline?” instead. Apologies for the confusion there.

Jeremy

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Wednesday, December 26, 2018 10:00 AM
To: Jeremy Rowley <jeremy...@digicert.com>

Cc: dev-secur...@lists.mozilla.org
Subject: Re: Underscore characters

Ryan Sleevi

unread,

Dec 26, 2018, 1:34:11 PM12/26/18

to Jeremy Rowley, ry...@sleevi.com, dev-secur...@lists.mozilla.org

On Wed, Dec 26, 2018 at 1:03 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> I don’t think I’m arguing that CAs should ever ignore the BRs. I’m arguing
> that deciding the consequences of failing to follow the BRs falls in the
> hands of the browsers. But I think you definitely highlighted why this
> discussion is confusing. I think all agree on the following:
>
> 1. Failure to revoke by Jan 15th is a non-compliance with the BRs.
> 2. Non-compliances require an incident report
> 3. The incident should appear on the audit report. Side note – there
> won’t be audit criteria around this particular issue by the time all certs
> are revoked. We’re planning to inform our auditor of course (already have
> in fact), but without audit criteria any delay in issuance essentially goes
> undetected unless someone in this community notices. Because the audit
> criteria won’t be updated until well after our audit report, if we were a
> bad acting CA, the incident would just never show up.
>
>
>
> I think the only thing we disagree on is:
>

> 1. Can the browser say what happens for a failure to comply with the

> BRs before the failure happens.
>

> Right, and I think this is where Matt was getting into the moral hazard
side of things, because I think this gets to the heart of the "ignore the
BRs".

If this is the question being answered, then it should be that every CA who
had a customer with some need would, rather than tell that customer no,
tell them "Go talk to the browsers". I think that's actively harmful and
unacceptable. It's unacceptable, because if shifts the burden wholly on to
the browsers to ensure the CAs compliance, and it sets a dangerous
precedent that all BRs are up for negotiation, so long as it's before the
failure happens. Further, if the CA doesn't like the answer, then they can
say no - but all of the cost is borne by the community, in discussing and
evaluating, not by the CA, who might decide it's not worth an incident.

That's why I posed it as a separate thing - it's not about discussing what
happens before the failure happens - but that this specific discussion
we're having is about a remediation plan for underscores. This is similar
to discussions for remediations for other incidents, such as sub-CAs that
aren't following the BRs, metadata in OUs, and other forms of invalid
domain names. The 'standard' expectation is 24 hours. SC12 extended that
substantially. And we're discussing why some feel that even SC12's proposed
remediation plan is problematic, and needing concrete details.

> Is that a fair assessment? I see why you wouldn’t want to engage in the
> question you asked (“Hypothetically, what would happen if we did (Bad
> Thing X)". That would be terrible. Much better to treat this question as
> “We know X is going to happen. What’s the best way to mitigate the concerns
> of the community?” Exception was the wrong word in my original post. I
> should have used “What would you like us to do to mitigate when we miss the
> Jan 15ht deadline?” instead. Apologies for the confusion there.
>

While I think "We know X is going to happen" is still problematic
(especially since DigiCert hasn't committed to actually having X happen), I
think you're correct that we're discussing about "How do we best remedy
this issue in a timely fashion", which is consistent with
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

Matt Palmer

unread,

Dec 26, 2018, 4:29:41 PM12/26/18

to dev-secur...@lists.mozilla.org

On Wed, Dec 26, 2018 at 06:02:57PM +0000, Jeremy Rowley via dev-security-policy wrote:
> Much better to treat this question as “We know X is going to happen.
> What’s the best way to mitigate the concerns of the community?” Exception
> was the wrong word in my original post. I should have used “What would
> you like us to do to mitigate when we miss the Jan 15ht deadline?”
> instead. Apologies for the confusion there.

I think that *could* be a productive discussion to have, assuming that (a)
it isn't a hypothetical (as in, "tell us what we'd have to do, and we'll
decide if that's more pain than just following the rules in the first
place") and (b) that there *is* something meaningful that can be done,
*before* the deadline would otherwise expire.

Insofar as DigiCert has been reasonably upfront about the issues they're
facing, and willing to engage in discussion, that isn't a bad thing.
Certainly, it's better than the alternative. Apart from that, though, I'm
not coming up with anything that *has* to be done before the deadline, that
would help to mitigate the problems associated with this incident.

One thing that I think *would* be useful would be to know, in sufficient
detail that the community can see that it *would* work, how DigiCert is
planning to change their practices and processes such that a similar kind of
incident cannot happen again in the future. Now that it is absolutely and
blindingly obvious that certificates may need to be replaced at any time due
to circumstances outside the CA's control, what does DigiCert intend to
change in order to ensure that their subscribers will *always* be prepared
to replace their certificates at short (ideally, five days, as per the
recent BR changes) notice?

If DigiCert had a plan to ensure that, and executed on it in a reasonable
timeframe, it would go a long way to assuaging my worries, at least, that
we're all going to be in this exact same position at some point in the
not-too-distant future.

- Matt

Matt Palmer

unread,

Dec 26, 2018, 4:36:49 PM12/26/18

to dev-secur...@lists.mozilla.org

On Wed, Dec 26, 2018 at 04:13:40PM +0000, Jeremy Rowley via dev-security-policy wrote:
> The trust stores are always free to ignore the CAB Forum mandates and make
> their own rules. Mozilla has in the past (see the Mozilla audit

> criteria).

Whilst the trust stores *can* make their own rules, my observation is that
they generally don't do that except as a last resort, or in exigent
circumstances. Whilst I wasn't around when it was created, my understanding
is that part of the reason the CA/B forum was created was to try and
harmonise trust stores' requirements for CAs, so that CAs could work to a
single set of requirements.

>From that perspective, I can't imagine it would be in the interests of CAs
to encourage trust stores to each do their own thing, in general.

> The browsers always decide the risk they want to bear and when that risk

> becomes unacceptable. The root question is definitely whether this

> particular mis-revocation provision would amount to unacceptable risk to
> the browsers.

Well, *any* amount of risk is "unacceptable" if there is no corresponding
reward to be gained from taking that risk.

> I don't think we're asking browsers to take on any risk.

You disagree with my suggestion that there is a risk that CAs (as a class;
I'm not thinking of DigiCert specifically here) will be less likely to
engage in a rigorous analysis of the relevant specifications in the future,
if not doing so does not result in any harm to them? You don't think there
is any risk to trust stores' reputations from the appearance of giving a
free pass to CAs which did not follow the rules? There is no risk of
appearing to favour certain CAs over others, by effectively punishing those
CAs which *did* play by the rules?

Let me pose to you a hypothetical. Imagine, for a moment, that DigiCert
holds the line, and revokes on January 15. This is certainly going to make
your customers annoyed, I certainly get that. But you do the right thing by
the web PKI, for which relying parties thank you.

Now, one of your competitors, who has all the scruples of an alley tomcat,
can easily find out who those companies are -- even before you revoke, just
by searching crt.sh for certs issued by DigiCert / Symantec with an
underscore in them.

They decide to pull a swift one, and put all their more persuasive
salespeople on a blitzkrieg campaign to contact those companies and say
"hey, I understand DigiCert's done the dirty on you, and they're making you
replace all your certificates and rename all your machines. That's gotta be
tough. If you switch all your certificate issuance to us, we'll give you
two year certs for the existing names, and you don't have to take the risk
of renaming everything and breaking your critical systems."

There's a decent chance that at least one of your customers would leap at
that chance, because whilst it might be risky to change certs, it's a *lot*
less risky than changing certs *and* renaming everything, and they have to
change their certs anyway.

Having lost a customer to a competitor who has gotten that business by
offering them something they really should be offering, would you feel a
little miffed if Mozilla, when presented with clear evidence that the rules
were being violated, decided to give your shady competitor a pass? I fully
expect you would be, and you'd be entirely right to feel that way. I'd be
right next to you getting miffed, too.

Now, do you think there are any CAs out there which have previously issued
underscore-containing certificates, who have already told all their
customers, flat-out, that they're getting revoked, and they just need to get
on with replacing them? Is there any chance that those CAs have lost a
customer as a result of that? Do you think there's any chance that they're
watching what is happening at the moment *very* closely, and getting ready
to be rather miffed if DigiCert gets a pass on the Janusary 15 deadline,
when *they* did the hard yards of telling all their customers their certs
were getting revoked, and *they* took the business risk of potentially
losing customers?

> In fact the opposite. The risk of revocation is a browser outage for that
> website.

The impacts of *that* risk only effect the trust stores to the degree that
users may cease to use the associated browser, for another. It has a far
greater impact on the organisation, which is as it should be, as it was the
organisation's decision to deploy non-conforming names, and an
infrastructure that isn't capable of adhering to the agreements that the
organisation made with their CA.

> A delay in revocation gives the operator for specifically issued
> certificates gives them more time to avoid an outage. Thus, risk is
> mitigated.

Risk is mitigated for one party, (or two), at the cost of increased risk to
another party (trust stores). Typically, in a risk transfer transaction,
there is some consideration that goes along with agreeing to take on that
risk.

Of course, if you don't believe that there is any risk to trust stores by
giving CAs a free pass on this, then you'll see the situation differently,
but in that case we'll just have to agree to disagree on that point.

> The main difference between this event and a compromised key is the
> comparative risks and the number. Explaining the key compromise to
> executive management for an emergency issue is a lot different of an
> exception process than explaining hundreds of certificates that need to be
> replaced because of non-descript issue. If we could provide more
> description around why this is a bad practice that had to be fixed over
> the holidays, you'd see a lot less confusion.

Are you talking about explaining to *your* executive management, or the
executive management of Organisation One?

The explanation to Organisation One's executive management, which I assume
would be done by their technical people, should be fairly simple: "the
certificates we were issued weren't correct, and they're going to stop
working on January 15. We need to replace them before then." That
pre-supposes, of course, that CAs are actually going to revoke them. The
executive management can dig further into the details if they want, but the
long and the short of it is that the certificates are going to stop working
on January 15, there's nothing anyone can do to stop that, and they need to
be replaced. I assume that CA subscriber agreements are suitably watertight
that Organisation One's general counsel will tell them to stop being silly
if they talk about legal action.

The explanation to your company's executive management might be slightly
more fraught, if only because the explanation boils down to, "we dun goofed,
boss, and handed out certs we shouldn't have". I have no idea what sort of
an organisation DigiCert is, but I can imagine in a suitably toxic
organisation that could be a Career Limiting Maneuver. I can only hope that
DigiCert isn't one of those.

Speaking of explaining things to executive management, though: can you
imagine being a CA which *did* play by the rules, and then trying to explain
to the Big Bosses why they did the right thing and revoked on time, when
their competitors didn't? And there were no negative consequences for those
competitors? How much harder would it be for those people to convince their
bosses to do the right thing next time?

- Matt

Ryan Sleevi

unread,

Dec 27, 2018, 1:30:30 PM12/27/18

to Jeremy Rowley, ry...@sleevi.com, dev-secur...@lists.mozilla.org

On Wed, Dec 26, 2018 at 1:03 PM Jeremy Rowley <jeremy...@digicert.com>

wrote:

> Much better to treat this question as “We know X is going to happen.
> What’s the best way to mitigate the concerns of the community?” Exception
> was the wrong word in my original post. I should have used “What would you
> like us to do to mitigate when we miss the Jan 15ht deadline?” instead.
> Apologies for the confusion there.
>

As I tried to highlight several times during early discussions, it's not
really ideal to have each of these trickle in over time.

DigiCert has apparently decided that for 14-15 customers it has sufficient
information to know that X is going to happen, based on their risk
analysis. Why are we seeing bugs trickle in, such as
https://bugzilla.mozilla.org/show_bug.cgi?id=1516545 ?

It would seem uncontroversial to suggest that, as part of the risk analysis
that DigiCert is claiming has already been done, that it has all the
information for an incident report for all of the customers it expects to
not revoke certificates for. If it doesn't, then it suggests that the risk
analysis is not being done responsibly, and being outsourced to the
community to perform.

Should we expect another 12 bugs to be filed? If so, when? If not, why?

As mentioned, if treating this as part of a "Responding to underscores"
incident, then this has the effect of being a slow trickle of an incomplete
incident report overall, and incomplete remediation plan, and those tend
not to bode well. I don't think it'd really be engaging with mitigating to,
say, file a bug on Jan 14th - so how do we move the discussion forward and
make sure the facts are available?

Jeremy Rowley

unread,

Dec 27, 2018, 1:47:22 PM12/27/18

to ry...@sleevi.com, dev-secur...@lists.mozilla.org

The original incident report contained all of the details of the initial filing. The additional, separated reports are trickling in as I get enough info to post something in reply to the updated questions. As the questions asked have changed from the original 7 in the Mozilla incident report, getting the info back takes time. Especially during the holiday season. We’re also working to close out as many without an exception as possible. Note that the deadline has not passed yet so all of these incident reports are theoretical (and not actually incidents) until Jan 15th. I gave the community the total potential number of certificates impacted and the total number of customers so we can have a community discussion on the overall risk and get public comments into the process before the deadline passes. I’m unaware of any policy at Mozilla or Google that provides guidance on how to file expected issues before they happen. If there is, I’d gladly follow that.

I’ve started 3 bugs while closing out two additional customers. I have enough info to file maybe 1-2 more reports. The rest will probably be filed after the new year when people are back working.

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Thursday, December 27, 2018 11:24 AM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: ry...@sleevi.com; dev-secur...@lists.mozilla.org
Subject: Re: Underscore characters

Jeremy Rowley

unread,

Dec 27, 2018, 1:50:10 PM12/27/18

to mozilla-dev-security-policy

There's a little bit of a "damned if you do, damned if you don't problem here". Wait until you have all the information? That's a paddlin'. File before you have enough information? That's a paddlin'. I'd appreciate better guidance on what Mozilla expects from these incident reports timing-wise.

Ryan Sleevi

unread,

Dec 27, 2018, 3:21:41 PM12/27/18

to Jeremy Rowley, mozilla-dev-security-policy

I'm not trying to throw you under the bus here, but I think it's helpful if
you could highlight what new information you see being required, versus
that which is already required.

I think, yes, you're right that it's not well received if you go violate
the BRs and then, after the fact, say "Hey, yeah, we violated, but here's
why", and finding out that the reasons are met with a lot of skepticism and
the math being shaky, and you can see that from past incident reports it
doesn't go over well.

But it's also not well received if it's before, and the statement is "Our
customer thinks we should violate the BRs. What would happen if we did, and
what information do you need from us?". That gets into the moral hazard
that Matt spoke to, and is a huge burden on the community where the
expectation is that the CA says "Sorry, we can't do that".

So the assumption here is that, in all of this discussion, DigiCert's done
everything it can to understand the issue, the timelines, remediation, etc,
and has plans to address both each and every customer and the systemic
issues that have emerged. If that's not the case, then how are we not in
one of those two scenarios above? And if it is the case, isn't that
information readily available by now?

>From the discussions on the incident reports, I feel like that's been the
heart of the questions; which is trying to understand what the root cause
is and what the remediation plan is. The statement "We'll miss the first
deadline, but we'll hit the second", but without any details about how or
why, or the steps being taken to ensure no deadlines are missed in the
future, doesn't really inspire confidence, and is exactly the same kind of
feedback that would be given post-incident.

James Burton

unread,

Dec 27, 2018, 3:45:43 PM12/27/18

to Ryan Sleevi, Jeremy Rowley, mozilla-dev-security-policy

For a CA to intentionally state that they are going to violate the BR
requirements means that that CA is under immense pressure to comply with
demands or face retribution. The severity inflicted on a CA by
intentionally violating the BR requirements can be severe. Rolling a dice
of chance. Why take the risk?

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-security-policy
>

thomas....@gmail.com

unread,

Dec 27, 2018, 3:53:14 PM12/27/18

to mozilla-dev-s...@lists.mozilla.org

As to why these certificates have to be revoked, you should see this the other way round: as a very generous service of the community to you and your customers!

Certificates with (pseudo-)hostnames in them are clearly invalid, so a conforming implementation should not accept them for anything and they should not pose any security risk. Based on this assessment (no revokation if no security risk), a CA could very well issue a certificate including any of the (psuedo-)hostnames "example.com_cvs.com", "example.com/cvs.com", "cvs.com/example.com", "https://example.com/cvs.com", "examp...@cvs.com" to the owner of example.com (who, arguably, has the exact same right to them as the owner of cvs.com has) and refuse to revoke them.

As to the consequences (in case this really becomes an incident report/incident reports): this shows a SEVERE lack of ability to revoke certificates on DigiCert's side, which must have been known AND ACCEPTED for a long time (this cannot be the first "blackout period" of (in the best case) 3.5 months). Thus, it seems to be a good idea to:

1. Henceforth, make NSS only accept certificates by DigiCert with a maximum validity of 100 days. Let's Encrypt has shown that this is clearly feasible.

or

2. Henceforth, require DigiCert to revoke a small, randomly (e.g., using RFC 3797) selected subset of their certificates every day (within 7 days). If this, e.g., for the same reasons as outlined in these incident reports, is not possible, it will trigger (a incrementally decreasing number of) more incident reports.

Both proposals would lead to more automation and a better understanding of the requirement of timely revocation, while pushing the ecosystem in the right direction. For its easiness, the first proposal would be my favorite but I would be very interested in hearing other people's thoughts about these proposals.

Ryan Sleevi

unread,

Dec 27, 2018, 4:01:01 PM12/27/18

to James Burton, Ryan Sleevi, Jeremy Rowley, mozilla-dev-security-policy

I'm not really sure I understand this response at all. I'm hoping you can
clarify.

On Thu, Dec 27, 2018 at 3:45 PM James Burton <j...@0.me.uk> wrote:

> For a CA to intentionally state that they are going to violate the BR
> requirements means that that CA is under immense pressure to comply with
> demands or face retribution.
>

I'm not sure I understand how this flows. Comply with whose demands? Face
retribution from who, and why?

> The severity inflicted on a CA by intentionally violating the BR
> requirements can be severe. Rolling a dice of chance. Why take the risk?
>

I'm not sure I understand the question at the end, and suspect there's a
point to the question I'm missing.

Presumably, a CA stating they're going to violate the BR requirements,
knowing the risk to trust that it may pose, would have done everything
possible to gather every piece of information so that they could assess the
risk of violation is outweighed by whatever other risks (in this case,
revocation). If that's the case, is it unreasonable to ask how the CA
determined that - which is the root cause analysis question? And how to
mitigate whatever other risk (in this case, revocation) poses going
forward, so that violating the BRs isn't consistently seen as the "best"
option?

Peter Bowen

unread,

Dec 27, 2018, 4:19:50 PM12/27/18

to thomas....@gmail.com, mozilla-dev-s...@lists.mozilla.org

On Thu, Dec 27, 2018 at 12:53 PM thomas.gh.horn--- via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

>
> As to why these certificates have to be revoked, you should see this the
> other way round: as a very generous service of the community to you and
> your customers!
>
> Certificates with (pseudo-)hostnames in them are clearly invalid, so a
> conforming implementation should not accept them for anything and they
> should not pose any security risk. Based on this assessment (no revokation
> if no security risk), a CA could very well issue a certificate including
> any of the (psuedo-)hostnames "example.com_cvs.com", "example.com/cvs.com",
> "cvs.com/example.com", "https://example.com/cvs.com", "examp...@cvs.com"
> to the owner of example.com (who, arguably, has the exact same right to
> them as the owner of cvs.com has) and refuse to revoke them.
>

I'm not clear how you get that the owner of example.com is covered anywhere
here. Parsed into labels, these all have com as the label closet to the
root and then have 'com_cvs', 'com/cvs', 'com/example', 'com/cvs', and
'com@cvs' as the next label respectively. None have 'example' as the next
label.

> As to the consequences (in case this really becomes an incident
> report/incident reports): this shows a SEVERE lack of ability to revoke
> certificates on DigiCert's side, which must have been known AND ACCEPTED
> for a long time (this cannot be the first "blackout period" of (in the best
> case) 3.5 months).

I don't see how this follows. DigiCert has made it clear they are able to
technically revoke these certificates and presumably are contractually able
to revoke them as well. What is being said is that their customers are
asking them to delay revoking them because the _customers_ have blackout
periods where the customers do not want to make changes to their systems.
DigiCert's customers are saying that they are judging the risk from
revocation is greater than the risk from leaving them unrevoked and asking
DigiCert to not revoke. DigiCert is then presenting this request along to
Mozilla to get feedback from Mozilla.

> Thus, it seems to be a good idea to:
>
> 1. Henceforth, make NSS only accept certificates by DigiCert with a
> maximum validity of 100 days. Let's Encrypt has shown that this is clearly
> feasible.
>
> or
>
> 2. Henceforth, require DigiCert to revoke a small, randomly (e.g., using
> RFC 3797) selected subset of their certificates every day (within 7 days).
> If this, e.g., for the same reasons as outlined in these incident reports,
> is not possible, it will trigger (a incrementally decreasing number of)
> more incident reports.
>
> Both proposals would lead to more automation and a better understanding of
> the requirement of timely revocation, while pushing the ecosystem in the
> right direction. For its easiness, the first proposal would be my favorite
> but I would be very interested in hearing other people's thoughts about
> these proposals.
>

I don't agree that demanding all certificate customers have "more
automation" is desirable. I am very familiar with the Chaos Monkey
approach Netflix has implemented and companies like Gremlin that offer
similar "Failure as a Service" products, but forcing this on customers
seems like a poor idea.

Thanks,
Peter

James Burton

unread,

Dec 27, 2018, 4:24:27 PM12/27/18

to Ryan Sleevi, Jeremy Rowley, mozilla-dev-security-policy

On Thu, Dec 27, 2018 at 9:00 PM Ryan Sleevi <ry...@sleevi.com> wrote:

> I'm not really sure I understand this response at all. I'm hoping you can
> clarify.
>
> On Thu, Dec 27, 2018 at 3:45 PM James Burton <j...@0.me.uk> wrote:
>
>> For a CA to intentionally state that they are going to violate the BR
>> requirements means that that CA is under immense pressure to comply with
>> demands or face retribution.
>>
>
> I'm not sure I understand how this flows. Comply with whose demands? Face
> retribution from who, and why?
>

The CA must be under immense pressure to comply with demands from certain
customers to determine that they don't have much of a choice but to
intentionally violate the BR requirements and by telling community and root
stores early they are hoping for leniency. The retribution by them
customers could be legal which is outside of this forum but is but it's
still relevant to them if that is the case.

>
>> The severity inflicted on a CA by intentionally violating the BR
>> requirements can be severe. Rolling a dice of chance. Why take the risk?
>>
>
> I'm not sure I understand the question at the end, and suspect there's a
> point to the question I'm missing.
>

The CA is rolling the dice of chance, they are intentionally risking
everything by violating the BR requirements and they know that such action
can face sanctions or distrust in the wrong case. The question I asked is
why are they taking the risk which leads from the first statement.

Matt Palmer

unread,

Dec 27, 2018, 5:04:52 PM12/27/18

to dev-secur...@lists.mozilla.org

On Thu, Dec 27, 2018 at 01:19:26PM -0800, Peter Bowen via dev-security-policy wrote:
> I don't see how this follows. DigiCert has made it clear they are able to
> technically revoke these certificates and presumably are contractually able
> to revoke them as well. What is being said is that their customers are
> asking them to delay revoking them because the _customers_ have blackout
> periods where the customers do not want to make changes to their systems.
> DigiCert's customers are saying that they are judging the risk from
> revocation is greater than the risk from leaving them unrevoked and asking
> DigiCert to not revoke. DigiCert is then presenting this request along to
> Mozilla to get feedback from Mozilla.

It's worth clarifying that "risk" is not a property of the universe, like
magnetic flux density, but rather is assessed relative to specific entities.
Thus, when talking about risk, it's worth clearly identifying to whom a risk
is associated, as in this variant of part of the above paragraph:

> DigiCert's customers are saying that they are judging the risk *to them*
> from revocation is greather than the risk *to them* from leaving them
> unrevoked

I'm sure you're familiar with all this, Peter. I just thought it was worth
highlighting for a wider audience, that one entity's assessment of risk to
them doesn't make it a physical constant that applies equally to everyone.
I find it very helpful when assessing such things to attach explicit
markers, somewhat like ensuring I specify both magnitude *and* direction on
my vectors.

- Matt

Jeremy Rowley

unread,

Dec 27, 2018, 6:56:58 PM12/27/18

to James Burton, Ryan Sleevi, mozilla-dev-security-policy

The risk is primarily outages of major sites across the web, including certs used in Google wallet. We’re thinking that is a less than desirable result, but we weren’t sure how the Mozilla community would feel/react. We’re still considering revoking all of the certs on Jan 15th based on these discussions. I don’t think we’re asking for leniency (maybe we are if that’s a factor?), but I don’t know what happens if you’re faced with causing outages vs. compliance. I started the conversation because I feel like we should be good netizans and make people aware of what’s going on instead of just following policy. I’m actually surprised at least one other CA that has issued a large number of underscore character certs hasn’t run into the same timing issues.

Normally, we would just revoke the certs, but there are a significant number of certs in the Alexa top 100. We’ve told most customers, “No exception”. I also thought it’s better to get the information out there so we can all make rational decisions (DigiCert included) if as many facts are known as possible.

We are working with the partners to get the certs revoked before the deadline. Most will. By January 15th, I hope there won’t be too many certs left. Unfortunately, by then it’s also too late to discuss what happens if the cert is not revoked. Ie – what are the benefits of revoking (strict compliance) vs revoking the larger impact certs as they are migrated (incident report). Unfortunately part 2, there’s no guidance on whether an incident report means total distrust v. something on your audit and a stern lecture. I’d happily suffer a lecture than take down a top site. Not so willing to gamble the whole company. This is why we wanted to have the discussion now, despite no violation so far. The response from the browsers is public - that they cannot make that determination. Does that mean we have our answer? Revoke is the only acceptable response?

From: James Burton <j...@0.me.uk>
Sent: Thursday, December 27, 2018 2:24 PM
To: Ryan Sleevi <ry...@sleevi.com>
Cc: Jeremy Rowley <jeremy...@digicert.com>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

On Thu, Dec 27, 2018 at 9:00 PM Ryan Sleevi <ry...@sleevi.com <mailto:ry...@sleevi.com> > wrote:

I'm not really sure I understand this response at all. I'm hoping you can clarify.

Jeremy Rowley

unread,

Dec 27, 2018, 7:08:50 PM12/27/18

to Peter Bowen, thomas....@gmail.com, mozilla-dev-s...@lists.mozilla.org

This is accurate. We have the technical capability and policy ability to
revoke the certificates. What we were hoping was a discussion based on
impact of the revocation so we could hear what we should do. Blind obedience
isn't my favorite answer, but it's an option. The guidance so far is file an
incident report now so we can discuss the potential impact. I've filed for
two companies, crossed a couple more off the list, and am still working with
the remainder to get things resolved. Although some have escalated over my
head, I think most are eager to hear what the community has to say. I also
think this is an interesting question for Mozilla's policy - not sure we've
ever addressed a potential non-compliance like this.

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org> On
Behalf Of Peter Bowen via dev-security-policy
Sent: Thursday, December 27, 2018 2:19 PM
To: thomas....@gmail.com
Cc: mozilla-dev-s...@lists.mozilla.org
Subject: Re: Underscore characters

I don't see how this follows. DigiCert has made it clear they are able to
technically revoke these certificates and presumably are contractually able
to revoke them as well. What is being said is that their customers are
asking them to delay revoking them because the _customers_ have blackout
periods where the customers do not want to make changes to their systems.
DigiCert's customers are saying that they are judging the risk from
revocation is greater than the risk from leaving them unrevoked and asking
DigiCert to not revoke. DigiCert is then presenting this request along to
Mozilla to get feedback from Mozilla.

> Thus, it seems to be a good idea to:
>
> 1. Henceforth, make NSS only accept certificates by DigiCert with a
> maximum validity of 100 days. Let's Encrypt has shown that this is
> clearly feasible.
>
> or
>
> 2. Henceforth, require DigiCert to revoke a small, randomly (e.g.,
> using RFC 3797) selected subset of their certificates every day (within 7
days).
> If this, e.g., for the same reasons as outlined in these incident
> reports, is not possible, it will trigger (a incrementally decreasing
> number of) more incident reports.
>
> Both proposals would lead to more automation and a better
> understanding of the requirement of timely revocation, while pushing
> the ecosystem in the right direction. For its easiness, the first
> proposal would be my favorite but I would be very interested in
> hearing other people's thoughts about these proposals.
>

I don't agree that demanding all certificate customers have "more
automation" is desirable. I am very familiar with the Chaos Monkey approach
Netflix has implemented and companies like Gremlin that offer similar
"Failure as a Service" products, but forcing this on customers seems like a
poor idea.

Thanks,
Peter

Jeremy Rowley

unread,

Dec 27, 2018, 7:12:18 PM12/27/18

to thomas....@gmail.com, mozilla-dev-s...@lists.mozilla.org

This is very helpful. If I had those two options, we'd just revoke all the
certs, screw outages. Unfortunately, the options are much broader than that.
If I could know what the risk v. benefit is, then you can make a better
decision? DigiCert distrusted - all revoked. DigiCert gets some mar on its
audit - outages seem worse. Make sense?

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org> On

Behalf Of thomas.gh.horn--- via dev-security-policy
Sent: Thursday, December 27, 2018 1:50 PM
To: mozilla-dev-s...@lists.mozilla.org
Subject: Re: Underscore characters

As to why these certificates have to be revoked, you should see this the
other way round: as a very generous service of the community to you and your
customers!

Certificates with (pseudo-)hostnames in them are clearly invalid, so a
conforming implementation should not accept them for anything and they
should not pose any security risk. Based on this assessment (no revokation
if no security risk), a CA could very well issue a certificate including any
of the (psuedo-)hostnames "example.com_cvs.com", "example.com/cvs.com",
"cvs.com/example.com",

"https://clicktime.symantec.com/a/1/Bz3KjBhWfzAsIJ0uIM5iJZb_Vq9KOZqIbbEqrWx1
PPc=?d=nuBPRsMXvpmDCViEfj_vdMTuPr8sqLAI5iKEWF4ohV9p1yKSHaat1UnUMwQC2TM1Glbqm
sZ5vll_Ws-lffmZiGXLoAjAa1j4xYlIvj_mjSSwyyAqosT8up883sRCNtFds_0zcjRxOOoj2-Clo
cugotsEOb5kZj4DN2uJO-MXnpA-ayZPZSvrBhJ61IzJdnfMh1ufcgt0H6eS4MDVVELwAzREz5sDF
lQhRCO_bmD3I3jI7vj9qUbLzQFJGYVKa0aQ_RlnmWxfRFD0s4bJcUeW2SLinms3T2PnVDt62TguH
hnVQeT7XLb0uAGF0x7KNhbpJbykznPGT6vDGP6xnntYiQHZgZqRiOfJvYE642rqp3X9NoRx26Q0Q
Qy4KgOGUE-nAs60vFYry1msFrinKGViW9Q%3D&u=https%3A%2F%2Fexample.com%2Fcvs.com"

, "examp...@cvs.com" to the owner of example.com (who, arguably, has the
exact same right to them as the owner of cvs.com has) and refuse to revoke
them.

As to the consequences (in case this really becomes an incident
report/incident reports): this shows a SEVERE lack of ability to revoke
certificates on DigiCert's side, which must have been known AND ACCEPTED for
a long time (this cannot be the first "blackout period" of (in the best

case) 3.5 months). Thus, it seems to be a good idea to:

1. Henceforth, make NSS only accept certificates by DigiCert with a maximum
validity of 100 days. Let's Encrypt has shown that this is clearly feasible.

or

2. Henceforth, require DigiCert to revoke a small, randomly (e.g., using RFC
3797) selected subset of their certificates every day (within 7 days). If
this, e.g., for the same reasons as outlined in these incident reports, is
not possible, it will trigger (a incrementally decreasing number of) more
incident reports.

Both proposals would lead to more automation and a better understanding of
the requirement of timely revocation, while pushing the ecosystem in the
right direction. For its easiness, the first proposal would be my favorite
but I would be very interested in hearing other people's thoughts about
these proposals.

_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org

https://clicktime.symantec.com/a/1/2hiT00ldRBQieEaN_06CurvCo04Hq3RsaRxAAoyWN
IY=?d=nuBPRsMXvpmDCViEfj_vdMTuPr8sqLAI5iKEWF4ohV9p1yKSHaat1UnUMwQC2TM1Glbqms
Z5vll_Ws-lffmZiGXLoAjAa1j4xYlIvj_mjSSwyyAqosT8up883sRCNtFds_0zcjRxOOoj2-Cloc
ugotsEOb5kZj4DN2uJO-MXnpA-ayZPZSvrBhJ61IzJdnfMh1ufcgt0H6eS4MDVVELwAzREz5sDFl
QhRCO_bmD3I3jI7vj9qUbLzQFJGYVKa0aQ_RlnmWxfRFD0s4bJcUeW2SLinms3T2PnVDt62TguHh
nVQeT7XLb0uAGF0x7KNhbpJbykznPGT6vDGP6xnntYiQHZgZqRiOfJvYE642rqp3X9NoRx26Q0QQ
y4KgOGUE-nAs60vFYry1msFrinKGViW9Q%3D&u=https%3A%2F%2Flists.mozilla.org%2Flis
tinfo%2Fdev-security-policy

Matt Palmer

unread,

Dec 27, 2018, 7:55:09 PM12/27/18

to dev-secur...@lists.mozilla.org

On Fri, Dec 28, 2018 at 12:12:03AM +0000, Jeremy Rowley via dev-security-policy wrote:
> This is very helpful. If I had those two options, we'd just revoke all the
> certs, screw outages. Unfortunately, the options are much broader than that.
> If I could know what the risk v. benefit is, then you can make a better
> decision? DigiCert distrusted - all revoked. DigiCert gets some mar on its
> audit - outages seem worse. Make sense?

Given that Mozilla wants CAs to abide by its policies, which include
adherence to the BRs, and you appear to be saying that you'll adhere to the
BRs if you're threatened with distrust... I'd say the logical response from
Mozilla would be to threaten distrust. I doubt, especially now, that you'll
get a categorical advance "it's OK to not revoke" from Mozilla.

- Matt

Jeremy Rowley

unread,

Dec 27, 2018, 8:01:26 PM12/27/18

to ry...@sleevi.com, mozilla-dev-security-policy

The 7 required items under the Mozilla template are:

1. Timeline of events
2. Timeline of actions taken
3. Whether the CA has stopping issuing
4. Summary of problematic certs
5. Cert data
6. How mistakes were made
7. Remediation plan

The info we’re working on getting a complete list of:

1. Blackout periods
2. Where each cert is used in the infrastructure
3. Why 30 day certs won’t work (on a per cert basis)
4. Reason the certs are publicly trusted
5. What risk are associated with the replacement
6. The date each cert can be revoked

Mostly we’re hearing back general answers. They’re almost all the same answer, but I’m really trying to get the level of detail requested.

I see how you could interpret the question that way. I see it more as the CAB forum got the date wrong. Could Mozilla please extend this after weighing the risks of revoking vs. non-revoking? Maybe two sides of the same question.

The second deadline is coming from the impacted parties. That’s the request from them so I’m relaying it on. Everyone is willing to move, just a matter of timing. If there’s a better balance of risk vs. risk, then we’d be happy to hear that.

>> So the assumption here is that, in all of this discussion, DigiCert's done everything it can to understand the issue, the

>> timelines, remediation, etc, and has plans to address both each and every customer and the systemic issues that have

>> emerged. If that's not the case, then how are we not in one of those two scenarios above? And if it is the case, isn't that

>> information readily available by now?

The information is readily available for the companies I posted in incident reports, particularly the first one. I think we’ve done everything reasonable to understand the issue. I haven’t, for example, chartered a flight to sit in their data center and examine their infrastructure. We do have daily calls with most of them on the issue. Maybe the amount of information the company has provided should be the guiding light?

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Thursday, December 27, 2018 1:16 PM
To: Jeremy Rowley <jeremy...@digicert.com>

Cc: mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

I'm not trying to throw you under the bus here, but I think it's helpful if you could highlight what new information you see being required, versus that which is already required.

I think, yes, you're right that it's not well received if you go violate the BRs and then, after the fact, say "Hey, yeah, we violated, but here's why", and finding out that the reasons are met with a lot of skepticism and the math being shaky, and you can see that from past incident reports it doesn't go over well.

But it's also not well received if it's before, and the statement is "Our customer thinks we should violate the BRs. What would happen if we did, and what information do you need from us?". That gets into the moral hazard that Matt spoke to, and is a huge burden on the community where the expectation is that the CA says "Sorry, we can't do that".

So the assumption here is that, in all of this discussion, DigiCert's done everything it can to understand the issue, the timelines, remediation, etc, and has plans to address both each and every customer and the systemic issues that have emerged. If that's not the case, then how are we not in one of those two scenarios above? And if it is the case, isn't that information readily available by now?

>From the discussions on the incident reports, I feel like that's been the heart of the questions; which is trying to understand what the root cause is and what the remediation plan is. The statement "We'll miss the first deadline, but we'll hit the second", but without any details about how or why, or the steps being taken to ensure no deadlines are missed in the future, doesn't really inspire confidence, and is exactly the same kind of feedback that would be given post-incident.

On Thu, Dec 27, 2018 at 1:50 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

There's a little bit of a "damned if you do, damned if you don't problem here". Wait until you have all the information? That's a paddlin'. File before you have enough information? That's a paddlin'. I'd appreciate better guidance on what Mozilla expects from these incident reports timing-wise.

Jeremy Rowley

unread,

Dec 27, 2018, 8:05:10 PM12/27/18

to Matt Palmer, mozilla-dev-security-policy

I disagree that we won't get that. I think we could see a "it's okay to wait
until April 30 for large pharmacy" or "Waiting until April 30 is too long
but March 1 is okay". I don't think Mozilla wants outages either. But... if
Mozilla did say that we should revoke now, that would be great as well. I'd
have a firm answer I can go back with. No risk, but no exception.

Well except moral risk of course....

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org> On
Behalf Of Matt Palmer via dev-security-policy
Sent: Thursday, December 27, 2018 5:55 PM
To: dev-secur...@lists.mozilla.org
Subject: Re: Underscore characters

_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org

https://clicktime.symantec.com/a/1/JAUY6LMmpzDeGtxtOiXLJVWWYjWV65xcMjKoLj_GS
gs=?d=2r4BCPONnLRAQaYxhIYsrR2xI_C73HdzeRvSzxfwF1rOccA0cfq95qcKptTpNVYkGzCfgl
u40QMyhwHQJyWghm9tDreLIrUFB4D0ugqZlnn2SKyEI85b9QcQlb6I-o78NypjSLQRAUF9s9i5tF
sXc6oVsnhZly7GCR8HrTZqfLEL8fXQKwA8A7MRCYPr2Hy61TCorYztrVr2u8IME1WcJdVQxd1tkB
MIgZG8M74du5AO2ELfvkGfV3pBYbOUubjwoFhmqqgsHy5GyDIO_EZS68OavUwfNHvpkZ-5paTSWR
yGwQFw0uz8CKa2kO0IOOBGt55A-WAyvJnhPJScUvwu_c9n2KmEljO7EbvvYGYA0E3Ef6rWWdpZbm
D8FZ39LChfaUgdEP4DX6Y%3D&u=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-
security-policy

James Burton

unread,

Dec 27, 2018, 8:07:29 PM12/27/18

to Jeremy Rowley, Matt Palmer, mozilla-dev-security-policy

I'm not sure if you're allowed to state this publicly. Has Microsoft giving
you the go ahead?

On Fri, Dec 28, 2018 at 1:05 AM Jeremy Rowley via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> I disagree that we won't get that. I think we could see a "it's okay to
> wait
> until April 30 for large pharmacy" or "Waiting until April 30 is too long
> but March 1 is okay". I don't think Mozilla wants outages either. But... if
> Mozilla did say that we should revoke now, that would be great as well. I'd
> have a firm answer I can go back with. No risk, but no exception.
>
> Well except moral risk of course....
>

> -----Original Message-----
> From: dev-security-policy <dev-security-...@lists.mozilla.org>
> On
> Behalf Of Matt Palmer via dev-security-policy
> Sent: Thursday, December 27, 2018 5:55 PM
> To: dev-secur...@lists.mozilla.org
> Subject: Re: Underscore characters
>

> On Fri, Dec 28, 2018 at 12:12:03AM +0000, Jeremy Rowley via
> dev-security-policy wrote:
> > This is very helpful. If I had those two options, we'd just revoke all
> > the certs, screw outages. Unfortunately, the options are much broader
> than
> that.
> > If I could know what the risk v. benefit is, then you can make a
> > better decision? DigiCert distrusted - all revoked. DigiCert gets some
> > mar on its audit - outages seem worse. Make sense?
>
> Given that Mozilla wants CAs to abide by its policies, which include
> adherence to the BRs, and you appear to be saying that you'll adhere to the
> BRs if you're threatened with distrust... I'd say the logical response from
> Mozilla would be to threaten distrust. I doubt, especially now, that
> you'll
> get a categorical advance "it's OK to not revoke" from Mozilla.
>
> - Matt
>
> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
>
> https://clicktime.symantec.com/a/1/JAUY6LMmpzDeGtxtOiXLJVWWYjWV65xcMjKoLj_GS
>
> gs=?d=2r4BCPONnLRAQaYxhIYsrR2xI_C73HdzeRvSzxfwF1rOccA0cfq95qcKptTpNVYkGzCfgl
>
> u40QMyhwHQJyWghm9tDreLIrUFB4D0ugqZlnn2SKyEI85b9QcQlb6I-o78NypjSLQRAUF9s9i5tF
>
> sXc6oVsnhZly7GCR8HrTZqfLEL8fXQKwA8A7MRCYPr2Hy61TCorYztrVr2u8IME1WcJdVQxd1tkB

> <https://clicktime.symantec.com/a/1/JAUY6LMmpzDeGtxtOiXLJVWWYjWV65xcMjKoLj_GSgs=?d=2r4BCPONnLRAQaYxhIYsrR2xI_C73HdzeRvSzxfwF1rOccA0cfq95qcKptTpNVYkGzCfglu40QMyhwHQJyWghm9tDreLIrUFB4D0ugqZlnn2SKyEI85b9QcQlb6I-o78NypjSLQRAUF9s9i5tFsXc6oVsnhZly7GCR8HrTZqfLEL8fXQKwA8A7MRCYPr2Hy61TCorYztrVr2u8IME1WcJdVQxd1tkB>
>
> MIgZG8M74du5AO2ELfvkGfV3pBYbOUubjwoFhmqqgsHy5GyDIO_EZS68OavUwfNHvpkZ-5paTSWR
>
> yGwQFw0uz8CKa2kO0IOOBGt55A-WAyvJnhPJScUvwu_c9n2KmEljO7EbvvYGYA0E3Ef6rWWdpZbm
> D8FZ39LChfaUgdEP4DX6Y%3D&u=https%3A%2F%2Flists.mozilla.org
> %2Flistinfo%2Fdev-
> security-policy

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-security-policy
>

Jeremy Rowley

unread,

Dec 27, 2018, 8:10:34 PM12/27/18

to James Burton, Matt Palmer, mozilla-dev-security-policy

Treading carefully…

Mozilla is the only browser related to the discussion. Probably sufficient to say that the revocation/no-revoke decision is entirely dependent on the results of this thread.

From: James Burton <j...@0.me.uk>
Sent: Thursday, December 27, 2018 6:07 PM
To: Jeremy Rowley <jeremy...@digicert.com>

Cc: Matt Palmer <mpa...@hezmatt.org>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

I'm not sure if you're allowed to state this publicly. Has Microsoft giving you the go ahead?

On Fri, Dec 28, 2018 at 1:05 AM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

I disagree that we won't get that. I think we could see a "it's okay to wait
until April 30 for large pharmacy" or "Waiting until April 30 is too long
but March 1 is okay". I don't think Mozilla wants outages either. But... if
Mozilla did say that we should revoke now, that would be great as well. I'd
have a firm answer I can go back with. No risk, but no exception.

Well except moral risk of course....

-----Original Message-----
From: dev-security-policy <dev-security-...@lists.mozilla.org <mailto:dev-security-...@lists.mozilla.org> > On
Behalf Of Matt Palmer via dev-security-policy
Sent: Thursday, December 27, 2018 5:55 PM

To: dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
Subject: Re: Underscore characters

On Fri, Dec 28, 2018 at 12:12:03AM +0000, Jeremy Rowley via
dev-security-policy wrote:
> This is very helpful. If I had those two options, we'd just revoke all
> the certs, screw outages. Unfortunately, the options are much broader than
that.
> If I could know what the risk v. benefit is, then you can make a
> better decision? DigiCert distrusted - all revoked. DigiCert gets some
> mar on its audit - outages seem worse. Make sense?

Given that Mozilla wants CAs to abide by its policies, which include
adherence to the BRs, and you appear to be saying that you'll adhere to the
BRs if you're threatened with distrust... I'd say the logical response from
Mozilla would be to threaten distrust. I doubt, especially now, that you'll
get a categorical advance "it's OK to not revoke" from Mozilla.

- Matt

_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://clicktime.symantec.com/a/1/JAUY6LMmpzDeGtxtOiXLJVWWYjWV65xcMjKoLj_GS <https://clicktime.symantec.com/a/1/JAUY6LMmpzDeGtxtOiXLJVWWYjWV65xcMjKoLj_GSgs=?d=2r4BCPONnLRAQaYxhIYsrR2xI_C73HdzeRvSzxfwF1rOccA0cfq95qcKptTpNVYkGzCfglu40QMyhwHQJyWghm9tDreLIrUFB4D0ugqZlnn2SKyEI85b9QcQlb6I-o78NypjSLQRAUF9s9i5tFsXc6oVsnhZly7GCR8HrTZqfLEL8fXQKwA8A7MRCYPr2Hy61TCorYztrVr2u8IME1WcJdVQxd1tkB>

gs=?d=2r4BCPONnLRAQaYxhIYsrR2xI_C73HdzeRvSzxfwF1rOccA0cfq95qcKptTpNVYkGzCfgl
u40QMyhwHQJyWghm9tDreLIrUFB4D0ugqZlnn2SKyEI85b9QcQlb6I-o78NypjSLQRAUF9s9i5tF
sXc6oVsnhZly7GCR8HrTZqfLEL8fXQKwA8A7MRCYPr2Hy61TCorYztrVr2u8IME1WcJdVQxd1tkB
MIgZG8M74du5AO2ELfvkGfV3pBYbOUubjwoFhmqqgsHy5GyDIO_EZS68OavUwfNHvpkZ-5paTSWR
yGwQFw0uz8CKa2kO0IOOBGt55A-WAyvJnhPJScUvwu_c9n2KmEljO7EbvvYGYA0E3Ef6rWWdpZbm

D8FZ39LChfaUgdEP4DX6Y%3D&u=https%3A%2F%2Flists.mozilla.org <http://2Flists.mozilla.org> %2Flistinfo%2Fdev-

security-policy
_______________________________________________
dev-security-policy mailing list

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Ryan Sleevi

unread,

Dec 27, 2018, 9:15:01 PM12/27/18

to Jeremy Rowley, James Burton, Ryan Sleevi, mozilla-dev-security-policy

On Thu, Dec 27, 2018 at 6:56 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> The risk is primarily outages of major sites across the web, including
> certs used in Google wallet. We’re thinking that is a less than desirable
> result, but we weren’t sure how the Mozilla community would feel/react.
>

I don’t think that is a particularly helpful framing, to be honest. The
risk these organizations face here is self-inflicted; regardless of the
feeling of underscores, there is unquestionably an issue for organizations
that cannot respond in the BR timeframes, let alone extended ones that
extend for months. That's a real ecosystem issue, and regardless of the CA
these customers partner with, an issue that needs both better understanding
and, to be honest, better prevention.

Matt has spoken at length to the risk to the community, which doesn’t
really seem like it’s been acknowledged, let alone proposed as to how it
will be mitigated. I have to ask again - what steps is DigiCert taking to
avoid these issues going forward?

We’re still considering revoking all of the certs on Jan 15th based on
> these discussions. I don’t think we’re asking for leniency (maybe we are
> if that’s a factor?), but I don’t know what happens if you’re faced with
> causing outages vs. compliance.
>

What happens is that you ask why there is risk of outage to begin with and
what can be done to improve going forward? Let’s assume you do revoke, and
it causes an outage - is DigiCert taking steps to ensure no customer of
theirs is ever faced with that risk? If so, what are those steps?

I started the conversation because I feel like we should be good netizans
> and make people aware of what’s going on instead of just following policy.
> I’m actually surprised at least one other CA that has issued a large number
> of underscore character certs hasn’t run into the same timing issues.
>

This seems to suggest that perhaps other CAs have prepared their customers
for revocation. How does this surprise - that no other CA faces this - lead
to tangible changes in the business processes? How would this change, if
another CA did have the same issue? Surely you can see there are real and
fundamental issues that you’re uniquely qualified to help your customers
address in ways that we cannot.

Have you analyzed CT, for example, to see why DigiCert is unique?
Certainly, by sheer volume, it's heavily tilted towards the old Symantec
infrastructure - and the customers that came over to DigiCert. With those
sorts of details, how does this change how things were done, or how they
will be done?

I’m not trying to pick on y’all - I think it is legitimately good that you
provided concrete data. Even if you do revoke on Jan 15, this is still
useful to understand the challenges, but only if this leads to meaningful
changes. What might those look like?

Normally, we would just revoke the certs, but there are a significant
> number of certs in the Alexa top 100. We’ve told most customers, “No
> exception”. I also thought it’s better to get the information out there so
> we can all make rational decisions (DigiCert included) if as many facts are
> known as possible.
>

And this is the framing that I think is incredibly helpful. Understanding
why customers can’t change, and what steps are being done to ensure they
can, is hugely useful. Wayne’s question were to this point - as were mine
towards understanding the problem from the other side, which are steps the
CA is taking. As I've repeatedly highlighted from
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , the goal
is not punishment - but understanding how these issues are being addressed.

>
> We are working with the partners to get the certs revoked before the
> deadline. Most will.
>

This seems like a significant improvement from “100% of customers can’t”

By January 15th, I hope there won’t be too many certs left. Unfortunately,
> by then it’s also too late to discuss what happens if the cert is not
> revoked. Ie – what are the benefits of revoking (strict compliance) vs
> revoking the larger impact certs as they are migrated (incident report).
> Unfortunately part 2, there’s no guidance on whether an incident report

> means total distrust v. something on your audit and a stern lecture.
>

I mean, it’s two-fold, right? Any incident can lead to total distrust, but
it’s also unlikely that a single incident leads to total distrust. The way
to balance those competing statements is to do what you’re doing - and to
be transparent. As Matt has highlighted, there’s a huge risk here that this
leads to a moral hazard - and the best way to mitigate that is to discuss
steps being taken to reduce that risk going forward, particularly about
what a core part of the problem statement is - difficulty in revocation.

I’d happily suffer a lecture than take down a top site. Not so willing to
> gamble the whole company. This is why we wanted to have the discussion now,
> despite no violation so far. The response from the browsers is public -
> that they cannot make that determination. Does that mean we have our
> answer? Revoke is the only acceptable response?
>

I mean, the answer has been to repeatedly highlight
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

In a number of ways, an unintentional violation is worse than an
intentional violation. Ignorance is not really an excuse when you hold keys
to the Internet, and being asleep at the wheel is hugely dangerous. So, if
I had to pick between an intentional violation and an unintentional (and
preventable) violation, I'd likely pick intentional. But there's also a
huge hazard with intentional violations - those reveal potentially systemic
issues and a lack of good faith, especially if they become common-place. We
definitely saw CAs perform intentional violations and notify
after-the-fact, and that's far, far worse than those that notify before
intentionally violating (I think every post-facto notification for
intentional incident has, eventually, lead to that CAs distrust).

So somewhere on the scale of things, we're in a better place than most
every alternative. But to ensure this is in that 'good faith' side of
things, understanding what the factors are that have been evaluated, and
what steps are being taken to prevent this, are significant. As I said, I
think the principles captured in
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation and in the
discussion about how at least some of us see this (that it's related to
underscores incident response) suggests that it's not, in fact, the end of
the world, or the CA, provided that meaningful data behind the decision to
not revoke is given, meaningful plans and timelines for resolution are
given, and meaningful steps to prevent this from ever happening again are
given. It becomes an incident report, and the result is not a stern lecture
- but concrete and quantifiable steps as to how to improve.

Matt Palmer

unread,

Dec 27, 2018, 9:20:38 PM12/27/18

to dev-secur...@lists.mozilla.org

On Thu, Dec 27, 2018 at 11:56:41PM +0000, Jeremy Rowley via dev-security-policy wrote:
> The risk is primarily outages of major sites across the web, including
> certs used in Google wallet. We’re thinking that is a less than desirable
> result, but we weren’t sure how the Mozilla community would feel/react.

I don't think there's *any* result from all this that everyone would consider
desirable -- otherwise we wouldn't need to have this conversation.

> We’re still considering revoking all of the certs on Jan 15th based on
> these discussions. I don’t think we’re asking for leniency (maybe we are
> if that’s a factor?)

I'm not sure I'd call it "leniency", but I think you're definitely asking
for "special treatment" -- pre-judgment on a potential incident so you can
decide whether or not it's worth it (to DigiCert) to deliberately break the
rules.

> Normally, we would just revoke the certs, but there are a significant
> number of certs in the Alexa top 100. We’ve told most customers, “No
> exception”.

What were the criteria by which DigiCert decided which customers to grant
exceptions to? My default assumption is "whichever ones will cost us the
most money, on a risk-of-departure-weighted basis, if we revoke their
misissued certs", so if DigiCert's criteria was different, I'd be keen to
have my assumption changed.

> I also thought it’s better to get the information out there so we can all
> make rational decisions (DigiCert included) if as many facts are known as
> possible.

There are a number of areas that I think could stand to have some more facts
added.

First off, your customers. There is a certain amount of exposition in the
pharmacy company bug, however I can't say that what's there so far fills me
with a sense of contentment. You said in your most recent post, "Security
vulnerabilities are patched based on their rating", and that lacking a CVSS
it is difficult to get recognition of a problem. Would it be fair to say
that this narrow approach to security is shared by all/most/some/none of the
other similarly situated customers?

As an aside, on the subject of "there's no CVSS score for this", let me fix
that up, with the official WombleSecure(TM)(R)(Patent Pending) CVSS for
"your certs are getting revoked":

https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H/E:H/RL:O/RC:C/AR:H/MAV:N/MAC:L/MPR:N/MUI:N/MS:U/MC:N/MI:N/MA:H

7.5 base, 7.2 temporal, and 8.9 environmental. All those scores are in the
"high" band. "Availability" *is* one of the sides of the security triangle,
after all.

Focusing on the "what about next time?" aspect, which I believe is the most
important, I'd be interested to know what your customers are planning on
changing about their systems and processes, such that if a similar event
happens in the future, the outcome won't be the same.

A similar question applies, even more forcefully, to DigiCert itself.
Clearly, whatever you've done so far didn't work, because these customers of
yours didn't heed whatever warnings and caveats you provided, and built
themselves systems and processes that are unable to comply with their
agreements to DigiCert (and, by extension, relying parties).

Hence, what is it that DigiCert plans to change, such that an equivalent
result cannot happen in the future, given a similar event? There was one
rather draconian possibility suggested up-thread, of DigiCert limiting
itself to 100 days validity, and revoking a number of randomly-chosen
certificates periodically. That would certainly remove any practical
possibility of customers not being able to refresh their certificates
if-and-when, however I can imagine it might be a bit of a shock to the
system for many of them.

Hence, I'd be interested in hearing what DigiCert's actual plans are,
because if it were my call, *that* would be the single biggest factor in
determining the disposition of an event like this. That errors occur is
regrettable, but it's when they happen repeatedly that it becomes
indefensible.

- Matt

Jeremy Rowley

unread,

Dec 27, 2018, 10:01:03 PM12/27/18

to ry...@sleevi.com, James Burton, mozilla-dev-security-policy

The risk Matt identified is too nebulous of an issue to address, tbh. How do you address a moral issue? The only way I can think of to address the moral issue is to say “we promise to be good”. But the weight that carries depends on how much you trust the actor. If you trust the actor, then the moral issue is addressed. If you don’t trust the actor, moral issue is not addressed. If you or Matt can identify a specific threat you’d like me to address about the moral issue, I’ll do my best to respond.

* What happens is that you ask why there is risk of outage to begin with and what can be done to improve going forward? Let’s assume you do revoke, and it causes an outage - is DigiCert taking steps to ensure no customer of theirs is ever faced with that risk? If so, what are those steps?

Yeah – there are several things we can do to improve going forward:

1. Communicate better with the customers. The first mistake was waiting until we had good data to communicate with the customers. This delayed notification. This was unknown to me at the time, or we would have sent out communication prior to the ballot passing. That instruction has been passed along (no waiting on these critical issues) plus training.
2. No more skipping CAB Forum meetings for me. This was easily a foreseeable issue because we knew people couldn’t replace in January. I think it’s been brought up a half dozen times in the forum at least. I’m not sure why we didn’t communicate this in Shanghai. But, the real problem is I didn’t have direct knowledge of what was going on. I probably need to be there in person each time so we can align the company correctly with that is going on.

I don’t think we can ever take steps to ensure that no customer is ever faced with the risk of revoked certs. I’m sure there will be other items that are adopted we don’t foresee. That said, we do promote automation, short-lived certs (you can get anything from about 8 hours up through our system), and CT logging. I think the biggest surprise on this one was it applied to certs that are no longer trusted by Mozilla or Google.

> This seems to suggest that perhaps other CAs have prepared their customers for revocation. How does this surprise - that no other CA faces this - lead to tangible changes in the business processes? How would this change, if another CA did have the same issue? Surely you can see there are real and fundamental issues that you’re uniquely qualified to help your customers address in ways that we cannot.

I suppose they did prepare better. Maybe other CAs are just smarter than me? I won’t leave that off the table. I agree that we are uniquely positioned to help our customers remediate. Definitely anxious to do that (and are doing so).

* Have you analyzed CT, for example, to see why DigiCert is unique? Certainly, by sheer volume, it's heavily tilted towards the old Symantec infrastructure - and the customers that came over to DigiCert. With those sorts of details, how does this change how things were done, or how they will be done?

We do know most of the customers were legacy Symantec, but there are definitely some DigiCert customers in there. I think we still continue the same course. It’s only been a year from the transition, and we’ve migrated nearly everyone off the Symantec infrastructure. Next comes shutting down all the legacy Symantec systems.

* I’m not trying to pick on y’all - I think it is legitimately good that you provided concrete data. Even if you do revoke on Jan 15, this is still useful to understand the challenges, but only if this leads to meaningful changes. What might those look like?

I appreciate that. I think these are all fair questions, and I’m trying my best to answer them. I especially don’t feel picked on since we’re requesting the information/decision on what to do.

I don’t know how to answer the question of what changes to make because I was a bit blindsided by the decision to revoke the certs. Probably shouldn’t have been considering the conversation at the CAB Forum. My number one priority right now is to shut down all of the legacy Symantec systems. Last year was mostly migration of issuance and trying to get the systems up to an expected caliber of performance. At the same time we’re introducing industry-standard (and above) automation of issuance and deployment systems that we hope will help people replace certificates faster.

* And this is the framing that I think is incredibly helpful. Understanding why customers can’t change, and what steps are being done to ensure they can, is hugely useful. Wayne’s question were to this point - as were mine towards understanding the problem from the other side, which are steps the CA is taking. As I've repeatedly highlighted from https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , the goal is not punishment - but understanding how these issues are being addressed.

The main blocker for all of these is policy, not technology. I don’t know how to solve third party policy decisions, which is why I can’t seem to answer the questions. The process of planning a change, getting sign-off, rolling the change to stage, getting more sign-off, and then rolling to production with final testing combined with the blackout periods is making something that should be easy very difficult. I run an agile team at DigiCert so none of these are concerns when we roll a change internally. It’s the revocation part that is getting people up in arms. The consistent message I’ve gotten from customers is that changing domains and certificates requires the same process. It’s just as fast to roll out a change to both items as change just a certificate. The built-in CAB Forum 30 day cert requirement isn’t solving the issue because of the way they roll changes, not because the 30 day certs aren’t available.

* This seems like a significant improvement from “100% of customers can’t”

Definitely an improvement. I’m hoping to get to 100% by the time we hit Jan 15th. The four I posted (and one more I got more info from today) probably won’t. Even within those customers, we’re asking them identify specifically which certificates cannot be replaced in time.

* I mean, it’s two-fold, right? Any incident can lead to total distrust, but it’s also unlikely that a single incident leads to total distrust. The way to balance those competing statements is to do what you’re doing - and to be transparent. As Matt has highlighted, there’s a huge risk here that this leads to a moral hazard - and the best way to mitigate that is to discuss steps being taken to reduce that risk going forward, particularly about what a core part of the problem statement is - difficulty in revocation.

This isn’t our first incident sadly ☹. It probably won’t be our last. The transition from Symantec to DigiCert was….rough.

* In a number of ways, an unintentional violation is worse than an intentional violation. Ignorance is not really an excuse when you hold keys to the Internet, and being asleep at the wheel is hugely dangerous. So, if I had to pick between an intentional violation and an unintentional (and preventable) violation, I'd likely pick intentional. But there's also a huge hazard with intentional violations - those reveal potentially systemic issues and a lack of good faith, especially if they become common-place. We definitely saw CAs perform intentional violations and notify after-the-fact, and that's far, far worse than those that notify before intentionally violating (I think every post-facto notification for intentional incident has, eventually, lead to that CAs distrust).

Totally agree. I really don’t want to violate the BRs, and this shouldn’t be the norm. I also recognize we don’t want to invite this question for every BR change. Maybe better Mozilla guidelines about what’s acceptable requests and what’s not?

* So somewhere on the scale of things, we're in a better place than most every alternative. But to ensure this is in that 'good faith' side of things, understanding what the factors are that have been evaluated, and what steps are being taken to prevent this, are significant. As I said, I think the principles captured in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation and in the discussion about how at least some of us see this (that it's related to underscores incident response) suggests that it's not, in fact, the end of the world, or the CA, provided that meaningful data behind the decision to not revoke is given, meaningful plans and timelines for resolution are given, and meaningful steps to prevent this from ever happening again are given. It becomes an incident report, and the result is not a stern lecture - but concrete and quantifiable steps as to how to improve.

Thanks Ryan. This post was really nice. Appreciate it.

From: Ryan Sleevi <ry...@sleevi.com>
Sent: Thursday, December 27, 2018 7:15 PM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: James Burton <j...@0.me.uk>; Ryan Sleevi <ry...@sleevi.com>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>
Subject: Re: Underscore characters

On Thu, Dec 27, 2018 at 6:56 PM Jeremy Rowley <jeremy...@digicert.com <mailto:jeremy...@digicert.com> > wrote:

The risk is primarily outages of major sites across the web, including certs used in Google wallet. We’re thinking that is a less than desirable result, but we weren’t sure how the Mozilla community would feel/react.

Jeremy Rowley

unread,

Dec 27, 2018, 10:19:35 PM12/27/18

to Matt Palmer, mozilla-dev-security-policy

> I don't think there's *any* result from all this that everyone would
> consider desirable -- otherwise we wouldn't need to have this conversation.

+ 1 to that.

> I'm not sure I'd call it "leniency", but I think you're definitely asking
> for "special treatment" -- pre-judgment on a potential incident so you can
> decide whether or not it's worth it (to DigiCert) to deliberately break the
> rules.

I'm not sure there's a policy against asking for special treatment or
pre-judgment. Like I said, I feel like this is a weird area where I'm not 100%
sure how to proceed. Like how do you raise when you think obedience to rules
is riskier than breaking them? Breaking them then explaining why seems like a
really bad idea. The best I could come up with is ask what to do and see if
the browsers agree. Acknowledged that this would be very bad in most cases,
but I'm not sure where you decide?

> What were the criteria by which DigiCert decided which customers to grant
> exceptions to? My default assumption is "whichever ones will cost us the
> most money, on a risk-of-departure-weighted basis, if we revoke their
> misissued certs", so if DigiCert's criteria was different, I'd be keen to
> have my assumption changed.

Based on the number of certificates, the reasons the customer identified they
couldn't make change, and whether revocation would take down a critical site.
It actually isn't tied to $ at all. The largest issuer of certificates isn't
on the exception list. Honestly, it came down to which ones were the most mad
at me for telling them I am going to revoke their certs. I also filed the
incident reports in that order.

> First off, your customers. There is a certain amount of exposition in the
> pharmacy company bug, however I can't say that what's there so far fills me
> with a sense of contentment. You said in your most recent post, "Security
> vulnerabilities are patched based on their rating", and that lacking a CVSS
> it is difficult to get recognition of a problem. Would it be fair to say
> that this narrow approach to security is shared by all/most/some/none of the
> other similarly situated customers?

No, but it's generally how people can get exceptions to the blackout period.
More the norm is around how these certs are rolled out. They fall under three
camps: a) a third party offering the main companies service that requires a
bunch of testing and permissions (probably contractual), b) complicated
policies about changes during/around blackout periods and c) certs actually
used in software that require code changes and deployment to update.

> As an aside, on the subject of "there's no CVSS score for this", let me fix
> that up, with the official WombleSecure(TM)(R)(Patent Pending) CVSS for
> "your certs are getting revoked":

https://clicktime.symantec.com/a/1/yUHHbekYeF5I1ApCiRHB3c4GRi5h119CZduhXSUjcHQ=?d=jjXJ8wGMEM-BgSpW3_vhyQL0sXCIhGbj3gBpMQofOamgauLb68trqD6rFgW1WlGMp2x8t2VFcaY0DBIxDVgeeB1NTgFMApldbJMcAgO-QzAYKleHGSG1QMDssL8YiuasGm7sy54zIql5pGoFC32z-FPTIi19g1UDgwcBY97oowWvIdYn96-dpAc9Bgo0beU6KZJB4GgT4nsTYZfQEPWR6iJovigq7cka80r2jfU6Ef-FnpegGAkDENlMwnIoHo4ti6V0kNC1BnXX92EeVaD_XCRNLlzHjHvbe0_9OrBDSAOuXH7r90tkFNs5Jf15Y9tnE-nNgpNo-7ATwrZ6C-AfpSHr9tX-RnCPFHoSUEIJ9az2IiiMo_si4rA2uaMaKtjN1Ziuk7XNO9s%3D&u=https%3A%2F%2Fwww.first.org%2Fcvss%2Fcalculator%2F3.0%23CVSS%3A3.0%2FAV%3AN%2FAC%3AL%2FPR%3AN%2FUI%3AN%2FS%3AU%2FC%3AN%2FI%3AN%2FA%3AH%2FE%3AH%2FRL%3AO%2FRC%3AC%2FAR%3AH%2FMAV%3AN%2FMAC%3AL%2FMPR%3AN%2FMUI%3AN%2FMS%3AU%2FMC%3AN%2FMI%3AN%2FMA%3AH

7.5 base, 7.2 temporal, and 8.9 environmental. All those scores are in the
"high" band. "Availability" *is* one of the sides of the security triangle,
after all.

Lol - thanks. I'll be sure to share this with them.

> Focusing on the "what about next time?" aspect, which I believe is the most
> important, I'd be interested to know what your customers are planning on
> changing about their systems and processes, such that if a similar event
> happens in the future, the outcome won't be the same.

After this, I'd like to talk about removing some of the Symantec roots from
Mozilla. A lot of these don't need trust in Mozilla and Chrome. The mix is in
the OS vs. Web ecosystem. They need trust in OS platforms, but Web is more
optional for a lot of the certs. If we have roots that are only trusted in
the two OS platforms (MS and Apple), the risk changes for the web community.

> A similar question applies, even more forcefully, to DigiCert itself.
> Clearly, whatever you've done so far didn't work, because these customers of
> yours didn't heed whatever warnings and caveats you provided, and built
> themselves systems and processes that are unable to comply with their
> agreements to DigiCert (and, by extension, relying parties).

See above. Also see my response to Ryan on the migration from legacy Symantec
systems.

> Hence, what is it that DigiCert plans to change, such that an equivalent
> result cannot happen in the future, given a similar event? There was one
> rather draconian possibility suggested up-thread, of DigiCert limiting
> itself to 100 days validity, and revoking a number of randomly-chosen
> certificates periodically. That would certainly remove any practical
> possibility of customers not being able to refresh their certificates
> if-and-when, however I can imagine it might be a bit of a shock to the
> system for many of them.

I don't think that really solves the problem. All that does is migrate people
from one CA to another. I don't have a good answer to this other than to
continue investing in automation and discovery systems. As mentioned though,
most of these complexities are with third party policy issues, not technical
issues. I'm unaware of any legal requirement that would prohibit revocation. I
know there's no technical limitation on our side that prevents issuing and
deploying new certs immediately.

> Hence, I'd be interested in hearing what DigiCert's actual plans are,
> because if it were my call, *that* would be the single biggest factor in
> determining the disposition of an event like this. That errors occur is
> regrettable, but it's when they happen repeatedly that it becomes
> indefensible.

True, but I think there should always be a discussion of risks involved in a
course of action. Blind obedience to RFCs isn't a good idea. A worse idea is
to not follow them because you don't want to. An even worser worse idea is to
challenge every rule you don't like from the CAB forum. Not sure the balance,
but I think my answer next time is "Revoke, no exception" rather than "Let me
see what the browser community thinks".

_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org

https://clicktime.symantec.com/a/1/Jd9aOCU0Y-jXTwNlEa00EncPIIJo69Sdp69qqPsHGHU=?d=jjXJ8wGMEM-BgSpW3_vhyQL0sXCIhGbj3gBpMQofOamgauLb68trqD6rFgW1WlGMp2x8t2VFcaY0DBIxDVgeeB1NTgFMApldbJMcAgO-QzAYKleHGSG1QMDssL8YiuasGm7sy54zIql5pGoFC32z-FPTIi19g1UDgwcBY97oowWvIdYn96-dpAc9Bgo0beU6KZJB4GgT4nsTYZfQEPWR6iJovigq7cka80r2jfU6Ef-FnpegGAkDENlMwnIoHo4ti6V0kNC1BnXX92EeVaD_XCRNLlzHjHvbe0_9OrBDSAOuXH7r90tkFNs5Jf15Y9tnE-nNgpNo-7ATwrZ6C-AfpSHr9tX-RnCPFHoSUEIJ9az2IiiMo_si4rA2uaMaKtjN1Ziuk7XNO9s%3D&u=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy

Ryan Sleevi

unread,

Dec 27, 2018, 10:22:48 PM12/27/18

to Jeremy Rowley, ry...@sleevi.com, James Burton, mozilla-dev-security-policy

On Thu, Dec 27, 2018 at 10:00 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> The risk Matt identified is too nebulous of an issue to address, tbh. How
> do you address a moral issue? The only way I can think of to address the
> moral issue is to say “we promise to be good”. But the weight that carries
> depends on how much you trust the actor. If you trust the actor, then the
> moral issue is addressed. If you don’t trust the actor, moral issue is not
> addressed. If you or Matt can identify a specific threat you’d like me to
> address about the moral issue, I’ll do my best to respond.
>

I think Matt provided a pretty clear moral hazard here - of customers
suggesting their CAs didn't do enough (e.g. should have tried harder to
intentionally violated by not revoking). One significant way to mitigating
that risk is to take meaningful steps to ensure that "We couldn't revoke"
is not really a viable or defensible option.

>
> - What happens is that you ask why there is risk of outage to begin

> with and what can be done to improve going forward? Let’s assume you do
> revoke, and it causes an outage - is DigiCert taking steps to ensure no
> customer of theirs is ever faced with that risk? If so, what are those
> steps?
>
>
>
> Yeah – there are several things we can do to improve going forward:
>
> 1. Communicate better with the customers. The first mistake was
> waiting until we had good data to communicate with the customers. This
> delayed notification. This was unknown to me at the time, or we would have
> sent out communication prior to the ballot passing. That instruction has
> been passed along (no waiting on these critical issues) plus training.
> 2. No more skipping CAB Forum meetings for me. This was easily a
> foreseeable issue because we knew people couldn’t replace in January. I
> think it’s been brought up a half dozen times in the forum at least. I’m
> not sure why we didn’t communicate this in Shanghai. But, the real problem
> is I didn’t have direct knowledge of what was going on. I probably need to
> be there in person each time so we can align the company correctly with
> that is going on.
>

> That... doesn't really inspire confidence. If the answer for how to deal
with this is block efforts to remediate issues, then it runs all the risk
that Matt was speaking to. "We knew people couldn't replace in January" is
a problem, for sure, but because fundamentally the risk is always there
that someone would need to revoke in January - or December, or November, or
whenever the sensitive holiday freeze or critical sales or lunar alignment
or personal vacation is - it's not really a mitigation at all for the issue.

I tried to give suggestions earlier for meaningful steps - such as making
sure all customers know that certificates may need to be revoked as soon as
24 hours. This has been a pattern of challenge in the past for DigiCert if
I recall correctly - I believe both Blizzard and GitHub had issues where
the keys were compromised, but these organizations didn't want to revoke
the certs until they could ship new private keys in their software (...
ignoring all the issues in that one). I know you've said you've got the
contracts in place to defensibly revoke these, but how are you helping your
users understand these risks? Do you have documentation on this? Do you
recommend users use automation? I know some of this speaks to business
practice, but I think that's somewhat core to the issue - since revocation
may be required, how is the CA, the party best placed to communicate to the
customer, communicating that necessity?

As Matt spoke to it somewhat, there's understandably competitive advantage
to being the CA that will try their hardest not to revoke. And while I
don't think this has risen to that level based on the information provided
so far, understanding how that perception is being mitigated is key. There
are other solutions, to be sure. Helping users move from publicly trusted
CAs to managed CAs, for example, can still meet the business needs of these
users w/o the attendant revocation risk.

Things like Heartbleed have shown that rapid revocation can be necessary.
Misissuance or misvalidation by the CA that results in revocation surely
can as well. Understandably, an answer of "Don't ever misissue" is great,
but if it's really pinning all the hopes on one thing. Other CAs have taken
steps like ensuring automation and short-lived certs as a way of ensuring
that the upper-bound of any issue is limited (for example, to 90 days, or
six months), and that automation is the default way of getting certs.

>
> - And this is the framing that I think is incredibly helpful.

> Understanding why customers can’t change, and what steps are being done to
> ensure they can, is hugely useful. Wayne’s question were to this point - as
> were mine towards understanding the problem from the other side, which are
> steps the CA is taking. As I've repeatedly highlighted from
> https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

> <https://clicktime.symantec.com/a/1/POL8RNB4yU_cADRiUe8z_jZxcu7WZVlAWTjJn6i5wJ8=?d=XJSnlPqK_clDa0y3O_aI0Omud25oQO1nXZfVoZzyW7v60aEmzGejkX33mqgy4Xq-WZCrw6DAzqwv4bmdtcEScpto7fIvILek0CEHTlo-j3WW1__7iDt1avHSwo2-V_oAsGH3Tcl3Li84FPhYEQUpokqiXGgnPImBXezLt6mJIChpWSRD9XoexQRH2MLKerdwOCu073mw8_0Wj9mG1Z9yhsV4BZvuwsArzLVt4RRtiKoCcuYtJNd01U_H7ItfnorOHQbsxe1F_KK7Mkq_im2tiUKP-TVzGJ-ujYJeuRwbPUxdTFwbnxpi2dC85FZ7b4rVKZqKWSXyuzXUJm69N7JWRpwnDsYYTXYyDlk_r-1aSVxNjTFLllKzYqtvbJMWUKkeLZ2rMeye&u=https%3A%2F%2Fwiki.mozilla.org%2FCA%2FResponding_To_An_Incident%23Revocation>

> , the goal is not punishment - but understanding how these issues are being
> addressed.
>
>
>
> The main blocker for all of these is policy, not technology. I don’t know
> how to solve third party policy decisions, which is why I can’t seem to
> answer the questions. The process of planning a change, getting sign-off,
> rolling the change to stage, getting more sign-off, and then rolling to
> production with final testing combined with the blackout periods is making
> something that should be easy very difficult. I run an agile team at
> DigiCert so none of these are concerns when we roll a change internally.
> It’s the revocation part that is getting people up in arms. The consistent
> message I’ve gotten from customers is that changing domains and
> certificates requires the same process. It’s just as fast to roll out a
> change to both items as change just a certificate. The built-in CAB Forum
> 30 day cert requirement isn’t solving the issue because of the way they
> roll changes, not because the 30 day certs aren’t available.
>

So, concrete suggestions then, since it sounds like you're asking for that.

1) Communication to all your customers about the industry-standard
revocation requirements
2) Clear promotion, documentation, and tools for automation
3) Clear and published policies about the critical nature of certificates
and how they should be regarded

None of these are necessarily unique to DigiCert - #2 gets close, but there
are options. When your customer comes to you and says "We have a holiday
freeze", doesn't it seem better placed to say "Look, beyond just signing a
subscriber agreement, we sent messages on dates X, Y, and Z around the
industry standard practices around revocation. We also provided solutions
A, B, and C, which were all declined by your team."

I know that sounds like "shifting blame", but since you're absolutely
correct that you can't prevent your customers from engaging in risky
behaviours, the best you can do is to make sure that it's clear to them
that it is risky, it is unsupported, and it's not DigiCert being mean, but
industry standard. There's an opportunity to take this incident and make
sure no DigiCert customer ever experiences this issue again - or those of
any other CA.

That's I think one way to mitigate the moral hazard Matt speaks to. When
the next customer of the next CA comes and says "Look, you need to try to
get us an exception" - the CA knows that if they didn't do all of those
steps, they really didn't take any of (these) lessons to heart, and it's
not really defensible. And, if they did take those steps, then hasn't the
expectation been shifted to the customer - and the risk - thus making it
easier for the CA to defensibly say "You did this to yourself?"

Totally agree. I really don’t want to violate the BRs, and this shouldn’t
> be the norm. I also recognize we don’t want to invite this question for
> every BR change. Maybe better Mozilla guidelines about what’s acceptable
> requests and what’s not?
>

I can't speak for Mozilla here, but I tried to lay out some clear
expectations:
1) This is an extension of an existing incident, rather than treating it as
an exception to some long-standing or new rule
2) This is being treated as part of the remediation (revocation) plan,
rather than as an intentional violation of some other requirement
3) Going forward, "they weren't prepared for revocation" is not really an
acceptable answer in and of itself, and for this particular incident,
concrete proposals for how "They weren't prepared for revocation" can be
addressed or mitigated go a long way to addressing the underlying root
cause here, and by proxy, demonstrate a healthy awareness of and balancing
of risk, and ways to concretely mitigate that for the future.

Jeremy Rowley

unread,

Dec 27, 2018, 10:48:09 PM12/27/18

to ry...@sleevi.com, James Burton, mozilla-dev-security-policy

>> I think Matt provided a pretty clear moral hazard here - of customers suggesting their CAs didn't do enough (e.g. should have tried harder to intentionally violated by not revoking). One significant way to mitigating that risk is to take meaningful steps to ensure that "We couldn't revoke" is not really a viable or defensible option.

Oh – thanks. I missed that. A lack of knowledge is already not a defensible position. Revocation requirements and an agreement to revoke within 24 hours is in all our of existing DigiCert contracts. The same language is going into all Symantec customer contracts now as customers transition to DigiCert systems. All of our documentation, including the CPS, say we can revoke with less than 1 day notice.

>From section 4.9.1 of our CPS:

DigiCert will revoke a Certificate within 24 hours if one or more of the following occurs:

1. The Subscriber requests in writing that DigiCert revoke the Certificate;
2. The Subscriber notifies DigiCert that the original Certificate request was not authorized and does not retroactively grant authorization;

3.DigiCert obtains evidence that the Subscriber’s Private Key corresponding to the Public Key in the Certificate suffered aKey Compromise; or

4. DigiCert obtains evidence that the validation of domain authorization or control for any FDQN or IP address in the Certificate should not be relied upon.

DigiCert may revoke a certificate within 24 hours and will revoke a Certificate within 5 days if one or more of the following occurs:

1. The Certificate no longer complies with the requirements of Sections 6.1.5 and 6.1.6 of the CA/B forum baseline requirements;
2. DigiCert obtains evidence that the Certificate was misused;

3.The Subscriber or the cross‐certified CA breached a material obligation under the CP, this CPS, or the relevant agreement;

4. DigiCert confirms any circumstance indicating that use of a FQDN or IP address in the Certificate is no longer legally permitted (e.g. a court or arbitrator has revoked a Domain Name registrant’s right to use the Domain Name, a relevant licensing or services agreement between the Domain Name registrant and the Applicant has terminated, or the Domain Name registrant has failed to renew the Domain Name);

5. DigiCert confirms that a Wildcard Certificate has been used to authenticate a fraudulently misleading subordinate FQDN;

6. DigiCert confirms a material change in the information contained in the Certificate;

7. DigiCert confirms that the Certificate was not issued in accordance with the CA/B forum requirements or the DigiCert CP or this CPS;

8. DigiCert determines or confirms that any of the information appearing in the Certificate is inaccurate;

9. DigiCert’s right to issue Certificates under the CA/B forum requirements expires or is revoked or terminated, unless DigiCert has made arrangements to continue maintaining the CRL/OCSP Repository;

….

This is why I couch it as we can revoke technically and legally, but I don’t think we should.

>> This doesn't really inspire confidence. If the answer for how to deal with this is block efforts to remediate issues, then it runs all the risk that Matt was speaking to. "We knew people couldn't replace in January" is a problem, for sure, but because fundamentally the risk is always there that someone would need to revoke in January - or December, or November, or whenever the sensitive holiday freeze or critical sales or lunar alignment or personal vacation is - it's not really a mitigation at all for the issue.

>> I tried to give suggestions earlier for meaningful steps - such as making sure all customers know that certificates may need to be revoked as soon as 24 hours. This has been a pattern of challenge in the past for DigiCert if I recall correctly - I believe both Blizzard and GitHub had issues where the keys were compromised, but these organizations didn't want to revoke the certs until they could ship new private keys in their software (... ignoring all the issues in that one). I know you've said you've got the contracts in place to defensibly revoke these, but how are you helping your users understand these risks? Do you have documentation on this? Do you recommend users use automation? I know some of this speaks to business practice, but I think that's somewhat core to the issue - since revocation may be required, how is the CA, the party best placed to communicate to the customer, communicating that necessity?

Sorry – I thought you meant in addition to those things. All customers know we can revoke within 24 hours. Note that in both those cases the GitHub case we did revoke the cert within 24 hours of notification. We have documentation on revocation (eg https://www.digicert.com/certificate-revocation.htm) and talk about it a lot. We also recommend automation. We had our own automation tools before ACME came along. We were implementing ACME until we got distracted with the migration. It’s back on track and should be supported in DigiCert systems soon. We won’t be integrating it with legacy Symantec systems as those are being EoL’ed.

>> As Matt spoke to it somewhat, there's understandably competitive advantage to being the CA that will try their hardest not to revoke. And while I don't think this has risen to that level based on the information provided so far, understanding how that perception is being mitigated is key. There are other solutions, to be sure. Helping users move from publicly trusted CAs to managed CAs, for example, can still meet the business needs of these users w/o the attendant revocation risk.

There probably is. As you pointed out, a lot of the issues are mixing of PKIs. We’re going to better separate out web PKI vs. OS PKI going forward. I’m working on some internal proposals on that. We do provided managed PKI already. This managed PKI uses the same tools and similar automation tools as our publicly trusted certificates at a fraction of the cost. They are available alongside each other in enterprise accounts to encourage people to use non-public certs where appropriate.

>> Things like Heartbleed have shown that rapid revocation can be necessary. Misissuance or misvalidation by the CA that results in revocation surely can as well. Understandably, an answer of "Don't ever misissue" is great, but if it's really pinning all the hopes on one thing. Other CAs have taken steps like ensuring automation and short-lived certs as a way of ensuring that the upper-bound of any issue is limited (for example, to 90 days, or six months), and that automation is the default way of getting certs.

We support both of those things. Unfortunately the automation isn’t using ACME quite yet.

>> So, concrete suggestions then, since it sounds like you're asking for that.

Yes thanks!

>> 1) Communication to all your customers about the industry-standard revocation requirements

Easier said than done, but I’ll brain storm something. I mean, we have communicated this to all customers previously but whether that communication is received is a different question.

>> 2) Clear promotion, documentation, and tools for automation

This will be easier shortly

>> 3) Clear and published policies about the critical nature of certificates and how they should be regarded

Okay – we’ll work on this. I think we have some documentation already, but it’s probably a good idea to link it in our agreements. We do reference the BRs and Mozilla policy in each agreement so I know those are distributed to every customer.

>> None of these are necessarily unique to DigiCert - #2 gets close, but there are options. When your customer comes to you and says "We have a holiday freeze", doesn't it seem better placed to say "Look, beyond just signing a subscriber agreement, we sent messages on dates X, Y, and Z around the industry standard practices around revocation. We also provided solutions A, B, and C, which were all declined by your team."

No one ever says this until you revoke their certs. Before this, we had a vague awareness of holiday freezes from past discussions here (particularly with Symantec issues). However, I didn’t know first-hand until this event what they really entailed.

>> I know that sounds like "shifting blame", but since you're absolutely correct that you can't prevent your customers from engaging in risky behaviours, the best you can do is to make sure that it's clear to them that it is risky, it is unsupported, and it's not DigiCert being mean, but industry standard. There's an opportunity to take this incident and make sure no DigiCert customer ever experiences this issue again - or those of any other CA.

Sounds good. Thanks a ton for the suggestions.

>> That's I think one way to mitigate the moral hazard Matt speaks to. When the next customer of the next CA comes and says "Look, you need to try to get us an exception" - the CA knows that if they didn't do all of those steps, they really didn't take any of (these) lessons to heart, and it's not really defensible. And, if they did take those steps, then hasn't the expectation been shifted to the customer - and the risk - thus making it easier for the CA to defensibly say "You did this to yourself?"

Yeah. This is exactly what I was looking for. I like this because it gives clear guidance on what’s expected beforehand. What do you need to do before taking a potential issue to the Mozilla community?

>> I can't speak for Mozilla here, but I tried to lay out some clear expectations:

Definitely helpful. Thank you!

Matt Palmer

unread,

Dec 27, 2018, 11:39:42 PM12/27/18

to dev-secur...@lists.mozilla.org

On Fri, Dec 28, 2018 at 03:19:19AM +0000, Jeremy Rowley via dev-security-policy wrote:
> > I'm not sure I'd call it "leniency", but I think you're definitely asking
> > for "special treatment" -- pre-judgment on a potential incident so you can
> > decide whether or not it's worth it (to DigiCert) to deliberately break the
> > rules.
>
> I'm not sure there's a policy against asking for special treatment or
> pre-judgment. Like I said, I feel like this is a weird area where I'm not 100%
> sure how to proceed.

There's certainly a fuzzy area in the middle between "here is a problem,
what should we do?" and the other extreme of "please let me know in advance
if we'll be OK with doing this bad thing, because I'd like to decide whether
it's worth breaking the rules". I have to say that several of your messages
have read far more towards the latter than the former.

Of course, the ability to distinguish is muddied by the need for you to
provide specific data about the scope of the problem, which focuses things
on just DigiCert, when there is the distinct possibility that other CAs are
sitting quietly in the wings, having all the data but not wanting to step
into the ring, as it were.

> Like how do you raise when you think obedience to rules
> is riskier than breaking them? Breaking them then explaining why seems like a
> really bad idea. The best I could come up with is ask what to do and see if
> the browsers agree. Acknowledged that this would be very bad in most cases,
> but I'm not sure where you decide?

I think you've followed the best course open to you. Talking about issues
is pretty much guaranteed to be better than keeping quiet and hoping for the
best (thanks, CT!).

Certainly, knowingly breaking the rules and then having it turn up later is
terrible -- as Ryan said, that's a quick way to get yourself distrusted. I
certainly think that if any other CA comes out with an incident report
post-Jan-15 dealing with unrevoked underscore-bearing certificates, the
general reaction is going to be along the lines of, "are you <expletive>
*kidding* me?!?".

> > What were the criteria by which DigiCert decided which customers to grant
> > exceptions to?

[snip]

> Honestly, it came down to which ones were the most mad at me for telling
> them I am going to revoke their certs.

I can imagine...

> > First off, your customers. There is a certain amount of exposition in the
> > pharmacy company bug, however I can't say that what's there so far fills me
> > with a sense of contentment. You said in your most recent post, "Security
> > vulnerabilities are patched based on their rating", and that lacking a CVSS
> > it is difficult to get recognition of a problem. Would it be fair to say
> > that this narrow approach to security is shared by all/most/some/none of the
> > other similarly situated customers?
>
> No, but it's generally how people can get exceptions to the blackout period.
> More the norm is around how these certs are rolled out. They fall under three
> camps: a) a third party offering the main companies service that requires a
> bunch of testing and permissions (probably contractual), b) complicated
> policies about changes during/around blackout periods and c) certs actually
> used in software that require code changes and deployment to update.

Those are useful categories to have, thanks. It's especially handy for CAs
to bear in mind when they're communicating with their customers about the
risks of deeply embedding data which may need to change at short notice.

> > Focusing on the "what about next time?" aspect, which I believe is the most
> > important, I'd be interested to know what your customers are planning on
> > changing about their systems and processes, such that if a similar event
> > happens in the future, the outcome won't be the same.
>
> After this, I'd like to talk about removing some of the Symantec roots from
> Mozilla. A lot of these don't need trust in Mozilla and Chrome. The mix is in
> the OS vs. Web ecosystem. They need trust in OS platforms, but Web is more
> optional for a lot of the certs. If we have roots that are only trusted in
> the two OS platforms (MS and Apple), the risk changes for the web community.

I wonder how well that'll work out, given the dominant server platform
(Linux, in its many and varied incarnations) generally sources its trust
store from Mozilla (for better or worse). Given the highly variable
timeline that distros have for updating their trust stores, you might be
dealing with the fallout from that one for a *long* time to come.

> > Hence, what is it that DigiCert plans to change, such that an equivalent
> > result cannot happen in the future, given a similar event? There was one
> > rather draconian possibility suggested up-thread, of DigiCert limiting
> > itself to 100 days validity, and revoking a number of randomly-chosen
> > certificates periodically. That would certainly remove any practical
> > possibility of customers not being able to refresh their certificates
> > if-and-when, however I can imagine it might be a bit of a shock to the
> > system for many of them.
>
> I don't think that really solves the problem. All that does is migrate people
> from one CA to another.

Well, it solves the problem of *DigiCert* customers not having change
blackouts for four months of the year, although as you say, whether that's
because DigiCert customers improve their systems and processes, or whether
they stop being DigiCert customers, is important. The key way to mitigate
the latter would be to make it an industry-wide expectation.

> I don't have a good answer to this other than to
> continue investing in automation and discovery systems. As mentioned though,
> most of these complexities are with third party policy issues, not technical
> issues. I'm unaware of any legal requirement that would prohibit revocation. I
> know there's no technical limitation on our side that prevents issuing and
> deploying new certs immediately.

Third-party policy problems are, of course, somewhat immune to purely
technical solutions, which is why I'm particularly keen to hear from
DigiCert (and its customers) on what other measures would be useful to ensure
that customer policies can be encouraged in the right direction.

- Matt

Wayne Thayer

unread,

Dec 31, 2018, 12:46:16 PM12/31/18

to Ryan Sleevi, Jeremy Rowley, James Burton, mozilla-dev-security-policy

On Thu, Dec 27, 2018 at 8:22 PM Ryan Sleevi via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

>
> I can't speak for Mozilla here, but I tried to lay out some clear
> expectations:

> 1) This is an extension of an existing incident, rather than treating it as
> an exception to some long-standing or new rule
>
2) This is being treated as part of the remediation (revocation) plan,
> rather than as an intentional violation of some other requirement
>

This framing is technically correct and quite helpful in this situation,
but I am concerned about what it appears to imply for other ambiguous
requirements. Seeking clarity through a discussion and vote, as happened
here, is more orderly than a sudden declaration that a practice is
forbidden. This particular situation is a bit like a law that is challenged
in court, then upheld by the court and enforced, as opposed to a new law
being retroactively enforced.

> 3) Going forward, "they weren't prepared for revocation" is not really an
> acceptable answer in and of itself, and for this particular incident,
> concrete proposals for how "They weren't prepared for revocation" can be
> addressed or mitigated go a long way to addressing the underlying root
> cause here, and by proxy, demonstrate a healthy awareness of and balancing
> of risk, and ways to concretely mitigate that for the future.
>

> Yes, this is the desired outcome.

Thanks James, Jeremy, Matt, and Ryan - I really appreciate the ideas that
have come out in this discussion. I'll be looking at ways to use some of
this thinking to improve the Mozilla incident reporting process -
suggestions are welcome.

- Wayne