Thanks for filing this, Jeremy.
If I understand correctly, the request DigiCert is asking is: "If we
submitted this as an incident report, would it be likely that conversations
about distrusting DigiCert would begin?", and that's what you're trying to
gauge from the community?
I think Wayne's already captured the "We need more information", but I
think it may be helpful to explain the reasoning and thinking here.
The Baseline Requirements and Root Program policies exist for a purpose: To
provide a consistent set of expectations for CAs, which meet the security
needs of the products using or operating those policies. As these policies
tend to call out, a CA may be removed (distrusted) for any reason or no
reason at all - it's entirely at the Program discretion. That said, history
tends to largely see removals for patterns of issues that, in aggregate,
demonstrate an ongoing and significant risk to users and the Internet at
large, although there have been CAs removed for single incidents in the
past - such as key compromise or issuing MITM certificates.
As a CA, the risk is that any and every incident may lead to the CA's
removal, and thus the best path to avoid that is to not have incidents in
the first place. Further, a CA with a pattern of incidents is not wrong to
be even more careful when it comes to presenting new incidents, especially
if they realize that they share similar root causes or further demonstrate
problematic patterns. That's not to say that if you only had a single
incident, you wouldn't be removed - as the policies capture, any reason and
no reason - but on the balance, it has historically tended to be
less-likely that first-time incidents lead to removal.
When incidents happen, it becomes necessary for Root Programs and the
communities they represent or collaborate with to evaluate the details of
the incident, as part of making a determination about what next steps are
appropriate. This involves investigating what the underlying root causes
are, to both ensure that the current CA with the incident understands the
significance, the severity, and the steps to remediate it, as well as to
help the industry at large develop and learn best practices, to prevent
future incidents. We're not the only industry to do this - in many ways,
it's borrowed from the aviation industry, that recognizes that critical
safety functions deserve thoughtful and detailed analysis to prevent harm
coming to those that trust in them.
Incident reports also serve to triage the issues - to work and identify the
risks and make sure they're being mitigated in a timely fashion. Sometimes,
the mitigation of risk may be to remove trust in the CA, other times, there
may be less significant steps that can be taken to address both the
immediate problem and the underlying issues.
DigiCert is now in a precarious position. As a CA, it knows that every one
of its Subscribers have agreed, in some legally binding form, that if the
CA has misissued a certificate, that it MUST be revoked within 24 hours
(or, very recently, and only in some cases, 5 days). The CA has a duty and
obligation to their customers, the Subscribers, to make sure that they
understand this. This is not about a punitive measure or punishing those
users for something their CA did - it's because the fundamental and
inherent risk is that there are incidents where certificates will need to
be replaced in as little as 24 hours, up to and including trust in the CA
being removed. To go back to that aviation analogy, the reason planes have
maintenance schedules is not because they're going to completely come
unglued and fall apart if you miss that maintenance schedule by a day - but
because of the severe and significant harm that comes about from having no
maintenance schedule at all, or even simply one that just isn't suitable
for the risks (to life, property, and safety). Matt Palmer's reply earlier
in this thread further expands on some of the other risks here and the
hazards that come with.
At the same time, DigiCert is, on behalf of their customers, saying that
even though both DigiCert and their customers agreed to the 24-hour
revocation rule, there are circumstances and situations that make that
risky. Despite being an industry standard (as captured in the Baseline
Requirements), and despite these agreements, DigiCert is concerned that
there are consequences for these customers that did not take adequate
precautions to meet the expectations they agreed to, and is trying to
perform a risk analysis. Further, they're looking for feedback from the
community to make sure that their analysis of the risk - the disruption to
their customers - is significant enough that it warrants both the immediate
risk of not revoking, the business risk to DigiCert, and the lasting risk
to the ecosystem, in intentionally violating the BRs.
It's not my intent to sound harsh, but to make sure it's clearly and
unambiguously stated as to what's happening. The reason for doing this is
because, on the balance, this seems to be exactly the recommendation in
https://wiki.mozilla.org/CA/Responding_To_An_Incident . This is called out
explicitly in the section on Revocation, which instruct the CA to perform a
risk analysis, develop a report, and devise a plan and timeline for
remediation. Further, this analysis should consider feedback of
third-parties, calling out explicitly both the CA's auditor and Root
Stores, as a means of checking that the analysis is balancing the right
tradeoffs, and that the plan is reasonable.
When a CA reports an incident, there is a discussion about what
certificates were impacted and the CA's plan and timeline to remediate them
- with the standing expectation being immediate revocation without some
otherwise demonstrable exigent risk. These plans factor into how the
incident is responded to by the Root Program - for example, the plan may
have inappropriately balanced the risk, they may have outright
misrepresented it, they may have misunderstood or mislead the community on
the size and scope of the issue, etc. Further, even if a plan is agreed to
as being acceptable (i.e. the incident not leading outright to discussions
of distrust), the incident is not actually closed out until the CA has
demonstrated the successful execution against that plan.
I know this message is long, and much of it stuff you know (but for which
others following may be unfamiliar with), but it gets both to the heart of
the request you're making and the key expectations to be able to respond.
You want to know whether, if this incident were filed, it would lead to a
discussion of distrust in some form, whether individually or in the
collective whole of the issues that DigiCert has had over the past several
years. The only reason we're even discussing this incident, specifically,
is because it relates to revocation following a previous incident
(underscores), which is the only thing acknowledged as even being up for
discussion or risk assessment by CAs. To be unambiguously clear, this would
be a wholly inappropriate request for any other form of BR violation, but
because it's specifically about balancing revocation and risk, it is
allowed, for now.
In order to answer that, we need to know:
1) What's the scope of the issue
2) What are the risks, as identified by DigiCert, and are they meaningfully
explained?
3) What's the concrete plan for remediation being presented
As it stands, it sounds like you've provided #1, which is Question 4 on the
incident report template. As called out by Wayne, #2 seems missing, and
that's captured by Question 6 on the incident report template, combined
with the facts and details from Question 2. Most concerning to me, however,
is that I can't find an answer to #3 - which is what Question 7 on the
template is trying to help identify. These are things that only DigiCert
can answer, and like any other CA, it needs to provide sufficient detail to
demonstrate that the issues are understood and being meaningfully
addressed, and that opportunities to improve are actively being pursued.
Please don't think of this as punishing DigiCert for even asking. I think
its commendable, that for the sole topic of revocation, DigiCert is taking
steps to engage in the risk analysis early, and publicly. You're not the
first CA to do so - other CAs have shared remediation plans regarding, for
example, TLS validation methods, and those too provided ways to balance
risk and measure progress. That said, as I mentioned earlier, I think that
going into 2019, we collectively, and CAs particularly, need to be taking
steps to prevent these conversations from ever being necessary, and,
fortunately or unfortunately for DigiCert, this places y'all in a unique
position of having both opportunity to use this long-standing and existing
practice, but also high-expectations on how to meaningfully ensure this
process never has to happen again.
All of this is said to make it clear that #3 - the concrete plan - not only
needs to include the remediation plan for these specific certs to be
revoked, and concrete dates and measurable milestones to see how well
DigiCert is progressing on that, but also needs to provide details as to
how DigiCert is taking steps to ensure that their customers do not find
themselves in these positions going forward.
For example, a commitment to open, standards-based automation solutions
provides an interoperable, industry-wide solution that such customers can
ensure certificates are replaced timely, whether because the issuing CA
needed to reissue, or because the issuing CA was no longer trusted.
Similarly, one could imagine that a plan also included a communication plan
to existing Subscribers to remind them of the details of the Subscriber
Agreement, which is industry standard and applies to all CAs, in requiring
timely revocation, and providing resources to help those customers prepare.
These are just two examples that, from the limited details provided, seem
to apply, but I expect that as the questions Wayne highlighted about the
risk analysis being performed, it may be that others are identified as
well. And that's what the incident process serves.
On Thu, Dec 20, 2018 at 12:55 AM Jeremy Rowley <
jeremy...@digicert.com>
wrote:
> Done:
>
>
>
>
https://bugzilla.mozilla.org/show_bug.cgi?id=1515564