Hi Pedro,
I’m not sure how best to proceed here. It seems like we’ve reached a point
where you’re wanting to discuss possible ways to respond to this, as a CA,
and it feels like this should be captured on the bug.
I’m quite worried here, because this reply demonstrates that we’re at a
point where there is still a rather large disconnect, and I’m not sure how
to resolve it. It does not seem that there’s an understanding here of the
security issues, and while I want to help as best I can, I also believe
it’s appropriate that we accurately consider how well a CA understands
security issue as part of considering incident response. I want there to be
a safe space for questions, but I’m also deeply troubled by the confusion,
and so I don’t know how to balance those two goals.
On Fri, Jul 3, 2020 at 3:24 AM Pedro Fuentes via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:
> >
> > Yes. But that doesn't mean we blindly trust the CA in doing so. And
> that's
> > the "security risk".
>
> But the point then is that a delegated responder that had the required
> "noCheck" extension wouldn't be affected by this issue and CAs wouldn't
> need to react, and therefore the issue to solve is the "mis-issuance"
> itself due to the lack of the extension, not the fact that the CA
> certificate could be used to do delegated responses for the same operator
> of the Root, which is acceptable, as you said.
I don’t understand why this is difficult to understand. If you have the
noCheck extension, then as RFC 6960, you need to make that certificate
short lived. The BRs require the cert have the extension.
Similarly, if something goes wrong with such a responder, you also have to
consider revoking the root, because it as-bad-as a root key compromise.
In fact the side effect is that delegated responders operated externally
> that have the required no check extension don't seem to be affected by the
> issue and would be deemed acceptable, without requiring further action to
> CAs, while the evident risk problem is still there.
The “nocheck” discussion extension here is to highlight the compliance
issue.
The underlying issue is a security issue: things capable of providing OCSP
responses that shouldn’t be.
It seems you understand the security issue when viewing external sub-CAs:
they can now impact the security of the issuer.
It seems we’re at an impasse for understanding the issue for
internally-operates Sub CAs: this breaks all of the auditable controls and
assurance frameworks, and breaks the security goals of a “correctly”
configured delegated responder, as discussed in the security considerations
throughout RFC 6960.
>
> >
> > I can understand that our views may differ: you may see 3P as "great
> risk"
> > and 1p as "acceptable risk". However, from the view of a browser or a
> > relying party, "1p" and "3p" are the same: they're both CAs. So the risk
> is
> > the same, and the risk is unacceptable for both cases.
>
> But this is not actually like that, because what is required now to CAs is
> to react appropriately to this incident, and you are imposing a unique
> approach while the situations are fundamentally different. It's not the
> same the derivations of this issue for CAs that had 3P delegation (or a mix
> of 1P and 3P), and the ones, like us, that don't have such delegation.
The burden is for your CA to establish that, in the incident response. I’ve
seen nothing from you to reasonably establish that; you just say “but it’s
different”. And that worries me, because it seems you don’t recognize that
all of the controls and tools and expectations we have, both in terms of
audits but also in all of the checks we make (for example, with crt.sh)
*also* lose their credibility for as long as this exists.
Again, I understand and appreciate the view that you seem to be advocating:
“If nothing goes wrong, no one is harmed. If third-parties were involved,
things could go wrong, so we understand that. But we won’t let anything go
wrong ourselves.”
But you seem to be misunderstanding what I’m saying: “If anything goes
wrong, we will not be able to detect it, and all of our assumptions and
safety features will fail. We could try and design new safety features, but
now we’re having to literally pay for your mistake, which never should have
happened in the first place.”
That is a completely unfair and unreasonable thing for WISeKey to ask of
the community: for everyone to change and adapt because WISeKey failed to
follow the expectations.
The key destruction is the only way I can see being able to provide some
assurance that “things won’t go wrong, because it’s impossible for them to
go wrong, here’s the proof”
Anything short of that is asking the community to either accept the
security risk that things can go wrong, or for everyone to go modify their
code, including their tools to do things like check CT, to appropriately
guard against that. Which is completely unreasonable. That’s how
fundamental this breaks the assumptions here.
In our particular case, where we have three affected CAs, owned and
> operated by WISeKey, we are proposing this action plan, for which we
> request feedback:
> 1.- Monday, new CAs will be created with new keys, that will be used to
> substitute the existing ones
> 2.- Monday, the existing CAs would be reissued with the same keys,
> removing the OCSP Signing EKU and with A REDUCED VALIDITY OF THREE MONTHS
To match your emphasis, THIS DOES NOTHING to solve the security problem. It
doesn’t *matter* the short validity of the new, what matters is the *old*
certificates and whether the *old* certificates private keys still exist.
Which is the thing you’re still continuing.
> 3.- The existing CAs will be disabled for any new issuance, and will only
> be kept operative for signing CRLs and to attend revocation requests
> 4.- Within the 7 days period, the previous certificate of the CAs will be
> revoked, updating CCADB and OneCRL
This does nothing to address the security risk. It *only* addresses the
compliance issue.
5.- Once the re-issued certificates expire, we will destroy the keys and
> write the appropriate report
This ignores the security issues and just focuses on the compliance issues.
Which is, I think, an extremely poor response for a CA. If that was the
response, then like I said, the mitigation I would encourage clients to do
is remove trust in those intermediates. Since that turns out to be rather
difficult, I think the safest option would be to remove trust in the root.
The most important thing I want CAs to solve is the security issue. That’s
why I highlighted *every* CA, not just those I reported, needs to examine
their systems. The point isn’t to focus on compliance and ignore security:
it’s to solve the problem at its core.
Because of the nature of this issue, anything short of revoking the
intermediates with a key destruction is asking “trust us to not screw up
but we have no way to prove we didn’t and detecting it is now even harder
and oh if we do you have to revoke the issuer CA/Root anyways”. Rather than
wait for something to go wrong, and accept that risk, I’d rather we just
tackle it head on and start removing trust in whatever is needed. That’s
how big the risk is, and how just hoping things won’t go wrong isn’t a
strategy.
In my humble opinion, this plan is:
> - Solving the BR compliance issue by revoking the offending certificate
> within the required period
> - Reducing even more the potential risk of hypothetical misuse of the keys
> by establishing a short life-time
>
> I hope this plan is acceptable.
This plan still highlights a misunderstanding about the security issues,
and as a consequence, doesn’t seem to understand that it doesn’t reduce the
potential risk. Revoking the old intermediate does nothing for the security
risk, unless and until the key is no longer usable, or everyone in the
world changes their code to defend against these situations. That’s just
how this bug works.