On Mon, Sep 23, 2019 at 11:53 PM Andy Warner via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:
> The practice of revoking non-issued certificates would therefore lead to
> CRL growth which would further make reliable revocation checking on
> bandwidth constrained clients more difficult.
As others have pointed out, it sounds like GTS is confused. This only
applies if you need to revoke them.
I’m not sure how many times it bears repeating, but the suggestion that you
need to revoke if you issued a precert, but not the cert, is patently
absurd. Among other things, as you point out, it causes both CRL and OCSP
growth.
Luckily, every browser participating has seemingly tried to make it clear
that’s not expected. So one objection handled :)
2. There seem to be a number of assumptions that precertificate issuance
> and certificate issuance is roughly atomic. In reality, a quorum of SCTs is
> required prior to final certificate issuance, so that is not the case.
Could you point to an example? All of the conversation I’ve seen has
highlighted that they are sequential, and can suffer a variety of delays or
errors. This is why the conversation has been about the least error prone
approach, in that it leads to the most consistent externally observable
results.
I admit, I’m honestly not sure what part of the conversation is being
referred to here.
As a result of this, the existence of a precertificate is possible without
> a final certificate having been issued.
Yup. And it’s been repeatedly acknowledged that is perfectly fine. The
proposed language further considers that, but emphasizes that by producing
and logging the precertificate, then regardless of the issue, the CA should
be prepared to provision services for it for the duration.
If you find yourself continually generating precertificates, that suggests
an operational/design issue, which you can remediate based on which is
cheaper for you: fixing the pipeline to be reliable (as many reliability
issues seen, to date, have been on the CA side) or continue to provision
services when things go bad. Either works, you can choose.
The important part is you need to treat (pre-cert || cert) in scope for all
your activities. You must be capable of revoking. You must be capable of
searching your databases. You must be capable of validating.
3. This raises the question of how much time a CA has from the time they
> issue a precertificate to when the final certificate must be issued.
It doesn’t, because it’s a flawed understanding that’s been repeatedly
addressed: you don’t have to issue the final certificate. But you MUST be
prepared to provision services as if you had.
In general, this means you provision and distribute those services ahead of
time. My reply to Dimitris earlier today provided a road map to an error
prone design, as well as two different ways of accomplishing compliant
design. Given that GTS is responding to that thread, I’m surprised to see
it come up again so quickly, as it seems like GTS may not have understood?
Likewise, there is the question of how soon the revocation information must
> be produced and reachable by an interested party (e.g. someone who has
> never seen the certificate in question but still wants to know the status
> of that certificate). [Aside, Wayne, you specifically said relying parties
> earlier, did you intend to say interested party or relying party? We have
> some additional questions if relying party was actually intended, as using
> it in that context seems to redefine what a relying party is.]
I cannot see how it redefined relying party, as anyone who decides to trust
GTS becomes a relying party of GTS, and does not change anything.
The question of how soon has been mentioned earlier, but again is addressed
by earlier replies. We’ve seen the problems with CAs arguing CDN
distribution. There is no reasonable way that the relying party community
can or should accept the phased rollout delays as compliant, particularly
with a 24 hour revocation timeline (for example).
A common approach to this is to pregenerate responses for distribution,
with edge caches (5019-style) that can talk to an authoritative origin
(6960-style) under the hood. If a client queries for the status, the edge
cache serves it if it’s a cache hit, otherwise communicates back to the
origin and pulls into the cache. This is perhaps unsurprising, as it’s the
model many active CDNs use, functioning as it were as a reverse proxy with
caching (and the ability to globally evict from the cache).
Since the CA already needs to ensure that they can have a globally
consistent response distributed within 24-hours, and that any time spent
synchronizing is time that the CA itself cannot use to investigate/respond,
this design discourages CAs from multihour rollouts (and that’s a good
thing). If you can’t meet those timelines, then you’re setting yourself up
for a CA incident that other CAs will have designed around.
If you think through the logical consequences for relying parties, it’s
clear that there are approaches CAs can use that are harmful, and there are
approaches they can use that are helpful. As publicly trusted CAs, they are
expected to be beyond reproach, and make every decision with the relying
parties’ interests at heart: not the Subscriber’s, not the Applicant’s, not
the CA’s. Something about putting the user first, and the user here is
everyone that will trust a certificate from that CA.
This “reachable” part is particularly meaningful in that when using a CDN
> there are often phased roll outs that can take hours to complete. Today,
> the BRs leave this ambiguous, the only statement in this area is that new
> information must be published every four days:
>
> "The CA SHALL update information provided via an Online Certificate Status
> Protocol at least every four days. OCSP responses from this service MUST
> have a maximum expiration time of ten days."
It’s not ambiguous.
Read 4.9.1.1 and 7.1.2.3(c). You aren’t providing a responder if it can’t
answer for four days, and you aren’t meeting the revocation timeline if you
aren’t publishing revocation information in 24 hours.
The normal timeline:
- Upon issuance, the definitive response is available
- That definitive response is refreshed at least every four days
- While the BRs max is ten days, a reminder that Microsoft sets a minimum
of 8 hours, requires the maximum be 7 days, and new information available
at half that - e.g. 3.5 days
- The responder should maintain global consistency (e.g. if using RFC5019,
this is easier)
When revoking:
- That response should be globally available and published with 24 hours or
five days.
With this change, it would seem there needs to be a lower bound defined for
> how quickly the information needs to be available if it is to be an
> effective monitoring tool.
Again, it sounds like GTS hasn’t been following the thread or the updates,
which have clarified as to why the presumed gap (between precert and cert)
is irrelevant, and thus a lower bound not needed here. This only becomes an
issue if GTS is responding unknown for several hours after issuing certs -
but by that logic, GTS is not providing responders for several hours after
issuance, which is a BR violation today.
* Clarifications
>
> This in turn raises the question if CAs can re-use authorization data such
> as CAA records or domain authorizations from the precertificate?
It doesn’t, because the BRs answer this, if GTS reads them. Specifically,
3.2.2.8 answers this for CAA.
If a final certificate has not been issued due to a persistent quorum
> failure, and that failure persists longer than the validity of the used
> authorization data, can the authorizations that were done prior to the
> precertificate issuance be re-used?
It seems a responsible CA would answer “No”, and ensure that the validity
period of any information they use is good for (pre-cert issuance time +
time they’re willing to wait for SCTs). They would avoid this whole issue,
by avoiding trying to do the “least possible” and recognizing that they
have the flexibility to unambiguously avoid any compliance issues here.
As such, in our opinion, a roll out period to enable software and
> deployment changes to be made would be appropriate. Had this conversation
> taken place within the CA/Browser forum, the implementation date would have
> been discussed before becoming a formal requirement. We leave it to
> Browsers to determine reasonable timelines and we're not seeking to delay,
> simply recognition that many changes take time to implement and it is tough
> to effectively respond to changes that become new requirements in an
> instant.
This is entirely unproductive and unhelpful, because it talks around the
issue. This is the behaviour we the community largely see of problematic
CAs that don’t have user’s security first.
If you think there’s an issue with the date, a productive, useful
contribution, would be:
- Highlight when
- Highlight why
However, none of the discussion “should” be a functional change for any CA
following the rules. Even as a clarification of expectations, it’s trivial
to resolve and get into compliance, judging by the responses we’ve seen
from CAs to date.
I’m most encouraged, and most discouraged, that it seems even still today,
GTS is having trouble understanding what’s proposed, and seeing things that
simply aren’t there, rather than the things that are. Hopefully, the
clarifications to this thread, showing GTS has not followed the
conversation, do much to assuage the concerns that GTS is being asked to
implement major changes. The only major change I can see is it sounds like
GTS may have had other compliance issues with its responder services,
likely similarly based on misunderstanding the requirement as a publicly
trusted CA. As I said, that’s encouraging and discouraging.
I know that’s far more direct than Wayne would be, but any publicly trusted
CA that’s been following this Forum should recognize that GTS is following
a playbook used by CAs to push back on security improvements
unconstructively, as a stalling tactic that usually exists to hide or paper
over non-compliance, and then arguing the non-compliance was because of
something ambiguous, rather than thinking through the logical consequences
of their decisions.
This isn’t to say pushing back gets you branded as a poor CA; however, this
specific approach, still lacking in actionable data and misunderstanding
both Root Policy and the CA/B Forum, absolutely causes a crisis of
confidence in the CAs that do this, and for good reason, as time has born
out.
Browsers should set whatever requirements they believe are in the best
> interest of their users, but the more requirements are split across
> multiple root program's requirements, the CA/Browser Forum and IETF, the
> harder it becomes to reason about what a correct behavior is. Given the
> historical precedent of rule making in CA/Browser forum and the fact that
> it covers all participants, it seems like the ideal body to ensure
> consistency within the ecosystem.
This is ahistorical. The historic precedent is and has been root program
requirements, especially with respect to matters like the topic at hand,
eventually flowing into the BRs.
That a CA would suggest that the CA/B Forum is a better place for
discussing this than here deeply saddens me, precisely because of the
intentional exclusion of a number of people from the Forum that have made
incredibly valuable and worthwhile contributions. Honestly, I suppose I had
expected GTS to value the openness and transparency this provides over the
Forum.
It is true that CAs bear the responsibility of following the rules of the
root programs they are in, and that can be complex if those requirements
aren’t centrally codified. A trivial solution exists for this, but which
CAs rarely avail themselves of: they can draft text (if they aren’t voting
members) or prepare and sponsor ballots (if they are) to incorporate the
root program requirements into text. For example, I continue to remain
shocked that no CA has thought to do so with Microsoft’s OCSP requirements,
or both Microsoft and Mozilla’s EKU requirements.
However, since this neither changes the expectations in the BRs nor
requires anything new of CAs, and merely explains the logical consequences
of RFC5019/6960 to those who may struggle with it, it does not seem to at
all raise to the level suggested here.