On Thu, Feb 25, 2021 at 8:21 PM Aaron Gable <aa...@letsencrypt.org
> If I may, I believe that the problem is less that it is a reference (which
> is true of every URL stored in CCADB), and more that it is a reference to
> an unsigned object.
While that's a small part, it really is as I said: the issue of being a
reference. We've already had this issue with the other URL fields, and thus
there exists logic to dereference and archive those URLs within CCADB.
Issues like audit statements, CP, and CPSes are all things that are indeed
critical to understanding the posture of a CA over time, and so actually
having those materials in something stable and maintained (without a
dependence on the CA) is important. <Google-Hat> It's the lesson from those
various past failure modes that had Google very supportive of the non-URL
based approach, putting the JSON directly in CCADB, rather than forcing yet
another "update-and-fetch" system.</Google-Hat>. You're absolutely correct
that the "configured by CA" element has the nice property of being assured
that the change came from the CA themselves, without requiring signing, but
I wouldn't want to reduce the concern to just that.
* I'm not aware of any other automation system with write-access to CCADB
> (I may be very wrong!), and I imagine there would need to be some sort of
> further design discussion with CCADB's maintainers about what it means to
> give write credentials to an automated system, what sorts of protections
> would be necessary around those credentials, how to scope those credentials
> as narrowly as possible, and more.
We already have automation for CCADB. CAs can and do use it for disclosure
> * I'm not sure CCADB's maintainers want updates to it to be in the
> critical path of ongoing issuance, as opposed to just in the critical path
> for beginning issuance with a new issuer.
Without wanting to sound dismissive, whether or not it's in a critical path
of updating is the CA's choice on their design. I understand that there are
designs that could put it there, I think the question is whether it's
reasonable for the CA to have done that in the first place, which is why
it's important to drill down into these concerns. I know you merely
qualified it as undesirable, rather than actually being a blocker, and I
appreciate that, but I do think some of these concerns are perhaps less
grounded or persuasive than others :)
Taking a step back here, I think there's been a fundamental design error in
your proposed design, and I think that it, combined with the (existing)
automation, may make much of this not actually be the issue you anticipate.
Since we're talking Let's Encrypt, the assumption here is that the CRL URLs
will not be present within the crlDistributionPoints of the certificates,
otherwise, this entire discussion is fairly moot, since those
crlDistributionPoints can be obtained directly from Certificate
The purpose of this field is to help discover CRLs that are otherwise not
discoverable (e.g. from CT), but this also means that these CRLs do not
suffer from the same design limitations of PKI. Recall that there's nothing
intrinsic to a CRL that expresses its sharding algorithm (ignoring, for a
second, reasonCodes within the IDP extension). The only observability that
an external (not-the-CA) party has, whether the Subscriber or the RP, is
merely that "the CRL DP for this certificate is different from the CRLDP
for that certificate". It is otherwise opaque how the CA used it, even if
through a large enough corpus from CT, you can infer the algorithm from the
pattern. Further, when such shards are being used, you can observe that a
given CRL that you have (whose provenance may be unknown) can be known
whether or not it covers a given certificate by matching the CRLDP of the
cert against the IDP of the CRL. We're talking about a scenario in which
the certificate lacks a CRLDP, and so there's no way to know that, indeed,
a given CRL "covers" the certificate unambiguously. The only thing we have
is the CRL having an IDP, because if it didn't, it'd have to be a full CRL,
and then you'd be back to only having one URL to worry about.
Because of all of this, it means that the consumers of this JSON are
expected to combine all of the CRLs present, union all the revoked serials,
and be done with it. However, it's that unioning that I think you've
overlooked here in working out your math. In the "classic" PKI sense (i.e.
CRLDP present), the CA has to plan for revocation for the lifetime of the
certificate, it's fixed when the certificate is created, and it's immutable
once created. Further, changes in revocation frequency mean you need to
produce new versions of that specific CRL. However, the scenario we're
discussing, in which these CRLs are unioned, you're entirely flexible at
all points in time for how you balance your CRLs. Further, in the 'ideal'
case (no revocations), you need only produce a single empty CRL. There's no
need to produce an empty-CRL-per-shard like there would be under 'classic'
This means your shard algorithm does entirely support dynamic rebalancing
at any point in time, which seemed to be your concern. It means that you
can, indeed, balance shards based on size, and it also means you can do
things like create batches of CRLs that act as additions over an initial
set. For lack of a better (public) analogy, it's a bit like database
servers that have the initial database as one file, then record a series of
deltas as separate files, along with periodic rebalancing to optimize the
data (e.g. remove deleted rows, re-sort indices, etc)
While I'm not sure I actually agree with your assertion that the large CRL
is unwieldy, since the target here is primarily Root Store Programs doing
aggregate revocation rather than RPs, let's say you wanted to target your
CRL as 60mb chunks. During the course of normal issuance, you simply create
one CRL that is empty (to represent "all the unrevoked, unexpired
certificates"), then begin working from oldest to newest of your revoked
certificates. Every time you've accumulated 60MB of revocation, you cut a
CRL and shard, and then continue. That algorithm scales whether you're
dealing with day-to-day revocation or a mass-revocation event - you will
always end up with 60mb-or-less shards.
Now, let's say at some point after creating your initial batch of
revocations, you have more revocations. You again have design options. One
option would be to re-generate the CRLs from the oldest cert to newest,
which will intrinsically rebalance the CRLs for the new revocations.
Alternatively, you can simply create a new CRL with *just* the new
revocations (or update your last CRL that's less than 60mb), and continue
operation. You create a new file every time you've accumulated 60mb of
Let's further assume you're concerned with the overall storage size,
although your current design of immutability/non-deletion of files seems to
suggest this is not as major a concern. The approach I described there, of
simply appending revocations to the not-a-deltaCRL-delta-CRL, will
eventually result in a host of non-temporally-based revocations, meaning
you'll end up needing to carry that data until every certificate within
that CRL is expired (again, assuming your non-deletion goal here). Even
then, you could simply, on a periodic interval that's appropriate for your
needs (e.g. weekly), rebalance all of your CRLs, and then update CCADB with
the newest batch.
In this model, the only time you have to touch CCADB is when you've
accumulated 60mb worth of revocations - in the worst case. However, because
these URLs effectively work as a union, you could certainly pre-declare
some CRLs (producing empty CRLs) to reduce the need to update CCADB. You
are, in effect, pre-allocating "space", by allowing yourself, say, 240MB of
revocations before you need to update CCADB again.
I think the mistake here began by treating this CRL list from the
perspective of the URLs embedded within certificates, which does trigger
the design constraints you mentioned regarding things like size, and
understandably makes the temporal sharding tempting. However, that's not
the scenario being discussed here, and so with a bit more creative thinking
about the design, hopefully you can see this isn't as worrying or complex
as you feared. I imagine this may take some time to process and think
through, but hopefully this explanation made sense.
> There is one thing you mentioned that's also non-obvious to me, because I
>> would expect you already have to deal with this exact issue with respect to
>> OCSP, which is "overwriting files is a dangerous operation prone to many
>> forms of failure". Could you expand more about what some of those
>> top-concerns are? I ask, since, say, an OCSP Responder is frequently
>> implemented as "Spool /ocsp/:issuerDN/:serialNumber", with the CA
>> overwriting :serialNumber whenever they produce new responses. It sounds
>> like you're saying that common design pattern may be problematic for y'all,
>> and I'm curious to learn more.
> Sure, happy to expand. For those following along at home, this last bit is
> relatively off-topic compared to the other sections above, so skip if you
> feel like it :)
> OCSP consists of hundreds of millions of small entries. Thus our OCSP
> infrastructure is backed by a database, and fronted by a caching CDN. So
> the database and the CDN get to handle all the hard problems of overwriting
> data, rather than having us reinvent the wheel. But CRL consists of
> relatively-few large entries, which is much better suited to a flat/static
> file structure like that you describe for a naive implementation of OCSP.
> For more on why we'd prefer to leave file overwriting to the experts rather
> than risk getting it wrong ourselves, see this talk
Without wanting to grossly over-simplify here, it equally holds that you
can be placing the CRLs within a database. I think you're assuming a
particular fixed "worst case" size for the CRL, and optimizing for that,
but as I tried to cover above, you aren't stymied by that element of PKI in
what's being asked for here.