CCADB Proposal: Add field called JSON Array of Partitioned CRLs Issued By This CA

203 views
Skip to first unread message

Kathleen Wilson

unread,
Feb 24, 2021, 3:36:12 PM2/24/21
to mozilla-dev-s...@lists.mozilla.org
All,

As previously discussed, there is a section on root and intermediate
certificate pages in the CCADB called ‘Pertaining to Certificates Issued
by this CA’, and it currently has one field called 'Full CRL Issued By
This CA'.

Proposal: Add field called 'JSON Array of Partitioned CRLs Issued By
This CA'

Description of this proposed field:
When there is no full CRL for certificates issued by this CA, provide a
JSON array whose elements are URLs of partitioned, DER-encoded CRLs that
when combined are the equivalent of a full CRL. The JSON array may omit
obsolete partitioned CRLs whose scopes only include expired certificates.

Example:

[
"http://cdn.example/crl-1.crl",
"http://cdn.example/crl-2.crl"
]



Additionally, I propose adding a new section to
https://www.ccadb.org/cas/fields called “Revocation Information”.

The proposed draft for this new section is here:
https://docs.google.com/document/d/1uVK0h4q5BSrFv6e86f2SwR5m2o9Kl1km74vG4HnkABw/edit?usp=sharing


I will appreciate your input on this proposal.

Thanks,
Kathleen


Aaron Gable

unread,
Feb 25, 2021, 12:33:41 PM2/25/21
to mozilla-dev-security-policy, Kathleen Wilson
Hi Kathleen,

It was my impression from earlier discussions
<https://groups.google.com/g/mozilla.dev.security.policy/c/Bf6HSA44528> that
the plan was for the new CCADB field to contain a URL which points to a
document containing only a JSON array of partitioned CRL URLs, rather than
the new CCADB field containing such an array directly.

Obviously this plan may have changed due to other off-list conversations,
but I would like to express a strong preference for the original plan. At
the scale at which Let's Encrypt issues, it is likely that our JSON array
will contain on the order of 1000 CRL URLs, and will add a new one (and age
out an entirely-expired one) every 6 hours or so. I am not aware of any
existing automation which updates CCADB at that frequency.

Further, from a resiliency perspective, we would prefer that the CRLs we
generate live at fully static paths. Rather than overwriting CRLs with new
versions when they are re-issued prior to their nextUpdate time, we would
leave the old (soon-to-be-expired) CRL in place, offer its replacement at
an adjacent path, and update the JSON to point at the replacement. This
process would have us updating the JSON array on the order of minutes, not
hours.

We believe that earlier "URL to a JSON array..." approach makes room for
significantly simpler automation on the behalf of CAs without significant
loss of auditability. I believe it may be helpful for the CCADB field
description (or any upcoming portion of the MRSP which references it) to
include specific requirements around the cache lifetime of the JSON
document and the CRLs referenced within it.

Thanks,
Aaron
> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>

Ryan Sleevi

unread,
Feb 25, 2021, 12:53:53 PM2/25/21
to Aaron Gable, Kathleen Wilson, mozilla-dev-security-policy
On Thu, Feb 25, 2021 at 12:33 PM Aaron Gable via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> Obviously this plan may have changed due to other off-list conversations,
> but I would like to express a strong preference for the original plan. At
> the scale at which Let's Encrypt issues, it is likely that our JSON array
> will contain on the order of 1000 CRL URLs, and will add a new one (and age
> out an entirely-expired one) every 6 hours or so. I am not aware of any
> existing automation which updates CCADB at that frequency.
>
> Further, from a resiliency perspective, we would prefer that the CRLs we
> generate live at fully static paths. Rather than overwriting CRLs with new
> versions when they are re-issued prior to their nextUpdate time, we would
> leave the old (soon-to-be-expired) CRL in place, offer its replacement at
> an adjacent path, and update the JSON to point at the replacement. This
> process would have us updating the JSON array on the order of minutes, not
> hours.


This seems like a very inefficient design choice, and runs contrary to how
CRLs are deployed by, well, literally anyone using CRLs as specified, since
the URL is fixed within the issued certificate.

Could you share more about the design of why? Both for the choice to use
sharded CRLs (since that is the essence of the first concern), and the
motivation to use fixed URLs.

We believe that earlier "URL to a JSON array..." approach makes room for
> significantly simpler automation on the behalf of CAs without significant
> loss of auditability. I believe it may be helpful for the CCADB field
> description (or any upcoming portion of the MRSP which references it) to
> include specific requirements around the cache lifetime of the JSON
> document and the CRLs referenced within it.


Indirectly, you’ve highlighted exactly why the approach you propose loses
auditability. Using the URL-based approach puts the onus on the consumer to
try and detect and record changes, introduces greater operational risks
that evade detection (e.g. stale caches on the CAs side for the content of
that URL), and encourages or enables designs that put greater burden on
consumers.

I don’t think this is suggested because of malice, but I do think it makes
it significantly easier for malice to go undetected, for accurate historic
information to be hidden or made too complex to maintain.

This is already a known and, as of recent, studied problem with CRLs [1].
Unquestionably, you are right for highlighting and emphasizing that this
constrains and limits how CAs perform certain operations. You highlight it
as a potential bug, but I’d personally been thinking about it as a
potential feature. To figure out the disconnect, I’m hoping you could
further expand on the “why” of the design factors for your proposed design.

Additionally, it’d be useful to understand how you would suggest CCADB
consumers maintain an accurate, CA attested log of changes. Understanding
such changes is an essential part of root program maintenance, and it does
seem reasonable to expect CAs to need to adjust to provide that, rather
than give up on the goal.

[1]
https://arxiv.org/abs/2102.04288

>

Aaron Gable

unread,
Feb 25, 2021, 2:15:16 PM2/25/21
to mozilla-dev-security-policy, Kathleen Wilson, Ryan Sleevi
Sure, happy to provide more details! The fundamental issue here is the
scale at which Let's Encrypt issues, and the automated nature by which
clients interact with Let's Encrypt.

LE currently has 150M certificates active, all (as of March 1st) signed by
the same issuer certificate, R3. In the event of a mass revocation, that
means a CRL with 150M entries in it. At an average of 38 bytes per entry in
a CRL, that means nearly 6GB worth of CRL. Passing around a single 6GB file
isn't good for reliability (it's much better to fail-and-retry downloading
one of a hundred 60MB files than fail-and-retry a single 6GB file), so
sharding seems like an operational necessity.

Even without a LE-initiated mass revocation event, one of our large
integrators (such as a hosting provider with millions of domains) could
decide for any reason to revoke every single certificate we have issued to
them. We need to be resilient to these kinds of events.

Once we've decided that sharding is necessary, the next question is "static
or dynamic sharding?". It's easy to imagine a world in which we usually
have only one or two CRL shards, but dynamically scale that number up to
keep individual CRL sizes small if/when revocation rises sharply. There are
a lot of "interesting" (read: difficult) engineering problems here, and
we've decided not to go the dynamic route, but even if we did it would
obviously require being able to change the list of URLs in the JSON array
on the fly.

For static sharding, we would need to constantly maintain a large set of
small CRLs, such that even in the worst case no individual CRL would become
too large. I see two main approaches: maintaining a fully static set of
shards into which our certificates are bucketed, or maintaining rolling
time-based shards (much like CT shards).

Maintaining a static set of shards has the primary advantage of "working
like CRLs usually work". A given CRL has a scope (e.g. "all certs issued by
R3 whose serial number is equal to 1 mod 500"), it has a nextUpdate, and a
new CRL with the same scope will be re-issued at the same path before that
nextUpdate is reached. However, it makes re-sharding difficult. If Let's
Encrypt's issuance rises enough that we want to have 1000 shards instead of
500, we'll have to re-shard every cert, re-issue every CRL, and update the
list of URLs in the JSON. And if we're updating the list, we should have
standards around how that list is updated and how its history is stored,
and then we'd prefer that those standards allow for rapid updates.

The alternative is to have rolling time-based shards. In this case, every X
hours we would create a new CRL, and every certificate we issue over the
next period would belong to that CRL. Similar to the above, these CRLs have
nice scopes: "all certs issued by R3 between AA:BB and XX:YY"). When every
certificate in one of these time-based shards has expired, we can simply
stop re-issuing it. This has the advantage of solving the resharding
problem: if we want to make our CRLs smaller, we just increase the
frequency at which we initialize a new one, and 90 days later we've fully
switched over to the new size. It has the disadvantage from your
perspective of requiring us to add a new URL to the JSON array every period
(and we get to drop an old URL from the array every period as well).

So why would we want to put each CRL re-issuance at a new path, and update
our JSON even more frequently? Because we have reason to believe that
various root programs will soon seek CRL re-issuance on the order of every
6 hours, not every 7 days as currently required; we will have many shards;
and overwriting files is a dangerous operation prone to many forms of
failure. Our current plan is to surface CRLs at paths like
`/crls/:issuerID/:shardID/:thisUpdate.der`, so that we never have to
overwrite a file. Similarly, our JSON document can always be written to a
new file, and the path in CCADB can point to a simple handler which always
serves the most recent file. Additionally, this means that anyone in
possession of one of our JSON documents can fetch all the CRLs listed in it
and get a *consistent* view of our revocation information as of that time.

I believe that there is an argument to be made here that this plan
increases the auditability of the CRLs, rather than decreases it. Root
programs could require that any published JSON document be valid for a
certain period of time, and that all CRLs within that document remain
available for that period as well. Or even that historical versions of CRLs
remain available until every certificate they cover has expired (which is
what we intend to do anyway). Researchers can crawl our history of CRLs and
examine revocation events in more detail than previously available.

Regardless, even without statically-pathed, timestamped CRLs, I believe
that the merits of rolling time-based shards are sufficient to be a strong
argument in favor of dynamic JSON documents.

I hope this helps and that I addressed your questions,
Aaron

Ryan Sleevi

unread,
Feb 25, 2021, 3:52:47 PM2/25/21
to Aaron Gable, mozilla-dev-security-policy, Kathleen Wilson, Ryan Sleevi
Hugely useful! Thanks for sharing - this is incredibly helpful.

I've snipped a good bit, just to keep the thread small, and have some
further questions inline.

On Thu, Feb 25, 2021 at 2:15 PM Aaron Gable <aa...@letsencrypt.org> wrote:

> I believe that there is an argument to be made here that this plan
> increases the auditability of the CRLs, rather than decreases it. Root
> programs could require that any published JSON document be valid for a
> certain period of time, and that all CRLs within that document remain
> available for that period as well. Or even that historical versions of CRLs
> remain available until every certificate they cover has expired (which is
> what we intend to do anyway). Researchers can crawl our history of CRLs and
> examine revocation events in more detail than previously available.
>

So I think unpacking this a little: Am I understanding your proposal
correctly that "any published JSON document be valid for a certain period
of time" effectively means that each update of the JSON document also gets
a distinct URL (i.e. same as the CRLs)? I'm not sure if that's what you
meant, because it would still mean regularly updating CCADB whenever your
shard-set changes (which seems to be the concern), but at the same time, it
would seem that any validity requirement imposes on you a lower-bound for
how frequently you can change or introduce new shards, right?

The issue I see with the "URL stored in CCADB" is that it's a reference,
and the dereferencing operation (retrieving the URL) puts the onus on the
consumer (e.g. root stores) and can fail, or result in different content
for different parties, undetectably. If it was your proposal to change to
distinct URLs, that issue would still unfortunately exist.

If there is an API that allows you to modify the JSON contents directly
(e.g. a CCADB API call you could make with an OAuth token), would that
address your concern? It would allow CCADB to still canonically record the
change history and contents, facilitating that historic research. It would
also facilitate better compliance tracking - since we know policies like
"could require that any published JSON" don't really mean anything in
practice for a number of CAs, unless the requirements are also monitored
and enforced.


> Regardless, even without statically-pathed, timestamped CRLs, I believe
> that the merits of rolling time-based shards are sufficient to be a strong
> argument in favor of dynamic JSON documents.
>

Right, I don't think there's any fundamental opposition to that. I'm very
much in favor of time-sharded CRLs over hash-sharded CRLs, for exactly the
reasons you highlight. I think the question was with respect to the
frequency of change of those documents (i.e. how often you introduce new
shards, and how those shards are represented).

There is one thing you mentioned that's also non-obvious to me, because I
would expect you already have to deal with this exact issue with respect to
OCSP, which is "overwriting files is a dangerous operation prone to many
forms of failure". Could you expand more about what some of those
top-concerns are? I ask, since, say, an OCSP Responder is frequently
implemented as "Spool /ocsp/:issuerDN/:serialNumber", with the CA
overwriting :serialNumber whenever they produce new responses. It sounds
like you're saying that common design pattern may be problematic for y'all,
and I'm curious to learn more.

Aaron Gable

unread,
Feb 25, 2021, 8:22:07 PM2/25/21
to Ryan Sleevi, mozilla-dev-security-policy, Kathleen Wilson
Similarly, snipping and replying to portions of your message below:

On Thu, Feb 25, 2021 at 12:52 PM Ryan Sleevi <ry...@sleevi.com> wrote:

> Am I understanding your proposal correctly that "any published JSON
> document be valid for a certain period of time" effectively means that each
> update of the JSON document also gets a distinct URL (i.e. same as the
> CRLs)?
>

No, the (poorly expressed) idea is this: suppose you fetch our
rapidly-changing document and get version X. Over the next five minutes,
you fetch every CRL URL in that document. But during that same five
minutes, we've published versions X+1 and X+2 of that JSON document at that
same URL. There should be a guarantee that, as long as you fetch the CRLs
in your document "fast enough" (for some to-be-determined value of "fast"),
all of those URLs will still be valid (i.e. not return a 404 or similar),
*even though* some of them are not referenced by the most recent version of
the JSON document.

This may seem like a problem that arises only in our rapidly-changing JSON
version of things. But I believe it should be a concern even in the system
as proposed by Kathleen: when a CA updates the JSON array contained in
CCADB, how long does a consumer of CCADB have to get a snapshot of the
contents of the previous set of URLs? To posit an extreme hypothetical, can
a CA hide misissuance of a CRL by immediately hosting their fixed CRL at a
new URL and updating their CCADB JSON list to include that new URL instead?
Not to put too fine a point on it, but I believe that this sort of
hypothetical is the underlying worry about having the JSON list live
outside CCADB where it can be changed on a whim, but I'm not sure that
having the list live inside CCADB without any requirements on the validity
of the URLs inside it provides significantly more auditability.

The issue I see with the "URL stored in CCADB" is that it's a reference,
> and the dereferencing operation (retrieving the URL) puts the onus on the
> consumer (e.g. root stores) and can fail, or result in different content
> for different parties, undetectably.
>

If I may, I believe that the problem is less that it is a reference (which
is true of every URL stored in CCADB), and more that it is a reference to
an unsigned object. URLs directly to CRLs don't have this issue, because
the CRL is signed. And storing the JSON array directly doesn't have this
issue, because it is implicitly signed by the credentials of the user who
signed in to CCADB to modify it. One possible solution here would be to
require that the JSON document be signed by the same CA certificate which
issued all of the CRLs contained in it. I don't think I like this solution,
but it is within the possibility space.


> If there is an API that allows you to modify the JSON contents directly
> (e.g. a CCADB API call you could make with an OAuth token), would that
> address your concern?
>

If Mozilla and the other stakeholders in CCADB decide to go with this
thread's proposal as-is, then I suspect that yes, we would develop
automation to talk to CCADB's API in exactly this way. This is undesired
from our perspective for a variety of reasons:
* I'm not aware of a well-maintained Go library for interacting with the
Salesforce API.
* I'm not aware of any other automation system with write-access to CCADB
(I may be very wrong!), and I imagine there would need to be some sort of
further design discussion with CCADB's maintainers about what it means to
give write credentials to an automated system, what sorts of protections
would be necessary around those credentials, how to scope those credentials
as narrowly as possible, and more.
* I'm not sure CCADB's maintainers want updates to it to be in the critical
path of ongoing issuance, as opposed to just in the critical path for
beginning issuance with a new issuer.

I think the question was with respect to the frequency of change of those
> documents.
>

Frankly, I think the least frequent creation of a new time-sharded CRL we
would be willing to do is once every 24 hours (that's still >60MB per CRL
in the worst case). That's going to require automation no matter what.


> There is one thing you mentioned that's also non-obvious to me, because I
> would expect you already have to deal with this exact issue with respect to
> OCSP, which is "overwriting files is a dangerous operation prone to many
> forms of failure". Could you expand more about what some of those
> top-concerns are? I ask, since, say, an OCSP Responder is frequently
> implemented as "Spool /ocsp/:issuerDN/:serialNumber", with the CA
> overwriting :serialNumber whenever they produce new responses. It sounds
> like you're saying that common design pattern may be problematic for y'all,
> and I'm curious to learn more.
>

Sure, happy to expand. For those following along at home, this last bit is
relatively off-topic compared to the other sections above, so skip if you
feel like it :)

OCSP consists of hundreds of millions of small entries. Thus our OCSP
infrastructure is backed by a database, and fronted by a caching CDN. So
the database and the CDN get to handle all the hard problems of overwriting
data, rather than having us reinvent the wheel. But CRL consists of
relatively-few large entries, which is much better suited to a flat/static
file structure like that you describe for a naive implementation of OCSP.
For more on why we'd prefer to leave file overwriting to the experts rather
than risk getting it wrong ourselves, see this talk
<https://www.deconstructconf.com/2019/dan-luu-files>.

Thanks,
Aaron

Ryan Sleevi

unread,
Feb 26, 2021, 1:03:00 AM2/26/21
to Aaron Gable, Ryan Sleevi, mozilla-dev-security-policy, Kathleen Wilson
On Thu, Feb 25, 2021 at 8:21 PM Aaron Gable <aa...@letsencrypt.org> wrote:

> If I may, I believe that the problem is less that it is a reference (which
> is true of every URL stored in CCADB), and more that it is a reference to
> an unsigned object.
>

While that's a small part, it really is as I said: the issue of being a
reference. We've already had this issue with the other URL fields, and thus
there exists logic to dereference and archive those URLs within CCADB.
Issues like audit statements, CP, and CPSes are all things that are indeed
critical to understanding the posture of a CA over time, and so actually
having those materials in something stable and maintained (without a
dependence on the CA) is important. <Google-Hat> It's the lesson from those
various past failure modes that had Google very supportive of the non-URL
based approach, putting the JSON directly in CCADB, rather than forcing yet
another "update-and-fetch" system.</Google-Hat>. You're absolutely correct
that the "configured by CA" element has the nice property of being assured
that the change came from the CA themselves, without requiring signing, but
I wouldn't want to reduce the concern to just that.

* I'm not aware of any other automation system with write-access to CCADB
> (I may be very wrong!), and I imagine there would need to be some sort of
> further design discussion with CCADB's maintainers about what it means to
> give write credentials to an automated system, what sorts of protections
> would be necessary around those credentials, how to scope those credentials
> as narrowly as possible, and more.
>

We already have automation for CCADB. CAs can and do use it for disclosure
of intermediates.


> * I'm not sure CCADB's maintainers want updates to it to be in the
> critical path of ongoing issuance, as opposed to just in the critical path
> for beginning issuance with a new issuer.
>

Without wanting to sound dismissive, whether or not it's in a critical path
of updating is the CA's choice on their design. I understand that there are
designs that could put it there, I think the question is whether it's
reasonable for the CA to have done that in the first place, which is why
it's important to drill down into these concerns. I know you merely
qualified it as undesirable, rather than actually being a blocker, and I
appreciate that, but I do think some of these concerns are perhaps less
grounded or persuasive than others :)

Taking a step back here, I think there's been a fundamental design error in
your proposed design, and I think that it, combined with the (existing)
automation, may make much of this not actually be the issue you anticipate.

Since we're talking Let's Encrypt, the assumption here is that the CRL URLs
will not be present within the crlDistributionPoints of the certificates,
otherwise, this entire discussion is fairly moot, since those
crlDistributionPoints can be obtained directly from Certificate
Transparency.

The purpose of this field is to help discover CRLs that are otherwise not
discoverable (e.g. from CT), but this also means that these CRLs do not
suffer from the same design limitations of PKI. Recall that there's nothing
intrinsic to a CRL that expresses its sharding algorithm (ignoring, for a
second, reasonCodes within the IDP extension). The only observability that
an external (not-the-CA) party has, whether the Subscriber or the RP, is
merely that "the CRL DP for this certificate is different from the CRLDP
for that certificate". It is otherwise opaque how the CA used it, even if
through a large enough corpus from CT, you can infer the algorithm from the
pattern. Further, when such shards are being used, you can observe that a
given CRL that you have (whose provenance may be unknown) can be known
whether or not it covers a given certificate by matching the CRLDP of the
cert against the IDP of the CRL. We're talking about a scenario in which
the certificate lacks a CRLDP, and so there's no way to know that, indeed,
a given CRL "covers" the certificate unambiguously. The only thing we have
is the CRL having an IDP, because if it didn't, it'd have to be a full CRL,
and then you'd be back to only having one URL to worry about.

Because of all of this, it means that the consumers of this JSON are
expected to combine all of the CRLs present, union all the revoked serials,
and be done with it. However, it's that unioning that I think you've
overlooked here in working out your math. In the "classic" PKI sense (i.e.
CRLDP present), the CA has to plan for revocation for the lifetime of the
certificate, it's fixed when the certificate is created, and it's immutable
once created. Further, changes in revocation frequency mean you need to
produce new versions of that specific CRL. However, the scenario we're
discussing, in which these CRLs are unioned, you're entirely flexible at
all points in time for how you balance your CRLs. Further, in the 'ideal'
case (no revocations), you need only produce a single empty CRL. There's no
need to produce an empty-CRL-per-shard like there would be under 'classic'
(non-unioned) PKI.

This means your shard algorithm does entirely support dynamic rebalancing
at any point in time, which seemed to be your concern. It means that you
can, indeed, balance shards based on size, and it also means you can do
things like create batches of CRLs that act as additions over an initial
set. For lack of a better (public) analogy, it's a bit like database
servers that have the initial database as one file, then record a series of
deltas as separate files, along with periodic rebalancing to optimize the
data (e.g. remove deleted rows, re-sort indices, etc)

While I'm not sure I actually agree with your assertion that the large CRL
is unwieldy, since the target here is primarily Root Store Programs doing
aggregate revocation rather than RPs, let's say you wanted to target your
CRL as 60mb chunks. During the course of normal issuance, you simply create
one CRL that is empty (to represent "all the unrevoked, unexpired
certificates"), then begin working from oldest to newest of your revoked
certificates. Every time you've accumulated 60MB of revocation, you cut a
CRL and shard, and then continue. That algorithm scales whether you're
dealing with day-to-day revocation or a mass-revocation event - you will
always end up with 60mb-or-less shards.

Now, let's say at some point after creating your initial batch of
revocations, you have more revocations. You again have design options. One
option would be to re-generate the CRLs from the oldest cert to newest,
which will intrinsically rebalance the CRLs for the new revocations.
Alternatively, you can simply create a new CRL with *just* the new
revocations (or update your last CRL that's less than 60mb), and continue
operation. You create a new file every time you've accumulated 60mb of
revocation.

Let's further assume you're concerned with the overall storage size,
although your current design of immutability/non-deletion of files seems to
suggest this is not as major a concern. The approach I described there, of
simply appending revocations to the not-a-deltaCRL-delta-CRL, will
eventually result in a host of non-temporally-based revocations, meaning
you'll end up needing to carry that data until every certificate within
that CRL is expired (again, assuming your non-deletion goal here). Even
then, you could simply, on a periodic interval that's appropriate for your
needs (e.g. weekly), rebalance all of your CRLs, and then update CCADB with
the newest batch.

In this model, the only time you have to touch CCADB is when you've
accumulated 60mb worth of revocations - in the worst case. However, because
these URLs effectively work as a union, you could certainly pre-declare
some CRLs (producing empty CRLs) to reduce the need to update CCADB. You
are, in effect, pre-allocating "space", by allowing yourself, say, 240MB of
revocations before you need to update CCADB again.

I think the mistake here began by treating this CRL list from the
perspective of the URLs embedded within certificates, which does trigger
the design constraints you mentioned regarding things like size, and
understandably makes the temporal sharding tempting. However, that's not
the scenario being discussed here, and so with a bit more creative thinking
about the design, hopefully you can see this isn't as worrying or complex
as you feared. I imagine this may take some time to process and think
through, but hopefully this explanation made sense.


> There is one thing you mentioned that's also non-obvious to me, because I
>> would expect you already have to deal with this exact issue with respect to
>> OCSP, which is "overwriting files is a dangerous operation prone to many
>> forms of failure". Could you expand more about what some of those
>> top-concerns are? I ask, since, say, an OCSP Responder is frequently
>> implemented as "Spool /ocsp/:issuerDN/:serialNumber", with the CA
>> overwriting :serialNumber whenever they produce new responses. It sounds
>> like you're saying that common design pattern may be problematic for y'all,
>> and I'm curious to learn more.
>>
>
> Sure, happy to expand. For those following along at home, this last bit is
> relatively off-topic compared to the other sections above, so skip if you
> feel like it :)
>
> OCSP consists of hundreds of millions of small entries. Thus our OCSP
> infrastructure is backed by a database, and fronted by a caching CDN. So
> the database and the CDN get to handle all the hard problems of overwriting
> data, rather than having us reinvent the wheel. But CRL consists of
> relatively-few large entries, which is much better suited to a flat/static
> file structure like that you describe for a naive implementation of OCSP.
> For more on why we'd prefer to leave file overwriting to the experts rather
> than risk getting it wrong ourselves, see this talk
> <https://www.deconstructconf.com/2019/dan-luu-files>.
>

Without wanting to grossly over-simplify here, it equally holds that you
can be placing the CRLs within a database. I think you're assuming a
particular fixed "worst case" size for the CRL, and optimizing for that,
but as I tried to cover above, you aren't stymied by that element of PKI in
what's being asked for here.

Rob Stradling

unread,
Feb 26, 2021, 5:49:28 AM2/26/21
to Aaron Gable, ry...@sleevi.com, mozilla-dev-security-policy, Kathleen Wilson
> We already have automation for CCADB. CAs can and do use it for disclosure of intermediates.

Any CA representatives that are surprised by this statement might want to go and read the "CCADB Release Notes" (click the hyperlink when you login to the CCADB). That's the only place I've seen the CCADB API "announced".

> Since we're talking Let's Encrypt, the assumption here is that the CRL URLs
> will not be present within the crlDistributionPoints of the certificates,
> otherwise, this entire discussion is fairly moot, since those
> crlDistributionPoints can be obtained directly from Certificate Transparency.

AIUI, Mozilla is moving towards requiring that the CCADB holds all CRL URLs, even the ones that also appear in crlDistributionPoints extensions. Therefore, I think that this entire discussion is not moot at all.

Ben's placeholder text:
https://github.com/BenWilson-Mozilla/pkipolicy/commit/26c1ee4ea8be1a07f86253e38fbf0cc043e12d48

________________________________
From: dev-security-policy <dev-security-...@lists.mozilla.org> on behalf of Ryan Sleevi via dev-security-policy <dev-secur...@lists.mozilla.org>
Sent: 26 February 2021 06:02
To: Aaron Gable <aa...@letsencrypt.org>
Cc: Ryan Sleevi <ry...@sleevi.com>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>; Kathleen Wilson <kwi...@mozilla.com>
Subject: Re: CCADB Proposal: Add field called JSON Array of Partitioned CRLs Issued By This CA

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.deconstructconf.com%2F2019%2Fdan-luu-files&amp;data=04%7C01%7C%7C7f0d8383bd6d4721a6aa08d8da1c2a99%7C0e9c48946caa465d96604b6968b49fb7%7C0%7C0%7C637499162416193758%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=oj6s5vmivQ9Q8EfnzdC27%2BNxiYoGdl%2Fo%2BXyCVvYxfFs%3D&amp;reserved=0>.
>

Without wanting to grossly over-simplify here, it equally holds that you
can be placing the CRLs within a database. I think you're assuming a
particular fixed "worst case" size for the CRL, and optimizing for that,
but as I tried to cover above, you aren't stymied by that element of PKI in
what's being asked for here.
_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy&amp;data=04%7C01%7C%7C7f0d8383bd6d4721a6aa08d8da1c2a99%7C0e9c48946caa465d96604b6968b49fb7%7C0%7C0%7C637499162416203704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OeM%2BvCVj3cuFx%2BSk8c4Ep67c8yjCbRpfK8wgy5ypc54%3D&amp;reserved=0

Ryan Sleevi

unread,
Feb 26, 2021, 11:47:40 AM2/26/21
to Rob Stradling, Aaron Gable, ry...@sleevi.com, mozilla-dev-security-policy, Kathleen Wilson
On Fri, Feb 26, 2021 at 5:49 AM Rob Stradling <r...@sectigo.com> wrote:

> > We already have automation for CCADB. CAs can and do use it for
> disclosure of intermediates.
>
> Any CA representatives that are surprised by this statement might want to
> go and read the "CCADB Release Notes" (click the hyperlink when you login
> to the CCADB). That's the only place I've seen the CCADB API "announced".
>
> > Since we're talking Let's Encrypt, the assumption here is that the CRL
> URLs
> > will not be present within the crlDistributionPoints of the certificates,
> > otherwise, this entire discussion is fairly moot, since those
> > crlDistributionPoints can be obtained directly from Certificate T
> ransparency.
>
> AIUI, Mozilla is moving towards requiring that the CCADB holds all CRL
> URLs, even the ones that also appear in crlDistributionPoints extensions.
> Therefore, I think that this entire discussion is not moot at all.
>

Rob,

I think you misparsed, but that's understandable, because I worded it
poorly. The discussion is mooted by whether or not the CA includes the
cRLDP within the certificate itself - i.e. that the CA has to allocate the
shard at issuance time and that it's fixed for the lifetime of the
certificate. That's not a requirement - EEs don't need cRLDPs - and so
there's no inherent need to do static assignment, nor does it sound like LE
is looking to go that route, since it would be incompatible with the design
they outlined. Because of this, the dynamic sharding discussed seems
significantly _less_ complex, both for producers and for consumers of this
data, than the static sharding-and-immutability scheme proposed.

Aaron Gable

unread,
Feb 26, 2021, 1:47:07 PM2/26/21
to mozilla-dev-security-policy, Rob Stradling, Kathleen Wilson, Ryan Sleevi
Thanks for the reminder that CCADB automatically dereferences URLs for
archival purposes, and for the info about existing automation! I don't
personally have CCADB credentials, so all of my knowledge of it is based on
what I've learned from others at LE and from this list.

If we leave out the "new url for each re-issuance of a given CRL" portion
of the design (or offer both url-per-thisUpdate and
static-url-always-pointing-at-the-latest), then we could in fact include
CRLDP urls in the certificates using the rolling time-based shards model.
And frankly we may want to do that in the near future: maintaining both CRL
*and* OCSP infrastructure when the BRs require only one or the other is an
unnecessary expense, and turning down our OCSP infrastructure would
constitute a significant savings, both in tangible bills and in engineering
effort.

Thus, in my mind, the dynamic sharding idea you outlined has two major
downsides:
1) It requires us to maintain our parallel OCSP infrastructure
indefinitely, and
2) It is much less resilient in the face of a mass revocation event.

Fundamentally, we need our infrastructure to be able to handle the
revocation of 200M certificates in 24 hours without any difference from how
it handles the revocation of one certificate in the same period. Already
having certificates pre-allocated into CRL shards means that we can
deterministically sign many CRLs in parallel.

Dynamically assigning certificates to CRLs as they are revoked requires
taking a lock to determine if a new CRL needs to be created or not, and
then atomically creating a new one. Or it requires a separate,
not-operation-as-normal process to allocate a bunch of new CRLs, assign
certs to them, and then sign those in parallel. Neither of these --
dramatically changing not just the quantity but the *quality* of the
database access, nor introducing additional processes -- is acceptable in
the face of a mass revocation event.

In any case, I think this conversation has served the majority of its
purpose. This discussion has led to several ideas that would allow us to
update our JSON document only when we create new shards (which will still
likely be every 6 to 24 hours), as opposed to on every re-issuance of a
shard. We'd still greatly prefer that CCADB be willing to
accept-and-dereference a URL to a JSON document, as it would allow our
systems to have fewer dependencies and fewer failure modes, but understand
that our arguments may not be persuasive enough :)

If Mozilla et al. do go forward with this proposal as-is, I'd like to
specifically request that CCADB surfaces an API to update this field before
any root programs require that it be populated, and does so with sufficient
lead time for development against the API to occur.

Thanks again,
Aaron

Ryan Sleevi

unread,
Feb 26, 2021, 3:05:15 PM2/26/21
to Aaron Gable, Kathleen Wilson, Rob Stradling, Ryan Sleevi, mozilla-dev-security-policy
On Fri, Feb 26, 2021 at 1:46 PM Aaron Gable <aa...@letsencrypt.org> wrote:

> If we leave out the "new url for each re-issuance of a given CRL" portion
> of the design (or offer both url-per-thisUpdate and
> static-url-always-pointing-at-the-latest), then we could in fact include
> CRLDP urls in the certificates using the rolling time-based shards model.
> And frankly we may want to do that in the near future: maintaining both CRL
> *and* OCSP infrastructure when the BRs require only one or the other is an
> unnecessary expense, and turning down our OCSP infrastructure would
> constitute a significant savings, both in tangible bills and in engineering
> effort.
>

This isn’t quite correct. You MUST support OCSP for EE certs. It is only
optional for intermediates. So you can’t really contemplate turning down
the OCSP side, and that’s intentional, because clients use OCSP, rather
than CRLs, as the fallback mechanism for when the aggregated-CRLs fail.

I think it would be several years off before we could practically talk
about removing the OCSP requirement, once much more reliable CRL profiles
are in place, which by necessity would also mean profiling the acceptable
sharding algorithms.

Further, under today’s model, while you COULD place the CRLDP within the
certificate, that seems like it would only introduce additional cost and
limitation without providing you benefit. This is because major clients
won’t fetch the CRLDP for EE certs (especially if OCSP is present, which
the BRs MUST/REQUIRE). You would end up with some clients querying (such as
Java, IIRC), so you’d be paying for bandwidth, especially in your mass
revocation scenario, that would largely be unnecessary compared to the
status quo.

Thus, in my mind, the dynamic sharding idea you outlined has two major
> downsides:
> 1) It requires us to maintain our parallel OCSP infrastructure
> indefinitely, and
>

To the above, I think this should be treated as a foregone conclusion in
today’s requirements. So I think mostly the discussion here focuses on #2,
which is really useful.

2) It is much less resilient in the face of a mass revocation event.
>
> Fundamentally, we need our infrastructure to be able to handle the
> revocation of 200M certificates in 24 hours without any difference from how
> it handles the revocation of one certificate in the same period. Already
> having certificates pre-allocated into CRL shards means that we can
> deterministically sign many CRLs in parallel.
>

You can still do parallel signing. I was trying to account for that
explicitly with the notion of the “pre-reserved” set of URLs. However, that
also makes an assumption I should have been more explicit about: whether
the expectation is “you declare, then fill, CRLs”, or whether it’s
acceptable to “fill, then declare, CRLs”. I was trying to cover the former,
but I don’t think there is any innate prohibition on the latter, and it was
what I was trying to call out in the previous mail.

I do take your point about deterministically, because the process I’m
describing is implicitly assuming you have a work queue (e.g. pub/sub, go
channel, etc), in which certs to revoke go in, and one or more CRL signers
consume the queue and produce CRLs. The order of that consumption would be
non-deterministic, but it very much would be parallelizable, and you’d be
in full control over what the work unit chunks were sized at.

>
> Dynamically assigning certificates to CRLs as they are revoked requires
> taking a lock to determine if a new CRL needs to be created or not, and
> then atomically creating a new one. Or it requires a separate,
> not-operation-as-normal process to allocate a bunch of new CRLs, assign
> certs to them, and then sign those in parallel. Neither of these --
> dramatically changing not just the quantity but the *quality* of the
> database access, nor introducing additional processes -- is acceptable in
> the face of a mass revocation event.
>

Right, neither of these are required if you can “produce, then declare”.
>From the client perspective, a consuming party cannot observe any
meaningful difference from the “declare, then produce” or the “produce,
then declare”, since in both cases, they have to wait for the CRL to be
published on the server before they can consume. The fact that they know
the URL, but the content is stale/not yet updated (I.e. the declare then
produce scenario) doesn’t provide any advantages. Ostensibly, the “produce,
then declare” gives greater advantage to the client/root program, because
then they can say “All URLs must be correct at time of declaration” and use
that to be able to quantify whether or not the CA met their timeline
obligations for the mass revocation event.

In any case, I think this conversation has served the majority of its
> purpose. This discussion has led to several ideas that would allow us to
> update our JSON document only when we create new shards (which will still
> likely be every 6 to 24 hours), as opposed to on every re-issuance of a
> shard. We'd still greatly prefer that CCADB be willing to
> accept-and-dereference a URL to a JSON document, as it would allow our
> systems to have fewer dependencies and fewer failure modes, but understand
> that our arguments may not be persuasive enough :)
>

<Google-Hat>We’re just one potential consumer, and not even the most urgent
of potential consumers (I.e. we would not be immediately taking advantage
of this as others may). You’ve raised a lot of good points, and also
highlighted a good opportunity to better communicate some of our
assumptions in the design - e.g. the ability for CAs to programmatically
update such contents being an essential property - that have been discussed
and are on the MVP implementation plan, but not communicated as such. We
definitely want to make sure ALL CCADB members are comfortable.</Google-Hat>

The tension here is the tradeoffs/risks to Root Programs (which,
admittedly, are not always obvious or well communicated) with the potential
challenges for CAs. I personally am definitely not trying to make CAs do
all the work, but I’m sensitive to the fact that a system that requires 70
CAs to do 1 thing feels like it scales better than requiring N root
Programs to do 70 things :)

If Mozilla et al. do go forward with this proposal as-is, I'd like to
> specifically request that CCADB surfaces an API to update this field before
> any root programs require that it be populated, and does so with sufficient
> lead time for development against the API to occur.
>

Agreed - I do think having a well-tested, reliable path for programmatic
update is an essential property to mandating the population. My hope and
belief, however, is that this is fairly light-weight and doable.

The primary benefits I see to this approach is it moves us from poll (where
the onus is on separate CCADB members/consumers) to push (where the
responsibility is on the CA to notify), and that it gives an auditable
historic archive “for free”, without requiring yet another bespoke archival
tool be created. Both of these enable greater CA accountability, which is
why I feel they’re important, but they definitely should be balanced
against the tradeoffs that CAs may have to undergo.

>

Aaron Gable

unread,
Feb 26, 2021, 6:01:10 PM2/26/21
to Ryan Sleevi, Kathleen Wilson, Rob Stradling, mozilla-dev-security-policy
On Fri, Feb 26, 2021 at 12:05 PM Ryan Sleevi <ry...@sleevi.com> wrote:

> You can still do parallel signing. I was trying to account for that
> explicitly with the notion of the “pre-reserved” set of URLs. However, that
> also makes an assumption I should have been more explicit about: whether
> the expectation is “you declare, then fill, CRLs”, or whether it’s
> acceptable to “fill, then declare, CRLs”. I was trying to cover the former,
> but I don’t think there is any innate prohibition on the latter, and it was
> what I was trying to call out in the previous mail.
>
> I do take your point about deterministically, because the process I’m
> describing is implicitly assuming you have a work queue (e.g. pub/sub, go
> channel, etc), in which certs to revoke go in, and one or more CRL signers
> consume the queue and produce CRLs. The order of that consumption would be
> non-deterministic, but it very much would be parallelizable, and you’d be
> in full control over what the work unit chunks were sized at.
>
> Right, neither of these are required if you can “produce, then declare”.
> From the client perspective, a consuming party cannot observe any
> meaningful difference from the “declare, then produce” or the “produce,
> then declare”, since in both cases, they have to wait for the CRL to be
> published on the server before they can consume. The fact that they know
> the URL, but the content is stale/not yet updated (I.e. the declare then
> produce scenario) doesn’t provide any advantages. Ostensibly, the “produce,
> then declare” gives greater advantage to the client/root program, because
> then they can say “All URLs must be correct at time of declaration” and use
> that to be able to quantify whether or not the CA met their timeline
> obligations for the mass revocation event.
>

I think we managed to talk slightly past each other, but we're well into
the weeds of implementation details so it probably doesn't matter much :)
The question in my mind was not "can there be multiple CRL signers
consuming revocations from the queue?"; but rather "assuming there are
multiple CRL signers consuming revocations from the queue, what
synchronization do they have to do to ensure that multiple signers don't
decide the old CRL is full and allocate new ones at the same time?". In the
world where every certificate is pre-allocated to a CRL shard, no such
synchronization is necessary at all.

This conversation does raise a different question in my mind. The Baseline
Requirements do not have a provision that requires that a CRL be re-issued
within 24 hours of the revocation of any certificate which falls within its
scope. CRLs and OCSP responses for Intermediate CAs are clearly required to
receive updates within 24 hours of the revocation of a relevant certificate
(sections 4.9.7 and 4.9.10 respectively), but no such requirement appears
to exist for end-entity CRLs. The closest is the requirement that
subscriber certificates be revoked within 24 hours after certain conditions
are met, but the same structure exists for the conditions under which
Intermediate CAs must be revoked, suggesting that the BRs believe there is
a difference between revoking a certificate and *publishing* that
revocation via OCSP or CRLs. Is this distinction intended by the root
programs, and does anyone intend to change this status quo as more emphasis
is placed on end-entity CRLs?

Or more bluntly: in the presence of OCSP and CRLs being published side by
side, is it expected that the CA MUST re-issue a sharded end-entity CRL
within 24 hours of revoking a certificate in its scope, or may the CA wait
to re-issue the CRL until its next 7-day re-issuance time comes up as
normal?

Agreed - I do think having a well-tested, reliable path for programmatic
> update is an essential property to mandating the population. My hope and
> belief, however, is that this is fairly light-weight and doable.
>

Thanks, I look forward to hearing more about what this will look like.

Aaron

Ryan Sleevi

unread,
Feb 26, 2021, 8:19:02 PM2/26/21
to Aaron Gable, Ryan Sleevi, Kathleen Wilson, Rob Stradling, mozilla-dev-security-policy
Oh, I meant they could be signing independent CRLs (e.g. each has an IDP
with a prefix indicating which shard-generator is running), and at the end
of the queue-draining ceremony, you see what CRLs each worker created, and
add those to the JSON. So you could have multiple "small" CRLs (one or more
for each worker, depending on how you manage things), allowing them to
process revocations wholly independently. This, of course, relies again on
the assumption that the cRLDP is not baked into the certificate, which
enables you to have maximum flexibility in how CRL URLs are allocated and
sharded, provided the sum union of all of their contents reflects the CA's
state.


> This conversation does raise a different question in my mind. The Baseline
> Requirements do not have a provision that requires that a CRL be re-issued
> within 24 hours of the revocation of any certificate which falls within its
> scope. CRLs and OCSP responses for Intermediate CAs are clearly required to
> receive updates within 24 hours of the revocation of a relevant certificate
> (sections 4.9.7 and 4.9.10 respectively), but no such requirement appears
> to exist for end-entity CRLs. The closest is the requirement that
> subscriber certificates be revoked within 24 hours after certain conditions
> are met, but the same structure exists for the conditions under which
> Intermediate CAs must be revoked, suggesting that the BRs believe there is
> a difference between revoking a certificate and *publishing* that
> revocation via OCSP or CRLs. Is this distinction intended by the root
> programs, and does anyone intend to change this status quo as more emphasis
> is placed on end-entity CRLs?
>
> Or more bluntly: in the presence of OCSP and CRLs being published side by
> side, is it expected that the CA MUST re-issue a sharded end-entity CRL
> within 24 hours of revoking a certificate in its scope, or may the CA wait
> to re-issue the CRL until its next 7-day re-issuance time comes up as
> normal?
>

I recall this came up in the past (with DigiCert, [1]), in which
"revocation" was enacted by setting a flag in a database (or perhaps that
was an *extra* incident, with a different CA), but not through the actual
publication and propagation of that revocation information from DigiCert's
systems through the CDN. The issue at the time was with respect to
4.9.1.1's requirements of whether or "SHALL revoke" is a matter of merely a
server-side bit, or whether it's the actual publication of that revocation
within the Repository (as reflected by the CRL).

I do believe it's problematic for the OCSP and CRL versions of the
repository to be out of sync, but also agree this is an area that is useful
to clarify. To that end, I filed
https://github.com/cabforum/servercert/issues/252 to make sure we don't
lose track of this for the BRs.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1640805

>

Aaron Gable

unread,
Feb 26, 2021, 8:53:17 PM2/26/21
to Ryan Sleevi, Kathleen Wilson, Rob Stradling, mozilla-dev-security-policy
On Fri, Feb 26, 2021 at 5:18 PM Ryan Sleevi <ry...@sleevi.com> wrote:

> I do believe it's problematic for the OCSP and CRL versions of the
> repository to be out of sync, but also agree this is an area that is useful
> to clarify. To that end, I filed
> https://github.com/cabforum/servercert/issues/252 to make sure we don't
> lose track of this for the BRs.
>

Thanks! I like that bug, and commented on it to provide a little more
clarity for how the question arose in my mind and what language we might
want to update. It sounds like maybe what we want is language to the effect
that, if a CA is publishing both OCSP and CRLs, then a certificate is not
considered Revoked until it shows up as Revoked in both revocation
mechanisms. (And it must be Revoked within 24 hours.)

We'll make sure our parallel CRL infrastructure re-issues CRLs
close-to-immediately after a certificate in that shard's scope is revoked,
just as we do for OCSP today.

Thanks again,
Aaron
Reply all
Reply to author
Forward
0 new messages