Survey of TLSBRv2 §7.1.2.7.6 extension criticality non-compliance

802 views
Skip to first unread message

Rob Stradling

unread,
Apr 8, 2024, 12:45:30 PMApr 8
to CCADB Public
In recent weeks, a number of CAs have filed incident reports relating to mistakes made when setting critical flags in Subscriber certificate extensions since the TLSBRv2 profiles came into force.  We thought it would be worth performing a comprehensive survey ourselves in order to discover if any similar incidents at other CAs had not yet been detected.

I've run [1] against the primary crt.sh DB, which caused it to trawl through the crt.sh ID space starting around the time TLSBRv2 went into force to identify any Subscriber certificate containing any common extension with its critical flag set incorrectly per §7.1.2.7.6.  I've posted a report of the results at [2], which was generated using [3].

Seven further incidents were identified.  I sent Certificate Problem Reports to the two CAs whose affected PKI hierarchies are trusted by root programs whose representatives are active in monitoring Bugzilla.  Both of those CAs responded promptly and filed incident reports: [4] and [5].

Having gathered this data, today I've used it to cross-check the lists of affected certificates that CAs have provided with their incident reports.  I was surprised to find two bugs ([6] and [7]) without any attached list of affected certificates.  I also observed some patterns of "omissions" in the disclosed lists of affected certificates, for which I would like to call upon the root program owners to clarify their expectations; noting that the CCADB incident reporting requirements [8] say that each incident report's "Appendix must include a listing of the complete certificate details of all affected certificates":
  1. Is a CA's incident report expected to disclose the affected certificates that have already expired prior to the CA's response to the incident?
  2. Is a CA's incident report expected to disclose the affected certificates that have already been revoked prior to the CA's response to the incident?
  3. Is a CA's incident report expected to disclose both an affected precertificate and its corresponding certificate?  Or just one of the pair?



--
Rob Stradling
Senior Research & Development Scientist
Sectigo Limited

Chris Clements

unread,
Apr 11, 2024, 1:03:17 PMApr 11
to Rob Stradling, CCADB Public

Hi Rob,


Thank you for the comprehensive survey and for clearly communicating your findings. 


In response to your questions, and from the perspective of the Chrome Root Program:


1. Is a CA's incident report expected to disclose the affected certificates that have already expired prior to the CA's response to the incident?


We see disclosing the full set of affected certificates, regardless of whether they have expired or have been revoked, as presenting the community with the most complete perspective of an incident’s impact. This is our preferred approach.


2. Is a CA's incident report expected to disclose the affected certificates that have already been revoked prior to the CA's response to the incident?


Yes, similar to the previous question, our preference is to collect the most complete perspective possible. 


3. Is a CA's incident report expected to disclose both an affected precertificate and its corresponding certificate?  Or just one of the pair?


You raise an opportunity for improvement. Historically, a list of precertificates was considered acceptable. However, having both precertificates and final certificates provides a more comprehensive perspective, which we consider favorable. 


We appreciate other thoughts and perspectives.


Additionally, we’ll plan to sync on these opinions with the other members of the CCADB Steering Committee, which could ultimately lead to an update of https://www.ccadb.org/cas/incident-report


Thanks again!

-Chris



--
You received this message because you are subscribed to the Google Groups "CCADB Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public+un...@ccadb.org.
To view this discussion on the web visit https://groups.google.com/a/ccadb.org/d/msgid/public/MW4PR17MB47290848C0FE089BD12FA77AAA002%40MW4PR17MB4729.namprd17.prod.outlook.com.

Aaron Gable

unread,
Apr 11, 2024, 1:33:43 PMApr 11
to Chris Clements, Rob Stradling, CCADB Public
In general, I agree that producing the most complete set of data possible is the most desirable course of action. However, I wonder how this desire interacts with the full scale of the WebPKI.

Two years ago, Let's Encrypt had an incident which affected 100% of our validations conducted via the TLS-ALPN-01 method. The end result was 10 zstd-compressed files, 10 megabytes each (due to Bugzilla's attachment size limits), together containing 2.7 million crt.sh URIs. Those files represented only one version of each certificate: usually the final certificate, but sometimes the precertificate for those issuances where production of the final certificate failed. The files also represented only the certificates which were unexpired at the time that the incident was discovered, only 18% of the total incident period.

If the list of affected certificates had included both the pre- and final certificates, and had covered the full incident period, it would have been a full gigabyte of compressed URLs. (10 MB per file x 10 files x 2 for both kinds of certs x 5 to go from 18% to 100% of incident period.) And this incident affected only a small fraction of Let's Encrypt's total issuance volume -- if the issue had been with our HTTP-01 method over the same incident period, the resulting list of URLs would have been nearly 20 gigabytes. Would this larger set of certificate data actually have been useful to the community, given that they were already untrusted?

Acquiring this fuller list would have significantly increased the time taken to conduct the investigation. Let's Encrypt prunes data about already-expired certificates from our easily-queriable database to prevent it from growing without bound, so the investigation would have had to start pulling in log data, which is a much slower process for both writing and executing the relevant queries. Would this additional investigation time, and correspondingly slower incident response and remediation, have been worthwhile?

It is possible that the incident period could have exceeded the audit log retention period (currently 2 years) required by the BRs. In that case, producing a full list of certificates would have been impossible. The certificates themselves contain no indication of what validation method was used for each identifier, so reconstruction from CT doesn't work. What would be the appropriate action if producing the full list of historically-affected certificates is not possible?

Thanks,
Aaron

Ryan Dickson

unread,
Apr 11, 2024, 4:11:40 PMApr 11
to Aaron Gable, Chris Clements, Rob Stradling, CCADB Public

Hi Aaron,


You raise some excellent points. Thanks for your feedback. It seems like (1) we generally agree on a goal of providing the most complete set of data possible, and (2) there’s an opportunity to balance our desires for the completeness, usefulness, and practicality of the certificate data in question.


With this community’s help (especially yours), the CCADB Steering Committee launched an updated Incident Report Template in October 2023. Though this template has only been in use for a short while, we believe there are opportunities to further promote consistency and transparency in Incident Reporting. 


One enhancement idea was to include a bulleted list of specific questions that would better guide responses for all given sections rather than the current free-form response approach based on some of the section descriptions on CCADB.org (i.e., the Impact section). The Impact and Appendix sections are similar in that they intend to describe the size and nature of the incident, and it's possible an enhancement to one can benefit the other.  


For example, the “Impact" section could be transformed…


From (current): The Impact section should contain a short description of the size and nature of the incident. For example: how many certificates, OCSP responses, or CRLs were affected; whether the affected objects share features (such as issuance time, signature algorithm, or validation type); and whether the CA Owner had to cease issuance during the incident.


To (illustrative): something like the following (intends to describe fields and expected responses)…


  • Total number of pre-certificates: [if applicable, the total count of pre-certificates affected by the issue(s) described in this incident report, including expired and revoked pre-certificates]


  • Total number of certificates: [if applicable, the total count of "final" certificates affected by the issue(s) described in this incident report, including expired and revoked certificates]


  • Total number of "remaining valid" certificates: [if applicable, the total count of "final" certificates affected by the issue(s) described in this incident report, minus expired and revoked certificates. Minimally, this set of certificates MUST be disclosed in the Appendix section of this report.]


  • Incident heuristic: [if applicable, EITHER: (a) describe a heuristic that would allow a third-party to assemble the full corpus of affected certificates, if not provided in the Appendix (e.g., "Any certificate containing policy OID 1.2.3.4.5.6 and issued between 11/13/2024 and 4/11/2024 is affected by this incident. Certificates that have been revoked or are expired are omitted from the certificate list disclosed to the Appendix.") --- (b) clearly explain why this isn't possible (e.g., "This incident affected every certificate issued between 5/25/2023 and 6/15/2024 that relied upon BR Validation Method 3.2.2.4.19. Because the relied upon validation method is not described in a certificate, this heuristic cannot be used by a third-party to assemble the full corpus of affected certificates. Certificates that have been revoked or expired have been omitted from the certificate list disclosed to the Appendix.), --- or (c) the full corpus of affected certificates are disclosed in the Appendix.]


  • Was issuance stopped in response to this incident, and why or why not?: [yes/no - explanation (e.g., "Yes. As described in the incident timeline, we stopped issuance after learning of this issue to correct the corresponding certificate profile.")]


This is just an example, and we might need to more thoughtfully consider incidents that don’t involve certificates before it can be considered for adoption. What’s helpful to us, though, is that this proposed approach can more consistently describe the impact of an incident - while also possibly offering a balance between our desire for completeness (i.e., satisfied by counts and a clear description of an incident heuristic) and practicality (only requiring disclosure of the “remaining valid" certificates in the Appendix). There might be unexpected benefits from this approach, for example, the heuristic may make it easier for other CA Owners to evaluate whether they share the same issue being reported.


We’re interested in your feedback, and that of other community members, in how this might help better define community expectations - and further improve the incident reporting process.


Thanks,

Ryan



Aaron Gable

unread,
Apr 11, 2024, 4:50:28 PMApr 11
to Ryan Dickson, Chris Clements, Rob Stradling, CCADB Public
Honestly, I like this approach. I think that describing the whole affected population in the Impact section, while only providing the full certificate data in the appendix for the still-valid certificates, is a good place to land. Specifically, I think that it is most useful for the Appendix to list exactly the set of certificates that will be (or should be) revoked as a result of the incident. This makes it easy for community members to verify that all listed certificates have in fact been revoked as promised, and to determine whether specific certificates identified by third-parties are included in the list. 

I think your illustrative language is aimed in exactly the right direction, and I especially like the idea of the "incident heuristic".

My two suggestions for refinement would be:
1) In the Impact section, don't bother distinguishing between precertificate and final certificates by default (since these numbers are usually nearly identical), but make a note that the CA definitely should list them separately if the incident affected precerts and final certs differently.
2) In the appendix, maybe add to the "preferred format" that we prefer crt.sh sha256 links to precertificates specifically. This removes any ambiguity, helps ensure that CAs don't forget to list certs for which precert issuance succeeded but final issuance failed, and means that CAs don't have to try to submit all the affected final certs to CT and then wait for crt.sh to ingest them as part of incident response.

Thanks,
Aaron

Andrew Ayer

unread,
Apr 15, 2024, 10:15:32 AMApr 15
to Aaron Gable, 'Aaron Gable' via CCADB Public, Chris Clements, Rob Stradling
Hi Aaron,

On Thu, 11 Apr 2024 10:33:30 -0700
"'Aaron Gable' via CCADB Public" <pub...@ccadb.org> wrote:

> Acquiring this fuller list would have significantly increased the time
> taken to conduct the investigation. Let's Encrypt prunes data about
> already-expired certificates from our easily-queriable database to
> prevent it from growing without bound, so the investigation would
> have had to start pulling in log data, which is a much slower process
> for both writing and executing the relevant queries. Would this
> additional investigation time, and correspondingly slower incident
> response and remediation, have been worthwhile?

When a CA claims that something is difficult, I think it's important to
gather as many details about the difficulty as possible, particularly
when it's being used as motivation for relaxing a requirement. So I
hope you can provide more details, and answer the following questions:

Are the challenges with acquiring a full list of affected certificates
applicable only to expired certificates, or also unexpired certificates?

What makes your database for expired certificates less easily-queryable?

Does it require additional staff time to query, or is it just a matter
of waiting for a query to complete?

How much longer would incident response and remediation take if you had
to query your last 2 years of expired and unexpired certificates, as
opposed to only unexpired certificates?

Regards,
Andrew

Andrew Ayer

unread,
Apr 15, 2024, 2:09:16 PMApr 15
to Ryan Dickson, 'Ryan Dickson' via CCADB Public, Aaron Gable, Chris Clements, Rob Stradling
Hi Ryan,

On Thu, 11 Apr 2024 16:11:00 -0400
"'Ryan Dickson' via CCADB Public" <pub...@ccadb.org> wrote:

> Total number of pre-certificates: [if applicable, the total count
> of pre-certificates affected by the issue(s) described in this
> incident report, including expired and revoked pre-certificates]
>
> Total number of certificates: [if applicable, the total count of
> "final" certificates affected by the issue(s) described in this
> incident report, including expired and revoked certificates]
>
> Total number of "remaining valid" certificates: [if applicable, the
> total count of "final" certificates affected by the issue(s) described in
> this incident report, minus expired and revoked certificates. Minimally,
> this set of certificates MUST be disclosed in the Appendix section of this
> report.]

I don't think it's a good idea to make a distinction between
precertificates and final certificates in incident reporting. Though
in rare cases a distinction makes sense (e.g. an encoding issue that
only appears in one or the other), in the vast majority of incidents,
certificates and precertificates are both equally good evidence of the
underlying non-compliant issuance event.

In particular, every precertificate implies the existence of a
corresponding final certificate whether the CA says they issued it
or not. Treating final certificates and precertificates as equivalent
during incident reporting reinforces this rather important facet of CT.
Treating them differently may give the impression that "precertificate
misissuance" is less bad than "certificate misissuance", a corrosive idea
that CAs have repeatedly tried to exploit.

I'm also deeply uncomfortable with removing the requirement to disclose
all affected certificates (or their equivalent precertificates). I would
think that generating a list of affected certificates would be an easy
byproduct of the investigation that CAs should be conducting anyways.
This is particularly true if the CA is revoking the certificates, but
even if the certificates are already expired, the CA should still be
scanning their corpus to generate a count of affected certificates.

Removing the requirement to produce this byproduct would at best be
requiring third parties to duplicate work already done by the CA. At
worst, it would allow CAs to cut corners in their investigations (e.g.
by just guessing the number of affected certificates).

If there is a way to reduce the overhead of generating the list, that's
good to pursue (and it seems like allowing certificates and
precertificates to be used interchangeably would help), but CAs should
still be required to produce the list.

Regards,
Andrew

Aaron Gable

unread,
Apr 15, 2024, 3:00:59 PMApr 15
to Andrew Ayer, Ryan Dickson, 'Ryan Dickson' via CCADB Public, Chris Clements, Rob Stradling
On Mon, Apr 15, 2024 at 7:15 AM Andrew Ayer <ag...@andrewayer.name> wrote:
Are the challenges with acquiring a full list of affected certificates
applicable only to expired certificates, or also unexpired certificates?

Only to expired certificates. Let's Encrypt did provide the full data on all unexpired certificates in that incident report. All statements in my email above were with regards to going further than that to additionally provide data on certificates that were already untrusted in the WebPKI due to expiration.
 
What makes your database for expired certificates less easily-queryable?

We do not maintain a database of expired certificates. As I said, Let's Encrypt prunes data regarding long-since-expired certificates from the database to prevent it from growing without bound. Audit log data is of course retained for the period required by the BRs, but searching text logs stored on magnetic tape is much harder than querying structured databases.
 
Does it require additional staff time to query, or is it just a matter
of waiting for a query to complete?

Both. Writing, debugging, testing, and validating the scripts which perform custom searches across text data takes longer than writing database queries, and then executing those scripts against terabytes of logs takes longer than running database queries. 
 
How much longer would incident response and remediation take if you had
to query your last 2 years of expired and unexpired certificates, as
opposed to only unexpired certificates?

Based on our more recent incident, which did require going to tape to query logs covering the last ~2 years, I estimate that it would have added a week to the investigation.

Thanks,
Aaron

Dimitris Zacharopoulos (HARICA)

unread,
Apr 16, 2024, 12:24:31 AMApr 16
to Andrew Ayer, Ryan Dickson, 'Ryan Dickson' via CCADB Public, Aaron Gable, Chris Clements, Rob Stradling


On 15/4/2024 5:13 μ.μ., Andrew Ayer wrote:
In particular, every precertificate implies the existence of a
corresponding final certificate whether the CA says they issued it
or not.  Treating final certificates and precertificates as equivalent
during incident reporting reinforces this rather important facet of CT.
Treating them differently may give the impression that "precertificate
misissuance" is less bad than "certificate misissuance", a corrosive idea
that CAs have repeatedly tried to exploit.

I agree that disclosing a precertificate should be considered sufficient for public incident reports, and there should be no obligation to log the "final" certificate. Every "final" certificate must include SCTs of a logged precertificate so everything needed for further investigation is already included in the trusted CT logs.

Dimitris.
Reply all
Reply to author
Forward
0 new messages