2017.08.10 Let's Encrypt Unicode Normalization Compliance Incident

688 views
Skip to first unread message

jo...@letsencrypt.org

unread,
Aug 10, 2017, 11:00:54 PM8/10/17
to mozilla-dev-s...@lists.mozilla.org
At 11:30am PST on August 10, 2017, Let’s Encrypt was made aware of a compliance issue regarding unicode normalization of domain names. During the same day we were made aware of the issue, all unexpired non-compliant certificates were found and revoked, a fix was applied to our CA systems, and we communicated with our community. We consider the matter to be fully resolved at this point, please let us know if we missed anything.

We were notified by a community member that Let's Encrypt had issued a number of certificates containing punycode domain names which had not undergone compliant unicode normalization. We confirmed that this was the case and began investigating our code and the relevant RFCs.

We noticed that the code we added to check the validity of punycode encodings in CSRs when we implemented support for IDNs didn't enforce any form of Unicode Normalization. We started developing a fix. After seeking outside insight into the issue and reading the relevant reference documents we came to the conclusion that Normalization Form KC was required. The BRs reference RFC 5280, which in turn references the encoding specified in RFC 3490 for IDNs, which requires Normalization Form KC. We finished our fix and deployed it to our CA at 5:20PM PST.

While developing the fix we searched our issuance databases for all unexpired certificates containing punycode DNS names and checked them for non-NFKC compliant names. We found 16, which are listed below. We revoked these certificates and notified the subscribers who requested them.

I would like to thank the community members that discovered this issue, as well as the Let's Encrypt team that worked hard to resolve it quickly.

--
Josh Aas
Executive Director
Internet Security Research Group
Let's Encrypt: A Free, Automated, and Open CA

Serial numbers of affected and revoked certificates:

03d03877cbcec666b81340ed6a39c47556d1
03758d04a7816ba893847658e636a58e1f71
03ef82538ca2e54e97ae2b180ecb32f8cee4
044568f36455d8847713adb24c69e60bf123
033e73ebfd2f270bc6109925f1ed40edca8b
038295d240613cdb9367506c0d3cf8002401
03556fbc38b13ea3a9b7f4dd59dacc350293
030cfe12721e17ca02c095b4a0c5e60ca8da
03ca6617e634f2f5ad9224ca32ca4c835909
03bd090cfe0fbd07b4fc60df07bbc5770b35
0314017b4eab87bb0f211e9e2bb329ca4297
03f48a8c02c473ce971236b6407ad7d00d89
03bfa7b8f318a30a88894523ebd2717ea9b4
032d7c46b0a815faa41a1876fed4d66a9993
039f94badc798eea44f8c81ceb0515024871
038f81a32455e41b702ffb1732186be3a007

Alex Gaynor

unread,
Aug 11, 2017, 9:19:57 AM8/11/17
to jo...@letsencrypt.org, mozilla-dev-s...@lists.mozilla.org
Hi Josh,

Given that these were all caught by cablint, has Let's Encrypt considered
integrating it into your issuance pipeline, or automatically monitoring
crt.sh (which runs cablint) for these issues so they don't need to be
caught manually by researchers?

Alex
> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>

Nick Lamb

unread,
Aug 11, 2017, 11:40:34 AM8/11/17
to mozilla-dev-s...@lists.mozilla.org
On Friday, 11 August 2017 14:19:57 UTC+1, Alex Gaynor wrote:
> Given that these were all caught by cablint, has Let's Encrypt considered
> integrating it into your issuance pipeline, or automatically monitoring
> crt.sh (which runs cablint) for these issues so they don't need to be
> caught manually by researchers?

The former has the risk of being unexpectedly fragile, the latter puts a bunch of work on crt.sh which (even if Rob says it's OK) is a bit out of order.

I would suggest that Let's Encrypt could automatically run cablint on some fraction of just issued certificates (where "some fraction" might be 100% if they've got the resources) and summarise the output for internal review. They're obliged to keep all the certificates they've just issued online by their own system design (ACME offers to deliver the certificate again if you ask for it) anyway.

This way: If cablint breaks, or won't complete in a timely fashion during high volume issuance, it doesn't break the CA itself. But on the other hand it also doesn't wail on Comodo's generously offered public service crt.sh.

Also, while on the subject I commend to any researchers at least as interested in the contents of the CT logs as myself the building of an actual Log Monitor of their own rather than relying on crt.sh. This is for several reasons, off the top of my head:

1. The Logs have anti-tamper features, but if only Comodo (and Google) look at the Logs themselves then we miss out on much real benefit from those features because we will never actually detect any tampering, we'd have to be told about it by someone we trust.

2. The Logs are obliged to achieve reasonable performance to hit Google's targets and will accordingly have been built to be robust, Rob has put lots of effort into crt.sh but it definitely has... off days. With my own monitor at least I can fix it when that happens.

3. You can custom design your Monitor around your interests. One completely reasonable thing to do, for example, would be to throw away all the DV certs if you are focused on Organizational details in certificates, another might be to discard all the certs from the bigger Issuers to focus only on what's going on with small CAs and sub-CAs that issue smaller volumes and thus might escape notice with anything they've got wrong.

Ryan Sleevi

unread,
Aug 11, 2017, 11:49:29 AM8/11/17
to Nick Lamb, mozilla-dev-security-policy
On Fri, Aug 11, 2017 at 11:40 AM, Nick Lamb via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> On Friday, 11 August 2017 14:19:57 UTC+1, Alex Gaynor wrote:
> > Given that these were all caught by cablint, has Let's Encrypt considered
> > integrating it into your issuance pipeline, or automatically monitoring
> > crt.sh (which runs cablint) for these issues so they don't need to be
> > caught manually by researchers?
>
> The former has the risk of being unexpectedly fragile,


Could you expand on this? It's not obvious what you mean.


> This way: If cablint breaks, or won't complete in a timely fashion during
> high volume issuance, it doesn't break the CA itself. But on the other hand
> it also doesn't wail on Comodo's generously offered public service crt.sh.
>

Could you expand on what you mean by "cablint breaks" or "won't complete in
a timely fashion"? That doesn't match my understanding of what it is or how
it's written, so perhaps I'm misunderstanding what you're proposing?

Kurt Roeckx

unread,
Aug 11, 2017, 12:32:53 PM8/11/17
to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy
On Fri, Aug 11, 2017 at 11:48:50AM -0400, Ryan Sleevi via dev-security-policy wrote:
> On Fri, Aug 11, 2017 at 11:40 AM, Nick Lamb via dev-security-policy <
> dev-secur...@lists.mozilla.org> wrote:
>
> > On Friday, 11 August 2017 14:19:57 UTC+1, Alex Gaynor wrote:
> > > Given that these were all caught by cablint, has Let's Encrypt considered
> > > integrating it into your issuance pipeline, or automatically monitoring
> > > crt.sh (which runs cablint) for these issues so they don't need to be
> > > caught manually by researchers?
> >
> > The former has the risk of being unexpectedly fragile,
>
>
> Could you expand on this? It's not obvious what you mean.
>
>
> > This way: If cablint breaks, or won't complete in a timely fashion during
> > high volume issuance, it doesn't break the CA itself. But on the other hand
> > it also doesn't wail on Comodo's generously offered public service crt.sh.
> >
>
> Could you expand on what you mean by "cablint breaks" or "won't complete in
> a timely fashion"? That doesn't match my understanding of what it is or how
> it's written, so perhaps I'm misunderstanding what you're proposing?

My understand is that it used to be very slow for crt.sh, but
that something was done to speed it up. I don't know if that change
was something crt.sh specific. I think it was changed to not
always restart, but have a process that checks multiple
certificates.


Kurt

Nick Lamb

unread,
Aug 11, 2017, 1:22:27 PM8/11/17
to mozilla-dev-s...@lists.mozilla.org
On Friday, 11 August 2017 16:49:29 UTC+1, Ryan Sleevi wrote:
> Could you expand on this? It's not obvious what you mean.

I guess I was unclear. My concern was that one obvious way to approach this is to set things up so that after the certificate is signed, Boulder runs cablint, and if it finds anything wrong with that signed certificate the issuance fails, no certificate is delivered to the applicant and it's flagged to Let's Encrypt administrators as a problem.

[ Let's Encrypt doesn't do CT pre-logging, or at least it certainly didn't when I last looked, so this practice would leave no trace of the problematic cert ]

In that case any bug in certlint (which is certainly conceivable) breaks the entire issuance pipeline for Let's Encrypt, which is what my employer would call a "Severity 1 issue", ie now people need to get woken up and fix it immediately. That seems like it makes Let's Encrypt more fragile.

> Could you expand on what you mean by "cablint breaks" or "won't complete in
> a timely fashion"? That doesn't match my understanding of what it is or how
> it's written, so perhaps I'm misunderstanding what you're proposing?

As I understand it, cablint is software, and software can break or be too slow. If miraculously cablint is never able to break or be too slow then I take that back, although as a programmer I would be interested to learn how that's done.

Ryan Sleevi

unread,
Aug 11, 2017, 1:33:40 PM8/11/17
to Nick Lamb, mozilla-dev-security-policy
On Fri, Aug 11, 2017 at 1:22 PM, Nick Lamb via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> On Friday, 11 August 2017 16:49:29 UTC+1, Ryan Sleevi wrote:
> > Could you expand on this? It's not obvious what you mean.
>
> I guess I was unclear. My concern was that one obvious way to approach
> this is to set things up so that after the certificate is signed, Boulder
> runs cablint, and if it finds anything wrong with that signed certificate
> the issuance fails, no certificate is delivered to the applicant and it's
> flagged to Let's Encrypt administrators as a problem.
>
> [ Let's Encrypt doesn't do CT pre-logging, or at least it certainly didn't
> when I last looked, so this practice would leave no trace of the
> problematic cert ]
>
> In that case any bug in certlint (which is certainly conceivable) breaks
> the entire issuance pipeline for Let's Encrypt, which is what my employer
> would call a "Severity 1 issue", ie now people need to get woken up and
> fix it immediately. That seems like it makes Let's Encrypt more fragile.
>

I'm not sure this is a particularly compelling argument. By this logic, the
most reliable thing a CA can or should do is sign anything and everything
that comes from applicants, since any form of check or control is a
potentially frail operation that may fail.


> > Could you expand on what you mean by "cablint breaks" or "won't complete
> in
> > a timely fashion"? That doesn't match my understanding of what it is or
> how
> > it's written, so perhaps I'm misunderstanding what you're proposing?
>
> As I understand it, cablint is software, and software can break or be too
> slow. If miraculously cablint is never able to break or be too slow then I
> take that back, although as a programmer I would be interested to learn how
> that's done.


But that's an argument that applies to any change, particularly any
positive change, so it does not appear as a valid argument _against_
cablint.

That is, you haven't elaborated any concern that's _specific_ to
certlint/cablint, merely an abstract argument against change or process of
any form. And while I can understand that argument in the abstract -
certainly, every change introduces some degree of risk - we have plenty of
tools to manage and mitigate risk (code review, analysis, integration
tests, etc). Similarly, we can also assume that this is not a steady-state
of issues (that is, it is not to be expected that every week there will be
an 'outage' of the code), since, as code, it can and is continually fixed
and updated.

Since your argument applies to any form of measurement or checking for
requirements - including integrating checks directly into Boulder (for
example, as Let's Encrypt has done, through its dependency on ICU / IDNA
tables) - I'm not sure it's an argument against these checks and changes. I
was hoping you had more specific concerns, but it seems they're generic,
and as such, it still stands out as a good idea to integrate such tools
(and, arguably, prior to signing, as all CAs should do - by executing over
the tbsCertificate). An outage, while unlikely, should be managed like all
risk to the system, as the risk of misissuance (without checks) is arguably
greater than the risk of disruption (with checks)

Matthew Hardeman

unread,
Aug 11, 2017, 5:20:47 PM8/11/17
to mozilla-dev-s...@lists.mozilla.org
I see both sides on this matter.

On the one hand, certlint/cablint catches lots of obvious problems, mostly with ridiculous certificate profiles or manual special purpose issuances. Certainly, there's a lot of bad issuance that having it in the blocking path might help with...

but...

If one integrates a project like certlint/cablint into the cert issuance pipeline, one suddenly takes on supplemental responsibility for certlint's bugs or changes.

The pace of change in certlint, just glancing at the git commits, is not slow. New tests are added. Tests are revised.

I imagine from a security perspective, it would be possible to have a locked down system to which the proposed to be signed cert is scrubbed by certlint and the code that incorporates certlint only receives back a "don't issue because XYZ" signal and then halts issuance, but still, it's another system (or at a minimum another VM), a locked down communication path between the rest of the cert issuance chain and this element, etc.

Even still... anywhere along the way, Mr. Bowen could go totally evil (I seriously doubt this would happen) and decide that certlint should flag "E: This CA is run by nasty people" anytime Issuer CN contains 'Maligned CA X1'.

The vast majority of a DV certificate's contents are fully generated by the CA, with no actual input from the cert requestor. In fact, essentially only a few small booleans (want OCSP must-staple?) and the subject DNS names make it into the certificate. Indeed, it was in the subject DNS names that this issue arose.

Having said that, it occurs to me that if a CA put an external and unaudited tool in the issuance pipeline and it caused no problems, the best case is that it catches something correctly and never fails to catch something which does not conform. Still, let's say something clearly not BR compliant made it past certlint. I find it hard to believe that anyone is going to give the implementing CA much credit or benefit for having had certlint in place if it failed to catch a problem and that certificate issued.

Even if the community and root programs granted more than the default grain of grace to the implementing CA in the face of a misxssuance missed by the tool, would that grain of grace still be granted if the CA was nearly 4 weeks behind the upstreams HEAD and it can be shown at time of issuance that HEAD would have caught the issue? It seems reasonable to me that an implementing CA might want to add some buffer between the initial commit/merge and their opportunity to perform some manual review of the changes prior to incorporating into their issuance environment.

On the other hand, if it erroneously, accidentally, or maliciously prevented issuance of a compliant certificate or certificates, that definitely has a penalty for the CA in terms of performance as well as overall reliability.

This is especially true in rapidly evolving projects.

I note the most recent commit for certlint was mere days ago and was described as "Fix memory leak". It was probably a small matter, but the rate of change in that tool suggests that to get full advantage, there would be a continuous integration project.

How often would you need to pull in the upstream to get the community's grace upon something that makes it past certlint? How often would the CA need to do manual code review to make sure that someone hasn't managed to commit a routine that:

Fails as a hard error upon the following criteria:

1. Take a random between 1 and 200 (ultimately .5% odds). If 1, then continue with the following bogus test steps, else continue normal path.
2. Check to see if certificate issuer data would appear to map to a real, trusted CA, and if the validity period aligns with now, as with a real certificate about to be issued. If so, fail on "E: Requestor has teh gay."

Someone manages to get that into GitHub, you pull it in periodically but without sufficient scrutiny, run it through integration tests, and put it into production. Those rules were careful enough that static test cases would likely pass every time and yet in production, at a high volume CA, lots of customers get delays or no certificate AND (if someone were actually ballsy enough to just pass along whatever error message arose) think the CA must be homophobic.

As an aside, has anyone put much thought into how much damage one privileged individual at any of [ GitHub, Ruby Gems, or Sonatype ] could cause?

I certainly understand a DevOps philosophy, but I also don't fully understand how you reconcile the particular benefits and risks of rapid and continuous integration with the needs of a highly regulated environment.

Eric Mill

unread,
Aug 12, 2017, 10:52:51 PM8/12/17
to Matthew Hardeman, mozilla-dev-s...@lists.mozilla.org
On Fri, Aug 11, 2017 at 5:20 PM, Matthew Hardeman via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> If one integrates a project like certlint/cablint into the cert issuance
> pipeline, one suddenly takes on supplemental responsibility for certlint's
> bugs or changes.
>

That's the case for any source code Let's Encrypt uses that was written by
someone else. Like all software, there are third party dependencies in
there somewhere, whether closed source or open source. (In Let's Encrypt's
case, they are generally open source, which helps the team's ability to
review it.)


> The pace of change in certlint, just glancing at the git commits, is not
> slow. New tests are added. Tests are revised.
>

That's a good thing.

Even still... anywhere along the way, Mr. Bowen could go totally evil (I
> seriously doubt this would happen) and decide that certlint should flag "E:
> This CA is run by nasty people" anytime Issuer CN contains 'Maligned CA X1'.
>

You seem to be assuming that Let's Encrypt would just automatically pull
down new code into its critical issuance code path without review. I would
definitely not assume that. That code will be reviewed before deployment.


> It seems reasonable to me that an implementing CA might want to add some
> buffer between the initial commit/merge and their opportunity to perform
> some manual review of the changes prior to incorporating into their
> issuance environment.
>

Yes, of course.

This is a lot of time to spend discussing the basics of project dependency
management. There are definitely tradeoffs for Let's Encrypt to evaluate
when considering something like integrating certlint into the issuance
pipeline -- performance of the certlint tool, potential memory leaks, as
well as the operations necessary to support a hosted service that keeps
certlint in memory for rapid processing.

If certlint proves to be too slow or take too much memory, then an
integration push could either cause those issues to be fixed, or cause a
new tool to be written that performs the same checks certlint does (now
that the work has been done to map and isolate the BRs into specific
technical checks).

We should be understanding if engineering tradeoffs preclude immediate
integration, but we should not dismiss the idea of relying on "someone
else's code" in the issuance pipeline. I'm sure that's already the case for
every CA in operation today.

-- Eric


_______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>



--
konklone.com | @konklone <https://twitter.com/konklone>

Matt Palmer

unread,
Aug 13, 2017, 9:00:06 PM8/13/17
to dev-secur...@lists.mozilla.org
On Fri, Aug 11, 2017 at 06:32:11PM +0200, Kurt Roeckx via dev-security-policy wrote:
> On Fri, Aug 11, 2017 at 11:48:50AM -0400, Ryan Sleevi via dev-security-policy wrote:
> > On Fri, Aug 11, 2017 at 11:40 AM, Nick Lamb via dev-security-policy <
> > dev-secur...@lists.mozilla.org> wrote:
> >
> > > On Friday, 11 August 2017 14:19:57 UTC+1, Alex Gaynor wrote:
> > > > Given that these were all caught by cablint, has Let's Encrypt considered
> > > > integrating it into your issuance pipeline, or automatically monitoring
> > > > crt.sh (which runs cablint) for these issues so they don't need to be
> > > > caught manually by researchers?
> > >
> > > The former has the risk of being unexpectedly fragile,
> >
> >
> > Could you expand on this? It's not obvious what you mean.
> >
> >
> > > This way: If cablint breaks, or won't complete in a timely fashion during
> > > high volume issuance, it doesn't break the CA itself. But on the other hand
> > > it also doesn't wail on Comodo's generously offered public service crt.sh.
> > >
> >
> > Could you expand on what you mean by "cablint breaks" or "won't complete in
> > a timely fashion"? That doesn't match my understanding of what it is or how
> > it's written, so perhaps I'm misunderstanding what you're proposing?
>
> My understand is that it used to be very slow for crt.sh, but
> that something was done to speed it up. I don't know if that change
> was something crt.sh specific. I think it was changed to not
> always restart, but have a process that checks multiple
> certificates.

I suspect you're referring to the problem of certlint calling out to an
external program to do ASN.1 validation, which was fixed in
https://github.com/awslabs/certlint/pull/38. I believe the feedback from
Rob was that it did, indeed, do Very Good Things to certlint performance.

- Matt

Peter Bowen

unread,
Aug 13, 2017, 11:04:40 PM8/13/17
to Matt Palmer, dev-secur...@lists.mozilla.org
I just benchmarked the current cablint code, using 2000 certs from CT
as a sample. On a single thread of a Intel(R) Xeon(R) CPU E5-2670 v2
@ 2.50GHz, it processes 394.5 certificates per second. This is 2.53ms
per certificate or 1.4 million certificates per hour.

Thank you Matt for that patch! This was a _massive_ improvement over
the old design.

Thanks,
Peter

Rob Stradling

unread,
Aug 14, 2017, 4:22:38 PM8/14/17
to dev-secur...@lists.mozilla.org
On 11/08/17 16:40, Nick Lamb via dev-security-policy wrote:
> On Friday, 11 August 2017 14:19:57 UTC+1, Alex Gaynor wrote:
>> Given that these were all caught by cablint, has Let's Encrypt considered
>> integrating it into your issuance pipeline, or automatically monitoring
>> crt.sh (which runs cablint) for these issues so they don't need to be
>> caught manually by researchers?
>
> The former has the risk of being unexpectedly fragile, the latter puts a bunch of work on crt.sh which (even if Rob says it's OK) is a bit out of order.
>
> I would suggest that Let's Encrypt could automatically run cablint on some fraction of just issued certificates (where "some fraction" might be 100% if they've got the resources) and summarise the output for internal review. They're obliged to keep all the certificates they've just issued online by their own system design (ACME offers to deliver the certificate again if you ask for it) anyway.
>
> This way: If cablint breaks, or won't complete in a timely fashion during high volume issuance, it doesn't break the CA itself. But on the other hand it also doesn't wail on Comodo's generously offered public service crt.sh.
>
> Also, while on the subject I commend to any researchers at least as interested in the contents of the CT logs as myself the building of an actual Log Monitor of their own rather than relying on crt.sh. This is for several reasons, off the top of my head:
>
> 1. The Logs have anti-tamper features, but if only Comodo (and Google) look at the Logs themselves then we miss out on much real benefit from those features because we will never actually detect any tampering, we'd have to be told about it by someone we trust.

+1. I trust me, but it bothers me that everyone else has to trust me
too. :-)

Also, crt.sh doesn't yet verify the Merkle Tree stuff when fetching
entries from the logs.

> 2. The Logs are obliged to achieve reasonable performance to hit Google's targets and will accordingly have been built to be robust, Rob has put lots of effort into crt.sh but it definitely has... off days.

Such as today, during which Comodo has been the target of a DoS attack
that's affected many of our services (including crt.sh and our CT logs
:-( ).

--
Rob Stradling
Senior Research & Development Scientist
COMODO - Creating Trust Online
Reply all
Reply to author
Forward
0 new messages