2018.05.18 Let's Encrypt CAA tag value case sensitivity incident

jo...@letsencrypt.org

unread,

May 18, 2018, 1:00:31 PM5/18/18

to mozilla-dev-s...@lists.mozilla.org

At 12:45 UTC we received a report to our cert-pro...@letsencrypt.org contact address that Let’s Encrypt was improperly handling CAA records with mixed case tags, resulting in mis-issuance under the baseline requirements. Thanks to Corey Bonnell of TrustWave for the report.

RFC 6844 Section 5.1[0] says: “Matching of tag values is case insensitive.” Let’s Encrypt’s implementation of CAA record processing processed CAA tags case sensitively[1]. This led to CAA tags that were not lowercase being ignored during CAA validation.

The problem was quickly confirmed, and a fix was developed and reviewed[2], with tests, by 13:28 UTC - under an hour from the initial report. We deployed the fix to our staging environment at 14:37 UTC. We disabled issuance of new certificates in our production environment at 14:45 UTC, to prevent additional misissuance from occurring. We deployed the fix to our production environment at 15:20 UTC.

Our logging of the CAA records processed does not provide the case information we need to determine whether other issuances were affected by this bug. We plan to perform these two post-incident remediation items to start with:

1. Improving the CAA validation logging in Boulder[3] to log CAA records prior to our processing.

2. Performing a scan of current CAA records for the domain names we have issued for in the past 90 days, specifically looking for tags in CAA records with non-lowercase characters. We’ll examine such instances on a case-by-case basis to determine the appropriate action.

The original reporter identified one certificate (https://crt.sh/?id=469407542) that was issued in violation of the CAA RFC as part of their testing. We have revoked this certificate as of 15:41 UTC.

[0] - https://tools.ietf.org/html/rfc6844#section-5.1

[1] - https://github.com/letsencrypt/boulder/blob/9990d14654661736a6ee6dc1520f605d0896c72d/va/caa.go#L82-L100

[2] - https://github.com/letsencrypt/boulder/pull/3722

[3] - https://github.com/letsencrypt/boulder/issues/3724

Jonathan Rudenberg

unread,

May 18, 2018, 1:45:54 PM5/18/18

to jo...@letsencrypt.org, mozilla-dev-s...@lists.mozilla.org

On Fri, May 18, 2018, at 13:00, josh--- via dev-security-policy wrote:
> 2. Performing a scan of current CAA records for the domain names we have
> issued for in the past 90 days, specifically looking for tags in CAA
> records with non-lowercase characters. We’ll examine such instances on a
> case-by-case basis to determine the appropriate action.

Do you log the full CAA record set (if any) for each authorized domain? If so, can you use those instead of the "current" records to find potentially unauthorized issuance?

Jonathan Rudenberg

unread,

May 18, 2018, 1:49:00 PM5/18/18

to jo...@letsencrypt.org, mozilla-dev-s...@lists.mozilla.org

Oops, I missed item 1, disregard :)

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy

Tim Hollebeek

unread,

May 18, 2018, 1:52:25 PM5/18/18

to jo...@letsencrypt.org, mozilla-dev-s...@lists.mozilla.org

> Our logging of the CAA records processed does not provide the case
> information we need to determine whether other issuances were affected by
> this bug.

We put a requirement in the BRs specifically so this problem could not occur:

"The CA SHALL log all actions taken, if any, consistent with its processing
practice."

-Tim

jacob.hoff...@gmail.com

unread,

May 18, 2018, 2:04:46 PM5/18/18

to mozilla-dev-s...@lists.mozilla.org

To be clear, we do log every CAA lookup (https://github.com/letsencrypt/boulder/blob/master/va/caa.go#L47). However, we do it at too high a level of abstraction: It doesn't contain the unprocessed return values from DNS. We plan to improve that as part of our remediation.

Our ideal would be to log all DNS traffic associated with each issuance, including A, AAAA, TXT, and CAA lookups. We initially experimented with this by capturing the full verbose output from our recursive resolver, but concluded that it was not usable for investigations because it was not possible to associate specific query/response pairs with the validation request that caused them (for instance, consider NS referrals, CNAME indirection, and caching). I think this is definitely an area of improvement we could pursue in the DNS ecosystem that would be particularly beneficial for CAs.

Tim Hollebeek

unread,

May 21, 2018, 6:56:52 AM5/21/18

to jacob.hoff...@gmail.com, mozilla-dev-s...@lists.mozilla.org

Ok. My biggest concern is not you guys, who are pretty security conscious,
but whether we need to improve the language to make it more clear that the
logging has to be sufficient so that in the event of a bug in the CAA logic,
it is possible to determine which issued certificates are affected and how.

It may be hard to come up with such language, but that was the intent of the
language that was currently there, and if it failed to adequately express
that and needs improvement, we should consider improvements.

-Tim

> -----Original Message-----
> From: dev-security-policy [mailto:dev-security-policy-
> bounces+tim.hollebeek=digice...@lists.mozilla.org] On Behalf Of
> jacob.hoffmanandrews--- via dev-security-policy
> Sent: Friday, May 18, 2018 2:05 PM
> To: mozilla-dev-s...@lists.mozilla.org
> Subject: Re: 2018.05.18 Let's Encrypt CAA tag value case sensitivity
incident
>
> On Friday, May 18, 2018 at 10:52:25 AM UTC-7, Tim Hollebeek wrote:

> To be clear, we do log every CAA lookup

> (https://clicktime.symantec.com/a/1/3HdZcXUFLJSV752s3qQoA0A6fzGR2WGY
> aa8Vb4eW0is=?d=2Gn0FYgiBMMDYQjPk2an9e5zCmdH8aOEM_a2k8A8ew7ArD
> v0URhjtIEPzgzNAA47eRfCIlwMe3ctM0pXRF0VTUqLXosrX-
> i7uR64LKqy873Aqy3Mii7JCWLQHOPpQWcNp3FWnBu624ZZQANcMTNtqbgJea
> RmalbiW1vABzoOte0IZNRfmkmQES8Nr67RP515OPIifYcBpDbj7_SzCddoRw_Im
> KUgkD70LCvR8NLdXBfk2_bpdPsIPd2MYiWXCpp3qWI_1_XQ9z_eyC1QGzTtcxOF
> DLgSe4rRoyLJQqTaoooPKFGFUX_3SIzP6bjz_SEXUqSWbBz7XRVk1YrZczQFl1NM
> N2BdjOE5nsDTre28cQDZNQ-1dOqbirW3-
> CbCQwcvVjIQBfy3i8vCqAUh4xoVlvk16SNfyCeF3pFZYJ_TtcaaO9Tr8cUp9RHfdwC
> 20jfPFtyRHXscZwhVP2Lfucn9JLErK7kbSczQrqe3GrqCICQf27hRDOnBq5_C&u=ht
> tps%3A%2F%2Fgithub.com%2Fletsencrypt%2Fboulder%2Fblob%2Fmaster%2Fv
> a%2Fcaa.go%23L47). However, we do it at too high a level of abstraction:

It
> doesn't contain the unprocessed return values from DNS. We plan to improve
> that as part of our remediation.
>
> Our ideal would be to log all DNS traffic associated with each issuance,
> including A, AAAA, TXT, and CAA lookups. We initially experimented with
this
> by capturing the full verbose output from our recursive resolver, but
concluded
> that it was not usable for investigations because it was not possible to
associate
> specific query/response pairs with the validation request that caused them
(for
> instance, consider NS referrals, CNAME indirection, and caching). I think
this is
> definitely an area of improvement we could pursue in the DNS ecosystem
that
> would be particularly beneficial for CAs.

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://clicktime.symantec.com/a/1/OveGoqfqvlk5eSNt6tWZIf0e1XY5TBocWaY
> xmYcWV4s=?d=2Gn0FYgiBMMDYQjPk2an9e5zCmdH8aOEM_a2k8A8ew7ArDv0
> URhjtIEPzgzNAA47eRfCIlwMe3ctM0pXRF0VTUqLXosrX-
> i7uR64LKqy873Aqy3Mii7JCWLQHOPpQWcNp3FWnBu624ZZQANcMTNtqbgJea
> RmalbiW1vABzoOte0IZNRfmkmQES8Nr67RP515OPIifYcBpDbj7_SzCddoRw_Im
> KUgkD70LCvR8NLdXBfk2_bpdPsIPd2MYiWXCpp3qWI_1_XQ9z_eyC1QGzTtcxOF
> DLgSe4rRoyLJQqTaoooPKFGFUX_3SIzP6bjz_SEXUqSWbBz7XRVk1YrZczQFl1NM
> N2BdjOE5nsDTre28cQDZNQ-1dOqbirW3-
> CbCQwcvVjIQBfy3i8vCqAUh4xoVlvk16SNfyCeF3pFZYJ_TtcaaO9Tr8cUp9RHfdwC
> 20jfPFtyRHXscZwhVP2Lfucn9JLErK7kbSczQrqe3GrqCICQf27hRDOnBq5_C&u=ht
> tps%3A%2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy

Nick Lamb

unread,

May 21, 2018, 9:07:19 AM5/21/18

to mozilla-dev-s...@lists.mozilla.org, jacob.hoff...@gmail.com

As a lowly relying party, I have to say I'd expect better here.

In particular, if example.com says their DNSSEC signed CAA forbade Let's Encrypt from issuing, and Let's Encrypt says otherwise, I absolutely would expect Let's Encrypt to produce DNSSEC signed RRs that match up to their story. The smoking gun for such scenarios exists, and CAs are, or should be, under no illusions that it's their job to produce it.

A log entry that says "CAA: check OK" is worthless for exactly the reason this thread exists, record the RRs themselves, byte for byte.

We've seen banks taking this sort of shortcut in the past and it did them no favour with me. I want to see the EMV transaction signature that proves a correct PIN was used, not a blurry print from some mainframe with an annotation that says "A 4 in this column indicates PIN confirmed".

Ryan Sleevi

unread,

May 21, 2018, 9:59:42 AM5/21/18

to Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

On Mon, May 21, 2018 at 9:06 AM, Nick Lamb via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> As a lowly relying party, I have to say I'd expect better here.
>
> In particular, if example.com says their DNSSEC signed CAA forbade Let's
> Encrypt from issuing, and Let's Encrypt says otherwise, I absolutely would
> expect Let's Encrypt to produce DNSSEC signed RRs that match up to their
> story. The smoking gun for such scenarios exists, and CAs are, or should
> be, under no illusions that it's their job to produce it.
>

Given the TTLs and the key sizes in use on DNSSEC records, why do you
believe this?

Matthew Hardeman

unread,

May 21, 2018, 11:23:07 AM5/21/18

to Nick Lamb, mozilla-dev-security-policy, jacob.hoff...@gmail.com

I concur with Mr. Lamb's position.

I agree not only with respect to DNSSEC signatures but to the entire query
and RR set upon which the CAs decisions relied.

I do acknowledge the challenge that Mr. Hoffman-Andrews surfaced: that it
may involve significant effort to correlate the various queries and
responses which underpin the higher level queries that the CA software
makes to their recursive resolver.

On Mon, May 21, 2018 at 8:06 AM, Nick Lamb via dev-security-policy <

dev-secur...@lists.mozilla.org> wrote:

> As a lowly relying party, I have to say I'd expect better here.
>
> In particular, if example.com says their DNSSEC signed CAA forbade Let's
> Encrypt from issuing, and Let's Encrypt says otherwise, I absolutely would
> expect Let's Encrypt to produce DNSSEC signed RRs that match up to their
> story. The smoking gun for such scenarios exists, and CAs are, or should
> be, under no illusions that it's their job to produce it.
>

> A log entry that says "CAA: check OK" is worthless for exactly the reason
> this thread exists, record the RRs themselves, byte for byte.
>
> We've seen banks taking this sort of shortcut in the past and it did them
> no favour with me. I want to see the EMV transaction signature that proves
> a correct PIN was used, not a blurry print from some mainframe with an
> annotation that says "A 4 in this column indicates PIN confirmed".
>

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-security-policy
>
>

Eric Mill

unread,

May 21, 2018, 6:00:12 PM5/21/18

to Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

On Mon, May 21, 2018, 9:07 AM Nick Lamb via dev-security-policy <

dev-secur...@lists.mozilla.org> wrote:

> As a lowly relying party, I have to say I'd expect better here.
>
> In particular, if example.com says their DNSSEC signed CAA forbade Let's
> Encrypt from issuing, and Let's Encrypt says otherwise, I absolutely would
> expect Let's Encrypt to produce DNSSEC signed RRs that match up to their
> story. The smoking gun for such scenarios exists, and CAs are, or should
> be, under no illusions that it's their job to produce it.
>
> A log entry that says "CAA: check OK" is worthless for exactly the reason
> this thread exists, record the RRs themselves, byte for byte.
>

FWIW, I don't think Let's Encrypt was saying that they just store "OK", but
that they store a parsed representation of the record rather than the raw
RR.

I understand you're asking for the raw RR to be stored, and that seems like
a reasonable request and (especially now) a reasonable interpretation of
the BRs. But your message might have left the impression that LE only
stored an OK/not-OK bit, which isn't how I read LE's email or code.

Tim Hollebeek

unread,

May 22, 2018, 11:18:07 AM5/22/18

to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

> Given the TTLs and the key sizes in use on DNSSEC records, why do you
believe
> this?

DigiCert is not sympathetic to disk space as a reason to not keep sufficient
information
in order to detect misissuance due to CAA failures.

In fact, inspired by this issue, we are taking a look internally at what we
log, and
considering the feasibility of logging even more information, including full
DNSSEC
signed RRs.

-Tim

Ryan Sleevi

unread,

May 22, 2018, 12:07:42 PM5/22/18

to Tim Hollebeek, ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

On Tue, May 22, 2018 at 11:17 AM, Tim Hollebeek <tim.ho...@digicert.com>
wrote:

Hi Tim,

I'm not sure why you mentioned disk space - could you help me understand
why you brought that up?

It doesn't actually seem to respond to the question - which is the TTLs and
key sizes of DNSSEC records affect both the verifiability of such
information and its ability to be used for non-repudiation (which is
ostensibly the goal of such record keeping)

I think your response presently rather severely misunderstands DNSSEC or
what the implications of what you propose mean, but I look forward to
DigiCert actually sharing what it proposes to do, so that the community can
discuss whether it reasonably achieves those goals with a proposed
implementation. Otherwise, we are arguably no different from where we are
today, in which CAs do what they believe is reasonable for one purpose, but
perhaps fail to achieve that in light of potential risks.

Tim Hollebeek

unread,

May 22, 2018, 12:14:13 PM5/22/18

to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

What precisely was the antecedent of “this” in your message? Re-reading it, I’m not clear which sentence you were referring to.

The only reasons I can think of for not keeping DNSSEC signed RRs are storage and/or performance, and we think those concerns should not be the driving force in logging requirements (within reason).

Are there other good reasons not to keep the DNSSEC signed RRs associated with DNSSEC CAA lookups?

-Tim

Ryan Sleevi

unread,

May 22, 2018, 12:43:54 PM5/22/18

to Tim Hollebeek, ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

On Tue, May 22, 2018 at 12:14 PM, Tim Hollebeek <tim.ho...@digicert.com>
wrote:

I believe you are operating on a flawed understanding of the value of
DNSSEC for forensic purposes, given the statement that "I absolutely would

expect Let's Encrypt to produce DNSSEC signed RRs that match up to their
story. The smoking gun for such scenarios exists, and CAs are, or should
be, under no illusions that it's their job to produce it."

To me, this demonstrates a flawed, naive understanding of DNSSEC, and in
particular, its value in forensic post-issuance claims, and also a flawed
understanding about how DNS works, in a way that, as proposed, would be
rather severely damaging to good operation and expected use of DNS. While
it's easy to take shots on the basis of this, or to claim that the only
reason not to store is because disk space, it would be better to take a
step back before making those claims.

DNSSEC works as short-lived signatures, in which the proper operation of
DNSSEC is accounted for through frequent key rotation. DNS works through
relying on factors such as TTLs to serve as effective safeguards against
overloading the DNS system, and its hierarchal distribution allows for
effective scaling of that system.

A good primer to DNSSEC can be had at
https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ , although I'm sure
many other introductory texts would suffice to highlight the problem.

Let us start with a naive claim that the CA should be able to produce the
entire provenance chain for the DNSSEC-signed leaf record. This would be
the chain of KSKs, ZSKs, the signed RRSets, as well as the DS records,
disabling caching for all of these (or, presumably, duplicating it such
that the .com KSK and ZSK are recorded for millions of certs).

However, what does this buy us? Considering that the ZSKs are intentionally
designed to be frequently rotated (24 - 72 hours), thus permitting weaker
key sizes (RSA-512), a provenance chain ultimately merely serves to
establish, in practice, one of a series of 512-bit RSA signatures. Are we
to believe that these 512-bit signatures, on whose keys have explicitly
expired, are somehow a smoking gun? Surely not, that'd be laughably
ludicrous - and yet that is explicitly what you propose in the quoted text.

So, again I ask, what is it you're trying to achieve? Are you trying to
provide an audit trail? If so, what LE did is fully conformant with that,
and any CA that wishes to disagree should look inward, and see whether
their audit trail records actual phone calls (versus records of such phone
calls), whether their filing systems store the actual records (versus
scanned copies of those records), whether all mail is delivered certified
delivery, and how they recall the results of that certified delivery.

However, let us not pretend that recording the bytes-on-the-wire DNS
responses, including for DNSSEC, necessarily helps us achieve some goal
about repudiation. Rather, it helps us identify issues such as what LE
highlighted - a need for quick and efficient information scanning to
discover possible impact - which is hugely valuable in its own right, and
is an area where I am certain that a majority of CAs are woefully lagging
in. That LE recorded this at all, beyond simply "checked DNS", is more of a
credit than a disservice, and a mitigating factor more than malfeasance.

Tim Hollebeek

unread,

May 22, 2018, 12:51:43 PM5/22/18

to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

What that wall of text completely misses is the point I and others have been trying to make.

The logs have to have enough information so you don’t end up in the situation Let’s Encrypt is currently, and unfortunately, in. Yes, what they did is compliant, and that’s exactly what most concerns me. It’s not about Let’s Encrypt, which just appears to have made a mistake, it happens. It’s about whether the rules need to be improved to reduce the likelihood of another CA ending up in the same situation.

As a separate issue, we’re looking into making sure we never end up in that situation, and as you say, other CAs should be too. We always reserve the right to do things that vastly exceed minimal compliance.

That should be something you should support, instead of producing increasingly long and condescending walls of text. I know how DNSSEC works.

-Tim

From: Ryan Sleevi [mailto:ry...@sleevi.com]
Sent: Tuesday, May 22, 2018 12:43 PM
To: Tim Hollebeek <tim.ho...@digicert.com>
Cc: ry...@sleevi.com; Nick Lamb <n...@tlrmx.org>; mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>; Jacob Hoffman-Andrews <jacob.hoff...@gmail.com>
Subject: Re: 2018.05.18 Let's Encrypt CAA tag value case sensitivity incident

Paul Wouters

unread,

May 22, 2018, 1:04:31 PM5/22/18

to ry...@sleevi.com, mozilla-dev-security-policy

On Tue, 22 May 2018, Ryan Sleevi via dev-security-policy wrote:

> However, what does this buy us? Considering that the ZSKs are intentionally
> designed to be frequently rotated (24 - 72 hours), thus permitting weaker
> key sizes (RSA-512),

I don't know anyone who believes or uses these timings or key sizes. It
might be done as an _attack_ but it would be a very questionable
deployment.

I know of 12400 512 bit RSA ZSK's in a total of about 6.5 million. And I
consider those to be an operational mistake.

> However, let us not pretend that recording the bytes-on-the-wire DNS
> responses, including for DNSSEC, necessarily helps us achieve some goal
> about repudiation. Rather, it helps us identify issues such as what LE
> highlighted - a need for quick and efficient information scanning to
> discover possible impact - which is hugely valuable in its own right, and
> is an area where I am certain that a majority of CAs are woefully lagging
> in. That LE recorded this at all, beyond simply "checked DNS", is more of a
> credit than a disservice, and a mitigating factor more than malfeasance.

I see no reason why not to log the entire chain to the root. The only
exception being maliciously long chains, which you can easilly cap
and error out on after following about 50 DS records?

Paul

Ryan Sleevi

unread,

May 22, 2018, 1:24:44 PM5/22/18

to Tim Hollebeek, ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

I think your position is both misguided and unrealistic, and I think you're
using it as an opportunity to pursue a technical issue, when there is both
a systemic issue at play (and can be demonstrated through all validation
methods) and a fundamental misunderstanding as to the value it would
provided.

I'm glad you find the technical explanation a wall of text. Bad ideas do
take disproportionately more effort to shoot down then they do to propose.
That is an unfortunate asymmetric cost, and one the Web PKI has had to bear
for quite some time.

I'm not opposed to systemic improvements. I am opposed to unnecessary
grand-standing and hand-wringing, when demonstrably worse things are
practiced.

On Tue, May 22, 2018 at 12:51 PM, Tim Hollebeek <tim.ho...@digicert.com>
wrote:

> What that wall of text completely misses is the point I and others have
> been trying to make.
>
>
>
> The logs have to have enough information so you don’t end up in the
> situation Let’s Encrypt is currently, and unfortunately, in. Yes, what
> they did is compliant, and that’s exactly what most concerns me. It’s not
> about Let’s Encrypt, which just appears to have made a mistake, it
> happens. It’s about whether the rules need to be improved to reduce the
> likelihood of another CA ending up in the same situation.
>
>
>
> As a separate issue, we’re looking into making sure we never end up in
> that situation, and as you say, other CAs should be too. We always reserve
> the right to do things that vastly exceed minimal compliance.
>
>
>
> That should be something you should support, instead of producing
> increasingly long and condescending walls of text. I know how DNSSEC works.
>
>
>
> -Tim
>
>
>

> *From:* Ryan Sleevi [mailto:ry...@sleevi.com]
> *Sent:* Tuesday, May 22, 2018 12:43 PM
> *To:* Tim Hollebeek <tim.ho...@digicert.com>
> *Cc:* ry...@sleevi.com; Nick Lamb <n...@tlrmx.org>;

> mozilla-dev-security-policy <mozilla-dev-s...@lists.mozilla.org>;
> Jacob Hoffman-Andrews <jacob.hoff...@gmail.com>

> *Subject:* Re: 2018.05.18 Let's Encrypt CAA tag value case sensitivity

> incident
>
>
>
>
>
>
>
> On Tue, May 22, 2018 at 12:14 PM, Tim Hollebeek <

> However, what does this buy us? Considering that the ZSKs are
> intentionally designed to be frequently rotated (24 - 72 hours), thus

> permitting weaker key sizes (RSA-512), a provenance chain ultimately merely
> serves to establish, in practice, one of a series of 512-bit RSA
> signatures. Are we to believe that these 512-bit signatures, on whose keys
> have explicitly expired, are somehow a smoking gun? Surely not, that'd be
> laughably ludicrous - and yet that is explicitly what you propose in the
> quoted text.
>
>
>
> So, again I ask, what is it you're trying to achieve? Are you trying to
> provide an audit trail? If so, what LE did is fully conformant with that,
> and any CA that wishes to disagree should look inward, and see whether
> their audit trail records actual phone calls (versus records of such phone
> calls), whether their filing systems store the actual records (versus
> scanned copies of those records), whether all mail is delivered certified
> delivery, and how they recall the results of that certified delivery.
>
>
>

Ryan Sleevi

unread,

May 22, 2018, 1:27:16 PM5/22/18

to Paul Wouters, ry...@sleevi.com, mozilla-dev-security-policy

On Tue, May 22, 2018 at 1:03 PM, Paul Wouters <pa...@nohats.ca> wrote:

> On Tue, 22 May 2018, Ryan Sleevi via dev-security-policy wrote:
>

> However, what does this buy us? Considering that the ZSKs are intentionally
>> designed to be frequently rotated (24 - 72 hours), thus permitting weaker
>> key sizes (RSA-512),
>>
>

> I don't know anyone who believes or uses these timings or key sizes. It
> might be done as an _attack_ but it would be a very questionable
> deployment.
>
> I know of 12400 512 bit RSA ZSK's in a total of about 6.5 million. And I
> consider those to be an operational mistake.

http://tma.ifip.org/wordpress/wp-content/uploads/2017/06/tma2017_paper58.pdf
has some fairly damning empirical data about the reliability of those
records, which is not in line with your anecdata.

>
>
> However, let us not pretend that recording the bytes-on-the-wire DNS
>> responses, including for DNSSEC, necessarily helps us achieve some goal
>> about repudiation. Rather, it helps us identify issues such as what LE
>> highlighted - a need for quick and efficient information scanning to
>> discover possible impact - which is hugely valuable in its own right, and
>> is an area where I am certain that a majority of CAs are woefully lagging
>> in. That LE recorded this at all, beyond simply "checked DNS", is more of
>> a
>> credit than a disservice, and a mitigating factor more than malfeasance.
>>
>

> I see no reason why not to log the entire chain to the root. The only
> exception being maliciously long chains, which you can easilly cap
> and error out on after following about 50 DS records?

"Why not" is not a very compelling argument, especially given the
complexity involved, and the return to value being low (and itself being
inconsistent with other matters)

vduk...@gmail.com

unread,

May 22, 2018, 1:32:51 PM5/22/18

to mozilla-dev-s...@lists.mozilla.org

On Tuesday, May 22, 2018 at 1:04:31 PM UTC-4, Paul Wouters wrote:
> On Tue, 22 May 2018, Ryan Sleevi via dev-security-policy wrote:
>
> > However, what does this buy us? Considering that the ZSKs are intentionally
> > designed to be frequently rotated (24 - 72 hours), thus permitting weaker
> > key sizes (RSA-512),
>
> I don't know anyone who believes or uses these timings or key sizes. It
> might be done as an _attack_ but it would be a very questionable
> deployment.
>
> I know of 12400 512 bit RSA ZSK's in a total of about 6.5 million. And I
> consider those to be an operational mistake.

These are "legacy" zones where ~3 operators are having some trouble getting better keys in place, but the swamp is slowly getting drained, a few months back the total was ~12900, out of a smaller overall total. ZSKs are predominantly 1024-bit, with a noticeably large minority using 1280 bits. Latest stats:

https://lists.dns-oarc.net/pipermail/dns-operations/2018-May/017628.html

vduk...@gmail.com

unread,

May 22, 2018, 1:56:28 PM5/22/18

to mozilla-dev-s...@lists.mozilla.org

As for ZSK lifetime, among still extant domains the average (last seen - first seen) time of no longer published ZSKs is 59 days. This is strongly indicative of a 60-day cycle at the larger DNSSEC-hosting providers. The sample size is "5187051" retired ZSKs.

The standard deviation is 34 days. So we can estimate that most ZSKs are rotated in 30-90 days. The sample size is ~5.2 million domains.

Matthew Hardeman

unread,

May 22, 2018, 4:50:02 PM5/22/18

to mozilla-dev-security-policy

Copying message accidentally sent directly to a list participant.

---------- Forwarded message ----------
From: Matthew Hardeman <mhar...@gmail.com>
Date: Tue, May 22, 2018 at 3:47 PM
Subject: Re: 2018.05.18 Let's Encrypt CAA tag value case sensitivity
incident

To: Ryan Sleevi <ry...@sleevi.com>

Ultimately it seems reasonable to, as Mr. Lamb suggested, log the DNS
result set such that it is possible to reproduce the point-in-time signed
confirmation of the CAA record (or signed non-existence) to within the
constraints of the DNSSEC mechanisms available and provisioned for the
zone, following the delegations down from the root zone.

Are there badly chosen keys and TTLs out there today in practice? Yes, but
they're a decreasing proportion of zones. Some TLDs are aggressively
deploying DNSSEC and encouraging proper implementation.

There's an opportunity here to provide for a best effort cryptographic
proof to within the boundaries of what the domain holder has configured.
Why not match the domain holder's effort with proportionate supporting
documentation?

> https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ , although I'm

> sure
> many other introductory texts would suffice to highlight the problem.
>
> Let us start with a naive claim that the CA should be able to produce the
> entire provenance chain for the DNSSEC-signed leaf record. This would be
> the chain of KSKs, ZSKs, the signed RRSets, as well as the DS records,
> disabling caching for all of these (or, presumably, duplicating it such
> that the .com KSK and ZSK are recorded for millions of certs).
>

> However, what does this buy us? Considering that the ZSKs are intentionally
> designed to be frequently rotated (24 - 72 hours), thus permitting weaker

> key sizes (RSA-512), a provenance chain ultimately merely serves to
> establish, in practice, one of a series of 512-bit RSA signatures. Are we
> to believe that these 512-bit signatures, on whose keys have explicitly
> expired, are somehow a smoking gun? Surely not, that'd be laughably
> ludicrous - and yet that is explicitly what you propose in the quoted text.
>
> So, again I ask, what is it you're trying to achieve? Are you trying to
> provide an audit trail? If so, what LE did is fully conformant with that,
> and any CA that wishes to disagree should look inward, and see whether
> their audit trail records actual phone calls (versus records of such phone
> calls), whether their filing systems store the actual records (versus
> scanned copies of those records), whether all mail is delivered certified
> delivery, and how they recall the results of that certified delivery.
>

> However, let us not pretend that recording the bytes-on-the-wire DNS
> responses, including for DNSSEC, necessarily helps us achieve some goal
> about repudiation. Rather, it helps us identify issues such as what LE
> highlighted - a need for quick and efficient information scanning to
> discover possible impact - which is hugely valuable in its own right, and
> is an area where I am certain that a majority of CAs are woefully lagging
> in. That LE recorded this at all, beyond simply "checked DNS", is more of a
> credit than a disservice, and a mitigating factor more than malfeasance.

Nick Lamb

unread,

May 22, 2018, 7:25:02 PM5/22/18

to ry...@sleevi.com, mozilla-dev-security-policy

On 21 May 2018 14:59, Ryan Sleevi <ry...@sleevi.com> wrote:

Given the TTLs and the key sizes in use on DNSSEC records, why do you believe this?

This is a smoking gun because it's extremely strong circumstantial evidence. Why else would these records exist except that in fact the "victim" published these DNS records at the time of (or shortly before) issuance?

As with a real smoking gun there certainly could be other explanations, but the most obvious (that these were the genuine query answers) will usually be correct.

If the reality is that fake records were supplied by a MitM using cracked 512 bit keys in order to fool the CA, the name owner victim is humiliated perhaps but they can take action to secure their names with a better key in future. And the Ecosystem gets a free warning as to the safety (rather otherwise) of short keys.

If we suppose the CA systematically produced these fake records afterwards to justify a mis-issuance I'd say that's quite a credibility jump from the level of shenanigans we've gotten used to from CAs and it depends upon their victim having a short key for it to even be possible.

These both sound like reasons to increase RSA keylengths for any names that are important for you, not justifications for inadequate logging.

Tim Hollebeek

unread,

May 23, 2018, 11:30:11 AM5/23/18

to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

You’re free to misattribute whatever motives you want to me. They’re not true. In fact, I would like to call on you yet again to cease speculating and imputing malicious motives onto well-intentioned posts.

The CAA logging requirements failed in this instance. How do we make them better? I’ll repeat that this isn’t a criticism of Let’s Encrypt, other than they had a bug like many of us have. Mozilla wants this to be a place where we can reflect on incidents and improve requirements.

I’m not looking for something that is full cryptographic proof, that’s can’t be made to work. What are the minimum logging requirements so that CAA logs can be used to reliably identify affected certificates when CAA bugs happen? That’s the discussion going on internally here. Love to hear other thoughts on this issue.

Also, we’re trying to be increasingly transparent about what goes on at DigiCert. I believe we’re the only CA that publishes what we will deliver *next* sprint. I would actually like to share much MORE information than we currently do, and have authorization to do so, but the current climate is not conducive to that.

The fact that I tend to get attacked in response to my sharing of internal thinking and incomplete ideas is not helpful or productive. It will unfortunately just cause us to have to stop being as transparent.

-Tim

Ryan Sleevi

unread,

May 23, 2018, 11:49:11 AM5/23/18

to Tim Hollebeek, ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

Tim,

I definitely think we've gone off the rails here, so I want to try to right
the cart here. You jumped in on a thread talking about DNSSEC providing
smoking guns [1] - which is a grandstanding bad idea. It wasn't yours, but
it's one that you jumped into the middle of the discussion, and began
offering other interpretations (such as it being about disk space [2]),
when the concern was precisely about trying to find a full cryptographic
proof that can be stable over the lifetime of the certificate - which for
Let's Encrypt is 90 days, but for some CAs, is up to 825-days [3].

As a systemic improvement, I think we're in violent agreement about the
goal - which is to make sure that when things go wrong, there are reliable
ways to identify where and why they went wrong - and perhaps simply in
disagreement on the means and ways to effect that. You posited that the
original motivation was that this specifically could not occur - but I
don't think that was actually shared or expressed, precisely because there
were going to be inherent limits to that information. I provided examples
of where and how, under the existing BRs, that the steps taken are both
consistent with and, arguably, above and beyond, what is required elsewhere
- which is not to say we should not strive for more, but is to put down the
notion from (other) contributors that somehow there's been less here.

I encouraged you to share more of your thinking, precisely because this is
what allows us to collectively evaluate the fitness for purpose [4] - and
the potential risks that well-intentioned changes can pose [5]. I don't
think it makes sense to anchor on the CAA aspect as the basis to improve
[6], when the real risk is the validation methods themselves. If our intent
is to provide full data for diagnostic purposes, then how far does that
rabbit hole go - do HTTP file-based validations need to record their DNS
lookup chains? Their IP flows? Their BGP peer broadcasts? The question of
this extreme rests on what is it we're trying to achieve - and the same
issue here (namely, CAA being misparsed) could just as equally apply to
HTTP streams, to WHOIS dataflows, or to BGP peers.

That's why I say it's systemic, and why I say that we should figure out
what it is we're trying to achieve - and misguided framing [1] does not
help further that.

[1]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/7L2_zfgfCwAJ
[2]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/gUT3t7B1CwAJ
[3]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/O7QTGmInCwAJ
[4]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/juHBkWV4CwAJ
[5]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/O5rwCV96CwAJ
[6]
https://groups.google.com/d/msg/mozilla.dev.security.policy/7AcHi_MgKWE/lpU2dpl8CwAJ

Tim Hollebeek

unread,

May 23, 2018, 12:04:31 PM5/23/18

to ry...@sleevi.com, Nick Lamb, mozilla-dev-security-policy, Jacob Hoffman-Andrews

Right, this is a fair and excellent summary, and there are things I would improve about my responses if I had access to a time machine. Constraints on my time are pretty brutal right now, and that does not always allow me to express myself as well as I would like.

I perceived, possibly incorrectly, a hesitation that adding at least some information about DNSSEC lookups would blow up the size of log files and would be difficult at scale. Our discussion internally reached the conclusion that we’re supportive of requiring even more extensive CAA logging, even if it is expensive. At Let’s Encrypt’s scale and our scale, that’s an important concern, and we think it should be publicly discussed (Comodo’s perspective would be interesting too). So that’s what I was thinking and ended up saying really, really badly.

Your discussion here is excellent and worthy of a longer term discussion. I was thinking more along the lines of “are there any appropriate quick fixes we might want to consider?” The answer may be no. But I do find it dangerous that minimal compliance with the current requirement can lead to situations like this. That alone makes me want to improve the requirement.

And while I’m on the subject, since it’s related: Jeremy and I do have a new policy of trying to err on the side of publicly oversharing internal information and deliberations, whenever we can. We think it’s the right thing to do.

-Tim

dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-security-policy

Paul Wouters

unread,

May 23, 2018, 12:26:26 PM5/23/18

to Ryan Sleevi, mozilla-dev-security-policy

On Tue, 22 May 2018, Ryan Sleevi wrote:

> I know of 12400 512 bit RSA ZSK's in a total of about 6.5 million. And I
> consider those to be an operational mistake.
>

> http://tma.ifip.org/wordpress/wp-content/uploads/2017/06/tma2017_paper58.pdf has some fairly damning empirical data about the reliability of those
> records, which is not in line with your anecdata.

My "anecdata" is Viktor Dukhovni's constant monitoring of all known
DNSSEC zones. It's data is current to a few days. The article you
quote used data up to Jan 2017, so is 1.5 years old. Still, it only
listed 275k out of 7M ZSK's being 512 bit RSA. I suspect it to be
due to one or a few providers who used to do that, but clearly no
longer do this since the current total is far lower at 12k out of
roughly the same sample size of 7M. Calling this data anecdata
isn't going to change anything other then my professional opinion
of you.

512 bit RSA keys was never a real thing in DNSSEC. I packaged up the
earliest versions of DNSSEC software for RHEL/CentOS/Fedora and it
never did anything less then 1024 for ZSKs, which was changed to 2048
bit on Mar 27 2014. KSK's were always 2048. I'm pretty sure the Debian
and Ubuntu packagers also didn't reduce the default upstream ZSK
keysizes to 512 bit.

But I'd love to read your research on where people advised you to roll
the ZSK at "24 - 72 hours" intervals. Do you have any links to
presentations given at any technology conference like DNS-OARC, RIPE,
IETF or ICANN?

> I see no reason why not to log the entire chain to the root. The only
> exception being maliciously long chains, which you can easilly cap
> and error out on after following about 50 DS records?
>
> "Why not" is not a very compelling argument, especially given the complexity involved, and the return to value being low (and itself being inconsistent
> with other matters)

CAs are in the business of verification, auditing and issuing security
certificates. If they cannot log why they made a certain decision in
the past based on the then gathered cryptographic material available,
they are simply not trustworthy at their job. Asking us to accept
"you should just trust we did the right DNSSEC checks in the past"
is pretty weak for a security institution, especially when you need
to resolve a dispute about a CAA record that was present at the time
but failed to prevent a certificate from being issued.

And if you cannot store a few kb of data per certificate (not) issued
based on a CAA record, then surely you're not mature enough to be in
the business of certifying and auditing anything.

Paul

Matthew Hardeman

unread,

May 23, 2018, 1:36:01 PM5/23/18

to Paul Wouters, Ryan Sleevi, mozilla-dev-security-policy

I believe that Paul Wouters has made a compelling case regarding the
current state of keying practices in DNSSEC deployment today.
There is sufficient cryptographic rigor to merit logging this data for
review of correct assessment as of the point in time at which certificate
issuance decisioning was made.

I concur in full with the assertions and positions of Paul Wouters and Nick
Lamb in the matters discussed in this thread up to this point.

I believe CAA validation checks should incorporate the DNSSEC data where
DNSSEC has been deployed. I believe the logs recorded by a CA should
preserve that data which was relied upon in the issuance decisioning.

I have some concerns pertaining to the state of logging as would appear to
be alluded to by Jacob Hoffman-Andrews. Specifically, I find some cause
for concern in the fragment "because it was not possible to associate

specific query/response pairs with the validation request that caused them
(for instance, consider NS referrals, CNAME indirection, and caching)".

This would appear to indicate that while Let's Encrypt may log either the
final response from their recursive resolver or some derivation of data
from the final response from their recursive resolver, Let's Encrypt may
not be logging _which_ particular entity within the DNS hierarchy they are
utilizing as the controlling CAA record - or at least, this language
suggests that for some circumstances they are not recording the underlying
delegations/reasons for which a given CAA record somewhere in the
DNS hierarchy was utilized when attempting to clear issuance for a given
domain label. That becomes a bit concerning, as I would expect that a CA
relying on a CAA record at "
ok-sure.multiple-layers-of-indirection.my-crazy-dns-service.com" bearing
tag "issue" with value "letsencrypt.org" when authorizing issuance of a
certificate for dnsName "a.example.com" should be able to explain the chain
of facts which made that CAA record at that position within the
DNS hierarchy authoritative for the domain label in question.

There arises a potential further concern if this logging of DNS data upon
which a CA's decisions rely pertains to DNS queries for function beyond CAA
clearance. For example, the DNS queries associated with domain control
validation over a given domain label. Consider the software package
acme-dns which assists domain holders with delegating the ACME dns-01
validation records onto a specialized DNS server which responds with the
correct dynamic responses while allowing the underlying domain to have
static delegations for the TXT records back in the actual authoritative
zones of any number of domains that they don't want to migrate to dynamic
DNS services. That's a great use case and should not be discouraged.
Neither, however, should it the case that the CA's logging only has the
response of the final result from the acme-dns server. The logs should
contain the DNS query responses which connected that record served up by
the acme-dns server to the delegation that granted that authority to the
acme-dns server on behalf of the authorization domain name.

> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-security-policy
>

vduk...@gmail.com

unread,

May 24, 2018, 11:07:05 AM5/24/18

to mozilla-dev-s...@lists.mozilla.org

On Tuesday, May 22, 2018 at 1:27:16 PM UTC-4, Ryan Sleevi wrote:
> On Tue, May 22, 2018 at 1:03 PM, Paul Wouters <pa...@nohats.ca> wrote:
>
> > I know of 12400 512 bit RSA ZSK's in a total of about 6.5 million. And I
> > consider those to be an operational mistake.
>
> http://tma.ifip.org/wordpress/wp-content/uploads/2017/06/tma2017_paper58.pdf
> has some fairly damning empirical data about the reliability of those
> records, which is not in line with your anecdata.

One of the reasons that the number of 512-bit keys is indeed now only ~12k (and gradually decreasing) is rooted in a passing comment in that paper: "The majority of them can be
attributed to a hosting provider below cz."

As it turns out, I played a role in remediating that problem: https://lists.dns-oarc.net/pipermail/dns-operations/2017-October/016880.html

My focus is more operational than academic, so instead of writing a paper, I posted to the dns-operations list, and not long after that post the folks at "wedos.cz" resigned all the zones in question with 1024-bit or better keys. It remains to address the same issue at approximately three providers to essentially eliminate 512-bit keys from DNSSEC: https://twitter.com/VDukhovni/status/998341243800301568

So no, Pauls numbers are not "anecdata" and it is unwise to imply such a thing without knowing the full story. The DANE survey is identifying, publicizing and driving remediation of various neglected aspects of DNSSEC operations, and the overall ecosystem is getting considerably healthier than it was back in 2014.

--
Viktor.