Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CAA record checking issue

437 views
Skip to first unread message

Jeremy Rowley

unread,
May 9, 2019, 10:05:36 PM5/9/19
to mozilla-dev-s...@lists.mozilla.org
FYI, we posted this today:



https://bugzilla.mozilla.org/show_bug.cgi?id=1550645



Basically we discovered an issue with our CAA record checking system. If the
system timed out, we would treat the failure as a DNS failure instead of an
internal failure. Per the BRs Section 3.2.2:

"CAs are permitted to treat a record lookup failure as permission to issue
if:

. the failure is outside the CA's infrastructure;

. the lookup has been retried at least once; and

. the domain's zone does not have a DNSSEC validation chain to the ICANN
root"



The failure was not outside our infrastructure so issuance was improper.



We checked all the applicable CAA records and found 16 where the CAA record
would not permit us to issue if we were issuing a new cert today. What we
are proposing is to revoke these certificates and reissue them (if they pass
all the proper checks). The rest would pass if we issued today so we were
going to leave these where they are while disclosing them to the Mozilla
community.



Other suggestions are welcome.



The issue was put into the code back when CAA record checking became
mandatory (Sept 2017). We generally have a peer review of our code so that
at least one other developer has looked at the system before release. In
this case, neither PM nor a second reviewer was involved in the development.
We've since implemented more stringent development processes, including
ensuring a PM reviews and brings questions about projects to the compliance
team.



Anyway, let me know what questions, comments, etc you have.



Thanks!

Jeremy

Tim Shirley

unread,
May 10, 2019, 9:29:53 AM5/10/19
to Jeremy Rowley, mozilla-dev-s...@lists.mozilla.org
Jeremy,

Thanks for sharing this. After reading your description, I'm curious how your system was previously (or is now) satisfying the third criteria needed to issue in the face of a record lookup failure: confirming that the domain's zone does not have a DNSSEC validation chain to the ICANN root. Wouldn't any issuance require at least one successful DNS query in order to confirm the lack of a DS record somewhere between the TLD and the domain you're checking so you know the domain doesn't have a valid DNSSEC chain? If the CAA checking service was down, wouldn't those have all timed out? Or are those checks being done from a different system that wasn't down?

Regards,
Tim

On 5/9/19, 10:05 PM, "dev-security-policy on behalf of Jeremy Rowley via dev-security-policy" <dev-security-...@lists.mozilla.org on behalf of dev-secur...@lists.mozilla.org> wrote:

FYI, we posted this today:



https://scanmail.trustwave.com/?c=4062&d=99zU3MWO5ZnJnVq-TZZut0-4BjNGA3S27plK9QDITw&s=5&u=https%3a%2f%2fbugzilla%2emozilla%2eorg%2fshow%5fbug%2ecgi%3fid%3d1550645

Ryan Sleevi

unread,
May 10, 2019, 1:54:37 PM5/10/19
to Jeremy Rowley, mozilla-dev-s...@lists.mozilla.org
On Thu, May 9, 2019 at 10:05 PM Jeremy Rowley via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> We checked all the applicable CAA records and found 16 where the CAA record
> would not permit us to issue if we were issuing a new cert today. What we
> are proposing is to revoke these certificates and reissue them (if they
> pass
> all the proper checks). The rest would pass if we issued today so we were
> going to leave these where they are while disclosing them to the Mozilla
> community.
>

Could you share the risk analysis that helped you reach this latter
conclusion?

That is, CAA is a time of check thing, and, as you note, you can't be sure
they were appropriately authorized at the time of issuance. Thus, even if
the site operator is a DigiCert customer now, or might have disabled CAA
now, there's no ability to determine whether or not they previously
approved it - or even whether the holder of that old certificate is still
the authorized domain representative now (e.g. in the event of a domain
transfer or sale)

In general, the default should be to revoke all. That said, if there's a
thorough analysis that has considered this, and other scenarios, and that,
on the whole, has led DigiCert to believe the current path is more
appropriate, it'd be great if you could share that analysis. I think Tim's
questions are useful as well, in understanding the reasoning.

Basically, without stating a position on whether your analysis is right or
wrong, I'm hoping you can show your work in detail, and all the factors you
considered. That sort of analysis is what helps the community build
confidence that the chosen path, despite being a violation of the BRs, is a
reflection of a CA thoughtfully considering all perspectives.

Jeremy Rowley

unread,
May 10, 2019, 1:57:47 PM5/10/19
to ry...@sleevi.com, mozilla-dev-s...@lists.mozilla.org
Okay. I'm working on something and will post it soon.
________________________________
From: Ryan Sleevi <ry...@sleevi.com>
Sent: Friday, May 10, 2019 11:54:14 AM
To: Jeremy Rowley
Cc: mozilla-dev-s...@lists.mozilla.org
Subject: Re: CAA record checking issue

Jeremy Rowley

unread,
May 10, 2019, 3:48:09 PM5/10/19
to Tim Shirley, mozilla-dev-s...@lists.mozilla.org
Hey Tim,

The issue was a call between the CA and CAA checker. The CAA checker would check the DNS and verify the DNSSEC chain. However, when retrieving the information from the CAA checker, the CA had the error, which means the CAA check was not evaluated correctly. Under normal operation the CAA check does the DNSSEC , CAA, and other DNS queries. Here it wasn't a DNS failure - it was a communication failure between the CA and CAA checker.

I guess you could say there were two failures in this case. First that the CAA check timed out internally and second that the DNSSEC check never happened. The mis-issuance still amounts to the same thing.

Normally, even if we get a DNS failure, we can usually check to see if the zone is signed (at least at the root zone). If there is a signed root zone, then we treat the entire zone as signed (meaning we fail on error).

Jeremy

-----Original Message-----
From: Tim Shirley <TShi...@trustwave.com>
Sent: Friday, May 10, 2019 7:30 AM
To: Jeremy Rowley <jeremy...@digicert.com>; mozilla-dev-s...@lists.mozilla.org
Subject: Re: CAA record checking issue

Jeremy,

Thanks for sharing this. After reading your description, I'm curious how your system was previously (or is now) satisfying the third criteria needed to issue in the face of a record lookup failure: confirming that the domain's zone does not have a DNSSEC validation chain to the ICANN root. Wouldn't any issuance require at least one successful DNS query in order to confirm the lack of a DS record somewhere between the TLD and the domain you're checking so you know the domain doesn't have a valid DNSSEC chain? If the CAA checking service was down, wouldn't those have all timed out? Or are those checks being done from a different system that wasn't down?

Regards,
Tim

On 5/9/19, 10:05 PM, "dev-security-policy on behalf of Jeremy Rowley via dev-security-policy" <dev-security-...@lists.mozilla.org on behalf of dev-secur...@lists.mozilla.org> wrote:

FYI, we posted this today:



https://scanmail.trustwave.com/?c=4062&d=99zU3MWO5ZnJnVq-TZZut0-4BjNGA3S27plK9QDITw&s=5&u=https%3a%2f%2fbugzilla%2emozilla%2eorg%2fshow%5fbug%2ecgi%3fid%3d1550645



Basically we discovered an issue with our CAA record checking system. If the
system timed out, we would treat the failure as a DNS failure instead of an
internal failure. Per the BRs Section 3.2.2:

"CAs are permitted to treat a record lookup failure as permission to issue
if:

. the failure is outside the CA's infrastructure;

. the lookup has been retried at least once; and

. the domain's zone does not have a DNSSEC validation chain to the ICANN
root"



The failure was not outside our infrastructure so issuance was improper.



We checked all the applicable CAA records and found 16 where the CAA record
would not permit us to issue if we were issuing a new cert today. What we
are proposing is to revoke these certificates and reissue them (if they pass
all the proper checks). The rest would pass if we issued today so we were
going to leave these where they are while disclosing them to the Mozilla
community.



Jeremy Rowley

unread,
May 10, 2019, 3:55:35 PM5/10/19
to ry...@sleevi.com, mozilla-dev-s...@lists.mozilla.org
The analysis was basically that all the verification documents are still good, which means if we issued the cert today, the issuance would pass without further checks (since the data itself is good for 825 days). Because of this, customers with domains that didn’t prohibit Digicert in their CAA record (anywhere in the chain) could simply reissue the certificate without a problem. We could require this of all customers. For the 16, issuance would fail if the CAA check was performed today. Therefore, we want to revoke those.



The one reason I wanted more time to respond is that we think we may have most CAA records in our Splunk data for the time of issuance. Our new plan is that we will revoke all certs unless we can confirm the CAA record was permissive at the time of issuance. I don’t know the number of certs that we will revoke yet. I’ll post an update when we compare the Splunk data to the issuance data.



The real problem was the CA would kick off a request to the CAA checker. If the CA encountered an error, the request would time out. The CAA record may still have checked the CAA records appropriately but the CA never pulled the information to verify issuance authorization. So it’s a mis-issuance unless we can pull the data and prove it wasn’t. Combing through the archive data will take a while.



Jeremy



From: Ryan Sleevi <ry...@sleevi.com>
Sent: Friday, May 10, 2019 11:54 AM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: mozilla-dev-s...@lists.mozilla.org
Subject: Re: CAA record checking issue







On Thu, May 9, 2019 at 10:05 PM Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org <mailto:dev-secur...@lists.mozilla.org> > wrote:

We checked all the applicable CAA records and found 16 where the CAA record
would not permit us to issue if we were issuing a new cert today. What we
are proposing is to revoke these certificates and reissue them (if they pass
all the proper checks). The rest would pass if we issued today so we were
going to leave these where they are while disclosing them to the Mozilla
community.



Ryan Sleevi

unread,
May 10, 2019, 4:15:58 PM5/10/19
to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-s...@lists.mozilla.org
On Fri, May 10, 2019 at 3:55 PM Jeremy Rowley <jeremy...@digicert.com>
wrote:

> The analysis was basically that all the verification documents are still
> good, which means if we issued the cert today, the issuance would pass
> without further checks (since the data itself is good for 825 days).
> Because of this, customers with domains that didn’t prohibit Digicert in
> their CAA record (anywhere in the chain) could simply reissue the
> certificate without a problem. We could require this of all customers. For
> the 16, issuance would fail if the CAA check was performed today.
> Therefore, we want to revoke those.
>

>
> The one reason I wanted more time to respond is that we think we may have
> most CAA records in our Splunk data for the time of issuance. Our new plan
> is that we will revoke all certs unless we can confirm the CAA record was
> permissive at the time of issuance. I don’t know the number of certs that
> we will revoke yet. I’ll post an update when we compare the Splunk data to
> the issuance data.
>

Thanks for answering. I was hoping you had a more thorough analysis ;) I do
have other questions about the implementation details, but I'll add those
to the bug, so we can focus this discussion on the immediate remediation
steps.

I guess my reservation with such an approach (and this is more a metapoint)
is consider issuing an EV certificate without having the supporting
documentation and/or without validating the documentation. You later come
back to the documents, validate them, and find out you got lucky - the
information was actually correct, even though the controls failed and the
process wasn't followed. Do you revoke the certificates, on the basis the
process failed, or do you not revoke them, because they were eventually
consistent?

This might sound like a hypothetical, but it's a question this industry has
faced in the past [1][2], and browsers have reached different conclusions
than CAs. It's not immediately clear to me how the proposed response here
differs from those past responses, and may highlight some of the difference
in philosophies here. An analysis that considered these past events, and
how they were received by the community, and how there may be different
facts here that lead to different conclusions, would be useful in both
validating and justifying the proposed course of action.


> The real problem was the CA would kick off a request to the CAA checker.
> If the CA encountered an error, the request would time out. The CAA record
> may still have checked the CAA records appropriately but the CA never
> pulled the information to verify issuance authorization. So it’s a
> mis-issuance unless we can pull the data and prove it wasn’t. Combing
> through the archive data will take a while.
>

[1]
https://wiki.mozilla.org/CA:Symantec_Issues#Issue_C:_Unauthorized_EV_Issuance_by_RAs_.28January_2014_-_February_2015.29

[2]
https://wiki.mozilla.org/CA:Symantec_Issues#Issue_T:_CrossCert_Misissuances_.28January_2010_-_January_2017.29

Jeremy Rowley

unread,
May 10, 2019, 4:54:41 PM5/10/19
to ry...@sleevi.com, mozilla-dev-s...@lists.mozilla.org
The difference is we actually have the data at time of issuance. It just wasn’t correctly relied on for these specific certs. I think this means there is an open question on whether the issuance even was a mis-issuance since the CAA information was collected…even if it wasn’t perfect.



This is why we’re revising the approach to say “Were the certs actually mis-issued? If yes, revoke. If no, then don’t revoke.”



I was looking at it like a law. You may think you trespassed by walking on some grass. But if permission was granted at the time to walk on the grass, then you never actually violated a rule (even if you didn’t know about the permission). If permission was granted later, you still broke that law and are accountable, even if no penalty is applied. Here, we didn’t appropriately store the information but the data may have been stored and checked in a process. More succinctly said, the difference is the broken process may result in compliantly issued certificates which is different than a broken certs that are then remediated. If I can prove the compliance at the time the cert was issued, then the certs shouldn’t be revoked.



Does that makes sense? I can certainly revoke all 1100 if that’s the preferred approach, but I figure with a few days time I can better answer question of what were the results in a break of normally compliant process?



Oh, one other factor is that the system wasn’t exploitable. The break was between two internal processes talking to each other so the errors couldn’t result in certificates issued to a bad actor. It was also a very low volume compared to normal issue. Neither of these are good reasons or excuses. Instead they are the reason we thought we should perhaps not revoke all the certs until we better understand the compliance implications.



From: Ryan Sleevi <ry...@sleevi.com>
Sent: Friday, May 10, 2019 2:16 PM
To: Jeremy Rowley <jeremy...@digicert.com>
Cc: ry...@sleevi.com; mozilla-dev-s...@lists.mozilla.org
Subject: Re: CAA record checking issue







Corey Bonnell

unread,
May 10, 2019, 8:57:04 PM5/10/19
to Jeremy Rowley, ry...@sleevi.com, mozilla-dev-s...@lists.mozilla.org
(This time posting to the list with the right email address; sorry for the duplicate email Jeremy and Ryan)

I’d like to point out that the precedent has not been to require CAs to revoke all certificates in the face of a CAA implementation flaw, but merely that the CAA checks are executed again to determine authorization for non-revocation. Specifically, a flaw in Let’s Encrypt’s CAA implementation was discovered last year (https://bugzilla.mozilla.org/show_bug.cgi?id=1462735) and it did not necessitate the revocation of all valid Let’s Encrypt certificates at the time. The incident report is light on details, but it sounds like only those certificates which failed the CAA recheck were revoked. Furthermore, in Let’s Encrypt’s case, there was insufficient logging of the original pre-issuance CAA lookup results, which may not be the case with Digicert here.

In light of this, I believe that the revocation of only the 16 certificates would align with precedent.

Thanks,
Corey
________________________________
From: dev-security-policy <dev-security-...@lists.mozilla.org> on behalf of Jeremy Rowley via dev-security-policy <dev-secur...@lists.mozilla.org>
Sent: Friday, May 10, 2019 16:54
To: ry...@sleevi.com
Cc: mozilla-dev-s...@lists.mozilla.org
Subject: RE: CAA record checking issue
[1] https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.mozilla.org%2FCA%3ASymantec_Issues%23Issue_C%3A_Unauthorized_EV_Issuance_by_RAs_.28January_2014_-_February_2015.29&amp;data=02%7C01%7C%7C350da1609f1f47c2224208d6d589bb39%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931184857745846&amp;sdata=0Mqw%2BM4hF5CNp%2BTGSRPtT5l3bLg98zU9vJ0vthXzYVs%3D&amp;reserved=0

[2] https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.mozilla.org%2FCA%3ASymantec_Issues%23Issue_T%3A_CrossCert_Misissuances_.28January_2010_-_January_2017.29&amp;data=02%7C01%7C%7C350da1609f1f47c2224208d6d589bb39%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931184857745846&amp;sdata=y5bNoO0xh2esxv58EMMFWlPxJWf72Dy7wcoK6TXjPOQ%3D&amp;reserved=0

Han Yuwei

unread,
May 11, 2019, 11:38:02 AM5/11/19
to mozilla-dev-s...@lists.mozilla.org
This raised a question:
How can CA prove they have done CAA checks or not at the time of issue?

在 2019年5月10日星期五 UTC+8上午10:05:36,Jeremy Rowley写道:

Nick Lamb

unread,
May 11, 2019, 7:04:32 PM5/11/19
to dev-secur...@lists.mozilla.org, Jeremy Rowley
On Fri, 10 May 2019 02:05:17 +0000
Jeremy Rowley via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> https://bugzilla.mozilla.org/show_bug.cgi?id=1550645
>
> Anyway, let me know what questions, comments, etc you have.

Thanks Jeremy,

If DigiCert is able to retrospectively achieve confidence that issuance
would have been permitted (because their records are good enough to go
back and see the CAA DNS records that were fetched but not used or at
the least the assessment made of those records at the time) I personally
think there is no need to revoke certificates that were in some sense
legitimately issued. To revoke them in these circumstances seems
perverse.

This also rewards keeping high quality issuance records that let you go
back and understand what went wrong. The BRs mandate some record
keeping, but we definitely don't always see evidence of good quality
record keeping in incident reports (I would count ISRG / Let's Encrypt
here definitely).


If DigiCert turns out not to have the records, or checking isn't done
for whatever reasons then I think all 1053 affected certs should be
revoked, without trying to justify narrowing it down further.

In the margins, e.g. if DigiCert can see that some cases have no CAA,
but in cases with CAA it's not possible to be sure if it would have
permitted issuance, I think we need to ask for all 1053 to be revoked
for consistency rather than making complicated decisions that have the
effect of penalizing some subscribers for doing the Right Thing.


I don't endorse the plan of revoking 16 certs based on CAA information
that's far (perhaps more than 12 months) newer than the issuance, I
don't think this is compatible with the declared philosophy of the CAA
and so it makes the message about what CAA is or is not for too
muddled. Revoking all 1053 makes more sense than revoking 16 on this
basis.


Nick.

Matt Palmer

unread,
May 12, 2019, 7:39:32 PM5/12/19
to dev-secur...@lists.mozilla.org
On Sat, May 11, 2019 at 08:37:53AM -0700, Han Yuwei via dev-security-policy wrote:
> This raised a question:
> How can CA prove they have done CAA checks or not at the time of issue?

They can't, just as they can't prove they have or haven't done
domain-control validation. It's up to audits, external adversarial testing,
and the forthright honesty of CAs themselves to proactively report when they
have a problem, to identify when CAs have failed to maintain the necessary
standards.

- Matt

Mike Kushner

unread,
May 13, 2019, 4:35:19 AM5/13/19
to mozilla-dev-s...@lists.mozilla.org
Indeed. It would have been awesome if CAA had included returning a signed token containing the result of the check, but that would probably have been impossible to roll out on all of the world's DNS servers.

Cheers,
Mike

Matt Palmer

unread,
May 13, 2019, 6:49:11 PM5/13/19
to dev-secur...@lists.mozilla.org
On Mon, May 13, 2019 at 01:35:09AM -0700, Mike Kushner via dev-security-policy wrote:
> On Monday, May 13, 2019 at 1:39:32 AM UTC+2, Matt Palmer wrote:
> Indeed. It would have been awesome if CAA had included returning a signed
> token containing the result of the check, but that would probably have
> been impossible to roll out on all of the world's DNS servers.

Yep, at that point you've basically rolled out DNSSEC, and if you've managed
to achieve *that* Herculean feat, sites can just publish identity data in
DNS and you don't need CAs at all.

- Matt

--
Sure, it's possible to write C in an object-oriented way. But, in practice,
getting an entire team to do that is like telling them to walk along a
straight line painted on the floor, with the lights off.
-- Tess Snider, slug...@slug.org.au

0 new messages