INCIDENT RESPONSE: November 2015 Google ‘pilot’ and ‘aviator’ logged 3 certs with invalid signatures

419 views
Skip to first unread message

Ryan M Hurst

unread,
Jun 4, 2016, 12:41:35 PM6/4/16
to Certificate Transparency Policy

SUMMARY:

On November 11th (1,2) and 12th of 2015 the Google ‘pilot’ and ‘aviator’ logs logged a total of three certificates with invalid signatures. This represented a violation of section 3.1 of RFC 6962 which states:


Logs MUST verify that the submitted end-entity certificate or Precertificate has a valid signature chain leading back to a trusted root CA certificate, using the chain of intermediate CA certificates provided by the submitter.


This was a result of a bug in where a 'fail-open' existed in cases where the chain being added contained an unsupported signature algorithm.  


While this did not impact users and was not exploited beyond the inclusion of the three spam entries it exposed these two logs to the risks that come with the unnecessary processing of spam. Those being potential DoS, degraded data quality, database storage hitting quota, etc.


The issue was resolved in production by November 17th, 2015.


IMPACT:

In November of 2015 three certificates with invalid signatures were included in the ‘Pilot’ and ‘Aviator’ logs.


As a result of this defect an attacker would have potentially been able to SPAM the logs which could have negatively affected log availability.


ROOT CAUSE:

A refactoring of the code base was made on September 29th, 2015 which was intended to improve error handling. This change in the way errors were handled resulted in requests with invalid signatures falling into a code path intended for non-critical errors; thus the associated certificates were added to the log.


REMEDIATION AND PREVENTION:

To prevent similar incidents in future, we have improved the error handling of the associated code added test coverage for this specific case to ensure this specific condition does not occur again.


ADDITIONAL DETAILS:

It is our practice to immediately release a public incident report once a post-mortem is completed. In this particular case, though the post-mortem was completed immediately after incident resolution, the incident report was not immediately released. This oversight was a result of staffing change and hand-over related issues.


Richard Salz

unread,
Jun 4, 2016, 1:20:27 PM6/4/16
to Ryan Hurst, Certificate Transparency Policy

So do you think that absent Peter's email the disclosure would have never happened?

Ryan M Hurst

unread,
Jun 4, 2016, 1:26:45 PM6/4/16
to Certificate Transparency Policy

Ryan M Hurst

unread,
Jun 4, 2016, 2:09:12 PM6/4/16
to Certificate Transparency Policy
It appears my reply to Rich's email was somehow truncated.

Rich asked "So do you think that absent Peter's email the disclosure would have never happened?"

No, I do not believe that is the case. This issue was reported by Rob before I joined the team. Rob and I had discussed it once I joined but it had slipped through the track, it was still on my todo list and Rob and I had discussed it as recently as a month prior.

It was entirely my oversight that this did not go out sooner but it would have gone out.

Ryan



On Saturday, June 4, 2016 at 9:41:35 AM UTC-7, Ryan M Hurst wrote:

Richard Salz

unread,
Jun 4, 2016, 2:47:03 PM6/4/16
to Ryan M Hurst, Certificate Transparency Policy
On Sat, Jun 4, 2016 at 2:09 PM, 'Ryan M Hurst' via Certificate Transparency Policy <ct-p...@chromium.org> wrote:
It was entirely my oversight that this did not go out sooner but it would have gone out.


Thank you; I accept that.

Matt Palmer

unread,
Jun 5, 2016, 1:51:48 AM6/5/16
to Certificate Transparency Policy
On Sat, Jun 04, 2016 at 09:41:34AM -0700, 'Ryan M Hurst' via Certificate Transparency Policy wrote:
> On November 11th (1 <https://crt.sh/?id=10663251>,2
> <https://crt.sh/?q=10665866>) and 12th <https://crt.sh/?id=10735477> of
> 2015 the Google ‘pilot’ and ‘aviator’ logs logged a total of three
> certificates with invalid signatures. This represented a violation of
> section 3.1 of RFC 6962 <https://tools.ietf.org/html/rfc6962> which states:
>
> Logs MUST verify that the submitted end-entity certificate or
> Precertificate has a valid signature chain leading back to a trusted root
> CA certificate, using the chain of intermediate CA certificates provided by
> the submitter.

Given that these logs have failed to abide by a MUST criteria in RFC6962, does
Chromium intend to remove these logs from the trusted set? If not, why not?

- Matt

Ryan Sleevi

unread,
Jun 6, 2016, 8:39:57 PM6/6/16
to Matt Palmer, Certificate Transparency Policy
On Sat, Jun 4, 2016 at 10:51 PM, Matt Palmer <mpa...@hezmatt.org> wrote:
Given that these logs have failed to abide by a MUST criteria in RFC6962, does
Chromium intend to remove these logs from the trusted set?  If not, why not?

At the risk of forking threads, I posted a broader response that tried to explain more the policy and motivations at https://groups.google.com/a/chromium.org/d/msg/ct-policy/AH9JHYDljpU/f4I9vQLACwAJ

My inclination is that, similar to the failures of Symantec, Digicert, and Venafi, this doesn't raise to the level of a security concern that would directly affect Chrome users' security, nor the ecosystem as a whole. I'm curious, however, if there's perspective that I've failed to consider.

Richard Salz

unread,
Jun 7, 2016, 9:34:18 AM6/7/16
to Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
It is your browser, you give it away, you can do what you want.

I haven't seen anything in the policy, nor discussions, that allow any lattitude.

I believe RFC non-compliance is a greater issue than some downtime that other logs have had.

In the interests of transparency, you should update the policy to include the fact that value judgements will come into play if that is, in fact, the case.

Ryan Sleevi

unread,
Jun 7, 2016, 10:57:22 AM6/7/16
to Richard Salz, Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
On Tue, Jun 7, 2016 at 6:34 AM, Richard Salz <rich...@gmail.com> wrote:
It is your browser, you give it away, you can do what you want.

I haven't seen anything in the policy, nor discussions, that allow any lattitude.

There's an entire section in the policy dedicated to it, as I pointed out. Perhaps you're reading a different policy?
 
I believe RFC non-compliance is a greater issue than some downtime that other logs have had.

On what basis?
 
In the interests of transparency, you should update the policy to include the fact that value judgements will come into play if that is, in fact, the case.

It already says that. Rather extensively. In fact, there's a whole section called "Policy Violations" - https://sites.google.com/a/chromium.org/dev/Home/chromium-security/certificate-transparency/log-policy

Richard Salz

unread,
Jun 7, 2016, 11:39:26 AM6/7/16
to Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
You are right, there is a section on Policy violations that I had forgotten about.  It's three sentences say exactly what I was asking for.  I apologize.

The non-compliance can never be expunged from the logs.  Some missing uptime can be corrected. What was the true effect of the latter?  And what were the measurements?

Ryan Sleevi

unread,
Jun 7, 2016, 11:54:59 AM6/7/16
to Richard Salz, Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
On Tue, Jun 7, 2016 at 8:39 AM, Richard Salz <rich...@gmail.com> wrote:
You are right, there is a section on Policy violations that I had forgotten about.  It's three sentences say exactly what I was asking for.  I apologize.

The non-compliance can never be expunged from the logs.  Some missing uptime can be corrected. What was the true effect of the latter?  And what were the measurements?

As explained on the other thread behind the reasoning, uptime has security impact:

A significant downtime event can cause an MMD to be blown.
Intermittent downtime can be used to mask split log views. Without any uptime requirement at all, a log can simply deal with requests to get an inclusion proof by calling it "network failure", and thus fail to provide a cryptographic commitment that they've violated policy.
 
Uptime also has ecosystem impact:

Clients have less reliability to check inclusion proofs of SCTs into STHs.
Monitors and Auditors have additional work to deal with outages. Put differently, there's no recourse for a flaky, unreliable log - which thus is not useful for increasing public trust, if it can't be reliably used.
CAs have to deal with added sources of inconsistency and flake in submissions, increasing issuance time and adding to the overheads of logging.


As I also explained, on the other thread, in this particular case, the non-compliance is no different than if the log had accepted an arbitrary root. That is, consider if the Google logs had allowed "Ryan's Really Awesome Root" to be added to the set of known CAs, which was a key I personally controlled. I could have logged certificates for google.com, facebook.com, akamai.com, etc. However, because "Ryan's Really Awesome Root" is not trusted on any platforms, the mere fact that such log entries exist is not enough. In order to effectively monitor, Monitors need to be doing post-processing on entries - namely, evaluating the certificate chain to see if it roots in a CA that the Monitor cares about (and isn't revoked, has the right EKUs, etc). As such, even with these non-compliant entries, any well-written Monitor will be unaffected by this - it's indistinguishable in effect from the "Ryan's Really Awesome Root" case. And the policy, as it stands today, doesn't prevent logs from accepting "Ryan's Really Awesome Root". And I know of logs that do something similar today (and having said that, you can easily figure out which), which is proof again that Monitors already need to be doing that signature and policy evaluation.

So that's why I see the non-compliance as a non-issue, whereas uptime directly ties to security and ecosystem issues that the policy is intentionally trying to address.

Richard Salz

unread,
Jun 7, 2016, 12:25:22 PM6/7/16
to Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
I understand the importance of uptime and its potential impact.  Without seeing detailed reports, I do not believe that the downtime for all those logs was impactful.  Your software, your rules, YMMV.

Even if MMD were blown, once, is it really any worse than non-compliant certs in a log?  Again, see above.

Ryan Sleevi

unread,
Jun 7, 2016, 12:27:42 PM6/7/16
to Richard Salz, Ryan Sleevi, Matt Palmer, Certificate Transparency Policy


On Tue, Jun 7, 2016 at 9:25 AM, Richard Salz <rich...@gmail.com> wrote:
I understand the importance of uptime and its potential impact.  Without seeing detailed reports, I do not believe that the downtime for all those logs was impactful.  Your software, your rules, YMMV.

Even if MMD were blown, once, is it really any worse than non-compliant certs in a log?  Again, see above.

Richard,

Without understanding why you believe non-compliant certs in a log is bad - a point I don't feel you've articulated or responded to - I can't really answer your question. If it was rhetorical, well, I do hope you can elaborate a bit more on the point you're trying to make.

Ben Laurie

unread,
Jun 8, 2016, 9:23:14 AM6/8/16
to Richard Salz, Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
The logs are absolutely full of non-compliant certs, and this is a
virtue: https://crt.sh/?x509lint=1+week.

Richard Salz

unread,
Jun 8, 2016, 10:14:48 AM6/8/16
to Ben Laurie, Ryan Sleevi, Matt Palmer, Certificate Transparency Policy
Okay, thanks Ben.  Not sure it's a benefit, but if it's BAU, then so be it.

Rob Stradling

unread,
Jun 9, 2016, 10:13:56 AM6/9/16
to ct-p...@chromium.org
On 04/06/16 17:41, 'Ryan M Hurst' via Certificate Transparency Policy wrote:
<snip>
> IMPACT:
>
> In November of 2015 three certificates with invalid signatures were
> included in the ‘Pilot’ and ‘Aviator’ logs.

Hi Ryan. Nitpicking your counting...

If "included" means "added to the Merkle tree", then:
- three certs with invalid signatures ([2], [3] and [6]) were
included in the 'Pilot' log.
- two certs with invalid signatures ([2] and [3]) were included in
the 'Aviator' log.

Or, if "included" also includes the CA certificates that are returned in
get-entries "extra_data" fields, then:
- six certs with invalid signatures ([1]..[6]) were included in the
'Pilot' log.
- three certs with invalid signatures [1], [2] and [3]) were included
in the 'Aviator' log.

(Either way, your count of "three certificates...in the 'Pilot' and
'Aviator' logs" seems imprecise. ;-) )

> As a result of this defect an attacker would have potentially been able
> to SPAM the logs which could have negatively affected log availability.

Indeed. That's why (on 16th November 2015) I chose to disclose this
problem to the Google CT Team privately. After the bug was fixed (super
quickly - thanks!), I felt it would be inappropriate for me to disclose
it publicly given that Adam E had promised to do an incident report.

Given the tardiness of the incident report, I'm wondering if there's
anything I should've done differently. Perhaps I should've done a
public disclosure after 90 days (a la Project Zero). Or perhaps I
should've disclosed it publicly in the first place, as I have done most
other times I've encountered problems or unexpected behaviour with logs
(e.g. [7], [8], [9] and [10]).

ISTM that it would be really useful to have some documented guidance on
how to disclose problems with logs!


[1] https://crt.sh/?id=10663250
[2] https://crt.sh/?id=10663251
[3] https://crt.sh/?id=10665866
[4] https://crt.sh/?id=10735475
[5] https://crt.sh/?id=10735476
[6] https://crt.sh/?id=10735477
[7]
https://groups.google.com/d/msg/certificate-transparency/39CnMs_4ZsY/-yKISz3uCwAJ
[8]
https://groups.google.com/a/chromium.org/d/msg/ct-policy/Ij8K3jLIcs4/BBpVjRaDeHwJ
[9]
https://groups.google.com/a/chromium.org/forum/#!topic/ct-policy/F7o4SXfpWek
[10]
https://groups.google.com/forum/#!msg/certificate-transparency/RwR22ORN76g/CDmctWZCu3QJ

--
Rob Stradling
Senior Research & Development Scientist
COMODO - Creating Trust Online

Ryan Hurst

unread,
Jun 9, 2016, 12:18:07 PM6/9/16
to Rob Stradling, ct-p...@chromium.org
Rob,

I apologize for the lack of precision in my response, I was trying to provide an accurate but simple summation of the incident and in the process left some of the details out. Thank you for the more detailed explanation.

As for the appropriate process to follow, we appreciate that you notified us immediately which allowed us to quickly fix the issue. The delayed publication of the incident report was an artifact of failed handover between Adam and I and nothing you had done.

We will discuss internally publishing guidance on how to handle notifications in the future to remove any ambiguity that may exist.

Ryan





--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/fed5588a-e40b-f0d6-139d-d15d3e80067f%40comodo.com.

Rob Stradling

unread,
Jun 9, 2016, 3:53:15 PM6/9/16
to Ryan Hurst, ct-p...@chromium.org
On 09/06/16 17:18, Ryan Hurst wrote:
> Rob,
>
> I apologize for the lack of precision in my response, I was trying to
> provide an accurate but simple summation of the incident and in the
> process left some of the details out. Thank you for the more detailed
> explanation.

Hi Ryan. No problem. And BTW, I think you've already apologized
enough. ;-)

> As for the appropriate process to follow, we appreciate that you
> notified us immediately which allowed us to quickly fix the issue. The
> delayed publication of the incident report was an artifact of failed
> handover between Adam and I and nothing you had done.

I'm just interested in what lessons we can all learn from this
incident/response. Whilst it's possible that your's was the first
"failed handover" in the history of CT, I'm sure it won't be the last!

> We will discuss internally publishing guidance on how to handle
> notifications in the future to remove any ambiguity that may exist.

Thanks.

Ryan Sleevi

unread,
Jun 9, 2016, 4:05:08 PM6/9/16
to Rob Stradling, Ryan Hurst, ct-p...@chromium.org
On Thu, Jun 9, 2016 at 12:53 PM, Rob Stradling <rob.st...@comodo.com> wrote:
I'm just interested in what lessons we can all learn from this incident/response.  Whilst it's possible that your's was the first "failed handover" in the history of CT, I'm sure it won't be the last!

I think you've posed a good set of questions, which is what responsible disclosure looks like for a CT log. The answer to that seems to depend on how critical the CT log is for the security of something, and whether or not failures can be remediated.

For example, consider if you find a vulnerability that allows an SCT to be issued that isn't incorporated in the MMD? That's discoverable within 24 hours - and is reasonably serious enough to be grounds for disqualifying the log (as we've seen) - so does a 90 day policy benefit anyone? [Concretely: Consider Izenpe]

What about an issue that causes SCTs to be malformed? Is that a security issue or an operations issue? Will the ecosystem be healthy if the matter isn't discovered until 90 days after - when perhaps thousands or hundreds of thousands of invalid SCTs have been issued? [Concretely: Consider Symantec]

What about an issue that causes a log's spam protections to fail? Will public disclosure increase the risk that the log will be spammed into failure, or that, by spamming the log, monitors would be forced to download needless data? Does the system try to mitigate the issue already - or is it left up to logs (as spam reduction presently is, effectively)? [Concretely: Consider Google]

It should come as no surprise that I'm generally fond of radical transparency, with reasoned judgement when appropriate, so I lean to a path that encourages people to disclose privately, but also feel it's reasonable to disclose publicly relatively quickly. The goal, long-term of course, is that CT itself is not a critical security function per se, but a transparency issue, and that any issues such as removing or distrusting logs can have the ecosystem already robust enough to handle (such as including multiple SCTs for precerts in the final cert) or reasonably quickly transition to correct (OCSP stapling, TLS embedding). These help reduce the time necessary for responsible disclosure to perhaps hours or days, rather than weeks and months.

This is something I frequently think about and debate with various people here, but I haven't been as open about the thoughts about CT that keep me up at night - and the ways it can fail. Are there other scenarios, either as currently implemented or within the end goals of CT, that we should think about? The types of failures that can happen, and how they might need to be handled - and what risks public disclosure would create?

Richard Salz

unread,
Jun 9, 2016, 4:38:40 PM6/9/16
to Ryan Sleevi, Rob Stradling, Ryan Hurst, ct-p...@chromium.org
The business/branding aspect are very important.  Our customers want the EV/"green bar" indication.

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.

Ryan Sleevi

unread,
Jun 9, 2016, 4:41:54 PM6/9/16
to Richard Salz, Ryan Sleevi, Rob Stradling, Ryan Hurst, ct-p...@chromium.org
I'm not sure I understand what you were replying to or how it relates, but I'm probably missing something important. Could you explain further about how it relates to the topic of how to disclose log issues, and more importantly, how those log issues impact the ecosystem? I'm not trying to be snarky, but I'm at a total loss for how your statement is related.

Rob Stradling

unread,
Jun 9, 2016, 5:41:42 PM6/9/16
to rsl...@chromium.org, Ryan Hurst, ct-p...@chromium.org
On 09/06/16 21:04, Ryan Sleevi wrote:
Radical transparency sounds good to me. :-)

> This is something I frequently think about and debate with various
> people here, but I haven't been as open about the thoughts about CT that
> keep me up at night - and the ways it can fail. Are there other
> scenarios, either as currently implemented or within the end goals of
> CT, that we should think about? The types of failures that can happen,
> and how they might need to be handled - and what risks public disclosure
> would create?

Other scenarios that immediately come to mind:

- Some of the "extra data" (which the log does not sign or
incorporate into its Merkle Tree but is expected to provide upon
request) is missing (Concretely: Consider Alpha).

- Uptime < the required threshold (Concretely: Consider (1) WoSign's
first log and (2) Certly).

- Slow responses - how slow is too slow? (Concretely: Consider (1)
The Venafi log, which in my experience is noticeably slower than other
logs at handling add-(pre-)chain calls, and (2) traceroute says that
ct.googleapis.com is 350ms away from Comodo's CA systems, whereas
ct-fixed-ip.googleapis.com is only 6ms away - we've already been waiting
weeks for Google's Network Operations "GEO mapping team" to fix this)

- Log private key known or believed to have been compromised. (Hasn't
happened yet AFAIK!)

Iñigo Barreira

unread,
Jun 13, 2016, 2:39:28 AM6/13/16
to Rob Stradling, Ryan Sleevi, Ryan Hurst, ct-p...@chromium.org
It can be also important, depending on the failure, the reaction time to fix the issue. It´s not the same to fix it the same day tan 3 weeks later because of the "impact" or the "importance" of the issue because it let to a subjective result.

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.

Nick Lamb

unread,
Jun 13, 2016, 9:54:26 AM6/13/16
to Certificate Transparency Policy
I tried quoting but the Google Groups interface is so reprehensibly awful I gave up.

Rob FWIW ninety days would make sense to me, but the same real world we live in that may make an otherwise responsible organisation like Google fail to respond in ninety days could also see you forget to do anything after the deadline expires. In light of that just announcing publicly may be more practical, particularly whenever the thing you're publishing doesn't seem to represent a real immediate danger to the web PKI or to security generally. Just one less thing to remember that way.

Eric Mill

unread,
Jun 13, 2016, 10:01:15 AM6/13/16
to Nick Lamb, Certificate Transparency Policy
Well, Google would be a lot more likely to respond in 90 days if it was explicitly understood that people reporting issues were giving them 90 days. In this case, everything was handled informally by people who knew each other professionally, which perhaps contributed to it being treated too casually. I don't think you can conclude from this experience that a formal reporting window wouldn't work.

On Mon, Jun 13, 2016 at 9:54 AM, Nick Lamb <tiala...@gmail.com> wrote:
I tried quoting but the Google Groups interface is so reprehensibly awful I gave up.

Rob FWIW ninety days would make sense to me, but the same real world we live in that may make an otherwise responsible organisation like Google fail to respond in ninety days could also see you forget to do anything after the deadline expires. In light of that just announcing publicly may be more practical, particularly whenever the thing you're publishing doesn't seem to represent a real immediate danger to the web PKI or to security generally. Just one less thing to remember that way.

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.
Reply all
Reply to author
Forward
0 new messages