Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Google OCSP service down

2,359 views
Skip to first unread message

s...@gmx.ch

unread,
Jan 21, 2018, 10:07:37 AM1/21/18
to dev-secur...@lists.mozilla.org
Hi

Google delivers the certificate [1] to me, for *.google.com,
*.youtube.com and other major services.
However, the OCSP service [2] does not work for me. I verified this from
multiple locations, machines, OSes and versions of Firefox. Furthermore,
I used SSL Labs [3] and the status on crt.sh [1] to verify. AFAIK other
browsers don't support hard fail for OCSP at all. However curl does:

$ curl --version
curl 7.57.0 (x86_64-pc-linux-gnu) libcurl/7.57.0 OpenSSL/1.1.0g
zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) libssh2/1.8.0
nghttp2/1.29.0
Release-Date: 2017-11-29
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s
rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM
NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL
$ curl --cert-status https://www.google.com
curl: (91) No OCSP response received

I monitor this issue for some hours, but it's quite surprising that
Google has not yet fixed it. The OCSP service is not listed on their app
status board [4] and I failed to find any way to contact Google directly
about this issue. The Google PKI does not fit in any contact form I
found and the category "other" is always referring to some FAQs or similar.
It's also a single point of failure since all Google services are signed
by the Google PKI, which (if you are strict) cannot be fully trusted
without a valid OCSP response...

Can somebody confirm this issue? You can easily flip the
"security.OCSP.require" pref to true in about:config (Firefox) to check
or using curl.
Is there a known contact to report it (or is someone with a Google hat
reading this anyway)?
Is there any plan if a CA fails for whatever reason and cannot be
contacted anymore, because all their services are signed by themselves?
In the case of Google they are also preloaded and pinned in all (modern)
browsers, so it's very hard to bypass (for good reasons) if they would
have a serious issue in the PKI.



[1] https://crt.sh/?id=299058714&opt=ocsp
[2] http://clients1.google.com/ocsp
[3]
https://www.ssllabs.com/ssltest/analyze.html?d=google.com&s=2607%3af8b0%3a4005%3a80a%3a0%3a0%3a0%3a200e
[4] https://www.google.com/appsstatus

signature.asc

Paul Kehrer

unread,
Jan 21, 2018, 10:47:20 AM1/21/18
to dev-secur...@lists.mozilla.org
I can confirm that the endpoint embedded in the certificate (
http://clients1.google.com/ocsp) is giving a 404 to OCSP requests at this
time. crt.sh's OCSP monitoring page also shows this.

-Paul

On January 21, 2018 at 9:07:48 AM, sjw--- via dev-security-policy (
_______________________________________________
dev-security-policy mailing list
dev-secur...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

David E. Ross

unread,
Jan 21, 2018, 11:13:30 AM1/21/18
to mozilla-dev-s...@lists.mozilla.org
On 1/21/2018 7:47 AM, Paul Kehrer wrote:
> Is there a known contact to report it (or is someone with a Google hat
> reading this anyway)?

On Friday (two days ago), I reported this to dns-...@google.com, the
only E-mail address in the WhoIs record for google.com.

I received an automated reply indicating that security issues should
instead be reported to secu...@google.com. I immediately resent
(Thunderbird's Edit As New Message) to secu...@google.com.

I then received an automated reply from secu...@google.com that listed
a variety of Web addresses for reporting various problems. I replied
via E-mail to secu...@google.com:
> Because of the OCSP failure, I am unable to reach any of the google.com
> Web site cited in your reply.

Yes, I could disable OCSP checking. But I my need for Google is
insufficient for me to browse insecurely.

By the way, in SeaMonkey 2.49.1 (the latest version) the Google Internet
Authority G2 certificate appears to be an intermediate, signed by the
GeoTrust Global CA root.

There is a pending request (bug #1325532) from Google to add a Google
root certificate to NSS. Given the inadequacy of Google's current
information on reporting security problems, I have doubts whether this
request should be approved.

See <https://bugzilla.mozilla.org/show_bug.cgi?id=1325532>.

--
David E. Ross
<http://www.rossde.com/>

President Trump: Please stop using Twitter. We need
to hear your voice and see you talking. We need to know
when your message is really your own and not your attorney's.

Ryan Hurst

unread,
Jan 21, 2018, 11:42:08 AM1/21/18
to mozilla-dev-s...@lists.mozilla.org
We are investigating the issue and will provide a update when that investigation is complete.

Thank you for letting us know.

Ryan Hurst
Product Manager
Google

Ryan Sleevi

unread,
Jan 21, 2018, 12:51:36 PM1/21/18
to David E. Ross, mozilla-dev-s...@lists.mozilla.org
On Sun, Jan 21, 2018 at 11:12 AM David E. Ross via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> On 1/21/2018 7:47 AM, Paul Kehrer wrote:
> > Is there a known contact to report it (or is someone with a Google hat
> > reading this anyway)?
>
> On Friday (two days ago), I reported this to dns-...@google.com, the
> only E-mail address in the WhoIs record for google.com.


I couldn’t find that listed in the CP/CPS as where to report problems.
Instead, I see a different email listed.

What made you decide to ignore the CP/CPS, which is where CAs list their
problem reporting mechanisms?

Given that a CA’s CP/CPS applies to their hierarchy and issuance practices,
not a single certificate, and given that past discussions on this list have
specifically called out the CP/CPS as the place to determine problem
reporting mechanisms, it does seem unreasonable to expect arbitrary
reporting mechanisms to get the same attention as the defined mechanisms.


>
> I received an automated reply indicating that security issues should
> instead be reported to secu...@google.com. I immediately resent
> (Thunderbird's Edit As New Message) to secu...@google.com.
>
> I then received an automated reply from secu...@google.com that listed
> a variety of Web addresses for reporting various problems. I replied
> via E-mail to secu...@google.com:
> > Because of the OCSP failure, I am unable to reach any of the google.com
> > Web site cited in your reply.
>
> Yes, I could disable OCSP checking. But I my need for Google is
> insufficient for me to browse insecurely.
>
> By the way, in SeaMonkey 2.49.1 (the latest version) the Google Internet
> Authority G2 certificate appears to be an intermediate, signed by the
> GeoTrust Global CA root.
>
> There is a pending request (bug #1325532) from Google to add a Google
> root certificate to NSS. Given the inadequacy of Google's current
> information on reporting security problems, I have doubts whether this
> request should be approved.
>
> See <https://bugzilla.mozilla.org/show_bug.cgi?id=1325532>.
>
> --
> David E. Ross
> <http://www.rossde.com/>
>
> President Trump: Please stop using Twitter. We need
> to hear your voice and see you talking. We need to know
> when your message is really your own and not your attorney's.

David E. Ross

unread,
Jan 21, 2018, 2:09:40 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org
On 1/21/2018 9:50 AM, Ryan Sleevi wrote:
> I couldn’t find that listed in the CP/CPS as where to report problems.
> Instead, I see a different email listed.
>
> What made you decide to ignore the CP/CPS, which is where CAs list their
> problem reporting mechanisms?
>
> Given that a CA’s CP/CPS applies to their hierarchy and issuance practices,
> not a single certificate, and given that past discussions on this list have
> specifically called out the CP/CPS as the place to determine problem
> reporting mechanisms, it does seem unreasonable to expect arbitrary
> reporting mechanisms to get the same attention as the defined mechanisms.

At the time I tried reporting the problem, I forgot that Google had a
pending request to add its root to NSS. When I checked the Certificate
Manager list of Authorities in my browser, Google did not appear.

In any case, this OCSP problem still makes me question Google's ability
to manage a certification authority. As a prior reply in this thread
indicates, it took two days for Google to even acknowledge there is a
problem.

As of right now, it appears the problem has been fixed. With both
checkboxes checked under OCSP at [Edit > Preferences > Privacy &
Security > Certificates], I am now able to reach Google Web sites.

Ryan Hurst

unread,
Jan 21, 2018, 3:00:10 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org

>
> We are investigating the issue and will provide a update when that investigation is complete.
>
> Thank you for letting us know.
>
> Ryan Hurst
> Product Manager
> Google

I wanted to provide an update to the group. The issue has been identified and a roll out of the fix is in progress across all geographies.

I have personally verified the fix in several geographies.

A post mortem will be created and shared with the group as soon as it is ready.

Ryan Hurst

unread,
Jan 21, 2018, 3:09:33 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org
> > Is there a known contact to report it (or is someone with a Google hat
> > reading this anyway)?
>

David,

I am sorry you experienced difficulty in contacting us about this issue.

We maintain contact details both within our CPS (like other CAs) and at https://pki.goog so that people can reach us expeditiously. In the future if anyone needs to reach us please use those details.

Google is a large organization and when other teams are contacted (such as DNS) we do not have control over when and if those issues will reach us.

We are actively working on a post mortem on this issue and when it is complete we will share it in this thread.

Thanks for your help in this matter,

Hanno Böck

unread,
Jan 21, 2018, 4:00:44 PM1/21/18
to dev-secur...@lists.mozilla.org, Ryan Hurst
Hi,

On Sun, 21 Jan 2018 12:09:23 -0800 (PST)
Ryan Hurst via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> We maintain contact details both within our CPS (like other CAs) and
> at https://pki.goog so that people can reach us expeditiously. In the
> future if anyone needs to reach us please use those details.

I just tried to see what I'd do if I wanted to report issues with
Google's CA (assuming I don't know where its webpage lives and assuming
I don't know any Googlers to report this directly).

When I look into the cert details the certificates for Google webpages
are issued by
"Google Internet Authority G2"

If I goole for that I end up at
https://pki.google.com/

This page has a similar style as the pki.goog, but notably it doesn't
list any contact info. It has an FAQ, but that doesn't have any
question of the form "How do I report a problem with your CA?"

The only thing that might be helpful is a pointer to report security
incidents. I'd probably have done that, though I would be unsure, as
it's debatable whether an offline OCSP counts as a security issue.


Meta-comment:

I think the whole CA incident reporting question has lots of room for
improvement. And I think this should be considered in a way that people
who are not familiar with the details of the CA ecosystem can
successfully report incidents. I.e. saying "you can find all the
contact info in our CPS" is not particularly helpful, as nobody outside
a small circle of people knows what that is.
I think if people try the "natural" way of contacting a certificate
issuing entity this should lead to a successful outcome. (And that is
more or less "This has been issued by X, so I try to contact X".)

--
Hanno Böck
https://hboeck.de/

mail/jabber: ha...@hboeck.de
GPG: FE73757FA60E4E21B937579FA5880072BBB51E42

Ryan Sleevi

unread,
Jan 21, 2018, 4:01:41 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org
On Sun, Jan 21, 2018 at 2:08 PM David E. Ross via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> On 1/21/2018 9:50 AM, Ryan Sleevi wrote:
> > I couldn’t find that listed in the CP/CPS as where to report problems.
> > Instead, I see a different email listed.
> >
> > What made you decide to ignore the CP/CPS, which is where CAs list their
> > problem reporting mechanisms?
> >
> > Given that a CA’s CP/CPS applies to their hierarchy and issuance
> practices,
> > not a single certificate, and given that past discussions on this list
> have
> > specifically called out the CP/CPS as the place to determine problem
> > reporting mechanisms, it does seem unreasonable to expect arbitrary
> > reporting mechanisms to get the same attention as the defined mechanisms.
>
> At the time I tried reporting the problem, I forgot that Google had a
> pending request to add its root to NSS. When I checked the Certificate
> Manager list of Authorities in my browser, Google did not appear.


I’m not sure I see the relevance of this. Regardless of whether or not a CA
is pending inclusion, there is a defined mechanism for problem reporting,
provided in the CP/CPS. The Mozilla CCADB disclosures lists the applicable
CP/CPS.

Whatever other criticisms you may make, and I would say this regardless the
CA it affected, you used an adhoc reporting mechanism rather than any
defined problem reporting mechanism, and so the failure to respond to that
points less so to the CA’s failure than the reporters.

In any case, this OCSP problem still makes me question Google's ability
> to manage a certification authority. As a prior reply in this thread
> indicates, it took two days for Google to even acknowledge there is a
> problem.


This framing continues to adopt your misreporting of the date of report (in
order to beget acknowledgement). I agree that a full incident response is
warranted, but I do find it somewhat surprising that the basis of your
conclusion seems to be, from your previous remarks, predicated on a failure
to acknowledge your non-standard, ad-hoc problem report. I can understand
you may “have questions,” but absent details, and in light of your own
misunderstandings, I am curious whether you are being premature in
judgement?


>
> As of right now, it appears the problem has been fixed. With both
> checkboxes checked under OCSP at [Edit > Preferences > Privacy &
> Security > Certificates], I am now able to reach Google Web sites.
>

Ryan Sleevi

unread,
Jan 21, 2018, 4:15:23 PM1/21/18
to Hanno Böck, dev-secur...@lists.mozilla.org, Ryan Hurst
To be honest, I think I find myself agreeing with other CAs when I question
whether that should be or necessarily is a goal.

If you’ve been on an inbound bug queue for virtually any product
(particularly popular ones), you will be amazed at the (lack of) quality
reports. Just search the Mozilla or Chromium bug trackers for “my computer
has been hacked” to see a variety of bugs from people most likely suffering
from one or more mental disorders, unfortunately, to see how bad it can be.

Add to that the complexity of PKI, and the contractual obligations of
responsivess, and t becomes quite different. Talk to existing CAs that
provided email links to problem reporting mechanisms (prior to Mozilla’s
requiring they do so) and hear about the spam. I know of problem reports
from Google to other CAs that have similarly been caught by the spam
filters designed to ensure high signal.

Combined with the spectrum of technical acumen we see, even here, or
through contributions from Interested Parties to the CA/Browser Forum, and
I suspect that highlighting even more the reporting mechanism is to vastly
increase the noise, rather than the signal, and thus do more harm than good.

Normalizing problem reporting - meaning that reporters have to do more work
to align their reports into actionable data - conversely increases the
barrier to submission but reduces the barrier to action. Is it an equitable
tradeoff? It may be.

Something to ponder, however, as easier does not necessarily mean better.

>
>

s...@gmx.ch

unread,
Jan 21, 2018, 4:29:58 PM1/21/18
to dev-secur...@lists.mozilla.org
Hi

Thanks for investigating.

First of all, my previously curl command is not suitable to verify a
OCSP status. It only works for OCSP stapling which is not supported by
Google servers.
You may use openssl ocsp instead:
openssl ocsp -issuer [GoogleInternetAuthorityG2.crt] -cert
[googlecom.crt] -url http://clients1.google.com/ocsp -resp_text -header
HOST=clients1.google.com

I can confirm that the service is now working again for me most of the
time, but some queries still fail (may be due load balancing in the
backend?).


Am 21.01.2018 um 22:00 schrieb Hanno Böck via dev-security-policy:
> If I goole for that I end up at https://pki.google.com/ This page has
> a similar style as the pki.goog, but notably it doesn't list any
> contact info. It has an FAQ, but that doesn't have any question of the
> form "How do I report a problem with your CA?" The only thing that
> might be helpful is a pointer to report security incidents. I'd
> probably have done that, though I would be unsure, as it's debatable
> whether an offline OCSP counts as a security issue.

I ended up with the same situation. But "OCSP is down" does not fit in
any category on the vulnerability report site and the cartegory "other"
does only provide support articles.

signature.asc

Ryan Hurst

unread,
Jan 21, 2018, 4:42:59 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org
On Sunday, January 21, 2018 at 1:29:58 PM UTC-8, s...@gmx.ch wrote:
> Hi
>
> Thanks for investigating.
>
> I can confirm that the service is now working again for me most of the
> time, but some queries still fail (may be due load balancing in the
> backend?).
>

Thank you for your report and confirming you are seeing things starting to work.

Google operates a global network utilizing many redundant servers and the nature of the way that works is one connection to the next you may be hitting a different cluster of servers.

It can take a while for all of these different clusters to receive the associated updates.

This would explain your inconsistent results.

I am actively watching this deployment to ensure it completes successfully but at this point, it seems all will continue to roll out as expected.

As an aside, We are still continuing our post-mortem.

Ryan Hurst

unread,
Jan 21, 2018, 6:01:46 PM1/21/18
to mozilla-dev-s...@lists.mozilla.org
The issue should be 100% resolved now.

As per earlier posts, we will complete the post-mortem and report to the community with our findings.

ihave...@gmail.com

unread,
Jan 22, 2018, 4:26:01 AM1/22/18
to mozilla-dev-s...@lists.mozilla.org
Hi,

Just as an FYI, I am still getting 404. My geographic location is UAE if that helps at all.

My openssl command:
openssl ocsp -issuer gtsx1.pem -cert goodr1demopkigoog.crt -url http://ocsp.pki.goog/GTSGIAG3 -CAfile gtsrootr1.pem
Error querying OCSP responder
77317:error:27075072:OCSP routines:PARSE_HTTP_LINE1:server response error:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-59.60.1/src/crypto/ocsp/ocsp_ht.c:224:Code=404,Reason=Not Found

Kind regards,
Tham Wickenberg

Ryan Hurst

unread,
Jan 22, 2018, 11:31:55 AM1/22/18
to mozilla-dev-s...@lists.mozilla.org
On Monday, January 22, 2018 at 1:26:01 AM UTC-8, ihave...@gmail.com wrote:
> Hi,
>
> Just as an FYI, I am still getting 404. My geographic location is UAE if that helps at all.
>
> My openssl command:
> openssl ocsp -issuer gtsx1.pem -cert goodr1demopkigoog.crt -url http://ocsp.pki.goog/GTSGIAG3 -CAfile gtsrootr1.pem
> Error querying OCSP responder
> 77317:error:27075072:OCSP routines:PARSE_HTTP_LINE1:server response error:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-59.60.1/src/crypto/ocsp/ocsp_ht.c:224:Code=404,Reason=Not Found

Tham,

It seems you are not specifying the hostname header which is required by HTTP 1.1 which is required by RFC 2560:

Here is what a command for that root would look like:
openssl ocsp -issuer r1goodissuer.cer -cert r1good.cer -no_nonce -text -url "http://ocsp.pki.goog/GTSGIAG3" -header host ocsp.pki.goog

Ryan

Wayne Thayer

unread,
Jan 22, 2018, 3:07:32 PM1/22/18
to Ryan Sleevi, dev-secur...@lists.mozilla.org, Hanno Böck, Ryan Hurst
On Sun, Jan 21, 2018 at 2:14 PM, Ryan Sleevi via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

>
> > I think the whole CA incident reporting question has lots of room for
> > improvement. And I think this should be considered in a way that people
> > who are not familiar with the details of the CA ecosystem can
> > successfully report incidents. I.e. saying "you can find all the
> > contact info in our CPS" is not particularly helpful, as nobody outside
> > a small circle of people knows what that is.
>

Even if a relying party looks for the problem reporting mechanism in the
CPS, they are unlikely to find it. The only requirement is "The CA SHALL
publicly disclose the instructions through a readily accessible online
means" in BR 9.4.3. From my observations, many CAs do not place this in
their CPS, and almost none equate the requirement to "easy to find".

> I think if people try the "natural" way of contacting a certificate
> > issuing entity this should lead to a successful outcome. (And that is
> > more or less "This has been issued by X, so I try to contact X".)
>
> The "natural" way is likely to be some generic support email address that
receives thousands of emails a day and is subject to the problems Ryan
describes below. Maintaining a 24-hour response time for any email address
a relying party might find is not compatible with the requirement for a
timely response.

>
> To be honest, I think I find myself agreeing with other CAs when I question
> whether that should be or necessarily is a goal.
>
> If you’ve been on an inbound bug queue for virtually any product
> (particularly popular ones), you will be amazed at the (lack of) quality
> reports. Just search the Mozilla or Chromium bug trackers for “my computer
> has been hacked” to see a variety of bugs from people most likely suffering
> from one or more mental disorders, unfortunately, to see how bad it can be.
>
> Add to that the complexity of PKI, and the contractual obligations of
> responsivess, and t becomes quite different. Talk to existing CAs that
> provided email links to problem reporting mechanisms (prior to Mozilla’s
> requiring they do so) and hear about the spam. I know of problem reports
> from Google to other CAs that have similarly been caught by the spam
> filters designed to ensure high signal.
>
> Combined with the spectrum of technical acumen we see, even here, or
> through contributions from Interested Parties to the CA/Browser Forum, and
> I suspect that highlighting even more the reporting mechanism is to vastly
> increase the noise, rather than the signal, and thus do more harm than
> good.
>
> Normalizing problem reporting - meaning that reporters have to do more work
> to align their reports into actionable data - conversely increases the
> barrier to submission but reduces the barrier to action. Is it an equitable
> tradeoff? It may be.
>
> Something to ponder, however, as easier does not necessarily mean better.
>
> This is a good point, but easier doesn't necessarily mean worse either. I
propose that we add a requirement that makes the reporting mechanism more
consistent and easier to find (e.g. clearly labeled so that a search for
"google CA problem report" gets me there). Allow the reporting mechanism to
be flexible so that a CA can, for example, use a form with a captcha to
collect the report. I don't know if we need to specify "better" by
normalizing the mechanism or information that is gathered, but I'm also not
opposed.

Moudrick M. Dadashov

unread,
Jan 22, 2018, 4:38:57 PM1/22/18
to Wayne Thayer, Ryan Sleevi, dev-secur...@lists.mozilla.org, Hanno Böck, Ryan Hurst
Hi Wayne,

This is how its supposed to work under eIDAS:

1. Check the value of the QCStatement [1] of the certificate under
problem (which is the location of PDS);
2. Open the PDS and check relevant contact info as in [2].

Thanks,
M.D.

[1] see 4.3.4 (QCStatement regarding location of PKI Disclosure
Statements (PDS)) in ETSI EN 319 412-5;
[2] see Annex 1 (Model PKI disclosure statement) in ETSI EN 319 411-1.

Ryan Hurst

unread,
Feb 21, 2018, 11:53:19 PM2/21/18
to mozilla-dev-s...@lists.mozilla.org
I wanted to follow up with our findings and a summary of this issue for the community.

Bellow you will see a detail on what happened and how we resolved the issue, hopefully this will help explain what hapened and potentially others not encounter a similar issue.

Summary
-------
January 19th, at 08:40 UTC, a code push to improve OCSP generation for a subset of the Google operated Certificate Authorities was initiated. The change was related to the packaging of generated OCSP responses. The first time this change was invoked in production was January 19th at 16:40 UTC.

NOTE: The publication of new revocation information to all geographies can take up to 6 hours to propagate. Additionally, clients and middle-boxes commonly implement caching behavior. This results in a large window where clients may have begun to observe the outage.

NOTE: Most modern web browsers “soft-fail” in response to OCSP server availability issues, masking outages. Firefox, however, supports an advanced option that allows users to opt-in to “hard-fail” behavior for revocation checking. An unknown percentage of Firefox users enable this setting. We believe most users who were impacted by the outage were these Firefox users.

About 9 hours after the deployment of the change began (2018-01-20 01:36 UTC) a user on Twitter mentions that they were having problems with their hard-fail OCSP checking configuration in Firefox when visiting Google properties. This tweet and the few that followed during the outage period were not noticed by any Google employees until after the incident’s post-mortem investigation had begun.

About 1 day and 22 hours after the push was initiated (2018-01-21 15:07 UTC), a user posted a message to the mozilla.dev.security.policy mailing list where they mention they too are having problems with their hard-fail configuration in Firefox when visiting Google properties.

About two days after the push was initiated, a Google employee discovered the post and opened a ticket (2018-01-21 16:10 UTC). This triggered the remediation procedures, which began in under an hour.

The issue was resolved about 2 days and 6 hours from the time it was introduced (2018-01-21 22:56 UTC). Once Google became aware of the issue, it took 1 hour and 55 minutes to resolve the issue, and an additional 4 hours and 51 minutes for the fix to be completely deployed.

No customer reports regarding this issue were sent to the notification addresses listed in Google's CPSs or on the repository websites for the duration of the outage. This extended the duration of the outage.

Background
----------
Google's OCSP Infrastructure works by generating OCSP responses in batches, with each batch being made up of the certificates issued by an individual CA.

In the case of GIAG2, this batch is produced in chunks of certificates issued in the last 370 days. For each chunk, the GIAG2 CA is asked to produce the corresponding OCSP responses, the results of which are placed into a separate .tar file.

The issuer of GIAG2 has chosen to issue new certificates to GIAG2 periodically, as a result GIAG2 has multiple certificates. Two of these certificates no longer have unexpired certificates associated with them. As a result, and as expected, the CA does not produce responses for the corresponding periods.

All .tar files produced during this process are then concatenated with the -concatenate command in GNU tar. This produces a single .tar file containing all of the OCSP responses for the given Certificate Authority, then this .tar file is distributed to our global CDN infrastructure for serving.

A change was made in how we batch these responses, specifically instead of outputting many .tar files within a batch, a concatenation was of all tar files was produced.

The change in question triggered an unexpected behaviour in GNU tar which then manifested as an empty tarball. These "empty" updates ended up being distributed to our global CDN, effectively dropping some responses, while continuing to serve responses for other CAs.

During testing of the change, this behaviour was not detected, as the tests did not cover the scenario in which some chunks did not contain unexpired certificates.

Findings
--------
- The outage only impacted sites with TLS certificates issued by the GIAG2 CA as it was the only CA that met the required pre-conditions of the bug.
- The bug that introduced this failure manifested itself as an empty container of OCSP responses. The root cause of the issue was an unexpected behavior of GNU tar relating to concatenating tar files.
- The outage was observed by revocation service monitoring as “unknown certificate” (HTTP 404) errors. HTTP 404 errors are expected in OCSP responder operations; they typically are the result of poorly configured clients. These events are monitored and a threshold does exist for an on-call escalation.
- Due to a configuration error the designated Google team did not receive an escalation message.
- External users did not use the contact details Google provided in the CPS.

Remediation Plan
----------------
- A bug fix has been applied to prevent the same issue from happening again.
- Test cases looking for a minimum number of OCSP responses in each tar were added to the test automation suites to catch similar issues in the future.
- The monitoring system that was misconfigured was updated to use the correct address for escalations.
- Both the Google Trust Services CPS (found on pki.goog) and the Google CPS (found on pki.google.com) have been updated to make it clear what email address is the most expedient path to reach the PKI team for non-security incidents.
- The Google PKI repository page was updated to show contact details in the same way the Google Trust Services repository page already did in a hope to help users find a path of escalation.
- The wizard that is returned for mails to the security email address has been updated to also include an explicit option for issues related to the “Google Certificate Authority” in the hopes of helping users who choose this path of escalation.
- Existing procedures that are relied upon for periodic verification of effective escalation have been updated to include unknown certificate checking.

Paul Kehrer

unread,
Feb 22, 2018, 2:01:43 AM2/22/18
to mozilla-dev-s...@lists.mozilla.org
Thank you for this comprehensive incident report Ryan. Your team's decision
to improve the documentation around the right address for reporting is
great to see! I wonder if it might also make sense to pull the contact
information directly on https://pki.goog above the fold?

-Paul (reaperhulk)

On February 22, 2018 at 12:53:32 PM, Ryan Hurst via dev-security-policy (

Tim Hollebeek

unread,
Feb 25, 2018, 9:42:18 AM2/25/18
to Ryan Hurst, mozilla-dev-s...@lists.mozilla.org
Ryan,

Wayne and I have been discussing making various improvements to 1.5.2
mandatory for all CAs. I've made a few improvements to DigiCert's CPSs in
this area, but things probably still could be better. There will probably be
a CA/B ballot in this area soon.

DigiCert's 1.5.2 has our support email address, and our Certificate Problem
Report email (which I recently added). That doesn't really cover everything
(yet).

It looks like GTS 1.5.2 splits things into security (including CPRs), non-security
requests.

I didn't chase down any other 1.5.2's yet, but it'd be interesting to hear what
other CAs have here. I suspect most only have one address for everything.

Something to keep in mind once the CA/B thread shows up.

-Tim
> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://clicktime.symantec.com/a/1/c7XVow9dpuj8IcTSi3RUsAZNao2vvQpjx50
> I-L-Vues=?d=a8bGh4U_daa8sZ6NrNFYldn92rRny4FeSmGVut8w-
> EpNntcoPemdf815YVvwKHuqoKWrFl-_FF88KvI-
> g6MtPoT7dR8X0p7jIOiMMzFB1Oo7HjzsAY1_9lqhZrLywcjqWbk13D_p3Ll4Lsel0
> FbCfxQg8ZRva7LmdOqP_8fxd4j4zZQZtuK1IaD6sXqMG0L7ytNcn6rF2IUFRa4Qa
> VWZK1TzJXCjW_OddQll8kDyKRRM_ygs1cq6S-
> igplPwN_yuWgdTc7_rIz0lzmwwvaaTuM20kuHGNPwWaFXn3pVW9313nUNiXz
> BLAr8DV4QEgnaRqD_CLgMftm7WfKblze0HRF-
> N45Bld6PgwdHDi2xobKs0BSWDW5tOuJmzbtPmfPvBxSTMduaXRBXTQAKl4zf1q
> iD0rIGhSVrdmJCz9a69KaAmJjoVcwKfn9h4rwU5h2ydzQ%3D%3D&u=https%3A
> %2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy

Ryan Hurst

unread,
Feb 25, 2018, 3:11:46 PM2/25/18
to Tim Hollebeek, mozilla-dev-s...@lists.mozilla.org
Tim,

I can see value in a ballot on how to clarify incident reporting and other
contact related issues, right now 1.5.2 is pretty sparse in regards to how
to handle this. I would be happy to work with you on a proposal here.

Ryan

On Sun, Feb 25, 2018 at 6:41 AM, Tim Hollebeek <tim.ho...@digicert.com>
wrote:
> > _______________________________________________
> > dev-security-policy mailing list
> > dev-secur...@lists.mozilla.org
0 new messages