Authentication issues with Google App Engine

220 views
Skip to first unread message

Google Cloud Platform Status

unread,
Mar 5, 2015, 10:52:44 AM3/5/15
to google-appengine...@googlegroups.com
We are investigating an issue with App Engine authentication services. We
will post an update shortly.

Google Cloud Platform Status

unread,
Mar 5, 2015, 11:27:49 AM3/5/15
to google-appengine...@googlegroups.com
We are investigating an issue with authentication on Google App Engine
beginning at Thursday, 2015-03-05 07:32 (all times are in US/Pacific).

Affected applications are responding with a HTTP 302 on user login, or with
a 403 error connecting to Google APIs.

We will provide more information by 09:00 US/Pacific time.

Google Cloud Platform Status

unread,
Mar 5, 2015, 11:46:20 AM3/5/15
to google-appengine...@googlegroups.com
The issue with Google App Engine and Google APIs authentication is resolved
for most applications as of 08:28 US/Pacific. Our engineers continue to
monitor the situation to ensure that service is fully restored and stable.

We will provide more information by 09:15 US/Pacific.

Google Cloud Platform Status

unread,
Mar 5, 2015, 11:56:14 AM3/5/15
to google-appengine...@googlegroups.com
The problem with authentication on Google App Engine and the Google APIs
was resolved as of Thursday, 2015-03-05 08:27 (all times are in
US/Pacific). We apologize to our customers for the inconvenience, and we
thank you for your patience and continued support.

We will provide a more detailed analysis of this incident once we have
completed our internal investigation.

Google Cloud Platform Status

unread,
Mar 5, 2015, 6:01:52 PM3/5/15
to google-appengine...@googlegroups.com
At 7:04 AM PST Google systems began returning errors for approximately 20%
of requests from App Engine to many Google Cloud Platform APIs. The error
rate peaked around 50% at 7:50 and remained at that level until the
incident was resolved at 8:26. Many users observed this issue as a failure
of the authentication service. We will post a complete incident report
following our internal investigation.

Google Cloud Platform Status

unread,
Mar 6, 2015, 5:00:25 PM3/6/15
to google-appengine...@googlegroups.com
SUMMARY:

On Thursday 5 March 2015, for a duration of 84 minutes, Google App Engine
applications that accessed some Google APIs over HTTP experienced elevated
error rates. We apologize for any impact this incident had on your service
or application, and have made immediate changes to prevent this issue from
recurring.

DETAILED DESCRIPTION OF IMPACT:

On Thursday 5 January, from 07:04 AM to 08:28 AM, some Google App Engine
applications making calls to other Google APIs via HTTP experienced
elevated error rates. During the incident, the global error rate for all
API calls remained under 1%, and in total, the outage affected 2% of
applications that were active during the incident. The effect on those
applications was significant: requests to issue OAuth tokens experienced an
error rate of over 85%. In addition, the HTTP APIs to
googleapis.com/storage and googleapis.com/gmail received error rates
between 50% and 60%. Other googleapis.com endpoints were affected with
error rates of 10% to 20%.

ROOT CAUSE:

A component in Google’s shared HTTP load balancing fabric experienced a
non-malicious increase in traffic, exceeding its provisioned capacity. This
triggered an automatic DoS protection which shunted a portion of the
incoming traffic to a CAPTCHA. The unexpected response caused some clients
to issue automated retries, exacerbating the problem.

REMEDIATION AND PREVENTION:

Google Engineers were alerted to the issue by automated monitoring at
07:02, as the load balancing system detected excess traffic and attempted
to automatically mitigate it. At 07:46, Google Engineers enabled standby
load balancing capacity to rectify the issue. From 08:15 to 08:40, Google
Engineers continued to provision additional resources in the load balancing
fabric in order to serve the increased traffic. During this period, at
08:28, Google engineers determined that sufficient capacity was in place to
serve both regular and retry traffic, and instructed the load balancing
system to cease mitigation and resume normal traffic serving. This action
marked the end of the event.

To prevent this issue from recurring, Google engineers are comprehensively
re-examining the affected load balancing fabric to ensure it is and remains
correctly provisioned. Additionally, Google engineers are improving
monitoring rules to provide an early warning of capacity shortfall.
Finally, Google engineers are examining the services that depend on this
load balancing system, and will move some services to a separate pool of
more easily scalable load balancers where appropriate.
Reply all
Reply to author
Forward
0 new messages