Authentication issues with App Engine

Google Cloud Platform Status

unread,

Dec 8, 2015, 2:55:21 AM12/8/15

to google-appengine...@googlegroups.com

"We are experiencing an issue with App Engine applications accessing Google
APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may
return a "401 Invalid Credentials" error message.
For everyone who is affected, we apologize for any inconvenience you may be
experiencing. We will provide an update by 2015-12-08 00:20 US/Pacific with
current details."

Google Cloud Platform Status

unread,

Dec 8, 2015, 2:55:22 AM12/8/15

to google-appengine...@googlegroups.com

Google Cloud Platform Status

unread,

Dec 8, 2015, 2:55:23 AM12/8/15

to google-appengine...@googlegroups.com

Google Cloud Platform Status

unread,

Dec 8, 2015, 3:05:21 AM12/8/15

to google-appengine...@googlegroups.com

We are investigating reports of an issue with App Engine applications
accessing Google APIs. We will provide more information by 23:50

Google Cloud Platform Status

unread,

Dec 8, 2015, 3:23:22 AM12/8/15

to google-appengine...@googlegroups.com

We are experiencing an issue with App Engine applications accessing Google

APIs beginning at Monday, 2015-12-08 22:00 US/Pacific. Affected APIs may

return a "401 Invalid Credentials" error message.
For everyone who is affected, we apologize for any inconvenience you may be

experiencing. We will provide an update by 2015-12-09 01:20 US/Pacific with
current details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 4:19:56 AM12/8/15

to google-appengine...@googlegroups.com

We are experiencing an issue with App Engine applications accessing Google
APIs beginning at Monday, 2015-12-08 22:00 US/Pacific. Affected APIs may
return a "401 Invalid Credentials" error message.
For everyone who is affected, we apologize for any inconvenience you may be

experiencing. We will provide an update by 2015-12-09 02:30 US/Pacific with
current details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 5:19:39 AM12/8/15

to google-appengine...@googlegroups.com

We are experiencing an issue with App Engine applications accessing Google
APIs beginning at Monday, 2015-12-08 22:00 US/Pacific. Affected APIs may
return a "401 Invalid Credentials" error message. For everyone who is
affected, we apologize for any inconvenience you may be experiencing. We

will provide an update by 2015-12-09 03:30 US/Pacific with current details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 6:11:59 AM12/8/15

to google-appengine...@googlegroups.com

The issue with App Engine applications accessing Google APIs should have
been resolved for the majority of projects and we expect a full resolution
in the near future. We will provide another status update by 08:00
US/Pacific with current details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 7:19:36 AM12/8/15

to google-appengine...@googlegroups.com

Despite actions taken to mitigate the problem, a significant number of App
Engine applications have continued to experience errors while accessing
Google APIs. For everyone who is affected, we apologize for any

inconvenience you may be experiencing. We will provide an update by

2015-12-08 05:30 US/Pacific with current details.

unread,

Dec 8, 2015, 2:29:31 PM12/8/15

to google-appengine...@googlegroups.com

We’re investigating elevated error rates for some Google Cloud Platform
users. We believe these errors are affecting between 2-5 percent of Google
App Engine (GAE) applications. We are working directly with the customers
who are affected to restore full operation in their application as quickly
as possible, and apologize for any inconvenience.
We will provide another status update by 2015-12-08 12:30 US/Pacific with
current details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 3:32:05 PM12/8/15

to google-appengine...@googlegroups.com

We believe the issue is resolved for most customers.
A new update will be provided by 2015-12-08 13:30 US/Pacific with current
details.

Google Cloud Platform Status

unread,

Dec 8, 2015, 4:29:39 PM12/8/15

to google-appengine...@googlegroups.com

The issue with App Engine applications accessing Google APIs should have

been resolved for all affected customers as of 13:15 US/Pacific. We will
conduct an internal investigation of this issue and make appropriate
improvements to our systems to prevent or minimize future recurrence. We
will provide a more detailed analysis of this incident once we have
completed our internal investigation.

Google Cloud Platform Status

unread,

Dec 16, 2015, 10:40:17 AM12/16/15

to google-appengine...@googlegroups.com

SUMMARY:

On Monday 7 December 2015, 1.29% of Google App Engine applications received
errors when issuing authenticated calls to Google APIs over a period of 17
hours and 3 minutes. During a 45-minute period, authenticated calls to
Google APIs from outside of App Engine also received errors, with the error
rate peaking at 12%. We apologise for the impact of this issue on you and
your service. We consider service degradation of this level and duration to
be very serious and we are planning many changes to prevent this occurring
again in the future.

DETAILED DESCRIPTION OF IMPACT:

Between Monday 7 December 2015 20:09 PST and Tuesday 8 December 2015 13:12,
1.29% of Google App Engine applications using service accounts received
error 401 "Access Denied" for all requests to Google APIs requiring
authentication. Unauthenticated API calls were not affected. Different
applications experienced impact at different times, with few applications
being affected for the full duration of the incident.

In addition, between 23:05 and 23:50, an average of 7% of all requests to
Google Cloud APIs failed or timed out, peaking briefly at 12%. Outside of
this time only API calls from App Engine were affected.

ROOT CAUSE:

Google engineers have recently carried out a migration of the Google
Accounts system to a new storage backend, which included copying API
authentication service credentials data and redirecting API calls to the
new backend.

To complete this migration, credentials were scheduled to be deleted from
the previous storage backend. This process started at 20:09 PST on Monday 7
December 2015. Due to a software bug, the API authentication service
continued to look up some credentials, including those used by Google App
Engine service accounts, in the old storage backend. As these credentials
were progressively deleted, their corresponding service accounts could no
longer be authenticated.

The impact increased as more credentials were deleted and some Google App
Engine applications started to issue a high volume of retry requests. At
23:05, the retry volume exceeded the regional capacity of the API
authentication service, causing 1.3% of all authenticated API calls to fail
or timeout, including Google APIs called from outside Google App Engine. At
23:30 the API authentication service exceeded its global capacity, causing
up to 12% of all authenticated API calls to fail until 23:50, when the
overload issue was resolved.

REMEDIATION AND PREVENTION:

At 23:50 PST on Monday 8 December, Google engineers blocked certain
authentication credentials that were known to be failing, preventing
retries on these credentials from overloading the API authentication
service.

On Tuesday 9 December 08:52 PST, the deletion process was halted, having
removed 2.3% of credentials, preventing further applications from being
affected. At 10:08, Google engineers identified the root cause for the
misdirected credentials lookup. After thorough testing, a fix was rolled
out globally, resolving the issue for all affected Google App Engine
applications by 13:12.

Google has conducted a far-reaching review of the issue's root causes and
contributory factors, leading to numerous prevention and mitigation actions
in the following areas:
— Google engineers have deployed monitoring for additional infrastructure
signals to detect and analyse similar issues more quickly.
— Google engineers have improved internal tools to extend auditing and
logging and automatically advise relevant teams on potentially risky data
operations.
— Additional rate limiting and caching features will be added to the API
authentication service, increasing its resilience to load spikes.
— Google’s development guidelines are being reviewed and updated to improve
the handling of service or backend migrations, including a grace period of
disabling access to old data locations before fully decommissioning them.

Our customers rely on us to provide a superior service and we regret we did
not live up to expectations in this case. We apologize again for the
inconvenience this caused you and your users.

Reply all

Reply to author

Forward