We have noticed an issue impacting the networking to Google App Engine. Issue started at 2015/01/20 18:24 (all times are in US/Pacific). The problem was resolved as of 2015/01/20 18:42. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

64 views
Skip to first unread message

Google Cloud Platform Status

unread,
Jan 22, 2015, 12:10:59 AM1/22/15
to google-appengine...@googlegroups.com
SUMMARY:

On Tuesday 20 January 2015, some Google App Engine applications experienced
elevated rates of HTTP 500 errors for a duration of 11 minutes. We
apologize if you were affected by this incident. We are working hard to
prevent incidents like this from recurring in future.

DETAILED DESCRIPTION OF IMPACT:

On Tuesday 20 January 2015, some Google App Engine apps experienced
elevated rates of HTTP 500 errors during the following time intervals:
18:24 - 18:27, 18:36 - 18:41, and 19:06 - 19:08 (all times in PST). The
issue affected 13% of applications. This issue caused 3% of requests to App
Engine to receive 500 errors during the 11 minutes of the incident.

ROOT CAUSE:

The issue was caused by an error in the software-defined networking control
system responsible for network traffic between Google datacenters. The
system incorrectly determined that there had been a drop in network
capacity available to App Engine applications in one datacenter.

REMEDIATION AND PREVENTION:

Our engineers received an automated alert for the issue at 18:42. At 18:55,
we redirected some traffic away from the affected datacenter. The system
returned to stability at 19:08.

To prevent a recurrence of this issue, we will disable the subsystem which
malfunctioned until both a fix for the immediate malfunction and a defense
in depth have been deployed.
Reply all
Reply to author
Forward
0 new messages