Google App Engine Task Queue issues beginning 2014-09-29 19:30 Pacific Time

158 views
Skip to first unread message

Google App Engine Downtime Notify

unread,
Sep 30, 2014, 1:22:28 AM9/30/14
to google-appengine...@googlegroups.com
We're investigating an issue with Google App Engine Task Queue beginning at Sunday, 2014-09-29 19:30 US/Pacific. We will provide more information within the next 30 minutes.

Google App Engine Downtime Notify

unread,
Sep 30, 2014, 1:42:18 AM9/30/14
to google-appengine...@googlegroups.com, google-appengine...@googlegroups.com
The problem with Google App Engine Task Queue lower processing rate was fully resolved as of Monday, 2014-09-29 22:13 US/Pacific. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Google App Engine Downtime Notify

unread,
Oct 1, 2014, 4:54:07 PM10/1/14
to google-appengine...@googlegroups.com

SUMMARY:


On Monday 29 September 2014, some Google App Engine applications using the Task Queue API experienced a decrease in the dispatch rate for tasks for a period of 2 hours and 28 minutes. In addition, on Monday 29 September and Tuesday 30 September 2014, some App Engine applications experienced errors when creating files using the Files API for a period of 11 hours and 2 minutes.


We hold ourselves to a high standard, and we failed to meet that standard. We are taking action to ensure that incidents like this do not happen in the future.


DETAILED DESCRIPTION OF IMPACT:


From Monday 29 September 2014 19:30 to 21:58 PDT, 29% of App Engine applications using the Task Queue API in US datacenters experienced a decrease in the dispatch rate for tasks. During the incident, tasks were dispatched at 78% of the rate seen during the previous day at the same time.


From Monday 29 September 21:58 until Tuesday 30 September 09:00, 27% of App Engine applications using the Files API in US datacenters experienced errors when creating files. The error rate for affected applications during this period was 95%.


ROOT CAUSE:


Both the task queue dispatch issue and Files API issue were ultimately caused by a failure in the storage layer in one US datacenter. Initially, the impact of the storage layer issue was limited to a drop in the task queue dispatch rate. We later determined that its impact would become more severe. We therefore redirected all App Engine traffic to other datacenters. This change exposed a latent misconfiguration in the Files API, which caused affected applications to experience errors when creating files.


REMEDIATION AND PREVENTION:


The App Engine support team received the first customer report of a drop in the task queue dispatch rate at 20:31. To resolve this issue, our engineers moved task queue operations for affected applications to other datacenters at 21:58.


At 22:54, our engineers moved all App Engine traffic away from the affected datacenter, which led to the Files API errors. Our engineers diagnosed and fixed the Files API issue at 07:41. The fix was fully rolled out to all affected customers by 09:00.


For customers using the Files API, which is now deprecated, we recommend that you migrate your code to use the Cloud Storage client library instead:


https://cloud.google.com/appengine/docs/java/googlecloudstorageclient/

https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/


Our support team will contact customers that make significant use of the Files API and provide help to move their code to a fully supported solution.


Reply all
Reply to author
Forward
0 new messages