SUMMARY:
On Monday 29 September 2014, some Google App Engine applications using the Task Queue API experienced a decrease in the dispatch rate for tasks for a period of 2 hours and 28 minutes. In addition, on Monday 29 September and Tuesday 30 September 2014, some App Engine applications experienced errors when creating files using the Files API for a period of 11 hours and 2 minutes.
We hold ourselves to a high standard, and we failed to meet that standard. We are taking action to ensure that incidents like this do not happen in the future.
DETAILED DESCRIPTION OF IMPACT:
From Monday 29 September 2014 19:30 to 21:58 PDT, 29% of App Engine applications using the Task Queue API in US datacenters experienced a decrease in the dispatch rate for tasks. During the incident, tasks were dispatched at 78% of the rate seen during the previous day at the same time.
From Monday 29 September 21:58 until Tuesday 30 September 09:00, 27% of App Engine applications using the Files API in US datacenters experienced errors when creating files. The error rate for affected applications during this period was 95%.
ROOT CAUSE:
Both the task queue dispatch issue and Files API issue were ultimately caused by a failure in the storage layer in one US datacenter. Initially, the impact of the storage layer issue was limited to a drop in the task queue dispatch rate. We later determined that its impact would become more severe. We therefore redirected all App Engine traffic to other datacenters. This change exposed a latent misconfiguration in the Files API, which caused affected applications to experience errors when creating files.
REMEDIATION AND PREVENTION:
The App Engine support team received the first customer report of a drop in the task queue dispatch rate at 20:31. To resolve this issue, our engineers moved task queue operations for affected applications to other datacenters at 21:58.
At 22:54, our engineers moved all App Engine traffic away from the affected datacenter, which led to the Files API errors. Our engineers diagnosed and fixed the Files API issue at 07:41. The fix was fully rolled out to all affected customers by 09:00.
For customers using the Files API, which is now deprecated, we recommend that you migrate your code to use the Cloud Storage client library instead:
https://cloud.google.com/appengine/docs/java/googlecloudstorageclient/
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
Our support team will contact customers that make significant use of the Files API and provide help to move their code to a fully supported solution.