|Elevated error rates and latency on 2013-01-15||Chris Ramsdale||1/17/13 4:21 PM|
Beginning on January 15, 2012 at approximately 7 AM US/Pacific time and continuing until approximately 12 PM US/Pacific time, some Google App Engine applications experienced elevated request latency and error rates. This incident was caused by a configuration issue in our storage layer that resulted in increased CPU usage for a single datacenter. Approximately 4% of all requests to App Engine applications resulted in errors during this event. For some applications, a majority of requests resulted in errors.
This incident was identified by our standard monitoring systems and we began taking corrective measures immediately, including moving applications out of the affected data center. A more detailed timeline is included below (all times are in US/Pacific).
App Engine infrastructure operates across multiple data centers and is designed to be resilient both to individual hardware failures and even to the loss of an entire data center. We are actively improving our ability to move applications from one data center to another quickly and transparently. Similarly, we are improving our processes and tools for configuration changes to the storage infrastructure to avoid similar incidents in the future.
We apologize for the inconvenience caused by this outage. If you believe your paid application experienced an SLA violation during this incident, please fill out our refund request form.
Chris Ramsdale on behalf of the Google App Engine Team