Serving issues

Max Ross (Google)

unread,

Oct 26, 2012, 11:28:03 AM10/26/12

to google-appengine...@googlegroups.com

App Engine is currently experiencing serving issues. The team is actively working on restoring the service to full strength. Please follow this thread for updates.

Max Ross (Google)

unread,

Oct 26, 2012, 12:33:01 PM10/26/12

to google-appengine...@googlegroups.com

At approximately 7:30am Pacific time this morning, Google began experiencing slow performance and dropped connections from one of the components of App Engine. The symptoms that service users would experience include slow response and an inability to connect to services. We currently show that a majority of App Engine users and services are affected. Google engineering teams are investigating a number of options for restoring service as quickly as possible, and we will provide another update as information changes, or within 60 minutes.

Christina Ilvento

unread,

Oct 26, 2012, 1:51:17 PM10/26/12

to google-appengine...@googlegroups.com

We are continuing work to correct the ongoing issues with App Engine. Operation has been restored for some services, while others continue to see slow response times and elevated error rates. The malfunction appears to be limited to a single component which routes requests from users to the application instance they are using, and does not affect the application instances themselves.

We’ll post another status update as more information becomes available, and/or no later than one hour from now.

Christina Ilvento

unread,

Oct 26, 2012, 3:06:00 PM10/26/12

to google-appengine...@googlegroups.com

At this point, we have stabilized service to App Engine applications. App Engine is now successfully serving at our normal daily traffic level, and we are closely monitoring the situation and working to prevent recurrence of this incident.

This morning around 7:30AM US/Pacific time, a large percentage of App Engine’s load balancing infrastructure began failing. As the system recovered, individual jobs became overloaded with backed-up traffic, resulting in cascading failures. Affected applications experienced increased latencies and error rates. Once we confirmed this cycle, we temporarily shut down all traffic and then slowly ramped it back up to avoid overloading the load balancing infrastructure as it recovered. This restored normal serving behavior for all applications.

We’ll be posting a more detailed analysis of this incident once we have fully investigated and analyzed the root cause.

Regards,

Christina Ilvento on behalf of the Google App Engine Team

Christina Ilvento

unread,

Oct 26, 2012, 8:54:30 PM10/26/12

to google-appengine...@googlegroups.com

Hi All,

We've posted more details on our blog: http://googleappengine.blogspot.com/2012/10/about-todays-app-engine-outage.html