Google App Engine Issues on Thursday, May 08th 2014

658 views
Skip to first unread message

Google App Engine Downtime Notify

unread,
May 9, 2014, 11:02:28 AM5/9/14
to google-appengine...@googlegroups.com
We're investigating an issue with Google App Engine beginning at Thursday, 2014-05-08 18:00 PM US/Pacific where some applications are serving with higher number of instances than usual.

We will provide more information by 2014-05-09 9:00 AM US/Pacific.

Google App Engine Downtime Notify

unread,
May 9, 2014, 12:06:03 PM5/9/14
to google-appengine...@googlegroups.com
We are currently experiencing an issue with Google App Engine and some applications are being served from a higher number of instances than usual. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide the next update update by Friday, 2014-05-09 10:00 (Pacific Time) with further details.

Google App Engine Downtime Notify

unread,
May 9, 2014, 1:11:43 PM5/9/14
to google-appengine...@googlegroups.com
We are continuing to experience elevated instance counts in some Google App Engine applications. Our engineers are working to resolve this issue. We will provide the next update by Friday, 2014-05-09 11:00 AM (Pacific Time) with further details.

Google App Engine Downtime Notify

unread,
May 9, 2014, 2:07:49 PM5/9/14
to google-appengine...@googlegroups.com
The problem with Google App Engine using more than the expected number of instances was resolved as of Friday, 2014-05-09 10:15 (US Pacific Time). We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.We will make sure that impacted customers will not be charged for any extra instances that were used during this period, and will provide a more detailed analysis of the incident once we have completed our internal investigation.

Google App Engine Downtime Notify

unread,
May 15, 2014, 7:45:23 PM5/15/14
to google-appengine...@googlegroups.com

SUMMARY:

On Thursday afternoon, 8 May 2014, between 0.01% and 0.10% of applications on GAE experienced an unexpected increase in the number of instances associated with their application.  If your service or application was affected, we apologize - we have corrected the error in GAE, we are crediting all affected applications for the erroneous additional instance-hours, and we are improving our GAE monitoring and release procedures to help prevent a recurrence.


DETAILED DESCRIPTION OF IMPACT:

On Thursday 8 May 2014, between 0.01% and 0.10% of applications on GAE experienced an increase in the number of running instances during the period 16:50 PST to 9:50 PST the following morning.  This behavior resulted in increases in instance hours quota usage,higher numbers of loading requests, and in some cases moderately-increased latency. The increased use of instance hours caused approximately 0.001% of apps to reach their free instance-hours limit or daily budget, resulting in some errors.


ROOT CAUSE:

The root cause of the incident was an issue introduced by the rollout of GAE version 1.9.5. The new scheduler did not correctly re-use idle instances in specific instances, instead starting new instances for every request.  The issue took some time to resolve as the large number of instances was persisted across a rollback to the previous 1.9.4 version.


REMEDIATION AND PREVENTION:

Google engineers reset the instance hour quota to stop quota exceeded errors. Google engineers resolved the issue by redirecting traffic to a datacenter running GAE version 1.9.4. Google will credit impacted customers to cover the cost of instances used during this period.

To prevent recurrences, Google engineers are adding additional pre-launch tests and improving the alerting and management infrastructure to ensure rapid detection and diagnosis of any recurrence..

Reply all
Reply to author
Forward
0 new messages