Google App Engine issue with Task Queue service beginning at 3:50 PM US/Pacific on August 1, 2014

122 views
Skip to first unread message

Google App Engine Downtime Notify

unread,
Aug 1, 2014, 8:23:51 PM8/1/14
to google-appengine...@googlegroups.com
Starting at Friday, 2014-08-01 15:50, Google App Engine Task Queues failed to execute tasks. This incident with Google App Engine Task Queues was resolved as of Friday, 2014-08-01 17:00 (all times are in US/Pacific). We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Google App Engine Downtime Notify

unread,
Aug 11, 2014, 2:47:15 PM8/11/14
to google-appengine...@googlegroups.com, google-appengine...@googlegroups.com

SUMMARY:

On Friday, August 1, 2014, the App Engine Task Queue service delayed execution of some tasks for a duration of 70 minutes. If your service or application was affected, we apologize for any inconvenience.  We have taken and are taking immediate steps to improve the platform’s performance and availability.


DETAILED DESCRIPTION OF IMPACT:

On Friday 1st of April 2014 from 15:50 to 17:00 US/Pacific, App Engine’s Task Queue service delayed execution of some tasks in the queue for some applications.  During this period, the number of HTTP requests executed by Task Queue service dropped 21.7%.  Execution of affected tasks were delayed, and cron jobs could not start on time.


ROOT CAUSE:

Google engineers were changing the configuration of the Task Queue service to allocate more resources for each process that manages tasks in the queue. Operator error led to misconfiguration of resource requirements, preventing the Task Queue service from operating in some datacenters until reconfigured properly.


REMEDIATION AND PREVENTION:

To fix the immediate issue, Google engineers directed traffic to datacenters that were not impacted.  To prevent the issue in the future, Google engineers will enhance our deployment tool to prevent this class of misconfiguration when restarting processes.  Google engineers will also increase the resources allocated to the Task Queue service so that the service has a buffer to perform the retry attempts of the configuration change.


Reply all
Reply to author
Forward
0 new messages