Google App Engine issue with UrlFetch service beginning at 04:15 US/Pacific on June 17, 2014

120 views
Skip to first unread message

Ilya Zakreuski

unread,
Jun 17, 2014, 10:09:37 AM6/17/14
to google-appengine...@googlegroups.com
We're investigating an issue with Google App Engine UrlFetch service beginning at approximately Tuesday, 2014-06-17 04:15 (US/Pacific Time). We will provide more information shortly.

Ilya Zakreuski

unread,
Jun 17, 2014, 10:49:05 AM6/17/14
to google-appengine...@googlegroups.com
We have identified the issue affecting Google App Engine UrlFetch service and latency is returning to normal for majority of affected applications starting from 07:00 (US/Pacific Time). We will provide another update shortly.

Ilya Zakreuski

unread,
Jun 17, 2014, 12:00:05 PM6/17/14
to google-appengine...@googlegroups.com
The problem with Google App Engine UrlFetch service was resolved as of 07:00 Pacific. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Google App Engine Downtime Notify

unread,
Jun 18, 2014, 12:15:57 AM6/18/14
to google-appengine...@googlegroups.com

SUMMARY:

On Tuesday 17 June 2014, some Google App Engine applications experienced elevated latency from the URL Fetch API for a duration of 162 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we have taken and are taking immediate steps to improve the platform’s performance and availability.


DETAILED DESCRIPTION OF IMPACT:

On Tuesday 17 June 2014 from 04:14 to 06:56 US/Pacific, 5% of App Engine applications hosted in US datacenters experienced elevated latency from the URL Fetch API. Latency at the median was unchanged, but latency at the 90th percentile increased by 87% when aggregated over all applications that were affected. The incident did not cause a significant increase in error rates. URL Fetch calls to Google APIs (except Cloud Storage) and appspot.com URLs were not affected.


ROOT CAUSE:

An increase in load caused a software component on some machines to shut down. This resulted in a temporary reduction in available capacity in one of the systems used by the URL Fetch API.


REMEDIATION AND PREVENTION:

Google’s monitoring detected the problem at 04:47. We identified the source of the increase in load. We were able to successfully bring the affected machines back into operation after the load was reduced. To prevent recurrence, our engineers are investigating why the software component experienced issues under high load.


Reply all
Reply to author
Forward
0 new messages