Google Cloud Platform Status
unread,Apr 24, 2015, 8:46:21 AM4/24/15Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to google-appengine...@googlegroups.com
SUMMARY:
On Friday 17 April 2015, the Google App Engine Logs API experienced
intermittent failures and reduced throughput for read requests for a
duration of 54 minutes. If your service or application was affected, we
apologize — this is not the level of quality and reliability we strive to
offer you, and we are taking immediate steps to improve the platform’s
performance and availability.
DETAILED DESCRIPTION OF IMPACT:
On Friday 17 April 2015 from 16:02 to 16:56 PDT, 3% of read requests to the
Logs API failed and there was a 96% drop in throughput. The problem
affected 16% of applications that rely on this API to export logs. In this
time window, users experienced intermittent timeouts while attempting to
view application logs on App Engine Admin Console or Google Cloud
Developers console.
ROOT CAUSE:
Hotspotting in the App Engine Logs API's storage subsystem caused a number
of storage nodes to fail. This eventually resulted in resource depletion
and request failures.
REMEDIATION AND PREVENTION:
At 16:05 on Friday 17 April 2015, an automated alert on depletion of
available resources for the Logs API was sent out to Google Engineers. To
resolve the immediate problem they started redirecting traffic away from
the affected storage layer. The service started recovering at 16:51 and
normal operation was restored at 16:56.
To prevent similar incidents in future, we are implementing changes to
reallocate resources consumed by high use individual nodes of the storage
layer backing the Logs API.