Google Cloud Storage issue on Wednesday 8 October 2014 beginning at 14:30 PDT

228 views
Skip to first unread message

Google Cloud Storage - Announce

unread,
Oct 8, 2014, 6:05:41 PM10/8/14
to gs-an...@googlegroups.com
We're investigating an issue with Google Cloud Storage on Wednesday 8 October 2014 beginning at 14:30 PDT. We will provide more information shortly.

Google Cloud Storage - Announce

unread,
Oct 8, 2014, 6:35:33 PM10/8/14
to gs-an...@googlegroups.com, gs-an...@googlegroups.com
We are currently experiencing an issue with Google Cloud Storage and some users are experiencing elevated errors and latency.  For everyone who is affected, we apologize - we know you count on Google to work for you and we're working hard to restore normal operation.

Google Cloud Storage - Announce

unread,
Oct 8, 2014, 7:05:55 PM10/8/14
to gs-an...@googlegroups.com, gs-an...@googlegroups.com
The problem with Google Cloud Storage was resolved as of 15:55 PDT. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

Google Cloud Storage - Announce

unread,
Oct 9, 2014, 8:37:34 PM10/9/14
to gs-an...@googlegroups.com, gs-an...@googlegroups.com

SUMMARY:

On Wednesday 8 October 2014, some Google Cloud Storage users experienced increased error rates for 70 minutes. If you were affected by this issue we apologize.  Our goal is to provide you with a high level of reliability and availability, which we failed to meet in this situation.  We are taking immediate steps to improve Google Cloud Storage’s reliability and availability.


DETAILED DESCRIPTION OF IMPACT:

On Wednesday 8 October 2014 from 14:15 to 15:30 PDT, 3% of requests to the Google Cloud Storage XML API and 2.9% of requests to the JSON API received an HTTP 500 or 503 response.  No increase in latency was observed during the incident.


ROOT CAUSE:

On Wednesday 8 October 2014, two independent fiber cuts resulted in the loss of a terabit of network capacity between Google datacenters. The impact of this loss of capacity was increased due to coincident maintenance on a third fiber link. This resulted in packet loss on the remaining links due to saturation.


REMEDIATION AND PREVENTION:

The loss of network capacity and impacted network links were identified at 14:30 PDT by Google engineers who notified the fiber provider.  Google engineers then took action to route traffic away from the affected links to reduce congestion and quickly bring the link in maintenance back into service.  Error rates for Google Cloud Storage returned to baseline levels at 15:30 PDT.  The fiber repairs were finished on 9 October 2014 at 09:34.


To minimize impact in future incidents of this sort, we will increase the priority of Google Cloud Storage traffic on Google's production network so that it will be more resilient to loss of network capacity.
Reply all
Reply to author
Forward
0 new messages