SUMMARY:
On Wednesday 8 October 2014, some Google Cloud Storage users experienced increased error rates for 70 minutes. If you were affected by this issue we apologize. Our goal is to provide you with a high level of reliability and availability, which we failed to meet in this situation. We are taking immediate steps to improve Google Cloud Storage’s reliability and availability.
DETAILED DESCRIPTION OF IMPACT:
On Wednesday 8 October 2014 from 14:15 to 15:30 PDT, 3% of requests to the Google Cloud Storage XML API and 2.9% of requests to the JSON API received an HTTP 500 or 503 response. No increase in latency was observed during the incident.
ROOT CAUSE:
On Wednesday 8 October 2014, two independent fiber cuts resulted in the loss of a terabit of network capacity between Google datacenters. The impact of this loss of capacity was increased due to coincident maintenance on a third fiber link. This resulted in packet loss on the remaining links due to saturation.
REMEDIATION AND PREVENTION:
The loss of network capacity and impacted network links were identified at 14:30 PDT by Google engineers who notified the fiber provider. Google engineers then took action to route traffic away from the affected links to reduce congestion and quickly bring the link in maintenance back into service. Error rates for Google Cloud Storage returned to baseline levels at 15:30 PDT. The fiber repairs were finished on 9 October 2014 at 09:34.