Google Cloud Platform Status
unread,Feb 22, 2018, 10:13:38 AM2/22/18Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to google-appengine...@googlegroups.com
On Thursday 15 February 2018, specific Google Cloud Platform services
experienced elevated errors and latency for a period of 62 minutes from
11:42 to 12:44 PST. The following services were impacted:
Cloud Datastore experienced a 4% error rate for get calls and an 88% error
rate for put calls.
App Engine's serving infrastructure, which is responsible for routing
requests to instances, experienced a 45% error rate, most of which were
timeouts.
App Engine Task Queues would not accept new transactional tasks, and also
would not accept new tasks in regions outside us-central1 and europe-west1.
Tasks continued to be dispatched during the event but saw start delays of
0-30 minutes; additionally, a fraction of tasks executed with errors due to
the aforementioned Cloud Datastore and App Engine performance issues.
App Engine Memcache calls experienced a 5% error rate.
App Engine Admin API write calls failed during the incident, causing
unsuccessful application deployments. App Engine Admin API read calls
experienced a 13% error rate.
App Engine Search API index writes failed during the incident though search
queries did not experience elevated errors.
Stackdriver Logging experienced delays exporting logs to systems including
Cloud Console Logs Viewer, BigQuery and Cloud Pub/Sub. Stackdriver Logging
retries on failure so no logs were lost during the incident. Logs-based
Metrics failed to post some points during the incident.
We apologize for the impact of this outage on your application or service.
For Google Cloud Platform customers who rely on the products which were
part of this event, the impact was substantial and we recognize that it
caused significant disruption for those customers. We are conducting a
detailed post-mortem to ensure that all the root and contributing causes of
this event are understood and addressed promptly.