Google App Engine Datastore search errors and latency

已查看 212 次
跳至第一个未读帖子

Google Cloud Platform Status

未读,
2015年8月12日 14:48:542015/8/12
收件人 google-appengine...@googlegroups.com
We are investigating reports of an issue with Google App Engine Datastore.
We will provide more information by 12:00 US/Pacific.

Google Cloud Platform Status

未读,
2015年8月12日 15:00:242015/8/12
收件人 google-appengine...@googlegroups.com
We are experiencing an issue with App Engine Search API requests timing out
beginning at Wednesday, 2015-08-12 11:05 US/Pacific. You may see requests
timing out or returning successfully with increased latency.

For everyone who is affected, we apologize for any inconvenience you may be
experiencing. We will provide an update by 13:00 US/Pacific with current
details.

Google Cloud Platform Status

未读,
2015年8月12日 15:12:132015/8/12
收件人 google-appengine...@googlegroups.com
The issue with App Engine Search API Timeouts should be resolved as of
12:00 US/Pacific. Our internal investigation is in progress and at this
point cannot be certain that the issue cannot re-occur. We will post a
further update by 13:00 as we work towards declaring the incident fully
over.

Google Cloud Platform Status

未读,
2015年8月12日 15:50:422015/8/12
收件人 google-appengine...@googlegroups.com
The issue with App Engine Search API should be resolved for all affected
apps as of 11:46 US/Pacific. We will conduct an internal investigation of
this issue and make appropriate improvements to our systems to prevent or
minimize future recurrence. We will provide a more detailed analysis of
this incident once we have completed our internal investigation.

Google Cloud Platform Status

未读,
2015年8月13日 19:30:122015/8/13
收件人 google-appengine...@googlegroups.com
SUMMARY:

On Wednesday, 12 August 2015, the Search API for Google App Engine
experienced increased latency and errors for 40 minutes. We apologize for
this incident and the effect it had on applications using the Search API.
We strive for excellent performance and uptime, so we will take appropriate
actions right away to improve the Search API’s availability.

If you believe your paid application experienced an SLA violation as a
result of this incident, please contact us at:
https://support.google.com/cloud/answer/3420056

DETAILED DESCRIPTION OF IMPACT:

On Wednesday, 12 August 2015 from 11:05am to 11:45am PDT, the Search API
service experienced an increase in latency and error rate. 8.7% of
applications using the Search API received a 7.5% error rate with messages
like: “Timeout: Failed to complete request in NNNNms”

ROOT CAUSE:

A set of queries sent to a Google-owned service running on App Engine
caused the Search API service to fail.

REMEDIATION AND PREVENTION:

At 10:28, Google engineers were automatically alerted to increasing latency
in the Search API backend, but at this point, customers were not impacted.
At 11:05, the increasing latency started causing Search API timeouts. Once
the cause of the latency increase was discovered, the relevant user was
isolated from other customers, ending the incident at 11:45.

The Search API team is implementing mitigation and monitoring changes as a
result of this incident, which include changes to the API backend to
isolate the impact of similar issues and improved monitoring to reduce the
time taken to detect and isolate problematic workloads for the Search API.
回复全部
回复作者
转发
0 个新帖子