Datastore Indexing Issue

188 views
Skip to first unread message

Google Cloud Platform Status

unread,
Apr 10, 2015, 4:09:14 PM4/10/15
to google-appengine...@googlegroups.com
We're investigating an issue with Google App Engine Datastore beginning at
Friday 2015-04-10 12:30 (all times are in US/Pacific). We will provide more
information shortly within one hour.

Google Cloud Platform Status

unread,
Apr 10, 2015, 4:45:38 PM4/10/15
to google-appengine...@googlegroups.com
We are currently experiencing an issue with Google App Engine Datastore.
Some applications' Datastore indexes are not updating. For everyone who is
affected, we apologize for any inconvenience you may be experiencing. We
will provide an update by Friday 2015-10-04 14:45 (US/Pacific) with current
details.
Google Engineers have identified the cause and are currently working on
multiple resolution strategies.

Google Cloud Platform Status

unread,
Apr 10, 2015, 5:15:55 PM4/10/15
to google-appengine...@googlegroups.com
The problem with Google App Engine Datastore was resolved as of Friday
2015-04-10 14:00 (US/Pacific). We apologize for the inconvenience and thank
you for your patience and continued support. Please rest assured that
system reliability is a top priority at Google, and we are making
continuous improvements to make our systems better. We will provide a more
detailed analysis of this incident once we have completed our internal
investigation.

Google Cloud Platform Status

unread,
Apr 15, 2015, 3:25:36 PM4/15/15
to google-appengine...@googlegroups.com

SUMMARY:

On Friday 10th April 2015, attempts to create or update Datastore indexes
failed for some Google App Engine applications for a duration of 148
minutes. In addition, a number of applications retrieved stale data using
eventually consistent read operations for an unexpectedly long period. If
your service or application was affected, we apologize — this is not the
level of quality and reliability we strive to offer you, and we are taking
immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 10 April 2015 from 11:30 to 13:58 PDT, 331 requests to create or
update the definition of Datastore composite indexes across 21 applications
failed to complete. In addition, about 34% of applications retrieved stale
data using eventually consistent QUERY or GET operations [1]. Unlike
strongly consistent queries, it is expected of eventually consistent read
operations to return stale data for a brief period. However, this behaviour
was extended to a longer duration than that which is typically observed
during normal operations. There was no impact on strongly consistent
operations.

During the recovery phase of this incident about 7% of Google App Engine
applications experienced elevated latency on PUT operations for 15 minutes.

ROOT CAUSE:

During a planned maintenance activity, undertaken to create a new Datastore
replica to accommodate organic growth, incorrectly configured automation
created an unnecessary large table in the new replica. This resulted in
exhaustion of resources allocated to Datastore and write failures to this
replica. Once the underlying problem was resolved, a high volume of writes
were routed to the new replica, resulting in elevated latency for write
operations.

REMEDIATION AND PREVENTION:

At 00:30 PDT on Friday 10th April 2015, an automated alert on resource
depletion was sent out to Google Engineers. However, this alert was
suppressed, as is normal practice when undertaking this type of maintenance
activity. At 11:30 PDT, quota allocated to the replica was exhausted.
Google Engineers were notified by internal teams at 12:53 PDT of problems
with Datastore indexes. At 13:26 PDT, Google Engineers deleted the
problematic large table and started the procedure to reserve additional
quota for this storage replica. This took effect at 13:35 PDT and the
replica started receiving write requests immediately, which caused a brief
increase in latency. Normal operation was restored at 13:58 PDT.

To prevent similar incidents in future, we are modifying our maintenance
procedures to avoid suppression of the appropriate alerts, and to ensure
that this large table is created under close monitoring.


[1]. Details on eventual and strong consistency on Google Cloud Datastore:
https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/#h.tf76fya5nqk8
Reply all
Reply to author
Forward
0 new messages