This morning at around 6:30am PDT we experienced a datastore outage
during which a small percentage of requests returned errors. Between
9:00 and 11:30am, and again at 12:40pm, the percentage of requests
returning errors increased significantly. At around 1:40pm we were
able to isolate the issue, and requests are currently serving
normally.
This outage was the result of a bug in our datastore servers and was
triggered by a particular class of queries. We have isolated the bug
and we're currently working on a fix. Going forward, we're also
working to further isolate queries so that in the future a bug like
this won't affect the stability of the system as a whole.
Thanks for being patient. We'll post further updates as we have them.
(Especially the second message, where DataStore commits are showing up
one moment, and not the other, and then showing up again. Needless to
say that's throwing my app into a frenzy.)
Thanks,
Aral
<snip>
> We have isolated the bug
> and we're currently working on a fix. Going forward, we're also
> working to further isolate queries so that in the future a bug like
> this won't affect the stability of the system as a whole.
> Thanks for being patient. We'll post further updates as we have them.
Just an update on the issues we saw on Tuesday. We've identified the
root cause of the issue and implemented a fix. Specifically, we've
instituted a set of controls to ensure 1) that datastore queries no
longer trigger this particular bug and 2) that bugs like this in the
future don't affect the stability of the system as a whole. All of
our systems are currently operating smoothly, and have been since
1:40pm PDT (GMT-7) on Tuesday 6/17.
During this preview period, we on the App Engine team are working hard
to smooth out the system. At all times, though, we're trying to keep
system-wide outages like this to an absolute minimum so that your apps
will remain up and running. We're also trying to make sure that we
build effective ways to communicate with developers about the hiccups
that occasionally occur with large and complex systems like this, and
we'd welcome your feedback and ideas.
Pete Koomen, App Engine Team
On Jun 17, 3:35 pm, Pete <pkoo...@google.com> wrote:
> This morning at around 6:30am PDT we experienced a datastore outage
> during which a small percentage of requests returned errors. Between
> 9:00 and 11:30am, and again at 12:40pm, the percentage of requests
> returning errors increased significantly. At around 1:40pm we were
> able to isolate the issue, and requests are currently serving
> normally.
> This outage was the result of a bug in our datastore servers and was
> triggered by a particular class of queries. We have isolated the bug
> and we're currently working on a fix. Going forward, we're also
> working to further isolate queries so that in the future a bug like
> this won't affect the stability of the system as a whole.
> Thanks for being patient. We'll post further updates as we have them.
On Jun 19, 1:13 pm, Pete <pkoo...@google.com> wrote:
> [snip] We're also trying to make sure that we
> build effective ways to communicate with developers about the hiccups
> that occasionally occur with large and complex systems like this, and
> we'd welcome your feedback and ideas.
Well, I'd suggest a public live dashboard of the App Engine service as
a whole, showing uptime, min / mean / median response times, aggregate
storage, number of App Engine apps, rate of new app creation, max /
mean / median storage per app, and so on.
Something like Zeitgeist[1] for App Engine.
True, only a few of these statistics are actually *useful* to an App
Engine developer, but we all have an intense curiosity about what's
going on under the covers, so you might as well leverage that into a
PR tool.
Just a quick update to the weirdness I was seeing: we've concluded
that it was due to browser caching because of a flaky internet
connection (T-Mobile WiFi at Starbucks). Please see the thread
referenced previously for Pete's comment for more details.
On Jun 18, 6:49 pm, Aral <a...@aralbalkan.com> wrote:
<snip>
Right now I'm seeing the behavior described here:http://
groups.google.com/group/google-appengine/browse_thread/thread/...
<snip>
From 06-23 09:59PM to 06-23 10:12PM GAE server time:
Traceback (most recent call last):
File "/base/python_lib/versions/1/google/appengine/ext/webapp/
__init__.py", line 499, in __call__
handler.get(*groups)
File "/base/data/home/apps/hms/1.188/Main_app.py", line 1031, in get
sysvalues=db.GqlQuery("SELECT * FROM SysValues").get()
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1257, in get
results = self.fetch(1)
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1301, in fetch
raw = self._get_query().Get(limit, offset)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 928, in Get
return self._Run(limit, offset)._Next(limit)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 872, in _Run
_ToDatastoreError(err)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 1603, in _ToDatastoreError
raise datastore_errors.Error(err.error_detail)
Error
Please Check.
On Jun 19, 10:50 pm, "Michael R. Bernstein" <mich...@fandomhome.com>
wrote:
> On Jun 19, 1:13 pm, Pete <pkoo...@google.com> wrote:
> > [snip] We're also trying to make sure that we
> > build effective ways to communicate with developers about the hiccups
> > that occasionally occur with large and complex systems like this, and
> > we'd welcome your feedback and ideas.
> Well, I'd suggest a public live dashboard of the App Engine service as
> a whole, showing uptime, min / mean / median response times, aggregate
> storage, number of App Engine apps, rate of new app creation, max /
> mean / median storage per app, and so on.
> Something like Zeitgeist[1] for App Engine.
> True, only a few of these statistics are actually *useful* to an App
> Engine developer, but we all have an intense curiosity about what's
> going on under the covers, so you might as well leverage that into a
> PR tool.
And again from 06-24 08:24AM to 06-24 08:27AM and from 06-24 11:39AM
to 06-24 12:20PM .
The tracebcak always involves a datastore operation (even simple
reads) and always end with:
File "/base/python_lib/versions/1/google/appengine/api/datastore.py",
line 1603, in _ToDatastoreError
raise datastore_errors.Error(err.error_detail)
Error
Is anybody else getting these errors again ?
On Jun 24, 9:51 am, Gadi <gad.l...@gmail.com> wrote:
> From 06-23 09:59PM to 06-23 10:12PM GAE server time:
> Traceback (most recent call last):
> File "/base/python_lib/versions/1/google/appengine/ext/webapp/
> __init__.py", line 499, in __call__
> handler.get(*groups)
> File "/base/data/home/apps/hms/1.188/Main_app.py", line 1031, in get
> sysvalues=db.GqlQuery("SELECT * FROM SysValues").get()
> File "/base/python_lib/versions/1/google/appengine/ext/db/
> __init__.py", line 1257, in get
> results = self.fetch(1)
> File "/base/python_lib/versions/1/google/appengine/ext/db/
> __init__.py", line 1301, in fetch
> raw = self._get_query().Get(limit, offset)
> File "/base/python_lib/versions/1/google/appengine/api/
> datastore.py", line 928, in Get
> return self._Run(limit, offset)._Next(limit)
> File "/base/python_lib/versions/1/google/appengine/api/
> datastore.py", line 872, in _Run
> _ToDatastoreError(err)
> File "/base/python_lib/versions/1/google/appengine/api/
> datastore.py", line 1603, in _ToDatastoreError
> raise datastore_errors.Error(err.error_detail)
> Error
> Please Check.
> On Jun 19, 10:50 pm, "Michael R. Bernstein" <mich...@fandomhome.com>
> wrote:
> > On Jun 19, 1:13 pm, Pete <pkoo...@google.com> wrote:
> > > [snip] We're also trying to make sure that we
> > > build effective ways to communicate with developers about the hiccups
> > > that occasionally occur with large and complex systems like this, and
> > > we'd welcome your feedback and ideas.
> > Well, I'd suggest a public live dashboard of the App Engine service as
> > a whole, showing uptime, min / mean / median response times, aggregate
> > storage, number of App Engine apps, rate of new app creation, max /
> > mean / median storage per app, and so on.
> > Something like Zeitgeist[1] for App Engine.
> > True, only a few of these statistics are actually *useful* to an App
> > Engine developer, but we all have an intense curiosity about what's
> > going on under the covers, so you might as well leverage that into a
> > PR tool.