App Engine outage today

124 views
Skip to first unread message

Pete

unread,
Jun 17, 2008, 6:35:00 PM6/17/08
to Google App Engine
Hi all,

This morning at around 6:30am PDT we experienced a datastore outage
during which a small percentage of requests returned errors. Between
9:00 and 11:30am, and again at 12:40pm, the percentage of requests
returning errors increased significantly. At around 1:40pm we were
able to isolate the issue, and requests are currently serving
normally.

This outage was the result of a bug in our datastore servers and was
triggered by a particular class of queries. We have isolated the bug
and we're currently working on a fix. Going forward, we're also
working to further isolate queries so that in the future a bug like
this won't affect the stability of the system as a whole.

Thanks for being patient. We'll post further updates as we have them.

Pete, App Engine Team

Aral

unread,
Jun 18, 2008, 1:49:55 PM6/18/08
to Google App Engine
Should we expect weirdness with the DataStore until we have an update
from you?

Right now I'm seeing the behavior described here:
http://groups.google.com/group/google-appengine/browse_thread/thread/bace7a0b6716df6e#

(Especially the second message, where DataStore commits are showing up
one moment, and not the other, and then showing up again. Needless to
say that's throwing my app into a frenzy.)

Thanks,
Aral

<snip>
> We have isolated the bug
> and we're currently working on a fix. Going forward, we're also
> working to further isolate queries so that in the future a bug like
> this won't affect the stability of the system as a whole.
>
> Thanks for being patient. We'll post further updates as we have them.
<snip>
Message has been deleted

Pete Koomen

unread,
Jun 19, 2008, 4:39:59 PM6/19/08
to Google App Engine
Hi all,

Just an update on the issues we saw on Tuesday. We've identified the
root cause of the issue and implemented a fix. Specifically, we've
instituted a set of controls to ensure 1) that datastore queries no
longer trigger this particular bug and 2) that bugs like this in the
future don't affect the stability of the system as a whole. All of
our systems are currently operating smoothly, and have been since
1:40pm PDT (GMT-7) on Tuesday 6/17.

During this preview period, we on the App Engine team are working hard
to smooth out the system. At all times, though, we're trying to keep
system-wide outages like this to an absolute minimum so that your apps
will remain up and running. We're also trying to make sure that we
build effective ways to communicate with developers about the hiccups
that occasionally occur with large and complex systems like this, and
we'd welcome your feedback and ideas.

Pete Koomen, App Engine Team

Michael R. Bernstein

unread,
Jun 19, 2008, 4:50:17 PM6/19/08
to Google App Engine
On Jun 19, 1:13 pm, Pete <pkoo...@google.com> wrote:
>
> [snip] We're also trying to make sure that we
> build effective ways to communicate with developers about the hiccups
> that occasionally occur with large and complex systems like this, and
> we'd welcome your feedback and ideas.

Well, I'd suggest a public live dashboard of the App Engine service as
a whole, showing uptime, min / mean / median response times, aggregate
storage, number of App Engine apps, rate of new app creation, max /
mean / median storage per app, and so on.

Something like Zeitgeist[1] for App Engine.

True, only a few of these statistics are actually *useful* to an App
Engine developer, but we all have an intense curiosity about what's
going on under the covers, so you might as well leverage that into a
PR tool.

Example dashboards:

http://status.aws.amazon.com/

http://trust.salesforce.com/trust/status/

- Michael

[1] http://www.google.com/intl/en/press/zeitgeist2007/

Aral Balkan

unread,
Jun 20, 2008, 4:24:27 AM6/20/08
to Google App Engine
Just a quick update to the weirdness I was seeing: we've concluded
that it was due to browser caching because of a flaky internet
connection (T-Mobile WiFi at Starbucks). Please see the thread
referenced previously for Pete's comment for more details.

On Jun 18, 6:49 pm, Aral <a...@aralbalkan.com> wrote:
<snip>
Right now I'm seeing the behavior described here:http://
groups.google.com/group/google-appengine/browse_thread/thread/...
<snip>

Gadi

unread,
Jun 24, 2008, 2:51:16 AM6/24/08
to Google App Engine
The error is back again !!

From 06-23 09:59PM to 06-23 10:12PM GAE server time:

Traceback (most recent call last):
File "/base/python_lib/versions/1/google/appengine/ext/webapp/
__init__.py", line 499, in __call__
handler.get(*groups)
File "/base/data/home/apps/hms/1.188/Main_app.py", line 1031, in get
sysvalues=db.GqlQuery("SELECT * FROM SysValues").get()
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1257, in get
results = self.fetch(1)
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1301, in fetch
raw = self._get_query().Get(limit, offset)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 928, in Get
return self._Run(limit, offset)._Next(limit)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 872, in _Run
_ToDatastoreError(err)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 1603, in _ToDatastoreError
raise datastore_errors.Error(err.error_detail)
Error


Please Check.



On Jun 19, 10:50 pm, "Michael R. Bernstein" <mich...@fandomhome.com>
wrote:

Gadi

unread,
Jun 24, 2008, 4:35:24 PM6/24/08
to Google App Engine
And again from 06-24 08:24AM to 06-24 08:27AM and from 06-24 11:39AM
to 06-24 12:20PM .

The tracebcak always involves a datastore operation (even simple
reads) and always end with:

File "/base/python_lib/versions/1/google/appengine/api/datastore.py",
line 1603, in _ToDatastoreError
raise datastore_errors.Error(err.error_detail)
Error

Is anybody else getting these errors again ?
> > [1]http://www.google.com/intl/en/press/zeitgeist2007/- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages