super high failure rate for datastore

6 views
Skip to first unread message

Ben Nevile

unread,
Jun 2, 2009, 12:35:03 AM6/2/09
to Google App Engine
About 90 minutes ago one of my applications began failing on somewhere
close to 100% of its datastore API calls. The app is nowhere close to
exceeding any quotas, and I haven't made any changes in the code in
the last ... I don't know, at least 12 hours.

Que pasa Googlito? The app id is "butterpicks".


Ben

Ben Nevile

unread,
Jun 2, 2009, 4:08:49 AM6/2/09
to Google App Engine
tick tock. four and a half hours of these failures now. I've had to
take the site down and put up a "sorry!" message.

ben

Ben Nevile

unread,
Jun 2, 2009, 10:47:58 AM6/2/09
to Google App Engine
looks like the problem corrected itself about an hour ago. Please
will someone from Goog let me know what's up? I want to be able to
trust this platform!

Ben

objectuser

unread,
Jun 2, 2009, 8:41:41 AM6/2/09
to Google App Engine
That's crazy!

I notice that the status panel shows normal:

http://code.google.com/status/appengine

Are you still getting failures? I'm wondering if those other days
where everything was normal also experienced failures that don't show
up o the status panel.

Brett (Google)

unread,
Jun 2, 2009, 2:58:57 PM6/2/09
to Google App Engine
Hi Ben,

On Jun 1, 9:35 pm, Ben Nevile <ben.nev...@gmail.com> wrote:
> About 90 minutes ago one of my applications began failing on somewhere
> close to 100% of its datastore API calls.  The app is nowhere close to
> exceeding any quotas, and I haven't made any changes in the code in
> the last ... I don't know, at least 12 hours.

Could you provide some more information about the source of your
Datastore issues? What exceptions are you getting (what do you see in
your application logs)? What's your access pattern like (requests per
second, data size per request, number of Datastore calls per request)?
Do you have contention on any entities? Do you have a high rate of
querying on a single set of entities? Are you using memcache at all?

All of these issues can affect your application individually while the
rest of the system works just fine. It's important to keep track of
your application's errors in its logs. It's also useful to load test
your application before you get load to find the source of any sources
of contention before they happen.

-Brett
App Engine Team

Ben Nevile

unread,
Jun 2, 2009, 5:20:14 PM6/2/09
to Google App Engine
Hi Brett,

I'm happy to provide you with some details. I think the most important
detail though is that the application has been running for months with
more or less the same access pattern, and last night's behaviour was
anomalous.

The exceptions were all Timeouts.

The application gets between 1 and 25 req/second, depending on the
time of day. The number of datastore requests ranges from 0 to 10
with the median probably around 1. The amount of data being returned
is always small... just simple html or short json strings. There's no
contention. I am using memcache yes, but the timeouts always came
from datastore operations.

Ben



On Jun 2, 11:58 am, "Brett (Google)" <brett-appeng...@google.com>
wrote:
Reply all
Reply to author
Forward
0 new messages