Timeout: datastore timeout: operation took too long.

573 views
Skip to first unread message

ray

unread,
Apr 13, 2009, 1:36:05 PM4/13/09
to Google App Engine
I'm not sure why this occurs all of a sudden. The same job runs in
600ms. Then once in a while this runs over 6800 MS and times out.
I can't have jobs just timeout for no reason. According to my logs
during this time the only other request was a second prior. This
occurred at 04-13 09:13AM 29.627 today and other times at random in
the last few days.

DarkCoiote

unread,
Apr 13, 2009, 7:11:14 PM4/13/09
to Google App Engine
Looks like my and others problem... posted a few days ago:

http://groups.google.com/group/google-appengine/browse_thread/thread/83b45cb3f90a2a3f/740e922de7d0b33b?q=#740e922de7d0b33b

Random datastore timeouts in totally unexpected places...

Jeff S (Google)

unread,
Apr 15, 2009, 2:26:09 PM4/15/09
to google-a...@googlegroups.com
Hi Ray,

Which operation was is that timed out (get, query, put)? Also, how consistently are you seeing these timeouts?

I generally recommend catching datastore timeouts and handling them in a way that makes sense for your app. There are currently occasional (quite rare as a percentage) timeouts for queries and gets, and timeouts on a put is often an indicator of contention on that entity or entity group.

Happy coding,

Jeff

DarkCoiote

unread,
Apr 16, 2009, 7:16:45 AM4/16/09
to Google App Engine


On Apr 15, 3:26 pm, "Jeff S (Google)" <j...@google.com> wrote:
> Hi Ray,
>
> Which operation was is that timed out (get, query, put)? Also, how
> consistently are you seeing these timeouts?
>
> I generally recommend catching datastore timeouts and handling them in a way
> that makes sense for your app. There are currently occasional (quite rare as
> a percentage) timeouts for queries and gets, and timeouts

> on a put is often
> an indicator of contention on that entity or entity group.

well... almost (if not all) of the timeouts I'm seeing are on 'put'
operations...
but contention would need, like, 2 or more operations on the same
entity
(all my entities are roots... stupid, I know ), right?

I'll check for bugs that could cause multiples requests and stuff like
that....

Thank you

>
> Happy coding,
>
> Jeff
>
> On Mon, Apr 13, 2009 at 4:11 PM, DarkCoiote <darkcoi...@gmail.com> wrote:
>
> > Looks like my and others problem... posted a few days ago:
>
> >http://groups.google.com/group/google-appengine/browse_thread/thread/...

Jeff S (Google)

unread,
Apr 16, 2009, 1:10:12 PM4/16/09
to google-a...@googlegroups.com
On Thu, Apr 16, 2009 at 4:16 AM, DarkCoiote <darkc...@gmail.com> wrote:

On Apr 15, 3:26 pm, "Jeff S (Google)" <j...@google.com> wrote:
> Hi Ray,
>
> Which operation was is that timed out (get, query, put)? Also, how
> consistently are you seeing these timeouts?
>
> I generally recommend catching datastore timeouts and handling them in a way
> that makes sense for your app. There are currently occasional (quite rare as
> a percentage) timeouts for queries and gets, and timeouts

> on a put is often
> an indicator of contention on that entity or entity group.

well... almost (if not all) of the timeouts I'm seeing are on 'put'
operations...
but contention would need, like, 2 or more operations on the same
entity
 (all my entities are roots... stupid, I know ), right?

Actually, making most of your entities roots is often better than deep ancestor trees in terms of overall write throughput. In a transactional write a entity is updated which has a parent entity, the ancestors are locked. So if nearly concurrent requests update different entities which all share a common ancestor (in other words the entities are in the same entity group), some of the child entity updates could fail due to contention on writes. There are more details here in the documentation:

http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html#Entity_Groups_Ancestors_and_Paths

"""
Tips for using entity groups:

- Only use entity groups when they are needed for transactions. For other relationships between entities, use ReferenceProperty properties and Key values, which can be used in queries.

- The more entity groups your application has—that is, the more root entities there are—the more efficiently the datastore can distribute the entity groups across datastore nodes. Better distribution improves the performance of creating and updating data. Also, multiple users attempting to update entities in the same entity group at the same time will cause some users to retry their transactions, possibly causing some to fail to commit changes. Do not put all of the application's entities under one root.

- A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller.

- Entity groups do not have a significant impact on the speed of queries.
"""

If your entities are roots, then yes contention could occur if the same entity were updated by overlapping requests.

Thank you,

Jeff
 

DarkCoiote

unread,
Apr 16, 2009, 7:10:07 PM4/16/09
to Google App Engine
Yes... I've read that...

The problem with using all entities as root is that I'm unable to use
transactions as it is... I would have to code a lit bit... although
I just found a paper describing a project that seems really good.

http://danielwilkerson.com/dist-trans-gae.html

I think it will be presented this weekend, and I really hope that
it goes "public" or even yet plugged to app engine!

Just double-checked my GAE logs, and it seems that I'm getting
timeouts in "get" operations as well... multiple requests could cause
that too (one get 'over' a put for example)...

Have to check my code....

Thank you

On Apr 16, 2:10 pm, "Jeff S (Google)" <j...@google.com> wrote:
> On Thu, Apr 16, 2009 at 4:16 AM, DarkCoiote <darkcoi...@gmail.com> wrote:
>
> > On Apr 15, 3:26 pm, "Jeff S (Google)" <j...@google.com> wrote:
> > > Hi Ray,
>
> > > Which operation was is that timed out (get, query, put)? Also, how
> > > consistently are you seeing these timeouts?
>
> > > I generally recommend catching datastore timeouts and handling them in a
> > way
> > > that makes sense for your app. There are currently occasional (quite rare
> > as
> > > a percentage) timeouts for queries and gets, and timeouts
>
> > > on a put is often
> > > an indicator of contention on that entity or entity group.
>
> > well... almost (if not all) of the timeouts I'm seeing are on 'put'
> > operations...
> > but contention would need, like, 2 or more operations on the same
> > entity
> >  (all my entities are roots... stupid, I know ), right?
>
> Actually, making most of your entities roots is often better than deep
> ancestor trees in terms of overall write throughput. In a transactional
> write a entity is updated which has a parent entity, the ancestors are
> locked. So if nearly concurrent requests update different entities which all
> share a common ancestor (in other words the entities are in the same entity
> group), some of the child entity updates could fail due to contention on
> writes. There are more details here in the documentation:
>
> http://code.google.com/appengine/docs/python/datastore/keysandentityg...

ray

unread,
Apr 16, 2009, 10:05:24 PM4/16/09
to Google App Engine
I'm seeing random timeouts where there are many seconds or minutes
between requests and hours before app engine handled the same request
within 400ms. I hit another one today that will actually cost my
business, not much but some. My app needs to catch these and I am on
many pain points. However, I think the timeouts needs addressed.

Paul Kinlan

unread,
Apr 17, 2009, 3:50:55 AM4/17/09
to google-a...@googlegroups.com
Hi,

I would just like to add myself into this, my app twitterautofollow (twollo) regularly gets DataStore time outs on puts (in the most part).  All my entities are root entities.

I did have a thread open on this only a few days ago.

Paul

2009/4/17 Ray Malone <rayi...@gmail.com>

Sylvain

unread,
Apr 17, 2009, 4:54:11 AM4/17/09
to Google App Engine
Datastore timeout is one of the biggest (and oldest) issue with GAE
and mostly because it is random.
You can get it on get, put, fetch,... even with few entities.

I hope that soon, datastore timeout will be negligible. Currently, it
is not the case.

Regards


On 17 avr, 09:50, Paul Kinlan <paul.kin...@gmail.com> wrote:
> Hi,
>
> I would just like to add myself into this, my app twitterautofollow (twollo)
> regularly gets DataStore time outs on puts (in the most part).  All my
> entities are root entities.
>
> I did have a thread open on this only a few days ago.
>
> Paul
>
> 2009/4/17 Ray Malone <rayish...@gmail.com>

ray

unread,
Apr 17, 2009, 8:12:15 AM4/17/09
to Google App Engine
Speaking only for my app, I can tell this is not an application
issue. My datastore is made up of only root entities and in most
cases works well quickly. My app doesn't see large volumes of
requests per second and it's more like requests per minute. There
is no possible way the errors I'm seeing are from contention given the
minutes between requests. And, the same exact process is handled a
few hours before within 600ms. My app is soon to increase in
volume and needs to be stable. I love using app engine, but I've
never faced errors like this on any other platform from IIS Server and
SQL Server to PHP and MY SQL or even Unix and Oracle. Don't get me
wrong, I understand the difference in the platforms, If there is
anything I can do to prevent this I would love to know. I'm not
moving my app, but need to find a solution. My company will be on the
local NBC news (I will be mentioning app engine) and may see a huge
volume hitting the site next week.



On Apr 17, 4:54 am, Sylvain <sylvain.viv...@gmail.com> wrote:
> Datastore timeout is one of the biggest (and oldest) issue with GAE
> and mostly because it is random.
> You can get it on get, put, fetch,... even with few entities.
>
> I hope that soon, datastore timeout will be negligible. Currently, it
> is not the case.
>
> Regards
>
> On 17 avr, 09:50, Paul Kinlan <paul.kin...@gmail.com> wrote:
>
> > Hi,
>
> > I would just like to add myself into this, my app twitterautofollow (twollo)
> > regularly gets DataStore time outs on puts (in the most part).  All my
> > entities are root entities.
>
> > I did have a thread open on this only a few days ago.
>
> > Paul
>
> > 2009/4/17RayMalone <rayish...@gmail.com>

ken keller

unread,
Apr 17, 2009, 12:14:55 PM4/17/09
to Google App Engine
Good for you that your company has good prospects. As somebody who has
built high traffic sites (co-founder of IGN.com), I offer some advice:
Don't even think about driving traffic to it unless it has been stable
for weeks. If you are having problems w/ minute traffic, you can't
imagine how bad it will be under load. Plan for graceful degradation.
Some possible degradations: Read-only mode except for existing
registered users. Static site.

Another thing to try is to queue put's in memcache, in a simple
datastore q, or in http://aws.amazon.com/sqs/. The deferred put's can
be stored as simple Strings using the Pickler.

Brandon Thomson

unread,
Apr 17, 2009, 4:15:24 PM4/17/09
to Google App Engine
Google will not really acknowledge this as a problem or defect but
neither do they provide a lot of options for workaround. My experience
is that Timeouts do occur on about 0.5% of puts regardless of size of
entity. These are root entities. It is not caused by write contention
from multiple requests on the entity. A handler only called by
periodic cron job will experience it.

For one week in March when they were playing around with the
architecture Timeouts went from about 0.5% of all puts to about 5%.
Since it has been not as bad. There was an open defect but they
recently closed it: http://code.google.com/p/googleappengine/issues/detail?id=764

I agree with notcourage about queues being a good solution and this is
what I do because the data has to go somewhere. Currently I am using
SQS. But a perhaps more important question is, is the time you spend
implementing these workarounds better spent implementing your app on a
different platform? I honestly don't know the answer, maybe notcourage
has a better idea. I am curious to hear any more thoughts you have.
Because if we are trying to make money this is always at front of
mind.

On Apr 17, 12:14 pm, notcourage <klr...@gmail.com> wrote:
> Good for you that your company has good prospects. As somebody who has
> built high traffic sites (co-founder of IGN.com), I offer some advice:
> Don't even think about driving traffic to it unless it has been stable
> for weeks. If you are having problems w/ minute traffic, you can't
> imagine how bad it will be under load. Plan for graceful degradation.
> Some possible degradations: Read-only mode except for existing
> registered users. Static site.
>
> Another thing to try is to queue put's in memcache, in a simple
> datastore q, or inhttp://aws.amazon.com/sqs/. The deferred put's can
> be stored as simple Strings using the Pickler.
>
> On Apr 17, 5:12 am, Ray Malone <rayish...@gmail.com> wrote:
>
> > Speaking only for my app, I can tell this is not an application
> > issue.  My datastore is made up of only root entities and in most
> > cases works well quickly.  My app doesn't see large volumes of
> > requests per second and it's more like requests per minute.     There
> > is no possible way the errors I'm seeing are from contention given the
> > minutes between requests.  And, the same exact process is handled a
> > few hours before within 600ms.      My app is soon to increase in
> > volume and needs to be stable.  I love using app engine, but I've
> > never faced errors like this on any other platform from IIS Server and
> > SQL Server to PHP and MY SQL or even Unix and Oracle.   Don't get me
> > wrong, I understand the difference in the platforms, If there is
> > anything I can do to prevent this I would love to know.  I'm not
> > moving my app, but need to find a solution.  My company will be on the
> > local NBC news (I will be mentioning app engine) and may see a huge
> > volume hitting the site next week.
>
> > On Apr 17, 4:54 am, Sylvain <sylvain.viv...@gmail.com> wrote:
>
> > > Datastoretimeoutis one of the biggest (and oldest) issue with GAE
> > > and mostly because it is random.
> > > You can get it on get, put, fetch,... even with few entities.
>
> > > I hope that soon, datastoretimeoutwill be negligible. Currently, it
> > > > > > > > > > > I can't have jobs justtimeoutfor no reason.    According to

ray

unread,
Apr 18, 2009, 8:33:19 PM4/18/09
to Google App Engine
Today, my app has seen 2 timeout errors at 3:10 pm according to the
log. Each one was 30 seconds apart. One was a Get and the other a
Put. Each on totally different classes that are not connected in any
way.

barabaka

unread,
Apr 30, 2009, 11:45:01 AM4/30/09
to Google App Engine
Well, I've read a lot of posts about google datastore and the problems
with batch operations, relational approach to arrange data in bigtable
etc. but I always thought the problem wasn't in datastore itself but
in the way people use it. Now I can see with my experience that it
acts just in an unpredictable way. I deployed a test java app that
tries to clear 500 (guaranteed amount!) entries per request. All
entries are in the same entity group and delete is executed in batch
in single transaction. All operations are executed with low level API
so no possible overhead is involved. Here is a sample code and logs:

Code (cut):
=============
Query q = new Query(World.class.getSimpleName()); // create query
Iterator<Entity> i = datastoreService.prepare(q).asIterator();
idx = 0;
while (i.hasNext() && idx<500) {
keys.add(i.next().getKey());
idx++;
}

// delete keys in batch
Transaction t = datastoreService.beginTransaction();
datastoreService.delete(keys);
t.commit();
==============

1st request (all goes well, 500 entries removed)
-------------------------------------------------
1.
I 04-30 07:52AM 02.091 org.itvn.controller.TvnController
clearDbBySize: Reading 500 entity keys...
See details
2.
I 04-30 07:52AM 03.832 org.itvn.controller.TvnController
clearDbBySize: Removing keys by groups, total groups: 1
3.
I 04-30 07:52AM 03.832 org.itvn.controller.TvnController
clearDbBySize: Trying to remove 500 entities...
4.
I 04-30 07:52AM 07.873 org.itvn.controller.TvnController
clearDbBySize: Removed 500 entities.

2nd request - timeout exception, on READ operation (i.hasNext())
-------------------------------------------------
1.
I 04-30 07:52AM 22.719 org.itvn.controller.TvnController
clearDbBySize: Reading 500 entity keys...
See details
2.
W 04-30 07:52AM 26.551 Nested in
org.springframework.web.util.NestedServletException: Request
processing failed; nested exception is
com.google.appengine.api.datastore.Datas
3.
W 04-30 07:52AM 26.552 /clear_db/500
com.google.appengine.api.datastore.DatastoreTimeoutException:
datastore timeout: operation took too long. at
com.google.appengine.api.d
4.
C 04-30 07:52AM 26.555 Uncaught exception from servlet
com.google.appengine.api.datastore.DatastoreTimeoutException:
datastore timeout: operation took too long. at com.goog

Here we go, first request executes well, and the next (only a few
seconds later) fails! Note that this is only a test application, with
no load at all. Am I doing something wrong? What's the RELIABLE way to
read/remove 500 entities? Is it a problem with quantity (500)? If so
how much entities could be read without timeout? Can someone give the
reasonable answer to this? If you need more details about app, I can
share this test case in public.

Oleg





Sylvain

unread,
Apr 30, 2009, 2:59:32 PM4/30/09
to Google App Engine
For my app, I never fetch more than 250 entities because I've seen
that if this values is bigger you raise too many datastore timeouts.
But even with 250 entities (with a very basic Kind) something I get a
timeout.

One "funny" thing is that you can fetch up to 1000 entities (whatever
kind, number of attributes,...) but in the fact it doesn't work ->
timeout.

Brandon Thomson

unread,
May 15, 2009, 4:33:16 PM5/15/09
to Google App Engine
Actually, even just fetching one entity by key will frequently cause a
Timeout. My logs are full of these...
> > 2nd request -timeoutexception, on READ operation (i.hasNext())
> > -------------------------------------------------
> >    1.
> >       I 04-30 07:52AM 22.719 org.itvn.controller.TvnController
> > clearDbBySize: Reading 500 entity keys...
> >       See details
> >    2.
> >       W 04-30 07:52AM 26.551 Nested in
> > org.springframework.web.util.NestedServletException: Request
> > processing failed; nested exception is
> > com.google.appengine.api.datastore.Datas
> >    3.
> >       W 04-30 07:52AM 26.552 /clear_db/500
> > com.google.appengine.api.datastore.DatastoreTimeoutException:
> > datastoretimeout: operation took too long. at
> > com.google.appengine.api.d
> >    4.
> >       C 04-30 07:52AM 26.555 Uncaught exception from servlet
> > com.google.appengine.api.datastore.DatastoreTimeoutException:
> > datastoretimeout: operation took too long. at com.goog
>
> > Here we go, first request executes well, and the next (only a few
> > seconds later) fails! Note that this is only a test application, with
> > no load at all. Am I doing something wrong? What's the RELIABLE way to
> > read/remove 500 entities? Is it a problem with quantity (500)? If so
> > how much entities could be read withouttimeout? Can someone give the
Reply all
Reply to author
Forward
0 new messages