When is the Timeout bug going to get fixed?

phtq

unread,

Jan 26, 2010, 5:30:45 PM1/26/10

to Google App Engine

Our application error log for the 26th showed around 160 failed http
requests due to timeouts. That's 160 users being forced to hit the
refresh button on their browser to get a normal response. A more
typical day has 20 to 60 timeouts. We have been waiting over a year
for this bug to get fixed with no progress at all. Its beginning to
look like it's unfixable so perhaps Google could provide some
workaround. In our case, the issue arises because of the 1,000 file
limit. We are forced to hold all our .js, .css, .png. mp3, etc. files
in the database and serve them from there. The application is quite
large and there are well over 10,000 files. The Python code serving up
the files does just one DB fetch and has about 9 lines of code so
there is no way it can be magically restructured to make the Timeout
go away. However, putting all the files on the app engine as real
files would avoid the DB access and make the problem go away. Could
Google work towards removing that file limit?

Joshua Smith

unread,

Jan 26, 2010, 5:47:07 PM1/26/10

to google-a...@googlegroups.com

Have you used the retry recipe? It has made about 99.9% of my timeouts go away.

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>

djidjadji

unread,

Jan 26, 2010, 5:48:17 PM1/26/10

to google-a...@googlegroups.com

There is an article series about the datastore. It explains that the
Timeouts are inevitable. It gives the reason for the timeouts. They
will always be part of Bigtable and the Datastore of GAE.

The only solution is a retry on EVERY read. The get by id/key and the queries.
If you do that then very few reads will result in a Timeout.
I wait first 3 and then 6 secs between each request. I log each Timeout.
If still Timeout after 3 read tries I raise the exception.

The result is very few final read Timeouts. The log shows frequent
requests that need a retry, but most of them will succeed with the
first.

For speed, fetch the Static content object by key_name, and key_name
is the file path.

2010/1/26 phtq <phe...@typequick.com.au>:

Prem

unread,

Jan 27, 2010, 3:15:55 PM1/27/10

to Google App Engine

Keeping static files on external storage like Amazon S3/Amazon
Cloudfront or any CDN might help. I have not hit the file limit yet
but I did this to speed up page response. Maybe this will help?

phtq

unread,

Jan 27, 2010, 5:48:25 PM1/27/10

to Google App Engine

Thanks for mentioning this recipe, it worked well in testing and we
will try it on the user population tomorrow.

On Jan 27, 9:48 am, djidjadji <djidja...@gmail.com> wrote:
> There is an article series about the datastore. It explains that the
> Timeouts are inevitable. It gives the reason for the timeouts. They
> will always be part of Bigtable and the Datastore of GAE.
>
> The only solution is a retry on EVERY read. The get by id/key and the queries.
> If you do that then very few reads will result in a Timeout.
> I wait first 3 and then 6 secs between each request. I log each Timeout.
> If still Timeout after 3 read tries I raise the exception.
>
> The result is very few final read Timeouts. The log shows frequent
> requests that need a retry, but most of them will succeed with the
> first.
>
> For speed, fetch the Static content object by key_name, and key_name
> is the file path.
>

> 2010/1/26 phtq <pher...@typequick.com.au>:

phtq

unread,

Jan 27, 2010, 5:50:38 PM1/27/10

to Google App Engine

Splitting an application across multiple systems leaves your
application with a downtime which is the sum of the downtimes of the
individual systems. I wouldn't like to do that. I would hope Google
would lift the file limit so we could get the extra speed within the
same system.

phtq

unread,

Feb 9, 2010, 5:54:26 PM2/9/10

to Google App Engine

The recipe does cut down the Timeouts dramatically, but there are
still a large number which seem to bypass the this fix completely. A
sample error log entry is attached:

Exception in request:
Traceback (most recent call last):
File "/base/python_lib/versions/third_party/django-0.96/django/core/
handlers/base.py", line 77, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/base/data/home/apps/kbdlessons/1-01.339729324125102596/
views.py", line 725, in newlesson
productentity = Products.gql("where Name = :1", ProductID).get()
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1564, in get
results = self.fetch(1, rpc=rpc)
File "/base/python_lib/versions/1/google/appengine/ext/db/
__init__.py", line 1616, in fetch
raw = raw_query.Get(limit, offset, rpc=rpc)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 1183, in Get
limit=limit, offset=offset, prefetch_count=limit,
**kwargs)._Get(limit)
File "/base/python_lib/versions/1/google/appengine/api/
datastore.py", line 1113, in _Run
raise _ToDatastoreError(err)
Timeout

Any ideas on how to deal with is class of Timeouts?

On Jan 28, 9:48 am, phtq <pher...@typequick.com.au> wrote:
> Thanks for mentioning this recipe, it worked well in testing and we
> will try it on the user population tomorrow.
>
> On Jan 27, 9:48 am, djidjadji <djidja...@gmail.com> wrote:
>
>
>
> > There is an article series about the datastore. It explains that the
> > Timeouts are inevitable. It gives the reason for the timeouts. They
> > will always be part of Bigtable and the Datastore of GAE.
>
> > The only solution is a retry on EVERY read. The get by id/key and the queries.
> > If you do that then very few reads will result in aTimeout.
> > I wait first 3 and then 6 secs between each request. I log eachTimeout.

> > If stillTimeoutafter 3 read tries I raise the exception.

Eli Jones

unread,

Feb 10, 2010, 12:02:30 AM2/10/10

to google-a...@googlegroups.com

Well.. you can always wrap puts with a try,except like so (if you want it to just keep retrying):

wait = .1

while True:

try:

db.put(myEntity)

break

except db.Timeout:

from time import sleep

sleep(wait)

wait *= 2

tom

unread,

Feb 10, 2010, 1:59:33 AM2/10/10

to Google App Engine

I use this code that I wrote:

import time
import traceback

def retry(func,*args,**kw):

start=time.time()
e=0
while time.time()-start < 25:
try:
return func(*args,**kw)
except:
traceback.print_exc()
e+=1
time.sleep(e)

raise

def do():

''' do your stuff like writing to datastore '''

pass

retry(do)

On Feb 10, 7:02 am, Eli Jones <eli.jo...@gmail.com> wrote:
> Well.. you can always wrap puts with a try,except like so (if you want it to
> just keep retrying):
>
> wait = .1
> while True:
> try:
> db.put(myEntity)
> break
> except db.Timeout:
> from time import sleep
> sleep(wait)
> wait *= 2
>

> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>

> > .
> > > > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>

ryan

unread,

Feb 11, 2010, 1:28:24 PM2/11/10

to Google App Engine

hi phtq! out of curiosity, would you consider 1.3.1's extended retries
a "fix" for the timeout "bug"?

http://googleappengine.blogspot.com/2010/02/app-engine-sdk-131-including-major.html

(of course, as djidjadji mentioned, it's not really a bug as much as
an unfortunate fact of life in distributed systems, and 1.3.1's
extended retries aren't a fix as much as a mitigating factor. still,
i'm curious about the perception.)

phtq

unread,

Feb 11, 2010, 6:08:36 PM2/11/10

to Google App Engine

Hello Ryan,

Looking at our error logs for the last 2 days, I would have to save
the situation is improved with the advent of 1.3.1, but certainly not
fixed. From the standpoint of our app., being forced to supply all our
mp3, png, etc. files out of the database enormously increases our
exposure to the timeout 'feature'. If we could have real files (which
one would hope didn't suffer from some similar timeout problem) I
don't think we would have any substantial problem. As it stands, I
still see 30 or so timeout entries in the log per day. That number is
still a bit too high for me.

On Feb 12, 5:28 am, ryan <ryanb+appeng...@google.com> wrote:
> hi phtq! out of curiosity, would you consider 1.3.1's extended retries
> a "fix" for the timeout "bug"?
>

> http://googleappengine.blogspot.com/2010/02/app-engine-sdk-131-includ...

ryan

unread,

Feb 12, 2010, 3:33:44 AM2/12/10

to Google App Engine

On Feb 11, 3:08 pm, phtq <pher...@typequick.com.au> wrote:
>
> Looking at our error logs for the last 2 days, I would have to save
> the situation is improved with the advent of 1.3.1, but certainly not
> fixed. From the standpoint of our app., being forced to supply all our
> mp3, png, etc. files out of the database enormously increases our
> exposure to the timeout 'feature'.

i definitely agree, serving static files from the datastore is far
from optimal. the 1000 file limit is definitely something we're aware
of and still actively thinking about, but i don't have any updates on
that right now.

having said that, if these truly are static files that don't change, i
assume you're caching them in memcache and usually serving from there,
as opposed to serving them from the datastore on every request...?

also, if they're static files, the blobstore api would be much more
appropriate than the datastore. have you tried it?

Stephen

unread,

Feb 12, 2010, 12:42:13 PM2/12/10

to Google App Engine

On Feb 12, 8:33 am, ryan <ryanb+appeng...@google.com> wrote:
> On Feb 11, 3:08 pm, phtq <pher...@typequick.com.au> wrote:
>
> > Looking at our error logs for the last 2 days, I would have to save
> > the situation is improved with the advent of 1.3.1, but certainly not
> > fixed. From the standpoint of our app., being forced to supply all our
> > mp3, png, etc. files out of the database enormously increases our
> > exposure to the timeout 'feature'.
>
> i definitely agree, serving static files from the datastore is far
> from optimal. the 1000 file limit is definitely something we're aware
> of and still actively thinking about, but i don't have any updates on
> that right now.

http://highscalability.com/blog/2010/1/22/how-buddypoke-scales-on-facebook-using-google-app-engine.html

Most of the cost of BuddyPoke is in content delivery. The
main app for BuddyPoke is a flash file must be served. These
costs are much higher than the costs for running the actual
application. Dave is investigating Rackspace for file serving.
GAE has a relatively high failure rate for accessing content, which
is acceptable when returning avatars, but is not OK for loading
up the initial image.

ie. the failure rate for static file serving on App Engine is so high
that BuddyPoke has to use an expensive content delivery network to
serve it's flash app. :-(

> also, if they're static files, the blobstore api would be much more
> appropriate than the datastore. have you tried it?

Is the blobstore API faster/more-reliable than serving from the db/
memcache?

Obviously, if your files are > 1MB then the blobstore is your only
option. If you have a few static files then static file serving is an
option. But if you have dynamic files < 1MB there are now two
options: db/memcache or blobstore. Which is better, and why?

Ikai L (Google)

unread,

Feb 12, 2010, 1:04:49 PM2/12/10

to google-a...@googlegroups.com

I actually have some familiarity with BuddyPoke. Regarding static file serving failures: they're actually very low, but they are still higher than a traditional CDN provider's. Dave's entire application IS a Flash application, so if this doesn't serve, his application won't work. We're working to improve this, but we can understand that there are parts of his business needs that we can't meet yet.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

--
Ikai Lan
Developer Programs Engineer, Google App Engine
http://googleappengine.blogspot.com | http://twitter.com/app_engine

Denis

unread,

Feb 13, 2010, 3:29:34 AM2/13/10

to Google App Engine

Hi Tom

except: clause will catch too many exceptions - what if this will be
not a timeout but your own error in code?

I propose to do

try:
return func(*args, **kw)
except (db.InternalError, db.Timeout,
db.CapabilityDisabledError), e:
logging.error(...)
....

Regards

Denis

unread,

Feb 13, 2010, 3:30:39 AM2/13/10

to Google App Engine

Hi Tom

except: clause will catch too many exceptions - what if this will be
not a timeout but your own error in code?

I propose to do

try:
return func(*args, **kw)
except (db.InternalError, db.Timeout,
db.CapabilityDisabledError), e:
logging.error(...)
....

Regards

On Feb 10, 8:59 am, tom <stoch...@gmail.com> wrote:

Nick Johnson (Google)

unread,

Feb 13, 2010, 7:11:20 AM2/13/10

to google-a...@googlegroups.com

On Fri, Feb 12, 2010 at 8:33 AM, ryan <ryanb+a...@google.com> wrote:

On Feb 11, 3:08 pm, phtq <pher...@typequick.com.au> wrote:
>
> Looking at our error logs for the last 2 days, I would have to save
> the situation is improved with the advent of 1.3.1, but certainly not
> fixed. From the standpoint of our app., being forced to supply all our
> mp3, png, etc. files out of the database enormously increases our
> exposure to the timeout 'feature'.

i definitely agree, serving static files from the datastore is far
from optimal. the 1000 file limit is definitely something we're aware
of and still actively thinking about, but i don't have any updates on
that right now.

I think you mean the 3000 file limit. :)

-Nick

having said that, if these truly are static files that don't change, i
assume you're caching them in memcache and usually serving from there,
as opposed to serving them from the datastore on every request...?

also, if they're static files, the blobstore api would be much more
appropriate than the datastore. have you tried it?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

--
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Stephen

unread,

Feb 13, 2010, 11:14:43 AM2/13/10

to Google App Engine

On Feb 12, 6:04 pm, "Ikai L (Google)" <ika...@google.com> wrote:
>
> On Fri, Feb 12, 2010 at 9:42 AM, Stephen <sdea...@gmail.com> wrote:
>
> >http://highscalability.com/blog/2010/1/22/how-buddypoke-scales-on-fac...

> >
> > Most of the cost of BuddyPoke is in content delivery. The
> > main app for BuddyPoke is a flash file must be served. These
> > costs are much higher than the costs for running the actual
> > application. Dave is investigating Rackspace for file serving.
> > GAE has a relatively high failure rate for accessing content, which
> > is acceptable when returning avatars, but is not OK for loading
> > up the initial image.
> >
> > ie. the failure rate for static file serving on App Engine is so high
> > that BuddyPoke has to use an expensive content delivery network to
> > serve it's flash app. :-(
>

> I actually have some familiarity with BuddyPoke. Regarding static file
> serving failures: they're actually very low, but they are still higher than
> a traditional CDN provider's. Dave's entire application IS a Flash
> application, so if this doesn't serve, his application won't work. We're
> working to improve this, but we can understand that there are parts of his
> business needs that we can't meet yet.

Great. I thought I remembered seeing 10% error rate reported in a
slide deck, but I guess you're saying it's much lower than that.

> > > also, if they're static files, the blobstore api would be much more
> > > appropriate than the datastore. have you tried it?
> >
> > Is the blobstore API faster/more-reliable than serving from the db/
> > memcache?
> >
> > Obviously, if your files are > 1MB then the blobstore is your only
> > option. If you have a few static files then static file serving is an
> > option. But if you have dynamic files < 1MB there are now two
> > options: db/memcache or blobstore. Which is better, and why?

Any ideas on blobstore vs. db/memcache for serving smallish 'static'
files? Profile pics, for example.

ryan

unread,

Feb 15, 2010, 11:40:04 AM2/15/10

to Google App Engine

On Feb 12, 9:42 am, Stephen <sdea...@gmail.com> wrote:

> Obviously, if your files are > 1MB then the blobstore is your only
> option. If you have a few static files then static file serving is an
> option. But if you have dynamic files < 1MB there are now two
> options: db/memcache or blobstore. Which is better, and why?

if the files won't change, and they really are opaque blobs,
definitely prefer blobstore. it shortcuts much of the app engine
serving stack, so it will usually be somewhat faster and more reliable
than the datastore. probably only by a small constant factor, but
that's still something.

> Great. I thought I remembered seeing 10% error rate reported in a
> slide deck, but I guess you're saying it's much lower than that.

yes. even before the extended retries in 1.3.1, the datastore's
overall average error rate was a few *orders of magnitude* less than
that. i don't know the error rate for static file serving off the top
of my head, but it's almost certainly even better, maybe as much as an
order of magnitude less than the datastore's itself.

(that's average, of course. spikes will always happen occasionally.
still, we've managed to reduce them pretty significantly in the last
few months too.)

Reply all

Reply to author

Forward