"Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

10 views
Skip to first unread message

Dave Peck

unread,
Dec 14, 2009, 2:28:06 PM12/14/09
to Google App Engine
Hello,

I have an app (citygoround.org) that, especially in the morning, often
has 10-15 minutes of outright downtime due to server errors.

Looking into it, I see that right before the downtime starts, a few
requests log the following warning message:

> Request was aborted after waiting too long to attempt to service
your request.
> Most likely, this indicates that you have reached your
simultaneous dynamic request limit.

I'm certainly not over my limit, but I can believe that the request in
question could take a while. (I'll get to the details of that request
in a moment.)

Immediately after these warnings, my app has a large amount of time
(10+ minutes) where *all requests* -- no matter how unthreatening --
raise a DeadlineExceededError. Usually this is raised during the
import of an innocuous module like "re" or "time" or perhaps a Django
1.1 module. (We use use_library.)

My best theory at the moment is that:

1. It's a cold start, so nothing is cached.
2. App Engine encounters the high latency request and bails.
3. We probably inadvertently catch the DeadlineExceededError, so the
runtime doesn't clean up properly.
4. Future requests are left in a busted state.

Does this sound at all reasonable? I see a few related issues (2396,
2266, and 1409) but no firm/completely clear discussion of what's
happening in any of them.

Thanks,
Dave

PS:

The specifics about our high latency request are *not* strictly
relevant to the larger problem I'm having, but I will include them
because I have a second "side" question to ask about it.

The "high latency" request is serving an image. Our app lets users
upload images and we store them in the data store. When serving an
image, our handler:

1. Checks to see if the bytes for the image are in memcache, and if so
returns them immediately.
2. Otherwise grabs the image from the datastore, and if it is smaller
than 64K, adds the bytes to the memcache
3. Returns the result

I'm wondering if using memcache in this way is a smart idea -- it may
very well be the cause of our latency issues. It's hard to tell.

Alternatively, the issue could be: we have a page that shows a large
number (~100) of such images. If someone requests this page, we may
have a lot of simultaneous image-producing requests happening at the
same time. Perhaps _this_ is the root cause of the original "Request
was aborted" issue?

Just not sure here...

Ikai L (Google)

unread,
Dec 14, 2009, 4:32:22 PM12/14/09
to google-a...@googlegroups.com
Do you see that it's consistent at the same times? What's your application ID? I'll look into it.


--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.





--
Ikai Lan
Developer Programs Engineer, Google App Engine

Dave Peck

unread,
Dec 14, 2009, 5:15:12 PM12/14/09
to Google App Engine
Hi Ikai,

The app id is "citygoround".

We had a number of stretches of "badness" this morning. An example
stretch:

6:07AM 33.867 ("Request was aborted...")
6:07AM 49.672 through 7:12AM 24.470 ("DeadlineExceededError" and/or
"ImproperlyConfiguredError" -- looks like it depends on which imports
fail.)

And another:

8:17AM 37.620 ("Request was aborted...")
8:17AM 54.348 through 8:46AM 51.478 ("DeadlineExceededError" and/or
"ImproperlyConfiguredError")

One last thing: the app is open source. If it helps, you can find the
exact code that we're running in production at:

http://github.com/davepeck/CityGoRound/

The screenshot handler in question is found in ./citygoround/views/
app.py Line 115.

Cheers,
Dave
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Jason C

unread,
Dec 15, 2009, 12:14:31 PM12/15/09
to Google App Engine
Ikai,

We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am to
7.30am (log time).

Can you look into that as well?

Thanks,
j
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Dave Peck

unread,
Dec 15, 2009, 1:54:30 PM12/15/09
to Google App Engine, Jason C
Hi Ikai,

Any further details on your end? I get the feeling we're not the only
ones, and we've experienced very serious downtime in the last ~48
hours.

This is a critical issue for us to resolve, but at the same time we
lack key pieces of data that would help us solve it on our own...

Thanks,
Dave

Ikai L (Google)

unread,
Dec 15, 2009, 2:26:01 PM12/15/09
to google-a...@googlegroups.com
Dave,
 
You're correct that this is likely affecting other applications, but it's not a global issue. There are hotspots in the cloud that we notice are being especially impacted during certain times of the day. We're actively working on addressing these issues, but in the meantime, there are manual steps we can try to prevent your applications from becoming resource starved. We do these on a one-off basis and reserve them only for applications that seem to exhibit the behavior of seeing DeadlineExceeded on simple actions (not initial JVM startup), and at fairly predictable intervals during the day. I've taken these steps to try to remedy your application. Can you let us know if these seem to help? If not, they may indicate that something is going on with your application code, though that does not seem like the case here.


To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Dave Peck

unread,
Dec 15, 2009, 2:39:20 PM12/15/09
to Google App Engine
Ikai,

We'll keep an eye on our app for the next ~24 hours and report back.

At what time did you make the changes to our instance? We had
substantial downtime earlier today, alas.

Can you provide any details about what sort of change was made?

Thanks,
Dave

On Dec 15, 11:26 am, "Ikai L (Google)" <ika...@google.com> wrote:
> Dave,
>
> You're correct that this is likely affecting other applications, but it's
> not a global issue. There are hotspots in the cloud that we notice are being
> especially impacted during certain times of the day. We're actively working
> on addressing these issues, but in the meantime, there are manual steps we
> can try to prevent your applications from becoming resource starved. We do
> these on a one-off basis and reserve them only for applications that seem to
> exhibit the behavior of seeing DeadlineExceeded on simple actions (not
> initial JVM startup), and at fairly predictable intervals during the day.
> I've taken these steps to try to remedy your application. Can you let us
> know if these seem to help? If not, they may indicate that something is
> going on with your application code, though that does not seem like the case
> here.
>
>
>
>
>
> On Tue, Dec 15, 2009 at 10:54 AM, Dave Peck <davep...@gmail.com> wrote:
> > Hi Ikai,
>
> > Any further details on your end? I get the feeling we're not the only
> > ones, and we've experienced very serious downtime in the last ~48
> > hours.
>
> > This is a critical issue for us to resolve, but at the same time we
> > lack key pieces of data that would help us solve it on our own...
>
> > Thanks,
> > Dave
>
> > > > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com><google-appengine%2Bunsubscrib

Ikai L (Google)

unread,
Dec 15, 2009, 2:56:42 PM12/15/09
to google-a...@googlegroups.com
I made the change right before I sent the email. Let me know how it works for you.

Jason, I also made the change to your application. Please report back after tomorrow if you continue to experience issues.

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Jason C

unread,
Dec 16, 2009, 10:00:50 AM12/16/09
to Google App Engine
We (steprep) still saw a set of them on Dec 16 starting 3.54am through
6.57am (log time).

j

Wesley Chun (Google)

unread,
Jan 19, 2010, 9:10:23 PM1/19/10
to Google App Engine
dave, jason,

just wanted to do a follow-up to see where things stand with your apps
now. i'm coming across a similar user issue and was wondering whether
it's the same problem or not. can you post your complete error stack
traces if you're still running into this issue? here's the issue filed
by the other user FYI, who's app seems to have few requests but each
one has high latency:

http://code.google.com/p/googleappengine/issues/detail?id=2621

if your respective apps don't suffer from this problem any more, what
did you do to resolve it or did it magically go away?

thanks,
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
http://corepython.com

wesley.j.chun :: wesc...@google.com
developer relations :: google app engine

Jason C

unread,
Jan 20, 2010, 10:43:17 AM1/20/10
to Google App Engine
I was under the impression that something happened internally at
Google to adjust the way that apps were balanced around machines and/
or other internal tuning.

Additionally, we run a ping every 10 seconds to keep an instance hot.
While I understand how this doesn't have much effect in a distributed
environment (though practically speaking in this case it does seem to
have a positive effect), and while I also understand how this is
"abuses" a shared resource, I'm currently afraid to turn it off.

j

On Jan 19, 8:10 pm, "Wesley Chun (Google)" <wesc+...@google.com>
wrote:

> wesley.j.chun :: wesc+...@google.com

Reply all
Reply to author
Forward
0 new messages