Latency spikes and aborted requests in the last 24 hours

136 views
Skip to first unread message

Ikai Lan (Google)

unread,
Sep 23, 2010, 11:19:56 AM9/23/10
to google-a...@googlegroups.com
Hey guys,

We've been tracking latency spikes and intermittent batches of aborted requests over the last 24 hour period. There were at least 3 periods of spikes yesterday:

~1100 Pacific (moderate, brief period of high)
~1300 Pacific (small)
~1530 Pacific (small)

Community reports seem to indicate that these are related to loading requests. If you are being affected, can you report your environment, language and if these times correlate with your experiences in this thread? Are there periods that this particular report did not cover? Please provide measurements, times and application IDs.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine

Fredrik Bonander

unread,
Sep 23, 2010, 11:42:43 AM9/23/10
to google-a...@googlegroups.com
Hi, 

I've been experiencing "DeadlineExceededError" that doesn't match these reports:

21/9: 04.15 -> 04.45 (9 different requests of 80 total during 21/9)
20/9: 05.27
17/9: 03.06

All request times after ~30 sec.

I use Appengine 1.3.7, Python and Django 1.1. The application id is martinadlerphotography.

If more information is needed I'll be happy to provide it.

..fredrik



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

--
Fredrik Bonander

the infinite power of the creative mind - 




Ikai Lan (Google)

unread,
Sep 23, 2010, 11:54:10 AM9/23/10
to google-a...@googlegroups.com
Fredrik, are these loading requests? If so, how long does a standard loading request take to load?


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Eli Jones

unread,
Sep 23, 2010, 12:10:29 PM9/23/10
to google-a...@googlegroups.com
My app is in Python.  The only thing it does between 6:30 AM PST and 1:00 PM PST is run a single chained task.  The task just runs once every 5 minutes.  (There are no other backend processes or users).

Yesterday at 11:00 AM PST and 11:02 AM PST, the task experienced the 10 second, 500 error (with 0 CPU_MS used).. then the task successfully retried at 11:02:55 AM PST.

There was another one (not mentioned in your list of times for yesterday) at 7:45 AM PST.  Failed after 10 seconds with a 500 error and 0 CPU_MS used.  The task successfully retried at 7:47 AM PST (for some reason, the tasks in my app wait 120 seconds for the first retry)

For some reason, the log doesn't contain the message that usually goes with this error (there is no detailed log message):

 "Request was aborted after waiting too long to attempt to service your request. This may happen sporadically when the App Engine serving cluster is under unexpectedly high or uneven load. If you see this message frequently, please contact the App Engine team."

But, it has the exact same fingerprint. (It just seems that the logs are trimming the detailed messages faster than usual.)

This chained task runs 79 times each day (this would be 82 requests counting these 3 errors).. so these errors accounted for 3.66 % of the requests during the day yesterday.


Fredrik Bonander

unread,
Sep 23, 2010, 12:11:14 PM9/23/10
to google-a...@googlegroups.com
Yes, normal get-request. A standard request is about 300-500ms. But just view the logs and it's have been 3-4 spikes today taking 800-1100 ms.

On a side note. On a normal request (loads data from memcache) should i really take 300-500 ms? The app it self is a real basic html site thats populated with about 3-5 entrys from the datastore?

..fredrik

Sent from my iPad

Eli Jones

unread,
Sep 23, 2010, 12:15:44 PM9/23/10
to google-a...@googlegroups.com
Also, a cold start for this task can take as little as 960 ms (where it shows the log detail: "This request caused a new process to be started for your application...").

And, the task is doing a URLFetch, so.. I'm sure that is taking up a chunk of the running time.

One of my imports is pytz (no django or anything like that though).. so that could also be of interest.

Ikai Lan (Google)

unread,
Sep 23, 2010, 12:20:33 PM9/23/10
to google-a...@googlegroups.com
It sounds like these are impacting Python developers - I haven't heard any Java app developers chime in yet.

What's the ballpark of the QPS you guys are doing? <1qps, 10qps, 100qps, etc?


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Fredrik Bonander

unread,
Sep 23, 2010, 12:26:57 PM9/23/10
to google-a...@googlegroups.com
Sorry, what's QPS?

..fredrik

Sent from my iPad

Fredrik Bonander

unread,
Sep 23, 2010, 12:31:50 PM9/23/10
to Fredrik Bonander, google-a...@googlegroups.com
Oh, queries per second?

Very low. At it's highest it's about 0,6. Only have about 50-60 visitors a day.

Sent from my iPad

Carlos Ble

unread,
Sep 23, 2010, 12:34:05 PM9/23/10
to google-a...@googlegroups.com
Same here

2010/9/23 Fredrik Bonander <carl.fredr...@gmail.com>



--
Carlos Ble
www.MavenCharts.com
www.iExpertos.com
www.carlosble.com

Eli Jones

unread,
Sep 23, 2010, 12:39:35 PM9/23/10
to google-a...@googlegroups.com
My app is extremely lightweight.

The average CPU usage for a cold start falls in these ranges:

CPU_MS:    1030 - 1230
API_CPU_MS:  408

The rough outline of the code that runs is:

1.  Pull credentials form memcache and decrypt.
2.  Connect to gdata service, and get some data.
3.  Create 4 new entities (of same Model type) containing this fetched data and append to a list.
4.  Stick those entities in memcache as protobufs.
5.  Create new entity (for a different Model) and append to the list used in Step 3.
6.  db.put(list) to the datastore. (or keep trying until success)
7. If necessary, add task to run at next 5 minute mark.

So.. as long as memcache is working and the encrypted credentials are there.. there are not any Gets from the datastore.  There is just the Put that is putting 4 entities of one model type and 1 entity of another model type all at once.

timezone from Pytz is used for correct timezone calculations at the start of the day.. to make sure that the initial task starts at the correct time.  (At this point I could move the "from pytz import timezone" down to just the function that calculates the initial start delay.. but I avoid making changes to this code unless it's really necessary (timezone was used in more than one place in the past.. so it was imported at the top of the file).

Jason C

unread,
Sep 23, 2010, 1:06:00 PM9/23/10
to Google App Engine
We get lots of bursts of loading failures. Most recently: 9.45a to
9.47a log time (2010-09-23). Appid: steprep

We also get lots of ongoing 10-second timeout. Using the logs, it
seems to me that our user-facing requests are below the 1000ms
threshold, so I don't know why we get so many 10-second timeouts
(i.e., the "Request was aborted after waiting too long to attempt to
service your request. This may happen sporadically when the App Engine
serving cluster is under unexpectedly high or uneven load. If you see
this message frequently, please contact the App Engine team." error).

Our QPS varies throughout the day from 1 to around 12, though we have
jobs that introduce short spikes of 50-60.

j
> > On Thu, Sep 23, 2010 at 12:15 PM, Eli Jones <eli.jo...@gmail.com> wrote:
>
> >> Also, a cold start for this task can take as little as 960 ms (where it
> >> shows the log detail: "This request caused a new process to be started for
> >> your application...").
>
> >> And, the task is doing a URLFetch, so.. I'm sure that is taking up a chunk
> >> of the running time.
>
> >> One of my imports is pytz (no django or anything like that though).. so
> >> that could also be of interest.
>
> >>>> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> >>>> .
> >>>> For more options, visit this group at
> >>>>http://groups.google.com/group/google-appengine?hl=en.
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Google App Engine" group.
> >> To post to this group, send email to google-a...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/google-appengine?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Jason C

unread,
Sep 23, 2010, 4:17:58 PM9/23/10
to Google App Engine
When I refer to bursts of errors, this is an example of what I mean.
This is a query for any 500-series errors against appid: steprep. Note
how close together the log entries are (i.e., the bursty-ness). These
requests operate very quickly when App Engine is performing well.

09-23 01:07PM 42.459 /_ah/queue/deferred 500 29768ms
09-23 01:07PM 42.084 /_ah/queue/deferred 500 29485ms
09-23 01:07PM 38.017 /_ah/queue/deferred 500 29523ms
09-23 01:07PM 33.636 /_ah/queue/deferred 500 29417ms
09-23 01:07PM 32.176 /_ah/queue/deferred 500 29787ms
09-23 01:07PM 30.812 /_ah/queue/deferred 500 29451ms
09-23 01:07PM 25.477 /_ah/queue/deferred 500 29626ms
09-23 01:07PM 24.605 /_ah/queue/deferred 500 30025ms
09-23 01:07PM 14.422 /mgmt/search/start/ 500 40086ms
09-23 01:07PM 24.204 /_ah/queue/deferred 500 29768ms
09-23 01:07PM 20.746 /_ah/queue/deferred 500 29968ms
09-23 01:07PM 30.383 /exec-reports/queue/increment-all-time-data-
counter 500 10003ms
09-23 01:05PM 47.180 /_ah/queue/deferred 500 30070ms
09-23 01:05PM 42.770 /mgmt/search/start/ 500 30053ms
09-23 01:04PM 55.080 /_ah/queue/deferred 500 29404ms
09-23 01:04PM 49.054 /_ah/queue/deferred 500 29589ms

Anything around the 30-s mark is a DeadlineExceededError, anything
around the 10-s mark is the "Request was aborted after waiting too
long to attempt to service your request....".

These bursts are _NOT_ rare; I can find them at will, including
similar bursts of the 10-s style:

09-23 12:52PM 07.445 /_ah/queue/deferred 500 10006ms
09-23 12:52PM 06.510 /_ah/queue/deferred 500 10016ms
09-23 12:52PM 06.303 /_ah/queue/deferred 500 10003ms
09-23 12:52PM 05.902 /_ah/queue/deferred 500 10017ms
09-23 12:52PM 05.641 /_ah/queue/deferred 500 10017ms
09-23 12:52PM 04.872 /_ah/queue/deferred 500 10002ms
09-23 12:52PM 04.211 /_ah/queue/deferred 500 10030ms
09-23 12:52PM 03.759 /_ah/queue/deferred 500 10029ms
09-23 12:52PM 03.652 /_ah/queue/deferred 500 10021ms

I am running out of ideas of what I can do at my end to address this
issue. Does anyone from Google have suggestions? Is this an
infrastructure / autoscaling problem, or is it a problem at my end?

Ikai Lan (Google)

unread,
Sep 23, 2010, 4:21:22 PM9/23/10
to google-a...@googlegroups.com
Jason, are you calling URLFetch/datastore/Memcache in that action? 

So far there are no reports of Java applications being impacted. We usually have a bit more of flexibility when it comes to tearing down and spinning up Python instances, so I'm wondering if this is related.


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Jason C

unread,
Sep 23, 2010, 4:25:53 PM9/23/10
to Google App Engine
Yes, in the /_ah/queue/deferred requests, we are almost certainly
doing datastore/memcache work and very likely doing urlfetch work (I'm
referring to this probabilistically, because we use deferred for many
things).

BTW, appid steprep is Python.

j

On Sep 23, 2:21 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> > > > >>>> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com><google-appengine%2Bunsubscrib
> > e...@googlegroups.com>
> > > > >>>> .
> > > > >>>> For more options, visit this group at
> > > > >>>>http://groups.google.com/group/google-appengine?hl=en.
>
> > > > >>  --
> > > > >> You received this message because you are subscribed to the Google
> > Groups
> > > > >> "Google App Engine" group.
> > > > >> To post to this group, send email to...
>
> read more »

Jason C

unread,
Sep 23, 2010, 4:35:28 PM9/23/10
to Google App Engine
Lest you think it's something specific with deferred, I get bursts on
other urls as well. My examples often have lots of deferred because
that is by far our most common url. Here is another example:

09-23 01:25PM 27.093 /mgmt/search/start/ 500 29684ms
09-23 01:23PM 55.101 /mgmt/search/start/ 500 29973ms
09-23 01:22PM 25.100 /mgmt/search/start/ 500 29746ms
09-23 01:20PM 44.929 /mgmt/search/start/ 500 40098ms
09-23 01:19PM 15.158 /mgmt/search/start/ 500 29602ms

Here's an example from a test deployment of the same app, with much
lower QPS (appid: steprep-demo). Note the blend of 30-s and 10-s
timeouts.

09-23 01:27PM 27.743 /_ah/queue/deferred 500 37833ms
09-23 01:27PM 42.985 /_ah/queue/deferred 500 10227ms
09-23 01:27PM 42.972 /exec-reports/queue/increment-daily-data-counter
500 10237ms
09-23 01:27PM 37.157 /exec-reports/queue/increment-all-time-data-
counter 500 10052ms
09-23 01:27PM 36.758 /mgmt/jobs/put/deferred/?jobId=478f5f7f-fb3f-4974-
b033-18d1ef75902a 500 10102ms
09-23 01:27PM 36.187 /exec-reports/queue/increment-daily-data-counter
500 10087ms
09-23 01:27PM 35.626 /mgmt/jobs/put/deferred/?jobId=2efd0717-a59f-446c-
b1f9-4a2200420f95 500 10002ms
09-23 01:27PM 26.337 /_ah/queue/deferred 500 10077ms
09-23 01:27PM 26.146 /exec-reports/queue/increment-daily-data-counter
500 10006ms

/mgmt/jobs/put/deferred/, in particular, does 3 db.get(), 2 db.put(),
and 1 db.delete(), all with single entities/keys - no other APIs are
used. Though in this example, that's somewhat irrelevant because it's
hitting the 10-s boundary, not the 30-s.

j
> > > > > >>>> Developer Programs Engineer, Google App Engine...
>
> read more »

Darien Caldwell

unread,
Sep 23, 2010, 4:53:29 PM9/23/10
to Google App Engine
all serving for my app is now 100% dead. even the simplest request is
coming up 500 errors. App ID is slconductor.

Fabiant7t

unread,
Sep 23, 2010, 5:13:18 PM9/23/10
to Google App Engine
@Ikai
We do have about 1.2 requests/second on our application with the ID
"dctptv", so that's probably a good app to debug the problem with. The
application is written in Python, using Memcache, URL Fetch, TaskQueue
APIs, the
Datastore and a zip-imported version of Django 1.2.3.

The whole app has a great test coverage, has been running the same
code without problems 24hours ago and is still building successfully
on our continuous integration server and running on our development
boxes. So I'm pretty sure that's not a problem concerning our code.
Furthermore, we deployed a version having Guido van Rossum's AppStats
as the first Django middleware, so every request that gets routed
through Django directly hits his code (and also does raise Error 500
very often. I don't know if you are able to sign in as a superuser and
have a look to the results, but in case you can:
http://190-fabian.latest.dctptv.appspot.com/appstats (The default
version is "186-fabian", which is running way more stable than "190-
fabian", so the version with appstats does not get any requests beside
yours).

Best regards,
Fabian

On Sep 23, 10:53 pm, Darien Caldwell <darien.caldw...@gmail.com>
wrote:

Blixt

unread,
Sep 24, 2010, 11:11:44 AM9/24/10
to Google App Engine
Our application (simpleeditionsapp) is built with Python and makes use
of the Django 1.1 library. We get a lot of DeadlineExceededErrors
during imports with loading requests. We have been getting these
errors sporadically for the last week.

On Sep 23, 11:13 pm, Fabiant7t <fabian.topfst...@gmail.com> wrote:
> @Ikai
> We do have about 1.2 requests/second on our application with the ID
> "dctptv", so that's probably a good app to debug the problem with. The
> application is written in Python, using Memcache, URL Fetch, TaskQueue
> APIs, the
> Datastore and a zip-imported version of Django 1.2.3.
>
> The whole app has a great test coverage, has been running the same
> code without problems 24hours ago and is still building successfully
> on our continuous integration server and running on our development
> boxes. So I'm pretty sure that's not a problem concerning our code.
> Furthermore, we deployed a version having Guido van Rossum's AppStats
> as the first Django middleware, so every request that gets routed
> through Django directly hits his code (and also does raise Error 500
> very often. I don't know if you are able to sign in as a superuser and
> have a look to the results, but in case you can:http://190-fabian.latest.dctptv.appspot.com/appstats(The default

John Gardner

unread,
Sep 24, 2010, 1:31:34 PM9/24/10
to Google App Engine
I just saw a rash of timeouts (actually, Download Error 5 from an
urlfetch, which appears to be DEADLINE_EXCEEDED).

This was after 5 seconds or so, in a task queue operation that does
nothing apart from a single urlfetch.

The timings logged for the request are all around 5 seconds:
ms=5276 cpu_ms=525 api_cpu_ms=0 cpm_usd=0.029958 queue_name=event-
notify task_name=16139988366227500260
The same request typically takes milliseconds.

It just occurred between these times:
24/Sep/2010:09:26:09 -0700
24/Sep/2010:09:37:53 -0700

James Saull

unread,
Sep 24, 2010, 2:48:43 PM9/24/10
to Google App Engine
Even on Python pages using Django templates (such my "About" page that
did not access anything (e.g. memcache or db) I still had the dreaded
failures. Sporadic too. I'd be clicking around the app briskly with no
issues and then a few minutes later it would just return 500. All URLs
on my site require Admin/Login at the moment as it is not released and
therefore gets very low traffic - just me testing. Whilst all this was
going on the dashboard access was fine and I could use the dashboard's
data viewer page no problem either.

Here is a successful get for a URL that does use memcache and db:
09-21 11:22PM 48.003 /food/view 200 68ms 79cpu_ms 3kb Mozilla/5.0

but often got:

09-23 01:27AM 52.163 /food/view 500 10011ms 0cpu_ms 0kb....
09-22 07:56AM 06.435 /food/view 500 10173ms 0cpu_ms 0kb Mozilla/5.0
09-22 07:55AM 38.587 /food/view 500 27768ms 0cpu_ms 0kb Mozilla/5.0
09-22 07:22AM 05.410 /food/view 500 10009ms 0cpu_ms 0kb Mozilla/5.0
09-22 07:21AM 55.090 /food/view 500 10069ms 0cpu_ms 0kb Mozilla/5.0
09-21 03:12PM 43.961 /food/view 500 10413ms 0cpu_ms 0kb Mozilla/5.0
09-21 03:06PM 45.757 /food/view 500 10036ms 0cpu_ms 0kb Mozilla/5.0
09-21 01:10AM 21.571 /food/view 302 2ms 0cpu_ms 0kb Mozilla/5.0

Request was aborted after waiting too long to attempt to service your
request. This may happen sporadically when the App Engine serving
cluster is under unexpectedly high or uneven load. If you see this
message frequently, please contact the App Engine team.

Even the simple About page that is just a django template:
09-23 07:19AM 20.750 /content/about 200 82ms 0cpu_ms
09-23 02:54AM 27.894 /content/about 500 10002ms 0cpu_ms 0kb Mozilla/
5.0
09-23 01:57AM 00.374 /content/about 200 78ms 0cpu_ms
09-23 12:59AM 54.452 /content/about 500 10085ms 0cpu_ms 0kb Mozilla/
5.0
09-23 12:32AM 26.253 /content/about 200 4597ms 20cpu_ms 2kb Mozilla/
5.0
09-21 03:07PM 12.443 /content/about 500 10188ms 0cpu_ms 0kb Mozilla/
5.0

Request was aborted after waiting too long to attempt to service your
request. This may happen sporadically when the App Engine serving
cluster is under unexpectedly high or uneven load. If you see this
message frequently, please contact the App Engine team.

HTH?

James Saull

unread,
Sep 24, 2010, 2:51:14 PM9/24/10
to Google App Engine
My site does appear to be fine now and has been error free for at
least the last 24 hours. But like I said, my app receives no traffic
except for me testing it so I could just be visiting at lucky times.
Let me know if you would like to know my appspot application id and if
you want me to add diagnostic code.

Ikai Lan (Google)

unread,
Sep 24, 2010, 2:51:10 PM9/24/10
to google-a...@googlegroups.com
Thanks James, that's really useful information. Did you post how many QPS your application does?


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Jason C

unread,
Sep 24, 2010, 5:42:07 PM9/24/10
to Google App Engine
This thread is somewhat quiet today, but we (appid steprep) are still
seeing lots of time outs. Running at about 7 QPS when this happened:

09-24 02:37PM 57.365 /_ah/queue/deferred 500 10021ms
09-24 02:37PM 51.017 /_ah/queue/deferred 500 10011ms
09-24 02:37PM 47.446 /_ah/queue/deferred 500 10013ms
09-24 02:37PM 47.266 /_ah/queue/deferred 500 10010ms
09-24 02:37PM 45.304 /_ah/queue/deferred 500 10039ms
09-24 02:37PM 44.501 /_ah/queue/deferred 500 10010ms
09-24 02:37PM 42.179 /_ah/queue/deferred 500 10009ms
09-24 02:37PM 42.014 /_ah/queue/deferred 500 10029ms
09-24 02:35PM 18.229 /_ah/queue/deferred 500 10031ms
09-24 02:35PM 16.249 /_ah/queue/deferred 500 10069ms
09-24 02:35PM 15.662 /_ah/queue/deferred 500 10065ms
09-24 02:35PM 13.913 /_ah/queue/deferred 500 10003ms
09-24 02:35PM 13.520 /_ah/queue/deferred 500 10021ms
09-24 02:35PM 11.282 /_ah/queue/deferred 500 10087ms
09-24 02:35PM 10.238 /_ah/queue/deferred 500 10003ms
09-24 02:35PM 09.414 /_ah/queue/deferred 500 10094ms
09-24 02:34PM 05.098 /_ah/queue/deferred 500 29954ms
09-24 02:33PM 58.846 /_ah/queue/deferred 500 29576ms
09-24 02:34PM 10.829 /_ah/queue/deferred 500 10014ms
09-24 02:34PM 10.764 /_ah/queue/deferred 500 10042ms
09-24 02:34PM 09.730 /_ah/queue/deferred 500 10026ms
09-24 02:34PM 09.384 /_ah/queue/deferred 500 10003ms
09-24 02:34PM 09.339 /_ah/queue/deferred 500 10011ms
09-24 02:34PM 08.879 /_ah/queue/deferred 500 10081ms
09-24 02:34PM 07.925 /_ah/queue/deferred 500 10026ms
09-24 02:34PM 06.968 /_ah/queue/deferred 500 10001ms
09-24 02:34PM 04.180 /_ah/queue/deferred 500 10058ms
09-24 02:34PM 02.697 /_ah/queue/deferred 500 10053ms

j

On Sep 24, 12:51 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Carlos Ble

unread,
Sep 25, 2010, 4:25:03 PM9/25/10
to google-a...@googlegroups.com
We are having just a few visitors today but still got some 500 codes. How can I know the QPS? 

2010/9/24 Jason C <jason.a...@gmail.com>
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

James Saull

unread,
Sep 25, 2010, 5:39:07 PM9/25/10
to Google App Engine
I had a few more today:

09-25 05:58AM 51.411 /food/view 500 10020ms 0cpu_ms 0kb Mozilla/5.0
09-25 05:57AM 55.829 /food/view 500 49554ms 0cpu_ms 0kb Mozilla/5.0

Ikai - my app is in test at the moment and my config requires login/
admin on all pages. My QPS therefore is nearly 0 and the problem may
exist at many other times during the day - I just wouldn't know it.
This is also my first ever appengine site and it uses all the default
Python libraries.

Regards,
James



On Sep 24, 7:51 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Raymond C.

unread,
Sep 26, 2010, 12:30:16 AM9/26/10
to Google App Engine
My app is still having a lot of timeout on DB put and get. I am just
tired to keep reporting here while no one seems to be doing anything
about it.

I miss the days before the maintenance before 13/9.

Robin

unread,
Sep 26, 2010, 4:43:46 AM9/26/10
to Google App Engine
I have seen this just once on Java. The app ID is visualwebflows. All
custom code, using the datastore directly without an ORM.

Only happened once, but the site is in development so not seeing more
than 100 hits a day.

I have redacted the URLs used, but below is a stack trace from the
error log.

Robin Message

[23/Sep/2010:05:58:23 -0700] "GET REDACTED HTTP/1.1" 500 0 - "Mozilla/
5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.1.9) Gecko/20100330 Fedora/
3.5.9-1.fc11 Firefox/3.5.9,gzip(gfe)" "visualwebflows.appspot.com"
ms=30574 cpu_ms=706 api_cpu_ms=64 cpm_usd=0.019645 pending_ms=232
exit_code=104

#
W 09-23 05:58AM 23.264

Error for REDACTED
com.google.apphosting.runtime.HardDeadlineExceededError: This request
(322478a37b051ea5) started at 2010/09/23 12:57:52.952 UTC and was
still executing at 2010/09/23 12:58:22.751 UTC.
at
com.google.appengine.runtime.Request.process-322478a37b051ea5(Request.java)
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:231)
at sun.misc.Resource.getBytes(Resource.java:114)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
43)
at java.lang.reflect.Method.invoke(Method.java:616)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)
at
net.thetadesign.cherry.server.EditServlet.getOutputFormatter(EditServlet.java:
215)
at
net.thetadesign.cherry.server.EditServlet.evaluate(EditServlet.java:
138)
at net.thetadesign.cherry.server.EditServlet.render(EditServlet.java:
108)
at net.thetadesign.cherry.server.EditServlet.render(EditServlet.java:
50)
at
net.thetadesign.cherry.server.AbstractServlet.showController(AbstractServlet.java:
176)
at
net.thetadesign.cherry.server.AbstractServlet.doGet(AbstractServlet.java:
144)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:693)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:806)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:
511)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1166)
at
com.google.apphosting.utils.servlet.ParseBlobUploadFilter.doFilter(ParseBlobUploadFilter.java:
97)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1157)
at
com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter(SaveSessionFilter.java:
35)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1157)
at
com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:
43)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1157)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:
388)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:
216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:
182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:
765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
418)
at
com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle(AppVersionHandlerMap.java:
238)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:
152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
542)
at org.mortbay.jetty.HttpConnection
$RequestHandler.headerComplete(HttpConnection.java:923)
at
com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable(RpcRequestParser.java:
76)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:
135)
at
com.google.apphosting.runtime.JavaRuntime.handleRequest(JavaRuntime.java:
251)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime
$6.handleBlockingRequest(RuntimePb.java:6784)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime
$6.handleBlockingRequest(RuntimePb.java:6782)
at
com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest(BlockingApplicationHandler.java:
24)
at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:
398)
at com.google.net.rpc.impl.Server$2.run(Server.java:852)
at
com.google.tracing.LocalTraceSpanRunnable.run(LocalTraceSpanRunnable.java:
56)
at
com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan(LocalTraceSpanBuilder.java:
576)
at com.google.net.rpc.impl.Server.startRpc(Server.java:807)
at com.google.net.rpc.impl.Server.processRequest(Server.java:369)
at
com.google.net.rpc.impl.ServerConnection.messageReceived(ServerConnection.java:
442)
at
com.google.net.rpc.impl.RpcConnection.parseMessages(RpcConnection.java:
319)
at
com.google.net.rpc.impl.RpcConnection.dataReceived(RpcConnection.java:
290)
at com.google.net.async.Connection.handleReadEvent(Connection.java:
474)
at
com.google.net.async.EventDispatcher.processNetworkEvents(EventDispatcher.java:
831)
at
com.google.net.async.EventDispatcher.internalLoop(EventDispatcher.java:
207)
at com.google.net.async.EventDispatcher.loop(EventDispatcher.java:
103)
at
com.google.net.rpc.RpcService.runUntilServerShutdown(RpcService.java:
251)
at com.google.apphosting.runtime.JavaRuntime
$RpcRunnable.run(JavaRuntime.java:418)
at java.lang.Thread.run(Thread.java:636)

Jason C

unread,
Sep 27, 2010, 6:55:52 PM9/27/10
to Google App Engine
Appid: steprep, python, QPS around 9-10

We're still seeing large error rates on requests (primarily cron/
taskqueue), seemingly to do with autoscaling. Here's an example chunk:

09-27 03:41PM 22.306 /_ah/queue/deferred 500 35908ms
09-27 03:41PM 48.096 /_ah/queue/deferred 500 10028ms
09-27 03:41PM 28.501 /_ah/queue/deferred 500 29527ms
09-27 03:41PM 46.634 /_ah/queue/deferred 500 10034ms
09-27 03:41PM 20.134 /_ah/queue/deferred 500 35703ms
09-27 03:41PM 20.146 /_ah/queue/deferred 500 35691ms
09-27 03:41PM 18.991 /mgmt/jobs/put/deferred/?
jobId=6c7dae4c-7c6a-4290-96b7-8cae4bd66410 500 36756ms
09-27 03:41PM 45.607 /_ah/queue/deferred 500 10013ms
09-27 03:41PM 45.297 /mgmt/jobs/put/deferred/?jobId=c5f66519-
f37b-4c06-87c8-ac1fe3db6633 500 10040ms
09-27 03:41PM 44.434 /_ah/queue/deferred 500 10040ms
09-27 03:41PM 43.555 /_ah/queue/deferred 500 10026ms
09-27 03:41PM 43.236 /_ah/queue/deferred 500 10029ms
09-27 03:41PM 43.230 /_ah/queue/deferred 500 10005ms
09-27 03:41PM 42.436 /_ah/queue/deferred 500 10013ms
09-27 03:41PM 22.305 /_ah/queue/deferred 500 29862ms
09-27 03:41PM 41.949 /_ah/queue/deferred 500 10040ms
09-27 03:41PM 22.174 /_ah/queue/deferred 500 29437ms
09-27 03:41PM 41.446 /_ah/queue/deferred 500 10006ms
09-27 03:41PM 41.308 /_ah/queue/deferred 500 10041ms
09-27 03:41PM 41.188 /_ah/queue/deferred 500 10008ms
09-27 03:41PM 40.760 /_ah/queue/deferred 500 10050ms
09-27 03:41PM 40.551 /_ah/queue/deferred 500 10132ms
09-27 03:41PM 09.059 /_ah/queue/deferred 500 41582ms
09-27 03:41PM 21.013 /mgmt/search/start/ 500 29442ms
09-27 03:41PM 39.855 /_ah/queue/deferred 500 10020ms

Ikai, please do let me know if there's anything I can do to help from
my end.

Thanks,
j

Jason C

unread,
Sep 28, 2010, 10:32:40 PM9/28/10
to Google App Engine
As of 7.28p (log time) on Tuesday night, my logs show a massive
improvement. MASSIVE.

I almost cannot believe the difference.

Did something change? Man, oh man I hope this isn't just a time of day
anomaly...

This. rocks. hard.

j

Carlos Ble

unread,
Sep 29, 2010, 5:07:01 AM9/29/10
to Google App Engine
We are still facing as many DeadlineExceededError as the past days,
but now it is happening while sending emails:

DeadlineExceededError: The API call mail.Send() took too long to
respond and was cancelled

So the app loads, it works and it gets this exception just trying to
send a single email to a single recipient. Appid: mavencharts

Thanks

Jason C

unread,
Sep 29, 2010, 1:05:45 PM9/29/10
to Google App Engine
We (appid: steprep) still see bursts of DeadlineExceededErrors (e.g.,
log time: 7.29a and 8.24a, Sep 29), but this is a different
characteristic than before where we had _ongoing_ 10-s and 30-s
timeouts on cron jobs.

We also have been seeing the mail.Send() timeout issues for the past
couple of days, both on appid steprep and appid daily-deal. Most of
our emails are sent with taskqueue (and thus they retry); yesterday we
saw some duplicate emails being delivered, so perhaps even when the
mail.Send() fails, there is actually a message being delivered (which
is also not good).

j

Ikai Lan (Google)

unread,
Sep 29, 2010, 2:33:23 PM9/29/10
to google-a...@googlegroups.com
I believe the mail issues may be separate issues. I'll look into those.


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

John Gardner

unread,
Oct 5, 2010, 12:15:39 PM10/5/10
to Google App Engine
I just saw a series of deadline exceeded errors.

Our app is "cxiongame". I was essentially the only user on the
system, but I was creating a number of TaskQueue jobs. I assume the
failures were due to spin up time on additional instances to handle
the task queues.

It would be nice if there was a mechanism to inform AppEngine that
instances are up, so rather than hitting a DeadlineExceeded the task
could just delay a little longer.

On Sep 23, 10:19 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> Hey guys,
>
> We've been tracking latency spikes and intermittent batches of aborted
> requests over the last 24 hour period. There were at least 3 periods of
> spikes yesterday:
>
> ~1100 Pacific (moderate, brief period of high)
> ~1300 Pacific (small)
> ~1530 Pacific (small)
>
> Community reports seem to indicate that these are related to loading
> requests. If you are being affected, can you report your environment,
> language and if these times correlate with your experiences in this thread?
> Are there periods that this particular report did not cover? Please provide
> measurements, times and application IDs.
>

Stepan

unread,
Oct 19, 2010, 1:54:49 PM10/19/10
to Google App Engine
My app id pixycity, it's unusable for last 4 days, cold starts end
regularly in DeadlineExceededError. One week ago the cold starts took
long but were successful.
Thanks
Stepan
www.pixycity.com

Ikai Lan (Google)

unread,
Oct 19, 2010, 2:55:42 PM10/19/10
to google-a...@googlegroups.com
Is the main page doing something that would take a long time to load? Multiple datastore calls, URLFetches?


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

derrickschneider

unread,
Oct 26, 2010, 2:16:00 PM10/26/10
to Google App Engine
I wanted to chime in. We're seeing high latency (10s, 0 cpu_ms) on
tasks that are put into the queue on an hourly basis. This has been
going on for a couple of weeks at least. The end result is that we're
churning through our CPU time early in the morning, and then our app
is dead for the real work it needs to do the rest of the day. I've
temporarily "fixed" the problem by removing our hourly jobs. This is
coming up on a crucial time for our application, so it would be nice
to know the status.

appid is maxistelemetry. All Python, using task queues heavily to do
periodic aggregations of information in the datastore. The task queues
uses Django templating to parse expressions in config files. Just the
default SDK (upgraded to 1.3.8 a few days ago), no special libraries.

Derrick

On Oct 19, 11:55 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> Is the main page doing something that would take a long time to load?
> Multiple datastore calls, URLFetches?
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> Blogger:http://googleappengine.blogspot.com
> Reddit:http://www.reddit.com/r/appengine
> Twitter:http://twitter.com/app_engine
>
> On Tue, Oct 19, 2010 at 10:54 AM, Stepan <stepan.wag...@gmail.com> wrote:
> > My app id pixycity, it's unusable for last 4 days, cold starts end
> > regularly in DeadlineExceededError. One week ago the cold starts took
> > long but were successful.
> > Thanks
> > Stepan
> >www.pixycity.com
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> > .

johnP

unread,
Oct 27, 2010, 2:44:38 PM10/27/10
to Google App Engine

Appengine has been horrible since the last maintenance. Periodic
waves of "Request was aborted after waiting too long..." and
DeadlineExceededErrors... Flurries of 25-100 of these in a 2-3 minute
period. Occurs on simple pages as well as db-intensive ones. I'm
using Django Appengine Patch.

It was bad like this in the late spring, and got better in the summer
up until the last maintenance. Then it got bad again. I guess a .1-.
5% error rate might be survivable, but I'd hope that the standard
we're trying to achieve is a bit higher than this.

:)




On Oct 26, 11:16 am, derrickschneider <derrick.schnei...@gmail.com>
wrote:
> > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>

Carlos Ble

unread,
Oct 27, 2010, 3:25:45 PM10/27/10
to Google App Engine
Same here. We are just planning a migration out of GAE :-(

Derrick Schneider

unread,
Oct 27, 2010, 3:29:22 PM10/27/10
to google-a...@googlegroups.com
Yeah, I'm in the same boat. We're trying to decide which implementation of this app to go with, and having AppEngine be in an unreliable state isn't helping my cause at all.

Derrick

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Writer. Programmer. Puzzle Designer.
http://www.obsessionwithfood.com

nickmilon

unread,
Oct 27, 2010, 4:41:08 PM10/27/10
to Google App Engine
Appl id:geognos
appl version : 5
language :python
live instances at the time: 2
loading request: no

A typical normal operation for this request :"
10-27 09:47AM 08.920 /api/countries/info/CL.json 200 17ms 19cpu_ms 0kb
MSIE 15.0,gzip(gfe)
195.97.105.127 - - [27/Oct/2010:09:47:08 -0700] "GET /api/countries/
info/CL.json HTTP/1.1" 200 551 - "MSIE 8.0,gzip(gfe)"
"5.latest.geognos.appspot.com" ms=18 cpu_ms=19 api_cpu_ms=0
cpm_usd=0.000607
"

While I was load testing the above app with a steady load of around 2
QPS I got 2 error requests out of 30000 (see below).
Particular request handler is very light no access to any service
except a mcache get no heavy imports.
Also some more errors today on an other handler.


10-27 09:04AM 59.320 /api/countries/info/CL.json 500 12599ms 0cpu_ms
0kb MSIE 15.0,gzip(gfe)
W 10-27 09:05AM 11.920 Request was aborted after waiting too long to
attempt to service your request. This may happen sporadically when the
App Engine serving cluster is .....
10-27 08:20AM 51.444 /api/countries/info/CL.json 500 17015ms 0cpu_ms
0kb MSIE 15.0,gzip(gfe)
W 10-27 08:21AM 08.460 Request was aborted after waiting too long to
attempt to service your request. This may happen sporadically when the
App Engine serving cluster is ....


On Sep 23, 6:19 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> Hey guys,
>
> We've been tracking latency spikes and intermittent batches of aborted
> requests over the last 24 hour period. There were at least 3 periods of
> spikes yesterday:
>
> ~1100 Pacific (moderate, brief period of high)
> ~1300 Pacific (small)
> ~1530 Pacific (small)
>
> Community reports seem to indicate that these are related to loading
> requests. If you are being affected, can you report your environment,
> language and if these times correlate with your experiences in this thread?
> Are there periods that this particular report did not cover? Please provide
> measurements, times and application IDs.
>
Reply all
Reply to author
Forward
0 new messages