Application instances seem to be too aggressively recycled

53 views
Skip to first unread message

Jason C

unread,
Mar 10, 2009, 1:22:27 PM3/10/09
to Google App Engine
We have a new application that receives _very_ little load. So little,
in fact, that each request spins up a new application instance. We are
using Django trunk and the import overhead is high. All of this yields
a long request (e.g., 8802ms) using a lot of CPU (e.g., 3247ms-cpu).

With very little load, it makes sense that instances are recycled. On
that assumption, we've started applying some primer load against a
couple of uris in an attempt to keep some instances hot. We're
applying around 1 request/second across 2 uris.

When we hit a hot instance, we get blazing speed (e.g., url_1: 73ms
91ms-cpu, url_2: 368ms 615ms-cpu - these values are pulled from the
App Engine console Logs tool and I'm not completely sure if this
represents Runtime, or combined Runtime/API - I believe the latter).

Under this 1 request/second load, we are still seeing lots of instance
startup - even after 40-50 minutes of sustained load. Subjectively,
the instance startups seem to come in bursts, though we've done no
formal analysis around this.

Does anyone else see this behavior? It _really_ kills our application
performance - so much so, that we're considering moving away from
Django in an effort to minimize the start-up pain.

Any info or war stories would be appreciated.

peterk

unread,
Mar 10, 2009, 2:23:01 PM3/10/09
to Google App Engine
How frequently are you hitting appengine 'cold', requiring a start-up?

I'm running an app on django using app-engine-patch. Just testing and
so forth, it averages around 1 request every 5 to 10 seconds. Just
looking over my last 60 requests or so, I don't see any evidence of
'cold starts', all requests are within the range I'd be expecting
(100-300ms in my case).

Your start-up cost in any case sounds very high..are you using app-
engine-patch, or how are you using django?

Jason C

unread,
Mar 10, 2009, 2:34:49 PM3/10/09
to Google App Engine
As of right now, we are seeing instance start-ups around every 2-3
seconds - every 2-4 requests.

We are using http://code.google.com/p/google-app-engine-django/ as the
shim.

j

peterk

unread,
Mar 10, 2009, 2:44:24 PM3/10/09
to Google App Engine
That sounds like very strange behaviour.

I don't have much experience with django helper..I remember using it
before switching to appengine patch, but I don't remember having these
kinds of issues with it. I wasn't really looking out for them though,
mind you.

The only reassurance I can provide is that it definitely shouldn't be
that way, that's not normal behaviour..so likely someone will be able
to help you sort out what's causing this. I can recommend app-engine-
patch as an alternative if you feel like comparing though. It's
available here:

http://code.google.com/p/app-engine-patch/

On Mar 10, 6:34 pm, Jason C <jason.a.coll...@gmail.com> wrote:
> As of right now, we are seeing instance start-ups around every 2-3
> seconds - every 2-4 requests.
>
> We are usinghttp://code.google.com/p/google-app-engine-django/as the

cz

unread,
Mar 10, 2009, 8:16:13 PM3/10/09
to Google App Engine
Actually, that is normal behavior. This has been discussed in previous
threads.
GAE seems to aggressively purge it's app cache, average app lifetime
appears to be under 2 seconds. Appengine-patch may be marginally
faster but both require Django1.x to be imported via zipimport, which
is pretty expensive.
Our app also exhibits this problem due to fairly low traffic, but
there's not much that can be done as far as I can tell. Our app's
dynamic pages contain lots of images also served via GAE, and since
the browser can make many requests at once to load these, the app can
be started up on several sever instances (probably due to load
balancing) to deliver all the images. This can add up to a huge amount
of CPU usage for just one page.
Basically, the more traffic your app gets the faster it will be.
Some people have suggested using a process somewhere to automatically
make requests once per second or so to keep apps in the cache. But
this is surely frowned upon by Google.

Jarek Zgoda

unread,
Mar 11, 2009, 4:13:35 AM3/11/09
to Google App Engine
I'm seeing the same behaviour but I do not use Django (Werkzeug +
Jinja2). While this combo seems lighter (in terms of CPU usage), my
app becomes "cold" each 2-3 seconds and request takes > 1200ms CPU to
be served, with Jinja2 Environment creation taking most of CPU
resources. I tried to optimize this process by caching what can be
cached but got little to none improvement as not much can be cached
here (only loader and templates).

peterk

unread,
Mar 11, 2009, 5:11:15 AM3/11/09
to Google App Engine
I went back to my logs..I was ignoring a certain request previously,
so that's why I was missing my 'cold starts'! For some reason, one of
my particular requests seems to get all my cold starts. However, the
request and cpu times I'm seeing for these cold starts are quite
different from what Jason is reporting. Typically they range from
1000-1600ms and 1900-2100 ms-cpu.

Antonin Hildebrand

unread,
Mar 11, 2009, 7:26:13 AM3/11/09
to Google App Engine
I can also confirm this behavior with my app, recycling takes place
after about 2 seconds of inactivity. I also guess, that this recycling
timeout had to be lowered by GAE team during last week, because I had
running and working application on appspot. I did no updates to it and
the app did break because of this change.

johnP

unread,
Mar 11, 2009, 12:28:22 PM3/11/09
to Google App Engine
I've been tracking (and seeing) this for a while already. Besides the
latency that occurs each time Django gets re-zipimported, what is
concerning is the thought of paying for CPUs to constantly reload the
cache. My app's not live yet - so there is some time before this
becomes a $$$ problem for me, but...

I remain forever hopeful that it will be solved by then. :)



On Mar 11, 4:26 am, Antonin Hildebrand <antonin.hildebr...@gmail.com>
wrote:

Jason C

unread,
Mar 11, 2009, 12:49:56 PM3/11/09
to Google App Engine
Hmm. Never thought about that aspect; I was only concerned with the
response performance.

Perhaps the AdWords profit maximization heuristic has been applied to
the instance recycling algorithm? ;)

P. Hausel

unread,
Mar 11, 2009, 2:00:51 PM3/11/09
to Google App Engine




On Mar 11, 12:28 pm, johnP <j...@thinkwave.com> wrote:
> I've been tracking (and seeing) this for a while already.  Besides the
> latency that occurs each time Django gets re-zipimported, what is
> concerning is the thought of paying for CPUs to constantly reload the
> cache.  My app's not live yet - so there is some time before this
> becomes a $$$ problem for me, but...
>

But how can you reach your billing limit if the issue is that you get
low traffic in the first place? other than that, I agree, it would be
great if this 2-sec limit was increased.

johnP

unread,
Mar 11, 2009, 6:38:37 PM3/11/09
to Google App Engine
That's an excellent point. Can I assume that (if) I ever reach the
billing limit, the cache will last longer than 2 seconds?

Jason C

unread,
Mar 12, 2009, 10:57:21 AM3/12/09
to Google App Engine
It's unfortunate the no one from Google has given a comment on this
thread (at least as far as I can tell).

Is this aggressive application restarting normal, or is it a bug?

Is there something/anything that we as developers can do with our
software to provide good responsiveness even on lower traffic / new
sites?

j

peterk

unread,
Mar 12, 2009, 11:59:03 AM3/12/09
to Google App Engine
As cz suggested, you could set up a request handler that does a
minimum of processing (i.e. one that just returns an empty response or
something), and then ping it from a third party location frequently
enough to keep your app hot.

That would use up requests, but shouldn't burn too much cpu or
bandwidth quota if the request handler is really doing virtually
nothing.

Then once you get enough natural traffic to keep your app hot you
could stop doing this.

If you don't want to, or can't, ping from a third party location,
maybe you get your app to ping itself. See: http://stage.vambenepe.com/archives/549

Jason C

unread,
Mar 12, 2009, 12:46:17 PM3/12/09
to Google App Engine
Thanks peterk.

We were doing this, however:

"Under this 1 request/second load, we are still seeing lots of
instance
startup - even after 40-50 minutes of sustained load. Subjectively,
the instance startups seem to come in bursts, though we've done no
formal analysis around this."

It sort of seems like a bug to me.

j

Brett Slatkin

unread,
Mar 13, 2009, 2:04:52 AM3/13/09
to google-a...@googlegroups.com
Heyo,

On Tue, Mar 10, 2009 at 7:16 PM, cz <cze...@gmail.com> wrote:
> Actually, that is normal behavior. This has been discussed in previous
> threads.
> GAE seems to aggressively purge it's app cache, average app lifetime
> appears to be under 2 seconds. Appengine-patch may be marginally
> faster but both require Django1.x to be imported via zipimport, which
> is pretty expensive.
> Our app also exhibits this problem due to fairly low traffic, but
> there's not much that can be done as far as I can tell. Our app's
> dynamic pages contain lots of images also served via GAE, and since
> the browser can make many requests at once to load these, the app can
> be started up on several sever instances (probably due to load
> balancing) to deliver all the images. This can add up to a huge amount
> of CPU usage for just one page.
> Basically, the more traffic your app gets the faster it will be.
> Some people have suggested using a process somewhere to automatically
> make requests once per second or so to keep apps in the cache. But
> this is surely frowned upon by Google.

Indeed, we are aware of this issue for some applications with certain
load characteristics. We're taking steps to improve app caching to
maximize instance lifetimes when it makes sense. Thanks for reporting
all of your experiences with this behavior!

-Brett

Jason C

unread,
Apr 21, 2009, 9:31:24 AM4/21/09
to Google App Engine
I tried using the new cron facility to hit a url every minute in an
attempt to keep an instance hot. I am seeing a full instance spin-up
on every hit (>3,000ms-cpu).

If the application cache timeout was extended to just over a minute,
at least the cron facility would be able to be used to keep an
instance hot. This would tremendously help new applications that are
just building traffic.

j

Barry Hunter

unread,
Apr 21, 2009, 9:41:23 AM4/21/09
to google-a...@googlegroups.com
But you dont know that the cron handler is going to run on the same
machine that the next visitor is going to hit. In fact cron may even
be running on machines not serving www traffic.

An app might be running on different machines (or even different data
centers) each time its been run.
--
Barry

- www.nearby.org.uk - www.geograph.org.uk -

T.J. Crowder

unread,
Apr 21, 2009, 11:45:09 AM4/21/09
to Google App Engine
> But you dont know that the cron handler is going to run on the same
> machine that the next visitor is going to hit. In fact cron may even
> be running on machines not serving www traffic.

A very, very good point. That argues for pinging it externally.

> An app might be running on different machines (or even different data
> centers) each time its been run.

Indeed it might. But if the ping is coming from the same network
location, barring load limits being reached, etc., etc., etc., I would
expect fairly quickly for the app to settle down on the same couple of
servers.

Let me raise the question of whether the ping would be of any value AT
ALL; because if not, it's just going to sit there consuming quotas and
potentially making Google unhappy. Part of the point of AppEngine is
we get to take advantage of Google's extensive CDN. If your ping is
from a network in California, I wouldn't be surprised to find that a
customer in New York gets a cold instance. I'd be truly shocked if a
customer in Ireland or Ukraine didn't, at least once things are up and
running properly. (Here in the preview period, it wouldn't surprise me
if most AppEngine stuff is still coming out of just a couple of data
centres for now.)

FWIW,
--
T.J. Crowder
tj / crowder software / com
Independent Software Engineer, consulting services available

On Apr 21, 2:41 pm, Barry Hunter <barrybhun...@googlemail.com> wrote:
> But you dont know that the cron handler is going to run on the same
> machine that the next visitor is going to hit. In fact cron may even
> be running on machines not serving www traffic.
>
> An app might be running on different machines (or even different data
> centers) each time its been run.
>
Reply all
Reply to author
Forward
0 new messages