GAE Starts dynamic instances for no reason... causing unnecessary cold start delays

489 views
Skip to first unread message

Mike Lawrence

unread,
Jul 27, 2011, 5:59:46 PM7/27/11
to Google App Engine
I purchased 3 Always-On instances.
My site is under construction with no traffic.
When I hit my site, GAE fires up a new dynamic instance to service the
request when there are 3 idle instances!
My app starts in 2.0 seconds (using stripes)
But GAE takes 9.4 seconds to reply (why?).
Really annoying.
Why pay for Always-On when you get the crappy response time of dynamic
instances?

App Id: WordPong

EST:
2011-07-27 17:37:04.859 /game/Game.wp?_eventName=questionList 200
9481ms 2063cpu_ms 103api_cpu_ms 1kb Mozilla/5
2011-07-27 12:37:51 Completed update of a new default version
version=22.2011-07-27T19:37:17Z

Here's the screen shot of my instances at the time of the request:
http://dl.dropbox.com/u/473572/Untitled.jpg

Why are there any dynamic instances running at all when there are 3
idle always-on instances available?
Looks like a serious bug where GAE is wasting resources and providing
poor response times for no reason.

Jon McAlister

unread,
Jul 27, 2011, 7:12:13 PM7/27/11
to google-a...@googlegroups.com
So, there's a couple of things going on here. I'll see if I can help explain.

The first is that with 1.5.2 we changed how resident instances work,
so that if a dynamic instance existed, the scheduler would prefer an
idle dynamic instance to an idle resident instance. Further, if a
resident instance was busy, we would use this as a cue to warmup a new
dynamic instance. The point of these changes is to turn the resident
instances into the reserve for the app. They are generally idle, but
if the app gets more traffic than it can handle with non-reserved
instances, then it will use some of the reserved instances (and this
will in turn spawn a new dynamic instance).

Generally, Always On is going away with the new billing plan, and
being replaced by Min Idle Instances, which is how the reserved
instances have been changed to behave with 1.5.2. We're continuing to
evaluate all aspects here, both how well these reserve instances are
working, what we should be doing, what we should change about the
scheduler and the billing plan, and so on.

In terms of this specific example, the slow request was caused by
general bigtable slowness during that time interval. This can be seen
somewhat here: http://code.google.com/status/appengine/detail/datastore/2011/07/27#ae-trust-detail-datastore-get-latency

This can also be investigated somewhat using our logs viewer. For
example, we can see all loading requests for an app with:
https://appengine.google.com/logs?app_id=wordpong&severity_level_override=1&severity_level=3&tz=US%2FPacific&filter=loading_request%3D1&filter_type=regex&date_type=now&date=2011-07-27&time=15%3A46%3A45&limit=20&view=Search.
Note how the only loading requests this app has received have been
/_ah/warmup

Also we can see all requests sent to a specific instance. Here's the
one with the log line you listed above:
https://appengine.google.com/logs?app_id=wordpong&severity_level_override=1&severity_level=3&tz=US%2FPacific&filter=instance%3D00c61b117c39de5ca2d60c64270fc703fdce1355&filter_type=regex&date_type=now&date=2011-07-27&time=15%3A56%3A40&limit=20&view=Search.
Note how the first request the instance served was /_ah/warmup,
followed by a pause of 4 seconds, followed by the /game/Game.wp
request which ran for 9 seconds.

There are a couple of things that can be done now to get different
behaviors. One is to set Max Idle Instances to three, which will kill
off the dynamic instances for your app, and leave the app with just
the resident instances. The other is to use Backends, which will give
you direct control over how many instances run for your app and their
properties: http://code.google.com/appengine/docs/java/backends/overview.html

Hopefully that helps. There is also a lengthy discussion going on at:
http://groups.google.com/group/google-appengine/browse_thread/thread/baf439a6e073f6da

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Mike Lawrence

unread,
Jul 28, 2011, 11:32:49 AM7/28/11
to Google App Engine
Wow that's awesome. Thanks Jon for the very thoughtful analysis.
I understand what's going on now.

I recommend you change the scheduler to favor instances that have been
warmed up
over N milliseconds where N is the average start-up time for the app.

You should never send a request to a new instance when there is an
idle instance (dynamic or reserved) that is hot.
Without this change, new requests will be delayed unnecessarily which
is unacceptable for a user-facing app.

Thanks again for the detailed analysis and recommendations.
Love the log searching tricks.

On Jul 27, 7:12 pm, Jon McAlister <jon...@google.com> wrote:
> So, there's a couple of things going on here. I'll see if I can help explain.
>
> The first is that with 1.5.2 we changed how resident instances work,
> so that if a dynamic instance existed, the scheduler would prefer an
> idle dynamic instance to an idle resident instance. Further, if a
> resident instance was busy, we would use this as a cue to warmup a new
> dynamic instance. The point of these changes is to turn the resident
> instances into the reserve for the app. They are generally idle, but
> if the app gets more traffic than it can handle with non-reserved
> instances, then it will use some of the reserved instances (and this
> will in turn spawn a new dynamic instance).
>
> Generally, Always On is going away with the new billing plan, and
> being replaced by Min Idle Instances, which is how the reserved
> instances have been changed to behave with 1.5.2. We're continuing to
> evaluate all aspects here, both how well these reserve instances are
> working, what we should be doing, what we should change about the
> scheduler and the billing plan, and so on.
>
> In terms of this specific example, the slow request was caused by
> general bigtable slowness during that time interval. This can be seen
> somewhat here:http://code.google.com/status/appengine/detail/datastore/2011/07/27#a...
>
> This can also be investigated somewhat using our logs viewer. For
> example, we can see all loading requests for an app with:https://appengine.google.com/logs?app_id=wordpong&severity_level_over....
> Note how the only loading requests this app has received have been
> /_ah/warmup
>
> Also we can see all requests sent to a specific instance. Here's the
> one with the log line you listed above:https://appengine.google.com/logs?app_id=wordpong&severity_level_over....
> Note how the first request the instance served was /_ah/warmup,
> followed by a pause of 4 seconds, followed by the /game/Game.wp
> request which ran for 9 seconds.
>
> There are a couple of things that can be done now to get different
> behaviors. One is to set Max Idle Instances to three, which will kill
> off the dynamic instances for your app, and leave the app with just
> the resident instances. The other is to use Backends, which will give
> you direct control over how many instances run for your app and their
> properties:http://code.google.com/appengine/docs/java/backends/overview.html
>
> Hopefully that helps. There is also a lengthy discussion going on at:http://groups.google.com/group/google-appengine/browse_thread/thread/...

Mike Lawrence

unread,
Jul 28, 2011, 4:49:32 PM7/28/11
to Google App Engine
Just ran a volume test on my server.
The three reserved instances are not getting any traffic.
That can't be normal.
Instead GAE spawned 3 dynamic instances to handle all the load.
I have set Max Idle instances to 3.
Why would you ever want instances to sit idle under a load test?
If you have capacity (idle instances), why not use them when you're
under heavy load?
http://dl.dropbox.com/u/473572/Untitled2.jpg
Wouldn't I be charged for 6 instance under the new pricing model when
I'm only really using 3?


On Jul 27, 7:12 pm, Jon McAlister <jon...@google.com> wrote:
> So, there's a couple of things going on here. I'll see if I can help explain.
>
> The first is that with 1.5.2 we changed how resident instances work,
> so that if a dynamic instance existed, the scheduler would prefer an
> idle dynamic instance to an idle resident instance. Further, if a
> resident instance was busy, we would use this as a cue to warmup a new
> dynamic instance. The point of these changes is to turn the resident
> instances into the reserve for the app. They are generally idle, but
> if the app gets more traffic than it can handle with non-reserved
> instances, then it will use some of the reserved instances (and this
> will in turn spawn a new dynamic instance).
>
> Generally, Always On is going away with the new billing plan, and
> being replaced by Min Idle Instances, which is how the reserved
> instances have been changed to behave with 1.5.2. We're continuing to
> evaluate all aspects here, both how well these reserve instances are
> working, what we should be doing, what we should change about the
> scheduler and the billing plan, and so on.
>
> In terms of this specific example, the slow request was caused by
> general bigtable slowness during that time interval. This can be seen
> somewhat here:http://code.google.com/status/appengine/detail/datastore/2011/07/27#a...
>
> This can also be investigated somewhat using our logs viewer. For
> example, we can see all loading requests for an app with:https://appengine.google.com/logs?app_id=wordpong&severity_level_over....
> Note how the only loading requests this app has received have been
> /_ah/warmup
>
> Also we can see all requests sent to a specific instance. Here's the
> one with the log line you listed above:https://appengine.google.com/logs?app_id=wordpong&severity_level_over....
> Note how the first request the instance served was /_ah/warmup,
> followed by a pause of 4 seconds, followed by the /game/Game.wp
> request which ran for 9 seconds.
>
> There are a couple of things that can be done now to get different
> behaviors. One is to set Max Idle Instances to three, which will kill
> off the dynamic instances for your app, and leave the app with just
> the resident instances. The other is to use Backends, which will give
> you direct control over how many instances run for your app and their
> properties:http://code.google.com/appengine/docs/java/backends/overview.html
>
> Hopefully that helps. There is also a lengthy discussion going on at:http://groups.google.com/group/google-appengine/browse_thread/thread/...

Jon McAlister

unread,
Jul 28, 2011, 5:00:06 PM7/28/11
to google-a...@googlegroups.com
Because the scheduler is now treating the reserved instances as
Min-Idle-Instances, what you're describing is expected
behavior. They are intentionally kept idle, and it tries to serve
traffic using the non-reserved instances. Then, if the
non-reserved instances can't keep up, then it will make use of
the reserved instances.

That is, to repeat, the invariant that the scheduler is trying to
maintain here is that your app has at least 3 idle instances.
And if an instance is getting traffic, then it isn't idle. The value
of an idle instance is that it can process requests right-away
if needed, without having to first warmup or do a loading request.

It sounds like what you'd really prefer is something like
Min-Instances, but that's not presently an available option.

If you set Max-Idle-Instances, then you will never be charged for
any instances which are idle in excess of the configured
value. In this example, if Max-Idle-Instances=3, and the
scheduler is running 6 instances, and three of them are always
idle, then the resulting charge would be for 3 instances.

Mike Lawrence

unread,
Jul 29, 2011, 10:14:25 AM7/29/11
to google-a...@googlegroups.com
thanks for clearing that up.
I'm sure a lot of people will be confused by this.
Maybe add a $ next to instances that we are being charged for so it's obvious?
Thanks for your excellent support

Ikai Lan (Google)

unread,
Jul 29, 2011, 10:24:05 AM7/29/11
to google-a...@googlegroups.com
Mike, you aren't being charged for these instances in the current billing scheme. Because you're being charged for CPU hours, if you use 5 CPU hours, whether those CPU hours are served by 100 instances or 10 instances - you pay the same. You would only pay more under the new (as of yet to be launched) pricing scheme where you would pay per instance. We're working on solutions that should make this both more transparent and configurable by then.

Ikai Lan 
Developer Programs Engineer, Google App Engine


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/4x8OnZt-sSUJ.

Robert Kluin

unread,
Jul 31, 2011, 2:44:23 AM7/31/11
to google-a...@googlegroups.com

This whole thread is an excellent example of why I'm not at all enthusiastic about the new pricing.

Mike is 100% correct, this behavior is amazing unintuative and confusing.  If we will be paying around $60 per month per instance, why would we ever want one to sit idle?  Shouldn't App Engine serve the incoming request with an idle instance and spin up a new instance to be in reserve, if needed?

Also, if I understand correctly, if I set min idle instances to, say, 5.  It will nearly always have 5 instance sitting there doing nothing.  If one does get used, it will spin up another immediately.  Is the point of min idle instances to handle bursts of traffic (though it isn't clear ut does that either since it may well send requests to new instances)?  Otherwise I can hardly see the point in ever having more than 1 idle instance.


Robert



Jon McAlister

unread,
Jul 31, 2011, 2:15:07 PM7/31/11
to google-a...@googlegroups.com
On Sat, Jul 30, 2011 at 11:44 PM, Robert Kluin <robert...@gmail.com> wrote:
> This whole thread is an excellent example of why I'm not at all enthusiastic
> about the new pricing.
>
> Mike is 100% correct, this behavior is amazing unintuative and confusing.
> If we will be paying around $60 per month per instance, why would we ever
> want one to sit idle?  Shouldn't App Engine serve the incoming request with
> an idle instance and spin up a new instance to be in reserve, if needed?

To that last question, that's precisely what its doing. It sends
requests to available instances, and if after having done that there
are no longer N Min-Idle-Instances, it will spin up new ones to meet
that invariant.

Also, if you set Max-Idle-Instances, then you are not charged for
these extra instances. By setting Max-Idle-Instances, to say 3, then
you are never charged for more than three idle instances. This works
whether or not you have configured Min-Idle-Instances.

> Also, if I understand correctly, if I set min idle instances to, say, 5.  It
> will nearly always have 5 instance sitting there doing nothing.  If one does
> get used, it will spin up another immediately.  Is the point of min idle
> instances to handle bursts of traffic (though it isn't clear ut does that
> either since it may well send requests to new instances)?  Otherwise I can
> hardly see the point in ever having more than 1 idle instance.

It depends on the app and the developer. In the most extreme case, of
an app with very high request latency, that is not concurrent, and for
whom loading request latency is even worse, then the only way it can
handle additional load is to have idle instances ready to take the
additional requests. Some apps share properties with this extreme app,
and have asked for more than 3 idle instances as such.

Another example is that of a load test. By using this feature, you get
as many instances turned up as you'd like, without having to wait for
the scheduler to turn them up for you. For apps that run at hundreds
or thousands of instances, having more control over this can be
useful.

Per

unread,
Jul 31, 2011, 5:25:44 PM7/31/11
to Google App Engine
Hi Jon and Ikai,

I've been seeing crazy amounts of warmup requests all week. For
instance, while one request was being served, 5 to 7 instances started
up *simultaneously*, although there was no other load. Now, the *next*
request was almost guaranteed to hit one of the new instances, which
are warmed up, but not 100% hot... and thus takes a long time to
load... which spins up a few *new* instances. The app used to have a
startup time of 5 to 6 seconds, and it used to respond within 400ms to
800ms, with only a few heavy pages being a lot slower. Not a lightning
fast app, but it worked really well before 1.5.2. Now response times
have dropped by factor 3 to 5, which takes even normal requests into a
really awkward 2 to 3 second range. This has been plaguing the
application all week, and I had no clue what was going on.

I figured out that I needed to limit the the idle VMs to 3 to, so I'd
at least avoid having 3-7 new idle VMs per request and always hitting
luke-warm instances. That's better, but there is still an awful lot of
spinning up and spinning down going on, and every 3rd or 4th request
or so still goes to a "warm-but-not-hot" VM, taking 3 seconds or so.

I don't really care about the billing at the moment, but it is really
embarrassing me in front of our customers to have such a slow
application all of a sudden. I've been working the weekend on some
general profiling to mitigate, but please make this stop, or give us
more control so we can determine how soon a new instance is created
(in my case, not so soon...)

Does this also change the recommended approach to warmup-request in
general? So far, I had only warmed the most crucial parts of the app,
since the doc states that requests may as well hit a cold instance,
and you wouldn't want to wait for 20s until really the entire app was
piping hot. Instead of doing an extreme 20 seconds warm-up, the
instances would gradually hotten up with each new request coming in.
But now instances go up and down a lot, so each one is barely luke-
warm, and thus slower on average. I'd change the warmup sequence to
20s of heavy exercises if it helped, but that would spin up even more
instances... oh dear. :)


Cheers,
Per

Tim Hoffman

unread,
Jul 31, 2011, 8:11:30 PM7/31/11
to google-a...@googlegroups.com
Hi

I thought I would chime in, I am slowly starting to get my head around what you are suggesting
in terms of scheduler behaviour.

Though you do mention min idle-instances (which isn't a tuneable parameter as I understand it.)

So I will mention what I am seeing on a low frequency use site.
We are running always on instances.

Code hasn't changed in an appreciable way over the last few months. 
(changed analytics to async loading and a couple of cosmetic changes)

Webmaster tools reports that for all of June my site latency was down in the sub 3 seconds.
As of the 1.5.2 update it has been steadily climbed to 6 secs. Its more than likely because of out low traffic
(about 200 page views a day) that someone will always hit an idle always on instance.

All of the dashboards configurations are currently set to default (max idle auto, min pending latency auto)

So it would seem on the face of it, that the introduction of the new regime (though incomplete) that
default settings are less than optimal than pre-1.5.2 and because there are no real statistics presented
in this space the only way to try and remedy the situation is trial and error, which
isn't really a good option on a live site.

Are you going to try an compile/present some sort of statistics so that we can better understand what is going on.
Like a instance miss rate, % requests not server by always on/idle, rate of new instance creation etc...

Thanks

Tim

Francois Masurel

unread,
Aug 1, 2011, 2:09:00 AM8/1/11
to google-a...@googlegroups.com
I've created a production issue :


You might want to star it.

Francois

Tim Hoffman

unread,
Aug 1, 2011, 4:26:31 AM8/1/11
to google-a...@googlegroups.com
I have created a new issue specifically asking for metrics/statistics on rate of instance startup shutdown,

http://code.google.com/p/googleappengine/issues/detail?id=5458

If you interested star or even add some more stats/numbers your would like to see.

I think this is going to be important as we move into the new charging scheme.

Rgds

Tim

Jon McAlister

unread,
Aug 1, 2011, 2:00:44 PM8/1/11
to google-a...@googlegroups.com
@Per, thanks for the data, that is very helpful! There are a
couple of things going on here. The first is that it is spinning
up several new dynamic instances at a time. That's definitely a
bug, and I've confirmed why its happening. We'll fix that for the
next release.

For your warmup requests, I think you should try to do as much
initialization as you can within the 30s time limit. Warmup
requests are a special kind of loading request, in that they are
not end-user facing:
http://code.google.com/appengine/docs/python/config/appconfig.html#Inbound_Services

By having your warmup requests do more work, you should avoid the
warm-but-not-hot instances, which appear to be the cause of the
higher end-user latencies in your case. I should point out, in
case this is not obvious, when an instance is being warmed-up, we
are not sending it any other traffic. We will wait until
/_ah/warmup completes before we send any user requests to it.

@Tim, Min-Idle-Instances is not yet tunable (other than 0 or 3
with Always-On), but will be soon. We're working on it.

@All, thanks for filing the production issues, and providing the
helpful data and feedback. We are continuing to debate the matter
internally and are continuing to look for the best solution here.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/1WiVSAKF5loJ.

Reply all
Reply to author
Forward
0 new messages