GAE starting unnecessary instances

354 views
Skip to first unread message

Francois Masurel

unread,
Jul 19, 2011, 6:04:13 AM7/19/11
to google-a...@googlegroups.com
Hi everybody,

It seems that GAE is starting unnecessary instances when AlwaysOn is activated.

My app is Java / Multi-threaded and doesn't have a lot of traffic.

Still, one to three dynamic instances are started quite regularly (check attachment).

Is anyone seeing the same behavior ?  It could cost a lot with the new pricing scheme.

Thanx for your help.

Francois
Instances-VinoCities.jpg

Francois Masurel

unread,
Jul 19, 2011, 6:32:44 AM7/19/11
to google-a...@googlegroups.com
It looks like 2 resident instances and 1 dynamic are stalled as they didn't serve any request for the last minute (check attachment).

But still one dynamic instance was started less than 2 minutes ago.

Is it normal ?
Instances-VinoCities_2.jpg

Francois Masurel

unread,
Jul 19, 2011, 6:34:59 AM7/19/11
to google-a...@googlegroups.com
Even 3 now, for just just a few new requests. Geez.

It's getting really expensive :-(
Instances-VinoCities_3.jpg

Tom Phillips

unread,
Jul 19, 2011, 7:28:23 AM7/19/11
to Google App Engine
Yes, seeing the same problem on all my apps. Java, M/S.

Raise a production issue Francois?

/Tom

On Jul 19, 6:34 am, Francois Masurel <f.masu...@gmail.com> wrote:
> Even 3 now, for just just a few new requests. Geez.
>
> It's getting really expensive :-(
>
>  Instances-VinoCities_3.jpg
> 43KViewDownload

Tom Phillips

unread,
Jul 19, 2011, 7:39:50 AM7/19/11
to Google App Engine
http://code.google.com/status/appengine shows they are investigating
issue(s) with Java currently..

tempy

unread,
Jul 19, 2011, 4:57:33 PM7/19/11
to Google App Engine
I'm still seeing this too...

On Jul 19, 1:39 pm, Tom Phillips <tphill0...@gmail.com> wrote:
> http://code.google.com/status/appengineshows they are investigating

Francois Masurel

unread,
Jul 19, 2011, 5:31:33 PM7/19/11
to google-a...@googlegroups.com
Resident instances don't seem to be used at all (check attachment).

It's probably a bug on GAE side.

Any googlers around ?
Instances-VinoCities.jpg

Robert Kluin

unread,
Jul 19, 2011, 5:41:15 PM7/19/11
to google-a...@googlegroups.com
I've been seeing funky stuff with instances the past few days. It
seems that instances are being killed off very aggressively. In my
active applications' dashboards, I see instances getting killed then
immediately spun back up in big batches. I'm seeing this type of
behavior across around 7 Python applications, of various traffic
levels both MS and HR datastore.

Prior to today, it seemed they were not getting spun up as readily
either. Today I'm seeing more instances online again (which is good,
app is much more responsive.


Robert

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/INY70Yliu5YJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

Screen shot 2011-07-19 at 17.38.18.png
Screen shot 2011-07-19 at 17.39.05.png

Galoch

unread,
Jul 19, 2011, 6:18:20 PM7/19/11
to Google App Engine
Same here. Seems like GAE is totally ignoring Always On instances.
I also noticed that even with no user hitting our app and a single
cron job that runs every 5 minutes it is still spinning instances
every 3 minutes and then killing them in 2 minutes.

This has been happening since after the upgrade on 14th July. During
peak load this really gets nasty and brings down the performance.

This is the feedback I got yesterday from one of our customers since
it takes time to spin an instance (and yes we use Spring):

"1) I found the GUI to be very laggy"

Can someone from Google please respond?

Johan Euphrosine

unread,
Jul 21, 2011, 8:56:42 PM7/21/11
to google-a...@googlegroups.com
After speaking with Engs, I think I can explain what is going on:

Here are the current scheduling rules: (> reads as has priority for
handling the incoming request)

1/ Idle Always On instance > Spawning a new Dynamic instance
2/ Spawning a new Dynamic instance > Busy Always On instance
3/ Idle Dynamic instance > Busy Always On instance
4/ Idle Dynamic instance > Idle Always On instance

I will give you an example to illustrate the behavior you all noticed,
that is Dynamic instance handling request while Always On is idle.

(Always On instance started)
- Incoming request
- Always On instance handle the request
- another Incoming request
(Always On instance busy)
- A new Dynamic instance is spawned
(Dynamic instance idle, Always on instance busy)
- Dynamic instance handle the request
- another Incoming request
(Dynamic instance idle, Always on instance idle)
- Dynamic instance handle the request
- No request for more than idle-dynamic-instance-timeout
- Dynamic instance shut down
- another Incoming request
(Always On instance idle)
- Always On instance handle the request

Hope it makes thing clearer.

As part of the new billing model you will have a scheduler knob called
'max-idle-instances' that you can use if extra idling dynamic
instances are undesired.

The good news is that we are open to suggestion, if you think this
behavior is the wrong default, feel free to comment on that thread and
I will follow up your suggestion to the Engineering team.

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.

> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations

Galoch

unread,
Jul 22, 2011, 3:09:45 AM7/22/11
to Google App Engine
@Johan,
The issue is not about Always On instance being busy. Its actually the
other way ... the Always On instance is never busy ... at least that
is what we observed in last 3-4 days. Your explanation may be partly
true since this behavior keeps on changing.

For e.g. I have a snapshot of instances from July 19th and here's the
details (for some reason I can't see a link to attach the snapshot
images here):
Resident Instance 1: Requests: 49 Age: 1Hr
Resident Instance 2: Requests: 6 Age: 1Hr
Resident Instance 3: Requests: 2 Age: 1Hr
Dynamic Instance 1: Requests: 7 Age: 2min
Dynamic Instance 2: Requests: 291 Age: 1Hr
Dynamic Instance 3: Requests: 322 Age: 1Hr

This is under "no load" with only very light weight cron jobs running.
This gets much much worse during the day under peak load with requests
for dynamic instances reaching 1000+ in matter of minutes and resident
instances have only "1" request served.

As you see above Resident Instance 2 and 3 are hardly hit so I don't
think they are busy at all. On the other hand, Dynamic Instance 2 and
3 get most of the hits.

Dynamic Instance 1 is what is killing us. It keeps getting killed and
reborn within that 5 minute window!!

We use Spring framework and it is really very expensive for us when a
new instance starts up.

Just to give you a background, we had gone through a real roller
coaster ride to make this to work on GAE by breaking the loading of
framework into many different chunks. But still spinning was out of
control. Then we found java threads to our rescue. We worked through
the hack to load JDO to avoid UnsupportedOperationException. We
finally got it to work where most of our requests were served by
Always On instances with occasional spinning of Dynamic instances. It
was quite impressive.

Unfortunately, this was short lived when we hit this new behavior with
GAE. The very last thing we want GAE to do is create a new instance
every few minutes as it could easily reach 30 second deadline during
the day and throw critical error.

I am not sure when the new billing will come into effect but we really
need this thing fixed as it literally brings down our app to a
grinding halt. So I am open to any suggestions you guys think can help
us.

Another thought about new scheduler is to have a configurable
schedule. For e.g. our users are mostly business users who work during
normal business hours. We want to be able to spin more Always On
instances during those hours and bring the number down during nights
and weekends. Dynamic instances won't work for us due to reason
explained above.


Thanks,
galoch
> > For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en.

Francois Masurel

unread,
Jul 22, 2011, 3:37:10 AM7/22/11
to google-a...@googlegroups.com
Still very close from Galoch situation, mine is a bit different.

My Dynamic instances dont seem to be recycled but still my Resident instances dont seem to be used at all (see attachment).

Why ?

Francois
Instances-VinoCities.jpg

Johan Euphrosine

unread,
Jul 22, 2011, 10:57:05 AM7/22/11
to google-a...@googlegroups.com
HI Galoch,

Thanks for the followup,

I think you are experiencing a combinaison fo the two following rules
I was pointing to in my previous email:


(> reads as has priority for handling the incoming request)

2/ Spawning a new Dynamic instance > Busy Always On instance

4/ Idle Dynamic instance > Idle Always On instance

Applied to your example it could means that:


Resident Instance 1: Requests: 49 Age: 1Hr
Resident Instance 2: Requests: 6 Age: 1Hr
Resident Instance 3: Requests: 2 Age: 1Hr
Dynamic Instance 1: Requests: 7 Age: 2min
Dynamic Instance 2: Requests: 291 Age: 1Hr
Dynamic Instance 3: Requests: 322 Age: 1Hr

- 1 Hours ago while all your Always On instance were busy and you had
a burst of incoming requests and the scheduler spawned new Dynamic
instances as per rule 2/ highlighted above.
- After the burst and back to normal traffic the new Dynamic Instances
were handing incoming requests in priority as per rule 4/ highlighted
above.
- 2 Minutes ago all your instances Always On + Dynamic were busy again
and the scheduler spawned a new Dynamic instance that handle 7
incoming requests.

Hope that make more sense for you and Francois, but as I said earlier
we are open to suggestion and I will make sure someone working on the
scheduler team monitor this thread for your input.

> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Tom Phillips

unread,
Jul 22, 2011, 11:33:27 AM7/22/11
to Google App Engine
When are threaded instances considered "busy"?

If it is while they are serving only a one request it would explain
why Always on instances for threaded Java are now severely under-
utilized.

/Tom

Rob Coops

unread,
Jul 22, 2011, 11:37:11 AM7/22/11
to google-a...@googlegroups.com
1/ Idle Always On instance > Spawning a new Dynamic instance
2/ Spawning a new Dynamic instance > Busy Always On instance
3/ Idle Dynamic instance > Busy Always On instance
4/ Idle Dynamic instance > Idle Always On instance

So App engine prefers to use bored Always On instances over spawning new dynamic once that's good. If the Always On instances are busy it spawns a new Dynamic instance good. If a Dynamic instance is bored but the Always On once are busy the Dynamic instance gets the load still good.

But then you loose me if the Always one instance is idle the Dynamic instance still gets the load, why?

In this last case I would expect the Always on instance to get the load otherwise the Dynamic instance will keep on being busy and will not get stopped because of it.

I don't know what the cost are of spawning and later destroying a Dynamic instance but I cannot imagine that this is such a huge cost that you would have to prefer using the Dynamic instances over the Always On once.

I believe that this last rule should read:
4/ Idle Always On instance > Idle Dynamic instance

In which case the Idle Always On instances would get the load and the bored Dynamic instances would get cleaned up much faster then they are now.
I suspect this is a typo though as I cannot imagine that this is really the setup but if it is I would say that is a candidate for change. :-)

The other thing I would suggest is altering the load balancing rules for Dynamic instances from the picture painted in this email it looks like the load balancing of multiple Dynamic instances is pretty much round robin (or equal load based). If this would be changed to always try and load use one Dynamic instance till the load reaches 80% or so before using the second one and so on this would allow the despawning of excess Dynamic instances much sooner then when one uses the current setup. This does mean a slightly bigger hit in case of a serious failure of the currently preferred Dynamic instance. Hence the 80% mark for the load which is arbitrarily chosen by randomly picking a number above 50 and might need some more scientific work to ensure that in an average scenario the remaining instances will usually be able to take the load caused by the sudden death of an Dynamic instance.

From what I currently see it looks like the safest option has been chosen, meaning that in all cases the service will remain active no matter what happens but this means a significant cost on the customer side. I suspect that many customers are happy with that, but I think that an equal amount of them will want to see a situation where the costs are less likely to spike while providing a similar albeit slightly less high availability solution.
It might be an interesting idea to offer several flavors of high availability, ranging from the current supper safe but relatively unpredictable cost to a pretty decent with very predictable costs one. I have no idea if this is technically possible but something tells me that it should not be that hard to do. And even if it is a little harder to do my guess is that Google would be up to that task.

Well that's my two cents...

Regards,

Rob



(> reads as has priority for handling the incoming request)
2/ Spawning a new Dynamic instance > Busy Always On instance
4/ Idle Dynamic instance > Idle Always On instance

Luca

unread,
Jul 22, 2011, 1:09:38 PM7/22/11
to google-a...@googlegroups.com
Johan, 

it seems to me the problem is rule 4/ below. 
Shouldn't the rule be: 

Idle Always On instance > Idle Dynamic instance

instead? 
In this way, if you have a dynamic instance on, it will not handle traffic unless the always-on ones are busy.  So when the traffic decreases, and the dynamic instances are no longer needed, they will be correctly shut off. 

I also think an auxiliary problem is with rule 2/ 
If you spawn a new instance just because the others are all busy, you may tend to have too many instances. 
The fact of spawning new instances should be configurable, depending on how many requests are already queued for the instances you have already active -- 0, 1, more, etc.  

More in general, if you allowed users to specify how much it costs to them to delay serving a request, it would not be difficult to synthesize for each app an optimal decision policy to decide whether to switch another instance on or off.  This can be done using tools from dynamic optimization / control theory.  I would be glad to help if the people there need guidance on this (I used to be in the research group there till a month ago). 

Luca

Galoch

unread,
Jul 22, 2011, 2:57:34 PM7/22/11
to Google App Engine
Hi Johan,

Thanks for the explanation. I have couple of questions on that.

1. "1 Hours ago while all your Always On instance were busy and you
had a burst of incoming requests"
While this may be true when my Always On instances were "busy" running
some stuff but what about when 2 Always On instances show only "1"
request served which is the Warmup request itself. Does this mean
Warmup requests are considered as traffic? If that is the case then
Always On instances seem rather useless since they will never ever get
called in this scenario.


2. As Tom mentioned, what qualifies "busy". When threadsafe option was
implemented in GAE these 3 Always On instances were able to do most of
the heavy lifting with occasional spinning of dynamic instances.
Nothing has changed on our side that should alter this behavior. With
all these changes happening within GAE I am trying to figure out what
changed and what we can do to contain this burst of traffic within 3
(or more ) Always On instances with less frequent spinning of Dynamic
instances.


3. "- 2 Minutes ago all your instances Always On + Dynamic were busy
again and the scheduler spawned a new Dynamic instance that handle 7
incoming requests. "
Again what constitutes "busy" as I do not see any request being served
by Always On instances 2 and 3 in last 1 hour. Note that number of
requests served by Always On 2/3 are unchanged since they were
created ...
Here's my reading in this scenario:
a. It kills Dynamic Instance 1 within 2 minutes of serving a request
b. When traffic comes in it looks only for Dynamic Instances if they
are busy and completely ignores Always On instances at this point
c. It recreates Dynamic Instance 1

In other words, what rule is applied in this case?

Also I fail to understand rule 4 as both Rob and Luca mentioned. That
completely undermines having Always On instances under threadsafe
mode.

4. I like Rob's suggestion of better load balancing techniques but
again with a caveat that an instance needs to be able to serve
multiple threads before reaching a set capacity (80% or so)

5. Luca's suggestion also makes sense but again with the same
caveat ... it should be able to process multiple threads before
queuing

6. I looked at the new sliders in the Admin console and with those the
situation is even worse. I set the Max Idle Instances to 3 (that's the
minimum I could choose) and Min Pending Latency to 15 secs ... Guess
what our CPU usage has gone up to 15 in 12 hrs because of constant
creation and killing of 3 dynamic instances. Bare minimum traffic and
few light weight crons.
But the good side is now I see requests coming in on the 3 Always On
instances. Is that enough load they are serving ... I don't know yet
but something to observe.


Two things I suggest would be really helpful for us:
A. The overall key here is to know the thread handling capacity of an
instance. Better yet if it can be configured similar to Backends but
dynamic in nature (and of course Backends pricing is outrageous ...
but that's another topic)
B. Able to add more Always On instances but again with a dependency
explained in point A.

Hope it makes sense.

Thanks,
galoch

Tom Phillips

unread,
Jul 22, 2011, 3:46:55 PM7/22/11
to Google App Engine
The current behavior makes me suspect we are being prepared for Always-
on being replaced completely by the new scheduler knobs. Being able to
turn up the number of idle instances does make always-on somewhat
redundant, as long as the idle instances stick around for a while.

Also, if always-on instances WERE being properly utilized right now,
it would require artificial load to ascertain the effects of the new
scheduler on dynamic instances (assuming the three AO instances were
sufficient for your app previously). Many devs wouldn't learn about
the new scheduler attributes until after their app (suddenly) becomes
popular - not the best time for surprises. And Google wouldn't get as
much feedback on the features/behavior of the new scheduler.

Is Always-on going to be kept under the new model?

/Tom

Robert Kluin

unread,
Jul 22, 2011, 10:11:07 PM7/22/11
to google-a...@googlegroups.com
I've suspected the same thing. So far we've seen no pricing info for
always on and questions about it have went unanswered. Not to mention
that $2.10 / week for three always-on instances is a lot different
than $40 / week.


Robert

Francois Masurel

unread,
Jul 23, 2011, 5:01:47 AM7/23/11
to google-a...@googlegroups.com
Geez, now even some Dynamic instances are not used too and stay idle for hours (check attachment).

With the new pricing model this will get really really expensive.

I hope there is definitely something going wrong.

Francois
VinoCities-Instances_2.jpg

Johan Euphrosine

unread,
Jul 25, 2011, 9:12:22 AM7/25/11
to google-a...@googlegroups.com
On Fri, Jul 22, 2011 at 8:57 PM, Galoch <galo...@gmail.com> wrote:
> Hi Johan,
>
> Thanks for the explanation. I have couple of questions on that.

Thanks for showing interest in GAE internals, I'd be happy to answer
those questions directly if I can, or forward them to someone who can
answer them better.

> 1. "1 Hours ago while all your Always On instance were busy and you
> had a burst of incoming requests"
> While this may be true when my Always On instances were "busy" running
> some stuff but what about when 2 Always On instances show only "1"
> request served which is the Warmup request itself. Does this mean
> Warmup requests are considered as traffic? If that is the case then
> Always On instances seem rather useless since they will never ever get
> called in this scenario.

On the admin console capture you included in your previous mail, I
didn't see Always On instances showing only "1" request served but
rather:


Resident Instance 1: Requests: 49 Age: 1Hr
Resident Instance 2: Requests: 6 Age: 1Hr
Resident Instance 3: Requests: 2 Age: 1Hr

Let me know if I missed something.

> 2. As Tom mentioned, what qualifies "busy". When threadsafe option was
> implemented in GAE these 3 Always On instances were able to do most of
> the heavy lifting with occasional spinning of dynamic instances.
> Nothing has changed on our side that should alter this behavior. With
> all these changes happening within GAE I am trying to figure out what
> changed and what we can do to contain this burst of traffic within 3
> (or more ) Always On instances with less frequent spinning of Dynamic
> instances.

There are two scheduler knobs that could help you to affect the way
Dynamic instance are spawned.
"Minimum Pending Latency" and "Max Idle Instances" as described here:
http://code.google.com/appengine/docs/adminconsole/performancesettings.html

> 3. "- 2 Minutes ago all your instances Always On + Dynamic were busy
> again and the scheduler spawned a new Dynamic instance that handle 7
> incoming requests. "
> Again what constitutes "busy" as I do not see any request being served
> by Always On instances 2 and 3 in last 1 hour. Note that number of
> requests served by Always On 2/3 are unchanged since they were
> created ...
> Here's my reading in this scenario:
> a. It kills Dynamic Instance 1 within 2 minutes of serving a request
> b. When traffic comes in it looks only for Dynamic Instances if they
> are busy and completely ignores Always On instances at this point
> c. It recreates Dynamic Instance 1
>
> In other words, what rule is applied in this case?

Sorry, those were mostly specification of mine, I didn't know that the
request served by Always On 2/3 were unchanged according to the
information you provided.
I can investigate deeper into the specific behaviour of your
application, if you open a Production Issue with your application id.

> Also I fail to understand rule 4 as both Rob and Luca mentioned. That
> completely undermines having Always On instances under threadsafe
> mode.
>
> 4. I like Rob's suggestion of better load balancing techniques but
> again with a caveat that an instance needs to be able to serve
> multiple threads before reaching a set capacity (80% or so)
>
> 5. Luca's suggestion also makes sense but again with the same
> caveat ... it should be able to process multiple threads before
> queuing

Thanks a lot for your feedback, I will make sure to forward those
suggestions to the engineering team.

>
> 6. I looked at the new sliders in the Admin console and with those the
> situation is even worse. I set the Max Idle Instances to 3 (that's the
> minimum I could choose) and Min Pending Latency to 15 secs ... Guess
> what our CPU usage has gone up to 15 in 12 hrs because of constant
> creation and killing of 3 dynamic instances. Bare minimum traffic and
> few light weight crons.
> But the good side is now I see requests coming in on the 3 Always On
> instances. Is that enough load they are serving ... I don't know yet
> but something to observe.

Maybe you can open a feature request for having a smaller min for 'Max
Idle Instance' when Always On is activated or having Always On
instances count in Max Idle Instance.

> Two things I suggest would be really helpful for us:
> A. The overall key here is to know the thread handling capacity of an
> instance. Better yet if it can be configured similar to Backends but
> dynamic in nature (and of course Backends pricing is outrageous ...
> but that's another topic)

Are you looking for <max-concurrent-requests> support for Servlet ? If
so I would recommend to open a Feature request.

> B. Able to add more Always On instances but again with a dependency
> explained in point A.

Again, opening a feature request make sense to track this separately.

> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Johan Euphrosine

unread,
Jul 25, 2011, 9:14:10 AM7/25/11
to google-a...@googlegroups.com
Hi,

I will ask to the engineer team and get back to you in thread.

Hope that helps.

> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Johan Euphrosine

unread,
Jul 25, 2011, 9:29:36 AM7/25/11
to google-a...@googlegroups.com
On Fri, Jul 22, 2011 at 7:09 PM, Luca <luca.de...@gmail.com> wrote:
> Johan,
> it seems to me the problem is rule 4/ below.
> Shouldn't the rule be:
> Idle Always On instance > Idle Dynamic instance
> instead?
> In this way, if you have a dynamic instance on, it will not handle traffic
> unless the always-on ones are busy.  So when the traffic decreases, and the
> dynamic instances are no longer needed, they will be correctly shut off.

I confirm the current rules is Idle Dynamic instance > Always On instance
While I agree your proposition make sense in some cases, it sounds
also good to take advantage of the Dynamic instances that have just be
spawned especially if the startup cost is high.

Like Gregd highlighted in his pricing faq, Always On will became a
settings to control the number of idle instances you would like to
have running.

Q: How will Always On work under the new model?
A: When App Engine leaves preview all Paid Apps and Apps in Premier
Accounts will be able to set the number of idle instances they would
like to have running. Always On was designed to allow an app to
always have idle instances running to save on instance start-up
latency. For many Apps a single idle instance should be enough
(especially when using concurrent requests). This means that for many
customers, setting an App to be paid will mean a $9/month minimum
spend, you can then use the 24 free IH/day to keep an instance running
all the time by setting Min Idle Instances to be 1.

> I also think an auxiliary problem is with rule 2/
> If you spawn a new instance just because the others are all busy, you may
> tend to have too many instances.
> The fact of spawning new instances should be configurable, depending on how
> many requests are already queued for the instances you have already active
> -- 0, 1, more, etc.
> More in general, if you allowed users to specify how much it costs to them
> to delay serving a request, it would not be difficult to synthesize for each
> app an optimal decision policy to decide whether to switch another instance
> on or off.  This can be done using tools from dynamic optimization / control
> theory.  I would be glad to help if the people there need guidance on this
> (I used to be in the research group there till a month ago).

It sounds to me that the scheduler knobs are there precisely to address this:
http://code.google.com/appengine/docs/adminconsole/performancesettings.html

Let me know if I overlooked it.

> Luca
> On Thursday, July 21, 2011 5:56:42 PM UTC-7, Johan Euphrosine (Google)
> wrote:
>>
>> After speaking with Engs, I think I can explain what is going on:
>>
>> Here are the current scheduling rules: (> reads as has priority for
>> handling the incoming request)
>>
>> 1/ Idle Always On instance > Spawning a new Dynamic instance
>> 2/ Spawning a new Dynamic instance > Busy Always On instance
>> 3/ Idle Dynamic instance > Busy Always On instance
>> 4/ Idle Dynamic instance > Idle Always On instance
>

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/X8gH6jjIK0MJ.

Johan Euphrosine

unread,
Jul 25, 2011, 9:31:25 AM7/25/11
to google-a...@googlegroups.com
I think Ggegd covered this in his pricing FAQ:

Q: How will Always On work under the new model?
A: When App Engine leaves preview all Paid Apps and Apps in Premier
Accounts will be able to set the number of idle instances they would
like to have running. Always On was designed to allow an app to
always have idle instances running to save on instance start-up
latency. For many Apps a single idle instance should be enough
(especially when using concurrent requests). This means that for many
customers, setting an App to be paid will mean a $9/month minimum
spend, you can then use the 24 free IH/day to keep an instance running
all the time by setting Min Idle Instances to be 1.

Let me know if you need more information.

> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Johan Euphrosine

unread,
Jul 25, 2011, 9:33:02 AM7/25/11
to google-a...@googlegroups.com
What is your setting for 'Max Idle Instances' ?

Feel free to open a Production issue with you appid if you want me to
track this specifically.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/Qczx6foLplUJ.

Francois Masurel

unread,
Jul 25, 2011, 9:42:42 AM7/25/11
to google-a...@googlegroups.com
Hi Johan,

Thanx for your answer.

Setting MaxIdleInstances to 3 did force the resident instances to be used instead of the dynamic ones.

As we will soon pay for active instances, I would haved like to have the 3 resident ones being used instead of extra dynamic ones being started.

Thanx again for your help.

Francois

Francois Masurel

unread,
Jul 25, 2011, 10:58:59 AM7/25/11
to google-a...@googlegroups.com
Something strange :

GAE keeps starting new dynamic instances though I have set the Min Pending Latency to 15s via the Application Settings page.

My requests are served in a few ms on average and I have multithreading enabled.

These instances are immediately destroyed as I have set the Max Idle Instances to 3 (corresponding to my always on instances).

But still lots of warmup requests in my logs showing these dynamic instances start every few minutes.

Will I have to pay for all these short lived dynamic instances ?

My app ID is : vncts1

Thanx for your help.

Francois

Francois Masurel

unread,
Jul 25, 2011, 11:18:09 AM7/25/11
to google-a...@googlegroups.com
Screenshots showing instances and warmup requests (cf. attachments).

Latency is a only few ms but still new instances are started very frequently.
vncts1_warmups.jpg
vncts1_instances.jpg

Johan Euphrosine

unread,
Jul 25, 2011, 12:18:36 PM7/25/11
to google-a...@googlegroups.com
Hi Francois,

I think the help text of the 'Idle Instances' settings is pretty
self-explanatory:
"""
You will not be charged for instances over the specified maximum.
"""

I can investigate on why these instances are created even thought you
setup a high min pending latency, and a a Max idle instances
corresponding to the number of Always On instance.

Feel free to open a Production issue, with your application id if this
is affection your operation.

Thanks in advance.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/TVr3Ko1-baIJ.

Galoch

unread,
Jul 25, 2011, 3:25:53 PM7/25/11
to Google App Engine

Hi Johan,

Thanks for following up on this discussion.

I have new findings to share with you since GAE now works a bit
differently with all these recent changes. Before I do that I will
share the changes I made to our app which has helped us mitigate some
performance issues.

1. We moved all our cron jobs to the Backends with 1 B1 RI. Amazingly
it is able to handle all the load single handed compared to 6
instances (3 RI + 3 DI) in the regular app. We have our eye brows
raised! But we are playing it with caution by setting <max-concurrent-
requests> to 1. Will gradually raise the limits until it starts
complaining.

2. We set Max Idle Instances to 6 and Min Pending Latency to 15s. I
recommend keeping Max Idle Instances to at least 4. Because if you
keep it at 3 GAE kills and creates DI "very" aggressively ... in some
cases 10-20 seconds.

3. In our regular app we are running dummy cron jobs to keep 3 DI up
at all times. Again, amazingly with many concurrent users they are
able to handle ALL the load. On the contrary RI are almost untouched.
Here's the latest statistics:

QPS* Latency* Requests Errors Age Memory
Availability
0.000 0.0 ms 25 0 12:43:22 82.3 MBytes Resident
Icon Resident
0.000 0.0 ms 10 0 12:44:45 80.6 MBytes Resident
Icon Resident
0.000 0.0 ms 5 0 12:44:46 81.9 MBytes Resident
Icon Resident
0.350 19.1 ms 11762 0 9:45:11 118.5 MBytes Dynamic Icon
Dynamic
0.317 17.5 ms 3890 0 3:11:04 97.7 MBytes Dynamic Icon
Dynamic
0.333 67.5 ms 15604 0 12:43:41 104.8 MBytes Dynamic Icon
Dynamic

But this is just a sample set of users. I don't know yet if we can
scale on this principle but it seems to work on light load. But this
is definitely short term fix since this is going to be expensive under
new billing.


After spending fairly good amount of time collecting these statistics
I can boil down to a few things that I want to share, that can help us
tune our app to GAE.
A. Know the current usage of an instance (RI or DI) in terms of
current CPU utilization in percentage. This will help us correlate CPU
utilization shown in each request in the logs and help us tune our
queries. Right now there is no way to determine what effect the tuning
had on an individual instance.

B. Know the number of current threads running within an instance
(maybe in a span of 15 seconds or so). This will help us identify
ratio of thread handling capacity vs CPU consumed (point A). Higher
the number means better throughput. Lower the number means we need to
closely examine our requests.

C. Able to override the 30 second deadline for the warm up requests so
that we have enough time to load our frameworks.

D. Able to specify how long an instance can live before reclaimed by
GAE. Something similar to session timeouts. This will help avoid
unnecessary warm up requests and will improve overall performance.

E. Ability to specify rules within GAE console that will adjust the
Max Idle Instances during different times of a day and week.

F. Able to specify the size of DI (something similar to Backends B1,
B2 ... so on) instead of standard DI. Should be able to configure
along with each rule mentioned in point E.


I understand that C, D, E, F may involve some additional costs to us
but these short / long term enhancements will hugely help us run our
business applications on GAE. The idea is to give us more time and
capacity with the DI while keeping a control on the cost.

Let me know if I missed something that is already there or I am going
off tangent here.

Johan Euphrosine

unread,
Jul 26, 2011, 4:09:45 PM7/26/11
to google-a...@googlegroups.com
On Mon, Jul 25, 2011 at 9:25 PM, Galoch <galo...@gmail.com> wrote:
>
> Hi Johan,
>
> Thanks for following up on this discussion.
>
> I have new findings to share with you since GAE now works a bit
> differently with all these recent changes. Before I do that I will
> share the changes I made to our app which has helped us mitigate some
> performance issues.
>
> 1. We moved all our cron jobs to the Backends with 1 B1 RI. Amazingly
> it is able to handle all the load single handed compared to 6
> instances (3 RI + 3 DI) in the regular app. We have our eye brows
> raised! But we are playing it with caution by setting <max-concurrent-
> requests> to 1. Will gradually raise the limits until it starts
> complaining.
>
> 2. We set Max Idle Instances to 6 and Min Pending Latency to 15s. I
> recommend keeping Max Idle Instances to at least 4. Because if you
> keep it at 3 GAE kills and creates DI "very" aggressively ... in some
> cases 10-20 seconds.

Thanks for investigating those behaviors, feel free to continue to
share your findings with the community.

> 3. In our regular app we are running dummy cron jobs to keep 3 DI up
> at all times. Again, amazingly with many concurrent users they are
> able to handle ALL the load. On the contrary RI are almost untouched.
> Here's the latest statistics:
>
> QPS*    Latency*        Requests        Errors  Age             Memory
> Availability
> 0.000   0.0 ms  25              0              12:43:22         82.3 MBytes     Resident
> Icon Resident
> 0.000   0.0 ms  10              0              12:44:45         80.6 MBytes     Resident
> Icon Resident
> 0.000   0.0 ms  5               0              12:44:46         81.9 MBytes     Resident
> Icon Resident
> 0.350   19.1 ms         11762   0               9:45:11         118.5 MBytes    Dynamic Icon
> Dynamic
> 0.317   17.5 ms         3890    0               3:11:04         97.7 MBytes     Dynamic Icon
> Dynamic
> 0.333   67.5 ms         15604   0               12:43:41        104.8 MBytes    Dynamic Icon
> Dynamic
>
> But this is just a sample set of users. I don't know yet if we can
> scale on this principle but it seems to work on light load. But this
> is definitely short term fix since this is going to be expensive under
> new billing.

I believe those stats illustrate the rule we discussed before:
Idle DI > Idle RI.

Keep in mind that under the new pricing model Min Idle Instances
should superseed Always On instances, as described by gregd in the
billing faq.

So hopefully you should not have idling DI instance wandering around,
and you will be able to control this with Max Idle Instances anyway.

I think your suggestions would be a great addition to the
admin-console, and let user have more visibility and control over
their instances and thus over billing.

You should definitely fill separate feature requests on the public
issue tracker for them.

In the meantime, I will make sure the engineering team is aware of
your suggestion.

> Let me know if I missed something that is already there or I am going
> off tangent here.

You're completely on the topic of this discussion :)

Johan Euphrosine

unread,
Jul 26, 2011, 4:24:34 PM7/26/11
to google-a...@googlegroups.com
On Fri, Jul 22, 2011 at 5:37 PM, Rob Coops <rco...@gmail.com> wrote:
> 1/ Idle Always On instance > Spawning a new Dynamic instance
> 2/ Spawning a new Dynamic instance > Busy Always On instance
> 3/ Idle Dynamic instance > Busy Always On instance
> 4/ Idle Dynamic instance > Idle Always On instance
> So App engine prefers to use bored Always On instances over spawning new
> dynamic once that's good. If the Always On instances are busy it spawns a
> new Dynamic instance good. If a Dynamic instance is bored but the Always On
> once are busy the Dynamic instance gets the load still good.
>
> But then you loose me if the Always one instance is idle the Dynamic
> instance still gets the load, why?
> In this last case I would expect the Always on instance to get the load
> otherwise the Dynamic instance will keep on being busy and will not get
> stopped because of it.
> I don't know what the cost are of spawning and later destroying a Dynamic
> instance but I cannot imagine that this is such a huge cost that you would
> have to prefer using the Dynamic instances over the Always On once.
> I believe that this last rule should read:
> 4/ Idle Always On instance > Idle Dynamic instance
> In which case the Idle Always On instances would get the load and the bored
> Dynamic instances would get cleaned up much faster then they are now.
> I suspect this is a typo though as I cannot imagine that this is really the
> setup but if it is I would say that is a candidate for change. :-)

This is not a typo:


4/ Idle Always On instance > Idle Dynamic instance

Under the new pricing model, as gregd highlighted in the pricing faq
Always On will be superseeded with Min Idle Instances.

So we shouldn't see that "weird" behaviour of Always On instance
getting idle, but rather a nice distribution of Dynamic Instance
matching the scheduler knobs you set in admin-console.

Thanks a lot for those suggestions, I will make sure to forward them
to the engineering team.

In the meantime feel free to open feature request for them if you want
to the community to be able to track their progress.

Santiago Lema

unread,
Jul 26, 2011, 7:47:37 PM7/26/11
to Google App Engine
To keep things simple, is this issue worth worrying about before the
new billing model becomes active ? Or will everything be so different
that the stuff we see know will be different then ?

My instances:
https://skitch.com/smalltech/fcyfd/instances-smallte.ch

Site: smallte.ch
(on http://websmalltech.appspot.com/ )


On 25 juil, 10:33, Johan Euphrosine <pro...@google.com> wrote:
> What is your setting for 'Max IdleInstances' ?
>
> Feel free to open a Production issue with you appid if you want me to
> track this specifically.
>
>
>
>
>
>
>
>
>
> On Sat, Jul 23, 2011 at 11:01 AM, Francois Masurel <f.masu...@gmail.com> wrote:
> > Geez, now even some Dynamicinstancesare not used too and stay idle for

Johan Euphrosine

unread,
Jul 27, 2011, 4:57:10 AM7/27/11
to google-a...@googlegroups.com
The gist of the discussion is:
- Idle Dynamic Instances get traffic in priority over Idle Always On Instances.
- People have shown in the thread that 'Max Idle Instance' knob could
be used to workaround this behavior by preventing too many Dynamic
Instances from spawning.
- This is not really worth worrying before the new billing model,
since the current behavior is temporary until Always On Instances get
superseeded by 'Min Idle Instance' knob.

A few suggestions coming out of this thread that I forwarded the
scheduler engineering team:
1/ Show the current CPU% usage of an instance.
2/ Show the current number of threads of an instance.
3/ Allow to override the 30 second deadline for the warm up requests
(for loading frameworks).
4/ New idle_instance_timeout knob: how long an idle instance can live
before being reclaimed.
5/ Allow to configure the capacity for Dynamic Instance (in a similar
way to Backends).
6/ Add rules for changing Max Idle Instances / Instance capacity


during different times of a day and week.

7/ Multiple high availability preset regarding instance spawning: (
safe + unpredicatable cost ) versus ( unsafe + predictable cost ).
8/ Allow user to specify a budget related to request serving delay so
the scheduler can synthesize a policy about instance spawning.
9/ Alternative load balancing strategy: load 1 instance > 50% before
using the second one and so on (instead of round-robin).

Hope that helps.

pdknsk

unread,
Jul 27, 2011, 6:20:13 AM7/27/11
to Google App Engine
It seems like the service is moving in a direction in which devs will
have to adjust many knobs, rather than have Google handle it
automagically in the background.

Johan Euphrosine

unread,
Jul 27, 2011, 11:41:34 AM7/27/11
to google-a...@googlegroups.com
Actually the latter knobs I listed were suggestion from the community
but feel free to raise a different voice.

The only knobs Google provide so far are:
- maximum number of idle instances
- minimum pending latency

See http://code.google.com/appengine/docs/adminconsole/performancesettings.html
for more details.

- min number of idle instances will come later to supplant Always on.

And all knobs should have automatic default values that make App
Engine handle those "automagically in the background".

Reply all
Reply to author
Forward
0 new messages