"15 minutes idle" - my new understanding of how it works

Tim

unread,

Sep 8, 2011, 6:07:48 AM9/8/11

to google-a...@googlegroups.com

Some of these threads are getting a bit long ("tl;dr"), but following Jon's interesting explanations of the "15 minute minimum" issue that has had many of us worried, let me try a brief explanation of how I see it working (I'm a user, not a Google employee, so take this with a pinch of salt until confirmed/refuted).

- From when an instance starts, to when it dies, it is at any time in one of 3 states

- ACTIVE: actively serving a request (ie from receiving an HTTP request until it finishes serving the HTTP response)

- IDLE: inactive, not serving a request but still notionally allocated to a machine and in memory, will NOT be killed

- FREE-IDLE: the process itself is in precisely the same state as "idle", but is NOT billed, and is eligible to be killed at any time

- You are billed for total instance time of all instances in states "active" and "idle", but you are NOT billed for any time spent in "free idle" state

- Only "free idle" instances are truly killed (other than failure, load balancing etc)

- The "free idle" state is how GAE attempts to optimise scheduling and fault tolerance without thrashing etc without costing you anything

The question now is how an instance starts, ends, and changes from one state to another.

- An instance starts when either

- it is needed to serve a request (there are no IDLE or FREE-IDLE instances available subject to max-pending-latency) - starts in ACTIVE state

- or you have specified an "always-on" count - instances will start in IDLE state until "always-on" count is reached

- with this regards to "always-on", I don't know if what you specify is a number of machines you in "ACTIVE or IDLE" state, or "ACTIVE or IDLE or FREE-IDLE" state

- Any instance becomes ACTIVE when the scheduler allocates it a request to serve

- An ACTIVE instance becomes IDLE when it finishes serving a request

- (with concurrency, this is actually "when it is serving 0 requests")

- An IDLE instance becomes FREE-IDLE when either

- it has been IDLE for 15 consecutive minutes

- or the number of IDLE instances exceeds your max-idle-instances count and this instance is chosen to become FREE-IDLE

- or something within GAE (fault tolerance, load, outages) needs to otherwise kill your IDLE instance

- that is, they can switch any IDLE instance to FREE-IDLE and then kill it at any time subject to their own discretion

Now looked at in this way, you can see that "billed instance time" is not the same as "actual instance time" (what you see on your dashboard doesn't currently distinguish between IDLE and FREE-IDLE) but that "billed instance time" is essentially "time spent active serving a single request, including waiting for synchronous API calls to complete" plus "time spent in IDLE state". This is not the same "CPU cycles consumed", but more like "share of the capacity" (like paying for time in an internet cafe).

But the big point is that the FREE-IDLE state is completely unbilled, and you have some control over when an instance enters this state - it's not true to say "all instances will be billed for a minimum of 15 minutes once started" as we all first heard, but you should instead see that "up to max-idle instances number of instances in the IDLE state will remain so for 15 minutes without activity before entering FREE-IDLE" state.

So what's the point of IDLE state for you?

Well, if you know you have a big startup time (hello Java) or you use in-process-memory state caching (rather than, say, memcache) then IDLE instances are your best friend, and you'll want to set a max-idle-instances sufficient to handle the rate of traffic increases for bursts, but if your startup time is minimal (hello python-with-webapp and handlers that only use memcache/datastore to preserve state) then IDLE instances deliver minimal benefit, and you'll want to set IDLE instances to a low number, maybe 1 or even 0 if GAE agree to allow such a setting.

This mental model is what's put the smile back on my face ... here's hoping Jon and Greg don't come back to tell me I'm completely wrong...

Cheers

--

Tim

Joshua Smith

unread,

Sep 8, 2011, 9:56:49 AM9/8/11

to google-a...@googlegroups.com

I take issue with this:

On Sep 8, 2011, at 6:07 AM, Tim wrote:

- An instance starts when either
- it is needed to serve a request (there are no IDLE or FREE-IDLE instances available subject to max-pending-latency) - starts in ACTIVE state

I have seen cases where an instance starts for no apparent reason. In my case, I'm bumping along with just one instance most of the time. Every now and then, the scheduler starts a second instance, even though the first one is still responding quickly and hasn't handled a request in a few seconds.

My theories are:

- There is a race condition, data sync, or timing issue, which causes the schedule to sometimes think an instance is busy when it isn't; or,

- There are heuristics in the scheduler which predict an increase in load, and act proactively (such as a request coming in from a new IP address)

My details are in the thread "Scheduler Bug or Inscrutable Scheduler "Feature"". No response from our google overlords on that one.

Another case was reported as well, although he had hundreds of instances running, so it would be a lot harder to grok the state of the system in his case.

-Joshua

Jon McAlister

unread,

Sep 8, 2011, 10:31:54 AM9/8/11

to google-a...@googlegroups.com

Hi Tim,

Thanks for writing that. It's correct, and a cool way to think about
how idleness works under the new model.

Cheers,
Jon

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/78I4vMk1Al8J.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

Jon McAlister

unread,

Sep 8, 2011, 10:36:05 AM9/8/11

to google-a...@googlegroups.com

On Thu, Sep 8, 2011 at 6:56 AM, Joshua Smith <Joshua...@charter.net> wrote:
> I take issue with this:
> On Sep 8, 2011, at 6:07 AM, Tim wrote:
>
> - An instance starts when either
> - it is needed to serve a request (there are no IDLE or FREE-IDLE
> instances available subject to max-pending-latency) - starts in ACTIVE state
>
> I have seen cases where an instance starts for no apparent reason. In my
> case, I'm bumping along with just one instance most of the time. Every now
> and then, the scheduler starts a second instance, even though the first one
> is still responding quickly and hasn't handled a request in a few seconds.
> My theories are:
> - There is a race condition, data sync, or timing issue, which causes the
> schedule to sometimes think an instance is busy when it isn't; or,
> - There are heuristics in the scheduler which predict an increase in load,
> and act proactively (such as a request coming in from a new IP address)
> My details are in the thread "Scheduler Bug or Inscrutable Scheduler
> "Feature"". No response from our google overlords on that one.

I thought I did respond... In any event, for the reasons you listed
above and others, this is why max-idle-instances is important. It
ensures that you are not held accountable for scheduler behaviors such
as these listed. When you set it, the billable-instances-rate is
determined by max-idle-instances (a setting you directly control) and
active-instances-rate (again, hopefully something you control). The
nuances of how the scheduler spins up extra instances to minimize
latency and provide spare capacity are not part of the formula, other
than their effect on your serving latency and reliability.

> Another case was reported as well, although he had hundreds of instances
> running, so it would be a lot harder to grok the state of the system in his
> case.
> -Joshua
>

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

Joshua Smith

unread,

Sep 8, 2011, 11:45:19 AM9/8/11

to google-a...@googlegroups.com

On Sep 8, 2011, at 10:36 AM, Jon McAlister wrote:

I thought I did respond... In any event, for the reasons you listed
above and others, this is why max-idle-instances is important. It
ensures that you are not held accountable for scheduler behaviors such
as these listed. When you set it, the billable-instances-rate is
determined by max-idle-instances (a setting you directly control) and
active-instances-rate (again, hopefully something you control). The
nuances of how the scheduler spins up extra instances to minimize
latency and provide spare capacity are not part of the formula, other
than their effect on your serving latency and reliability.

Unless I'm misunderstanding, we are "held accountable for scheduler behaviors such as these listed."

If the load could be served by a single instance, but the scheduler decides to start a second one to handle a single request (for no apparent reason), there is going to be a minimum of 0.25 instance hours added to my bill. If this happens once a day, and I need an instance up all the time to handle an external kiosk which refreshes itself, then I'm going to be charged for 24.25 - 24 (free) = 0.25 instance hours. If this happens X times a day, I'll be charged for 0.25X instance hours a day. And I have the billing prediction numbers to prove it:

Obviously, this isn't that big a deal, since I've got to give you $9 a month anyway. But if the weird scheduler behaviors scale up, then this occasional propensity to start unneeded instances could really start costing someone some serious $$.

-Joshua

zdravko

unread,

Sep 8, 2011, 11:57:00 AM9/8/11

to Google App Engine

Why could a start-up image not be saved and almost instantaneously
loaded from disk (like PC hibernation does) ?

Would this not totally replace the need for any idle instances ?
Where is the catch, considering how too obvious this is ;?)

Jon McAlister

unread,

Sep 8, 2011, 1:37:28 PM9/8/11

to google-a...@googlegroups.com

Hi Joshua,

You are right, I should have been more explicit. While I believe

what I said to be true for most apps, you are right that there

exists classes of apps where the second half of the billing

formula needs to be explicitly considered.

For your app towngovernment, it has one instance serving pretty

much all day, and occasionally two. The

active-instances-rate (the orange line) is always <0.1, and the

total-instances-rate (the blue line) is 1 and occasionally 2. You

have set max-idle-instances=1. As such, with the billing formula:

billable-instances-rate = min(active-instances-rate + max-idle-instances, total-instances-rate)

This evaluates to billable-instances-rate = min([0..0.1] + 1,

[1..2]) = [1..1.1]. That is, it has a different value at each

part of the day depending on the present value of

active-instances or total-instances, but always lays in the range

[1..1.1].

Further, since you set max-idle-instances=1 at

2011/09/06-06:50:41, what I said above only really applies to

your billing reports for 09-06 and onwards. Although, for the

09-06 report, it only applies to 17/24 of the day, whereas for

the 09-07 report it will apply to the entire day. Note that this

is actually why your 09-06 billable instances hours (25.59) are

less than your 09-06 billable instance hours (29.17). Also, your

09-07 report should be even lower.

I think that 25 daily instance hours is about the lowest your

going to see for this app.

I hope that helps,

Jon

PastedGraphic-10.png

PastedGraphic-9.png

Jon McAlister

unread,

Sep 8, 2011, 1:38:53 PM9/8/11

to google-a...@googlegroups.com

Oops, a typo. I meant to say:

"""Note that this
is actually why your 09-06 billable instances hours (25.59) are

less than your 09-05 billable instance hours (29.17). Also, your

09-07 report should be even lower."""

> I think that 25 daily instance hours is about the lowest your
> going to see for this app.
> I hope that helps,
> Jon
>
> On Thu, Sep 8, 2011 at 8:45 AM, Joshua Smith <Joshua...@charter.net> wrote:
>>
>> On Sep 8, 2011, at 10:36 AM, Jon McAlister wrote:
>>
>> I thought I did respond... In any event, for the reasons you listed
>> above and others, this is why max-idle-instances is important. It
>> ensures that you are not held accountable for scheduler behaviors such
>> as these listed. When you set it, the billable-instances-rate is
>> determined by max-idle-instances (a setting you directly control) and
>> active-instances-rate (again, hopefully something you control). The
>> nuances of how the scheduler spins up extra instances to minimize
>> latency and provide spare capacity are not part of the formula, other
>> than their effect on your serving latency and reliability.
>>
>> Unless I'm misunderstanding, we are "held accountable for scheduler behaviors such as these listed."
>> If the load could be served by a single instance, but the scheduler decides to start a second one to handle a single request (for no apparent reason), there is going to be a minimum of 0.25 instance hours added to my bill. If this happens once a day, and I need an instance up all the time to handle an external kiosk which refreshes itself, then I'm going to be charged for 24.25 - 24 (free) = 0.25 instance hours. If this happens X times a day, I'll be charged for 0.25X instance hours a day. And I have the billing prediction numbers to prove it:
>>

Joshua Smith

unread,

Sep 8, 2011, 2:01:17 PM9/8/11

to google-a...@googlegroups.com

That all sounds right to me. And note that I'm not really worried about this because:

1) I have to give you $9 anyway, and
2) When 2.7 comes out, assuming I survive the switch to HR, I should be able to handle everything with 1 instance no problem.

It's just that there are some *unexplained* things going on in the scheduler, and that is driving a lot of this worry among developers.

A full accounting of the kinds of glitches that can cause the scheduler to be heavy-handed with the instances is probably something you guys should give us, to help drive the point home that you are aware of these issues and are doing something about them.

-Joshua

Gerald Tan

unread,

Sep 8, 2011, 2:13:40 PM9/8/11

to google-a...@googlegroups.com

Unless I'm misunderstanding, we are "held accountable for scheduler behaviors such as these listed."

If the load could be served by a single instance, but the scheduler decides to start a second one to handle a single request (for no apparent reason), there is going to be a minimum of 0.25 instance hours added to my bill. If this happens once a day, and I need an instance up all the time to handle an external kiosk which refreshes itself, then I'm going to be charged for 24.25 - 24 (free) = 0.25 instance hours. If this happens X times a day, I'll be charged for 0.25X instance hours a day. And I have the billing prediction numbers to prove it:

I don't think this is correct, according to the "Max Idle Instances + Active Instance" charge. Spinning up extra instances beyond the first one (assuming Max Idle Instances) should not be costing you. Your total instance hours charge for the day should be 24 + area under the yellow curve on the instance graph.

My latest estimated bill came round to 24.11 total instance hours - I have a single instance that is being kept alive by a cronjob every minute, and extra instances were spawned a few times through the day. My total instance hours being < 24.25 shows that you don't automatically get charged an extra 0.25 instance hour if an extra instance hour. Only the Active Instance Hours matter - which is the yellow line.

Jon McAlister

unread,

Sep 8, 2011, 2:17:00 PM9/8/11

to google-a...@googlegroups.com

That sounds like a reasonable request. However, I would not
characterize this example as heavy-handed. Your desire here is for one
instance always, never more and never less. And occasionally you are
getting one extra instance, which is soon killed off. In this
particular design space, that is literally the smallest possible error
that could be made by the instance scheduler :-P

I understand that the worry, and I agree we should do better. There
are indeed several things which can be happening here. The usual
suspects are request spike, app moved to another machine, or datastore
latency spike, but there are other things that can trigger this as
well. Basically, it's actually a complex distributed system under the
hood, but we're trying hard to make it appear to not be. But at places
like this, details are bleeding through the abstraction layer.

Thanks for hanging in with us. We mean well, we messed up, we're working on it.

Hope that helps,
Jon

Stephen

unread,

Sep 8, 2011, 3:10:18 PM9/8/11

to google-a...@googlegroups.com

On Thu, Sep 8, 2011 at 7:17 PM, Jon McAlister <jon...@google.com> wrote:
>
> I understand that the worry, and I agree we should do better. There
> are indeed several things which can be happening here. The usual
> suspects are request spike, app moved to another machine, or datastore
> latency spike, but there are other things that can trigger this as
> well. Basically, it's actually a complex distributed system under the
> hood, but we're trying hard to make it appear to not be. But at places
> like this, details are bleeding through the abstraction layer.

One way to fix some of these problems would be to raise the free quota
to 25 instance hours. Greg D thought this was a good idea when the new
pricing was first announced, but it seems to have been forgotten
about:

On Wed, May 18, 2011 at 4:55 PM, Gregory D'alesandre <gr...@google.com> wrote:
> On Wed, May 18, 2011 at 1:57 AM, 风笑雪 <kea...@gmail.com> wrote:
>>
>> Hi Greg,
>>
>> Can you raise On-demand Frontend Instances free quota to 25 Instance Hours
>> per day?
>>
>> The small apps have very low traffics in average, but sometime (maybe
>> several minutes) it may use more than 1 instances to handle burst traffics,
>> so that they will got OverQuotaErrors at the end of the day.
>
> Its a good point, we'll look into this. Thanks for the suggestion!
>
> Greg

Another way to fix some of this might be to enable people to allocate
resources within the free budget allowance themselves. So for example,
some of the 9 hours currently available for backends could be used for
front ends, or vice versa. Arguably, it would better highlight some of
the advantages of the billing interface.

Joshua Smith

unread,

Sep 8, 2011, 3:37:37 PM9/8/11

to google-a...@googlegroups.com

On Sep 8, 2011, at 2:17 PM, Jon McAlister wrote:

I would not
characterize this example as heavy-handed

Nor would I. I was thinking more of that poor fellow who's seeing a 100 instance spike every two hours. That's scary.

-Joshua

Jon McAlister

unread,

Sep 8, 2011, 3:40:18 PM9/8/11

to google-a...@googlegroups.com

But if his max-idle-instances setting is less than 100, he's protected
against that. The protection fails for cases like the one you list
where the scheduler variance is less than the max-idle-instances
setting.

Tim

unread,

Sep 8, 2011, 3:57:27 PM9/8/11

to google-a...@googlegroups.com

On Thursday, 8 September 2011 16:57:00 UTC+1, zdravko wrote:

Why could a start-up image not be saved and almost instantaneously
loaded from disk (like PC hibernation does) ?

Would this not totally replace the need for any idle instances ?
Where is the catch, considering how too obvious this is ;?)

Again, I don't work for Google or AppEngine, but having worked on large computational (non-web) grids let me give you a possible reason.

The machines on our grids had strictly controlled disk images, and apart from swap and OS tempfiles, the jobs run on the machines never get to write anything to the local disk - all persistence is to shared and replicated data storage (see the way that scripts can't access static files or the file system, but the blobstore and datastore write data elsewhere). The scripts that are invoked are loaded from smart shared storage drives with local caching buried in the file system (something like AFS - the Andrew File System - is ideal: very fast global reads with smart tiered caching but quite horrible distributed write performance).

This way machines can be swapped in and out without the grid management even knowing that a machine has changed (ie you don't have to remove machine #1234 and add #6789, but you take out #1234 and replace it with a new #1234).

Now hibernating to disk is either going to write to the local disk, which is a no-no from managing the local machines (which may be diskless using network boot or minimal disk for just the O/S and swap), or it'll hibernate the image to shared remote storage, which then has to replicated and managed and synchronised etc and you have to allow for pulling the image from the centralised disk and starting up and re-connecting to any dynamic services, in which case you're back into large startup times and a whole load of persistence costs just like the datastore/blobstore every time something is shut down.

Now if you were Google and you were really going to do hibernation, to my mind you'd build the hibernation image at deployment time (ie when a new version is uploaded the grid starts up an instance runs it to some known initialised but not active state, and hibernates that and then deploys THAT image for starting up instances of the new version), but that would require APIs or similar to let you define when your process is ready to hibernate, and explanations of how things like changes of GAE version or the built-in libraries etc will cause new images to be built and re-deployed, or the restarted image would have such dynamic info passed to it to re-initialise key details

So the short answer is - hibernation doesn't fit with the way the compute grid has probably been built and is managed.... a grid is not 100,000 desktop machines...

--

T

zdravko

unread,

Sep 9, 2011, 12:56:58 AM9/9/11

to Google App Engine

Dear Google,

Do you have Tim's resume?

Considering how poorly overall this GAE pricing increase has been
handled and *explained* and with Tim's knack for being able to explain
complex issues in plain and simple terms, his resume should be on top
of GAE's resumes pile.

And, for the good of the whole GAE community, I would be glad to wave
my finders' fees ;)

Sincerely,
Zdravko

Gregory D'alesandre

unread,

Sep 9, 2011, 1:21:50 AM9/9/11

to google-a...@googlegroups.com

I still think its a good idea! I think it is such a good idea, that I might talk about it on our blog tomorrow ;)