min_instances, min_idle_instances, and old versions

Alan deLespinasse

unread,

Sep 10, 2020, 11:34:10 PM9/10/20

to Google App Engine

First question: Is this accurate? That is, if an auto-scaled service has min_instances set to nonzero, does that mean that instances in old versions don't get shut down when you deploy a new version? And those instances get billed?

I've been running a service with the following configuration (standard environment, Python 3.7):

runtime: python37

instance_class: F4

automatic_scaling:

min_instances: 1

max_instances: 10

inbound_services:

- warmup

New versions are deployed frequently because it's our integration environment. Apparently we've ended up with more and more instances running, because when we deploy a new version, the old version continues to exist and have 1 running instance. And apparently we're getting billed for these. I don't think I should have expected this, based on a close reading of the documentation. (I have opened a billing support request, since I think it's Google's error in documentation, if not actually a bug.)

So now I'm trying to fix the configuration to avoid this. Based on the Server Fault article I linked above, I tried setting min_instances to 0 and min_idle_instances to 1. This seems to result in always at least 2 instances running. I think maybe because one instance is getting requests (we have a once-a-minute cron job, among other things), so it's not "idle", so there has to be one more instance to have a minimum of one idle instance.

So I tried setting both min_instances and min_idle_instances to 0, but I still seem to always have at least 2 instances.

It's really hard to tell though, because the GCP console sometimes takes a bit to update, and maybe sometimes there are actually more instances than the configuration requires (I think maybe they're not always billed?).

So, second question: What is min_idle_instances actually supposed to mean? Is it the minimum number of instances, or is it the minimum number of "idle" instances, for some definition of "idle"? If an instance is serving 1 simple query per minute, does that mean it's not idle, so there will be a second, idle instance? But then there are 2 instances, and if there are a few simple queries happening per minute, it seems like some might be randomly routed to each instance, so neither instance would be idle, and a third instance would get started.

Another complication: in our production environment, I increased min_instances to 2 because of this issue. It's pretty important that we always (>99.9% anyway) have at least 1 running instance, since apparently there's no way to get instance startup time to less than 20-30 seconds, and to guarantee that, apparently we need 2 instances running most of the time, because they can get preempted at any time without warming up a replacement first. So now I'm not sure whether to set min_idle_instances to 1 or 2 or what in production.

Do I need to set max_idle_instances? The documentation says its default value is "automatic", but doesn't say what that actually means.

I'm having a hard time figuring out these issues based on the documentation.

All I want is

Instances get shut down when I deploy a new version (I would have thought this was always the case no matter what!)
Each of my environments normally just has 1 instance running, assuming light traffic (1-10 queries per minute). Having 2 instances always running in production is ok, if that's the only way to achieve the next point:
Never (or almost never) have zero instances running (aka almost never have a query take more than 2 seconds because of warmup time)
Autoscale up to a reasonable maximum if traffic gets heavier

I didn't think this would be so difficult. I've been using App Engine for a long time, and thought I knew what I was doing, but I guess I've never used these options in the current environment (Python 3, standard).

Olu

unread,

Sep 17, 2020, 2:26:50 PM9/17/20

to Google App Engine

To start with, I can confirm that you would be billed for all Instances in use, whether or not they are actively serving requests, traffic or not.

I will attempt to response to your inquiries as I have highlighted them below:

1. Is the information shared on this link[1] accurate?

A: It is not exactly clear which part of the information you are looking to verify. However, I assume you are trying to confirm the explanation about min_instance and min_idle_instances. If so, yes, the information is accurate as those words were copied verbatim from the Documentation[2][3]. If not, please reply to this thread.

2. if an auto-scaled service has min_instances set to nonzero, does that mean that instances in old versions don't get shut down when you deploy a new version? And those instances get billed?

A: I believe this article[4] explains in detail how Instances are managed, particularly on scaling down. Scaling down Instances depend on the decrease in the request volumes. Typically, App Engine Standard environment scales down to 0[5] and as explained here[6], if the scheduler decides shuts down active instances due to lack of requests being handled, another instance will not start until prompted by an external request, even with the min_instance set.

With all that being said, as explained in this documentation[7], the default behavior of App Engine Standard is that whenever a new application version is deployed, except the --no-promote flag is used in the deployment, the newly deployed version is automatically configured to receive 100% of traffic. So, with no traffic to the older version, the scheduler would shut down the instances due to lack of requests, even if the min_instance is set to nonzero.

If you are experiencing a different behavior, I suggest you reach out directly to the GCP Support Engineers[8] for better evaluation of the issue.

3. What is min_idle_instances actually supposed to mean?

A: As explained in the documentation[3], this is the number of instances that keeps running and ready to serve traffic. The idle Instances helps to avoid the effect of pending latency on your App Engine application.

As I may have alluded above, Instances are created whenever requests are received. When instances are created, there are certain steps that apply for the Instance to start up and be ready to attend to requests. These are explained in these documentation[9][10]. Basically, having Idle instances help to avoid such steps that would cause pending latency.

4. Do I need to set max_idle_instances?

A: No, you do not have need to set this parameter as it is Optional. Indeed, the default value of the max_idle_instances is automatic, which implies that the max is determined by the App Engine Autoscaler depending particularly on the number of requests being handled.

Not to overwhelm you with a lot of information, I think you can find the details that you require in my response. If not, please be sure to reply with more inquiries.

[1]https://serverfault.com/questions/999892/app-engine-standard-auto-scaling-how-to-stop-previous-version-on-deployment/999900#999900

[2]https://cloud.google.com/appengine/docs/standard/python3/config/appref#automatic_scaling_min_instances

[3]https://cloud.google.com/appengine/docs/standard/python3/config/appref#min_idle_instances

[4]https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#scaling_down

[5]https://stackoverflow.com/questions/51272392/how-to-scale-down-to-0-instances-in-gae-standard-go#answer-51291372

[6]https://issuetracker.google.com/162502284#comment2

[7]https://cloud.google.com/appengine/docs/standard/python/tools/uploadinganapp#deploying_an_app

[8][1]https://cloud.google.com/support-hub

[9]https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#startup

[10]https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#loading_requests

Alan deLespinasse

unread,

Sep 28, 2020, 5:43:09 PM9/28/20

to Google App Engine

tl;dr: Never use min_instances! It will just increase your bill unnecessarily.

On Thursday, September 17, 2020 at 2:26:50 PM UTC-4 Olu wrote:

To start with, I can confirm that you would be billed for all Instances in use, whether or not they are actively serving requests, traffic or not.

I will attempt to response to your inquiries as I have highlighted them below:

1. Is the information shared on this link[1] accurate?

A: It is not exactly clear which part of the information you are looking to verify. However, I assume you are trying to confirm the explanation about min_instance and min_idle_instances. If so, yes, the information is accurate as those words were copied verbatim from the Documentation[2][3]. If not, please reply to this thread.

Sorry, I guess I was referring to something implied by that link, not directly stated, which is that setting min_instances to 1 or more will result in instances never getting shut down in old versions, even if they are not receiving traffic.

2. if an auto-scaled service has min_instances set to nonzero, does that mean that instances in old versions don't get shut down when you deploy a new version? And those instances get billed?

A: I believe this article[4] explains in detail how Instances are managed, particularly on scaling down. Scaling down Instances depend on the decrease in the request volumes. Typically, App Engine Standard environment scales down to 0[5] and as explained here[6], if the scheduler decides shuts down active instances due to lack of requests being handled, another instance will not start until prompted by an external request, even with the min_instance set.

With all that being said, as explained in this documentation[7], the default behavior of App Engine Standard is that whenever a new application version is deployed, except the --no-promote flag is used in the deployment, the newly deployed version is automatically configured to receive 100% of traffic. So, with no traffic to the older version, the scheduler would shut down the instances due to lack of requests, even if the min_instance is set to nonzero.

Obviously setting min_instances overrides the default behavior of scaling down to zero. And I'm now convinced that, with min_instances set to nonzero, it doesn't scale down to zero even in obsolete versions that are set to receive no traffic. This isn't documented behavior, but I've seen it implied elsewhere (like the Server Fault page above), and it was more or less confirmed by the agent who handled my billing complaint (they checked with support engineers, I believe).

(As the documentation mentions, "For this feature to function properly, you must make sure that warmup requests are enabled and that your application handles warmup requests." So some users may have min_instances set to more than zero, but not see the above problem, because it is not actually configured correctly to maintain a minimum number of instances. I made this mistake for a while.)

If you are experiencing a different behavior, I suggest you reach out directly to the GCP Support Engineers[8] for better evaluation of the issue.

So apparently I have to pay for a support plan just to get information that should be in the documentation...

3. What is min_idle_instances actually supposed to mean?

A: As explained in the documentation[3], this is the number of instances that keeps running and ready to serve traffic. The idle Instances helps to avoid the effect of pending latency on your App Engine application.

I noticed that this documentation has recently been updated (maybe partly in response to my complaints?). It now says "The number of additional instances..." (emphasis mine), and goes on to explain that by "additional instances", it means that App Engine calculates the "necessary" number of instances to server current load, and adds on min_idle_instances more instances. So it does not mean that there will always be this many "idle" instances (for some definition of "idle"), as the name would imply. The new documentation is a big improvement. (new version / old version)

(Still waiting for the min_instances documentation to be updated to warn about the danger of zombie instances)

As I may have alluded above, Instances are created whenever requests are received. When instances are created, there are certain steps that apply for the Instance to start up and be ready to attend to requests. These are explained in these documentation[9][10]. Basically, having Idle instances help to avoid such steps that would cause pending latency.

4. Do I need to set max_idle_instances?

A: No, you do not have need to set this parameter as it is Optional. Indeed, the default value of the max_idle_instances is automatic, which implies that the max is determined by the App Engine Autoscaler depending particularly on the number of requests being handled.

Sorry, I was imprecise. I wasn't asking if it's required. I was asking if I should set it, i.e, if I might get surprises in my bill if it's not set, or anything. It is still not clear to me what it actually means, since there's no clear definition of "idle" provided, and anyway because of the previous confusion over min_idle_instances, I don't want to assume that it has anything to do with instances that are "idle". The current documentation implies that it has something to do with how rapidly instances will be scaled down after a traffic peak, but doesn't give me any way to quantitatively predict how rapidly it would scale down for a particular value. Anyway I'm not setting this for now.

For anyone reading this who's curious, this is my new production app.yaml file:

runtime: python37

instance_class: F4

automatic_scaling:

min_instances: 0

min_idle_instances: 1

max_instances: 10

inbound_services:

- warmup

With this configuration, there are always at least 2 instances of the current version. We always have a minimum of 1 request per minute (from a cron job); I assume it would probably scale down to 1 instance if we went a sufficiently long time with no requests at all (I have no idea how long it would take). Old versions do scale down to zero instances, though sometimes it takes a while. For our integration and staging environments, we set min_idle_instances to zero and max_instances to 2, and there is always at least one instance (presumably would scale down to zero if given a chance).

Boris Brudnoy

unread,

Oct 2, 2020, 11:35:20 AM10/2/20

to Google App Engine

Thanks for this conversation, it clarified some matters for me.

Is there, then, a way to tell the App Engine Standard Environment to only apply the min_instance setting to the default app version, especially if 100% of traffic is now directed at that default version? I'd like to avoid the scenario of older app versions running unused instances.

Thanks,

Boris

Alan deLespinasse

unread,

Oct 2, 2020, 11:54:49 AM10/2/20

to Google App Engine

Not sure what you mean by the default version. There's a default *service*, but generally you have a different configuration file for each service, so it's easy enough to have a different value of min_instances for each one.

My best advice for now is not to use min_instances; use min_idle_instances instead. Unfortunately settings min_idle_instances can result in a larger number of running instances than setting min_instances to the same number. For example, if min_idle_instances is 2, and you have a pretty low level of traffic that could easily be handled by one instance, then you'll have 3 instances. Whereas if min_instances were 2, you'd have only 2 instances under the same conditions.

The other option is probably some kind of scripting in your release process to delete instances of old versions, or even to delete all old versions (which I wouldn't want to do, because having old versions still around is very useful if you need to do a quick rollback). I'm not really sure what's possible.

Alan deLespinasse

unread,

Oct 2, 2020, 11:58:44 AM10/2/20

to Google App Engine

Note that I've had similar-looking issues in the past, though that was under very different conditions and it appeared to get fixed.

Reply all

Reply to author

Forward