Indeed by default the scheduler will try to make space for spare
capacity in your frontend instances. There are usually two reasons for
this.
The first is that requests do not usually come in regularly, they
arrive in spikes and various irregular patterns. In order to
comfortably handle the spikes without pending latency, an app needs
more instances than would be expected by purely multiplying qps and
latency.
The second is that if the app were to receive more load, having spare
capacity is helpful to serve the additional load without pending
latency.
The design of the scheduler is that the default automatic mode decides
to minimize pending latency, and provide excess spare capacity, to
help deal with the two issues explained above. Obviously, not all apps
will want this, especially in light of the new billing formula. This
is why there is now the ability to opt-out of this by using the
Performance Settings options of max-idle-instances and
min-pending-delay. With those options you can opt-out and signal to
the scheduler that pending latency and spare capacity are not as
important for your app as instance utilization.
I hope that helps,
Jon
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>