Concurrency and instance startup logic

Mike

unread,

Sep 4, 2011, 6:26:27 AM9/4/11

to Google App Engine

Like many, I'm deeply disappointed that Google went ahead with the
extortionate price increases, but instead of complaining about it
(because it seems to be falling on deaf ears) I'm trying to come up
with a workable solution to decrease costs.

I have a couple of apps that see an average amount of traffic (around
1 request every 3 seconds, fairly constantly) and I'm trying to
persuade the scheduler to keep only 1 instance running.

I'm confident only 1 instance is needed. - Latency is around 150ms and
thread-safe is true. However, there always appears to be 2 or more
instances active. Its ridiculous to see these instances having served
1 request sitting idle for 15 minutes. My understanding was that
enabling concurrent requests would stop these additional instances
starting up because if one request is waiting on I/O (common) another
request can be handled.

What is the instance startup logic when thread-safe is true? I can not
see any difference. If anything, setting thread-safe to true has
increased the number of idle instances. In my eyes, the scheduler is
broken:

Min Pending Latency appears to have no effect. Even set at 15s you
still see new instances start up unexpectedly.

Max Idle Instances should drop down to zero. I do not want an
additional idle instance bringing the total instances to 2.

Can we have an option to decrease the 15 minute idle time before
stopping. I do not want to pay for 15 minutes instance time to serve
one request.

Another point: instance Startup time seems way too high in general. I
do not understand why it takes at least 4 seconds to start an
instance. Lightweight servlet containers can start in 200ms these
days. If startup time was optimised, the 15 minute idle period reduced
to say 5 minutes and the scheduler made less enthusiastic about
starting new instances I think the changes might be (almost)
acceptable. However, right now, I think a lot of people are most
annoyed about paying for instances that are sitting there (mostly)
idle - and this is understandable, we got used to paying for CPU time.

I would really like to know how the scheduler decides whether or not
to start up a new instance when thread-safe is true. Is this
documented anywhere?

Kind regards,

Mike.

Sergey Schetinin

unread,

Sep 4, 2011, 7:00:39 AM9/4/11

to google-a...@googlegroups.com

I have an app that has instances page looking like this: http://i.imgur.com/YROrD.png

It's a very small app with billing disabled. It will not work within free quota after the pricing change simply because the scheduler is no good.

I think one way to fix this would be to open-source the scheduler -- that would add some transparency at least (I've sent a more detailed email about this to the group, but it seems to have been filtered as spam or whatever).

-Sergey

Gerald Tan

unread,

Sep 4, 2011, 12:39:46 PM9/4/11

to google-a...@googlegroups.com

At the moment I've found that the best way to keep it capped to 1 instance is to ensure that the 1 instance you have does not die. The problem is when there are 0 instances running, and 2 quick requests come in, the scheduler will startup 2 instances to handle both requests. So what I did was to add a heartbeat cron job to send in a request to a servlet that does nothing, once per minute. This seems to work great, keeping that one instance going. The deployment was made 8 hours ago, and there has only been 1 instance since then.

Mike

unread,

Sep 5, 2011, 4:01:14 PM9/5/11

to Google App Engine

I have been playing around with the settings for a while now and have
come to the conclusion that enabling concurrent requests i.e. <thread-
safe>true</thread-safe> does not mean that the scheduler chooses to
send multiple requests to active instances - it still starts more. I
am also confident that Min Pending Latency is not correctly
implemented - I can not see evidence that any requests were waiting 15
seconds in the logs yet the schedular chose to start additional
instances.

We all need justification for the schedulers strange behaviour.

Personally I would like to be able specify a maximum number of
instances. So, for example, if I say 1 instance, all requests go to
that 1 instance. A maximum rate could be specified by google for an
instance, say 100 per second, beyond that a redirect to a page saying
server overloaded and an email sent to admin to suggest paying for an
additional instance.

Nick Rudnik

unread,

Sep 5, 2011, 7:53:39 PM9/5/11

to google-a...@googlegroups.com

When you look in the logs to see why new instances were started, you should see right before a warmup request a request that takes longer than the minimum pending latency. Is that not the case? I too am trying to understand how instances are started so I can avoid extra instance hours and this is how I have tracked down the requests causing new instances to start.

Jon McAlister

unread,

Sep 5, 2011, 8:51:48 PM9/5/11

to google-a...@googlegroups.com

Hi Mike, I can help explain some of the issues going on here.

For starters, if an app is thread-safe, then yes that is a very
important signal to the scheduler. It indicates that an instance
can sustain more requests and as such there is no need to place
the incoming request on a pending queue (assuming no other idle
instances).

The next most important thing I should point out is that the
actual formula for billing purposes for frontend instances is not
the "total instances" (blue line) on your instances dashboard
graph. If you set max-idle-instances (it appears you did this for
most of 09-01 and half of 09-02, and then again earlier today),
then the actual formula will be based on both
total-instances (blue line) and active-instances (orange line):

billable instances = min(active-instances + max-idle-instances,
total-instances)

So, in your example, active-instances seems to be 0.1-0.2,
total-instances is between 2 and 3, and max-idle-instances was 1
during certain time windows. This explains why your billed
frontend instances hours were 51.69, 54.82, 30.82, and 41.67
respectively on 08-30, 08-31, 09-01, and 09-02. The days where
you had max-idle-instances=1 more, the lower the billed instances
hours were. This, even though the total number of instances was
largely unmodified.

This is because max-idle-instances has two primary functions. The
first is to advise the scheduler to physically kill off excess
idle instances, and the latter is to be interpreted as a hard
ceiling (per the above formula) by our billing system. This is
why on 09-01, you were billed based on an average rate of 1.28
instance-seconds-per-second (the other noise in this figure is
that the max-idle-instances setting was made at 02:56:41, so did
not apply for the first ~three hours of the day).

Regardless of the billing question, why are there 2-3 instances?
What's happening here is that the loading requests for this app
have an average latency of 10.6s:
https://appengine.google.com/logs?app_id=jadsbeta&version_id=29.353002800509795229&severity_level_override=1&severity_level=3&tz=US%2FPacific&filter=loading_request%3D1&filter_type=regex&date_type=now&date=2011-09-05&time=13%3A00%3A45&limit=20&view=Search.
So, whenever the apps needs to do a loading request, this is
problematic because, as that instance is conducting the loading
request, there are more requests coming in, and those need to be
serviced. But they can't be serviced by the existing instance,
because it is doing a loading request (warm (non-loading) requests can
be concurrent, but the loading request cannot be concurrent with
non-loading-requests). So the scheduler turns up a new instance.

[Note: the setting of min-pending-delay=15s actually should help
in most cases here since loading request latency is still
<15s. Unfortunately there is a bug where right now we are
basically treating min-pending-delay=15s as equivalent to
min-pending-delay=10s:
http://code.google.com/p/googleappengine/issues/detail?id=5765.
I have since fixed this and it should be live in a few weeks.]

The recommendation to keep the clone running is not altogether a
bad one, in that it means you have fewer loading requests, so
this situation is avoided, as the warm request latency is nice
and low. But if you can reduce his loading request latency, that
would also help. But at the end of the day, it won't change the
bill if you have set max-idle-instances. Afaict, 30 cents per day
is probably the lowest your bill could get until the
aforementioned bugfix is live.

I hope that helps here,
Jon

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Reply all

Reply to author

Forward