Why resident instances in auto scaling are idle?

3,371 views
Skip to first unread message

Shashikanth Reddy

unread,
Oct 22, 2016, 8:05:52 AM10/22/16
to Google App Engine



I have a resident instance which is not handling any api requests. Why is that instance idle?

Jordan (Cloud Platform Support)

unread,
Oct 24, 2016, 8:24:04 PM10/24/16
to Google App Engine
Idle Resident Instances are instances waiting with your application code pre-loaded and warmed up, but never actually accept requests. When an Idle Resident instance is needed by your application due to high traffic or CPU intensive operations, your Resident instance will turn into a Dynamic instance and start accepting new requests. 

Once you have your new Dynamic instance handling your incoming requests, App Engine will create a new Resident instance in order to maintain your 'Minimum Idle Instances' setting in your 'appengine-web.xml' or 'app.yaml'. The Dynamic instances will then turn down when there is a period of no traffic, in order to save you money. 

Vidya Narayanan

unread,
Oct 24, 2016, 11:32:31 PM10/24/16
to Google App Engine
Your response suggests that we should not be seeing high instance startup times (since the idle resident instance should turn into a dynamic instance to handle new requests). But that isn't quite what we are seeing - we are seeing high latencies coinciding with instance startups. Why would that make sense? 

Thanks,
Vidya 

Jordan (Cloud Platform Support)

unread,
Oct 25, 2016, 2:54:51 PM10/25/16
to Google App Engine
Resident Instances turning into Dynamic Instances to handle requests does not effect the time required to start an instance (it is actually designed to help it). When a new Idle Resident Instance is required, an '/_ah/warmup' will be sent to your application. This will trigger the creation of a new instance, and your code will begin to run. Therefore if you are seeing high latency during instance startup, it is likely your code that is the cause. 

You can use the Stackdriver Trace tool to sort requests to your application over a period of time by highest latency. You can then select the requests with the highest latency which will show you the actual processes that ran during the request. If your code makes any URL Fetch requests to other applications or servers, this forces your app to wait for this external service to run, causing higher latency. Multiple individual calls to other Google services such as the Datastore may also cause latency. It is recommended to perform batch requests to any Google service that supports it, in order to reduce the amount of calls your application makes. By optimizing the time needed to execute your code, the latency experienced by an incoming request will be reduced. 

Vidya

unread,
Oct 27, 2016, 4:12:04 AM10/27/16
to google-a...@googlegroups.com
Let me try to understand this correctly. There is a general set of best practices for being efficient with latencies - including trying to do batch requests wherever possible and storing data in memcache to avoid a lot of datastore queries and so on. 

And then there is the question of what causes the latency spikes at the time of an instance starting up. We have observed specific spikes that occur only when a new instance is started. If I understand what you wrote correctly, you appear to be saying that if there are any external URL fetches, those calls will need to wait for the instance to startup before they can be handled - is that right? I am assuming any Image Service calls will be considered external server calls as well? 

A separate question is why the instance takes 30-60s to come up when our application startup times on a local machine or compute engine instance are in the order of 5-10s - what could be causing a 6x increase in startup times? Given we are on F4 instances, this sounds very strange. 

Thanks,
Vidya 

On Tue, Oct 25, 2016 at 7:54 AM, 'Jordan (Cloud Platform Support)' via Google App Engine <google-a...@googlegroups.com> wrote:
Resident Instances turning into Dynamic Instances to handle requests does not effect the time required to start an instance (it is actually designed to help it). When a new Idle Resident Instance is required, an '/_ah/warmup' will be sent to your application. This will trigger the creation of a new instance, and your code will begin to run. Therefore if you are seeing high latency during instance startup, it is likely your code that is the cause. 

You can use the Stackdriver Trace tool to sort requests to your application over a period of time by highest latency. You can then select the requests with the highest latency which will show you the actual processes that ran during the request. If your code makes any URL Fetch requests to other applications or servers, this forces your app to wait for this external service to run, causing higher latency. Multiple individual calls to other Google services such as the Datastore may also cause latency. It is recommended to perform batch requests to any Google service that supports it, in order to reduce the amount of calls your application makes. By optimizing the time needed to execute your code, the latency experienced by an incoming request will be reduced. 

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/0ZBLsyc51gk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/d4baf583-826f-45b3-b139-1f47e699697e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jeff Schnitzer

unread,
Oct 27, 2016, 8:03:45 PM10/27/16
to Google App Engine
The GAE classloader does some security checking that isn’t present in the dev container. Plus actual loading of classes from jars seems to be slower (probably some sort of network filesystem is involved). 5-10s startup time locally is quite long; a corresponding 30-60 server-side seems realistic, even with F4s. Things to avoid, if you can: Classpath scanning, AOP. Both seem to slow things down.

Also: I have always found Bad Things happen when trying to use resident instances. At lower traffic levels, it seems to produce _more_ cold start requests rather than less. I’ve had the best user experience leaving all automating scaling settings at their default - for both high traffic and low traffic apps. 

Jeff

On Wed, Oct 26, 2016 at 9:11 PM, Vidya <vi...@thesilverlabs.com> wrote:
Let me try to understand this correctly. There is a general set of best practices for being efficient with latencies - including trying to do batch requests wherever possible and storing data in memcache to avoid a lot of datastore queries and so on. 

And then there is the question of what causes the latency spikes at the time of an instance starting up. We have observed specific spikes that occur only when a new instance is started. If I understand what you wrote correctly, you appear to be saying that if there are any external URL fetches, those calls will need to wait for the instance to startup before they can be handled - is that right? I am assuming any Image Service calls will be considered external server calls as well? 

A separate question is why the instance takes 30-60s to come up when our application startup times on a local machine or compute engine instance are in the order of 5-10s - what could be causing a 6x increase in startup times? Given we are on F4 instances, this sounds very strange. 

Thanks,
Vidya 

On Tue, Oct 25, 2016 at 7:54 AM, 'Jordan (Cloud Platform Support)' via Google App Engine <google-appengine@googlegroups.com> wrote:
Resident Instances turning into Dynamic Instances to handle requests does not effect the time required to start an instance (it is actually designed to help it). When a new Idle Resident Instance is required, an '/_ah/warmup' will be sent to your application. This will trigger the creation of a new instance, and your code will begin to run. Therefore if you are seeing high latency during instance startup, it is likely your code that is the cause. 

You can use the Stackdriver Trace tool to sort requests to your application over a period of time by highest latency. You can then select the requests with the highest latency which will show you the actual processes that ran during the request. If your code makes any URL Fetch requests to other applications or servers, this forces your app to wait for this external service to run, causing higher latency. Multiple individual calls to other Google services such as the Datastore may also cause latency. It is recommended to perform batch requests to any Google service that supports it, in order to reduce the amount of calls your application makes. By optimizing the time needed to execute your code, the latency experienced by an incoming request will be reduced. 

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/0ZBLsyc51gk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/d4baf583-826f-45b3-b139-1f47e699697e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.

To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.

Jordan (Cloud Platform Support)

unread,
Oct 27, 2016, 8:10:30 PM10/27/16
to Google App Engine
You are correct, executing a URL Fetch request during the initiation of your code will cause a large amount of latency as your instance must wait for the requesting server to respond. As previously mentioned you can use the Stackdriver Trace tool on a specific '/_ah/warmup' request that is seeing high latency to investigate the exact parts of your code that are taking up the most time. 

Using this same tool I went ahead and took a look at your project. I saw that on 2016-10-22 a single '/urlfetch.Fetch' took 47.5 seconds to return in one of your '/_ah/warmup' instance startups. I also saw a single request to your endpoint '/api/admin/warmup' took 62 seconds, and 56 '/datastore_v3.Put' calls took a combined 2 seconds, all during the same instance startup. These results were quickly pulled from the 'Summary' tab in the trace details for one of your calls. 

Concerning the latency comparison of running your app in production vs locally in development. While running locally your application and all of its assets lives within a local web server on your computer. This web server is hosted on your same localhost IP, meaning any outgoing requests will be instantly served by your same computer. This differs from production in that any URL Fetch request or API call needs to be served from a different location by a different server, bringing network latency and the traffic congestion of each server into the mix.  

You can easily see from the above how removing URL Fetch requests and batching your requests to Google Services would drastically reduce the startup time for your instances. 


Nick

unread,
Oct 28, 2016, 1:59:02 AM10/28/16
to Google App Engine
A few years ago I came across a suggestion that you can improve startup time by including you war class files in a jar, implying that maybe the war is loaded in an exploded manner. Not sure if this is true or relevant, I've never bothered. You could test it out.

I've never seen resident instance work or do anything, I always observe cold starts on requests I make even when a resident instance is running. Quite often you'll also see the full load being born by one or two instances while others lie around for a long time only serving one or two requests. It would be great if someone had time to play and understand practically what matters.

I found it interesting that resident instances are supposed to be converted to dynamic though - I never noticed that.

Vidya

unread,
Oct 28, 2016, 10:52:45 AM10/28/16
to google-a...@googlegroups.com
I think we were talking about orthogonal points to an extent here. There are two separate components: 

1. Why does it take so long to startup an instance? 

The response here is that we need to work on our app startup times and that's fair. I am positive we have room for optimization there, although there will come a point where the tradeoff is really about using external libraries vs getting down to the low level APIs - and we need to make that an acceptable 2016 coding solution without needing to go back to circa 2000 :) 

2. While the new instance is being brought up, why aren't requests being served by the resident instance and why do we always see the resident instance to be idle? 

This is a question I haven't seen an answer to yet. Really, this is the bigger question, since even if we brought down our app startup times to say, 5 seconds, that's still unacceptable user latencies (and an average of 5s will imply enough variances that reach 2-3x that). 

Now, after this discussion, I dug up a bunch of things and have learned a thing or two about how the GAE schedulers *may* be working. A particularly interesting thread I found was this - https://groups.google.com/forum/#!topic/google-appengine/sA3o-PTAckc%5B26-50%5D

Based on my understanding so far, it looks like new requests will be routed to the new instance as soon as it is "instantiated", without necessarily waiting for it to actually be up (or they may be routed based on some thresholds of the request rates). Obviously, this will put the instance startup latencies right in the path of the user response times. 

If this is indeed true, this is going to be a terrible experience for anyone that is not operating at, well, essentially Google scale :). I can imagine that this works very well at massive scale (and that may be the great benefit of using GAE) - but, an app wouldn't live to see that day if, in its initial days, it is providing a sucky UX due to long response times. 

Our own experience matches this explanation - with min pending latency values of 30ms (we left it at default), a new instance is getting spun up the minute there are any requests in the queue whatsoever - which means the new requests are routed immediately to the new instance, while the resident instance remains idle practically all the time. 

I would imagine that the scheduler would adapt to particular app startup times + total number of operating instances to determine when to move new requests over to a new instance. 

Given little to no official information about how the scheduler works, I'd like to understand if the experimentation and observations that the GAE community has developed is in fact correct - otherwise, we're going to be launching into a mini manual Monte Carlo simulation to really tune the knobs that may work for our case. The risk here, of course, is that as soon as the operating parameters change for our app, we need to be re-running the simulation. 

I have to say that the GAE docs leave a lot to be desired. For the number of people that have suffered through this topic, one would hope by now that there are more insights on what's actually going on and how to maximize the efficiency of app engine. 

Unfortunately, "optimize your app's startup times" is a necessary but insufficient answer. As I wrote, unless we're talking 100ms app startup times, in today's expected UX metrics, we aren't going very far with this. 

All that said, we're currently starting to experiment with longer min pending time values to see if that causes the resident instances to kick in. It looks like some people have had luck with that. This will obviously cause problems at scale, but, if it solves our problem at the small scales now, we'd start there and change it as we scale up. 

One more hack to our list there - but, if you have better suggestions, I'd like to know. 

Thanks,
Vidya 

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/0ZBLsyc51gk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.

Jordan (Cloud Platform Support)

unread,
Oct 28, 2016, 8:58:06 PM10/28/16
to Google App Engine
Hey Vidya.

You are correct that the instance start time is greatly based on your code, as each time a new instance is created it must load and prepare a fresh copy of your code to serve.

As for the reason why you are seeing a single instance handling the bulk of your requests, this comes down to the App Engine scheduler as you have mentioned. The scheduler will simply ask the first instance if it can handle a request. Based on your scaling configuration for pending latency and concurrent requests, your first instance will tell the scheduler that it can handle an extra request, and so it does; leaving the rest of your instances waiting to handle any overflow. 

If App Engine thinks you may need an extra instance warmed up just in case of overflow, it will create one. This is why you see a single Dynamic instance at the bottom handling no requests. Again, App Engine sends requests to Dynamic instances and not idle Resident instance. If there is no available Dynamic instance, your Resident Instance will be treated as a Dynamic instance and a new Resident Instance will be kicked up to meet your configured minimum idle instances.  

To configure your scaling options to force requests to be more spread across available instances, simply reduce the amount of concurrent requests a single instance is allowed to handle, reduce the minimum pending latency a request is allowed to wait in an instance's pending queue for, and reduce the max pending latency to force a request to be handled by a new instance after a period of time. Note, I would not recommend setting any of these to zero forcing each request to be handled by a single instance. This is because you still want multiple requests to be handled by each instance, to balance cost and performance. 

Continue to use the Stackdriver Trace tool to see the breakdown of latency for requests, and use this to configure the optimal scaling settings for your app so that requests are not waiting too long in a pending queue for other requests in front of it to finish. Ideally optimizing your code to execute requests very quickly in an asynchronous style (such as using the Task Queue to perform long image manipulation tasks instead of forcing a user to wait) will make your application scalable for Cloud computing. 
Reply all
Reply to author
Forward
0 new messages