Instances restart on reaching maximum number of requests

Iliya Novikov

unread,

Jul 18, 2016, 5:02:48 AM7/18/16

to Google App Engine

Hello,

We are experiencing an issue in the following setup. We have a module that processes tasks from a push queue. The module is using automatic scaling. Normally it runs around 8 instances and processes around 200 tasks per second. Recently I noticed that there are spikes in the App Engine Dashboard in the Latency graph. (Worths mentioning that the spikes seem to occur more or less exactly once an hour) So I checked the logs and found the following pattern:

All of a sudden all the instances start to report the following: "After handling this request, the process that handled this request reached the maximum number of requests that may be handled in a single process' lifetime, and exited normally." I mean one by one all of them will throw this message.
Next you see a bunch of the following messages: "This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.". Each request reporting this lasts like a few seconds so it truly "takes longer" as the message says.

Ultimately what seems to happen is that all our instances restart at the same time which means that for a few seconds there are no instances ready to serve. This generates huge latency spikes that we surely would like to eliminate if possible. Is there something we could do on this?

Thank you,

Iliya.

Martin Öjes

unread,

Jul 18, 2016, 6:34:56 AM7/18/16

to Google App Engine

We have experienced the same problem. It seems that Google changed the way GAE handles Warmup requests. We solved our issues by adding a warmup request handler to app.yaml and a small script just echoing an OK message.

The updated documentation page (for PHP runtime), with a working example:

https://cloud.google.com/appengine/docs/php/warmup-requests/configuring

Best regards
Martin

Iliya Novikov

unread,

Jul 18, 2016, 11:37:29 AM7/18/16

to Google App Engine

Thanks for reply, Martin. I checked with the documentation and according to it we have warmup requests configured correctly. Though it reminded me of one thing. I have never seen a warmup request sent to a running App Engine application. I can see them only when I deploy a new version. It seems that none of my instances started by the automatic scaling never gets a warmup request. I guess this is wrong, no? Unfortunately the documentation is vague on this: "App Engine might not issue a warmup request every time your application needs a new instance".

Adam (Cloud Platform Support)

unread,

Jul 18, 2016, 1:39:25 PM7/18/16

to Google App Engine

Would you mind posting the details from your app.yaml?

Martin Öjes

unread,

Jul 18, 2016, 8:41:53 PM7/18/16

to Google App Engine

Documentation mention that behavior like so:

"If warmup requests are enabled for your application, App Engine attempts to detect when your application needs a new instance and initiates a warmup request to initialize a new instance. However, these detection attempts do not work in every case. As a result, you might encounter loading requests, even if warmup requests are enabled in your app. For example, if your app is serving no traffic, the first request to the app will always be a loading request, not a warmup request."

That being said, I see warmup requests in the log all the time.
Our configuration looks like this:

automatic_scaling:
 min_idle_instances: 2
 max_idle_instances: automatic 
 min_pending_latency: 30ms 
 max_pending_latency: automatic

Martin

Iliya Novikov

unread,

Jul 19, 2016, 7:06:53 AM7/19/16

to Google App Engine

Sure. We use Java, here is the content of appengine-web.xml:

<instance-class>F2</instance-class>

<automatic-scaling>

<min-idle-instances>0</min-idle-instances>

<max-idle-instances>6</max-idle-instances>

<min-pending-latency>60ms</min-pending-latency>

<max-pending-latency>120ms</max-pending-latency>

</automatic-scaling>

<warmup-requests-enabled>true</warmup-requests-enabled>

Adam (Cloud Platform Support)

unread,

Jul 22, 2016, 3:47:54 PM7/22/16

to Google App Engine

Since by your account the module hovers at about 8 instances which all get proactively restarted at the same time, this suggests that scaling doesn't happen gradually and that the load on that module is fairly constant.

Keeping a minimum number of idle instances will help with latency spikes when they occur, however you may be better suited to use manual scaling at 8 instances to avoid the proactive restart issue and use the setNumInstances() method from the Modules Service to manually scale up or down if the load needs to change.

Marcel Manz

unread,

Aug 1, 2016, 11:40:43 AM8/1/16

to Google App Engine

Hi

I second Adam's suggestion to use manual scaling in order to not run into any latency issues. We're running on manual scaling for several years now and this proved the only scaling method guaranteeing no latency issues.

Eg. when we tried with basic-scaling, we discovered that:

1. There is no min-instances parameter. In worst case app engine will scale down to 1 instance, but if you encouter a sudden traffic spike there won't be any prewarmed, unused instances. We would like to see here a config option where min-instances can be configured on basic-scaling.

2. Even though we have warmup requests configured, some requests still went to newly created instances, thus adding latency for the time it takes to serve the first request.

As our application has to deliver consistent performance, we chose manual-scaling instead.

@Adam, regarding setNumInstances() method: Is there some kind of method to programmatically retrieve the active instances value, as like the active green line as shown in the dashboard when 'Instances' is selected? Or similar the current latency value when 'Latency' is selected?

For our application it seems a better fit to use manual-scaling, but use a custom load balancer control function that will use setNumInstances() to scale the instances depending on load.

Thanks & Regards
Marcel

Iliya Novikov

unread,

Aug 8, 2016, 6:49:41 AM8/8/16

to Google App Engine

Thanks for your replies, Adam and Marcel,

After a few days of experimenting with settings for the automatic scaling all of a sudden the warm-up requests started to work for us. This is something I can't really explain. My guess is that this has changed after I set up some non-zero "min-idle-instances" (previously it always was 0). The most interesting is that after I noticed this change I rolled back all the settings to as they were and the warm-up requests are still coming. So ultimately the system works well with exactly the same settings we always had. This is a total mistery, but the problem seems to be solved.

Thank you,

Iliya.

Fredrik Bertin Fjeld

unread,

Oct 16, 2017, 10:09:20 AM10/16/17

to Google App Engine

With regarding to the experienced logging message stating: "After handling this request, the process that handled this request reached the maximum number of requests that may be handled in a single process' lifetime, and exited normally."