Does App Engine Flexible for Python support concurrent requests?

Parth Mishra

unread,

Jun 30, 2018, 7:03:48 PM6/30/18

to Google App Engine

From the [documentation](https://cloud.google.com/appengine/docs/flexible/python/how-requests-are-handled) on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.

Let's say my application can process a single request every 60 seconds.

After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold.

I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold.

The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure.

Kenworth (Google Cloud Platform)

unread,

Jul 2, 2018, 2:00:56 PM7/2/18

to google-a...@googlegroups.com

1- I assume the 60-second value was just some random number. This is because the deadline for requests to frontend instances is 60 seconds. Otherwise, you will be hit by DeadlineExceededErrors.

2- Concurrent, by definition, means an event can exist/happen or be done at the same time. This means it does not have to wait for a single task to finish before executing the next one. GAE Flex environment automatically scales your app and down while balancing the load. Here is an article explaining the instance behavior depending if the application is set to manual or automatic scaling.

Parth Mishra

unread,

Jul 2, 2018, 6:37:18 PM7/2/18

to Google App Engine

1. Isn't the deadline exceeded error only for standard App engine or does it apply to both? I was testing this today and I could make requests with latency return times of much greater than 60s

2. I guess I'm confused on why the app.yaml documentation for Flexible omits things such as 'max_concurrent_requests' and 'target_cpu_throughput'. Do those concepts of concurrency not apply to flexible? The only way to scale in the flexible engine is to reach a target CPU threshold, but what if I want to scale based on request latency as well? Somewhat similarly, I want to know if I can configure my web framework to accept new requests without waiting for an existing one to finish. Does that clear up what I'm trying to achieve?

Ani Hatzis

unread,

Jul 3, 2018, 4:34:25 AM7/3/18

to google-a...@googlegroups.com

On Tue, Jul 3, 2018 at 12:37 AM Parth Mishra <pmishr...@gmail.com> wrote:

1. Isn't the deadline exceeded error only for standard App engine or does it apply to both? I was testing this today and I could make requests with latency return times of much greater than 60s

Correct, request timeout limit is 60 seconds for standard and 60 minutes for flexible (see comparison here).

2. I guess I'm confused on why the app.yaml documentation for Flexible omits things such as 'max_concurrent_requests' and 'target_cpu_throughput'. Do those concepts of concurrency not apply to flexible? The only way to scale in the flexible engine is to reach a target CPU threshold, but what if I want to scale based on request latency as well? Somewhat similarly, I want to know if I can configure my web framework to accept new requests without waiting for an existing one to finish. Does that clear up what I'm trying to achieve?

Scaling is indeed different between standard and flexible, see short description here. The "target_cpu_throughput" setting of standard is "cpu_utilization : target_utilization" in flexible though.

On Monday, July 2, 2018 at 2:00:56 PM UTC-4, Kenworth (Google Cloud Platform) wrote:
1- I assume the 60-second value was just some random number. This is because the deadline for requests to frontend instances is 60 seconds. Otherwise, you will be hit by DeadlineExceededErrors.

2- Concurrent, by definition, means an event can exist/happen or be done at the same time. This means it does not have to wait for a single task to finish before executing the next one. GAE Flex environment automatically scales your app and down while balancing the load. Here is an article explaining the instance behavior depending if the application is set to manual or automatic scaling.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/52f798a0-9ff7-4df1-8551-b8815ee61142%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward