flex env concurrent requests

166 views
Skip to first unread message

Venugopal Thotakura

unread,
Jul 9, 2019, 2:40:25 PM7/9/19
to Google App Engine
We have built an app using aiohttp & asyncio.. We want to run our app on single threaded env (since we have in-memory data, which needs to re-used across all requests) with asyncio.

We tried to run this on Appengine with flex env (manual scaling, with 1vCPU), all works fine except the requests are processed one after another.. When I add one more 1vCPU, then requests were able to process concurrently (2 requests concurrently). so GAE is not routing request to app based on the number of cores (tried to run direct python script and run via gunicorn)

is there a way to solve this with GAE? we have rate limit to handle concurrent requests.. I am presuming, its not possible with app engine, could you please confirm this?

Thanks in advance.

Sam (Google Cloud Support)

unread,
Jul 12, 2019, 7:58:38 PM7/12/19
to google-a...@googlegroups.com
Note that you are only using "manual_scaling: instances: 1". Therefore you only have one instance to accept requests.

The web server (Nginx) in front of your application code accepts the request from the load balancer and attempts to route it to your proper service in your code. If your code is too busy to respond (meaning it is blocking on an older request) the Nginx proxy will timeout (after retires) and tell the Load Balancer "502 Bad Gateway". It is then up to the client to retry sending requests until your application code is free to accept new requests.

- Therefore it is recommended to ensure that your application never blocks on a single request and is able to handle concurrent requests. As per the documentations [1], the default Python Gunicorn config only uses one worker which is only able to handle a single request (aka no concurrent requests). It is therefore recommended to increases the number of workers as explained in [1] and to use ASYNC workers to allow your single instance to accept concurrent requests [2].

- Once your single instance is able to handle more requests it may then become over-worked (depending on your amount of incoming traffic) and bottleneck on the CPU. It is then recommended to either increase the instance resources and/or use more than one instance (normally automatic scaling is recommended for high traffic applications). 

[1] https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration
[2] http://docs.gunicorn.org/en/latest/design.html#async-workers
Reply all
Reply to author
Forward
0 new messages