Python Async gRPC Queueing Requests

1,534 views
Skip to first unread message

Raj Agarwal

unread,
Mar 26, 2022, 4:30:02 AM3/26/22
to grpc.io
Hello,

SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.

Problem:
Requests on our gRPC microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned many pods to handle concurrent requests. When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
1. The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap).
2. The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.

What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's routing doesn't work well for our situation (we need to look into this further, however that will require a big overhaul to our system).

One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client-retry. The client-retry will hopefully hit a different pod (this is something I'm trying to test as a part of this). In order to do this, I set the "maximum_concurrent_rpcs" to 1. When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions.

I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. So I tried:

channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)

However, I'm not seeing any change here either. (Note, even after I get this client working, eventually i'll need to make this run in golang. Any help there would be appreciated as well).

Questions:
1. How can I get the desired effect here?
2. Is there some way to ensure that at least the earlier requests don't get starved by the new requests?
3. Any other advice on how to fix this issue? I'm grasping at straws here.

Thank you!

Lidi Zheng

unread,
Mar 28, 2022, 1:50:11 PM3/28/22
to grpc.io
Hi,

Thanks for the clear description of the architecture, and problem statement.

Issue 1: Why there isn't any RESOURCE_EXHAUSTED?

Based on the description "Requests on our gRPC microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile.", the handling logic of the RPC might be CPU-intensive? The concurrency in AsyncIO is achieved by the coroutines (async gens) yield the control back to the event loop, when it's doing IO or waiting on another resource. If one coroutine (in this case, the RPC method handler) consumes 8s of CPU continuously, it could starve other coroutines. This means even if there is incoming request, the AsyncIO stack won't have cycles to read them from kernel and reject with RESOURCE_EXHAUSTED.

If you hope to increase the parallelism of CPU-intense tasks and still use AsyncIO, I would recommend to use the AsyncIO Executors: https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools

Issue 2: Uneven RPC distribution.

This is controlled by the loadbalancer and sidecars.

---

Question 1: How can I get the desired effect here?

For AsyncIO servers, you can use the AsyncIO Executors as stated above.

For service meshes, there is usually several timeout options. In Envoy, there is HttpConnectionManager.request_timeout: https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto. It disarms the timer when the first byte of the response header is received. So, we can change the RPC service to streaming, then ask the backend to send initial metadata when it decided to handle this request. In this way, we can set the request_timeout to a relatively short duration, then when a bad assignment occur it can recover quicker.

Question 2: Is there some way to ensure that at least the earlier requests don't get starved by the new requests?

In AsyncIO, there can be only one coroutine takes control at a time. See AsyncIO Executors.

Question 3: Any other advice on how to fix this issue?

See above.

Cheers,
Lidi Zheng
Reply all
Reply to author
Forward
0 new messages