(gRPC-java) How do I implement service throttling?

Ryan Michela

unread,

Feb 26, 2017, 2:17:30 AM2/26/17

to grpc.io

I'd like to implement concurrency throttling for my gRPC-java services so I can limit the number of concurrent executions of my service and put a reasonable cap on the queue for waiting work. One way I've found to do this is to use a custom Executor that limits the number of concurrent tasks and task queue depth. The downside of this approach is that it applies to all services hosted in a server. I've studied the code and there does not appear to be a a way for the server Executor to know which service and operation is being requested.

What is the correct way to implement different throttle policies for individual service running in a Server? Do I really have to create a unique Server instance (with associated port) for every distinct throttle policy?

Finally, would a PR to allow for per-service Executors be accepted?

Carl Mastrangelo

unread,

Feb 27, 2017, 3:53:58 PM2/27/17

to grpc.io

What should happen if a request comes in and the server cannot handle it? Fail the request immediately? Queue it? Drop the connection? Queue with dropping if overloaded? You can do most of these from your application without getting gRPC involved. If you don't want to even parse the request, you can disable automatic inbound flow control, and simply not call request. The data will queue up until the flow control window is empty and the client will stop sending.

Ryan Michela

unread,

Feb 27, 2017, 11:38:39 PM2/27/17

to grpc.io

What should happen if a request comes in and the server cannot handle it? Fail the request immediately? Queue it? Drop the connection? Queue with dropping if overloaded?

By adding my own Executor to the server (ServerBuilder.executor()) I can control much of this. By fixing the max number of threads I can control the maximum concurrency. By setting the Executor's queue depth, I can control how many backed up requests I'll accept before turning requesters away. What I can't do is control this on a service-by-service basis. I can only set an Executor for the Server. I'd hate to have to allocate full Server and port for each unique throttling policy.

What I'd really like is for the Task created for each request to include some kind of metadata about the request so that I can write my own intelligent Executor that enforces throttles and backlog queues on a service-by-service or even operation-by-operation basis.

Carl Mastrangelo

unread,

Mar 1, 2017, 2:35:27 PM3/1/17

to grpc.io

I think you are missing part of the picture. If a client sends a request to you, you have to either put it somewhere, or drop it. The OS will buffer the TCP bytes, but will stall all the other incoming bytes as well. Even if the userspace appliation, grpc, ignores those bytes, they are still stuck in a buffer on your machine. If this buffer grows too much, the OS is going to start dropping packets, and make the connection speed plummet. The client will see this as deadline exceeded errors. The server will see this as head-of-line blocking. This is a bad way to handle overload.

If you accept the data from the OS, into your userspace program, you have control over it. You can prioritize it over other requests. (Remember, HTTP/2 multiplexes many requests over a single connection. If you stall the connection, the RPCs cannot be handled). Rather than letting clients hang and repeatedly try to reconnect, and send more data than you can buffer, you can directly fail the RPCs with an error indicating that clients should back off. It is cheap in terms of processing, and decreases latency in overload scenarios. This is a good way to handle overload.

Fixing the number of threads in the Server is tantamount to the bad way. The callbacks from the network thread (Netty in this case) will show up on the queue of the executor. The data will build up there until you OOM. If you limit the size of that queue and drop overload, callbacks will not fire, and your application threads will hang indefinitely. If you don't drop overload, the network threads will hang trying to get a spot in the executor queue, and will stop reading from the network. You are back to the first, bad solution.

Limiting the concurrency on an HTTP/2 basis only works per connection. It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs from coming in. You will OOM.

The only other acceptable way to push back on clients is with Flow control signals. You can not call request() on the server call, which gRPC will convert to HTTP/2 window updates, and you can check isReady() to see if sending would block.

I think your problem is solved in ways other than the one you think you need. You are assuming too much about gRPC works. If you are afraid of malicious users, add auth.

Ryan Michela

unread,

Mar 1, 2017, 4:42:25 PM3/1/17

to grpc.io

OK, then let's back up.

As an example, my DBAs tell me I cannot have unlimited number of concurrent connections to the database. Let's call the number 20. I need to restrict the number of concurrent executions of my service to no more than 20. During periods of high load, I may receive requests faster than my 20 concurrent executions can handle, causing incoming calls to back up waiting for available resources. When a backup occurs, I want to start turning requests away immediately if more than 10 requests are backed up. This way the caller can try another host rather than keep hammering the overloaded host.

What is the correct way to implement the above throttle policy in gRPC? I need to implement a standardized throttling behavior for all my services in a way that doesn't require that every developer of every service to insert the same brittle boilerplate in their services.

Carl Mastrangelo

unread,

Mar 3, 2017, 12:28:29 AM3/3/17

to grpc.io

Use a Semaphore and call tryAcquire() it before calling request(). Share the semaphone between all your Server handlers. You can do this with a ServerInterceptor. If tryAcquire() times out, you can immediately fail the request. It is up to you how long you are willing to wait before dropping load.

Note that calling request() causes a message to be parsed and delivered to your StreamObserver or ServerCall.Listener, which ever API you are using.

That said, at least internally, many services deal with the problem of overload by using loadbalancing, and provisioning up serving capacity. Each team has a different idea about how to handle overload, so we give you the controls to deal with it rather than make a choice that may not be applicable to all users.

Ryan Michela

unread,

Mar 26, 2018, 1:35:33 PM3/26/18

to grpc.io

Netflix has releases a gRPC compatible adaptive client and server throttle for Java.