I think you are missing part of the picture. If a client sends a request to you, you have to either put it somewhere, or drop it. The OS will buffer the TCP bytes, but will stall all the other incoming bytes as well. Even if the userspace appliation, grpc, ignores those bytes, they are still stuck in a buffer on your machine. If this buffer grows too much, the OS is going to start dropping packets, and make the connection speed plummet. The client will see this as deadline exceeded errors. The server will see this as head-of-line blocking. This is a bad way to handle overload.
If you accept the data from the OS, into your userspace program, you have control over it. You can prioritize it over other requests. (Remember, HTTP/2 multiplexes many requests over a single connection. If you stall the connection, the RPCs cannot be handled). Rather than letting clients hang and repeatedly try to reconnect, and send more data than you can buffer, you can directly fail the RPCs with an error indicating that clients should back off. It is cheap in terms of processing, and decreases latency in overload scenarios. This is a good way to handle overload.
Fixing the number of threads in the Server is tantamount to the bad way. The callbacks from the network thread (Netty in this case) will show up on the queue of the executor. The data will build up there until you OOM. If you limit the size of that queue and drop overload, callbacks will not fire, and your application threads will hang indefinitely. If you don't drop overload, the network threads will hang trying to get a spot in the executor queue, and will stop reading from the network. You are back to the first, bad solution.
Limiting the concurrency on an HTTP/2 basis only works per connection. It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs from coming in. You will OOM.
The only other acceptable way to push back on clients is with Flow control signals. You can not call request() on the server call, which gRPC will convert to HTTP/2 window updates, and you can check isReady() to see if sending would block.
I think your problem is solved in ways other than the one you think you need. You are assuming too much about gRPC works. If you are afraid of malicious users, add auth.