Bare server throughtput limitation?

51 views

Skip to first unread message

dyla...@gmail.com

unread,

May 19, 2020, 11:14:27 AM5/19/20

to grpc.io

I am designing a neural network inference server and I have built my server and client using a synchronous grpc model, with a unary RPC design. For reference, the protobuf formats are based on the Nvidia Triton Inference server formats https://github.com/NVIDIA/triton-inference-server. My design expects a large batch of inputs (16384, for a total size of 1MB) to be received by the server, the inference to be run, and then the result to be returned to the client. I send these inputs in a repeated bytes field in my protobuf. However, even if I make my server-side function simply return an OK status (no actual processing), I find that the server can only process ~1500-2000 batches of inputs per second (this is run with both server and client on the same machine so network limitations should not be relevant). However, I know that my inference processing can handle throughputs closer to 10000 batches/second.

Is there an inherent limitation to the number of requests that a gRPC server can handle per second? Is there a server setting or design change I can make to increase this maximum throughput?

I am happy to provide more information if it can help in understanding my issue.

Thanks for your help,

-Dylan

yas...@google.com

unread,

May 28, 2020, 7:25:55 PM5/28/20

to grpc.io

Hi,

Though there is no inherent limitation to the number of requests that a server can handle, the sync processing model has not been the target for performance optimizations. The Async model would be better suited for performance optimizations, and you might be able to increase your throughput.

Reply all

Reply to author

Forward

0 new messages