Correct me if I'm wrong: Basically
if the ThreadPoolExecutor's max_workers
is less than maximum_concurrent_rpcs
then after all the threads get busy processing requests, the next request will be queued and will be processed when a thread finishes its processing. If my server already processing maximum_concurrent_rpcs
number of requests concurrently, and yet another request is received, the request will be rejected immediately.
I have a gRPC server as simple as below:
self.server = grpc.server(futures.ThreadPoolExecutor(max_workers=1), maximum_concurrent_rpcs=1000)
self.server.add_insecure_port('{}:{}'.format(host, port))
self.server.start()
Now with worker equal to 1, I should reach 1000 RPS to exhaust resources/threads. When I start the server I just can send just ONE rpc call per second! Now when I increase max_workers to 2, I can send 2 RPC calls per second. Why
maximum_concurrent_rpcs is ignored? If I increase max_workers to 3 I can send 3 RPC calls at the same time.
To test this behavior I had to put sleep(3) in my exposed gRPC method.
Can someone explain why it is bound to max_workers? As far as I know max_workers is close to CPU cores:
if max_workers is None:
# Use this number because ThreadPoolExecutor is often
# used to overlap I/O instead of CPU work.
max_workers = (os.cpu_count() or 1) * 5
if max_workers <= 0:
raise ValueError("max_workers must be greater than 0")