I am using grpc for bidirection streaming service. Beause this is an online real-time service, and we don't want our clients to wait too long, so we want to limit the rpcs to a certain number and reject the extra requests.
We are able to do it in Python with following code:
```
grpcServer = grpc.server(futures.ThreadPoolExecutor(max_workers=10), maximum_concurrent_rpcs=10)
```
In such case, only 10 requests will be processed, and other request will be rejected, the clients will throw a RESOURCE_EXHAUSTED error.
However, we find it hard to do it in C++. We use code following
https://stackoverflow.com/questions/52311876/grpc-sync-server-limit-handle-thread```
ServerBuilder builder;
builder.SetSyncServerOption(ServerBuilder::SyncServerOption::NUM_CQS, 10);
grpc::ResourceQuota quota;
quota.SetMaxThreads(10 * 2);
builder.SetResourceQuota(quota);
```
We are using grpc in many services. Some using kaldi, some using libtorch.
In some cases, the above code behaves normally, it processes 10 requests at a time, and rejects other requests. and the processing speed (our service requires a log of cpu calculation) is ok.
In some cases, it only accepts 9 request at a time.
In some cases, it accepts 10 request at a time, but the processing speed is significantly lower than before.
We also tried
```
builder.AddChannelArgument(GRPC_ARG_MAX_CONCURRENT_STREAMS, 10);
```
But it is useless becuase GRPC_ARG_MAX_CONCURRENT_STREAMS is the max_concurrent_rpcs on a single http connection.
Could someone please point out the equivalent C++ version of the Python code? We want to make our service to handle 10 requests at a time, and rejects other request, we do not want any service to wait in the queue.