completion queue distribution, memory footrpint

43 views
Skip to first unread message

TataG

unread,
Oct 3, 2021, 7:33:21 PM10/3/21
to grpc.io
Hi gRPC group members.

I was looking how the RPC's are distributed on server side. 
In grpc_bench c++ multithread example https://github.com/LesnyRumcajs/grpc_bench/blob/master/cpp_grpc_mt_bench/main.cpp, they have multiple threads handling RPC's and a completion queue per thread, I assume according to performance best practices recommendations https://grpc.io/docs/guides/performance/
client is  https://gist.github.com/ppLorins/f30e6a2e14c3738288200b43a803b122 with  difference that Async completion threads are opened prior to sending the data and client is sending unary generic rpc's.
When client side is sending to that specific server using 10 channels 100K unary RPC's per channel I can see that RPC's are processed unevenly on server side, some thread with its own completion queue got more, some got less. As each thread is doing the same exact processing, I expected that by the ned of processing there would be an even number of processed events. 
May I ask if it is expected and why?

Another question regarding overall performance. For example I can see that client can send data faster than server can process it. Because of that at some point, client has to wait until it can continue sending data. No matter if client has 1 channel and sends 100K unary requests, and if server has 10 threads and a completion queue per thread. Multiple threads sometimes even had worse performance than a single thread on server side.
This results in growth of RAM memory consumption on client side to very large numbers. There are no memory leaks(direct and indirect), but even after all RPC's are processed and all channels are destroyed, the memory footprint doesn't go down.
May I ask for any suggestions in this regards?

I'm using generic way as I have own serialization, and unary as I was planning to create multiple servers eventually as pods, and use load balancing to send RPC's in round robin manner to them.

Thank you in advance!


Mark D. Roth

unread,
Oct 6, 2021, 2:32:59 PM10/6/21
to TataG, grpc.io
On Sun, Oct 3, 2021 at 4:33 PM TataG <tatiana...@gmail.com> wrote:
Hi gRPC group members.

I was looking how the RPC's are distributed on server side. 
In grpc_bench c++ multithread example https://github.com/LesnyRumcajs/grpc_bench/blob/master/cpp_grpc_mt_bench/main.cpp, they have multiple threads handling RPC's and a completion queue per thread, I assume according to performance best practices recommendations https://grpc.io/docs/guides/performance/
client is  https://gist.github.com/ppLorins/f30e6a2e14c3738288200b43a803b122 with  difference that Async completion threads are opened prior to sending the data and client is sending unary generic rpc's.
When client side is sending to that specific server using 10 channels 100K unary RPC's per channel I can see that RPC's are processed unevenly on server side, some thread with its own completion queue got more, some got less. As each thread is doing the same exact processing, I expected that by the ned of processing there would be an even number of processed events. 
May I ask if it is expected and why?

The CQ-based async API can't provide really perfect balancing across CQs.  Some amount of imbalance between threads is expected.

In general, we are trying to move away from the CQ-based async API and towards the new callback-based async API, described in https://github.com/grpc/proposal/blob/master/L67-cpp-callback-api.md.  You might try that out and see if it solves this balancing problem for you, although it may be a bit of a trade-off right now, since there are still other some performance issues with the new callback-based API that will be resolved once the EventEngine migration is complete (see https://github.com/grpc/proposal/pull/245 for details; ETA for that is probably still about 6 months away).
 

Another question regarding overall performance. For example I can see that client can send data faster than server can process it. Because of that at some point, client has to wait until it can continue sending data. No matter if client has 1 channel and sends 100K unary requests, and if server has 10 threads and a completion queue per thread. Multiple threads sometimes even had worse performance than a single thread on server side.
This results in growth of RAM memory consumption on client side to very large numbers. There are no memory leaks(direct and indirect), but even after all RPC's are processed and all channels are destroyed, the memory footprint doesn't go down.
May I ask for any suggestions in this regards?

It's hard to say what's happening here without a heap profile of some sort.  Note that most gRPC objects are actually ref-counted internally, so even after you "destroy" them via the API, they may stick around for a little while longer waiting for final callbacks to be invoked for everything to get cleaned up.  So you could try just waiting a few seconds after you destroy the channels to see if the memory usage goes down.
 

I'm using generic way as I have own serialization, and unary as I was planning to create multiple servers eventually as pods, and use load balancing to send RPC's in round robin manner to them.

Thank you in advance!


--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/4b165577-26b1-4496-a371-a127c9596608n%40googlegroups.com.


--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.
Reply all
Reply to author
Forward
0 new messages