Multiple CallData on Server-side and performance

194 views
Skip to first unread message

Sachin Gaikwad

unread,
May 3, 2016, 8:17:55 AM5/3/16
to grp...@googlegroups.com
Hi all,

Short-story:

In C++ examples given in grpc git repository, usually we register 1 CallData to Cq and next CallData is registered only when 1 request is *being processed* (enum state_ = PROCESS).

Questions:
1) What if we have multiple CallData registered with Cq? Will it improve QPS (query-per-second) or latency?

2) Where is parsing of RPC args, decompressions, etc. done? Thread which blocks in Cq->Next() does that after waking up because of event? or some background thread does this?

Long story:

I wrote a grpc benchmark which has 2 threads on server and 2 threads on client.
Server: 1 thread does Cq->Next(). When it receives event, processing of RPC is done another thread.
Client: 1 thread does Cq->AsyncNext(). When it received event, processing of response is done in another thread.

Now, I was registering only 1 CallData with Server and I got these numbers in my benchmark:

My grpc benchmark with 1 CallData:
QPS: 608
Latency (50/95/99 percentile): 12788/15605/15861 usecs

qps_driver benchmark performed likes this:

qps benchmark with 10000 CallData: (1 thread)
QPS: 948
Latency (50/95/99 percentile): 10814/11752/12382 usecs

When I started comparing my grpc benchmark with test/cpp/qps/ benchmark, I found that in qps_worker/qps_driver benchmark, we register multiple CallData (to be precise 10000) at startup.

I added a gflag in my grpc benchmark for number of CallData to register on Server-side. I ran benchmark with this flag set to 2 i.e. register 2 CallData with Server Cq. Here are the numbers I got:

My grpc benchmark with 2 CallData:
QPS: 859
Latency: (50/95/99 percentile): 9098/10880/11376 usecs

There is significant increase in QPS (from 608 to 859) and drop in latency as well. I am not able to understand why having multiple CallData registered with Cq helps? Can anyone explain me this?

Thanks,
Sachin

Sachin Gaikwad

unread,
May 4, 2016, 2:53:34 AM5/4/16
to grp...@googlegroups.com
Anyone has any thoughts on these 2 questions below?

Questions:
1) What if we have multiple CallData registered with Cq? Will it improve QPS (query-per-second) or latency? As per my results, having multiple CallData registered with Server improves QPS and latency - but I don't understand why?

2) Where is parsing of RPC args, decompressions, etc. done? Thread which blocks in Cq->Next() does that after waking up because of event? or some background thread does this?

Sachin

Craig Tiller

unread,
May 4, 2016, 11:01:05 AM5/4/16
to Sachin Gaikwad, grp...@googlegroups.com

1) is certainly true: the server fast path is when a call has already been requested by the application code, and the slow path is when there has been no such request.

2) metadata parsing happens on whatever thread picks up the data. Decompression is done on the thread that receives the cq event, as does protobuf parsing. There are no* background threads.

* except for name resolution


--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAKtXhSOZ4CngcEr%2B_jSWb5vL9a1DHS_SyjBG6PO6R0oYhcVTGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sachin Gaikwad

unread,
May 4, 2016, 3:03:50 PM5/4/16
to Craig Tiller, grp...@googlegroups.com
Thanks Craig.

On Wed, May 4, 2016 at 8:30 PM, Craig Tiller <cti...@google.com> wrote:

1) is certainly true: the server fast path is when a call has already been requested by the application code, and the slow path is when there has been no such request.

Can you point me to gprc code to read for "fast path" and "slow path"?

Sachin

Craig Tiller

unread,
Jun 1, 2016, 11:33:12 AM6/1/16
to Sachin Gaikwad, grp...@googlegroups.com
Take a look at publish_new_rpc() in src/core/lib/surface/server.c.

The fast path is whenever we can call publish_call() directly from that function, the slow path is when we get to the end, need to take server->mu_call, and queue the incoming rpc until it's requested.
Reply all
Reply to author
Forward
0 new messages