Multiple CallData on Server-side and performance

Sachin Gaikwad

unread,

May 3, 2016, 8:17:55 AM5/3/16

to grp...@googlegroups.com

Hi all,

Short-story:

In C++ examples given in grpc git repository, usually we register 1 CallData to Cq and next CallData is registered only when 1 request is *being processed* (enum state_ = PROCESS).

Questions:
1) What if we have multiple CallData registered with Cq? Will it improve QPS (query-per-second) or latency?

2) Where is parsing of RPC args, decompressions, etc. done? Thread which blocks in Cq->Next() does that after waking up because of event? or some background thread does this?

Long story:

I wrote a grpc benchmark which has 2 threads on server and 2 threads on client.

Server: 1 thread does Cq->Next(). When it receives event, processing of RPC is done another thread.

Client: 1 thread does Cq->AsyncNext(). When it received event, processing of response is done in another thread.

Now, I was registering only 1 CallData with Server and I got these numbers in my benchmark:

My grpc benchmark with 1 CallData:

QPS: 608

Latency (50/95/99 percentile): 12788/15605/15861 usecs

qps_driver benchmark performed likes this:

qps benchmark with 10000 CallData: (1 thread)

QPS: 948

Latency (50/95/99 percentile): 10814/11752/12382 usecs

When I started comparing my grpc benchmark with test/cpp/qps/ benchmark, I found that in qps_worker/qps_driver benchmark, we register multiple CallData (to be precise 10000) at startup.

I added a gflag in my grpc benchmark for number of CallData to register on Server-side. I ran benchmark with this flag set to 2 i.e. register 2 CallData with Server Cq. Here are the numbers I got:

My grpc benchmark with 2 CallData:

QPS: 859

Latency: (50/95/99 percentile): 9098/10880/11376 usecs

There is significant increase in QPS (from 608 to 859) and drop in latency as well. I am not able to understand why having multiple CallData registered with Cq helps? Can anyone explain me this?

Thanks,

Sachin

Sachin Gaikwad

unread,

May 4, 2016, 2:53:34 AM5/4/16

to grp...@googlegroups.com

Anyone has any thoughts on these 2 questions below?

Questions:
1) What if we have multiple CallData registered with Cq? Will it improve QPS (query-per-second) or latency? As per my results, having multiple CallData registered with Server improves QPS and latency - but I don't understand why?

2) Where is parsing of RPC args, decompressions, etc. done? Thread which blocks in Cq->Next() does that after waking up because of event? or some background thread does this?

Sachin

Craig Tiller

unread,

May 4, 2016, 11:01:05 AM5/4/16

to Sachin Gaikwad, grp...@googlegroups.com

1) is certainly true: the server fast path is when a call has already been requested by the application code, and the slow path is when there has been no such request.

2) metadata parsing happens on whatever thread picks up the data. Decompression is done on the thread that receives the cq event, as does protobuf parsing. There are no* background threads.

* except for name resolution

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAKtXhSOZ4CngcEr%2B_jSWb5vL9a1DHS_SyjBG6PO6R0oYhcVTGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sachin Gaikwad

unread,

May 4, 2016, 3:03:50 PM5/4/16

to Craig Tiller, grp...@googlegroups.com

Thanks Craig.

On Wed, May 4, 2016 at 8:30 PM, Craig Tiller <cti...@google.com> wrote:

1) is certainly true: the server fast path is when a call has already been requested by the application code, and the slow path is when there has been no such request.

Can you point me to gprc code to read for "fast path" and "slow path"?

Sachin

Craig Tiller

unread,

Jun 1, 2016, 11:33:12 AM6/1/16

to Sachin Gaikwad, grp...@googlegroups.com

Take a look at publish_new_rpc() in src/core/lib/surface/server.c.

The fast path is whenever we can call publish_call() directly from that function, the slow path is when we get to the end, need to take server->mu_call, and queue the incoming rpc until it's requested.

Reply all

Reply to author

Forward