grpc С++ async performance

pol...@gmail.com

unread,

May 25, 2016, 10:17:14 AM5/25/16

to grpc.io

Hello,

I'm testing performance of gRPC C++ asynchronous client / server and wondering about huge amount of syscalls performing by grpc chttp2 transport implementation.

My perf test based on qps_test from upstream with little code simplification for only one case:

- async client (1 thread) calling stream RPC method

- async server (1 thread)

- 5 seconds for warmup and then 10 seconds for benchmarking

- client opens single connection and in infinite loop sending message with 200 bytes long string

message SimpleRequest { string data = 1; }

- server respond with another 200 bytes long string

message SimpleResponse { string data = 1; }

Results on my commodity linux machine:

warming up for 5 seconds
running benchmark for 10 seconds
messages: 145465
elapsed: 9.99862 s
throughput: 14548.5 msg/s
latencies:
50%: 63.8073 us
75%: 77.0946 us
90%: 92.4911 us
99%: 118.268 us

With FlameGraph I see that vast majority of time program spends in performing syscalls.

Is there any way to prevent making syscall after each ClientAsyncReaderWriter::Write() call?

I see WriteOptions class have flag GRPC_WRITE_BUFFER_HINT but actually it is not used in chttp2 transport implementation.

--

Sergey

Craig Tiller

unread,

May 25, 2016, 10:41:56 AM5/25/16

to pol...@gmail.com, grpc.io

Hey,

This is a useful analysis, thankyou!

GRPC_WRITE_BUFFER_HINT is certainly the right knob to delay the syscall after an API write, but as you noticed, it's not currently implemented (though it used to be). It is on my hit list in the coming months, but needs some significant refactoring to the chttp2 write path.

One other way to achieve this would be to have two threads hitting each completion queue - the second thread will see that a write is in progress and return immediately leaving the first to perform the actual write.

Craig

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/f65fd422-6232-4d1f-a80d-e293621b62b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey Polovko

unread,

May 25, 2016, 10:57:46 AM5/25/16

to Craig Tiller, grpc.io

Good news!

Some time ago my colleague implemented approach for optimising number of syscalls in Netty library https://github.com/netty/netty/issues/1759 (unfortunately still an open issue though).

This is approach is used in our internal RPC framework for many years and in my perf tests (comparing gRPC with ours RPC) shows huge benefits in number of processed messages per second and in latency per message.

—

Sergey

Craig Tiller

unread,

May 25, 2016, 11:08:54 AM5/25/16

to Sergey Polovko, grpc.io

That's effectively what you get with multiple threads running against a CQ, though without the latency hit of forcing a context switch on the initial write. We currently pass through a lock, though that'll be going away in the next month or so (see https://github.com/grpc/grpc/pull/6407 for where we're going with that).

We have a fairly strong policy of not spinning up threads within gRPC outside of those that applications give us. We end up having to right now for name resolution, but that's also something we're hoping to address soon (and really be in the place where gRPC core doesn't create threads).

Eric Anderson

unread,

May 25, 2016, 1:18:55 PM5/25/16

to Sergey Polovko, Craig Tiller, grpc.io

On Wed, May 25, 2016 at 7:57 AM, Sergey Polovko <pol...@gmail.com> wrote:

Some time ago my colleague implemented approach for optimising number of syscalls in Netty library https://github.com/netty/netty/issues/1759 (unfortunately still an open issue though).

This is what grpc-java is doing, and it did give us a noticeable speedup. As Craig mentions though, it isn't directly applicable to how the C core operates.

Craig Tiller

unread,

May 30, 2016, 2:05:19 PM5/30/16

to Eric Anderson, Sergey Polovko, grpc.io

I've been experimenting with delaying the write until the next poll occurs... basically signalling an events when the transport is ready to write and only pulling together the bytes and dumping them on the wire when some thread picks it up.

Reply all

Reply to author

Forward