gRPC C++ performance gap vs Go

AlexN

unread,

Aug 24, 2023, 3:46:47 AM8/24/23

to grpc.io

Hi,

When looking at the QPS benchmarks (and reproducing them) the performance gap between the C++ and Go is more than x2 (Go is faster). The test scenario appears to be quite similar: Streaming secure throughput QPS (8 cores).

I tried to look deeper into it and found that C++ server has lots of outgoing small TCP messages (~100-200 bytes). These appear to take a considerable amount of CPU time. When Go server was considered, the number of small messages was just a bit higher than the number of large (~1.5Kbytes) outgoing messages.

I wonder if anyone has explored this and if there are any directions (or maybe existing implementations) for improving the short message performance of C++?

Thanks,

Alex

Jeff Steger

unread,

Aug 24, 2023, 12:32:42 PM8/24/23

to AlexN, grpc.io

my first thought is maybe nagle algorithm is on for one and off for the other. thats probably configurable somewhere. anyway i am purely speculating.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d5c28277-3ee4-4fd4-bac3-1da9e2017f66n%40googlegroups.com.

AlexN

unread,

Aug 27, 2023, 3:39:52 AM8/27/23

to grpc.io

Thanks Jeff, it is an interesting idea. I've tried to look at it.

As far as the gRPC performance tests configuration were concerned, it did no significant change (for the high QPS scenario) if set the:

        "channel_args": [
          {
            "name": "grpc.optimization_target",
            "str_value": "latency"

either to "latency" or "throughput." I believe this is the handle to control the TCP_NODELAY socket option.

I'm afraid it is somewhere in the gRPC "inner workings." I've tried to use strace (sparingly, although it certainly affects the execution) and it shows the small messages already at the system call interface. Moreover, it shows that send/write system calls are usually preempted by "themselves" from other server threads. Though the latter could be an artifact of using strace at hundreds of thousand packets/second rate.

Reply all

Reply to author

Forward