How to reproduce the latency and qps benchmark numbers for grpc-java?

aruf...@gmail.com

unread,

Dec 22, 2016, 12:11:41 PM12/22/16

to grpc.io

Hi!

The benchmark page shows that grpc-java has a unary call latency of ~300us and a qps of ~150k between 2 8-cores VMs.

https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5712453606309888

How do I reproduce these numbers with AsyncClient and AsyncServer? What are the command line parameters used for producing these numbers?

I ranthe benchmark on a pair of 16-cores VMs with 40 Gbps network in the same datacenter.

I ran server with:

java io.grpc.benchmarks.qps.AsyncServer --address=0.0.0.0:9000 --transport=netty_nio

and client with:

java io.grpc.benchmarks.qps.AsyncClient --address=server:9000 --transport=netty_nio

and results:

Channels: 4

Outstanding RPCs per Channel: 10

Server Payload Size: 0

Client Payload Size: 0

50%ile Latency (in micros): 2151

90%ile Latency (in micros): 8087

95%ile Latency (in micros): 10607

99%ile Latency (in micros): 17711

99.9%ile Latency (in micros): 39359

Maximum Latency (in micros): 413951

QPS: 10917

For optimizing latency I ran server with --directexecutor and client with --channels=1 --outstanding_rpcs=1

Channels: 1

Outstanding RPCs per Channel: 1

Server Payload Size: 0

Client Payload Size: 0

50%ile Latency (in micros): 617

90%ile Latency (in micros): 1011

95%ile Latency (in micros): 2025

99%ile Latency (in micros): 7659

99.9%ile Latency (in micros): 18255

Maximum Latency (in micros): 125567

QPS: 1094

For optimizing throughput I ran client with --directexecutor and --channels=32 and --outstanding_rpcs=1000

Channels: 32

Outstanding RPCs per Channel: 1000

Server Payload Size: 0

Client Payload Size: 0

50%ile Latency (in micros): 167935

90%ile Latency (in micros): 520447

95%ile Latency (in micros): 652799

99%ile Latency (in micros): 1368063

99.9%ile Latency (in micros): 2390015

Maximum Latency (in micros): 3741695

QPS: 120428

Without --directexecutor in the server and client with --channels=32 and --outstanding_rpcs=1000

Channels: 32

Outstanding RPCs per Channel: 1000

Server Payload Size: 0

Client Payload Size: 0

50%ile Latency (in micros): 347135

90%ile Latency (in micros): 1097727

95%ile Latency (in micros): 1499135

99%ile Latency (in micros): 2330623

99.9%ile Latency (in micros): 3735551

Maximum Latency (in micros): 6463487

QPS: 55969

What is the recommended configuration to achieve the claimed throughput of 150k qps? What are the parameters used for generating the numbers? I'm not able to find that anywhere.

Thanks!

Alpha

alpha....@gmail.com

unread,

Dec 23, 2016, 11:20:23 AM12/23/16

to grpc.io, aruf...@gmail.com

I was referring to the benchmark numbers in this page: http://www.grpc.io/docs/guides/benchmarking.html

The numbers I quoted was from the performance dashboard.

It will be great if the page can link to the benchmark implementation and the exact command line flags used.

Alpha

Carl Mastrangelo

unread,

Dec 29, 2016, 12:44:07 PM12/29/16

to grpc.io, aruf...@gmail.com

The benchmarks are run using LoadClient and LoadServer, in a sibling directory. The test you ran is design to maximize the number of QPS rather than minimize latency. The latency benchmarks are designed to be controlled by a coordinating process found the in C core repo. There are separate scripts to build the java code, and then run them. The running code is here: https://github.com/grpc/grpc/blob/master/tools/run_tests/run_performance_tests.py

As a general note: more channels (in Java) don't result in higher QPS. The Channel abstraction is designed to be reused by lots of threads, so you usually only need one. Using 32 in your case almost certainly will hurt performance. (I personally use 4).

Reply all

Reply to author

Forward