gRPC for low latency Distributed Services

raut...@gmail.com

unread,

Aug 8, 2016, 3:35:03 AM8/8/16

to grpc.io

Hi,
Did anyone here used gRPC to implement low latency distributed stuff?

What is the performance of gRPC when there is a need to finish RPC calls within 2 ms between a server and client for every RPC call? Can gRPC do this for 99 percentile of traffic at say 50000 RPC calls per second?

Does anyone have any experience with such a system yet?

Yes, I benchmarked and it is order of magnitudes slower compared to writing a custom messaging bus over zmq. Am I doing it wrong?
With custom zmq messaging bus we get latency in order of microseconds between 2 services on same host (21 us avg) vs 2 ms avg for gRPC.

That is extremely slow compared to zmq., i.e x1000 time slow.

How can I fix this?

Thanks,

Koen De Keyser

unread,

Aug 8, 2016, 5:50:48 PM8/8/16

to grpc.io

I have used a bidirectional, long-lived streaming RPC of grpc, over which then messages are sent in both directions. In that case, the observed latencies where lower than 2ms (I think I recall it was in the order of hundred microseconds, but will need to double-check), over 40 Gbps ethernet. This was using the C++ library, async implementation.

This approach moves the connection and RPC setup cost out of the individual latency of the messages. If you tested with standard single request/response RPC's, that might explain the higher latency. Also, my messages where rather simple, and I expect the protobuf serialization to be quite fast. Larger messages will incur some latency due to the more complex serialization. Maybe using Flatbuffers as the serialization layer could help you out there, but I don't have any experience there.

Koen

Ken Payson

unread,

Aug 8, 2016, 6:55:37 PM8/8/16

to Koen De Keyser, grpc.io

We have continuously running performance tests tracking gRPC master:

https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5760820306771968

These tests use small, simple messages. For C++ the latency is around 200us.

As Koen noted, there is a higher first message latency, and that could explain the 2ms latency you have observed.

Ken

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/8195ab20-29a1-4e67-a7d7-f1d4a9575e10%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Eric Anderson

unread,

Aug 9, 2016, 11:59:18 AM8/9/16

to raut...@gmail.com, grpc.io

On Mon, Aug 8, 2016 at 12:35 AM, <raut...@gmail.com> wrote:

With custom zmq messaging bus we get latency in order of microseconds between 2 services on same host (21 us avg) vs 2 ms avg for gRPC.

Did you reuse the ClientConn between RPCs?

In our performance tests on GCE (using not very special machines, where netperf takes ~100µs) we see ~300µs latency for unary and ~225µs latency for streaming in Go.

Pradeep Singh

unread,

Aug 9, 2016, 1:33:31 PM8/9/16

to Eric Anderson, grpc.io

Oh I was running the included benchmark in gRPC src code.

I think it reuses the same connection.

300us sounds really good.

What latency do you guys notice when client and server are running on different hosts?

Thanks,

--

Pradeep Singh

Koen De Keyser

unread,

Aug 9, 2016, 3:56:16 PM8/9/16

to grpc.io, ej...@google.com, raut...@gmail.com

I have measured an average round trip latency of 320us (so A sends message to B, and B then sends a message back to A). This was using streaming RPC (C++ implementation), using 2 machines (Xeon E5) connected over 40 Gbps ethernet. This does include some additional logic on both sides, but nothing substantial, so pure gRPC latency might actually be slightly lower.

Koen

Pradeep Singh

unread,

Aug 9, 2016, 4:21:00 PM8/9/16

to Koen De Keyser, grpc.io, Eric Anderson

That is nice.
Thank you for the reply Koen.

--

Pradeep Singh

Carl Mastrangelo

unread,

Aug 9, 2016, 8:00:26 PM8/9/16

to grpc.io, ej...@google.com, raut...@gmail.com

On machines that are within the same network, you can expect latencies in the low hundreds of microseconds. I have personally measured numbers within 100 - 200 microseconds on nearby machines. I had to tune the server somewhat to achieve this, but it is possible.

Pradeep Singh

unread,

Aug 9, 2016, 8:25:29 PM8/9/16

to Carl Mastrangelo, grpc.io, Eric Anderson

Thanks Carl.

And what throughput can you achieve with these latencies?

I mean sending one Req and receiving one Response is fine but what happens to latencies when REQ rate reaches 50K Reqs per second, especially what is avg latency and throughput at point when CPU cores are saturated at either Client or Server.

I agree that latency and throughput do not go hand in hand but would love to know your numbers before it starts crossing millisecond latency boundaries?

--Pradeep

--

Pradeep Singh

Carl Mastrangelo

unread,

Aug 9, 2016, 9:12:24 PM8/9/16

to Pradeep Singh, grpc.io, Eric Anderson

The latency numbers are a little tricky to interpret with respect to throughput. Latency and throughput are at odds with each other, and optimizing one usually comes at the cost of the other. (And, generally speaking, latency is more important than throughput).

When Testing for latency, I create a client and server running on separate machines. The client sends a single messages, and waits for a response. Upon receiving a response it sends another. We call this a closed-loop benchmark. It is effectively single threaded, in order to not introduce additional noise into the system. (We also vary whether or not to use an additional executor when handling responses, which can change the latency by about 25us) In such a setup, I can do around 200us latency, which ends up being around 5000qps for a single core.

When running the benchmark trying to max out QPS, I can get much higher throughput. The latency in such tests is around 50ms median latency, for an aggregate throughput of about 300-400 Kqps. (and 186ms at the 99.9th percentile). This is running a java server and client each with a 32 core machine. We can go much higher, and I have added a number of performance issues to the grpc-java github project. (all the code is available in our benchmarks directory, so that you can reproduce them yourself)

We give you good defaults out of the box with gRPC. The numbers I am mentioning here are achieved by looking more thoroughly into the setup, and making the appropriate changes. Our setup is careful to avoid lock contention, avoid thread context switches, avoid allocating memory where possible, and obeying the flow control signals. We prefer the Async API to the synchronous one.

All our numbers are visible on the dashboard as previously mentioned. Describing your use case will tell what approximate performance you can expect, and how to achieve it.

Pradeep Singh

unread,

Aug 9, 2016, 9:34:55 PM8/9/16

to Carl Mastrangelo, grpc.io, Eric Anderson

We are planning to write our own Message bus for a low latency Ad platform.

With very tight SLAs on response times (<80ms), we would want to have a solution which can give us a good throughput with most of the traffic within this latency.

What complicates the problem is that there are 4 hops involved before a response is sent out.

This means, we either write our own RPC implementation in C(*shudders*) or use a substitue which can get us towards it.

Since I am responsible for this, I would like to evaluate gRPC is right fit for such a peculiar use case?

Thanks,

--

Pradeep Singh

Carl Mastrangelo

unread,

Aug 9, 2016, 9:43:19 PM8/9/16

to Pradeep Singh, grpc.io, Eric Anderson

When talking at the millisecond level, gRPC is likely not going to show up as a significant cost. Adding up the cost of message serialization, encryption, headers, etc. will maybe account for a millisecond of time inside of your client, and the rest of the delay will be from your network. (and of course your actual application code).

It sounds like gRPC is designed for your use. I don't think your use case is unusual at all.

Pradeep Singh

unread,

Aug 9, 2016, 10:43:09 PM8/9/16

to Carl Mastrangelo, grpc.io, Eric Anderson

Thank you Carl for being patient with all the questions and queries.

Appreciate it.

--

Pradeep Singh

Reply all

Reply to author

Forward