gRPC throughput for streaming RPCs

3,992 views
Skip to first unread message

Vivek M

unread,
Jun 2, 2017, 9:37:22 AM6/2/17
to grpc.io
Hi,

We have a gRPC streaming server and it serves only one streaming RPC but there are lots of data that has to be streamed using this RPC. And our customer is ready to invoke this RPC only once (they are not ok to have multiple streaming calls running). We hit a throughput issue and we observed that by increasing the HTTP/2 window size from its default 64K, we are able to achieve more throughput.

However I would like to know with default value of 64K window size how can we achieve more throughput. Is there a way to tell the gRPC stack to use multiple streams per streaming RPC? So instead of using one stream with larger window, it can create and use multiple small streams of 64K window by dynamically creating a stream whenever it senses the existing active streams are choked.

If not, what other options do we have to increase the throughput with default window of 64K?

Thanks,
Vivek

Carl Mastrangelo

unread,
Jun 2, 2017, 4:21:46 PM6/2/17
to grpc.io
I am assuming you are using gRPC Java:

* Setting the window size correctly will have the biggest win; it should be about equal to the bandwidth delay product BDP.  64K was picked as a generally safe guess, but it isn't correct in all environments.  There is work to automatically tune this, but it isn't in today.

* If you have exactly 1 RPC active at a time, there are optimizations to make the DATA frames larger.  (16K by default, set by the remote side settings).  You can change this (though TBH, I have never tried and don't know how) to be larger so that each RPC fits in a single frame and doesn't need to be cut up.  

* If you have more then 1 RPC active, each message is cut up into 1K chunks in order to make each RPC get more fair access to the wire.  This was changed in master, and will be available in 1.5, but you can run with master to try it out.  This ONLY helps if there are more than one active RPCs.

* If you are pushing more than 10gbps, you can run into TLS bottlenecks.  This is almost certainly not applicable to most people, but you can create multiple channels to get around this, but you give up in order delivery.  I would avoid doing this until it is the last possible thing.

* Making multiple RPCs will slow down your code, due to the header overhead for each RPC.  As seen on our performance dashboard ( http://performance-dot-grpc-testing.appspot.com/explore?dashboard=5636470266134528 ) you can see streaming throughput is about 2 - 2.5x faster.  


What kind of bottlenecks do you see, and what are your target goals?  

Vivek M

unread,
Jun 2, 2017, 5:37:42 PM6/2/17
to grpc.io
Hi Carl,

Its a C++ streaming server.

There will be only one streaming RPC active from one client. There can be max 3-4 clients connecting to the server and invoking this RPC. But there will be only one RPC active per client connection. So I assume the below 2 are relevant in my case and find the replies inline.


* Setting the window size correctly will have the biggest win; it should be about equal to the bandwidth delay product BDP.  64K was picked as a generally safe guess, but it isn't correct in all environments.  There is work to automatically tune this, but it isn't in today.
[Vivek] Yup it improved the performance. BTW will there be only one stream used per uni-directional streaming RPC? Is there a way to tell the stack to use multiple streams per RPC? I am trying to explore more here if it is possible to improve the performance by having multiple streams each of default window size of 64KB instead of using only one stream having a big window size.


* If you have exactly 1 RPC active at a time, there are optimizations to make the DATA frames larger.  (16K by default, set by the remote side settings).  You can change this (though TBH, I have never tried and don't know how) to be larger so that each RPC fits in a single frame and doesn't need to be cut up. 


What kind of bottlenecks do you see, and what are your target goals? 
[Vivek] Eventhough the cwnd of our TCP connection is pegged at around 350 KB, because of default 64K HTTP/2 window size our server is not able to push more data and hence the throughput is reduced. As the HTTP/2 window size being negotiated by the client is not configurable (bcoz server picks up the window size advertised by the client), I am exploring more on how to make use of multiple streams (each 64K) per RPC call in the server to improve the throughput.

Thanks,
Vivek

Carl Mastrangelo

unread,
Jun 2, 2017, 8:56:54 PM6/2/17
to grpc.io


On Friday, June 2, 2017 at 2:37:42 PM UTC-7, Vivek M wrote:
Hi Carl,

Its a C++ streaming server.

There will be only one streaming RPC active from one client. There can be max 3-4 clients connecting to the server and invoking this RPC. But there will be only one RPC active per client connection. So I assume the below 2 are relevant in my case and find the replies inline.


* Setting the window size correctly will have the biggest win; it should be about equal to the bandwidth delay product BDP.  64K was picked as a generally safe guess, but it isn't correct in all environments.  There is work to automatically tune this, but it isn't in today.
[Vivek] Yup it improved the performance. BTW will there be only one stream used per uni-directional streaming RPC? Is there a way to tell the stack to use multiple streams per RPC? I am trying to explore more here if it is possible to improve the performance by having multiple streams each of default window size of 64KB instead of using only one stream having a big window size.

Generally no.  The real window win is for the connection as a whole, rather than for each individual stream.  It is the connection window that runs out the soonest, usually.

 


* If you have exactly 1 RPC active at a time, there are optimizations to make the DATA frames larger.  (16K by default, set by the remote side settings).  You can change this (though TBH, I have never tried and don't know how) to be larger so that each RPC fits in a single frame and doesn't need to be cut up. 
[Vivek] I am also not sure how do we achieve this. Will https://github.com/grpc/grpc/blob/ac4a7283ee77bfe5118a061a62930019ff090e37/src/cpp/common/channel_arguments.cc#L167 help?

Since this is C++ and not Java, I am not sure how to set it.  C++ uses 8K chunks by default I think.
 

What kind of bottlenecks do you see, and what are your target goals? 
[Vivek] Eventhough the cwnd of our TCP connection is pegged at around 350 KB, because of default 64K HTTP/2 window size our server is not able to push more data and hence the throughput is reduced. As the HTTP/2 window size being negotiated by the client is not configurable (bcoz server picks up the window size advertised by the client), I am exploring more on how to make use of multiple streams (each 64K) per RPC call in the server to improve the throughput.
 
The HTTP/2 window as used by gRPC is actually more about memory management than speed.  It is used as away to push back on buffering excessively.  You can raise the window, but it will cost you more memory.  If you care about throughput, but perhaps less about latency, you might consider doing corking to speed up the connection. 

Vivek M

unread,
Jun 7, 2017, 3:46:24 AM6/7/17
to grpc.io
Thanks Carl.
Reply all
Reply to author
Forward
0 new messages