flow control problem in GRPC C++ release v1.25.0

62 views
Skip to first unread message

韩飞

unread,
Dec 14, 2020, 2:11:06 PM12/14/20
to grpc.io
I use GRPC C++ lib in my distributed SQL engine. I use server stream that client side sends "connect" request then the server side sends data packets by a sync writer and client side receive data packets by a async reader.
I use a single stream to pass massive data and be sure that the reader fetches data packets promptly, but I find the stream is halted by every 4-5 seconds sometimes.
To look into this problem, I open the flowctl and timer trace log. I finder the server (sender) consumes out the remote window very fast and the stream is moved to stalled list. the log is in client.log

We can find that in 12:39s stream 11 is added to stalled list and waited for a stream updt, in 12:43s , a timer thread receive it and reset the window size, then unblock this write call.

On server side, the log is server.log, in 12:43 server begin to receive all of the data in a window at once and sent the updt packet to the client.

Why the updt request halt for four seconds ? It seems server process too much data at once, but why?

韩飞

unread,
Dec 15, 2020, 12:05:31 AM12/15/20
to grpc.io
update : after I use sync mode on the client side, this problem is fixed. Does it mean a bug of a mixed use of sync and async mode?


--
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/hR2g4hFvp3M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/07626768-cf49-4670-9b8c-62da6112f523n%40googlegroups.com.

yas...@google.com

unread,
Dec 16, 2020, 4:01:38 PM12/16/20
to grpc.io
That's some really good investigation! Kudos! 
The transport layer (chttp2) does not really care whether the application is using the async API or the sync API. What matters really is whether there is a thread that is polling the underlying fds. For the sync API, this generally happens through `Read()`/`Write()`/`Finish()` calls, but for the cq based async API, this happens when `Next()` is called on the associated cq. Given, that in the async case, it is the timer thread that ends up receiving the update and not any application thread, I would suspect that the application is not polling the cqs.
Reply all
Reply to author
Forward
0 new messages