Intermittent Unavailable/Unknown RpcException (C++/C#)

94 views
Skip to first unread message

Jacob B

unread,
Aug 26, 2021, 1:09:41 PM8/26/21
to grpc.io

We are using gRPC (version  1.37.1) for our inter-process communication between our C# process and C++ process. Both processes act as a server and client with the other and run on the same machine over localhost using the HTTP/2 transport. All of the calls are use blocking synchronous unary calls and not bi-directional streaming. Some average(ish) stats:

From C++->C#: 0-2 calls per second, 0-40 calls per minute

From C#->C++: 0-5 calls per second, 0-200 calls per minute

Intermittently, we were getting one of 3 issues

  • C# client call to C++ server comes back with an RpcException, usually “HTTP2/Parse Error”, “Endpoint Read Failed”, or “Transport Closed”
  • C++ client call to C# server comes back with Unavailable or Unknown
  • C++ client WaitForConnected call to check the channel fails after 500ms

 

The top most one is the most frequent and where we have the most information about. Usually, what we’ll see is the Client receives the RPC call and runs into an unknown frame type. Then the subchannel goes into shutdown and everything usually re-connects fine. We also generally see an embedded error like the following (note that we replaced all __FILE__ instances to __FUNCTION__ in our gRPC source):

win_read","file_line":307,"os_error":"The system detected an invalid pointer address in attempting to use a pointer argument in a call.\r\n","syscall":"WSARecv","wsa_error":10014}]},{"created":"@1622120588.494000000","description":"frame of size 262404 overflows local window of 65535","file":"grpc_core::chttp2::TransportFlowControl::ValidateRecvData","file_line":213}]}

What we’ve seen with the unknown frame type, is that it parses the HEADERS, WINDOW_UPDATE, DATA, WINDOW_UPDATE and then gets a TCP: on_read without a corresponding READ and then tries to parse again. It’s this parse where it looks like the parser is at the wrong offset in the buffer, because it gets the unknown frame type, incoming frame size and incoming stream_id all map to the middle of the RPC call that it just parsed.

 

The above was what we were encountering prior to a change to create a new channel for each rpc call. While we realize it is not great from a performance standpoint, we have seen increased stability since making the change. However, we still do occasionally get rpc exceptions. Now, the most common is “Unknown”/”Stream Removed” rather than the ones listed above.


Any ideas on what might be going wrong is appreciated.

 

yas...@google.com

unread,
Sep 23, 2021, 1:46:26 PM9/23/21
to grpc.io
For reference, https://github.com/grpc/grpc/issues/27292 is the related issue
Reply all
Reply to author
Forward
0 new messages