RST_STREAM with error code 2 and grpc

Thomas Barnekow

unread,

May 30, 2020, 3:26:14 PM5/30/20

to grpc.io

Hi,

I'm having trouble with an error that I don't really understand and that I can't really debug completely and so I would appreciate any helpful hints.

Here's the setup. I have deployed a gRPC server written in Java to a Kubernetes cluster on the Google Kubernetes Engine (GKE). The deployment consists of one Docker container for my gRPC server and another one for the Extensible Service Proxy (ESP) v1 as a sidecar to the gRPC server. For that deployment, there is a service of type NodePort, which forwards HTTP/2 traffic to the ESP (which forwards it to my gRPC server). Finally, I am using an ingress with a Google-managed certificate to make my gRPC service available via TLS on the Internet (by creating and configuring an external HTTPS load balancer, among other things).

The gRPC server uses bidirectional streaming. Simplifying matters a little, after an initial request received from the gRPC client (written in C# and targeting the .NET Framework, running on my local machine), the gRPC server downloads files from another web service and reports each file it downloaded in a gRPC response. And this also works fine unless the number of files that are reported is below some threshold (not a fixed one, though). However, after it has reported around 76-82 files in the corresponding number of responses, an exception is thrown in my client because of an RST_STREAM frame received from the server.

Having enabled logging by setting GRPC_VERBOSITY to DEBUG and GRPC_TRACE to all, the only line item I could find in my gRPC client's log that says something about this error has the following information:

{

"created":"@1590858674.654000000",

"description":"Error received from peer ipv4:[load-balancer's-public-ip-address]:443",

"file":"T:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\surface\call.cc",

"file_line":1056,

"grpc_message":"Received RST_STREAM with error code 2",

"grpc_status":13

}

My gRPC server does not report any error. In fact, it happily continues streaming further responses after my client received the RST_STREAM, and only when waiting for the next client request, the cancellation of the request is reported.

Having enabled logging for ESP, the error log shows the following line items, among others:

2020/05/30 11:45:31 [debug] 9#9: worker cycle

2020/05/30 11:45:31 [debug] 9#9: epoll timer: 498

2020/05/30 11:45:31 [debug] 9#9: epoll: fd:21 ev:0001 d:00007F63709A54E8

2020/05/30 11:45:31 [debug] 9#9: *51 http2 read handler

2020/05/30 11:45:31 [debug] 9#9: *51 SSL_read: 13

2020/05/30 11:45:31 [debug] 9#9: *51 SSL_read: -1

2020/05/30 11:45:31 [debug] 9#9: *51 SSL_get_error: 2

2020/05/30 11:45:31 [debug] 9#9: *51 http2 frame type:3 f:0 l:4 sid:1

2020/05/30 11:45:31 [debug] 9#9: *51 http2 RST_STREAM frame, sid:1 status:8

2020/05/30 11:45:31 [info] 9#9: *51 client canceled stream 1, client: [my-client's-public-ip-address], server: , request: "POST /dokumate.dss.v1.SignatureService/CreateSignature HTTP/2.0", host: "signature-service.dss.dokumate.com"

In the last line item, the ESP log says that the client canceled the stream, which my code does not do and which I could also not see in my client's log.

I tried to find something in the ingress or load balancer logs, but those are silent about any errors. Thus, I am at a loss as to what is happening here.

It all works perfectly fine when I (1) run the client and the server on my local machine and (2) when I run the server on the GKE and expose the pod and my server's port through a service of type load balancer. In that case, there is no TLS, however, which is a no-go for productive use.

Regards, Thomas

Thomas Barnekow

unread,

May 31, 2020, 6:17:35 AM5/31/20

to grpc.io

I found the solution. The load balancer was configured with a default timeout of 30 seconds.

Based on further testing, while the number of responses the client received before receiving the RST_STREAM frame still varied, I saw that this consistently occurred after approximately 30 seconds. Searching explicitly for timeouts, I found a StackOverflow question (see https://stackoverflow.com/questions/44601191/kubernetes-on-gce-ingress-timeout-configuration) pointing me into the right direction.

Google's documentation does not talk about that default timeout. At least I did not find it.

yas...@google.com

unread,

Jun 10, 2020, 12:03:50 PM6/10/20

to grpc.io

Hi,

Thanks for the feedback, but this is not a gRPC setting. This probably needs to be reported to GKE folks.

Reply all

Reply to author

Forward

RST_STREAM with error code 2 and grpc_status 13

Thomas Barnekow

Thomas Barnekow

yas...@google.com