Incoming frame of size N exceeds local window size of 0

298 views
Skip to first unread message

Даниил Филиппов

unread,
Nov 23, 2021, 1:47:08 AM11/23/21
to grpc.io

I have a gRPC service with a bidirectional streaming method.

  • Client: python grpcio 1.41.1.
  • Server: akka-grpc 2.1.0.

The client is a slow consumer (the server could potentially perform at a higher rate).

Occasionally (with some random delay after method call), client logs message like the following:

E1122 13:42:55.763763501 108048 flow_control.cc:240] Incoming frame of size 317205 exceeds local window size of 0. The (un-acked, future) window size would be 1708209 which is not exceeded. This would usually cause a disconnection, but allowing it due tobroken HTTP2 implementations in the wild. See (for example) https://github.com/netty/netty/issues/6520.

Sometimes this message is followed by exception:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "[...]/client.py", line 107, in fetch
    for response in responses:
  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "Stream removed" debug_error_string = "{"created":"@1637649068.837642637","description":"Error received from peer ipv4:***.***.***.***:****","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"

But sometimes overall call succeeds with no exception.

Some research:

  • Disabling BDP by setting grpc.http2.bdp_probe = 0 seems to resolve the problem, but I suppose it's just a side effect of overall throughput decrease.
  • There is somewhat similar issue on GitHub, but it looks like it's about an unary call. In that case, server starts to use increased initial window size immediately after receiving client's SETTINGS frame and before sending SETTINGS ack (if I understood right). In my case, frame ordering looks correct.
  • Exploring captured network packets and client-side gRPC tracing logs (GRPC_VERBOSITY=DEBUG, GRPC_TRACE=flowctl) doesn't give me any insights.

I'll greatly appreciate any ideas on how to resolve or diagnose the problem.

yas...@google.com

unread,
Dec 1, 2021, 2:38:06 PM12/1/21
to grpc.io
It looks like there is a bug around handling of flow control windows either in the client or the server. Based on the error log, I would presume that the server's flow control implementation is buggy. To dig deeper, we would need to look at what flow control updates are being sent.
Reply all
Reply to author
Forward
0 new messages