I have a gRPC service with a bidirectional streaming method.
- Client: python grpcio 1.41.1.
- Server: akka-grpc 2.1.0.
The client is a slow consumer (the server could potentially perform at a higher rate).
Occasionally (with some random delay after method call), client logs message like the following:
E1122 13:42:55.763763501 108048 flow_control.cc:240] Incoming frame of size 317205 exceeds local window size of 0.
The (un-acked, future) window size would be 1708209 which is not exceeded.
This would usually cause a disconnection, but allowing it due tobroken HTTP2 implementations in the wild.
See (for example) https://github.com/netty/netty/issues/6520.
Sometimes this message is followed by exception:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "[...]/client.py", line 107, in fetch
for response in responses:
File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Stream removed"
debug_error_string = "{"created":"@1637649068.837642637","description":"Error received from peer ipv4:***.***.***.***:****","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
But sometimes overall call succeeds with no exception.
Some research:
- Disabling BDP by setting grpc.http2.bdp_probe = 0 seems to resolve the problem, but I suppose it's just a side effect of overall throughput decrease.
- There is somewhat similar issue on GitHub, but it looks like it's about an unary
call. In that case, server starts to use increased initial window size
immediately after receiving client's SETTINGS frame and before sending
SETTINGS ack (if I understood right). In my case, frame ordering looks
correct.
- Exploring captured network packets and client-side gRPC tracing logs (GRPC_VERBOSITY=DEBUG, GRPC_TRACE=flowctl) doesn't give me any insights.
I'll greatly appreciate any ideas on how to resolve or diagnose the problem.