We’re investigating a recurring TCP RST observed ~2.5 seconds after a gRPC client sends application data (PSH, ACK) on a bidirectional stream, and we’re trying to confirm whether this behavior is expected or a side-effect of the keepalive configuration.
Environment
gRPC-Java version: [1.64.0]
Transport: Netty
Channel configured with:
.keepAliveTime(3, TimeUnit.MINUTES) .keepAliveTimeout(2, TimeUnit.SECONDS) .keepAliveWithoutCalls(true)
Server side allows keepalive and does not appear to terminate connections.
Observed behavior
The client sends an HTTP/2 DATA frame (visible as TCP PSH, ACK).
No further packets are received from the server.
Approximately 2.5 seconds later, the client issues a TCP RST.
This occurs consistently when the server does not reply or acknowledge within that interval.
However, we do not see a ping explicitly sent at the time the RST occurs.
It appears that a timeout due to lack of any inbound data (not necessarily a PING-ACK) may trigger shutdown().
Questions
Does KeepAliveManager consider only unacknowledged PINGs when starting the keepalive timeout, or anyperiod of read inactivity (including outstanding DATA frames)?
If no PING was sent yet (because keepAliveTime >> 2 s), can the timeout still trigger a shutdown purely due to read inactivity?
Could the RST behavior stem from the Netty transport closing the channel immediately when shutdown()fires (e.g., via Channel.close() with SO_LINGER=0)?
Are there known differences between gRPC-Java and gRPC-C/C++ regarding this shutdown trigger?
Additional context
We’re analyzing this in the context of a long-lived bidirectional streaming RPC.
tcpdump shows the client’s last sent frame is application DATA, not a PING.
We suspect the combination of .keepAliveTimeout(2s) and .keepAliveTime(3min) may result in a “false positive” closure if the server doesn’t respond quickly enough after the last DATA frame.
We’d appreciate clarification or a reference to where in the codebase this distinction (PING ACK vs generic read inactivity) is definitively made.
Observed behavior
The client sends an HTTP/2 DATA frame (visible as TCP PSH, ACK).
No further packets are received from the server.
Approximately 2.5 seconds later, the client issues a TCP RST.
This occurs consistently when the server does not reply or acknowledge within that interval.
However, we do not see a ping explicitly sent at the time the RST occurs.
It appears that a timeout due to lack of any inbound data (not necessarily a PING-ACK) may trigger shutdown().
Questions
Does KeepAliveManager consider only unacknowledged PINGs when starting the keepalive timeout, or anyperiod of read inactivity (including outstanding DATA frames)?
If no PING was sent yet (because keepAliveTime >> 2 s), can the timeout still trigger a shutdown purely due to read inactivity?
Could the RST behavior stem from the Netty transport closing the channel immediately when shutdown()fires (e.g., via Channel.close() with SO_LINGER=0)?
Are there known differences between gRPC-Java and gRPC-C/C++ regarding this shutdown trigger?
Additional context
We’re analyzing this in the context of a long-lived bidirectional streaming RPC.
tcpdump shows the client’s last sent frame is application DATA, not a PING.
We suspect the combination of .keepAliveTimeout(2s) and .keepAliveTime(3min) may result in a “false positive” closure if the server doesn’t respond quickly enough after the last DATA frame.
We’d appreciate clarification or a reference to where in the codebase this distinction (PING ACK vs generic read inactivity) is definitively made.