Previously[1] we thought we had things configured so that our Ruby grpc clients would ping our grpc-java server every 30 seconds to keep connections alive (and detect if they've gone bad) between calls.[2] This seems to mostly work, but after still seeing a fairly high rate of GRPC::Unavailable errors, we ran a tcpdump on a client machine and observed that it appears that processes will sometimes stop pinging even though the connection is still good. We believe the connection is still good because we'll sometimes see more traffic from the same client port, after which pings will restart, re-timed based on the time of that call. (Note that due to TLS we're inferring what's call-traffic and what's ping-traffic based on packet sizes and timing.) Other times we'll see pings stop and then (more than 30 seconds) later, at the configured maxConnectionIdle since the last call, see the server close the connection.
Does anyone know of a reason client keepalive pings would stop being sent when the connection is still healthy?
Our channel_args configuration is as follows.
"grpc.initial_reconnect_backoff_ms" => 500,
"grpc.min_reconnect_backoff_ms" => 500,
"grpc.max_reconnect_backoff_ms" => 10_000,
"grpc.keepalive_permit_without_calls" => 1,
"grpc.http2.min_time_between_pings_ms" => 10_000,
"grpc.keepalive_time_ms" => 30_000,
"grpc.keepalive_timeout_ms" => timeout_seconds * 1_000,
Thanks.
-hume.