java gRPC client do not refresh DNS entries

85 views
Skip to first unread message

Maksim Likharev

unread,
Aug 20, 2025, 3:36:14 PMAug 20
to grpc.io
I’m observing the following behavior: Service S1 (java microservice) communicates with Service S2 (java microservice)  using gRPC unary calls, and both services run in k8s. The gRPC client in S1 uses keepalive and resolves a headless Service (which returns multiple IP addresses). After scaling S2 down and then back up, the gRPC client in S1 stops communicating, UNAVAILABLE error,  logs indicate it continues using stale IP addresses.
Problem does not resolve until restart of the S1. k8s headless service has correct IP addresses, and name resolution from the pod ( nslookup/dig) shows correct IPs as well, so this is not an infrastructure problem.

What could be causing this, and how can I force the gRPC client to refresh its DNS cache?

Kannan Jayaprakasam

unread,
Aug 22, 2025, 4:42:33 AMAug 22
to grpc.io
The gRPC Java client would have tried refreshing the IP addresses but what must have happened is timing issues in the headless service's scaling up. When the old set of pods go down they would have sent a GOAWAY on the established connections by the client (since you use keepalive on the client, it is all the more likely the GOAWAY was not lost to the client). This would have immediately caused a re-resolution and still have received the old addresses as the new pods may not have come up yet, and fail to establish connection and cause rpcs to fail. After all the addresses fail to connect, name re-resolution will be triggered and a re-connection scheduled after a backoff time dictated by the connection backoff policy in your service config. You can try one of the following options.
1. Using the connection backoff policy to wait for lesser time or configuring retry policies for rpcs to wait for longer.
2. Forcing channel reconnect with ManagedChannel.resetConnectBackoff to reset the back off timer and cause a re-resolution and reconnect.
2. Using waitForReady in CallOptions so the RPC waits for the channel to become ready.
3. Active polling of the channel state with ManagedChannel.getState or ManagedChannel.notifyWhenStateChanged to know when the channel becomes READY.

Maksim Likharev

unread,
Aug 22, 2025, 7:06:34 PMAug 22
to grpc.io
let me try to check if backoff will help, but the client never recovers, we noticed that one of the servers didn't recover in 36 hours, until we restarted the service.
Reply all
Reply to author
Forward
0 new messages