grpc.dial isn't establishing a valid connection despite several re-tries

36 views
Skip to first unread message

Tathagata Chakraborty

unread,
Jul 30, 2025, 9:37:25 AMJul 30
to grpc.io

I am using golang GRPC v1.40.0 on MacOS platforms to communicate with my GRPC server. I am noticing a problem where if grpc.Dial is called once when the machine's network state is unstable (let's say something happens that prevent DNS resolution for my target server's host name) GRPC isn't able to recover and return a valid grpc.clientconn object for a very long time, even after the network is stabilised (DNS resolution is working again). This happens despite several reconnection (grpc.clientconn.close and subsequent grpc.dial) attempts.

I don't see any attempt to resolve DNS or any other TCP traffic for my server in Wireshark trace during my attempts to re-dial the connection. Though I see lot noise from DNS queries to grpc_config which gets unanswered as my server don't support them. Dial() doesn't return any error but connection state is always 'idle' as opposed to 'ready' (when it works).

However, I also noticed that when I pass in a custom dialer function (that is using net.dialer with system's default resolver) as part of grpc dialOptions during call to grpc.dial(...) in that case the reattempt succeeds and I get back a valid connection at ready state. But if don't supply a dialer function and let grpc manage the name resolution and dialing of the connection then it doesn't recover easily from the bad state, takes a very long time, 10-20 mins.

Note: we use DNS scheme for go resolver: resolver.SetDefaultScheme("dns")

Can anyone shed some light into go-grpc's connection establishment logic that explains this?

Reply all
Reply to author
Forward
0 new messages