Connection failures in grpc-go 1.56

190 views
Skip to first unread message

Robert Woerner

unread,
Aug 1, 2023, 3:34:52 PM8/1/23
to grpc.io
Somewhere between grpc-go version 1.55.1 and 1.56.1, a change was made that substantially increased the number of "connection refused" messages my client gets if the server is not available  when the client starts. Moreover, the client still receives "connection refused" when it sends a message, at least until the first or second retry.  I can confirm that lsof shows the post is listening. 

My very tentative theory is that this behavior change is related to the changes made to the background reconnection code.  Possibly, a connection is not attempted in the goroutine issuing a call, but only from the background.  This leads the call to fail until the background goroutine establishes the connection. 

Can anyone shed light on this?  Or suggest a way to recreate the previous behavior?  I don't have a choice about which version to use.

Thanks, Rob

Easwar Swaminathan

unread,
Aug 16, 2023, 8:03:57 PM8/16/23
to grpc.io
Are you using the `pick_first` LB policy?

If so, we made a change in 1.56 that changes its behavior to support what we call sticky transient failure, i.e it will keep retrying the connection until it can successfully make on, or the idle timeout fires. The earlier behavior used to be that once `pick_first` tried all addresses given to it, and was not able to make a successful connection, it would put the channel in IDLE and not retry until the next RPC attempt (or an explicit API call from the application to connect). These changes are documented in https://github.com/grpc/proposal/blob/master/A62-pick-first.md.

If you want to enable the idle timeout (which we had to disable by default due to a bug that has since been fixed), you can use this dial option: https://pkg.go.dev/google.golang.org/grpc#WithIdleTimeout

We think that sticky transient failure is the correct behavior for the `pick_first` policy, and other gRPC languages are doing the same as well.

Hope this helps,
Easwar
Reply all
Reply to author
Forward
0 new messages