We ended up adding the following to `Dial':
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 10 * time.Second,
})
This required bumping grpc to a commit that included the fix in
https://github.com/grpc/grpc-go/pull/2307 which sets the
TCP_USER_TIMEOUT socket option on Linux. On a side note, this issue
doesn't affect windows clients. It looks like by default windows
retransmissions are much lower than on GNU/Linux.