[Java] gRPC failure Connection reset by peer with inactivity.

cr2...@gmail.com

unread,

Aug 4, 2017, 10:44:54 AM8/4/17

to grpc.io

Hi,
I have code that's using the futureStub and using NettyChannelBuilder with no other properties set other than usePlaintext(true); I have users that are claiming everything is working fine except if there's no activity on that connection for about 20 minutes. Then they see:

gRPC failure=Status{code=UNAVAILABLE, description=null, cause=java.io.IOException: Connection reset by peer

I've asked them to try the keepAliveTime and keepAliveTimeout and I'm not sure yet if they've done that yet.

The server side is GO 1.40

Client Older version we are moving to later in next release:
com.google.protobuf » protobuf-java 3.1.0
io.grpc » grpc-netty    1.3.0
io.grpc » grpc-protobuf    1.3.0
io.grpc » grpc-stub    1.3.0

So barrage of questions:

Any thoughts on what is happening here ? I know not a lot of details for you to go on :(

The Java client is moving to the very latest version. Are there concerns with compatibility we need to keep in mind with GO server side ?

Are there timeouts were the underlying connections are closed due to inactivity? I would assume they'd reconnect under covers if so, would the keepAlive help here ? Other options to try?

I noticed on ManagedChannel getState and notifyWhenStateChanged These have @ExperimentalApi and Warning: this API is not yet implemented by the gRPC
So I assume they really can't be used in latest version to auto retry setting up connections when they get disconnected?

cr2...@gmail.com

unread,

Aug 4, 2017, 12:07:38 PM8/4/17

to grpc.io, cr2...@gmail.com

Also any advise on good practices to make connections appear more resilient to the layers up the stack?

Doug Fawley

unread,

Aug 9, 2017, 11:50:15 AM8/9/17

to grpc.io, cr2...@gmail.com

On Friday, August 4, 2017 at 7:44:54 AM UTC-7, cr2...@gmail.com wrote:

Any thoughts on what is happening here ? I know not a lot of details for you to go on :(

It's possible this is caused by a proxy between the client and server. If that is the case, client-side keepalive settings should be used to prevent this if it is not desired.

On the gRPC server-side, there is a "max idle" setting:

https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go#L45

It defaults to infinity/disabled, but you should make sure it's not being set unintentionally. If this is the cause, client-side keepalive will not help -- keepalive uses pings, but gRPC's idleness detection considers only active streams (RPCs).

The Java client is moving to the very latest version. Are there concerns with compatibility we need to keep in mind with GO server side ?

There should not be compatibility concerns between gRPC 1.x versions across languages. Please file an issue if you encounter any.

Thanks,

Doug

Eric Anderson

unread,

Aug 9, 2017, 1:00:46 PM8/9/17

to cr2...@gmail.com, grpc.io

On Fri, Aug 4, 2017 at 7:44 AM, <cr2...@gmail.com> wrote:

I have users that are claiming everything is working fine except if there's no activity on that connection for about 20 minutes. Then they see:

gRPC failure=Status{code=UNAVAILABLE, description=null, cause=java.io.IOException: Connection reset by peer

To others seeing this, there are two places you can see this error: in a Status or in logged. In the next grpc-java release (1.6) these errors will be squelched from being logged, but you can still see in in the Status.

I've asked them to try the keepAliveTime and keepAliveTimeout and I'm not sure yet if they've done that yet.

Something ("in the network") is killing the connection after 20 minutes; that may be proxies, firewalls, or routers. Most of the time keepalive will fix this. There is a chance that the TCP connection is being killed because of its age instead of inactivity; keepalive won't fix that.

The server side is GO 1.40

...

So barrage of questions:

Any thoughts on what is happening here ? I know not a lot of details for you to go on :(

You mentioned the client configuration. How about the server. Is it using MaxConnectionAge? It's possible that there's a bug in the implementation and it isn't gracefully shutting down the connection. (Or maybe the Grace timeout is exceeded)

(Note that MaxConnectionIdle has "idleness" defined at a higher level; it is the time since the last RPC completed, not the last network activity. So it is unlikely to cause a problem.)

The Java client is moving to the very latest version. Are there concerns with compatibility we need to keep in mind with GO server side ?

No. Things should work fine.

Are there timeouts were the underlying connections are closed due to inactivity? I would assume they'd reconnect under covers if so, would the keepAlive help here ? Other options to try?

The Channel should reconnect automatically (and if it isn't that's a really big bug; please file an issue). However, when the connection dies any RPCs on that connection fail with the Status you see. But if you did another RPC immediately after, gRPC should attempt to reconnect and everything "just work."

I noticed on ManagedChannel getState and notifyWhenStateChanged These have @ExperimentalApi and Warning: this API is not yet implemented by the gRPC
So I assume they really can't be used in latest version to auto retry setting up connections when they get disconnected?

The reconnect part of that API is when you want a connection to be available, but aren't sending RPCs. The requestConnection of getState(true) "acts" like you sent an RPC (so it brings up a TCP connection if there isn't one), without actually sending an RPC. Even if implemented, it wouldn't be necessary.

And yes, the API is still unimplemented. In 1.6 more of it will be plumbed, but it still may not be functioning quite right. The API really only has two uses though: 1) notify the application that the Channel is unhealthy and 2) allow the application to cheaply (without sending an RPC) cause a connection to be established.

cr2...@gmail.com

unread,

Aug 9, 2017, 1:11:03 PM8/9/17

to grpc.io, cr2...@gmail.com

Thanks All !

I finally got some more details and they were NOT running with the keep alive on. And yes, was through some proxy. The keep alive did seem to fix their issues. Thanks for the other answers.

On Friday, August 4, 2017 at 10:44:54 AM UTC-4, cr2...@gmail.com wrote:

hongla...@gmail.com

unread,

Dec 19, 2018, 5:54:02 AM12/19/18

to grpc.io

Hi, have been fixed this issue? What's the rootcause?

在 2017年8月4日星期五 UTC+8下午10:44:54，cr2...@gmail.com写道：

Reply all

Reply to author

Forward