Connection leak on GRPC Java client

557 views
Skip to first unread message

Arthur Naseef

unread,
Apr 27, 2023, 5:43:37 PM4/27/23
to grpc.io
I am running into an issue with the GRPC Java client in which the client leaks connections over time.  Reading through the grpc-java code, debugging, and instrumenting has led my the following question:
  • Does the netty client code ever close the connection except when it sees the socket close intiaited externally (i.e. by the O/S or the server)?
Here is a small project that (1) contains a description of the problem and some of the history related to it, and (2) can be used to reproduce the connection leak.


In brief, we see the following:
  • The NGINX ingress times out a request
  • The NGINX ingress sends a GOAWAY packet to the client.
  • The client channel transitions to IDLE but does not close the connection.  
  • The client creates a new connection for the channel, which transitions to CONNECTING and then READY
  • The list of transports for the channel holds the leaked connections
Note that switching to the OK HTTP implementation appears to improve the results with the test tool, but our main application still observes leaked connections when running with OK HTTP.

Any help is appreciated.  I can certainly share more details as needed.

Art

sanjay...@google.com

unread,
Apr 28, 2023, 5:13:58 AM4/28/23
to grpc.io
Take a look at https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md - it says "...channels that receive a GOAWAY when there are no active or pending RPCs should also switch to IDLE..."

Also according to https://github.com/grpc/grpc-java/blob/master/api/src/main/java/io/grpc/ManagedChannel.java#L78 if you call `getState` with `true` then "the channel will try to make a connection if it is currently IDLE ". And that might explain why your `getState` call itself causes a new connection to be created. I haven't looked at your code in detail but do you need to call `getState` with `true`? Can you try with `false` ?





Arthur Naseef

unread,
Apr 28, 2023, 11:42:58 AM4/28/23
to grpc.io
Thank you for the response.  we are aware of the semantics, and they do as advertised - the Channel goes into IDLE on the GOAWAY.  However, the CONNECTION itself lingers indefinitely.  So every time we get a GOAWAY from the server, we leak a connection - until that connection is closed by the server itself.  I was expecting the connection to close after receiving the GOAWAY.

We call getState with true as a means of ensuring the client does its best to keep the connection to the server.  The README.md in the POC project explain why.  In short - the server pushes messages to the client (via GRPC stream), the server cannot initiate the connection to the client, and the client does not know when the server will send messages.  So, the client does it's best to keep the connection to the server active at all times.

Art

Arthur Naseef

unread,
Apr 28, 2023, 11:44:16 AM4/28/23
to grpc.io
Let me clarify one point - I was expecting the client to close the connection after the GOAWAY.  Is there a reason to leave the connection open after that point?

Art

Yuri Golobokov

unread,
Apr 28, 2023, 12:06:55 PM4/28/23
to Arthur Naseef, grpc.io
GOAWAY just prevents new streams(calls) from being started on the connection. If you have live streams on the connection it will stay open until all calls are completed.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d9a8271c-ee25-4b76-a802-546c69e4cedbn%40googlegroups.com.

Arthur Naseef

unread,
Apr 28, 2023, 1:20:12 PM4/28/23
to grpc.io
Should the connection close after all calls are complete, or have failed to start?

Art

Yuri Golobokov

unread,
Apr 28, 2023, 3:03:21 PM4/28/23
to Arthur Naseef, grpc.io
Yes, it should close. But I'm not sure if the client or the nginx initiates the closing.

Arthur Naseef

unread,
Apr 28, 2023, 3:53:44 PM4/28/23
to grpc.io
The POC project I linked can be used to see that the client never initiates a close - at least for the Netty client code.  So a faulty server/proxy/gateway can cause the client to leak connections indefinitely.  Digging through the GRPC + Netty code, I did not find any path that closes the connection except when the socket close is seen by the client.

Trying with the OK HTTP implementation, it's better, but I still am running into a problem that the POC does not reproduce.

Art

Sanjay Pujare

unread,
Apr 30, 2023, 5:12:28 AM4/30/23
to Arthur Naseef, grpc.io
Hmmm, so what you are saying is that the current logic assumes that an "idle" connection is to be closed from the server side and only then the client side will perform the corresponding clean up.

Are you able to see (say with netstat) that connections are getting leaked since the client never closes them? And on these connections there are no outstanding RPCs?

Arthur Naseef

unread,
May 1, 2023, 12:08:20 PM5/1/23
to grpc.io
Correct - the client is not cleaning up those connections.  Netstat shows the ESTABLISHED connections increasing over time.

The POC test code shows this can happen even when no GPC call is made, but instead the `channel.getState(true)` call is made by the client application.  It can also show that an attempt to make a GRPC call can create a connection that is never closed.  The README.md in that file has scenarios listed and instructions to reproduce the symptoms.

Art

Arthur Naseef

unread,
May 1, 2023, 12:17:39 PM5/1/23
to grpc.io
I updated the README.md for the test project this morning to show a use-case that leaks connections without calling `getState(true)`.


Art
Reply all
Reply to author
Forward
0 new messages