keep channel alive without activity

595 views
Skip to first unread message

Rajat Goyal

unread,
Jan 9, 2022, 8:20:29 AM1/9/22
to grpc.io
Hi, 

     We have a system where clients open bi-directional grpc stream to ALB, which proxies to one of active server. So 
        
                         bi-di                     
        client <---------------->  ALB  <----------------> server

In-case of any failure of connection, clients re-connects to us as we want to keep a bi-di channel open. 

Question is : How can we keep the channel open even if there is no activity for sometime. ALB are configured with 300 sec idle-timeout which means it will drop the connection if no packets are exchanged in 300 sec. 

As we want to keep the connection open as much possible ( only re-create in-case of any issue),  and not let it die due to idle timeout, what properties should server can client set ?
Should keep-alive setting at both client & server help out ? 


Rajat Goyal

unread,
Jan 10, 2022, 11:54:25 AM1/10/22
to grpc.io
Hi,

      Gentle reminder for any resolution for above.

Regards,
Rajat

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/532f5551-e978-467e-b71c-0031a54953bfn%40googlegroups.com.

Sanjay Pujare

unread,
Jan 10, 2022, 12:03:04 PM1/10/22
to Rajat Goyal, grpc.io

Rajat Goyal

unread,
Jan 10, 2022, 12:10:09 PM1/10/22
to Sanjay Pujare, grpc.io
Hi Sanjay,

     I see that bi-directional streamObserver object gets call back onError() in case of any error in network. 

Isn't that done by any heartbeat mechanism already?. If so, then connection at ALB should be active with these ping-pong packets ?

Regards,
Rajat

Rajat Goyal

unread,
Jan 10, 2022, 3:16:24 PM1/10/22
to Sanjay Pujare, grpc.io
ALB is configured with idle-timeout - 5 minutes. 
I configured bi-di client with :             keepAliveWithoutCalls(true).keepAliveTime(90, TimeUnit.SECONDS).keepAliveTimeout(10, TimeUnit.SECONDS)
while server is configured with :        permitKeepAliveWithoutCalls(true).permitKeepAliveTime(1, TimeUnit.MINUTES)

But I received INTERNAL: HTTP/2 error code: PROTOCOL_ERROR Received Rst Stream after exactly 5 minutes. Which looks like ALB has dropped the connection after 5 minutes.

Any idea how we can keep idle connection alive ?


Sanjay Pujare

unread,
Jan 10, 2022, 5:31:18 PM1/10/22
to Rajat Goyal, grpc.io

Rajat Goyal

unread,
May 16, 2022, 11:35:57 AM5/16/22
to Sanjay Pujare, grpc.io
Hi Sanjay / Grpc team, 

        I have implemented RPC based regular pings. Like I am sending some dummy request each minute to LB once connected from client side.
I observed below : 
    a) This method works fine if there is no request from the server side. This way the connection is alive for many hours without issues.
    b) But this method doesn't work if there is some bi-directional request from the server. The moment it receives a response from server, the connections is dropped from LB after exactly 5 mins, even if I am sending a regular dummy request from the client side every minute. 

I also checked the server logs the dummy request is being received every min, which means client is sending regular 1 min ping, but still LB is dropping the connection. 
While in case there is no response from server side, the connection is not dropped by LB.

Can you please help me what grpc / LB might be doing in both cases ?

Regards,
Rajat

Sanjay Pujare

unread,
May 16, 2022, 10:48:22 PM5/16/22
to Rajat Goyal, grpc.io
Comments inline below:

On Mon, May 16, 2022 at 8:35 AM Rajat Goyal <rajatgoy...@gmail.com> wrote:
Hi Sanjay / Grpc team, 

        I have implemented RPC based regular pings. Like I am sending some dummy request each minute to LB once connected from client side.
I observed below : 
    a) This method works fine if there is no request from the server side. This way the connection is alive for many hours without issues.

"request from the server side": just want to clarify what is being said. You mean a response message since a server can only send response messages back. Or did you mean server sending requests on a different connection to another server?

 
    b) But this method doesn't work if there is some bi-directional request from the server. The moment it receives a response from server, the connections is dropped from LB after exactly 5 mins, even if I am sending a regular dummy request from the client side every minute. 

Again I would like to clarify your "bi-directional request from the server" : do you mean a bi-di RPC into the server where server is sending responses to the client? And in such a case the LB drops the connection after 5 mins?
 

I also checked the server logs the dummy request is being received every min, which means client is sending regular 1 min ping, but still LB is dropping the connection. 
While in case there is no response from server side, the connection is not dropped by LB.

Can you please help me what grpc / LB might be doing in both cases ?

To summarize my understanding of what you are. saying: when a connection is established through the ALB to a gRPC backend the connection stays alive indefinitely if the client only sends a dummy RPC every minute. This dummy RPC has a dummy request but no response (only header(s) and status code). As soon as you send any real RPCs where the server sends any response messages then the LB does not keep the connection alive but drops it 5 minutes after the last non-dummy RPC. Is this correct?

Rajat Goyal

unread,
May 17, 2022, 3:10:05 PM5/17/22
to Sanjay Pujare, grpc.io
Yes Sanjay your understanding is correct. 

We solved this by 

   a) Client sends some dummy request to server every 1 min. This is a actual request defined in proto buf by passing some type like dummy request.

   b) On reception of every such above request, server responds back with a dummy response, which client ignores based on request type like dummy response.

  Earlier we were not doing part-b, only part-a was there and server was just ignoring it.

Now issue is solved completely, when we implemented part-b as well.

Regards,
Rajat
Reply all
Reply to author
Forward
0 new messages