Grpc Keep alive not working properly when the client and server have an istio side car running

54 views
Skip to first unread message

Prathish Elango

unread,
Sep 29, 2025, 10:22:08 AM (7 days ago) Sep 29
to grpc.io
So I'm running into some weird issues with configuring keep alive. I have java based grpc client and server running in same namespace. Each has their own istio proxy running as side car.

Firstly when I look into tcp dump, I can see PINGS[0] from client to server but no PING[0] with ACK set back. The connections is not getting reset however. Even if the rate is less than keepalivepermit time im not seeing any GO AWAY[0]. I filtred out TCP type and checked all other packets. This packet was took from all of the interface from the pod like this

kubectl exec -n namespace -i podname -- tcpdump -i any -U -s0 -w - | wireshark -k -i -

Later i enabled trace logs of io.grpc. In that I can see the below in client logs

│ {"t":"2025-09-28T05:23:13.521Z","msg":"[id: 0x1ac0fbdb, L:/10.XX.XX.XX:55332 - R:grpc-server-dev.svc.cluster.local/172.XX.XX.XX:6868] OUTBOUND PING: ack=false bytes=1111","lgr":"io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler","trd":"gr │
│ pc-default-worker-ELG-1-2","lvl":"DEBUG"} │
│ {"t":"2025-09-28T05:23:13.521Z","msg":"[id: 0x1ac0fbdb, L:/10.XX.XX.XX:55332 - R:grpc-serve-dev.svc.cluster.local/172.XX.XX.XX:6868] INBOUND PING: ack=true bytes=1111","lgr":"io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler","trd":"grpc │
│ -default-worker-ELG-1-2","lvl":"DEBUG"}

To my surprise I can see in client logs the INBOUNG and OUTBOUND PING with its ACK. This means my client is getting ack for its PINGS. I still haven't found out why i didnt get this in my TCP dump

BUT I dont see the reverse of this in my server logs. It has also io.grpc trace logs enabled. I don't see the INBOUND PING with ack=false and OUTBOUND PING with ack=true in my server logs

And I don't see GO_AWAY also if permitkeepalive threshold is greater than my keepalive time.

This setup completely works fine when I run these two services in my local machine. Everything works fine. However in my cluster im facing these weird issues. Any help would be greatly appreciated, I'm breaking my head for a week now trying to find out what is happening

Eric Anderson

unread,
Sep 29, 2025, 10:30:18 AM (7 days ago) Sep 29
to Prathish Elango, grpc.io
On Mon, Sep 29, 2025 at 7:22 AM Prathish Elango <prathi...@gmail.com> wrote:
BUT I dont see the reverse of this in my server logs. It has also io.grpc trace logs enabled. I don't see the INBOUND PING with ack=false and OUTBOUND PING with ack=true in my server logs

And I don't see GO_AWAY also if permitkeepalive threshold is greater than my keepalive time.

That's because the keepalive is limited to the HTTP/2 connection. It can go through L4/TCP proxies, but Istio will have an L7 proxy. So you're doing keepalive between the client and the sidecar. For what are you wanting to use keepalive?

Prathish Elango

unread,
Sep 29, 2025, 11:23:35 AM (7 days ago) Sep 29
to grpc.io

Hi Eric, Thanks for checking this. Sorry if this is a duplicate response. Not sure if the last message reached as my machine shutdown just as I hit the send

I'm analysing on using grpc keep alive to detect dangling connections between my client and server like hardware failure or something similar which brings down the server and the connection doesn't get closed. This is helping me in case I don't use a proxy in between my services. And as you said istio side car is responding to h2 pings from my clients, I modified ip tables of my server to simulate a blackhole kind of scenario and my client was still sending those h2 pings and getting back ack for it

Do you have a suggestion on what I can do to make sure my client detects these kinds of broken servers? Right now all the calls are unary between them

Eric Anderson

unread,
Sep 29, 2025, 5:39:52 PM (6 days ago) Sep 29
to Prathish Elango, grpc.io
On Mon, Sep 29, 2025 at 8:23 AM Prathish Elango <prathi...@gmail.com> wrote:
I'm analysing on using grpc keep alive to detect dangling connections between my client and server like hardware failure or something similar which brings down the server and the connection doesn't get closed.

If you're concerned about the machine getting wedged, then you'd want keepalive between the proxy sidecars. If you are concerned with the process getting wedged, then you'd want it between the client and proxy, and proxy and server. (For the client to monitor its sidecar, and for the sidecar to monitor the server.)

client → sidecar → sidecar → server

Each → could have its own keepalive driven by the client-side/left-side of the connection. You're already doing client→sidecar.

This is helping me in case I don't use a proxy in between my services. And as you said istio side car is responding to h2 pings from my clients, I modified ip tables of my server to simulate a blackhole kind of scenario and my client was still sending those h2 pings and getting back ack for it

Do you have a suggestion on what I can do to make sure my client detects these kinds of broken servers? Right now all the calls are unary between them

I see Istio has a setting for TcpKeepalive. TCP keepalive is relevant between the two sidecars. It isn't useful between the sidecar and server. But that might be what you want to detect machines breaking. Your question is really an Istio question, and I haven't actually used Istio.

Prathish Elango

unread,
Oct 1, 2025, 11:58:55 PM (4 days ago) Oct 1
to grpc.io
Thanks for your inputs. I will look further at Istio side if anything can be done to achieve this
Reply all
Reply to author
Forward
0 new messages