When and how to use keepalive?

895 views
Skip to first unread message

san...@saares.eu

unread,
Jun 18, 2018, 5:17:33 AM6/18/18
to grpc.io
What is the keepalive feature for? What is the right way to use it?

I am interested in having connection drops detected fast and reconnects happen fast. So I tried setting low keepalive thresholds.

                // We expect this connection to be rock solid.
                new ChannelOption("grpc.keepalive_time_ms", 1000),
                new ChannelOption("grpc.keepalive_timeout_ms", 1000),
                new ChannelOption("grpc.keepalive_permit_without_calls", 1),

That didn't work out well. I see a lot of "Status(StatusCode=Internal, Detail="keepalive watchdog timeout")" errors now (using .NET client). What's worse, I see such errors for calls that actually succeed as far as the server knows. This leads to headache on my non-idempotent APIs when the client retries (keepalive timeout -> seems logical to retry the lost connection, no?)

Clearly I am doing it wrong. So I ask the wider audience - what is the right way to configure gRPC to detect and recover from fast from connections? Both my client and server are on the same server, so I expect them to respond very fast to each other.

Aaron Beitch

unread,
Jun 18, 2018, 8:28:13 PM6/18/18
to grpc.io
Ideally, you wouldn't need to rely on keepalives to know when a connection is closed. If the server or client exits cleanly they should notify the other end immediately.

keepalives can be used to detect unclean exits. Such as a when a server or client loses power or connectivity.

Aaron

san...@saares.eu

unread,
Jun 19, 2018, 5:32:41 AM6/19/18
to grpc.io
Okay that makes sense. Dirty exits are exactly what I want to detect and recover from.

Do you see any obvious reason why my configuration of 1000ms interval with 1000ms timeout would lead to errors of "keepalive watchdog timeout" even when no connection is broken and the call actually succeeded? Is it just too tight of an interval? Can you recommend some "sane" settings?

yas...@google.com

unread,
Jun 20, 2018, 3:03:45 PM6/20/18
to grpc.io
By default, the timeout period at the client is set to 20 seconds.
Reply all
Reply to author
Forward
0 new messages