20 second delay when restarting gRPC service

497 views
Skip to first unread message

Bruno Bowden

unread,
Jul 14, 2017, 3:36:32 PM7/14/17
to grpc.io
NOTE: we are currently using gRPC v1.1.4. For various reasons, it's more challenging for us to upgrade but please advise if this is a fixed issue in a more recent version or any workaround that you'd suggest. Thanks for all your work on gRPC.

I'm trying to solve an issue with a persistent gRPC client where it takes 20 seconds to reconnect after a "Connect Failed" event. This is useful during testing when the service is repeatedly brought up and down, while the client is left running. See the console logs below that shows what repeatedly occurs. After bringing down the service, the client reports "Deadline Exceeded" - it has a 500ms and these failures are expected. When the service is restarted, including responding to other grpc clients, the already running grpc client continues to fail until 20 second after the original "Connect Failed".

My impression is that this is part of the exponential backoff and retry. If I restart the persistent client, then it works immediately. I've tried playing with the grpc.max_reconnect_backoff_ms and grpc.initial_reconnect_backoff_ms settings without any success.

# GRPC SERVICE BROUGHT DOWN
I0714
12:13:17.325851  1793 xxxx.cc:118] gRPC status: Connect Failed
I0714
12:13:18.826437  1794 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:19.826522  1792 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:20.826452  1795 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:21.826382  1798 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:22.826511  1797 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:23.826552  1791 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:24.826550  1795 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:25.826663  1793 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:26.826717  1794 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:27.826637  1792 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:28.826462  1798 xxxx.cc:118] gRPC status: Deadline Exceeded
# GRPC SERVICE SUCCESSFULLY RESTARTED
# expecting reconnect to work immediately
I0714
12:13:29.826494  1795 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:30.826251  1791 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:31.827332  1792 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:32.826256  1796 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:33.826165  1795 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:34.827574  1791 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:35.826184  1792 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:36.825944  1798 xxxx.cc:118] gRPC status: Deadline Exceeded
I0714
12:13:37.326781  1795 xxxx.cc:125] gRPC status: ok
I0714
12:13:38.326337  1797 xxxx.cc:125] gRPC status: ok
I0714
12:13:39.326292  1794 xxxx.cc:125] gRPC status: ok
I0714
12:13:40.326501  1798 xxxx.cc:125] gRPC status: ok

Carl Mastrangelo

unread,
Jul 18, 2017, 2:40:42 PM7/18/17
to grpc.io, br...@aurora.tech
This is indeed because gRPC does exponential connection backoff. To reconnect immediate means we would have to know when the service is back up.  This is a problem because checking if the service is up is sometimes expensive, sometimes even enough to keep the service down.  

At most it will wait 2 minutes after which it will re attach.  I cannot comment on whether those parameters were broken at that release.

Bruno Bowden

unread,
Jul 18, 2017, 2:52:52 PM7/18/17
to Carl Mastrangelo, grpc.io
In our case, we're only using gRPC over a local network, so we're fine with repeated and expensive attempts to reconnect. My first thought is to do do a new call to CreateChannel... but please let me know if there is anything else you'd suggest as a workaround.

--
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/gmdyN2rukbY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/cd863687-5cb7-46c3-965f-adb19f77299a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

br...@aurora.tech

unread,
Jul 18, 2017, 7:19:18 PM7/18/17
to grpc.io, not...@google.com, br...@aurora.tech
Destroying and recreating the stub with NewStub means instant reconnect. So clearly it's possible for it to work. At the same time, I've seen some errant issues with dangling pointers which is a risk when you're constantly recreating objects.

Main point... is there a mechanism in the current version of grpc to disable the exponential backoff? I understand that's generally the wrong approach but in this case, it's quite appropriate for us.
To unsubscribe from this group and all its topics, send an email to grpc-io+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages