grpc C++ - failure to reconnect to server

930 views
Skip to first unread message

cha...@retailnext.net

unread,
May 18, 2016, 3:40:53 PM5/18/16
to grpc.io

Hello,

grpc C++, RELEASE_13_1, running on Android 4.4.


My grpc client fails to re-connect to a server if I physically disconnect the network, wait some time (> 10 mins), and then reconnect the network.

GRPC_TRACE is set to "connectivity_state" and I periodically check the state by calling channel->GetState(false):

1. Starting client (4 streaming services multiplexed on 1 channel):

16:57:09.281237 [    grpc]  <DEBUG>  CONWATCH: 0x7548dabc pick_first: get IDLE
16:57:09.282513 [    grpc]  <DEBUG>  SET: 0x75485058 client_channel: IDLE --> IDLE [new_lb+resolver]
16:57:09.284304 [    grpc]  <DEBUG>  CONWATCH: 0x7548dabc pick_first: from IDLE [cur=IDLE] notify=0x7548c33c
16:57:09.285091 [    grpc]  <DEBUG>  CONWATCH: 0x7548ca74 subchannel: from IDLE [cur=IDLE] notify=0x7548c364
16:57:09.287457 [    grpc]  <DEBUG>  SET: 0x7548ca74 subchannel: IDLE --> CONNECTING [state_change]
16:57:09.289711 [    grpc]  <DEBUG>  SET: 0x7548dabc pick_first: IDLE --> CONNECTING [connecting_changed]
16:57:09.291326 [    grpc]  <DEBUG>  CONWATCH: 0x7548ca74 subchannel: from CONNECTING [cur=CONNECTING] notify=0x7548c364
16:57:09.292361 [    grpc]  <DEBUG>  SET: 0x75485058 client_channel: IDLE --> CONNECTING [lb_changed]
16:57:09.293598 [    grpc]  <DEBUG>  CONWATCH: 0x7548dabc pick_first: from CONNECTING [cur=CONNECTING] notify=0x75485d1c
16:57:09.559146 [    grpc]  <DEBUG>  CONWATCH: 0x754b626c client_transport: from READY [cur=READY] notify=0x754b0c98
16:57:09.560757 [    grpc]  <DEBUG>  SET: 0x7548ca74 subchannel: CONNECTING --> READY [connected]
16:57:09.581935 [    grpc]  <DEBUG>  SET: 0x7548dabc pick_first: CONNECTING --> READY [connecting_ready]
16:57:09.588590 [    grpc]  <DEBUG>  CONWATCH: 0x754b626c client_transport: from READY [cur=READY] notify=0x7548da94
16:57:09.589425 [    grpc]  <DEBUG>  SET: 0x75485058 client_channel: CONNECTING --> READY [lb_changed]
16:57:09.590138 [    grpc]  <DEBUG>  CONWATCH: 0x7548dabc pick_first: from READY [cur=READY] notify=0x7548a7fc


2. Checking connectivity

16:57:54.025261 [    grpc]  <DEBUG>  CONWATCH: 0x75485058 client_channel: get READY

3. Disconnect cable (client continues to write to server - output socket buffer filling - up to a point)

16:59:39.033484 [    grpc]  <DEBUG>  CONWATCH: 0x75485058 client_channel: get READY

Both connections to the server remain open.

4. After 18 minutes or so:

17:17:39.616776 [    grpc]  <DEBUG>  SET: 0x754b626c client_transport: READY --> FATAL_FAILURE [close_transport]
17:17:39.618827 [    grpc]  <DEBUG>  SET: 0x7548dabc pick_first: READY --> FATAL_FAILURE [selected_changed]
17:17:39.619160 [    grpc]  <DEBUG>  SET: 0x7548ca74 subchannel: READY --> FATAL_FAILURE [reflect_child]
17:17:39.619854 [    grpc]  <DEBUG>  SET: 0x75485058 client_channel: READY --> TRANSIENT_FAILURE [lb_changed]

At this point all services fail. We retry by closing and opening new service instances. We're using the blocking API so the new service instances are blocked in the ClientReaderWriter constructor on the completion queue.
Both connections the the server are closed. All this seems OK so far.

5. A few minutes later:

17:19:01.573006 [    grpc]  <ERROR>  getaddrinfo: No address associated with hostname
17:19:01.573913 [    grpc]  <DEBUG>  SET: 0x7548dabc pick_first: FATAL_FAILURE --> FATAL_FAILURE [shutdown]
17:19:01.574632 [    grpc]  <DEBUG>  CONWATCH: 0x754b626c client_transport: unsubscribe notify=0x7548da94
17:19:01.575053 [    grpc]  <DEBUG>  SET: 0x75485058 client_channel: TRANSIENT_FAILURE --> TRANSIENT_FAILURE [new_lb+resolver]


6. Reconnect cable:

17:23:55.667141 [    grpc]  <DEBUG>  CONWATCH: 0x75485058 client_channel: get TRANSIENT_FAILURE

Still in TRANSIENT_FAILURE state. The client never attempts to reconnect again.


Any ideas? Thanks!




Vijay Pai

unread,
Aug 3, 2016, 10:27:03 AM8/3/16
to grpc.io, cha...@retailnext.net
Hello,
We never reproduced this issue, but I wanted to know if you (or any others on the forum) were still experiencing it. I may be able to help get this investigated if it's still a concern.
Regards,
Vijay S. Pai

csen...@gmail.com

unread,
Dec 7, 2017, 10:30:58 AM12/7/17
to grpc.io
Hello,
I am facing a similar issue https://github.com/grpc/grpc/issues/13656
Could you please provide your comments?
Reply all
Reply to author
Forward
0 new messages