Channel State

320 views
Skip to first unread message

mauricio...@lacity.org

unread,
Oct 19, 2018, 12:56:34 PM10/19/18
to grpc.io
I started with a simple synchronous unary call to a go based server from cpp.  We decided to add a health check routine so that if we're disconnected when we re-established communication, I could sync data.  I have a timer that every 5 seconds calls the GetState method to get the current state.  That was working great.  

We then had to switch to the Async version of the rpc.  I based my code off the example in the greeter_async_client_2 example and ever since the health check routine crashes everytime it calls GetState.  

Program terminated with signal 11, Segmentation fault.
#0  0x00007f4b123c7920 in grpc::Channel::GetState(bool) () from /usr/local/lib/libgrpc++.so.1
(gdb) bt
#0  0x00007f4b123c7920 in grpc::Channel::GetState(bool) () from /usr/local/lib/libgrpc++.so.1
#1  0x0000000000430289 in MobileFeedServer::OnHealthCheck (this=0x660f80 <mfeed>) at mobilefeed.cpp:865
#2  0x0000000000415723 in MobileFeedServer::TimerHealthCheck::onTimer (this=<optimized out>, timer=<optimized out>) at mobilefeed.cpp:858
#3  0x0000000000445014 in MiddlewareEvent_TimerCB (event=<optimized out>, msg=<optimized out>, closure=0x661068 <mfeed+232>) at /middleware/src/version/libmiddlewarecpp/event.cpp:357
#4  0x00007f4b126405c2 in _middlewareQueue_DispatchEx (q=0x1fb35b0, wait=wait@entry=0, ignoreListenerLimit=ignoreListenerLimit@entry=MIDDLEWARE_TRUE) at /middleware/src/version/libmiddleware/disp.c:471
#5  0x00007f4b1264082a in _middlewareQueue_Dispatch (q=<optimized out>, wait=wait@entry=0) at /middleware/src/version/libmiddleware/disp.c:538
#6  0x00007f4b12640cd0 in MiddlewareQueueGroup_TimedDispatch (qgroup=<optimized out>, wait=<optimized out>, wait@entry=0.5) at /middleware/src/version/libmiddleware/disp.c:733
#7  0x000000000044592c in MiddleWareQueueGroup::timedDispatch (this=this@entry=0x660ff0 <mfeed+112>, timeout=timeout@entry=0.5) at /middleware/src/version/libmiddlewarecpp/qgroup.cpp:58
#8  0x00007f4b12eb6591 in MyTask::MainLoop (this=0x660f80 <mfeed>, transport=<optimized out>) at mytask.cxx:317
#9  0x00007f4b12eb4529 in MyTask::MainLoop (this=this@entry=0x660f80 <mfeed>, Description=Description@entry=0x449e12 "MobileFeed", argc=argc@entry=7, argv=argv@entry=0x7fffa279e6e8) at mytask.cxx:272
#10 0x000000000040ff83 in main (argc=7, argv=0x7fffa279e6e8) at mobilefeed.cpp:958
(gdb)

My code is fairly straightforward.  When my timer goes off, I save my state as the previous state, get the current state, then if the current is ready and the previous wasn't, I synchronize.  

grpc_connectivity_state mystate;

mystate
= grpcClient->channelinterface->GetState(true);

grpcClient
->SetCurrentState(mystate);


if ((mystate == GRPC_CHANNEL_READY) && (grpcClient->GetOldState() != GRPC_CHANNEL_READY))

{
     
// synchronize data
}
else
     
// log state

Is there a better way to implement this health check?  

Mark D. Roth

unread,
Oct 24, 2018, 3:52:04 PM10/24/18
to mauricio...@lacity.org, grp...@googlegroups.com
I'm not sure exactly what is causing this crash, but I suspect it may be a side-effect of the way you're monitoring the channel's connectivity state via a timer.  For example, it could be that the timer is firing after the channel has already been destroyed (either because you're failing to cancel the timer when you destroy the channel or due to some race condition in your timer code).

A much better approach would be to use either the NotifyOnStateChange() method or the WaitForConnected() method to get notified when the state changes, so that you don't have to mess with timers to periodically check the state or worry about lifetime issues.  The difference between these two methods is that the former uses a completion queue whereas the latter blocks a thread waiting for the state change.  If you're already using the C++ async API, then you probably want the former; otherwise, you probably want the latter.

If you use NotifyOnStateChange(), note that this API may miss state transitions.  For example, if the channel was in state IDLE when you first got a result and is now in state READY, it's likely that the channel was in state CONNECTING in between the two, but you might never have gotten a notification for it.  But that's probably what you want, since what you really care about is when the channel goes into READY state, and if you miss that transition, that means that the channel went into READY but then immediately switched to another state, in which case you probably wouldn't want to have started a new attempt anyway.

If you use WaitForConnected(), you will probably want to call that whenever an RPC fails with a status like UNAVAILABLE.  Then it will block until the channel is reconnected, at which point you can retry your RPC.

I hope this information is helpful.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/73677090-c564-4c09-be76-64b8dea35bdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

Mauricio Ramirez

unread,
Oct 24, 2018, 3:54:45 PM10/24/18
to ro...@google.com, grp...@googlegroups.com
Mark,

Thanks for the information.  This was very helpful.
--
Mauricio Ramirez
Programmer Analyst

Reply all
Reply to author
Forward
0 new messages