URGENT: Crash in grpc code while running asynchorous stream for long time.

601 views
Skip to first unread message

Chaitanya Gangwar

unread,
Mar 27, 2017, 2:47:10 AM3/27/17
to grpc.io
Hi ,

I am seeing following a crash in grpc code. I have 5 node setup, where all nodes are streaming out data. After few hours of streaming, I am seeing a crash on 2 nodes but rest 3 nodes are working fine. It is not always reproducible. Looks like some timing issue. Below is the crash :

#0  0x7438e660 in __GI_raise (sig=sig@entry=6)

    at ../nptl/sysdeps/unix/sysv/linux/raise.c:67

#1  0x743900ae in __GI_abort () at abort.c:92

#2  0x748b847a in grpc::Server::PerformOpsOnCall(grpc::CallOpSetInterface*, grpc::Call*) () from /usr/local/grpc/1.1.0/lib/libgrpc++_unsecure.so.1

#3  0x74a20c97 in PerformOps (ops=<optimized out>, this=<optimized out>)

    at oss-binaries-x86-gcc-4.8.2-glibc-2.13/usr/local/grpc/1.1.0/include/grpc++/impl/codegen/call.h:669

#4  Write (tag=<optimized out>, msg=..., this=<optimized out>)

    at oss-binaries-x86-gcc-4.8.2-glibc-2.13/usr/local/grpc/1.1.0/include/grpc++/impl/codegen/async_stream.h:431

#5  WaveGrpcClientInfo<com::telemetry::interface::InterfaceStatisticsProfileResponse>::postDataToClient (this=0x9ad64a0, pdata=0x72611028,

    deleteProtoBuf=false) at ./GRPCInterface/WaveGrpcClientInfo.cpp:47


Env Information:

- cpp server

- python client

- grpc library version : 1.1.0-dev

- Asynchrous streaming


Code :

template <class T> void WaveGrpcClientInfo<T>::postDataToClient (void *pdata, bool deleteProtoBuf)
{
    //check if client is still alive, if client is disconnected, just return.
    if (m_serverContext->IsCancelled ())
    {
        return;
    }

    // post the protobf output to client.
    m_writer->Write (*(static_cast<T*>(pdata)), m_uniqueTag);

    if (true == deleteProtoBuf)
    {
        WaveNs::tracePrintf (TRACE_LEVEL_DEVEL, "WaveGrpcClientInfo::postDataToClient: Deleting payload.");
        delete ((static_cast<T*>(pdata)));
    }

}


ServerAsyncWriter<T>*  m_writer;


Is this a known issue ? is it fixed in any newer release ? Please help.


Thanks

Chaitanya

Craig Tiller

unread,
Mar 27, 2017, 9:40:23 AM3/27/17
to Chaitanya Gangwar, grpc.io

It looks like you're trying to start a new write while there's one already outstanding. This is unsupported and will crash (although we should admittedly give better messaging).

You need to ensure that the previous write has delivered it's tag back via the completion queue before starting a new one.


--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/cfe889a4-6ddb-424f-8f2e-156b770bc7b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chaitanya Gangwar

unread,
Mar 27, 2017, 10:19:41 AM3/27/17
to grpc.io, chaitany...@gmail.com

Thanks Craig for your response.

In my design, I have one thread where I start the grpc server and wait on completion queue for events. As soon, any client connects I store the AsynWriter pointer on a map and from some other thread I am doing a post on that AsyncWriter every 5 secs. is this a wrong design?  I agree with this design, the thread which is doing a write can start a new write again instead of waiting for an old write to complete.  Could you please suggest any changes fix this?

Thanks
Chaitanya

Craig Tiller

unread,
Mar 27, 2017, 11:07:23 AM3/27/17
to Chaitanya Gangwar, grpc.io
If gRPC allowed you to start a new write before the old one completed, then there's an opportunity for unbounded memory growth, which could crash your application (out of memory errors).

Instead, we limit the outstanding write requests to one, and force the application to deal with push-back.

In your case, you should track whether each write is completed. I can think of a variety of strategies to deal with a straggling client:
1. disconnect if the last message isn't consumed
2. buffer N messages and then goto 1
3. delay the next round of messages until all clients have consumed the current message

Reply all
Reply to author
Forward
0 new messages