gRPC stuck in epoll_wait state

227 views
Skip to first unread message

nupur uttarwar

unread,
Dec 13, 2021, 7:43:01 PM12/13/21
to grpc.io

Hello,

We are using gnmi-cli client to configure ports which sends a unary rpc request to gRPC.

Eg: sudo gnmi-cli set "device:virtual-device,name:net_vhost0,host:host1,device-type:VIRTIO_NET,queues:1,socket-path:/tmp/vhost-user-0,port-type:LINK"

This was working fine with gRPC version 1.17.2. We are trying to upgrade gRPC and other dependent modules used in our project. After upgrading to 1.33 version, gnmi client send request is stuck in epoll_wait indefinitely. Here is the back trace:

0x00007f85e9bc380e in epoll_wait () from /lib64/libc.so.6

(gdb) bt

#0  0x00007f85e9bc380e in epoll_wait () from /lib64/libc.so.6

#1  0x00007f85eb642864 in pollable_epoll(pollable*, long) () from /usr/local/lib/libgrpc.so.12

#2  0x00007f85eb6432e9 in pollset_work(grpc_pollset*, grpc_pollset_worker**, long) () from /usr/local/lib/libgrpc.so.12

#3  0x00007f85eb64acd5 in pollset_work(grpc_pollset*, grpc_pollset_worker**, long) () from /usr/local/lib/libgrpc.so.12

#4  0x00007f85eb652cde in grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, long) () from /usr/local/lib/libgrpc.so.12

#5  0x00007f85eb6b9c50 in cq_pluck(grpc_completion_queue*, void*, gpr_timespec, void*) () from /usr/local/lib/libgrpc.so.12

#6  0x00007f85eb6b9ed3 in grpc_completion_queue_pluck () from /usr/local/lib/libgrpc.so.12

#7  0x00007f85ea856f2b in grpc::CoreCodegen::grpc_completion_queue_pluck(grpc_completion_queue*, void*, gpr_timespec, void*) ()

   from /usr/local/lib/libgrpc++.so.1

#8  0x00000000005db71e in grpc::CompletionQueue::Pluck (this=0x7ffec74be7e0, tag=0x7ffec74be840)

    at /usr/local/include/grpcpp/impl/codegen/completion_queue.h:316

#9  0x00000000005e7467 in grpc::internal::BlockingUnaryCallImpl<gnmi::SetRequest, gnmi::SetResponse>::BlockingUnaryCallImpl (this=0x7ffec74beaa0,

    channel=<optimized out>, method=..., context=0x7ffec74beea0, request=..., result=0x7ffec74bec40)

    at /usr/local/include/grpcpp/impl/codegen/client_unary_call.h:69

#10 0x00000000005d5dab in grpc::internal::BlockingUnaryCall<gnmi::SetRequest, gnmi::SetResponse> (result=0x7ffec74be670, request=...,

    context=0x7ffec74bebf0, method=..., channel=<optimized out>) at /usr/local/include/grpcpp/impl/codegen/client_unary_call.h:38

#11 gnmi::gNMI::Stub::Set (this=<optimized out>, context=context@entry=0x7ffec74beea0, request=..., response=response@entry=0x7ffec74bec40)

    at p4proto/p4rt/proto/p4/gnmi/gnmi.grpc.pb.cc:101

#12 0x000000000041de62 in gnmi::Main (argc=-951325536, argv=0x7ffec74bee20) at /usr/include/c++/10/bits/unique_ptr.h:173

#13 0x00007f85e9aea1e2 in __libc_start_main () from /lib64/libc.so.6

#14 0x000000000041a06e in _start () at /usr/include/c++/10/new:175

 

Comparing the successful and unsuccessful logs, I can see that grpc gets stuck in epoll_wait state waiting for OP_COMPLETE event after grpc_call_start_batch is started.

After investigating further, I can see that this issue started from version 1.32.0, mainly after this commit(https://github.com/grpc/grpc/pull/23372). Just before this commit, it works fine.

Attached are the logs with with GRPC_TRACE=all,-timer_check,-timer and GRPC_VERBOSITY=DEBUG for reference. List of the logs attached:

Any idea why this is happening? Please let me know if you need more logs or any other information to assist further.

 

Thanks,

Nupur Uttarwar

AJ Heller

unread,
May 17, 2022, 6:39:27 PM5/17/22
to grpc.io
If you're still having this issue, it would be worth trying to upgrade to gRPC v1.46.0 or newer. The default polling engine has been removed, so if there is still an underlying bug in gnmi or gRPC, it may show up in some other way.
Reply all
Reply to author
Forward
0 new messages