Thread 21 (Thread 0x7f54f6bff700 (LWP 51033)):
#0 0x00007f554ad90d43 in epoll_wait () from /lib64/libc.so.6 <-- indefinitely stuck or times out if one is set
#1 0x00007f55416969e5 in add_poll_object (exec_ctx=0x2482ad88, bag=0x188924, bag_type=(POLL_OBJ_POLLSET | unknown: 32596), item=0x0, item_type=(POLL_OBJ_POLLSET | unknown: 1097427428)) at src/core/lib/iomgr/ev_epollsig_linux.c:1484
#2 0x00007f5541697972 in fd_ref (fd=0x4, reason=0x33b8b87c0 <Address 0x33b8b87c0 out of bounds>, file=0x0, line=0) at src/core/lib/iomgr/ev_poll_posix.c:447
#3 0x00007f55416a4691 in tcp_drop_uncovered_then_handle_write (exec_ctx=0x7f54f6bf8680, arg=0x7f4c4e2c9010, error=0x0) at src/core/lib/iomgr/tcp_posix.c:239
#4 0x00007f55416c71a8 in grpc_server_request_registered_call (server=0x0, rmp=0x0, call=0x2482ad88, deadline=0x188924, initial_metadata=0x7f4c4e2c8f00, optional_payload=0x0, cq_bound_to_call=0x4, cq_for_notification=0x33b8b87c0,
tag=0x0) at src/core/lib/surface/server.c:1486
#5 0x00007f55416c75d2 in conforms_to (slice=..., legal_bits=0x7f4c22ce2e50 "\270\064\245\214L\177", err_desc=0x7f54f6bf8701 "\211\030") at src/core/lib/surface/validate_metadata.c:40
#6 0x00007f5541b97105 in HPHP::c_Call_ni_startBatch (this_=0x7f4c22ce2e60, actions=...) at /home/ryangordon/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:529
The stack trace doesn't always look like that but it always is stuck on epoll_wait/epoll_pwait often from gpr_cv_wait.
I've managed to get GPRC_TRACE=all GRPC_VERBOSITY=DEBUG of a good request and a request when it is in the bad state to compare side by side.
The bad request looks like this when it's execution deviates:
D0919 14:54:36.312349629 51030 combiner.c:163] C:0x7f54faf35c00 grpc_combiner_execute c=0x7f4c22edbfb0 last=61995
D0919 14:54:36.312353011 51030 exec_ctx.c:73] closure 0x7f4c22edbec0 finished
I0919 14:54:36.312358626 51030 completion_queue.c:834] grpc_completion_queue_next(cq=0x7f4c22e9e200, deadline=gpr_timespec { tv_sec: 4, tv_nsec: 999000000, clock_type: 3 }, reserved=(nil))...
Whereas, the good request does this:
D0915 15:19:50.178995777 13689 combiner.c:163] C:0x7f932172f700 grpc_combiner_execute c=0x7f9313d67fb0 last=1 D0915 15:19:50.179005880 13689 exec_ctx.c:73] closure 0x7f9313d67ec0 finished D0915 15:19:50.179074826 13689 combiner.c:225] C:0x7f932172f700 grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 time_to_execute_final_list=0 D0915 15:19:50.179077765 13689 combiner.c:243] C:0x7f932172f700 maybe_finish_one n=0x7f9313d67fb0... (and a whole lot more)Effectively it seems like the something is not getting scheduled when it should be (grpc_combiner_continue_exec_ctx never gets run). I've been working on this for a few weeks now and haven't been able to track down why or a fix. I have the full logs if they might be helpful. I've been testing on my branch that is kept pretty close to grpc/grpc master.Does anyone have any idea what is going on? I have the bad process open on a test server in gdb and have timeouts implemented so I can run repeated requests at it to continue debugging if necessary.Thanks,Ryan
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/b3cc406f-9b93-4259-b8f9-df3eb904132a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
#0 0x00007f6ddb4a1a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f6dcbf4471c in gpr_cv_wait (cv=0x7f6db13f8a90, mu=0x7f6a65b9cd90, abs_deadline=...) at src/core/lib/support/sync_posix.c:80
#2 0x00007f6dcbe9bea8 in begin_worker (pollset=0x7f6a65b9cd90, worker=0x7f6db13f8a70, worker_hdl=0x0, now=0x7f6db13f8a30, deadline=...) at src/core/lib/iomgr/ev_epoll1_linux.c:747
#3 0x00007f6dcbe9c7c9 in pollset_work (exec_ctx=0x7f6db13f8b80, ps=0x7f6a65b9cd90, worker_hdl=0x0, now=..., deadline=...) at src/core/lib/iomgr/ev_epoll1_linux.c:941
#4 0x00007f6dcbea9691 in grpc_pollset_work (exec_ctx=0x7f6db13f8b80, pollset=0x7f6a65b9cd90, worker=0x0, now=..., deadline=...) at src/core/lib/iomgr/ev_posix.c:210
#5 0x00007f6dcbecc1a8 in cq_next (cq=0x7f6a65b9cc80, deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:914
#6 0x00007f6dcbecc5d2 in grpc_completion_queue_next (cq=0x7f6a65b9cc80, deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:992
#7 0x00007f6dcc394ecc in HPHP::c_Call_ni_startBatch (this_=0x7f6a9288a670, actions=...) at /opt/build/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:583
#0 0x00007f6dd56b6d43 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f6dcbe9b9e5 in do_epoll_wait (exec_ctx=0x7f6d99ff8b80, ps=0x7f6a65867710, now=..., deadline=...) at src/core/lib/iomgr/ev_epoll1_linux.c:640
#2 0x00007f6dcbe9c972 in pollset_work (exec_ctx=0x7f6d99ff8b80, ps=0x7f6a65867710, worker_hdl=0x0, now=..., deadline=...) at src/core/lib/iomgr/ev_epoll1_linux.c:964
#3 0x00007f6dcbea9691 in grpc_pollset_work (exec_ctx=0x7f6d99ff8b80, pollset=0x7f6a65867710, worker=0x0, now=..., deadline=...) at src/core/lib/iomgr/ev_posix.c:210
#4 0x00007f6dcbecc1a8 in cq_next (cq=0x7f6a65867600, deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:914
#5 0x00007f6dcbecc5d2 in grpc_completion_queue_next (cq=0x7f6a65867600, deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:992
#6 0x00007f6dcc394ecc in HPHP::c_Call_ni_startBatch (this_=0x7f6a624e0670, actions=...) at /opt/build/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:583
D0919 14:54:36.312349629 51030 combiner.c:163] C:0x7f54faf35c00 grpc_combiner_execute c=0x7f4c22edbfb0 last=61995
D0919 14:54:36.312349629 51030 combiner.c:163] C:0x7f54faf35c00 grpc_combiner_execute c=0x7f4c22edbfb0 last=61995
I0919 14:54:36.312170818 51030 call.c:2069] grpc_call_start_batch(call=0x7f4c22edb020, ops=0x7f550e39fde0, nops=4, tag=0x7f4c22edb020, reserved=(nil))
I0919 14:54:36.312199708 51030 call.c:1677] ops[0]: SEND_INITIAL_METADATA
..
I0919 14:54:36.312210605 51030 call.c:1677] ops[1]: RECV_INITIAL_METADATA ptr=0x7f4c22e9e510
I0919 14:54:36.312214004 51030 call.c:1677] ops[2]: SEND_MESSAGE ptr=0x7f4c262df200
I0919 14:54:36.312216804 51030 call.c:1677] ops[3]: SEND_CLOSE_FROM_CLIENT
..
D0919 14:54:36.312362846 51030 completion_queue.c:812] PENDING TAGS: 0x7f4c22edb020
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d8cd586a-d329-4a51-a030-fff17c458e7d%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/0924073e-285e-444e-aa95-b500dbb34320%40googlegroups.com.
Yes, GRPC_VERBOSITY=DEBUG.
The primary cause of your problem is that you're calling completion_queue_pluck after calling cancel_with_status. Assuming you're following the model of the current PHP wrapper and calling pluck with an infinite timeout, you should only do so exactly once for each operation that creates a tag on the queue. The function start_batch creates a tag; cancel_with_status does not. So blocking forever is the expected behavior in that situation.
Also, you should not need to call cancel_with_status at all. If the call has timed out, that should mean that the call has already finished and that there is already a status.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/cf6ecb95-47be-4b0a-bcc1-d79b22ee5909%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/571eb275-53a4-4b82-8203-8eb5db9ade87%40googlegroups.com.