Multiple calldata instances with one shared completion queue in Server

36 views
Skip to first unread message

Youbing Li

unread,
Aug 8, 2022, 1:36:08 AMAug 8
to grpc.io
Hello,

 My service has multiple methods.

Async C++ API is using and those methods shared the same Completion Queue in the server side.

In each calldata, there is an object 
    grpc::ServerAsyncResponseWriter<Reply> responder_{nullptr}

In the application layer, each calldata is used by one thread.
If multiple threads simultaneously call responder_->Finish, the server will abort.

It sames like multiple operations on same completion queue is not supported.

GDB stack: (Note: multiple threads call Finish in all stacks)

0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

#1  0x00007ffff76c9859 in __GI_abort () at abort.c:79

#2  0x0000555559003b21 in absl::lts_20211102::raw_logging_internal::(anonymous namespace)::RawLogVA(absl::lts_20211102::LogSeverity, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (severity=absl::lts_20211102::LogSeverity::kFatal, file=0x5555596fdcca "mutex.cc", line=1922, format=0x5555596fdb58 "Check %s failed: %s", ap=0x7ffc5a9eaef0) at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/base/internal/raw_logging.cc:182

#3  0x0000555559003c98 in absl::lts_20211102::raw_logging_internal::RawLog (severity=absl::lts_20211102::LogSeverity::kFatal, file=0x5555596fdcca "mutex.cc", line=1922, format=0x5555596fdb58 "Check %s failed: %s") at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/base/internal/raw_logging.cc:216

#4  0x0000555558f9c962 in absl::lts_20211102::Mutex::LockSlowLoop (this=0x55555af72d28, waitp=0x7ffc5a9eb1f0, flags=0) at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/synchronization/mutex.cc:1922

#5  0x0000555558f9c679 in absl::lts_20211102::Mutex::LockSlowWithDeadline (this=0x55555af72d28, how=0x5555596fde60 <absl::lts_20211102::kExclusiveS>, cond=0x0, t=..., flags=0) at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/synchronization/mutex.cc:1875

#6  0x0000555558f9c33b in absl::lts_20211102::Mutex::LockSlow (this=0x55555af72d28, how=0x5555596fde60 <absl::lts_20211102::kExclusiveS>, cond=0x0, flags=0) at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/synchronization/mutex.cc:1768

#7  0x0000555558f9b309 in absl::lts_20211102::Mutex::Lock (this=0x55555af72d28) at /data/src/open-src/vcpkg/buildtrees/abseil/src/cae233d0bf-0b16e4be7c.clean/absl/synchronization/mutex.cc:1481

#8  0x0000555558ed7038 in gpr_mu_lock (mu=0x55555af72d28) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/core/lib/gpr/sync_abseil.cc:56

#9  0x0000555558c001c8 in grpc_cq_begin_op (cq=0x55555af72c20, tag=0x55555b062210) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/core/lib/surface/completion_queue.cc:666

#10 0x0000555558bf0d49 in call_start_batch (call=0x7ffff0016130, ops=0x7ffc5a9eb590, nops=3, notify_tag=0x55555b062210, is_notify_tag_closure=0) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/core/lib/surface/call.cc:1702

#11 0x0000555558bf10c1 in grpc_call_start_batch (call=0x7ffff0016130, ops=0x7ffc5a9eb590, nops=3, tag=0x55555b062210, reserved=0x0) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/core/lib/surface/call.cc:1761

#12 0x000055555642e799 in grpc::CoreCodegen::grpc_call_start_batch (this=0x55555ae90ad0, call=0x7ffff0016130, ops=0x7ffc5a9eb590, nops=3, tag=0x55555b062210, reserved=0x0) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/cpp/common/core_codegen.cc:113

#13 0x0000555555c4ed5e in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpServerSendStatus, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::ContinueFillOpsAfterInterception (this=0x55555b062210) at /data/src/fabarta/fabarta-compute/build/vcpkg_installed/x64-linux/include/grpcpp/impl/codegen/call_op_set.h:971

#14 0x0000555555c4eb05 in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpServerSendStatus, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::FillOps (this=0x55555b062210, call=0x55555b062078) at /data/src/fabarta/fabarta-compute/build/vcpkg_installed/x64-linux/include/grpcpp/impl/codegen/call_op_set.h:901

#15 0x0000555556445e46 in grpc::Server::PerformOpsOnCall (this=0x55555af57160, ops=0x55555b062210, call=0x55555b062078) at /data/src/open-src/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/cpp/server/server_cc.cc:1305

#16 0x0000555555c2198d in grpc::internal::Call::PerformOps (this=0x55555b062078, ops=0x55555b062210) at /data/src/fabarta/fabarta-compute/build/vcpkg_installed/x64-linux/include/grpcpp/impl/codegen/call.h:68

#17 0x0000555555c4f9c9 in grpc::ServerAsyncResponseWriter<fabarta::compute::ReplyMsg>::Finish (this=0x55555b062070, msg=..., status=..., tag=0x55555b061bc0) at /data/src/fabarta/fabarta-compute/build/vcpkg_installed/x64-linux/include/grpcpp/impl/codegen/async_unary_call.h:355


Youbing Li

unread,
Aug 8, 2022, 10:00:15 AMAug 8
to grpc.io
From folloing logs, we can see there are multiple incoming requests in the same second from the same host:port

[2022-08-08 21:30:18.707037] call_data 0x55555b09b810 from host ipv4:192.168.0.42:50576                                                          
 [2022-08-08 21:30:18.724226] call_data 0x55555b098ca0 from host ipv4:192.168.0.42:50576                                                           
[2022-08-08 21:30:18.762191] call_data 0x55555b09f4a0 from host ipv4:192.168.0.42:38126                                                           
[2022-08-08 21:30:18.762573] call_data 0x55555b1606a0 from host ipv4:192.168.0.42:38126                                                           
[2022-08-08 21:30:18.762507] call_data 0x55555b0a0600 from host ipv4:192.168.0.42:38126
[2022-08-08 21:30:18.762365] call_data 0x55555b187aa0 from host ipv4:192.168.0.42:38126                                                           
[2022-08-08 21:30:18.767045] call_data 0x55555b09a6b0 from host ipv4:192.168.0.42:50576                                                          
 [2022-08-08 21:30:18.768973] call_data 0x55555b0dc0e0 from host ipv4:192.168.0.56:56830                                                           
[2022-08-08 21:30:18.769633] call_data 0x55555b09af20 from host ipv4:192.168.0.56:56830                                                           
[2022-08-08 21:30:18.771011] call_data 0x55555b09c080 from host ipv4:192.168.0.56:56830                                                          
 [2022-08-08 21:30:18.786424] call_data 0x55555b097290 from host ipv4:192.168.0.42:50576

If I lock the mutex before calling "Finish", the issue would be gone, and the side effect is performance downgrade.

If I manually set channel argument GRPC_ARG_MAX_CONCURRENT_STREAMS for both client and side, the issue is gone as well.

Does anyone know the reason?

Youbing Li

unread,
Aug 9, 2022, 9:18:59 PMAug 9
to grpc.io
(gdb) bt
#0  0x00005555559d42de in read_channel_args(grpc_chttp2_transport*, grpc_channel_args const*, bool) ()
#1  0x00005555559d5618 in grpc_chttp2_transport::grpc_chttp2_transport(grpc_channel_args const*, grpc_endpoint*, bool, grpc_resource_user*) ()
#2  0x00005555559e0ba9 in grpc_create_chttp2_transport(grpc_channel_args const*, grpc_endpoint*, bool, grpc_resource_user*) ()
#3  0x00005555559cd710 in grpc_core::(anonymous namespace)::Chttp2ServerListener::ActiveConnection::HandshakingState::OnHandshakeDone(void*, grpc_error*) ()
#4  0x0000555555854ddd in exec_ctx_run(grpc_closure*, grpc_error*) ()
#5  0x0000555555855325 in grpc_core::ExecCtx::Flush() ()
#6  0x0000555555b0a531 in pollset_work ()
#7  0x0000555555ab4933 in pollset_work(grpc_pollset*, grpc_pollset_worker**, long) ()
#8  0x000055555585760d in grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, long) ()
#9  0x000055555588e21b in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
#10 0x000055555588e799 in grpc_completion_queue_next ()
#11 0x00005555555ee99c in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
#12 0x00005555555c4ebe in grpc::CompletionQueue::Next (this=0x555556495370, tag=0x7fffffffdf58, ok=0x7fffffffdf57)
    at /usr/local/include/grpcpp/impl/codegen/completion_queue.h:185

read_channel_args will read the configuration of GRPC_ARG_MAX_CONCURRENT_STREAMS and 
maybe_start_some_streams sets the stream ID and adds into stream map  for the connection.
Reply all
Reply to author
Forward
0 new messages