Hi,
I see an occasional when shutting down my server (version 1.4.2, VS 2015, Windows).
My set up: I use async api and have 2 threads.
1. Thread 1 processes the tags from the completion queue. It also executes the shutdown request.
2. Thread 2 plucks the tags from the completion queue and queues them for the thread 1 to process.
After running for a while, on thread 1, I call server shutdown followed by the completion
queue shutdown. The code looks like
if (mServer != nullptr)
mServer->Shutdown(gpr_time_add(gpr_now(GPR_CLOCK_MONOTONIC), gpr_time_from_micros((deadline - TimeValue::getTimeOfDay()).getMicroSeconds(), GPR_TIMESPAN)));
if (mCQ != nullptr)
mCQ->Shutdown();
Meanwhile, thread 2 is doing following
while (true)
{
if (!mCQ->Next((void**)&tagInfo.tagProcessor, &tagInfo.ok))
{
break;
}
}
So at some point, thread 2 notices that the completion queue has been shutdown and it exits out of the loop.
Thread 1 continues to process the tags that were received previously. One of them happened to be 'done' tag of a previous rpc operation and in response to it, I destroy the rpc (my own class). This causes the ServerContext held by the rpc to be destroyed and I see following crash.
ntdll.dll!00007ffe6f7ebbdf() Unknown
ntdll.dll!00007ffe6f7c4571() Unknown
ntdll.dll!00007ffe6f7c4490() Unknown
gpr_mu_lock(gpr_mu * mu) Line 53 C
interned_slice_destroy(interned_slice_refcount * s) Line 94 C
interned_slice_unref(grpc_exec_ctx * exec_ctx, void * p) Line 112 C
grpc_slice_unref_internal(grpc_exec_ctx * exec_ctx, grpc_slice slice) Line 76 C
destroy_channel_elem(grpc_exec_ctx * exec_ctx, grpc_channel_element * elem) Line 926 C
grpc_channel_stack_destroy(grpc_exec_ctx * exec_ctx, grpc_channel_stack * stack) Line 166 C
destroy_channel(grpc_exec_ctx * exec_ctx, void * arg, grpc_error * error) Line 383 C
grpc_exec_ctx_flush(grpc_exec_ctx * exec_ctx) Line 85 C
grpc_exec_ctx_finish(grpc_exec_ctx * exec_ctx) Line 100 C
grpc_call_unref(grpc_call * c) Line 579 C
grpc::ServerContext::~ServerContext() Line 157 C++
Is it possible that the shutdown sequence is just a red herring and the bug is something else? Or is my shutdown sequence wrong (I can't delete ServerContext after shutting down server and completion queue)?
Any tips appreciated.
Thanks.