We have seen a thread deadlock issue in ModelBase when using a TaskCompletionSource inside the callback of an AsyncEventingBasicConsumer and wondered if anyone could provide some insight / guidance on the best way of working with threads / async in this scenario.
The code in question was for an RPC client. The code would declare a temporary queue for the response, create an AsyncEventingBasicConsumerfor that queue, send the request and then wait for the expected number of replies to arrive. To do this it declares a TaskCompletionSource with TaskCreationOptions.RunContinuationsAsynchronously and inside the AsyncEventingBasicConsumer's callback it calls SetResult(null) when the expected number of replies has arrived.
var completionSource = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
consumer.Received += (source, deliveryArgs) => {
//Process the message(s)
//When we have received the expected number of replies complete the task
completionSource.SetResult(null)
};
After sending the request we then await the TCS.
await Task.WhenAny(
completionSource.Task,
Task.Delay(timeout)
);
Shortly thereafter we cancel the consumer and close the channel by calling Close() on IModel.
On the vast majority of environments the general flow is:
1. Thread X creates the TCS and sends the request message.
2. The RabbitMQ background thread receives the reply(s) and calls SetResult(null) on the TCS.
3. Thread Y then calls Close() on IModel.
However, on some environments what we see is this:
1. Thread X creates the TCS and sends the request message.
2. The RabbitMQ background thread receives the reply(s) and calls SetResult(null) on the TCS.
3. The RabbitMQ background thread then calls Close() on IModel resulting in a thread deadlock.
The thread deadlock is caused by the fact that ModelBase.Close method causes the calling thread to be blocked until the channel.close-ok reply comes back from the server, and because the calling thread is now the RabbitMQ background thread that reply is not processed and the call to BlockingCell.GetValue() times out. This happens every time on these environments.
We managed to resolve the issue by removing the TCS and replacing it with thread signaling using SemaphoreSlim.
My question is whether this is a common gotcha, and are there any general guidelines on working with threads/tasks inside an AsyncEventingBasicConsumer's callback?
Thanks,
Mark