RabbitMQ client sporadic timeouts

129 views
Skip to first unread message

Brandon “AGr”

unread,
Feb 10, 2023, 11:32:37 AM2/10/23
to rabbitmq-users
Some background on our environment, we have a wcf based windows service that gets on the order of 10-100 calls per second in production, there is a single global IConnection in the application and when a message is to be sent a Task.Factory.StartNew task is created and a new channel is created (some channels use publisher confirms, some dont). This service only sends and does not read any messages. The RabbitMQ broker is running on localhost

This has been running fine for years using RabbitMQ.Client 5.1, then this week we deployed an update that brought RabbitMQ.Client up to the latest 6.4 version, and the system will sporadically get tons of failed sends at the same time, some sends do work. The deploy worked fine for a month in our staging environment, and only experienced these issues on the production system.

```
2023-02-09 16:46:18.020
 Level: Error
 Description: Failed to send BroadcastQueryEvent
 Exception: System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell`1.WaitForValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_ExchangeDeclare(String exchange, String type, Boolean passive, Boolean durable, Boolean autoDelete, Boolean internal, Boolean nowait, IDictionary`2 arguments)
   at RabbitMQ.Client.Impl.AutorecoveringModel.ExchangeDeclare(String exchange, String type, Boolean durable, Boolean autoDelete, IDictionary`2 arguments)
   at ....BroadcastEventMessenger.SendQueryEvent(BroadcastQueryEvent queryEvent)

2023-02-09 16:46:18.020
 Level: Error
 Description: Failed to send BroadcastQueryEvent
 Exception: System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell`1.WaitForValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_ExchangeDeclare(String exchange, String type, Boolean passive, Boolean durable, Boolean autoDelete, Boolean internal, Boolean nowait, IDictionary`2 arguments)
   at RabbitMQ.Client.Impl.AutorecoveringModel.ExchangeDeclare(String exchange, String type, Boolean durable, Boolean autoDelete, IDictionary`2 arguments)
   at ....SendQueryEvent(BroadcastQueryEvent queryEvent)

2023-02-09 16:46:18.006
 Level: Error
 Description: Failed to send BroadcastQueryEvent
 Exception: System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell`1.WaitForValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_ExchangeDeclare(String exchange, String type, Boolean passive, Boolean durable, Boolean autoDelete, Boolean internal, Boolean nowait, IDictionary`2 arguments)
   at RabbitMQ.Client.Impl.AutorecoveringModel.ExchangeDeclare(String exchange, String type, Boolean durable, Boolean autoDelete, IDictionary`2 arguments)
   at ....SendQueryEvent(BroadcastQueryEvent queryEvent)

2023-02-09 16:46:18.006
 Level: Error
 Description: Failed to send BroadcastQueryEvent
 Exception: System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell`1.WaitForValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_ExchangeDeclare(String exchange, String type, Boolean passive, Boolean durable, Boolean autoDelete, Boolean internal, Boolean nowait, IDictionary`2 arguments)
   at RabbitMQ.Client.Impl.AutorecoveringModel.ExchangeDeclare(String exchange, String type, Boolean durable, Boolean autoDelete, IDictionary`2 arguments)
   at ....SendQueryEvent(BroadcastQueryEvent queryEvent)
```

My first guess is there could be some kind of race condition inside the rabbitmq library, as the production system has never gotten these errors in years of running with the 5.1 client, and deploying an update that only changed the Rabbitmq.client nuget version to 6.4 resulted in these sporadic errors.

There are no log entries in RabbitMQ (which was set to log.file.level info) when this happened

This previous posts seem similar, timeouts on 6.x
https://groups.google.com/g/rabbitmq-users/c/3TbIhO9e9fA/m/alvehhUSBQAJ

Doing testing on the production system is of course problematic as these messages failing to send impacts our business operations, so we have to rollback the deploy at the first sight of these errors

I am continuing to attempt to replicate the issue locally but haven't been able to so far, what steps would you recommend to troubleshoot this?

Luke Bakken

unread,
Feb 10, 2023, 4:48:54 PM2/10/23
to rabbitmq-users
Hi Brandon,

Thanks for the thorough set of information. Could we get the following information:
  • RabbitMQ and Erlang version
  • .NET version being used (Framework, Core, etc and version)
  • Is the workload in your staging environment similar to production?
  • How many Task.Factory.StartNew calls can happen at once?
One major change between 5.x and 6.x is how the ThreadPool is used. My guess is you'll have to increase the size to ensure enough threads exist for your tasks as well as for the use of the .NET client:


I'm working on version 7 and one major task will be to improve the client's interaction with the ThreadPool.

Thanks,
Luke

Brandon “AGr”

unread,
Feb 10, 2023, 7:03:43 PM2/10/23
to rabbitmq-users
Was able to capture some production traffic and reproduce the issue (using latest broker running locally RabbitMQ 3.11.8 Erlang 25.2.2),  application is targeting .net framework 4.6.1

Using `ThreadPool.SetMinThreads` to a high initial number does indeed prevent the problem from occuring


This is what my local repro shows for the number of calls inside the send mesage method (some of these calls are sending a bulk number of messages with CreateBasicPublishBatch), the first 7 calls attempt to send within ~50 milliseconds

Sender for 1 message CallCount 1 Sent 0 @ 17:58:40.307
Sender for 1 message CallCount 2 Sent 0 @ 17:58:40.313
Sender for 1 message CallCount 3 Sent 0 @ 17:58:40.339
Sender for 1 message CallCount 4 Sent 0 @ 17:58:40.339
Sender for 1 message CallCount 5 Sent 0 @ 17:58:40.347
Sender for 1 message CallCount 6 Sent 0 @ 17:58:40.356
Sender for 1 message CallCount 7 Sent 0 @ 17:58:40.370
Sender for 1 message CallCount 8 Sent 0 @ 17:58:40.401
Sender for 1 message CallCount 9 Sent 0 @ 17:58:40.411
Sender for 1 message CallCount 10 Sent 0 @ 17:58:40.414
Sender for 1 message CallCount 11 Sent 0 @ 17:58:40.424
Sender for 1 message CallCount 12 Sent 0 @ 17:58:40.439
Sender for 2 messages CallCount 13 Sent 0 @ 17:58:40.503
Sender for 1 messages CallCount 14 Sent 0 @ 17:58:40.505
Sender for 14 messages CallCount 15 Sent 0 @ 17:58:40.505
Sender for 5 messages CallCount 16 Sent 0 @ 17:58:40.505
Sender for 14 messages CallCount 17 Sent 0 @ 17:58:40.505
Sender for 1 messages CallCount 18 Sent 0 @ 17:58:40.508
Sender for 8 messages CallCount 19 Sent 0 @ 17:58:40.655
Sender for 4 messages CallCount 20 Sent 0 @ 17:58:41.340
Sender for 1 message CallCount 21 Sent 0 @ 17:58:42.267
Sender for 1 message CallCount 22 Sent 0 @ 17:58:43.260
Sender for 1 message CallCount 23 Sent 0 @ 17:58:44.267
Sender for 1 messages CallCount 24 Sent 0 @ 17:58:45.285
Sender for 1 message CallCount 25 Sent 0 @ 17:58:46.262
Sender for 1 message CallCount 26 Sent 0 @ 17:58:47.257
Sender for 1 message CallCount 27 Sent 0 @ 17:58:48.267
Sender for 1 message CallCount 28 Sent 0 @ 17:58:49.264
Sender for 1 messages CallCount 29 Sent 0 @ 17:58:50.293

This is all the wireshark traffic between the test application and the broker

14    0.615541    ::1    ::1    AMQP    590    Connection.Start
16    0.640176    ::1    ::1    AMQP    513    Connection.Start-Ok
18    0.640849    ::1    ::1    AMQP    84    Connection.Tune
20    0.643025    ::1    ::1    AMQP    84    Connection.Tune-Ok
22    0.644458    ::1    ::1    AMQP    80    Connection.Open vhost=/
24    0.644717    ::1    ::1    AMQP    77    Connection.Open-Ok
27    0.873868    ::1    ::1    AMQP    72    Heartbeat
35    15.870745    ::1    ::1    AMQP    72    Heartbeat
39    30.653556    ::1    ::1    AMQP    72    Heartbeat
41    30.879964    ::1    ::1    AMQP    72    Heartbeat
65    47.441778    ::1    ::1    AMQP    72    Heartbeat
67    47.977139    ::1    ::1    AMQP    90    Channel.Open Channel.Open
69    47.978170    ::1    ::1    AMQP    80    Channel.Open-Ok
71    47.978694    ::1    ::1    AMQP    80    Channel.Open-Ok
73    48.021671    ::1    ::1    AMQP    77    Channel.Open
75    48.022381    ::1    ::1    AMQP    77    Channel.Open
77    48.022436    ::1    ::1    AMQP    80    Channel.Open-Ok
79    48.023121    ::1    ::1    AMQP    80    Channel.Open-Ok
81    48.024224    ::1    ::1    AMQP    200    Exchange.Declare x=ERCOT Nodal Broadcast Exchange.Declare x=SPP IMP Broadcast Exchange.Declare x=MISO Broadcast
83    48.024429    ::1    ::1    AMQP    105    Exchange.Declare x=PJM Broadcast
85    48.024579    ::1    ::1    AMQP    76    Exchange.Declare-Ok
87    48.024627    ::1    ::1    AMQP    76    Exchange.Declare-Ok
89    48.024972    ::1    ::1    AMQP    76    Exchange.Declare-Ok
91    48.025047    ::1    ::1    AMQP    76    Exchange.Declare-Ok 

Would converting this into a test that I could send you be helpful? Sounds like you are already familiar with the ThreadPool exhaustion problem

Brandon “AGr”

unread,
Feb 13, 2023, 12:47:52 PM2/13/23
to rabbitmq-users
I created this repro example based on my use case, it's not very different from your MassPublish example


It looks like ThreadPool.SetMinThreads workerThreads must be at least as high as the number of simultaneous messages being sent, increasing completion ports doesn't impact the timeouts

Luke Bakken

unread,
Feb 13, 2023, 1:32:42 PM2/13/23
to rabbitmq-users
Thanks for all of this! I have not yet had a chance to follow up on your responses due to being busy working on customer support. I do have this bookmarked to follow-up!

Luke Bakken

unread,
Jun 10, 2024, 12:09:56 PMJun 10
to rabbitmq-users
Hello,

It has been quite a while, but I finally have some time to return to starred messages in this group.

In this case, if you have time to convert your .NET code to use version 7.0.0-rc.1 of the client, and test it, it would be very appreciated.


More than likely, the issue you report here has been resolved.

I am going to fork your test code to update it to use 7.0.0-rc.1, and see if I see the same behavior.

Thanks,
Luke

Luke Bakken

unread,
Jun 10, 2024, 2:00:39 PMJun 10
to rabbitmq-users

Luke Bakken

unread,
Jun 10, 2024, 2:01:09 PMJun 10
to rabbitmq-users
Everything appears to work correctly without modifying the thread pool, using the code in this PR:

Reply all
Reply to author
Forward
0 new messages