Connection Recovery .NET

255 views
Skip to first unread message

nate

unread,
Sep 22, 2020, 2:13:50 PM9/22/20
to rabbitmq-users
Hey all,

I am working with .NET RabbitMQ client library (5.1.2). I have a wrapper library that centralizes some of the redundant operations such as opening/closing connection and channel. I am forcing anyone that uses this library to use auto recovery connection. However, I've encountered an exception in my services that was caused by a timeout when acking a published message. Auto recovery connection did not start as expected because this was a channel error exception. My question is how can I handle channel error exceptions correctly without interfering with the auto recovery connection mechanism? I don't want to be closing and opening connections/channels while auto recovery is working.

Thanks

Wesley Peng

unread,
Sep 22, 2020, 7:22:33 PM9/22/20
to rabbitm...@googlegroups.com
Hello

Talk is cheap, show us the code. :)

> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6c65527d-4f00-4ac2-889c-130b2e963aean%40googlegroups.com.
>
-------------------------------------------------------------------------------------------------
FreeMail powered by mail.co.uk

nate

unread,
Sep 22, 2020, 8:37:13 PM9/22/20
to rabbitmq-users
When I send a message, I get an IO exception some times when there are very high stress loads on the queue. When exception is raised, the auto recovery never kicks in.
            try
            {
                if (_client.SendMessage(message))
                {
                    _client.WaitForConfirmsSynchronouslyOrTimeout(TimeSpan.FromSeconds(TIMEOUT_IN_SECONDS));
                    _logger.LogMessage("RabbitMQ message sent successfully and ack received.", LogDetail.Information);
                    onSendSuccess?.Invoke();
                }
            }
            catch (Exception ex)
            {
                _logger.LogException(ex);
                _logger.LogMessage("Failed to send RabbitMQ message.", LogDetail.Warning);
                throw;
            }

Auto Recovery is enabled:
            ConnectionFactory factory = new ConnectionFactory
            {
                AutomaticRecoveryEnabled = true,
                NetworkRecoveryInterval = _config.NetworkRecoveryInterval
            };

I have a service thread that constantly checks if connection is down and prevents from doing the business logic work. However, the connection will never come back up because auto recovery never started.
void RunningThread() {

                    if (!_publisherMessenger.IsConnected())
                    {
                        token.WaitHandle.WaitOne(THREAD_SLEEP_MILLISECONDS);
                        continue;
                    }
       // business logic

nate

unread,
Sep 22, 2020, 9:04:58 PM9/22/20
to rabbitmq-users
Here is the logged exception.

2020-09-19 20:54:17.794  Error          Exception Thrown!!!
2020-09-19 20:54:17.794  Error           message: Timed out waiting for acks
2020-09-19 20:54:17.794  Error           type: System.IO.IOException
2020-09-19 20:54:17.794  Error           target: Void WaitForConfirmsOrDie(System.TimeSpan)
2020-09-19 20:54:17.794  Error           source: RabbitMQ.Client
2020-09-19 20:54:17.794  Error           stack-unwinding:
2020-09-19 20:54:17.794  Error          System.IO.IOException: Timed out waiting for acks
   at RabbitMQ.Client.Impl.ModelBase.WaitForConfirmsOrDie(TimeSpan timeout)
   at RabbitMQ.Client.Impl.AutorecoveringModel.WaitForConfirmsOrDie(TimeSpan timeout)
   at Client.WaitForConfirmsSynchronouslyOrTimeout(TimeSpan timeoutInSeconds)
   at Messenger.Send(String message, Action onSendSuccess)
2020-09-19 20:54:17.794 Warning        Failed to send RabbitMQ message.

Wesley Peng

unread,
Sep 22, 2020, 9:08:51 PM9/22/20
to rabbitm...@googlegroups.com
I am not familiar with .net library, I believe @Luke and @Arnaud will
take time looking at this.

Thanks.
> <https://groups.google.com/d/msgid/rabbitmq-users/6c65527d-4f00-4ac2-889c-130b2e963aean%40googlegroups.com>.
> >
> -------------------------------------------------------------------------------------------------
> FreeMail powered by mail.co.uk <http://mail.co.uk>
>
> --
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/rabbitmq-users/7c99f2a1-9066-4ea3-b365-4ba704eaf2f6n%40googlegroups.com
> <https://groups.google.com/d/msgid/rabbitmq-users/7c99f2a1-9066-4ea3-b365-4ba704eaf2f6n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Brown Purchase

unread,
Sep 22, 2020, 9:10:21 PM9/22/20
to rabbitm...@googlegroups.com
If you are debugging and havent let the ping occur you will get similar error.  Without seeing when and how you manage connections i have a hard time understanding how you might otherwise be affected.

Nathan Dhami

unread,
Sep 23, 2020, 6:10:28 AM9/23/20
to rabbitm...@googlegroups.com
How does this pinging work? I'm just testing my application for resiliency using the .net rabbitmq library. In my testing I have a thread that is constantly sending rabbitmq messages on a channel (every 10 ms) and waiting for a publisher confirm (5 second timeout). It works 99.8% of the time. When the exception does occur, the auto recovery doesn't start. I have one class that contains the connection/channel and I have only 1 connection/channel at most (a singleton class, nothing special). For a solution, I think I don't want to be interfering with auto recovery but I also want to handle these edge cases that don't start auto recovery and the connection/channel is down.  

You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/742NJJXbebA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAADfLOpQKD8PYUVuaZfFg6Sa%2BwEhctBT0cHyzKC4xrJxB9fvfQ%40mail.gmail.com.
Message has been deleted

nate

unread,
Sep 24, 2020, 4:34:08 AM9/24/20
to rabbitmq-users
https://github.com/rabbitmq/rabbitmq-java-client/issues/151

I think I may be having similar issues as above link but with .NET

System.IO.IOException: Timed out waiting for acks
   at RabbitMQ.Client.Impl.ModelBase.WaitForConfirmsOrDie(TimeSpan timeout)
   at RabbitMQ.Client.Impl.AutorecoveringModel.WaitForConfirmsOrDie(TimeSpan timeout)
   at Client.WaitForConfirmsSynchronouslyOrTimeout(TimeSpan timeoutInSeconds)
   at Messenger.Send(String message, Action onSendSuccess)

My channel never recovers after encountering above exception.

Code:

IConnection _connnection = .... // initialized somewhere else
IModel _channel = ... // initialized somewhere else

void ReconnectChannel() 
{

            if ( _connnection.IsConnectionOpen)
            { 
// have to do this in case of channel exceptions (or exceptions that don't trigger auto recovery)
                if (! _channel.IsChannelOpen)
                {
                    _logger.LogMessage($"RabbitMQ channel is down.", LogDetail.Warning);
                   _channel = _connection.CreateModel();
                }
            }
            else
            {
// Rely on auto recovery to recover the connection and channels
                _logger.LogMessage($"RabbitMQ not connected. Auto recovery is enabled and will recover connection.", LogDetail.Warning);
            }
}

Would it be safe to implement above code ? I'm assuming that auto recovery only triggers when IConnection's bool flag IsConnectionOpen=false.

Brown Purchase

unread,
Sep 24, 2020, 10:19:13 AM9/24/20
to rabbitm...@googlegroups.com
I havent tried the synchronous confirms as it seems like the wrong choice for a system that is meant to be asynchronous.  Especially if you are sending every 10ms.  Send and forget....let the callback tell you it finished.

Nathan Dhami

unread,
Sep 24, 2020, 12:37:37 PM9/24/20
to rabbitm...@googlegroups.com
The system is designed to be synchronous. I was only sending every 10ms to see what kind of issues I could face and the real root issue is that I could encounter a channel exception in my publisher service and it would never recover (I think the same problem may be in my consumer service as well). I came to the conclusion that I need to handle recovery manually for these soft errors that don't trigger auto recovery. Can anyone please advise me what the best way is to handle soft errors while auto recovery is enabled? Is there event handlers I need to register or would the exposed IsOpen bool properties on IModel channel object And IConnection connection object be enough to determine soft error vs auto recoverable error?  

Brown Purchase

unread,
Sep 25, 2020, 2:44:40 AM9/25/20
to rabbitm...@googlegroups.com
I really do believe you are preventing the ping keepalive from running somehow.  also its curious you are sending every 10ms but does that include the confirm?  Reason I ask is that maybe you are stepping on yourself somehow and preventing the ping.
are there multiple threads?  if so do you have a channel for each thread?

Reply all
Reply to author
Forward
0 new messages