RabbitMQ.Client .Net nuget package behavior on disconnection

432 views
Skip to first unread message

marc....@frontiersin.org

unread,
Apr 27, 2016, 4:00:54 AM4/27/16
to rabbitmq-users

Hello everybody,


In my company, we are using the .Net RabbitMQ.Client nuget package (3.4.0 and 3.6.0) to publish messages to RabbitMQ. A single connection is created on application startup with the AutomaticRecovery enabled and the default NetworkRecoveryInterval of 5 seconds. Then we use this AutorecoveringConnection throughout the lifetime of the application without ever closing it explicitly. The application can run for long periods, like weeks, without being shut down.


We recently had an issue where it was impossible to create new IModels until the application was restarted. Calling the method AutorecoveringConnection.CreateModel() was always throwing the following exception:


Type:

RabbitMQ.Client.Exceptions.AlreadyClosedException


Stacktrace:

at RabbitMQ.Client.Framing.Impl.Connection.EnsureIsOpen()

at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.CreateModel()


Message:

_Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=541, text="Unexpected Exception", classId=0, methodId=0, cause=System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

at RabbitMQ.Client.Impl.Frame.ReadFrom(NetworkBinaryReader reader)

at RabbitMQ.Client.Impl.SocketFrameHandler.ReadFrame()

at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()

at RabbitMQ.Client.Framing.Impl.Connection.MainLoop()_


This started to happen moments after the RabbitMq server was unreachable for a short time. It was solved by restarting the application after a few days.


My first question is: is it possible that there is a bug in the automatic recovery mechanism that made the recovery mechanism fail and lead the AutorecoveringConnection object to be in a corrupted state where it doesn’t try to reconnect anymore and where it cannot create new IModels anymore? For instance, in AutorecoveringConnection.Init(), there is a try/catch that will simply swallow exceptions when recovering fails.


Which leads to my second question. Is it wise to create a single AutorecoveringConnection and then use it for extended periods of time, like weeks? Or does this goes against the intended usage of AutorecoveringConnections?


Also, while investigating this issue, I noticed that the AutorecoveringConnection.CreateModel() will throw this exception if the connection is currently closed. There is nothing to synchronize this method with the recovery mechanism, to make sure that we wait until being reconnected to create the IModel. That creates a small window of a few seconds where if you try to create a new IModel while the connection is recovering, this exception will be thrown. Is that by design and is there a way around besides catching this exception and implementing my own retry mechanism?

Thank you very much,

Marc

Michael Klishin

unread,
Apr 27, 2016, 11:47:55 AM4/27/16
to rabbitm...@googlegroups.com
It can take a while to detect connection failure and recovery may or may not succeed immediately.

Before recovery happens, all channels on closed connections are also closed and thus cannot be used.
"Delaying" protocol method delivery until recovery occurs might work for some cases but not all
(e.g. what if an exchange you are trying to publish to is gone by the time recovery happens?),
so it's not clear to me if trying to change the behaviour we have would be a significant improvement.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

marc....@frontiersin.org

unread,
May 2, 2016, 7:51:36 AM5/2/16
to rabbitmq-users
Hello Michael,

Thank you for your answer.

We were also not aware with this behavior regarding the usage of existing Models, which makes sense.

We'll update our software to deal with that and the fact that Model creation cannot be done while recovering the connection.

Also I've seen on github that you have an issue #132 about providing events on unexpected exceptions. This would allow us to hook ourselves up to detect when the connection recovery fails. After that we'll be able to create a new connection instead of waiting for the corrupted connection to recover.

Marc
Reply all
Reply to author
Forward
0 new messages