Clients routinely disconnected and automatically reconnected

834 views
Skip to first unread message

Dave Frank

unread,
Jun 14, 2016, 1:08:27 PM6/14/16
to rabbitmq-users
OS: Windows Server 2008 R2
RabbitMQ version: 3.5.2 (for server and clients)
Clients in use: Java and .Net

We routinely (several times a day nominally) receive callbacks into the Java client and .Net client that indicate that connection was dropped by the server and then the connection was re-established immediately afterwards. In the rabbit log, this looks like:

=ERROR REPORT====
Channel error on connection <0.31653.53> (CLIENTHOST -> SERVERHOST, vhost '/', user: USERNAME), channel 1: 
{amqp_error,not_found,"no_exchange EXCHANGENAME in vshost '/'",
        'queue.bind'}

=WARNING REPORT====
closing AMQP connection <0.31653.53> (CLIENTHOST -> SERVERHOST):
connection_closed_abruptly

=INFO REPORT====
accepting AMQP connection <0.3176.53> (CLIENTHOST -> SERVERHOST)


The client receives something like

AMQP channel shutdown. reason = null, message = connection error

com.rabbitmq.client.ShutdownSignalException: connection error
...
Caused by: java.net.SocketException: Unrecognized Windows Sockets error: 0: recv failed

or in the .Net client,

Exception: RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text="NOT_FOUND - no exchange 'mux_low_side_outgoing_29' in vhost '/'", classId=60, methodId=40, cause=
   at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd)
   at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, Boolean immediate, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, IBasicProperties basicProperties, Byte[] body)

The only thing we've been able to guess at is the client machine is busy when this happens. For example, it will sometimes happen when the application is starting up and doing other initialization work. 

Usually, we see a little burst of these messages and everything settles down without any real issue. However, recently one of our .Net applications started receiving the above error a few times a second and had to be restarted to clear the issue.  

Note the connection errors are all on connections and channels that are long lived. 


Any thoughts and where to look would be helpful. Unfortunately, I haven't been able to come up with a way to consistently reproduce the issues.

Thanks

Tom Molesworth

unread,
Jun 14, 2016, 1:30:19 PM6/14/16
to rabbitm...@googlegroups.com
Hi,


On 14/06/16 18:08, Dave Frank wrote:
OS: Windows Server 2008 R2
RabbitMQ version: 3.5.2 (for server and clients)
Clients in use: Java and .Net

We routinely (several times a day nominally) receive callbacks into the Java client and .Net client that indicate that connection was dropped by the server and then the connection was re-established immediately afterwards. In the rabbit log, this looks like:

=ERROR REPORT====
Channel error on connection <0.31653.53> (CLIENTHOST -> SERVERHOST, vhost '/', user: USERNAME), channel 1: 
{amqp_error,not_found,"no_exchange EXCHANGENAME in vshost '/'",
        'queue.bind'}

You can't bind a queue to an exchange that does not exist. Declare that exchange before the bind - or find whatever's deleting it and fix that if you were expecting the exchange to be in place already.


or in the .Net client,

Exception: RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text="NOT_FOUND - no exchange 'mux_low_side_outgoing_29' in vhost '/'", classId=60, methodId=40, cause=
   at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd)
   at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, Boolean immediate, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, IBasicProperties basicProperties, Byte[] body)

This looks similar - you're publishing to an exchange that doesn't exist. Presumably your error handler is taking that channel exception and treating it as a connection failure.

It should also be possible to add retry logic on channel errors, if you don't want the overhead of the exchange declare - any errors like this will cause the channel to close, but should not affect the connection. This means you should be able to open a new channel, declare the offending exchange, and continue.

Note that a missing exchange means there won't be any queues bound to it when you recreate it... i.e. just recreating the exchange before publishing again is unlikely to work as expected, the messages probably won't be routed to a queue until the binding.

cheers,

Tom

Dave Frank

unread,
Jun 14, 2016, 4:59:30 PM6/14/16
to rabbitmq-users
Hah, yes. That is the normal response I've seen about this issue, but the exchanges always exist when these errors happen or at least if they are getting removed, they are being added back before I can tell they were gone.

Dave Frank

unread,
Jun 16, 2016, 2:13:02 PM6/16/16
to rabbitmq-users
Thanks for the help Tom.

While I can't necessarily identify the root cause of this, what I did find is that the channel was likely being closed without the connection being closed as you suggested. I found that I could simply check if the connection was still good and create a new channel when this occurs.

However, like I said, it's unclear how the exchange would ever be deleted because it's not auto-deleteing. I can even close my app and the exchange will exist. I am left assuming that the broker will occasionally fail to find the exchange when a message is published even when the exchange exists. Maybe it's some kind of race condition because it's not something I can reproduce on demand. 

Tom Molesworth

unread,
Jun 16, 2016, 4:02:58 PM6/16/16
to rabbitm...@googlegroups.com
My usual first step would be wireshark or similar packet tracing - if
you're dealing with high traffic volumes and/or the error only happens
rarely, that may not be practical.

If you install the event exchange plugin:

https://github.com/rabbitmq/rabbitmq-event-exchange

then you can trace exchange create/delete events (among other things).
That should at least help to rule out stray application components
deleting/recreating the exchange. Preferably on a test system, of course.

If there are no exchange recreation events, perhaps try cranking up the
load - e.g. run as many clients publishing messages as possible, see if
the rate of errors increases or you start to see other errors.

cheers,

Tom

Reply all
Reply to author
Forward
0 new messages