Topology Recovery queue binding fails

737 views
Skip to first unread message

ni...@bluejeansnet.com

unread,
Feb 2, 2016, 5:03:03 PM2/2/16
to rabbitmq-users
This is with amqp-client 3.5.7; I haven't verified yet whether 3.6.0 is affected, but the code is the same, so I assume it would be.

With topology recovery enabled, with a long-running consumer on a non-durable, non-exclusive, auto-delete queue, I yank my laptop's ethernet cable, wait a moment, and plug it back in.  This causes a shutdown signal to be delivered followed by an attempted topology recovery, which fails when it comes to queue binding:

INFO  [2016-02-02 13:41:03,231] ConsumerManager - Shutdown signal was delivered for cTag nick_mbp_685B35A351A0_1454447762858-Consumer-bluejeans_exchange_, with the reason null, isHardError=true, isInitiatedByApplication=false
INFO  [2016-02-02 13:41:03,231] ConsumerManager - Shutdown signal was delivered for cTag nick_mbp_685B35A351A0_1454447722856-Consumer-bluejeans_53a378cfe4b08023288d4308_, with the reason null, isHardError=true, isInitiatedByApplication=false
Caught an exception when recovering topology Caught an exception while recovering binding between bluejeans$catalyst.listener and bluejeans$53a378cfe4b08023288d4308.#: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
com.rabbitmq.client.TopologyRecoveryException: Caught an exception while recovering binding between bluejeans$catalyst.listener and bluejeans$53a378cfe4b08023288d4308.#: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverBindings(AutorecoveringConnection.java:579)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverEntities(AutorecoveringConnection.java:519)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.beginAutomaticRecovery(AutorecoveringConnection.java:446)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.access$000(AutorecoveringConnection.java:52)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection$1.shutdownCompleted(AutorecoveringConnection.java:382)
at com.rabbitmq.client.impl.ShutdownNotifierComponent.notifyListeners(ShutdownNotifierComponent.java:75)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:576)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:195)
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:241)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:219)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118)
at com.rabbitmq.client.impl.ChannelN.queueBind(ChannelN.java:933)
at com.rabbitmq.client.impl.ChannelN.queueBind(ChannelN.java:61)
at com.rabbitmq.client.impl.recovery.RecordedQueueBinding.recover(RecordedQueueBinding.java:14)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverBindings(AutorecoveringConnection.java:575)
... 7 more
Caught an exception when recovering topology Caught an exception while recovering binding between bluejeans$catalyst.listener and bluejeans$exchange.#: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
com.rabbitmq.client.TopologyRecoveryException: Caught an exception while recovering binding between bluejeans$catalyst.listener and bluejeans$exchange.#: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverBindings(AutorecoveringConnection.java:579)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverEntities(AutorecoveringConnection.java:519)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.beginAutomaticRecovery(AutorecoveringConnection.java:446)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.access$000(AutorecoveringConnection.java:52)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection$1.shutdownCompleted(AutorecoveringConnection.java:382)
at com.rabbitmq.client.impl.ShutdownNotifierComponent.notifyListeners(ShutdownNotifierComponent.java:75)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:576)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:195)
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:241)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:219)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118)
at com.rabbitmq.client.impl.ChannelN.queueBind(ChannelN.java:933)
at com.rabbitmq.client.impl.ChannelN.queueBind(ChannelN.java:61)
at com.rabbitmq.client.impl.recovery.RecordedQueueBinding.recover(RecordedQueueBinding.java:14)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverBindings(AutorecoveringConnection.java:575)
... 7 more
WARN  [2016-02-02 13:41:08,455] AmqpConnectionHolder - Connection amqp://blue...@10.4.5.111:5671/ recovered

Checking the RabbitMQ Admin UI when this happens, I see that the consumer and the queue did get recovered, but the queue binding did not, evidently due to this error.

One difference I see between the recovery that works (queue declaration) and the recovery that fails (queue binding) is that queue recovery calls this.channel.queueDeclare(...) whereas queue binding recovery calls this.channel.getDelegate().queueBind(...).

Probably the call to .getDelegate() prevents the AutorecoveringChannel from recovering and tries to directly use a channel that is closed.

I think the fix is to change RecordedQueueBinding#recover to call this.channel.queueBind instead of this.channel.getDelegate().queueBind.

Thoughts, or should I try to create a PR?

    Nick

Michael Klishin

unread,
Feb 2, 2016, 5:18:24 PM2/2/16
to rabbitm...@googlegroups.com, ni...@bluejeansnet.com
On 3 February 2016 at 01:03:07, ni...@bluejeansnet.com (ni...@bluejeansnet.com) wrote:
> One difference I see between the recovery that works (queue
> declaration) and the recovery that fails (queue binding) is
> that queue recovery calls this.channel.queueDeclare(...)
> whereas queue binding recovery calls this.channel.getDelegate().queueBind(...).

Sounds reasonable, thank you for digging in.

At first I thought that maybe channel delegate should be recovered first; in fact,
we already do that in AutorecoveringConnection#beginAutomaticRecovery:

 * Recover channels, which opens a new “delegate” channel for every autorecovering channel
 * Recover entities *in the connection* (because known entity cache is per connection)

so my hypothesis is wrong.

> Probably the call to .getDelegate() prevents the AutorecoveringChannel
> from recovering and tries to directly use a channel that is closed.
>
> I think the fix is to change RecordedQueueBinding#recover to
> call this.channel.queueBind instead of this.channel.getDelegate().queueBind.
>
> Thoughts, or should I try to create a PR?

Please submit a PR against the `stable` branch.
Also, feel free to do as much testing in your environment as you feel comfortable/have time for first.
3.6.1 is unlikely to ship until mid next week because we are working on a tricky issue
(in fact, a workaround for an issue in OTP) we’d like to include into it.

Thank you for your time. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Michael Klishin

unread,
Feb 2, 2016, 5:27:36 PM2/2/16
to rabbitm...@googlegroups.com, ni...@bluejeansnet.com
 On 3 February 2016 at 01:18:18, Michael Klishin (mkli...@pivotal.io) wrote:
> Please submit a PR against the `stable` branch.
> Also, feel free to do as much testing in your environment as you
> feel comfortable/have time for first.
> 3.6.1 is unlikely to ship until mid next week because we are working
> on a tricky issue
> (in fact, a workaround for an issue in OTP) we’d like to include
> into it.

Also, a Wireshark traffic capture would help you understand if it is
an issue with an old channel and not a new channel being unexpectedly closed):
http://www.rabbitmq.com/amqp-wireshark.html

ni...@bluejeansnet.com

unread,
Feb 2, 2016, 5:51:16 PM2/2/16
to rabbitmq-users, ni...@bluejeansnet.com
Sounds good.  I will open an issue so that I have a good branch name, then do some additional work to try a fix, test, and push up a PR.

   Nick

Michael Klishin

unread,
Feb 3, 2016, 2:47:43 AM2/3/16
to rabbitm...@googlegroups.com, ni...@bluejeansnet.com
On 3 February 2016 at 01:51:19, ni...@bluejeansnet.com (ni...@bluejeansnet.com) wrote:
> I will open an issue so that I have a good branch name, then do some
> additional work to try a fix, test, and push up a PR.

FTR, here's the issue and a fix will be in 3.6.1:
https://github.com/rabbitmq/rabbitmq-java-client/issues/129

entirely thanks to Nick.
Reply all
Reply to author
Forward
0 new messages