Channel auto_recovery not effective?

41 views
Skip to first unread message

Thilo-Alexander Ginkel

unread,
Apr 24, 2013, 7:12:58 AM4/24/13
to ruby...@googlegroups.com
Hi there,

we are currently struggling with getting Channel auto_recovery working. Upon successful AMQP.start we establish a channel (from the block passed to AMQP.start) and explicitly enable auto_recovery:

    @channel               = AMQP::Channel.new(connection)
    @channel.auto_recovery = true

After that publishing and consuming messages works as expected, but breaks forever (for the life-time of the VM) once the first error happens on that channel (e.g., by calling channel.direct w/ passive: true for a non-existing exchange resulting in an error '#<AMQ::Protocol::Channel::Close:0x00000008338800 @method_id=10, @reply_code=404, @class_id=40, @reply_text="NOT_FOUND - no exchange 'test' in vhost '/'">
').

Channel.on_error is invoked correctly and we can see that Channel.auto_recovering? also returns true at that point. Still, none of the messages that we publish from that point on reach any subscriber -- according to `rabbitmqctl list_queues` they do not even reach the respective queue.

Any ideas how we could isolate what is causing this issue? Is my assumption correct that auto_recovery on channel level is supposed to work correctly in amqp 1.0.1?

Thanks,
Thilo

Michael Klishin

unread,
Apr 24, 2013, 7:16:39 AM4/24/13
to ruby...@googlegroups.com

2013/4/24 Thilo-Alexander Ginkel <th...@ginkel.com>

After that publishing and consuming messages works as expected, but breaks forever (for the life-time of the VM) once the first error happens on that channel (e.g., by calling channel.direct w/ passive: true for a non-existing exchange resulting in an error '#<AMQ::Protocol::Channel::Close:0x00000008338800 @method_id=10, @reply_code=404, @class_id=40, @reply_text="NOT_FOUND - no exchange 'test' in vhost '/'">
').

Channels that had an exception on them cannot be used again. That's how the protocol works.
 

Channel.on_error is invoked correctly and we can see that Channel.auto_recovering? also returns true at that point. Still, none of the messages that we publish from that point on reach any subscriber -- according to `rabbitmqctl list_queues` they do not even reach the respective queue.

Any ideas how we could isolate what is causing this issue? Is my assumption correct that auto_recovery on channel level is supposed to work correctly in amqp 1.0.1?


Automatic recovery reopens channels when there is a network connection failure, not when there
is a channel-level exception. amqp gem absolutely must not try to be that intelligent: channel error
recovery is application-specific.

There is a method that lets you manually "reuse" a channel object (its id will change, but otherwise it is as if it was "reopend"):

Thilo-Alexander Ginkel

unread,
Apr 24, 2013, 7:28:02 AM4/24/13
to ruby...@googlegroups.com
On Wednesday, April 24, 2013 1:16:39 PM UTC+2, Michael Klishin wrote:
2013/4/24 Thilo-Alexander Ginkel <th...@ginkel.com>
After that publishing and consuming messages works as expected, but breaks forever (for the life-time of the VM) once the first error happens on that channel (e.g., by calling channel.direct w/ passive: true for a non-existing exchange resulting in an error '#<AMQ::Protocol::Channel::Close:0x00000008338800 @method_id=10, @reply_code=404, @class_id=40, @reply_text="NOT_FOUND - no exchange 'test' in vhost '/'">
').

Channels that had an exception on them cannot be used again. That's how the protocol works.

Ok, understood.
 
Channel.on_error is invoked correctly and we can see that Channel.auto_recovering? also returns true at that point. Still, none of the messages that we publish from that point on reach any subscriber -- according to `rabbitmqctl list_queues` they do not even reach the respective queue.

Any ideas how we could isolate what is causing this issue? Is my assumption correct that auto_recovery on channel level is supposed to work correctly in amqp 1.0.1?

Automatic recovery reopens channels when there is a network connection failure, not when there
is a channel-level exception. amqp gem absolutely must not try to be that intelligent: channel error
recovery is application-specific.

There is a method that lets you manually "reuse" a channel object (its id will change, but otherwise it is as if it was "reopend"):


Would it be viable to call Channel.reuse from within the Channel.on_error callback? I already tried doing so, but message delivery via that channel still seems to be negatively impacted, i.e., published messages vanish into thin air.

    @channel.on_error do |ch, channel_close|
      ch.reuse
    end

Thanks for your help,
Thilo

Michael Klishin

unread,
Apr 24, 2013, 7:38:16 AM4/24/13
to ruby...@googlegroups.com

2013/4/24 Thilo-Alexander Ginkel <th...@ginkel.com>

Would it be viable to call Channel.reuse from within the Channel.on_error callback? I already tried doing so, but message delivery via that channel still seems to be negatively impacted, i.e., published messages vanish into thin air.

You can call it from Channel#on_error. "vanish into thin air" is not very descriptive. Consult RabbitMQ log and management UI to see what's going on.

Thilo-Alexander Ginkel

unread,
Apr 24, 2013, 4:44:40 PM4/24/13
to ruby...@googlegroups.com
On Wednesday, April 24, 2013 1:38:16 PM UTC+2, Michael Klishin wrote:
2013/4/24 Thilo-Alexander Ginkel <th...@ginkel.com>
Would it be viable to call Channel.reuse from within the Channel.on_error callback? I already tried doing so, but message delivery via that channel still seems to be negatively impacted, i.e., published messages vanish into thin air.

You can call it from Channel#on_error. "vanish into thin air" is not very descriptive. Consult RabbitMQ log and management UI to see what's going on.

Unfortunately, the RabbitMQ logs are do not show anything after original error causing the channel close and tcpdump also shows that no data is transmitted by the client after the error happened:

If that helps, I can try to reproduce the issue in a minimal example.

Regards,
Thilo

Michael Klishin

unread,
Apr 25, 2013, 12:41:00 AM4/25/13
to ruby...@googlegroups.com

2013/4/25 Thilo-Alexander Ginkel <th...@ginkel.com>

If that helps, I can try to reproduce the issue in a minimal example.

We have an example that reopens a channel, I believe I see an issue there.
Investigating.

Michael Klishin

unread,
Apr 25, 2013, 3:55:26 AM4/25/13
to ruby...@googlegroups.com

2013/4/25 Michael Klishin <michael....@gmail.com>

We have an example that reopens a channel, I believe I see an issue there.
Investigating.

amq-client 1.0.1 [1] resolves one issue with reopened channels (around lazily executed
queue operations, e.g. with code such as ch.queue("", :exclusive => true).bind(x).subscribe(...))

I'm working on a better example that demonstrates channel error recovery and a few tests
that will catch the issue above.

amqp gem 1.0.2 will be out later today or tomorrow.

Michael Klishin

unread,
Apr 25, 2013, 3:57:00 AM4/25/13
to ruby...@googlegroups.com

2013/4/25 Michael Klishin <michael....@gmail.com>
amq-client 1.0.1 [1

Forgot the link: http://rubygems.org/gems/amq-client/versions/1.0.1

Thilo-Alexander Ginkel

unread,
Apr 25, 2013, 12:51:07 PM4/25/13
to ruby...@googlegroups.com
On Thursday, April 25, 2013 9:57:00 AM UTC+2, Michael Klishin wrote:
2013/4/25 Michael Klishin <michael....@gmail.com>
amq-client 1.0.1 [1

Forgot the link: http://rubygems.org/gems/amq-client/versions/1.0.1

Excellent. With amqp 1.0.1 and amq-client 1.0.2 things have changed a bit: Now, Channel.reuse is effective, i.e., it reconnects the channel, but effectively gets stuck in an endless loop as it seems to try to also reconnect the exchange that originally caused the channel to go down due to the 404 response, leading to yet another channel failure, and so on:

Regards,
Thilo

Michael Klishin

unread,
Apr 25, 2013, 1:02:23 PM4/25/13
to ruby...@googlegroups.com

2013/4/25 Thilo-Alexander Ginkel <th...@ginkel.com>

Excellent. With amqp 1.0.1 and amq-client 1.0.2 things have changed a bit: Now, Channel.reuse is effective, i.e., it reconnects the channel, but effectively gets stuck in an endless loop as it seems to try to also reconnect the exchange that originally caused the channel to go down due to the 404 response, leading to yet another channel failure, and so on:

There was one more improvement in amqp gem itself today that's not yet released.
The example I've been using can be found in the repo:

Unfortunately, if you don't recover associated entities, the whole method is not very
useful. I don't have a solution in mind about how to tell which entities need to be filtered out. channel.close does not carry entity name or type in the payload, only error code/class id and the message. Parsing error message strings is a very fragile solution.

Michael Klishin

unread,
Apr 25, 2013, 2:01:47 PM4/25/13
to ruby...@googlegroups.com

2013/4/25 Michael Klishin <michael....@gmail.com>

I don't have a solution in mind about how to tell which entities need to be filtered out.

There is one option for you: ch.exchanges is a hash that maps exchange names to AMQP::Exchange instances. You probably can just delete an exchange that fails
declaration from it.

I'll try to come up with a way to deregister entities that fail declaration but it is
a really counterintuitive feature, I'm afraid. Adding it may do more harm than good.

Thilo-Alexander Ginkel

unread,
Apr 25, 2013, 6:30:31 PM4/25/13
to ruby...@googlegroups.com
On Thursday, April 25, 2013 8:01:47 PM UTC+2, Michael Klishin wrote:
I don't have a solution in mind about how to tell which entities need to be filtered out.

There is one option for you: ch.exchanges is a hash that maps exchange names to AMQP::Exchange instances. You probably can just delete an exchange that fails
declaration from it.

I'll give that a try.
 
I'll try to come up with a way to deregister entities that fail declaration but it is
a really counterintuitive feature, I'm afraid. Adding it may do more harm than good.

IMHO that makes sense if an exchange is failing declaration due to different parameters. If it, however, fails due to not having been created before if `passive: true` is supplied, I would not expect any negative consequences (apart from an AMQP::Error being raised) as the documentation suggests that `passive: true` may be used as a means to check whether an exchange actually exists:

    # @option opts [Boolean] :passive (false)  If set, the server will not create the exchange if it does not
    #                                          already exist. The client can use this to check whether an exchange
    #                                          exists without modifying the server state.

"Breaking" the underlying channel under these circumstances was - at least for me - kind of unexpected. ;-)

Regards,
Thilo

Michael Klishin

unread,
Apr 26, 2013, 2:26:14 AM4/26/13
to ruby...@googlegroups.com

2013/4/26 Thilo-Alexander Ginkel <th...@ginkel.com>

IMHO that makes sense if an exchange is failing declaration due to different parameters. If it, however, fails due to not having been created before if `passive: true` is supplied

We don't have that information in channel.close. Parsing error messages is very fragile.

Michael Klishin

unread,
Apr 26, 2013, 2:27:34 AM4/26/13
to ruby...@googlegroups.com

2013/4/26 Thilo-Alexander Ginkel <th...@ginkel.com>

I would not expect any negative consequences (apart from an AMQP::Error being raised) as the documentation suggests that `passive: true` may be used as a means to check whether an exchange actually exists

AMQP::Error hasn't been in use for a few years.

This is how the protocol works: passive declarations that fail close the channel.

Thilo-Alexander Ginkel

unread,
Apr 26, 2013, 4:29:00 AM4/26/13
to ruby...@googlegroups.com
On Friday, April 26, 2013 8:27:34 AM UTC+2, Michael Klishin wrote:
2013/4/26 Thilo-Alexander Ginkel <th...@ginkel.com>
I would not expect any negative consequences (apart from an AMQP::Error being raised) as the documentation suggests that `passive: true` may be used as a means to check whether an exchange actually exists

AMQP::Error hasn't been in use for a few years.

Ok, I guess this is due to the asynchronous nature of the protocol. I was assuming that AMQP::Error is in use as it is mentioned in the RubyDoc of Channel.direct:

    # @raise [AMQP::Error] Raised when exchange is redeclared with parameters different from original declaration.
    # @raise [AMQP::Error] Raised when exchange is declared with  :passive => true and the exchange does not exist.

Maybe it would be a good idea to drop these lines if they no longer reflect the current situation.
 
This is how the protocol works: passive declarations that fail close the channel.

Ok. I'll resort to using a dedicated channel for each passive declaration.

Thanks,
Thilo 

Michael Klishin

unread,
Apr 26, 2013, 4:44:56 AM4/26/13
to ruby...@googlegroups.com

2013/4/26 Thilo-Alexander Ginkel <th...@ginkel.com>

Maybe it would be a good idea to drop these lines if they no longer reflect the current situation.

Done.
Reply all
Reply to author
Forward
0 new messages