Unexpected auto-recovery behavior?

100 views
Skip to first unread message

Rich Bramante

unread,
Aug 2, 2023, 9:36:28 PM8/2/23
to rabbitmq-users
Observed the following and wondering if it is expected or considered to be a possible issue.

Using Java 5.13.1 rabbitmq client library against RabbitMQ 3.11.18 Erlang 25.3.2.2 cluster.

Testing several recovery scenarios, specifically the documented behavior "Channel-level exceptions will not trigger any kind of recovery as they usually indicate a semantic issue in the application (e.g. an attempt to consume from a non-existent queue)."

In my test:
* Allocate auto-recovering connection
* Allocate channel from the connection
* basicConsume() from a busy queue
* Deliberately app-error the channel closed so it won't autorecover (in my case I called basicAck() with an invalid msg id on the channel)
* As expected, consumption stops, incoming events to the queue do not fire consumer callbacks

What surprised me:

With my app in the above state, in the Rabbit UI, I performed a "Force Close" of the connection. When the connection recovered, it also recovered the channel and consumer callbacks started firing again with new events.

I had not explicitly closed the channel after the application error since I assumed it to be closed "underneath" me, and it did not recover as long as its parent connection remained stable. But when the parent connection was forced to recover it did recover the channel closed by protocol error which seems to go against the documentation.

Should I have explicitly closed my channel upon shutdown notification even in the case that the channel appears to have been closed server-side to avoid this? Does the documentation need to cover this? Is it an oversight in the recovery code and this particular channel should not have recovered?

Michal Kuratczyk

unread,
Aug 3, 2023, 5:51:07 AM8/3/23
to rabbitm...@googlegroups.com
Hi,

I'm not sure I understand but I think you are describing a scenario where ultimately the connection is terminated (manually) and therefore a new connection is established, together with a new channel.
There's no channel recovery as such in this case though, it's a connection recovery.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f6be4970-728f-454a-9092-238faf9d4b7an%40googlegroups.com.


--
Michał
RabbitMQ team

Rich Bramante

unread,
Aug 3, 2023, 9:09:44 AM8/3/23
to rabbitmq-users
Yes, I guess that describes it but the part that surprised me is that the channel that was recovered when the connection recovered was already in a closed state due to its own application error, it was not closed due to the connection interruption.

Is the recovery contract that any channel on a connection that has not had channel.close() explicitly invoked by the application will be recovered if its underlying connection is recovered -- regardless of any previous close state?

If so, I think adding to the current doc "Channel-level exceptions will not trigger any kind of recovery as they usually indicate a semantic issue in the application..." that connection-triggered recovery will still recover channels in such a state may be helpful.

Thank you.

Luke Bakken

unread,
Aug 3, 2023, 10:58:16 AM8/3/23
to rabbitmq-users
Hi Rich,

Thanks for taking the time to investigate and report this. I have opened the following issue: https://github.com/rabbitmq/rabbitmq-java-client/issues/1085

More than likely the right fix is to update the docs but there's a chance that this behavior is unintended.

Thanks,
Luke

Rich Bramante

unread,
Aug 3, 2023, 11:30:37 AM8/3/23
to rabbitmq-users
Hi Luke.

Thank you. I have been doing some additional testing, and I think this could be a larger issue:

I tried to explicitly close the channel with the protocol error but you cannot close it because:

com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 9, class-id=60, method-id=80)
at com.rabbitmq.client.impl.AMQChannel.processShutdownSignal(AMQChannel.java:401) ~[amqp-client-5.13.1.jar:5.13.1]
at com.rabbitmq.client.impl.ChannelN.startProcessShutdownSignal(ChannelN.java:287) ~[amqp-client-5.13.1.jar:5.13.1]
at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:608) ~[amqp-client-5.13.1.jar:5.13.1]
at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:542) ~[amqp-client-5.13.1.jar:5.13.1]
at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:535) ~[amqp-client-5.13.1.jar:5.13.1]

I also tried to abort the channel which does not result in AlreadyClosedException but does not otherwise change behavior -- if the underlying connection were to error and recover, this channel is "resurrected".

This seems like it is asking for a channel leak. If the application wants to try and recover from its error it must allocate a new channel since this channel is currently defunct. It cannot close the current channel. If the connection has a recovery event later then this old channel will reactivate. This seems like a leak? The only way I currently could see around this would be for the application to track all channels with protocol errors and track recovery events for these channels so that if they recover at a later date it can close them then to avoid a leak.

I'm not sure most SDK users would expect this. I think the assumption would be the tracking of Channels for the Connection in the SDK layers would remove application-errored Channels from its lists of Recoverables. But maybe there is a reason why that is not a good idea?

Rich

Luke Bakken

unread,
Aug 3, 2023, 11:53:19 AM8/3/23
to rabbitmq-users
Hi Rich,

It would be great if you followed-up via that GitHub issue. Thanks!

Rich Bramante

unread,
Aug 3, 2023, 12:03:17 PM8/3/23
to rabbitmq-users
Hi Luke. I have added latest notes to the GitHub issue. Thank you!

Rich
Reply all
Reply to author
Forward
0 new messages