reject-requeue crashes quorum members with x-delivery-limit

1,041 views
Skip to first unread message

Adam Gardner

unread,
Apr 27, 2020, 6:34:18 PM4/27/20
to rabbitmq-users
Aloha,

I've run into an odd set of behaviors in regards to quorum queues with x-delivery-limit set. For a minimal reproduction case:

Setup:
  1. Create a fresh single-node installation running RabbitMQ 3.8.3 on Erlang 22.3.2 (I'll show differences that occur with a multi-node cluster afterwards, but the results are more-or-less the same)
  2. Make sure the Quorum Queues feature flag is enabled
  3. Through the management UI, create a new queue, called "dummy", with Queue Type "Quorum" and x-delivery-limit set to 2.
  4. Open the queue details page, and publish a simple message such as '{"foo":"bar"}'
Test:
  1. Using the "Get message" section of the UI, set "Ack mode" to "Reject requeue true"
  2. Click "Get Message(s)", and your message is shown, with an x-delivery-count header value of 0
  3. Click "Get Message(s)" again, and your message is shown, with an "x-delivery-count" header value of 1
  4. Click "Get Message(s)" again, and you are told "541 Cannot get a message from quorum queue 'queue 'dummy' in vhost '/'': noproc"
  5. Eventually the UI will display that the queue state is "down", with a tooltip that says "The queue is located on a cluster node or nodes that are down", although the sole node in the cluster is still online. You may need to attempt to interact with the queue again (e.g. get messages again) in order for the queue state to update in the UI.
Essentially the same results in a three-node cluster; although the queue state doesn't transition to 'down', the leader changes to "?" and the "online" list drops from 3 nodes to one node (no obvious pattern to which node remains, sometimes it's the former leader, sometimes it isn't). Regardless, attempting to consume from the queue fails. Actually, the leader and member loss can be observed to occur as soon as the message passes the delivery-limit threshold (e.g. immediately after step 3), before attempting to consume from the queue again.

Variations that don't change behavior:
  • Same results if the dummy queue has x-dead-letter-exchange set
  • Same results if you consume messages from code rather than the UI
  • Same results if you bind the dummy queue to an exchange and publish messages to the exchange, rather than publishing directly to the queue
  • Same results if x-delivery-limit is applied via policy rather than at queue creation
Variations that do change behavior:
  • Works without error if you use "Nack messages requeue true" instead of "Reject requeue true"
  • Works without error if you do not have x-delivery-limit set on the queue
Other notes:
I initially came across this while trying to come up with a minimum reproduction for a situation where there was no crash, but the x-delivery-limit was not being honored (that is, x-delivery-limit was set to 3 but the same message could be redelivered an arbitrary number of times). I can no longer reproduce my original error, however.

Has anyone else encountered anything like this?

Thanks,
- Adam Gardner


Adam Gardner

unread,
Apr 27, 2020, 8:00:16 PM4/27/20
to rabbitmq-users
Just to add some additional information here, only the final redelivery attempt (the one that would push the message over the delivery limit) matters; if you reject-requeue twice once and then nack-requeue, it works fine; if you nack-requeue and then reject-requeue, it doesn't. Also, when the message is requeued because the client channel was closed, that works the same way as the nack-requeue case (in other words, it works as expected)

Adam Gardner

unread,
Apr 27, 2020, 8:32:06 PM4/27/20
to rabbitmq-users
Some further additional information, when trying this from code (using the Bunny library for Ruby), I actually sometimes get failures even when using basic.nack, in addition to the failures using basic.reject. I'm not sure what the distinction is.

Adam Gardner

unread,
Apr 27, 2020, 11:02:11 PM4/27/20
to rabbitmq-users
More information: basic.nack never actually works from code at all, the only thing that works from code is closing the channel without acking, nacking or rejecting the message. The reason that "Nack message requeue true" works from the management UI seems to be that it doesn't actually call basic.nack, it just lets the channel close (see https://github.com/rabbitmq/rabbitmq-management/blob/master/src/rabbit_mgmt_wm_queue_get.erl lines 75 and 76, 108-112 and 129-149).

Karl Nilsson

unread,
Apr 28, 2020, 3:21:14 AM4/28/20
to rabbitm...@googlegroups.com
Ok thanks, logs would have been helpful as this was a crash bug but it was easy to reporoduce. It seems to be limited to using basic.get when reaching the delivery limit. When using basic.consume it doesn't happen. Will work on a fix for this today.

Cheers
Karl

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ea93a593-2f50-44c8-b7d4-ee8349e59c9b%40googlegroups.com.


--
Karl Nilsson

Pivotal/RabbitMQ

Karl Nilsson

unread,
Apr 28, 2020, 10:37:20 AM4/28/20
to rabbitm...@googlegroups.com
Ok here is the fix PR, to be included in 3.8.4


Thanks for the report
Karl
--
Karl Nilsson

Pivotal/RabbitMQ

Adam Gardner

unread,
Apr 28, 2020, 2:29:29 PM4/28/20
to rabbitmq-users
Sweet, thanks for the quick turnaround on that. Sorry for omitting the logs, not sure what I was thinking.

Thanks,
- Adam

On Tuesday, April 28, 2020 at 4:37:20 AM UTC-10, Karl Nilsson wrote:
Ok here is the fix PR, to be included in 3.8.4


Thanks for the report
Karl

On Tue, 28 Apr 2020 at 08:20, Karl Nilsson <knil...@pivotal.io> wrote:
Ok thanks, logs would have been helpful as this was a crash bug but it was easy to reporoduce. It seems to be limited to using basic.get when reaching the delivery limit. When using basic.consume it doesn't happen. Will work on a fix for this today.

Cheers
Karl

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.


--
Karl Nilsson

Pivotal/RabbitMQ


--
Karl Nilsson

Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages