Minority quorum queue declaration

468 views
Skip to first unread message

Marek Beseda

unread,
Jun 13, 2019, 6:24:27 AM6/13/19
to rabbitmq-users
From my observartions, when declaring a quorum queue it spans the whole cluster including nodes that are currently down. This leads to a problem because when the nodes become up again, they don't always start participating.

Using erlang 22.0, 3.8-beta-4.
4 node cluster
Autoheal

Reproduction steps:
1. Setup a 4 node cluster.  Although the number of nodes should not really matter.
2. Force a partition, I used `blockade partition rabbitmq1`.
3. Create a quorum queue on isolated node
4. Restore partition `blockade join`.

What I got:
Queue exists after autoheal, but usually only one other node started participating, hence the quorum is not restored. Other nodes start participating after stop_app, start_app.

What I expected:
Queue should be either lost due to autoheal or all other nodes should participate.


By not participating I mean that the nodes don't mention local_state in quorum_status check. This can cause problems as clients are unaware of partition and when they declare queue a manual action has to be taken before the queue is operational.

Karl Nilsson

unread,
Jun 13, 2019, 7:06:51 AM6/13/19
to rabbitmq-users
It should not be possible to declare a queue inside a minority partition and the declare operation should fail. I would need to test that isn't something that has regressed. Whether or not all queue members are recovered after a successful declaration that didn't manage to start all nodes in the quorum is another matter that is very much dependent on the partition recover mode.

That said quorum queues are not designed to be frequently declared / deleted as part of a dynamic topology. Look at them more like a part of the static topology required for a given application. Something that is set up during application deploy and then rarely changed.

I will test to see it behaves as designed.

Cheers
Karl

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ac120e3c-cc47-4de9-b9be-ebf067665567%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Karl Nilsson

Marek Beseda

unread,
Jun 13, 2019, 8:05:16 AM6/13/19
to rabbitmq-users
This might be related too. It happened to me once that add_member succeeded in minority, but I was not able to reproduce this problem. However, what I usually got was either of these:

Error
nodedown

or:

Error:
{:shutdown, {:failed_to_start_child, :"%2F_test", {:already_started, #PID<10851.17080.0>}}}

The problem is that even when the majority is restored, the second error still shows. So trying to add a member during minority prevents you to add a member until restart. 

On Thursday, June 13, 2019 at 12:06:51 PM UTC+1, kjnilsson wrote:
It should not be possible to declare a queue inside a minority partition and the declare operation should fail. I would need to test that isn't something that has regressed. Whether or not all queue members are recovered after a successful declaration that didn't manage to start all nodes in the quorum is another matter that is very much dependent on the partition recover mode.

That said quorum queues are not designed to be frequently declared / deleted as part of a dynamic topology. Look at them more like a part of the static topology required for a given application. Something that is set up during application deploy and then rarely changed.

I will test to see it behaves as designed.

Cheers
Karl

On Thu, 13 Jun 2019 at 11:24, Marek Beseda <marek...@outlook.cz> wrote:
From my observartions, when declaring a quorum queue it spans the whole cluster including nodes that are currently down. This leads to a problem because when the nodes become up again, they don't always start participating.

Using erlang 22.0, 3.8-beta-4.
4 node cluster
Autoheal

Reproduction steps:
1. Setup a 4 node cluster.  Although the number of nodes should not really matter.
2. Force a partition, I used `blockade partition rabbitmq1`.
3. Create a quorum queue on isolated node
4. Restore partition `blockade join`.

What I got:
Queue exists after autoheal, but usually only one other node started participating, hence the quorum is not restored. Other nodes start participating after stop_app, start_app.

What I expected:
Queue should be either lost due to autoheal or all other nodes should participate.


By not participating I mean that the nodes don't mention local_state in quorum_status check. This can cause problems as clients are unaware of partition and when they declare queue a manual action has to be taken before the queue is operational.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ac120e3c-cc47-4de9-b9be-ebf067665567%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Karl Nilsson

Karl Nilsson

unread,
Jun 13, 2019, 8:58:04 AM6/13/19
to rabbitm...@googlegroups.com
I just did a test attempting to declare a queue in minority and this failed as expected. I'm not sure how you managed to declare a queue from a minority view - can you supply the logs for all nodes for that time period?



To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Karl Nilsson

Pivotal/RabbitMQ

Marek Beseda

unread,
Jun 13, 2019, 9:25:28 AM6/13/19
to rabbitmq-users
Attached logs. 1 was isolated in partition. I did not restore the network in these logs as the creation should have failed. Also, I don't think that the other logs are necessary, but adding them for completeness. Also below is the complete config.

[
    { rabbit, [
        { loopback_users, [ ] },
        { tcp_listeners, [ 5672 ] },
        { ssl_listeners, [ ] },
        { default_pass, <<"admin">> },
        { default_user, <<"admin">> },
        { hipe_compile, false },
        { cluster_partition_handling, autoheal },
        { vm_memory_high_watermark, 0.25} ,
        { disk_free_limit, 10000000000 },
        { kernel, [
            {net_ticktime, 20}
        ]},
        { log_levels, [
            {connection, debug}, {channel, debug}, {federation, debug}, {mirroring, debug}
        ]},
        { lager, [{error_logger_hwm, 10000 }]}
    ] },
    { rabbitmq_management, [ { listener, [
        { port, 15672 },
        { ssl, false }
    ] } ] }
].
logs_1.log
logs_2.log
logs_3.log
logs_4.log

Marek Beseda

unread,
Jun 13, 2019, 9:27:06 AM6/13/19
to rabbitmq-users
Just realized it might be relevant that I am declaring the queue through the management ui.

Karl Nilsson

unread,
Jun 18, 2019, 5:12:00 AM6/18/19
to rabbitm...@googlegroups.com
I've done some testing around this and there are some additional complexity around declaring queues through the management interface. Especially during a period when a node is still connected but isn't able to receive network traffic. Depending on the network partition strategy the database record for the queue may remain after cluster heal but with no actual queue nodes started, or it may be removed if the minority side ended up being the "loser" . In this case you either need to cycle the apps on each node to make sure they quorum queue replicas are started or delete and re-declare the queue.

With any Raft system you always need to take care around membership configuration changes to ensure you don't lose quorum and this applies here as well. The only reason I can think of when an operator needs to change the initially declared quorum queue members is when a RabbitMQ node is to be de-commissioned. For this we provide the rabbitmq-queues grow and shrink commands which either try to shrink cluster with members on a certain rabbit node or grow uneven quorum queues cluster into a new RabbitMQ node.

Until we make improvements around the internal database used to stored RabbitMQ entities (mnesia) I'm not sure we can provide better guarantees around queue declaration consistency during partitions. 

On Thu, 13 Jun 2019 at 14:27, Marek Beseda <marek...@outlook.cz> wrote:
Just realized it might be relevant that I am declaring the queue through the management ui.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Karl Nilsson

Pivotal/RabbitMQ

Marek Beseda

unread,
Jun 18, 2019, 9:53:13 AM6/18/19
to rabbitmq-users
So to be clear, the current implementation is not meant to be used regularly, but only during decommission and other rare occasions?


On Tuesday, June 18, 2019 at 10:12:00 AM UTC+1, Karl Nilsson wrote:
I've done some testing around this and there are some additional complexity around declaring queues through the management interface. Especially during a period when a node is still connected but isn't able to receive network traffic. Depending on the network partition strategy the database record for the queue may remain after cluster heal but with no actual queue nodes started, or it may be removed if the minority side ended up being the "loser" . In this case you either need to cycle the apps on each node to make sure they quorum queue replicas are started or delete and re-declare the queue.

With any Raft system you always need to take care around membership configuration changes to ensure you don't lose quorum and this applies here as well. The only reason I can think of when an operator needs to change the initially declared quorum queue members is when a RabbitMQ node is to be de-commissioned. For this we provide the rabbitmq-queues grow and shrink commands which either try to shrink cluster with members on a certain rabbit node or grow uneven quorum queues cluster into a new RabbitMQ node.

Until we make improvements around the internal database used to stored RabbitMQ entities (mnesia) I'm not sure we can provide better guarantees around queue declaration consistency during partitions. 

On Thu, 13 Jun 2019 at 14:27, Marek Beseda <marek...@outlook.cz> wrote:
Just realized it might be relevant that I am declaring the queue through the management ui.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/bf83c98b-f1c7-47fb-943b-42d0c34edd66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Karl Nilsson

Pivotal/RabbitMQ

Michael Klishin

unread,
Jun 30, 2019, 6:30:59 PM6/30/19
to rabbitmq-users
Current implementation of what? Karl is referring to the fact that the internal schema data store is very opinionated
in how it suggests applications recover from partitions. We are migrating it to Raft to have a much more reasonable, well understood recovery behavior.
This is a WIP and will ship in RabbitMQ 4.0.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages