Question on queue_leader

AK

unread,

Oct 31, 2023, 4:07:11 AM10/31/23

to rabbitmq-users

Hi,

We have a 5 node RMQ (3.11.2) cluster, we have set queue_leader_locator to balanced. As per the document when queues are declared they queues get balanced. But we have observed if queues are declared in bulk we saw majority of queue's leader is a single node - it is skewed.

In the document https://www.rabbitmq.com/quorum-queues.html#leader-placement

balanced: If there are overall less than 1000 queues (classic queues, quorum queues, and streams), pick the node hosting the minimum number of quorum queue leaders. If there are overall more than 1000 queues, pick a random node.

What does the above exactly mean ? first 1000 queues if declared at same time will go to the same node ?

Regards,
Arati

Michal Kuratczyk

unread,

Oct 31, 2023, 4:31:56 AM10/31/23

to rabbitm...@googlegroups.com

If a queue is declared and there are currently less than 1000 queues in the cluster,

we perform an exact check/comparison and place the leader on the node with the least number of leaders.

If there is more than 1000 queues, a random node is picked, because that's a much faster operation than

counting leaders on all nodes and at this scale, we assume it doesn't really matter (random still provides a relatively

even distribution). The history of "balanced" is that "least-leaders" strategy that we used to have

didn't scale well - for example importing 10000 queues to a cluster would get very slow over time since for every new

queue, we had to count the exact number of queue leaders on each node. The "random" strategy on the other hand,

doesn't work well with a low number of queues (with just a handful of queues, the distribution can be very skewed).

Hence, we implemented a mixed "balanced" strategy, which provides the perfect distribution for a relatively small

number of queues and fairly even distribution for a large number of queues, while providing much better performance:

O(1) instead of O(N) for N > 1000.

If this is not the behaviour you see, please provide an executable test case.

If you think the docs are not clear enough - please PR a suggested edit.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/062ff00e-1fa2-4c5d-b218-15096d9052f8n%40googlegroups.com.

--

Michał

RabbitMQ team

AK

unread,

Oct 31, 2023, 7:46:10 AM10/31/23

to rabbitmq-users

We have a 5 node cluster, 3.11.2 version. queue-leader-locator set to balanced in config file on all the nodes.
We had 276 queues balanced between 5 nodes (56 on 3 nodes, 54 on 2 nodes).

App starts using this RMQ declares 269 new queues, the queue distribution looks like follows:
1st node: 81 queues
2nd node: 59 queues
3rd node: 59 queues
4th node: 67 queues
5th node: 279 queues

triggering rebalance manually balanced the queues equally. Even though the balance mode was set and the queues were balanced, when new queues were declared by app(amqp-client: 5.7.3) queues were again skewed.

Michal Kuratczyk

unread,

Oct 31, 2023, 8:28:56 AM10/31/23

to rabbitm...@googlegroups.com

Again, if this is not the behaviour you see, please provide an executable test case.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/eeee3749-7e16-4c75-920a-9b07460d7385n%40googlegroups.com.

--

Michał

RabbitMQ team

AK

unread,

Oct 31, 2023, 10:16:47 AM10/31/23

to rabbitmq-users

Hi Michal,

can you give me an example of an executable test case ?

Regards,
Arati

Michal Kuratczyk

unread,

Oct 31, 2023, 10:37:08 AM10/31/23

to rabbitm...@googlegroups.com

Something I can just execute against a freshly deployed cluster, ideally by simply copy-pasting your commands.

Whether that's a bunch of `rabbitmqctl` / `rabbitmqadmin` or similar commands,

or https://perftest.rabbitmq.com/ or a custom app you develop (sometimes this is necessary,

but should be avoided if possible, as it adds steps for me to understand how to build it / run it

and introduces a risk of client-side problems).

Here's a surprisingly related example:

https://github.com/rabbitmq/rabbitmq-server/issues/5703

In this case it was an internal report, so I use bazel to start a development node and I assume people

know where to find our sample definition files (I refer to a file from https://github.com/rabbitmq/sample-configs/)

but overall, someone can copy-paste this and see the problem and then re-run the steps to test

the fix. As an additional benefit, creating such steps often leads to enlightenment where you realize

what you did wrong (I'm not suggesting you did anything wrong here, just in general).

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/296fc47f-c73c-4753-b046-f00f05bfd21bn%40googlegroups.com.

--

Michał

AK

unread,

Oct 31, 2023, 10:40:17 AM10/31/23

to rabbitmq-users

Understood Michal. Thanks.
I will try and reproduce this issue at my end and get back with the exact steps.

Reply all

Reply to author

Forward

Question on queue_leader_locator

AK

Michal Kuratczyk

AK

Michal Kuratczyk

AK

Michal Kuratczyk

AK