Better balancing of queues

302 views
Skip to first unread message

William Davis

unread,
Feb 28, 2019, 6:29:57 AM2/28/19
to rabbitmq-users
Rabbit 3.7.10
AWS C5.9xlarge instances x 4 (36 cores 72 GB Ram)
ELB Loadbalancer in front
Currently 324 Queues

Our application connects to the cluster through the ELB. I was monitoring some stats and noticed that Instance 2 has over 80% of the queues, I dont really know how this came to be.
When monitoring performance I notice that cores 0-17 are pegged constantly, while cores 18-35 mostly sit idle.
We plan to start implementing the Consistent Hash Exchange to further shard out our workload, but, I wanted to ask here if anyone had a 'simple automagic' way of getting this workload better distributed.
If we shutdown all applications and delete the queues, it'll all be recreated on startup. 
But I really dont want to have to manually configure the application to connect to any specific node, that is bad for failover.

So what are your thoughts on ways to better utilize all these wonderfully idle cores?

Luke Bakken

unread,
Feb 28, 2019, 10:19:14 AM2/28/19
to rabbitmq-users
Hi William,

We have a script available that will re-balance your queue masters - https://github.com/rabbitmq/support-tools/blob/master/scripts/rebalance-queue-masters

If you're feeling adventurous, there is a community-developed library to do this as well -https://github.com/Ayanda-D/rabbitmq-queue-master-balancer

Or, you can accomplish the re-balancing yourself by using policies in the same manner that the re-balancing script does.

But I really dont want to have to manually configure the application to connect to any specific node, that is bad for failover

No matter what node is a queue master, or what nodes have the queue mirrors, your apps can connect to any node in the cluster and interact with a queue.

What version of Erlang are you using?

Thanks,
Luke

William Davis

unread,
Feb 28, 2019, 3:37:20 PM2/28/19
to rabbitmq-users
Oh this is awesome!

Here is my erl info:
ubuntu@rabbitprod1:~$ erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().'  -noshell
"20"

Michael Klishin

unread,
Feb 28, 2019, 6:57:14 PM2/28/19
to rabbitm...@googlegroups.com
William,

Can you please use

rabbitmq-diagnostics erlang_version

or 

rabbitmqctl eval 'rabbit_misc:otp_release().'

if erlang_version is not available in 3.7.10 (I don't remember that, sorry).

Only having major Erlang version is often not enough.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

William Davis

unread,
Feb 28, 2019, 8:28:08 PM2/28/19
to rabbitmq-users
Here you go:

ubuntu@rabbitprod1:~$ sudo rabbitmq-diagnostics erlang_version
Asking node rabbit@rabbitprod1 for its Erlang/OTP version...
Erlang/OTP 20.3

I was looking at that script, but I dont see much in the way of documentation. 
What is the best way to use it?

Luke Bakken

unread,
Mar 1, 2019, 8:30:57 AM3/1/19
to rabbitmq-users
Hi William,

The script does have a help function.

To rebalance all queues in the default vhost, just run the script. The defaults will suffice. If you wish to use other vhosts, or select a subset of queues to re-balance, there are options for that (-p and -r, respectively).

Since re-balancing requires queue synchronization you should run this at times of low to no load on your system.

Thanks,
Luke

William Davis

unread,
Mar 1, 2019, 9:11:20 AM3/1/19
to rabbitmq-users
Thanks Luke, some feed back on it.
The default regex of .* err'd out for me on every vhost I tried. I can see it created the temp policy but then threw an exception.

root@rabbit1:/# cd ~
root@rabbit1:~# ls
config  mnesia  rebalance-queue-masters  schema
root@rabbit1:~# ./rebalance-queue-masters
20190301-14:04:13 [INFO] Setting temporary policies on vhost: /
20190301-14:04:17 [INFO] Updating queue master for queue 'name' from 'pid' to 'rabbit@rabbit3'
20190301-14:04:17 [INFO] Setting policy "name-ha-temp" for pattern "^name$" to "{"ha-mode":"exactly","ha-params":1}" with priority "990" for vhost "/" ...
20190301-14:04:18 [INFO] Synchronising queue 'name' in vhost '/' ...
Error:
not_found
root@rabbit1:~#

I was able to get it working by specifying my own regex:


root@rabbit1:~# ./rebalance-queue-masters -r my.*
20190301-14:05:06 [INFO] Setting temporary policies on vhost: /
20190301-14:05:10 [INFO] Updating queue master for queue 'my-new-queue12345678' from 'rabbit@rabbit1' to 'rabbit@rabbit3'
20190301-14:05:10 [INFO] Setting policy "my-new-queue12345678-ha-temp" for pattern "^my-new-queue12345678$" to "{"ha-mode":"exactly","ha-params":1}" with priority "990" for vhost "/" ...
20190301-14:05:11 [INFO] Synchronising queue 'my-new-queue12345678' in vhost '/' ...
20190301-14:05:12 [INFO] Setting policy "my-new-queue12345678-ha-temp" for pattern "^my-new-queue12345678$" to "{"ha-mode":"nodes","ha-params":["rabbit@rabbit3"]}" with priority "992" for vhost "/" ...
20190301-14:05:12 [INFO] Synchronising queue 'my-new-queue12345678' in vhost '/' ...
20190301-14:05:14 [INFO] Queue master successfully updated: 'rabbit@rabbit3'
20190301-14:05:14 [INFO] Clearing policy "my-new-queue12345678-ha-temp" on vhost "/" ...
...
...
...


Once I had it working in my test cluster I executed it in production. There were a few hiccups however. It seems that from time to time it would fail randomly (and I was dumb for not grabbing the output). I noticed that when it would fail the temp policy did not get deleted, so I had to manually do that. No big deal. There is one queue that failed to migrate that we are having an issue with now though. When our application tries to connect it times out performing any ops against this queue. I've tried to delete it manually, but this results in the web interface locking up. I also tried from the command line:
ubuntu@rabbitprod1:~$ sudo rabbitmqctl delete_queue myqueue.ScanHistoryReprocessing -p prod
Deleting queue 'myqueue.ScanHistoryReprocessing' on vhost 'prod' ...


This command hangs indefinitely (never returns). There is no log output related to this command from what I can see.

Luke Bakken

unread,
Mar 1, 2019, 10:51:40 AM3/1/19
to rabbitmq-users
Hi William,

I don't know why the default regex didn't work. I'll re-test at some point. With regard to the queue -

rabbitmqctl eval '{ok, Q} = rabbit_amqqueue:lookup(rabbit_misc:r(<<"prod">>, queue, <<"myqueue.ScanHistoryReprocessing">>)), rabbit_amqqueue:delete_crashed(Q).'

Give that command a try to delete your queue.
Thanks,
Luke

On Friday, March 1, 2019 at 6:11:20 AM UTC-8, William Davis wrote:
Thanks Luke, some feed back on it.
The default regex of .* err'd out for me on every vhost I tried. I can see it created the temp policy but then threw an exception.

William Davis

unread,
Mar 1, 2019, 11:12:25 AM3/1/19
to rabbitmq-users
Thanks Luke, that command did the trick. Appreciate the help here.

Rod

unread,
Aug 9, 2019, 2:02:37 AM8/9/19
to rabbitmq-users
Hi Luke!

These plugin worked flawless in RabbitMQ 3.7.x for me. But when using it in 3.6.10 it finishes with "no errors" all queues rebalanced. But temp HA policy are not removed and I ended up with double amount of messages per queue ( in those queues where the temporal HA policy was not removed).

I have seen similar issues though with other rebalancing tools and RabbitMQ 3.6

I don't know if that is happening only to me though. But I have the same config in RabbitMQ 3.6 and 3.7

Lutz Horn

unread,
Aug 9, 2019, 2:57:26 AM8/9/19
to rabbitm...@googlegroups.com
Hi Rod,

note that RabbitMQ 3.6 reached its EOL on 31 May 2018[1]. Especially plugin compatibility is not guaranteed for this version. You are advised to upgrade to RabbitMQ 3.7[2].

Lutz

[1] https://www.rabbitmq.com/versions.html
[2] https://www.rabbitmq.com/upgrade.html

________________________________________
Von: rabbitm...@googlegroups.com <rabbitm...@googlegroups.com> im Auftrag von Rod <rodrigo.sand...@gmail.com>
Gesendet: Freitag, 9. August 2019 08:02
An: rabbitmq-users
Betreff: [rabbitmq-users] Re: Better balancing of queues
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f87fd7e1-bfa8-4ead-be5c-1b59534553b1%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages