Error: The following plugins could not be found: rabbitmq_queue_master_balancer

845 views
Skip to first unread message

Rodrigo

unread,
Jun 11, 2019, 7:30:56 PM6/11/19
to rabbitmq-users
Hi,

I'm using RabbitMq 3.6.10 and trying to install this plug-in https://github.com/Ayanda-D/rabbitmq-queue-master-balancer

I have copied the *.ez file to 2 different directories with no luck (under usr/lib/rabbitmq/plugins     AND   /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.10/plugins

plugin is not even listed

[e*] amqp_client                       3.6.10

[E*] autocluster                       0.9.0+4.g0e7899d

[e*] cowboy                            1.0.4

[e*] cowlib                            1.0.2

[  ] rabbitmq_amqp1_0                  3.6.10

[E*] rabbitmq_auth_backend_ldap        3.6.10

[  ] rabbitmq_auth_mechanism_ssl       3.6.10

[e*] rabbitmq_aws                      3.6.13.milestone1+2.g946e794

[  ] rabbitmq_consistent_hash_exchange 3.6.10

[  ] rabbitmq_event_exchange           3.6.10

[  ] rabbitmq_federation               3.6.10

[  ] rabbitmq_federation_management    3.6.10

[  ] rabbitmq_jms_topic_exchange       3.6.10

[E*] rabbitmq_management               3.6.10

[e*] rabbitmq_management_agent         3.6.10

[  ] rabbitmq_management_visualiser    3.6.10

[E*] rabbitmq_mqtt                     3.6.10

[  ] rabbitmq_recent_history_exchange  3.6.10

[  ] rabbitmq_sharding                 3.6.10

[E*] rabbitmq_shovel                   3.6.10

[E*] rabbitmq_shovel_management        3.6.10

[  ] rabbitmq_stomp                    3.6.10

[  ] rabbitmq_top                      3.6.10

[  ] rabbitmq_tracing                  3.6.10

[  ] rabbitmq_trust_store              3.6.10

[e*] rabbitmq_web_dispatch             3.6.10

[  ] rabbitmq_web_mqtt                 3.6.10

[  ] rabbitmq_web_mqtt_examples        3.6.10

[  ] rabbitmq_web_stomp                3.6.10

[  ] rabbitmq_web_stomp_examples       3.6.10

[  ] sockjs                            0.3.4

sh-4.2$ 


and obviously not found

something_here$ sudo rabbitmq-plugins enable rabbitmq_queue_master_balancer

Error: The following plugins could not be found:

  rabbitmq_queue_master_balancer

Rod

unread,
Jun 12, 2019, 12:31:46 AM6/12/19
to rabbitmq-users
Just in case if this is relevant: I'm directly downloading and copying the *.ez file from here: https://github.com/Ayanda-D/rabbitmq-queue-master-balancer/releases/download/v0.0.4/rabbitmq_queue_master_balancer-0.0.4.ez

Luke Bakken

unread,
Jun 12, 2019, 10:58:44 AM6/12/19
to rabbitmq-users
Hi Rod,

I suspect that .ez file is not compatible with RabbitMQ 3.6.10, which is old and out-of-support. I suggest trying out the latest version of RabbitMQ (3.7.15) with it.

Thanks,
Luke

Rod

unread,
Jun 12, 2019, 2:28:40 PM6/12/19
to rabbitmq-users
Hi, thanks for replying.

As per the documentation: 


It says : This plugin is compatible with RabbitMQ 3.6.x and beyond, to the latest release.

Luke Bakken

unread,
Jun 12, 2019, 2:34:14 PM6/12/19
to rabbitmq-users
Hi Rod,

At this point I can recommend reviewing this page - https://www.rabbitmq.com/installing-plugins.html

When you copy the plugin to an appropriate directory be sure that the permissions on the .ez file allow the rabbitmq user to read it.

I'm assuming you're using Erlang 19 or greater.

Thanks,
Luke

Rod

unread,
Jun 12, 2019, 2:39:50 PM6/12/19
to rabbitmq-users
Hi!

Do I have to do something with Erlang? because the *.ez file is ready to be used (just downloaded and put it in the plugin directory).

Also I checked file permission. All of them have the same kind of permission:

-rw-r--r--. 1 root root 271316 May 25  2017 amqp_client-3.6.10.ez

-rw-r--r--. 1 root root 225805 May 25  2017 cowboy-1.0.4.ez

-rw-r--r--. 1 root root 125600 May 25  2017 cowlib-1.0.2.ez

-rw-r--r--. 1 root root 841579 May 25  2017 rabbit_common-3.6.10.ez

-rw-r--r--. 1 root root 211403 May 25  2017 rabbitmq_amqp1_0-3.6.10.ez

-rw-r--r--. 1 root root  34398 May 25  2017 rabbitmq_auth_backend_ldap-3.6.10.ez

-rw-r--r--. 1 root root  13098 May 25  2017 rabbitmq_auth_mechanism_ssl-3.6.10.ez

-rw-r--r--. 1 root root  14670 May 25  2017 rabbitmq_consistent_hash_exchange-3.6.10.ez

-rw-r--r--. 1 root root  11460 May 25  2017 rabbitmq_event_exchange-3.6.10.ez

-rw-r--r--. 1 root root 162854 May 25  2017 rabbitmq_federation-3.6.10.ez

-rw-r--r--. 1 root root  13812 May 25  2017 rabbitmq_federation_management-3.6.10.ez

-rw-r--r--. 1 root root  22438 May 25  2017 rabbitmq_jms_topic_exchange-3.6.10.ez

-rw-r--r--. 1 root root 745255 May 25  2017 rabbitmq_management-3.6.10.ez

-rw-r--r--. 1 root root 149415 May 25  2017 rabbitmq_management_agent-3.6.10.ez

-rw-r--r--. 1 root root  41445 May 25  2017 rabbitmq_management_visualiser-3.6.10.ez

-rw-r--r--. 1 root root 105971 May 25  2017 rabbitmq_mqtt-3.6.10.ez

-rw-r--r--. 1 root root    630 Jun 12 04:13 rabbitmq_queue_master_balancer-0.0.4.ez

-rw-r--r--. 1 root root  14659 May 25  2017 rabbitmq_recent_history_exchange-3.6.10.ez

-rw-r--r--. 1 root root  34102 May 25  2017 rabbitmq_sharding-3.6.10.ez

-rw-r--r--. 1 root root  81065 May 25  2017 rabbitmq_shovel-3.6.10.ez

-rw-r--r--. 1 root root  18963 May 25  2017 rabbitmq_shovel_management-3.6.10.ez

-rw-r--r--. 1 root root 109801 May 25  2017 rabbitmq_stomp-3.6.10.ez

-rw-r--r--. 1 root root  51777 May 25  2017 rabbitmq_top-3.6.10.ez

-rw-r--r--. 1 root root  49841 May 25  2017 rabbitmq_tracing-3.6.10.ez

-rw-r--r--. 1 root root  50943 May 25  2017 rabbitmq_trust_store-3.6.10.ez

-rw-r--r--. 1 root root  40288 May 25  2017 rabbitmq_web_dispatch-3.6.10.ez

-rw-r--r--. 1 root root  24697 May 25  2017 rabbitmq_web_mqtt-3.6.10.ez

-rw-r--r--. 1 root root  66243 May 25  2017 rabbitmq_web_mqtt_examples-3.6.10.ez

-rw-r--r--. 1 root root  37693 May 25  2017 rabbitmq_web_stomp-3.6.10.ez

-rw-r--r--. 1 root root  52184 May 25  2017 rabbitmq_web_stomp_examples-3.6.10.ez

-rw-r--r--. 1 root root  57872 May 25  2017 ranch-1.3.0.ez

-rw-r--r--. 1 root root     59 May 25  2017 README

-rw-r--r--. 1 root root 100901 May 25  2017 sockjs-0.3.4.ez

Luke Bakken

unread,
Jun 12, 2019, 2:52:08 PM6/12/19
to rabbitmq-users
Hi Rod,

I'm at a bit of a loss as to what's going on in your environment. I downloaded the generic-unix package for 3.6.10, started up RabbitMQ using Erlang 19.3, and produced the attached transcript.

The transcript shows the list of plugins before and after copying the .ez file to the plugins/ directory.

Please run the cksum command on your downloaded .ez file to ensure it matches what is in the transcript.

Thanks,
Luke
queue-balancer-plugin-transcript.txt

Rod

unread,
Jun 12, 2019, 5:20:19 PM6/12/19
to rabbitmq-users
Wow, The check sum was the problem!! I used wget command to download the file and it worked. 

Now the plugin is listed, but when I try to enable it I'm getting the following issue:


Error: {plugin_built_with_incompatible_erlang,

           "rabbitmq_queue_master_balancer"}



I will have to investigate, thanks Luke!

Luke Bakken

unread,
Jun 12, 2019, 5:21:34 PM6/12/19
to rabbitmq-users
Hi Rod,

What version of Erlang are you using? That plugin probably requires at least version 19.3.

Thanks,
Luke

Rod

unread,
Jun 12, 2019, 5:30:58 PM6/12/19
to rabbitmq-users
Hi Luke, 
I just checked:

sh-4.2$  erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().'  -noshell

"19"

Luke Bakken

unread,
Jun 12, 2019, 5:38:59 PM6/12/19
to rabbitmq-users
Rod -

Your RabbitMQ log will also show the minor and patch versions for the Erlang VM. Give that a check. I'm wondering if you're on 19.0 - 19.2

Luke

Rod

unread,
Jun 12, 2019, 5:45:15 PM6/12/19
to rabbitmq-users
Hi!

I checked the log:

=INFO REPORT==== 11-Jun-2019::18:25:06 ===

Starting RabbitMQ 3.6.10 on Erlang 19.3.6.13

Copyright (C) 2007-2017 Pivotal Software, Inc.

Licensed under the MPL.  See http://www.rabbitmq.com/

Rod

unread,
Jun 12, 2019, 5:48:07 PM6/12/19
to rabbitmq-users
By the way, although this might be irrelevant, at the very beginning of the log it's says:

=WARNING REPORT==== 11-Jun-2019::18:25:05 ===

Problem reading some plugins: [{"/usr/lib/rabbitmq/lib/rabbitmq_server-3.6.10/plugins/rabbitmq_queue_master_balancer-0.0.4.ez",

                                {invalid_ez,einval}}]


=INFO REPORT==== 11-Jun-2019::18:25:06 ===

Starting RabbitMQ 3.6.10 on Erlang 19.3.6.13

Copyright (C) 2007-2017 Pivotal Software, Inc.

Licensed under the MPL.  See http://www.rabbitmq.com/


Luke Bakken

unread,
Jun 12, 2019, 5:59:26 PM6/12/19
to rabbitmq-users
Right, that's because the .ez file you first downloaded is corrupt.

I'm not sure why you're getting the "incompatible Erlang" message as I tested with version 19.3 locally. I suspect to resolve that the fastest method will be to compile the plugin from source. Instructions are on the plugin's README page.

Luke

Rod

unread,
Jun 12, 2019, 6:09:00 PM6/12/19
to rabbitmq-users
Yep, thanks, I think that is what I'm going to do. Basically I should compile with the same version as My RabbitMq is running in the server and I should be fine.

By the way, If I try to use older versions like 0.0.3 or 0.0.1 of the same plugin I get this:


$ sudo rabbitmq-plugins enable rabbitmq_queue_master_balancer

Plugin configuration unchanged.


Applying plugin configuration to rabbit@xxxxxxxxxxx...WARNING: module rabbit_queue_master_balancer_app not found, so not scanned for boot steps.

WARNING: module rabbit_queue_master_balancer_sup not found, so not scanned for boot steps.

WARNING: module rabbit_queue_master_balancer_sync not found, so not scanned for boot steps.

Error: {could_not_start,rabbitmq_queue_master_balancer,

           {undef,

               [{rabbit_queue_master_balancer_app,start,[normal,[]],[]},

                {application_master,start_it_old,4,

                    [{file,"application_master.erl"},{line,273}]}]}}

 failed.

Rod

unread,
Jun 13, 2019, 11:11:26 AM6/13/19
to rabbitmq-users
So I started building the pluigin since looks like the Erlang version it was built is not the one compatible with RabbitMq 3.6.10. 

In this case:

Erlang version used by RabbitMq 3.6.10 is Erlang 19.

I had to install:
  • Git command
  • Elixir 1.7 (higher release compatible Elixir version with Erlang 19)

After that process when running "make" command to build  the plugin source code I'm getting:

make[2]: execvp: elixir: Permission denied

make[1]: Leaving directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbit_common'

make[1]: Entering directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbit'

erlang.mk:30: Please upgrade to GNU Make 4 or later: https://erlang.mk/guide/installation.html

make[1]: execvp: elixir: Permission denied

make[2]: Entering directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbitmq_cli'

erlang.mk:30: Please upgrade to GNU Make 4 or later: https://erlang.mk/guide/installation.html

make[2]: execvp: elixir: Permission denied

make[3]: Entering directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/observer_cli'

/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbitmq_cli/erlang.mk:30: Please upgrade to GNU Make 4 or later: https://erlang.mk/guide/installation.html

/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbitmq_cli/erlang.mk:30: Please upgrade to GNU Make 4 or later: https://erlang.mk/guide/installation.html

make[3]: Leaving directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/observer_cli'

 GEN    escript/rabbitmqctl

/bin/sh: line 3: mix: command not found

make[2]: *** [escript/rabbitmqctl] Error 127

make[2]: Leaving directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbitmq_cli'

make[1]: *** [deps] Error 2

make[1]: Leaving directory `/home/test_stuff/rabbitmq-queue-master-balancer-0.0.4/deps/rabbit'

make: *** [deps] Error 2



I'm not sure why I'm getting make[2]: execvp: elixir: Permission denied

In theory all my files should have the right permissions. I ran sudo chmod   777  -R /usr/bin/elixir/


-rwxrwxrwx. 1 root root 3745 Jul 13  2018 elixir

-rwxrwxrwx. 1 root root 4747 Mar 15  2018 elixir.bat

-rwxrwxrwx. 1 root root 1239 Mar 15  2018 elixirc

-rwxrwxrwx. 1 root root 1325 Mar 15  2018 elixirc.bat

-rwxrwxrwx. 1 root root 2272 Mar 15  2018 iex

-rwxrwxrwx. 1 root root 2468 Mar 15  2018 iex.bat

-rwxrwxrwx. 1 root root   45 Dec 22  2017 mix

-rwxrwxrwx. 1 root root   95 Feb  6  2018 mix.bat

-rwxrwxrwx. 1 root root  576 Dec 22  2017 mix.ps1

Luke Bakken

unread,
Jun 13, 2019, 12:46:04 PM6/13/19
to rabbitmq-users
Hi Rod,

I'm pretty sure your issue is due to elixir being corrupted, much in the same way we found that the .ez file was corrupted.

I have built this plugin for you, and here are the steps I used:

source /home/lbakken/development/erlang/installs/19.3.6.12/activate             # activates that erlang version in my PATH, built by kerl
kiex use 1.6.6                                                                  # use older version of Elixir
cd rabbitmq-queue-master-balancer
git checkout v3.6.x
make dist

I have attached the .ez files that the above command generated. You probably only need to copy the rabbitmq_queue_master_balancer-0.0.4+dirty.ez file into the appropriate plugins/ directory.

Let me know how it goes -
Luke
dist.tgz

Rod

unread,
Jun 13, 2019, 2:05:43 PM6/13/19
to rabbitmq-users
Thanks! I will try with your file. But I checked the cheksum for the elixir *.zip file and it was correct.  I would like to give a chance somehow to solve my building problem, Although I will test your file for sure.
Message has been deleted

Rod

unread,
Jun 13, 2019, 3:41:27 PM6/13/19
to rabbitmq-users
Hi Luke,

I just used yours. I'm still getting the same:


Applying plugin configuration to rabbit@someIpHere...WARNING: module rabbit_queue_master_balancer_app not found, so not scanned for boot steps.

WARNING: module rabbit_queue_master_balancer_sup not found, so not scanned for boot steps.

WARNING: module rabbit_queue_master_balancer_sync not found, so not scanned for boot steps.

 failed.

Error: {could_not_start,rabbitmq_queue_master_balancer,

           {undef,

               [{rabbit_queue_master_balancer_app,start,[normal,[]],[]},

                {application_master,start_it_old,4,

                    [{file,"application_master.erl"},{line,273}]}]}}


El jueves, 13 de junio de 2019, 12:46:04 (UTC-4), Luke Bakken escribió:

Rod

unread,
Jun 13, 2019, 5:03:41 PM6/13/19
to rabbitmq-users
Luke, I just re-install yours in a new node. Clean node, and I was able to enable it. I think the other node was full of corrupted files. And that triggered that when installing a new build of the plugin (a good one) it mixed with the corrupted files eventually.

Just a hunch, but at least I was able to install it :)

Luke Bakken

unread,
Jun 14, 2019, 1:05:52 AM6/14/19
to rabbitmq-users
OK that's as good of an explanation as any. Let us know if you have issues using the plugin.

Rod

unread,
Jun 18, 2019, 11:23:23 AM6/18/19
to rabbitmq-users
No issues so far. I might have questions but I'm going to use another topic to ask.


taking advantage of it, I might ask if there's a testing tool for this kind of plugin (Example, send load of messages while nodes go down and come back to the normal) I would like to see if messages are lost or that kind of things.

Luke Bakken

unread,
Jun 18, 2019, 11:36:01 AM6/18/19
to rabbitmq-users

Ayanda

unread,
Jun 19, 2019, 11:18:16 AM6/19/19
to rabbitmq-users
Hi Rod,

The release package (0.0.4) you were trying to use was for built 3.7.x, see release notes: https://github.com/Ayanda-D/rabbitmq-queue-master-balancer/releases/tag/v0.0.4
I'll be adding 3.6.x packages as well, to help avoid you/others having to manually build this. See latest release notes & packages: https://github.com/Ayanda-D/rabbitmq-queue-master-balancer/releases/tag/v0.0.5
The plugin is indeed compatible to both 3.6.x and 3.7.x as you quoted, but had to be built for 3.6.x in your case. See updated README: https://github.com/Ayanda-D/rabbitmq-queue-master-balancer#build 
Not as many users still on 3.6.x, but we'll be attaching a 3.6.x release package for users, to avoid this extra step you incurred.

Regarding the test scenario you've mentioned: "if there's a testing tool for this kind of plugin - send load of messages while nodes go down and come back to the normal",
it would be a good test, but during queue transitions while plugin is attaining balance in your cluster, it will be a risk if your network is bad and nodes are teared up/down, rapidly.
The README states, under "Additional Info":
   
     Queue balancing is a delicate operation which must be carried out in a very controlled manner and environment not prone to network partitions.

It's a support tool, to be used in planned support window periods when your clusters are for example, handling moderate uniform traffic patterns, low chances of unexpected traffic bursts, partitions, node terminations, etc.
However feedback on stretching it's performance under intense conditions is very much welcome, and would help us make it withstand various conditions which may not have mimicked/simulated in our tests.
Please don't hesitate to file an issue if you come across any.

Cheers! & Luke, thanks again for chipping in ;-)

Rod

unread,
Jul 23, 2019, 1:49:20 AM7/23/19
to rabbitmq-users
Hi, Again!

I have been using the plugin that Luke had posted (the custom build he created for me) all good at the very begining. But now I´m facing some issues. 

  • The plugin is not rebalancing queues. Let´s say I have 505 queues, I rebooted one node. That node when coming back has 0 queues, which is fine. 
  • HA is enabled.
  • Using rebalance plugin: I load queues, then I use "go" command. Then when I try to get the report I see that just one queue has been moved to the node that had 0 queues.
  • Many times when I run the report command, after having run "load queues" and "go" command from the rebalance plugin I get the error showed in below picture:

IMG_20190723_012558__01.jpg

IMG_20190723_014611__01.jpg


Ayanda

unread,
Jul 23, 2019, 7:10:01 AM7/23/19
to rabbitmq-users
Hi Rod

The plugin that Luke posted was build of v0.0.4. A newer version followed/was released.
You'll find 3.6.x compatible version assets attached for use.

I'm also assuming the you have a significant queue depth size on all, or part of your "505" queues? With the version
you are using, yes, you are likely to run into an issue of the the internal plugin's FSM process exhausting. It's been
made resistant and dynamically adaptable to the installation on which it's on use in v0.0.5.

Upgrade and let us know if you face an issues. This issue (unless something else) is resolved in v0.0.5. Cheers!


Ayanda

Rod

unread,
Jul 23, 2019, 10:49:58 AM7/23/19
to rabbitmq-users
Hi Ayanda,


And I got issue in the following below picture. Just in case I´m using RabbitMQ 3.6.10 and Erlang/OTP 19. You can see the plugin is in the directory for sure.

IMG_20190723_104303__01.jpg

Rod

unread,
Jul 23, 2019, 11:14:35 AM7/23/19
to rabbitmq-users
More info from the logs in the following caps:

IMG_20190723_110301__01.jpg

IMG_20190723_110926__01.jpg

Ayanda

unread,
Jul 23, 2019, 12:06:59 PM7/23/19
to rabbitmq-users
Hi Rod

Ok. It's an Erlang conflict. The release artefacts on the project repo were built on Erlang 20.3.
I've just created a package for you on Erlang 19 (find attached, an Erlang-19 compatible 0.0.5__3.6.x.ez, package)

Then, in case you need to (or come across this conflict again), you can build a package by cloning the project and switching to the v3.6.x branch, then execute make dist
You'll find the plugin package in the "plugins" directory.

You'll need Elixir installed as well (you can reference some of my notes[1] for this, which I forward most users/customers setting-up Erlang/Elixir). Cheers!



Ayanda
rabbitmq_queue_master_balancer-0.0.5...3.6.x.ez

Rod

unread,
Jul 23, 2019, 1:02:25 PM7/23/19
to rabbitmq-users
Hi Ayanda, thanks for the help. 

I'm still having issues and the plugin somehow doesn't want to rebalance. Let me add the logs and process I went through in pictures:

IMG_20190723_124539__01.jpg

IMG_20190723_125236.jpg

IMG_20190723_125640__01.jpg

Ayanda

unread,
Jul 23, 2019, 1:54:31 PM7/23/19
to rabbitmq-users
Hi Rod

Ok, looks like the node is still executing the old version of the plugin. Carryout the following:

1. Disable the plugin (assuming you've installed the new version)
2. Navigate to the plugins-expand directory, var/lib/rabbitmq/mnesia/<YOUR-RABBIT-NODENAME>-plugins-expand/ and locate the plugin's expanded directory, rabbitmq_queue_master_balancer-<VERSION>/
3. Delete the plugin's expanded directory. It contains the compiled BEAM files of the plugin which will be loaded on your node/Erlang runtime. Your node/cluster was/is still loading the old version
4. Re-enable the plugin. 

The newer version will be expanded and loaded on your node's runtime. Cheers!


Regards,
Ayanda

Rod

unread,
Jul 23, 2019, 2:44:40 PM7/23/19
to rabbitmq-users
Hi Ayanda

I just deleted all mixed versions that I had in that folder, in all my 3 nodes. As per what you suggested. Then I retried to rebalance queues, and nothing happened :(

IMG_20190723_144118.jpg

picture here with the latest log

Ayanda

unread,
Jul 24, 2019, 5:40:25 AM7/24/19
to rabbitmq-users
Hi Rod

You're hitting this error[1]. Which simply means a lookup for a policy's existence in the policy registry table (rabbit_runtime_parameters [2]) was carried out, during
an attempt to clear the policy[3][4].

This plugin doesn't make use of any prolonged policies and parameters. Neither does it use or expect any policies and parameters to be in available and in existence
prior it's use. It only uses intermediate "short-lived" policies, which it sets[5], i.e. writes it into the policy registry table[6], and then immediately deletes[7] and clears[3][4] it
after the configured policy_transition_delay has elapsed [8]. (we have this policy_transition_delay to safely interleave updates into the database/policy registry table).

So the only possible means/manner in which this error you're facing can be produced here, is, if during it's balancing run, in between it's temporary policy being set and cleared,
you're already somehow clearing the policy? And by the time the plugin attempts to clear a policy which it just momentarily set, the policy is already not_found[1],
i.e. not available in the policy registry table. This doesn't make sense, unless policies are being externally changed during the plugin's run. (also reset[9] the plugin prior each run if
you're facing this).

The plugin should rightfully, stop, and report such errors. It rightfully will not attempt to clear policies or proceed when such conflicts are found. If it continued running, it could
affect your installation - so stopping and reporting the error in the logs is a good thing. Which is the main concern with other long running scripts - how they behave when unexpected
conditions manifest, such as this? Here, we enforce termination immediate stoppage and error reporting in the logs without any effects on your queues/policies, etc.
So please don't clear (or change) any intermediate working policies during the plugin's execution run. (And share your full log files if this persists - they'll give use full visibility on transition of
events leading to such a conflict in your installation - and we'll point out where/when conflict is induced). This still falls under a usage problem. We'll only address it as an issue if
the problem is with the plugin itself, with non-externally induced problems, which are successfully reproducible, and can be classified as a "bug". (& if so, feel free to file an issue on the
project's repo). Cheers!


Regards
Message has been deleted

Rod

unread,
Jul 24, 2019, 4:37:54 PM7/24/19
to rabbitmq-users
Hi Ayanda again!
So we started in a new clean environment. 3 nodes and 493 queues.

It is rebalancing more or less 9 or 7 queues per attempt. No errors in the logs.

My questions would be:
We will need to run the plugin for a couple of hours till achieve an equilibrium. Is there a way to rebalance a bigger amount of queues per rebalance attempt?

Also would be good if a new command that returns true or false if equilibrium has been reached. So we can automate the runs to achieve equilibrium using this plugin ( then just running a command after each attempt to equilibriate would be enough to know when to stop running the plugin in automated way.

Rod

unread,
Jul 24, 2019, 6:02:08 PM7/24/19
to rabbitmq-users
Actually I just started having a similar behavior as yesterday. The plugin stoped rebalancing.

I will post the logs in a few minutes.

Message has been deleted

Rod

unread,
Jul 24, 2019, 8:15:39 PM7/24/19
to rabbitmq-users
I think I have found the root cause of most of these issues. I think your explanation above makes totally sense. Why?

  • I noticed that there are no problem if I create 12 queues in 3 nodes. Then I try to rebalance. Everything is pretty quick.
  • But with 500 queues things changes. Some process that are "transparent" while using the plugin happens. 
    • Queues are re-synchronized. It takes time
    • Some queues have no policy for a period of time while they are moving I think (so if I go to the RabbitMQ UI manager a BUNCH of them in the policy column have no policy (a ? is displayed) for a couple of minutes  (like between 10 minutes or almost 1 hour).
    • re-synchronization takes more than 10 mins till last queue is fully synchronized.
  • So running the plugin all the time like load_queues,go,report repeatedly non stopping, trigger all this weird issues with the plugin.
  • I think should be good if the plugin can not allow us to run another command till all the queues are synchronized and policies re-added in a properly way.
  • I don´t know why I´m still having an unsynchronised queue for almost 40 mins. It says state:running, slaves (unsynchronised) this happen after using the plugin only. What could be the reason?

Rod

unread,
Jul 24, 2019, 11:55:38 PM7/24/19
to rabbitmq-users
Actually I'm quite confusing now.

  • The plugin works whenever it wants :( sometimes it rebalance (Even if I wait till I see that all queues are sync and all my queues have HA policy added after a few mins).
  • If all looks good in the RabbitMQ manager UI, I try to rebalance again to achieve the equilibrium and...nothing happen. Even if I wait for an hour between rebalancing attempts.
  • from time to time some queues are not sync automatically. What I did to solve it is to run rabbitmqctl stop_app start_app. Then the queue was sync.
  • Whenever I try to use the plugin just with 10 or 20 queues, all is good. No errors, no random results. No issues in the log. But with big amount of queues (like 500) all the nightmare starts. 
I have attached a picture with some logs. Unfortunately I can not add the whole log. But I'm trying to add what I think could add some value to figure out what is happening.
IMG_20190724_233557.jpg

Ayanda

unread,
Jul 25, 2019, 11:07:52 AM7/25/19
to rabbitmq-users
Hi Rod

You're mentioning
  • Queues are re-synchronized. It takes time
- Obviously. If your queues are "loaded" with loads of messages, then YES, it's inevitable they will take time to synchronize.
  And the plugin will take its time to ensure messages are surely synchronized. When it comes to speed, the plugin is highly configurable[1] - if you want
  to reduce these parameters to suite your needs, then sure, go ahead  - defaults are high to ensure safety of queues (and btw, we expect
  to see your configuration as well when you report your problems). These configs are there to ensure queues are restored to their original state after
  being migrated across the cluster. And a more determining factor of speed is the amount of messages in your queues which, up to now, you still haven't
  given us an indication on? We need full details of your installation, setup and procedures to reproduce problems you're facing.
  • So running the plugin all the time like load_queues,go,report repeatedly non stopping, trigger all this weird issues with the plugin
- Okay, so this is mis-use of the plugin. It's not what's recommended in the docs. You can't repeatedly be executing "load_queues" and "go" while
  plugin is already be instructed to start balancing. You're only supposed to query the report and  info and/or status during the balancing procedures.
  Sorry - I'm assuming correct usage - but I guess this could a valid case/feature to add, to avoid such usage from some users who haven't read the docs ;-) 


  • I don´t know why I´m still having an unsynchronised queue for almost 40 mins. It says state:running, slaves (unsynchronised) this happen after using the plugin only. What could be the reason?
- If you read the docs, this is proportional to the amount of messages in your queues. If you have N number of queues, with e.g. 100K messages each, then
  firstly, synchronization will take time, and secondly the plugin will take time to ensure synchronization verification of each queue it operates on.
  "this happen after using the plugin only", if usage is incorrect as mentioned above then, you're inducing queue migrations multiple times before the previous
  run has completed - which is wrong! (but understood - we'll restrict such usage scenarios for users who cant use the plugin as documented from the start. it's only
  the "load_queues" of queues which needs addressing nothing else)

Regarding operation, the plugin has an automated test suite[2], which we have limited the "heaviest" test to be 75 queues, with 300 messages each while messages
are being continuously published (to also avoid long running tests). And ALL TESTS pass. Externally to the tests automation, the heaviest load is tested under
1000 queues at the moment (under correct usage).

We need more detail on your setup and test procedures to precisely replicate your issue, from message sizes, number of messages, config file(s), entire
logs (not screenshots - & leave it to us to judge what's useful in the logs), and more detail which can help us see what you're doing. As stated in the docs, these
procedures are supposed to be carried out in a controlled manner (with full awareness of progress and states, hence the intermediate report and info and status commands),
and not executing "load_queues,go,report repeatedly non stopping" .




Regards,

Rod

unread,
Jul 29, 2019, 1:53:32 PM7/29/19
to rabbitmq-users
Hi  Ayanda
 I read the documentation. Maybe I misunderstood something in there.

But here you have what you asked me: I have attached logs and config that I have in my rabbitmq.

Notice that all the issues that I have had in the past were with empty queues.

I modified a little bit the default config as you can see in the config file I have attached and now I'm able to synchronize 400 empty queues. If I had more, let's say another 400( so in total 800). The plugin fails again. I think it is me and the configuration that I'm using (the parameters for the rebalance plugin).

For queues with messages the plugin always fails at some specific point. I have not been able to find the right configuration to rebalance them. But I think you might help with that.
my_poc_logs.txt
my_poc_logs_saasl
my_poc_config.txt

Rodrigo Sandoval

unread,
Jul 29, 2019, 3:32:22 PM7/29/19
to rabbitm...@googlegroups.com
Let me clarify something that I said wrong: 

when there are messages in any queues, the plugin fails. So it doesn't fail in that specific queue with messages but in any random. Probably for the config I'm using for the plugin.

Also I said 2000 messages but I meant 1000.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ef550f32-5a99-44a6-b41d-93d283bf3b45%40googlegroups.com.

Rodrigo Sandoval

unread,
Jul 30, 2019, 2:02:05 AM7/30/19
to rabbitm...@googlegroups.com
I just used a configuration where it worked for 400 queues and 1000 messages.

Operational_priority, 5000
Preload_queues, false
Sync_delay_timeout, 10000
Sync_verification_factor 300
Policy_transaction_delay, 3000

I basically reduced policy_transaction_delay from a very high number I had. With default value of 50 didn't work well. 

I have read the documentation many times. But I am not sure what should be the right config depending on amount of queues + amount of messages( where should I focus the tweak when thinking in queues and messages?)

There is something I don't understand. How I ended up with 2000 messages instead of 1000.

Ayanda Dube

unread,
Jul 30, 2019, 4:04:11 AM7/30/19
to rabbitm...@googlegroups.com
Hi Rodrigo

Thanks for the feedback. Nothing odd with your configuration at all.

Most recent execution is 500+ queues, with 1000 messages, mirrored across all cluster nodes - working perfectly fine with default configuration. And as already mentioned - heaviest test we've carried out is with 1000 queues with active high load/traffic (which was even more hectic as the sync delays are non-deterministic). 1000 or 2000 fixed messages on a 1000 queues is a much more "easy" and straight forward for this tool. Nothing's fluctuating, we've made it withstand harsher conditions under normal use (We also have some of our "paying customers" whom we offer RabbitMQ consulting services to, using this tool for balancing queues). 

FYI, running rabbitmqctl eval 'rabbit_queue_master_balancer:info().' will give you the effective/active configuration during an execution run. Some configurations aren't allowed to fall below, e.g. 100ms, should be reflected by this call.

We're also happy to discuss this with you over a call or something. If you're keen, can you please drop an email to Erlang Solutions, at gen...@erlang-solutions.com requesting to talk to me or one of our RabbitMQ engineers (cc me - I'll prioritise your request). And we'll help you out! We're keen to hear/see what procedures you're executing wrong/incorrectly to get the outcomes you're seeing. 

Cheers! 


Best regards,
Ayanda

Erlang Solutions Ltd.



Code Sync & Erlang Solutions Conferences

Code BEAM Lite BD - Budapest: 20 September 2019

Code BEAM Lite NYC - NYC: 01 October 2019

Code BEAM Lite - Berlin: 11 October 2019

RabbitMQ Summit - London: 4 November 2019

Code Mesh LDN - London: 7-8 November 2019

Code BEAM Lite India - Bangalore: 14 November 2019

Code BEAM Lite AMS - Amsterdam: 28 November 2019

Lambda Days - Kraków: 13-14 February 2020

Code BEAM SF - San Francisco: 5-6 March 2020


Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.

Rod

unread,
Jul 30, 2019, 10:20:25 PM7/30/19
to rabbitmq-users
Many thanks Ayanda. Hopefully we will end up with these issues solved. I have sent you an email. 

Also I´m trying to compile te test suit but I´m facing errors. I will create another question to to mix stuff.


El martes, 30 de julio de 2019, 4:04:11 (UTC-4), Ayanda escribió:
Hi Rodrigo

Thanks for the feedback. Nothing odd with your configuration at all.

Most recent execution is 500+ queues, with 1000 messages, mirrored across all cluster nodes - working perfectly fine with default configuration. And as already mentioned - heaviest test we've carried out is with 1000 queues with active high load/traffic (which was even more hectic as the sync delays are non-deterministic). 1000 or 2000 fixed messages on a 1000 queues is a much more "easy" and straight forward for this tool. Nothing's fluctuating, we've made it withstand harsher conditions under normal use (We also have some of our "paying customers" whom we offer RabbitMQ consulting services to, using this tool for balancing queues). 

FYI, running rabbitmqctl eval 'rabbit_queue_master_balancer:info().' will give you the effective/active configuration during an execution run. Some configurations aren't allowed to fall below, e.g. 100ms, should be reflected by this call.

We're also happy to discuss this with you over a call or something. If you're keen, can you please drop an email to Erlang Solutions, at gen...@erlang-solutions.com requesting to talk to me or one of our RabbitMQ engineers (cc me - I'll prioritise your request). And we'll help you out! We're keen to hear/see what procedures you're executing wrong/incorrectly to get the outcomes you're seeing. 

Cheers! 


Best regards,
Ayanda

Erlang Solutions Ltd.


On Tue, 30 Jul 2019 at 08:02, Rodrigo Sandoval <rodrigo...@gmail.com> wrote:
I just used a configuration where it worked for 400 queues and 1000 messages.

Operational_priority, 5000
Preload_queues, false
Sync_delay_timeout, 10000
Sync_verification_factor 300
Policy_transaction_delay, 3000

I basically reduced policy_transaction_delay from a very high number I had. With default value of 50 didn't work well. 

I have read the documentation many times. But I am not sure what should be the right config depending on amount of queues + amount of messages( where should I focus the tweak when thinking in queues and messages?)

There is something I don't understand. How I ended up with 2000 messages instead of 1000.

On Mon, Jul 29, 2019, 3:32 PM Rodrigo Sandoval <rodrigo.san...@gmail.com> wrote:
Let me clarify something that I said wrong: 

when there are messages in any queues, the plugin fails. So it doesn't fail in that specific queue with messages but in any random. Probably for the config I'm using for the plugin.

Also I said 2000 messages but I meant 1000.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages