Number of node in cluster doesn't match between portal and rabbitmqctl

86 views
Skip to first unread message

philip

unread,
Sep 18, 2019, 12:54:30 AM9/18/19
to rabbitmq-users
Hi,

I have setup 3 nodes in our rabbitmq cluster.  Because of bad hardware, one of the node went down and we have brought a new machine and added it back to the cluster.  Couple weeks later, same thing happened and we brought up another new machine and joined it back to the cluster.  Everything seemed to be working properly.

And I just happen to notice today, in the listening port detail on the rabbitmq portal are showing 5 clustering instead of 3.  But it does display 3 nodes are in the clustering.  Please see attached screenshot.
I also run rabbitmqctl cluster_status and it also report 3 nodes.

We didn't see any issues so far, but what type of issues could this bring?  How did that get into such state, so that we can avoid that in the future?  And most importantly how to correct that?

Thanks,
Phil


rabbitmq_portal.png

Wesley Peng

unread,
Sep 18, 2019, 2:23:45 AM9/18/19
to rabbitm...@googlegroups.com


on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3.  But it does
> display 3 nodes are in the clustering.  Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

philip lin

unread,
Sep 18, 2019, 8:16:22 AM9/18/19
to rabbitm...@googlegroups.com
I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6e9cba80-8ef0-68c1-bf9c-f6cee52218dc%40thepeng.eu.

Luke Bakken

unread,
Sep 18, 2019, 11:44:47 AM9/18/19
to rabbitmq-users
Hi Philip,

This is probably due to stale data in RabbitMQ's management database.

What is the output of rabbitmqctl cluster_status?

Please leave enough of the node name visible so that we can tell the difference between them. Thanks.

Luke


On Wednesday, September 18, 2019 at 5:16:22 AM UTC-7, philip wrote:
I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

On Wed, Sep 18, 2019 at 2:23 AM Wesley Peng <wes...@thepeng.eu> wrote:


on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3.  But it does
> display 3 nodes are in the clustering.  Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Michael Klishin

unread,
Sep 18, 2019, 12:57:41 PM9/18/19
to rabbitmq-users
Have the nodes been explicitly removed using `rabbitmqctl forget_cluster_node`?

The stats DB that management UI uses is transient, you can clear it as described in the docs or by restarting the node.

You can ask each node for its locally registered listeners using `rabbitmq-diagnostics listeners`. That's the source of truth (for that node)
as it queries the listener table directly instead of relying on emitted stats events.

philip

unread,
Sep 18, 2019, 2:59:37 PM9/18/19
to rabbitmq-users
I ran the command on all 3 nodes:
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-152-20 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                'rabbit@ip-xxx-xxx-152-20']}]},
 {running_nodes,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                 'rabbit@ip-xxx-xxx-152-20']},
 {cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},
 {partitions,[]},
 {alarms,[{'rabbit@ip-xxx-xxx-150-33',[]},
          {'rabbit@ip-xxx-xxx-151-19',[]},
          {'rabbit@ip-xxx-xxx-152-20',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-151-19 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                'rabbit@ip-xxx-xxx-152-20']}]},
 {running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-150-33',
                 'rabbit@ip-xxx-xxx-151-19']},
 {cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},
 {partitions,[]},
 {alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},
          {'rabbit@ip-xxx-xxx-150-33',[]},
          {'rabbit@ip-xxx-xxx-151-19',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-150-33 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                'rabbit@ip-xxx-xxx-152-20']}]},
 {running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-151-19',
                 'rabbit@ip-xxx-xxx-150-33']},
 {cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},
 {partitions,[]},
 {alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},
          {'rabbit@ip-xxx-xxx-151-19',[]},
          {'rabbit@ip-xxx-xxx-150-33',[]}]}]

Thanks.

On Wednesday, September 18, 2019 at 11:44:47 AM UTC-4, Luke Bakken wrote:
Hi Philip,

This is probably due to stale data in RabbitMQ's management database.

What is the output of rabbitmqctl cluster_status?

Please leave enough of the node name visible so that we can tell the difference between them. Thanks.

Luke

On Wednesday, September 18, 2019 at 5:16:22 AM UTC-7, philip wrote:
I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

On Wed, Sep 18, 2019 at 2:23 AM Wesley Peng <wes...@thepeng.eu> wrote:


on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3.  But it does
> display 3 nodes are in the clustering.  Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Michael Klishin

unread,
Sep 18, 2019, 4:50:38 PM9/18/19
to rabbitmq-users
Please list the listeners on every node as mentioned above.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2159ad2e-53b2-450b-b8ce-4c852e4b5dfc%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

philip lin

unread,
Sep 18, 2019, 9:20:44 PM9/18/19
to rabbitm...@googlegroups.com
I ran "rabbitmq-diagnostics listeners" on each node:

Asking node rabbit@ip-xxx-xxx-151-19 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Asking node rabbit@ip-xxx-xxx-152-20 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Asking node rabbit@ip-xxx-xxx150-33 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Thanks.

Michael Klishin

unread,
Sep 19, 2019, 4:02:30 AM9/19/19
to rabbitmq-users
So something caused a listener not to be removed from the stats DB. You can clear it as explained in the docs
or disable and re-enable the management plugin. Without a way to reproduce that's as much as I can recommend.

In 3.8, the monitoring system collects node-specific data and aggregates it outside of RabbitMQ [1], so this issue and the likes may or may not apply.


philip lin

unread,
Sep 20, 2019, 10:07:55 AM9/20/19
to rabbitm...@googlegroups.com
OK.  I did bring down all three nodes in the cluster and start them back up.  And once that was finished, the old clustering info shown in the UI are no longer there.

Thanks for info.

Michael Klishin

unread,
Sep 20, 2019, 5:20:37 PM9/20/19
to rabbitmq-users
FTR, it was sufficient to reset the stats database on all nodes or restart management plugin on all of them.



philip

unread,
Mar 13, 2020, 3:31:18 PM3/13/20
to rabbitmq-users
Hi,

Response from 9-19:
"So something caused a listener not to be removed from the stats DB. You can clear it as explained in the docs
or disable and re-enable the management plugin. Without a way to reproduce that's as much as I can recommend."

Since the last time (about 6 months ago) I reset the entire cluster, we have about 20+ entries of stale data again.  So this time instead of resetting the cluster since we try to avoid down time, I've tried to disable and re-enable the management plugin per one of the suggestions, but didn't seem to be working.  So, next I'd like to try clear the stats DB directly, can u please provide the link to the docs that explain how to do that?

Just to confirm, these data in the stats DB shouldn't affect the normal operation (e.g. HA) in the cluster, right?

Thanks.



 {cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},
 {partitions,[]},
 {alarms,[{'rabbit@ip-xxx-xxx-150-33',[]},
          {'rabbit@ip-xxx-xxx-151-19',[]},
          {'rabbit@ip-xxx-xxx-152-20',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-151-19 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                'rabbit@ip-xxx-xxx-152-20']}]},
 {running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-150-33',
                 'rabbit@ip-xxx-xxx-151-19']},
 {cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},
 {partitions,[]},
 {alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},
          {'rabbit@ip-xxx-xxx-150-33',[]},
          {'rabbit@ip-xxx-xxx-151-19',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-150-33 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
                'rabbit@ip-xxx-xxx-152-20']}]},
 {running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-151-19',
                 'rabbit@ip-xxx-xxx-150-33']},
 {cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Luke Bakken

unread,
Mar 13, 2020, 4:58:20 PM3/13/20
to rabbitmq-users
Hello,

Please see the docs - https://www.rabbitmq.com/rabbitmqctl.8.html#Management_agent_plugin

Stats DB entries will not affect HA at all.

Thanks,
Luke

philip

unread,
Mar 16, 2020, 11:18:19 AM3/16/20
to rabbitmq-users
Hi,

Thanks for the info.  
When I ran "rabbitmqctl reset_stats_db --all", I got command not found error.  After some research on management agent plugin, I found another ref here: https://www.rabbitmq.com/management.html#stats-db.  So, I ran the following commands and got ok back from the response.
rabbitmqctl eval 'rabbit_mgmt_storage:reset().'  on each node; however, the issue was still there such that multiple/old ip were still listed for clustering ports as before.
rabbitmqctl eval 'rabbit_mgmt_storage:reset_all().'; however, the issue was still there such that multiple/old ip were still listed clustering ports as before.

Any idea?

Thanks.
Reply all
Reply to author
Forward
0 new messages