Number of node in cluster doesn't match between portal and rabbitmqctl

philip

unread,

Sep 18, 2019, 12:54:30 AM9/18/19

to rabbitmq-users

Hi,

I have setup 3 nodes in our rabbitmq cluster. Because of bad hardware, one of the node went down and we have brought a new machine and added it back to the cluster. Couple weeks later, same thing happened and we brought up another new machine and joined it back to the cluster. Everything seemed to be working properly.

And I just happen to notice today, in the listening port detail on the rabbitmq portal are showing 5 clustering instead of 3. But it does display 3 nodes are in the clustering. Please see attached screenshot.

I also run rabbitmqctl cluster_status and it also report 3 nodes.

We didn't see any issues so far, but what type of issues could this bring? How did that get into such state, so that we can avoid that in the future? And most importantly how to correct that?

Thanks,

Phil

rabbitmq_portal.png

Wesley Peng

unread,

Sep 18, 2019, 2:23:45 AM9/18/19

to rabbitm...@googlegroups.com

on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3. But it does
> display 3 nodes are in the clustering. Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

philip lin

unread,

Sep 18, 2019, 8:16:22 AM9/18/19

to rabbitm...@googlegroups.com

I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6e9cba80-8ef0-68c1-bf9c-f6cee52218dc%40thepeng.eu.

Luke Bakken

unread,

Sep 18, 2019, 11:44:47 AM9/18/19

to rabbitmq-users

Hi Philip,

This is probably due to stale data in RabbitMQ's management database.

What is the output of rabbitmqctl cluster_status?

Please leave enough of the node name visible so that we can tell the difference between them. Thanks.

Luke

On Wednesday, September 18, 2019 at 5:16:22 AM UTC-7, philip wrote:

I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

On Wed, Sep 18, 2019 at 2:23 AM Wesley Peng <wes...@thepeng.eu> wrote:

on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3. But it does
> display 3 nodes are in the clustering. Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Michael Klishin

unread,

Sep 18, 2019, 12:57:41 PM9/18/19

to rabbitmq-users

Have the nodes been explicitly removed using `rabbitmqctl forget_cluster_node`?

The stats DB that management UI uses is transient, you can clear it as described in the docs or by restarting the node.

You can ask each node for its locally registered listeners using `rabbitmq-diagnostics listeners`. That's the source of truth (for that node)

as it queries the listener table directly instead of relying on emitted stats events.

philip

unread,

Sep 18, 2019, 2:59:37 PM9/18/19

to rabbitmq-users

I ran the command on all 3 nodes:

==========================================================

Cluster status of node rabbit@ip-xxx-xxx-152-20 ...

[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',

'rabbit@ip-xxx-xxx-152-20']}]},

{running_nodes,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',

'rabbit@ip-xxx-xxx-152-20']},

{cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},

{partitions,[]},

{alarms,[{'rabbit@ip-xxx-xxx-150-33',[]},

{'rabbit@ip-xxx-xxx-151-19',[]},

{'rabbit@ip-xxx-xxx-152-20',[]}]}]

==========================================================

Cluster status of node rabbit@ip-xxx-xxx-151-19 ...

[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',

'rabbit@ip-xxx-xxx-152-20']}]},

{running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-150-33',

'rabbit@ip-xxx-xxx-151-19']},

{cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},

{partitions,[]},

{alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},

{'rabbit@ip-xxx-xxx-150-33',[]},

{'rabbit@ip-xxx-xxx-151-19',[]}]}]

==========================================================

Cluster status of node rabbit@ip-xxx-xxx-150-33 ...

[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',

'rabbit@ip-xxx-xxx-152-20']}]},

{running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-151-19',

'rabbit@ip-xxx-xxx-150-33']},

{cluster_name,<<"rab...@ip-xxx-xxx-150-194.internal">>},

{partitions,[]},

{alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},

{'rabbit@ip-xxx-xxx-151-19',[]},

{'rabbit@ip-xxx-xxx-150-33',[]}]}]

Thanks.

On Wednesday, September 18, 2019 at 11:44:47 AM UTC-4, Luke Bakken wrote:

Hi Philip,

This is probably due to stale data in RabbitMQ's management database.

What is the output of rabbitmqctl cluster_status?

Please leave enough of the node name visible so that we can tell the difference between them. Thanks.

Luke

On Wednesday, September 18, 2019 at 5:16:22 AM UTC-7, philip wrote:

I'm running rabbit 3.7.17 which I think it's the latest version and erlang 21.3.8.6.

On Wed, Sep 18, 2019 at 2:23 AM Wesley Peng <wes...@thepeng.eu> wrote:

on 2019/9/18 12:54, philip wrote:
> And I just happen to notice today, in the listening port detail on the
> rabbitmq portal are showing 5 clustering instead of 3. But it does
> display 3 nodes are in the clustering. Please see attached screenshot.
> I also run rabbitmqctl cluster_status and it also report 3 nodes.

This seems a bug. have you upgraded rabbitmq to latest version?

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Michael Klishin

unread,

Sep 18, 2019, 4:50:38 PM9/18/19

to rabbitmq-users

Please list the listeners on every node as mentioned above.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2159ad2e-53b2-450b-b8ce-4c852e4b5dfc%40googlegroups.com.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

philip lin

unread,

Sep 18, 2019, 9:20:44 PM9/18/19

to rabbitm...@googlegroups.com

I ran "rabbitmq-diagnostics listeners" on each node:

Asking node rabbit@ip-xxx-xxx-151-19 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Asking node rabbit@ip-xxx-xxx-152-20 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Asking node rabbit@ip-xxx-xxx150-33 to report its protocol listeners ...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Interface: [::], port: 15672, protocol: http, purpose: HTTP API

Thanks.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAGcLz6WqjQ6sz1GjiPLc6fcTzHN-HhVQpfs%3DJrTaF5BSYYdW9Q%40mail.gmail.com.

Michael Klishin

unread,

Sep 19, 2019, 4:02:30 AM9/19/19

to rabbitmq-users

So something caused a listener not to be removed from the stats DB. You can clear it as explained in the docs

or disable and re-enable the management plugin. Without a way to reproduce that's as much as I can recommend.

In 3.8, the monitoring system collects node-specific data and aggregates it outside of RabbitMQ [1], so this issue and the likes may or may not apply.

1. http://next.rabbitmq.com/prometheus.html

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAJa_1xX6hgPifUVRRAiR9sMZeCyx8uY77of5FTw-Qwjq4BAwXA%40mail.gmail.com.

philip lin

unread,

Sep 20, 2019, 10:07:55 AM9/20/19

to rabbitm...@googlegroups.com

OK. I did bring down all three nodes in the cluster and start them back up. And once that was finished, the old clustering info shown in the UI are no longer there.

Thanks for info.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAGcLz6XvcTkZ6c294h9qeB034xbB%2BLxD%3Dc7-qNyJPK97C4fUtg%40mail.gmail.com.

Michael Klishin

unread,

Sep 20, 2019, 5:20:37 PM9/20/19

to rabbitmq-users

FTR, it was sufficient to reset the stats database on all nodes or restart management plugin on all of them.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAJa_1xV-CgVkSLhRLR4RFw-AOhSCmE9uxH_jh3EeRaCAKee5Cw%40mail.gmail.com.

philip

unread,

Mar 13, 2020, 3:31:18 PM3/13/20

to rabbitmq-users

Hi,

Response from 9-19:

"So something caused a listener not to be removed from the stats DB. You can clear it as explained in the docs

or disable and re-enable the management plugin. Without a way to reproduce that's as much as I can recommend."

Since the last time (about 6 months ago) I reset the entire cluster, we have about 20+ entries of stale data again. So this time instead of resetting the cluster since we try to avoid down time, I've tried to disable and re-enable the management plugin per one of the suggestions, but didn't seem to be working. So, next I'd like to try clear the stats DB directly, can u please provide the link to the docs that explain how to do that?

Just to confirm, these data in the stats DB shouldn't affect the normal operation (e.g. HA) in the cluster, right?

Thanks.

{cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},

{partitions,[]},
{alarms,[{'rabbit@ip-xxx-xxx-150-33',[]},
{'rabbit@ip-xxx-xxx-151-19',[]},
{'rabbit@ip-xxx-xxx-152-20',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-151-19 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
'rabbit@ip-xxx-xxx-152-20']}]},
{running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-150-33',
'rabbit@ip-xxx-xxx-151-19']},

{cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},

{partitions,[]},
{alarms,[{'rabbit@ip-xxx-xxx-152-20',[]},
{'rabbit@ip-xxx-xxx-150-33',[]},
{'rabbit@ip-xxx-xxx-151-19',[]}]}]
==========================================================
Cluster status of node rabbit@ip-xxx-xxx-150-33 ...
[{nodes,[{disc,['rabbit@ip-xxx-xxx-150-33','rabbit@ip-xxx-xxx-151-19',
'rabbit@ip-xxx-xxx-152-20']}]},
{running_nodes,['rabbit@ip-xxx-xxx-152-20','rabbit@ip-xxx-xxx-151-19',
'rabbit@ip-xxx-xxx-150-33']},

{cluster_name,<<"rabbit@ip-xxx-xxx-150-194.internal">>},

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2159ad2e-53b2-450b-b8ce-4c852e4b5dfc%40googlegroups.com.

--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAGcLz6WqjQ6sz1GjiPLc6fcTzHN-HhVQpfs%3DJrTaF5BSYYdW9Q%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAJa_1xX6hgPifUVRRAiR9sMZeCyx8uY77of5FTw-Qwjq4BAwXA%40mail.gmail.com.

--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAGcLz6XvcTkZ6c294h9qeB034xbB%2BLxD%3Dc7-qNyJPK97C4fUtg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAJa_1xV-CgVkSLhRLR4RFw-AOhSCmE9uxH_jh3EeRaCAKee5Cw%40mail.gmail.com.

Luke Bakken

unread,

Mar 13, 2020, 4:58:20 PM3/13/20

to rabbitmq-users

Hello,

Please see the docs - https://www.rabbitmq.com/rabbitmqctl.8.html#Management_agent_plugin

Stats DB entries will not affect HA at all.

Thanks,

Luke

philip

unread,

Mar 16, 2020, 11:18:19 AM3/16/20

to rabbitmq-users

Hi,

Thanks for the info.

When I ran "rabbitmqctl reset_stats_db --all", I got command not found error. After some research on management agent plugin, I found another ref here: https://www.rabbitmq.com/management.html#stats-db. So, I ran the following commands and got ok back from the response.

rabbitmqctl eval 'rabbit_mgmt_storage:reset().' on each node; however, the issue was still there such that multiple/old ip were still listed for clustering ports as before.

rabbitmqctl eval 'rabbit_mgmt_storage:reset_all().'; however, the issue was still there such that multiple/old ip were still listed clustering ports as before.