RabbitMQ peer discovery with consul does not work

991 views
Skip to first unread message

eng1024

unread,
Apr 22, 2019, 11:19:08 PM4/22/19
to rabbitmq-users
Hey all,

I'm stuck and need your help.

Trying to configure rabbit cluster with consul peer discovery. Never worked with consul before but I think I configured it OK.

Testing with two rabbit nodes; here is my config on both:

root@rabbitmq-auto-cluster01-7fkv:~# cat /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
cluster_formation.consul.host = rabbitmq-auto-cluster01-consul
cluster_formation.consul.port = 8500    # 8500 is default
cluster_formation.consul.scheme = http  # http is default
cluster_formation.consul.svc_addr_auto = true

When I start rabbit on both nodes, I'm getting no errors:

root@rabbitmq-auto-cluster01-7fkv:~# tail -15 /var/log/rabbitmq/rab...@rabbitmq-auto-cluster01-7fkv.log
2019-04-23 00:11:57.088 [info] <0.580.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-04-23 00:11:57.089 [info] <0.686.0> Statistics database started.
2019-04-23 00:11:57.138 [notice] <0.106.0> Changed loghwm of /var/log/rabbitmq/rab...@rabbitmq-auto-cluster01-7fkv.log to 50
2019-04-23 00:11:57.350 [info] <0.8.0> Server startup complete; 11 plugins started.
 * rabbitmq_federation_management
 * rabbitmq_tracing
 * rabbitmq_shovel_management
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_peer_discovery_consul
 * rabbitmq_shovel
 * rabbitmq_web_dispatch
 * rabbitmq_federation
 * rabbitmq_event_exchange
 * rabbitmq_peer_discovery_common


When I start rabbit on two nodes, I see this in consul:

root@rabbitmq-auto-cluster01-consul:/var/log# tail -f /var/log/syslog
Apr 23 03:06:04 rabbitmq-auto-cluster01-consul consul[20918]:     2019/04/23 03:06:04 [INFO] agent: Synced service "rabbitmq:rabbitmq-auto-cluster01-f10v"
Apr 23 03:06:09 rabbitmq-auto-cluster01-consul consul[20918]:     2019/04/23 03:06:09 [INFO] agent: Synced service "rabbitmq:rabbitmq-auto-cluster01-7fkv"

... and I also see this:

root@rabbitmq-auto-cluster01-consul:/var/log# consul catalog services
consul
rabbitmq

From both rabbit nodes I can connect to the other one on tcp/25672 (clustering port) by using short name of the other node

root@rabbitmq-auto-cluster01-7fkv:~# nc -v rabbitmq-auto-cluster01-f10v 25672
Connection to rabbitmq-auto-cluster01-f10v 25672 port [tcp/*] succeeded!

Ran tcpdump on a host listening to all traffic on port 25672 while starting rabbit on it and don't see any traffic on that port whatsoever. It's like the node doesn't even try to negotiate clustering with the peer.

Both nodes come up as standalone servers. And yes, I did set /var/lib/rabbitmq/.erlang.cookie

WHAT AM I MISSING???

Thanks!

Michael Klishin

unread,
Apr 23, 2019, 9:10:31 AM4/23/19
to rabbitmq-users
Please see [1]. Almost certainly your nodes have started as individual nodes and no longer perform peer discovery on restarts.
You must reset and restart at least one of them.

[1] also explains how to enable debug logging so that all Consul API requests are logged.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Apr 23, 2019, 9:11:29 AM4/23/19
to rabbitmq-users
Oh, and at the top of the log when a node starts it will also log if it starts as a blank one (I don't remember if that
uses debug log level or not). That part is not present in the log snippet you've posted. Consider sharing the entire file.

eng1024

unread,
Apr 23, 2019, 2:24:12 PM4/23/19
to rabbitmq-users
Thanks for the quick response, Michael!

This is a part of Terraform repo that creates consul server and autoscalable RabbitMQ cluster in GCP. Adding

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

at the bottom of my cloud-init script fixed the issue. I'm attaching the script in case anyone needs to reinvent the wheel.

Disclaimer: I know very little about RabbitMQ so all the config parameters in the script got picked up from all over the interwebs.

On Monday, April 22, 2019 at 8:19:08 PM UTC-7, eng1024 wrote:
Hey all,

I'm stuck and need your help.

Trying to configure rabbit cluster with consul peer discovery. Never worked with consul before but I think I configured it OK.

Testing with two rabbit nodes; here is my config on both:

root@rabbitmq-auto-cluster01-7fkv:~# cat /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
cluster_formation.consul.host = rabbitmq-auto-cluster01-consul
cluster_formation.consul.port = 8500    # 8500 is default
cluster_formation.consul.scheme = http  # http is default
cluster_formation.consul.svc_addr_auto = true

When I start rabbit on both nodes, I'm getting no errors:

root@rabbitmq-auto-cluster01-7fkv:~# tail -15 /var/log/rabbitmq/rabbit@rabbitmq-auto-cluster01-7fkv.log
2019-04-23 00:11:57.088 [info] <0.580.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-04-23 00:11:57.089 [info] <0.686.0> Statistics database started.
2019-04-23 00:11:57.138 [notice] <0.106.0> Changed loghwm of /var/log/rabbitmq/rabbit@rabbitmq-auto-cluster01-7fkv.log to 50
rabbitmq_setup.sh

Michael Klishin

unread,
Apr 23, 2019, 4:01:52 PM4/23/19
to rabbitmq-users
Right, so resetting the node indeed makes it perform peer discovery. I suspect that my hypothesis
was right. FWIW it is covered in [1][2].


On Tue, Apr 23, 2019 at 9:24 PM eng1024 <aba...@gmail.com> wrote:
Thanks for the quick response, Michael!

This is a part of Terraform repo that creates consul server and autoscalable RabbitMQ cluster in GCP. Adding

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

at the bottom of my cloud-init script fixed the issue. I'm attaching the script in case anyone needs to reinvent the wheel.

Disclaimer: I know very little about RabbitMQ so all the config parameters in the script got picked up from all over the interwebs.

On Monday, April 22, 2019 at 8:19:08 PM UTC-7, eng1024 wrote:
Hey all,

I'm stuck and need your help.

Trying to configure rabbit cluster with consul peer discovery. Never worked with consul before but I think I configured it OK.

Testing with two rabbit nodes; here is my config on both:

root@rabbitmq-auto-cluster01-7fkv:~# cat /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
cluster_formation.consul.host = rabbitmq-auto-cluster01-consul
cluster_formation.consul.port = 8500    # 8500 is default
cluster_formation.consul.scheme = http  # http is default
cluster_formation.consul.svc_addr_auto = true

When I start rabbit on both nodes, I'm getting no errors:

root@rabbitmq-auto-cluster01-7fkv:~# tail -15 /var/log/rabbitmq/rab...@rabbitmq-auto-cluster01-7fkv.log
2019-04-23 00:11:57.088 [info] <0.580.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-04-23 00:11:57.089 [info] <0.686.0> Statistics database started.
2019-04-23 00:11:57.138 [notice] <0.106.0> Changed loghwm of /var/log/rabbitmq/rab...@rabbitmq-auto-cluster01-7fkv.log to 50

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages