Help troubleshooting autocluster

Brian Wawok

unread,

May 4, 2018, 7:54:34 PM5/4/18

to rabbitmq-users

Is there a way to turn up autocluster discovery logs? Something is wrong with my cluster and I cannot see any errors.

(I am aware of https://groups.google.com/forum/#!topic/rabbitmq-users/wuOfzEywHXo and friends, have been digging down that path a while)

Basically I bring up my cluster of 3 nodes, and each forms a 1 node cluster.

Manually going to my k8s endpoint at /api/v1/namespaces/default/endpoints/rabbitmq-ha-discovery shows valid values for everything.

My config

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## The default "guest" user is only permitted to access the server
## via a loopback interface (e.g. localhost)
loopback_users.guest = false
## Memory-based Flow Control threshold
vm_memory_high_watermark.absolute = 3072MB

Sample logs from 1 node...

Starting RabbitMQ 3.7.4 on Erlang 20.1.7
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
## ##
## ## RabbitMQ 3.7.4. Copyright (C) 2007-2018 Pivotal Software, Inc.
########## Licensed under the MPL. See http://www.rabbitmq.com/
###### ##
########## Logs: <stdout>
Starting broker...
2018-05-04 23:46:29.393 [info] <0.5496.0>
node : rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : O6RpMR08WTUZ0+0bJZpreQ==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local
2018-05-04 23:46:30.893 [info] <0.5509.0> Memory high watermark set to 2929 MiB (3072000000 bytes) of 5120 MiB (5368709120 bytes) total
2018-05-04 23:46:30.898 [info] <0.5511.0> Enabling free disk space monitoring
2018-05-04 23:46:30.898 [info] <0.5511.0> Disk free limit set to 50MB
2018-05-04 23:46:30.923 [info] <0.5513.0> Limiting to approx 1048476 file handles (943626 sockets)
2018-05-04 23:46:30.923 [info] <0.5514.0> FHC read buffering: OFF
2018-05-04 23:46:30.923 [info] <0.5514.0> FHC write buffering: ON
2018-05-04 23:46:30.925 [info] <0.5496.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2018-05-04 23:46:30.999 [info] <0.5496.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2018-05-04 23:46:31.000 [info] <0.5496.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping registration.
2018-05-04 23:46:31.002 [info] <0.5496.0> Priority queues enabled, real BQ is rabbit_variable_queue
2018-05-04 23:46:31.012 [info] <0.5545.0> Starting rabbit_node_monitor
2018-05-04 23:46:31.084 [info] <0.5496.0> Management plugin: using rates mode 'basic'
2018-05-04 23:46:31.086 [info] <0.5579.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2018-05-04 23:46:31.097 [info] <0.5579.0> Starting message stores for vhost '/'
2018-05-04 23:46:31.098 [info] <0.5583.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2018-05-04 23:46:31.102 [info] <0.5579.0> Started message store of type transient for vhost '/'
2018-05-04 23:46:31.102 [info] <0.5586.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2018-05-04 23:46:31.116 [info] <0.5579.0> Started message store of type persistent for vhost '/'
2018-05-04 23:46:31.144 [info] <0.5650.0> started TCP Listener on [::]:5672
2018-05-04 23:46:31.144 [info] <0.5496.0> Setting up a table for connection tracking on this node: 'tracked_connecti...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.145 [info] <0.5496.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_pe...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.146 [info] <0.33.0> Application rabbit started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.150 [info] <0.33.0> Application rabbitmq_amqp1_0 started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.155 [info] <0.5659.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 10 seconds.
2018-05-04 23:46:31.155 [info] <0.33.0> Application rabbitmq_peer_discovery_common started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.156 [info] <0.33.0> Application rabbitmq_shovel started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.157 [info] <0.33.0> Application rabbitmq_federation started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.157 [info] <0.33.0> Application rabbitmq_consistent_hash_exchange started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.157 [info] <0.33.0> Application rabbitmq_peer_discovery_k8s started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.165 [info] <0.33.0> Application rabbitmq_management_agent started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.166 [info] <0.33.0> Application cowboy started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.166 [info] <0.33.0> Application rabbitmq_web_dispatch started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.233 [info] <0.5726.0> Management plugin started. Port: 15672
2018-05-04 23:46:31.233 [info] <0.5832.0> Statistics database started.
2018-05-04 23:46:31.238 [info] <0.33.0> Application rabbitmq_management started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.238 [info] <0.33.0> Application rabbitmq_federation_management started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.239 [info] <0.33.0> Application rabbitmq_shovel_management started on node 'rab...@rabbitmq-ha-1.rabbitmq-ha-discovery.default.svc.cluster.local'
2018-05-04 23:46:31.813 [info] <0.5843.0> accepting AMQP connection <0.5843.0> (10.0.11.10:37640 -> 10.0.5.28:5672)
2018-05-04 23:46:31.819 [info] <0.5843.0> connection <0.5843.0> (10.0.11.10:37640 -> 10.0.5.28:5672): user 'username' authenticated and granted access to vhost '/'
completed with 11 plugins.
2018-05-04 23:46:31.891 [info] <0.5.0> Server startup complete; 11 plugins started.
* rabbitmq_shovel_management
* rabbitmq_federation_management
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
* rabbitmq_peer_discovery_k8s
* rabbitmq_consistent_hash_exchange
* rabbitmq_federation
* rabbitmq_shovel
* rabbitmq_peer_discovery_common
* rabbitmq_amqp1_0
2018-05-04 23:46:44.473 [info] <0.5865.0> accepting AMQP connection <0.5865.0> (10.0.11.8:42962 -> 10.0.5.28:5672)

Brian Wawok

unread,

May 4, 2018, 9:36:50 PM5/4/18

to rabbitmq-users

I turned all logging to debug and am not seeing anything try...

2018-05-05 01:33:16.827 [debug] <0.5647.0> Peer discovery: checking for partitioned nodes to clean up.

2018-05-05 01:33:16.828 [debug] <0.5647.0> Peer discovery: all known cluster nodes are up.

Michael Klishin

unread,

May 4, 2018, 10:39:20 PM5/4/18

to rabbitm...@googlegroups.com

http://www.rabbitmq.com/cluster-formation.html#troubleshooting-cluster-formation as well as the section

that mentions when peer discovery will not work: http://www.rabbitmq.com/cluster-formation.html#peer-discovery.

log.file.level = debug
log.console.level = debug

will log every Consul HTTP API request.

According to the provided log discovery was not performed, very very likely due to the fact

that this node has already been initialized and possibly a cluster member.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Brian Wawok

unread,

May 5, 2018, 10:22:49 AM5/5/18

to rabbitmq-users

You called it, it was cached cluster data on each node.

I wish it logged a bit more at INFO on startup ;)

On Friday, May 4, 2018 at 10:39:20 PM UTC-4, Michael Klishin wrote:

http://www.rabbitmq.com/cluster-formation.html#troubleshooting-cluster-formation as well as the section
that mentions when peer discovery will not work: http://www.rabbitmq.com/cluster-formation.html#peer-discovery.

log.file.level = debug
log.console.level = debug

will log every Consul HTTP API request.

According to the provided log discovery was not performed, very very likely due to the fact
that this node has already been initialized and possibly a cluster member.

On Fri, May 4, 2018 at 8:36 PM, Brian Wawok <bwa...@gmail.com> wrote:

I turned all logging to debug and am not seeing anything try...

2018-05-05 01:33:16.827 [debug] <0.5647.0> Peer discovery: checking for partitioned nodes to clean up.
2018-05-05 01:33:16.828 [debug] <0.5647.0> Peer discovery: all known cluster nodes are up.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward