Thanks, Michał!
I was able to get RabbitMQ to start after switching to "classic_config" for the peer discovery. However, I noticed some errors in the logs and wanted to confirm if it something we can ignore.
I'm using this config for my 2 node/pod RabbitMQ cluster:
cluster_formation.peer_discovery_backend = classic_config
cluster_formation.classic_config.nodes.1 = rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
cluster_formation.classic_config.nodes.2 = rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
I'm seeing this in the "rabbitmq-ha-0" log:
2021-05-14 07:16:25.028 [info] <0.271.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2021-05-14 07:16:25.028 [info] <0.271.0> Configured peer discovery backend: rabbit_peer_discovery_classic_config
2021-05-14 07:16:25.028 [debug] <0.271.0> Peer discovery backend does not support initialisation
2021-05-14 07:16:25.028 [info] <0.271.0> Will try to lock with peer discovery backend rabbit_peer_discovery_classic_config
2021-05-14 07:16:25.028 [debug] <0.271.0> rabbit_peer_discovery:lock returned not_supported
2021-05-14 07:16:25.028 [info] <0.271.0> Peer discovery backend does not support locking, falling back to randomized delay
2021-05-14 07:16:25.028 [info] <0.271.0> Peer discovery backend rabbit_peer_discovery_classic_config supports registration.
2021-05-14 07:16:25.028 [debug] <0.271.0> Randomized startup delay: configured range is from 5000 to 60000 milliseconds, PRNG pick: 33446...
2021-05-14 07:16:25.028 [info] <0.271.0> Will wait for 33446 milliseconds before proceeding with registration...
2021-05-14 07:16:58.475 [info] <0.271.0> All discovered existing cluster peers: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
2021-05-14 07:16:58.475 [info] <0.271.0> Peer nodes we can cluster with: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
2021-05-14 07:16:58.483 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
2021-05-14 07:16:58.483 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 9 retries left...
2021-05-14 07:16:58.986 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
....
2021-05-14 07:17:02.509 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 1 retries left...
2021-05-14 07:17:03.011 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
2021-05-14 07:17:03.011 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 0 retries left...
2021-05-14 07:17:03.512 [warning] <0.271.0> Could not successfully contact any node of: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local (as in Erlang distribution). Starting as a blank standalone node...
But I noticed that the cluster_status seems to show both nodes:
sh-4.4$ rabbitmqctl cluster_status
Cluster status of node rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local ...
Cluster name: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
Running Nodes
rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
Listeners
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Is it safe to ignore the errors? Are they possible just to due timing issue because the 2nd RabbitMQ node/pod has not come online yet?
Thanks!