RabbitMQ 3.7.7 sometimes fails to form cluster with rabbit_peer_discovery_classic

ferdinan...@gmail.com

unread,

Jul 16, 2018, 3:44:21 AM7/16/18

to rabbitmq-users

Hi,

When initially forming a RabbitMQ cluster using rabbit_peer_discovery_classic_config and starting all nodes (docker containers) at the same time, sometimes only two of three nodes form a cluster while the third remains a standalone node.

Logs

2018-07-13 06:36:19.728 [info] <0.191.0> Configured peer discovery backend: rabbit_peer_discovery_classic_config

2018-07-13 06:36:19.728 [info] <0.191.0> Will try to lock with peer discovery backend rabbit_peer_discovery_classic_config

2018-07-13 06:36:19.728 [info] <0.191.0> Peer discovery backend does not support locking, falling back to randomized delay

2018-07-13 06:36:19.728 [info] <0.191.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping randomized startup delay.

2018-07-13 06:36:19.728 [info] <0.191.0> All discovered existing cluster peers: rabbit@dockerhost2, rabbit@dockerhost1, rabbit@dockerhost0

2018-07-13 06:36:19.729 [info] <0.191.0> Peer nodes we can cluster with: rabbit@dockerhost2, rabbit@dockerhost0

2018-07-13 06:36:19.764 [warning] <0.191.0> Could not auto-cluster with node rabbit@dockerhost2: {error,tables_not_present}

2018-07-13 06:36:19.770 [warning] <0.191.0> Could not auto-cluster with node rabbit@dockerhost0: {error,tables_not_present}

2018-07-13 06:36:19.771 [warning] <0.191.0> Could not successfully contact any node of: rabbit@dockerhost2,rabbit@dockerhost0 (as in Erlang distribution). Starting as a blank standalone node...

Used versions

RabbitMQ Version: 3.7.7

Erlang Version: 20.2.3

Used config

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config

log.console.level = debug

cluster_formation.classic_config.nodes.1 = rabbit@dockerhost0

cluster_formation.classic_config.nodes.2 = rabbit@dockerhost1

cluster_formation.classic_config.nodes.3 = rabbit@dockerhost2

The issue looks like a race condition, however, at https://www.rabbitmq.com/cluster-formation.html#initial-formation-race-condition it is stated, that the rabbit_peer_discovery_classic_config backend avoids the issue of race conditions by relying on a pre-configured set of peers. When performing a randomized sleep in the start script before starting rabbitmq, the problem occurs less likely depending on the actual delay.

The probability of the occurrence of this problem is much higher with three nodes than with two nodes, but it was observed with two nodes as well.

Is this normal behavior?

Thanks

Ferdinand

Michael Klishin

unread,

Jul 17, 2018, 1:30:11 PM7/17/18

to rabbitm...@googlegroups.com

The log suggests that the node in question could not sync tables from its peers. It can be a side effect on parallel booting.

Start your containers sequentially or increase the delay range to something like 5-30 seconds [1]. The plugin has a low default

that is least annoying during development and may need adjustment depending on the specific deployment scenario.

You can also enable debug logging [2] to see what delay value is used.

1. Search for "randomized_startup_delay_range" on http://www.rabbitmq.com/cluster-formation.html

2. http://www.rabbitmq.com/cluster-formation.html#troubleshooting-cluster-formation

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

ferdinan...@gmail.com

unread,

Jul 19, 2018, 3:21:48 AM7/19/18

to rabbitmq-users

Configuring a randomized_startup_delay_range seems to have no effect when using the rabbit_peer_discovery_classic_config backend, see attached logs.

It states:

...Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping randomized startup delay.

The startup delay range was configured to 5-60 seconds, as visible at the top of the log file.

A randomized sleep in the start script seems necessary.

Ferdinand

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

rabbitmq.log

Reid Harrison

unread,

Jul 31, 2018, 1:13:03 AM7/31/18

to rabbitmq-users

I am having this same issue from time to time with 3.7.7. The cluster doesn't always form correctly when using the old cluster_nodes config.

Reid Harrison

unread,

Aug 13, 2018, 10:52:06 AM8/13/18

to rabbitmq-users

Michael, I have the same logs as Ferdinand. Randomized startup delay is not supported with classic config (Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping randomized startup delay) and when two cluster nodes enter cluster formation simultaneously, they both start as a standalone node (Could not successfully contact any node of: rabbit@host1,rabbit@host2 (as in Erlang distribution). Starting as a blank standalone node...).

Is it feasible to support randomized startup delay for classic cluster config?

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Aug 13, 2018, 6:14:48 PM8/13/18

to rabbitm...@googlegroups.com

It can be added but the right thing to do is to add cluster formation retries.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

RabbitMQ 3.7.7 sometimes fails to form cluster with rabbit_peer_discovery_classic_config

ferdinan...@gmail.com

Michael Klishin

ferdinan...@gmail.com

Reid Harrison

Reid Harrison

Michael Klishin