What cluster_partition_handling configuration is recommended for a two node cluster?

Kiran D

unread,

May 30, 2017, 2:39:32 PM5/30/17

to rabbitmq-users

Our RabbitMQ deployment involves a simple 2 node cluster on two hosts with HA Queues. We are faced with determining how to handle network partition in the event of a network failure. I am testing the autoheal option and have a few observations. I would like to know what is the recommended configuration to handle network partition. Our applications behave erratically on a partition and we cannot ignore a partition event.

My test setup:

1. RabbitMQ cluster nodes rabbit_01 and rabbit_02 (rabbit_01, rabbit_02 are two Ubuntu VMs on a host-only network).

2. cluster_partition_handling = autoheal

3. No rabbitmq clients were operating during the test.

4. RabbitMQ Broker version: 3.6.1

Failure conditions tested:

1. Disconnect network adapters for rabbit_02.

2. Tail rabbitmq logs and/or monitor the web console to be notified of a partition.

3. The autoheal option forces rabbit_02 to be frozen / disabled.

4. Reconnect network adapters for rabbit_02.

5. Autoheal option results in rabbit_02 rejoining the cluster.

Above behavior is not consistent though. I noticed it works 3 out of 5 times. On certain times, the step 3 where autoheal restarts rabbitmq broker on rabbit_02 fails and the node never recovers. I had to manually restart rabbitmq service to restore the broker and cluster. Has anyone faced this issue before?

Questions:

1. Is there something I am missing in above configuration for autoheal?

2. What is the best partition handling configuration for a two node cluster?

3. At this time, we are ok with loss of service on one node as long as the system detects partition quickly and forces all clients to be connected to a single node.

4. Is there any way to programatically detect network partition in our clients? We have clients written in C, C# and Java running on this system and they are all part of the same software system.

Michael Klishin

unread,

May 30, 2017, 3:49:26 PM5/30/17

to rabbitm...@googlegroups.com

According to 3), "pause_minority" could work well for you but autoheal can also satisfy those properties.

How quickly a partition detected does not depend on the strategy used but rather on what peer inactivity

timeout is used: http://www.rabbitmq.com/nettick.html. Do not use values that are very low (e.g. 1 second) as it will lead

to false positives.

It works in a similar fashion for client connections:

http://rabbitmq.com/heartbeats.html

The best configuration is to use 3 (or any odd number) of nodes because with 2 nodes determining

which side is the majority is great fun: there is no right answer.

3.6.1 is 9 releases behind and lacks certain fixes related to autoheal specifically more at

http://www.rabbitmq.com/changelog.html.

You can request node status using the HTTP API that ships with the management plugin:

http://www.rabbitmq.com/management.html.

Lastly, while Java and C# clients can use address lists and support automatic

connection recovery, librabbitmq-c does not, and recovery of client connections is an important

part of system recovery as a whole.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Kiran D

unread,

May 30, 2017, 3:59:40 PM5/30/17

to rabbitmq-users

Thanks Michael. We have accounted for the auto-recovery of connection in rabbitmq-c clients. I will upgrade to the latest Broker version and test again.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward