Hi,
We are considering using Rabbit 3.6.2 with 2 nodes (each in different EC2 AZ's) in a cluster.
We would like one node (say node A in normal operation) to be the master for all HA queues and point our Java clients at a single address.
In the case of a failure of the master node A (where it can not be restarted for whatever reason) we would like to manually failover to the second node B. This would involve configuring a DNS like mechanism so that the Java clients use the new 'master' IP address. We prioritize messages not being lost over availability.
It's my understanding that in normal operation (with our clients pointed at node A only) then unless we shut down node A first by accident (and then B becomes the master) then node A would be the master for all HA queues. In a network partition case, if we chose autoheal for
cluster_partition_handling then node A might be chosen as the master given the stated rules in
https://www.rabbitmq.com/partitions.html but the result is uncertain (in an outage there might be no clients at all). We could use the ignore mode but that would require running commands whenever there is a partition (even if A is still running).
Given what we'd like is to nominate node A to be the master and then manually switch to node B (if node A could not be restarted), it seems the pause-if-all-down setting with the "listed nodes" defined as [node A] is ideal for us.
If there is a network partition then node A would be always be the master when the partition ends because B will pause during the partition. Any failure of B or partition would presumably result in Node B syncing up with A after.
My questions are:
1. Are there any particular issues with 3.6.2 meaning we should avoid it for production, say versus other versions e.g. 3.5.x or 3.6.x ?
2. Is there any example of configuring this setting? i.e. is this correct ? {cluster_partition_handling, {pause_if_all_down, [nodeA_name], autoheal}}
3. Is it possible to use the listed nodes list to achieve this?
4. If so what would be the process to change the value's cluster_partition_handling listed nodes to point at Node B? Would it be to shutdown all Rabbit servers, modify the configuration and bring up Node B first ?
Thanks for any advice! Dave