We are running a RabbitMQ cluster with two nodes behind a load balancer.
Fault tolerance and no message loss are the key requirements.
For HA, we have the following policy in place:
Name: Lazy HA
Pattern: .*
Apply to: all
Definition:
ha-mode: exactly
ha-params: 2
ha-sync-mode: automatic
queue-master-locator: min-masters
queue-mode: lazy
We expect that at last every ~3 months we need to restart the cluster node by node, to apply OS level patches.
This should be transparent to the RabbitMQ clients.
Doing some test runs, simulating load while restarting the RabbitMQ cluster node by node we found that:
- No messages are lost (which is good!)
- Some messages are delivered a second time, where the second message has flag redelivered=true
Of course this is expected in case of a crash of a RabbitMQ node.
However, in case of a graceful shutdown my hope is to have no duplicates at all.
I.e. ideal scenario is: Cluster node 1 is finishing its work, close all connections, synchronizes status to cluster node 2.
Then cluster node 2 takes over, and there are no in-doubt messages to be delivered (again).
Question is if and how can that be achieved?
Do we have to shutdown RabbitMQ with different commands (not just rabbitmqctl shutdown)?
Best regards,
- Joachim