> Before 3.4.0 is out, you need to make sure that mirrors are synchronised
> before shutting down master in this experiment.
>
> With over 1M queues, sync can take some time but after that, published
> message
> that is routed to mirrored queue(s) will be delivered to mirror(s)
> before RabbitMQ
> confirms the publish.
Right, so I performed the following steps:
1. Set up a 2 node cluster, rabbit and rabbit2.
# deliberately not using automatic sync in step 2
2. Apply a policy: set_policy ha-all "^ha\." '{"ha-mode":"all"}'
3. Connect to rabbit, publish 100K messages that route to a queue named "ha.q1"
4. Manually perform sync to rabbit2
5. Shut down rabbit
6. See rabbit2 elected master for ha.q1
7. See ha.q1 still have 100K messages
8. Bring rabbit back
9. See rabbit become a mirror of ha.q1, sync it
10. Shut down rabbit2
11. Check that ha.q1 still has 100K messages
12. Bring rabbit2 back
13. See rabbit2 become a mirror of ha.q1, unsynchronised
Since rabbit2 is not unsynchronised, it will not be elected master should rabbit shut down.
Let's try it:
14. Shut down rabbit
15. See ha.q1 master move to rabbit2 and have 0 messages
16. Bring back rabbit
17. Publish 100K messages that are routed to ha.q1
18. ha.q1 now has 100K messages, master is on rabbit2, mirror on rabbit
19. Ensure rabbit is synchronised
20. Shut down rabbit2
21. ha.q1 master is now on rabbit, with 100K messages enqueued
22. Bring rabbit2 back
23. ha.q1 master is on rabbit, with 1 mirror, still with 100K messages
I think this is enough evidence to suggest that you are indeed running into what's described in
http://next.rabbitmq.com/ha.html#cluster-shutdown and need to give slaves
some time to sync before shutting down master.
With 1.1M messages it takes several messages on my 1-2 year old Core i7 machine with an SSD.