RabbitMQ HA queue mirroring - messages lost

121 views
Skip to first unread message

Emma Niklasson

unread,
Apr 29, 2016, 10:41:24 AM4/29/16
to rabbitmq-users
Hi everyone,

I recently experienced a network partition in an MQ cluster, resulting in all server1 queues disappearing from server2 and vice versa, so I am currently trying to set up mirrored queues to combat this specific problem; however, I am having some problems.

A summary:

- In my test setup (using rabbitmq-server 3.3.4 on CentOS 6), I had one node (server1) with a bunch of queues with messages in; I added another node (server2) to this setup
- I mirrored the queues from server1 to server2 using a ha-mode:all policy, which seems to have worked; in the interface I could see all queues being mirrored; I intentionally used manual synching
- I then tried stopping rabbitmq-server on server1, and as expected, the pre-existing messages from server1 were no longer seen on server2, whereas the queues were still present and presumably functional
- I then rejoined server1 (service rabbitmq-server start) to server2, and found that all messages that had been on server1 had disappeared

I have a few questions surrounding this:

- Is it by design that server1 (my original master node) lost all its messages when it connected back to server2 ? Is there a way to make them "merge" rather than overwrite each others' configs?
- Is it possible (with or without upgrading) to facilitate automatic synching that does not cause the queue to stop responding? This would take care of some of my problems.
- What would be the most reliable setup possible for RabbitMQ at the moment? That is, what setup would be most tolerant of network and system failures?

Thanks!

Emma

Michael Klishin

unread,
Apr 29, 2016, 12:49:20 PM4/29/16
to rabbitm...@googlegroups.com
What exactly happens when nodes reconnect after a network split differs according to the partition strategy
used but yes, one side of the partition intentionally resets and syncs from the other.
There is no way to merge message sets at the moment, although we are interested in introducing some way of doing
this in a future version.

The is no one size fits all suggestion for what would be "the most reliable." A lot of this depends on how your applications
react to e.g. RabbitMQ disconnecting clients when a the minority side of a partition pauses itself.

Mirroring to more than half of the nodes, using an odd number of nodes, using publisher confirms with sensible
re-publishing in your apps, and using the autoheal partition handling strategy is one combination that our users
report as good enough for them but YMMV.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages