On 7 April 2015 at 17:07:57, Dev Imagicle (
dev.im...@gmail.com) wrote:
> Thank you Michael for your quick reply.
> I'm not sure to have well understood you answer. Are you tell me
> there is no way to get messages from a mirrored queue when network
> connection is restored and partition is detected?
There is. The minority of nodes will reset and sync with the majority.
> Is there any way to get M3 and M4 before restarting the losing nodes C and D?
It depends on which node queue master resides. If it's A or B, then C and D will have to
* Detect a partition
* Re-connect
* Reset and re-sync
then they can be used.
In the pause_minority mode, nodes in the minority will drop their client connections and refuse to accept new ones to make client re-connect to
the winning side.
In autoheal, nodes will restart and attempt to re-connect. This drops client but accepts new connections.
If you see basic.consume hanging on C and D, this means the queue you're consuming from currently has master on A or B,
but C/D is re-connecting OR haven't noticed that the network connection is dead. Yes, it does not happen immediately and on Linux
takes 75 * 9 seconds by default [1] because Linux defaults are absolutely out of touch with the times.
Erlang nodes have their own mechanism of detection of down peers. See [2]. With RabbitMQ the default value is 30 (seconds).
Do not set it below 3, the risk of false positives would be quite high.
1.
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
2.
https://www.rabbitmq.com/nettick.html