Unable to get messages from mirrored queues in partitioned mode

120 views
Skip to first unread message

Dev Imagicle

unread,
Apr 7, 2015, 7:14:02 AM4/7/15
to rabbitm...@googlegroups.com
Hi all,
I have a cluster of 4 nodes, the network connections among nodes are stable enough, but a network partition may occur.
I have mirrored queues and I don't want to loose any message published in these queues (I can tolerate duplications); therefore I set autoheal property to ignore, in order to deal with partition recovery through a dedicated tool able to backup messages before restarting RabbitMQ service.
Sometimes I'm not able to get messages from the mirrored queues after network connection is restored and RabbitMQ works in partitioned mode: in particular functions basicget and basicconsume hang. This happens on the all nodes of the cluster.
In this situation all nodes are partitioned each other, but mirrored queues are partially replicated, none of the queues is down or crashed.

Could you help me to understand if this is an expected behavior (I hope no) or it's a bug? Is there is a safe manner to get messages from mirrored queues when the broker is working in partitioned mode?

Thanks and Regards,
Riccardo

Michael Klishin

unread,
Apr 7, 2015, 8:46:49 AM4/7/15
to rabbitm...@googlegroups.com, Dev Imagicle
On 7 April 2015 at 14:14:04, Dev Imagicle (dev.im...@gmail.com) wrote:
> Could you help me to understand if this is an expected behavior
> (I hope no) or it's a bug? Is there is a safe manner to get messages
> from mirrored queues when the broker is working in partitioned
> mode?

There is no "merge two divergent data sets" handling strategy, so in all existing ones the "losing"
part of the partition (the minority) re-syncs from the "winner" and is not available until that happens.

AP mode for clustering is coming some time in the future but no promises on the date. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Dev Imagicle

unread,
Apr 7, 2015, 10:07:56 AM4/7/15
to rabbitm...@googlegroups.com, dev.im...@gmail.com
Thank you Michael for your quick reply.
I'm not sure to have well understood you answer. Are you tell me there is no way to get messages from a mirrored queue when network connection is restored and partition is detected?

I'll try to explain you my scenario and what I want to do in order to avoid message lost.
In my 4 nodes cluster (A, B,C,D) I have a mirrored queue. I disconnect two nodes (A,B) from the others (C,D) then I publish messages M1 and M2 on A and M3, M4 on C. I don't consume messages.
When I reconnect the network cable and the management plugin says "Network partition detected" my tool decides that A and B belongs to the winner partition while C and D are the losers and will be restarted.
Before restarting C and D I try to read M3 and M4 from the mirrored queue to save them, but both functions basicget or basiccomsum stuck.

Is there any way to get M3 and M4 before restarting the losing nodes C and D? 

Thanks,
Riccardo

Michael Klishin

unread,
Apr 7, 2015, 10:34:41 AM4/7/15
to rabbitm...@googlegroups.com, Dev Imagicle
On 7 April 2015 at 17:07:57, Dev Imagicle (dev.im...@gmail.com) wrote:
> Thank you Michael for your quick reply.
> I'm not sure to have well understood you answer. Are you tell me
> there is no way to get messages from a mirrored queue when network
> connection is restored and partition is detected?

There is. The minority of nodes will reset and sync with the majority.

> Is there any way to get M3 and M4 before restarting the losing nodes C and D? 

It depends on which node queue master resides. If it's A or B, then C and D will have to

 * Detect a partition
 * Re-connect
 * Reset and re-sync

then they can be used.

In the pause_minority mode, nodes in the minority will drop their client connections and refuse to accept new ones to make client re-connect to
the winning side.

In autoheal, nodes will restart and attempt to re-connect. This drops client but accepts new connections.

If you see basic.consume hanging on C and D, this means the queue you're consuming from currently has master on A or B,
but C/D is re-connecting OR  haven't noticed that the network connection is dead. Yes, it does not happen immediately and on Linux
takes 75 * 9 seconds by default [1] because Linux defaults are absolutely out of touch with the times.

Erlang nodes have their own mechanism of detection of down peers. See [2]. With RabbitMQ the default value is 30 (seconds).
Do not set it below 3, the risk of false positives would be quite high.

1. http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
2. https://www.rabbitmq.com/nettick.html

Dev Imagicle

unread,
Apr 9, 2015, 6:03:43 AM4/9/15
to rabbitm...@googlegroups.com, dev.im...@gmail.com
I have this configuration:
4 nodes Windows server 2008 R2 Enterprise {HA-TEST1, HA-TEST2, MARCOSRV1, MARCOSRV2)
RabbitMQ 3.5.0
Erlang 17.1
{cluster_partition_handling, ignore}
.NET client 3.5.0

The 4 nodes are connected to the same switch.
After some hours I checked the status of the cluster and I found it in a network partition reported below, during this time no network failures occured.

NodeWas partitioned from
imagiclerabbit@HA-TEST1imagiclerabbit@MARCOSRV2
imagiclerabbit@MARCOSRV1imagiclerabbit@MARCOSRV2
imagiclerabbit@MARCOSRV2imagiclerabbit@HA-TEST1
imagiclerabbit@MARCOSRV1

The 4 node repor the same information about network partition, while node status is different:
- HA-TEST1 sees MARCOSRV2 offline
- HA-TEST2 sees all nodes online
- MARCOSRV1 sees MARCOSRV2 offline
- MARCOSRV2 sees both HA-TEST1 and MARCOSRV1 offline

The mirrored queues status is reported below:

ha_FaxJobimagiclerabbit@MARCOSRV2D ha-all
running
0000.00/s0.00/s0.00/s
ha_IncomingFaximagiclerabbit@HA-TEST1 +1D ha-all
running
000
ha_IncomingFaxPrintimagiclerabbit@MARCOSRV1 +1D ha-all
running
000
ha_MailNotificationimagiclerabbit@MARCOSRV1 +1D ha-all
running
0220.00/s0.00/s0.00/s
ha_Retryimagiclerabbit@MARCOSRV2 +1D ha-all
running
0000.00/s0.00/s0.00/s
ha_RetryIncomingFaxPrintimagiclerabbit@MARCOSRV2 +1D ha-all
running
000
ha_RetryMailNotificationimagiclerabbit@HA-TEST1 +1D ha-all
running
000
ha_Submitimagiclerabbit@HA-TEST1 +1D ha-all
running
0000.00/s0.00/s0.00/s

In this status I have threads blocked during a basicget operation.

Could you help me to understand how the cluster fell into this status without network failures?
I caught logs from the 4 nodes, you can find them as attachments, I suppose the problem started from MARCOSRV2, could you tell me wath happened in my cluster?

Thanks,
Riccardo
Logs.zip

Michael Klishin

unread,
Apr 9, 2015, 6:40:16 AM4/9/15
to Dev Imagicle, rabbitm...@googlegroups.com
There can be plenty of reasons: vMotion and similar look like temporary network interruptions to the runtime.
OS swapping can prevent VMs from responding to each other.

So check your infrastructure monitoring. 

MK
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Logs.zip>
Reply all
Reply to author
Forward
0 new messages