RabbitMQ Pause Minority Issue

451 views
Skip to first unread message

Atul Gharat

unread,
Aug 14, 2018, 5:43:43 AM8/14/18
to rabbitmq-users
Hi Team,

We have 3 rabbitMQ nodes which are configured in HA clustering with pause_minotrity cluster partition method.

We have blocked traffic from node3 to node2 only, but still node3 went down by throwing below exception :

ERROR:
2018-08-14 13:59:46.995 [error] <0.1353.0> Partial partition detected:
 * We saw DOWN from rabbit@node2
 * We can still see rabbit@node1 which can see rabbit@node2
 * pause_minority mode enabled
We will therefore pause until the *entire* cluster recovers
2018-08-14 13:59:46.995 [warning] <0.1353.0> Cluster minority/secondary status detected - awaiting recovery
2018-08-14 13:59:46.995 [info] <0.2253.0> RabbitMQ is asked to stop...


RabbitMQ Config file:

 [
 {rabbit,
  [
   {cluster_partition_handling,pause_minority},
        {cluster_nodes, {['rabbit@inode1, 'rabbit@node2','rabbit@node3'], disc}}

]}
].

Michael Klishin

unread,
Aug 14, 2018, 2:20:49 PM8/14/18
to rabbitm...@googlegroups.com
This is called a partial partition which in RabbitMQ are promoted to "full" partitions.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

de...@novigado.ee

unread,
Aug 20, 2018, 8:32:33 AM8/20/18
to rabbitmq-users
In my case there are 3 clustered brokers with HA queues set as 1 master and 1 mirror on another node. Each node is in separate AZ.

There was similar situation like mentioned above (that link between 2 AZ went down), but in my case 2 nodes that could not see each other entered paused minority mode and rebooted itself.

In this case queues that used one of those 2 nodes as master and another one as mirror, completely went unavailable.

I expected that is routing between AWS AZ more redundant and communication would be routed over another AZ, but since is this not case, I'm now looking into solution for this kind of problems.

Is there any recommended best practices to mitigate this issue?

Reply all
Reply to author
Forward
0 new messages