Hi Michael,
I was wondering if I can get a clarification on what you meant by "partially" resolved. We're running a cluster on 3.4.4 and came across a partition recently after restarting a node (should've been sent a SIGTERM). I'm wondering if it is related to this bug.
From the logs (below), it looks to me like a falsely detected partition:
nodeX restarted
nodeA and nodeB log nodeX as being down
nodeA and nodeB log nodeX as being up
nodeA and nodeB find that the other can talk to nodeX so they disconnect from each other.
I haven't found any mentioning of partitions in the logs for nodeX at the time and there were three other nodes.
Is it possible that there's a bug with partition detection on fast restarts?
Thanks,
Paul
On nodeX:
=INFO REPORT==== 28-Feb-2015::21:41:19 === Setting permissions...
=INFO REPORT==== 28-Feb-2015::21:41:26 === Starting RabbitMQ 3.4.4 on Erlang R16B Copyright (C) 2007-2014 GoPivotal, Inc. Licensed under the MPL. See
http://www.rabbitmq.com/=INFO REPORT==== 28-Feb-2015::21:41:26 === ...
=INFO REPORT==== 28-Feb-2015::21:41:26 === Limiting to approx 99900 file handles (89908 sockets)
=INFO REPORT==== 28-Feb-2015::21:41:29 === Memory limit set to 72471MB of 96628MB total.
=INFO REPORT==== 28-Feb-2015::21:41:29 === Disk free limit set to 50MB
On nodeA:
=INFO REPORT==== 28-Feb-2015::21:41:28 === node 'rabbit@nodeX' down: connection_closed
=INFO REPORT==== 28-Feb-2015::21:41:28 === node 'rabbit@nodeX' up
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueA' in vhost '/': Master <rabbit@nodeA> saw deaths of mirrors <rabbit@nodeX>
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueB' in vhost '/': Slave <rabbit@nodeA> saw deaths of mirrors <rabbit@nodeX>
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueC' in vhost '/': Master <rabbit@nodeA> saw deaths of mirrors <rabbit@nodeX>
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueD' in vhost '/': Slave <rabbit@nodeA> saw deaths of mirrors <rabbit@nodeX>
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueD' in vhost '/': Promoting slave <rabbit@nodeA> to master
=INFO REPORT==== 28-Feb-2015::21:41:28 === Mirrored queue 'queueE' in vhost '/': Slave <rabbit@nodeA> saw deaths of mirrors <rabbit@nodeX>
=ERROR REPORT==== 28-Feb-2015::21:41:28 === Partial partition detected: * We saw DOWN from rabbit@nodeX * We can still see rabbit@nodeB which can see rabbit@nodeX We will therefore intentionally disconnect from rabbit@nodeB
On nodeB:
=INFO REPORT==== 28-Feb-2015::21:41:28 === node 'rabbit@nodeX' down: connection_closed
=INFO REPORT==== 28-Feb-2015::21:41:28 === node 'rabbit@nodeX' up
=ERROR REPORT==== 28-Feb-2015::21:41:28 === Partial partition detected: * We saw DOWN from rabbit@nodeX * We can still see rabbit@nodeA which can see rabbit@nodeX We will therefore intentionally disconnect from rabbit@nodeA