On 3 April 2015 at 05:29:38, Paul Ruan (
paul...@dropbox.com) wrote:
> I'm trying to figure out what the recommended way is for recovering
> from a partition.
> From
https://www.rabbitmq.com/partitions.html, it sounds
> like a reasonable way is to stop all the nodes and then start them
> up again:
> "It may be simpler to stop the whole cluster and start it again;
> if so make sure that the first node you start is from the trusted
> partition."
Note that in any partition there are two sides: it should be fine to only stop nodes
in the minority. If you don't know which side is that and can afford to stop all
of them, that's fine.
> For stopping each node, is it enough to do a rabbitmqctl stop_app?
> Or do we have to do a rabbitmqctl stop? Is it not necessary to reset
> the nodes in the untrusted partitions?
> Also, is it not recommended to stop nodes using SIGTERM?
When nodes on the minority side stop themselves, they do what is effectively stop_app.
However, as far as manual interventions go, `rabbitmqctl stop ${PID_FILE}` is also
fine — that's what our Debian init script uses, for example.
Note that if you stop all nodes, the last node to stop must be the first one to start
to act as a seed for other nodes.
--
MK
Staff Software Engineer, Pivotal/RabbitMQ