How do I restart a cluster after an all-node failure?

1,307 views
Skip to first unread message

Alexander

unread,
Jul 6, 2016, 10:07:00 AM7/6/16
to rabbitmq-users
Hello!
I have a cluster of 4 nodes, in disc mode, on 4 virtual machines. 
When I crash all 4 of them (as if there was an electrical blackout), on restart they don't reunion in a cluster again, beacuse no node knows what the last running node was.
So I have to delete /var/lib/rabbitmq/mnesia directory on each of them and manually recluster.
But then all the messages that can be stored in mnesia will be lost! 
Is there any way to set a cluster up after such a failure without mnesia deletion?
Maybe change some entries in mnesia on restart?

Configuration: rabbitmq 2.8.2 (due to corporate requirements), ubuntu 14.04.
Thanks! 

Michael Klishin

unread,
Jul 6, 2016, 10:22:46 AM7/6/16
to rabbitm...@googlegroups.com
The docs describe how to do rolling restarts. If a node cannot be recovered it should be removed from the cluster.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander

unread,
Jul 6, 2016, 10:39:15 AM7/6/16
to rabbitmq-users
Can you give a link please? I haven't found it in docs

среда, 6 июля 2016 г., 17:22:46 UTC+3 пользователь Michael Klishin написал:

Michael Klishin

unread,
Jul 6, 2016, 10:41:16 AM7/6/16
to rabbitm...@googlegroups.com
"Breaking up a cluster", "Upgrading clusters" on

V Z

unread,
Jul 7, 2016, 2:03:09 AM7/7/16
to rabbitmq-users
Would the answer to this question be to use 'rabbitmqctl force_boot' (per http://www.rabbitmq.com/clustering.html) on one of the nodes, and then start the rest?

Michael Klishin

unread,
Jul 7, 2016, 4:27:49 AM7/7/16
to rabbitm...@googlegroups.com, V Z
That, too.  

On 7 July 2016 at 09:03:12, V Z (uvzu...@gmail.com) wrote:
> Would the answer to this question be to use 'rabbitmqctl force_boot' (per http://www.rabbitmq.com/clustering.html)
> on one of the nodes, and then start the rest?
>
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To post to this group, send an email to rabbitm...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
MK

Staff Software Engineer, Pivotal/RabbitMQ


V Z

unread,
Jul 7, 2016, 5:28:31 PM7/7/16
to rabbitmq-users, uvzu...@gmail.com
Is one option better than another?

Joseph Casale

unread,
Jul 7, 2016, 8:26:01 PM7/7/16
to rabbitm...@googlegroups.com, uvzu...@gmail.com
On Thu, Jul 7, 2016 at 3:28 PM, V Z <uvzu...@gmail.com> wrote:
> Is one option better than another?

Check the logs and infer who was the master and thus the holder of the
likely most up to date queue, force it up and reset the rest.

Rabbit Mq

unread,
Jul 7, 2016, 11:17:49 PM7/7/16
to rabbitmq-users
Is either way to recovery less destructive?

Alexander

unread,
Jul 8, 2016, 2:58:24 AM7/8/16
to rabbitmq-users, uvzu...@gmail.com
Thank you guys, but that's all for new versions. There is no command "force_boot" in rabbitmqctl 2.8.2, nor there are commands like 'forget_cluster_node' and similar. When upgrading you still need to know what disc node was the last one and make it an upgrader.
Breaking cluster with "reset" and "force_reset" doesn't take effect if all nodes have crashed at once and no node knows which one was the last. Moreover, if I'm not mistaken, the reset/force_reset command returns mnesia db to it's virgin state, meaning, all persistent messages will be lost.

четверг, 7 июля 2016 г., 11:27:49 UTC+3 пользователь Michael Klishin написал:

V Z

unread,
Jul 9, 2016, 12:42:06 AM7/9/16
to rabbitmq-users
Does force_boot reset amnesia?
Reply all
Reply to author
Forward
0 new messages