Restore a Cluster from a Node Crash

423 views
Skip to first unread message

Navin Ilango

unread,
Mar 5, 2015, 5:18:21 PM3/5/15
to rabbitm...@googlegroups.com
Hi ,

Couple Questions,

1) I had a 3 Node(R1, R2, R3) cluster up and running. I brought the one node(R2) down(simulated a server crash), how to remove the references of R2 in R1 and R3.
2) Brought back R2 up and running, but was not able to join the cluster again since there was reference of R2 in R1 and R2.


Log While adding the crated node back to the cluster: Message says R2 is already part of the cluster but its not.

[root@rabbitmq2 ~]# rabbitmqctl join_cluster --ram rabbit@rabbitmq1

Clustering node rabbit@rabbitmq2 with rabbit@rabbitmq1 ...

...done (already_member).

[root@rabbitmq2 ~]# rabbitmqctl cluster_status

Cluster status of node rabbit@rabbitmq2 ...

[{nodes,[{disc,[rabbit@rabbitmq2]}]}]

...done.

[root@rabbitmq2 ~]# 



Log while removing a crashed node from the cluster:

[root@rabbitmq1 ~]# rabbitmqctl cluster_status

Cluster status of node rabbit@rabbitmq1 ...

[{nodes,[{disc,[rabbit@rabbitmq1,rabbit@rabbitmq2,rabbit@rabbitmq3]}]},

 {running_nodes,[rabbit@rabbitmq3,rabbit@rabbitmq1]},

 {cluster_name,<<"rabbit@rabbitmq1">>},

 {partitions,[]}]

...done.

[root@rabbitmq1 ~]# rabbitmqctl forget_cluster_node rabbit@rabbit2

Removing node rabbit@rabbit2 from cluster ...

Error: {not_a_cluster_node,"The node selected is not in the cluster."}

[root@rabbitmq1 ~]# Connection to 127.0.0.1 closed by remote host.

Connection to 127.0.0.1 closed.

Michael Klishin

unread,
Mar 5, 2015, 5:27:17 PM3/5/15
to Navin Ilango, rabbitm...@googlegroups.com
 On 6 March 2015 at 01:18:24, Navin Ilango (navi...@gmail.com) wrote:
> 1) I had a 3 Node(R1, R2, R3) cluster up and running. I brought
> the one node(R2) down(simulated a server crash), how to remove
> the references of R2 in R1 and R3.

rabbitmqctl forget_cluster_node

> 2) Brought back R2 up and running, but was not able to join the cluster
> again since there was reference of R2 in R1 and R2.

If R2 comes back and can contact R1 or R3, it should be able to re-join the cluster.

You log suggests you've removed it from the cluster. To make it re-join the cluster, reset it
and use join_cluster or automatic clustering:

http://www.rabbitmq.com/clustering.html#auto-config
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Navin Ilango

unread,
Mar 6, 2015, 10:04:49 AM3/6/15
to rabbitm...@googlegroups.com, navi...@gmail.com
Thanks Micheal. what if R1 crashes , since thats the master node to which R2 and R3 joins to form a cluster. 

Michael Klishin

unread,
Mar 6, 2015, 10:06:29 AM3/6/15
to Navin Ilango, rabbitm...@googlegroups.com
On 6 March 2015 at 18:04:52, Navin Ilango (navi...@gmail.com) wrote:
> Thanks Micheal. what if R1 crashes , since thats the master node
> to which R2 and R3 joins to form a cluster.

master will be moved to R2 or R3. 

Navin Ilango

unread,
Mar 6, 2015, 1:03:36 PM3/6/15
to rabbitm...@googlegroups.com, navi...@gmail.com
Yeah, It works, simulated it. Will I loose the messages on the masterQueue if the server which has the masterQueue dies. I still have 'ha-sync-mode' => 'automatic' in my policy. But em I kind of lost all the messages that was already in the queue.

Michael Klishin

unread,
Mar 6, 2015, 1:07:30 PM3/6/15
to Navin Ilango, rabbitm...@googlegroups.com
On 6 March 2015 at 21:03:38, Navin Ilango (navi...@gmail.com) wrote:
> Will I loose the messages on the masterQueue if the server which
> has the masterQueue dies.

For queues that are mirrored, you should not (at least normally, when mirrors are in sync – which
is their normal state when they are online). For non-mirrored ones, yes. That's why mirroring exists.
Reply all
Reply to author
Forward
0 new messages