Ganeti 2.16.1 on Centos 2 nodes cluster recover

9 views
Skip to first unread message

Miguel Cordas

unread,
Jan 11, 2023, 8:23:14 AM1/11/23
to ganeti
Hello,

We have a 2 nodes Ganeti 2.16.1 cluster with 2 VMs (drbd).

Bad idea : node 1 HD failed, VMs irresponsive. My friend rebooted both nodes…

Now node 2 (which is master) is live, node 1 dead. All Ganeti commands on node 2 lead to timeout except “gnt-cluster getmaster”.

Our idea is to install Centos 7  and Ganeti 2.16.1 on 2 or 3 more nodes.

We need some advice on the best path to recover the cluster and be able to start instances (mail server and ERP).

Also, is there a way, since node2 is master, to be able to start instances on node 2 while it’s the only node left ?

Thanks al lot for your advices, remarks and insights.

Miguel.


Apollon Oikonomopoulos

unread,
Jan 11, 2023, 8:32:32 AM1/11/23
to gan...@googlegroups.com
Hi Miguel,

On 05:23 Wed 11 Jan , Miguel Cordas wrote:
> Hello,
>
> We have a 2 nodes Ganeti 2.16.1 cluster with 2 VMs (drbd).
>
> Bad idea : node 1 HD failed, VMs irresponsive. My friend rebooted both
> nodes…
>
> Now node 2 (which is master) is live, node 1 dead. All Ganeti commands on
> node 2 lead to timeout except “gnt-cluster getmaster”.

On a 2-node cluster with one failed node, the node that's left alive
does not have quorum and so Ganeti daemons will not start automatically.
`gnt-cluster getmaster` works because it simply reads a file from disk,
without contacting any of the Ganeti daemons.

> Our idea is to install Centos 7 and Ganeti 2.16.1 on 2 or 3 more
> nodes.
>
> We need some advice on the best path to recover the cluster and be able to
> start instances (mail server and ERP).
>
> Also, is there a way, since node2 is master, to be able to start instances
> on node 2 while it’s the only node left ?

Now, if I recall correctly, the normal way out of this is to start
ganeti-luxid and ganeti-wconfd with the `--no-voting --yes-do-it`
arguments on the surviving node. This should get Ganeti functional on
node 2 again, and give you the ability to start instances etc. You might
also want to mark node 1 as offline (using `gnt-node modify --offline`)
to tell Ganeti not to attempt to contact the offline node.

After you restore the cluster to a working state with more than 2 nodes
alive, be sure to remove those arguments and restart the daemons on node
2.

Cheers,
Apollon

Miguel Cordas

unread,
Jan 11, 2023, 4:48:58 PM1/11/23
to gan...@googlegroups.com
Hello,

Thanks Apollon : VMs are up and running and we are setting up 2 more nodes.

Afterwards, we’ll think about an upgrade path…

Cheers,

Miguel.

--
You received this message because you are subscribed to a topic in the Google Groups "ganeti" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ganeti/FiyFFtFlDEg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ganeti+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ganeti/Y766aynQnh1fd%2BVa%40marvin.dmesg.gr.
Reply all
Reply to author
Forward
0 new messages