Hi Miguel,
On 05:23 Wed 11 Jan , Miguel Cordas wrote:
> Hello,
>
> We have a 2 nodes Ganeti 2.16.1 cluster with 2 VMs (drbd).
>
> Bad idea : node 1 HD failed, VMs irresponsive. My friend rebooted both
> nodes…
>
> Now node 2 (which is master) is live, node 1 dead. All Ganeti commands on
> node 2 lead to timeout except “gnt-cluster getmaster”.
On a 2-node cluster with one failed node, the node that's left alive
does not have quorum and so Ganeti daemons will not start automatically.
`gnt-cluster getmaster` works because it simply reads a file from disk,
without contacting any of the Ganeti daemons.
> Our idea is to install Centos 7 and Ganeti 2.16.1 on 2 or 3 more
> nodes.
>
> We need some advice on the best path to recover the cluster and be able to
> start instances (mail server and ERP).
>
> Also, is there a way, since node2 is master, to be able to start instances
> on node 2 while it’s the only node left ?
Now, if I recall correctly, the normal way out of this is to start
ganeti-luxid and ganeti-wconfd with the `--no-voting --yes-do-it`
arguments on the surviving node. This should get Ganeti functional on
node 2 again, and give you the ability to start instances etc. You might
also want to mark node 1 as offline (using `gnt-node modify --offline`)
to tell Ganeti not to attempt to contact the offline node.
After you restore the cluster to a working state with more than 2 nodes
alive, be sure to remove those arguments and restart the daemons on node
2.
Cheers,
Apollon