Best way to check the node status (HA in Ganeti)

183 views
Skip to first unread message

fabien...@gmail.com

unread,
Apr 13, 2016, 9:55:30 AM4/13/16
to ganeti
Hi,

I'm trying to setup some sort of HA with Ganeti. I read some articles and previous email about Ganeti with Pacemaker/Corosync/... and it looks a bit tricky.
http://docs.ganeti.org/ganeti/2.17/html/design-linuxha.html

I'm thinking of keeping things simple to avoid any automated risky operations.
What I want to do is to check node status and if it doesn't respond, I will just power it off using ipmi, and set it offline in ganeti to let harep do the other part of the job (move instances, ...).
If I do it this way, maybe I won't put any instance by default on the master node, so I don't need to deal with it (if the master is down, it doesn't prevent the VMs from running fine on the other nodes).

What would be the best way to check a node status ?
ping is simple but doesn't give the real state of the node.
With gnt-instance list, I get a ERROR_nodedown which is not so bad.
I had also a look at rapi, but I'm not sure it's the best tool to do that...

Any suggestion ?

Thanks !

Fabien

candlerb

unread,
Apr 14, 2016, 7:58:35 AM4/14/16
to ganeti
On Wednesday, 13 April 2016 14:55:30 UTC+1, fabien...@gmail.com wrote:

What would be the best way to check a node status ?

I think you've just hit on the fundamental problem, which is why ganeti doesn't do all this for you already.

How do you know *for sure* that a node is down? As opposed to e.g.
- just running very slowly
- the switch port that its NIC is connected to has failed
- some other problem

If you mark the node as offline when it isn't, and you restart instances elsewhere, then you will have multiple running copies of the same instance (= split brains).

You mention IPMI, which means you're happy to shoot the other node in the head. But this also means you're not talking about full server failures (e.g. loss of power supply); you're considering partial failures where "something went wrong" on the node, but the hardware is still up (at least to talk to the IPMI controller).

Detecting "something went wrong" for a suitable threshold of "wrongness" is hard indeed.
 
 
ping is simple but doesn't give the real state of the node.
With gnt-instance list, I get a ERROR_nodedown which is not so bad.

That just reports the state of the node as seen from the master - i.e. can the master communicate with the slave. If the network or NIC port is down, or the remote host is so heavily overloaded that it doesn't respond in a reasonable time, or a bad iptables rule has been introduced, it will appear to be down.

 
I had also a look at rapi, but I'm not sure it's the best tool to do that...
 
 
RAPI just gives you the same information as you'd get from the command line on the master.

> Any suggestion ?

Think carefully whether you really want to do this in the first place.

An alternative approach would be to do high availability at the service level. That is:

1. Run multiple instances for the same service

2. Use instance exclusion tags to ensure that the iallocator never runs them on the same node

3. Do your failover at the application level: e.g. with CARP, or a load-balancer in front of the cluster, or any of the tools you'd normally use to distribute load across a redundant pair of physical servers

Of course, this also implies either that your services are stateless or that they can share state with each other safely (perhaps with some sort of clustered database?) but that's a design decision which is specific to each service you want to run.

Regards,

Brian.

Georg Faerber

unread,
Sep 30, 2017, 1:35:47 PM9/30/17
to gan...@googlegroups.com
Hi Fabien,

On 16-04-13 06:47:06, fabien...@gmail.com wrote:
> I'm trying to setup some sort of HA with Ganeti. I read some articles and
> previous email about Ganeti with Pacemaker/Corosync/... and it looks a bit
> tricky.
> http://docs.ganeti.org/ganeti/2.17/html/design-linuxha.html

Did you implement this?

Thanks,
Georg
signature.asc
Reply all
Reply to author
Forward
0 new messages