automatic failover with ganeti.

143 views
Skip to first unread message

BRULÉ Yann

unread,
Sep 6, 2009, 1:19:27 PM9/6/09
to Pkg-gane...@lists.alioth.debian.org, gan...@googlegroups.com

Hi,

 

I have a ganeti cluster with 3 vm. I can failover instance when one failed with gnt-instance failover… and migrate an instance when both nodes are running.

But how could I do an automatic failover, who detects auto when one nodes failed?

Is anybody use heartbeat? Is it possible to do this with heartbeat?

 

Thank you,

 

Best regards.

 

Sp4rKY

unread,
Sep 17, 2009, 2:42:02 AM9/17/09
to ganeti
Hi,

There already is a ganeti-watcher script, which try to restart vm when
they are off (because of a crash, or a node reboot etc).

Anyway, atm, the script detect that the node is down/offline, but
doesn't deal with.
The code is in /usr/sbin/ganeti-watcher, lines 410/413 for me :

elif instance.state in HELPLESS_STATES:
if notepad.NumberOfRestartAttempts(instance):
notepad.RemoveInstance(instance)

I think it will be pretty simple to implements failover here.
Anyway, maybe it's not implemented for a good reason that I don't
know, devs ?

I think the way to do that is to extend the class Instance defined is
the script (which is a mapper for administration call) with a method
Failover which
calls opcode.OpFailoverInstance

Anyway, for me it's a 4 lines patch, so it's strange it's not
implemented yet, and maybe there is some good reason for that. devs ?

Cheers,

Maxence

On 6 sep, 19:19, BRULÉ Yann <Yann.BR...@supinfo.com> wrote:
> Hi,
>
> I have a ganeti cluster with 3 vm. I can failover instance when one failed with gnt-instance failover... and migrate an instance when both nodes are running.

Sp4rKY

unread,
Sep 17, 2009, 3:03:24 AM9/17/09
to ganeti
Ok, just made a quick-and-nasty patch :)

I have no time to test it atm, so if you can do so :) Please let me
know if it works :)

Cheers,

Maxence

=================================
--- a/usr/sbin/ganeti-watcher Thu Sep 17 08:50:46 2009 +0200
+++ b/usr/sbin/ganeti-watcher Thu Sep 17 09:01:50 2009 +0200
@@ -253,6 +253,12 @@
op = opcodes.OpActivateInstanceDisks(instance_name=self.name)
cli.SubmitOpCode(op, cl=client)

+ def Failover(self):
+ """Encapsulates the failover of an instance.
+
+ """
+ op = opcodes.opFailoverInstance(instance_name=self.name)
+ cli.SubmitOpCode(op, cl=client)

def GetClusterData():
"""Get a list of instances on this cluster.
@@ -409,8 +415,14 @@

notepad.RecordRestartAttempt(instance)
elif instance.state in HELPLESS_STATES:
- if notepad.NumberOfRestartAttempts(instance):
- notepad.RemoveInstance(instance)
+ logging.info("Instance's primary node is down, try to
failover")
+ try:
+ instance.Failover()
+ self.started_instances.add(instance.name)
+ except Exception:
+ logging.exception("Error while failovering instance %s",
+ instance.name)
+ notepad.RecordRestartAttempt(instance)
else:
if notepad.NumberOfRestartAttempts(instance):
notepad.RemoveInstance(instance)
=================================

Loic Dachary

unread,
Sep 17, 2009, 3:08:01 AM9/17/09
to gan...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Did you add a test to the test suite that validates it works as expected ?

Cheers

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqx4FEACgkQ8dLMyEl6F2058QCfZELq7/pq+ukLuHT7ZfeoE6hk
0uwAn2MEC2W2vzR7lVm5seIkmb79JtZ4
=T59u
-----END PGP SIGNATURE-----

loic.vcf

Sp4rKY

unread,
Sep 17, 2009, 3:15:58 AM9/17/09
to ganeti
The testcases which are in the git are only for ganeti base classes.
I patched an executable script, which doesn't have any test script :)

Cheers,

Maxence
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>
> iEYEARECAAYFAkqx4FEACgkQ8dLMyEl6F2058QCfZELq7/pq+ukLuHT7ZfeoE6hk
> 0uwAn2MEC2W2vzR7lVm5seIkmb79JtZ4
> =T59u
> -----END PGP SIGNATURE-----
>
>  loic.vcf
> < 1 000AfficherTélécharger

Iustin Pop

unread,
Sep 17, 2009, 4:10:10 AM9/17/09
to gan...@googlegroups.com
On Wed, Sep 16, 2009 at 11:42:02PM -0700, Sp4rKY wrote:
>
> Hi,
>
> There already is a ganeti-watcher script, which try to restart vm when
> they are off (because of a crash, or a node reboot etc).
>
> Anyway, atm, the script detect that the node is down/offline, but
> doesn't deal with.
> The code is in /usr/sbin/ganeti-watcher, lines 410/413 for me :
>
> elif instance.state in HELPLESS_STATES:
> if notepad.NumberOfRestartAttempts(instance):
> notepad.RemoveInstance(instance)
>
> I think it will be pretty simple to implements failover here.
> Anyway, maybe it's not implemented for a good reason that I don't
> know, devs ?
>
> I think the way to do that is to extend the class Instance defined is
> the script (which is a mapper for administration call) with a method
> Failover which
> calls opcode.OpFailoverInstance
>
> Anyway, for me it's a 4 lines patch, so it's strange it's not
> implemented yet, and maybe there is some good reason for that. devs ?

Ganeti doesn't have any STONITH method, which means that just because we
can't reach over the network a node, it's not necessary that the node
itself is indeed down.

Without special hardware (controllable power switches, shared storage,
etc.) it's not possible to do this in a guaranteed-safe way. And
speaking from experience, yes it can be that a node is no longer
reachable over the network but its instances are.

So what it could happen is that the watcher would restart the instance
on its secondary node, but the instance already runs on the original
node, *with the same IP and MAC*, which would be pretty difficult to
diagnose.

So we left this part (automated failover) as an “exercise for the user”.
While we do intend to slowly improve things, we don't yet have a clear
design on this.

regards,
iustin

BRULÉ Yann

unread,
Oct 2, 2009, 4:20:26 AM10/2/09
to gan...@googlegroups.com, pkg-gane...@lists.alioth.debian.org
Hi !

I try to download ganeti-watcher but I don't know where to find it. Someone could tell me where I can find it and how to configure it to do an automatic failover?

Thank you !

Have a good day.

-----Message d'origine-----
De : gan...@googlegroups.com [mailto:gan...@googlegroups.com] De la part de Sp4rKY
Envoyé : jeudi 17 septembre 2009 09:16
À : ganeti
Objet : Re: automatic failover with ganeti.

Michael Hanselmann

unread,
Oct 2, 2009, 4:35:40 AM10/2/09
to gan...@googlegroups.com, pkg-gane...@lists.alioth.debian.org, Yann....@supinfo.com
2009/10/2 BRULÉ Yann <Yann....@supinfo.com>:

> I try to download ganeti-watcher but I don't know where to find it. Someone could tell me where I can find it and how to configure it to do an automatic failover?

ganeti-watcher is part of Ganeti (daemons/ganeti-watcher and
$prefix/sbin/ganeti-watcher after installation). It can't do automatic
failovers.

Regards,
Michael

Reply all
Reply to author
Forward
0 new messages