Ganeti cluster graceful shutdown

49 views
Skip to first unread message

bamblew...@gmail.com

unread,
Jun 12, 2022, 2:02:20 PM6/12/22
to ganeti
Dear Ganeti Experts,

As the high heat, blackout and hurricaine seasons approach, I am revisiting my organization's disaster recovery procedures. Currently, when a power outage occurs, we have an automated shutdown procedure for our physical ganeti hosts. It kicks off with plenty of time for the instances on each cluster to shutdown gracefully, then shuts down the hosts, before battery power is drained.

It got me thinking though: What happens at the process level when a ganeti host with active instances receives a "shutdown -h now" command? We are running the KVM hypervisor. Does KVM simply kill the running process for the instance, or does it send a "shutdown" signal to the client OS and wait for a reponse, before killing the instance?

-jm

Rudolph Bott

unread,
Jun 12, 2022, 5:56:42 PM6/12/22
to gan...@googlegroups.com
Hi JM, 

 <bamblew...@gmail.com> schrieb am So., 12. Juni 2022, 20:02:
It got me thinking though: What happens at the process level when a ganeti host with active instances receives a "shutdown -h now" command? We are running the KVM hypervisor. Does KVM simply kill the running process for the instance, or does it send a "shutdown" signal to the client OS and wait for a reponse, before killing the instance?

First of all, your init system (systemd or whatever) will shutdown all known services. That includes the ganeti daemons but does not include the KVM/qemu processes. When systemd has stopped all configured/known services, it will attempt to kill and terminate all remaining processes. That will effectively hard-kill all running instances. It does not trigger a regular shutdown inside the instances. 

The only way to gracefully shut down a kvm instance is to connect to its human monitor socket or the qmp socket and issue a shutdown command. That in turn will issue an ACPI power button event to the guest to which it has to react by itself. 
Ganeti actually does the same on instance shutdown: let qemu issue the power button event and wait for up to 120 seconds (default timeout) for the instance process to terminate. If it does not, Ganeti will kill the qemu process. 
In theory you could implement a systemd unit that runs a script on shutdown which discovers all running qemu processes, issues the shutdown command through either of the sockets and waits for the instances to turn off. 
This script should run after the Ganeti processes have been shut down to avoid Ganeti re-starting the qemu processes. 

Cheers
Rudi


-jm

--
You received this message because you are subscribed to the Google Groups "ganeti" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ganeti+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ganeti/aa551d4f-aed9-4c74-9288-134ae2865b46n%40googlegroups.com.

John McNally

unread,
Jun 12, 2022, 7:44:08 PM6/12/22
to gan...@googlegroups.com
Rudi,

Thanks for the details. Interesting and makes total sense. However, I think I will stick with my current cluster shutdown script (18 lines of bash, the same on all nodes) with a very simple algorithm:

1. if this node is the master node:
- issue "gnt-instance shutdown" for each instance

2. if this node is NOT the master node:
- ping the master node continuously until it stops responding

3. shutdown this node

Thanks again.
  
_________________
John McNally
jmcn...@acm.org


Randy Bush

unread,
Jun 13, 2022, 10:23:18 AM6/13/22
to John McNally, gan...@googlegroups.com
> 1. if this node is the master node:
> - issue "gnt-instance shutdown" for each instance

nit: `gnt-instance shutdown --all` for lazy typists :)

randy

Sascha Lucas

unread,
Jun 13, 2022, 11:11:42 AM6/13/22
to gan...@googlegroups.com
Hi,

On Sun, 12 Jun 2022, Rudolph Bott wrote:

> In theory you could implement a systemd unit that runs a script on shutdown
> which discovers all running qemu processes, issues the shutdown command
> through either of the sockets and waits for the instances to turn off.

There is an example init script included in the Ganeti source[1]. Should
work with automatic systemd wrapper.

Thanks, Sascha.

[1] https://github.com/ganeti/ganeti/blob/6ffce6b234ae1b5fe3c948269683cb20f2b50f65/doc/examples/ganeti-kvm-poweroff.initd.in

Brian Candler

unread,
Jun 19, 2022, 1:42:02 PM6/19/22
to ganeti
If you want them all to restart again if/when the master reboots, then you can do:

gnt-cluster watcher pause 3600
gnt-instance shutdown --all --no-remember


John McNally

unread,
Jun 22, 2022, 12:22:52 PM6/22/22
to gan...@googlegroups.com
Ooh, that's nice! Good to know.

Is there a way to easily restart just the instances that were running, excluding those in the ADMIN_down state?
  
_________________
John McNally
jmcn...@acm.org

--
You received this message because you are subscribed to a topic in the Google Groups "ganeti" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ganeti/55L6PNRRw3I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ganeti+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ganeti/eb154aad-1ee0-4ad6-959b-16103a9f0435n%40googlegroups.com.

Brian Candler

unread,
Jun 23, 2022, 2:33:46 AM6/23/22
to ganeti
If you shutdown with --no-remember then it still considers them as up - and therefore the watcher will consider them failed and restart them.  (That's why you have to disable the watcher, otherwise it may restart them almost immediately)
Reply all
Reply to author
Forward
0 new messages