> It's a bit like the linstor controller without linstor
Interesting development, but as far as I can see, it's rather lower level than that.
Linstor provisions volumes across a cluster: it configures drbd on all the nodes, and creates/destroys LVM or ZFS volumes for the underlying storage. drbdd just watches for events from the drbd resources on the local node, and triggers local plugins when their state changes.
The "
promoter" plugin triggers changes to systemd units. Therefore, if you were running VMs using systemd units to start qemu (and that is certainly a low-level way to run VMs!), on failure you could make it auto-promote another node to primary and restart the VM there. The promoter plugin runs on *all* nodes, and there's a race to see the first one that wins. This means it doesn't need a distributed cluster manager (such as ganeti), since the drbd resource itself acts as the locking/coordination mechanism.
I can see drbdd being useful to signal events to ganeti, but not this auto-promotion; ganeti has a central master architecture. It would want to pick up events on the master, make its own decision which node to promote, and perform it as a job.
Also note: I believe it has been a conscious decision to date that ganeti *won't* auto-migrate an instance to another node when a node fails (*). This drbdd mechanism *could* be used to implement or trigger such auto-healing, if it was so desired. It would however be relying on drbd's own quorum and state management, rather than ganeti's.
IMO, the best way to simplify the ganeti drbd logic would be to get it to talk to the Linstor API. Then you get all the benefits of drbd9, up to 32 replicas, and lvm and zfs underneath, and thin-volumes, and live-migration to diskless nodes. Linstor handles
plain LVM volumes too. With this, ganeti could remove *all* its LVM and DRBD logic!
Regards,
Brian.
(*) This is surprising to some people, because there is harep, which you enable by setting instance tags "ganeti:watcher:autorepair:<type>", e.g. "ganeti:watcher:autorepair:failover". But as the
manpage says:
Harep doesn't do any hardware failure detection on its own, it relies on nodes being marked as offline by the administrator.
So if your server goes down at 3am, the VMs *won't* be automatically restarted on another node. Plus, it doesn't work with shared storage (ext) - only drbd(**)
Previous discussions:
(**) or plain - but since the node with the plain volume has died, it can only reinstall a fresh instance for you.