Hello All,
I've been testing ganeti (ver 2.11) and I'm a little unclear on the setup and expectations for automated instance failover.
Here's my setup:
* small 3-node cluster running with a few instances on each node
* disk is shared storage (SAN) using the LVM external storage provider
* added the "ganeti:watcher:autorepair:failover" tag to the cluster, each node, and each instance
* harep is running from cron every 2 minutes on the master
** there's a warning from harep that the cluster has inconsistent data: node is missing 802 MB ram. There's plenty of free ram on each node, so it shouldn't be an issue.
I had a node failure today (server crashed) and marked the node as offline. The instance that was running on the failed node is showing as ERROR_nodeoffline but the output from harep indicates that all of the instances are healthy.
Should the failed instance be moved to another node, or is something missing or preventing the instance from failing-over?
Any help clarification would be appreciated.
Thank you!
Scott