That's interesting. As I understand it, nodes which fail only be marked offline manually. So this implies that harep will never make any repair without manual intervention, is that true?
If this is true, I'd say the documentation is highly misleading.
There it talks only about checking state of *instances*, not the state of *nodes*.
Furthermore, under this model, I can see the value of:
failover: allow instance reboot on the secondary
(i.e. if the node fails, restart the instance on the other node). But I can't see under what circumstance it would use
migrate: allow instance migration
How can a (live) migration be done if either the instance is down or one of the nodes is offline? Conversely, if the instance is working, why would it need migrating to "repair" it?
Regards,
Brian.