Ganeti instance failover misbehaving

36 views
Skip to first unread message

Jay

unread,
Dec 27, 2024, 4:58:53 AM12/27/24
to ganeti
Hi

I have 3 nodes and 1 master node. I configured DRBD settings to accommodate issues regarding automatic failover. The system was working fine on manually failing over a node (it falls back to the master node). For testing purposes, I shut down the primary node (node1:masternode) but it resulted in following error:

```
root@masternode:/home/masternode# gnt-instance list
Instance      Hypervisor OS                  Primary_node        Status         Memory
ins1.com kvm        raw-image+default   node1.com      ERROR_nodedown      ?
ins2.com kvm        linux-image+default node2.com      running          8.0G
ins3.com kvm        raw-image+default   node3.com      running          4.0G
ins4.com kvm        raw-image+default   masternode.com running          8.0G
ins5.com kvm        raw-image+default   masternode.com running         32.0G
ins6.com kvm        raw-image+default   masternode.com running         32.0G
```

Also attaching the Post-Failover review report:

```
root@masternode:/home/masternode# gnt-cluster verify
Submitted jobs 2254, 2255
Waiting for job 2254 ...
Thu Dec 26 16:49:08 2024 * Verifying cluster config
Thu Dec 26 16:49:08 2024 * Verifying cluster certificate files
Thu Dec 26 16:49:08 2024 * Verifying hypervisor parameters
Thu Dec 26 16:49:08 2024 * Verifying all nodes belong to an existing group
Waiting for job 2255 ...
Thu Dec 26 16:49:14 2024 * Verifying group 'default'
Thu Dec 26 16:49:14 2024 * Gathering data (4 nodes)
Thu Dec 26 16:49:14 2024 * Gathering information about nodes (4 nodes)
Thu Dec 26 16:49:31 2024 * Gathering disk information (4 nodes)
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: while getting disk information: Error 7: Failed to connect to 192.168.1.231 port 1811: No route to host
Thu Dec 26 16:49:33 2024 * Verifying configuration file consistency
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: Could not verify the SSH setup of this node.
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: Node did not return file checksum data
Thu Dec 26 16:49:33 2024 * Verifying node status
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: while contacting node: Error 7: Failed to connect to 192.168.1.231 port 1811: No route to host
Thu Dec 26 16:49:33 2024   - ERROR: node masternode.com: ssh communication with node 'node1.com': ssh problem: ssh: connect to host node1.com port 22: No route to host\n
Thu Dec 26 16:49:33 2024   - ERROR: node masternode.com: tcp communication with node 'node1.com': failure using the primary interface(s)
Thu Dec 26 16:49:33 2024   - ERROR: node node2.com: ssh communication with node 'node1.com': ssh problem: ssh: connect to host node1.com port 22: No route to host\n
Thu Dec 26 16:49:33 2024   - ERROR: node node2.com: tcp communication with node 'node1.com': failure using the primary interface(s)
Thu Dec 26 16:49:33 2024   - ERROR: node node3.com: ssh communication with node 'node1.com': ssh problem: ssh: connect to host node1.com port 22: No route to host\n
Thu Dec 26 16:49:33 2024   - ERROR: node node3.com: tcp communication with node 'node1.com': failure using the primary interface(s)
Thu Dec 26 16:49:33 2024 * Verifying instance status
Thu Dec 26 16:49:33 2024   - ERROR: instance ins6.com: couldn't retrieve status for disk/0 on node1.com: Error 7: Failed to connect to 192.168.1.231 port 1811: No route to host
Thu Dec 26 16:49:33 2024   - WARNING: instance ins6.com: disk/0 on masternode.com is degraded; local disk state is 'ok'
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: instance ins6.com, connection to secondary node failed
Thu Dec 26 16:49:33 2024   - ERROR: instance ins1.com: instance not running on its primary node node1.com
Thu Dec 26 16:49:33 2024   - ERROR: instance ins1.com: couldn't retrieve status for disk/0 on node1.com: Error 7: Failed to connect to 192.168.1.231 port 1811: No route to host
Thu Dec 26 16:49:33 2024   - WARNING: instance ins1.com: disk/0 on masternode.com is degraded; local disk state is 'ok'
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: instance ins1.com, connection to primary node failed
Thu Dec 26 16:49:33 2024   - ERROR: instance ins4.com: couldn't retrieve status for disk/0 on node1.com: Error 7: Failed to connect to 192.168.1.231 port 1811: No route to host
Thu Dec 26 16:49:33 2024   - WARNING: instance ins4.com: disk/0 on masternode.com is degraded; local disk state is 'ok'
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: instance ins4.com, connection to secondary node failed
Thu Dec 26 16:49:33 2024 * Verifying orphan volumes
Thu Dec 26 16:49:33 2024 * Verifying N+1 Memory redundancy
Thu Dec 26 16:49:33 2024   - ERROR: node node1.com: not enough memory to accomodate instance failovers should node masternode.com fail (40960MiB needed, 0MiB available)
```

How can I get the same result as when manually failing through ` gnt-instance failover ins1` ?


Reply all
Reply to author
Forward
0 new messages