Error when starting an instance

227 views
Skip to first unread message

Léa Mckenzie

unread,
Nov 8, 2022, 5:25:09 AM11/8/22
to ganeti
Hello All,

I'm a real newbie on Ganeti and i'm taking an old infrastructure left by the previous IT guys.

I got a node that goes down that I bring up, then impossible to start any instance on it from the master.

Below my error message from the master: 
Job 671721 for jump.admin has failed: Failure: prerequisites not met for this operation:
error type: environment_error, error details:
Can't get data from node server-06: Error while executing backend function: Unable to get KVM --help output

 
When running : gnt-node list 
Node           DTotal  DFree MTotal  MNode  MFree Pinst Sinst
server-01      4.0T  11.0G 503.4G 379.9G 150.4G    53     0
server -02     4.0T 531.0G 503.4G 333.5G 207.4G    47     0
server -03     4.0T 119.0G 503.4G 345.7G 235.0G    42     0
server -04     3.9T 690.5G 503.4G 380.0G 239.6G    26     0
server -05     3.9T  68.5G 503.4G 391.1G 131.3G    39     0
server -06        ?          ?           ?             ?             ?        53     0
server -08      4.0T 456.0G 503.4G 355.0G 176.7G    30     0

Could someone walk me to the path of understanding this product.

Many thanks

Léa Mckenzie

unread,
Nov 8, 2022, 8:33:30 AM11/8/22
to ganeti
All my node distribution :

CentOS Linux release 7.4.1708 (Core)

Ganeti version : 
ganeti.x86_64                         2.15.2-2.el7.centos             installed
ganeti-sysvinit.x86_64                2.15.2-2.el7.centos             installed


My gnt-cluster verify : 

[server-02 ~]# gnt-cluster verify
Submitted jobs 671782, 671783
Waiting for job 671782 ...
Tue Nov  8 14:21:42 2022 * Verifying cluster config
Tue Nov  8 14:21:42 2022 * Verifying cluster certificate files
Tue Nov  8 14:21:42 2022   - ERROR: cluster: While verifying /var/lib/ganeti/spice.pem: Certificate is expired (valid from 2016-04-20 17:09:17 to 2021-04-19 17:09:17)
Tue Nov  8 14:21:42 2022   - ERROR: cluster: While verifying /var/lib/ganeti/spice-ca.pem: Certificate is expired (valid from 2016-04-20 17:09:17 to 2021-04-19 17:09:17)
Tue Nov  8 14:21:42 2022 * Verifying hypervisor parameters
Tue Nov  8 14:21:42 2022 * Verifying all nodes belong to an existing group
Waiting for job 671783 ...
Tue Nov  8 14:21:44 2022 * Verifying group 'default'
Tue Nov  8 14:21:44 2022 * Gathering data (10 nodes)
Tue Nov  8 14:21:44 2022 * Gathering information about nodes (10 nodes)
Tue Nov  8 14:21:49 2022 * Gathering disk information (10 nodes)
Tue Nov  8 14:21:53 2022 * Verifying configuration file consistency
Tue Nov  8 14:21:53 2022   - ERROR: cluster: Client certificate digest of master candidate '947090e9-31bd-45fd-b452-5d56e9e4d3d6' does not match its entry in the cluster's map of master candidate certificates. Expected: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47 Got: C5:AF:1D:77:AC:81:C0:7D:7F:F8:80:ED:3D:BB:B7:22:C2:7D:C2:DB
Tue Nov  8 14:21:53 2022   - ERROR: cluster: Client certificate digest of master candidate 'a60d12fb-d381-4a03-84ad-8519b292e200' does not match its entry in the cluster's map of master candidate certificates. Expected: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47 Got: A5:4F:37:54:A0:83:B7:CF:D9:19:10:F9:3A:CD:99:63:30:01:27:69
Tue Nov  8 14:21:53 2022   - ERROR: cluster: Client certificate digest of master candidate 'f0f5b568-dc59-4f35-b377-315ababb4a53' does not match its entry in the cluster's map of master candidate certificates. Expected: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47 Got: F3:B9:F0:81:01:73:2B:B1:B9:D7:13:5B:73:11:11:DC:CA:C6:21:DB
Tue Nov  8 14:21:53 2022   - ERROR: cluster: Client certificate digest of master candidate '7d863cb0-4f76-482e-8d55-e62c3a2b49ae' does not match its entry in the cluster's map of master candidate certificates. Expected: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47 Got: BD:AE:D3:BC:1D:40:94:6E:59:E2:90:A7:6A:AC:88:19:52:1F:FF:90
Tue Nov  8 14:21:53 2022   - ERROR: cluster: Client certificate digest of master candidate '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a' does not match its entry in the cluster's map of master candidate certificates. Expected: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47 Got: 06:09:4F:EE:6D:D6:D3:69:77:A3:42:A0:B1:78:CD:A6:1A:4E:BD:6A
Tue Nov  8 14:21:53 2022   - ERROR: node: The following node UUIDs are listed in the public key file on node 'server-05.corp', but are not potential master candidates: flexo.corp, server-06.corp, server-07.corp. The following node UUIDs of potential master candidates are missing in the public key file on node server-05.corp: e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf. A SSH key of master candidate 'server-05.corp' (UUID: '947090e9-31bd-45fd-b452-5d56e9e4d3d6') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-01.corp' (UUID: 'a60d12fb-d381-4a03-84ad-8519b292e200') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-04.corp' (UUID: 'f0f5b568-dc59-4f35-b377-315ababb4a53') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-08.corp' (UUID: '7d863cb0-4f76-482e-8d55-e62c3a2b49ae') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-05.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-05.corp'.
Tue Nov  8 14:21:53 2022   - ERROR: node: The following node UUIDs are listed in the public key file on node 'server-01.corp', but are not potential master candidates: flexo.corp, server-06.corp, server-07.corp. The following node UUIDs of potential master candidates are missing in the public key file on node server-01.corp: e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf. A SSH key of master candidate 'server-05.corp' (UUID: '947090e9-31bd-45fd-b452-5d56e9e4d3d6') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-01.corp' (UUID: 'a60d12fb-d381-4a03-84ad-8519b292e200') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-04.corp' (UUID: 'f0f5b568-dc59-4f35-b377-315ababb4a53') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-08.corp' (UUID: '7d863cb0-4f76-482e-8d55-e62c3a2b49ae') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-01.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-01.corp'.
Tue Nov  8 14:21:53 2022   - ERROR: node: The following node UUIDs are listed in the public key file on node 'server-04.corp', but are not potential master candidates: flexo.corp, server-06.corp, server-07.corp. The following node UUIDs of potential master candidates are missing in the public key file on node server-04.corp: e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf. A SSH key of master candidate 'server-05.corp' (UUID: '947090e9-31bd-45fd-b452-5d56e9e4d3d6') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-01.corp' (UUID: 'a60d12fb-d381-4a03-84ad-8519b292e200') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-04.corp' (UUID: 'f0f5b568-dc59-4f35-b377-315ababb4a53') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-08.corp' (UUID: '7d863cb0-4f76-482e-8d55-e62c3a2b49ae') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-03.corp' (UUID: '0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-04.corp'. A SSH key of master candidate 'server-02.corp' (UUID: '39d98794-89f2-42e7-b52a-6515dc165cb3') is not in the 'authorized_keys' file of node 'server-04.corp'.
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: Could not verify the SSH setup of this node.
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: Node did not return file checksum data
Tue Nov  8 14:21:53 2022 * Verifying node status
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-05.corp: missing bridges: vmbr43
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-01.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: hypervisor kvm parameter verify failure (source instance w10-mobile-sim.corp): Parameter 'cdrom2_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10_x86_64_virtio.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: hypervisor kvm parameter verify failure (source instance nemesis.corp): Parameter 'cdrom2_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10_x86_64_virtio.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: hypervisor kvm parameter verify failure (source instance banana.integ): Parameter 'cdrom2_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/virtio-win-0.1-74.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom2_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/virtio-win-0.1-74.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom2_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/virtio-win-0.1-74.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-04.corp: missing bridges: vmbr43
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: while contacting node: Error while executing backend function: Unable to get KVM --help output
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-08.corp: missing bridges: vmbr43
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-03.corp: missing bridges: vmbr43
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: hypervisor kvm parameter verify failure (source instance sage.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: hypervisor kvm parameter verify failure (source instance support-windows10.integ): Parameter 'cdrom_image_path' fails validation: not found or not a file or URL (current value: '/srv/ganeti/iso/windows10.iso')
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-08.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-03.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-02.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-04.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-05.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: ssh communication with node 'server-01.corp': ssh problem: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\'r\n
Tue Nov  8 14:21:53 2022   - ERROR: node server-02.corp: missing bridges: vmbr43
Tue Nov  8 14:21:53 2022 * Verifying instance status
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance packages.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance vmdev-022.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance bench-mcms-be4.rdvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance zk3.build, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance jump.admin, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance copypaste.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance test-rre.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance locol-ans.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance papyrus-replica.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance lab-otn-igloo.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance malitel.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance vmdev-017.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance centos6u5-test-sctp.build.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance azercel-iris.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance lab-otn-lms.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance pandora.admin, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance msproject-1.sales, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance task-force-qkv-db3.rdvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance vmdev-010.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance lab-aneep.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance lms-otn-51.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance w10-or-tools.integ, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance sage.integ, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance osm-ksa.psvm, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance vmroots.corp, connection to primary node failed
Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance vmdev-021.corp, connection to primary node failed

Tue Nov  8 14:21:53 2022   - ERROR: node server-06.corp: instance metronome.rdvm, connection to primary node failed
Tue Nov  8 14:21:53 2022 * Verifying orphan volumes
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/centos-6j.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/zk2.rdvm is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/cores is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/zk3.rdvm is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/bf51239e-7f44-44a2-bed1-17549601789a.disk0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/nagios.corp is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/bb-rhea.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/spools is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/bb-cronos.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/spools.rd-deps is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-05.corp: volume ganeti/jedox.integ is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-01.corp: volume ganeti/mijiu.snap is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-01.corp: volume ganeti/centos-5.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-01.corp: volume ganeti/centos-6b.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-01.corp: volume ganeti/centos-5b.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/one-nightly.rh6.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/centos-6f.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/smsc.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/lms-52.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/igl-jfe07.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/one-nightly.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/igl-jfe06.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/mcms-50.rdvm.0 is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-04.corp: volume ganeti/centos-7a.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/bb-theia.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/gerrit-preprod.corp is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/bkp_git.corp is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/artemis.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/buster-template is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/git.corp is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/demeter.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-08.corp: volume ganeti/hermes.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/ares.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/paste.corp is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/git-preprod is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/centos-6e.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/hestia.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/server_transfer is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/centreon.snap is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-03.corp: volume ganeti/centos-6c.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-02.corp: volume ganeti/osm.integ is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-02.corp: volume ganeti/centos-7b.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-02.corp: volume ganeti/tmp_edmz_export is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-02.corp: volume ganeti/centos-6.build is unknown
Tue Nov  8 14:21:53 2022   - WARNING: node server-02.corp: volume ganeti/centos-6u8.integ is unknown
Tue Nov  8 14:21:53 2022 * Verifying N+1 Memory redundancy
Tue Nov  8 14:21:53 2022 * Other Notes
Tue Nov  8 14:21:53 2022   - NOTICE: 310 non-redundant instance(s) found.
Tue Nov  8 14:21:53 2022   - NOTICE: 3 offline node(s) found.
Tue Nov  8 14:21:54 2022 * Hooks Results

Rudolph Bott

unread,
Nov 8, 2022, 6:07:57 PM11/8/22
to gan...@googlegroups.com
Hi Léa, 

welcome to the world of Ganeti :-) The cluster you have inherited there seems to be in a pretty bad state. Ganeti uses both SSH and HTTPS for internal communication and both seem to be borked in multiple ways.

You can read more about the communication channels here: https://docs.ganeti.org/docs/ganeti/2.15/html/security.html

I would suggest to fix SSH first, which should greatly reduce the errors thrown by gnt-cluster verify.  To do so, you can issue this on your master node:

gnt-cluster renew-crypto --new-ssh-keys

If this fails, you can still manually try to fix the situation. Log in to each of your servers and collect all SSH Public Keys for root. This should either bei /root/.ssh/id_dsa.pub or /root/.ssh/id_rsa.pub (depending on the type of keys used in your cluster). Build a list of all those public keys and put them into /root/.ssh/authorized_keys on all your nodes and re-check gnt-cluster verify again. If (and only if!) all SSH-related errors are gone now, you can try and fix your TLS certificates (which have been expired and also have other problems, according to gnt-cluster verify):

gnt-cluster renew-crypto --new-cluster-certificate
gnt-cluster renew-crypto --new-node-certificates

Please let us know the output of gnt-cluster verify after these steps.




--
You received this message because you are subscribed to the Google Groups "ganeti" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ganeti+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ganeti/82659f0a-8d27-4bc3-867a-b128ba29ce83n%40googlegroups.com.

Léa Mckenzie

unread,
Nov 9, 2022, 10:18:49 AM11/9/22
to ganeti
Hi,

Thank you for your answer, I run the command : 

gnt-cluster renew-crypto --new-ssh-keys

But it fails so, I manually put all the ssh keys as you suggest.
Then I try to run the command again, just to see if it will work after, unfortunately I got the following error after the server saying that “The authenticity of host 'server-08' can't be established: “

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Wed Nov  9 16:02:42 2022 Renewing SSH keys
Failure: command execution error:
Could not renew the SSH keys of all nodes: Error while executing backend function: Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oPort=22 -oStrictHostKeyChecking=no -4 root@server-02 '/bin/sh -c /usr/lib64/ganeti/ssh-update'' failed: exited with exit code 255


When running gnt-cluster verify I have no SSH error anymore, so I will run the renew-crypto for the certificates and let you know

Léa Mckenzie

unread,
Nov 9, 2022, 10:49:24 AM11/9/22
to ganeti
I only run gnt-cluster renew-crypto --new-cluster-certificate

But at the end that ask me for the password of the master ( I didn't know that name), the time for me to find the correct password that fails : 

[root@server-02 ~]# gnt-cluster renew-crypto --new-cluster-certificate

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
Note: skipping offline node(s):
Blocking watcher
Stopping master daemons
Stopping daemons on server-02.corp
Starting daemon 'ganeti-noded' on server-02.corp
Starting daemon 'ganeti-wconfd' on server-02.corp
Stopping daemons on server-01.corp
Stopping daemons on server-03.corp
Stopping daemons on server-04.corp
Stopping daemons on server-05.corp
Stopping daemons on server-06.corp
Stopping daemons on server-08.corp
Updating the cluster SSL certificate.
Copying /var/lib/ganeti/server.pem to server-01.corp:22
Copying /var/lib/ganeti/server.pem to server-03.corp:22
Copying /var/lib/ganeti/server.pem to server-04.corp:22
Copying /var/lib/ganeti/server.pem to server-05.corp:22
Copying /var/lib/ganeti/server.pem to server-06.corp:22
Copying /var/lib/ganeti/server.pem to server-08.corp:22
Updating client SSL certificates.
ro...@gnt.corp's password:
Permission denied, please try again.
ro...@gnt.corp's password:
Permission denied, please try again.
ro...@gnt.corp's password:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Failure: command execution error:
Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-02.corp '/bin/sh -c /usr/lib64/ganeti/ssl-update'' failed: exited with exit code 255
Starting daemons on server-01.corp
Starting daemons on server-03.corp
Starting daemons on server-04.corp
Starting daemons on server-05.corp
Starting daemons on server-06.corp
Starting daemons on server-08.corp
Stopping daemon 'ganeti-wconfd' on server-02.corp
Stopping daemon 'ganeti-noded' on server-02.corp
Starting daemons on server-02.corp
Failure: command execution error:
Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-02.corp '/bin/sh -c /usr/lib64/ganeti/ssl-update'' failed: exited with exit code 255


Now when I try to run any command I got : 
[root@server-02 ~]# gnt-cluster verify
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


When running gnt-job list I got the same result.

Your valuable help will be really appreciate

Rudolph Bott

unread,
Nov 9, 2022, 11:39:03 AM11/9/22
to gan...@googlegroups.com
Hi Lèa,

it seems that is currently not possible for the Ganeti master to talk to all nodes via SSH. You can verify this manually by issuing

ssh -l root $node

..and see if you get a password-less login. If that works out, the node should be fine. If it does not, you might need to fix things:
* make sure that your SSH public key is present in the remote /root/.ssh/authorized_keys file
* clear any host-key related errors (e.g. accept the host key from that node, remove conflicting ones from your known_hosts file)

You should be able to (re)start the ganeti-related daemons by issuing locally on each server:

service ganeti restart

This will not affect any running instances, only restart the ganeti daemons. After that, you should be able to use gnt-* commands again on your master node.

Please post the output of gnt-cluster verify afterwards.

Cheers,
Rudi



--
 Rudolph Bott - bo...@sipgate.de

 sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
 HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
 Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391

Léa Mckenzie

unread,
Nov 9, 2022, 5:18:39 PM11/9/22
to ganeti
Hi Rudi,

From the master, I was able to log in with ssh on all nodes without password.

I went on all node and restart ganeti service : 

service ganeti restart

But I'm still not able to run any gnt-* command, I got the following error :

[root@server-02 ~]# gnt-cluster verify
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


When locking ganeti-Luxid.service or ganeti-wconfd status on master :

 
[root@server-02 ~]# systemctl status ganeti-luxid.service
● ganeti-luxid.service - Ganeti query daemon (luxid)
   Loaded: loaded (/usr/lib/systemd/system/ganeti-luxid.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-11-09 22:57:34 CET; 46s ago
     Docs: man:ganeti-luxid(8)
 Main PID: 181465 (ganeti-luxid)
    Tasks: 1
   Memory: 23.4M
   CGroup: /system.slice/ganeti-luxid.service
           └─181465 /usr/sbin/ganeti-luxid -f

Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-02.corp", nodePrimaryIp = "192.168.2.15", nodeSecondaryIp = "192.168.2.15", node...nodeVmCapable
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", node...nodeVmCapable
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", node...nodeVmCapable
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeMa...deVmCapable =
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", node...nodeVmCapable
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", node...nodeVmCapable
Nov 09 22:58:16 server-02 ganeti-luxid[181465]: No voting RPC result from ["server-03.corp","server-02.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.corp"]
Hint: Some lines were ellipsized, use -l to show in full.

--------

root@server-02 ~]# systemctl status ganeti-wconfd.service
● ganeti-wconfd.service - Ganeti config writer daemon (wconfd)
   Loaded: loaded (/usr/lib/systemd/system/ganeti-wconfd.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-11-09 23:08:29 CET; 54s ago
     Docs: man:ganeti-wconfd(8)
 Main PID: 183474 (ganeti-wconfd)
    Tasks: 1
   Memory: 25.9M
   CGroup: /system.slice/ganeti-wconfd.service
           └─183474 /usr/sbin/ganeti-wconfd -f

Nov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-02.corp", nodePrimaryIp = "192.168.2.15", nodeSecondaryIp = "192.168.2.15", node...nodeVmCapable
ov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", node...nodeVmCapable
Nov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", node...nodeVmCapable
Nov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeMa...deVmCapable =
Nov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", node...nodeVmCapable
Nov 09 23:09:11 server-02 ganeti-luxid[181465]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", node...nodeVmCapable
Nov 09 23:09:11 server-02 ganeti-luxid[181465]: No voting RPC result from ["server-03.corp","server-02.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.corp"]
Hint: Some lines were ellipsized, use -l to show in full.

It is really frustrating as I don't really understand what could cause the issue, and solving one issue brings me to another without solving the majer one.

Thank you for helping me, Rudi.

Léa Mckenzie

unread,
Nov 9, 2022, 5:40:01 PM11/9/22
to ganeti
When digging on the log I saw this :

Nov  9 23:29:01 server-02 ganeti-luxid: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Mon May  7 15:54:22 CEST 2018, nodeMtime = Mon May  7 15:54:22 CEST 2018, nodeUuid = "f0f5b568-dc59-4f35-b377-315ababb4a53", nodeSerial = 1, nodeTags = fromList []}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."

So I try to ran 

gnt-cluster renew-crypto --new-node-certificates

But still same facing the same issue timeout :

 
[root@server-02 ~]# gnt-cluster renew-crypto --new-node-certificates

Updating certificates now. Running "gnt-cluster verify" is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out

Simon Maillard

unread,
Nov 10, 2022, 3:37:56 AM11/10/22
to gan...@googlegroups.com
On 09/11/2022 16:49, Léa Mckenzie wrote:
> Command 'ssh -oEscapeChar=none -oHashKnownHosts=no
> -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts
> -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gnt.corp
> -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-02.corp '/bin/sh -c
> /usr/lib64/ganeti/ssl-update'' failed: exited with exit code 255

It looks like your master node can't connect to itself.
can you try

# ssh ro...@server-02.corp

It will probably complain about bad host key.
remove it from your know_hosts file then run it again to accept the host key

then

# ssh -oStrictHostKeyChecking=yes ro...@server-02.corp

then retry

# gnt-cluster renew-crypto --new-cluster-certificate



--
Simon Maillard

Parce que sinon ça rend la discussion incompréhensible.
> Pourquoi ça ?
>> Je préfère répondre en dessous.
>>> Que faites-vous à la place ?
>>>> Non.
>>>>> Vous n'aimez pas répondre au-dessus ?

Léa Mckenzie

unread,
Nov 10, 2022, 3:53:09 AM11/10/22
to ganeti
Hi Simon,

Thanks for your help.

My master is able to connect to himself via ssh.

[root@server-02 ~]# ssh -oStrictHostKeyChecking=yes ro...@server-02.corp
Last login: Thu Nov 10 09:46:14 2022 from server-02.corp
[root@server-02 ~]# exit
logout
Connection to server-02.corp closed.


Unfortunately, that didn't solve the issue, and I'm still not able to run any gnt-* command 

[root@server-02 ~]# gnt-cluster renew-crypto --new-cluster-certificate
Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out

Simon Maillard

unread,
Nov 10, 2022, 4:34:56 AM11/10/22
to gan...@googlegroups.com
On 10/11/2022 09:53, Léa Mckenzie wrote:
> My master is able to connect to himself via ssh.
>
> [root@server-02 ~]# ssh -oStrictHostKeyChecking=yes ro...@server-02.corp
> Last login: Thu Nov 10 09:46:14 2022 from server-02.corp

And this ?

# ssh -oEscapeChar=none -oHashKnownHosts=no
-oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts
-oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gnt.corp
-oPort=22 -oStrictHostKeyChecking=yes ro...@server-02.corp

what is the output of

# ps auxww |grep ganeti

Léa Mckenzie

unread,
Nov 14, 2022, 9:10:00 AM11/14/22
to ganeti
Hi Simon,

My apologies for the delay, another server not related to ganeti was down (I'm crying every time I see this infras)

I run the command, and I am now able to connect (I was not able with the full command) : 

[root@server-02 ~]# ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes ro...@server-02.corp
Last login: Mon Nov 14 14:33:23 2022 from 192.168.xx.xx


Then I retry the command but still got the error :

[root@server-02 ~]# gnt-cluster renew-crypto --new-cluster-certificate
Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


Here is the result of the PS, i just removed my running instance to have less think on it : 

[root@server-02 ~]# ps auxww | grep ganeti

root      30843  9.0  0.0 242028 33904 ?        Rs   14:53   0:00 /usr/sbin/ganeti-wconfd -f
root      30844  8.6  0.0 243652 34968 ?        Rs   14:53   0:00 /usr/sbin/ganeti-luxid -f
root      30846  6.0  0.0 284736 22344 ?        Ss   14:53   0:00 /usr/bin/python2 /usr/sbin/ganeti-rapi -f
root      30865  0.0  0.0 112816   976 pts/0    S+   14:53   0:00 grep --color=auto ganeti
root      36081  0.0  0.0 235824 37264 ?        Ss    2021 265:33 /usr/sbin/ganeti-confd -f
root     117309  0.0  0.0 280532 133316 ?       SLs  Nov09   2:46 /usr/bin/python2 /usr/sbin/ganeti-noded -f


I have got that on my log : 

Nov 14 15:04:47 server-02 ganeti-luxid: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Mon May  7 15:54:22 CEST 2018, nodeMtime = Mon May  7 15:54:22 CEST 2018, nodeUuid = "f0f5b568-dc59-4f35-b377-315ababb4a53", nodeSerial = 1, nodeTags = fromList []}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."

Maybe they are a wait to manually put the certificate and make sure this is trusted by my node.

Thanks again.

Léa Mckenzie

unread,
Nov 17, 2022, 7:16:01 AM11/17/22
to ganeti
Hi All,

I'am still stuck on my issue and I really think as per the log that this is some certificate problem. Have run to couple of documentation and find that one : https://docs.ganeti.org/docs/ganeti/2.15/html/cluster-keys-replacement.html?highlight=cluster%20keys%20replacement 

As I cannot run command du to this error message : 

Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


And this one as well : 

ganeti-wconfd[186903]: Error in the RPC HTTP reply from 'Node
ganeti-luxid[187496]: Error in the RPC HTTP reply from 'Node

I try that : 

systemctl stop ganeti-wconfd
systemctl stop ganeti-luxid.service
/usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
/usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it


Then ran :
gnt-cluster renew-crypto --new-cluster-certificate --new-node-certificates

but get fails when the Daemons start on my master node.Same log with RPC.

So I re-run the command to stop voting and to be able to do a 

#gnt-cluster verify

and got plenty error :

root@server-02 ~]# gnt-cluster verify
Submitted jobs 672688, 672689
Waiting for job 672688 ...
Thu Nov 17 12:42:24 2022 * Verifying cluster config
Thu Nov 17 12:42:24 2022 * Verifying cluster certificate files
Thu Nov 17 12:42:24 2022 * Verifying hypervisor parameters
Thu Nov 17 12:42:24 2022 * Verifying all nodes belong to an existing group
Waiting for job 672689 ...
Thu Nov 17 12:42:25 2022 * Verifying group 'default'
Thu Nov 17 12:42:25 2022 * Gathering data (10 nodes)
Thu Nov 17 12:42:25 2022 * Gathering information about nodes (10 nodes)
Thu Nov 17 12:42:31 2022 * Gathering disk information (10 nodes)
Thu Nov 17 12:42:33 2022   - ERROR: node server-05.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022   - ERROR: node server-01.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022   - ERROR: node server-04.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022   - ERROR: node server-06.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022   - ERROR: node server-08.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022   - ERROR: node server-03.corp: while getting disk information: Error 60: Peer's certificate issuer has been marked as not trusted by the user.
Thu Nov 17 12:42:33 2022 * Verifying configuration file consistency
Thu Nov 17 12:42:33 2022   - ERROR: cluster: Client certificate digest of master candidate '39d98794-89f2-42e7-b52a-6515dc165cb3' does not match its entry in the cluster's map of master candidate certificates. Expected: DF:DA:EF:44:93:61:22:43:FE:D3:77:6C:64:F5:15:2E:F7:4B:F3:12 Got: 0C:79:70:85:F0:EE:63:37:D3:9C:40:7C:A3:AA:E9:A7:44:43:D4:47


Please anyone to help, I really need to get those server back to production

Jun Futagawa

unread,
Nov 17, 2022, 8:28:49 AM11/17/22
to ganeti
 Hi Léa,

Since stopping ganeti does not affect the running VM, it may be a good idea
to stop ganeti of the node with the problem once and copy configuration files
from the master node.

The following procedure may differ slightly because I don't know ganeti-2.15.2-2.el7.centos RPM you are using.

# The RPM we built is ganeti-2.15.2-2.el7 (not el7.centos).
# https://jfut.integ.jp/linux/ganeti/7/x86_64/ganeti-2.15.2-2.el7.x86_64.rpm

1. Stop ganeti on server-06
Check systemd service and target:
for i in $(systemctl list-unit-files | grep ganeti | awk '{ print $1 }'); do echo systemctl stop $i; done

Stop everything:
systemctl stop ganeti-common.service
systemctl stop ganeti-confd.service
systemctl stop ganeti-kvmd.service
systemctl stop ganeti-luxid.service
systemctl stop ganeti-metad.service
systemctl stop ganeti-mond.service
systemctl stop ganeti-noded.service
systemctl stop ganeti-rapi.service
systemctl stop ganeti-wconfd.service
systemctl stop ganeti.service
systemctl stop ganeti-node.target
systemctl stop ganeti.target

2. Backup /var/lib/ganeti on server-06
mv /var/lib/ganeti /var/lib/ganeti.`date -I`

3. Copy /var/lib/ganeti from server-02(master node) to server-06
scp -rp /var/lib/ganeti server-06:/var/lib/

4. Start ganeti on server-06
Start everything:
systemctl start ganeti.target
systemctl start ganeti-kvmd.service
and others as needed.

5. Run gnt-cluster verify on server-02(master node)
If the problem of server-06 has been resolved, do the same with other problem nodes.

If the problem persists, I recommend checking the algorithm and expiration date of the certificate.

# on server-02(master node)
$ for i in $(ls /var/lib/ganeti/*.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
Signature Algorithm: sha256WithRSAEncryption
Not Before: Sep 24 02:20:27 2022 GMT
Not After : Sep 23 02:20:27 2027 GMT
Public Key Algorithm: rsaEncryption
Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/rapi.pem
Signature Algorithm: sha256WithRSAEncryption
Not Before: Sep 24 02:20:27 2022 GMT
Not After : Sep 23 02:20:27 2027 GMT
Public Key Algorithm: rsaEncryption
Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/server.pem
Signature Algorithm: sha256WithRSAEncryption
Not Before: Sep 24 02:20:27 2022 GMT
Not After : Sep 23 02:20:27 2027 GMT
Public Key Algorithm: rsaEncryption
Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/spice-ca.pem
Signature Algorithm: sha256WithRSAEncryption
Not Before: Sep 24 02:20:27 2022 GMT
Not After : Sep 23 02:20:27 2027 GMT
Public Key Algorithm: rsaEncryption
Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/spice.pem
Signature Algorithm: sha256WithRSAEncryption
Not Before: Sep 24 02:20:27 2022 GMT
Not After : Sep 23 02:20:27 2027 GMT
Public Key Algorithm: rsaEncryption
Signature Algorithm: sha256WithRSAEncryption

If the algorithm is sha1WithRSAEncryption, certificate validation may be rejected
depending on the version of openssl. To resolve this you need to upgrade to Ganeti 2.16.1 or later,
or downgrade to an older OpenSSL RPM which does not have the problem(I have not tried).

Ganeti 2.16.1 Important changes:
https://github.com/ganeti/ganeti/releases/tag/v2.16.1
https://github.com/jfut/ganeti-rpm/blob/master/doc/update-rhel-2.15-to-2.16.md

Thanks,

--
Jun Futagawa  

2022年11月17日木曜日 21:16:01 UTC+9 Léa Mckenzie:

James brisket

unread,
Nov 18, 2022, 1:11:06 AM11/18/22
to ganeti
Hi Jun,

Thanks for your answer.

Copying master file to server-06 doesn't solve the issue. 

As I was not able to run command, I re-do the process to stop voting and then I was able to run #gnt-cluster verify but with same error : Error 60: Peer's certificate issuer has been marked as not trusted by the user.

So I check the algorithm and expiration date of the certificate as you recommended: 

[root@server-02 ~]# for i in $(ls /var/lib/ganeti/*.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Nov 17 11:47:08 2022 GMT
            Not After : Nov 16 11:47:08 2027 GMT
            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption
/var/lib/ganeti/rapi.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Apr 23 15:32:35 2021 GMT
            Not After : Apr 22 15:32:35 2026 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/server.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Nov 17 11:47:02 2022 GMT
            Not After : Nov 16 11:47:02 2027 GMT
            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption
/var/lib/ganeti/spice-ca.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 17 11:14:49 2022 GMT
            Not After : Nov 16 11:14:49 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/spice.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 17 11:14:00 2022 GMT
            Not After : Nov 16 11:14:00 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption

Below my openssl version :

[root@server-02 ~]# openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

Do you know if this version is OK of OpenSSL.

Do you know if when I will upgrade to 2.16 I will have stop my running instance ? 

If no I will follow : https://github.com/jfut/ganeti-rpm/blob/master/doc/update-rhel-2.15-to-2.16.md as recommended for the upgrade.

Many many thanks 

Jun Futagawa

unread,
Nov 18, 2022, 9:39:37 AM11/18/22
to ganeti
Hi James,

> Do you know if when I will upgrade to 2.16 I will have stop my running instance ?

It is not necessary to stop running instances when updating Ganeti.
You can solve the issue without stopping the VM.

I found a test environment configured with the old CentOS 7.1 and ganeti-2.15.0-1.el7.x86_64.
I was mistaken and CentOS 7 used NSS instead of OpenSSL.

# python -c 'import pycurl; print pycurl.version'
libcurl/7.29.0 NSS/3.28.4 zlib/1.2.7 libidn/1.28 libssh2/1.4.3

# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

# rpm -qa | grep ganeti | sort
ganeti-2.15.0-1.el7.x86_64
ganeti-instance-debootstrap-0.14-1.el7.noarch
ganeti-instance-image-0.7.2-1.el7.centos.noarch
integ-ganeti-release-7-1.el7.noarch

# gnt-node list

Node                                         DTotal DFree MTotal MNode MFree Pinst Sinst
node01.centos7.example.org  50.0G 16.5G   3.7G  312M  3.1G     1     1
node02.centos7.example.org  50.0G 16.5G   3.7G   95M  3.3G     1     1

# gnt-instance list
Instance                              Hypervisor OS                  Primary_node                                 Status  Memory
vm1.dev.example.org kvm        debootstrap+default node01.centos7.example.org running   1.0G
vm2.dev.example.org kvm        debootstrap+default node02.centos7.example.org running   1.0G

# for i in $(ls /var/lib/ganeti/{client,server}.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Feb 18 17:33:39 2020 GMT
            Not After : Feb 16 17:33:39 2025 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption
/var/lib/ganeti/server.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Feb 18 17:33:39 2020 GMT
            Not After : Feb 16 17:33:39 2025 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption

I reproduced the issue and found a solution. Please try the following.

1. Disable kernel and kmod-drbd updates just to be safe

vi /etc/yum.conf

exclude=kernel* drbd* kmod-drbd*

2. Modify to update from CentOS 7.1 to 7.4 instead of the latest 7.9.

cp -a /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.before
vi /etc/yum.repos.d/CentOS-Base.repo

Diff:
# diff /etc/yum.repos.d/CentOS-Base.repo.before /etc/yum.repos.d/CentOS-Base.repo
15,16c15,16
< mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
< #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
---
> #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
> baseurl=http://ftp.iij.ad.jp/pub/linux/centos-vault/7.4.1708/os/x86_64
24,25c24,25
< mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
< #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
---
> #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
> baseurl=http://ftp.iij.ad.jp/pub/linux/centos-vault/7.4.1708/updates/x86_64
33,34c33,34
< mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
< #baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
---
> #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
> baseurl=http://ftp.iij.ad.jp/pub/linux/centos-vault/7.4.1708/extras/x86_64

3. Update to CentOS 7.4

RHEL/CentOS 7.4 has a change in the default behavior of the encryption library that is probably causing the issue.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.4_release_notes/chap-red_hat_enterprise_linux-7.4_release_notes-deprecated_functionality_in_rhel7

yum clean all
yum update

# python -c 'import pycurl; print pycurl.version'
libcurl/7.29.0 NSS/3.28.4 zlib/1.2.7 libidn/1.28 libssh2/1.4.3

# cat /etc/redhat-release

CentOS Linux release 7.4.1708 (Core)

# gnt-instance list
Instance                              Hypervisor OS                  Primary_node                                 Status  Memory
vm1.dev.example.org kvm        debootstrap+default node01.centos7.example.org running   1.0G
vm2.dev.example.org kvm        debootstrap+default node02.centos7.example.org running   1.0G

4. Re-create the certificate to reproduce the issue.

# gnt-cluster renew-crypto --new-cluster-certificate -d -v
2022-11-18 22:44:29,278: gnt-cluster renew-crypto pid=18501 cli:1219 DEBUG Command line: gnt-cluster renew-crypto --new-cluster-certificate -d -v

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
Blocking watcher
Stopping master daemons
2022-11-18 22:44:32,699: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-master
Stopping daemons on node01.centos7.example.org
2022-11-18 22:44:32,729: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-all
Starting daemon 'ganeti-noded' on node01.centos7.example.org
2022-11-18 22:44:32,747: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-noded
Starting daemon 'ganeti-wconfd' on node01.centos7.example.org
2022-11-18 22:44:32,890: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-wconfd
Stopping daemons on node02.centos7.example.org
2022-11-18 22:44:32,905: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@node02.centos7.example.org '/usr/lib64/ganeti/daemon-util stop-all'

Updating the cluster SSL certificate.
2022-11-18 22:44:33,106: gnt-cluster renew-crypto pid=18501 io:549 DEBUG Backing up /var/lib/ganeti/server.pem at /var/lib/ganeti/server.pem.backup-2022-11-18_22_44_33.jVOYi_
2022-11-18 22:44:33,106: gnt-cluster renew-crypto pid=18501 security:85 DEBUG Generating new cluster certificate at /var/lib/ganeti/server.pem
2022-11-18 22:44:33,235: gnt-cluster renew-crypto pid=18501 io:549 DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-11-18_22_44_33.WCun_R
Copying /var/lib/ganeti/server.pem to node02.centos7.example.org:22
2022-11-18 22:44:33,236: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem node02.centos7.example.org:/var/lib/ganeti/server.pem
Updating client SSL certificates.
2022-11-18 22:44:33,406: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@node02.centos7.example.org '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''
2022-11-18 22:44:33,713: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-11-18_22_44_33.md2dEm
2022-11-18 22:44:33,752: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@node01.centos7.example.org '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''
2022-11-18 22:44:34,132: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-11-18_22_44_34.B8Ka6D
Copying /var/lib/ganeti/ssconf_master_candidates_certs to node02.centos7.example.org:22
2022-11-18 22:44:34,149: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs node02.centos7.example.org:/var/lib/ganeti/ssconf_master_candidates_certs
2022-11-18 22:44:44,328: gnt-cluster renew-crypto pid=18501 wconfd:70 DEBUG Timout trying to connect to WConfD
2022-11-18 22:44:44,329: gnt-cluster renew-crypto pid=18501 wconfd:73 DEBUG Will retry
2022-11-18 22:45:00,355: gnt-cluster renew-crypto pid=18501 wconfd:70 DEBUG Timout trying to connect to WConfD
2022-11-18 22:45:00,355: gnt-cluster renew-crypto pid=18501 wconfd:73 DEBUG Will retry
2022-11-18 22:45:28,783: gnt-cluster renew-crypto pid=18501 wconfd:70 DEBUG Timout trying to connect to WConfD
2022-11-18 22:45:28,784: gnt-cluster renew-crypto pid=18501 wconfd:73 DEBUG Will retry
2022-11-18 22:46:00,715: gnt-cluster renew-crypto pid=18501 wconfd:70 DEBUG Timout trying to connect to WConfD
2022-11-18 22:46:00,715: gnt-cluster renew-crypto pid=18501 wconfd:73 DEBUG Will retry

... This does not finish, so Ctrl+C is used to force termination. ...

^CStarting daemons on node02.centos7.example.org
2022-11-18 22:46:59,946: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@node02.centos7.example.org '/usr/lib64/ganeti/daemon-util start-all'
Stopping daemon 'ganeti-wconfd' on node01.centos7.example.org
2022-11-18 22:47:00,457: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-wconfd
Stopping daemon 'ganeti-noded' on node01.centos7.example.org
2022-11-18 22:47:00,474: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-noded
Starting daemons on node01.centos7.example.org
2022-11-18 22:47:00,517: gnt-cluster renew-crypto pid=18501 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start-all
Aborted. Note that if the operation created any jobs, they might have been submitted and will continue to run in the background.

# gnt-cluster renew-crypto --new-ssh-keys
The authenticity of host 'node02.centos7.example.org (192.168.3.62)' can't be established.
ECDSA key fingerprint is SHA256:dL+xa7VWp4+5Kqr6CL4tYSR5WtN2fD4ptQeBnlJ85Ps.
ECDSA key fingerprint is MD5:ad:4a:68:1b:e8:5a:a2:87:5b:81:6f:cd:cf:23:d6:c8.
Are you sure you want to continue connecting (yes/no)? yes

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Fri Nov 18 22:48:33 2022 Renewing SSH keys
All requested certificates and keys have been replaced. Running "gnt-cluster verify" now is recommended.

# gnt-node list

Node                                         DTotal DFree MTotal MNode MFree Pinst Sinst
node01.centos7.example.org      ?     ?      ?     ?     ?     1     1
node02.centos7.example.org      ?     ?      ?     ?     ?     1     1

I was able to reproduce the issue. The error log is as follows:

# tail -100 /var/log/ganeti/node-daemon.log
...
2022-11-18 22:50:02,782: ganeti-noded pid=19036 ERROR Error while handling request from 192.168.3.61:36310
Traceback (most recent call last):
  File "/usr/share/ganeti/2.15/ganeti/http/server.py", line 594, in _IncomingConnection
    self.request_executor(self, self.handler, connection, client_addr)
  File "/usr/share/ganeti/2.15/ganeti/server/noded.py", line 158, in __init__
    http.server.HttpServerRequestExecutor.__init__(self, *args, **kwargs)
  File "/usr/share/ganeti/2.15/ganeti/http/server.py", line 422, in __init__
    http.Handshake(sock, self.WRITE_TIMEOUT)
  File "/usr/share/ganeti/2.15/ganeti/http/__init__.py", line 539, in Handshake
    raise HttpError("Error in SSL handshake: %s" % err)
HttpError: Error in SSL handshake: ([('SSL routines', 'ssl3_get_client_certificate', 'certificate verify failed')],)

The certificate is still sha1WithRSAEncryption.

# for i in $(ls /var/lib/ganeti/{client,server}.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Nov 18 13:44:34 2022 GMT
            Not After : Nov 17 13:44:34 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption
/var/lib/ganeti/server.pem
    Signature Algorithm: sha1WithRSAEncryption
            Not Before: Nov 18 13:44:33 2022 GMT
            Not After : Nov 17 13:44:33 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha1WithRSAEncryption

5. Update to ganeti-2.16.2-1

I remembered that I have applied a patch to ganeti-2.16.1 to solve this issue.
https://github.com/jfut/ganeti-rpm/issues/18

So I updated to the newer ganeti-2.16.2-1.

*IMPORTANT*: The RPM you are using is not the RPM we built.
Therefore, different build options (e.g., specifying paths) may cause incompatibilities.
Our 2.16.2-1 build options are as follows.
https://github.com/jfut/ganeti-rpm/blob/v2.16.2-1/rpmbuild/ganeti/SPECS/ganeti.spec#L154-L168

# all nodes
systemctl stop ganeti-kvmd.service
systemctl stop ganeti.target
tar czf /root/ganeti-$(date +%FT%H-%M-%S).tar.gz -C /var/lib ganeti
chmod 600 /root/ganeti-$(date +%FT%H)*.tar.gz

wget --no-check-certificate https://jfut.integ.jp/linux/ganeti/7/x86_64/ganeti-2.16.2-1.el7.x86_64.rpm
(--no-check-certificate is required because this environment is old and does not support Let's Encrypt's new root certificate.)

yum update ganeti-2.16.2-1.el7.x86_64.rpm

# master node only
# /usr/lib64/ganeti/tools/cfgupgrade --verbose
Please make sure you have read the upgrade notes for Ganeti 2.16.2
(available in the UPGRADE file and included in other documentation
formats). Continue with upgrading configuration?
y/[n]/?: y
2022-11-18 23:00:41,935: Found configuration version 2150000 (2.15.0)
2022-11-18 23:00:41,936: Creating symlink from /var/lib/ganeti/rapi_users to /var/lib/ganeti/rapi/users
...
2022-11-18 23:00:41,963: Writing configuration file to /var/lib/ganeti/config.data
2022-11-18 23:00:41,966: Testing the new config file...
2022-11-18 23:00:41,974: File loaded successfully after upgrading
Configuration successfully upgraded to version 2.16.2.

6. Re-create the certificate to fix the issue

# all nodes
systemctl stop ganeti.target

# master node only

/usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
/usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it
gnt-cluster renew-crypto --new-cluster-certificate -d -v

Restart ganeti to disable '--no-voting --yes-do-it'.

# master node only
systemctl stop ganeti.target
systemctl start ganeti.target

Confirm that the issue has been resolved.

# gnt-node list

Node                                         DTotal DFree MTotal MNode MFree Pinst Sinst
node01.centos7.example.org  50.0G 16.5G   3.7G  724M  2.5G     1     1
node02.centos7.example.org  50.0G 16.5G   3.7G  682M  2.7G     1     1

# gnt-instance list
Instance                              Hypervisor OS                  Primary_node                                 Status  Memory
vm1.dev.example.org kvm        debootstrap+default node01.centos7.example.org running   1.0G
vm2.dev.example.org kvm        debootstrap+default node02.centos7.example.org running   1.0G

The certificate is now sha256WithRSAEncryption.

# for i in $(ls /var/lib/ganeti/{client,server}.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 18 14:07:03 2022 GMT
            Not After : Nov 17 14:07:03 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/server.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 18 14:07:02 2022 GMT
            Not After : Nov 17 14:07:02 2027 GMT
            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption

The VM also remains running.

# gnt-instance console vm1.dev.example.org

CentOS Linux 7 (Core)
Kernel 3.10.0-1062.el7.x86_64 on an x86_64

vm1 login: root
Password:
Last login: Wed Feb 19 03:06:38 on ttyS0

However, in other cases, updating ganeti only did not solve the issue.
Also updating to CentOS 7.5 did not solve the issue.
Skipping CentOS versions, updating to the latest CentOS 7.9 worked fine.

Once all problems have been resolved, don't forget to revert /etc/yum.conf (and /etc/yum.repos.d/CentOS-Base.repo).

Thanks,  

-- 
Jun Futagawa   

2022年11月18日金曜日 15:11:06 UTC+9 James brisket:

Léa Mckenzie

unread,
Nov 21, 2022, 4:55:51 PM11/21/22
to ganeti
Hi Jun,

Thank you so much for all your help you are doing for us.

I'm running an error and I don't know if this is the package or if maybe I need to upgrade first to ganeti-2.16.2-0.el7.x86_64.rpm.

Error: Package: ganeti-sysvinit-2.15.2-2.el7.centos.x86_64 (installed)
           Requires: ganeti(x86-64) = 2.15.2-2.el7.centos
           Removing: ganeti-2.15.2-2.el7.centos.x86_64 (installed)
               ganeti(x86-64) = 2.15.2-2.el7.centos
           Updated By: ganeti-2.16.2-1.el7.x86_64 (/ganeti-2.16.2-1.el7.x86_64)
               ganeti(x86-64) = 2.16.2-1.el7
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest


I will upgrade to a lower version, then to the version you suggest me.

Many thanks

Jun Futagawa

unread,
Nov 21, 2022, 6:53:10 PM11/21/22
to ganeti
Hi Léa,

Sorry, I forgot about sysvinit RPM file. Please try the following.

wget --no-check-certificate https://jfut.integ.jp/linux/ganeti/7/x86_64/ganeti-2.16.2-1.el7.x86_64.rpm
wget --no-check-certificate https://jfut.integ.jp/linux/ganeti/7/x86_64/ganeti-sysvinit-2.16.2-1.el7.x86_64.rpm
yum update ganeti-*2.16.2-1.el7.x86_64.rpm

No upgrade to ganeti-2.16.2-0.el7.x86_64.rpm is required.

Thanks, 

-- 
Jun Futagawa

2022年11月22日火曜日 6:55:51 UTC+9 Léa Mckenzie:

Léa Mckenzie

unread,
Nov 28, 2022, 3:49:22 AM11/28/22
to ganeti
Hi Jun,

Quick update :

We carefully follow all your instruction, and we are now successfully in the correct version of Ganeti on all my nodes:
# rpm -qa | grep ganeti | sort
ganeti-2.16.2-1.el7.x86_64
ganeti-sysvinit-2.16.2-1.el7.x86_64

Unfortunately, I am still facing the same issue :
# systemctl status ganeti-luxid.service
● ganeti-luxid.service - Ganeti query daemon (luxid)
   Loaded: loaded (/usr/lib/systemd/system/ganeti-luxid.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-11-24 13:12:15 CET; 44s ago
     Docs: man:ganeti-luxid(8)
 Main PID: 403555 (ganeti-luxid)
    Tasks: 1
   Memory: 20.2M
   CGroup: /system.slice/ganeti-luxid.service
           └─403555 /usr/sbin/ganeti-luxid -f

Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-03.corp", nodePrimaryIp = "192.168.2.17", nodeSecondaryIp = "192.168.2.17", node...nodeVmCapable
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", node...nodeVmCapable
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", node...nodeVmCapable
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeMa...deVmCapable =
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", node...nodeVmCapable
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", node...nodeVmCapable
Nov 24 13:12:57 server-02 ganeti-luxid[403555]: No voting RPC result from ["server-03.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.c..."]

Hint: Some lines were ellipsized, use -l to show in full.

# systemctl status ganeti-wconfd.service
● ganeti-wconfd.service - Ganeti config writer daemon (wconfd)
   Loaded: loaded (/usr/lib/systemd/system/ganeti-wconfd.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-11-24 13:15:12 CET; 3s ago
     Docs: man:ganeti-wconfd(8)
 Main PID: 404299 (ganeti-wconfd)
    Tasks: 1
   Memory: 21.4M
   CGroup: /system.slice/ganeti-wconfd.service
           └─404299 /usr/sbin/ganeti-wconfd -f

Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-03.corp", nodePrimaryIp = "192.168.2.17", nodeSecondaryIp = "192.168.2.17", nod...nodeVmCapable
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", nod...nodeVmCapable
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", nod...nodeVmCapable
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeM...deVmCapable =
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", nod...nodeVmCapable
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", nod...nodeVmCapable
Nov 24 13:15:15 server-02 ganeti-wconfd[404299]: No voting RPC result from ["server-03.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.c..."]

Hint: Some lines were ellipsized, use -l to show in full.
# for i in $(ls /var/lib/ganeti/{client,server}.pem); do echo ${i}; openssl x509 -text -noout -in ${i}; done | egrep "/var/|Not|Algorithm"
/var/lib/ganeti/client.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 23 14:14:27 2022 GMT
            Not After : Nov 22 14:14:27 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption
/var/lib/ganeti/server.pem
    Signature Algorithm: sha256WithRSAEncryption
            Not Before: Nov 23 14:14:21 2022 GMT
            Not After : Nov 22 14:14:21 2027 GMT

            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption

Still, as you suggest, maybe upgrading the master to CentOS 7.9 as I'm in 7.4. The only thing that I don't know is how to upgrade without rebooting.

Do you know if all my node need to be in the same CentOS version ? 

Many thanks for your help

Jun Futagawa

unread,
Nov 28, 2022, 8:35:56 PM11/28/22
to ganeti
Hi Léa,

If you have the same issue after recreating the certificate with ganeti-2.16.2-1.el7.x86_64,
then I suggest you try to upgrade to CentOS 7.9.  


> Still, as you suggest, maybe upgrading the master to CentOS 7.9 as I'm in 7.4. The only thing that I don't know is how to upgrade without rebooting.

After upgrading to CentOS 7.9 using yum update, there is no need to reboot the OS.
It just restart ganeti.target and ganeti processes will start using CentOS 7.9's libraries.
Then try to recreate the certificate again.

6. Re-create the certificate to fix the issue
  
*IMPORTANT* Don't forget the following before upgrading to CentOS 7.9:

1. Disable kernel and kmod-drbd updates just to be safe

If you upgrade kernel version, you must also upgrade kmod-drbd (probably provided by ELRepo),
otherwise DRBD will not work after the next OS reboot.

During normal operation, it is recommended to upgrade both kernel and kmod-drbd
if it is planned to reboot OS immediately after upgrading them.


> Do you know if all my node need to be in the same CentOS version ?

No, I don't know, but I think it is better to use the same version
so that the issue does not reproduce after a master-failover.

Thanks,

-- 
Jun Futagawa

2022年11月28日月曜日 17:49:22 UTC+9 Léa Mckenzie:

Léa Mckenzie

unread,
Nov 30, 2022, 8:12:26 PM11/30/22
to ganeti
Hi Jun,

After I've upgrade to centos7.9 I still have the same issue, and 

# gnt-cluster renew-crypto --new-cluster-certificate -d -v

Always failed when starting the daemon on my master node server-02

Stopping daemon 'ganeti-wconfd' on server-02.corp
2022-11-30 23:44:57,105: gnt-cluster renew-crypto pid=2459 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-wconfd
Stopping daemon 'ganeti-noded' on server-02.corp
2022-11-30 23:44:57,168: gnt-cluster renew-crypto pid=2459 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-noded
Starting daemons on server-02.corp
2022-11-30 23:44:57,280: gnt-cluster renew-crypto pid=2459 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start-all
2022-11-30 23:45:27,718: gnt-cluster renew-crypto pid=2459 cli:1257 ERROR Error during command processing

Traceback (most recent call last):
  File "/usr/share/ganeti/2.16/ganeti/cli.py", line 1253, in GenericMain
    result = func(options, args)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1293, in RenewCrypto
    opts.debug > 0)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1203, in _RenewCrypto
    cl = GetClient()
  File "/usr/share/ganeti/2.16/ganeti/runtime.py", line 265, in GetClient
    client = luxi.Client(address=pathutils.QUERY_SOCKET)
  File "/usr/share/ganeti/2.16/ganeti/luxi.py", line 104, in __init__
    self._InitTransport()
  File "/usr/share/ganeti/2.16/ganeti/rpc/client.py", line 207, in _InitTransport
    allow_non_master=self.allow_non_master)
  File "/usr/share/ganeti/2.16/ganeti/rpc/transport.py", line 108, in __init__
    raise errors.TimeoutError("Connect timed out")
TimeoutError: Connect timed out

Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


My log are as follow :

[root@server-02 ~]# systemctl status ganeti-luxid.service -l

● ganeti-luxid.service - Ganeti query daemon (luxid)
   Loaded: loaded (/usr/lib/systemd/system/ganeti-luxid.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-12-01 01:59:36 CET; 1min 6s ago
     Docs: man:ganeti-luxid(8)
 Main PID: 28404 (ganeti-luxid)
    Tasks: 1
   Memory: 23.9M
   CGroup: /system.slice/ganeti-luxid.service
           └─28404 /usr/sbin/ganeti-luxid -f

Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-03.corp", nodePrimaryIp = "192.168.2.17", nodeSecondaryIp = "192.168.2.17", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Fri Apr 27 15:27:05 CEST 2018, nodeMtime = Fri Apr 27 15:27:05 CEST 2018, nodeUuid = "0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a", nodeSerial = 1, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "cuberolls.corp", nodePrimaryIp = "192.168.2.11", nodeSecondaryIp = "192.168.2.11", nodeMasterCandidate = False, nodeOffline = True, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = False, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Wed May 11 17:56:02 CEST 2016, nodeMtime = Mon Nov  7 10:47:50 CET 2022, nodeUuid = "43e57895-0039-4b20-b0b1-1265514fd736", nodeSerial = 3, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlCouldntConnect, explanation: Failed connect to 192.168.2.11:1811; Connection refused"
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "flexo.corp", nodePrimaryIp = "192.168.2.28", nodeSecondaryIp = "192.168.2.28", nodeMasterCandidate = False, nodeOffline = True, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = False, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Wed Apr 20 17:09:16 CEST 2016, nodeMtime = Tue Jul 17 18:43:39 CEST 2018, nodeUuid = "4d462ddf-8ebf-4fc5-8dda-e4d89b750d0b", nodeSerial = 7, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlCouldntConnect, explanation: Failed connect to 192.168.2.28:1811; No route to host"
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Tue Jan  8 14:07:41 CET 2019, nodeMtime = Tue Jan  8 14:07:41 CET 2019, nodeUuid = "7d863cb0-4f76-482e-8d55-e62c3a2b49ae", nodeSerial = 1, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Mon May 14 16:57:47 CEST 2018, nodeMtime = Mon May 14 16:57:47 CEST 2018, nodeUuid = "947090e9-31bd-45fd-b452-5d56e9e4d3d6", nodeSerial = 1, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Tue Jun  5 11:27:06 CEST 2018, nodeMtime = Tue Jun  5 11:27:06 CEST 2018, nodeUuid = "a60d12fb-d381-4a03-84ad-8519b292e200", nodeSerial = 1, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Fri May 25 11:06:56 CEST 2018, nodeMtime = Wed Nov  9 10:26:27 CET 2022, nodeUuid = "e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf", nodeSerial = 18, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", nodeMasterCandidate = True, nodeOffline = False, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = True, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Mon May  7 15:54:22 CEST 2018, nodeMtime = Mon May  7 15:54:22 CEST 2018, nodeUuid = "f0f5b568-dc59-4f35-b377-315ababb4a53", nodeSerial = 1, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlSSLCACert, explanation: Peer's certificate issuer has been marked as not trusted by the user."
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: Error in the RPC HTTP reply from 'Node {nodeName = "brisket.corp", nodePrimaryIp = "192.168.2.41", nodeSecondaryIp = "192.168.2.41", nodeMasterCandidate = False, nodeOffline = True, nodeDrained = False, nodeGroup = "14e25f34-fc7b-4fb0-b0f4-7707d96f0b9a", nodeMasterCapable = False, nodeVmCapable = True, nodeNdparams = PartialNDParams {ndpOobProgramP = Nothing, ndpSpindleCountP = Nothing, ndpExclusiveStorageP = Nothing, ndpOvsP = Nothing, ndpOvsNameP = Nothing, ndpOvsLinkP = Nothing, ndpSshPortP = Nothing, ndpCpuSpeedP = Nothing}, nodePowered = True, nodeHvStateStatic = GenericContainer {fromContainer = fromList []}, nodeDiskStateStatic = GenericContainer {fromContainer = fromList []}, nodeCtime = Mon May  2 16:17:44 CEST 2016, nodeMtime = Mon Nov  7 15:45:24 CET 2022, nodeUuid = "f3ab1850-d02b-4af3-90bf-caca9a8a835d", nodeSerial = 9, nodeTags = TagSet {unTagSet = fromList []}}': CurlLayerError "code: CurlCouldntConnect, explanation: Failed connect to 192.168.2.41:1811; Connection refused"
Dec 01 02:00:31 server-02 ganeti-luxid[28404]: No voting RPC result from ["server-03.corp","cuberolls.corp","flexo.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.corp","brisket.corp"]


The other node are offline, but have not bee removed correctly by the other admin ( didn't show it before as I thought that was not relevent)

By looking, this groupe I noted that my /var/lib/ganeti/ssconf_master_candidates_certs only content 1 line.

Do you know if they are another way to debug those certificats error ?

I still have 52 VM down and cannot execute any command because I got those timed out.

thanks

John McNally

unread,
Nov 30, 2022, 9:55:21 PM11/30/22
to gan...@googlegroups.com
Hi Léa,

I've had a similar problem in the past -- trouble regenerating certificates _after_ they have expired. The following procedure worked for me on a ganeti 2.15 cluster not long ago. It involves manually creating new server.pem and rapi.pem files and distributing them to all nodes (run all commands as root):

mkdir ganeti-backup
cp -a /var/lib/ganeti/server.pem ganeti-backup/
cp -a /var/lib/ganeti/rapi.pem ganeti-backup/

chmod 0600 /var/lib/ganeti/server.pem && \
    openssl req -new -newkey rsa:1024 -days 1825 -nodes  -x509 -keyout /var/lib/ganeti/server.pem  -out /var/lib/ganeti/server.pem -batch && \
    chmod 0400 /var/lib/ganeti/server.pem

chmod 0600 /var/lib/ganeti/rapi.pem && \
    openssl req -new -newkey rsa:1024 -days 1825 -nodes  -x509 -keyout /var/lib/ganeti/rapi.pem  -out /var/lib/ganeti/rapi.pem -batch && \
    chmod 0400 /var/lib/ganeti/rapi.pem

gnt-cluster copyfile /var/lib/ganeti/server.pem
gnt-cluster copyfile /var/lib/ganeti/rapi.pem

systemctl restart ganeti   # do this on all the non-master nodes first, then the master node last

gnt-cluster verify  # shouldn't complain about expired certs anymore (don't worry about the spice certs, which are not critical)

Hope this helps.
  
_________________
John McNally
jmcn...@acm.org


--
You received this message because you are subscribed to the Google Groups "ganeti" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ganeti+un...@googlegroups.com.

Jun Futagawa

unread,
Dec 1, 2022, 7:31:34 AM12/1/22
to ganeti
 Hi Léa,


> After I've upgrade to centos7.9 I still have the same issue, and

Let me get this straight. Are all online nodes as follows?

- Upgraded to CentOS 7.9
- Upgraded to ganeti-2.16.2-1.el7.x86_64.rpm and ganeti-sysvinit-2.16.2-1.el7.x86_64.rpm
- ganeti.target and ganeti-kvmd.service have been stopped

Then, are you run the following on your master node server-02?


/usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
/usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it
gnt-cluster renew-crypto --new-cluster-certificate -d -v

> The other node are offline, but have not bee removed correctly by the other admin ( didn't show it before as I thought that was not relevent)

gnt-cluster renew-crypto tries to connect to all nodes present in the configuration file (/var/lib/ganeti/config.data).
Therefore, if a node is no longer used, it is better to remove it from the Ganeti cluster.
(However, as long as I tried it in my test environment with 3 nodes and one of them offline,
there was no problem with the presence of the offline node.)

Example of deleting an offline node:

# node01 = master node
[ro...@node01.example.org ~]# systemctl stop ganeti-kvmd.service
[ro...@node01.example.org ~]# systemctl stop ganeti.target
[ro...@node01.example.org ~]# /usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
[ro...@node01.example.org ~]# /usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it

[ro...@node01.example.org ~]# gnt-node list

Node DTotal DFree MTotal MNode MFree Pinst Sinst
node01.example.org 50.0G 16.5G 3.7G 562M 2.8G 1 1
node02.example.org ? ? ? ? ? 1 1
node03.example.org ? ? ? ? ? 0 0

[ro...@node01.example.org ~]# gnt-node remove node03
Thu Dec 1 20:42:06 2022 - WARNING: Communication failure to node a11e2166-afbc-44c9-85f0-c3929fbbc447: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:37 2022 Some nodes' SSH key files could not be updated:
Thu Dec 1 20:42:37 2022 node03.example.org: Removing SSH keys from node 'node03.example.org' failed. This can happen when the node is already unreachable. Error: Error after 3 retries. Last exception: Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oPort=22 -oStrictHostKeyChecking=no -4 ro...@node03.example.org '/bin/sh -c /usr/lib64/ganeti/ssh-update'' failed: exited with exit code 255.
Thu Dec 1 20:42:42 2022 - WARNING: Communication failure to node node03.example.org: Error 28: Connection timed out after 5023 milliseconds
Thu Dec 1 20:42:47 2022 - WARNING: Errors encountered on the remote node while leaving the cluster: Error 28: Connection timed out after 5009 milliseconds
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/cluster-domain-secret to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/rapi/users to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/rapi.pem to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /etc/hosts to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/spice.pem to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/hmac.key to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/spice-ca.pem to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Copy of file /var/lib/ganeti/known_hosts to node node02.example.org failed: Error 7: Failed connect to 192.168.3.62:1811; Connection refused
Thu Dec 1 20:42:47 2022 - WARNING: Communication failure to node a11e2166-afbc-44c9-85f0-c3929fbbc447: Error 7: Failed connect to 192.168.3.62:1811; Connection refused

[ro...@node01.example.org ~]# gnt-node list

Node DTotal DFree MTotal MNode MFree Pinst Sinst
node01.example.org 50.0G 16.5G 3.7G 567M 2.8G 1 1
node02.example.org ? ? ? ? ? 1 1

Then retry Step 6.
6. Re-create the certificate to fix the issue

> By looking, this groupe I noted that my */var/lib/ganeti/ssconf_master_candidates_certs *only content 1 line.

>
> Do you know if they are another way to debug those certificats error ?

Even if /var/lib/ganeti/ssconf_master_candidates_certs currently has only one line now,
a successful gnt-cluster renew-crypto will automatically add lines for all nodes.

Thanks,

-- 
Jun Futagawa
2022年12月1日木曜日 10:12:26 UTC+9 Léa Mckenzie:

Léa Mckenzie

unread,
Dec 2, 2022, 5:06:35 AM12/2/22
to ganeti
Hi Jun,

> Let me get this straight. Are all online nodes as follows?

Here on all my node Ganeti version and OS release:

[root@server-01 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-02 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-03 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-04 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-05 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-06 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)

[root@server-08 ~]# rpm -qa | grep ganeti | sort && cat /etc/redhat-release

ganeti-2.16.2-1.el7.x86_64

ganeti-sysvinit-2.16.2-1.el7.x86_64

CentOS Linux release 7.9.2009 (Core)
-------------------------------------------------------------------------------
Ganeti.target and ganeti-kvmd.service have been stopped on all node server-01, 03, 04, 05, 06, 08.

ganeti-luxid.service, ganeti-wconfd.service, Ganeti.target and ganeti-kvmd.service have been stopped on node master server-02 THEN still on master node server-02 :

# /usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
# /usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it
# gnt-cluster renew-crypto --new-cluster-certificate -d -v

Process didn't finish and end up, here, when I run the command :

[root@server-02 ~]# gnt-cluster renew-crypto --new-cluster-certificate -d -v

2022-12-02 10:41:16,675: gnt-cluster renew-crypto pid=27117 cli:1250 DEBUG Command line: gnt-cluster renew-crypto --new-cluster-certificate -d -v

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.

This requires all daemons on all nodes to be restarted and may take

some time. Continue?

y/[n]/?: y

Gathering cluster information

2022-12-02 10:41:21,826: gnt-cluster renew-crypto pid=27117 client:150 DEBUG CallRPCMethod QueryConfigValues: format: 0ms, sock: 0ms, parse: 0ms

2022-12-02 10:41:21,827: gnt-cluster renew-crypto pid=27117 client:150 DEBUG CallRPCMethod Query: format: 0ms, sock: 0ms, parse: 0ms

Note: skipping offline node(s): brisket.corp, cuberolls.corp, flexo.corp

2022-12-02 10:41:21,829: gnt-cluster renew-crypto pid=27117 client:150 DEBUG CallRPCMethod QueryNodes: format: 0ms, sock: 0ms, parse: 0ms

Blocking watcher

Stopping master daemons

2022-12-02 10:41:21,829: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-master

Stopping daemons on server-02.corp

2022-12-02 10:41:21,867: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-all

Starting daemon 'ganeti-noded' on server-02.corp

2022-12-02 10:41:21,922: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-noded

Starting daemon 'ganeti-wconfd' on server-02.corp

2022-12-02 10:41:21,962: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-wconfd

Stopping daemons on server-01.corp

2022-12-02 10:41:21,995: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-01.corp '/usr/lib64/ganeti/daemon-util stop-all'

Stopping daemons on server-03.corp

2022-12-02 10:41:22,204: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-03.corp '/usr/lib64/ganeti/daemon-util stop-all'

Stopping daemons on server-04.corp

2022-12-02 10:41:22,428: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-04.corp '/usr/lib64/ganeti/daemon-util stop-all'

Stopping daemons on server-05.corp

2022-12-02 10:41:22,654: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-05.corp '/usr/lib64/ganeti/daemon-util stop-all'

Stopping daemons on server-06.corp

2022-12-02 10:41:22,886: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-06.corp '/usr/lib64/ganeti/daemon-util stop-all'

Stopping daemons on server-08.corp

2022-12-02 10:41:23,109: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-08.corp '/usr/lib64/ganeti/daemon-util stop-all'

Updating the cluster SSL certificate.

2022-12-02 10:41:23,374: gnt-cluster renew-crypto pid=27117 io:549 DEBUG Backing up /var/lib/ganeti/server.pem at /var/lib/ganeti/server.pem.backup-2022-12-02_10_41_23.6gaqLl

2022-12-02 10:41:23,375: gnt-cluster renew-crypto pid=27117 security:86 DEBUG Generating new cluster certificate at /var/lib/ganeti/server.pem

2022-12-02 10:41:23,606: gnt-cluster renew-crypto pid=27117 io:549 DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_23.CiFLUs

Copying /var/lib/ganeti/server.pem to server-01.corp:22

2022-12-02 10:41:23,608: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-01.corp:/var/lib/ganeti/server.pem

Copying /var/lib/ganeti/server.pem to server-03.corp:22

2022-12-02 10:41:23,815: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-03.corp:/var/lib/ganeti/server.pem

Copying /var/lib/ganeti/server.pem to server-04.corp:22

2022-12-02 10:41:24,045: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-04.corp:/var/lib/ganeti/server.pem

Copying /var/lib/ganeti/server.pem to server-05.corp:22

2022-12-02 10:41:24,258: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-05.corp:/var/lib/ganeti/server.pem

Copying /var/lib/ganeti/server.pem to server-06.corp:22

2022-12-02 10:41:24,523: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-06.corp:/var/lib/ganeti/server.pem

Copying /var/lib/ganeti/server.pem to server-08.corp:22

2022-12-02 10:41:24,738: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem server-08.corp:/var/lib/ganeti/server.pem

Updating client SSL certificates.

2022-12-02 10:41:25,039: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-01.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:25,579: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "", "action": "create", "node_name": "server-01.corp"}

 

2022-12-02 10:41:25,665: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_25.SmhiVf

2022-12-02 10:41:25,694: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-03.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:26,290: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-03.corp"}

 

2022-12-02 10:41:26,504: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_26._BpPzf

2022-12-02 10:41:26,546: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-04.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:26,852: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-04.corp"}

 

2022-12-02 10:41:26,909: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_26.cDBcH0

2022-12-02 10:41:26,923: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-05.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:27,577: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-05.corp"}

 

2022-12-02 10:41:27,827: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_27.pVWycx

2022-12-02 10:41:27,857: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-06.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:28,157: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-06.corp"}

 

2022-12-02 10:41:28,238: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_28.dvHRHu

2022-12-02 10:41:28,250: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-08.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:28,869: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-08.corp"}

 

2022-12-02 10:41:29,175: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_29.DiwWdx

2022-12-02 10:41:29,200: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@server-02.corp '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''

2022-12-02 10:41:29,496: DEBUG Received data: {"cluster_name": "gnt.corp", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "server-02.corp"}

 

2022-12-02 10:41:29,801: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-02_10_41_29.G4zbxY

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-01.corp:22

2022-12-02 10:41:29,825: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-01.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-03.corp:22

2022-12-02 10:41:30,036: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-03.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-04.corp:22

2022-12-02 10:41:30,230: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-04.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-05.corp:22

2022-12-02 10:41:30,441: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-05.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-06.corp:22

2022-12-02 10:41:30,675: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-06.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Copying /var/lib/ganeti/ssconf_master_candidates_certs to server-08.corp:22

2022-12-02 10:41:30,885: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs server-08.corp:/var/lib/ganeti/ssconf_master_candidates_certs

Starting daemons on server-01.corp

2022-12-02 10:41:31,189: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-01.corp '/usr/lib64/ganeti/daemon-util start-all'

Starting daemons on server-03.corp

2022-12-02 10:41:33,535: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-03.corp '/usr/lib64/ganeti/daemon-util start-all'

Starting daemons on server-04.corp

2022-12-02 10:41:34,315: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-04.corp '/usr/lib64/ganeti/daemon-util start-all'

Starting daemons on server-05.corp

2022-12-02 10:41:34,967: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-05.corp '/usr/lib64/ganeti/daemon-util start-all'

Starting daemons on server-06.corp

2022-12-02 10:41:35,651: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-06.corp '/usr/lib64/ganeti/daemon-util start-all'

Starting daemons on server-08.corp

2022-12-02 10:41:36,298: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gnt.corp -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@server-08.corp '/usr/lib64/ganeti/daemon-util start-all'

Stopping daemon 'ganeti-wconfd' on server-02.corp

2022-12-02 10:41:37,222: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-wconfd

Stopping daemon 'ganeti-noded' on server-02.corp

2022-12-02 10:41:37,280: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-noded

Starting daemons on server-02.corp

2022-12-02 10:41:37,381: gnt-cluster renew-crypto pid=27117 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start-all

2022-12-02 10:42:07,848: gnt-cluster renew-crypto pid=27117 cli:1257 ERROR Error during command processing

Traceback (most recent call last):

  File "/usr/share/ganeti/2.16/ganeti/cli.py", line 1253, in GenericMain

    result = func(options, args)

  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1293, in RenewCrypto

    opts.debug > 0)

  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1203, in _RenewCrypto

    cl = GetClient()

  File "/usr/share/ganeti/2.16/ganeti/runtime.py", line 265, in GetClient

    client = luxi.Client(address=pathutils.QUERY_SOCKET)

  File "/usr/share/ganeti/2.16/ganeti/luxi.py", line 104, in __init__

    self._InitTransport()

  File "/usr/share/ganeti/2.16/ganeti/rpc/client.py", line 207, in _InitTransport

    allow_non_master=self.allow_non_master)

  File "/usr/share/ganeti/2.16/ganeti/rpc/transport.py", line 108, in __init__

    raise errors.TimeoutError("Connect timed out")

TimeoutError: Connect timed out

Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:

Connect timed out

[root@server-02 ~]# systemctl status ganeti-luxid.service

● ganeti-luxid.service - Ganeti query daemon (luxid)

   Loaded: loaded (/usr/lib/systemd/system/ganeti-luxid.service; disabled; vendor preset: disabled)

   Active: active (running) since Fri 2022-12-02 10:45:41 CET; 1min 13s ago

     Docs: man:ganeti-luxid(8)

 Main PID: 28202 (ganeti-luxid)

    Tasks: 1

   Memory: 23.5M

   CGroup: /system.slice/ganeti-luxid.service

           └─28202 /usr/sbin/ganeti-luxid -f

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-03.corp", nodePrimaryIp = "192.168.2.17", nodeSecondaryIp = "192.168.2.17", nodeM...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "cuberolls.corp", nodePrimaryIp = "192.168.2.11", nodeSecondaryIp = "192.168.2.11", no...se, nodeVmCap

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "flexo.corp", nodePrimaryIp = "192.168.2.28", nodeSecondaryIp = "192.168.2.28", nodeMa...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-08.corp", nodePrimaryIp = "192.168.2.26", nodeSecondaryIp = "192.168.2.26", nodeM...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-05.corp", nodePrimaryIp = "192.168.2.19", nodeSecondaryIp = "192.168.2.19", nodeM...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-01.corp", nodePrimaryIp = "192.168.2.8", nodeSecondaryIp = "192.168.2.8", nodeMas...deVmCapable =

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-06.corp", nodePrimaryIp = "192.168.2.20", nodeSecondaryIp = "192.168.2.20", nodeM...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "server-04.corp", nodePrimaryIp = "192.168.2.18", nodeSecondaryIp = "192.168.2.18", nodeM...nodeVmCapable

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: Error in the RPC HTTP reply from 'Node {nodeName = "brisket.corp", nodePrimaryIp = "192.168.2.41", nodeSecondaryIp = "192.168.2.41", node..., nodeVmCapab

Dec 02 10:46:49 server-02 ganeti-luxid[28202]: No voting RPC result from ["server-03.corp","cuberolls.corp","flexo.corp","server-08.corp","server-05.corp","server-01.corp","server-06.corp","server-04.co...risket.corp"]

Hint: Some lines were ellipsized, use -l to show in full.

Jun Futagawa

unread,
Dec 2, 2022, 9:21:42 PM12/2/22
to ganeti
Hi Léa,

Thanks for the information on the node. I see that everything has been upgraded and is fine.

I could reproduce the same error if one or all nodes were denied TCP/UDP connections on ports required for ganeti other than SSH from the master node.
It may be that the yum update reset the firewall settings that were not persistent(?).

Please check your firewall settings.

- ganeti-noded: 1811/tcp
- ganeti-confd: 1814/udp
- ganeti-rapi: 5080/tcp
- ganeti-mond: 1815/tcp
- ganeti-metad: 80/tcp
- DRBD/VNC port for instances: 11000/tcp - 14999/tcp

Or allow all TCP/UDP connections from the master node.

Here's what I'm trying to do:

# IP and host
I changed the master of my test environment to node02.

192.168.3.61 node01.example.org node01
192.168.3.62 node02.example.org node02 // master node
192.168.3.70 gcluster // master IP

# Add iptables settings on node01 to deny all TCP/UDP connections except SSH from master node node02
-A INPUT -p tcp -m tcp --dport 22 -s 192.168.3.62/32 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -s 192.168.3.70/32 -j ACCEPT
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT 192.168.3.62/32 -j DROP
-A INPUT 192.168.3.70/32 -j DROP

# Stop ganeti processes on node01 and node02

systemctl stop ganeti-kvmd.service
systemctl stop ganeti.target

# gnt-cluster renew-crypto on master node node02
[ro...@node02.example.org ~]# /usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
[ro...@node02.example.org ~]# /usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it

[ro...@node02.example.org ~]# gnt-cluster renew-crypto --new-cluster-certificate -d -v
2022-12-03 10:43:17,225: gnt-cluster renew-crypto pid=25709 cli:1250 DEBUG Command line: gnt-cluster renew-crypto --new-cluster-certificate -d -v

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y
Gathering cluster information
2022-12-03 10:43:18,612: gnt-cluster renew-crypto pid=25709 client:150 DEBUG CallRPCMethod QueryConfigValues: format: 0ms, sock: 0ms, parse: 0ms
2022-12-03 10:43:18,613: gnt-cluster renew-crypto pid=25709 client:150 DEBUG CallRPCMethod Query: format: 0ms, sock: 0ms, parse: 0ms
2022-12-03 10:43:18,614: gnt-cluster renew-crypto pid=25709 client:150 DEBUG CallRPCMethod QueryNodes: format: 0ms, sock: 0ms, parse: 0ms

Blocking watcher
Stopping master daemons
2022-12-03 10:43:18,614: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-master
Stopping daemons on node02.example.org
2022-12-03 10:43:18,628: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop-all
Starting daemon 'ganeti-noded' on node02.example.org
2022-12-03 10:43:18,639: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-noded
Starting daemon 'ganeti-wconfd' on node02.example.org
2022-12-03 10:43:18,747: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start ganeti-wconfd
Stopping daemons on node01.example.org
2022-12-03 10:43:18,766: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@node01.example.org '/usr/lib64/ganeti/daemon-util stop-all'

Updating the cluster SSL certificate.
2022-12-03 10:43:18,939: gnt-cluster renew-crypto pid=25709 io:549 DEBUG Backing up /var/lib/ganeti/server.pem at /var/lib/ganeti/server.pem.backup-2022-12-03_10_43_18.hJsjXV
2022-12-03 10:43:18,939: gnt-cluster renew-crypto pid=25709 security:86 DEBUG Generating new cluster certificate at /var/lib/ganeti/server.pem
2022-12-03 10:43:19,215: gnt-cluster renew-crypto pid=25709 io:549 DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-03_10_43_19.iEA0bB
Copying /var/lib/ganeti/server.pem to node01.example.org:22
2022-12-03 10:43:19,216: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/server.pem node01.example.org:/var/lib/ganeti/server.pem
Updating client SSL certificates.
2022-12-03 10:43:19,385: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@node01.example.org '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''
2022-12-03 10:43:19,690: DEBUG Received data: {"cluster_name": "gcluster", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "node01.example.org"}

2022-12-03 10:43:19,805: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-03_10_43_19.1g93YY
2022-12-03 10:43:19,740: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -oPort=22 -oStrictHostKeyChecking=yes -4 ro...@node02.example.org '/bin/sh -c '\''/usr/lib64/ganeti/ssl-update --debug --verbose'\'''
2022-12-03 10:43:19,956: DEBUG Received data: {"cluster_name": "gcluster", "node_daemon_certificate": "REMOVED AS IT MY PRIVATE KEY", "action": "create", "node_name": "node02.example.org"}

2022-12-03 10:43:20,026: DEBUG Backing up /var/lib/ganeti/client.pem at /var/lib/ganeti/client.pem.backup-2022-12-03_10_43_20.Z23LHh
Copying /var/lib/ganeti/ssconf_master_candidates_certs to node01.example.org:22
2022-12-03 10:43:20,036: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd scp -p -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 /var/lib/ganeti/ssconf_master_candidates_certs node01.example.org:/var/lib/ganeti/ssconf_master_candidates_certs
Starting daemons on node01.example.org
2022-12-03 10:43:20,219: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oHostKeyAlias=gcluster -q -oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 ro...@node01.example.org '/usr/lib64/ganeti/daemon-util start-all'
Stopping daemon 'ganeti-wconfd' on node02.example.org
2022-12-03 10:43:20,654: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-wconfd
Stopping daemon 'ganeti-noded' on node02.example.org
2022-12-03 10:43:20,669: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util stop ganeti-noded
Starting daemons on node02.example.org
2022-12-03 10:43:20,709: gnt-cluster renew-crypto pid=25709 process:221 INFO RunCmd /usr/lib64/ganeti/daemon-util start-all
2022-12-03 10:43:51,002: gnt-cluster renew-crypto pid=25709 cli:1257 ERROR Error during command processing

Traceback (most recent call last):
  File "/usr/share/ganeti/2.16/ganeti/cli.py", line 1253, in GenericMain
    result = func(options, args)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1293, in RenewCrypto
    opts.debug > 0)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1203, in _RenewCrypto
    cl = GetClient()
  File "/usr/share/ganeti/2.16/ganeti/runtime.py", line 265, in GetClient
    client = luxi.Client(address=pathutils.QUERY_SOCKET)
  File "/usr/share/ganeti/2.16/ganeti/luxi.py", line 104, in __init__
    self._InitTransport()
  File "/usr/share/ganeti/2.16/ganeti/rpc/client.py", line 207, in _InitTransport
    allow_non_master=self.allow_non_master)
  File "/usr/share/ganeti/2.16/ganeti/rpc/transport.py", line 108, in __init__
    raise errors.TimeoutError("Connect timed out")
TimeoutError: Connect timed out
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out

Thanks,

-- 
Jun Futagawa

2022年12月2日金曜日 19:06:35 UTC+9 Léa Mckenzie:

Léa Mckenzie

unread,
Dec 5, 2022, 6:11:40 PM12/5/22
to ganeti
Hi Jun,

About removing a node :

I stop the services
[root@server-02]# systemctl stop ganeti-luxid.service
[root@server-02]# systemctl stop ganeti-wconfd.service
[root@server-02]# systemctl stop ganeti-kvmd.service
[root@server-02]# systemctl stop ganeti.target

I start them again with no voting to be able to run ganeti command

[root@server-02]# /usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
[root@server-02]# /usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it

i run the command to list the node listed on my cluster :

[root@server-02 ~]# gnt-node list

Node           DTotal  DFree MTotal  MNode  MFree Pinst Sinst
brisket.corp        *      *      *      *      *     0     0
cuberolls.corp      *      *      *      *      *     0     0
flexo.corp          *      *      *      *      *    16     0
server-01.corp         ?      ?      ?      ?      ?    53     0
server-02.corp      4.0T 531.0G 503.4G 329.6G 221.6G    47     0
server-03.corp         ?      ?      ?      ?      ?    42     0
server-04.corp         ?      ?      ?      ?      ?    26     0
server-05.corp         ?      ?      ?      ?      ?    39     0
server-06.corp         ?      ?      ?      ?      ?    53     0
server-08.corp         ?      ?      ?      ?      ?    30     0

Worked for that node :

[root@server-02 ~]# gnt-node remove cuberolls.corp
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node a60d12fb-d381-4a03-84ad-8519b292e200: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node 7d863cb0-4f76-482e-8d55-e62c3a2b49ae: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node f0f5b568-dc59-4f35-b377-315ababb4a53: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node 0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:04 2022  - WARNING: Communication failure to node 947090e9-31bd-45fd-b452-5d56e9e4d3d6: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:30:07 2022 Some nodes' SSH key files could not be updated:
Fri Dec  2 11:30:07 2022 cuberolls.corp: Removing SSH keys from node 'cuberolls.corp' failed. This can happen when the node is already unreachable. Error: Error after 3 retries. Last exception: Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oConnectTimeout=10 -oPort=22 -oStrictHostKeyChecking=no -4 ro...@cuberolls.corp '/bin/sh -c /usr/lib64/ganeti/ssh-update'' failed: exited with exit code 255.
Fri Dec  2 11:30:09 2022  - WARNING: Communication failure to node cuberolls.corp: Error 7: Failed connect to cuberolls.corp:1811; Connection refused
Fri Dec  2 11:30:09 2022  - WARNING: Errors encountered on the remote node while leaving the cluster: Error 7: Failed connect to cuberolls.corp:1811; Connection refused
Failure: command execution error:
Can't update hosts file with new host data: Error 51: Peer does not recognize and trust the CA that issued your certificate.

[root@server-02 ~]# gnt-node list

Node         DTotal DFree MTotal MNode MFree Pinst Sinst
brisket.corp      *     *      *     *     *     0     0
flexo.corp        *     *      *     *     *    16     0
server-01.corp       ?     ?      ?     ?     ?    53     0
server-02.corp       ?     ?      ?     ?     ?    47     0
server-03.corp       ?     ?      ?     ?     ?    42     0
server-04.corp       ?     ?      ?     ?     ?    26     0
server-05.corp       ?     ?      ?     ?     ?    39     0
server-06.corp       ?     ?      ?     ?     ?    53     0
server-08.corp       ?     ?      ?     ?     ?    30     0

Error with Brisket:

[root@server-02 ~]# gnt-node remove brisket.corp
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node 7d863cb0-4f76-482e-8d55-e62c3a2b49ae: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node a60d12fb-d381-4a03-84ad-8519b292e200: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node f0f5b568-dc59-4f35-b377-315ababb4a53: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node 0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node 39d98794-89f2-42e7-b52a-6515dc165cb3: Error 51: Peer does not recognize and trust the CA that issued your certificate.
Fri Dec  2 11:33:45 2022  - WARNING: Communication failure to node 947090e9-31bd-45fd-b452-5d56e9e4d3d6: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Failure: command execution error:
Could not remove the SSH key of node 'brisket.corp' (UUID: f3ab1850-d02b-4af3-90bf-caca9a8a835d).: Error 51: Peer does not recognize and trust the CA that issued your certificate.
[
root@server-02 ~]# gnt-node list

Node         DTotal DFree MTotal MNode MFree Pinst Sinst
brisket.corp      *     *      *     *     *     0     0
flexo.corp        *     *      *     *     *    16     0
server-01.corp       ?     ?      ?     ?     ?    53     0
server-02.corp       ?     ?      ?     ?     ?    47     0
server-03.corp       ?     ?      ?     ?     ?    42     0
server-04.corp       ?     ?      ?     ?     ?    26     0
server-05.corp       ?     ?      ?     ?     ?    39     0
server-06.corp       ?     ?      ?     ?     ?    53     0
server-08.corp       ?     ?      ?     ?     ?    30     0

Same with Flexo because some VM are still listed on him event if it is OFFLINE and DOWN so I removed the instances with --ignore-failures parameters

[root@server-02 ~]# gnt-node remove flexo.corp

Failure: prerequisites not met for this operation:
error type: wrong_input, error details:
Instance i-oms.integ is still running on the node, please remove first
[root@server-02 ~]# gnt-instance list | grep flexo
igl-igl2-valid.integ            kvm        image+centos-5.5       flexo.corp   ERROR_nodeoffline      *
i-oms.integ                 kvm        image+centos-current   flexo.corp   ERROR_nodeoffline      *
i-probe.integ               kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *
i-tef-de-prod.integ         kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *
lms-41.integ                  kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *
lms-oc-1.integ                 kvm        image+centos-5.5       flexo.corp   ERROR_nodeoffline      *
lms-od.integ                   kvm        image+centos-5.8       flexo.corp   ERROR_nodeoffline      *
lms-or.integ                   kvm        image+centos-5.5       flexo.corp   ERROR_nodeoffline      *
lms-son.integ               kvm        nothing+default        flexo.corp   ERROR_nodeoffline      *
lms-test.integ                  kvm        nothing+default        flexo.corp   ERROR_nodeoffline      *
lms-valid.tmp                   kvm        image+centos-5.5       flexo.corp   ERROR_nodeoffline      *
posbile-mcg.integ           kvm        image+centos-current   flexo.corp   ERROR_nodeoffline      *
quantum-db2.integ               kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *
v-appli.integ                kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *
v-web.integ                  kvm        image+centos-6.5       flexo.corp   ERROR_nodeoffline      *
v-vasc02.integ             kvm        image+centos-6.3       flexo.corp   ERROR_nodeoffline      *

After the clean of those instances got the same error as Brisket

[root@server-02 ~]# gnt-node list

Node         DTotal DFree MTotal MNode MFree Pinst Sinst
brisket.corp      *     *      *     *     *     0     0
flexo.corp        *     *      *     *     *     0     0
server-01.corp       ?     ?      ?     ?     ?    53     0
server-02.corp       ?     ?      ?     ?     ?    47     0
server-03.corp       ?     ?      ?     ?     ?    42     0
server-04.corp       ?     ?      ?     ?     ?    26     0
server-05.corp       ?     ?      ?     ?     ?    39     0
server-06.corp       ?     ?      ?     ?     ?    53     0
server-08.corp       ?     ?      ?     ?     ?    30     0

[root@server-02 ~]# gnt-node remove flexo.corp
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node 7d863cb0-4f76-482e-8d55-e62c3a2b49ae: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node a60d12fb-d381-4a03-84ad-8519b292e200: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node f0f5b568-dc59-4f35-b377-315ababb4a53: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node e10a17f6-f6d9-4217-8e0c-4b23f52d6fcf: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node 0fa130c2-6d8f-4afe-acc3-66d2c5d8a69a: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node 39d98794-89f2-42e7-b52a-6515dc165cb3: Error 51: Peer does not recognize and trust the CA that issued your certificate.
Fri Dec  2 11:52:54 2022  - WARNING: Communication failure to node 947090e9-31bd-45fd-b452-5d56e9e4d3d6: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Failure: command execution error:
Could not remove the SSH key of node 'flexo.corp' (UUID: 4d462ddf-8ebf-4fc5-8dda-e4d89b750d0b).: Error 51: Peer does not recognize and trust the CA that issued your certificate.

Léa Mckenzie

unread,
Dec 5, 2022, 6:32:50 PM12/5/22
to ganeti
About the Firewalld, Iptables and even selinux 

Here on all my node and master :

[root@server-02 ~]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)
[root@server-02 ~]# sestatus
SELinux status:                 disabled
[root@server-02 ~]# iptables -L --line-number
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    ACCEPT     all  --  anywhere             anywhere

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination
1    ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination
1    ACCEPT     all  --  anywhere             anywhere


I'am wild open on those server.

[root@server-02 ~]# /usr/lib64/ganeti/daemon-util start ganeti-luxid --no-voting --yes-do-it
[root@server-02 ~]# /usr/lib64/ganeti/daemon-util start ganeti-wconfd --no-voting --yes-do-it

[root@server-02 ~]# gnt-cluster renew-crypto --new-cluster-certificate -d -v
2022-12-06 00:21:43,727: gnt-cluster renew-crypto pid=244870 cli:1250 DEBUG Command line: gnt-cluster renew-crypto --new-cluster-certificate -d -v

Updating certificates now. Running "gnt-cluster verify"  is recommended after this operation.
This requires all daemons on all nodes to be restarted and may take
some time. Continue?
y/[n]/?: y


Same error failed on master server-02 
Traceback (most recent call last):
  File "/usr/share/ganeti/2.16/ganeti/cli.py", line 1253, in GenericMain
    result = func(options, args)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1293, in RenewCrypto
    opts.debug > 0)
  File "/usr/share/ganeti/2.16/ganeti/client/gnt_cluster.py", line 1203, in _RenewCrypto
    cl = GetClient()
  File "/usr/share/ganeti/2.16/ganeti/runtime.py", line 265, in GetClient
    client = luxi.Client(address=pathutils.QUERY_SOCKET)
  File "/usr/share/ganeti/2.16/ganeti/luxi.py", line 104, in __init__
    self._InitTransport()
  File "/usr/share/ganeti/2.16/ganeti/rpc/client.py", line 207, in _InitTransport
    allow_non_master=self.allow_non_master)
  File "/usr/share/ganeti/2.16/ganeti/rpc/transport.py", line 108, in __init__
    raise errors.TimeoutError("Connect timed out")
TimeoutError: Connect timed out
Timeout while talking to the master daemon. Jobs might have been submitted and will continue to run even if the call timed out. Useful commands in this situation are "gnt-job list", "gnt-job cancel" and "gnt-job watch". Error:
Connect timed out


A cluster verify when the no voting are set :

[root@server-02 ~]# gnt-cluster verify
Submitted jobs 674897, 674898
Waiting for job 674897 ...
Tue Dec  6 00:03:12 2022 * Verifying cluster config
Tue Dec  6 00:03:12 2022 * Verifying cluster certificate files
Tue Dec  6 00:03:12 2022 * Verifying hypervisor parameters
Tue Dec  6 00:03:12 2022 * Verifying all nodes belong to an existing group
Waiting for job 674898 ...
Tue Dec  6 00:03:13 2022 * Verifying group 'default'
Tue Dec  6 00:03:13 2022 * Gathering data (9 nodes)
Tue Dec  6 00:03:13 2022 * Gathering information about nodes (9 nodes)
Tue Dec  6 00:03:13 2022 * Gathering disk information (9 nodes)
Tue Dec  6 00:03:14 2022   - ERROR: node server-08.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-01.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-04.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-06.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-03.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-02.corp: while getting disk information: Error 51: Peer does not recognize and trust the CA that issued your certificate.
Tue Dec  6 00:03:14 2022   - ERROR: node server-05.corp: while getting disk information: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022 * Verifying configuration file consistency
Tue Dec  6 00:03:14 2022   - ERROR: node server-08.corp: Could not verify the SSH setup of this node.
Tue Dec  6 00:03:14 2022   - ERROR: node server-08.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-01.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-04.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-06.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-03.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-02.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022   - ERROR: node server-05.corp: Node did not return file checksum data
Tue Dec  6 00:03:14 2022 * Verifying node status
Tue Dec  6 00:03:14 2022   - ERROR: node server-08.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-01.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-04.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-06.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-03.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
Tue Dec  6 00:03:14 2022   - ERROR: node server-02.corp: while contacting node: Error 51: Peer does not recognize and trust the CA that issued your certificate.
Tue Dec  6 00:03:14 2022   - ERROR: node server-05.corp: while contacting node: Error 35: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.


All is pointing to certificate but nothing change when tring to renew them.

The issue seems to be simple by reading the error but nothing is working as supposed to be, if all certificates are store in /var/lib/ganeti then I checked everywhere

If you have other recommendation, I will be glad to test them.

Many thanks

Jun Futagawa

unread,
Dec 6, 2022, 5:46:50 AM12/6/22
to ganeti
Hi Léa,

It's a tough problem.

gnt-cluster renew-crypto --new-cluster-certificate mainly does the following:

1. Create new certificate files (/var/lib/ganeti/{client.pem,server.pem})
2. Copy the new certificate file and /var/lib/ganeti/ssconf_master_candidates_certs(1 line) to the member nodes via SSH
3. Start ganeti-noded(Listen 1811/tcp, using the new certificate file) by using /usr/lib64/ganeti/daemon-util on member nodes
4. Connect from master node to ganeti-noded on member nodes via 1811/tcp
5. Initialization process updates /var/lib/ganeti/ssconf_master_candidates_certs to multiple lines on all nodes

To confirm #4, in your last reply, the firewall seems to be OK, but in the meantime,
could you please check if you can communicate with nc command?

on server-06:
[root@server-06 ~]# systemctl stop ganeti.target
[root@server-06 ~]# nc -kl 1811
(Waiting for connection...)

on master server-02:
[root@server-02 ~]# echo test | nc server-06.corp 1811

If "test" is displayed in the terminal of server-06, communication is established.
[root@server-06 ~]# nc -kl 1811
test

To confirm #3, I want to see /var/log/ganeti/node-daemon.log of the member node(e.g. server-06).
(If an error message is output in /var/log/ganeti/node-daemon.log)

Thanks,

-- 
Jun Futagawa

2022年12月6日火曜日 8:32:50 UTC+9 Léa Mckenzie:
Reply all
Reply to author
Forward
0 new messages