Problem adding a new node (could not reload SSH keys)

1,409 views
Skip to first unread message

Bernhard J. M. Grün

unread,
Mar 18, 2014, 11:14:42 AM3/18/14
to gan...@googlegroups.com
Hi,

I think I have found an incompatibility of Ganeti 2.10.1 with Ubuntu 14.04 (development release with all current updates).
At the moment this is only a test cluster. But we plan to start our live cluster soon.

The cluster was initiated with this commands:
root@node1:~/work# gnt-cluster init --enabled-hypervisor=kvm  --master-netdev=br0 -s 10.0.2.11 --maintain-node-health=yes --prealloc-wipe-disks=yes --nic-parameters link=br0 --vg-name=ganeti SOME_NAME
root@node1:~/work# gnt-cluster modify -H kvm:kernel_path=''
root@node1:~/work# gnt-cluster modify --reserved-lvs default/root
root@node1:~/work# gnt-cluster verify
Submitted jobs 49, 50
Waiting for job 49 ...
Tue Mar 18 16:12:39 2014 * Verifying cluster config
Tue Mar 18 16:12:39 2014 * Verifying cluster certificate files
Tue Mar 18 16:12:39 2014 * Verifying hypervisor parameters
Tue Mar 18 16:12:40 2014 * Verifying all nodes belong to an existing group
Waiting for job 50 ...
Tue Mar 18 16:12:40 2014 * Verifying group 'default'
Tue Mar 18 16:12:40 2014 * Gathering data (1 nodes)
Tue Mar 18 16:12:40 2014 * Gathering disk information (1 nodes)
Tue Mar 18 16:12:40 2014 * Verifying configuration file consistency
Tue Mar 18 16:12:41 2014 * Verifying node status
Tue Mar 18 16:12:41 2014 * Verifying instance status
Tue Mar 18 16:12:41 2014 * Verifying orphan volumes
Tue Mar 18 16:12:41 2014 * Verifying N+1 Memory redundancy
Tue Mar 18 16:12:41 2014 * Other Notes
Tue Mar 18 16:12:41 2014 * Hooks Results

The next command to add a new node produces the following error:
root@node1:~/work# gnt-node add -s 10.0.2.12 10.0.1.12
-- WARNING -- 
Performing this operation is going to replace the ssh daemon keypair
on the target machine (10.0.1.12) with the ones of the current one
and grant full intra-cluster ssh root access to/from it

The authenticity of host '10.0.1.12 (10.0.1.12)' can't be established.
ECDSA key fingerprint is b1:dc:68:aa:a5:38:25:29:ad:ed:e3:0e:af:73:15:17.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.1.12' (ECDSA) to the list of known hosts.
2014-03-18 15:44:07,150: Unhandled Ganeti error: Could not reload SSH keys, command '/usr/local/lib/ganeti/daemon-util reload-ssh-keys' had exitcode 1 and error 
Failure: command execution error:
Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oStrictHostKeyChecking=ask -4 ro...@10.0.1.12 /usr/local/lib/ganeti/prepare-node-join' failed: exited with exit code 1

All nodes were in a cluster before. But I destroyed that cluster and also updated Ganeti from 2.10.0 to 2.10.1.

Do you have any idea how I could solve that problem?

Many thanks in advance

Bernhard

Helga Velroyen

unread,
Mar 18, 2014, 11:37:47 AM3/18/14
to gan...@googlegroups.com
I'm not entirely sure what exactly the problem is, but can you try these things:

- On the node you want to add, regenerate the ssh key pair of the root user. (manually, with `ssh-keygen`).

- On the master node: check if there is more than one key in the master node's ``authorized_keys`` file that holds an old ssh key of the new node and if that's the case, remove it.

Then try again adding the node.

Also, are you using in split user mode? (That means did you run configure with the options --with-user-prefix=... --with-group-prefix=... ?)

Let me know if that helps!

Cheers,
Helga



--
-- 
Helga Velroyen | Software Engineer | hel...@google.com | 

Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Bernhard J. M. Grün

unread,
Mar 19, 2014, 3:41:34 AM3/19/14
to gan...@googlegroups.com
Hi,

I just tried both of your tips. Unfortunately this did not change anything.

To be absolutely sure that this has nothing to do with the ssh keys I deleted them on both sides after I destroyed the cluster.
Unfortunately the error seems to be the same:
root@node1:~# gnt-cluster destroy --yes-do-it
root@node1:~# rm -rf .ssh
root@node2:~# rm -rf .ssh
root@node1:~# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
df:ec:f2:f2:4f:c6:36:40:16:eb:34:bd:56:b2:db:f2 ro...@node1.SOME_NAME
The key's randomart image is:
+--[ RSA 2048]----+
|            .    |
|             +   |
|            * o .|
|           = . = |
|        S   o +  |
|         . o + o |
|          . o O .|
|          o. + + |
|           =+.. E|
+-----------------+
root@node1:~# gnt-cluster init --enabled-hypervisor=kvm  --master-netdev=br0 -s 10.0.2.11 --maintain-node-health=yes --prealloc-wipe-disks=yes --nic-parameters link=br0 --vg-name=ganeti SOME_NAME
root@node1:~# gnt-cluster modify -H kvm:kernel_path=''
root@node1:~# gnt-cluster modify --reserved-lvs default/root
root@node1:~# gnt-cluster verify
Submitted jobs 5, 6
Waiting for job 5 ...
Wed Mar 19 08:33:13 2014 * Verifying cluster config
Wed Mar 19 08:33:13 2014 * Verifying cluster certificate files
Wed Mar 19 08:33:13 2014 * Verifying hypervisor parameters
Wed Mar 19 08:33:13 2014 * Verifying all nodes belong to an existing group
Waiting for job 6 ...
Wed Mar 19 08:33:13 2014 * Verifying group 'default'
Wed Mar 19 08:33:13 2014 * Gathering data (1 nodes)
Wed Mar 19 08:33:14 2014 * Gathering disk information (1 nodes)
Wed Mar 19 08:33:14 2014 * Verifying configuration file consistency
Wed Mar 19 08:33:14 2014 * Verifying node status
Wed Mar 19 08:33:14 2014 * Verifying instance status
Wed Mar 19 08:33:14 2014 * Verifying orphan volumes
Wed Mar 19 08:33:14 2014 * Verifying N+1 Memory redundancy
Wed Mar 19 08:33:14 2014 * Other Notes
Wed Mar 19 08:33:14 2014 * Hooks Results
root@node1:~# gnt-node add -s 10.0.2.12 10.0.1.12
-- WARNING -- 
Performing this operation is going to replace the ssh daemon keypair
on the target machine (10.0.1.12) with the ones of the current one
and grant full intra-cluster ssh root access to/from it

The authenticity of host '10.0.1.12 (10.0.1.12)' can't be established.
ECDSA key fingerprint is b1:dc:68:aa:a5:38:25:29:ad:ed:e3:0e:af:73:15:17.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.1.12' (ECDSA) to the list of known hosts.
ro...@10.0.1.12's password: 
2014-03-19 08:33:23,378: Unhandled Ganeti error: Could not reload SSH keys, command '/usr/local/lib/ganeti/daemon-util reload-ssh-keys' had exitcode 1 and error 
Failure: command execution error:
Command 'ssh -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oCheckHostIp=no -oStrictHostKeyChecking=ask -4 ro...@10.0.1.12 /usr/local/lib/ganeti/prepare-node-join' failed: exited with exit code 1

Do you have any other ideas?
Maybe some debug outputs that I could enable?

Thanks in advance

Bernhard

Bernhard J. M. Grün

unread,
Mar 19, 2014, 4:02:16 AM3/19/14
to gan...@googlegroups.com
I would like to add some information to answer the question about my compile options:
root@node1:~/work/ganeti-2.10.1# ./configure --localstatedir=/var --sysconfdir=/etc --enable-monitoring --enable-confd --enable-restricted-commands --enable-versionfull --with-kvm-kernel=""

So everything runs as root and I also used that options for version 2.10.0 before.

Thomas Thrainer

unread,
Mar 19, 2014, 4:32:00 AM3/19/14
to gan...@googlegroups.com
I guess there is no '/etc/init.d/ssh' init script in Ubuntu 14.04 any more?
AFAIK the way to restart the SSH daemon with Upstart would be 'service ssh restart', right?
So you'd have to pass '--with-ssh-initscript=service ssh' to ./configure.

Cheers,
Thomas
--
Thomas Thrainer | Software Engineer | thom...@google.com | 

Bernhard J. M. Grün

unread,
Mar 19, 2014, 4:50:28 AM3/19/14
to gan...@googlegroups.com
It seems at least the start script /etc/init.d/ssh is there.
But I will try your suggestion as I get the following output when I run bash -x /etc/init.d/ssh restart:
root@node1:/etc/init.d# bash -x ./ssh restart
+ set -e
+ test -x /usr/sbin/sshd
+ umask 022
+ test -f /etc/default/ssh
+ . /etc/default/ssh
++ SSHD_OPTS=
+ . /lib/lsb/init-functions
+++ run-parts --lsbsysinit --list /lib/lsb/init-functions.d
++ for hook in '$(run-parts --lsbsysinit --list /lib/lsb/init-functions.d 2>/dev/null)'
++ '[' -r /lib/lsb/init-functions.d/20-left-info-blocks ']'
++ . /lib/lsb/init-functions.d/20-left-info-blocks
++ for hook in '$(run-parts --lsbsysinit --list /lib/lsb/init-functions.d 2>/dev/null)'
++ '[' -r /lib/lsb/init-functions.d/50-ubuntu-logging ']'
++ . /lib/lsb/init-functions.d/50-ubuntu-logging
+++ LOG_DAEMON_MSG=
++ FANCYTTY=
++ '[' -e /etc/lsb-base-logging.sh ']'
++ true
+ '[' -n '' ']'
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/sbin:/sbin
+ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/sbin:/sbin
+ case "$1" in
+ check_for_upstart 1
+ init_is_upstart
+ '[' -x /sbin/initctl ']'
+ /sbin/initctl version
+ /bin/grep -q upstart
+ return 0
+ exit 1

As you can see at the end there is an exit code 1 - the same as in the ganeti error messages.

I will report back if your suggestion solves that problem.

Thanks

Bernhard

Bernhard J. M. Grün

unread,
Mar 19, 2014, 5:39:18 AM3/19/14
to gan...@googlegroups.com
It seems a bit better now but the next error is already there.
Maybe I have fiddled too much with these nodes to test the robustness of ganeti.

Anyway I now get the following error:
root@node1:~# gnt-node add -s 10.0.1.12 node2.SOME_NAME
-- WARNING -- 
Performing this operation is going to replace the ssh daemon keypair
on the target machine (10.0.1.12) with the ones of the current one
and grant full intra-cluster ssh root access to/from it

The authenticity of host '10.0.1.12 (10.0.1.12)' can't be established.
ECDSA key fingerprint is b1:dc:68:aa:a5:38:25:29:ad:ed:e3:0e:af:73:15:17.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.1.12' (ECDSA) to the list of known hosts.
ro...@10.0.1.12's password: 
ssh stop/waiting
ssh start/running, process 4170
Failure: prerequisites not met for this operation:
error type: wrong_input, error details:
The master has a secondary ip but the new node doesn't have one
root@node1:~# gnt-node add -s 10.0.2.12 node2.ganeti1.nsynd.com
-- WARNING -- 
Performing this operation is going to replace the ssh daemon keypair
on the target machine (10.0.1.12) with the ones of the current one
and grant full intra-cluster ssh root access to/from it

The authenticity of host '10.0.1.12 (10.0.1.12)' can't be established.
ECDSA key fingerprint is b1:dc:68:aa:a5:38:25:29:ad:ed:e3:0e:af:73:15:17.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.1.12' (ECDSA) to the list of known hosts.
ssh stop/waiting
ssh start/running, process 4310
Wed Mar 19 10:31:07 2014  - INFO: Node will be a master candidate
Wed Mar 19 10:31:08 2014 ssh/hostname verification failed (checking from 45993f24-cd49-453d-8acd-512f441af443): hostname mismatch: expected 10.0.1.12 but got node2.SOME_NAME
Failure: command execution error:
ssh/hostname verification failed

Should I just begin with a clean install? Or do you think it could be gainful to find the real reason behind that problems?

Bernhard

Helga Velroyen

unread,
Mar 19, 2014, 5:44:40 AM3/19/14
to gan...@googlegroups.com
Hi!


Well, if you really fiddled a lot with the machines, I guess a clean install is less time-consuming. We run a lot of tests with Ganeti, but we cannot possibly take all things into account that people fiddle with their machines prior to using Ganeti.

Cheers,
Helga

Thomas Thrainer

unread,
Mar 19, 2014, 6:05:30 AM3/19/14
to gan...@googlegroups.com
It somehow seems as if you added the new node by IP rather than by its hostname. Are you sure that you did a 'gnt-node add -s <secondary_ip> <node_fqdn>'? Do the nodes have a secondary IP for replication? Otherwise you should leave out the -s option.
Also, you should make sure that the hostnames are resolvable by DNS (to their primary IP) and that 'hostname --fqdn' on the nodes actually output the correct hostname for all nodes.

Cheers,
Thomas

Bernhard J. M. Grün

unread,
Mar 19, 2014, 8:11:53 AM3/19/14
to gan...@googlegroups.com
It seems the problem was in the /etc/hosts file:
10.0.1.12 10.0.1.12 10

Maybe because I really tried to add a node using the IP address instead of the FQDN.

So this was clearly my fault.

Many thanks for your great help!

Bernhard

Rezaul Karim

unread,
May 27, 2014, 7:38:56 AM5/27/14
to gan...@googlegroups.com

I worked it in the following way.... (ubuntu 14.04 only)

In clinet-node:
cd /root/
rm -fr .ssh

mkdir .ssh

vim /etc/ssh/sshd_config   ; change the line to
    PermitRootLogin yes
echo "service ssh restart" > /etc/init.d/ssh

In master-node:
cd /root/
rm -fr .ssh
mkdir .ssh

vim /etc/ssh/sshd_config 
  ; change the line to
    PermitRootLogin yes
echo "service ssh restart" > /etc/init.d/ssh

Now Run...
gnt-node add -s 172.16.208.162 gnode2.ganeti.lan

Neil M

unread,
Jun 4, 2014, 2:59:10 PM6/4/14
to gan...@googlegroups.com
Note that this is a problem with the official release packages with Ubuntu 14.04 as well.  As noted in this thread it appears to have to do with how Ubuntu implements /etc/init.d/ssh, it doesn't seem to actually do anything by default but I was able to confirm replacing that script with "service ssh restart" will fix it.  Here is the Ubuntu bug, it also links back to this thread:

https://bugs.launchpad.net/ubuntu/+source/ganeti/+bug/1308571
Reply all
Reply to author
Forward
0 new messages