SSL Problems after upgrade to 2.11.6

993 views
Skip to first unread message

Gianfilippo

unread,
Oct 19, 2014, 6:31:44 PM10/19/14
to gan...@googlegroups.com
Hello,
after an upgrade from 2.10 to 2.11.6, i cannot bring up again the
cluster getting on all the nodes the error:
HttpError: Error in SSL handshake: ([('SSL routines',
'SSL3_GET_CLIENT_CERTIFICATE', 'no certificate returned')],)

On the master, starting ganeti:

WARNING:root:Error contacting node node1: Error 35: error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
WARNING:root:Error contacting node node2: Error 35: error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
WARNING:root:Error contacting node node3: Error 35: error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca

I tried several gnt-cluster renew-crypto commands
(--new-cluster-certificate, --new-node-certificates ,
--new-spice-certificate) without luck.

Any hint?

Thanks for your help.

Helga Velroyen

unread,
Oct 20, 2014, 3:57:58 AM10/20/14
to gan...@googlegroups.com
Hi!

a few questions:

- from which version did you upgrade to 2.11.6?
- which OS in which version are you using on the nodes?
- can you check the version of openssl and python-openssl for me?
- Did this occur right after the upgrade or were there any other operations done? Where those operations successful or maybe aborted half-way, including the upgrade?

What you can try is this:
- check if all nodes have *the same* server.pem in /var/lib/ganeti. If not, copy the server.pem file manually from the master node to all other nodes.
- check if all nodes have one (but not the same) client.pem in /var/lib/ganeti and check on each node if the ssconf_mastercandidate_certs file exists and has as many keys as you have master candidates in your cluster (one key per line). If some nodes don't have the client.pem or the ssconf file is incomplete, something seems to be broken half-way through the upgrade or renew-crypto operation. You can try to fix it this way:
  * remove *all* client.pem from all machines
  * remove the ssconf_mastercandidate_certs file on *all* nodes
  * after that, all nodes should be able to communicate with each other again, but 'gnt-cluster verify' will complain about having not the correct ssl setup. Thus as a last step, you should run 'gnt-cluster renew-crypto --new-node-certficates' again.

Let me know if that works for you,
Helga


--
Helga Velroyen | Software Engineer | hel...@google.com | 

Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Helga Velroyen

unread,
Oct 20, 2014, 4:06:46 AM10/20/14
to gan...@googlegroups.com
On Mon, Oct 20, 2014 at 9:57 AM, Helga Velroyen <hel...@google.com> wrote:
Hi!

a few questions:

- from which version did you upgrade to 2.11.6?
^^ Disregard this question, I just saw you already answered it :)

Gianfilippo

unread,
Oct 20, 2014, 5:02:56 AM10/20/14
to gan...@googlegroups.com
Hello Helga,

Il 20/10/2014 09:57, 'Helga Velroyen' via ganeti ha scritto:
> Hi!
>
> a few questions:
>
> - from which version did you upgrade to 2.11.6?
2.10.2

> - which OS in which version are you using on the nodes?
Ubuntu 12.04

> - can you check the version of openssl and python-openssl for me?
Sure.

openssl 1.0.1-4ubuntu5.20
python-openssl 0.12-1ubuntu2.1

By the way, i have recompiled python-pycurl (7.19.0-4ubuntu3) against
openssl instead of the default ubuntu package compiled against gnutls
(following the procedure at
http://theengguy.blogspot.it/2012/06/ganeti-not-working-in-ubuntu-1204.html
).

Using the default python-openssl compiled against gnutls, i get a
similar error on master node:
Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
and on the client node:
HttpError: Error in SSL handshake: ([('SSL routines',
'SSL3_GET_CLIENT_CERTIFICATE', 'no certificate returned')],)


> - Did this occur right after the upgrade or were there any other
> operations done? Where those operations successful or maybe aborted
> half-way, including the upgrade?
There were no operations ongoing. I have performed other upgrades
before, without any issue; the cluster was created using 2.4 if i'm not
wrong, and i have upgraded all my way to 2.10.2, in different steps.

>
> What you can try is this:
> - check if all nodes have *the same* server.pem in /var/lib/ganeti. If
> not, copy the server.pem file manually from the master node to all other
> nodes.
I have copied it again manually on all the nodes, still getting "tlsv1
alert unknown ca" error.ì

> - check if all nodes have one (but not the same) client.pem in
> /var/lib/ganeti and check on each node if the
> ssconf_mastercandidate_certs file exists and has as many keys as you
> have master candidates in your cluster (one key per line). If some nodes
> don't have the client.pem or the ssconf file is incomplete, something
> seems to be broken half-way through the upgrade or renew-crypto
> operation. You can try to fix it this way:
> * remove *all* client.pem from all machines
> * remove the ssconf_mastercandidate_certs file on *all* nodes
> * after that, all nodes should be able to communicate with each other
> again, but 'gnt-cluster verify' will complain about having not the
> correct ssl setup.

I have deleted all the client.pem files and ssconf_mastercandidate_certs
files from all the nodes, still getting the same error for all the nodes:
WARNING:root:Error contacting node node1: Error 35: error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
etc etc
CRITICAL:root:Cluster inconsistent, most of the nodes didn't answer
after multiple retries. Aborting startup
CRITICAL:root:Use the --no-voting option if you understand what effects
it has on the cluster state

Thus as a last step, you should run 'gnt-cluster
> renew-crypto --new-node-certficates' again.
it also failed.

Thank you again


>
> Let me know if that works for you,
> Helga
>
>
>
> On Mon, Oct 20, 2014 at 12:31 AM, Gianfilippo <gia...@gmail.com
> <mailto:gia...@gmail.com>> wrote:
>
> Hello,
> after an upgrade from 2.10 to 2.11.6, i cannot bring up again the
> cluster getting on all the nodes the error:
> HttpError: Error in SSL handshake: ([('SSL routines',
> 'SSL3_GET_CLIENT_CERTIFICATE', 'no certificate returned')],)
>
> On the master, starting ganeti:
>
> WARNING:root:Error contacting node node1: Error 35:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
> WARNING:root:Error contacting node node2: Error 35:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
> WARNING:root:Error contacting node node3: Error 35:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
>
> I tried several gnt-cluster renew-crypto commands
> (--new-cluster-certificate, --new-node-certificates ,
> --new-spice-certificate) without luck.
>
> Any hint?
>
> Thanks for your help.
>
>
>
>
> --
> Helga Velroyen | Software Engineer |hel...@google.com
> <mailto:hel...@google.com> |

Gianfilippo

unread,
Oct 20, 2014, 5:10:25 AM10/20/14
to gan...@googlegroups.com
By the way, i have noticed that the nodes don't have any client.pem
file, even after performing a gnt-cluster renew-crypto
--new-node-certificates .
Performing a gnt-cluster renew-crypto --new-node-certificates , i get in
the node-daemon.log of the client nodes the following errors:

2014-10-20 11:08:51,548: ganeti-noded pid=29376 ERROR Error while
handling request from IPADDRESS:49404
Traceback (most recent call last):
File "/usr/local/share/ganeti/2.11/ganeti/http/server.py", line 594,
in _IncomingConnection
self.request_executor(self, self.handler, connection, client_addr)
File "/usr/local/share/ganeti/2.11/ganeti/server/noded.py", line 158,
in __init__
http.server.HttpServerRequestExecutor.__init__(self, *args, **kwargs)
File "/usr/local/share/ganeti/2.11/ganeti/http/server.py", line 422,
in __init__
http.Handshake(sock, self.WRITE_TIMEOUT)
File "/usr/local/share/ganeti/2.11/ganeti/http/__init__.py", line
539, in Handshake
raise HttpError("Error in SSL handshake: %s" % err)
HttpError: Error in SSL handshake: ([('SSL routines',
'SSL3_GET_CLIENT_CERTIFICATE', 'no certificate returned')],)


candlerb

unread,
Oct 20, 2014, 5:02:27 PM10/20/14
to gan...@googlegroups.com
Did you build ganeti yourself from source? Or use the package from the PPA <https://launchpad.net/~pkg-ganeti-devel/+archive/lts>? Or something else?

Helga Velroyen

unread,
Oct 21, 2014, 3:35:23 AM10/21/14
to gan...@googlegroups.com
Hi!

I will try to find a machine where I can reproduce this problem, so stay tuned. Meanwhile, a few more questions:

- Is it correct, that you ran the cluster with 2.10 before you linked python-pycurl with gnutls? My suspicion is that somehow the gnutls is not happy about the certificate which was created with python-openssl. I will try to test this.
If you have the capacities, you could try the following: if you have a machine where you can re-initialize a Ganeti-cluster using the python-pycurl with gnuttls and copy the server.pem of that test machine to all the machines of the cluster that currently makes problems, you could check if it works then. Then we would know that the swap from openssl to gnutls caused the problem. 

Cheers,
Helga

Gianfilippo

unread,
Oct 21, 2014, 7:16:20 AM10/21/14
to gan...@googlegroups.com
Hello candlerb,
i used ganeti 2.11.6 from sources. Anyways, I managed to solve the issue
in the following way:

1) I employed a recompiled version of python-pycurl against openssl,
instead of the default one compiled against gnutls. I followed the
procedure at
http://theengguy.blogspot.it/2012/06/ganeti-not-working-in-ubuntu-1204.html
:

# aptitude install build-essential dpkg-dev
# apt-get source python-pycurl
# aptitude build-dep python-pycurl
# aptitude install libcurl4-openssl-dev
# cd pycurl-7.19.0
# perl -p -i -e "s/gnutls/openssl/" debian/control
# dpkg-buildpackage -rfakeroot -b
# cd ..
# dpkg -i python-pycurl_7.19.0-4ubuntu3_amd64.deb

2) I noticed that a node, that was offline during the first update
procedure and readded after a reinstall with a gnt-node add --readd ,
connected successfully to the cluster. I checked the differences between
the nodes and noticed that the successfully connecting node was missing
5 files in the /var/lib/ganeti dir. So i deleted them on all the nodes:

1961 rm client.pem
1966 rm ssconf_enabled_user_shutdown
1971 rm ssconf_node_vm_capable
1973 rm ssconf_master_candidates_certs
1975 rm ssconf_gluster_storage_dir

After deleting these files on all the nodes, the cluster started to work
again, and i could renew the cluster keys and gnt-cluster redist-conf .
Then, I noticed that the files have been recreated on all the cluster nodes.
I don't know how this is making any sense, as the errors i was getting
were related to SSL certificates. After deleting these 5 files on all
the nodes, the nodes connected succesfully to the master node without
giving any certificate error.
In any case, I didn't manage to make the cluster work using the default
ubuntu 12.04 pycurl package linked against GNUTLS.

Greetings
--
Gianfilippo Giannini
System administration for the masses
mail me: gianfilipp...@kelyon.it - call me: +39 081 19753290
stuck in the '90s? fax me! +39 081 19753299
"I love deadlines. I love the whooshing noise they make as they pass" -
Douglas Adams

Marco Casavecchia

unread,
Jan 23, 2015, 5:11:16 AM1/23/15
to gan...@googlegroups.com
Hello, I recently tried to migrate from 2.10.3 to 2.11.6 using "gnt-cluster upgrade --to=2.11" and I got the same issue.

Only the master node seems to be able to generate the client.pem and the  ssconf_master_candidates_certs.
In the rest of the nodes the files doesn't appears and the ssconf_master_candidates_certs on the master is very short.

My cluster runs on a Debian Wheezy 7.8 and what follows is a part of the master log.

------------8<------8<--------

2015-01-22 18:15:06,075: ganeti-masterd pid=142375/ClientReq12 ERROR More than half of the nodes failed
2015-01-22 18:15:06,118: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,118: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,118: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,118: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR More than half of the nodes failed
2015-01-22 18:15:06,119: ganeti-masterd pid=142375/Jq20/Job1378115 INFO Op 1/1: opcode GROUP_VERIFY_DISKS(174e87dd-3641-4f87-9bdf-e36fdc8ba65e) waiting for locks
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in jobqueue_update on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,167: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,168: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,168: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR More than half of the nodes failed
2015-01-22 18:15:06,181: ganeti-masterd pid=142375/ClientReq16 INFO Received job poll request for 1378115
2015-01-22 18:15:06,210: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in lv_list on node 040ef521-2f97-44bb-bcf3-56d0de5f0194: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,210: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in lv_list on node 8373d049-424d-4906-8fe5-56e6c0f6ca8c: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,210: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in lv_list on node fa77d9e1-9760-4b92-85c0-182dbfe16e49: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,210: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error enumerating LVs on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,210: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error enumerating LVs on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,211: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error enumerating LVs on node ganeti04.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,253: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in drbd_needs_activation on node 040ef521-2f97-44bb-bcf3-56d0de5f0194: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,254: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error getting DRBD status on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,303: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in drbd_needs_activation on node 8373d049-424d-4906-8fe5-56e6c0f6ca8c: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,303: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error getting DRBD status on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,344: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS ERROR RPC error in drbd_needs_activation on node fa77d9e1-9760-4b92-85c0-182dbfe16e49: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,344: ganeti-masterd pid=142375/Jq20/Job1378115/G_VERIFY_DISKS WARNING Error getting DRBD status on node ganeti04.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,392: ganeti-masterd pid=142375/MainThread INFO Accepted connection from pid=142456, uid=0, gid=0
2015-01-22 18:15:06,395: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,395: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC error in jobqueue_update on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR RPC call jobqueue_update (Updating /var/lib/ganeti/queue/job-1378115) failed on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 ERROR More than half of the nodes failed
2015-01-22 18:15:06,396: ganeti-masterd pid=142375/Jq20/Job1378115 INFO Finished job 1378115, status = success
2015-01-22 18:15:06,490: ganeti-masterd pid=142375/ClientReq4 INFO Received job query request for 1378115
2015-01-22 18:15:06,493: ganeti-masterd pid=142375/ClientReq2 INFO Received job archive request for 1378115
2015-01-22 18:15:06,493: ganeti-masterd pid=142375/ClientReq2 INFO Archiving job 1378115
2015-01-22 18:15:06,536: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC error in jobqueue_rename on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti06.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti01.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti07.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti05.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti02.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR RPC call jobqueue_rename (Renaming files ([('/var/lib/ganeti/queue/job-1378115', '/var/lib/ganeti/queue/archive/137/job-1378115')])) failed on node ganeti03.or.lan: Error 35: gnutls_handshake() failed: A TLS fatal alert has been received.
2015-01-22 18:15:06,537: ganeti-masterd pid=142375/ClientReq2 ERROR More than half of the nodes failed
2015-01-22 18:15:18,718: ganeti-masterd pid=142375/MainThread INFO Received signal 15 asking for shutdown
2015-01-22 18:15:18,741: ganeti-masterd pid=142375/MainThread INFO Clean master daemon shutdown
2015-01-22 18:15:47,447: ganeti-masterd pid=147168/MainThread INFO ganeti-masterd daemon startup
2015-01-22 18:15:47,449: ganeti-masterd pid=147168/MainThread INFO Using PycURL libcurl/7.26.0 GnuTLS/2.12.20 zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3
2015-01-22 18:15:47,522: ganeti-masterd pid=147168/MainThread INFO Inspecting job queue
2015-01-22 18:15:47,525: ganeti-masterd pid=147168/MainThread INFO Job queue inspection: 0/85 (1.2 %)
2015-01-22 18:15:47,545: ganeti-masterd pid=147168/MainThread ERROR Can't parse job 1378058, will archive.
Traceback (most recent call last):
  File "/usr/local/share/ganeti/2.10/ganeti/jqueue.py", line 2061, in _LoadJobUnlocked
    job = self._LoadJobFromDisk(job_id, False)
  File "/usr/local/share/ganeti/2.10/ganeti/jqueue.py", line 2124, in _LoadJobFromDisk
    raise errors.JobFileCorrupted(err)
JobFileCorrupted: Invalid data to LoadOpCode: OP_ID OP_CLUSTER_RENEW_CRYPTO unsupported
2015-01-22 18:15:47,666: ganeti-masterd pid=147168/MainThread ERROR Can't parse job 1378076, will archive.
Traceback (most recent call last):
  File "/usr/local/share/ganeti/2.10/ganeti/jqueue.py", line 2061, in _LoadJobUnlocked
    job = self._LoadJobFromDisk(job_id, False)
  File "/usr/local/share/ganeti/2.10/ganeti/jqueue.py", line 2124, in _LoadJobFromDisk
    raise errors.JobFileCorrupted(err)
JobFileCorrupted: Invalid data to LoadOpCode: OP_ID OP_CLUSTER_RENEW_CRYPTO unsupported

------------8<------8<--------

Any suggestion?

Marco Casavecchia

unread,
Jan 23, 2015, 7:24:44 AM1/23/15
to gan...@googlegroups.com
An Update,
Since by removing every client.pem and ssconf_master_candidates_certs from all the nodes the cluster seems to work (even if with a few warning messages), I tried to see if by upgrading it to 2.12.1 would have fixed the issue.

I was wrong... in fact even by adding the missing configuration parameters, the cluster is still unusable (it can't perform the verify), so I had to revert to the 2.10.3.
I guess that I need to find a way to fix the TLS Issue on the 2.11 upgrade before try again an upgrade to 2.12.

Helga Velroyen

unread,
Jan 23, 2015, 7:31:12 AM1/23/15
to gan...@googlegroups.com
Hi!

after you removed all client.pem and ssconf_master_candidates_certs, one of the warnings probably is that you should run a renew-crypto. Before this warning fixed, updating to 2.12 is likely to fail.

Which packages did you use of Ganeti, the one debian provides with the distribution or did you install from source?

Cheers,
Helga

Daniel K

unread,
Jan 23, 2015, 10:07:40 AM1/23/15
to gan...@googlegroups.com
What does the following gives you?
python -c 'import pycurl; print pycurl.version'

If you are using GNU TLS:
PycURL/7.19.5.1 libcurl/7.26.0 GnuTLS/2.12.20 zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3

 then i suggest the following workaround, since I've had the same issue with my debian setup.

# apt-get install python-dev python-setuptools build-essential dpkg-dev
# apt-get source python-pycurl
# apt-get build-dep python-pycurl
# cd pycurl-7.19.0/

# perl -p -i -e "s/gnutls/openssl/" debian/control
# apt-get install libcurl4-openssl-dev# cd ../
# dpkg-buildpackage -rfakeroot -b
# dpkg -i python-pycurl_7.19.0-5_amd64.deb

Verify using the python script from above:
python -c 'import pycurl; print pycurl.version'

Output should now be:
libcurl/7.26.0 OpenSSL/1.0.1e zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3


I've used this procedure based on https://bugs.launchpad.net/ubuntu/+source/pycurl/+bug/926548 comment #9.

You need to do that on all the nodes & you need to restart the ganeti nodes before restarting the master.
I've also deleted the keys i've had in the master and nodes and re-generated them using:

# gnt-cluster renew-crypto --new-cluster-certificate
# gnt-cluster renew-crypto --new-node-certificates

Hope it helps.

Daniel.

Marco Casavecchia M.

unread,
Jan 23, 2015, 11:01:42 AM1/23/15
to gan...@googlegroups.com
Yeah, i saw it XD
I installed ganeti from source.

Marco Casavecchia M.

unread,
Jan 23, 2015, 11:05:31 AM1/23/15
to gan...@googlegroups.com
Wow, thanks a lot!
This very interesting.

I will try it monday morning and then I will let you know. :)

Marco Casavecchia

unread,
Jan 26, 2015, 6:01:17 AM1/26/15
to gan...@googlegroups.com
Okay, It worked.
Now my cluster is running on 2.11.6 without problems.

Thank you very much Daniel.

Daniel K

unread,
Jan 26, 2015, 2:16:50 PM1/26/15
to gan...@googlegroups.com
You're welcome.
Reply all
Reply to author
Forward
0 new messages