SSL error with geni-lib, but same certificate/configuration works with omni

63 views
Skip to first unread message

ff...@nyu.edu

unread,
Dec 28, 2016, 3:22:43 PM12/28/16
to GENI Users
I attempted to follow the instructions here to create a context with geni-lib. I am running the following code in Python 2.7.12 with latest geni-lib on 0.9-DEV:

from geni.aggregate import FrameworkRegistry
from geni.aggregate.context import Context
from geni.aggregate.user import User
import pprint
import geni.aggregate.instageni as ig

def buildContext ():
  portal = FrameworkRegistry.get("portal")()
  portal.cert = "/home/ffund/.ssl/geni-ffund01.pem"
  portal.key = "/home/ffund/.ssl/geni-ffund01.pem"
  ffund = User()
  ffund.name = "ffund01"
  ffund.urn = "urn:publicid:IDN+ch.geni.net+user+ffund01"
  ffund.addKey("/home/ffund/.ssh/id_rsa.pub")
  context = Context()
  context.addUser(ffund)
  context.cf = portal
  context.project = "witestlab"
  return context

context = buildContext()
pprint.pprint(ig.GPO.getversion(context))

but I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "geni/aggregate/core.py", line 157, in getversion
    return self.api.getversion(context, self.url)
  File "geni/aggregate/apis.py", line 129, in getversion
    res = AM2.getversion(url, False, context.cf.cert, context.cf.key)
  File "geni/minigcf/amapi2.py", line 39, in getversion
    timeout = config.HTTP.TIMEOUT, allow_redirects = config.HTTP.ALLOW_REDIRECTS)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.9.1-py2.7.egg/requests/sessions.py", line 511, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.9.1-py2.7.egg/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.9.1-py2.7.egg/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.9.1-py2.7.egg/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert unknown ca')],)",)


I think my certificate is OK though, because it works fine with omni. The relevant entries in my omni config file are the same as in the context I was trying to set up:

[omni]                                                                          
default_cf = portal_chapi                                                       
users = ffund01                                                                 
useslicemembers = True                                                          
default_project = witestlab                                                                                                 
[portal]                                                                        
type = pgch                                                                     
authority=ch.geni.net                                                           
ch = https://ch.geni.net/PGCH                                                   
sa = https://ch.geni.net/PGCH                                                   
cert = ~/.ssl/geni-ffund01.pem                                                  
key = ~/.ssl/geni-ffund01.pem                                                   
[ffund01]                                                                       
urn = urn:publicid:IDN+ch.geni.net+user+ffund01                                 
keys = ~/.ssh/id_rsa.pub                                                        

and I don't get any SSL errors with omni. Any ideas?

Nicholas Bastin

unread,
Dec 28, 2016, 3:27:05 PM12/28/16
to geni-...@googlegroups.com
On Wed, Dec 28, 2016 at 3:22 PM, <ff...@nyu.edu> wrote:
I attempted to follow the instructions here to create a context with geni-lib. I am running the following code in Python 2.7.12 with latest geni-lib on 0.9-DEV:

I'm having the same problem on one of my VMs, but not others.  It appears to be a problem with one of the following things, although I haven't been able to nail it down:

OpenSSL
Python Cryptography
pyOpenSSL

A fresh install of Ubuntu 16.04 works fine, although an upgrade to 16.04 from 14.04.5 does not.

I tried installing OpenSSL 1.0.2g by hand, but it didn't fix it (although this is complicated enough that I can't be sure it's actually using it everywhere it needs to).

So, this is the long way around me saying I am aware of the problem in some environments, but I'm not sure at the moment how to fix it.  If you can send me relevant system information for your environment, it might help nail down the problem (OpenSSL version, etc.)

--
Nick

Nicholas Bastin

unread,
Dec 28, 2016, 9:54:38 PM12/28/16
to geni-...@googlegroups.com
So, it turns out the current instructions for Ubuntu 14.04 no longer work, due to a confluence of geni-lib changes, some ubuntu changes, and a probable change at the GPO clearinghouse.  I've been working on the problem and have made progress in addressing the issue (both with new docs and geni-lib modifications), but they are not ready quite yet (hopefully in the next day or two).

I have also tracked down the source of the unknown CA problem on ubuntu 14.04 and addressed it, although I still have to do some testing for newer versions.  I am not sure how to fix systems which are now in this state, as Python setuptools has created some very bad environments that are hard to debug.

It seems likely that this problem was caused by a change in the software or configuration of the GPO clearinghouse (sometime after thanksgiving, not sure when exactly), which changed the combination of supported TLS version, key exchange ciphers, and signature hash algorithms.  Unfortunately, I don't have any record of what they used to be, so there's probably no looking back at this point.

I will send out an update when geni-lib has been updated, although if you are desperate for a fix you can work off of:

bitbucket.org/nbastin/geni-lib branch 20161221-ch2

This may involve uninstalling some packages that were previously installed via pip, and reinstalling them via apt:

python-ipaddr python-requests python-lxml python-pip

--
Nick

ff...@nyu.edu

unread,
Dec 28, 2016, 11:32:02 PM12/28/16
to GENI Users
Thanks, Nick.

I was on a 16.04 via upgrade, not a clean install, so I guess I can confirm that's one of the "bad" environments.

I tried it out in a clean virtualenv and it seems to work (so far). 

# Notes to self: running geni-lib in clean virtualenv
# Create a virtualenv for geni-lib 
mkdir geni-lib-virtualenv
cd geni-lib-virtualenv/
virtualenv geni

# Get geni-lib
cd geni-lib
hg update -C 0.9-DEV
cd ..

# Install libraries in the virtualenv
source geni/bin/activate
pip install lxml ipaddr requests
cd geni-lib
python setup.py install

Tom Mitchell

unread,
Jan 6, 2017, 8:31:42 AM1/6/17
to GENI Users


On Wednesday, December 28, 2016 at 9:54:38 PM UTC-5, nick.bastin wrote:
So, it turns out the current instructions for Ubuntu 14.04 no longer work, due to a confluence of geni-lib changes, some ubuntu changes, and a probable change at the GPO clearinghouse. 

It seems likely that this problem was caused by a change in the software or configuration of the GPO clearinghouse (sometime after thanksgiving, not sure when exactly), which changed the combination of supported TLS version, key exchange ciphers, and signature hash algorithms.  Unfortunately, I don't have any record of what they used to be, so there's probably no looking back at this point.

Hi Nick,

I'm not aware of any recent changes to the software or configuration of the GPO clearinghouse. The last release/update of the clearinghouse software was on September 20, 2016. No configuration changes or OS updates have occurred since Thanksgiving.

Can you say anything more about what you think might have changed at the GPO clearinghouse that caused an issue for geni-lib? Any additional information you can provide would help me to investigate.

Thanks,
Tom

Nicholas Bastin

unread,
Jan 7, 2017, 3:20:45 AM1/7/17
to geni-...@googlegroups.com
On Fri, Jan 6, 2017 at 8:31 AM, Tom Mitchell <tmit...@bbn.com> wrote:
Can you say anything more about what you think might have changed at the GPO clearinghouse that caused an issue for geni-lib? Any additional information you can provide would help me to investigate.

The only thing I can positively identify is a possible change in the advertised acceptable client cert issuers (or accepted cert types).  The current list is this:

/C=US/ST=Utah/L=Salt Lake City/O=Utah Network Testbed/OU=Certificate Authority/CN=boss.emulab.net/emailAddress=testb...@flux.utah.edu
Client Certificate Types: RSA sign, DSA sign, ECDSA sign

There's nothing actually *wrong* with this list (I think), I just have a strong hunch that it is somewhat different from what it used to be.  My reasoning is that the problem we experience is actually on the client side - the underlying socket provider pre-validates whether the cert we're about to send *would be* accepted by the server (using the x509v3 AKID extension), and it (seemingly incorrectly) decides that it won't work, so it never sends it and bails out (that is the source of the error from the original email).  I don't think Omni avails itself of any of the certificate extensions, so it just goes ahead, and of course the client cert is actually fine so it works.  The first system I found this problem on started experiencing it without (probably) any client-side changes (high confidence this is true, but I sortof doubt everything at this point), and so I think it's possible that a "meaningless" change on the server side might have tripped a pre-existing bug on the client side.  On the other hand, if the server doesn't have any kind of auto-updating enabled for OS or security updates, then who knows.

I have made some changes to the geni-lib environment that eliminate the problem for fresh installs, but I still don't profess to fully understanding it - we have systems that appear to be identical (based on every comparison I have been able to think of so far), but which exhibit different behaviours (some work, some don't).  This unfortunately isn't that surprising - it's actually pretty hard to figure out how things installed via pip were actually installed, and you can't guarantee that equivalent module versions are actually identical for a couple of reasons:

* Some modules have optional C extensions, and pip can't tell you after the fact whether it was built to use it, or whether an installation fell back onto a pure python version (and thus still resulted in a successful installation)

* Modules that use CFFI (like our crypto extension) can have lookup behaviours that are dictated by the underlying library version when they were installed, but don't know that the library version changed.  If a function call location changes this will either cause a crash or will properly find the new entry point, which is what you want, but if the default initialization of say a data structure is different, there's no way to detect this.

I've fallen back to removing almost all of our reliance on pip, as reconciling the environments is not something we want to try to tackle.  I have a new set of docs for installing geni-lib on Ubuntu 14.04 and 16.04 that will be going up soon, although existing installations still risk falling victim to this problem, even if they haven't already.

I'm not sure this is worth investing a lot of new time into - I think the moral of the story here is that we need to rely on consistent distribution packages to maintain a knowable environment, rather than using pip to make sure we have latest versions.  There is probably a future where this will cause us problems with vulnerabilities discovered in the OS packages that are not up to date, but at the moment I will err on the side of solving the problem we *do* have versus the one we might have.

--
Nick

Tom Mitchell

unread,
Jan 9, 2017, 11:14:44 AM1/9/17
to GENI Users


On Saturday, January 7, 2017 at 3:20:45 AM UTC-5, nick.bastin wrote:
On Fri, Jan 6, 2017 at 8:31 AM, Tom Mitchell <tmit...@bbn.com> wrote:
Can you say anything more about what you think might have changed at the GPO clearinghouse that caused an issue for geni-lib? Any additional information you can provide would help me to investigate.

The only thing I can positively identify is a possible change in the advertised acceptable client cert issuers (or accepted cert types).  The current list is this:

/C=US/ST=Utah/L=Salt Lake City/O=Utah Network Testbed/OU=Certificate Authority/CN=boss.emulab.net/emailAddress=testbed-ops@flux.utah.edu
Client Certificate Types: RSA sign, DSA sign, ECDSA sign


The set of acceptable client cert issuers hasn't changed since May, 2015, so that's not it. I can't speak to the accepted cert types though. I don't know where that comes from.

Thanks for the clear explanation below about pre-validation on the client side. That does explain why omni still works while geni-lib is having trouble. I'm glad you were able to get things working again at least in some circumstances. If you'd like me to check anything on the clearinghouse side going forward, just let me know.

Thanks,
Tom


 

Nicholas Bastin

unread,
Jan 9, 2017, 12:28:50 PM1/9/17
to geni-...@googlegroups.com
On Mon, Jan 9, 2017 at 11:14 AM, Tom Mitchell <tmit...@bbn.com> wrote:
The set of acceptable client cert issuers hasn't changed since May, 2015, so that's not it. I can't speak to the accepted cert types though. I don't know where that comes from.

Thanks for the clear explanation below about pre-validation on the client side. That does explain why omni still works while geni-lib is having trouble. I'm glad you were able to get things working again at least in some circumstances. If you'd like me to check anything on the clearinghouse side going forward, just let me know.

It turns out that maybe the IG AM is having the same problem:


Specifically the line:

LWP::Protocol::https::Socket: SSL connect attempt failed because of handshake problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at /usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.

This further leads me to believe something has changed on the CH side, although I still don't know what it would be.  If spewlogs are retained for significant time periods, someone with access to them might be able to more precisely identify the date at which this problem started occurring.

--
Nick

Victor J. Orlikowski

unread,
Jan 9, 2017, 12:46:11 PM1/9/17
to Nicholas Bastin, geni-...@googlegroups.com
On Mon, Jan 09, 2017, at 12:28 PM, Nicholas Bastin wrote:
> It turns out that maybe the IG AM is having the same problem:
>

What's the root cert?
And - is said root cert SHA-1 signed?

Reason I ask:
A lot of the browser folks shut down SHA-1 signed end-entity certs
as of 1/1/2017. Root certs, and intermediates, *ought* to be exempt
(esp. since the foundation of trust in the root certs has a
different basis) - but I wonder if some of the libraries out there
have decided to handle that differently...

Victor
--
Victor J. Orlikowski <> vjo@(ee.|cs.)?duke.edu

Victor J. Orlikowski

unread,
Jan 9, 2017, 12:48:55 PM1/9/17
to Nicholas Bastin, geni-...@googlegroups.com
On Mon, Jan 09, 2017, at 12:46 PM, Victor J. Orlikowski wrote:
> What's the root cert?
> And - is said root cert SHA-1 signed?
>

Answering my own question - ch.geni.net roots at "GeoTrust Global
CA" - which is SHA-1 signed.

It may be that someone has (incorrectly) decided that the SHA-1
signed root cert is reason enough to invalidate the chain.

Nicholas Bastin

unread,
Jan 9, 2017, 12:51:44 PM1/9/17
to Victor J. Orlikowski, geni-...@googlegroups.com
On Mon, Jan 9, 2017 at 12:48 PM, Victor J. Orlikowski <v...@duke.edu> wrote:
Answering my own question - ch.geni.net roots at "GeoTrust Global
CA" - which is SHA-1 signed.

It may be that someone has (incorrectly) decided that the SHA-1
signed root cert is reason enough to invalidate the chain.

I thought about this, but at least checking my own environment my local openssl has no problem with that configuration (also the root cert in question here is actually the root for the *client* cert, not the root for the TLS connection envelope).

Of course, I also have systems that appear identical that work differently...so who knows!  :-)

--
Nick 

Victor J. Orlikowski

unread,
Jan 9, 2017, 12:52:53 PM1/9/17
to Nicholas Bastin, geni-...@googlegroups.com
On Mon, Jan 09, 2017, at 12:48 PM, Victor J. Orlikowski wrote:
> It may be that someone has (incorrectly) decided that the SHA-1
> signed root cert is reason enough to invalidate the chain.
>

Another matter of curiosity...

I realize that you're removing much of geni-lib's reliance on pip
(based on your earlier message) - but, does geni-lib pull in certifi
(to have the latest root certs available)?

Nicholas Bastin

unread,
Jan 9, 2017, 1:00:14 PM1/9/17
to Victor J. Orlikowski, geni-...@googlegroups.com
On Mon, Jan 9, 2017 at 12:52 PM, Victor J. Orlikowski <v...@duke.edu> wrote:
I realize that you're removing much of geni-lib's reliance on pip
(based on your earlier message) - but, does geni-lib pull in certifi
(to have the latest root certs available)?

No, but we also don't care what your roots are (you can optionally enable root checking, but you need to have the GENI roots, as they're not in the same trust constellation as public web sites, and you'd also need to install many of the server certs as their own root).

--
Nick

Tom Mitchell

unread,
Jan 9, 2017, 1:05:20 PM1/9/17
to GENI Users


On Monday, January 9, 2017 at 12:28:50 PM UTC-5, nick.bastin wrote:
On Mon, Jan 9, 2017 at 11:14 AM, Tom Mitchell <tmit...@bbn.com> wrote:
The set of acceptable client cert issuers hasn't changed since May, 2015, so that's not it. I can't speak to the accepted cert types though. I don't know where that comes from.

Thanks for the clear explanation below about pre-validation on the client side. That does explain why omni still works while geni-lib is having trouble. I'm glad you were able to get things working again at least in some circumstances. If you'd like me to check anything on the clearinghouse side going forward, just let me know.

It turns out that maybe the IG AM is having the same problem:


Specifically the line:

LWP::Protocol::https::Socket: SSL connect attempt failed because of handshake problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at /usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.


I suspect this is unrelated. I don't know what client certificate might be used by the InstaGENI AM to invoke methods at the GPO Clearinghouse. The call being made here, based on the error "skipping registration" appears to be a ProtoGENI-specific API call. The GPO Clearinghouse does not implement the actual call that this code will eventually invoke, "RegisterSliver". I looked at GeniCM.pm.in and GeniRegistry.pm.in in the emulab-devel code (although my copy is a bit out of date).

I'll touch base with the InstaGENI team to see if they can figure out what client certificate is being used in this particular error. I have no reason to believe the InstaGENI AM in this case is using a certificate that should be accepted by the GPO Clearinghouse.

Tom


 

Hussamuddin Nasir

unread,
Jan 9, 2017, 1:07:11 PM1/9/17
to geni-...@googlegroups.com

I agree with Tom here. That is intact what the error means. I remember asking about this to Leigh back in 2014 ...

Its not related to the geni-lib issue

cheers,

Hussam
(Hussamuddin Nasir)

Netlab Operations Team 

-------------------------------------------------------------------
Laboratory for Adv. Networking  Phone  : (859)218-0059
James F Hardymon Building       Fax    : (859)323-3740
301 Rose Street, Rm 237         E-mail : na...@netlab.uky.edu
Lexington, KY 40506-0495        Web    : http://www.netlab.uky.edu

                        University of Kentucky
                        **********************
------------------------------------------------------------------- 
--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicholas Bastin

unread,
Jan 9, 2017, 1:08:10 PM1/9/17
to geni-...@googlegroups.com
On Mon, Jan 9, 2017 at 1:05 PM, Tom Mitchell <tmit...@bbn.com> wrote:
I suspect this is unrelated. I don't know what client certificate might be used by the InstaGENI AM to invoke methods at the GPO Clearinghouse.
 
Ah yeah that's a good point...  I had not noticed it trying to connect to ch.geni.net in the past (although in this case the error makes sense assuming it doesn't have a client cert it can use).

--
Nick

Victor J. Orlikowski

unread,
Jan 9, 2017, 1:11:43 PM1/9/17
to Nicholas Bastin, geni-...@googlegroups.com
On Mon, Jan 09, 2017, at 01:00 PM, Nicholas Bastin wrote:
> No, but we also don't care what your roots are (you can optionally enable
> root checking, but you need to have the GENI roots, as they're not in the
> same trust constellation as public web sites, and you'd also need to
> install many of the server certs as their own root).
>

I figured, and that's fair.

I was trying to determine if there's any *external* reason for cert
validation to *suddenly* stop working.
These were the things that came to mind.

Sometimes, I exist merely to inject "crazy" avenues to consider -
or, so I'm told. ;)

Nicholas Bastin

unread,
Jan 11, 2017, 12:29:57 AM1/11/17
to Victor J. Orlikowski, geni-...@googlegroups.com
Just to loop back on this one - I've pushed new changes to geni-lib that create fresh environments for Ubuntu 14.04 and 16.04 that work.  There are new installation instructions at:


There are more changes coming, but enough people were starting to report having this problem that I needed to get a temporary fix deployed.  I doubt that these instructions will *fix* broken installations (it depends on what the underlying problem actually is), but they should create installations which will not become broken.

--
Nick

Tom Mitchell

unread,
Jan 11, 2017, 10:10:01 AM1/11/17
to GENI Users, v...@duke.edu
I was able to confirm with the InstaGENI team that the error in the spewlog is unrelated to whatever the geni-lib problem is. The InstaGENI aggregate manager is making a call to the GPO clearinghouse using a client certificate that is not in the trusted bundle at the GPO clearinghouse. Thus the failure of that particular call is expected and appropriate.
Reply all
Reply to author
Forward
0 new messages