Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Deficiency in urllib/socket for https?

17 views
Skip to first unread message

Gary Feldman

unread,
Aug 21, 2003, 4:56:20 PM8/21/03
to
I think I've found a deficiency in the design of urllib related to https.

In order to complete an https connection, it appears that URLOpener and
hence FancyURLOpener require the key and cert files. Or at least, it's not
clear from the description of socket.ssl what it does if they're omitted.

However, urlopen has no way to specify such things. Nor should it - for
typical uses, a person simply trying to retrieve data from an ssl site
really doesn't want to know or care about keys and certificate directories.
One just wants to provide an https url and have it work. Ideally, there
should be defaults for the certificate files.

This implies that somewhere in the function hierarchy, I suspect in
socket.ssl, there needs to be some clever defaults. I don't know if they
folks maintaining the Python distribution really want to be in the business
of maintaining key and certificate directories (probably not), but there at
least ought to be a way to specify default directories (oh, no, another
environment variable?). Thinking idealistically, it would be great if it
could share the default certs on the system (i.e. on UNIX, find a Netscape
or Mozilla install directory and use those, and on MS Windows, do whatever
it takes to use the Windows mechanism).

It's possible my analysis is flawed. I haven't taken the time to download
and read the _ssl code, just the socket.py code (and urllib and httplib) .
So corrections as appreciated as much as comments.

Gary

John J. Lee

unread,
Aug 22, 2003, 10:47:59 AM8/22/03
to
Gary Feldman <gafStop...@ziplink.stopallspam.net> writes:

> I think I've found a deficiency in the design of urllib related to https.
>
> In order to complete an https connection, it appears that URLOpener and
> hence FancyURLOpener require the key and cert files. Or at least, it's not
> clear from the description of socket.ssl what it does if they're omitted.

Nor from urllib -- see below. In fact, it seems that verification is
just skipped if they're not there.


> However, urlopen has no way to specify such things. Nor should it - for
> typical uses, a person simply trying to retrieve data from an ssl site
> really doesn't want to know or care about keys and certificate directories.
> One just wants to provide an https url and have it work. Ideally, there
> should be defaults for the certificate files.

Hmm, looking at both urllib and urllib2, I see urllib2 doesn't use any
key or certificate files at all. So, two points: this is a deficiency
in urllib2 that should be fixed, and, if you're not bothered about key
verification, I'd guess just not providing key / cert files will work.

Hmm, urllib documentation seems wrong here:

Additional keyword parameters, collected in x509, are used for
authentication with the https: scheme. The keywords key_file and
cert_file are supported; both are needed to actually retrieve a
resource at an https: URL.

The fact that https works in urllib2 (which does not provide key /
cert files) seems to demonstrate that they're *not* required, and that
verification is skipped if they're not supplied.

If you *are* bothered about verification, use the x509 arg to
FancyURLOpener (which is documented, see above). The urlopen function
is just a convenience -- just cut-n-paste the trivial code from
urllib.py and adapt it to your needs if you need something more
complicated.


> This implies that somewhere in the function hierarchy, I suspect in
> socket.ssl, there needs to be some clever defaults. I don't know if they
> folks maintaining the Python distribution really want to be in the business
> of maintaining key and certificate directories (probably not), but there at
> least ought to be a way to specify default directories (oh, no, another
> environment variable?). Thinking idealistically, it would be great if it
> could share the default certs on the system (i.e. on UNIX, find a Netscape
> or Mozilla install directory and use those, and on MS Windows, do whatever
> it takes to use the Windows mechanism).

That sounds great if you have the time to write the code. Nobody else
is likely to.


John

Gary Feldman

unread,
Aug 22, 2003, 1:47:30 PM8/22/03
to
On 22 Aug 2003 15:47:59 +0100, j...@pobox.com (John J. Lee) wrote:

Thanks for your extensive reply. All I can say is that any environment
that silently does https interactions without verifying the certificate,
and without loudly warning the user, is a security catastrophe waiting to
happen. While I don't claim to be a web security expert, I've spent enough
time dealing with such issues to know how critical this is, and how
important it is to be take responsibility for such issues at all times.
Even if it's just a clearly labelled warning in urlopen saying that it
ignores https certification errors, which by definition defeats a primary
purpose of https (it gets you encryption but no authentication).

>That sounds great if you have the time to write the code. Nobody else
>is likely to.

I have the time at the moment (unfortunately). I'm still working on the
Python expertise.

Gary

John J. Lee

unread,
Aug 22, 2003, 5:31:31 PM8/22/03
to
Gary Feldman <gafStop...@ziplink.stopallspam.net> writes:

> On 22 Aug 2003 15:47:59 +0100, j...@pobox.com (John J. Lee) wrote:
>
> Thanks for your extensive reply. All I can say is that any environment
> that silently does https interactions without verifying the certificate,
> and without loudly warning the user, is a security catastrophe waiting to
> happen. While I don't claim to be a web security expert, I've spent enough

[...]


> Even if it's just a clearly labelled warning in urlopen saying that it
> ignores https certification errors, which by definition defeats a primary
> purpose of https (it gets you encryption but no authentication).

[...]

You're right -- with the caveat that it is useful to have https even
without authentication (essentially all https traffic on the internet
proves that ;-).

Would you mind submitting a doc patch (both urllib and urllib2 docs
appear to need fixing -- urllib2 to say that it never verifies, urllib
to say that it skips verification if an appropriate x509 mapping isn't
supplied)?


John

John J. Lee

unread,
Aug 22, 2003, 5:50:01 PM8/22/03
to
j...@pobox.com (John J. Lee) writes:
[...]

> Would you mind submitting a doc patch (both urllib and urllib2 docs
> appear to need fixing -- urllib2 to say that it never verifies, urllib
> to say that it skips verification if an appropriate x509 mapping isn't
> supplied)?

Hmm, maybe I've got this wrong: the fact that key/cert args are passed
to httplib.HTTPS by urllib doesn't mean authentication happens, and
the fact that they're not passed by urllib2 doesn't mean
authentication doesn't happen. I'll investigate.


John

John J. Lee

unread,
Aug 22, 2003, 6:55:35 PM8/22/03
to
j...@pobox.com (John J. Lee) writes:
[...]
> You're right -- with the caveat that it is useful to have https even
> without authentication (essentially all https traffic on the internet
> proves that ;-).
[...]

I should have said "...it is useful to have *support* for https...".

The utility of https itself is another matter...


John

John J. Lee

unread,
Aug 22, 2003, 8:04:24 PM8/22/03
to

Bah! *After* reading the source, I found this in the ssl module docs:

| Warning: This does not do any certificate verification!

(which the _ssl.c source confirms: it uses SSL_VERIFY_NONE, but
doesn't call SSL_get_verify_result).

So the urllib docs are wrong:

| Additional keyword parameters, collected in x509, are used for
| authentication with the https: scheme. The keywords key_file and
| cert_file are supported; both are needed to actually retrieve a
| resource at an https: URL.

They're not needed, and they're never used for authentication (if you
don't count just checking the key without verifying it against the
certificate). Given this, the fact that urllib2 doesn't have
arguments for this starts to look like a feature, not a bug! Actually
(dredging up very hazy memories here) aren't you supposed to check a
revocation list, too? Is that given in a URL in the certificate? No
idea how this SSL stuff is supposed to work, really...

I'll upload a doc patch in a minute.

So, in summary, none of httplib, urllib and urllib2 in standard Python
do proper authentication (because the socket module doesn't). There
are third-party SSL libraries for Python: m2crypto is one. If you
need it, and assuming m2crypto has an ssl function with the same
interface that *does* do better auth, I suppose you could probably do

import socket
from m2crypto import ssl # or whatever
socket.ssl = ssl


And have urllib magically start working, with any luck.


John

0 new messages