Web authentication

luigipaioro

unread,

Dec 4, 2003, 9:23:41 AM12/4/03

to pytho...@python.org

Good morning to all!

I'm trying to access on a web page that needs user and password
authentication. I'm enabled to access there (I mean that I have an
user name and a password to access via web), but I cannot access using
an automatic procedure (that is what I need to make a daemon that
downloads weekly an ASCII file from that site).

I've tried using urllib:

import urllib

conn = urlib.urlopen("http://user:pass...@www.mysite.com")
print conn.read()

But it doesn't work (it asks me again user and password).

Does anybody know how can I acces to my site with authentication?

I think that urllib2 can help me but I don't undestand how!!!

Thaks

Luigi

luigipaioro

unread,

Dec 4, 2003, 10:20:08 AM12/4/03

to pytho...@python.org

John J. Lee

unread,

Dec 4, 2003, 2:39:53 PM12/4/03

to

"luigipaioro" <luigi...@libero.it> writes:

> Good morning to all!

Good morning! No need to post twice, BTW.

> I'm trying to access on a web page that needs user and password
> authentication. I'm enabled to access there (I mean that I have an
> user name and a password to access via web), but I cannot access using
> an automatic procedure (that is what I need to make a daemon that
> downloads weekly an ASCII file from that site).
>
> I've tried using urllib:
>
> import urllib
>
> conn = urlib.urlopen("http://user:pass...@www.mysite.com")
> print conn.read()
>
> But it doesn't work (it asks me again user and password).

That URL should work for "Basic HTTP authentiation" using urllib (I
think -- I always use urllib2, so not certain about urllib). For some
reason, a quick glance at the code suggests it *won't* work with
urllib2, but it's easy enough to achieve the same result with that
module (see the link below for how). The page you're accessing may
need some other means of authentication, though.

When you log in manually, does your browser pop up a little, rather
plain-looking, separate window? Or do you type directly into a form
on the web page itself? If the former, it's probably Basic auth., and
what you're doing should work (or, unlikely, Digest auth., in which
case I think you need urllib2). If the latter, you probably need to
submit an HTML form in the web page to log in.

Some examples on auth and proxies with urllib2 (beware: I don't use a
proxy or basic / digest auth. very often, so these are untested
examples: if you use them, *please* comment on them to say whether
they do or do not work as advertised):

http://www.python.org/sf/798244

To fill in HTML forms, you can use urllib2.urlopen(url,
urllib.urlencode(...read the docs <wink>...)), or, if you want Python
to parse the form(s) for you and/or don't want to know the messy
details of HTML forms, you could use

http://wwwsearch.sf.net/ClientForm/

You may also find you need to handle HTTP cookies:

http://wwwsearch.sf.net/ClientCookie/

John

Paul Rubin

unread,

Dec 4, 2003, 2:47:11 PM12/4/03

to

"luigipaioro" <luigi...@libero.it> writes:
> Does anybody know how can I acces to my site with authentication?
>
> I think that urllib2 can help me but I don't undestand how!!!

It's documented in the manual. Try something like (untested):

import urllib

class Open_with_auth(urllib.FancyURLopener):
def prompt_user_passwd(self, host, realm):
return ('username', 'userpassword') # the uid and passwd you want to use

urllib._urlopener = Open_with_auth()

John J. Lee

unread,

Dec 4, 2003, 6:45:23 PM12/4/03

to

Doesn't/shouldn't http://user:pas...@example.com/blah.html work?

I don't know where that syntax is specified (if anywhere) -- do you
know, Paul? It seems at a glance that urllib understands that syntax
for ordinary Basic Auth., where urlib2 only knows it as a syntax for
proxy Basic Auth., but I may be wrong there...

John

Paul Rubin

unread,

Dec 4, 2003, 8:23:40 PM12/4/03

to

j...@pobox.com (John J. Lee) writes:
> Doesn't/shouldn't http://user:pas...@example.com/blah.html work?

It ought to but I don't know if urllib supports it. I've always done
it the other way, with that FancyURLOpener subclass.

> I don't know where that syntax is specified (if anywhere) -- do you
> know, Paul?

Part of the http spec.

Alan Kennedy

unread,

Dec 5, 2003, 5:28:31 AM12/5/03

to

[John J. Lee]

> Doesn't/shouldn't http://user:pas...@example.com/blah.html work?
>
> I don't know where that syntax is specified (if anywhere)

RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax

Section: 3.2.2. Server-based Naming Authority

Quoting from that section

"""
URL schemes that involve the direct use of an IP-based protocol to
a
specified server on the Internet use a common syntax for the server
component of the URI's scheme-specific data:

where <userinfo> may consist of a user name and, optionally,
scheme-
specific information about how to gain authorization to access the
server. The parts "<userinfo>@" and ":<port>" may be omitted.

server = [ [ userinfo "@" ] hostport ]

The user information, if present, is followed by a commercial
at-sign
"@".

userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | "+" | "$" | "," )

Some URL schemes use the format "user:password" in the userinfo
field. This practice is NOT RECOMMENDED, because the passing of
authentication information in clear text (such as URI) has proven
to
be a security risk in almost every case where it has been used.
"""

regards,

--
alan kennedy
------------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/contact/alan

Luigi

unread,

Dec 5, 2003, 9:59:54 AM12/5/03

to

Thak you...

I've tried to follow yours suggestions, but I cannot solve my trouble
yet!

These are my attempts:

def open_site1():

import urllib2

auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password('My realm', 'http://mysite.com/',
'user', 'password')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
site = urllib2.urlopen('http://mysite.com/')
print site.read()

def open_site2():

import urllib

class Open_with_auth(urllib.FancyURLopener):
def prompt_user_passwd(self, host, realm):

return ('user', 'password')

urllib._urlopener = Open_with_auth()
conn = urllib.urlopen('http://mysite.com')
print conn.read()

Both don't work!
Where is my mistake??

Luigi

John J. Lee

unread,

Dec 5, 2003, 11:50:43 AM12/5/03

to

Alan Kennedy <ala...@hotmail.com> writes:

> [John J. Lee]
> > Doesn't/shouldn't http://user:pas...@example.com/blah.html work?
> >
> > I don't know where that syntax is specified (if anywhere)
>
> RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax
>
> Section: 3.2.2. Server-based Naming Authority
>
> Quoting from that section
>
> """
> URL schemes that involve the direct use of an IP-based protocol to
> a
> specified server on the Internet use a common syntax for the server
> component of the URI's scheme-specific data:
>
> <userinfo>@<host>:<port>

[...]

Oops, how did I miss that?

Thanks

John

John J. Lee

unread,

Dec 5, 2003, 11:57:33 AM12/5/03

to

luigi...@libero.it (Luigi) writes:

> I've tried to follow yours suggestions, but I cannot solve my trouble
> yet!
>
> These are my attempts:

[...]

Nothing obviously wrong there.

> Where is my mistake??

What happens when you run the code? Traceback, unexpected HTTP
response, ...? Hard to guess without this kind of information.

The problem is often obvious if you just use a sniffer to look at the
HTTP headers getting sent by your browser and by your program, and
compare the two. Use Ethereal, for example (if you have an https: URL
so can't use a sniffer, try the livehttpheaders plugin for Mozilla, or
a proxy like proxomitron).

John

Greg Jorgensen

unread,

Dec 8, 2003, 3:44:29 PM12/8/03

to

luigi...@libero.it (Luigi) wrote in message news:<a94bfaf2.03120...@posting.google.com>...

> These are my attempts:
> ...

> Both don't work!
> Where is my mistake??

Since you don't know what the problem is and you probably aren't
familiar with proxies and header sniffing, try fetching the page with
curl (http://curl.haxx.se/), which you will already have if you run
Linux. Once you have it working from the command line with curl you
can translate to Python.

Turn on curl's verbose output:

curl -v http://user:pass...@www.mysite.com/page.html

Possible reasons for your problems:

- username/password wrong
- url wrong
- server not using basic authentication
- server expecting specific referrer, IP address, user agent
- secure connection (HTTPS) required

Some of these problems will be apparent from curl's output. Others
(like the referrer or user agent check) will not, but you can perhaps
deduce them from how the site works from a browser.

Greg Jorgensen
PDXperts LLC
Portland, Oregon USA

Luigi

unread,

Dec 9, 2003, 4:24:50 AM12/9/03

to

I'm very very sorry!!!! My user name was wrong!

Right now the second one works fine!!!

Thank you very much!!!

Luigi

gr...@pobox.com (Greg Jorgensen) wrote in message news:<9415febc.03120...@posting.google.com>...