How do I authenticate a urllib2 script in order to access HTTPS web services from a Django site?

213 views
Skip to first unread message

carl

unread,
Feb 22, 2011, 4:38:32 PM2/22/11
to Django users
Hi, everyone. I've searched the group site and couldn't find anything
with this specific problem. If I missed a good discussion on this, let
me know. The following is my problem:

I'm working on a django/mod_wsgi/apache2 website that serves sensitive
information using https for all requests and responses. All views are
written to redirect if the user isn't authenticated. It also has
several views that are meant to function like RESTful web services.

I'm now in the process of writing a script that uses urllib/urllib2 to
contact several of these services in order to download a series of
very large files. I'm running into problems with 403: FORBIDDEN errors
when attempting to log in.

The (rough-draft) method I'm using for authentication and log in is:

def login( base_address, username=None, password=None ):

# prompt for the username (if needed), password
if username == None:
username = raw_input( 'Username: ' )
if password == None:
password = getpass.getpass( 'Password: ' )
log.info( 'Logging in %s' % username )

# fetch the login page in order to get the csrf token
cookieHandler = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener( urllib2.HTTPSHandler(),
cookieHandler )
urllib2.install_opener( opener )

login_url = base_address + PATH_TO_LOGIN
log.debug( "login_url: " + login_url )
login_page = opener.open( login_url )

# attempt to get the csrf token from the cookie jar
csrf_cookie = None
for cookie in cookieHandler.cookiejar:
if cookie.name == 'csrftoken':
csrf_cookie = cookie
break
if not cookie:
raise IOError( "No csrf cookie found" )
log.debug( "found csrf cookie: " + str( csrf_cookie ) )
log.debug( "csrf_token = %s" % csrf_cookie.value )

# login using the usr, pwd, and csrf token
login_data = urllib.urlencode( dict(
username=username, password=password,
csrfmiddlewaretoken=csrf_cookie.value ) )
log.debug( "login_data: %s" % login_data )

req = urllib2.Request( login_url, login_data )
response = urllib2.urlopen( req )
# <--- 403: FORBIDDEN here

log.debug( 'response url:\n' + str( response.geturl() ) + '\n' )
log.debug( 'response info:\n' + str( response.info() ) + '\n' )

# should redirect to the welcome page here, if back at log in -
refused
if response.geturl() == login_url:
raise IOError( 'Authentication refused' )

log.info( '\t%s is logged in' % username )
# save the cookies/opener for further actions
return opener

I'm using the HTTPCookieHandler to store Django's authentication
cookies on the script-side so I can access the web services and get
through my redirects.

Specifically, I'm getting a 403 when trying to post the credentials to
the login page/form over the https connection. This method works when
used on the development server which uses an http connection.

There is no Apache directory directive that prevents access to that
area (that I can see). The script connects successfully to the login
page without post data so I'm thinking that would leave Apache out of
the problem (but I could be wrong).

I know that the CSRFmiddleware for Django is going to bump me out if I
don't pass the csrf token along with the log in information, so I pull
that first from the first page/form load's cookiejar. Like I
mentioned, this works with the http/development version of the site.

The python installations I'm using are both compiled with SSL.

I've also read that urllib2 doesn't allow https connections via proxy.
I'm not very experienced with proxies, so I don't know if using a
script from a remote machine is actually a proxy connection and
whether that would be the problem. Is this causing the access problem?

From what I can tell, the problem is in the combination of cookies and
the post data, but I'm unclear as to where to take it from here.

Any help would be appreciated. Thanks

carl

unread,
Feb 23, 2011, 3:43:37 PM2/23/11
to Django users
It turns out I needed to set the HTTP Referer header to the login page
url in the request where I post the login information.

req.add_header( 'Referer', login_url )

The reason is explained on the Django CSRF documentation -
specifically, step 4.

Due to our somewhat peculiar server setup where we use HTTPS on the
production side and DEBUG=False, I wasn't seeing the csrf_failure
reason for failure (in this case: 'Referer checking failed - no
referer') that is normally output in the DEBUG info. I ended up
printing that failure reason to the Apache error_log and STFW'd on it.
That lead me to code.djangoproject/.../csrf.py and the Referer header
fix.

Thanks to anyone who gave this a read!
Reply all
Reply to author
Forward
0 new messages