[Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info

25 views
Skip to first unread message

Ian Bicking

unread,
Sep 27, 2009, 11:35:12 PM9/27/09
to Web SIG
I tried implementing some code to convert REQUEST_URI (the raw request URL) and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info.


Admittedly the tests are not very complete, I just wasn't feeling creative about test cases.  In terms of performance this avoids being entirely brute force, but feels kind of complex.  I'm betting there's an entirely different approach which is faster.  But whatever.

Graham Dumpleton

unread,
Sep 28, 2009, 3:34:35 AM9/28/09
to Ian Bicking, Web SIG
2009/9/28 Ian Bicking <ia...@colorstudy.com>:

> I tried implementing some code to convert REQUEST_URI (the raw request URL)
> and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info.
>   http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2)
>   http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3)
> Admittedly the tests are not very complete, I just wasn't feeling creative
> about test cases.  In terms of performance this avoids being entirely brute
> force, but feels kind of complex.  I'm betting there's an entirely different
> approach which is faster.  But whatever.

Got an error:

mod_wsgi (pid=4301): Exception occurred processing WSGI script
'/Users/grahamd/Testing/tests/wsgi20.wsgi'.
Traceback (most recent call last):
File "/Users/grahamd/Testing/tests/wsgi20.wsgi", line 80, in application
environ['PATH_INFO'])
File "/Users/grahamd/Testing/tests/wsgi20.wsgi", line 64, in
request_uri_to_path
remove_segments = remove_segments - 1 -
qscript_name_parts[-1].lower().count('%2f')
IndexError: list index out of range

This was an extreme corner case where Apache mod_rewrite was being
used to do stuff:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /wsgi20.wsgi/$1 [QSA,PT,L]

and Apache was configured to allow encoded slashes. The input would have been:

REQUEST_URI: '/a%2fb/c/d'
SCRIPT_NAME: '/wsgi20.wsgi'
PATH_INFO: '/a/b/c/d'

That style of rewrite rule is quite often used with Apache, although
allowing encoded slashes isn't.

That SCRIPT_NAME needs to be adjusted is a known consideration with
this rewrite rule. Usually you would use wrapper around WSGI
application which does:

def _application(environ, start_response):
# The original application.
...

import posixpath

def application(environ, start_response):
# Wrapper to set SCRIPT_NAME to actual mount point.
environ['SCRIPT_NAME'] = posixpath.dirname(environ['SCRIPT_NAME'])
if environ['SCRIPT_NAME'] == '/':
environ['SCRIPT_NAME'] = ''
return _application(environ, start_response)

If that algorithm is used in WSGI adapter however, would never get the
opportunity to do that though as would already have failed before it
got called.

Graham
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Ian Bicking

unread,
Sep 28, 2009, 1:36:32 PM9/28/09
to Graham Dumpleton, Web SIG
Thanks for the test case; fixed in tip now.  If anything goes wrong what should happen is a return value of (quote(script_name), quote(path_info)) -- there's no combination of request_uri/script_name/path_info that should cause an exception (except bugs).  As you say, there's no promise that those values are in any way related, and when that is the case it is appropriate to fix it up at the WSGI stage (not necessarily in the WSGI adapter itself).
Reply all
Reply to author
Forward
0 new messages