Double encoded slash in path_info removed when using apache/wsgi

731 views
Skip to first unread message

Merrick

unread,
Oct 7, 2008, 7:05:55 PM10/7/08
to Django users
I need urls that contain other urls within them i.e.

http://mydomain.com/find/http%3A%2F%2Fwww.wired.com%2F%2F

I am using Apache 2.2.3-4, mod_wsgi 2.0-1 and have
"AllowEncodedSlashes On" within my virtualhost conf file, this setting
allows me to have encoded slashes in the path_info without having
apache return a fake 404.

My views.py includes:

def find(request, url):
import urllib
url = urllib.unquote_plus(url)
form = FindForm({'url': url,})
return render_to_response('find.html', { 'form': form, 'request':
request })


The template find.html includes a line {{ request.path_info }} to
print out the path_info.

Firefox shows the second encoded forward slash is removed when calling
this url:
http://mydomain.com/find/http%3A%2F%2Fwww.wired.com%2F%2F
prints out:
/find/http:/www.wired.com/

The django test client prints out the second forward slash

>>> ./manage.py
>>> from django.test.client import Client
>>> c = Client()
>>> response = c.get('/bookmarklet/http:%2F%2Fwww.wired.com%2F%2F')
>>> response.content
'/find/http://www.wired.com//'

I did not include anything about my regular expression in urls.py up
until this point because it is not related to this problem from what I
can tell, but in case you are wondering I was trying to capture the
url after find/ - my regex looks like this r'^find/(?P<url>(.*))$'

Thanks for looking at this,

Merrick

Merrick

unread,
Oct 8, 2008, 2:57:42 AM10/8/08
to Django users
Could this problem be in the wsgi.py handler in django? If not I
suppose I need to look at mod_wsgi and quit asking here :)

Graham Dumpleton

unread,
Oct 8, 2008, 6:00:44 AM10/8/08
to Django users


On Oct 8, 5:57 pm, Merrick <merr...@gmail.com> wrote:
> Could this problem be in the wsgi.py handler in django? If not I
> suppose I need to look at mod_wsgi and quit asking here :)

It is how Apache works and there is possibly not much you can do about
it.

In Apache 1.3, repeating slashes can be passed through to CGI
PATH_INFO variable in certain situations, but in Apache 2.X they
aren't and instead, repeating slashes are collapsed by Apache.

To make the behaviour consistent, mod_wsgi will apply Apache 2.X
behaviour to Apache 1.3 when passing through CGI variables in WSGI
environment, and will drop repeating slashes.

So, if using Apache 2.X there would be nothing that could be done even
if mod_wsgi code weren't collapsing the repeating slashes.

The closest you will get would be to look at and parse value of
REQUEST_URI variable passed through in WSGI environment. For example,
for normal and encoding repeating slashes, one gets:

PATH_INFO: '/a/b/c/http:/www.wired.com/'
QUERY_STRING: ''
REQUEST_URI: '/wsgi/scripts/echo.py/a//b/c/http%3A%2F%2Fwww.wired.com
%2F%2F'
SCRIPT_NAME: '/wsgi/scripts/echo.py'

Although Apache/mod_wsgi supplies REQUEST_URI, it isn't a required
WSGI variable and may not be available with other WSGI hosting
solutions.

In general relying on slashes and/or encoded slashes in PATH_INFO may
not be a good idea. For discussion on these issues read through:

http://groups.google.com/group/python-web-sig/browse_frm/thread/2003e1c1ecce27b2
http://groups.google.com/group/python-web-sig/browse_frm/thread/5907c746c855a18e

Graham

Merrick

unread,
Oct 9, 2008, 7:57:34 PM10/9/08
to Django users
Thank you Graham, I was going crazy trying to figure this out.

Thankfully I control my hosting environment top to bottom (colocation)
so I will try using REQUEST_URI I will use it.


On Oct 8, 3:00 am, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:
> On Oct 8, 5:57 pm, Merrick <merr...@gmail.com> wrote:
>
> > Could this problem be in the wsgi.py handler in django? If not I
> > suppose I need to look at mod_wsgi and quit asking here :)
>
> It is how Apache works and there is possibly not much you can do about
> it.
>
> In Apache 1.3, repeatingslashescan be passed through to CGI
> PATH_INFO variable in certain situations, but in Apache 2.X they
> aren't and instead, repeatingslashesare collapsed by Apache.
>
> To make the behaviour consistent, mod_wsgi will apply Apache 2.X
> behaviour to Apache 1.3 when passing through CGI variables in WSGI
> environment, and will drop repeatingslashes.
>
> So, if using Apache 2.X there would be nothing that could be done even
> if mod_wsgi code weren't collapsing the repeatingslashes.
>
> The closest you will get would be to look at and parse value of
> REQUEST_URI variable passed through in WSGI environment. For example,
> for normal and encoding repeatingslashes, one gets:
>
> PATH_INFO: '/a/b/c/http:/www.wired.com/'
> QUERY_STRING: ''
> REQUEST_URI: '/wsgi/scripts/echo.py/a//b/c/http%3A%2F%2Fwww.wired.com
> %2F%2F'
> SCRIPT_NAME: '/wsgi/scripts/echo.py'
>
> Although Apache/mod_wsgi supplies REQUEST_URI, it isn't a required
> WSGI variable and may not be available with other WSGI hosting
> solutions.
>
> In general relying onslashesand/or encodedslashesin PATH_INFO may
> not be a good idea. For discussion on these issues read through:
>
>  http://groups.google.com/group/python-web-sig/browse_frm/thread/2003e...
>  http://groups.google.com/group/python-web-sig/browse_frm/thread/5907c...
>
> Graham
>
> > On Oct 7, 4:05 pm, Merrick <merr...@gmail.com> wrote:
>
> > > I need urls that contain other urls within them i.e.
>
> > >http://mydomain.com/find/http%3A%2F%2Fwww.wired.com%2F%2F
>
> > > I am using Apache 2.2.3-4, mod_wsgi 2.0-1 and have
> > > "AllowEncodedSlashes On" within my virtualhost conf file, this setting
> > > allows me to have encodedslashesin the path_info without having
Reply all
Reply to author
Forward
0 new messages