[Web-SIG] Repeating slashes in REQUEST_URI, SCRIPT_NAME and PATH_INFO.

31 views
Skip to first unread message

Graham Dumpleton

unread,
Jan 28, 2007, 7:10:03 PM1/28/07
to web...@python.org
Another question on SCRIPT_NAME, PATH_INFO etc.

This time I am after information on what responsibilities an adapter for a
specific web server has in respect of removal and/or preservation of repeating
slashes in a request URI.

Take for example that a WSGI application is mounted at:

/wsgi/a

and that the request URI is:

REQUEST_URI: '/////wsgi//////a///b//c/d'

What should SCRIPT_NAME and PATH_INFO be set to? Should repeating slashes
be removed from SCRIPT_NAME so that it matches the normalised mount point,
or should the repeating slashes be preserved?

Thus should the above REQUEST_URI yield:

SCRIPT_NAME: '/wsgi/a'
PATH_INFO: '///b//c/d'

or perhaps:

SCRIPT_NAME: '/////wsgi//////a'
PATH_INFO: '///b//c/d'

Similarly should repeating slashes be left as is in the PATH_INFO?

I note that path_info_pop() in paste says:

>>> def call_it(script_name, path_info):
... env = {'SCRIPT_NAME': script_name, 'PATH_INFO': path_info}
... result = path_info_pop(env)
... print 'SCRIPT_NAME=%r; PATH_INFO=%r; returns=%r' % (
... env['SCRIPT_NAME'], env['PATH_INFO'], result)
>>> call_it('/foo', '/bar')
SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns='bar'
>>> call_it('/foo/bar', '')
SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns=None
>>> call_it('/foo/bar', '/')
SCRIPT_NAME='/foo/bar/'; PATH_INFO=''; returns=''
>>> call_it('', '/1/2/3')
SCRIPT_NAME='/1'; PATH_INFO='/2/3'; returns='1'
>>> call_it('', '//1/2')
SCRIPT_NAME='//1'; PATH_INFO='/2'; returns='1'

The last comment demonstrates the need to treat repeating slashes
as a single slash, but also seems to indicate that SCRIPT_NAME can have
repeating slashes in it. Running the code yields:

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '//c/d', 'SCRIPT_NAME': '/////wsgi//////a///b'}

In wsgiref.shift_path_info(), although it also treats repeating slashes as one,
it strips all the repeating slashes out.

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '/c/d', 'SCRIPT_NAME': '/wsgi/a/b'}

What is accepted convention for dealing with repeating slashes. Should
any web server adapter leave repeating slashes in both SCRIPT_NAME and
PATH_INFO, or should it at least normalise SCRIPT_NAME so that it matches
the designated mount point.

Thanks in advance.

Graham
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Robert Brewer

unread,
Jan 28, 2007, 11:44:46 PM1/28/07
to Graham Dumpleton, web...@python.org

Graham Dumpleton wrote:
> What is accepted convention for dealing with repeating slashes.
> Should any web server adapter leave repeating slashes in both
> SCRIPT_NAME and PATH_INFO, or should it at least normalise
> SCRIPT_NAME so that it matches the designated mount point.

The URI BNF allows for empty path segments, so a doubled slash has its own distinct meaning. And since "the designated mount point" is so designated by the URI, I would think one should leave doubled slashes in. IMO this certainly applies to PATH_INFO, although I could understand someone writing a server that normalized SCRIPT_NAME (and telling its users it was limited in that way).


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

Reply all
Reply to author
Forward
0 new messages