%2B (plus sign) gets decoded twice?

326 views
Skip to first unread message

michi

unread,
Mar 10, 2008, 1:42:53 AM3/10/08
to pylons-discuss
Hello,

I'm using Pylons-0.9.6.1. When a request URL contains percent-encoded
plus signs (%2B), the corresponding controller receives them as spaces
(" ").

Say the request URL is:
http://example.com/language/view/C%2B%2B

The value of request.environ["PATH_INFO"] is '/language/view/C++',
which is already percent-decoded.

The value of request.environ['pylons.routes_dict'] is {'action':
u'view', 'controller': u'language', 'id': u'C '}

How can I fix this?

Thanks!
--Michi

Mike Orr

unread,
Mar 10, 2008, 3:57:50 PM3/10/08
to pylons-...@googlegroups.com

Created bug ticket http://routes.groovie.org/trac/routes/ticket/67 .

The only workaround I can think of is to recognize "C " in the action
and replace it with "C++". Not elegant bit it will get you by until
Routes is fixed. I hope you don't have lots of incoming IDs with
pluses and spaces in them.

--
Mike Orr <slugg...@gmail.com>

michi

unread,
Mar 10, 2008, 4:32:29 PM3/10/08
to pylons-discuss
Hi Mike,

Thanks for your quick response. For now I'll do what you suggested.

Thanks!
--Michi

On Mar 10, 12:57 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
> Mike Orr <sluggos...@gmail.com>

michi

unread,
Mar 14, 2008, 7:25:05 PM3/14/08
to pylons-discuss
Also, The value of request.environ["PATH_INFO"] seems to be coming
from urllib.unquote()
function ("+" is not converted to " "). Is there a way for me to
specify that I want to use
urllib.unquote_plus() instead?

Here is what I'm trying to do. When the URL is:

http://example.com/topic/view/C%2B%2B

the controller receives "C++", and when the URL is

http://example.com/topic/view/Python+is+great

the controller receives "Python is great".

Thanks!
--Michi

michi

unread,
Mar 15, 2008, 1:22:42 AM3/15/08
to pylons-discuss
... and the encoding above is consistent with what url_for() returns:

print h.url_for(controller='topic', action='view', id='C++')

/topic/view/C%2B%2B

print h.url_for(controller='topic', action='view', id='Python is
great')

/topic/view/Python+is+great

--Michi

Ben Bangert

unread,
Mar 16, 2008, 12:57:46 PM3/16/08
to pylons-...@googlegroups.com

I should note that the latest routes-dev repo does have the multiple
decoding bug fixed. It also fixes the routes resource issue with
using ; instead of / at the end for actions. It will probably be
released soon, in the next week or so.

Cheers,
Ben

michi

unread,
Apr 3, 2008, 3:07:27 AM4/3/08
to pylons-discuss
Ok, just one more thing...

One consequence of Paste doing percent-decoding is that the id cannot
contain forward-slashes. Here is an example url:

http://example.com/topic/view/PL%2FSQL

In this case request.environ["PATH_INFO"] is /topic/view/PL/SQL, and
there is no way for routes to tell if there was an escaped slash. The
WSGI spec doesn't seem to specify whether the value of PATH_INFO
should be percent-decoded or not. What's the expected behavior here?

Python Web Server Gateway Interface v1.0:
http://www.python.org/dev/peps/pep-0333/

--Michi

Mike Orr

unread,
Apr 3, 2008, 4:28:09 AM4/3/08
to pylons-...@googlegroups.com
On Thu, Apr 3, 2008 at 12:07 AM, michi <muts...@gmail.com> wrote:
>
> Ok, just one more thing...
>
> One consequence of Paste doing percent-decoding is that the id cannot
> contain forward-slashes. Here is an example url:
>
> http://example.com/topic/view/PL%2FSQL
>
> In this case request.environ["PATH_INFO"] is /topic/view/PL/SQL, and
> there is no way for routes to tell if there was an escaped slash. The
> WSGI spec doesn't seem to specify whether the value of PATH_INFO
> should be percent-decoded or not. What's the expected behavior here?

If your ID can include a slash, you should be using a wildcard variable:

m.connect("/:topic/view/*id", controller="foo", action="bar")

You can try ":controller/:action/*id", but no guarantees on what wrong
URLs it may swallow.

As for whether Paste should defer unescaping, here's what the RFCs say:

ftp://ftp.isi.edu/in-notes/rfc2616.txt (HTTP, section 3.2.3)

http://www.faqs.org/rfcs/rfc2396.html (URI, sections 2,2, 2,3,
2,4, 2,4,1, 2.4.2)

The slash is a reserved character, meaning it's distinct from its
escaped equivalent (%2F). As opposed to tilde which is unreserved, so
"~" and "%7E" are completely equivalent.

However, I'm sure Paste would find it very tedious to unescape some
characters but not others. There's also the issue of filenames with
spaces. When should "a%20space.jpg" be unescaped to "a space.jpg"?

--
Mike Orr <slugg...@gmail.com>

Ian Bicking

unread,
Apr 3, 2008, 11:20:39 AM4/3/08
to pylons-...@googlegroups.com

The WSGI spec refers to the CGI spec, which indicates that PATH_INFO
should be url-unquoted. Unfortunately as a result there's no way to
distinguish %2f from /. I would recommend translating this character to
something else.

Ian

Ben Bangert

unread,
Apr 3, 2008, 12:56:49 PM4/3/08
to pylons-...@googlegroups.com
On Apr 3, 2008, at 8:20 AM, Ian Bicking wrote:

> The WSGI spec refers to the CGI spec, which indicates that PATH_INFO
> should be url-unquoted. Unfortunately as a result there's no way to
> distinguish %2f from /. I would recommend translating this
> character to
> something else.

Alternatively, you can put the variable in as a query argument, where
Routes will ignore it.

Cheers,
Ben

michi

unread,
Apr 3, 2008, 1:24:21 PM4/3/08
to pylons-discuss
Ah, I see, that's really unfortunate. How about adding percent-encoded
URL in request.environ? It's ok to add "server-defined variables" to
request.environ according to PEP333. This way, you don't break the
spec and the end users (or routes) can use the percent-encoded URL if
they choose to .

There was a similar discussion in ROR Trac:
http://dev.rubyonrails.org/ticket/7544

--Michi

Ben Bangert

unread,
Apr 3, 2008, 1:42:57 PM4/3/08
to pylons-...@googlegroups.com

Unlike Rails though, Pylons functions underneath WSGI, which is a
subset of CGI and specifically says that the URL Pylons see will have
already been URL decoded, thus Routes has no idea what slash was
originally URL encoded. I should note that the Rails solution there
breaks depending on if the connector used to hook up the Rails app URL
decodes the URL first. This is part of why the WSGI spec operates as a
subset of the CGI spec, to ensure portability and consistency.

To have Routes use one percent encoded URL in some cases, but not in
others, will cause a similar bout of confusion. As sometimes the app
will work as people expect (depending on their deployment), and
sometimes it will break.

Why not put the data you want to retain the slashes, into the query
string? You'll get the characters you want then, though Routes will be
unable to directly dispatch on them, you could write a conditional
function for the Routes match, that looks at the query string:
http://routes.groovie.org/manual.html#conditions

That way you could pull out the data you wanted from the
environ['QUERY_STRING'] (or pass environ to a WSGIRequest obj to make
it easier to use), then add the variable to the Routes match dict so
your actions can see them.

Cheers,
Ben

Mike Orr

unread,
Apr 3, 2008, 3:22:42 PM4/3/08
to pylons-...@googlegroups.com

But doesn't the wildcard variable type exist precisely for this case?

map.connect("images/*id", controller="main", action="image")

matches:

images/ABC.jpg
images/123/456
images/nature/winter/ABC.jpg

--
Mike Orr <slugg...@gmail.com>

michi

unread,
May 11, 2008, 2:37:21 AM5/11/08
to pylons-discuss
I update Routes to 1.8. It did fix the double decoding issue, but
there is one more issue related to url encoding. The url_for function
in webhelper uses quote_plus for encoding (' ' [space] becomes
'+' [plus sign]). The wsgi_setup function in paste, on the other hand,
uses unquote for decoding.

I'm using the following packages:
Paste-1.6
Routes-1.8
WebHelpers-0.6dev

Yes, I could probably use the query string, but I just like those nice
looking /controller/action/id uri :)

Thanks!
--Michi

Luis Bruno

unread,
May 11, 2008, 11:45:30 AM5/11/08
to pylons-...@googlegroups.com
Mike Orr escreveu:

> But doesn't the wildcard variable type exist precisely for this case?
> map.connect("images/*id", controller="main", action="image"
When I tried it, I got stuck in these kinds of URL:
/catalog/Laptops/HP/Compaq/Armada-XXX
/catalog/Printers/Brother/D/PCL-344

Should "split" as:
["Laptops", "HP/Compaq", "Armada-XXX"]
["Printers", "Brother", "D/PCL-344"]

As far as I see, the "best" way to deal with these is to escape / as $2F
to get around the CGI-induced WSGI limitation.

signature.asc
Reply all
Reply to author
Forward
0 new messages