How to prevent URL HTML encoding?

4,984 views
Skip to first unread message

Ross

unread,
Sep 12, 2008, 10:00:09 AM9/12/08
to Django users
I setup a URL that accepts any number of characters and the '+'
symbol, which works just fine.

/search/pets/dog+cat
/search/pets/turtle
/search/pets/turtle+cat+mouse

All of those URLs go to the same view. The view splits the third
argument on the '+' symbol and uses the list to do some work. The
problem is, when I call render_to_response to display the template,
Django encodes the '+' symbols in the URL. The user's address bar
shows '/search/pets/dog%2Bcat' instead of preserving the '+' symbols.

Is there a way to prevent that?

Norman Harman

unread,
Sep 12, 2008, 10:22:40 AM9/12/08
to django...@googlegroups.com

http://docs.djangoproject.com/en/dev/ref/templates/builtins/#safe

http://docs.djangoproject.com/en/dev/topics/templates/#id2


--
Norman J. Harman Jr.
Senior Web Specialist, Austin American-Statesman
___________________________________________________________________________
Get off the sidelines and huddle up with the Statesman all season long
for complete high school, college and pro coverage in print and online!

julianb

unread,
Sep 12, 2008, 12:04:56 PM9/12/08
to Django users
On Sep 12, 4:22 pm, "Norman Harman" <nhar...@statesman.com> wrote:
> http://docs.djangoproject.com/en/dev/ref/templates/builtins/#safe
>
> http://docs.djangoproject.com/en/dev/topics/templates/#id2

HTML escaping doesn't make %2B out of '+'...

Ross

unread,
Sep 12, 2008, 12:28:45 PM9/12/08
to Django users
http://www.w3schools.com/TAGS/ref_urlencode.asp
http://en.wikipedia.org/wiki/Percent-encoding

The URL percent encoding for '+' is '%2B'.

Norman, I understand how to mark variables as safe inside templates,
but the problem I am having is with the URL in your browser bar after
hitting a URL like '/djangoproject/search/hello+goodbye/' .

I capture the parameters like so: r'^djangoproject/search/(?P<test>[\w\
+]+)/$'

Then I use render_to_response("sometemplate.html", { ... }), which
should not rewrite the URL as far as I understand. However, the
browser address bar will show '/djangoproject/search/hello%2Bgoodbye/'
after hitting that page. I am not using a redirect, so I am not even
touching the URL. I am not sure why this is happening or how to
prevent it...

Ross

unread,
Sep 12, 2008, 12:31:05 PM9/12/08
to Django users
Sorry, Julian, I misunderstood what you wrote. I see the Django HTML
escaping doesn't change the '+' symbol, which makes this even more
confusing... Even more strangely, this only happens intermittently.

On Sep 12, 11:04 am, julianb <julian....@gmail.com> wrote:

Steve Holden

unread,
Sep 12, 2008, 12:39:33 PM9/12/08
to django...@googlegroups.com
Ross wrote:
> Sorry, Julian, I misunderstood what you wrote. I see the Django HTML
> escaping doesn't change the '+' symbol, which makes this even more
> confusing... Even more strangely, this only happens intermittently.
>
"+" might be a bad choice of separator anyway, given that the plus sign
when literally present in a URL will be treated as an escaped
representation of space. So Django's probably doing the right thing
changing the plus signs to %2B, since that way the server will actually
se a plus sign in the URL, and hopefully pass it through correctly to
whatever code processes the URL.

In terms of having the browser's user see the plus signs in the location
bar, there is no way to do that, because of their interpretation
described above.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Karen Tracey

unread,
Sep 12, 2008, 1:08:09 PM9/12/08
to django...@googlegroups.com
On Fri, Sep 12, 2008 at 12:31 PM, Ross <rea...@gmail.com> wrote:

Sorry, Julian, I misunderstood what you wrote. I see the Django HTML
escaping doesn't change the '+' symbol, which makes this even more
confusing... Even more strangely, this only happens intermittently.

I'd guess this is only happening when you are, in fact, getting redirected via APPEND_SLASH or something like that.  No part of a normal response tells the browser what to put in the address bar, so I don't see how anything done by render_to_response could be involved here.

What, exactly, is the problem with having the + percent-encoded in the URL?  Is it that it might be confusing to users or is there an actual failure to route urls correctly once this has happened?

Karen

Ross

unread,
Sep 12, 2008, 1:49:18 PM9/12/08
to Django users
Karen, you were exactly right: APPEND_SLASH is the culprit here. If
you hit '/search/hotel+air/', the address stays that way. Hitting '/
search/hotel+air' though redirects you to '/search/hotel%2Bair/'

I am using '+' because it logically makes sense for what I am trying
to do. I am doing a travel search, and for packages it would be
something like '/search/hotel+air/'. You can append any number of
travel pieces in there like '/search/cruise+hotel+car+air/'. The order
of the pieces doesn't matter, because I split them on '+' when it gets
to the view.

Even if '+' is meant to be a space in URL-speak, it conveys the right
meaning for what I am trying to do. Is it possible to prevent
APPEND_SLASH from percent encoding URLs?


On Sep 12, 12:08 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:

Karen Tracey

unread,
Sep 12, 2008, 2:14:45 PM9/12/08
to django...@googlegroups.com
On Fri, Sep 12, 2008 at 1:49 PM, Ross <rea...@gmail.com> wrote:

Karen, you were exactly right: APPEND_SLASH is the culprit here. If
you hit '/search/hotel+air/', the address stays that way. Hitting '/
search/hotel+air' though redirects you to '/search/hotel%2Bair/'

I am using '+' because it logically makes sense for what I am trying
to do. I am doing a travel search, and for packages it would be
something like '/search/hotel+air/'. You can append any number of
travel pieces in there like '/search/cruise+hotel+car+air/'. The order
of the pieces doesn't matter, because I split them on '+' when it gets
to the view.

Even if '+' is meant to be a space in URL-speak, it conveys the right
meaning for what I am trying to do. Is it possible to prevent
APPEND_SLASH from percent encoding URLs?

You didn't answer the part about why the percent-encoding is a problem? By the time it gets to your view, the percent-encoding will be undone and your view code can deal with '+' chars.  (At least, that is how a quickly modified view of my own sees it...sorry but I do not have time for a more in-depth investigation at the moment).  So, why is the percent-encoding done by the redirect a problem?  It should not be a problem so I'm having trouble understanding why you want to prevent it.

Karen

julianb

unread,
Sep 12, 2008, 4:58:49 PM9/12/08
to Django users
A '+' doesn't have to be encoded, but it should be if it has no
special meaning, maybe that's the problem here.
http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters

Why does APPEND_SLASH even percent-encode?

Ross

unread,
Sep 12, 2008, 6:06:23 PM9/12/08
to Django users
The problem is it looks bad! It makes the URL unreadable, which is
what I want to prevent. For example, last.fm uses '+' symbols to
separate band names.

http://www.last.fm/music/Goo+Goo+Dolls pretty obviously takes you to
the page for the band "Goo Goo Dolls".

http://www.last.fm/music/Goo%2BGoo%2BDolls, however, is far tougher to
pick apart by a human reader.

My short answer is I want to keep my URLs human readable.

On Sep 12, 1:14 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:

Karen Tracey

unread,
Sep 12, 2008, 11:15:48 PM9/12/08
to django...@googlegroups.com
On Fri, Sep 12, 2008 at 6:06 PM, Ross <rea...@gmail.com> wrote:

The problem is it looks bad! It makes the URL unreadable, which is
what I want to prevent. For example, last.fm uses '+' symbols to
separate band names.

http://www.last.fm/music/Goo+Goo+Dolls pretty obviously takes you to
the page for the band "Goo Goo Dolls".

http://www.last.fm/music/Goo%2BGoo%2BDolls, however, is far tougher to
pick apart by a human reader.

My short answer is I want to keep my URLs human readable.

OK, so you are not actually seeing a code problem in your dispatching/views resulting from this?  That is what I was trying to determine: whether your code had to adapt to things coming in percent-encoded, because I didn't think it should, and if it did, I'd want to track down why.

If you just don't like how it looks in the browser address bar, then simply avoid having it happen.  Specify your url regex expressions so that the trailing slash is optional, that way APPEND_SLASH never has to get involved and issue the redirect. Wouldn't that be easier than trying to change how APPEND_SLASH works? 

Karen

Ross

unread,
Sep 13, 2008, 1:36:30 AM9/13/08
to Django users
The URL encoding is not causing any problems in my code. Splitting the
parameter on '+' works the same whether the address is 'Goo+Goo+Dolls'
or 'Goo%2BGoo%2BDolls'.

I can certainly add additional expressions to catch URLs with and
without the trailing slash, I was just hoping there was an option to
do just that.

On Sep 12, 10:15 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> On Fri, Sep 12, 2008 at 6:06 PM, Ross <real...@gmail.com> wrote:
>
> > The problem is it looks bad! It makes the URL unreadable, which is
> > what I want to prevent. For example, last.fm uses '+' symbols to
> > separate band names.
>
> >http://www.last.fm/music/Goo+Goo+Dollspretty obviously takes you to

Karen Tracey

unread,
Sep 13, 2008, 2:25:23 AM9/13/08
to django...@googlegroups.com
On Sat, Sep 13, 2008 at 1:36 AM, Ross <rea...@gmail.com> wrote:

The URL encoding is not causing any problems in my code. Splitting the
parameter on '+' works the same whether the address is 'Goo+Goo+Dolls'
or 'Goo%2BGoo%2BDolls'.

I can certainly add additional expressions to catch URLs with and
without the trailing slash, I was just hoping there was an option to
do just that.

Change your pattern from:


r'^djangoproject/search/(?P<test>[\w\+]+)/$'

to:

r'^djangoproject/search/(?P<test>[\w\+]+)/?$'

The question mark makes the trailing slash optional -- pattern will match a url with 0 or 1 trailing slashes.

Karen

 

Ross

unread,
Sep 13, 2008, 11:47:08 AM9/13/08
to Django users
That will definitely do the trick, but the idea of APPEND_SLASH from
what I understand is 'http://www.last.fm/music/Goo+Goo+Dolls/' and
'http://www.last.fm/music/Goo+Goo+Dolls' (with and without the
trailing slash) are considered different URLs. Appending the slash
normalizes all your URLs to always include a trailing slash, which
correctly routes search engines to the same page.

I am only writing a small internal app, so search engine optimization
is not a concern. However, if my app were public, I would want to make
APPEND_SLASH work for me.

Karen, thanks for all the help by the way! I am going to go with the
optional slash in my regexs to get around APPEND_SLASH.

On Sep 13, 1:25 am, "Karen Tracey" <kmtra...@gmail.com> wrote:
> On Sat, Sep 13, 2008 at 1:36 AM, Ross <real...@gmail.com> wrote:
>
> > The URL encoding is not causing any problems in my code. Splitting the
> > parameter on '+' works the same whether the address is 'Goo+Goo+Dolls'
> > or 'Goo%2BGoo%2BDolls'.
>
> > I can certainly add additional expressions to catch URLs with and
> > without the trailing slash, I was just hoping there was an option to
> > do just that.
>
> Change your pattern from:
>
> r'^djangoproject/search/(?P<test>[\w\+]+)/$'
>
> to:
>
> r'^djangoproject/search/(?P<test>[\w\+]+)/?$'
>
> The question mark makes the trailing slash optional -- pattern will match a
> url with 0 or 1 trailing slashes.
>
> Karen
>
>
>
> > On Sep 12, 10:15 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> > > On Fri, Sep 12, 2008 at 6:06 PM, Ross <real...@gmail.com> wrote:
>
> > > > The problem is it looks bad! It makes the URL unreadable, which is
> > > > what I want to prevent. For example, last.fm uses '+' symbols to
> > > > separate band names.
>
> > > >http://www.last.fm/music/Goo+Goo+Dollsprettyobviously takes you to
Reply all
Reply to author
Forward
0 new messages