Capturing full URL string

62 views
Skip to first unread message

John M

unread,
Jun 4, 2008, 12:30:11 PM6/4/08
to Django users
I am running into some weirdness in an app im writing and so I thought
I'd try to see how the basics of URL strings are handled.

So I wrote a one line hello world app, and wanted to see how the dev
server output it's results. I am still getting my feet wet with the
whole web / http / HTML thing, these may be silly questions.

Here's my view.py
from django.http import HttpResponse

def announce(request):
return HttpResponse("Hello")

the urls.py maps announce/$ to this view, and that is working.

What's odd, is when I goto my browser and do http://localhost:8000/announce,
the dev server does this output:
[04/Jun/2008 09:22:20] "GET /announce HTTP/1.1" 301 0
[04/Jun/2008 09:22:20] "GET /announce/ HTTP/1.1" 200 5

I'm unclear as to what these are? the 301 looks like it's 'fixing' my
URL or something to redirect to /announce/ instead of just the /
announce that I put in the URL?

The reason for all this, is I'm having trouble in another post about a
torrent tracker im trying to write, and i'm not getting the parameters
passed as I would expect.

Any information about this behavior would be great.

Thanks

John

John M

unread,
Jun 4, 2008, 12:54:30 PM6/4/08
to Django users
Well, in investigating this further, I think I found the 'issue'.
There are a few threads with similar issues, where the pattern in
urls.py doesn't capture the trailing slash, and I guess by default,
django figures it will 'adjust' the URL and redirect to the 'correct'
URL, hence the initial 301, which I think for most cases is great.

My original urls.py was:

(r'^announce/$', 'track.views.announce'),

and that was causing a 301 if I went to 127.0.0.1:8000/announce.

If I added
(r'^announce$', 'track.views.announce')

so my urls.py looks like this:
(r'^announce/$', 'track.views.announce'),
(r'^announce$', 'track.views.announce')

when run with the urls.py above, it ran as expected without the 301
redirect.

So here's the rub, this is awesome for probably most applications, but
when django does the redirect under the dev server, with a complicated
escaped parameter it seems like it mangles the parameters on the
redirect.

To test this, I pointed a bittorrent client at my test announce client
(it doesn't do anything but return 'hello world', so the client will
just assume it's a bad tracker, but for purposes of testing....

With the urls.py w/o adjusting for 301 'features':

urls.py
(r'^announce/$', 'track.views.announce')

runserver output:

[04/Jun/2008 09:47:29] "GET /announce?info_hash=%10%C2%E1%96%E0%8D
%90%05%B7%DF%C
6%BC%8E%C2%15%E4%3D%60%CC
%84&peer_id=T03I-----7oWWIp1FS9B&port=57328&uploaded=0&
downloaded=0&left=0&no_peer_id=1&compact=1&event=started&key=-UIpWN
HTTP/1.1" 30
1 0

[04/Jun/2008 09:47:29] "GET /announce/?
uploaded=0&compact=1&no_peer_id=1&info_ha
sh=%10%EF%BF%BD%EF%BF%BD%EF%BF%BD%05%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF
%BD%EF%BF%B
D%EF%BF%BD%CC%84&event=started&downloaded=0&key=-
UIpWN&peer_id=T03I-----7oWWIp1F
S9B&port=57328&left=0 HTTP/1.1" 200 5

with the urls.py adjusted to avoid the 301.
(r'^announce$', 'track.views.announce'),
(r'^announce/$', 'track.views.announce'),

runserver output:
[04/Jun/2008 09:49:36] "GET /announce?info_hash=%10%C2%E1%96%E0%8D
%90%05%B7%DF%C
6%BC%8E%C2%15%E4%3D%60%CC
%84&peer_id=T03I-----6HsE05d95lH&port=17242&uploaded=0&
downloaded=0&left=0&no_peer_id=1&compact=1&event=started&key=Jq-8Ta
HTTP/1.1" 20
0 5

Note the difference?

When django does the redirect for the 301 the parameters change,
specifically info_hash:

before 301
info_hash=%10%C2%E1%96%E0%8D%90%05%B7%DF%C6%BC%8E%C2%15%E4%3D%60%CC%84

After 301
info_hash=%10%EF%BF%BD%EF%BF%BD%EF%BF%BD%05%EF%BF%BD%EF%BF%BD%EF%BF%BD
%EF%BF%BD%EF%BF%BD%EF%BF%BD%CC%84

Am I right in my observations?

Is there anything I can do to avoid this in django? HELP, THIS IS
REALLY HOLDING ME UP. In the mean time, I'll capture the slashes
better.

Thanks,

John



On Jun 4, 9:30 am, John M <retireonc...@gmail.com> wrote:
> I am running into some weirdness in an app im writing and so I thought
> I'd try to see how the basics of URL strings are handled.
>
> So I wrote a one line hello world app, and wanted to see how the dev
> server output it's results. I am still getting my feet wet with the
> whole web / http / HTML thing, these may be silly questions.
>
> Here's my view.py
> from django.http import HttpResponse
>
> def announce(request):
> return HttpResponse("Hello")
>
> the urls.py maps announce/$ to this view, and that is working.
>
> What's odd, is when I goto my browser and dohttp://localhost:8000/announce,

Gregor Müllegger

unread,
Jun 4, 2008, 1:35:40 PM6/4/08
to Django users
This is because Django will redirect you to a page with an appended
slash to your url if it's not already there -- how you have
discovered.

To understand why this is done you should read the following section
in django's documentation:
http://www.djangoproject.com/documentation/middleware/#django-middleware-common-commonmiddleware

To prevent the trailing slahs, set in your settings module
APPEND_SLASH to False.

Gregor

Matthias Kestenholz

unread,
Jun 4, 2008, 1:35:43 PM6/4/08
to django...@googlegroups.com

John M

unread,
Jun 4, 2008, 2:03:08 PM6/4/08
to Django users
Yes, I understand that, and I think it's a good thing, but when it
redirects, it mangles the parameters, would you agree?

J

On Jun 4, 10:35 am, Gregor Müllegger <phxx...@googlemail.com> wrote:
> This is because Django will redirect you to a page with an appended
> slash to your url if it's not already there -- how you have
> discovered.
>
> To understand why this is done you should read the following section
> in django's documentation:http://www.djangoproject.com/documentation/middleware/#django-middlew...

Karen Tracey

unread,
Jun 4, 2008, 3:59:52 PM6/4/08
to django...@googlegroups.com
On Wed, Jun 4, 2008 at 2:03 PM, John M <retire...@gmail.com> wrote:
Yes, I understand that, and I think it's a good thing, but when it
redirects, it mangles the parameters, would you agree?

Yes, I think that's a bug in Django.  The code that is doing the APPEND_SLASH handling tries to use request.GET.urlencode() to restore the original query parameters to the new url it has generated (specifically here: http://code.djangoproject.com/browser/django/trunk/django/middleware/common.py#L83).   However this fails to reconstitute the original query parameters when they were not in fact valid utf-8 to begin with (as your info_hash is not).  Back when the GET QueryDict was constructed, this code:

http://code.djangoproject.com/browser/django/trunk/django/http/__init__.py#L156

took the info_hash bytestring with repr '\x10\xc2\xe1\x96\xe0\x8d\x90\x05\xb7\xdf\xc6\xbc\x8e\xc2\x15\xe4=`\xcc\x84' and generated the unicode string with repr u'\x10\ufffd\ufffd\ufffd\x05\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0304' in its place.  The request was assumed to be encoded in utf-8 for want of any better information.  All those '\ufffd's are the Unicode replacement character, indicating that the input bytestring contained invalid utf-8 sequences.  (For example, while the first byte \x10 is a valid 1-byte utf-8 sequence, the next two bytes \xc2 \xe1 are not.  \xc2 is a valid first byte for a 2-byte sequence, but the 2nd byte must then begin with binary 10, where \xe1 begins binary 11. So those two bytes are tossed and '\ufffd' put in their place.)  At this point there is no way to go back to the original input since generating the replacement char in place of invalid input throws away the original information.  When the APPEND_SLASH code tries to urlencode() this unicode version of the query string, you see a lot of %EF%BF%BD because \xef\xbf\xbd is the 3-byte utf-8 encoding of the Unicode replacement character \ufffd.  Clear as mud?

Anyway I think line 83 of django/middleware/common.py should be:

    newurl += '?' + request.META['QUERY_STRING']

instead of:
 
    newurl += '?' + request.GET.urlencode()

That will ensure that the query parameters included in the redirect url are identical to what was included in the original url.

Karen

John M

unread,
Jun 4, 2008, 4:35:56 PM6/4/08
to Django users
Well, thank God you took a look at the code and agreed on my
findings. I'll just adjust my urls.py for now.

Should I submit a bug report? (it'd be my first :) )

Thanks again for your time on this, I'm glad it was a bug and not my
mis-standing of django or the way this all works together. Now I can
move on a continue my app.

John

On Jun 4, 12:59 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> On Wed, Jun 4, 2008 at 2:03 PM, John M <retireonc...@gmail.com> wrote:
> > Yes, I understand that, and I think it's a good thing, but when it
> > redirects, it mangles the parameters, would you agree?
>
> Yes, I think that's a bug in Django. The code that is doing the
> APPEND_SLASH handling tries to use request.GET.urlencode() to restore the
> original query parameters to the new url it has generated (specifically
> here:http://code.djangoproject.com/browser/django/trunk/django/middleware/...).
> However this fails to reconstitute the original query parameters when they
> were not in fact valid utf-8 to begin with (as your info_hash is not). Back
> when the GET QueryDict was constructed, this code:
>
> http://code.djangoproject.com/browser/django/trunk/django/http/__init...

Leeland (The Code Janitor)

unread,
Jun 5, 2008, 1:02:52 PM6/5/08
to Django users
I read this with great interest. It actually will save me time on my
project. I will be encountering this behavior shortly. Thank you!

Please, please submit a bug report on this with all the details here.
If people do not submit bug reports future user of Django will
eventually have the same problem and loose hours and hours searching
for the cause (like Karen, Matthias, Gregor and yourself just did).
Make that hard work you just did troubleshooting this count!

+ Leeland

Karen Tracey

unread,
Jun 5, 2008, 1:28:45 PM6/5/08
to django...@googlegroups.com
On Wed, Jun 4, 2008 at 4:35 PM, John M <retire...@gmail.com> wrote:

Well, thank God you took a look at the code and agreed on my
findings.  I'll just adjust my urls.py for now.

Should I submit a bug report?  (it'd be my first :) )

Thanks again for your time on this, I'm glad it was a bug and not my
mis-standing of django or the way this all works together.  Now I can
move on a continue my app.

Yes, I think this is worth a ticket (assuming one doesn't exist already -- I don't have time to search).

Karen

John M

unread,
Jun 5, 2008, 5:41:04 PM6/5/08
to Django users
OOOO How exciting, im actually getting involved (via Karen of course),
I'll submit today and put the ticket number back here.

Thanks so much.

John

On Jun 5, 10:28 am, "Karen Tracey" <kmtra...@gmail.com> wrote:

John M

unread,
Jun 5, 2008, 6:21:42 PM6/5/08
to Django users
Reply all
Reply to author
Forward
0 new messages