Non ASCII URL characters in Django

479 views
Skip to first unread message

Shivaraj

unread,
Jul 26, 2009, 9:42:58 AM7/26/09
to Django users
Django doesn't allow urlecode or urlopen when there is a non-ASCII/
international characters in the url.
http://www.djangosnippets.org/snippets/1048/ which uses text.encode
('utf-8') helped.

Reported this in http://code.djangoproject.com/ticket/2078 .
Just wanted to know if this can be included as a settings parameter /
or some kind of include in urls.py .

Russell Keith-Magee

unread,
Jul 26, 2009, 7:55:10 PM7/26/09
to django...@googlegroups.com
On Sun, Jul 26, 2009 at 9:42 PM, Shivaraj<shivr...@gmail.com> wrote:
>
> Django doesn't allow urlecode or urlopen when there is a non-ASCII/
> international characters in the url.

Django does no such thing.

Firstly, urlencode and urlopen aren't part of Django's API, so this
isn't an issue of Django's making. Full unicode URLs work fine with
Django itself - if you don't believe me, create a test project that
contains a model with a CharField primary key, and create an object
that uses non-ASCII characters in that key. You'll find that you can
easily point your browser at /admin/myapp/mytest/<non-ascii-chars>/.

Secondly, the interaction of urllib and unicode is a well known problem:

http://www.google.com/search?&q=unicode+URL+python
http://bugs.python.org/issue216716
http://bugs.python.org/issue1712522

Yours,
Russ Magee %-)

Shivaraj

unread,
Jul 27, 2009, 3:26:52 AM7/27/09
to Django users
Let me repeat the original question.
If I put a nonASCII character to urlopen / urlencode and try to open
an url in python prompt it works fine.
So it's not issue with python(2.6+) as of now.

Now I call the same functions from Django and it reports error.
I will give you a simple scenario. Try unicode('район','utf-8') from
any view function and it will throw a TypeError stating decoding is
not permitted or so.


On Jul 27, 4:55 am, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:
> On Sun, Jul 26, 2009 at 9:42 PM, Shivaraj<shivraj...@gmail.com> wrote:
>
> > Django doesn't allow urlecode or urlopen when there is a non-ASCII/
> > international characters in the url.
>
> Django does no such thing.
>
> Firstly, urlencode and urlopen aren't part of Django's API, so this
> isn't an issue of Django's making. Full unicode URLs work fine with
> Django itself - if you don't believe me, create a test project that
> contains a model with a CharField primary key, and create an object
> that uses non-ASCII characters in that key. You'll find that you can
> easily point your browser at /admin/myapp/mytest/<non-ascii-chars>/.
>
> Secondly, the interaction of urllib and unicode is a well known problem:
>
> http://www.google.com/search?&q=unicode+URL+pythonhttp://bugs.python.org/issue216716http://bugs.python.org/issue1712522
>
> Yours,
> Russ Magee %-)

Russell Keith-Magee

unread,
Jul 27, 2009, 7:26:08 AM7/27/09
to django...@googlegroups.com
On Mon, Jul 27, 2009 at 3:26 PM, Shivaraj<shivr...@gmail.com> wrote:
>
> Let me repeat the original question.
> If I put a nonASCII character to urlopen / urlencode and try to open
> an url in python prompt it works fine.
> So it's not issue with python(2.6+) as of now.
>
> Now I call the same functions from Django and it reports error.

And let me repeat my original answer. You seem to be of the opinion
that Django is - or is capable of - doing something here. It isn't.
Django is just a library. It doesn't change the operation of the
functions in the standard library. If you're seeing errors, you will
see them consistently regardless of whether they are at the Python
prompt or being invoked from within a Django view, barring differences
in file encoding, etc - that is, the conditions that will alter the
operation of the standard library.

> I will give you a simple scenario. Try unicode('район','utf-8') from
> any view function and it will throw a TypeError stating decoding is
> not permitted or so.

It doesn't do that on my machine. I do get a Syntax error if my file
doesn't have a PEP-263 compliant coding string (# -*- coding: utf-8
-*-), since 'район' isn't a legal ASCII string. If the coding string
is present, the I don't get any error.

I do get a TypeError if I try to call unicode(u'район','utf-8') (i.e.,
inputing an explicitly unicode string), - but that's understandable,
since I'm asking for a UTF-8 decode of a string that is already
Unicode. This also happens regardless of whether I'm at a Python
prompt or in a Django view - which is understandable, since it's a
fundamentally incorrect way to invoke unicode().

I don't know exactly what you're hitting here, but it's a lot more
subtle than "Django did it".

Yours,
Russ Magee %-)

Shivaraj

unread,
Jul 27, 2009, 9:14:41 AM7/27/09
to Django users

Sorry ,a little bit of complication which I overlooked was the issue.
I escaped a URL parameter to urls.py which took some form like
http://mysite?param=%u0410%u0433%u0430 ....
and then I unquoted it and converted to unicode by
param = unquote(param)
result = param.replace('%u','\\u').decode('unicode_escape')

This potentially gave me a unicode string which I tried to unicode
again which was giving me an error.
Thanks for pointing this out.



On Jul 27, 4:26 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages