Python unicode issues when running in django environment

16 views
Skip to first unread message

Tom Evans

unread,
Mar 17, 2010, 12:14:57 PM3/17/10
to django...@googlegroups.com
Hi all

I have a weird problem with unicode conversion whilst running in the
django environment. I'm not convinced django is the cause, but
hopefully someone will have seen something like this and can point me
in the right direction..

The original code was an XML-RPC call from one django server to
another, and the problem occurred when attempting to extract the
parameters from the utf-8 encoded XML, but fortunately I can reproduce
it much more simply. The basic process is that a unicode string is
encoded to a utf-8 bytestring, and put into an XML document. It is
then extracted from the XML by the other server and then
re-interpreted as a unicode string, which should look like this:

> $ python
Python 2.5.2 (r252:60911, Apr 25 2008, 17:25:09)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> u'£'
u'\xa3'
>>> u'£'.encode('utf-8')
'\xc2\xa3'
>>> unicode(u'£'.encode('utf-8'), 'utf-8')
u'\xa3'
>>> print unicode(val.encode('utf-8'), 'utf-8')
£

As you can see, exactly as expected. However, when running under
'manage.py shell', the behaviour is quite different:

Python 2.5.2 (r252:60911, Apr 25 2008, 17:25:09)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> u'£'
u'\xc2\xa3'
>>> u'£'.encode('utf-8')
'\xc3\x82\xc2\xa3'
>>> unicode(u'£'.encode('utf-8'), 'utf-8')
u'\xc2\xa3'
>>> print unicode(u'£'.encode('utf-8'), 'utf-8')
£

Any pointers?

Cheers

Tom

Karen Tracey

unread,
Mar 17, 2010, 12:33:14 PM3/17/10
to django...@googlegroups.com
On Wed, Mar 17, 2010 at 12:14 PM, Tom Evans <tevans.uk@googlemail.com> wrote:
Any pointers?

http://bugs.python.org/issue1288615

It's fixed in current Python 2.6, I believe.

However, this is a shell-only issue, so I'm not sure it will explain your original problem.

Karen

Tom Evans

unread,
Mar 17, 2010, 1:08:29 PM3/17/10
to django...@googlegroups.com
On Wed, Mar 17, 2010 at 4:33 PM, Karen Tracey <kmtr...@gmail.com> wrote:
> On Wed, Mar 17, 2010 at 12:14 PM, Tom Evans <teva...@googlemail.com>

> wrote:
>>
>> Any pointers?
>
> http://bugs.python.org/issue1288615
>
> It's fixed in current Python 2.6, I believe.
>
> However, this is a shell-only issue, so I'm not sure it will explain your
> original problem.
>
> Karen
>

Hmm, still a bit confused why it doesn't exhibit the issue under a
regular python 2.5 shell if it is a python 2.5 issue. Something
definitely 'tweaks' the environment going through the django shell
rather than the python shell. Must only be triggered in certain edge
cases, I suppose.

However, that does explain the failing tests. I am initiating one side
of the RPC from the shell, for testing, like so:

>>> checkBounceUser('10.0.0.1', {'X-Hi': u'£' }, sp=settings.SSO_ID)

which then prints out the extracted arguments in the console of the
other server:
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xc2\xa3'

If I give it an explicit unicode codepoint, there isn't the parsing bug, and so:

>>> checkBounceUser('10.0.0.1', {'X-Hi': u'\xa3' }, sp=settings.SSO_ID)
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xa3'

As always, garbage in => garbage out, so that explains why nothing I
was doing helped!

Thanks for the pointers Karen!

Cheers

Tom

Karen Tracey

unread,
Mar 17, 2010, 1:29:03 PM3/17/10
to django...@googlegroups.com
On Wed, Mar 17, 2010 at 1:08 PM, Tom Evans <tevans.uk@googlemail.com> wrote:
Hmm, still a bit confused why it doesn't exhibit the issue under a
regular python 2.5 shell if it is a python 2.5 issue. Something
definitely 'tweaks' the environment going through the django shell
rather than the python shell. Must only be triggered in certain edge
cases, I suppose.

I don't remember the details exactly, but I believe it's a bug for input read by code.interact(). manage.py shell calls code.interact() to present the interpreter shell, and what you enter at the shell is processed as input by code.interact(). The Python shell invoked directly via the python executable does not use code.interact(), so you see the problem when running under python manage.py shell, but not the plain Python shell.

 
However, that does explain the failing tests. I am initiating one side
of the RPC from the shell, for testing, like so:

>>> checkBounceUser('10.0.0.1', {'X-Hi': u'£' }, sp=settings.SSO_ID)

which then prints out the extracted arguments in the console of the
other server:
name: X-Hi  value: £
repr(name): u'X-Hi'  repr(value): u'\xc2\xa3'

If I give it an explicit unicode codepoint, there isn't the parsing bug, and so:

>>> checkBounceUser('10.0.0.1', {'X-Hi': u'\xa3' }, sp=settings.SSO_ID)
name: X-Hi  value: £
repr(name): u'X-Hi'  repr(value): u'\xa3'

Another workaround is to avoid using manage.py shell; instead use the plain python shell with  DJANGO_SETTINGS_MODULE set in the environment.

Karen
Reply all
Reply to author
Forward
0 new messages