I have a weird problem with unicode conversion whilst running in the
django environment. I'm not convinced django is the cause, but
hopefully someone will have seen something like this and can point me
in the right direction..
The original code was an XML-RPC call from one django server to
another, and the problem occurred when attempting to extract the
parameters from the utf-8 encoded XML, but fortunately I can reproduce
it much more simply. The basic process is that a unicode string is
encoded to a utf-8 bytestring, and put into an XML document. It is
then extracted from the XML by the other server and then
re-interpreted as a unicode string, which should look like this:
> $ python
Python 2.5.2 (r252:60911, Apr 25 2008, 17:25:09)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> u'£'
u'\xa3'
>>> u'£'.encode('utf-8')
'\xc2\xa3'
>>> unicode(u'£'.encode('utf-8'), 'utf-8')
u'\xa3'
>>> print unicode(val.encode('utf-8'), 'utf-8')
£
As you can see, exactly as expected. However, when running under
'manage.py shell', the behaviour is quite different:
Python 2.5.2 (r252:60911, Apr 25 2008, 17:25:09)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> u'£'
u'\xc2\xa3'
>>> u'£'.encode('utf-8')
'\xc3\x82\xc2\xa3'
>>> unicode(u'£'.encode('utf-8'), 'utf-8')
u'\xc2\xa3'
>>> print unicode(u'£'.encode('utf-8'), 'utf-8')
£
Any pointers?
Cheers
Tom
Any pointers?
Hmm, still a bit confused why it doesn't exhibit the issue under a
regular python 2.5 shell if it is a python 2.5 issue. Something
definitely 'tweaks' the environment going through the django shell
rather than the python shell. Must only be triggered in certain edge
cases, I suppose.
However, that does explain the failing tests. I am initiating one side
of the RPC from the shell, for testing, like so:
>>> checkBounceUser('10.0.0.1', {'X-Hi': u'£' }, sp=settings.SSO_ID)
which then prints out the extracted arguments in the console of the
other server:
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xc2\xa3'
If I give it an explicit unicode codepoint, there isn't the parsing bug, and so:
>>> checkBounceUser('10.0.0.1', {'X-Hi': u'\xa3' }, sp=settings.SSO_ID)
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xa3'
As always, garbage in => garbage out, so that explains why nothing I
was doing helped!
Thanks for the pointers Karen!
Cheers
Tom
Hmm, still a bit confused why it doesn't exhibit the issue under a
regular python 2.5 shell if it is a python 2.5 issue. Something
definitely 'tweaks' the environment going through the django shell
rather than the python shell. Must only be triggered in certain edge
cases, I suppose.
However, that does explain the failing tests. I am initiating one side
of the RPC from the shell, for testing, like so:
>>> checkBounceUser('10.0.0.1', {'X-Hi': u'£' }, sp=settings.SSO_ID)
which then prints out the extracted arguments in the console of the
other server:
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xc2\xa3'
If I give it an explicit unicode codepoint, there isn't the parsing bug, and so:
>>> checkBounceUser('10.0.0.1', {'X-Hi': u'\xa3' }, sp=settings.SSO_ID)
name: X-Hi value: £
repr(name): u'X-Hi' repr(value): u'\xa3'