I have been puzzled for some time now on a UnicodeEncodeError exception that i get when unicode characters are getting posted to the server.
After testing several options i can work around the exception but it does not feel right so i thought to post the issue + the solution that seems to work for me here.
On this website a form gets posted to the server through an ajax call. Basically the form is serialized using jquery ($(".form").serialize();) and then send to the server as one parameter. Since the whole form is posted as one parameter, the standard decoding on .GET/.POST won't help me but return the (still) urlencoded form data. So i need to decode it myself which i do as follows:
from urllib import unquote
return QueryDict(unquote(request.POST["f"])))
Now the constructor of QD raises an exception only if there are unicoded characters in the form. As an example one field can be "Kroati%C3%AB"
File "/env/python/local/lib/python2.7/site-packages/django/http/request.py", line 357, in __init__
value = value.decode(encoding)
File "/env/python/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
Now i found that the input query-string to the QD is actually a unicode string (u'....') and it's encoding is utf-8. And i think that is where the problem lies, because utf-8 encoded can be expressed in a 8-bit string (str(...)) iso a unicode(...).
And i can solve this exception by casting the input to str().
Simplified in a shell session:
>>> unquote(u'Kroati%C3%AB').decode('utf-8')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/pbor/djangoVillaSearch/site/env/python/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-7: ordinal not in range(128)
>>> unquote('Kroati%C3%AB').decode('utf-8')
u'Kroati\xeb'
Does this make sense?
Paul