I've got a problem with putting non unicode signs into a mysql table.
I get an UnicodeEncodeError when an user puts the Euro Sign (€) in a
newform Textfield. The Exception raises when when Django want to write
into the MySQL Database.
Do I have to encode (or decode) the string before I put it into the
database or do i have to change the DEFAULT_CHARSET ?
This is the Traceback:
UnicodeEncodeError at /mail/schreiben/2/
'latin-1' codec can't encode character u'\u20ac' in position 0: ordinal
not in range(256)
Request Method: POST
Request URL: http://127.0.0.1:8000/mail/schreiben/2/
Exception Type: UnicodeEncodeError
Exception Value: 'latin-1' codec can't encode character u'\u20ac' in
position 0: ordinal not in range(256)
Exception Location:
/usr/lib/python2.4/site-packages/MySQLdb/connections.py in
unicode_literal, line 179
/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py
in get_response
/home/christian/sail/saildj/../saildj/mailing/views.py in writemail
/home/christian/sail/saildj/../saildj/mailing/models.py in senden
/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/db/models/base.py
in save
/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/db/backends/util.py
in execute
/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/db/backends/mysql/base.py
in execute
/usr/lib/python2.4/site-packages/MySQLdb/cursors.py in execute
/usr/lib/python2.4/site-packages/MySQLdb/connections.py in literal
/usr/lib/python2.4/site-packages/MySQLdb/connections.py in
unicode_literal
The POST Data from the user was the Euro Sign:
POST:
Variable Value
betreff 'testmail'
thread_id ''
text '\xe2\x82\xac'
I hope you can help me...
Thanks,
Christian.
This doesn't solve the problem. The errormessage still appears with the
same traceback.
Any other ideas?
-rob
First, it depends on database backend. As far as I know MySQL wants byte
strings (and for example psycopg2 lives happily with unicode strings).
If you use newforms then you have data in unicode and you should then
encode it to put into db. If you don't do this explicitly Python will
use whatever default encoding is set for this in your environment. It's
latin-1 in your case and it can't encode "€", this is why you get an error.
The next question is which charset to use for encoding. This depends on
setup of your database. If it's configured to store data in utf-8 that
can encode all unicode characters (meaning in practice just all
characters) then you just encode your unicode strings into utf-8 and all
will work just fine. If the database configured in some 'old school' way
using a legacy charset then things get more complicated (can this
charset also store "€"? can this database accept utf-8 even if it
doesn't recognize it as such? do you need sorting? will anyone else use
this database with other client software?).
To summarizes: your storage (a database) and your input/output (the web)
really should use utf-8 to avoid problems with "strange" characters. If
you deal internally with unicode (which newforms produce for you) then
for now you should explicitly encode from it to utf-8 until Django
starts doing it automatically.
we have a bit of chaos here ... Tickets 3370, 1356 and probably 952
all are about this problem, all are accepted, and #3370 and #1356
have very similar patches. I ask everybody to continue discussion in
django-developers ("unicode issues in multiple tickets"), and I ask
the authors of these three tickets to work together to find out how
to proceed.
http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf
Michael
--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company