mysql unicode issues

530 views
Skip to first unread message

david

unread,
Nov 14, 2007, 7:23:20 PM11/14/07
to sqlalchemy
Hi,

I can see that this has been discussed before, but I am totally
baffled as to what is happening.

I.
ubuntu
mysql version 5.0.38

with :
MySQL_python-1.2.2
SQLAlchemy-03.10
python-2.5

Using SA, I can insert and select unicode data with no problem. All
the mysql stuff looks like it is set to "latin1", the database wasn't
created with any special options etc., I added nothing special to the
connection string, but if I insert Russian characters (encoded in
utf-8) into a column with type Unicode, It *works* and if I select, I
get back the correct data. It even looks fine with the "mysql"
program.

II.
Now, I try the same thing on a red-hat system
with:
mysql version 4.1.9
MySQL_python-1.2.2
SQLAlchemy-03.10
python-2.4.2

I realize I am changing two things (besides the os): mysql version,
and python version.

BUT, I can't get *anything* with sa and unicode and mysql to work!

I ended up creating the database specifically with utf-8 charset to
get even things with MySQLdb to work (which I finally did). But for
anything to work, I had to do this beforehand:

cursor.execute("set collation_connection=utf8_general_ci")
cursor.execute("set collation_server=utf8_general_ci")
cursor.execute("set character_set_results=utf8")

With SA, when I try to do, say, an insert with something like:

ins = utest_t.insert({'lastname':
"hello"})
conn.execute(ins)

no matter what I do I get errors like:
sqlalchemy.exceptions.DBAPIError: (LookupError) unknown encoding:
latin1_swedish_ci

Does anyone know what I am doing wrong here? Or, how I can make things
right?

BTW, I have tried various versions of the connection string:

conn_string = "mysql://xx:xxx@localhost/utest?
use_unicode=1&charset=utf8"
conn_string = "mysql://xx:xxx@localhost/utest?
use_unicode=0&charset=utf8"
conn_string = "mysql://xx:xxx@localhost/utest"

but with the same results.

All I can say is, unicode on mysql is, well...... I won't say it.

Thanks!
David

jason kirtland

unread,
Nov 14, 2007, 8:33:49 PM11/14/07
to sqlal...@googlegroups.com

can't reproduce this on 0.3.10 (or 0.3.11 or 0.4). the attached test
passes fine for me with zero special setup on nearly identical setup.
(i'm testing with python 2.4.4.)

all that's needed for unicode in mysql is to ensure that the connection
and table encodings aren't configured with a latin1 legacy default. the
mysql unicode data support is actually the best of all the open source
databases, and it's tied with postgres for nearly functional unicode
schemas. (sqlite wins that one.)

to guess wildly, this:

> sqlalchemy.exceptions.DBAPIError: (LookupError) unknown encoding:
> latin1_swedish_ci

could be due to specifying a collation where a charset should be in the
database configuration somewhere.

test419.py

david

unread,
Nov 15, 2007, 8:40:31 PM11/15/07
to sqlalchemy
Hi Jason -

Thanks a lot for the test. It is very helpful.

However, when I try running it on the mysql-5.0.38 machine as a
starting point for testing, (with appropriate mods for version, etc),
I get very mixed results. There are 6 cases in your test:

1. use_unicode=1, charset='utf8', table_options={}
I get "Warning: Incorrect string value:", but it does stick stuff in
the database. However, they are all '?' (question marks)

2. use_unicode=1, charset='utf8', table_options={'mysql_charset':
'utf8'}
No warnings, but all question marks.

3. use_unicode=0, charset='utf8', table_options={}
No warnings, but all question marks.

4. use_unicode=0, charset='utf8', table_options={'mysql_charset':
'utf8'}
No warnings, but all question marks.

5. - - table_options={}
UnicodeDecodeError - 'latin-1' codec,...etc.

6. - - table_options={'mysql_charset':
'utf8'}
UnicodeDecodeError - 'latin-1' codec,...etc

Hmmmm. Not sure I understand this at all.....

jason kirtland

unread,
Nov 15, 2007, 9:06:40 PM11/15/07
to sqlal...@googlegroups.com

My output from a 5.0.41 is attached. The 4.1.9 output is the same
except for some wording changes in the encoding warnings. Both
instances have stock configurations right from the MySQL binary tarball,
so when 'table_options={}' runs here, the varchar columns are stored in
the default 'latin1'.

When you're seeing all question marks, is that in the script output or
in the mysql client? I don't think you'd see the cyrillic characters in
the client unless you're using a capable terminal and possibly doing a
'set charset utf8'.

output-5.0.41.txt

david

unread,
Nov 15, 2007, 11:56:04 PM11/15/07
to sqlalchemy
Ok,

For the time being, I seemed to have fixed/solved the current
problems.

1. On the 5.0.38 system I was making a dumb mistake (when I modded
your program) - when fixed, I obtained essentially the same results as
you.

My conclusion from this:

a. have use_unicode=0, set charset='utf8' on the connection, AND

b. ensure you have
table options = {'mysql_charset': 'utf8'} when creating tables, and
all should be well.

2. On the 4.2.9 system, we have a different story. Installed there is
a standard
mysql-standard-4.1.9-pc-linux-gnu-i686, without mods. Now, *why* it
has this problem, I don't know, but I found the fix for this, which
was preventing anything from working:

(LookupError) unknown encoding: > latin1_swedish_ci ...

Apparently, there is a problem in the MySQLdb connection (either
MySQLdb itself, or the mysql libs it calls), so that when you call
connection.set_character_set(), and subsequently
connection.character_set_name(), you never get a different name -
always the original (guess what, 'latin1_swedish_ci'). Sooooo, someone
posted this incredible kludge, but at first glance, it appears to work
- redefine the character_set_name() method to return 'utf8' no matter
what. This is ugly, to be sure, but for the time being I'll see how it
goes.

Thanks a *million* for your help - it's been invaluable.

NOW, for transferring the data from one database to another so that
the unicode works correctly. THAT sounds like fun...:)

David
> [output-5.0.41.txt]rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'plain1' at row 1
> cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni1' at row 1
> cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'plain2' at row 1
> cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni2' at row 1
> cursor.execute(statement, parameters)
> mysql:///test?use_unicode=1&charset=utf8 table options {}
> [(u'????', u'????', u'????', u'????')]
> mysql:///test?use_unicode=1&charset=utf8 table options {'mysql_charset': 'utf8'}
> [(u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449')]
> mysql:///test?use_unicode=0&charset=utf8 table options {}
> [('????', u'????', '????', u'????')]
> mysql:///test?use_unicode=0&charset=utf8 table options {'mysql_charset': 'utf8'}
> [('\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449')]
> mysql:///test? table options {}
> (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89']
> mysql:///test? table options {'mysql_charset': 'utf8'}
> (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89']
> OK
Reply all
Reply to author
Forward
0 new messages