IntegrityError, MySQL, unicode strings equality (accents and umlauts)

27 views
Skip to first unread message

mezhaka

unread,
Aug 22, 2007, 6:28:35 AM8/22/07
to Django users
Hi djangers!

I've accidentally bumped into an IntegrityError problem as I added
unique_together = (("word", "language"),)
to my Keyword class.
Debugging the problem I was surprised to discover the following
behavior in MySQL shell:

mysql> select 'm n'='man';
+--------------+
| 'm n'='man' |
+--------------+
| 1 |
+--------------+

So it seems like MySQL cannot tell the difference between those m n
and man?! That sounds weird. It might be some MySQL options I need to
set.
Can anyone point me to some docs were I can dig for the answer? Or
just had the same issue?

ludo

unread,
Aug 22, 2007, 7:49:49 AM8/22/07
to Django users
It should be a collation issue.

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select 'm n'='men';
+--------------+
| 'm n'='men' |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)

You might want to use the SELECT BINARY operator:

mysql> select binary 'm n'='men';
+---------------------+
| binary 'm n'='men' |
+---------------------+
| 0 |
+---------------------+
1 row in set (0.00 sec)

Or set the default collation for the table in question.

With utf8_general_ci, which transliterates non-ascii characters:

mysql> create table test (a varchar(8), b varchar(8)) character set
utf8 collate utf8_general_ci;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into test values (' ', 'e');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test where a=b;
+------+------+
| a | b |
+------+------+
| | e |
+------+------+
1 row in set (0.00 sec)

With binary collation:

mysql> alter table test character set utf8 collate utf8_bin;
Query OK, 0 rows affected (0.09 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> insert into test values (' ', 'e');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test where a=b;
Empty set (0.00 sec)

Or set binary collation only for the filed which will contain unicode
chars:

mysql> create table test (a varchar(8) collate utf8_bin, b varchar(8))
character set utf8 collate utf8_general_ci;
Query OK, 0 rows affected (0.08 sec)

mysql> insert into test values (' ', 'e');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test where a=b;
Empty set (0.00 sec)

You can find the manual pages for MySQL collations and related issues
here

http://dev.mysql.com/doc/refman/4.1/en/charset-collations.html

Ludo

Reply all
Reply to author
Forward
0 new messages