Strange behavior with MySQL and UNIQUE (#7789)

247 views
Skip to first unread message

David Danier

unread,
Jul 24, 2008, 5:17:30 AM7/24/08
to django-d...@googlegroups.com
Hi devs,

I ran into a strange problem using MySQL. It seems like UNIQUE in MySQL
is case insensitive, so you cannot create "foo" if you have "FOO" in
your database.

You can test this yourself using the following SQL-statements:
---------8<------------------------------------------------------
CREATE TEMPORARY TABLE tempfoobar (
name varchar(100) UNIQUE
) TYPE=HEAP;
INSERT INTO tempfoobar VALUES("name");
SELECT * FROM tempfoobar WHERE name="NAME";
INSERT INTO tempfoobar VALUES("NAME");
DROP TABLE tempfoobar;
------------------------------------------------------>8---------

So, why does this in any way affect Django? (Skip this if you know #7789)

Django provides a simple way to create objects, if they don't exist yet,
called get_or_create(). This method used get() to test if the object is
there (which is case sensitive) and then tries to save a new object
(which is case insensitive). To work around race conditions the method
catches IntegrityError's and tries to fetch the object again.

Now with MySQL fetching the object does not work, but INSERT does not
work either as UNIQUE is case insensitive. Now get_or_create() raises a
DoesNotExist exception, which took me a while to figure out why.

About the impact of this issue: I use django-tagging
(http://code.google.com/p/django-tagging/) in my projects. After
switching to mysql on my development machine tagging objects became
nearly impossible. You always had to type in tags exacple the same way
they where in the database, otherwise saving the object was not
possible. This this issue might break working applications.

So, what to do about this? This issue should at least be documented, it
seems that it came up on this list before and there even is a bug-report
(#7789).

But I would prefer to fix it, which is actually pretty easy:
---------8<------------------------------------------------------
CREATE TEMPORARY TABLE tempfoobar (
name varchar(100) BINARY UNIQUE
) TYPE=HEAP;
INSERT INTO tempfoobar VALUES("name");
SELECT * FROM tempfoobar WHERE name="NAME";
INSERT INTO tempfoobar VALUES("NAME");
DROP TABLE tempfoobar;
------------------------------------------------------>8---------
(note the "BINARY")

I have attached a patch
(http://code.djangoproject.com/attachment/ticket/7789/django-mysql-case-sensitive-unique.patch)
for current trunk to #7789, perhaps this should go into Django before 1.0?

Greetings, David Danier

Karen Tracey

unread,
Jul 24, 2008, 10:43:43 AM7/24/08
to django-d...@googlegroups.com
On Thu, Jul 24, 2008 at 5:17 AM, David Danier <goliath.m...@gmx.de> wrote:

Hi devs,

I ran into a strange problem using MySQL. It seems like UNIQUE in MySQL
is case insensitive, so you cannot create "foo" if you have "FOO" in
your database.

[...snip details...]


So, what to do about this? This issue should at least be documented, it
seems that it came up on this list before and there even is a bug-report
(#7789).

But I would prefer to fix it, which is actually pretty easy:
---------8<------------------------------------------------------
CREATE TEMPORARY TABLE tempfoobar (
       name varchar(100) BINARY UNIQUE
) TYPE=HEAP;
INSERT INTO tempfoobar VALUES("name");
SELECT * FROM tempfoobar WHERE name="NAME";
INSERT INTO tempfoobar VALUES("NAME");
DROP TABLE tempfoobar;
------------------------------------------------------>8---------
(note the "BINARY")

I have attached a patch
(http://code.djangoproject.com/attachment/ticket/7789/django-mysql-case-sensitive-unique.patch)
for current trunk to #7789, perhaps this should go into Django before 1.0?

Yes, there's an issue here, but the fix is not that simple.  If you make the column type a binary varchar, than an iexact() (case-insensitive) lookup, which Django maps to LIKE for MySQL, will fail to find rows that should match:

mysql> describe testfail_author;
+----------+-------------+------+-----+---------+----------------+
| Field    | Type        | Null | Key | Default | Extra          |
+----------+-------------+------+-----+---------+----------------+
| id       | int(11)     | NO   | PRI | NULL    | auto_increment |
| name     | varchar(10) | NO   |     |         |                |
| num      | int(11)     | NO   | UNI |         |                |
| extra_id | int(11)     | NO   | MUL |         |                |
+----------+-------------+------+-----+---------+----------------+
4 rows in set (0.09 sec)

mysql> select name from testfail_author;
+------+
| name |
+------+
| a1   |
| a2   |
| a3   |
| a4   |
+------+
4 rows in set (0.01 sec)

mysql> select * from testfail_author where name like 'A1';
+----+------+------+----------+
| id | name | num  | extra_id |
+----+------+------+----------+
|  1 | a1   | 1001 |        2 |
+----+------+------+----------+
1 row in set (0.03 sec)

mysql> alter table testfail_author change column name name varchar(10) binary;
Query OK, 4 rows affected (0.39 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from testfail_author where name like 'A1';
Empty set (0.00 sec)

-----

As the ticket description states, you can work around this problem at the moment by specifying an iexact lookup on your get_or_create()...but that isn't too useful if you're using a 3rd-party app (tagging?) that is doing the get_or_create() for you. 

I don't know what the right answer is here, just that the fix is not as simple as it may appear.

Karen

julianb

unread,
Jul 24, 2008, 4:10:37 PM7/24/08
to Django developers
On Jul 24, 4:43 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> As the ticket description states, you can work around this problem at the
> moment by specifying an iexact lookup on your get_or_create()...

That doesn't really work if you're always using it. I haven't really
understood what Django does there but you still get DoesNotExist
errors.

Another ticket that's related to the topic is perhaps
http://code.djangoproject.com/ticket/7402

What we have is an IntegrityError that is hidden, so that's part of
the problem, too.

Jeremiah

unread,
Jul 25, 2008, 1:27:46 AM7/25/08
to Django developers
With a little research today I discovered that it is possible to do a
case insensitive match on a binary column by using COLLATE. This may
not be helpful for the current issue, but here is an example:

CREATE TABLE `unique_test` (
`name` varchar(32) binary,
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `unique_test` (name) VALUES ('Test'), ('test');

SELECT * FROM unique_test;
+------+
| name |
+------+
| Test |
| test |
+------+
2 rows in set (0.00 sec)

SELECT * FROM unique_test WHERE name LIKE 'Test';
+------+
| name |
+------+
| Test |
+------+
1 row in set (0.00 sec)

-- assuming the table was created as utf8
-- set charset for current connection to utf8 (only necessary if utf8
is not the MySQL server default)
SET character_set_connection = 'utf8';

-- force case insensitive match on a binary case sensitive field
SELECT * FROM unique_test WHERE name LIKE 'Test' COLLATE
utf8_general_ci;
+------+
| name |
+------+
| Test |
| test |
+------+
2 rows in set (0.00 sec)

Karen Tracey

unread,
Jul 25, 2008, 11:42:32 AM7/25/08
to django-d...@googlegroups.com
On Fri, Jul 25, 2008 at 1:27 AM, Jeremiah <jere...@bellomy.net> wrote:

With a little research today I discovered that it is possible to do a
case insensitive match on a binary column by using COLLATE.  

Yes, specifying COLLATE is what's needed here.  In fact specifying the BINARY attribute on a character column is just changing the default collation to the binary one for the column's character set.  Then to get case-insensitive matching on a column where you've done this you need to specify a case-insensitive collation to override the column default.  I actually tried that before sending my note yesterday but ran into errors with illegal mixtures of collations and ran out of time to look at it so just posted the problem.  At that point I was afraid there was no good way for Django code to know the right collation to specify, given that the table/column charset could be anything.  But trying again this morning it seems things might not be that dire: apparently if the connection charset is set to utf8 (which I believe Django does always), then you can specify a collation of utf8_general_ci regardless of the table/column charset.  At least, that worked for me for a table that was using latin1 charset.

I still have some concerns with Django specifying a binary column type on table creation:

First, what about tables that Django did not create (or created before the change was made)?   They likely have a case-insensitive default collation since that is the default for MySQL.  Do we just document the kind of trouble you can run into with get_or_create and unique on columns that don't have a binary default collation specified?

Second, specifying a binary collation for the column affects ordering as well.  So ordered results will change from using a case-insensitive ordering by default to the binary one.  This could be an issue, since I think binary ordering gives somewhat non-intuitive results whereas case-insensitive more naturally matches what people expect.  But maybe that's just my impression?  Do the other databases use binary ordering by default?

Third, need to verify that specifying a collation of utf8_general_ci will really always work regardless of the table/column charset.  It does seem to work for latin1, but I haven't tested beyond that and again have run out of time to play with this today...

Karen

Karen Tracey

unread,
Jul 26, 2008, 4:20:03 PM7/26/08
to django-d...@googlegroups.com
On Fri, Jul 25, 2008 at 11:42 AM, Karen Tracey <kmtr...@gmail.com> wrote:

I still have some concerns with Django specifying a binary column type on table creation:

[snip concerns 1-3, because 4 is way more troublesome]

Fourth, I just stumbled across a reason why specifying a binary modifier for a (var)char column in MySQL may be a really bad idea: the field values for instances of Django models associated with these columns turn out to bytestrings instead of unicode strings.  I had a model that worked fine and looked something like this (snipped to bare bones):

class Authors(models.Model):
    Author = models.CharField(unique=True, max_length=50)
    def __unicode__(self):
        return self.Author

While experimenting with this binary modifier idea I altered the MySQL table so that the Author column was of type BINARY VARCHAR instead of just VARCHAR.  Today I noticed in the admin for another model that has a ForeignKey to this model, no Select widget was rendered for this ForeignKey field that identifies an Author.  Eventually I tracked the reason for that down to the Select widget rendererer generating a DjangoUnicodeDecodeError (which was silently swallowed during template rendering, argh, but that's a quibble for a different discussion) when trying to render the list of choices (there's at least one element with non-ASCII data in the DB) for the widget.  Turns out that type(self.Author) for an Authors instance was <type 'str'>, not the expected <type 'unicode'>.  I could 'fix' my model by changing the __unicode__ method to return self.Author.decode('utf8') but a CharField whose instances have values that are of type 'str' instead of 'unicode' seems like a bad thing, and I'm not sure we want to go down that path.

Karen

julianb

unread,
Jul 27, 2008, 8:07:09 AM7/27/08
to Django developers
On Jul 26, 10:20 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> Fourth, I just stumbled across a reason why specifying a binary modifier for
> a (var)char column in MySQL may be a really bad idea:

I think we should agree on not changing anything in MySQL, seems not
possible to find an elegant solution.
What about handling the issue in the get_or_create method, e.g. making
a special case for MySQL or something like that?

David Cramer

unread,
Jul 27, 2008, 7:13:55 PM7/27/08
to Django developers
To be quite honest, I really doubt BINARY is a solution for anything
that isn't BINARY data. I'm completely against the change that was
made which forces everything to WHERE BINARY N = X because it DOES NOT
solve the issue that is presented. Anything and everything in MySQL is
based on the collation and character set, if your matching, or unique
indexes are off, its most likely because of that.

In regards to a unique index, use utf8_bin for the collation on those,
and leave the fields as normal types and it will probably be solved.

julianb

unread,
Aug 5, 2008, 2:25:18 PM8/5/08
to Django developers
Maybe someone could have a look at this ticket and patch. Would be
nice if we can somehow fix this, cause without that problem solved one
can't start solving the next one, IMHO:

http://code.djangoproject.com/ticket/7402
Reply all
Reply to author
Forward
0 new messages