Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Strange behavior with MySQL and UNIQUE (#7789)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
David Danier  
View profile  
 More options Jul 24 2008, 5:17 am
From: David Danier <goliath.mailingl...@gmx.de>
Date: Thu, 24 Jul 2008 11:17:30 +0200
Local: Thurs, Jul 24 2008 5:17 am
Subject: Strange behavior with MySQL and UNIQUE (#7789)
Hi devs,

I ran into a strange problem using MySQL. It seems like UNIQUE in MySQL
is case insensitive, so you cannot create "foo" if you have "FOO" in
your database.

You can test this yourself using the following SQL-statements:
---------8<------------------------------------------------------
CREATE TEMPORARY TABLE tempfoobar (
        name varchar(100) UNIQUE
) TYPE=HEAP;
INSERT INTO tempfoobar VALUES("name");
SELECT * FROM tempfoobar WHERE name="NAME";
INSERT INTO tempfoobar VALUES("NAME");
DROP TABLE tempfoobar;
------------------------------------------------------>8---------

So, why does this in any way affect Django? (Skip this if you know #7789)

Django provides a simple way to create objects, if they don't exist yet,
called get_or_create(). This method used get() to test if the object is
there (which is case sensitive) and then tries to save a new object
(which is case insensitive). To work around race conditions the method
catches IntegrityError's and tries to fetch the object again.

Now with MySQL fetching the object does not work, but INSERT does not
work either as UNIQUE is case insensitive. Now get_or_create() raises a
DoesNotExist exception, which took me a while to figure out why.

About the impact of this issue: I use django-tagging
(http://code.google.com/p/django-tagging/) in my projects. After
switching to mysql on my development machine tagging objects became
nearly impossible. You always had to type in tags exacple the same way
they where in the database, otherwise saving the object was not
possible. This this issue might break working applications.

So, what to do about this? This issue should at least be documented, it
seems that it came up on this list before and there even is a bug-report
(#7789).

But I would prefer to fix it, which is actually pretty easy:
---------8<------------------------------------------------------
CREATE TEMPORARY TABLE tempfoobar (
        name varchar(100) BINARY UNIQUE
) TYPE=HEAP;
INSERT INTO tempfoobar VALUES("name");
SELECT * FROM tempfoobar WHERE name="NAME";
INSERT INTO tempfoobar VALUES("NAME");
DROP TABLE tempfoobar;
------------------------------------------------------>8---------
(note the "BINARY")

I have attached a patch
(http://code.djangoproject.com/attachment/ticket/7789/django-mysql-cas...)
for current trunk to #7789, perhaps this should go into Django before 1.0?

Greetings, David Danier


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Jul 24 2008, 10:43 am
From: "Karen Tracey" <kmtra...@gmail.com>
Date: Thu, 24 Jul 2008 10:43:43 -0400
Local: Thurs, Jul 24 2008 10:43 am
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)

On Thu, Jul 24, 2008 at 5:17 AM, David Danier <goliath.mailingl...@gmx.de>
wrote:

Yes, there's an issue here, but the fix is not that simple.  If you make the
column type a binary varchar, than an iexact() (case-insensitive) lookup,
which Django maps to LIKE for MySQL, will fail to find rows that should
match:

mysql> describe testfail_author;
+----------+-------------+------+-----+---------+----------------+
| Field    | Type        | Null | Key | Default | Extra          |
+----------+-------------+------+-----+---------+----------------+
| id       | int(11)     | NO   | PRI | NULL    | auto_increment |
| name     | varchar(10) | NO   |     |         |                |
| num      | int(11)     | NO   | UNI |         |                |
| extra_id | int(11)     | NO   | MUL |         |                |
+----------+-------------+------+-----+---------+----------------+
4 rows in set (0.09 sec)

mysql> select name from testfail_author;
+------+
| name |
+------+
| a1   |
| a2   |
| a3   |
| a4   |
+------+
4 rows in set (0.01 sec)

mysql> select * from testfail_author where name like 'A1';
+----+------+------+----------+
| id | name | num  | extra_id |
+----+------+------+----------+
|  1 | a1   | 1001 |        2 |
+----+------+------+----------+
1 row in set (0.03 sec)

mysql> alter table testfail_author change column name name varchar(10)
binary;
Query OK, 4 rows affected (0.39 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from testfail_author where name like 'A1';
Empty set (0.00 sec)

-----

As the ticket description states, you can work around this problem at the
moment by specifying an iexact lookup on your get_or_create()...but that
isn't too useful if you're using a 3rd-party app (tagging?) that is doing
the get_or_create() for you.

I don't know what the right answer is here, just that the fix is not as
simple as it may appear.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
julianb  
View profile  
 More options Jul 24 2008, 4:10 pm
From: julianb <julian....@gmail.com>
Date: Thu, 24 Jul 2008 13:10:37 -0700 (PDT)
Local: Thurs, Jul 24 2008 4:10 pm
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)
On Jul 24, 4:43 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:

> As the ticket description states, you can work around this problem at the
> moment by specifying an iexact lookup on your get_or_create()...

That doesn't really work if you're always using it. I haven't really
understood what Django does there but you still get DoesNotExist
errors.

Another ticket that's related to the topic is perhaps
http://code.djangoproject.com/ticket/7402

What we have is an IntegrityError that is hidden, so that's part of
the problem, too.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremiah  
View profile  
 More options Jul 25 2008, 1:27 am
From: Jeremiah <jerem...@bellomy.net>
Date: Thu, 24 Jul 2008 22:27:46 -0700 (PDT)
Local: Fri, Jul 25 2008 1:27 am
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)
With a little research today I discovered that it is possible to do a
case insensitive match on a binary column by using COLLATE.  This may
not be helpful for the current issue, but here is an example:

CREATE TABLE `unique_test` (
  `name` varchar(32) binary,
  UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `unique_test` (name) VALUES ('Test'), ('test');

SELECT * FROM unique_test;
+------+
| name |
+------+
| Test |
| test |
+------+
2 rows in set (0.00 sec)

SELECT * FROM unique_test WHERE name LIKE 'Test';
+------+
| name |
+------+
| Test |
+------+
1 row in set (0.00 sec)

-- assuming the table was created as utf8
-- set charset for current connection to utf8 (only necessary if utf8
is not the MySQL server default)
SET character_set_connection = 'utf8';

-- force case insensitive match on a binary case sensitive field
SELECT * FROM unique_test WHERE name LIKE 'Test' COLLATE
utf8_general_ci;
+------+
| name |
+------+
| Test |
| test |
+------+
2 rows in set (0.00 sec)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Jul 25 2008, 11:42 am
From: "Karen Tracey" <kmtra...@gmail.com>
Date: Fri, 25 Jul 2008 11:42:32 -0400
Local: Fri, Jul 25 2008 11:42 am
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)

On Fri, Jul 25, 2008 at 1:27 AM, Jeremiah <jerem...@bellomy.net> wrote:

> With a little research today I discovered that it is possible to do a
> case insensitive match on a binary column by using COLLATE.

Yes, specifying COLLATE is what's needed here.  In fact specifying the
BINARY attribute on a character column is just changing the default
collation to the binary one for the column's character set.  Then to get
case-insensitive matching on a column where you've done this you need to
specify a case-insensitive collation to override the column default.  I
actually tried that before sending my note yesterday but ran into errors
with illegal mixtures of collations and ran out of time to look at it so
just posted the problem.  At that point I was afraid there was no good way
for Django code to know the right collation to specify, given that the
table/column charset could be anything.  But trying again this morning it
seems things might not be that dire: apparently if the connection charset is
set to utf8 (which I believe Django does always), then you can specify a
collation of utf8_general_ci regardless of the table/column charset.  At
least, that worked for me for a table that was using latin1 charset.

I still have some concerns with Django specifying a binary column type on
table creation:

First, what about tables that Django did not create (or created before the
change was made)?   They likely have a case-insensitive default collation
since that is the default for MySQL.  Do we just document the kind of
trouble you can run into with get_or_create and unique on columns that don't
have a binary default collation specified?

Second, specifying a binary collation for the column affects ordering as
well.  So ordered results will change from using a case-insensitive ordering
by default to the binary one.  This could be an issue, since I think binary
ordering gives somewhat non-intuitive results whereas case-insensitive more
naturally matches what people expect.  But maybe that's just my impression?
Do the other databases use binary ordering by default?

Third, need to verify that specifying a collation of utf8_general_ci will
really always work regardless of the table/column charset.  It does seem to
work for latin1, but I haven't tested beyond that and again have run out of
time to play with this today...

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Jul 26 2008, 4:20 pm
From: "Karen Tracey" <kmtra...@gmail.com>
Date: Sat, 26 Jul 2008 16:20:03 -0400
Local: Sat, Jul 26 2008 4:20 pm
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)

On Fri, Jul 25, 2008 at 11:42 AM, Karen Tracey <kmtra...@gmail.com> wrote:

> I still have some concerns with Django specifying a binary column type on
> table creation:

[snip concerns 1-3, because 4 is way more troublesome]

Fourth, I just stumbled across a reason why specifying a binary modifier for
a (var)char column in MySQL may be a really bad idea: the field values for
instances of Django models associated with these columns turn out to
bytestrings instead of unicode strings.  I had a model that worked fine and
looked something like this (snipped to bare bones):

class Authors(models.Model):
    Author = models.CharField(unique=True, max_length=50)
    def __unicode__(self):
        return self.Author

While experimenting with this binary modifier idea I altered the MySQL table
so that the Author column was of type BINARY VARCHAR instead of just
VARCHAR.  Today I noticed in the admin for another model that has a
ForeignKey to this model, no Select widget was rendered for this ForeignKey
field that identifies an Author.  Eventually I tracked the reason for that
down to the Select widget rendererer generating a DjangoUnicodeDecodeError
(which was silently swallowed during template rendering, argh, but that's a
quibble for a different discussion) when trying to render the list of
choices (there's at least one element with non-ASCII data in the DB) for the
widget.  Turns out that type(self.Author) for an Authors instance was <type
'str'>, not the expected <type 'unicode'>.  I could 'fix' my model by
changing the __unicode__ method to return self.Author.decode('utf8') but a
CharField whose instances have values that are of type 'str' instead of
'unicode' seems like a bad thing, and I'm not sure we want to go down that
path.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
julianb  
View profile  
 More options Jul 27 2008, 8:07 am
From: julianb <julian....@gmail.com>
Date: Sun, 27 Jul 2008 05:07:09 -0700 (PDT)
Local: Sun, Jul 27 2008 8:07 am
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)
On Jul 26, 10:20 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:

> Fourth, I just stumbled across a reason why specifying a binary modifier for
> a (var)char column in MySQL may be a really bad idea:

I think we should agree on not changing anything in MySQL, seems not
possible to find an elegant solution.
What about handling the issue in the get_or_create method, e.g. making
a special case for MySQL or something like that?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Cramer  
View profile  
 More options Jul 27 2008, 7:13 pm
From: David Cramer <dcra...@gmail.com>
Date: Sun, 27 Jul 2008 16:13:55 -0700 (PDT)
Local: Sun, Jul 27 2008 7:13 pm
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)
To be quite honest, I really doubt BINARY is a solution for anything
that isn't BINARY data. I'm completely against the change that was
made which forces everything to WHERE BINARY N = X because it DOES NOT
solve the issue that is presented. Anything and everything in MySQL is
based on the collation and character set, if your matching, or unique
indexes are off, its most likely because of that.

In regards to a unique index, use utf8_bin for the collation on those,
and leave the fields as normal types and it will probably be solved.

On Jul 27, 7:07 am, julianb <julian....@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
julianb  
View profile  
 More options Aug 5 2008, 2:25 pm
From: julianb <julian....@gmail.com>
Date: Tue, 5 Aug 2008 11:25:18 -0700 (PDT)
Local: Tues, Aug 5 2008 2:25 pm
Subject: Re: Strange behavior with MySQL and UNIQUE (#7789)
Maybe someone could have a look at this ticket and patch. Would be
nice if we can somehow fix this, cause without that problem solved one
can't start solving the next one, IMHO:

http://code.djangoproject.com/ticket/7402


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »