While it may still be the default depending on your MySQL version, MySQL
itself recommends utf8_unicode_ci instead of utf8_general_ci, as the later
can be incorrect for some characters and languages and its performance
benefits are no longer relevant. From the MySQL docs themselves:
"utf8_general_ci is a legacy collation that does not support expansions,
contractions, or ignorable characters." [1]
Using utf8_general_ci can be the cause of difficult to debug text issues.
IMO Django should update its MySQL collation recommendation to
utf8_unicode_ci.
[0] https://docs.djangoproject.com/en/dev/ref/databases/#collation-
settings
[1] http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html
--
Ticket URL: <https://code.djangoproject.com/ticket/22458>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* needs_tests: => 0
* needs_docs: => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:1>
* status: new => assigned
* owner: nobody => mardini
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:2>
Comment (by mardini):
PR: https://github.com/django/django/pull/2587
MySQL documentation doesn't recommends utf8_unicode_ci in all cases. It
states that "comparisons for the utf8_general_ci collation are faster, but
slightly less correct, than comparisons for utf8_unicode_ci", and "If this
is acceptable for your application, you should use utf8_general_ci because
it is faster. If this is not acceptable (for example, if you require
German dictionary order), use utf8_unicode_ci because it is more
accurate." I added a note and a link that explains both cases, and what
the recommended usage for each collation is. Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:3>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"11ac50b18e578498c1d95e0a75921b5864387d46"]:
{{{
#!CommitTicketReference repository=""
revision="11ac50b18e578498c1d95e0a75921b5864387d46"
Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation
Thanks tobami at gmail.com for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:4>
Comment (by Tim Graham <timograham@…>):
In [changeset:"b6863879e1cf20acdecb3606da8fe66b486836cf"]:
{{{
#!CommitTicketReference repository=""
revision="b6863879e1cf20acdecb3606da8fe66b486836cf"
[1.6.x] Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation
Thanks tobami at gmail.com for the report.
Backport of 11ac50b18e from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:5>
Comment (by Tim Graham <timograham@…>):
In [changeset:"b1e7dd445bb64c27df8e2b6902a76a67c79332ab"]:
{{{
#!CommitTicketReference repository=""
revision="b1e7dd445bb64c27df8e2b6902a76a67c79332ab"
[1.7.x] Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation
Thanks tobami at gmail.com for the report.
Backport of 11ac50b18e from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22458#comment:6>