[Django] #18392: Use utf8mb4 encoding with MySQL 5.5

240 views
Skip to first unread message

Django

unread,
May 28, 2012, 8:21:18 AM5/28/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
----------------------------------------------+--------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer (models, ORM) | Version: 1.4
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
Background:
I just ran into a problem with iPhone emoji characters being saved into
MySQL. The text was cut off after the first emoji character. After some
research I found which explains how it works:
http://mzsanford.wordpress.com/2010/12/28/mysql-and-unicode/

The recommendation is to use MySQL 5.5, and the "utf8mb4" encoding.

Suggestion:
Make "utf8mb4" the default encoding for MySQL 5.5 and up.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 28, 2012, 8:49:47 AM5/28/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):

* needs_better_patch: => 0
* needs_docs: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

To me it seems the character encoding is set on connect to 'charset':
'utf8'. Is it enough to change this from 'utf8' to 'utf8mb4' to change
this default. If not, where should this default encoding be defined? Does
this mean that any text column must be VARCHAR(N) CHARACTER SET utf8mb4 on
creation?

I am accepting this as to me the change sounds valid.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:1>

Django

unread,
May 28, 2012, 8:59:02 AM5/28/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by EmilStenstrom):

I don't know enough about MySQL and the ORM to answer your questions, I
hope someone else does.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:2>

Django

unread,
May 28, 2012, 12:08:40 PM5/28/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by EmilStenstrom):

Ok, there's some trickiness here. Indexes in InnoDB tables can't be longer
than 255 chars with utf8, but only 191 chars with utf8mb4. This means that
the default indexes that Django makes for CharField(max_length=255) is too
long, and will break things (break what? I'm running a migration that
converts all my tables to utf8mb4 automatically, and setting utf8mb4 on a
long charfield breaks because the index is too long).

From the official docs:


{{{
"InnoDB has a maximum index length of 767 bytes, so for utf8 or utf8mb4
columns, you can index a maximum of 255 or 191 characters, respectively.
If you currently have utf8 columns with indexes longer than 191
characters, you will need to index a smaller number of characters. In an
InnoDB table, these column and index definitions are legal:

col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))

To use utf8mb4 instead, the index must be smaller:

col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))"

}}}

From: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-
upgrading.html

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:3>

Django

unread,
May 28, 2012, 1:32:36 PM5/28/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: | decision needed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by akaariai):

* stage: Accepted => Design decision needed


Comment:

I am marking this as design decision needed. This will need a good
solution which guarantees trouble-free upgrades for current users. While
it would be nice to have full UTF8 support with MySQL, I don't think it is
worth risking a breakage for existing users.

So, looking for good solutions here. Does MySQL infer the used character
set from some runtime variable (perhaps something set at CREATE DATABASE
time)? If so, we could just check the used charset for the db, and use
that for test database creation and index creation. If so, this would be a
simple solution to this problem.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:4>

Django

unread,
Jun 4, 2012, 3:14:44 AM6/4/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: | decision needed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by kmtracey):

I don't believe setting the connection charset to this new utf8mb4
encoding (if we're talking to a server that supports it) would cause any
problem with the indexing: that's a problem that comes into play when you
change the database charset itself. Whether we need to change that isn't
clear to me. It's possible that these new characters (stored in a DB with
utf8mb4 encoding) would be transmitted over a connection that has charset
set simply to utf8. If so then there's really no point in making any
connection charset change. So one open question here is: do you get an
error or incorrect behavior trying to read/write these characters over a
utf8 connection created by Django when operating on a database with
utf8mb4 encoding?

Traditionally Django has not gone beyond advising in the documentation
what charset to use (uft8) for the database. Django doesn't attempt to set
the charset for stuff it creates to utf8, it just uses the default charset
for the DB, which is set before Django ever gets involved. Given this new
MySQL 5.5 support for "more better" unicode we probably need to update the
docs to mention the new option for database charset.

Having the index creation fail for a too-long !CharField is a nuisance. I
know we've had tickets before that dealt with this issue, but I don't
recall offhand what the status is. Django tries to disallow creation of a
!CharField where the index creation will fail, but MySQL makes it
incredibly difficult to figure out what the right value for "max allowed"
is. This new encoding just makes for more of a mess there.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:5>

Django

unread,
Jun 5, 2012, 4:00:10 AM6/5/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: | decision needed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by EmilStenstrom):

Maybe this ticket just needs some exploration before going further: A
simple test that tries to save and their fetch some 4-byte unicode
characters to MySQL... I think that would reveal what the minimal changes
are, for full unicode support to be possible.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:6>

Django

unread,
Jun 14, 2012, 11:31:31 PM6/14/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: hack utf8mb4 mysql | decision needed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by rogeliorv):

* keywords: => hack utf8mb4 mysql


Comment:

As proof of concept I tried inserting into mysql the following values in a
varchar.

😄😃😊☺😉😍😘😚

And got a "Warning: Incorrect string value" from mysql (which python
treats as an exception)

As a way to test it. The hack consists in adding self.query('SET NAMES
utf8mb4') in MySQLdb.connections in Connection.set_character_set function
as shown here: http://pastebin.com/MW5BgRgP

Of course the correct way would be to change this in django when setting
up the cursor connection.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:7>

Django

unread,
Jun 15, 2012, 5:07:19 AM6/15/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: hack utf8mb4 mysql | decision needed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by EmilStenstrom):

Replying to [comment:7 rogeliorv]:

> As a way to test it. The hack consists in adding self.query('SET NAMES
utf8mb4') in MySQLdb.connections in Connection.set_character_set function
as shown here: http://pastebin.com/MW5BgRgP
>
> Of course the correct way would be to change this in django when setting
up the cursor connection.

Did your hack remove the exception, or what was the rationale behind it?
What's the next step here?

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:8>

Django

unread,
Jun 15, 2012, 8:18:45 PM6/15/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by rogeliorv):

* keywords: hack utf8mb4 mysql => utf8mb4 mysql
* needs_better_patch: 0 => 1
* has_patch: 0 => 1
* needs_tests: 0 => 1


Comment:

Replying to [comment:8 EmilStenstrom]:

> Replying to [comment:7 rogeliorv]:
> > As a way to test it. The hack consists in adding self.query('SET NAMES
utf8mb4') in MySQLdb.connections in Connection.set_character_set function
as shown here: http://pastebin.com/MW5BgRgP
> >
> > Of course the correct way would be to change this in django when
setting up the cursor connection.
>

> Did your hack remove the exception? What was the rationale behind the
hack? What's the next step?


Yes, the hack removed the exception. The rationale followed was to make
the mysql client to use a certain encoding.

The next step is to make django's mysql connections to use utf8mb4 by
default or otherwise make it more configurable. Since utf8bm4 is utf8
compatible, there should be no extra changes in that regard.


To achieve this django.db.base.cursor should be changed in class
DatabaseWrapper function _cursor, (complete function definition here
http://pastebin.com/A6dMEMd4):

''kwargs = {

"conv": django_conversions,
"charset": "utf8mb4",
"use_unicode": True,
}
''

Unfortunately this won't work unless we also change MySQLdb.connections
class Connection function set_character_set:


Change the two bottom lines to (complete function definition here:
http://pastebin.com/AMN1B8za)

#Hack so data can be decoded/encoded using python's utf8 since
# python does not know about mysql utf8mb4

''if charset == 'utf8mb4':''
''charset = 'utf8'''

''self.string_decoder.charset = charset''

''self.unicode_literal.charset = charset''


This will guarantee you can use special characets like 😄😃😊☺😉😍😘😚

Unlike the previous hack, which worked on reading/writing data, this patch
only allows me to read data in utfmb4 format, but now I've hit an error on
insertion/creation where I get 'Cursor' object has no attribute
'_last_executed'. I will report evidence on this error as I find it. All
your help regarding this error is appreciated.

You can reach me via twitter, @rogeliorv

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:9>

Django

unread,
Aug 16, 2012, 3:32:48 PM8/16/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by kmichel_wgs):

As a workaround, you can make python understand 'utf8mb4' as an alias for
'utf8':
{{{
import codecs
codecs.register(lambda name: codecs.lookup('utf8') if name == 'utf8mb4'
else None)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:10>

Django

unread,
Sep 25, 2012, 12:17:48 AM9/25/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------

Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by kitsunde):

* cc: kitsunde@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:11>

Django

unread,
Oct 11, 2012, 7:27:13 AM10/11/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------

Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by evax):

There is a fix for this issue in the upcoming MySQLdb-python release
(https://github.com/farcepest/MySQLdb1/tree/utf8mb4)

This should be used in together with the appropriate OPTIONS in the
database config:
{{{
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'OPTIONS': {'charset': 'utf8mb4'},
(...)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:12>

Django

unread,
Oct 12, 2012, 4:16:24 AM10/12/12
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------

Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by evax):

The fix mentioned above has been merged to master (
https://github.com/farcepest/MySQLdb1) and released
(http://pypi.python.org/pypi/MySQL-python/1.2.4b5)

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:13>

Django

unread,
Feb 14, 2013, 6:56:52 PM2/14/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------

Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by streeter):

* cc: django@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:14>

Django

unread,
Feb 22, 2013, 2:17:51 PM2/22/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new

Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by akaariai):

* type: Uncategorized => New feature


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:15>

Django

unread,
Mar 22, 2013, 2:52:21 PM3/22/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* stage: Design decision needed => Accepted


Comment:

This is just another case of MySQL being purposefully and irreversibly
brain-damaged. What it calls utf8 in actually a non-standard 3-bytes
encoding unrelated to utf8.

Django should give the option to use utf8mb4, and maybe recommend it in
the docs (if it works well; non-default features of MySQL rarely work
well).

The comments above suggest this isn't possible right now, although I'm
still confused as to the nature of the problem. Accepting on this basis.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:16>

Django

unread,
Aug 19, 2013, 8:26:07 PM8/19/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by cvrebert):

* cc: cvrebert (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:17>

Django

unread,
Oct 30, 2013, 12:14:17 PM10/30/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by timo):

#21308 describes an issue with running tests on MySQL when charset=utf8mb4
that I've marked as a duplicate of this.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:18>

Django

unread,
Oct 31, 2013, 8:05:46 AM10/31/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by anonymous):

This [http://dev.mysql.com/doc/refman/5.5/en/innodb-restrictions.html
InnoDB restriction] was fixed in MySQL 5.5.14. You have to set the
following:

[http://dev.mysql.com/doc/refman/5.5/en/innodb-
parameters.html#sysvar_innodb_large_prefix innodb_large_prefix]=ON
innodb_file_per_table=ON
innodb_file_format=Barracuda

and {{{CREATE TABLE}}} or {{{ALTER TABLE}}} with the
{{{ROW_FORMAT=DYNAMIC}}} attribute.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:19>

Django

unread,
Nov 21, 2013, 7:15:08 PM11/21/13
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by moggers87):

* cc: moggers87 (added)


Comment:

Replying to [comment:19 anonymous]:
*snip*


>
> and {{{CREATE TABLE}}} or {{{ALTER TABLE}}} with the
{{{ROW_FORMAT=DYNAMIC}}} attribute.

[[BR]]
I'm not aware of any way to specify options like `ROW_FORMAT=DYNAMIC` for
create/alter statements in Django other than specifying `SET
ROW_FORMAT=DYNAMIC` with 'init_command' - but that adds overhead to each
connection (and feels a bit hacky imo)

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:20>

Django

unread,
Apr 30, 2014, 5:12:22 PM4/30/14
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by donturner):

Replying to [comment:20 moggers87]:


> Replying to [comment:19 anonymous]:
> *snip*
> >
> > and {{{CREATE TABLE}}} or {{{ALTER TABLE}}} with the
{{{ROW_FORMAT=DYNAMIC}}} attribute.
> [[BR]]
> I'm not aware of any way to specify options like `ROW_FORMAT=DYNAMIC`
for create/alter statements in Django other than specifying `SET
ROW_FORMAT=DYNAMIC` with 'init_command' - but that adds overhead to each
connection (and feels a bit hacky imo)

Having run into the same issue my conclusion is that the only proper
solution is to switch to PostgreSQL. It would appear that MySQL has
extremely bad support for indexing on utf8mb4 fields.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:21>

Django

unread,
Jul 4, 2014, 3:00:30 AM7/4/14
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: utf8mb4 mysql | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by flisky):

* cc: flisky (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:22>

Django

unread,
Jan 22, 2015, 12:45:03 PM1/22/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by collinanderson):

* cc: cmawebsite@… (added)


Comment:

One solution would be to reduce the INDEX size to 191 for mysql, like the
example above:

col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))"

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:23>

Django

unread,
May 7, 2015, 6:59:02 AM5/7/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aigarius):

https://mathiasbynens.be/notes/mysql-utf8mb4 has more information on this.
Ideally it would be wonderful if Django users could set an option and have
Django migration backend generate all the code for the migration to the
full utf8mb4 encoding for all databases, tables, columns and corresponding
changes to all indexes as well. And then on next major version this option
could become the new default.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:24>

Django

unread,
Nov 12, 2015, 3:03:39 AM11/12/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody

Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by teeberg):

Replying to [comment:24 aigarius]:
> [...] have Django migration backend generate all the code for the
migration [...]

Even the utf8 to utf8mb4 migration, which would be the easiest in terms of
required changes, may cause data loss if you have indexes that are longer
than 191 characters and should thus probably be inspected and fixed up
individually and manually. For that reason, it seems impossible to me to
automate. That being said, clear migration instructions would probably be
helpful for many users.

Maybe this could be another backend-specific setting, similar to what was
implemented for integer types in 1506c71a95cd7f58fbc6363edf2ef742c58d2487?
Although, it only applying to index columns may make it way more painful
to implement.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:25>

Django

unread,
Nov 12, 2015, 9:17:07 AM11/12/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by collinanderson):

For a first step, I propose a a setting in DATABASES OPTIONS that tells
Django the maximum index size to use when creating new indexes. I think it
should default to 191. That way Django won't ever reduce the size of
existing indexes, only new indexes.

If we have that settings, it's at least _possible_ to use utf8mb4. We
could then down the road have Django try to default to using utf8mb4
encoding for new databases/tables/columns.

(We need this for #20846.)

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:26>

Django

unread,
Nov 20, 2015, 1:57:10 PM11/20/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

Will that setting work nicely with migrations though? I think we need to
know the index names for some operations like `AlterField`. It seems
problematic if we have a way that users can vary the index names without
updating existing names.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:27>

Django

unread,
Nov 22, 2015, 9:27:43 AM11/22/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by collinanderson):

I was thinking it wouldn't actually change the name of the index, but I
haven't actually looked at the code. :)

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:28>

Django

unread,
Nov 23, 2015, 8:23:54 AM11/23/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

In case it's not clear, here's the problematic scenario I see:

* Indexes are created with max index name length=191 and truncated
accordingly.
* Developer increases max index name length setting to 200 (no changes to
index names in the database).
* Now migrations can't operate on existing indexes because the 200
character index names it generates aren't what's in the database.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:29>

Django

unread,
Nov 23, 2015, 10:08:45 AM11/23/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by collinanderson):

I'm not thinking of limiting the _name_ of the index. The issue is "the
maximum number of characters that can be indexed".

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:30>

Django

unread,
Nov 23, 2015, 10:26:27 AM11/23/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

Thanks Collin, in that case your proposal makes more sense to me. It could
be nice to get a consensus from more MySQL users though.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:31>

Django

unread,
Dec 27, 2015, 7:34:01 AM12/27/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by ernestoalejo):

* cc: contact@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:32>

Django

unread,
Dec 29, 2015, 11:46:40 AM12/29/15
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by teeberg):

* cc: teeberg (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:33>

Django

unread,
Jan 5, 2016, 4:11:45 PM1/5/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by ask):

* cc: ask@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:34>

Django

unread,
Jan 12, 2016, 3:34:29 AM1/12/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by kezabelle):

* cc: django@… (added)


Comment:

By way of consensus: I agree that if possible, going forward, Django ought
to seek to use utf8mb4, because for better or worse, the world is becoming
more mobile oriented, and with it more emoji-laden; the non-mb4
charsets/collations choke on such things, which is just plain unfortunate.

If a suitable backwards-compatible, migration-friendly patch can't be
achieved, it should at least be mentioned in the docs somewhere, IMHO
(searching for `utf8mb4` or `emoji` currently yields no results, so I
assume there are none squirreled away). Whether that mention is a
recommendation or not is probably still unclear.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:35>

Django

unread,
Jan 15, 2016, 9:08:03 AM1/15/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by EmilStenstrom):

Slightly off-topic but maybe interesting: The Wordpress team did convert
all their users from utf8 to utf8mb4 according to [this blog
post](https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/).

Looking at the code it seems they just dropped and recreated all the
indexes: https://github.com/WordPress/WordPress/blob/master/wp-
admin/includes/upgrade.php#L2687

And then converted all the tables one by one:
https://github.com/WordPress/WordPress/blob/master/wp-
admin/includes/upgrade.php#L1951

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:36>

Django

unread,
Mar 8, 2016, 4:04:58 PM3/8/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by collinanderson):

Yes, I based my proposal off of what WordPress did. WordPress limited the
length of the index without limiting the length of the field itself.
Django currently doesn't have that option.

https://code.djangoproject.com/ticket/18392#comment:26

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:37>

Django

unread,
May 18, 2016, 6:52:33 AM5/18/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by edmorley):

* cc: emorley@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:38>

Django

unread,
Jun 15, 2016, 3:08:09 PM6/15/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by clokep):

* cc: clokep@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:39>

Django

unread,
Sep 19, 2016, 8:35:49 PM9/19/16
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

The skips for the tests added in 1a9f6db5ffd2d5e71d73340ab59476572e05a728
should be removed or modified when completing this ticket. Should we skip
them conditionally based on the test charset or should we require running
the MySQL tests with utf8mb4?

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:40>

Django

unread,
Aug 5, 2017, 10:00:28 AM8/5/17
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

I suggest to begin with a very minimal patch like
[https://github.com/django/django/pull/8853 this PR], which will at least
allow users to begin converting some database columns to `utf8mb4` through
custom migrations, and use these columns in their code (where indexing
doesn't come in their way).

Working on index issues can come later, and will be needed to run the
Django test suite with `utf8mb4`.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:41>

Django

unread,
Aug 5, 2017, 10:12:55 AM8/5/17
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Oh, now I realize that `utf8mb4` can also be set in DATABASES OPTIONS.
Still, using it in Django by default is a strong signal.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:42>

Django

unread,
Aug 10, 2017, 12:31:48 PM8/10/17
to django-...@googlegroups.com
#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz):

* needs_better_patch: 1 => 0
* needs_tests: 1 => 0


Comment:

I provided a [https://github.com/django/django/pull/8886 more
comprehensive patch].

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:43>

Django

unread,
Sep 2, 2017, 2:53:17 PM9/2/17
to django-...@googlegroups.com
#18392: Make MySQL backend default to utf8mb4 encoding

-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: 0 => 1


Comment:

There's an outstanding issue to fix on the pull request and Claude said,
"I'm not sure if I'll have time to continue working on this, so if anyone
wants to take this patch further, feel free!"

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:44>

Django

unread,
May 24, 2018, 4:47:04 PM5/24/18
to django-...@googlegroups.com
#18392: Make MySQL backend default to utf8mb4 encoding
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Kelly Campbell):

As a workaround, I came up with this monkey patch that limits the index
size. We put this in our migrations/__init__.py:

{{{
from django.db.models.fields import CharField

def _create_index_sql(self, model, fields, suffix="", sql=None):
"""
Return the SQL statement to create the index for one or several
fields.
`sql` can be specified if the syntax differs from the standard (GIS
indexes, ...).
"""
tablespace_sql = self._get_index_tablespace_sql(model, fields)
idx_columns = []
for field in fields:
c = field.column
if isinstance(field, CharField):
if field.max_length > 255:
idx_columns.append(self.quote_name(c) + '(255)')
else:
idx_columns.append(self.quote_name(c))
else:
idx_columns.append(self.quote_name(c))
columns = [field.column for field in fields]
sql_create_index = sql or self.sql_create_index
return sql_create_index % {
"table": self.quote_name(model._meta.db_table),
"name": self.quote_name(self._create_index_name(model, columns,
suffix=suffix)),
"using": "",
"columns": ", ".join(column for column in idx_columns),
"extra": tablespace_sql,
}


from django.db.backends.mysql.schema import DatabaseSchemaEditor
DatabaseSchemaEditor._create_index_sql = _create_index_sql

}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:45>

Django

unread,
Oct 1, 2018, 1:03:27 PM10/1/18
to django-...@googlegroups.com
#18392: Make MySQL backend default to utf8mb4 encoding
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Arthur Pemberton):

Can anyone clarify the process to migrate the default Django generated
MySQL schema to a utf8mb4 friendly one?

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:46>

Django

unread,
Jun 28, 2021, 6:31:13 AM6/28/21
to django-...@googlegroups.com
#18392: Make MySQL backend default to utf8mb4 encoding
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by lambdaq):

I submitted a utf8mb4 fix on github and was redirected here to this old
relic.

https://github.com/django/django/pull/14563

Lots of stuff had changed since the issue was first open two years ago. My
two cents:

1. In 5.7, innodb indexes no longer limits the 767 bytes hardcap on
utf8mb4 indexes.

> When innodb_file_format is set to Barracuda, innodb_large_prefix=ON
allows index key prefixes longer than 767 bytes (up to 3072 bytes) for
tables that use a Compressed or Dynamic row format.


2. MySQL 8 will default use db-wide utf8mb4

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html#mysqld-8-0-1-charset

I guess Django's decision to hack MySQL's default settings to less
supported utf8mb3 (aka the utf8) would be unwise. It was a proper
compromise 9 years ago, but it will be a liability in the years to come.

Maybe at least we can add some notes to alarm the readers?


CREATE DATABASE <dbname> CHARACTER SET utf8;

This would be a huge misleading mistake on official Django doc.

@felixxm @pope1ni

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:48>

Django

unread,
Jul 1, 2021, 8:10:17 AM7/1/21
to django-...@googlegroups.com
#18392: Make MySQL backend default to utf8mb4 encoding
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 1.4
(models, ORM) |
Severity: Normal | Resolution:
Keywords: utf8mb4 mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nick Pope):

Replying to [comment:48 lambdaq]:
> Lots of stuff had changed since the issue was first open ''nine'' years
ago.

Yes, it has. But it is still not necessarily straightforward.

> 1. In v5.7, innodb indexes no longer limits the 767 bytes hardcap on
utf8mb4 indexes.
>
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-7.html#mysqld-5-7-7-feature


> > When innodb_file_format is set to Barracuda, innodb_large_prefix=ON
allows index key prefixes longer than 767 bytes (up to 3072 bytes) for
tables that use a Compressed or Dynamic row format.

Based on your linked release notes, MySQL 5.7.7 changed the ''defaults''
of some settings to the following:

{{{
innodb_file_format=Barracuda # Previous default was Antelope
innodb_large_prefix=ON # Previous default was OFF
}}}

These allow indexing strings up to 768 characters instead of 191
characters for utf8mb4 which should eliminate the problem of . I don't
think we need to worry about that new upper limit being an issue as 255
was the previous cap anyway with utf8 (a.k.a. utf8mb3).

It should be noted that these options are also deprecated as of that
release and removed in
[https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-0.html#mysqld-8-0-0-feature
8.0.0]. The release notes also state that using non-default values for the
above settings in MySQL 5.7.7+ will log a deprecation warning.

The underlying problem, however, is what
[https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html row
format] is used. It is necessary for the row format of a table to be
`COMPRESSED` or `DYNAMIC` for large key index support. These were
unavailable with the previous default file format configuration (Antelope)
and only `REDUNDANT` and `COMPACT` could be used.

Note that even in MySQL 8.0.0+ all of these
[https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html row
formats] can be used, __will not be changed automatically during upgrade__
and thus the migration of existing projects from utf8mb3 → utf8mb4 (with
255 → 191 characters) is still a potential problem.

That said, the default value of [https://dev.mysql.com/doc/refman/5.7/en
/innodb-parameters.html#sysvar_innodb_default_row_format
innodb_default_row_format] added in MySQL 5.7.9 is `DYNAMIC`. (See
[https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html#innodb-
row-format-defining here] also.) Note that in 5.6
[https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format.html#innodb-
row-format-defining the default value] was `COMPACT`.

I found the following [https://dev.mysql.com/doc/refman/5.7/en/upgrading-
from-previous-series.html#upgrade-innodb-changes upgrade details] which
state:

> In MySQL 5.7.9, `DYNAMIC` replaces `COMPACT` as the implicit default row
format for InnoDB tables. A new configuration option,
`innodb_default_row_format`, specifies the default InnoDB row format.
Permitted values include `DYNAMIC` (the default), `COMPACT`, and
`REDUNDANT`.
>
> After upgrading to 5.7.9, any new tables that you create use the row
format defined by `innodb_default_row_format` unless you explicitly define
a row format (`ROW_FORMAT`).
>
> For existing tables that do not explicitly define a `ROW_FORMAT` option
or that use `ROW_FORMAT=DEFAULT`, any operation that rebuilds a table also
silently changes the row format of the table to the format defined by
`innodb_default_row_format`. Otherwise, existing tables retain their
current row format setting. For more information, see Defining the Row
Format of a Table.

So maybe things are not as bad as they seem after all.

> > Important Change: The default character set has changed from latin1 to
utf8mb4. These system variables are affected:
>
>
> IMHO Django's decision to hack MySQL's default settings to less
supported utf8mb3 (aka the utf8) would be unwise. Maybe it was a proper
compromise ''9 years ago'', but it will be a liability in the years to


come.
>
> Maybe at least we can add some notes to alarm the readers?
>
>
> CREATE DATABASE <dbname> CHARACTER SET utf8;
>
> This would be a huge misleading mistake on official Django doc.
>
> @felixxm @pope1ni

It would be nice to sort this out, not that I'm volunteering. You should
reignite the discussion on the DevelopersMailingList if you want to take
this on.

Given how things have changed maybe there is an easy path forward now and
we could consider the following:

- Require MySQL 5.7.9+. We can't drop 5.7 entirely as it is supported
until October 2023. See SupportedDatabaseVersions.
- Add a system check to ensure that the configuration options are set to
expected values:
{{{
# For MySQL < 8.0.0:
innodb_file_format=Barracuda
innodb_large_prefix=ON
# For all versions:
innodb_default_row_format=DYNAMIC
}}}
- Add a system check that the `ROW_FORMAT` of all tables is `DEFAULT`,
`DYNAMIC`, or `COMPRESSED`. (See [https://dev.mysql.com/doc/refman/5.7/en
/innodb-row-format.html#innodb-row-format-detrmining here].)
- Change the documentation and connection configuration to use `utf8mb4`.
- Add plenty of warnings into the release notes detailing the restrictions
and how to fix any issues in preparation for upgrading Django.

One other question is how this all affects MariaDB which is supported by
`django.db.backends.mysql`. I haven't looked into that.

--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:49>

Reply all
Reply to author
Forward
0 new messages