[Django] #28949: Multibyte table name or column name causes miscalculation of the length of index name.

11 views
Skip to first unread message

Django

unread,
Dec 20, 2017, 10:52:21 AM12/20/17
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak | Owner: nobody
Youngrok |
Type: Bug | Status: new
Component: | Version: 2.0
Migrations | Keywords: migration multibyte
Severity: Normal | index
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Django migration automatically creates index with name consists of table
name, column names, hash, and suffix. When the length of generated index
name is greater than `self.connection.ops.max_name_length()`, it shortens
the name. However, it calculate length as python string type, so it's
length doesn't match with the length of databases. The length should be
calculated after encoded with the database encoding. Because of this
issue, migration fails with these conditions below:

* long multibyte model names
* two multibyte model related with foreign key
* the foreign key field is CharField(or it's child class)

With these conditions, django migration tries to create two index(one for
normal index, one for `like` index), and the name of those are same except
suffix(the latter has suffix `_like`), and the lengths of both index names
as string are less than max name length but the length of both index names
as bytes are greater than max name length, so name conflict is raised.

long multibyte table name and foreign key name.

Here is the code:
https://github.com/django/django/blob/4420761ea9457d386b2000cf9df5b2f6f88f8f91/django/db/backends/base/schema.py#L873
{{{#!python
index_name = '%s_%s_%s' % (table_name, '_'.join(column_names),
hash_suffix_part)
if len(index_name) <= max_length:
return index_name
}}}

[https://docs.djangoproject.com/en/2.0/ref/databases/#encoding Django
assumes that all databases use UTF-8 encoding], so the code should be
fixed like this:
{{{#!python
index_name = '%s_%s_%s' % (table_name, '_'.join(column_names),
hash_suffix_part)
if len(index_name.encode('utf8')) <= max_length:
return index_name
}}}

The code that shorten the name should be also fixed. Getting a third of
each part and re-joining is not good strategy in multibyte world, it can
also cause miscalculation. I think getting very small amount of table and
column names like 2 or 3 characters and joining them with original hash
can be a safe solution.

--
Ticket URL: <https://code.djangoproject.com/ticket/28949>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 26, 2017, 2:55:33 PM12/26/17
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:1>

Django

unread,
Jan 25, 2018, 8:48:38 AM1/25/18
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Abhishek
| Gautam
Type: Bug | Status: assigned

Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Abhishek Gautam):

* owner: nobody => Abhishek Gautam
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:2>

Django

unread,
Jan 25, 2018, 2:13:48 PM1/25/18
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Abhishek
| Gautam
Type: Bug | Status: assigned
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Abhishek Gautam):

As we just need a unique name for an index can so, can we create
index_name as :
{{{#!python
index_name = '%s%s' % (self._digest(*([table_name] + column_names)),
suffix)
}}}

_digest function will be:

{{{#!python
@classmethod
def _digest(cls, *args):
"""
Generate a 32-bit digest of a set of arguments that can be used to
shorten identifying names.
"""
h = hashlib.md5()
for arg in args:
h.update(force_bytes(arg))
return h.hexdigest()
}}}

Using _digest method we will get 32 byte string and in that we will add
suffix which will give us a length of index_name = 32 + length of suffix.
As suffix length will be very small length of index_name will not be able
to exceed 40 also.

--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:3>

Django

unread,
Jan 27, 2018, 8:26:54 AM1/27/18
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: (none)
Type: Bug | Status: new

Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Abhishek Gautam):

* owner: Abhishek Gautam => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:4>

Django

unread,
Jan 2, 2022, 10:15:43 AM1/2/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: assigned

Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* owner: (none) => Jacob Walls


* status: new => assigned

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/15273 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:5>

Django

unread,
Jan 27, 2022, 3:07:06 AM1/27/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 0 => 1
* needs_tests: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:6>

Django

unread,
Jan 27, 2022, 11:09:48 AM1/27/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* needs_better_patch: 1 => 0
* needs_tests: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:7>

Django

unread,
Jan 28, 2022, 4:47:54 AM1/28/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Ready for
index | checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:8>

Django

unread,
Jan 28, 2022, 5:43:54 AM1/28/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Migrations | Version: 2.0
Severity: Normal | Resolution:
Keywords: migration multibyte | Triage Stage: Accepted
index |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 0 => 1
* stage: Ready for checkin => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:9>

Django

unread,
Jan 28, 2022, 10:18:12 AM1/28/22
to django-...@googlegroups.com
#28949: Multibyte table name or column name causes miscalculation of the length of
index name.
-------------------------------------+-------------------------------------
Reporter: Pak Youngrok | Owner: Jacob
| Walls
Type: Bug | Status: closed
Component: Migrations | Version: 2.0
Severity: Normal | Resolution: wontfix

Keywords: migration multibyte | Triage Stage:
index | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: assigned => closed
* resolution: => wontfix
* stage: Accepted => Unreviewed


Comment:

Closing per [https://github.com/django/django/pull/15273 discussion]. We
cannot use `encode()` because identifier limits are express in chars not
bytes, chars that can have 2, 3, 4 bytes. It may also depend on encoding
of the operating system or database, so it's not feasible to prepare a
fully backward compatible solution. I'd say that if you decided to use
non-ASCII chars in identifiers, you actually did this to yourself. Any
solution would be error-prone.

--
Ticket URL: <https://code.djangoproject.com/ticket/28949#comment:10>

Reply all
Reply to author
Forward
0 new messages