[Django] #33017: Fix get_storage_engine to only look at current database

8 views
Skip to first unread message

Django

unread,
Aug 12, 2021, 3:58:21 AM8/12/21
to django-...@googlegroups.com
#33017: Fix get_storage_engine to only look at current database
-------------------------------------+-------------------------------------
Reporter: Matjaz | Owner: nobody
Gregoric |
Type: Bug | Status: new
Component: Database | Version: 3.2
layer (models, ORM) |
Severity: Normal | Keywords: MySQL
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
When querying the information_schema.tables for the storage engine, we
have to specify table_schema in addition to table_name, otherwise the
query returns a list of results for all tables with the specified name
from every database in the system. If there are multiple tables with the
same name but using different storage engines present in different
databases in the MySQL instance, the query could return a wrong result.

We have to limit the query to the current database to make sure we get the
correct result.

This also improves performance on MySQL instances with a large number of
databases, since querying the information_schema table can be very slow.

The get_storage_engine function that this patch fixes is invoked when
adding columns during migrations to determine whether it should index the
column. We have a MySQL instance with over a thousand databases in
production and this patch has cut down the time it takes to run a specific
set of django migrations from 2 hours to around 15 minutes.

--
Ticket URL: <https://code.djangoproject.com/ticket/33017>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 12, 2021, 3:59:19 AM8/12/21
to django-...@googlegroups.com
#33017: Fix get_storage_engine to only look at current database
-------------------------------------+-------------------------------------
Reporter: Matjaz Gregoric | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: MySQL | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Matjaz Gregoric:

Old description:

> When querying the information_schema.tables for the storage engine, we
> have to specify table_schema in addition to table_name, otherwise the
> query returns a list of results for all tables with the specified name
> from every database in the system. If there are multiple tables with the
> same name but using different storage engines present in different
> databases in the MySQL instance, the query could return a wrong result.
>
> We have to limit the query to the current database to make sure we get
> the correct result.
>
> This also improves performance on MySQL instances with a large number of
> databases, since querying the information_schema table can be very slow.
>
> The get_storage_engine function that this patch fixes is invoked when
> adding columns during migrations to determine whether it should index the
> column. We have a MySQL instance with over a thousand databases in
> production and this patch has cut down the time it takes to run a
> specific set of django migrations from 2 hours to around 15 minutes.

New description:

https://github.com/django/django/pull/14766

When querying the information_schema.tables for the storage engine, we
have to specify table_schema in addition to table_name, otherwise the
query returns a list of results for all tables with the specified name
from every database in the system. If there are multiple tables with the
same name but using different storage engines present in different
databases in the MySQL instance, the query could return a wrong result.

We have to limit the query to the current database to make sure we get the
correct result.

This also improves performance on MySQL instances with a large number of
databases, since querying the information_schema table can be very slow.

The get_storage_engine function that this patch fixes is invoked when
adding columns during migrations to determine whether it should index the
column. We have a MySQL instance with over a thousand databases in
production and this patch has cut down the time it takes to run a specific
set of django migrations from 2 hours to around 15 minutes.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33017#comment:1>

Django

unread,
Aug 17, 2021, 4:58:53 AM8/17/21
to django-...@googlegroups.com
#33017: Fix get_storage_engine to only look at current database
-------------------------------------+-------------------------------------
Reporter: Matjaz Gregoric | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: MySQL | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

OK, seems plausible. Let's take it for review. Thanks Matjaz.

--
Ticket URL: <https://code.djangoproject.com/ticket/33017#comment:2>

Django

unread,
Aug 20, 2021, 5:07:22 AM8/20/21
to django-...@googlegroups.com
#33017: Fix get_storage_engine to only look at current database
-------------------------------------+-------------------------------------
Reporter: Matjaz Gregoric | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: MySQL | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/33017#comment:3>

Django

unread,
Aug 20, 2021, 5:47:46 AM8/20/21
to django-...@googlegroups.com
#33017: Fix get_storage_engine to only look at current database
-------------------------------------+-------------------------------------
Reporter: Matjaz Gregoric | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: MySQL | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"518ce7a51f994fc0585d31c4553e2072bf816f76" 518ce7a5]:
{{{
#!CommitTicketReference repository=""
revision="518ce7a51f994fc0585d31c4553e2072bf816f76"
Fixed #33017 -- Fixed storage engine introspection on MySQL.

This also improves performance on MySQL instances with a large number
of databases, since querying the information_schema table can be very
slow
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33017#comment:4>

Reply all
Reply to author
Forward
0 new messages