[Django] #31276: Add 'flush' option to only clear non-empty tables

15 views
Skip to first unread message

Django

unread,
Feb 15, 2020, 11:48:50 AM2/15/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam | Owner: nobody
(Chainz) Johnson |
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
This was suggested as an extra feature for Django-MySQL by @jonatron in
https://github.com/adamchainz/django-mysql/pull/611/files , but it would
make more sense to make it in Django core.

The `sql_flush` database operation, used by the 'flush' management command
and, thus through `TransactionTestCase`, creates one `TRUNCATE TABLE`
statement for every table - regardless of whether they are empty already.
This means unnecessary database operations are run, which will be most
noticeable in test runs.

Instead of flushing *every* table, we can flush only the non-empty ones.

A minimal, hopefully cross-database way of quickly finding out which
tables in a list are non-empty is to make existence queries on them
combined with union:

{{{
chainz@localhost [14]> (select 't' as has_rows from t limit 1) union
(select 't2' as has_rows from t2 limit 1);
+----------+
| has_rows |
+----------+
| t2 |
+----------+
1 row in set (0.001 sec)
}}}

We could use logic like this to batch the tables into such union queries
in `django.core.management.sql.sql_flush`. We could then reduce the list
of tables we wish to flush down to the non-empty ones for the `sql_flush`
operation. Perhaps it would be best to put this behind a flag that is set
only in tests, for backwards compatibility.

--
Ticket URL: <https://code.djangoproject.com/ticket/31276>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 16, 2020, 11:21:11 AM2/16/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: nobody

Johnson |
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz):

* stage: Unreviewed => Accepted


Comment:

+1. What sort of backwards compatibility issues would you imagine here?

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:1>

Django

unread,
Feb 16, 2020, 11:32:57 AM2/16/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: nobody

Johnson |
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Adam (Chainz) Johnson):

Not sure actually, just being cautious by suggesting the flag in case
someone else can come up with a case. As far as I can tell, as long as we
still reset sequences, there’s no need to truncate/delete from empty
tables.

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:2>

Django

unread,
Feb 17, 2020, 1:48:43 AM2/17/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: nobody

Johnson |
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by felixxm):

I'm not sure, TBH. Checking no. of rows for each table will have a
performance impact, and creating a huge union for all tables (as proposed)
may not be better:
{{{


(select 't' as has_rows from t limit 1) union (select 't2' as has_rows
from t2 limit 1);
}}}

We need to take into account a few restrictions e.g. the database's limit
on the number of query parameters and 3rd-party database backends may not
support `UNION`.

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:3>

Django

unread,
Feb 18, 2020, 1:40:07 PM2/18/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned

Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jonatron):

* owner: nobody => jonatron
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:4>

Django

unread,
Feb 19, 2020, 7:02:05 AM2/19/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jonatron):

I've been benchmarking with 1000 tables and 5 tests. You'll notice sqlite
has a problem with too many tables. Also the query for sqlite needs to be:
{{{select * from ( SELECT '{}' AS has_rows FROM {} LIMIT 1 )}}}
instead of
{{{( SELECT '{}' AS has_rows FROM {} LIMIT 1 )}}}

postgres

current behaviour
Ran 5 tests in 1.538s
Ran 5 tests in 1.545s
Ran 5 tests in 1.573s

with patch
Ran 5 tests in 0.807s
Ran 5 tests in 0.880s
Ran 5 tests in 0.812s


mysql
current behaviour
Ran 5 tests in 17.101s
Ran 5 tests in 15.396s
Ran 5 tests in 15.487s

with patch
Ran 5 tests in 3.891s
Ran 5 tests in 3.545s
Ran 5 tests in 3.522s


sqlite - 400 tables
current behaviour
Ran 5 tests in 2.006s
Ran 5 tests in 2.039s
Ran 5 tests in 2.055s

with patch
Ran 5 tests in 0.106s
Ran 5 tests in 0.104s
Ran 5 tests in 0.107s

sqlite - 1000 tables
django.db.utils.OperationalError: too many terms in compound SELECT

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:5>

Django

unread,
Feb 19, 2020, 7:26:44 AM2/19/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by felixxm):

About SQLite, I mentioned this in my
[https://code.djangoproject.com/ticket/31276#comment:3 comment] ''"... the
database's limit on the number of query parameters"'' (999 on SQLite).

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:6>

Django

unread,
Feb 19, 2020, 7:57:26 AM2/19/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jonatron):

The sqlite problem can be fixed using the same chunking as is in
_quote_params_for_last_executed_query.
I can't explain sqlite's performance.

sqlite
1000 tables with patch
Ran 5 tests in 0.237s
Ran 5 tests in 0.345s
Ran 5 tests in 0.288s
Ran 5 tests in 0.257s
Ran 5 tests in 0.240s

1000 tables without patch

Ran 5 tests in 55.617s
Ran 5 tests in 57.577s
Ran 5 tests in 55.242s

900 tables without patch
Ran 5 tests in 45.796s

800 tables without patch
Ran 5 tests in 35.795s

600 tables without patch
Ran 5 tests in 20.521s

550 tables without patch
Ran 5 tests in 19.025s

500 tables without patch
Ran 5 tests in 3.049s

400 tables without patch
Ran 5 tests in 2.068s

200 tables without patch
Ran 5 tests in 0.612s

--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:7>

Django

unread,
Feb 20, 2020, 10:44:52 AM2/20/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jonatron):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:8>

Django

unread,
Feb 25, 2020, 7:14:10 AM2/25/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* needs_docs: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:9>

Django

unread,
Oct 23, 2020, 4:28:34 AM10/23/20
to django-...@googlegroups.com
#31276: Add 'flush' option to only clear non-empty tables
-------------------------------------+-------------------------------------
Reporter: Adam (Chainz) | Owner: jonatron
Johnson |
Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: assigned => closed
* resolution: => wontfix


--
Ticket URL: <https://code.djangoproject.com/ticket/31276#comment:12>

Reply all
Reply to author
Forward
0 new messages