When adding multiple fields to a Model the makemigrate and migrate does
not attempt to create one ALTER TABLE statement, but rather one per field
added. This can cause slow migrations of large tables.
I propose to optimise to attempt to merge multiple AddField to one single
ALTER TABLE request when the migrate is run.
My real world case the db backend was MySQL, here is PG example of 2nd
migration db output after adding two fields to one model.
{{{
Applying play.0002_auto_20150122_1825...DEBUG ALTER TABLE "play_play"
ADD COLUMN "field2" varchar(100) NULL; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100)
NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; (params
[])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT;
args=[]
DEBUG ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL;
(params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100)
NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; (params
[])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT;
args=[]
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24203>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Some ideas: https://github.com/peterlauri/django/pull/1
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:1>
Comment (by timgraham):
Could you give some performance numbers for comparison?
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:2>
Comment (by peterlauri):
Replying to [comment:2 timgraham]:
> Could you give some performance numbers for comparison?
I can provide this when I'm back from vacation in two weeks (13th of
February), I don't have access to the db for the moment.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:3>
* cc: MarkusH (added)
* version: 1.7 => master
* stage: Unreviewed => Someday/Maybe
Comment:
On PG it is recommended (in the docs iirc, but can't find a link) to add a
column with `null=True`, promote the row values for existing rows and drop
the `not null` constraint. This way adding the column is instant and thus
no need to join multiple `AddField` operations.
On MySQL <5.6 the `ADD COLUMN` operations are expensive in any case
independent from `NULL` or `NOT NULL` afaik.
On SQLite, well, it's SQLite. It's slow anyways and doesn't support adding
multiple columns at once.
I have no clue about Oracle's behavior.
That said, I'm not sure this is really necessary but keep the issue open
for your feedback.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:4>
Comment (by simonpercivall):
Replying to [comment:2 timgraham]:
> Could you give some performance numbers for comparison?
I can give performance numbers for a MySQL installation.
Doing an ALTER TABLE on a table with 120 columns and containing approx.
20M rows takes 2–3 hours. Combining several ALTER TABLES into one takes no
longer time.
The simple math here is that three Django migrations would take 6–9 hours,
whereas the combined version would take 2–3 hours.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:5>
Comment (by peterlauri):
Replying to [comment:2 timgraham]:
> Could you give some performance numbers for comparison?
This is not from production, it contains ~500k rows. I will try it out on
PG as well, just need to prep some data.
{{{
mysql> select count(*) from thetable;
+----------+
| count(*) |
+----------+
| 489484 |
+----------+
1 row in set (0.07 sec)
mysql> ALTER TABLE thetable ADD COLUMN field2 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 29.05 sec)
Records: 489484 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE thetable ADD COLUMN field3 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 10.82 sec)
Records: 489484 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE thetable ADD COLUMN field4 varchar(100) NULL, ADD
COLUMN field5 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 24.92 sec)
Records: 489484 Duplicates: 0 Warnings: 0
mysql>
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:6>
Comment (by peterlauri):
Replying to [comment:2 timgraham]:
> Could you give some performance numbers for comparison?
I prepped a DB with some fields and 500k rows. The alter table statements
as in my original description executed each in few milliseconds. No
significant improvement to execute them in a single statement.
{{{
ballongen=# select count(*) from testtable;
count
--------
500000
(1 row)
Time: 34.261 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field2" varchar(100) NULL;
ALTER TABLE
Time: 56.649 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field2" DROP DEFAULT;
ALTER TABLE
Time: 2.563 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field3" varchar(100) NULL;
ALTER TABLE
Time: 1.165 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field3" DROP DEFAULT;
ALTER TABLE
Time: 0.240 ms
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:7>
Comment (by adamchainz):
+1
I spend a large amount of time rewriting the SQL output from django
migrations to combine the ALTER TABLE statements for each table into one,
just so they can pass on our production database (MySQL, 30GB, hundreds of
millions of rows).
Actually I normally use [http://www.percona.com/doc/percona-toolkit/2.2
/pt-online-schema-change.html pt-online-schema-change] for tables of any
size, since it's 0 downtime, but it's still a pain to rewrite all the SQL
into a single ALTER.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:8>
Comment (by peterlauri):
Please note that in my comment #7
(https://code.djangoproject.com/ticket/24203#comment:7) I forgot to
mention this was for PostgreSQL.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:9>
* cc: ambivalentno (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:10>
Comment (by ambivalentno):
Regarding timings:
InnoDB table with 2 mln entries (but lots of big TextFields).
It takes ~3 hours to add a single field, and also ~3 hours to add multiple
fields in case they're grouped via single sql statement. Actually, most of
time it takes to perform a migration mysql copies data to temporary table.
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:11>
* cc: dharris+django@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:12>
* cc: emorley@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:13>
* cc: Phil Krylov (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:14>
* cc: Adam (Chainz) Johnson (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:15>
* cc: elonzh (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:16>
* cc: elonzh (removed)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:17>
* cc: elonzh (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24203#comment:18>