This is not a big issue on backends that support transactional DDL but it
is on ones that don't because in between the execution of the two
statements the column is not indexed.
For example given the following model
{{{#!python
class Author(models.Model):
name = models.CharField(max_length=50, db_index=True)
}}}
Altering `name` to `models.CharField(max_length=50, unique=True)` will
generate the following SQL on MySQL
{{{#!sql
DROP INDEX `author_name_idx` ON `author`;
-- Until the following statement completes the name field is not indexed
at all.
-- This can take a while on large tables.
ALTER TABLE `author` ADD CONSTRAINT `author_name_uniq` UNIQUE (`name`);
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/30867>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by Claude Paroz):
Why do you see that as a problem? Shouldn't the time between the two
commands be rather short? I think that reduced performance is something we
could expect during the execution of a migration.
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:1>
Comment (by Simon Charette):
> Why do you see that as a problem? Shouldn't the time between the two
commands be rather short?
Unique constraint creation can be relatively slow on large tables
particularly when performed without a lock (e.g. MySQL `LOCK=NONE`) to
allow concurrent read and writes. During that time period not having an
index on a frequently queried column can cause such queries to take a a
few orders more time to process and thus slowdown the constraint creation
time even more. At least that's what we've experienced on a table with 6M
rows when adding a unique constraint on a `VARCHAR(255) NOT NULL`.
> I think that reduced performance is something we could expect during the
execution of a migration.
Agreed but if we can reduce the performance degradation at the cost of
temporarily elevated storage needs I think we should do it.
Is there any reason except for double index storage needs not to do it?
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:2>
Comment (by Claude Paroz):
> Is there any reason except for double index storage needs not to do it?
No, if we can reasonably estimate that the risk of introducing regressions
is low and that it doesn't add too much code complexity.
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:3>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:4>
Comment (by Claude Paroz):
I would also add that at some point, you should assume the fact you are
using a database not supporting transactional DDL.
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:5>
Comment (by Simon Charette):
> I would also add that at some point, you should assume the fact you are
using a database not supporting transactional DDL.
FWIW I'm trying to figure out if it does make a difference on backends
that support transactional DDL as well.
For example, it's not completely crazy to assume that PostgreSQL could use
an existing index on a column it's trying to create a unique constraint on
to speed up the operation given B-trees are used for both implementations.
If that's the case deferring the index removal could be beneficial to
other backends as well.
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:6>
* status: new => closed
* resolution: => wontfix
Comment:
After a bit of investigation it looks like this would effectively add a
lot of complexity to the already complex `_alter_field` for little benefit
because the optimization is only valuable when only performing a
`(db_index=True)` to `(unique=True)` migration. If the field type is also
meant to be altered (e.g type) we certainly want to drop the index before
hand to avoid an internal rebuild with the new type which makes it hard to
determine when the optimization should be performed.
For the record I managed to get a noticeable speed up (10-15%) on
PostgreSQL when adding a unique constraint to an already indexed column
(10M rows integer).
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:7>
Comment (by Claude Paroz):
Thanks for sharing your experimentations!
"Le mieux est parfois l'ennemi du bien" :-)
--
Ticket URL: <https://code.djangoproject.com/ticket/30867#comment:8>