In the case of `makemigrations` the algorithm is run on each generated
`Migration.operations` individually and in the case of `squashmigration
<app_label>` it is the ordered combination of all `Migration.operations`
of the specified application.
The actual reduction algorithm
[https://github.com/django/django/blob/8715205c5c0e49b21b5bbea35d904713ee188a71/django/db/migrations/optimizer.py#L34-L77
hard codes all possible optimizations] which makes it harder to unit test
and extend (see #24109).
From my understanding these reduction methods were originally hardcoded in
the `Optimizer` class for performance reason. However, based on
benchmarking using a [https://gist.github.com/MarkusH/a95d9407a96fb8e63783
migration generation script provided by Markus], it looks like this might
be a case of premature optimization as I couldn't get a noticeable
slowdown on both `makemigrations` (with over 1000 operations generated in
a single migration) and `squashmigrations` (with over 100 migrations with
around 10 operations each).
Therefore I suggest to move operation reduction to their respective class.
Note that the `Operation.reduce` method could even be documented in the
future to expose a public API for third party application shipping with
custom `Operation` providers.
If it was made public this method could also be used to solve
[https://groups.google.com/d/msg/django-
developers/Ln1-IqysEwE/iL20WPT3EgAJ the data migration squashing problem
described by Shai on the mailing list] by allowing specific reduction to
be performed.
For example, given the following operation:
{{{#!python
class CreateInitialCountries(migrations.RunPython):
def __init__(model_name, name_field):
self.model_name = model_name
self.name_field = name_field
super().__init__(self.forwards)
def forwards(self, apps, schema_editor):
country_model = apps.get(self.model_name)
for name in self.get_initial_countries_from_csv():
country_model.objects.create(**{self.name_field: name})
def reduce(self, operation, in_between):
if (isinstance(operation, RenameModel) and
self.model_name == operation.old_model_name):
return [
operation, CreateInitialCountries(
operation.new_model_name, self.name_field
)
]
elif (isinstance(operation, RenameField) and
self.model_name == other.model_name and
self.field_name == other.old_name):
return [
operation, CreateInitialCountries(
self.model_name, other.new_name
)
]
elif (isinstance(operation, DeleteModel) and
self.model_name == operation.model_name):
return [operation]
}}}
These list of operations would me reduced/optimized as follow:
{{{#!python
operations = [
CreateModel(
'Country',
fields=[
('id', models.AutoField(primary_key=True)),
('name', models.CharField(max_length=50)),
]
),
CreateInitialCountries('Country', 'name'),
RenameField('Country', 'name', 'renamed_name'),
RenameModel('Country', 'RenamedCountry'),
]
assert optimize(operations) == [
CreateModel(
'RenamedCountry',
fields=[
('id', models.AutoField(primary_key=True)),
('renamed', models.CharField(max_length=50)),
]
),
CreateInitialCountries('RenamedCountry', 'renamed'),
]
operations = [
CreateModel(
'Country',
fields=[
('id', models.AutoField(primary_key=True)),
('name', models.CharField(max_length=50)),
]
),
CreateInitialCountries('Country', 'name'),
DeleteModel('Country'),
]
assert optimize(operations) == []
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26064>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:1>
* owner: nobody => charettes
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:2>
Comment (by shaib):
That looks very interesting, but has some issues that need working out.
The first is -- it seems obvious that the operation needs to be defined
out of the migration files. But intuitively, this is an operation that is
"only serving one migration" -- the idea that it would serve in a
different migration after a squashing operation will not be on anyone's
mind when it is written. As things are presented, I suspect it would be
hard to explain why you'd want to write the code of your data migration in
an operation class rather than a migration file.
The second (a problem we already have today, but perhaps this gives us an
opportunity to solve it) is interaction between user operations. If we go
this way, users will want to be able to handle these -- consider the
migration which adds the population to each country; our user will want to
replace the two (creation and add-population) by one operation, which
creates the countries with their population. But for that to make sense
(with the admonition not to change migrations which have already been
applied), the reduction rules cannot be defined on the earlier migration
-- they need to be defined either on the later one, or outside of both.
More generally -- I've seen data-creation migrations which have been much
more elaborate, creating complex structures with relations between models
etc. I suspect we won't be able to ask users to write such migrations with
all model and field names passed in as arguments and used via reflection.
Still, it may be possible to get from a "normal" `RunPython` function to
one closer in spirit to your `CreateInitialCountries` operation by more-
or-less automatic means (AST transformations).
So, there's a lot of promise here, but also a lot of work and thinking to
do.
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:3>
* cc: shaib (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:4>
* cc: loic (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:5>
Comment (by carljm):
Hmm, this is neat, and if there isn't a performance hit I think it's
probably worth implementing just to see what people do with the improved
flexibility, but I am skeptical about just how useful this will be in
practice for the use case described in the ticket. To put my skepticism in
brief: is it actually easier to write that reduce method in advance,
anticipating all the various schema alterations that might surround your
operation and how you'd return an adjusted version of your operation to
account for them, than it is to just take care of it manually when
squashing? Even if you're using "squashmigrations" rather than the "full
reset" approach, "taking care of it manually" is just a matter of removing
the data migration, adjusting its successor to point to its parent, then
squashing, then re-appending (and modifying as necessary) the data
migration. Squashing is done infrequently enough that I have a hard time
imagining when I'd actually choose to write and test a fully parametrized
operation with a custom reduce method instead of just doing that manual
dance every now and then. Maybe I just don't use initial-data migrations
enough? Or don't squash frequently enough?
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:6>
Comment (by charettes):
> To put my skepticism in brief: is it actually easier to write that
reduce method in advance, anticipating all the various schema alterations
that might surround your operation and how you'd return an adjusted
version of your operation to account for them, than it is to just take
care of it manually when squashing? Even if you're using
"squashmigrations" rather than the "full reset" approach, "taking care of
it manually" is just a matter of removing the data migration, adjusting
its successor to point to its parent, then squashing, then re-appending
(and modifying as necessary) the data migration. Squashing is done
infrequently enough that I have a hard time imagining when I'd actually
choose to write and test a fully parametrized operation with a custom
reduce method instead of just doing that manual dance every now and then.
Maybe I just don't use initial-data migrations enough? Or don't squash
frequently enough?
Thanks for your feedback Carl. I agree that for most experienced
developers among us this won't be a game changer. As you said we can
easily figure out where the non-elidable migration has to be moved or how
it should be adjusted.
However I think it can make a significant difference for third-party
applications and large projects where non experienced developers still
have to squash their migrations on their own once there is too many of
them. Such an addition could benefit such users as more experienced
developers could make the whole process a no brainer, the operation would
take care of itself.
For example, the `HStoreExtension` and `HStoreField` case would be a good
candidate for such an addition.
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:7>
* stage: Accepted => Ready for checkin
Comment:
[https://github.com/django/django/pull/5957 PR] looks good to me.
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:8>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"49f4c9f4c61885189d136d7073c26ecc91b482b1" 49f4c9f4]:
{{{
#!CommitTicketReference repository=""
revision="49f4c9f4c61885189d136d7073c26ecc91b482b1"
Fixed #26064 -- Moved operation reduction logic to their own class.
Thanks to Markus Holtermann and Tim Graham for their review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:9>
Comment (by GitHub <noreply@…>):
In [changeset:"daaa894960e1b2bce8ee31b7c109be84f598a84e" daaa8949]:
{{{
#!CommitTicketReference repository=""
revision="daaa894960e1b2bce8ee31b7c109be84f598a84e"
Refs #26064 -- Avoided unnecessary list slicing in migration optimizer.
The in_between list is only necessary if an optimization is possible.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26064#comment:10>