Instead of comparing the fake models rendered from the state generated by
applying `state_forwards` of all existing migrations against the current
model state generating model states from the current model classes and
comparing both would shave off a model rendering phase.
This was extracted from the meta ticket #22608 which tracks slow parts of
the migration framework.
An initial and dusty stab at this can be found here
https://github.com/django/django/compare/master...charettes:autodetector-
model-states
Note that this has a bit of overlap with #29898 because both optimizations
require a way to resolve relations efficiently from model states.
--
Ticket URL: <https://code.djangoproject.com/ticket/29899>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* cc: David Wobrock (added)
* owner: nobody => David Wobrock
* status: new => assigned
Comment:
Hi there,
I got more or less my head around what is asked in the ticket and how it
would improve the autodetector. However, I have some questions and I'd
like to ask for some help and expertise on this issue.
I'll just shoot my questions:
1. Am I understanding correctly that, by using the ModelState objects,
instead of the rendered model objects/classes, we avoid calling the
ProjectState.apps and ProjectState.concrete_apps (in the
Autodetector.__init__) which are rendering the models? Therefore, we avoid
work and speed up the makemigrations and migrate commands?
But since the "from_state" is still generated from the migration graph, we
will still apply all existing migrations iteratively to build it, right?
So we cannot totally avoid exploring the migration graph?
And let's get down to some more technical business :) I started re-working
the patch, it's still a work in progress, but I think most of the logic of
the autodetector is migrated to using model states. You should be able to
look at the patch commit-by-commit for convenience. I'll still need to
work on tests, fix the existing ones and adding some new ones to ensure I
introduced no regression. Please find my current progress here
https://github.com/django/django/compare/master...David-
Wobrock:ticket-29899?expand=1
2. The one thing about the logic which I'm unsure of, is the "relations"
property which is introduced in the ProjectState. The goal is, I guess, to
substitute the "related_objects" that a model class' option contains.
However, I don't fully understand how the "proxies", "concrete" models and
"swappable" ones work together.
I see that we are trying to fill the relations, for each model (identified
by app_label+model_name), have the related models (identified by the same
key) and the associated fields that have a relation. It's not easy to get
your head around the 4 consecutive loops.
- What are those relations exactly? All possible defined FK (also m2m,
...) on the model itself? Or also fields FK that point to this model?
- And would it possible, even quickly, to explain how "swappable", "proxy"
and "concrete" work together, I'd be grateful. I mean, I guess that we
need to resolve the real/concrete model from a proxy model, but
"swappable" is my main source of confusion here and I don't know how it
should be handled :/ Like a proxy model because we resolve the concrete
model when the User model is swappable?
Thanks a lot in advance! I'm assigning myself the issue.
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:1>
Comment (by Simon Charette):
Hello David,
> Am I understanding correctly that, by using the ModelState objects,
instead of the rendered model objects/classes, we avoid calling the
ProjectState.apps and ProjectState.concrete_apps (in the
Autodetector.init) which are rendering the models? Therefore, we avoid
work and speed up the makemigrations and migrate commands?
Exactly, accessing `.apps` and `concrete_apps` involves model rendering
which is really slow.
> But since the "from_state" is still generated from the migration graph,
we will still apply all existing migrations iteratively to build it,
right? So we cannot totally avoid exploring the migration graph?
Right but `from_state` is generated by calling `states_forwards` on all
operations of on-disk migrations and none of these operations should
access `.apps` during `states_forwards` which should make them relatively
fast.
> The one thing about the logic which I'm unsure of, is the "relations"
property which is introduced in the ProjectState. The goal is, I guess, to
substitute the "related_objects" that a model class' option contains.
That's it
> However, I don't fully understand how the "proxies", "concrete" models
and "swappable" ones work together.
You'll have to do a bit more of investigation on your side to figure this
out I'm afraid but the gist is that ''proxies'' models are `Meta.proxy =
True` models which are always backed by a ''concrete'' table backed one.
''swappable'' refers to models that can be swapped by other ones.
Currently only `auth.User` uses this feature internally.
> I see that we are trying to fill the relations, for each model
(identified by app_label+model_name), have the related models (identified
by the same key) and the associated fields that have a relation. It's not
easy to get your head around the 4 consecutive loops.
Yeah this code is tricky for sure.
> What are those relations exactly? All possible defined FK (also m2m,
...) on the model itself? Or also fields FK that point to this model?
IIRC they are all forwards relationships which means related fields
''from'' the model.
> And would it possible, even quickly, to explain how "swappable", "proxy"
and "concrete" work together, I'd be grateful. I mean, I guess that we
need to resolve the real/concrete model from a proxy model, but
"swappable" is my main source of confusion here and I don't know how it
should be handled :/ Like a proxy model because we resolve the concrete
model when the User model is swappable?
I tried to explain it above but I suggest you start by reading the
(limited) swappable model documentation and then dive into how they are
currently handled. A lot of this logic is tribal knowledge share by the
limited amount of people who worked on the migration framework over the
years and its hard to define beyond what's coded and currently passes the
suite. The basic idea is that swappable models could have been swapped by
another one (e.g. a custom user model via `settings.AUTH_USER_MODEL`) so
it needs to be treated differently.
As mentioned in the description there's a lot of overlap with this ticket
and #29898 which [https://groups.google.com/d/msg/django-
developers/_ohBzsuomqw/VOr2itOiAwAJ I tried to describe on the mailing
list to a developer interested in working on the latter for GSoC]. Both
tickets require access to a `ProjectState` forwards relationship cache
(and maybe a reverse one as well in this case) and both would benefit from
having `ProjectState` define methods that perform state alterations
currently baked into `Operation.state_forwards` overrides. That would
allow the auto-detector to simply call these operations if necessary when
performing renames instead of maintaining tons of different maps of
changes. It would also pave the way for `ProjectState`/`ModelState` to
have their own `.difference(other, questionner=None)` methods to break the
monolithic `autodetector` file into testable and extendable components
which users have been asking for a while.
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:2>
* needs_better_patch: 1 => 0
Comment:
PR [https://github.com/django/django/pull/12688]
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:3>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:4>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:5>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:6>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:8>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:9>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:10>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"aa4acc164d1247c0de515c959f7b09648b57dc42" aa4acc1]:
{{{
#!CommitTicketReference repository=""
revision="aa4acc164d1247c0de515c959f7b09648b57dc42"
Fixed #29899 -- Made autodetector use model states instead of model
classes.
Thanks Simon Charette and Markus Holtermann for reviews.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:13>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"a67849499a7aac6a18f8ce0e15f17914765a319c" a6784949]:
{{{
#!CommitTicketReference repository=""
revision="a67849499a7aac6a18f8ce0e15f17914765a319c"
Refs #29899 -- Improved variable names in
MigrationAutodetector._detect_changes().
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:12>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"fd325b9dee95f0930f5e8d582892b899508021fe" fd325b9]:
{{{
#!CommitTicketReference repository=""
revision="fd325b9dee95f0930f5e8d582892b899508021fe"
Refs #29899 -- Moved resolve_relation() to django.db.migrations.utils.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:11>
Comment (by Chris Jerdonek):
On this issue, does anyone remember if there was a reason that
`self.relations`
[https://github.com/django/django/commit/aa4acc164d1247c0de515c959f7b09648b57dc42
#diff-
5dd147e9e978e645313dd99eab3a7bab1f1cb0a53e256843adb68aeed71e61dcR93-R94
was defined] to have this form--
* `{remote_model_key: {model_key: [(field_name, field)]}}`
as opposed to this form?
* `{remote_model_key: {model_key: {field_name: field}}}`
While reviewing [https://github.com/django/django/pull/14587 PR #14587]
for #29898, I noticed the code would be a bit simpler in parts if it were
the latter as opposed to the former, but I wasn't sure if there was a
reason for a using a list of pairs instead of a dict.
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:14>
Comment (by Mariusz Felisiak):
Replying to [comment:14 Chris Jerdonek]:
> On this issue, does anyone remember if there was a reason that
`self.relations`
[https://github.com/django/django/commit/aa4acc164d1247c0de515c959f7b09648b57dc42
#diff-
5dd147e9e978e645313dd99eab3a7bab1f1cb0a53e256843adb68aeed71e61dcR93-R94
was defined] to have this form--
>
> * `{remote_model_key: {model_key: [(field_name, field)]}}`
>
> as opposed to this form?
>
> * `{remote_model_key: {model_key: {field_name: field}}}`
>
> While reviewing [https://github.com/django/django/pull/14587 PR #14587]
for #29898, I noticed the code would be a bit simpler in parts if it were
the latter as opposed to the former, but I wasn't sure if there was a
reason for a using a list of pairs instead of a dict.
Probably because there was no need to update the list of fields. I was
also thinking about changing this structure to a dict.
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:15>
Comment (by Chris Jerdonek):
> I was also thinking about changing this structure to a dict.
👍
--
Ticket URL: <https://code.djangoproject.com/ticket/29899#comment:16>