Adding support for importing models through the app registry while it is loading

173 views
Skip to first unread message

Aymeric Augustin

unread,
Sep 4, 2016, 3:03:18 PM9/4/16
to django-d...@googlegroups.com
Hello,

Since Django 1.7, `get_user_model()` cannot be called at import time. Django contains a lot of methods that start with `UserModel = get_user_model()` while it would be more natural to declare it as a global variable in the module. This trips up users and the reason why it’s forbidden isn’t obvious.


It's the consequence of two design decisions (which I’m responsible for) in the app-loading refactor.

1. Before Django 1.7, the sequence in which models were loaded could depend on which views were hit in what order after a WSGI process started. That allowed interesting failures mode. For example, a project could randomly fail to load, only in production, because of a circular import error. Throw multithreading into the mix and it could get really exciting. To solve this, the new app-loading system added an explicit `django.setup()` step that imports apps and models in a deterministic order and doesn’t let code interact with the app registry until it’s fully loaded.

2. Django must ensure that relationships between models are set up correctly. Specifically it must guarantee that reverse accessors are added to each model targeted by a foreign key. This guarantee can only be provided once all other models have been imported. There’s been a long string of bugs in this area around 2008 (if memory serves). The new app-loading system took this requirement into account at the design stage and, thanks to the explicit setup, is structurally more robust against such bugs.

Because of 2. apps.get_models() raises an exception before all models are imported rather than return an incomplete model. Because of 1. Django gives very little control on what happens up to this point.


However, in many cases, it doesn’t matter whether the model is fully constructed or not. In the use case I described in my introduction, the code only needs a reference to the model class at import time; that reference won’t be used until run time, after the app-loading process has completed. (Performing SQL queries through the ORM at import time doesn’t work anyway.)

For this reason, I think it would make sense to add an API similar to `apps.get_models()`, but that would work while app-loading is still in progress. It would make no guarantees about the functionality of the class it returns until app-loading has completed.

I implemented a proof of concept here: https://github.com/django/django/pull/6547.


I would appreciate feedback on the following questions:

1. Have you been missing this feature? If it’s needed outside of Django, I must make it a public API.

2. Is it reasonable to expose it as a public API? There’s a risk that StackOverflow will consider it as superior to get_models() because it’s more lax; they'll assume that I put checks in get_models() simply because I like messing with users.

3. Would you prefer adding a kwarg to get_models e.g. apps.get_models(settings.AUTH_USER_MODEL, unsafe=True) or adding a new method like the PR currently does? I’m thinking the former is more user friendly as the functionality is very similar.


Thanks,

--
Aymeric.

Carl Meyer

unread,
Sep 6, 2016, 1:40:23 PM9/6/16
to django-d...@googlegroups.com
Hi Aymeric,

On 09/04/2016 01:03 PM, Aymeric Augustin wrote:
> Hello,
>
> Since Django 1.7, `get_user_model()` cannot be called at import time.
> Django contains a lot of methods that start with `UserModel =
> get_user_model()` while it would be more natural to declare it as a
> global variable in the module. This trips up users and the reason why
> it’s forbidden isn’t obvious.
...
> However, in many cases, it doesn’t matter whether the model is fully
> constructed or not. In the use case I described in my introduction,
> the code only needs a reference to the model class at import time;
> that reference won’t be used until run time, after the app-loading
> process has completed. (Performing SQL queries through the ORM at
> import time doesn’t work anyway.)
>
> For this reason, I think it would make sense to add an API similar to
> `apps.get_models()`, but that would work while app-loading is still
> in progress. It would make no guarantees about the functionality of
> the class it returns until app-loading has completed.
>
> I implemented a proof of concept here:
> https://github.com/django/django/pull/6547.

A few thoughts:

1) A kwarg to `get_model` seems fine, but I don't like the vague FUD of
`unsafe=True` (if it's really "not safe" we shouldn't add it / make it
public at all). How about something more specific like
`require_ready=False`?

2) I think a key question here is the nature and clarity of the problems
that arise if you try to make use of an un-ready model class. If the
failure mode here is the reintroduction of unpredictable heisenbugs
where certain related fields are quietly not present because the model
on the other end happens to not have been imported yet, I think that's a
strong argument for refraining from introducing this API. Is it possible
to make it so that even the model meta introspection APIs (and of course
also any attempt to query the db) fail quickly, loudly, and with a clear
error message if the app registry is not yet fully populated? If so,
then I think there's little risk to adding this API (and in fact I think
we could even make it the default behavior of `get_model`).

As for whether it's needed in real use, the only feedback I can offer at
the moment is that the import-order enforcement that kicked in with the
removal of app-loading deprecation shims in Django 1.9 is the primary
reason that Instagram has so far upgraded only as far as Django 1.8;
fixing the import-order issues appeared to require too much upheaval and
would have delayed the upgrade too much. I haven't had time to look
closely at those issues so I can't yet offer more detailed feedback on
how much the availability of this particular API would help.

Carl

signature.asc

Aymeric Augustin

unread,
Sep 6, 2016, 2:55:54 PM9/6/16
to django-d...@googlegroups.com
Hi Carl,

Thanks for the feedback!

> On 06 Sep 2016, at 19:39, Carl Meyer <ca...@oddbird.net> wrote:
>
> 1) A kwarg to `get_model` seems fine, but I don't like the vague FUD of
> `unsafe=True` (if it's really "not safe" we shouldn't add it / make it
> public at all). How about something more specific like
> `require_ready=False`?

Agreed. Actually I thought there must be a better name but didn’t have a
better idea and then I forgot to mention it, the email was quite long already.

> 2) I think a key question here is the nature and clarity of the problems
> that arise if you try to make use of an un-ready model class. If the
> failure mode here is the reintroduction of unpredictable heisenbugs
> where certain related fields are quietly not present because the model
> on the other end happens to not have been imported yet, I think that's a
> strong argument for refraining from introducing this API.

In Django 1.7+, I’m pretty sure the import sequence is fully deterministic for
a given version of a project.

get_wsgi_application() runs django.setup() first thing. Not user code has been
imported at that point.

django.setup() accesses settings, which can import user code but shouldn’t —
it’s super easy to create import loops if it does. In general settings don't
import much. They might be able to import models but that's pathological.
Just don't do that.

Then apps.populate() imports apps and models in the order of INSTALLED_APPS.
This is fully deterministic, even with multi-threading: the first thread that
reaches populate() takes a lock and other threads wait until that thread has
finished.

At this point all models have been imported according in a fully deterministic
sequence (assuming non-pathological code e.g. `if random() > 0.5: import foo`)

I believe the next step is to import URLs. I didn't check if that happens
immediately or when the first request is received. Some non-determinism might
happen here, especially with per-request urlconfs, but let's focus on models
and leave that question for another day :-)

The change I’m proposing doesn’t introduce random bugs. If models are
missing reverse relations, that will be deterministic.

> Is it possible
> to make it so that even the model meta introspection APIs (and of course
> also any attempt to query the db) fail quickly, loudly, and with a clear
> error message if the app registry is not yet fully populated? If so,
> then I think there's little risk to adding this API (and in fact I think
> we could even make it the default behavior of `get_model`).

If not checking for models_ready was the default for get_model(), then I’d
expect it to be the default for get_models() as well, but that doesn’t make
sense. get_models() really needs all models to be imported.

My initial patch added a new method, `import_model`, because the operation is
quite different from the app registry's perspective. Instead of just looking
up the model, we try to import it. Even though the end result feels the same,
the context isn't the same.

Also, from the "separation of concerns" angle, I'm not eager to create
coupling between the model meta API and the app registry.

So, while I get the appeal of not adding a somewhat scary kwarg, I still think
it's good to have users declare that they're treading into pre-`models_ready`
territory.

We could relax this constraint later by ignoring the require_ready argument,
at the cost of some code churn in third-party apps that started using it.

> As for whether it's needed in real use, the only feedback I can offer at
> the moment is that the import-order enforcement that kicked in with the
> removal of app-loading deprecation shims in Django 1.9 is the primary
> reason that Instagram has so far upgraded only as far as Django 1.8;

BRB finding a patch of sand to bury my head in...

> fixing the import-order issues appeared to require too much upheaval and
> would have delayed the upgrade too much. I haven't had time to look
> closely at those issues so I can't yet offer more detailed feedback on
> how much the availability of this particular API would help.

Without looking at the code base, I can’t say. It might help if you’re doing
swappable-model-style dynamic imports. It won’t help if you’re importing
models directly or indirectly from your applications’s __init__.py, which is
a more common problem.

For the record this is tracked in #24312 but that ticket neither describes the
exact problem nor possible ways to tackle it.

I think #21719 would have led there as well but it turned out to be
surprisingly hard -- in the realm of "not sure I'd get anywhere even if I
spent two weeks full-time on it”.

--
Aymeric.


Carl Meyer

unread,
Sep 6, 2016, 3:20:08 PM9/6/16
to django-d...@googlegroups.com
On 09/06/2016 12:55 PM, Aymeric Augustin wrote:
> Hi Carl,
>
> Thanks for the feedback!

Of course! Thanks for working on things :-)

> ...
> The change I’m proposing doesn’t introduce random bugs. If models are
> missing reverse relations, that will be deterministic.

+1

>> Is it possible
>> to make it so that even the model meta introspection APIs (and of course
>> also any attempt to query the db) fail quickly, loudly, and with a clear
>> error message if the app registry is not yet fully populated? If so,
>> then I think there's little risk to adding this API (and in fact I think
>> we could even make it the default behavior of `get_model`).
>
> If not checking for models_ready was the default for get_model(), then I’d
> expect it to be the default for get_models() as well, but that doesn’t make
> sense. get_models() really needs all models to be imported.
>
> My initial patch added a new method, `import_model`, because the operation is
> quite different from the app registry's perspective. Instead of just looking
> up the model, we try to import it. Even though the end result feels the same,
> the context isn't the same.

Yes, this makes sense. It suggests another possible name for the kwarg,
`force_import=True`?

...
> Without looking at the code base, I can’t say. It might help if you’re doing
> swappable-model-style dynamic imports. It won’t help if you’re importing
> models directly or indirectly from your applications’s __init__.py, which is
> a more common problem.

Yes, I think it's likely this wouldn't help much.

> I think #21719 would have led there as well but it turned out to be
> surprisingly hard -- in the realm of "not sure I'd get anywhere even if I
> spent two weeks full-time on it”.

I still think that's the right approach (and should be possible) in
theory, but I trust your estimation of complexity and am currently short
of two-week-plus blocks of free time :-)

Carl

signature.asc

Aymeric Augustin

unread,
Sep 30, 2016, 4:17:15 PM9/30/16
to django-d...@googlegroups.com
Hello,

Quick follow-up: I just updated my PR according to this discussion:
https://github.com/django/django/pull/6547

While I was there, I made get_user_model() usable at import time.
We’ll need feedback from actual projects to know it that works.

--
Aymeric.

Reply all
Reply to author
Forward
0 new messages