Lazy-loading model fields

842 views
Skip to first unread message

Jesse Young

unread,
Oct 14, 2008, 3:14:05 AM10/14/08
to Django developers
Moving discussion from http://code.djangoproject.com/ticket/5420.

To summarize, I patched my local version of Django by adding a boolean
'lazy' parameter to the Field constructor, e.g.
models.TextField(lazy=True), which determines whether Django loads
that field by default on select queries, and also added one function
to the manager and query set, toggle_fields(fetch=None, lazy=None,
fetch_only=None), where each argument can be an array of field names
(or None), to change the fields that would be retrieved by that
particular query.

mtredinnick replied: "You are implementing quite a different feature:
fields that are always lazy loaded. That's a data modelling issue, not
a retrieval issue (although pretty much this whole ticket is about
working around data modelling that probably should be done differently
in the first place, but c'est la vie). If you don't ever want them
loaded with the main data, that suggests your data modelling has gone
slightly awry and those fields shouldn't be part of the main model.
Since foreign key relations are loaded lazily in the normal course of
events, simply put the things you don't want loaded with the main
model into another table and it will always lazy load upon access."

I agree that it is possible to do a similar thing by split large
fields into a separate table. However, there are also good reasons not
to split them out into separate tables:
* If you split them out, you have to manually create one model every
time you create the other model.
* If a fraction of your queries actually need those large fields, then
you have to do a join, which can be expensive.
* If your data access pattern changes after your application has
actual data, it is much harder to migrate your data than to make a one
line change to the application code (i.e., by adding lazy=True to the
field definition).
* Maintaining more tables is more work than maintaining fewer tables.

Anyway, I don't really mind if Django decides it doesn't want to
include this. We'll use it anyway. Just trying to help.

-Jesse

Marty Alchin

unread,
Oct 14, 2008, 10:23:24 AM10/14/08
to django-d...@googlegroups.com
I won't pretend to know the whole story here, but I will take issue
with some of your points. I'll leave others to people who have history
on the subject.

On Tue, Oct 14, 2008 at 3:14 AM, Jesse Young <adu...@gmail.com> wrote:
> * If you split them out, you have to manually create one model every
> time you create the other model.

If by "manually create one model" you mean "type out a brand new model
definition in Python" you're flat-out wrong. I've done quite a bit of
work with generating models dynamically[1] and what you're asking for
could be made very simple using that approach. Rather than this:

thesis = models.TextField(lazy=True)

you could use this instead:

thesis = lazy_utils.LazyTextField()

And LazyTextField could create the associated model behind the scenes,
set up a descriptor for loading its contents only when necessary,
cache those contents so they only get queried once, etc. And all this
without a single patch to Django's core and without you having to
manually maintain another model.

> * If a fraction of your queries actually need those large fields, then
> you have to do a join, which can be expensive.

It doesn't address this, of course, but like I said, I'll leave that
to the professionals.

> * If your data access pattern changes after your application has
> actual data, it is much harder to migrate your data than to make a one
> line change to the application code (i.e., by adding lazy=True to the
> field definition).

Admittedly, the relational approach would require a data migration,
but that's life sometimes. There are plenty of tools now to make
changes like this relatively easy to deal with, and I expect a custom
LazyTextField could be written to give extra hints to any of them.

> * Maintaining more tables is more work than maintaining fewer tables.

Again, this depends on what you mean by "maintaining". At a raw
database level, yes, that's true, but if you're talking about the
Python level, you can make that extra table almost non-existent and
work with it as a regular field.

Anyway, I don't expect this information to solve your problem, but I
hope I've cleared up a few inaccuracies in your justification. Maybe
it'll help you with whatever solution you do come up with.

-Gul

[1] http://code.djangoproject.com/wiki/DynamicModels

Jesse Young

unread,
Oct 14, 2008, 12:34:04 PM10/14/08
to Django developers
> And LazyTextField could create the associated model behind the scenes,
> set up a descriptor for loading its contents only when necessary,
> cache those contents so they only get queried once, etc. And all this
> without a single patch to Django's core and without you having to
> manually maintain another model.

A LazyTextField that manages a dynamic model behind the scenes sounds
like a neat idea. That would certainly make the Django side of this
easier to manage. (But of course I have a slight bias towards code
that exists already -- has anyone actually implemented this?)

> Admittedly, the relational approach would require a data migration,
> but that's life sometimes.

When an application is live and has a very large amount of data and
users, avoiding data migration is extremely important. Data migration
means that when we update our live website, someone has to run sql
commands on our live database servers, which takes time and can add a
lot of load; they have to reason about what would happen if someone
visits our website if the sql is updated but not the python code (or
vice versa), or whether they have to disable the website while the
migration occurs. And the person who updates the website might be a
different person than the developer who made the code change. So
avoiding data migration just by making a one-line change in the Python
code is a huge plus.

Anyway, I'm only trying to point out that inline lazy fields are
*sometimes* useful, and that Django shouldn't force people to do it by
splitting out fields into separate tables.

-Jesse

Marty Alchin

unread,
Oct 14, 2008, 12:56:14 PM10/14/08
to django-d...@googlegroups.com
On Tue, Oct 14, 2008 at 12:34 PM, Jesse Young <adu...@gmail.com> wrote:
> A LazyTextField that manages a dynamic model behind the scenes sounds
> like a neat idea. That would certainly make the Django side of this
> easier to manage. (But of course I have a slight bias towards code
> that exists already -- has anyone actually implemented this?)

I don't think it's out there anywhere, but if you like I could whip up
the code for it in about 15 minutes (probably less). I just haven't
bothered because I don't personally need it.

> Anyway, I'm only trying to point out that inline lazy fields are
> *sometimes* useful, and that Django shouldn't force people to do it by
> splitting out fields into separate tables.

Damn it. Now you've gotten me thinking of ways to possibly manage a
field the way you want (one table, lazy loading, no data migration, no
patches to django, one-line change to the model). I'm 98% sure it's
possible, and I know of many of the things that would need to be
addressed to do so, but I really shouldn't be spending my time on it.
I probably won't be able to get it out of my head until I get it
working though, unfortunately. If I do write it down, I'll let you
know.

Of course, it would also have to come with big biohazard signs warning
people that there could (and likely would) be unforeseen side-effects,
particularly in the admin, serializers, modelforms ... actually,
pretty much anything that introspects your model would never see that
field. I'm not sure if that's suitable for your case, but it certainly
wouldn't be good enough for wide distribution.

-Gul

alex....@gmail.com

unread,
Oct 14, 2008, 1:00:18 PM10/14/08
to Django developers
I think the sensible solution is to have what the ticket originally
suggested, a queryset/manager method to control whether a field is
lazy loading, and yes, that can be implemented without patching
django, you can, of course, make this the default by overiding the
default manager.

Jesse Young

unread,
Oct 14, 2008, 7:40:58 PM10/14/08
to Django developers
Hey Alex,

On the trac ticket, I mentioned that my motivating reason for adding
specifying default-lazy fields in the Field definition (i.e., without
having to specify them on every query set) is the implicit queries
that Django creates for foreign key lookups (e.g., item =
obj.relateditem causes an implicit SQL query, and I want to avoid
loading large text fields on the related item). Also, to a lesser
extent, it would be good to avoid loading these columns in
select_related().

I hadn't thought before about subclassing Query/QuerySet/Manager/
TextField to do this without changing the installed version of Django.
That's an interesting idea. My change did also involve changing
Options (i.e., _meta), though.

Currently, however, the Query and QuerySet classes are functionally
decomposed in a way that makes it hard to override most of this
functionality in a subclass without copy-and-pasting a very large
amount of code. (Compare, for instance,
django.contrib.gis.db.models.sql.Query.get_columns with
django.db.models.sql.Query.get_columns.) I just posted a patch to
http://code.djangoproject.com/ticket/9368 which would make overriding
this functionality somewhat easier. But it would still involve a
considerable amount of code duplication if that logic wasn't in the
Query base class.

-Jesse

On Oct 14, 10:00 am, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages