[Django] #19501: Allow for fast path model loading

Django

unread,

Dec 20, 2012, 6:05:41 PM12/20/12

to django-...@googlegroups.com

#19501: Allow for fast path model loading
----------------------------------------------+--------------------
Reporter: akaariai | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
I have been playing around with the idea of pushing the model construction
to a class method of the model. This will allow very fast reads from
database, around 2x faster for normal model, 4x faster for deferred model
when you override the from_db() so that it constructs the model directly
using the model.__dict__ and doesn't send any signals. With signal sending
the speed different is 1.2x for normal model, ~3x for deferred model.

Of course, you can't do that by default. This breaks multiple things:
setattr not called (and descriptors don't work either), overridden init
not called and of course no signals sent. So, the default would need to be
old good `__init__` way, or we would need to make a really big backwards
compatibility break.

Still, in most projects where model init is a performance bottleneck it
will be for just one or two models. 90%+ of models do not need the
signals, init call nor do they have any setter of interest in the from DB
case. For these models it would be trivial to create a "fast path readonly
proxy" for the performance bottleneck cases.

So, by adding the from_db method we could allow for really fast init for
those who need it. The cost is around 5% slower for the default `__init__`
case.

Actually, if you are not going to write the data back to the DB you can
throw away the model._state, too. Then you get this kind of results:
{{{
Running 'model_init_signals' benchmark ...
Min: 0.007368 -> 0.002830: 2.6035x faster
Avg: 0.007430 -> 0.002847: 2.6100x faster
Significant (t=347.070697)
Stddev: 0.00006 -> 0.00001: 5.6553x smaller (N = 18)

Running 'query_all' benchmark ...
Min: 0.006640 -> 0.002699: 2.4600x faster
Avg: 0.006684 -> 0.002713: 2.4636x faster
Significant (t=687.137094)
Stddev: 0.00002 -> 0.00001: 2.3581x smaller (N = 18)

Running 'query_defer' benchmark ...
Min: 0.016939 -> 0.003011: 5.6257x faster
Avg: 0.017019 -> 0.003028: 5.6211x faster
Significant (t=1300.019953)
Stddev: 0.00004 -> 0.00001: 3.9626x smaller (N = 18)
}}}

I think the 5% slowdown for the default `__init__` path is worth it so
that those who really need speed can achieve the above shown performance.

A really barebones patch at:
https://github.com/akaariai/django/compare/fast_init - this can be used to
run benchmarks but can't be included in Django as is - it doesn't have the
safe `__init__` as the default.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Dec 22, 2012, 10:13:43 AM12/22/12

to django-...@googlegroups.com

#19501: Allow for fast path model loading

-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by apollo13):

* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* needs_tests: => 0
* needs_docs: => 0

Comment:

Accepting the general idea here, can't comment much on the code yet. Can
we somehow automatically decide if fast-path loading can be used?

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:1>

Django

unread,

Dec 22, 2012, 11:27:09 AM12/22/12

to django-...@googlegroups.com

#19501: Allow for fast path model loading

-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: nobody

Comment (by akaariai):

For the model._state - no. The model._state is used at least when saving
and when validating ModelForms and we can't know if this is going to
happen beforehand.

For signal sending the answer is yes, though we need to do this per-
queryset basis as between querysets the signals could be changed.

For updating the dict directly I think we could do this if there are no
`__setattr__` defined, and there are no descriptors defined for the field
attnames.

And, if there is an overridden `__init__`, then we must call that in any
case.

Likely the best approach is to have just one "use_fastpath" flag to
from_db() which we would set to True in case all of the above conditions
allow for fast pathing.

I think this would work: in the iterator()'s setup code we would check if
use_fastpath is possible (model._meta.can_use_fastpath(init_list), then
pass the result to from_db().

Those who want to squeeze the model._state overhead out would need to
override the (non public) from_db() method.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:2>

Django

unread,

Dec 22, 2012, 8:38:45 PM12/22/12

to django-...@googlegroups.com

#19501: Allow for fast path model loading

-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: akaariai

Keywords: | Patch needs improvement: 1

Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Changes (by akaariai):

* owner: nobody => akaariai
* needs_better_patch: 0 => 1

Comment:

I have code that proves that it is possible to detect fast_path init
conditions. Attached as patch as github is down. The code should be safe.
Well, there are some crazy use cases which would change but they are crazy
enough to say "who cares".

For example:
{{{
for obj in qs:
if somecond:
signals.pre_init.connect(somesignal, sender=obj.__class__)
}}}
this would not work like it did before, signals would not be send inside
the loop. Everybody crazy enough to have this kind of code deserves what
is coming for them :)

For performance numbers:
{{{
Running 'query_all' benchmark ...
Min: 0.006885 -> 0.003727: 1.8474x faster
Avg: 0.006929 -> 0.003766: 1.8398x faster
Significant (t=299.109631)
Stddev: 0.00003 -> 0.00003: 1.1666x larger (N = 18)

Running 'query_all_multifield' benchmark ...
Min: 0.012577 -> 0.007315: 1.7193x faster
Avg: 0.012625 -> 0.007391: 1.7081x faster
Significant (t=295.886261)
Stddev: 0.00005 -> 0.00005: 1.0607x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.016842 -> 0.004038: 4.1708x faster
Avg: 0.016909 -> 0.004098: 4.1263x faster
Significant (t=841.877024)
Stddev: 0.00005 -> 0.00005: 1.0069x smaller (N = 18)

Running 'query_get' benchmark ...
Min: 0.024566 -> 0.024049: 1.0215x faster
Avg: 0.024594 -> 0.024079: 1.0214x faster
Significant (t=83.266099)
Stddev: 0.00002 -> 0.00002: 1.1787x larger (N = 18)

Running 'query_raw' benchmark ...
Min: 0.013461 -> 0.007918: 1.7000x faster
Avg: 0.013510 -> 0.007957: 1.6979x faster
Significant (t=439.052454)
Stddev: 0.00004 -> 0.00003: 1.2919x smaller (N = 18)

Running 'query_raw_deferred' benchmark ...
Min: 0.015671 -> 0.003356: 4.6696x faster
Avg: 0.015714 -> 0.003391: 4.6345x faster
Significant (t=1321.202256)
Stddev: 0.00003 -> 0.00002: 1.2602x smaller (N = 18)

Running 'model_init_self_signals' benchmark ...
Min: 0.011877 -> 0.012277: 1.0337x slower
Avg: 0.011926 -> 0.012312: 1.0323x slower
Significant (t=-38.998385)
Stddev: 0.00003 -> 0.00003: 1.1830x smaller (N = 18)
}}}

What can be learned from this? Well, init is faster, except a minor
slowdown in model_init_self_signals. Query.get() isn't changed much.
Deferred model loading is now clearly faster than normal loading. It was
slower(!) before. query_defer uses same model def than
query_all_multifield.

For ultimate readonly speed one can change the from_db. query_all as
example:
{{{
models.py:
from django.db import models
from django.db.models.base import Empty

class Book(models.Model):
title = models.CharField(max_length=100)

@classmethod
def from_db(cls, using, values, attnames, init_with_args,
can_fast_path):
new = Empty()
new.__class__ = cls
new.id, new.title = values
return new

}}}
the from_db isn't safe for deferred loading (init_with_args tells us if
all model's fields values are present, maybe I should rename that...), and
it always fast paths. One can't use these models in form validation nor
can one use these when writing (no _state). However, the speed difference
is major:
{{{
Running 'query_all' benchmark ...
Min: 0.006691 -> 0.001791: 3.7359x faster
Avg: 0.006725 -> 0.001800: 3.7369x faster
Significant (t=794.562242)
Stddev: 0.00002 -> 0.00001: 3.0487x smaller (N = 18)
}}}

I would like opinions if we want the added complexity. I know I am biased
towards optimizing vs code clarity. Still, in my opinion the code is
actually pretty clear, so this isn't a lot worse than before.

The patch needs some more tests (we should test directly the
can_fast_path() at least). It might be wise to move the can_fast_path() to
Model as _can_fast_path() to avoid the circular import problems. Also,
from_db() would likely need to be change to _from_db(), as it isn't a good
idea to make it public API. I think it falls into "semi-public" api
territory where we don't change it if we don't have a good reason, but if
we *do* have a reason then we can change the signature. And, we can always
make it public later if we wish so.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:3>

Django

unread,

Mar 1, 2013, 5:23:05 PM3/1/13

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Comment (by akaariai):

I did some work on the patch. Custom fields using SubFieldBase do not use
`__set__`, instead they use `__get__`. In addition I verified that nothing
in Django relies on custom `__init__` or model `__init__` signals. Still,
nothing in Django relies on `__setattr__`.

The easily most common action on load is to convert from DB format to
Python format. But this is exactly what SubFieldBase is meant for. There
might be other use cases, maybe something like calculating a value on
`__init__`. This seems fragile - if you then go and change obj.somefield
value and that field is part of the calculation your code will break.
Thus, a property seems a lot better way to go for such cases.

Django's code base shows that one usually wants to differentiate
`__init__` and load from DB, for example the init signals in Django act
only when initializing object, but they do nothing for load from db case.
Overriding `__init__` is also hard for this reason.

The performance numbers are nice:
{{{
Running 'query_all' benchmark ...
Min: 0.007524 -> 0.004527: 1.6620x faster
Avg: 0.007551 -> 0.004580: 1.6486x faster
Significant (t=312.956724)
Stddev: 0.00002 -> 0.00003: 1.4246x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.017146 -> 0.004315: 3.9736x faster
Avg: 0.017218 -> 0.004349: 3.9587x faster
Significant (t=896.296579)
Stddev: 0.00005 -> 0.00003: 1.6903x smaller (N = 18)
}}}

If you add in the patch from removing the chunked iterators, then you get
this:
{{{
Running 'query_all' benchmark ...
Min: 0.007714 -> 0.004020: 1.9189x faster
Avg: 0.007756 -> 0.004064: 1.9082x faster
Significant (t=341.747393)
Stddev: 0.00003 -> 0.00003: 1.1788x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.017498 -> 0.004134: 4.2325x faster
Avg: 0.017535 -> 0.004173: 4.2022x faster
Significant (t=1455.612646)
Stddev: 0.00002 -> 0.00003: 1.2341x larger (N = 18)
}}}

So, even more gain with simplified iterator protocol.

Last, for those who need absolute speed for read-only data, alter the
from_db() to this (for query_all):
{{{
@classmethod
def from_db(cls, using, values, field_names, init_with_args,
can_fast_init):

new = Empty()
new.__class__ = cls

new. = dict(zip(field_names, values))

Running 'query_all' benchmark ...
Min: 0.007590 -> 0.001996: 3.8026x faster
Avg: 0.007651 -> 0.002014: 3.7987x faster
Significant (t=457.077059)
Stddev: 0.00005 -> 0.00001: 4.4830x smaller (N = 18)
}}}

Now, such a model will only work in read-only cases (no model._state), and
wont work with .defer() or .only() (all fields must always be present),
but if you happen to need speed for read-only case, you can get it.

The branches are here:
https://github.com/akaariai/django/compare/fast_init_test and
https://github.com/akaariai/django/compare/fast_init_and_nonchunked (also
the latest patch from #18702).

Now, there surely is code that will break if this is committed as-is. So,
I wonder if this should be opt-in or opt-out with model._meta flag, or if
the whole load using `__init__` could be removed totally.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:4>

Django

unread,

May 14, 2013, 5:41:20 PM5/14/13

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Comment (by anonymous):

Sort of off-topic, but how do you run these benchmarks? Is there a
tutorial online for testing different branches like this? This would go a
long way towards improving performance in my projects. Come to think of
it, it could even be included in a future "Optimizing Django" section in
the docs.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:5>

Django

unread,

Jun 12, 2014, 8:40:57 AM6/12/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Comment (by akaariai):

See PR https://github.com/django/django/pull/2802 for a basic
implementation.

The cost of this is 10% slower query_all benchmark, and around 15% slower
query_select_related benchmark (these fetch large amount of objects and do
nothing else). OTOH, using the new hook it will be possible to speed up
model loading dramatically (query_all is 1.7x faster with direct __dict__
assignment).

My feeling is that if you have a bottleneck due to model instantiation
speed, then you want this patch so that you can optimize the way your
models are loaded. If you don't have a bottleneck in model instantiation,
then you don't care about the 10% in any case.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:6>

Django

unread,

Jun 12, 2014, 8:43:44 AM6/12/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Keywords: | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0

Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Changes (by akaariai):

* needs_better_patch: 1 => 0
* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:7>

Django

unread,

Jun 24, 2014, 8:44:50 PM6/24/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Keywords: | Patch needs improvement: 1

Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Changes (by timo):

* needs_better_patch: 0 => 1

Comment:

Patch has some review comments.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:8>

Django

unread,

Jun 27, 2014, 2:39:40 AM6/27/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

Keywords: | Patch needs improvement: 0

Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Changes (by akaariai):

* needs_better_patch: 1 => 0

Comment:

I updated the code and docs based on review comments. The code was
mistakenly creating a new model._state in from_db() method, when just
updating the attributes of the existing ._state instance is enough. Now
the slowdown (using djangobench) is:
{{{
Running 'query_select_related' benchmark ...
Min: 0.054252 -> 0.058567: 1.0795x slower
Avg: 0.055689 -> 0.059967: 1.0768x slower
Significant (t=-8.076647)
Stddev: 0.00271 -> 0.00258: 1.0494x smaller (N = 50)

Running 'query_all' benchmark ...
Min: 0.021309 -> 0.022659: 1.0633x slower
Avg: 0.022763 -> 0.024240: 1.0649x slower
Significant (t=-2.598991)
Stddev: 0.00285 -> 0.00283: 1.0042x smaller (N = 50)

Running 'query_raw_deferred' benchmark ...
Min: 0.016543 -> 0.017822: 1.0773x slower
Avg: 0.017149 -> 0.018389: 1.0723x slower
Significant (t=-4.172737)
Stddev: 0.00150 -> 0.00147: 1.0174x smaller (N = 50)
}}}

So, the code itself causes a minor slowdown. The benchmarks done in above
comments and the ticket's description test various fast-path
implementations. Basically, one can do the following for their model:

{{{
class Empty(object):
pass

class MyModel(models.Model):
...

@classmethod
def from_db(cls, db, field_names, values):
empty = Empty()
empty.__class__ = cls
empty.__dict__ = zip(field_names, values)
empty._state = ModelState()
empty._state.adding = False
return empty
}}}
and this results in around 2x better performance for query_all benchmark,
and 4x speedup for deferred model loading (query_defer benchmark).

This ticket discussed various ways to do automatically the above
optimization - it is possible to do if the model doesn't have customized
``__init__``, doesn't have pre/post init signals, and calling setattr
isn't required for the model. Detecting this seems too complex, so the
automation isn't there any more. However, users who require absolute
performance can do the optimization manually if needed.

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:9>

Django

unread,

Jun 30, 2014, 9:59:47 AM6/30/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: akaariai
Type: | Status: new
Cleanup/optimization | Version: master
Component: Database layer | Resolution:

* stage: Accepted => Ready for checkin

--
Ticket URL: <https://code.djangoproject.com/ticket/19501#comment:10>

Django

unread,

Jul 1, 2014, 9:30:17 AM7/1/14

to django-...@googlegroups.com

#19501: Allow for fast path model loading

-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: akaariai

Type: | Status: closed
Cleanup/optimization | Version: master
Component: Database layer | Resolution: fixed

Changes (by Anssi Kääriäinen <akaariai@…>):

* status: new => closed
* resolution: => fixed

Comment:

In [changeset:"0b6f05ede648ce62a5c91c7c38a0a362711f0656"]:
{{{
#!CommitTicketReference repository=""
revision="0b6f05ede648ce62a5c91c7c38a0a362711f0656"
Fixed #19501 -- added Model.from_db() method

The Model.from_db() is intended to be used in cases where customization
of model loading is needed. Reasons can be performance, or adding custom
behavior to the model (for example "dirty field tracking" to issue
automatic update_fields when saving models).