Speed up models.Model.__init__: deprecate pre_init and post_init signals?

87 views
Skip to first unread message

Anssi Kääriäinen

unread,
Aug 13, 2011, 10:22:54 AM8/13/11
to Django developers
I am trying to speed up Model.__init__, but it seems it is pretty well
optimized already. However, there are still some things that could be
done to speed up it even further.

Deprecate pre_init and post_init signals. I wonder if these are
actually used in third-party code. These signals are not the easiest
to use, as they get the field values in *args, or in **kwargs, or part
in *args, part in **kwargs. Django core uses them in generic foreign
keys:

django/contrib/contenttypes/generic.py:

class GenericForeignKey(object):
def contribute_to_class(self, cls, name):
...
# For some reason I don't totally understand, using weakrefs
here doesn't work.
signals.pre_init.connect(self.instance_pre_init, sender=cls,
weak=False)
...

def instance_pre_init(self, signal, sender, args, kwargs,
**_kwargs):
"""
Handles initializing an object with the generic FK instaed of
content-type/object-id fields.
"""
if self.name in kwargs:
value = kwargs.pop(self.name)
kwargs[self.ct_field] = self.get_content_type(obj=value)
kwargs[self.fk_field] = value._get_pk_val()


It is probably possible to fix that use case.

Are there more use cases? I would think the two possible use cases are
manipulating the initialization of third-party models and fields
needing to manipulate the initialization of a model (as in above).

I tested the effect of removing the signals using models T1 and T2. T1
has just id field, T2 has also 10 text fields.
Fetch of 10000 objects from DB:
T1: 0.16 s -> 0.13 s
T2: 0.36 s -> 0.33 s

Without DB (just a loop with T(args) calls):
T1: 0.10s -> 0.7s
T2: 0.18s -> 0.15s

So, one could save about 10% to 20% using this.

If there are use cases that are hard to do without pre_init and
post_init signals, then 10% to 20% speed loss for allowing those cases
isn't that bad. I just wonder if there are use cases like that?

One idea is to change _state from object to dict. This shaves of
additional 0.01 seconds per 10000 objects. The question here is if
_state should have logic attached to it. Currently it does not have
any logic, it is just a data container.

Still another idea is to get rid of the izip call and use an index
variable instead. That seems to shave off another 0.01 to 0.02 seconds
per 10000 objects. I need to work a little more on that idea still.

The total speed up for 10000 T1(id_val) calls is 50% and 10000 T2 from
DB is around 20%.

Otherwise I can't think anything to speed up model __init__ without
going to code generation.

For the interested, using cursor.execute("select * from t1");
list(cursor.fetchall()) takes 0.015 seconds. For t2 the time is 0.1
seconds. This is using postgresql. So, the overhead of fetching
objects instead of as-raw-as-possible sql is around 200% with all
optimizations, and up to 250-500+% without any optimizations.

- Anssi

Anssi Kääriäinen

unread,
Aug 13, 2011, 10:38:26 AM8/13/11
to Django developers
On Aug 13, 5:22 pm, Anssi Kääriäinen <anssi.kaariai...@thl.fi> wrote:
> For the interested, using cursor.execute("select * from t1");
> list(cursor.fetchall()) takes 0.015 seconds. For t2 the time is 0.1
> seconds. This is using postgresql. So, the overhead of fetching
> objects instead of as-raw-as-possible sql is around 200% with all
> optimizations, and up to 250-500+% without any optimizations.

I thought I was using PostgreSQL. But I was not. Instead I was using
sqlite3. The correct readings for PostgreSQL are 0.03 seconds for t1
and 0.3 seconds for t2. There is similar difference for the "fetching
from DB" tests. So, the overhead of Django model objects is actually
somewhat smaller when using PostgreSQL.

- Anssi

Michal Petrucha

unread,
Aug 13, 2011, 12:13:40 PM8/13/11
to django-d...@googlegroups.com

For the record, in my GSoC branch I already removed this particular
piece of code and GenericForeignKeys are handled somewhat differently,
though I'm still not convinced it is the best way.

Michal

signature.asc

Dan Fairs

unread,
Aug 15, 2011, 6:17:07 AM8/15/11
to django-d...@googlegroups.com
> Deprecate pre_init and post_init signals. I wonder if these are
> actually used in third-party code. These signals are not the easiest
> to use, as they get the field values in *args, or in **kwargs, or part
> in *args, part in **kwargs. Django core uses them in generic foreign
> keys:
>

To leap in here - we use post_init in our application, so there's at least *one* consumer out there! We use them to populate attributes on a model which depend on other system state, including data in that model itself. In our case, we actually only care about the instance passed in kwargs.

There may be another way of doing what we're doing without post_init, but I'd need to look into it.

Cheers
Dan
--
Dan Fairs | dan....@gmail.com | www.fezconsulting.com


Anssi Kääriäinen

unread,
Aug 15, 2011, 6:57:59 AM8/15/11
to django-d...@googlegroups.com, Dan Fairs
On 08/15/2011 01:17 PM, Dan Fairs wrote:
>> Deprecate pre_init and post_init signals. I wonder if these are
>> actually used in third-party code. These signals are not the easiest
>> to use, as they get the field values in *args, or in **kwargs, or part
>> in *args, part in **kwargs. Django core uses them in generic foreign
>> keys:
>>
> To leap in here - we use post_init in our application, so there's at least *one* consumer out there! We use them to populate attributes on a model which depend on other system state, including data in that model itself. In our case, we actually only care about the instance passed in kwargs.

Really stupid question, but why not just override __init__? Third party
code, where you can't modify the __init__?

But, maybe these signals do not need to be deprecated to get the speed
gain. We could check if pre_init or post_init is used at all in the
Django instance. This would be done by using global variables
has_pre_init_listeners and has_post_init_listeners in
django/db/models/base.py. These variables are set when first listener is
registered to pre_init and post_init signals. If there is one (for any
model), then do the normal signal sending. So, instead of doing:

singnals.pre_init.send(sender=...)

do

if has_pre_init_listeners:
signals.pre_init.send(sender=...)

That way, if there is no listeners at all defined, the overhead should
be non-measurable. Hopefully this is the common case.

Currently, the signal.send() calls add about 0.02 seconds to 10000
objects (on my old laptop). If there is any listener on pre_init or
post_init, the overhead of that is 0.05 seconds, no matter if the
listener is interested in the current model or not. That is, a listener
for T1 will add 0.05 seconds to initializations of T2. There is a
further small penalty for signals actually interested in the current
model. It is something like 0.03 seconds if I remember correctly.

- Anssi

Anssi Kääriäinen

unread,
Aug 15, 2011, 3:45:03 PM8/15/11
to Django developers
On Aug 15, 1:57 pm, Anssi Kääriäinen <anssi.kaariai...@thl.fi> wrote:
> But, maybe these signals do not need to be deprecated to get the speed
> gain. We could check if pre_init or post_init is used at all in the
> Django instance.

I tried this. The result is that you can save 0.03 seconds (or about
30% in the most trivial case) per 10000 objects if there are no
pre_init or post_init signals in the project. However
GenericForeignKey will listen to pre_init signal, and ImageField for
the post_init signal (I missed this one earlier). And if you have a
single listener, you will lose the whole benefit for all models. This
seems to be so common that the optimization is not worth it.

If there is a single listener for post_init, it will add 0.07 seconds
to _all_ model initializations in the project per 10000 objects, no
matter if there is a listener for the _current_ model being
initialized. Same for post_init. So, in the case where you have
GenericForeignKey and ImageField in your project, there is an addition
of 0.16 seconds per 10000 objects created. Remember that without any
overhead the initialization would be 0.08 seconds in the trivial case.
So in the most trivial case almost 2/3 of the time would be used
sending signals, even though nobody is listening. For more realistic
models, the overhead is somewhere around 20%-30%.

It is possible to get rid of this overhead by recording the existence
of signal listeners per model, directly in Options. The class is
passed to the connect method, so it is trivial to set the
has_pre_init_listeners and has_post_init_listeners attributes in the
overridden .connect() method. I am not sure about the Options
availability, is it available whenever the class is available? It
seems to work, and the savings are as expected, with both pre_init and
post_init signals defined for model T2, 10000 T1 initializations take
0.21 seconds before and 0.08 seconds after. Worth fixing, no?

- Anssi
Reply all
Reply to author
Forward
0 new messages