I was poking around in our (Votizen's) use of signals and thinking
about making some tooling so that signal usage was a bit more
transparent.
In doing so, I noticed that GenericForeignKey hooks the model pre_init
signal. It does that because GFK needs a chance to munge kwargs from
the GFK field name to the content_id and object_id fields.
Signal dispatch with zero receivers is much much faster than with 1 or
more receivers, and pre_init is fired quite a lot. Pretty much every
decent sized project I've worked on has used GFK, so I imagine this is
a large sap on performance overall.
Similarly, I just noticed that ImageField does a similar thing,
setting dimension values in post_init.
I think we could get a significant performance win if, instead of
using pre_init/post_init here, we changed Model.__init__ to call a
per-field hook that could munge the Model.__init__ kwargs.
SomeField.model_pre_init(instance, args, kwargs)
or maybe for symmetry with contribute_to_class:
SomeField.contribute_to_instance(instance, args, kwargs)
We could store which fields have these hooks upon ModelBase.__new__
construction and so skip most fields and overhead in __init__.
I'm happy to make a pull request but checking here to see if there's
any pushback.
> I was poking around in our (Votizen's) use of signals and thinking
> about making some tooling so that signal usage was a bit more
> transparent.
> In doing so, I noticed that GenericForeignKey hooks the model pre_init
> signal. It does that because GFK needs a chance to munge kwargs from
> the GFK field name to the content_id and object_id fields.
> Signal dispatch with zero receivers is much much faster than with 1 or
> more receivers, and pre_init is fired quite a lot. Pretty much every
> decent sized project I've worked on has used GFK, so I imagine this is
> a large sap on performance overall.
> Similarly, I just noticed that ImageField does a similar thing,
> setting dimension values in post_init.
> I think we could get a significant performance win if, instead of
> using pre_init/post_init here, we changed Model.__init__ to call a
> per-field hook that could munge the Model.__init__ kwargs.
> SomeField.model_pre_init(instance, args, kwargs)
> or maybe for symmetry with contribute_to_class:
> SomeField.contribute_to_instance(instance, args, kwargs)
> We could store which fields have these hooks upon ModelBase.__new__
> construction and so skip most fields and overhead in __init__.
> I'm happy to make a pull request but checking here to see if there's
> any pushback.
A big +1 from me.
Maybe the fields could explicitly add methods to pre/post init hooks
in contribute_to_class instead of method autodiscovery?
The signal receiver caching approach is another alternative (https://
code.djangoproject.com/ticket/16679). The downside of that approach is
that it complicates the already complex signals framework, so for that
reason it seems better to get rid of the post/pre init signals.
On Sat, Jul 14, 2012 at 10:57 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> I was poking around in our (Votizen's) use of signals and thinking
> about making some tooling so that signal usage was a bit more
> transparent.
> In doing so, I noticed that GenericForeignKey hooks the model pre_init
> signal. It does that because GFK needs a chance to munge kwargs from
> the GFK field name to the content_id and object_id fields.
> Signal dispatch with zero receivers is much much faster than with 1 or
> more receivers, and pre_init is fired quite a lot. Pretty much every
> decent sized project I've worked on has used GFK, so I imagine this is
> a large sap on performance overall.
> Similarly, I just noticed that ImageField does a similar thing,
> setting dimension values in post_init.
> I think we could get a significant performance win if, instead of
> using pre_init/post_init here, we changed Model.__init__ to call a
> per-field hook that could munge the Model.__init__ kwargs.
> SomeField.model_pre_init(instance, args, kwargs)
> or maybe for symmetry with contribute_to_class:
> SomeField.contribute_to_instance(instance, args, kwargs)
+1 to this idea.
My only concern is:
> We could store which fields have these hooks upon ModelBase.__new__
> construction and so skip most fields and overhead in __init__.
I'm not sure if I'm misreading your intentions here, but just to be
sure -- I'd like to avoid explicitly naming certain field types as
"special" in ModelBase if at all possible. If we explicitly name
certain fields in ModelBase, then it means that third-parties with the
same requirement won't be able to exploit the same API entry point. In
the case of GenericForeignKey specifically, it also means that we
would be introducing a dependency (even if it was name-only) on
contrib into core -- which is something we've been trying to move away
from.
On Jul 14, 2012, at 6:37 PM, Russell Keith-Magee wrote:
> My only concern is:
>> We could store which fields have these hooks upon ModelBase.__new__
>> construction and so skip most fields and overhead in __init__.
> I'm not sure if I'm misreading your intentions here, but just to be
> sure -- I'd like to avoid explicitly naming certain field types as
> "special" in ModelBase if at all possible. If we explicitly name
> certain fields in ModelBase, then it means that third-parties with the
> same requirement won't be able to exploit the same API entry point. In
> the case of GenericForeignKey specifically, it also means that we
> would be introducing a dependency (even if it was name-only) on
> contrib into core -- which is something we've been trying to move away
> from.
I think Jeremy means doing something like this in ModelBase.__new__:
meta.fields_needing_init = [field for field in fields if hasattr(field, 'model_pre_init')]
And in ModelBase.__init__:
for field in self._meta.fields_needing_init:
field.model_pre_init(*args, **kwargs)
This way, in ModelBase.__init__, only the fields that actually need to be initialized have this method called.
On Sun, Jul 15, 2012 at 9:51 AM, Andy McCurdy <sed...@gmail.com> wrote:
> On Jul 14, 2012, at 6:37 PM, Russell Keith-Magee wrote:
>> My only concern is:
>>> We could store which fields have these hooks upon ModelBase.__new__
>>> construction and so skip most fields and overhead in __init__.
>> I'm not sure if I'm misreading your intentions here, but just to be
>> sure -- I'd like to avoid explicitly naming certain field types as
>> "special" in ModelBase if at all possible. If we explicitly name
>> certain fields in ModelBase, then it means that third-parties with the
>> same requirement won't be able to exploit the same API entry point. In
>> the case of GenericForeignKey specifically, it also means that we
>> would be introducing a dependency (even if it was name-only) on
>> contrib into core -- which is something we've been trying to move away
>> from.
> I think Jeremy means doing something like this in ModelBase.__new__:
> meta.fields_needing_init = [field for field in fields if hasattr(field, 'model_pre_init')]
> And in ModelBase.__init__:
> for field in self._meta.fields_needing_init:
> field.model_pre_init(*args, **kwargs)
> This way, in ModelBase.__init__, only the fields that actually need to be initialized have this method called.
That is indeed what I meant, but I think Anssi's probably right that
this belongs in contribute_to_class instead of a new adhoc thing in
ModelBase.__new__.
On Sun, Jul 15, 2012 at 11:07 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> That is indeed what I meant, but I think Anssi's probably right that
> this belongs in contribute_to_class instead of a new adhoc thing in
> ModelBase.__new__.
timeit.Timer("""
for i in names:
models.Person(name=i)
""", """
from django.core.management import call_command
from regressiontests.model_fields import models
call_command('syncdb')
names = [str(i) for i in range(1000)]
""").timeit(1000)
Under the existing signal usage: 24.609718084335327
Under the new approach avoiding signal usage: 18.074234008789062
I'd expect post_init to be a smaller gain since the destination
function has more overhead, but still a gain.
Note that the pre_init and post_init signals are still fired -- it's
just that django as-shipped won't use any listeners on them, so the
Model.__init__ performance improves.
In some cases, object construction currently is larger than query time
- a simple 'select x from y limit 1000' would have similar overhead to
this. Also, note that this gain would be seen on all models in any
project using GFK or ImageField but no other pre_init or post_init
hooks - not just those models which have the fields. With this
switch, only the models with the hooking fields have the related
overhead.
On Wed, Jul 18, 2012 at 3:48 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> On Sun, Jul 15, 2012 at 11:07 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
>> That is indeed what I meant, but I think Anssi's probably right that
>> this belongs in contribute_to_class instead of a new adhoc thing in
>> ModelBase.__new__.
> timeit.Timer("""
> for i in names:
> models.Person(name=i)
> """, """
> from django.core.management import call_command
> from regressiontests.model_fields import models
> call_command('syncdb')
> names = [str(i) for i in range(1000)]
> """).timeit(1000)
> Under the existing signal usage: 24.609718084335327
> Under the new approach avoiding signal usage: 18.074234008789062
> I'd expect post_init to be a smaller gain since the destination
> function has more overhead, but still a gain.
> Note that the pre_init and post_init signals are still fired -- it's
> just that django as-shipped won't use any listeners on them, so the
> Model.__init__ performance improves.
> In some cases, object construction currently is larger than query time
> - a simple 'select x from y limit 1000' would have similar overhead to
> this. Also, note that this gain would be seen on all models in any
> project using GFK or ImageField but no other pre_init or post_init
> hooks - not just those models which have the fields. With this
> switch, only the models with the hooking fields have the related
> overhead.
On 27 heinä, 18:58, Jeremy Dunck <jdu...@gmail.com> wrote:
> Can I get a review of the branch? 25% performance improvement of
> Model.__init__ in stock django seems worth landing. :)
I did some work to add new benchmarks to djangobench. In my tests the
speedup for the init hooks approach was nearly 60% when both pre/post
init signals were in use in the project. Currently this happens when
there are at least one ImageField and one GenericForeignKey used in
the project. The test is fetching 500 objects from DB by
the .iterator() method.
We should do something to this issue. If the init hooks approach is
the way to go needs to be decided. The caching of signal receivers
gives the same speedup.
Some more info in ticket #18627.
- Anssi
PS: for another possible nice speedup see ticket https://code.djangoproject.com/ticket/16759 which is about removing the use of __deepcopy__ from queryset cloning.
There is potential for major speedups using that approach. Just one
example: model.save() +50% speedup.