Curious, how many folks are actually using dispatch at all?
For my personal usage, I'm actually not using any of the hooks- I suspect most folks aren't either. That said, I'm paying a fairly hefty price for them.
With Model.__init__'s send left in for 53.3k record instantiation (just a walk of the records), time required is 9.2s. Without the send, takes 7.0s. Personally, I'd like to get that quarter of the time slice back. :)
Via ticket 3439, I've already gone after dispatch to try and speed it up- I probably can wring a bit more speed out of it, but the improvements won't come anywhere near reclaiming the 2.2s from above.
What is left, is flat out removing the send invocations if they're not needed- specifically shifting the send calls out of __init__, and wrapping __init__ on the fly *when* something tries to connect to it.
Effectively,
from django.dispatch.dispatcher import connect, disconnect from django.db.model.signals import pre_init from django.db.models import Model
class m(Model): pass
callback = lambda *a, **kw:None
assert m.__init__ is Model.__init__ connect(callback, sender=m, signal=pre_init) assert m.__init__ is not Model.__init__
disconnect(callback, sender=m, signal=pre_init) assert m.__init__ is Model.__init__
The pro of this is that the slowdown is limited to *only* the instances where something is known to be listening- listening to model class foo doesn't slow down class bar basically; listening to save doesn't slow down __init__, etc.
The cons I'll enumerate:
1) to do this requires a few tricks- specifically, wrapping methods on a class on the fly when something starts listening, and reversing the wrapping when nobody is listening anymore. Personally, I'm comfortable with this (the misc contribute_to_class crap going on in _meta already isn't too far off). Realize however others may not be comfortable with it- thus speak up please.
2) usage of Any for a sender means we have to track the potential senders.
3) usage of Any for a signal means we have to track the signals involved in this trick (registration of the signal instance), and #2.
4) Not strictly required, but if sender is a class (and the only listeners are listening to *that* class, not Any), any deriving from that class will still fire the signal- meaning the performance gain is lost for the derivative, sends are occuring that don't have any listeners. This can be reclaimed via some tricks in ModelBase.__new__ offhand if desired and the use scenario is at least semi-common (how many people derive from a defined Model?).
5) the wrapping trick introduces an extra func into the callpath when something is listening. That's basically a semi-ellusive way of saying "it'll be slightly slower when there is a listener then what's in place now"; haven't finished the implementation thus I don't have specifics, but figure a few usecs hit from the wrapper itself (since the codepaths are executed often enough, it's worth noting for cases where listeners are expected).
Would appreciate any thoughts on above; the cons are basically implementation specific, that said I can work through them (I want that 25% back, damn it ;)- question is if folks are game for it or not, if the idea is palatable or not.
Aside from that, would really help if I had a clue what folks are actually using dispatch for with django- which signals, common patterns for implementing their own signals (pre/post I'd assume?), common signals they're listening to, etc.
Knowing it would help with optimizing dispatch further, and would be useful if someone ever decides to gut dispatch and refactor the code into something less fugly.
On 6/10/07, Brian Harring <ferri...@gmail.com> wrote:
> Aside from that, would really help if I had a clue what folks are > actually using dispatch for with django- which signals, common > patterns for implementing their own signals (pre/post I'd assume?), > common signals they're listening to, etc.
I'm working on something which will be leaning pretty heavily on the pre_save and post_save signals; the code's not public yet, but will be soon.
-- "Bureaucrat Conrad, you are technically correct -- the best kind of correct."
On 6/10/07, James Bennett <ubernost...@gmail.com> wrote:
> On 6/10/07, Brian Harring <ferri...@gmail.com> wrote: > > Aside from that, would really help if I had a clue what folks are > > actually using dispatch for with django- which signals, common > > patterns for implementing their own signals (pre/post I'd assume?), > > common signals they're listening to, etc.
> I'm working on something which will be leaning pretty heavily on the > pre_save and post_save signals; the code's not public yet, but will be > soon.
On Jun 10, 10:29 pm, "James Bennett" <ubernost...@gmail.com> wrote:
> I'm working on something which will be leaning pretty heavily on the > pre_save and post_save signals; the code's not public yet, but will be > soon.
I use these in django-multilingual to update translations when an instance of a translatable model is saved. I also depend on signals.class_prepared for each translatable model to finish its definition and create the child model with translation data.
I even tried to sneak yet another signal into Django at some point, but noone else was interested :)
That one would be cheap, though, triggered only when a new model gets created. I worked around not having it by wrapping ModelBase.__new__ in a function that did all the extra stuff I needed before calling the original implementation.
> I really like that technique, and plan to do similar in future.
Indeed; I (ab)use the hell out of signals, and would be sad without 'em. Nearly every trick in my sleeve these days needs signals.
That said, I'd also like that 25% back :) I'm *very* interested in your idea of dynamically enabling signals only when they're going to be caught; it's pointless to spend all that time dispatching if nothing's gonna answer. If you can figure out a clean way of accomplishing that -- and it looks like you've already started -- I'd certainly push for its acceptance.
On Sun, 2007-06-10 at 09:07 -0700, Brian Harring wrote: > Curious, how many folks are actually using dispatch at all?
> For my personal usage, I'm actually not using any of the hooks- I > suspect most folks aren't either. That said, I'm paying a fairly > hefty price for them.
> With Model.__init__'s send left in for 53.3k record instantiation > (just a walk of the records), time required is 9.2s. Without the > send, takes 7.0s. Personally, I'd like to get that quarter of the > time slice back. :)
Since you already have your own version of the Spanish Inquisition set up for testing, what portion of this overhead is just the function call? If you the dispatch function is replaced with just "return", do we save much.
In case it's not clear: I'm trying to get a feeling for how much of the cost is caused by the dispatching itself and how much by processing the dispatch inside the signal module. Is avoiding the call altogether necessary or making the handlers much faster? (More for future direction than anything else).
> Via ticket 3439, I've already gone after dispatch to try and speed it > up- I probably can wring a bit more speed out of it, but the > improvements won't come anywhere near reclaiming the 2.2s from above.
> What is left, is flat out removing the send invocations if they're > not needed- specifically shifting the send calls out of __init__, and > wrapping __init__ on the fly *when* something tries to connect to it.
> Effectively,
> from django.dispatch.dispatcher import connect, disconnect > from django.db.model.signals import pre_init > from django.db.models import Model
> class m(Model): pass
> callback = lambda *a, **kw:None
> assert m.__init__ is Model.__init__ > connect(callback, sender=m, signal=pre_init) > assert m.__init__ is not Model.__init__
> disconnect(callback, sender=m, signal=pre_init) > assert m.__init__ is Model.__init__
> The pro of this is that the slowdown is limited to *only* the > instances where something is known to be listening- listening to model > class foo doesn't slow down class bar basically; listening to save > doesn't slow down __init__, etc.
> The cons I'll enumerate:
> 1) to do this requires a few tricks- specifically, wrapping methods on > a class on the fly when something starts listening, and reversing the > wrapping when nobody is listening anymore. Personally, I'm > comfortable with this (the misc contribute_to_class crap going on in > _meta already isn't too far off). Realize however others may not be > comfortable with it- thus speak up please.
> 2) usage of Any for a sender means we have to track the potential > senders.
> 3) usage of Any for a signal means we have to track the signals > involved in this trick (registration of the signal instance), and #2.
> 4) Not strictly required, but if sender is a class (and the only > listeners are listening to *that* class, not Any), any deriving > from that class will still fire the signal- meaning the performance > gain is lost for the derivative, sends are occuring that don't have > any listeners. This can be reclaimed via some tricks in > ModelBase.__new__ offhand if desired and the use scenario is at least > semi-common (how many people derive from a defined Model?).
> 5) the wrapping trick introduces an extra func into the callpath when > something is listening. That's basically a semi-ellusive way of > saying "it'll be slightly slower when there is a listener then what's > in place now"; haven't finished the implementation thus I don't have > specifics, but figure a few usecs hit from the wrapper itself (since > the codepaths are executed often enough, it's worth noting for cases > where listeners are expected).
> Would appreciate any thoughts on above; the cons are basically > implementation specific, that said I can work through them (I want > that 25% back, damn it ;)- question is if folks are game for it or > not, if the idea is palatable or not.
> Aside from that, would really help if I had a clue what folks are > actually using dispatch for with django- which signals, common > patterns for implementing their own signals (pre/post I'd assume?), > common signals they're listening to, etc.
I don't think we can hope to get a really accurate picture here beyond a statistical sample with a broad range for any real confidence interval. The problem is that there is a userbase of thousands and a lot of evidence to suggest that most people don't read mailing list threads that they didn't start themselves. Yes, almost everybody reading this is an exception, but that automatically makes you an outlier.
However, to add to the sample, I'm using post_init in some cases and pre_save and post_save a lot. Looks like request_started and request_finished are making an appearance in my code, but mostly in diagnostic stuff that is not intended for production use.
> Knowing it would help with optimizing dispatch further, and would be > useful if someone ever decides to gut dispatch and refactor the code > into something less fugly.
Given that upstream pydispatcher isn't really being maintained, I don't think we should be too hesitant to tweak it for our needs.
> The problem is that there is a userbase of thousands and a lot of > evidence to suggest that most people don't read mailing list threads > that they didn't start themselves. Yes, almost everybody reading this is
so let's come to the surface... i use signals. Mainly pre-save / post-save. And just to let you know I miss a post_insert different from post_update.
On Mon, Jun 11, 2007 at 07:39:08PM +1000, Malcolm Tredinnick wrote:
> On Sun, 2007-06-10 at 09:07 -0700, Brian Harring wrote: > > Curious, how many folks are actually using dispatch at all?
> > For my personal usage, I'm actually not using any of the hooks- I > > suspect most folks aren't either. That said, I'm paying a fairly > > hefty price for them.
> > With Model.__init__'s send left in for 53.3k record instantiation > > (just a walk of the records), time required is 9.2s. Without the > > send, takes 7.0s. Personally, I'd like to get that quarter of the > > time slice back. :)
> Since you already have your own version of the Spanish Inquisition set > up for testing, what portion of this overhead is just the function call? > If you the dispatch function is replaced with just "return", do we save > much.
Offhand, replacing the dispatch with just 'return' is actually semi tricky, since there are a few receivers required for the django internals (class preparation). Basically requires delegating the send to the signal in select cases (for *_delete, and request_*, don't see much option unless they can be shifted around also).
For __init__ and save however, the wrap trick will fly- meaning don't even need the empty function call.
Either way, profile dump follows.
Top 30 via lsprof (cProfile for 2.5); with send left in Model.__init__
>>> ps.sort_stats("ti").print_stats(30)
Mon Jun 11 02:55:18 2007 dump.stats
1747388 function calls (1745991 primitive calls) in 18.627 CPU seconds
Ordered by: internal time List reduced from 916 to 30 due to restriction <30>
Model.__init__ is still a bit of a kick in the teeth offhand; addressing that one however requires some semi-nasty work shifting some of the fields related testing to be cached in _meta; not expecting a huge gain out of it, plus it'll likely be fairly nasty so I'd rather hold off on that one till a later date.
Not yet advocating it (mainly since digging it out would be ugly), but if you take a look at the bits above, having the option to disable verification on read *would* have a nice kick in the pants for ORM object instantiation when the admin has decided the data is guranteed to be the correct types.
> In case it's not clear: I'm trying to get a feeling for how much of the > cost is caused by the dispatching itself and how much by processing the > dispatch inside the signal module. Is avoiding the call altogether > necessary or making the handlers much faster? (More for future direction > than anything else).
Cost is from the dispatching; take a look in dispatcher.send. Django codebase has already deviated from dispatcher upstream via inlining large parts of the lookup there (part of the 5x boost in dispatching going from 0.95 to 0.96)- still has to do the lookups, which unfortunately are semi complex due to the semantics of Any.
That said... there really isn't any reason to continue making the calls if you know nothing is listening and the target to wrap emits just pre/post.
> > Aside from that, would really help if I had a clue what folks are > > actually using dispatch for with django- which signals, common > > patterns for implementing their own signals (pre/post I'd assume?), > > common signals they're listening to, etc.
> I don't think we can hope to get a really accurate picture here beyond a > statistical sample with a broad range for any real confidence interval. > The problem is that there is a userbase of thousands and a lot of > evidence to suggest that most people don't read mailing list threads > that they didn't start themselves. Yes, almost everybody reading this is > an exception, but that automatically makes you an outlier.
> However, to add to the sample, I'm using post_init in some cases and > pre_save and post_save a lot. Looks like request_started and > request_finished are making an appearance in my code, but mostly in > diagnostic stuff that is not intended for production use.
Just looking to get an idea of what folks are actually doing; simple example, it's easier to fire both pre/post if there is a listener for one- that said, if the vast number of folks are listening to only *one* of the signals, it's potentially worth the time to have the code swap in a pre, pre + post, or post wrapper as needed.
Also is a bit more of a pain in the ass implementing that, but looks of it, it'll be the desired next step.
> > Knowing it would help with optimizing dispatch further, and would be > > useful if someone ever decides to gut dispatch and refactor the code > > into something less fugly.
> Given that upstream pydispatcher isn't really being maintained, I don't > think we should be too hesitant to tweak it for our needs.
Don't spose it could just be thrown out? The code really *is* ugly :)
Can likely drop a lot of the internal voodoo and shift over to using weakref.Weak*Dictionary where appropriate internally, but robustapply is still fairly nasty- tend to think it should stop trying to hold folks hands, and just pass the send args/kwargs straight through to the receiver instead of trying to map args out.