Hello all, Faster signals are available as an attachment here. I tried attaching to #6814, but my upload was always accepted with 0 length. Odd.
The updated patch applies cleanly to 8156.
In addition to changes in patch, the following files may be removed from the tree: svn rm django/dispatch/errors.py django/dispatch/robust.py django/dispatch/robustapply.py tests/regressiontests/dispatch/tests/test_robustapply.py
Keith Bussell did at least as much work on this as I did. He prevailed to keep weak references and noted a rather shocking bug.
The first thing worth noting is that the changes are backwards incompatible:
1) All receivers to accept **kwargs. The benefit here is that signal dispatch is much quicker because it doesn't have to worry about conforming the sender's arguments to the receiver's arguments. (When settings.DEBUG == True, connect asserts that the receiver does indeed accept **kwargs.) 2) Dropped support for `Anonymous` sender. `Any` is still functionally available if Signal.connect is called with a `sender` of `None`. 3) The __author__, __cvsid__ and __version__ attributes were removed from django.dispatch, along with various other internals from the original pydispatcher implementation.
Also, robustApply has died. (And there was much rejoicing?)
Signals go from object sentinels to class Signal instances. These keep track of their own receivers. Order of dispatch to receivers is preserved, though I think the API shouldn't guarantee that-- it's just faster for send to use a list of receivers than a dict, but that may change at some point.
Since signals were not officially documented before, this patch doesn't include docs.
Any receiver wanting to ensure unique connection (despite, say, duplicate module loads) should define a dispatch_uid attribute. e.g. ======= def some_handler(**kwargs): pass some_handler.dispatch_uid = 'my very unique id' post_syncdb.connect(some_handler) =======
django.dispatch.dispatcher.* still exist for backwards compatibility; literally the only change client code should need to make is adding **kwargs.
OK, the fun stuff: We assume that the 0-receiver case is most common, that is that most projects don't use signals, and therefore shouldn't pay much of an overhead if they aren't used. (With a nod to bharring's prior approach of monkey-patching upon first listen.)
We assume that even on projects that use signals, connect and disconnect of handlers is relatively rare compared to the number of sent/received signals.
Optimizations have those things in mind.
Summary results: Signal sends with no listeners in process: 67% faster Signal sends with no listeners, but 10 other listeners on different signals: 91% faster Signal sends with 30 weakref receivers: 60% faster Signal sends with 30 strongref receivers: 67% faster
As a sideline, I noted even though signal handling frequently should hold weak references to receivers, dealing with weak references has a fairly high overhead. I took a hack at adding a USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where people can opt-out of the weakref safety net and into yet faster signal handling. My approach was to have a base signal class which deferred weak vs. strong semantics to subclasses. Unfortunately, that approach led to more function call overhead, which lost most of the gain to be had. In other words, USE_WEAK_SIGNALS could gain more performance (about 20% over this new performance), but would require some code duplication to avoid function call overhead. Out of time now. In any case, that could be done in a backwards compatible way, so it's not a 1.0 API issue.
Let us know any questions-- hopefully this makes the cut.
I'm mostly offline the next few days; we're moving. Bad timing w/ 1.0 approach, alas.
On Fri, Aug 1, 2008 at 2:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote: > The first thing worth noting is that the changes are backwards incompatible:
> 1) All receivers to accept **kwargs. The benefit here is that > signal dispatch is much quicker because it doesn't have to worry about > conforming the sender's arguments to the receiver's arguments. > (When settings.DEBUG == True, connect asserts that the receiver > does indeed accept **kwargs.)
I was a little worried about the **kwargs requirement, but I think the DEBUG==True approach is a good way to go about handling that change.
> 2) Dropped support for `Anonymous` sender. `Any` is still > functionally available if Signal.connect is called with a `sender` of > `None`.
I also assume that, since signals are now objects that manage their own listeners, there's also no more listening for `Any` signal? I'm happy with that, but it would be worth noting as a backwards-incompatible change.
> Also, robustApply has died. (And there was much rejoicing?)
Indeed. MUCH rejoicing!
> Signals go from object sentinels to class Signal instances. These > keep track of their own receivers. Order of dispatch to receivers is > preserved, though I think the API shouldn't guarantee that-- it's just > faster for send to use a list of receivers than a dict, but that may > change at some point.
Agreed. I don't think a dict will ever be a reasonable approach for this, but a set might prove useful, or some other unordered collection. We can't guarantee what order they're registered in (due to import issues), and signals should never rely on other signals anyway, so guaranteeing an execution order would be pure madness.
> Since signals were not officially documented before, this patch > doesn't include docs.
That's something I'd like to change, though, especially since we can't point people to PyDispatcher information anymore. I don't think docs should block this from making it into the beta, but I'd definitely like to see some docs go in before 1.0 proper. I'll see if I can spare some time to work on some in the next few weeks.
> Any receiver wanting to ensure unique connection (despite, say, > duplicate module loads) should define a dispatch_uid attribute. e.g. > ======= > def some_handler(**kwargs): > pass > some_handler.dispatch_uid = 'my very unique id' > post_syncdb.connect(some_handler) > =======
> django.dispatch.dispatcher.* still exist for backwards > compatibility; literally the only change client code should need to > make is adding **kwargs.
As long as there has to be some import changes -- and at the risk of starting a minor flame war -- what the distinction is between django.* and django.core.*? This seems -- to me -- like a candidate for django.core instead, but I really don't know what goes into that placement.
> We assume that the 0-receiver case is most common, that is that > most projects don't use signals, and therefore shouldn't pay much of > an overhead if they aren't used.
Agreed.
> We assume that even on projects that use signals, connect and > disconnect of handlers is relatively rare compared to the number of > sent/received signals.
I'm still curious to know what the speed differences are in the case of connect/disconnect. I've been experimenting a bit with using middleware and decorators to connect and disconnect signals for all models, but only for certain views, as a way of logging what actions a view takes. Thankfully, the `Any` listener is still supported, so this can still work, but I've always been a little concerned about the overhead of adding and removing signals like that for every request that needs them.
I know it's still very much a minority case, but if you've already got everything set up to benchmark, would it be possible to get those numbers as well?
> Summary results: > Signal sends with no listeners in process: 67% faster > Signal sends with no listeners, but 10 other listeners on > different signals: 91% faster > Signal sends with 30 weakref receivers: 60% faster > Signal sends with 30 strongref receivers: 67% faster
Definitely good news!
> As a sideline, I noted even though signal handling frequently > should hold weak references to receivers, dealing with weak references > has a fairly high overhead. I took a hack at adding a > USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where > people can opt-out of the weakref safety net and into yet faster > signal handling. > My approach was to have a base signal class which deferred weak vs. > strong semantics to subclasses. Unfortunately, that approach led to > more function call overhead, which lost most of the gain to be had. > In other words, USE_WEAK_SIGNALS could gain more performance (about > 20% over this new performance), but would require some code > duplication to avoid function call overhead. Out of time now. In any > case, that could be done in a backwards compatible way, so it's not a > 1.0 API issue.
I've gotten bitten by the weak reference thing on more than one occasion, so my first instinct is to just dump it altogether, rather than try to support both. Especially since the stated vast majority case is when just a few listeners are registered early in the process, and remain for the life of the process anyway. To my understanding weak references are used here for listeners that will likely go out of scope often, to save memory, since otherwise the dispatcher would keep those listeners (and possibly related objects) alive indefinitely, simply by having a listener registered.
However, I'm also the guy wondering about connecting and disconnecting signals in middleware and decorators, so I clearly have an interest in the potential performance benefits of weak references. I don't really have an official stance on this topic, but I'll definitely give it some more thought as time goes on, after 1.0 ships.
Good work, guys! I'm glad to see signals get the attention they deserve, and I look forward to seeing this in trunk.
On Fri, Aug 1, 2008 at 1:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote: > Faster signals are available as an attachment here. I tried > attaching to #6814, but my upload was always accepted with 0 length.
It's a Trac bug, BTW, with larger patches: they don't show up in the diff viewer. It's still there, though: click the "original format" link to download.
> Signal sends with no listeners in process: 67% faster > Signal sends with no listeners, but 10 other listeners on > different signals: 91% faster > Signal sends with 30 weakref receivers: 60% faster > Signal sends with 30 strongref receivers: 67% faster
And there was much rejoicing!
I'll take a closer look at this over the weekend, but at first glance I'm pretty happy and ready to check it in. This might be the biggest blocker for feature freeze since it's both a feature in its own right *and* a change that Jython/PyPy support depends on.
On Fri, Aug 1, 2008 at 7:56 AM, Marty Alchin <gulop...@gamemusic.org> wrote: > That's something I'd like to change, though, especially since we can't > point people to PyDispatcher information anymore. I don't think docs > should block this from making it into the beta, but I'd definitely > like to see some docs go in before 1.0 proper. I'll see if I can spare > some time to work on some in the next few weeks.
I'll happily review/edit whatever you come up with; let me know if you don't have time so I can take a stab.
On Fri, Aug 1, 2008 at 2:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote: > Hello all, > Faster signals are available as an attachment here. I tried > attaching to #6814, but my upload was always accepted with 0 length. > Odd.
> The updated patch applies cleanly to 8156.
Cool!. I've just tested it on Jython, and nothing breaks on the test suite after applying the patch.
[...]
> Also, robustApply has died. (And there was much rejoicing?)
OK, I've finally had a chance to complete my review of the patch. Here are my thoughts:
1. The speed-ups are impressive: I'm seeing about a 40% improvement in the running of the test suite (from 500s down to 300s). That's just fucking awesome. On top of that the net -500 LOC is very, very nice :)
2. I'm a bit confused about the implementation of ``dispatch_uid``. That is, it's used as an identifier to prevent duplicate registration of functions, but *not* internally to actually register the function. So, if I hit one of those same-function-different-id()s places, I'll prevent duplicate registration but it's possible that a ``disconnect`` call will fail (since it uses ``id(target)``). Is there a reason not to actually use ``dispatch_uid`` as the actual registration key?
3. On a similar tack, it seems I should be able to do::
This makes a few use-cases I've got a little simpler (namely dynamic generation of signal receivers. Don't ask.)
4. I see ``Signal.send_robust``, and I see a test for it, but I don't see where it's used. Is it vestigial, or is there a good reason to keep the two sending methods around? If so, maybe we should just use a ``suppress_exceptions`` kwarg on ``send`` instead?
5. Let's table your ``USE_WEAK_SIGNALS`` idea until at least after 1.0. I like the general idea of being able to opt out of weakrefs, but we can add that later without much fuss.
6. I'm assuming that your email starting this thread has everything I need to add to BackwardsIncompatibleChanges -- is that so?
7. I'm not thrilled with the continuing lack of signal docs, but *sigh* since we didn't have 'em before we might as well check this in without 'em still. I really hope someone interested enough in signals to follow this thread will step to write some (hint, hint!), but if not I'll make it happen before 1.0 myself.
I know you're busy moving this week, so I'm happy to make any of the changes I've proposed myself. Just want to make sure I'm not missing something. I'm also on #django-dev all day if that's an easier way to work some of this stuff out.
Other than the above, though, I'm ready to check this in.
> 2. I'm a bit confused about the implementation of ``dispatch_uid``. > That is, it's used as an identifier to prevent duplicate registration > of functions, but *not* internally to actually register the function. > So, if I hit one of those same-function-different-id()s places, I'll > prevent duplicate registration but it's possible that a ``disconnect`` > call will fail (since it uses ``id(target)``). Is there a reason not > to actually use ``dispatch_uid`` as the actual registration key?
> 3. On a similar tack, it seems I should be able to do::
> This makes a few use-cases I've got a little simpler (namely dynamic > generation of signal receivers. Don't ask.)
+1 for using ``dispatch_uid`` as the registration key.
We added ``dispatch_uid`` at the last moment as a fix for the multiple module import problem. (Getting around the "gross hack" Jeremy mentioned in the original message.) The ``disconnect`` call will only use ``id(target)`` if it's not an instance method (the logic for generating the id is the same as at registration time), so it shouldn't fail. That said, your suggestion in #3 is a more compelling argument to make the change.
I just noticed a potential issue with the current implementation: the original code allowed the same receiver to be connected to a signal multiple times as long as there were different senders. The ``dispatch_uid`` code changed that. Using the ``dispatch_uid`` as the registration key would remove the need for the ``registered_uids`` member, and fix this.
> 4. I see ``Signal.send_robust``, and I see a test for it, but I don't > see where it's used. Is it vestigial, or is there a good reason to > keep the two sending methods around? If so, maybe we should just use a > ``suppress_exceptions`` kwarg on ``send`` instead?
Vestigal. +0
> 6. I'm assuming that your email starting this thread has everything I > need to add to BackwardsIncompatibleChanges -- is that so?
Yes
> 7. I'm not thrilled with the continuing lack of signal docs, but > *sigh* since we didn't have 'em before we might as well check this in > without 'em still. I really hope someone interested enough in signals > to follow this thread will step to write some (hint, hint!), but if > not I'll make it happen before 1.0 myself.
On Aug 5, 2008, at 14:42, "Keith Bussell" <kbuss...@gmail.com> wrote:
> Hi Jacob,
> I worked on the signal refactor with Jeremy.
> On Tue, Aug 5, 2008 at 9:49 AM, Jacob Kaplan-Moss <Jacob... > 4. I see ``Signal.send_robust``, and I see a test for it, but I don't > see where it's used. Is it vestigial, or is there a good reason to > keep the two sending methods around? If so, maybe we should just use a > ``suppress_exceptions`` kwarg on ``send`` instead?
> Vestigal. +0
Note that a branch in the "for receiver in receivers" loop will have some overhead. I guess it doesn't hurt the common case of few receivers, though.
On Tue, Aug 5, 2008 at 2:42 PM, Keith Bussell <kbuss...@gmail.com> wrote: > +1 for using ``dispatch_uid`` as the registration key.
Yeah, good -- let's do it.
>> 4. I see ``Signal.send_robust``, and I see a test for it, but I don't >> see where it's used. Is it vestigial, or is there a good reason to >> keep the two sending methods around? If so, maybe we should just use a >> ``suppress_exceptions`` kwarg on ``send`` instead?
As Jeremy points out this'll cause a (small) bit of performance overhead. I say we worry about that later -- we can do more microbenchmarking later on.
On Tue, Aug 5, 2008 at 8:52 PM, Keith Bussell <kbuss...@gmail.com> wrote: > I uploaded a patch to the ticket[1] a few hours ago. I mentioned it to you > in IRC, but just now realized you weren't logged in. Oops.
Yeah, I saw. I'll check this in tomorrow morning -- right now I'm a few drinks past the legal limit and don't want to end up with a CUI.
> As an aside, once this gets committed, ticket #3951[2] can be closed with a > note about ``dispatch_uid``.
On Aug 6, 7:25 pm, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> It only needs to be that way if the list is being written to at some
> point, I'm not sure whether that's the case.
Yes, after closer look I see that it doesn't break anything in our
case but IMHO it's still better and safer to avoid using list as
default argument.
Another nitpick: doc string of send_robust says "These arguments must
be a subset of the argument names defined in providing_args, or a
DispatcherTypeError will be raised", but it doesn't happen.
providing_args set is not used anywhere in the code at all. What is
the reason behind this attribute?
On Wed, Aug 6, 2008 at 9:42 AM, Ivan Illarionov <ivan.illario...@gmail.com>wrote:
> Another nitpick: doc string of send_robust says "These arguments must > be a subset of the argument names defined in providing_args, or a > DispatcherTypeError will be raised", but it doesn't happen. > providing_args set is not used anywhere in the code at all. What is > the reason behind this attribute?
We went back and forth trying to balance features and performance and decided to remove the functionality still referenced in the docstring. At the moment, the providing_args parameter is purely for documentation purposes, but extra checking may be added in the future if necessary. Since API stability is a major feature of 1.0, we decided to leave it in.
The docstrings still need some help. I'll take another pass through them over the weekend.