The updated patch applies cleanly to 8156.
In addition to changes in patch, the following files may be removed
from the tree:
svn rm django/dispatch/errors.py django/dispatch/robust.py
django/dispatch/robustapply.py
tests/regressiontests/dispatch/tests/test_robustapply.py
Keith Bussell did at least as much work on this as I did. He
prevailed to keep weak references and noted a rather shocking bug.
(It turns out that id() for a instance methods can change, meaning
duplicate registration or missed sends where the sender or receiver is
an instance method.
http://groups.google.com/group/comp.lang.python/msg/6453d01aff5c87a0
)
The first thing worth noting is that the changes are backwards incompatible:
1) All receivers to accept **kwargs. The benefit here is that
signal dispatch is much quicker because it doesn't have to worry about
conforming the sender's arguments to the receiver's arguments.
(When settings.DEBUG == True, connect asserts that the receiver
does indeed accept **kwargs.)
2) Dropped support for `Anonymous` sender. `Any` is still
functionally available if Signal.connect is called with a `sender` of
`None`.
3) The __author__, __cvsid__ and __version__ attributes were
removed from django.dispatch, along with various other internals from
the original pydispatcher implementation.
Also, robustApply has died. (And there was much rejoicing?)
Signals go from object sentinels to class Signal instances. These
keep track of their own receivers. Order of dispatch to receivers is
preserved, though I think the API shouldn't guarantee that-- it's just
faster for send to use a list of receivers than a dict, but that may
change at some point.
Since signals were not officially documented before, this patch
doesn't include docs.
Any receiver wanting to ensure unique connection (despite, say,
duplicate module loads) should define a dispatch_uid attribute. e.g.
=======
def some_handler(**kwargs):
pass
some_handler.dispatch_uid = 'my very unique id'
post_syncdb.connect(some_handler)
=======
This avoids gross hacks like this:
http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/management/__init__.py?rev=7598#L48
;-)
django.dispatch.dispatcher.* still exist for backwards
compatibility; literally the only change client code should need to
make is adding **kwargs.
OK, the fun stuff:
We assume that the 0-receiver case is most common, that is that
most projects don't use signals, and therefore shouldn't pay much of
an overhead if they aren't used. (With a nod to bharring's prior
approach of monkey-patching upon first listen.)
We assume that even on projects that use signals, connect and
disconnect of handlers is relatively rare compared to the number of
sent/received signals.
Optimizations have those things in mind.
Summary results:
Signal sends with no listeners in process: 67% faster
Signal sends with no listeners, but 10 other listeners on
different signals: 91% faster
Signal sends with 30 weakref receivers: 60% faster
Signal sends with 30 strongref receivers: 67% faster
As a sideline, I noted even though signal handling frequently
should hold weak references to receivers, dealing with weak references
has a fairly high overhead. I took a hack at adding a
USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where
people can opt-out of the weakref safety net and into yet faster
signal handling.
My approach was to have a base signal class which deferred weak vs.
strong semantics to subclasses. Unfortunately, that approach led to
more function call overhead, which lost most of the gain to be had.
In other words, USE_WEAK_SIGNALS could gain more performance (about
20% over this new performance), but would require some code
duplication to avoid function call overhead. Out of time now. In any
case, that could be done in a backwards compatible way, so it's not a
1.0 API issue.
Let us know any questions-- hopefully this makes the cut.
I'm mostly offline the next few days; we're moving. Bad timing w/
1.0 approach, alas.
-Jeremy
I was a little worried about the **kwargs requirement, but I think the
DEBUG==True approach is a good way to go about handling that change.
> 2) Dropped support for `Anonymous` sender. `Any` is still
> functionally available if Signal.connect is called with a `sender` of
> `None`.
I also assume that, since signals are now objects that manage their
own listeners, there's also no more listening for `Any` signal? I'm
happy with that, but it would be worth noting as a
backwards-incompatible change.
> Also, robustApply has died. (And there was much rejoicing?)
Indeed. MUCH rejoicing!
> Signals go from object sentinels to class Signal instances. These
> keep track of their own receivers. Order of dispatch to receivers is
> preserved, though I think the API shouldn't guarantee that-- it's just
> faster for send to use a list of receivers than a dict, but that may
> change at some point.
Agreed. I don't think a dict will ever be a reasonable approach for
this, but a set might prove useful, or some other unordered
collection. We can't guarantee what order they're registered in (due
to import issues), and signals should never rely on other signals
anyway, so guaranteeing an execution order would be pure madness.
> Since signals were not officially documented before, this patch
> doesn't include docs.
That's something I'd like to change, though, especially since we can't
point people to PyDispatcher information anymore. I don't think docs
should block this from making it into the beta, but I'd definitely
like to see some docs go in before 1.0 proper. I'll see if I can spare
some time to work on some in the next few weeks.
> Any receiver wanting to ensure unique connection (despite, say,
> duplicate module loads) should define a dispatch_uid attribute. e.g.
> =======
> def some_handler(**kwargs):
> pass
> some_handler.dispatch_uid = 'my very unique id'
> post_syncdb.connect(some_handler)
> =======
>
> This avoids gross hacks like this:
> http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/management/__init__.py?rev=7598#L48
> ;-)
I like it.
> django.dispatch.dispatcher.* still exist for backwards
> compatibility; literally the only change client code should need to
> make is adding **kwargs.
As long as there has to be some import changes -- and at the risk of
starting a minor flame war -- what the distinction is between django.*
and django.core.*? This seems -- to me -- like a candidate for
django.core instead, but I really don't know what goes into that
placement.
> We assume that the 0-receiver case is most common, that is that
> most projects don't use signals, and therefore shouldn't pay much of
> an overhead if they aren't used.
Agreed.
> We assume that even on projects that use signals, connect and
> disconnect of handlers is relatively rare compared to the number of
> sent/received signals.
I'm still curious to know what the speed differences are in the case
of connect/disconnect. I've been experimenting a bit with using
middleware and decorators to connect and disconnect signals for all
models, but only for certain views, as a way of logging what actions a
view takes. Thankfully, the `Any` listener is still supported, so this
can still work, but I've always been a little concerned about the
overhead of adding and removing signals like that for every request
that needs them.
I know it's still very much a minority case, but if you've already got
everything set up to benchmark, would it be possible to get those
numbers as well?
> Summary results:
> Signal sends with no listeners in process: 67% faster
> Signal sends with no listeners, but 10 other listeners on
> different signals: 91% faster
> Signal sends with 30 weakref receivers: 60% faster
> Signal sends with 30 strongref receivers: 67% faster
Definitely good news!
> As a sideline, I noted even though signal handling frequently
> should hold weak references to receivers, dealing with weak references
> has a fairly high overhead. I took a hack at adding a
> USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where
> people can opt-out of the weakref safety net and into yet faster
> signal handling.
> My approach was to have a base signal class which deferred weak vs.
> strong semantics to subclasses. Unfortunately, that approach led to
> more function call overhead, which lost most of the gain to be had.
> In other words, USE_WEAK_SIGNALS could gain more performance (about
> 20% over this new performance), but would require some code
> duplication to avoid function call overhead. Out of time now. In any
> case, that could be done in a backwards compatible way, so it's not a
> 1.0 API issue.
I've gotten bitten by the weak reference thing on more than one
occasion, so my first instinct is to just dump it altogether, rather
than try to support both. Especially since the stated vast majority
case is when just a few listeners are registered early in the process,
and remain for the life of the process anyway. To my understanding
weak references are used here for listeners that will likely go out of
scope often, to save memory, since otherwise the dispatcher would keep
those listeners (and possibly related objects) alive indefinitely,
simply by having a listener registered.
However, I'm also the guy wondering about connecting and disconnecting
signals in middleware and decorators, so I clearly have an interest in
the potential performance benefits of weak references. I don't really
have an official stance on this topic, but I'll definitely give it
some more thought as time goes on, after 1.0 ships.
Good work, guys! I'm glad to see signals get the attention they
deserve, and I look forward to seeing this in trunk.
-Gul
I attached it here:
http://code.djangoproject.com/attachment/ticket/6814/signals_8156-2.diff
It's a Trac bug, BTW, with larger patches: they don't show up in the
diff viewer. It's still there, though: click the "original format"
link to download.
> Signal sends with no listeners in process: 67% faster
> Signal sends with no listeners, but 10 other listeners on
> different signals: 91% faster
> Signal sends with 30 weakref receivers: 60% faster
> Signal sends with 30 strongref receivers: 67% faster
And there was much rejoicing!
I'll take a closer look at this over the weekend, but at first glance
I'm pretty happy and ready to check it in. This might be the biggest
blocker for feature freeze since it's both a feature in its own right
*and* a change that Jython/PyPy support depends on.
Jacob
I'll happily review/edit whatever you come up with; let me know if you
don't have time so I can take a stab.
A couple of starting points that might help:
* http://code.djangoproject.com/wiki/Signals
* http://www.mercurytide.co.uk/whitepapers/django-signals/ -- we
should see if we can get permission to adapt this article into
official docs.
Jacob
Cool!. I've just tested it on Jython, and nothing breaks on the test
suite after applying the patch.
[...]
> Also, robustApply has died. (And there was much rejoicing?)
Much rejoicing indeed!
--
Leo Soto M.
http://blog.leosoto.com
OK, I've finally had a chance to complete my review of the patch. Here
are my thoughts:
1. The speed-ups are impressive: I'm seeing about a 40% improvement in
the running of the test suite (from 500s down to 300s). That's just
fucking awesome. On top of that the net -500 LOC is very, very nice :)
2. I'm a bit confused about the implementation of ``dispatch_uid``.
That is, it's used as an identifier to prevent duplicate registration
of functions, but *not* internally to actually register the function.
So, if I hit one of those same-function-different-id()s places, I'll
prevent duplicate registration but it's possible that a ``disconnect``
call will fail (since it uses ``id(target)``). Is there a reason not
to actually use ``dispatch_uid`` as the actual registration key?
3. On a similar tack, it seems I should be able to do::
>>> some_signal.connect(reciever, dispatch_uid="whatever")
and::
>>> some_signal.disconnect(dispatch_uid="whatever")
This makes a few use-cases I've got a little simpler (namely dynamic
generation of signal receivers. Don't ask.)
4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?
5. Let's table your ``USE_WEAK_SIGNALS`` idea until at least after
1.0. I like the general idea of being able to opt out of weakrefs, but
we can add that later without much fuss.
6. I'm assuming that your email starting this thread has everything I
need to add to BackwardsIncompatibleChanges -- is that so?
7. I'm not thrilled with the continuing lack of signal docs, but
*sigh* since we didn't have 'em before we might as well check this in
without 'em still. I really hope someone interested enough in signals
to follow this thread will step to write some (hint, hint!), but if
not I'll make it happen before 1.0 myself.
I know you're busy moving this week, so I'm happy to make any of the
changes I've proposed myself. Just want to make sure I'm not missing
something. I'm also on #django-dev all day if that's an easier way to
work some of this stuff out.
Other than the above, though, I'm ready to check this in.
Jacob
2. I'm a bit confused about the implementation of ``dispatch_uid``.
That is, it's used as an identifier to prevent duplicate registration
of functions, but *not* internally to actually register the function.
So, if I hit one of those same-function-different-id()s places, I'll
prevent duplicate registration but it's possible that a ``disconnect``
call will fail (since it uses ``id(target)``). Is there a reason not
to actually use ``dispatch_uid`` as the actual registration key?
3. On a similar tack, it seems I should be able to do::
>>> some_signal.connect(reciever, dispatch_uid="whatever")
and::
>>> some_signal.disconnect(dispatch_uid="whatever")
This makes a few use-cases I've got a little simpler (namely dynamic
generation of signal receivers. Don't ask.)
4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?
6. I'm assuming that your email starting this thread has everything I
need to add to BackwardsIncompatibleChanges -- is that so?
7. I'm not thrilled with the continuing lack of signal docs, but
*sigh* since we didn't have 'em before we might as well check this in
without 'em still. I really hope someone interested enough in signals
to follow this thread will step to write some (hint, hint!), but if
not I'll make it happen before 1.0 myself.
Hi Jacob,
I worked on the signal refactor with Jeremy.
On Tue, Aug 5, 2008 at 9:49 AM, Jacob Kaplan-Moss <Jacob...
4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?
Vestigal. +0
Yeah, good -- let's do it.
>> 4. I see ``Signal.send_robust``, and I see a test for it, but I don't
>> see where it's used. Is it vestigial, or is there a good reason to
>> keep the two sending methods around? If so, maybe we should just use a
>> ``suppress_exceptions`` kwarg on ``send`` instead?
As Jeremy points out this'll cause a (small) bit of performance
overhead. I say we worry about that later -- we can do more
microbenchmarking later on.
> I'll take a whack at it.
Cool. I emailed the MercuryTide folks asking for permission to adapt
some of their document
(http://www.mercurytide.co.uk/whitepapers/django-signals/); also see
http://code.djangoproject.com/wiki/Signals as a place to start.
Don't let this hold up getting the feature in, though; I'd like to
check this in tonight and get docs in over the next few days.
If you've got the time, go ahead and make these changes; otherwise
I'll do'em myself later tonight.
Jacob
Yeah, I saw. I'll check this in tomorrow morning -- right now I'm a
few drinks past the legal limit and don't want to end up with a CUI.
> As an aside, once this gets committed, ticket #3951[2] can be closed with a
> note about ``dispatch_uid``.
Cool, thanks for the note.
Jacob
Yup, good catch.
Jacob
Another nitpick: doc string of send_robust says "These arguments must
be a subset of the argument names defined in providing_args, or a
DispatcherTypeError will be raised", but it doesn't happen.
providing_args set is not used anywhere in the code at all. What is
the reason behind this attribute?