Signals performance (#6814)

127 views
Skip to first unread message

Jeremy Dunck

unread,
Aug 1, 2008, 2:17:52 AM8/1/08
to django-d...@googlegroups.com
Hello all,
Faster signals are available as an attachment here. I tried
attaching to #6814, but my upload was always accepted with 0 length.
Odd.

The updated patch applies cleanly to 8156.

In addition to changes in patch, the following files may be removed
from the tree:
svn rm django/dispatch/errors.py django/dispatch/robust.py
django/dispatch/robustapply.py
tests/regressiontests/dispatch/tests/test_robustapply.py

Keith Bussell did at least as much work on this as I did. He
prevailed to keep weak references and noted a rather shocking bug.

(It turns out that id() for a instance methods can change, meaning
duplicate registration or missed sends where the sender or receiver is
an instance method.
http://groups.google.com/group/comp.lang.python/msg/6453d01aff5c87a0
)

The first thing worth noting is that the changes are backwards incompatible:

1) All receivers to accept **kwargs. The benefit here is that
signal dispatch is much quicker because it doesn't have to worry about
conforming the sender's arguments to the receiver's arguments.
(When settings.DEBUG == True, connect asserts that the receiver
does indeed accept **kwargs.)
2) Dropped support for `Anonymous` sender. `Any` is still
functionally available if Signal.connect is called with a `sender` of
`None`.
3) The __author__, __cvsid__ and __version__ attributes were
removed from django.dispatch, along with various other internals from
the original pydispatcher implementation.

Also, robustApply has died. (And there was much rejoicing?)

Signals go from object sentinels to class Signal instances. These
keep track of their own receivers. Order of dispatch to receivers is
preserved, though I think the API shouldn't guarantee that-- it's just
faster for send to use a list of receivers than a dict, but that may
change at some point.

Since signals were not officially documented before, this patch
doesn't include docs.

Any receiver wanting to ensure unique connection (despite, say,
duplicate module loads) should define a dispatch_uid attribute. e.g.
=======
def some_handler(**kwargs):
pass
some_handler.dispatch_uid = 'my very unique id'
post_syncdb.connect(some_handler)
=======

This avoids gross hacks like this:
http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/management/__init__.py?rev=7598#L48
;-)

django.dispatch.dispatcher.* still exist for backwards
compatibility; literally the only change client code should need to
make is adding **kwargs.


OK, the fun stuff:
We assume that the 0-receiver case is most common, that is that
most projects don't use signals, and therefore shouldn't pay much of
an overhead if they aren't used. (With a nod to bharring's prior
approach of monkey-patching upon first listen.)

We assume that even on projects that use signals, connect and
disconnect of handlers is relatively rare compared to the number of
sent/received signals.

Optimizations have those things in mind.

Summary results:
Signal sends with no listeners in process: 67% faster
Signal sends with no listeners, but 10 other listeners on
different signals: 91% faster
Signal sends with 30 weakref receivers: 60% faster
Signal sends with 30 strongref receivers: 67% faster


As a sideline, I noted even though signal handling frequently
should hold weak references to receivers, dealing with weak references
has a fairly high overhead. I took a hack at adding a
USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where
people can opt-out of the weakref safety net and into yet faster
signal handling.
My approach was to have a base signal class which deferred weak vs.
strong semantics to subclasses. Unfortunately, that approach led to
more function call overhead, which lost most of the gain to be had.
In other words, USE_WEAK_SIGNALS could gain more performance (about
20% over this new performance), but would require some code
duplication to avoid function call overhead. Out of time now. In any
case, that could be done in a backwards compatible way, so it's not a
1.0 API issue.

Let us know any questions-- hopefully this makes the cut.

I'm mostly offline the next few days; we're moving. Bad timing w/
1.0 approach, alas.

-Jeremy

signals_8156-2.diff

Marty Alchin

unread,
Aug 1, 2008, 8:56:59 AM8/1/08
to django-d...@googlegroups.com
On Fri, Aug 1, 2008 at 2:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> The first thing worth noting is that the changes are backwards incompatible:
>
> 1) All receivers to accept **kwargs. The benefit here is that
> signal dispatch is much quicker because it doesn't have to worry about
> conforming the sender's arguments to the receiver's arguments.
> (When settings.DEBUG == True, connect asserts that the receiver
> does indeed accept **kwargs.)

I was a little worried about the **kwargs requirement, but I think the
DEBUG==True approach is a good way to go about handling that change.

> 2) Dropped support for `Anonymous` sender. `Any` is still
> functionally available if Signal.connect is called with a `sender` of
> `None`.

I also assume that, since signals are now objects that manage their
own listeners, there's also no more listening for `Any` signal? I'm
happy with that, but it would be worth noting as a
backwards-incompatible change.

> Also, robustApply has died. (And there was much rejoicing?)

Indeed. MUCH rejoicing!

> Signals go from object sentinels to class Signal instances. These
> keep track of their own receivers. Order of dispatch to receivers is
> preserved, though I think the API shouldn't guarantee that-- it's just
> faster for send to use a list of receivers than a dict, but that may
> change at some point.

Agreed. I don't think a dict will ever be a reasonable approach for
this, but a set might prove useful, or some other unordered
collection. We can't guarantee what order they're registered in (due
to import issues), and signals should never rely on other signals
anyway, so guaranteeing an execution order would be pure madness.

> Since signals were not officially documented before, this patch
> doesn't include docs.

That's something I'd like to change, though, especially since we can't
point people to PyDispatcher information anymore. I don't think docs
should block this from making it into the beta, but I'd definitely
like to see some docs go in before 1.0 proper. I'll see if I can spare
some time to work on some in the next few weeks.

> Any receiver wanting to ensure unique connection (despite, say,
> duplicate module loads) should define a dispatch_uid attribute. e.g.
> =======
> def some_handler(**kwargs):
> pass
> some_handler.dispatch_uid = 'my very unique id'
> post_syncdb.connect(some_handler)
> =======
>
> This avoids gross hacks like this:
> http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/management/__init__.py?rev=7598#L48
> ;-)

I like it.

> django.dispatch.dispatcher.* still exist for backwards
> compatibility; literally the only change client code should need to
> make is adding **kwargs.

As long as there has to be some import changes -- and at the risk of
starting a minor flame war -- what the distinction is between django.*
and django.core.*? This seems -- to me -- like a candidate for
django.core instead, but I really don't know what goes into that
placement.

> We assume that the 0-receiver case is most common, that is that
> most projects don't use signals, and therefore shouldn't pay much of
> an overhead if they aren't used.

Agreed.

> We assume that even on projects that use signals, connect and
> disconnect of handlers is relatively rare compared to the number of
> sent/received signals.

I'm still curious to know what the speed differences are in the case
of connect/disconnect. I've been experimenting a bit with using
middleware and decorators to connect and disconnect signals for all
models, but only for certain views, as a way of logging what actions a
view takes. Thankfully, the `Any` listener is still supported, so this
can still work, but I've always been a little concerned about the
overhead of adding and removing signals like that for every request
that needs them.

I know it's still very much a minority case, but if you've already got
everything set up to benchmark, would it be possible to get those
numbers as well?

> Summary results:
> Signal sends with no listeners in process: 67% faster
> Signal sends with no listeners, but 10 other listeners on
> different signals: 91% faster
> Signal sends with 30 weakref receivers: 60% faster
> Signal sends with 30 strongref receivers: 67% faster

Definitely good news!

> As a sideline, I noted even though signal handling frequently
> should hold weak references to receivers, dealing with weak references
> has a fairly high overhead. I took a hack at adding a
> USE_WEAK_SIGNALS settings (similar in purpose to USE_I18N) where
> people can opt-out of the weakref safety net and into yet faster
> signal handling.
> My approach was to have a base signal class which deferred weak vs.
> strong semantics to subclasses. Unfortunately, that approach led to
> more function call overhead, which lost most of the gain to be had.
> In other words, USE_WEAK_SIGNALS could gain more performance (about
> 20% over this new performance), but would require some code
> duplication to avoid function call overhead. Out of time now. In any
> case, that could be done in a backwards compatible way, so it's not a
> 1.0 API issue.

I've gotten bitten by the weak reference thing on more than one
occasion, so my first instinct is to just dump it altogether, rather
than try to support both. Especially since the stated vast majority
case is when just a few listeners are registered early in the process,
and remain for the life of the process anyway. To my understanding
weak references are used here for listeners that will likely go out of
scope often, to save memory, since otherwise the dispatcher would keep
those listeners (and possibly related objects) alive indefinitely,
simply by having a listener registered.

However, I'm also the guy wondering about connecting and disconnecting
signals in middleware and decorators, so I clearly have an interest in
the potential performance benefits of weak references. I don't really
have an official stance on this topic, but I'll definitely give it
some more thought as time goes on, after 1.0 ships.

Good work, guys! I'm glad to see signals get the attention they
deserve, and I look forward to seeing this in trunk.

-Gul

Jacob Kaplan-Moss

unread,
Aug 1, 2008, 9:36:47 AM8/1/08
to django-d...@googlegroups.com
On Fri, Aug 1, 2008 at 1:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> Faster signals are available as an attachment here. I tried
> attaching to #6814, but my upload was always accepted with 0 length.

I attached it here:

http://code.djangoproject.com/attachment/ticket/6814/signals_8156-2.diff

It's a Trac bug, BTW, with larger patches: they don't show up in the
diff viewer. It's still there, though: click the "original format"
link to download.

> Signal sends with no listeners in process: 67% faster
> Signal sends with no listeners, but 10 other listeners on
> different signals: 91% faster
> Signal sends with 30 weakref receivers: 60% faster
> Signal sends with 30 strongref receivers: 67% faster

And there was much rejoicing!

I'll take a closer look at this over the weekend, but at first glance
I'm pretty happy and ready to check it in. This might be the biggest
blocker for feature freeze since it's both a feature in its own right
*and* a change that Jython/PyPy support depends on.

Jacob

Jacob Kaplan-Moss

unread,
Aug 1, 2008, 11:44:31 AM8/1/08
to django-d...@googlegroups.com
On Fri, Aug 1, 2008 at 7:56 AM, Marty Alchin <gulo...@gamemusic.org> wrote:
> That's something I'd like to change, though, especially since we can't
> point people to PyDispatcher information anymore. I don't think docs
> should block this from making it into the beta, but I'd definitely
> like to see some docs go in before 1.0 proper. I'll see if I can spare
> some time to work on some in the next few weeks.

I'll happily review/edit whatever you come up with; let me know if you
don't have time so I can take a stab.

A couple of starting points that might help:

* http://code.djangoproject.com/wiki/Signals
* http://www.mercurytide.co.uk/whitepapers/django-signals/ -- we
should see if we can get permission to adapt this article into
official docs.

Jacob

Leo Soto M.

unread,
Aug 1, 2008, 1:03:37 PM8/1/08
to django-d...@googlegroups.com
On Fri, Aug 1, 2008 at 2:17 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> Hello all,
> Faster signals are available as an attachment here. I tried
> attaching to #6814, but my upload was always accepted with 0 length.
> Odd.
>
> The updated patch applies cleanly to 8156.

Cool!. I've just tested it on Jython, and nothing breaks on the test
suite after applying the patch.

[...]

> Also, robustApply has died. (And there was much rejoicing?)

Much rejoicing indeed!

--
Leo Soto M.
http://blog.leosoto.com

Jacob Kaplan-Moss

unread,
Aug 5, 2008, 12:49:32 PM8/5/08
to django-d...@googlegroups.com
Hi Jeremy --

OK, I've finally had a chance to complete my review of the patch. Here
are my thoughts:

1. The speed-ups are impressive: I'm seeing about a 40% improvement in
the running of the test suite (from 500s down to 300s). That's just
fucking awesome. On top of that the net -500 LOC is very, very nice :)

2. I'm a bit confused about the implementation of ``dispatch_uid``.
That is, it's used as an identifier to prevent duplicate registration
of functions, but *not* internally to actually register the function.
So, if I hit one of those same-function-different-id()s places, I'll
prevent duplicate registration but it's possible that a ``disconnect``
call will fail (since it uses ``id(target)``). Is there a reason not
to actually use ``dispatch_uid`` as the actual registration key?

3. On a similar tack, it seems I should be able to do::

>>> some_signal.connect(reciever, dispatch_uid="whatever")

and::

>>> some_signal.disconnect(dispatch_uid="whatever")

This makes a few use-cases I've got a little simpler (namely dynamic
generation of signal receivers. Don't ask.)

4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?

5. Let's table your ``USE_WEAK_SIGNALS`` idea until at least after
1.0. I like the general idea of being able to opt out of weakrefs, but
we can add that later without much fuss.

6. I'm assuming that your email starting this thread has everything I
need to add to BackwardsIncompatibleChanges -- is that so?

7. I'm not thrilled with the continuing lack of signal docs, but
*sigh* since we didn't have 'em before we might as well check this in
without 'em still. I really hope someone interested enough in signals
to follow this thread will step to write some (hint, hint!), but if
not I'll make it happen before 1.0 myself.

I know you're busy moving this week, so I'm happy to make any of the
changes I've proposed myself. Just want to make sure I'm not missing
something. I'm also on #django-dev all day if that's an easier way to
work some of this stuff out.

Other than the above, though, I'm ready to check this in.

Jacob

Keith Bussell

unread,
Aug 5, 2008, 3:42:07 PM8/5/08
to django-d...@googlegroups.com
Hi Jacob,

I worked on the signal refactor with Jeremy.

On Tue, Aug 5, 2008 at 9:49 AM, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:

2. I'm a bit confused about the implementation of ``dispatch_uid``.
That is, it's used as an identifier to prevent duplicate registration
of functions, but *not* internally to actually register the function.
So, if I hit one of those same-function-different-id()s places, I'll
prevent duplicate registration but it's possible that a ``disconnect``
call will fail (since it uses ``id(target)``). Is there a reason not
to actually use ``dispatch_uid`` as the actual registration key?

3. On a similar tack, it seems I should be able to do::

   >>> some_signal.connect(reciever, dispatch_uid="whatever")

and::

   >>> some_signal.disconnect(dispatch_uid="whatever")

This makes a few use-cases I've got a little simpler (namely dynamic
generation of signal receivers. Don't ask.)

+1 for using ``dispatch_uid`` as the registration key.

We added ``dispatch_uid`` at the last moment as a fix for the multiple module import problem. (Getting around the "gross hack" Jeremy mentioned in the original message.) The ``disconnect`` call will only use ``id(target)`` if it's not an instance method (the logic for generating the id is the same as at registration time), so it shouldn't fail. That said, your suggestion in #3 is a more compelling argument to make the change.

I just noticed a potential issue with the current implementation: the original code allowed the same receiver to be connected to a signal multiple times as long as there were different senders. The ``dispatch_uid`` code changed that. Using the ``dispatch_uid`` as the registration key would remove the need for the ``registered_uids`` member, and fix this.
 

4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?

Vestigal.  +0

6. I'm assuming that your email starting this thread has everything I
need to add to BackwardsIncompatibleChanges -- is that so?

Yes

7. I'm not thrilled with the continuing lack of signal docs, but
*sigh* since we didn't have 'em before we might as well check this in
without 'em still. I really hope someone interested enough in signals
to follow this thread will step to write some  (hint, hint!), but if
not I'll make it happen before 1.0 myself.

I'll take a whack at it.


    -keithb

Jeremy Dunck

unread,
Aug 5, 2008, 4:10:18 PM8/5/08
to django-d...@googlegroups.com, django-d...@googlegroups.com

On Aug 5, 2008, at 14:42, "Keith Bussell" <kbus...@gmail.com> wrote:

Hi Jacob,

I worked on the signal refactor with Jeremy.

On Tue, Aug 5, 2008 at 9:49 AM, Jacob Kaplan-Moss <Jacob...

4. I see ``Signal.send_robust``, and I see a test for it, but I don't
see where it's used. Is it vestigial, or is there a good reason to
keep the two sending methods around? If so, maybe we should just use a
``suppress_exceptions`` kwarg on ``send`` instead?

Vestigal.  +0


Note that a branch in the "for receiver in receivers" loop will have some overhead.  I guess it doesn't hurt the common case of few receivers, though.

(man quoting context is a pain on the iphone)

Jacob Kaplan-Moss

unread,
Aug 5, 2008, 4:49:08 PM8/5/08
to django-d...@googlegroups.com
On Tue, Aug 5, 2008 at 2:42 PM, Keith Bussell <kbus...@gmail.com> wrote:
> +1 for using ``dispatch_uid`` as the registration key.

Yeah, good -- let's do it.

>> 4. I see ``Signal.send_robust``, and I see a test for it, but I don't
>> see where it's used. Is it vestigial, or is there a good reason to
>> keep the two sending methods around? If so, maybe we should just use a
>> ``suppress_exceptions`` kwarg on ``send`` instead?

As Jeremy points out this'll cause a (small) bit of performance
overhead. I say we worry about that later -- we can do more
microbenchmarking later on.

> I'll take a whack at it.

Cool. I emailed the MercuryTide folks asking for permission to adapt
some of their document
(http://www.mercurytide.co.uk/whitepapers/django-signals/); also see
http://code.djangoproject.com/wiki/Signals as a place to start.

Don't let this hold up getting the feature in, though; I'd like to
check this in tonight and get docs in over the next few days.

If you've got the time, go ahead and make these changes; otherwise
I'll do'em myself later tonight.

Jacob

Keith Bussell

unread,
Aug 5, 2008, 9:52:58 PM8/5/08
to django-d...@googlegroups.com
I uploaded a patch to the ticket[1] a few hours ago. I mentioned it to you in IRC, but just now realized you weren't logged in. Oops.

As an aside, once this gets committed, ticket #3951[2] can be closed with a note about ``dispatch_uid``.

    -keithb

[1] http://code.djangoproject.com/ticket/6814
[2] http://code.djangoproject.com/ticket/3951

Jacob Kaplan-Moss

unread,
Aug 5, 2008, 10:51:42 PM8/5/08
to django-d...@googlegroups.com
On Tue, Aug 5, 2008 at 8:52 PM, Keith Bussell <kbus...@gmail.com> wrote:
> I uploaded a patch to the ticket[1] a few hours ago. I mentioned it to you
> in IRC, but just now realized you weren't logged in. Oops.

Yeah, I saw. I'll check this in tomorrow morning -- right now I'm a
few drinks past the legal limit and don't want to end up with a CUI.

> As an aside, once this gets committed, ticket #3951[2] can be closed with a
> note about ``dispatch_uid``.

Cool, thanks for the note.

Jacob

Message has been deleted

Ivan Illarionov

unread,
Aug 5, 2008, 10:54:42 PM8/5/08
to Django developers
Hi

From the patch:
def __init__(self, providing_args=[]):

Shouldn't it be
def __init__(self, providing_args=None):
if providing_args is None:
providing_args = []

?

Regards,
Ivan Illarionov

Jacob Kaplan-Moss

unread,
Aug 6, 2008, 9:28:23 AM8/6/08
to django-d...@googlegroups.com
On Tue, Aug 5, 2008 at 9:54 PM, Ivan Illarionov
<ivan.il...@gmail.com> wrote:
> Shouldn't it be
> def __init__(self, providing_args=None):
> if providing_args is None:
> providing_args = []

Yup, good catch.

Jacob

alex....@gmail.com

unread,
Aug 6, 2008, 11:25:22 AM8/6/08
to Django developers
It only needs to be that way if the list is being written to at some
point, I'm not sure whether that's the case.

On Aug 6, 8:28 am, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:
> On Tue, Aug 5, 2008 at 9:54 PM, Ivan Illarionov
>

Ivan Illarionov

unread,
Aug 6, 2008, 12:42:13 PM8/6/08
to Django developers
On Aug 6, 7:25 pm, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> It only needs to be that way if the list is being written to at some
> point, I'm not sure whether that's the case.

Yes, after closer look I see that it doesn't break anything in our
case but IMHO it's still better and safer to avoid using list as
default argument.

Another nitpick: doc string of send_robust says "These arguments must
be a subset of the argument names defined in providing_args, or a
DispatcherTypeError will be raised", but it doesn't happen.
providing_args set is not used anywhere in the code at all. What is
the reason behind this attribute?

Keith Bussell

unread,
Aug 6, 2008, 1:23:02 PM8/6/08
to django-d...@googlegroups.com
On Wed, Aug 6, 2008 at 9:42 AM, Ivan Illarionov <ivan.il...@gmail.com> wrote:

Another nitpick: doc string of send_robust says "These arguments must
be a subset of the argument names defined in providing_args, or a
DispatcherTypeError will be raised", but it doesn't happen.
providing_args set is not used anywhere in the code at all. What is
the reason behind this attribute?


We went back and forth trying to balance features and performance and decided to remove the functionality still referenced in the docstring. At the moment, the providing_args parameter is purely for documentation purposes, but extra checking may be added in the future if necessary. Since API stability is a major feature of 1.0, we decided to leave it in.
 
The docstrings still need some help. I'll take another pass through them over the weekend.

    -keithb
Reply all
Reply to author
Forward
0 new messages