Opinions sought on m2m signal ordering

10 views
Skip to first unread message

Russell Keith-Magee

unread,
Mar 27, 2010, 1:08:23 PM3/27/10
to Django Developers
Hi all,

One of the new features in 1.2 are signals on m2m operations [1].

Recently, Ticket #13087 was opened questioning the order in which m2m
signals are sent. I'm calling for any comments or opinions on exactly
how this feature should operate before the current behavior is baked
into a release.

At the moment, on an m2m operation:
* Add is sent *after* the rows have been added
* Remove is sent *after* the rows have been removed
* Clear is sent *before* the rows are cleared

The ordering of the clear signal is the point of contention.

The clear signal doesn't get a list of ids - it just flushes the
table. This means that if the signal is going to do anything to the
list of m2m objects that are being cleared, you need to be able to
interrogate the m2m relation before the clear actually happens. This
ordering issue doesn't affect add and remove because those signals are
given a list of affected IDs.

The problem reported by #13087 is that this ordering means you can't
set up a signal handler that ensures that the m2m relation always
contains a given object. The fact that the clear signal is sent before
the rows are cleared means that anything you add will be immediately
cleared again.

There are 5 options I can see.

Option 1: Do nothing. #13087 describes a use case we don't want to
support, so we ignore it.

Option 2: We add a "cleared" signal that occurs after the clear
actually occurs. This solves the use case for #13087, but only adds 1
signal.

Option 3: We modify the existing signals so we have a pre-post pair
for every signal. This maintains the analog with pre/post save, and
gives the most control. For example, on Alex Gaynor has suggested to
me that some people might want to use a pre-add signal rather than a
post-add signal for cache invalidation since there is a marginally
lower chance of getting a race condition. However, signals aren't free
-- an unattached signal is roughly equivalent to the overhead of a
function call.

Option 4: (1), but also move add and remove to be *pre* signals, to
alleviate Alex's concern from (3)

Option 5: (2), but also move add and remove to be *pre* signals, to
alleviate Alex's concern from (3)

Opinions?

[1] http://code.djangoproject.com/changeset/12223

Yours,
Russ Magee %-)

Andrew Badr

unread,
Mar 27, 2010, 2:52:27 PM3/27/10
to Django developers
Isn't the overhead of a function call negligible compared to executing
a database query or opening/closing a connection?

On Mar 27, 10:08 am, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:

Andrew Godwin

unread,
Mar 27, 2010, 6:59:44 PM3/27/10
to django-d...@googlegroups.com
On 27/03/10 17:08, Russell Keith-Magee wrote:
> There are 5 options I can see.
>
> Option 1: Do nothing. #13087 describes a use case we don't want to
> support, so we ignore it.
>

I think it should be supported; it seems like a reasonable suggestion,
and I can see reasons for implementing things that maintain M2M
relationships in this way.

> Option 2: We add a "cleared" signal that occurs after the clear
> actually occurs. This solves the use case for #13087, but only adds 1
> signal.
>
> Option 3: We modify the existing signals so we have a pre-post pair
> for every signal. This maintains the analog with pre/post save, and
> gives the most control. For example, on Alex Gaynor has suggested to
> me that some people might want to use a pre-add signal rather than a
> post-add signal for cache invalidation since there is a marginally
> lower chance of getting a race condition. However, signals aren't free
> -- an unattached signal is roughly equivalent to the overhead of a
> function call.
>
> Option 4: (1), but also move add and remove to be *pre* signals, to
> alleviate Alex's concern from (3)
>
> Option 5: (2), but also move add and remove to be *pre* signals, to
> alleviate Alex's concern from (3)
>

If the overhead isn't too great I'd personally prefer (3), as it's the
cleanest solution, and the one that seems most obvious (consistency
makes me feel all warm and fuzzy inside). However, since clear() isn't
the same as the other two, (2) also seems reasonable enough to me.

The only use-cases I can think of where you'd have to use a *pre* signal
for add or remove would be to stop them happening (i.e. enforce
constraints at the model level), and this seems like the Wrong Way to do
things. On the contrary, there are more reasons to have them there
*post* action, as you can then safely modify the relationship again/do
some raw database queries on the data/use the relationship to do
traversal or iteration over the contents (although one might argue
that's also a case for having a pre-remove).

Andrew

Russell Keith-Magee

unread,
Mar 27, 2010, 7:50:52 PM3/27/10
to django-d...@googlegroups.com
On Sun, Mar 28, 2010 at 2:52 AM, Andrew Badr <andre...@gmail.com> wrote:
> Isn't the overhead of a function call negligible compared to executing
> a database query or opening/closing a connection?

It is fairly small, especially if there are not signals attached. When
signals are attached, it's less trivial.

My concern here is that I don't want to make anything slower than it
has to be. If there's no point having a pre and post add signal, then
I don't want to make every m2m operations slower just because I can.

I suppose the underlying question is twofold:
* What is the 'right' signal sending behaviour, independent of the
cost of signals?
* Does the 'right' behaviour change if you account for the fact that
signals aren't free?

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Mar 27, 2010, 8:01:08 PM3/27/10
to django-d...@googlegroups.com

Cache invalidation is a reasonably compelling case for pre-signals; if
you invalidate a cache on the post-signal, there is a small window
between having modified the m2m and the cache being flushed. In that
window, any operation hitting the cache will see the m2m relations
still existing, but any operation that actually hits the database will
disagree. It's a really small edge case, but that's the class of
Heisenbug that is a pain to find in the wild.

However, as you note, any raw SQL activity will probably need to
happen in the post signal, so that's probably enough of a reason to
include pre and post signals.

Yours,
Russ Magee %-)

Andrew Godwin

unread,
Mar 27, 2010, 8:15:53 PM3/27/10
to django-d...@googlegroups.com
On 28/03/10 00:01, Russell Keith-Magee wrote:
> Cache invalidation is a reasonably compelling case for pre-signals; if
> you invalidate a cache on the post-signal, there is a small window
> between having modified the m2m and the cache being flushed. In that
> window, any operation hitting the cache will see the m2m relations
> still existing, but any operation that actually hits the database will
> disagree. It's a really small edge case, but that's the class of
> Heisenbug that is a pain to find in the wild.
>

Surely there's a Heisenbug for the opposite case, though; you invalidate
the cache, someone else hits a page, the M2M gets re-cached in its
current (still original) state, and then finally the M2M relation change
goes through. You now have an incorrect cache that's not going to be
invalidated any time soon.

> However, as you note, any raw SQL activity will probably need to
> happen in the post signal, so that's probably enough of a reason to
> include pre and post signals.
>

I'm strongly *for* including pre- and post- for all three; it just makes
sense to include these as options, considering they are elsewhere, and
I'm not convinced that the hit of an empty signal is that significant
(there's a function call, an assignment, and an if statement); as a
percentage of runtime, it seems very small (someone feel free to prove
me wrong, though!)

Andrew

Alex Gaynor

unread,
Mar 27, 2010, 8:19:48 PM3/27/10
to django-d...@googlegroups.com
On Sun, Mar 28, 2010 at 1:15 AM, Andrew Godwin <and...@aeracode.org> wrote:
> On 28/03/10 00:01, Russell Keith-Magee wrote:
>>
>> Cache invalidation is a reasonably compelling case for pre-signals; if
>> you invalidate a cache on the post-signal, there is a small window
>> between having modified the m2m and the cache being flushed. In that
>> window, any operation hitting the cache will see the m2m relations
>> still existing, but any operation that actually hits the database will
>> disagree. It's a really small edge case, but that's the class of
>> Heisenbug that is a pain to find in the wild.
>>
>
> Surely there's a Heisenbug for the opposite case, though; you invalidate the
> cache, someone else hits a page, the M2M gets re-cached in its current
> (still original) state, and then finally the M2M relation change goes
> through. You now have an incorrect cache that's not going to be invalidated
> any time soon.
>

Nope. On invalidation what you do is set the value explicitly to None
(which will force all future cache-getters to requery) and then use
cache.add() to set the cache value (cache.add won't set a value if
there's a value already, say an explicit None sentinel).

Welp, now we're off topic...

>> However, as you note, any raw SQL activity will probably need to
>> happen in the post signal, so that's probably enough of a reason to
>> include pre and post signals.
>>
>
> I'm strongly *for* including pre- and post- for all three; it just makes
> sense to include these as options, considering they are elsewhere, and I'm
> not convinced that the hit of an empty signal is that significant (there's a
> function call, an assignment, and an if statement); as a percentage of
> runtime, it seems very small (someone feel free to prove me wrong, though!)
>
> Andrew
>

> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-develop...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Firat Can Basarir

unread,
Mar 28, 2010, 9:44:03 AM3/28/10
to django-d...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

I was thinking hard to come up with an alternative and I have the following (I am not sure this is good API design but a single signal can handle both cases this way):

How about sending a pre-signal and including an object as a parameter which lets the user execute the m2m operation. With a single signal, you can do the post-signal stuff (if you need it) after executing the operation in your listener. If the operation is NOT executed after the signal is handled by the listener, the original dispatcher can execute it and continue without a post-signal.

Firat


Firat Can Basarir

unread,
Mar 28, 2010, 9:45:51 AM3/28/10
to django-d...@googlegroups.com
However I still believe, as a user of the framework, having 2 signals, before and after, are the best way to go. It is the most intuitive and clear way to provide this functionality.

George Vilches

unread,
Mar 28, 2010, 10:58:06 PM3/28/10
to django-d...@googlegroups.com

On Mar 27, 2010, at 1:08 PM, Russell Keith-Magee wrote:

> Option 3: We modify the existing signals so we have a pre-post pair
> for every signal. This maintains the analog with pre/post save, and
> gives the most control. For example, on Alex Gaynor has suggested to
> me that some people might want to use a pre-add signal rather than a
> post-add signal for cache invalidation since there is a marginally
> lower chance of getting a race condition. However, signals aren't free
> -- an unattached signal is roughly equivalent to the overhead of a
> function call.

I'm +1 for this. But, I'm also the person who proposed it on the ticket. :) I've had uses for pre- and post- on at least changed, and worked around the add/remove being on one side or the other. I'd happily eat the extra function call worth of overhead for the general case.

George

Reply all
Reply to author
Forward
0 new messages