Allowing models to influence QuerySet.update

106 views
Skip to first unread message

Paul Winkler

unread,
Nov 28, 2011, 1:28:44 PM11/28/11
to Django developers
There was an old thread about this at
https://groups.google.com/group/django-developers/browse_frm/thread/6e41ee7b08d50710/9cf6375d97bed499?lnk=gst&q=update+signal#9cf6375d97bed499
which fizzled out last summer with no conclusion.
(I'd reply to that thread but google groups apparently won't let me
reply to something that old.)

The ticket is at https://code.djangoproject.com/ticket/13021

I want this feature for much the same reason as Dennis: while there
were suggestions in the old thread about ways to avoid DRY in code
that is fully under your control, there has "not yet [been] a
suggestion that integrates with third party applications without
modifying them". A new signal would provide a clean, easy way to do
that - and would be easily understood and documented since it would
fit the existing pattern of pre_save, pre_delete, etc.

If this is going to be "wontfix", it'd be great to at least see that
finalized on the ticket, and maybe a note in the docs that
QuerySet.update() doesn't send any signals.

Thanks,

- PW

Luke Plant

unread,
Nov 29, 2011, 7:25:09 AM11/29/11
to django-d...@googlegroups.com

Sorry, I didn't see this message before I closed the ticket. I think the
reason Jacob gave still stands - that 'update' is meant to be closer to
SQL, and something you can rely on not to have performance side effects.
Having seen a lot of code recently with horrible performance due to
developers using post_save signals instead of adding the change in the
right place, I'm glad that update() is something I can rely to avoid
that mess.

(For example, they had a post_save signal on a model that did expensive
image processing of uploaded images. What they should have done was
override ModelAdmin.save() to call this image processing, since that was
the only place where the models image data could change, and there were
many other places that called save() on the model where there was no
possibility of the image files being changed).

I think if you need to add custom behaviour to an update() call in a
third party app, that app should provide an appropriate hook for it (in
a similar way to how the ModelAdmin allows you to override save() to
apply custom behaviour). Providing a generic hook for all update() calls
would be an anti-feature IMO.

The docs for QuerySet.update() already say that no post_save/pre_save
signals are emitted. Do we really need to add that no other signals are
emitted, when there aren't others that you would expect to be emitted?

Regards,

Luke

--
Noise proves nothing. Often a hen who has merely laid an egg
cackles as if she laid an asteroid.
-- Mark Twain

Luke Plant || http://lukeplant.me.uk/

Paul Winkler

unread,
Nov 29, 2011, 12:10:14 PM11/29/11
to Django developers
On Nov 29, 7:25 am, Luke Plant <L.Plant...@cantab.net> wrote:
> The docs for QuerySet.update() already say that no post_save/pre_save
> signals are emitted.

Mea culpa, somehow I missed that. Thanks for pointing it out, and
updating the ticket.

Hmm. I've been thinking of signals as key plug points for third-party
apps, as well as a way to avoid needing database-specific stored
procedures and triggers. Maybe I need to rethink that.

Luke Plant

unread,
Nov 29, 2011, 1:17:30 PM11/29/11
to django-d...@googlegroups.com
On 29/11/11 17:10, Paul Winkler wrote:

> Hmm. I've been thinking of signals as key plug points for third-party
> apps, as well as a way to avoid needing database-specific stored
> procedures and triggers. Maybe I need to rethink that.

Signals could be used for your use case, as an alternative to the method
I suggested - but it would be a *custom* signal provided by that app,
not one for QuerySet.update() itself.

I also noticed in db optimization docs that we have explicitly
documented update() and delete() as bypassing signals, and I think we
should honour that.

https://docs.djangoproject.com/en/dev/topics/db/optimization/#use-queryset-update-and-delete

Paul Winkler

unread,
Nov 29, 2011, 3:19:18 PM11/29/11
to Django developers
On Nov 29, 1:17 pm, Luke Plant <L.Plant...@cantab.net> wrote:
> On 29/11/11 17:10, Paul Winkler wrote:
>
> > Hmm. I've been thinking of signals as key plug points for third-party
> > apps, as well as a way to avoid needing database-specific stored
> > procedures and triggers. Maybe I need to rethink that.
>
> Signals could be used for your use case, as an alternative to the method
> I suggested - but it would be a *custom* signal provided by that app,
> not one for QuerySet.update() itself.

Right, but you can only do that when "that app" is your own code. Lack
of hooks like this is why I often find third-party apps unusable:
there is no way to override or wrap small pieces of behavior without
effectively rewriting the whole damn thing.
So it goes ... thanks for the input anyway.

- PW

Kääriäinen Anssi

unread,
Nov 29, 2011, 4:27:33 PM11/29/11
to django-d...@googlegroups.com
"""
I also noticed in db optimization docs that we have explicitly
documented update() and delete() as bypassing signals, and I think we
should honour that.

https://docs.djangoproject.com/en/dev/topics/db/optimization/#use-queryset-update-and-delete
"""

Is this correct for delete? A quick test (A1 is a model which I have hanging around - details about it aren't important):

from django.db.models.signals import post_delete

def foo(*args, **kwargs):
print args, kwargs
post_delete.connect(foo, sender=A1)

A1(dt=datetime.now()).save()
A1.objects.all().delete()

Result:
() {'instance': <A1: A1 object>, 'signal': <django.dispatch.dispatcher.Signal object at 0xa47434c>, 'sender': <class 'obj_creation_speed.models.A1'>, 'using': 'default'}

Search post_delete in django/db/models/deletion.py. Signals seem to be sent, even for cascaded deletion.

Personally I don't think post/pre instance changed signals are the way to go if you want to do auditing. DB triggers are much more reliable. Some problems with the Django signals:
- All operations do not send signals (bulk_create could easily send signals, the instances are available directly in that case, even bulk update could send signals per instance - first check if there is a listener, if there is, fetch all the updated instances and send signals, if there isn't, then don't fetch the instances. You will only pay the price when needed. Not saying this is a great idea, but maybe worth a thought).
- Proxy (including deferred models) and multitable-inherited models do not send signals as you would expect. I have groundwork for how to implement fast inherited signals in ticket #16679. The patch in that ticket also makes model __init__ much faster in certain common-enough cases. On the other hand, yet another cache.
- If you do anything outside Django Models (raw SQL, dbshell, another application accessing the DB) your auditing will not work.
- The DB triggers approach is faster.

The downside is that you will be programming DB specific triggers, in DB-specific language. Schema upgrades are a nightmare.

- Anssi Kääriäinen

Adrian Holovaty

unread,
Nov 29, 2011, 4:50:47 PM11/29/11
to django-d...@googlegroups.com
On Tue, Nov 29, 2011 at 3:27 PM, Kääriäinen Anssi
<anssi.ka...@thl.fi> wrote:
> Is this correct for delete? A quick test (A1 is a model which I have hanging around - details about it aren't important):
>
> from django.db.models.signals import post_delete
>
> def foo(*args, **kwargs):
>    print args, kwargs
> post_delete.connect(foo, sender=A1)
>
> A1(dt=datetime.now()).save()
> A1.objects.all().delete()
>
> Result:
> () {'instance': <A1: A1 object>, 'signal': <django.dispatch.dispatcher.Signal object at 0xa47434c>, 'sender': <class 'obj_creation_speed.models.A1'>, 'using': 'default'}
>
> Search post_delete in django/db/models/deletion.py. Signals seem to be sent, even for cascaded deletion.

Hmmmm, that is not ideal behavior. You mean QuerySet.delete() calls
the signal for each deleted object, rather than doing a delete at the
database level?

Adrian

Kääriäinen Anssi

unread,
Nov 29, 2011, 4:58:56 PM11/29/11
to django-d...@googlegroups.com
"""
Hmmmm, that is not ideal behavior. You mean QuerySet.delete() calls
the signal for each deleted object, rather than doing a delete at the
database level?
"""

This might not be exactly accurate, but I think it goes something like this:
- Fetch all the to-be deleted objects (one query)
- Check if there are cascades for those objects, fetch the cascades (one query per cascaded Model class?)
- Send pre_delete signals for all deleted instances
- Do the delete as one query for the to-be deleted objects, and then one query(?) per cascade Model class
- Send post_delete signals

Now, this is not that inefficient - but it would be a good optimization to NOT fetch the instances if there are no listeners for pre/post delete signals and there are no cascades (or all cascades are DO_NOTHING). Even if there are cascades, you could fetch just PKs of the to-be deleted models (even that is not actually needed, as you can use joins)

Again: I am not 100% sure how this behaves...

- Anssi

Carl Meyer

unread,
Nov 29, 2011, 5:31:08 PM11/29/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/29/2011 02:58 PM, K��ri�inen Anssi wrote:
> """
> Hmmmm, that is not ideal behavior. You mean QuerySet.delete() calls
> the signal for each deleted object, rather than doing a delete at the
> database level?
> """

These two things aren't mutually exclusive. The actual delete is still
done in one query per table, not one query per object. post_delete is
sent, but the delete() method on the instance is not called.

QuerySet.delete() has called the post_delete signal for each deleted
object ever since queryset-refactor was merged into trunk in Apr 2008,
at least. (I didn't bother trying to track it prior to that, it may well
go back much farther.)

Since the db-optimization doc that Luke linked was created in Jan 2010,
the claim there (that no model signals are sent by either
QuerySet.update() or QuerySet.delete()) has been inaccurate ever since
it was added.

> This might not be exactly accurate, but I think it goes something like this:
> - Fetch all the to-be deleted objects (one query)
> - Check if there are cascades for those objects, fetch the cascades (one query per cascaded Model class?)
> - Send pre_delete signals for all deleted instances
> - Do the delete as one query for the to-be deleted objects, and then one query(?) per cascade Model class
> - Send post_delete signals

Yes, this is an accurate summary. Prior to r14507, it did quite a few
more queries than that (one query per related model class per
to-be-deleted-object, rather than just one query per related model class).

> Now, this is not that inefficient - but it would be a good
> optimization to NOT fetch the instances if there are no listeners for
> pre/post delete signals and there are no cascades (or all cascades are
> DO_NOTHING). Even if there are cascades, you could fetch just PKs of the
> to-be deleted models (even that is not actually needed, as you can use
> joins)

Yes, these would be possible optimizations (in the no-listeners case).
They would break the admin's delete-confirmation screen, so we'd have to
be willing to accept a regression in functionality there, or else
duplicate the delete-cascade logic in the admin (leaving open more
possibilities for the delete-confirmation screen to not tell the truth),
in order to make these optimizations.

In the case of many cascade-deleted objects, bulk delete performance in
1.3 is already much better than it was in any previous Django release.
If something even closer to the metal is needed (besides the ability to
write raw SQL deletes), I'd prefer to see new raw-bulk-delete API added
rather than QuerySet.delete() made less functional in order to be
faster. I think there might even be a ticket for that already...

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7VXSwACgkQ8W4rlRKtE2eHugCfaC9OXnm4klDvrrdSUttZEknW
tVwAnRA2nqbcDjK3R4NUfnDb0GxmA1Vo
=jsy0
-----END PGP SIGNATURE-----

Anssi Kääriäinen

unread,
Nov 30, 2011, 1:32:20 AM11/30/11
to Django developers
On Nov 30, 12:31 am, Carl Meyer <c...@oddbird.net> wrote:
> > Now, this is not that inefficient - but it would be a good
> > optimization to NOT fetch the instances if there are no listeners for
> > pre/post delete signals and there are no cascades (or all cascades are
> > DO_NOTHING). Even if there are cascades, you could fetch just PKs of the
> > to-be deleted models (even that is not actually needed, as you can use
> > joins)
>
> Yes, these would be possible optimizations (in the no-listeners case).
> They would break the admin's delete-confirmation screen, so we'd have to
> be willing to accept a regression in functionality there, or else
> duplicate the delete-cascade logic in the admin (leaving open more
> possibilities for the delete-confirmation screen to not tell the truth),
> in order to make these optimizations.
>
> In the case of many cascade-deleted objects, bulk delete performance in
> 1.3 is already much better than it was in any previous Django release.
> If something even closer to the metal is needed (besides the ability to
> write raw SQL deletes), I'd prefer to see new raw-bulk-delete API added
> rather than QuerySet.delete() made less functional in order to be
> faster. I think there might even be a ticket for that already...

I should not have said that this would be a good optimization, and
used the possible optimizations wording. I haven't got any performance
problems when using delete in 1.3. So from my point
of view the above mentioned optimization is mostly a theoretical
possibility. I did not mean the 1.3 version of deletion is bad at all.

If dependent object collection could be made a public API (with the
possibility of fetching all dependencies to depth N, not just cascaded
dependencies), it would be a nice feature for sure. If just the
cascades part would be made public, it would be great, too. Showing
"these objects will be deleted" confirmation page is not a feature
only admin needs.

- Anssi

Carl Meyer

unread,
Nov 30, 2011, 3:12:47 PM11/30/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/29/2011 11:32 PM, Anssi K��ri�inen wrote:
> I should not have said that this would be a good optimization, and
> used the possible optimizations wording. I haven't got any performance
> problems when using delete in 1.3. So from my point
> of view the above mentioned optimization is mostly a theoretical
> possibility. I did not mean the 1.3 version of deletion is bad at all.

No worries, I wasn't offended, just trying to be clear about the
situation ;-)

> If dependent object collection could be made a public API (with the
> possibility of fetching all dependencies to depth N, not just cascaded
> dependencies), it would be a nice feature for sure. If just the
> cascades part would be made public, it would be great, too. Showing
> "these objects will be deleted" confirmation page is not a feature
> only admin needs.

True enough. Making to-be-deleted collection a public API would just
require some documentation, I think - the API is already there and the
admin uses it.

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7Wjj8ACgkQ8W4rlRKtE2fg5wCgqPcLgW0zr4yN8CkK+huEBoJf
YGoAn2x6TaO6Me6bb6yHK2v834LtkWIR
=nQ2M
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages