Allowing models to influence QuerySet.update

28 views
Skip to first unread message

Dennis Kaarsemaker

unread,
Jun 16, 2010, 8:38:17 AM6/16/10
to django-d...@googlegroups.com
I know that queryset.update does not call .save() for each instance and
I know why. I even agree with it :)

However, I'd like to have a bit more control over what update() does,
e.g. for auditing purposes. I know that I can simply write a few lines
of code near every update() call to do what I want, but that violates
DRY.

I have created ticket #13021[1] and created a github branch[2] that
provides the following:

* Two new signals: pre_update and post_update
The argument for these signals is the queryset and a dict of all
changes that will be or have been applied to the objects in that set.

* An update() method for Model.
This method is a classmethod because it does not deal with instances
but querysets. It calls pre_update on all of its fields.

* A pre_update method for Field that does nothing
Simply exists so that it doesn't have to be checked for

* A pre_update method for DateField for auto_now magic
This method adds an update for the relevant field to the dict of
updates. This means that auto_now really is auto_now, also when
using update()

* A patched QuerySet.update() to use the above features.
- It sends the pre_update and post_update signals
- It calls the models update method to update the kwargs dict

* Tests and documentation for these features

I've sent this proposal a few months ago and have kept the patch up to
date since. It still applies (I've just rebased it to trunk) and the
tests still pass.

Thanks in advance for considering and commenting,

[1] http://code.djangoproject.com/ticket/13021
[2] http://github.com/seveas/django/tree/seveas/ticket13021

--
Dennis K.

The universe tends towards maximum irony. Don't push it.

Dennis Kaarsemaker

unread,
Jul 28, 2010, 12:37:05 PM7/28/10
to django-d...@googlegroups.com
Is nobody interested?

I'm using this in production for auditing purposes and works just fine.
If only it were built in, then I wouldn't have to monkeypatch :)

I've rebased it to trunk again and it still works.

--
Dennis K.

They've gone to plaid!

Jacob Kaplan-Moss

unread,
Jul 28, 2010, 1:02:07 PM7/28/10
to django-d...@googlegroups.com
Hi Dennis --

I'm not totally thrilled with this proposal, though perhaps there's
some points I just don't get. As it is, though, I'm -0 on the idea.
update() is supposed to be an optimization that's "close to the metal"
of the database; adding potentially-hidden slow code to the mix rubs
me the wrong way.

In general, signals are a very dangerous thing: they make side-effects
non-obvious, and make it hard to discover what's going to be called
when. They also add a not-insignificant overhead. I'm of the opinion
that they should be avoided except where critically important, and I
don't see this to be one of those cases.

On Wed, Jun 16, 2010 at 7:38 AM, Dennis Kaarsemaker
<den...@kaarsemaker.net> wrote:
> However, I'd like to have a bit more control over what update() does,
> e.g. for auditing purposes. I know that I can simply write a few lines
> of code near every update() call to do what I want, but that violates
> DRY.

I think you may have missed a few possibilities in your search for a
DRY solution to the problem. If I was trying to add some audibility
around QuerySet.update() I would:

1. Define a QuerySet subclass that extended update() to provide
whatever hooks I needed (this could be a signal, perhaps, or just the
code).

2. Make a Manager subclass that returned by QuerySet object from
get_query_set().

3. Use said Manager (or a subclass) on each model that I want the behavior on.

This gives me control over *where* my added behavior happens, and it's
completely clear to anyone reading the code that I've got a custom
Manager; they can follow the trail to the update behavior.

> * An update() method for Model.
>  This method is a classmethod because it does not deal with instances
>  but querysets. It calls pre_update on all of its fields.
>
> * A pre_update method for Field that does nothing
>  Simply exists so that it doesn't have to be checked for
>
> * A pre_update method for DateField for auto_now magic
>  This method adds an update for the relevant field to the dict of
>  updates. This means that auto_now really is auto_now, also when
>  using update()

This is the part that I *really* don't understand -- why do you need
all the extra mechanism? You've now added (num_fields *
num_objects_in_queryset) function calls to each update() call...

So yeah, I'm pretty against this proposal, *especially* as implemented.

----

Now, all that said, there's a kernel here I'm interested in exploring.
If you follow the steps I outline above, you could pretty easily make
a reusable app that provided QuerySet signals, but that app would only
work with apps that explicitly depended upon it. That is, if I wanted
to receive update() signals from auth.User, I'd be SOL.

This dovetails with an idea I had recently: I think it's probably
about time we added a mechanism for apps to contribute methods or
behavior to QuerySets. I haven't thought it out fully, but I think
that the best way for you to proceed might be to consider the greater
question of how we might go about allowing project-wide QuerySet
extensions.

Jacob

Ian Clelland

unread,
Jul 28, 2010, 1:03:18 PM7/28/10
to django-d...@googlegroups.com
On Wed, Jul 28, 2010 at 9:37 AM, Dennis Kaarsemaker <den...@kaarsemaker.net> wrote:
Is nobody interested?

I'm using this in production for auditing purposes and works just fine.
If only it were built in, then I wouldn't have to monkeypatch :)

I've rebased it to trunk again and it still works.

On wo, 2010-06-16 at 14:38 +0200, Dennis Kaarsemaker wrote:
> I know that queryset.update does not call .save() for each instance and
> I know why. I even agree with it :)
>
> However, I'd like to have a bit more control over what update() does,
> e.g. for auditing purposes. I know that I can simply write a few lines
> of code near every update() call to do what I want, but that violates
> DRY.
>

From what I've seen, you're probably going to encounter some resistance to any proposal that includes adding new signals to the core, especially if they are never going to be used by the vast majority of developers, and especially if there is any workaround possible.

In this specific case, is this not a problem that can be solved by a custom manager on the specific models that need this functionality? That way you don't RY so much:

class ControlledUpdateManager(django.db.models.Manager):
    def update(self, *args, **kwargs):
        # pre-update code
        super(ControlledUpdateManager,self).update(*args, **kwargs)
        # post-update code

class MyModel(django.db.models.Model):
    ...
    objects = ControlledUpdateManager()
 
You can apply that manager to any models that need the same pre- and post- update code, or subclass from it, or use it as a mixin as necessary.

--
Regards,
Ian Clelland
<clel...@gmail.com>

Dennis Kaarsemaker

unread,
Jul 28, 2010, 1:53:53 PM7/28/10
to django-d...@googlegroups.com

This would add boilerplate to each class and makes it nontrivial to do
update() tracking for third party applications. I don't see why pre/post
update signals are worse than pre/post save and delete. Signals itself
are pretty lightweight (two function calls per update()) and are
separate from the rest of the proposal.

> This gives me control over *where* my added behavior happens, and it's
> completely clear to anyone reading the code that I've got a custom
> Manager; they can follow the trail to the update behavior.
>
> > * An update() method for Model.
> > This method is a classmethod because it does not deal with instances
> > but querysets. It calls pre_update on all of its fields.
> >
> > * A pre_update method for Field that does nothing
> > Simply exists so that it doesn't have to be checked for
> >
> > * A pre_update method for DateField for auto_now magic
> > This method adds an update for the relevant field to the dict of
> > updates. This means that auto_now really is auto_now, also when
> > using update()
>
> This is the part that I *really* don't understand -- why do you need
> all the extra mechanism? You've now added (num_fields *
> num_objects_in_queryset) function calls to each update() call...

For making auto_now and similar functions work with update(). Currently
auto_now fails in the face of update(), which is not documented. I chose
to make auto_now work but could live with the current status being
better documented and not having the above functionality. I could simply
implement it for my projects as a listener for the pre_update signal :)

> So yeah, I'm pretty against this proposal, *especially* as implemented.

Are you against just the signals too (so, only the first of the four
patches)? Your negative reactions seem to concern mostly the additions
to Model and Field.

> Now, all that said, there's a kernel here I'm interested in exploring.
> If you follow the steps I outline above, you could pretty easily make
> a reusable app that provided QuerySet signals, but that app would only
> work with apps that explicitly depended upon it. That is, if I wanted
> to receive update() signals from auth.User, I'd be SOL.

This is why I think a pre/post update signal is a good idea. It is the
same sort of signal as the _save and _delete signals which django has
had for a long time.

> This dovetails with an idea I had recently: I think it's probably
> about time we added a mechanism for apps to contribute methods or
> behavior to QuerySets. I haven't thought it out fully, but I think
> that the best way for you to proceed might be to consider the greater
> question of how we might go about allowing project-wide QuerySet
> extensions.

Oh, that's a neat idea. Quick brainstorm:

- Having a settings.DEFAULT_MANAGER / settings.DEFAULT_QUERYSET setting
so people can subclass QuerySet and use it as default
Pro: Should be fairly easy to implement on the django side
Con: Existing managers might need to be changed to subclass
settings.DEFAULT_MANAGER instead of the current parent
- Have an api to add functionality to the manager/queryset
Pro: No rewriting of existing code needed
Con: Feels like monkeypatching

I don't yet know what I would prefer and will look for other ways as I
don't fully like either of the above.

Jacob Kaplan-Moss

unread,
Jul 28, 2010, 2:43:16 PM7/28/10
to django-d...@googlegroups.com
On Wed, Jul 28, 2010 at 12:53 PM, Dennis Kaarsemaker
<den...@kaarsemaker.net> wrote:
> This would add boilerplate to each class and makes it nontrivial to do
> update() tracking for third party applications. I don't see why pre/post
> update signals are worse than pre/post save and delete.

Because the save/delete signals get called once when a single object
is saved/deleted. An update signal would have to be called once for
each object in the queryset. That is, if `SomeModel.objects.all()`
returns a thousand results, `SomeModel.objects.update(foo='bar')`
would have to call the update signals one thousand times.

This is why I refer to update() as an optimization. If you must have
Python-side behavior during a bulk update, iterate over the queryset
and call save. If you want performance, do it in the database. There
isn't a middle ground.

> Signals itself
> are pretty lightweight (two function calls per update()) and are
> separate from the rest of the proposal.

Doesn't matter how lightweight they are: cheap != free. We've
benchmarked this before, and you're right that it's small. But
anything that could potentially be called O(N) times needs to be very
carefully considered here.

> For making auto_now and similar functions work with update(). Currently
> auto_now fails in the face of update(), which is not documented.

Well, that's a documentation bug, then.

FTR, auto_now and friends have been close to being deprecated for
quite a while -- they're impossible to do easily and cheaply, and
(again) are better done in-database. I'll probably start taking the
steps to marking them deprecated in 1.3.

> Are you against just the signals too (so, only the first of the four
> patches)? Your negative reactions seem to concern mostly the additions
> to Model and Field.

Yes.

Jacob

Dennis Kaarsemaker

unread,
Jul 28, 2010, 3:08:12 PM7/28/10
to django-d...@googlegroups.com
On wo, 2010-07-28 at 13:43 -0500, Jacob Kaplan-Moss wrote:
> On Wed, Jul 28, 2010 at 12:53 PM, Dennis Kaarsemaker
> <den...@kaarsemaker.net> wrote:
> > This would add boilerplate to each class and makes it nontrivial to do
> > update() tracking for third party applications. I don't see why pre/post
> > update signals are worse than pre/post save and delete.
>
> Because the save/delete signals get called once when a single object
> is saved/deleted. An update signal would have to be called once for
> each object in the queryset. That is, if `SomeModel.objects.all()`
> returns a thousand results, `SomeModel.objects.update(foo='bar')`
> would have to call the update signals one thousand times.
>
> This is why I refer to update() as an optimization. If you must have
> Python-side behavior during a bulk update, iterate over the queryset
> and call save. If you want performance, do it in the database. There
> isn't a middle ground.

As implemented in my github branch it is called once (well, twice, pre
and post) per update() statement, not once per object.

> > Signals itself
> > are pretty lightweight (two function calls per update()) and are
> > separate from the rest of the proposal.
>
> Doesn't matter how lightweight they are: cheap != free. We've
> benchmarked this before, and you're right that it's small. But
> anything that could potentially be called O(N) times needs to be very
> carefully considered here.

If it were called N times, i'd agree. But it is not.

> > For making auto_now and similar functions work with update(). Currently
> > auto_now fails in the face of update(), which is not documented.
>
> Well, that's a documentation bug, then.
>
> FTR, auto_now and friends have been close to being deprecated for
> quite a while -- they're impossible to do easily and cheaply, and
> (again) are better done in-database. I'll probably start taking the
> steps to marking them deprecated in 1.3.

Ah... that's unexpected. I presume that the deprecation will be followed
by documentation on how to set this up in databases that support them?

Jacob Kaplan-Moss

unread,
Jul 28, 2010, 3:20:46 PM7/28/10
to django-d...@googlegroups.com
On Wed, Jul 28, 2010 at 2:08 PM, Dennis Kaarsemaker
<den...@kaarsemaker.net> wrote:
> As implemented in my github branch it is called once (well, twice, pre
> and post) per update() statement, not once per object.

Okay, I missed that -- sorry.

Doesn't really change how I feel about the feature, though: I don't
see the point in needlessly complicating what's supposed to be a very
thin wrapper around an UPDATE call. I haven't really heard a good
argument *for* the feature -- "I want it" isn't a great argument, and
there's already been a couple of suggestions to otherwise achieve the
feature.

Jacob

Dennis Kaarsemaker

unread,
Jul 28, 2010, 3:35:58 PM7/28/10
to django-d...@googlegroups.com

But not yet a suggestion that integrates with third party applications
without modifying them, which is one of the reasons I implemented this.
(The other reasons basically come down to "i want it" as complete
auditing, including update() is a requirement I have for a few projects)

On the other hand, I see no reasons not to include the feature as it
doesn't get in the way, is useful and comes with documentation and
tests. It doesn't make update() more complicated to use and four lines
of easy to understand code (2 for defining the signals and 2 for
sending) are not difficult to understand either for people maintaining
the code.

Jeremy Dunck

unread,
Jul 28, 2010, 3:46:04 PM7/28/10
to django-d...@googlegroups.com

A bit of a seque, but I think this is a useful way to DRY this common need:
http://djangosnippets.org/snippets/2117/

Jeremy Dunck

unread,
Jul 28, 2010, 3:51:57 PM7/28/10
to django-d...@googlegroups.com
On Wed, Jul 28, 2010 at 2:46 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
...

> A bit of a seque, but I think this is a useful way to DRY this common need:
> http://djangosnippets.org/snippets/2117/

segue, but non sequitur is what I meant in any case. :-)

Reply all
Reply to author
Forward
0 new messages