The idea is that pre_update listeners get a queryset that isn't executed.
Accessing that queryset might be costly, but if it isn't accessed, there
isn't much cost of adding a pre_update signals. For post_update the signal
handler would be given a list of PK values (the original queryset doesn't
work for post_update, the update might cause different instances to be
returned than was updated, consider
qs.filter(deleted=False).update(deleted=True))
This feature will also provide an upgrade path for #21169 (for FK
RelatedManager.remove() more specifically).
--
Ticket URL: <https://code.djangoproject.com/ticket/21461>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => assigned
* needs_better_patch: => 1
* owner: nobody => loic84
* needs_tests: => 0
* needs_docs: => 1
Comment:
WIP https://github.com/django/django/pull/1932.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:1>
* needs_better_patch: 1 => 0
* needs_docs: 1 => 0
Comment:
Now with docs, this should be ready for review.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:2>
Comment (by charettes):
IMO, until we add support for `RETURNING` (or equivalent on other
backends) we should try to mimic the way delete works, that is.
If no signals are attached to `(pre|post)_update` we do a ''fast update''.
If it's not the case we fetch the pk of the objects we're about to update
and use the retrieved list to:
1. Build the queryset passed to `pre_update` receivers (`pk__in=pk_list`);
2. Issue the update query (`WHERE pk IN pk_list`);
3. Pass the `pk_list` to `post_update` receivers (actually this could also
be a queryset built with `pk__in` instead of the `pk_list` itself).
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:3>
Comment (by akaariai):
I am not a big fan of first fetching pk_list, it suffers from these
problems:
- `pk__in=hugelist` is problematic (SQLite doesn't work if you have more
than 1000 items in the list, PostgreSQL performs badly)
- one extra query is needed
- there is still no guarantee that the update is race free. The races
would be different. Consider case
`qs.filter(count__lte=0).update(count=F('count') + 1)` - two threads
executing this statement in parallel could end up updating count from -1
to 1 while that isn't currently possible. In my opinion connecting to
pre/post_update altering what .update() does is worse than pre/post update
signals not seeing the exact same set of objects that were in fact
updated. Notably if deletes happen the latter problem is still there for
altered pre/post_update.
- when we add support for RETURNING we can't optimize the implementation
of pre_update
The reason why I voted for pk_list to post_update is that if the updated
pk_list is large, the user doesn't need to execute `pk__in=hugelist`
query. The query can be split to parts, or maybe the pk list is all that
is needed. OTOH I can see how queryset would make the API more consistent.
Maybe pass both?
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:4>
Comment (by loic84):
Latest attempt:
- Document that there is no guarantee that the `queryset` arg for
`pre_update` will exactly return the updated rows because of concurrency.
- Ensure that `pk_set` for `post_update` is the exact set of updated
models.
- Use chunked updates on SQLite when there are `post_update` signals.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:5>
* cc: charettes (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:6>
Comment (by akaariai):
The latest solution is close to what charettes suggested in comment:3. The
biggest distinctions are that pre_update is given the to-be updated
queryset instead of to-be-updated queryset.filter(pk_in=pk_set), and for
post_update you don't get queryset, you just get pk_set.
The reason for going with pk_set is that in some cases the only thing you
need is the set of primary keys, and if that happens to be a large set,
then doing pk_in=pk_set query will be expensive (PostgreSQL doesn't like
this type of query at all, and SQLite will not even execute a query with
more than 1000 pk values.
A possible solution is to provide both the queryset (with pk_in=pk_set
filtering pre-applied), *and* the pk_set to pre_/post_ update signals.
That way the user has nice API if that is all that is needed, but can also
just use the pk_set if that is possible.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:7>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:8>
* cc: shai (added)
Comment:
Not sure about this, but from a short look at the code it seems like the
pk_set is calculated before the update. That makes perfect sense, but
breaks a little if the update changes pk values. I don't see any sensible
behavior around this, but it should probably be documented.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:9>
Comment (by loic84):
@shai you are totally right, the limitation on changing pk is something we
already discussed but I agree it needs to be documented.
At this point @akaariai and I are not sure about how to proceed with this
patch. Do we only introduce `post_signal`? Is there value in the
`pre_signal` as currently implemented? Are there enough people who want
update signals at all?
Opinion welcome.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:10>
Comment (by bendavis78):
I would love to have an update signal for the purpose of cache
invalidation. Currently I use save/delete signals to detect when something
has changed to update an expensive calculation. While `pk_set` idea solves
some problems, it's not ideal for large result sets and has unpredictable
consequences.
For example, let's say I don't care about the primary keys of objects that
were updated, but rather the objects related to those that were updated. I
could run a much less expensive query by doing, for example,
`queryset.distinct().values_list('related_id')`.
Personally I feel it's better to allow the user be explicit in what is
passed around. I would propose that `pre_update` accept the queryset, and
users have the option to return a dict of extra data that will be
available to `post_update` receivers. For example:
{{{
#!python
@receiver(pre_update, sender=MyModel)
def on_pre_update(sender, update_fields, queryset, **kwargs):
return {
'related_pks': queryset.distinct().values_list('relatedmodel_id')
}
@receiver(post_update, sender=MyModel)
def on_post_update(sender, update_feilds, extra_data, **kwargs):
related_pks = extra_data.get('related_pks', [])
for obj in RelatedModel.objects.filter(pk__in=related_pks):
obj.do_something()
}}}
The implementation if `QuerySet.update()` would look like this:
{{{
#!python
def update(self, **kwargs):
"""
Updates all elements in the current QuerySet, setting all the given
fields to the appropriate values.
"""
assert self.query.can_filter(), \
"Cannot update a query once a slice has been taken."
meta = self.model._meta
self._for_write = True
query = self.query.clone(sql.UpdateQuery)
query.add_update_values(kwargs)
extra_data = {}
if not meta.auto_created:
responses = signals.pre_update.send(
sender=self.model, update_fields=kwargs, queryset=self,
using=self.db)
for rcvr, response in responses:
extra_data.update(response)
with transaction.atomic(using=self.db, savepoint=False):
rows = query.get_compiler(self.db).execute_sql(CURSOR)
self._result_cache = None
if not meta.auto_created:
signals.post_update.send(
sender=self.model, update_fields=kwargs,
extra_data=extra_data, using=self.db)
return rows
}}}
It's a bit cleaner and addresses any performance concerns, as well as the
issue with updated pks. Regarding comment:10, I think there use cases for
both `pre_update` and `post_update`, either together or alone.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:11>
Comment (by bendavis78):
Just created a PR for my above suggestion, with updated docs. I'm assuming
this would now be targeted for 1.8 since we missed the feature freeze.
PR: https://github.com/django/django/pull/2619
Commit:
https://github.com/bendavis78/django/commit/62a9ea61639a4a6ba9fde0eabc9f546db0a40de8
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:12>
Comment (by shai):
While the idea in comment:11 is interesting, I think the current
suggestion is lacking.
First of all, returning values from signals breaks with well-established
assumptions, and I wouldn't haste to change that. Further, the suggested
code assumes without checking that the returned values can be used to
update a dictionary, and worse: Collapses all returned values into one
dictionary, allowing different signal handlers to step on each other's
toes with no way to find out about it.
I think that the general direction of letting the user specify which data
is received has merit, but this should be implemented via the registration
API -- that is, either defining different signals ('pre_update',
'pre_update_with_qset', 'pre_update_with_pks') or specifying different
senders (something along the lines of `@receiver(pre_update,
sender=(MyModel, pre_update.WITH_QSET))`). Then, update can calculate the
required data only if it is, in fact, requested.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:13>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:14>
* cc: bendavis78 (added)
Comment:
Yeah, you're right, using the dict is a messy way to solve the problem. I
guess idea was to put the onus of "type" handling on the pre_save &
post_save. Ideally the update() method would be agnostic regarding the
payloads being passed between the signals. I think you've got the right
idea with using `sender`, though I don't think the API should be
prescribing those types (ie pks vs querysets).
Not sure what the right answer is at this point -- maybe someone will come
up with a more elegant approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:15>
* cc: Sardorbek Imomaliev (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:16>
* cc: Olivier Dalang (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:17>
Comment (by Olivier Dalang):
For reference, here's a package implementing this :
https://github.com/martinphellwig/django-query-signals
It includes also other bulk methods such as delete or bulk_create.
From the code, it looks like it suffers from edge case described above
(`the original queryset doesn't work for post_update, the update might
cause different instances to be returned than was updated, consider
qs.filter(deleted=False).update(deleted=True)`), but nothing unfixable it
seems.
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:18>
* owner: loic84 => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/21461#comment:19>