I have the following models configuration:
- `Parent` model
- `Child` model, with a `parent_id` foreign key to a `Parent` model, set
with `on_delete=models.SET_NULL`
Each `Parent` can have a lot of children, in my case roughly 30k.
I'm starting to encounter performance issues that make my jobs timeout,
because the SQL queries simply timeout.
I've enabled query logging, and noticed something weird (that is certainly
that way on purpose, but I don't understand why).
{{{
# Select the parent
SELECT * FROM "parent" WHERE "parent"."id" = 'parent123';
# Select all children
SELECT * FROM "children" WHERE "children"."parent_id" IN ('parent123');
# Update all children `parent_id` column to `NULL`
UPDATE "children" SET "parent_id" = NULL WHERE "children"."id" IN
('child1', 'child2', 'child3', ..., 'child30000');
# Finally delete the parent
DELETE FROM "parent" WHERE "parent"."id" IN ('parent123');
}}}
I would have expected the update condition to simply be `WHERE
"children"."parent_id" = 'parent123'`, but for some reason it isn't.
In the meantime, I'll switch to `on_delete=models.CASCADE`, which in my
case does the trick, but I was curious about the reason why this happens
in the first place.
Thanks in advance
--
Ticket URL: <https://code.djangoproject.com/ticket/33928>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted
Comment:
You are right that is an opportunity for an optimization when `SET`,
`SET_DEFAULT`, or `SET_NULL` is used but I wonder if it's worth doing
given `db_on_delete` support (see #21961) make things even better for this
use case.
> In the meantime, I'll switch to on_delete=models.CASCADE, which in my
case does the trick, but I was curious about the reason why this happens
in the first place.
This is likely the case because the collector is able to take
[https://github.com/django/django/blob/08688bd7dd8dfa218ecff03d6c94a753b1be8e59/django/db/models/deletion.py#L318-L320
the fast delete route] and avoid object fetching entirely.
If you want to give a shot at a patch you'll want to have a look at
`Collector.collect` and have it skip collection entirely when dealing with
`SET` and friends likely by adding a branch that turns them into ''fast''
updates.
[https://github.com/django/django/blob/08688bd7dd8dfa218ecff03d6c94a753b1be8e59/django/db/models/deletion.py#L500-L504
One branch that worries me is the post-deletion assignment of values to
in-memory instances] but I can't understand why this is even necessary
given all the instances that are collected for field updates are never
make their way out of the collector so I would expect it to be entirely
unnecessary.
You'll want to make sure to avoid breaking the
[https://github.com/django/django/blob/0dd29209091280ccf34e07c9468746c396b7778e/django/contrib/admin/utils.py#L164-L228
admin's collector subclass used to display deletion confirmation pages]
but from a quick look it doesn't seem to care about field updates.
Tentatively accepting but I think we should revisit when #21961 lands. In
the mean time if you want to give a shot at implementing this optimization
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:1>
* has_patch: 0 => 1
Comment:
Renan, lmk if [https://github.com/django/django/pull/15969 this PR] is
what you were expecting.
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:2>
* owner: nobody => Simon Charette
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:3>
* cc: David Wobrock (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:4>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:5>
* needs_better_patch: 1 => 0
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:6>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"a9be1dc5515e99ed96a2c640f5495af419c85b93" a9be1dc5]:
{{{
#!CommitTicketReference repository=""
revision="a9be1dc5515e99ed96a2c640f5495af419c85b93"
Refs #33928 -- Removed unnecessary attribute assignment on on-delete
updates.
Model instances retrieved for bulk field update purposes are not exposed
to the outside world and thus are not required to be kept update to
date.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:7>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"0701bb8e1f1771b36cdde45602ad377007e372b3" 0701bb8e]:
{{{
#!CommitTicketReference repository=""
revision="0701bb8e1f1771b36cdde45602ad377007e372b3"
Fixed #33928 -- Avoided unnecessary queries when cascade updating.
Models that use SET, SET_NULL, and SET_DEFAULT as on_delete handler
don't have to fetch objects for the sole purpose of passing them back to
a follow up UPDATE query filtered by the retrieved objects primary key.
This was achieved by flagging SET handlers as _lazy_ and having the
collector logic defer object collections until the last minute. This
should ensure that the rare cases where custom on_delete handlers are
defined remain uncalled when when dealing with an empty collection of
instances.
This reduces the number queries required to apply SET handlers from
2 to 1 where the remaining UPDATE use the same predicate as the non
removed SELECT query.
In a lot of ways this is similar to the fast-delete optimization that
was added in #18676 but for updates this time. The conditions only
happen to be simpler in this case because SET handlers are always
terminal. They never cascade to more deletes that can be combined.
Thanks Renan GEHAN for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33928#comment:8>