Re: [Django] #35865: Queryset aggregation keeps unnecessary SQL joins (was: Query.get_count() keeps unnecessary SQL joins)

8 views
Skip to first unread message

Django

unread,
Oct 25, 2024, 9:32:06 AM10/25/24
to django-...@googlegroups.com
#35865: Queryset aggregation keeps unnecessary SQL joins
-------------------------------------+-------------------------------------
Reporter: Ruslan | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* easy: 1 => 0
* stage: Unreviewed => Accepted
* summary: Query.get_count() keeps unnecessary SQL joins => Queryset
aggregation keeps unnecessary SQL joins

Comment:

To give you a bit of context here the ORM use to not prune unused
annotations before Django 4.2 (#28477) and the lack post annotation
pruning left-over `JOIN` pruning was identified as
[https://github.com/django/django/pull/16263#issuecomment-1311134646 a
potential optimization at the time].

To give a concrete example say you do

{{{#!python
Book.objects.annotate(
author_name=Concat("author__first_name", V(" "), "author_last_name"),
).count()
}}}

then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL
would have been

{{{#!sql
SELECT COUNT(*) FROM (
SELECT book.id, (author.first_name || ' ' || author.last_name)
author_name
FROM book
LEFT JOIN author ON (book.author_id = author.id)
)
}}}

and after it is

{{{#!sql
SELECT COUNT(*)
FROM book
LEFT JOIN author ON (book.author_id = author.id)
}}}

Now obviously in this case the M:1 join against author is not necessary in
this case but it's not always trivial to determine. Take the following
example

{{{#!python
author_qs = Author.objects.annotate(
book_title=F("books__title")
)
author_qs.count()
}}}

which results in

{{{#!sql
SELECT COUNT(*)
FROM author
LEFT JOIN book ON (book.author_id = author.id)
}}}

Then in this case we can't prune the 1:M join as it's multi-valued
(possibly many books for each author) and would return a different value
from `len(author_qs)`.

The problem then becomes that JOINs can be only be pruned if these two
conditions are met

1. They are not referenced anymore (could be done by decrementing
reference counts on annotation pruning)
2. They are not involved in multi-valued relationships (AKA many-to-many
or reverse many-to-one)

I'm tentatively accepting as this is an already identified desired
optimization but it is **far** from being an easy picking, it's in the
realm of close to wont-fix **very hard** to do correctly.
--
Ticket URL: <https://code.djangoproject.com/ticket/35865#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Reply all
Reply to author
Forward
0 new messages