[Django] #35437: Exists subquery ignores ordering when it's distinct

14 views
Skip to first unread message

Django

unread,
May 6, 2024, 2:30:13 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin | Owner: nobody
Marsh |
Type: | Status: new
Uncategorized |
Component: Database | Version: dev
layer (models, ORM) | Keywords:
Severity: Normal | subquery,exists,ordering,distinct
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I ran into a bug in the `Exists` expression, I'm using it to do some
filtering where I just want the latest object grouped by a certain value
(eg. latest financial statement object for each company). See the patch
for a failing test, but using the `Manager`/`Employee` models from the
test suite you can see in the slightly contrived example that for an
`Exists` query like
{{{
# ... filtering by highest paid employee per manager
Exists(
Employee.objects.order_by("manager",
"-salary").distinct("manager").filter(pk=OuterRef("pk"))
)
# ...
}}}
Gets transformed into SQL (Postgresql) like this where the ordering has
been stripped out
{{{
-- ...
EXISTS(
SELECT DISTINCT ON (U0."manager_id")
1 AS "a"
FROM "expressions_employee" U0
WHERE U0."id" = ("expressions_employee"."id")
-- Missing ordering which is required for distinct
LIMIT 1
)
-- ...
}}}

----

Obviously we want to call `clear_ordering` most of the time on `Exists`
subqueries for performance reasons, but in this case we either shouldn't
clear them or loudly raise an exception saying that distinct `Exists`
queries are not supported
--
Ticket URL: <https://code.djangoproject.com/ticket/35437>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 6, 2024, 2:30:29 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin Marsh | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
subquery,exists,ordering,distinct | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Kevin Marsh):

* Attachment "test_ordering_in_distinct_exists_subquery.diff" added.

Django

unread,
May 6, 2024, 2:32:15 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin Marsh | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
subquery,exists,ordering,distinct | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Kevin Marsh):

* Attachment "test_ordering_in_distinct_exists_subquery.2.diff" added.

Failing test

Django

unread,
May 6, 2024, 2:32:34 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin Marsh | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
subquery,exists,ordering,distinct | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Kevin Marsh):

* Attachment "test_ordering_in_distinct_exists_subquery.diff" removed.

Django

unread,
May 6, 2024, 6:59:50 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin Marsh | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
subquery,exists,ordering,distinct |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted
* type: Uncategorized => Bug

Comment:

The report makes sense. We could go two ways here

1. Clear `distinct_fields` as well as it's unnecessary (we only care about
the existence of a single row after all)
2. Avoid
[https://github.com/django/django/blob/34a503162fe222033a1cd3249bccad014fcd1d20/django/db/models/sql/query.py#L651
forcing the removal of ordering] since the `clear_ordering` method
[https://github.com/django/django/blob/34a503162fe222033a1cd3249bccad014fcd1d20/django/db/models/sql/query.py#L2264
is a noop when distinct fields are defined].
--
Ticket URL: <https://code.djangoproject.com/ticket/35437#comment:1>

Django

unread,
May 6, 2024, 7:35:29 PMMay 6
to django-...@googlegroups.com
#35437: Exists subquery ignores ordering when it's distinct
-------------------------------------+-------------------------------------
Reporter: Kevin Marsh | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Accepted
subquery,exists,ordering,distinct |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)
* resolution: => needsinfo
* status: new => closed

Comment:

Actually something is off in your test here. Even if the ordering is not
cleared all employees will match because your reduce the number of row by
employee

e.g.

{{{#!sql
SELECT "expressions_employee"."id",
"expressions_employee"."firstname",
"expressions_employee"."lastname",
"expressions_employee"."salary",
"expressions_employee"."manager_id",
"expressions_employee"."based_in_eu"
FROM "expressions_employee"
WHERE (EXISTS
(SELECT DISTINCT ON (U0."manager_id") 1 AS "a"
FROM "expressions_employee" U0
WHERE (U0."manager_id" IS NOT NULL
AND U0."id" = ("expressions_employee"."id"))
ORDER BY U0."manager_id" ASC, U0."salary" DESC
LIMIT 1)
AND "expressions_employee"."manager_id" = 2)
LIMIT 21;
}}}

In other words, your distinct group is by `U0."id" =
("expressions_employee"."id")` so you'll always have matching rows even if
the ordering is maintained. Try it out for yourself with the following
patch

{{{#!diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index b3f130c0b4..07da7ae1a1 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -648,7 +648,7 @@ def exists(self, limit=True):
combined_query.exists(limit=False)
for combined_query in q.combined_queries
)
- q.clear_ordering(force=True)
+ q.clear_ordering()
if limit:
q.set_limits(high=1)
q.add_annotation(Value(1), "a")
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35437#comment:2>
Reply all
Reply to author
Forward
0 new messages