Re: [Django] #34597: Queryset (split) exclude's usage of Exists is significantly slower than subquery

9 views
Skip to first unread message

Django

unread,
May 29, 2023, 6:17:47 AM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Sanders):

The `MATERIALIZE` in the expensive plan indicates that Postgres doesn't
think it has enough memory to do something more efficient. (Sometimes
Postgres will choose something less optimal in order to materialise.)

Another thing to try is turning off materialisation to see what plan it
uses:

{{{
SET enable_material = 'off';
}}}

If it's able to use a more efficient plan then you could try adjusting
your memory settings/adding more RAM.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 29, 2023, 7:41:23 AM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by David Sanders):

* cc: David Sanders (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:5>

Django

unread,
May 29, 2023, 7:51:26 AM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Sanders):

Posting some investigation here for charettes & others:

I suspect the additional redundant condition in the inner query that
references the outer query is causing unwanted materialisation:

{{{
AND "test_app_blog"."id" = "test_app_blog"."id"
}}}

When I removed this – no more materialisation.

Additionally: Not sure why this is being added to the query but if you
remove `Q(is_published=True)` so that you simply have
`Blog.objects.filter(~Q(translation=None))` then this redundant condition
is no longer added.

I tested with the reporter's models on latest main & I'd wager this is
something we can/may want to address.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:6>

Django

unread,
May 29, 2023, 9:21:19 AM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Thanks for the investigation David!

The `AND "blog"."id" = "blog"."id"` is definitely something we want to
elide and was mentioned in comment:1 as a bug but the reporter said that
they were also getting poor results with the provided ORM examples that
explicitly don't have this criteria (e.g.
`~Exists(Blog.objects.filter(translation=None, id=OuterRef("id"))`) so I
suggest we wait to see how well the SQL examples (that don't have this
criteria) perform for the reporter.

In other words, the `AND "blog"."id" = "blog"."id"` part is definitely a
bug we want to solve but if we are going to address this issue it'd be
great to confirm it's the only factor to blame.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:7>

Django

unread,
May 29, 2023, 10:16:25 AM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Sanders):

> but the reporter said that they were also getting poor results with the
provided ORM examples that explicitly don't have this criteria (e.g.
`~Exists(Blog.objects.filter(translation=None, id=OuterRef("id"))`)

Interesting I tried that and the plan was better, though I don't have any
data to work with.

Am eager to hear what the results are.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:8>

Django

unread,
May 29, 2023, 12:24:38 PM5/29/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Lorand Varga):

Simon, I want to say that I really appreciate your support and patience on
this. If I've been too annoying in my comments, it was not on purpose -
I'm actually really curios if there is an underlying issue or not with
django (since there are a lot of interactions going on). I'm not insisting
for any revert unless needed and your arguments were solid and I'm very
grateful for your support.

Getting back to the needy greedy details, turning off the workers does not
improve anything.
The NOT(Exists):
{{{
SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC


Nested Loop Anti Join (cost=1.13..132780.57 rows=180586 width=1985)
(actual time=211403.273..3034890.176 rows=4 loops=1)
Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
test_app_blog.id))
Rows Removed by Join Filter: 16853800739
-> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual
time=1.080..1356.688 rows=179902 loops=1)
Filter: (is_published AND (NOT api_has_translation))
Rows Removed by Filter: 27456
-> Materialize (cost=0.71..22504.89 rows=1 width=4) (actual
time=0.002..9.083 rows=93684 loops=179902)
-> Merge Left Join (cost=0.71..22504.88 rows=1 width=4) (actual
time=0.016..140.731 rows=194102 loops=1)
Merge Cond: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 59794
-> Index Only Scan using test_app_blog_pkey on
test_app_blog u0 (cost=0.42..15170.42 rows=207267 width=4) (actual
time=0.007..67.559 rows=207358 loops=1)
Heap Fetches: 33518
-> Index Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..6077.43 rows=59809 width=8) (actual
time=0.005..30.515 rows=59794 loops=1)
Planning Time: 0.936 ms
Execution Time: 3034891.393 ms
(16 rows)
}}}

The Exists does work:
{{{
SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_translation" U1
WHERE
U1."blog_id" = "test_app_blog"."id"
LIMIT
1
)
)
ORDER BY
"test_app_blog"."updated_at" DESC

Sort (cost=21037.38..21066.08 rows=11479 width=22) (actual
time=63.818..63.820 rows=4 loops=1)
Sort Key: test_app_blog.updated_at DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1501.37..20263.31 rows=11479 width=22) (actual
time=17.974..63.805 rows=4 loops=1)
-> HashAggregate (cost=1500.95..1632.70 rows=13175 width=4)
(actual time=17.018..20.483 rows=13256 loops=1)
Group Key: u1.blog_id
Batches: 1 Memory Usage: 1425kB
-> Index Only Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..1351.42 rows=59809 width=4) (actual
time=0.007..6.721 rows=59794 loops=1)
Heap Fetches: 735
-> Index Scan using test_app_blog_pkey on test_app_blog
(cost=0.42..1.47 rows=1 width=22) (actual time=0.003..0.003 rows=0
loops=13256)
Index Cond: (id = u1.blog_id)
Filter: (is_published AND (NOT api_has_translation))
Rows Removed by Filter: 1
Planning Time: 0.703 ms
Execution Time: 63.872 ms
}}}

Also played with various values for work_mem and haven't seen any change
in the postgres planning.

I want to also mention that David is on to something.
{{{
SET enable_material='off';
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC;


Sort (cost=143578.39..144029.85 rows=180585 width=22) (actual
time=691.697..691.802 rows=4 loops=1)
Sort Key: test_app_blog.updated_at DESC
Sort Method: quicksort Memory: 25kB
-> Hash Anti Join (cost=21645.24..125693.23 rows=180585 width=22)
(actual time=306.930..691.769 rows=4 loops=1)
Hash Cond: (test_app_blog.id = u0.id)
-> Index Scan using test_app_blog_is_published_4b47c652_uniq on
test_app_blog (cost=0.42..101768.51 rows=180586 width=22) (actual
time=0.684..304.445 rows=179902 loops=1)
Index Cond: (is_published = true)
Filter: (NOT api_has_translation)
Rows Removed by Filter: 12043
-> Hash (cost=21644.81..21644.81 rows=1 width=4) (actual
time=283.265..283.369 rows=194102 loops=1)
Buckets: 262144 (originally 1024) Batches: 2 (originally
1) Memory Usage: 6145kB
-> Gather Merge (cost=1000.73..21644.81 rows=1 width=4)
(actual time=6.321..245.263 rows=194102 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merge Left Join (cost=0.71..20644.67 rows=1
width=4) (actual time=0.067..105.796 rows=64701 loops=3)
Merge Cond: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 19931
-> Parallel Index Only Scan using
test_app_blog_pkey on test_app_blog u0 (cost=0.42..13961.37 rows=86361
width=4) (actual time=0.030..40.520 rows=69119 loops=3)
Heap Fetches: 33632
-> Index Scan using
test_app_translation_51c6d5db on test_app_translation u1
(cost=0.29..6077.43 rows=59809 width=8) (actual time=0.032..42.470
rows=59627 loops=3)
Planning Time: 2.751 ms
Execution Time: 692.507 ms
}}}

Not sure yet why my postgres seems restricted (I remember it has "enough"
RAM) but will get back with a comment once I find out more.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:9>

Django

unread,
May 30, 2023, 12:54:04 AM5/30/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Thank you for your report Lorand and for providing all these details. I'm
also curious about what might be causing these slowdowns.

Until we figure out what might be to blame for the materialization issues
I started looking into the unnecessary `AND "blog"."id" = "blog"."id"`
generation by generating the exact change in terms of SQL generation that
8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1 resulting in.

The full result can be found
[https://gist.github.com/charettes/5256c4d52470a6f48f8955a92bce2135#file-
ticket-34597-sqlite-diff-L1010-L1025 in this gist] and the issue manifests
itself in two tests
1. [https://gist.github.com/charettes/5256c4d52470a6f48f8955a92bce2135
#file-ticket-34597-sqlite-diff-L1010-L1025
queries.tests.TestTicket24605.test_ticket_24605]
2. [https://gist.github.com/charettes/5256c4d52470a6f48f8955a92bce2135
#file-ticket-34597-sqlite-diff-L263-L280
queries.tests.ExcludeTests.test_ticket14511]

I should have some time tomorrow to dig into the issue further.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:10>

Django

unread,
May 31, 2023, 12:19:27 AM5/31/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

[https://github.com/django/django/pull/16906 I drafted a PR for the
harmful constant expression comparison] in the mean time and I'll see if
there is a way to do saner `NULL` handling so no `LEFT JOIN` for the sole
purpose of doing an `IS NULL` is necessary. I think it's optimizable when
detecting a `filter_rhs is None`.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:11>

Django

unread,
May 31, 2023, 4:24:31 AM5/31/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Unreviewed => Accepted


Comment:

Tentatively accepted.

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:12>

Django

unread,
Jun 14, 2023, 6:52:23 AM6/14/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"1c4f5f314e2b0c77b3fa0c75f703218e7e06f4be" 1c4f5f3]:
{{{
#!CommitTicketReference repository=""
revision="1c4f5f314e2b0c77b3fa0c75f703218e7e06f4be"
Refs #32143 -- Removed superflous constraints on excluded query.

The outer query reference is not necessary when alias can be reused and
can even be harmful by confusing query planers.

Refs #34597.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:13>

Django

unread,
Feb 8, 2024, 10:12:21 PMFeb 8
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

Had
[https://old.reddit.com/r/PostgreSQL/comments/1am2e8x/whats_new_in_the_postgres_16_query_planner/
a discussion with Postgres major contributor about this very issue] and
[https://techcommunity.microsoft.com/t5/azure-database-for-
postgresql/what-s-new-in-the-postgres-16-query-planner-
optimizer/ba-p/4051828#right-anti-join a query planer optimization in
Postgres 16+] that might make the `EXISTS` solution worth investigating.
--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:14>

Reply all
Reply to author
Forward
0 new messages