Re: [Django] #34597: Queryset (split) exclude's usage of Exists is significantly slower than subquery (was: Django ORM query SQL generation changed (and is apparently a lot slower))

2 views
Skip to first unread message

Django

unread,
May 28, 2023, 11:30:15 PM5/28/23
to django-...@googlegroups.com
#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Thank you for providing query plans, this is helpful.

While it's undeniable that in this particular case the query is slower,
[https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN
Postgres wiki itself advise against using NOT IN] so there is certainly
something else to blame here.

The first thing that is suspicious to me is that the new query is using
parallel workers which are know to be causing some slowdown when not
appropriately configured. For example, if running the first query gets you
close to your `work_mem` then the second one will definitely will cause
spills to disk and could explain the significant difference in execution
duration. What I see from the plan is that Postgres believes that it will
be faster to parallelize the query but when it tries to materialize the
results returned by workers [https://explain.depesz.com/s/KPXv#stats it
takes ages].

Tuning this parameter is [https://www.percona.com/blog/parallel-queries-
in-postgresql/ definitely out of scope for this ticket] as it dips into
user / DBA support territory but it'd be great to confirm that it's the
reason you are encountering this issue.

> I've managed to use your advice and modify the query to force the
subquery to use IN but that leads to synthax that is more complex, while
the django 2.2 synthax is a lot simpler: ~Q(translation=None)
>
> While not terrible I do think something has been broken in django since
the ~Q(translation=None) seems a lot cleaner and more ORMish.

I've never argue otherwise, it was solely meant as a way to unblock you
from upgrading from a Django version that has been unsupported for months
and that I assumed you were eager to move away from.

As for what ''ORMish'' means that is open for interpretation. I would
argue that `~Q(translation=None)` is extremely ambiguous given how `None`
doesn't map 1:1 to SQL NULL semantics that the objective of an ORM is to
abstract SQL operations to do the right thing most of the time while
providing escape hatches for the remaining cases. In this particular case
I've provided you a reasoning about why these changes were made so
hopefully it's clear to you why we need substantial evidence before
reverting these changes given they have already been part of two LTS
without prior reports of performance degradation.

In order to determine what might the origin your issue I'd ask you to run
the following queries while preceding them with a `SET
max_parallel_workers_per_gather = 0`

{{{#!sql
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
}}}

{{{#!sql
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_translation" U1
WHERE
U1."blog_id" = "test_app_blog"."id"
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Reply all
Reply to author
Forward
0 new messages