[Django] #36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs

9 views
Skip to first unread message

Django

unread,
Aug 14, 2025, 12:53:58 PMAug 14
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Type:
| Uncategorized
Status: new | Component: Database
| layer (models, ORM)
Version: 5.0 | Severity: Normal
Keywords: queryset filter | Triage Stage:
regression | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
We stumbled upon this issue while upgrading a large code base from Django
4.x to 5.0.

A reasonably simplified (probably not the most trivial) reproduction case
looks like this:


{{{
Alpha.objects.filter(
bravos__charlie_bravos__charlie=9999,
)
.filter(
Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
)
)
)
.values_list("pk")
}}}

In Django 4 the generated query is:


{{{
SELECT "example_alpha"."id"
FROM "example_alpha"
INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" =
"example_bravo_alphas"."alpha_id")
INNER JOIN "example_bravo" ON
("example_bravo_alphas"."bravo_id" = "example_bravo"."id")
INNER JOIN "example_charliebravo" ON ("example_bravo"."id" =
"example_charliebravo"."bravo_id")
WHERE ("example_charliebravo"."charlie_id" = 9999
AND EXISTS
(SELECT 1 AS "a"
FROM "example_delta" U0
INNER JOIN "example_delta_bravos" U1 ON (U0."id" =
U1."delta_id")
WHERE U1."bravo_id" =
("example_charliebravo"."bravo_id")
LIMIT 1))
}}}


In Django 5:


{{{
SELECT "example_alpha"."id"
FROM "example_alpha"
INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" =
"example_bravo_alphas"."alpha_id")
INNER JOIN "example_bravo" ON ("example_bravo_alphas"."bravo_id" =
"example_bravo"."id")
INNER JOIN "example_charliebravo" ON ("example_bravo"."id" =
"example_charliebravo"."bravo_id")
LEFT OUTER JOIN "example_bravo_alphas" T6 ON ("example_alpha"."id" =
T6."alpha_id")
LEFT OUTER JOIN "example_bravo" T7 ON (T6."bravo_id" = T7."id")
LEFT OUTER JOIN "example_charliebravo" T8 ON (T7."id" = T8."bravo_id")
WHERE ("example_charliebravo"."charlie_id" = 9999
AND EXISTS
(SELECT 1 AS "a"
FROM "example_delta" U0
INNER JOIN "example_delta_bravos" U1 ON (U0."id" =
U1."delta_id")
WHERE U1."bravo_id" = (T8."bravo_id")
LIMIT 1))
}}}


Interestingly, this QuerySet generates the expected query both in 4.x and
5.0:


{{{
Alpha.objects
.filter(
Q(bravos__charlie_bravos__charlie=9999)
&
Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
)
)
)
.values_list("pk")
}}}


Example project for reproduction: https://github.com/salomvary/django-5
-filter-regression

The issue is also present in 5.1.x and 5.2.x.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 14, 2025, 12:59:07 PMAug 14
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset filter | Triage Stage:
regression | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Márton Salomváry):

* type: Uncategorized => Bug

--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:1>

Django

unread,
Aug 14, 2025, 1:31:26 PMAug 14
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset filter | Triage Stage:
regression | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Márton Salomváry):

I've git bisected using this test:
https://github.com/salomvary/django/commit/cc2631043e8a2fbf1b6a1eed166bf1e9ef890bc4

The offending commit seems to be this:
https://github.com/django/django/commit/0e1aae7a5f51408b73c5a29e18bd1803dd030930
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:2>

Django

unread,
Aug 18, 2025, 11:49:35 AMAug 18
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* cc: Simon Charette (added)
* keywords: queryset filter regression => chained-filters
* resolution: => invalid
* status: new => closed
* type: Bug => Cleanup/optimization

Comment:

Hello Márton Salomváry, thank you for your report, as well as for the test
project and bisected revision number.

In general, Django does not make guarantees about the exact SQL that is
generated. This means SQL can change from version to version, and we would
not consider that a bug as long as the semantics of the query remain the
same (which, based on my testing, is the case here). The resulting
queryset still returns the correct values.

That said, we do care about potential performance impacts and/or
regressions. When triaging, I focused on how this change affects query
planning, and I can confirm that the query plan is different/worse (see
EXPLAIN ANALYZE results for Postgres comparing `4.2.x` vs `main`). I do
wonder, however, whether the way the queries are currently written might
be more complex than necessary, which could contribute to these
differences in execution plans. For example, this alternative queryset is,
IMHO, equivalent and much more straightfoward:

{{{#!python
Alpha.objects.filter(
Q(bravos__charlie_bravos__charlie=9999)
& Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
)
)
)
}}}

Because of the above, I'll close the ticket as `invalid`. If you encounter
a case where a query written in a way that aligns with Django's ORM
patterns (as shown above) returns incorrect results or performs poorly,
please feel free to reopen it with details. Thanks again!

=== Appendix ===
`EXPLAIN ANALYZE` for `stable/4.2.x`:
{{{
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.013
rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
-> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual
time=0.009..0.011 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.alpha_id
-> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual
time=0.008..0.010 rows=1 loops=1)
Output: test_filter_regression_bravo.id, u1.bravo_id,
test_filter_regression_charliebravo.bravo_id
Inner Unique: true
Join Filter: (test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8)
(actual time=0.007..0.008 rows=1 loops=1)
Output: test_filter_regression_charliebravo.bravo_id,
u1.bravo_id
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output: test_filter_regression_charliebravo.id,
test_filter_regression_charliebravo.name,
test_filter_regression_charliebravo.charlie_id,
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Nested Loop (cost=3.05..13.34 rows=540 width=4)
(actual time=0.002..0.002 rows=0 loops=2)
Output: u1.bravo_id
Inner Unique: true
-> Bitmap Heap Scan on
public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10
width=8) (actual time=0.001..0.001 rows=0 loops=2)
Output: u1.id, u1.delta_id, u1.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89
rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
Index Cond: (u1.bravo_id =
test_filter_regression_charliebravo.bravo_id)
-> Index Only Scan using
test_filter_regression_delta_pkey on public.test_filter_regression_delta
u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=1)
Output: u0.id
Index Cond: (u0.id = u1.delta_id)
Heap Fetches: 1
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4)
(actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_bravo.id
Index Cond: (test_filter_regression_bravo.id =
u1.bravo_id)
Heap Fetches: 1
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using test_filter_regression_alpha_pkey on
public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122)
(actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond: (test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.382 ms
Execution Time: 0.027 ms
}}}

For `main`:
{{{
Hash Join (cost=79.87..131.14 rows=20 width=122) (actual
time=0.074..0.171 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
Hash Cond: (t6.bravo_id = u1.bravo_id)
-> Nested Loop (cost=12.32..63.27 rows=40 width=134) (actual
time=0.046..0.143 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id, t7.id, t8.bravo_id
-> Nested Loop (cost=12.17..52.62 rows=42 width=130) (actual
time=0.041..0.137 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id, t7.id
Inner Unique: true
-> Nested Loop (cost=11.90..39.10 rows=42 width=126)
(actual time=0.034..0.130 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id
Join Filter: (test_filter_regression_alpha.id =
t6.alpha_id)
-> Nested Loop (cost=11.74..33.86 rows=11 width=126)
(actual time=0.030..0.126 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name,
test_filter_regression_bravo_alphas.alpha_id
Inner Unique: true
-> Nested Loop (cost=11.47..30.32 rows=11
width=4) (actual time=0.026..0.121 rows=1 loops=1)
Output:
test_filter_regression_bravo_alphas.alpha_id
-> Hash Join (cost=11.32..28.77 rows=3
width=8) (actual time=0.018..0.112 rows=2 loops=1)
Output:
test_filter_regression_bravo.id,
test_filter_regression_charliebravo.bravo_id
Hash Cond:
(test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Seq Scan on
public.test_filter_regression_bravo (cost=0.00..15.40 rows=540 width=4)
(actual time=0.003..0.051 rows=900 loops=1)
Output:
test_filter_regression_bravo.id, test_filter_regression_bravo.name
-> Hash (cost=11.28..11.28 rows=3
width=4) (actual time=0.006..0.006 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.bravo_id
Buckets: 1024 Batches: 1
Memory Usage: 9kB
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.004..0.004 rows=0 loops=2)
Output:
test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond:
(test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using
test_filter_regression_alpha_pkey on public.test_filter_regression_alpha
(cost=0.28..0.32 rows=1 width=122) (actual time=0.004..0.004 rows=1
loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond:
(test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
-> Index Scan using
test_filter_regression_bravo_alphas_alpha_id_b2c31687 on
public.test_filter_regression_bravo_alphas t6 (cost=0.15..0.35 rows=10
width=8) (actual time=0.004..0.004 rows=1 loops=1)
Output: t6.id, t6.bravo_id, t6.alpha_id
Index Cond: (t6.alpha_id =
test_filter_regression_bravo_alphas.alpha_id)
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo t7 (cost=0.28..0.32 rows=1
width=4) (actual time=0.007..0.007 rows=1 loops=1)
Output: t7.id
Index Cond: (t7.id = t6.bravo_id)
Heap Fetches: 1
-> Index Only Scan using
test_filter_regression_charliebravo_bravo_id_1b159238 on
public.test_filter_regression_charliebravo t8 (cost=0.15..0.22 rows=3
width=4) (actual time=0.005..0.005 rows=1 loops=1)
Output: t8.bravo_id
Index Cond: (t8.bravo_id = t6.bravo_id)
Heap Fetches: 1
-> Hash (cost=65.05..65.05 rows=200 width=4) (actual time=0.020..0.021
rows=2 loops=1)
Output: u1.bravo_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=63.05..65.05 rows=200 width=4) (actual
time=0.017..0.017 rows=2 loops=1)
Output: u1.bravo_id
Group Key: u1.bravo_id
Batches: 1 Memory Usage: 40kB
-> Hash Join (cost=22.15..57.95 rows=2040 width=4) (actual
time=0.015..0.016 rows=2 loops=1)
Output: u1.bravo_id
Inner Unique: true
Hash Cond: (u1.delta_id = u0.id)
-> Seq Scan on
public.test_filter_regression_delta_bravos u1 (cost=0.00..30.40 rows=2040
width=8) (actual time=0.001..0.001 rows=2 loops=1)
Output: u1.id, u1.delta_id, u1.bravo_id
-> Hash (cost=15.40..15.40 rows=540 width=4) (actual
time=0.004..0.004 rows=2 loops=1)
Output: u0.id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on
public.test_filter_regression_delta u0 (cost=0.00..15.40 rows=540
width=4) (actual time=0.001..0.001 rows=2 loops=1)
Output: u0.id
Planning Time: 2.462 ms
Execution Time: 0.238 ms
}}}

For `main` with optimized queryset:
{{{
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.012
rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
-> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual
time=0.009..0.011 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.alpha_id
-> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual
time=0.008..0.010 rows=1 loops=1)
Output: test_filter_regression_bravo.id, u1.bravo_id,
test_filter_regression_charliebravo.bravo_id
Inner Unique: true
Join Filter: (test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8)
(actual time=0.006..0.008 rows=1 loops=1)
Output: test_filter_regression_charliebravo.bravo_id,
u1.bravo_id
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output: test_filter_regression_charliebravo.id,
test_filter_regression_charliebravo.name,
test_filter_regression_charliebravo.charlie_id,
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Nested Loop (cost=3.05..13.34 rows=540 width=4)
(actual time=0.002..0.002 rows=0 loops=2)
Output: u1.bravo_id
Inner Unique: true
-> Bitmap Heap Scan on
public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10
width=8) (actual time=0.001..0.001 rows=0 loops=2)
Output: u1.id, u1.delta_id, u1.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89
rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
Index Cond: (u1.bravo_id =
test_filter_regression_charliebravo.bravo_id)
-> Index Only Scan using
test_filter_regression_delta_pkey on public.test_filter_regression_delta
u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=1)
Output: u0.id
Index Cond: (u0.id = u1.delta_id)
Heap Fetches: 1
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4)
(actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_bravo.id
Index Cond: (test_filter_regression_bravo.id =
u1.bravo_id)
Heap Fetches: 1
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using test_filter_regression_alpha_pkey on
public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122)
(actual time=0.001..0.001 rows=1 loops=1)
Output: test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond: (test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.381 ms
Execution Time: 0.027 ms
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:3>

Django

unread,
Aug 18, 2025, 12:38:34 PMAug 18
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Márton Salomváry):

Hi Natalia,

> In general, Django does not make guarantees about the exact SQL that is
generated. This means SQL can change from version to version, and we would
not consider that a bug as long as the semantics of the query remain the
same (which, based on my testing, is the case here).

I think the semantics did change, maybe the example I provided is not
obvious enough to showcase that. Will provide some more cases where we had
to change code.

> That said, we do care about potential performance impacts and/or
regressions.

In our case the performance impact was extreme. We got from queries that
take a few dozens of milliseconds to ones that practically never complete
and had to kill queries manually in order to recover the PostgreSQL
database server.

> I do wonder, however, whether the way the queries are currently written
might be more complex than necessary, which could contribute to these
differences in execution plans.

They might indeed be written more complex than necessary, but they used to
generate SQL we had no performance or other issues with.

I understand you are suggesting rewriting the queries before upgrading
Django from 4.x to 5.0 but given the size and complexity of our code base,
identifying the queries in a 100% reliable way is nearly impossible,
meaning that we have to keep fighting runaway queries and rewriting until
we no longer have production failures.


> For example, this alternative queryset is, IMHO, equivalent and much
more straightfoward:
>
>
> {{{
> Alpha.objects.filter(
> Q(bravos__charlie_bravos__charlie=9999)
> & Exists(
> Delta.objects.filter(
> bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
> )
> )
>)
> }}}

Sure it might be more straightforward, but repeating `filter()` calls is
often handy if some is only applied conditionally, like this:

{{{
qs = qs.filter(some_unconditional_query...)

if some_condition:
qs = qs.filter(some_conditional_query...)
}}}

I also wonder why repeating `filter()` and combining expressions like
`Q(...) & Q()` do not result in equivalent queries. Ie. shouldn't these
three always be the same? If not, what is the difference in the semantics?

{{{
qs1 = qs.filter(foo=bar).filter(baz=qux)

qs2 = qs.filter(Q(foo=bar) & Q(baz=qux))

qs3 = qs.filter(foo=bar, baz=qux)
}}}

> If you encounter a case where a query written in a way that aligns with
Django's ORM patterns (as shown above) returns incorrect results or
performs poorly, please feel free to reopen it with details.

I would appreciate feedback on how the original query does **not** align
with Django's ORM patterns. The documentation does not discourage repeat-
calls to `filter()` in order to narrow down results.

In any case, I will try to provide more input on how this breaks the
**results** of our queries.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:4>

Django

unread,
Aug 18, 2025, 1:28:58 PMAug 18
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

Replying to [comment:4 Márton Salomváry]:

> They might indeed be written more complex than necessary, but they used
to generate SQL we had no performance or other issues with.
>
> I understand you are suggesting rewriting the queries before upgrading
Django from 4.x to 5.0 but given the size and complexity of our code base,
identifying the queries in a 100% reliable way is nearly impossible,
meaning that we have to keep fighting runaway queries and rewriting until
we no longer have production failures.

The thing is that the commit you reference fixes a bug. Using your
examples:

{{{
Alpha <--M2M--> Bravo <--FK-- CharlieBravo --FK--> Charlie
Delta <--M2M--> Bravo
}}}

* Django `4.2x`: `Alpha --> Bravo --> CharlieBravo` (single INNER JOIN)

* Since `5.0.x`:
* `Alpha --> Bravo --> CharlieBravo` (INNER JOIN for first filter)
* `Alpha --> T6 Bravo --> T7 CharlieBravo` (LEFT OUTER JOIN path for
second filter)

> [...] repeating `filter()` calls is often handy if some is only applied
conditionally, like this:
>
> {{{
> qs = qs.filter(some_unconditional_query...)
>
> if some_condition:
> qs = qs.filter(some_conditional_query...)
> }}}

As a side note, this example of chaining multiple `.filter()` calls is
common and supported, but if you need to ensure joins are combined as
efficiently as possible, the equivalent pattern is to build up a list of
conditions and combine them with `&` into a single `filter`, for example:

{{{#!python
filters = [Q(some_unconditional_query)]
if some_condition:
filters.append(Q(some_conditional_query))
result = qs.filter(reduce(operator.and_, filters))
}}}

This is functionally equivalent to chaining, but ensures the ORM generates
queries consistent with the documented behavior for multi-valued
relationships.

> I also wonder why repeating `filter()` and combining expressions like
`Q(...) & Q()` do not result in equivalent queries. Ie. shouldn't these
three always be the same? If not, what is the difference in the semantics?

This is documented in this
[https://docs.djangoproject.com/en/5.2/topics/db/queries/#spanning-multi-
valued-relationships section].

In fact, prior to the change in Django 5.0, usage of expressions passed to
`.filter()` sometimes contradicted the documentation around multi-valued
relationships. The docs have long stated that when spanning multi-valued
relationships, repeated filters can lead to additional joins, and that
conditions should be combined in a single `.filter()` if you want to avoid
that. The recent change aligns behavior with that documented expectation,
even if it changes some queries that happened to work differently before.

> I would appreciate feedback on how the original query does **not** align
with Django's ORM patterns. The documentation does not discourage repeat-
calls to `filter()` in order to narrow down results.

Chaining `.filter()` is a normal pattern, but when multi-valued
relationships are involved the docs note that it can change semantics and
have query/performance implications. If `JOIN` reuse is necessary due to
performance requirements, combining conditions in a single call is the ORM
pattern to follow.

> In any case, I will try to provide more input on how this breaks the
**results** of our queries.

That would certainly help, thank you!
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:5>

Django

unread,
Aug 18, 2025, 1:52:50 PMAug 18
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Márton Salomváry):

Replying to [comment:5 Natalia Bidart]:
> The thing is that the commit you reference fixes a bug.

I understand that. What I can't decide is whether what I am seeing is
another bug introduced by that bugfix, or I have been successfully relying
on the existence of bug for quite long :)

>
> This is documented in this
[https://docs.djangoproject.com/en/5.2/topics/db/queries/#spanning-multi-
valued-relationships section].
>
> > I would appreciate feedback on how the original query does **not**
align with Django's ORM patterns. The documentation does not discourage
repeat-calls to `filter()` in order to narrow down results.
>
> Chaining `.filter()` is a normal pattern, but when multi-valued
relationships are involved the docs note that it can change semantics and
have query/performance implications. If `JOIN` reuse is necessary due to
performance requirements, combining conditions in a single call is the ORM
pattern to follow.

Oh thanks, now I remember having been bitten by this in the past, but not
often and painfully enough to remember the peculiarities of joins with
multi-valued relationships!

>
> > In any case, I will try to provide more input on how this breaks the
**results** of our queries.
>
> That would certainly help, thank you!

In this example, count is wrongly `2` in Django 5.0 and correctly `1` in
Django 4.x:

https://github.com/salomvary/django-5-filter-
regression/commit/f3cabd4e8987210e9aa17c99bfa622cfc7c580d8

{{{
alpha = Alpha.objects.create(id=7777)
bravo = alpha.bravos.create()
charlie1 = Charlie.objects.create(id=9999)
charlie2 = Charlie.objects.create(id=5555)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie1)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie2)
bravo.deltas.create()

queryset = (
Alpha.objects.filter(
bravos__charlie_bravos__charlie=9999,
)
.filter(
Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
)
)
)
.annotate(count=Count(
"bravos__charlie_bravos__bravo"
))
.values_list("pk", "count")
)

self.assertEqual(
[(7777, 1)],
list(queryset)
)
}}}

Now I wonder how to spot these in an existing code base, and prevent
shooting myself in the foot in the future. I guess I can monkey-patch
QuerySets to log a warning if repeated `filter()` is used with multi-
valued relationships and force using a single `filter()` and combining
conditions with `&` or `|`.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:6>

Django

unread,
Aug 24, 2025, 8:37:25 PMAug 24
to django-...@googlegroups.com
#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> What I can't decide is whether what I am seeing is another bug
introduced by that bugfix, or I have been successfully relying on the
existence of bug for quite long :)

I can confirm that when support for conditional expressions passing to
`filter` was added in #25367 the documented ''chaining'' property to avoid
join reuse was missed (look at the tests added in
4137fc2efce2dde48340728b8006fc6d66b9e3a5). This makes me believe the bug
has been present since 3.0 which is when the feature was introduced.

> In this example, count is wrongly 2 in Django 5.0 and correctly 1 in
Django 4.x:

This is ''expected'' as in Django 5.0 the multi-valued relationship is
joined twice which triggers #10060.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:7>
Reply all
Reply to author
Forward
0 new messages