#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):
* cc: Simon Charette (added)
* keywords: queryset filter regression => chained-filters
* resolution: => invalid
* status: new => closed
* type: Bug => Cleanup/optimization
Comment:
Hello Márton Salomváry, thank you for your report, as well as for the test
project and bisected revision number.
In general, Django does not make guarantees about the exact SQL that is
generated. This means SQL can change from version to version, and we would
not consider that a bug as long as the semantics of the query remain the
same (which, based on my testing, is the case here). The resulting
queryset still returns the correct values.
That said, we do care about potential performance impacts and/or
regressions. When triaging, I focused on how this change affects query
planning, and I can confirm that the query plan is different/worse (see
EXPLAIN ANALYZE results for Postgres comparing `4.2.x` vs `main`). I do
wonder, however, whether the way the queries are currently written might
be more complex than necessary, which could contribute to these
differences in execution plans. For example, this alternative queryset is,
IMHO, equivalent and much more straightfoward:
{{{#!python
Alpha.objects.filter(
Q(bravos__charlie_bravos__charlie=9999)
& Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
)
)
)
}}}
Because of the above, I'll close the ticket as `invalid`. If you encounter
a case where a query written in a way that aligns with Django's ORM
patterns (as shown above) returns incorrect results or performs poorly,
please feel free to reopen it with details. Thanks again!
=== Appendix ===
`EXPLAIN ANALYZE` for `stable/4.2.x`:
{{{
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.013
rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
-> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual
time=0.009..0.011 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.alpha_id
-> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual
time=0.008..0.010 rows=1 loops=1)
Output:
test_filter_regression_bravo.id, u1.bravo_id,
test_filter_regression_charliebravo.bravo_id
Inner Unique: true
Join Filter: (
test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8)
(actual time=0.007..0.008 rows=1 loops=1)
Output: test_filter_regression_charliebravo.bravo_id,
u1.bravo_id
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.id,
test_filter_regression_charliebravo.name,
test_filter_regression_charliebravo.charlie_id,
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Nested Loop (cost=3.05..13.34 rows=540 width=4)
(actual time=0.002..0.002 rows=0 loops=2)
Output: u1.bravo_id
Inner Unique: true
-> Bitmap Heap Scan on
public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10
width=8) (actual time=0.001..0.001 rows=0 loops=2)
Output:
u1.id, u1.delta_id, u1.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89
rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
Index Cond: (u1.bravo_id =
test_filter_regression_charliebravo.bravo_id)
-> Index Only Scan using
test_filter_regression_delta_pkey on public.test_filter_regression_delta
u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=1)
Output:
u0.id
Index Cond: (
u0.id = u1.delta_id)
Heap Fetches: 1
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4)
(actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_bravo.id
Index Cond: (
test_filter_regression_bravo.id =
u1.bravo_id)
Heap Fetches: 1
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using test_filter_regression_alpha_pkey on
public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122)
(actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond: (
test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.382 ms
Execution Time: 0.027 ms
}}}
For `main`:
{{{
Hash Join (cost=79.87..131.14 rows=20 width=122) (actual
time=0.074..0.171 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
Hash Cond: (t6.bravo_id = u1.bravo_id)
-> Nested Loop (cost=12.32..63.27 rows=40 width=134) (actual
time=0.046..0.143 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id,
t7.id, t8.bravo_id
-> Nested Loop (cost=12.17..52.62 rows=42 width=130) (actual
time=0.041..0.137 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id,
t7.id
Inner Unique: true
-> Nested Loop (cost=11.90..39.10 rows=42 width=126)
(actual time=0.034..0.130 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name, t6.bravo_id
Join Filter: (
test_filter_regression_alpha.id =
t6.alpha_id)
-> Nested Loop (cost=11.74..33.86 rows=11 width=126)
(actual time=0.030..0.126 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name,
test_filter_regression_bravo_alphas.alpha_id
Inner Unique: true
-> Nested Loop (cost=11.47..30.32 rows=11
width=4) (actual time=0.026..0.121 rows=1 loops=1)
Output:
test_filter_regression_bravo_alphas.alpha_id
-> Hash Join (cost=11.32..28.77 rows=3
width=8) (actual time=0.018..0.112 rows=2 loops=1)
Output:
test_filter_regression_bravo.id,
test_filter_regression_charliebravo.bravo_id
Hash Cond:
(
test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Seq Scan on
public.test_filter_regression_bravo (cost=0.00..15.40 rows=540 width=4)
(actual time=0.003..0.051 rows=900 loops=1)
Output:
test_filter_regression_bravo.id,
test_filter_regression_bravo.name
-> Hash (cost=11.28..11.28 rows=3
width=4) (actual time=0.006..0.006 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.bravo_id
Buckets: 1024 Batches: 1
Memory Usage: 9kB
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.004..0.004 rows=0 loops=2)
Output:
test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond:
(test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using
test_filter_regression_alpha_pkey on public.test_filter_regression_alpha
(cost=0.28..0.32 rows=1 width=122) (actual time=0.004..0.004 rows=1
loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond:
(
test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
-> Index Scan using
test_filter_regression_bravo_alphas_alpha_id_b2c31687 on
public.test_filter_regression_bravo_alphas t6 (cost=0.15..0.35 rows=10
width=8) (actual time=0.004..0.004 rows=1 loops=1)
Output:
t6.id, t6.bravo_id, t6.alpha_id
Index Cond: (t6.alpha_id =
test_filter_regression_bravo_alphas.alpha_id)
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo t7 (cost=0.28..0.32 rows=1
width=4) (actual time=0.007..0.007 rows=1 loops=1)
Output:
t7.id
Index Cond: (
t7.id = t6.bravo_id)
Heap Fetches: 1
-> Index Only Scan using
test_filter_regression_charliebravo_bravo_id_1b159238 on
public.test_filter_regression_charliebravo t8 (cost=0.15..0.22 rows=3
width=4) (actual time=0.005..0.005 rows=1 loops=1)
Output: t8.bravo_id
Index Cond: (t8.bravo_id = t6.bravo_id)
Heap Fetches: 1
-> Hash (cost=65.05..65.05 rows=200 width=4) (actual time=0.020..0.021
rows=2 loops=1)
Output: u1.bravo_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=63.05..65.05 rows=200 width=4) (actual
time=0.017..0.017 rows=2 loops=1)
Output: u1.bravo_id
Group Key: u1.bravo_id
Batches: 1 Memory Usage: 40kB
-> Hash Join (cost=22.15..57.95 rows=2040 width=4) (actual
time=0.015..0.016 rows=2 loops=1)
Output: u1.bravo_id
Inner Unique: true
Hash Cond: (u1.delta_id =
u0.id)
-> Seq Scan on
public.test_filter_regression_delta_bravos u1 (cost=0.00..30.40 rows=2040
width=8) (actual time=0.001..0.001 rows=2 loops=1)
Output:
u1.id, u1.delta_id, u1.bravo_id
-> Hash (cost=15.40..15.40 rows=540 width=4) (actual
time=0.004..0.004 rows=2 loops=1)
Output:
u0.id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on
public.test_filter_regression_delta u0 (cost=0.00..15.40 rows=540
width=4) (actual time=0.001..0.001 rows=2 loops=1)
Output:
u0.id
Planning Time: 2.462 ms
Execution Time: 0.238 ms
}}}
For `main` with optimized queryset:
{{{
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.012
rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Inner Unique: true
-> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual
time=0.009..0.011 rows=1 loops=1)
Output: test_filter_regression_bravo_alphas.alpha_id
-> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual
time=0.008..0.010 rows=1 loops=1)
Output:
test_filter_regression_bravo.id, u1.bravo_id,
test_filter_regression_charliebravo.bravo_id
Inner Unique: true
Join Filter: (
test_filter_regression_bravo.id =
test_filter_regression_charliebravo.bravo_id)
-> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8)
(actual time=0.006..0.008 rows=1 loops=1)
Output: test_filter_regression_charliebravo.bravo_id,
u1.bravo_id
-> Bitmap Heap Scan on
public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3
width=4) (actual time=0.003..0.003 rows=2 loops=1)
Output:
test_filter_regression_charliebravo.id,
test_filter_regression_charliebravo.name,
test_filter_regression_charliebravo.charlie_id,
test_filter_regression_charliebravo.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17
rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
Index Cond:
(test_filter_regression_charliebravo.charlie_id = 9999)
-> Nested Loop (cost=3.05..13.34 rows=540 width=4)
(actual time=0.002..0.002 rows=0 loops=2)
Output: u1.bravo_id
Inner Unique: true
-> Bitmap Heap Scan on
public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10
width=8) (actual time=0.001..0.001 rows=0 loops=2)
Output:
u1.id, u1.delta_id, u1.bravo_id
Recheck Cond:
(test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
Heap Blocks: exact=1
-> Bitmap Index Scan on
test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89
rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
Index Cond: (u1.bravo_id =
test_filter_regression_charliebravo.bravo_id)
-> Index Only Scan using
test_filter_regression_delta_pkey on public.test_filter_regression_delta
u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=1)
Output:
u0.id
Index Cond: (
u0.id = u1.delta_id)
Heap Fetches: 1
-> Index Only Scan using test_filter_regression_bravo_pkey
on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4)
(actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_bravo.id
Index Cond: (
test_filter_regression_bravo.id =
u1.bravo_id)
Heap Fetches: 1
-> Index Scan using
test_filter_regression_bravo_alphas_bravo_id_042061e7 on
public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10
width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_bravo_alphas.id,
test_filter_regression_bravo_alphas.bravo_id,
test_filter_regression_bravo_alphas.alpha_id
Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
test_filter_regression_bravo.id)
-> Index Scan using test_filter_regression_alpha_pkey on
public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122)
(actual time=0.001..0.001 rows=1 loops=1)
Output:
test_filter_regression_alpha.id,
test_filter_regression_alpha.name
Index Cond: (
test_filter_regression_alpha.id =
test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.381 ms
Execution Time: 0.027 ms
}}}
--
Ticket URL: <
https://code.djangoproject.com/ticket/36552#comment:3>