[Django] #25789: Inefficient Queries Generated due to not using WHERE EXISTS

43 views
Skip to first unread message

Django

unread,
Nov 20, 2015, 4:53:23 PM11/20/15
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
----------------------------------------------+--------------------
Reporter: cancan101 | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer (models, ORM) | Version: 1.8
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
[http://stackoverflow.com/questions/33764737/django-equivalent-of-
sqlalchemy-any-to-filter-where-exists/33765323 Reposting question from SO]
with some more details.

I believe that the Django ORM is generating seriously inefficient SQL due
to it not using `WHERE EXISTS` but instead using a `DISTINCT` with a `LEFT
JOIN`. by comparison, SQLAlchemy will use `WHERE EXISTS`.

I have two models, `Exam` and `Series`. `Series` objects have a foreign
key to an `Exam` object. Both of the models contain a field
`description_user`. I am trying to search for all `Exam`s that have a
search term in `description_user` or have a child `Series` with that term
in its `description_user`. I want to do this for a number of search terms
(requiring all of them). I also want to de-duplicate the results (ie not
get the same Exam multiple times).

This is roughly what the filter looks like:

{{{
a = (Q(**{'series__description_user__icontains': 'bar'}) |
Q(**{'description_user__icontains': 'bar'}))
b = (Q(**{'series__description_user__icontains': 'foo'}) |
Q(**{'description_user__icontains': 'foo'}))
c = (Q(**{'series__description_user__icontains': 'baz'}) |
Q(**{'description_user__icontains': 'baz'}))
Exam.objects.filter(a & b & c).distinct()
}}}

with corresponding SQL:

{{{
SELECT DISTINCT
"exam_storage_exam"."id",
"exam_storage_exam"."description_user"
FROM "exam_storage_exam"
LEFT OUTER JOIN "exam_storage_series"
ON (
"exam_storage_exam"."id" =
"exam_storage_series"."exam_id"
AND (
"exam_storage_series"."removed" IS NULL) )
WHERE (
"exam_storage_exam"."removed" IS NULL
AND (
"exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
OR
"exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\')
AND (
"exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
OR
"exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\')
AND (
"exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
OR
"exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\'))

}}}

The issue is that as the number of search terms grows, the size of the
intermediate data set before the DISTINCT operation grows as well.

Ideally the SQL would look like:
{{{
SELECT *
FROM exam
WHERE (EXISTS (SELECT 1
FROM exam_storage_series
WHERE exam.id = series.exam_id AND (
series.description_user LIKE '%foo%'
)) or exam.description_user LIKE '%foo%') AND
(EXISTS (SELECT 1
FROM exam_storage_series
WHERE exam.id = series.exam_id AND (
series.description_user LIKE '%bar%'
)) or exam.description_user LIKE '%bar%') AND
(EXISTS (SELECT 1
FROM exam_storage_series
WHERE exam.id = series.exam_id AND (
series.description_user LIKE '%baz%'
)) or exam.description_user LIKE '%baz%')
}}}

Currently the performance of Django query is terrible. This style
searching comes up for example in how [https://github.com/tomchristie
/django-rest-
framework/blob/43c45cc9391ec2bed9481a8b309990dec35b6ac8/rest_framework/filters.py#L132-L180
DRF generates search queries].

--
Ticket URL: <https://code.djangoproject.com/ticket/25789>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 23, 2015, 12:14:23 PM11/23/15
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* cc: akaariai (added)
* needs_better_patch: => 0
* type: Uncategorized => Cleanup/optimization
* needs_tests: => 0
* needs_docs: => 0


Comment:

Anssi, is this feasible? The closest existing ticket I could find is
#16603 but not sure if it should be considered a duplicate.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:1>

Django

unread,
Nov 23, 2015, 1:38:38 PM11/23/15
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

Yes, this is possible, and something Django's ORM should do.

Unfortunately this is hard to implement correctly. The big problem is
aggregation, when doing .filter().annotate(), where both operations target
the same relation, the aggregation must use results of the join generated
by the filter. But if the filter doesn't generate a join, then we have a
problem.

It would be Djangoic if the exists query would be generated automatically.
For the above mentioned reason this is hard. Maybe it would be easier if
we had something like .filter(models.Exists(somerel__col=val)), where the
Exists() class would inform Django that an exists subquery should be
generated.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:2>

Django

unread,
Nov 23, 2015, 2:02:38 PM11/23/15
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:3>

Django

unread,
Nov 24, 2015, 7:19:46 AM11/24/15
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by mjtamlyn):

* cc: mjtamlyn (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:4>

Django

unread,
Feb 10, 2016, 11:24:21 AM2/10/16
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* keywords: => QuerySet.extra


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:5>

Django

unread,
Apr 20, 2016, 3:08:14 AM4/20/16
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: cancan101 | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by schinckel):

It's not totally relevant (at least not yet), but I have a version of this
that works with .annotate():

https://github.com/django/django/pull/6478

There's also a django-developers thread that hopefully may get some
discussion.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:6>

Django

unread,
Aug 16, 2018, 9:29:55 AM8/16/18
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody

Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by James Howe):

Similarly, I have a use-case where this query structure has the only
acceptable performance (in PostgreSQL):
{{{#!sql
... FROM model_1
WHERE NOT EXISTS (
SELECT TRUE FROM model_2
WHERE model_1.pkey = model_2.fkey
AND (model_2.property IS NULL OR model_2.property >= 42)
)
}}}

Currently using `extra()` to do it.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:7>

Django

unread,
Aug 16, 2018, 9:31:34 AM8/16/18
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by James Howe):

* cc: James Howe (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:8>

Django

unread,
Aug 17, 2018, 1:01:49 AM8/17/18
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

James, since this ticket was filled support for `Exists` expression was
added so you should be able to avoid using `extra`

{{{#!python
queryset.annotate(
foo_exists=Exists(subqueryset.filter(foo_id=OuterRef('pk'))
).filter(
foo_exists=False,
)
}}}

Once #25367 lands it'll be possible to pass `~Exists` directly to filter.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:9>

Django

unread,
Aug 23, 2022, 12:05:47 PM8/23/22
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Another ''nice to have'' that could be added here would be allow the usage
of transforms on related fields so we could have lookups of the form
`__exists` or `__count`

{{{#!python
Author.objects.filter(books__exists=Q(title__icontains="ring"))
Author.objects.filter(books__count__gt=4)
}}}

instead of

{{{#!python
Author.objects.filter(Exist(Book.objects.filter(author=OuterRef("pk"),
title__icontains="ring")))
Author.objects.filter(
GreaterThan(
Book.objects.filter(
author_id=OutRef("pk"),
).values(
"author_id"
).values(Count("id")), 4
)
)
}}}

I think it would go a long way in making it easier for users to work
around this issue and #10060.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:10>

Django

unread,
Sep 21, 2022, 2:59:47 PM9/21/22
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by David Wobrock):

* cc: David Wobrock (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:11>

Django

unread,
Sep 21, 2022, 3:48:00 PM9/21/22
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

FWIW I had a very limited try at this in
[https://github.com/django/django/compare/main...charettes:django:exists-
m2m-lookup this branch]. Mostly some test and debugging shims but I
figured I'd provide them if someone is interesting in looking this issue.

I'd add that this work has a bit of overlap with #28296 as it basically
needs to turn a series of transforms and lookups against a related field
into a subquery without causing JOINs on the outer query which the default
behaviour of using the `__` syntax.

--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:12>

Django

unread,
Sep 23, 2022, 7:00:58 AM9/23/22
to django-...@googlegroups.com
#25789: Inefficient Queries Generated due to not using WHERE EXISTS
-------------------------------------+-------------------------------------
Reporter: Alex Rothberg | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by John Speno):

* cc: John Speno (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/25789#comment:13>

Reply all
Reply to author
Forward
0 new messages