[Django] #21604: Embed raw queries as subqueries when used with an __in filter

28 views
Skip to first unread message

Django

unread,
Dec 13, 2013, 1:26:45 AM12/13/13
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
----------------------------------------------+--------------------
Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
As per this discussion on django-developers:
https://groups.google.com/d/topic/django-developers/cWrcEHZaMTg/discussion

In brief: At the moment, when passing a `RawQuerySet` to an `__in` filter,
the raw query is evaluated and the result passed as a parameter to the
filter's SQL. It would be nice to have the option of using the raw SQL in
a subquery instead, for performance reasons.

I've had an initial shot at implementing this feature and would love any
feedback:
https://github.com/AlexHill/django/compare/master...raw_subqueries

I've taken the approach of wrapping the raw query in another `SELECT` when
it's used as a subquery, selecting the required columns (either the
primary key or others as specified in `only()` - see below) and adding an
`IS NOT NULL` clause.


== Responding to Anssi's feedback on django-developers about this
approach:

> 1. It should not be assumed that the primary key is the field needed in
the inner query. We do have foreign keys to other fields, and one can use
this approach without foreign keys at all.

We discussed making a `ValuesRawQuerySet` that could be used like this:
`Book.objects.filter(subject_code__in=subject_raw_query.values('code'))`

But that feels a bit too convoluted - as Anssi pointed out, it's a verbose
way of writing raw SQL and getting plain data back, which Django already
has.

What I think makes sense, and what I've implemented, is using `only()` for
this purpose. `QuerySet.only()` and `QuerySet.defer()` appear to silently
ignore requests to defer the PK: if you do `Book.objects.only('title')`,
both `title` and the primary key are fetched. What I've implement will
follow that convention when the RawQuery is executed directly, but leave
the PK out in subqueries when applicable, so that things like
`Author.objects.get(pk__in=raw_book_query.only('author_id'))` work as
expected.

Supporting `only()` requires a pretty big change to RawQuery, and I feel
like it might be all a bit too tricky - it's all contained in the most
recent commit so can be easily rolled back if necessary. Keen to know what
others think.

Using `only()` also doesn't work in SQLite currently, due to #21603.


> 2. If the query is of the form NOT IN, then we need to also filter out
null values, otherwise no results are returned if the raw query contains a
single null (NOT IN is weird...)

This is taken into account in the patch - subqueries are always filtered
for nulls. This will be a database no-op in the common case of selecting a
not-null primary key.

> 3. Does the wrapping of the subquery into (select field from (raw_query)
where field is not null) cause performance problems? It could be possible
that this prevents the DB's optimizer from working correctly.

Postgres does the right thing and as far as I can read SQLite's query
planner, that does too. Have not tried any other backends yet.

> 4. If the query doesn't select the required column (that is, it is
deferred), we don't have any way to know that. This seems like a
documentation issue, but worth mentioning.

The patch in its current state sidesteps this to some degree, in that
including the primary key is already a stated requirement of the queries
passed to RawQuerySet. However, at the moment you just get an error back
from the database backend, instead of the InvalidQuery Django raises when
evaluating a pk-less raw query. I think this is acceptable if documented.

--
Ticket URL: <https://code.djangoproject.com/ticket/21604>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 9, 2014, 7:00:34 AM2/9/14
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------

Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by mjtamlyn):

* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* needs_tests: => 0
* needs_docs: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:1>

Django

unread,
Aug 11, 2014, 8:52:56 AM8/11/14
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------

Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1

Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):

* needs_better_patch: 0 => 1


Comment:

Sorry for not responding to this ticket sooner.

I think the patch is a bit too invasive for this ticket. We only need
support for `somefield__in=rawqs`. It would be hopefully possible to just
alter the way how the SQL is generated when used in an `in` lookup. In
particular there shouldn't be any need

I wonder if it would be better to just add support for
`qs.filter(somefield__in=RawQuery(sql_str, params))` (and also do the same
for other lookup types if at all possible). This would be a lot easier to
support (actually, this will be likely extremely easy to do after #14030).
Of course, this doesn't allow one to use an existing raw query.

If this could be implemented with just wrapping the query in subquery when
used from `__in` lookup, then I think supporting `somefield__in=rawqs`
directly is OK. If not, then lets focus on adding support for
`somefield__in=RawQuery(sql_str, params)`. In other words, the current
patch seems too complicated for the added feature.

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:2>

Django

unread,
Aug 7, 2019, 10:37:42 AM8/7/19
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------

Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

This should be as easy as implementing `RawQuery.as_sql` to return
`(self.sql, self.params)` and setting `has_select_fields=True` as class
attribute.

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:3>

Django

unread,
Oct 31, 2022, 2:47:00 AM10/31/22
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------

Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Andreas Galazis):

Any news? This is a must if you want to optimize subqueries that cannot be
generated by orm

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:4>

Django

unread,
Oct 31, 2022, 9:49:59 AM10/31/22
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------

Reporter: alex@… | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Andreas, if this is something you'd like to see in Django
[https://docs.djangoproject.com/en/4.1/internals/contributing/writing-
code/ you could give a shot submitting a PR] based on the approach
described in comment:3

Something along these lines, with regression tests, should get you almost
all the way there

{{{#!diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index c2a71ff589..4845ae9e90 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -82,6 +82,8 @@ def get_children_from_q(q):
class RawQuery:
"""A single raw SQL query."""

+ has_select_fields = True
+
def __init__(self, sql, using, params=()):
self.params = params
self.sql = sql
@@ -151,6 +153,9 @@ def _execute_query(self):
self.cursor = connection.cursor()
self.cursor.execute(self.sql, params)

+ def as_sql(self, compiler, connection):
+ return self.sql, self.params
+

ExplainInfo = namedtuple("ExplainInfo", ("format", "options"))
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:5>

Django

unread,
Nov 25, 2022, 8:27:09 AM11/25/22
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by William):

* owner: nobody => William
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:6>

Django

unread,
Dec 7, 2023, 4:01:38 AM12/7/23
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aivars Kalvāns):

Hi!

Is this still a thing? I had a similar need recently and solved it by
using RawSQL:

{{{
from django.db.models.expressions import RawSQL

NamedCategory.objects.create(id=1, name="first")
NamedCategory.objects.create(id=2, name="second")
NamedCategory.objects.create(id=3, name="third")
NamedCategory.objects.create(id=4, name="fourth")

query = DumbCategory.objects.filter(
id__in=RawSQL("SELECT id FROM queries_dumbcategory WHERE id >= %s",
params=[3])
)

self.assertEqual(set(query.values_list("id", flat=True)), {3, 4})
print(query.query)
}}}

this works as expected and prints out

{{{
SELECT "queries_dumbcategory"."id" FROM "queries_dumbcategory" WHERE
"queries_dumbcategory"."id" IN (SELECT id FROM queries_dumbcategory WHERE
id >= 3)
}}}

which confirms that the raw subquery was embedded.
Is there something that would make it better or more useful by using
RawQuerySet?

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:7>

Django

unread,
Dec 7, 2023, 4:06:22 AM12/7/23
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aivars Kalvāns):

https://docs.djangoproject.com/en/4.2/ref/models/expressions/#raw-sql-
expressions

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:8>

Django

unread,
Dec 7, 2023, 8:46:47 AM12/7/23
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

The submitted report was about making `RawQuerySet` work but if `RawSQL`
works, and didn't exist at the time when this report was created 10 years
ago, I'd be inclined to close this one as ''wontfix'' at this point.

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:9>

Django

unread,
Dec 7, 2023, 11:03:24 AM12/7/23
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Aivars Kalvāns):

* cc: Aivars Kalvāns (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:10>

Django

unread,
May 5, 2025, 6:05:35 AMMay 5
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* needs_better_patch: 1 => 0

Comment:

Needs the release note moved to 6.0.txt but otherwise ready for a re-
review pending Simon's comment about whether to wontfix after all.
--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:11>

Django

unread,
May 20, 2025, 8:29:06 AMMay 20
to django-...@googlegroups.com
#21604: Embed raw queries as subqueries when used with an __in filter
-------------------------------------+-------------------------------------
Reporter: alex@… | Owner: William
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* needs_better_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/21604#comment:12>
Reply all
Reply to author
Forward
0 new messages