{{{
Window.objects.annotate(row=Window(expression=RowNumber())).filter(row__lt=1)
}}}
is not allowed. Instead, the window function expression should be wrapped
in an inner query, and the filtering should be done in an outer query.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
Old description:
> #26608 will introduce window function expressions, but will disallow
> filtering on the result of them, e.g.:
>
> {{{
> Window.objects.annotate(row=Window(expression=RowNumber())).filter(row__lt=1)
> }}}
>
> is not allowed. Instead, the window function expression should be wrapped
> in an inner query, and the filtering should be done in an outer query.
New description:
#26608 will introduce window function expressions, but will disallow
filtering on the result of them, e.g.:
{{{
Window.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)
}}}
is not allowed. Instead, the window function expression should be wrapped
in an inner query, and the filtering should be done in an outer query.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:1>
Comment (by Bernd Wechner):
This is 2 years old with no action and I am very keen to see it
implemented (need it rather badly).
It strikes me as an aside that a more general approach may kill more birds
with one stone. I noticed the rather excellent ExpressionWrapper(), and it
struck me that a QueryWrapper() would be a more general solution that
covers this particular need and will cover others as well, known and
unknown at present.
In short QueryRapper would simply make an inner query of the QuerySet to
date so that subsequent operations act upon it as if it were a table.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:2>
* cc: Alexandr Artemyev (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:3>
* cc: Andy Terra (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:4>
* cc: Étienne Beaulé (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:5>
* keywords: window orm filter subquery => window orm filter subquery GSoC
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:6>
* owner: nobody => Manav Agarwal
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:7>
Comment (by Manav Agarwal):
I was looking throughout this issue and found that we had one such
QueryWrapper class in 2.1 version.
([https://docs.djangoproject.com/en/2.1/_modules/django/db/models/query_utils/]).
Anyone has any idea why it is not present in the latest releases?
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:8>
Comment (by Manav Agarwal):
I was doing some research on this issue and found a few solutions to the
problem. (all these are vague ideas. Any suggestions/feedback would be
appreciated to make the idea worth implementing)
1. A separate QueryWrapper Class which will have syntax like this
a.
{{{Window.objects.annotate(row=QueryWrapper(Window(expression=RowNumber())).filter(row__gt=1)}}}
OR
b. QueryWrapper class internally implemented for window function to
automatically generate SQL subquery for all window expressions.
2. Use the subquery class internally to make execute all window expression
related queries as subqueries.
3. Passing an alias to window expression and then in spite of generating
half query with just over and order by clause we may generate a separate
select statement when will further be used as a select statement for a
separate table in the query.
I personally feel that implementing 1.a would be a good option but as I
mentioned above this is just a vague idea and to implement it I need some
guidance from someone who is more experienced.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:9>
* cc: Michael Wheeler (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:10>
* owner: Manav Agarwal => (none)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:11>
* cc: şuayip üzülmez (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:12>
* cc: John Speno (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:13>
* cc: Alex Scott (added)
Comment:
Is there a recommend workaround for not being able to filter on a window
function result? Can you wrap it somehow manually?
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:14>
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:15>
* cc: Ad Timmering (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:16>
Comment (by rossm6):
I have this problem as well.
According to this article https://learnsql.com/blog/window-functions-not-
allowed-in-where/ the solution is either to use a common table expression
or a subquery which is the FROM clause in the sql query. Neither
unfortunately is supported by django it seems. Although I did find this
package for the first option -
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#extra.
Both of these should be options in the Django ORM right? Each would be a
big win for the power of the ORM.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:17>
* cc: Hannes Ljungberg (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:18>
* cc: Dave Johansen (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:19>
Comment (by SafaAlfulaij):
What I'm doing currently is this hack:
Before:
{{{
#!python
queryset =
MyModel.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)
}}}
After:
{{{
#!python
queryset = MyModel.objects.annotate(row=Window(expression=RowNumber()))
sql, params = queryset.query.sql_with_params()
queryset = queryset.raw(f"SELECT * FROM ({sql}) AS full WHERE row >= 1",
params)
}}}
I don't see that it's bad to have this currently, with whatever
limitations of `raw` documented in the `Window` filtering section, then
add in more features if real use cases are provided.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:20>
Comment (by Simon Charette):
#26780 which is about adding support for slices prefetching (think top-n
results per category) to core would benefit from this feature being
implemented at least partially.
The most difficult part of this issue is not the subquery pushdown itself
(see #24462) but making sure that union filters of the form
`filter(Q(window__lookup=foo) | Q(aggregate__lookup=bar) |
Q(field__lookup=baz))` are resulting in the proper usage of inner query
`WHERE` and `HAVING` and outer query usage of `WHERE` (see the
`Where.split_having`
[https://github.com/django/django/blob/d38324edc840049d42c3454b9487ac370aab5ee9/django/db/models/sql/where.py#L38-L79
method] for the current implementation).
If we were to start by focusing this ticket on the ''simple'' intersection
use cases of the form `filter(window__lookup=foo)` (as reported here and
required by #26780) I suspect we'd cover most of the use cases while
deferring most of the complexity. If someone would like to give this a
shot I'd start by doing the following:
1. Make `Window.filterable = True` for now
2. Adjust `Where.split_having` to properly deal with
`self.contains_over_clause` by returning a triple of the form `(where:
Where, having: Where, window: Where)` and error out when `self.connector
!= AND and self.contains_over_clause`. Possibly rename to
`split_having_window`?
3. Adjust `SQLCompiler.pre_sql_setup` to assign `self.over_where` and use
it in `SQLCompiler.as_sql` to wrap the query in a subquery that `SELECT *
FROM ({subquery_sql}) subquery WHERE {over_where_sql}`
4. Add tests for new supported use cases and disallowed ones.
5. Make `Q.filterable` return `False` when `self.connector != AND and
self.contains_over_clause` but that will result in weird error messages of
the form `Q is disallowed in the filter clause.` so maybe we'll want to
deprecate `Q.filterable` in favour of a `BaseExpression.check_filterable`
method instead that defaults to `raise` the current message and is
overridden in `Q` to raise a proper message with regards to complex
filters window functions.
Happy to review a PR that attempts the above or provide feedback here if
that means this ticket is partially fixed and allows for #26780 to benefit
from this work.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:21>
* cc: Simon Charette (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:22>
Comment (by Simon Charette):
Had a first stab at the above and it seems to be working relatively well,
[https://github.com/django/django/compare/main...charettes:django:ticket-28333
-filter-by-window not too intrusive of a change]. I'll give a shot at
implementing #26780 on to of it now to confirm it could work.
As a side note it seems that the Snowflake database has an SQL extension
to filter against window functions, the [https://docs.snowflake.com/en
/sql-reference/constructs/qualify.html QUALIFY clause].
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:23>
Comment (by Simon Charette):
Submit [https://github.com/django/django/pull/15922 a PR] that adds
support for jointed predicates but still disallowed disjointed ones.
For example, given the following model and queryset
{{{#!python
class Employee(models.Model):
name = models.CharField(max_length=50)
department = models.CharField(max_length=50)
salary = models.IntegerField()
class PastEmployeeDepartment(models.Model):
employee = models.ForeignKey(Employee,
related_name="past_departments")
department = models.CharField(max_length=50)
queryset = Employee.objects.annotate(
dept_max_salary=Window(Max(), partition_by="department"),
dept_salary_rank=Window(Rank(), partition_by="department",
order_by="-salary"),
past_depths_cnt=Count("past_departments"),
)
}}}
All of the following is supported
{{{#!python
# window predicate will be pushed to outer query
queryset.filter(dept_max_salary__gte=F("salary"))
SELECT * FROM (...) "quantify" WHERE dept_max_salary >=
"quantify"."salary"
# department predicate will be applied in inner query
queryset.filter(department="IT", dept_max_salary__gte=F("salary"))
SELECT * FROM (... WHERE "department" = 'IT') "quantify" WHERE
dept_max_salary >= "quantify"."salary"
# aggregate predicate will be applied in the inner query
queryset.filter(past_depths_cnt__gte=1, dept_max_salary__gte=F("salary"))
SELECT * FROM (... HAVING COUNT("pastemployeedepartment"."id" >= 1)
"quantify" WHERE dept_max_salary >= "quantify"."salary"
}}}
Some form of disjointed predicates against window functions (using `OR`)
are also supported as long as they are ''only'' against window functions
{{{#!python
# Disjointed predicates only about window functions is supported
queryset.filter(Q(dept_max_salary__gte=F("salary")) |
Q(dept_salary_rank__lte=2))
SELECT * FROM (...) "quantify" WHERE "dept_max_salary" >=
"quantify"."salary" OR "dept_salary_rank" <= 2
}}}
And limits are only applied on the outer query, once all window function
filters are applied.
The following is not supported
1. Disjointed filters mixing predicates against window functions and
aggregates and/or column references as it's really hard to emulate without
getting in multiple level of subquery pushdown particularly if aggregation
is involved.
2. Filtering against columns masked by the usage of `values`,
`values_list`, or `alias`. This one could be to solved by adding another
layer of subquery pushdown that avoids applying the mask in the subquery
but does so in an outermost query over the one used for window filtering.
3. Passing window functions instances directly to `filter` and `exclude`
instead of referencing annotated window functions.
Feedback about the proposed supported feature set and implementation is
very welcome.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:24>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:25>
* owner: (none) => Simon Charette
* needs_better_patch: 0 => 1
* status: new => assigned
* needs_tests: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:26>
Comment (by Simon Charette):
The latest version of the patch now supports filtering against annotations
masked by the usage of `values` and friends.
`queryset.filter(dept_max_salary__gte=2000).values("id")` now results in
{{{#!python
SELECT "col1" FROM (
SELECT * FROM (
SELECT "id" AS "col1", MAX OVER (...) AS "depth_max_salary" FROM
...
) "qualify" WHERE "dept_max_salary" >= 2000
) "qualify_mask"
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:27>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"35911078fa40eb35859832987fedada76963c01e" 35911078]:
{{{
#!CommitTicketReference repository=""
revision="35911078fa40eb35859832987fedada76963c01e"
Replaced Expression.replace_references() with .replace_expressions().
The latter allows for more generic use cases beyond the currently
limited ones constraints validation has.
Refs #28333, #30581.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:28>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"8c3046daade8d9b019928f96e53629b03060fe73" 8c3046da]:
{{{
#!CommitTicketReference repository=""
revision="8c3046daade8d9b019928f96e53629b03060fe73"
Refs #28333 -- Moved SQLCompiler's forced column aliasing logic to
get_select().
This extends query composability possibilities when dealing with
subqueries which is necessary to implement window function filtering.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:29>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"f387d024fc75569d2a4a338bfda76cc2f328f627" f387d024]:
{{{
#!CommitTicketReference repository=""
revision="f387d024fc75569d2a4a338bfda76cc2f328f627"
Refs #28333 -- Added partial support for filtering against window
functions.
Adds support for joint predicates against window annotations through
subquery wrapping while maintaining errors for disjointed filter
attempts.
The "qualify" wording was used to refer to predicates against window
annotations as it's the name of a specialized Snowflake extension to
SQL that is to window functions what HAVING is to aggregates.
While not complete the implementation should cover most of the common
use cases for filtering against window functions without requiring
the complex subquery pushdown and predicate re-aliasing machinery to
deal with disjointed predicates against columns, aggregates, and window
functions.
A complete disjointed filtering implementation should likely be
deferred until proper QUALIFY support lands or the ORM gains a proper
subquery pushdown interface.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:30>
* needs_better_patch: 1 => 0
* has_patch: 1 => 0
* needs_tests: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:31>
Comment (by GitHub <noreply@…>):
In [changeset:"f210de760b06cd57ff37b416e2bf9eafb0bfe929" f210de76]:
{{{
#!CommitTicketReference repository=""
revision="f210de760b06cd57ff37b416e2bf9eafb0bfe929"
Refs #28333 -- Fixed NonQueryWindowTests.test_invalid_filter() on
databases that don't support window expressions.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:32>
Comment (by GitHub <noreply@…>):
In [changeset:"3ba7f2e9069c54db6d6d9d2fd1945b2dbc935d9c" 3ba7f2e]:
{{{
#!CommitTicketReference repository=""
revision="3ba7f2e9069c54db6d6d9d2fd1945b2dbc935d9c"
Refs #28333 -- Explicitly ordered outer qualify query on window filtering.
While most backends will propagate derived table ordering as long as
the outer query doesn't perform additional processing the SQL specs
doesn't explicitly state the ordering must be maintained.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:33>
Comment (by Mariusz Felisiak):
Referencing outer window expressions in subqueries should also be
supported, see #34368.
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:34>
Comment (by master):
My code was working till 3.2, but with
https://code.djangoproject.com/changeset/8c3046daade8d9b019928f96e53629b03060fe73,
it doesn't anymore.
Here is a simplified (no filter(), less annotate(), fake Value(), ...)
demonstrator:
{{{
>>> from postman.models import Message
>>> qs1=Message.objects.values_list('id').order_by()
>>> print(qs1.query) # correct
SELECT "postman_message"."id" FROM "postman_message"
>>> qs2=Message.objects.values('thread').annotate(id=Value(2,
IntegerField())).values_list('id').order_by()
>>> print(qs2.query) # correct
SELECT 2 AS "id" FROM "postman_message"
>>> print(qs1.union(qs2).query) # will cause my problem
SELECT "postman_message"."id" AS "col1" FROM "postman_message" UNION
SELECT 2 AS "id" FROM "postman_message"
}}}
The noticeable point is the introduction of the alias `AS "col1"`, not
compatible with the `id` in the second part.
In the full code, the union is injected in another query, of the form
`SELECT ... FROM ... INNER JOIN (the union) PM ON (... = PM.id)`
So it leads to the error: `django.db.utils.OperationalError: no such
column: PM.id`
In db/models/sql/compiler.py, get_combinator_sql(), the call is imposed
as: `as_sql(with_col_aliases=True)`
I don't know how to solve this problem.
Any advice?
--
Ticket URL: <https://code.djangoproject.com/ticket/28333#comment:35>