[Django] #35396: QuerySet filters incorrectly pushed to the inner query when applied after a window function filter

11 views
Skip to first unread message

Django

unread,
Apr 22, 2024, 1:56:52 PM4/22/24
to django-...@googlegroups.com
#35396: QuerySet filters incorrectly pushed to the inner query when applied after a
window function filter
-------------------------------------+-------------------------------------
Reporter: Gary Chen | Owner: nobody
Type: Bug | Status: new
Component: Database | Version: 4.2
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I may be doing something funky here with my window function, but I'm
trying to get the first row in each partition of a window, then filter the
results by another column. I expect the last filter to be applied to the
outer query created by the window function filter, but it's "pushed" up to
the inner query leading to incorrect results. Filtering on window
functions was introduced in 4.2 and I wonder if this is a case that wasn't
caught.

== Example
A simple model:
{{{#!python
class Player(Model):
name = CharField()
city = CharField()
score = IntegerField()
active = BooleanField()
}}}

Some data:
||= id =||= name =||= city =||= score =||=active=||
||0||Cary||Phoenix||17||false||
||1||Joe||Phoenix||15||true||
||2||Katie||Phoenix||13||true||
||3||Bob||Springfield||12||true||
||4||Alice||Springfield||10||true||

The queryset:
{{{#!python
Player.objects.annotate(
first=Window(
expression=functions.FirstValue("id"),
partition_by=[F("city")],
order_by=("-score"),
),
).filter(id=F("first"), active=True)
}}}

The generated sql looks like this:
{{{#!sql
SELECT
*
FROM
(
SELECT
`myapp_player`.`id` AS `col1`,
`myapp_player`.`name` AS `col2`,
`myapp_player`.`city` AS `col3`,
`myapp_player`.`score` AS `col4`,
`myapp_player`.`active` AS `col5`,
FIRST_VALUE(`myapp_player`.`id`) OVER (
PARTITION BY `myapp_player`.`city`
ORDER BY
`myapp_player`.`score` DESC
) AS `first`
FROM
`myapp_player`
WHERE
`myapp_player`.`active` = True
) `qualify`
WHERE
`col1` = (`first`)
}}}

This would return this result:

||= id =||= name =||= city =||= score =||=active=||
||1||Joe||Phoenix||15||true||
||3||Bob||Springfield||12||true||

== Expected
I would expect the generated SQL from that queryset to look like this:

{{{#!sql
SELECT
*
FROM
(
SELECT
`myapp_player`.`id` AS `col1`,
`myapp_player`.`name` AS `col2`,
`myapp_player`.`city` AS `col3`,
`myapp_player`.`score` AS `col4`,
`myapp_player`.`active` AS `col5`,
FIRST_VALUE(`myapp_player`.`id`) OVER (
PARTITION BY `myapp_player`.`city`
ORDER BY
`myapp_player`.`score` DESC
) AS `first`
FROM
`myapp_player`
) `qualify`
WHERE
`col1` = (`first`) AND `col5` = True
}}}

With a result of:
||= id =||= name =||= city =||= score =||=active=||
||3||Bob||Springfield||12||true||
--
Ticket URL: <https://code.djangoproject.com/ticket/35396>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 22, 2024, 4:04:25 PM4/22/24
to django-...@googlegroups.com
#35396: QuerySet filters incorrectly pushed to the inner query when applied after a
window function filter
-------------------------------------+-------------------------------------
Reporter: Gary Chen | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)
* resolution: => wontfix
* status: new => closed

Comment:

Changing the behavior would trivial, adjust `Where.split_having_qualify`
to always return everything in `qualify_node` when it's present and
nothing in `where_node` or `having_node` but I don't think it's the right
thing to do.

The real problem here is the ambiguity of what should be done when users
filter against non-windowed and windowed expressions at the same time.

The current implementation defaults to respecting what fetching the result
set without the qualify outer query emulation would do to make sure non-
window referencing filters are applied against the set of rows windowered
over.

In other words if you do
{{{#!python
objects = list(Player.objects.annotate(
first=Window(
expression=functions.FirstValue("id"),
partition_by=[F("city")],
order_by=("-score"),
),
).filter(active=True))
}}}

And you iterate over `objects` you'll only get two matches of `id` and
`first`. The way it's implemented is coherent with how
[https://docs.snowflake.com/en/sql-reference/constructs/qualify QUALIFY is
implemented] in backends that support it.

What I suspect you want here is more control over subquery wrapping
instead (see #24462).

If you want to want to filter out highest score by city you should also
window by `active` and filter against it

{{{#!python
Player.objects.annotate(
first=Window(
expression=functions.FirstValue("id"),
partition_by=[F("city")],
order_by=("-score"),
),
first_active=Window(
expression=functions.FirstValue("active"),
partition_by=[F("city")],
order_by=("-score"),
),
).filter(id=F("first"), first_active=True)
}}}

{{{#!sql
SELECT
*
FROM
(
SELECT
`myapp_player`.`id` AS `col1`,
`myapp_player`.`name` AS `col2`,
`myapp_player`.`city` AS `col3`,
`myapp_player`.`score` AS `col4`,
`myapp_player`.`active` AS `col5`,
FIRST_VALUE(`myapp_player`.`id`) OVER (
PARTITION BY `myapp_player`.`city`
ORDER BY
`myapp_player`.`score` DESC
) AS `first`,
FIRST_VALUE(`myapp_player`.`active`) OVER (
PARTITION BY `myapp_player`.`city`
ORDER BY
`myapp_player`.`score` DESC
) AS `first_active`
FROM
`myapp_player`
) `qualify`
WHERE
`col1` = (`first`) AND `first_active` = True
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35396#comment:1>

Django

unread,
Apr 23, 2024, 9:46:26 AM4/23/24
to django-...@googlegroups.com
#35396: QuerySet filters incorrectly pushed to the inner query when applied after a
window function filter
-------------------------------------+-------------------------------------
Reporter: Gary Chen | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Gary Chen):

>>The real problem here is the ambiguity of what should be done when users
filter against non-windowed and windowed expressions at the same time.

I would think the ordering of the filter statements would at least make
this clear. The django docs call out that annotation/filter ordering
matters, so I'd expect filters before the window annotation to apply to
the inner query, and after to apply to the outer query.
--
Ticket URL: <https://code.djangoproject.com/ticket/35396#comment:2>
Reply all
Reply to author
Forward
0 new messages