Example query:
{{{
Book.objects.all().annotate(
pub_year=TruncYear('pubdate')
).order_by().values('pub_year').annotate(
total_pages=Sum('pages'),
top_rating=Subquery(
Book.objects.filter(
pubdate__year=OuterRef('pub_year')
).order_by('rating').values('rating')[:1]
)
).values('pub_year', 'total_pages', 'top_rating')
}}}
Generated SQL on 3.0.6:
{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate") AS
"pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate")
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate"),
"top_rating"
}}}
Generated SQL on current master:
{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
),
"aggregation_regress_book"."pubdate"
}}}
----
I see two separate issues here:
- {{{"aggregation_regress_book"."pubdate"}}} is being added to the group
by clause incorrectly (this issue probably predates the patch mentioned
above)
- Even though the subquery is in the select statement, the alias is not
being used and instead the subquery is reevaluated. This nearly doubles
the cost of one of my queries that is experiencing this problem.
----
I don't know much about the ORM internals, but here is my naive patch for
the second issue:
{{{
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index ee98984826..6ea287d6cb 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -2220,7 +2220,7 @@ class Query(BaseExpression):
# the selected fields anymore.
group_by = []
for expr in self.group_by:
- if isinstance(expr, Ref) and expr.refs not in
field_names:
+ if isinstance(expr, Ref) and expr.refs not in field_names
+ annotation_names:
expr = self.annotations[expr.refs]
group_by.append(expr)
self.group_by = tuple(group_by)
}}}
I'd appreciate anyone with deeper knowlege of the ORM to chime in and let
me know if I'm on the right track. Tests are passing locally.
The resulting query on master:
{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
"top_rating"
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by Christian Klus):
Here's the test case I used (not sure if it's in the right location). It
succeeds on 3.0.6 and fails on subsequent releases.
{{{
diff --git a/tests/aggregation_regress/tests.py
b/tests/aggregation_regress/tests.py
index 7604335257..dac995e1fc 100644
--- a/tests/aggregation_regress/tests.py
+++ b/tests/aggregation_regress/tests.py
@@ -8,9 +8,10 @@ from django.contrib.contenttypes.models import
ContentType
from django.core.exceptions import FieldError
from django.db import connection
from django.db.models import (
- Aggregate, Avg, Case, Count, DecimalField, F, IntegerField, Max, Q,
StdDev,
- Sum, Value, Variance, When,
+ Aggregate, Avg, Case, Count, DecimalField, F, IntegerField, Max,
OuterRef,
+ Q, StdDev, Subquery, Sum, Value, Variance, When
)
+from django.db.models.functions import TruncYear
from django.test import TestCase, skipUnlessAnyDBFeature,
skipUnlessDBFeature
from django.test.utils import Approximate
@@ -102,6 +103,21 @@ class AggregationTests(TestCase):
s2.books.add(cls.b1, cls.b3, cls.b5, cls.b6)
s3.books.add(cls.b3, cls.b4, cls.b6)
+ def test_groupby_after_aggregation_and_subquery(self):
+ self.assertEqual(
+ Book.objects.all().annotate(
+ pub_year=TruncYear('pubdate')
+ ).order_by().values('pub_year').annotate(
+ total_pages=Sum('pages'),
+ top_rating=Subquery(
+ Book.objects.filter(
+ pubdate__year=OuterRef('pub_year')
+ ).order_by('rating').values('rating')[:1]
+ )
+ ).values('pub_year', 'total_pages', 'top_rating').count(),
+ 4
+ )
+
def assertObjectAttrs(self, obj, **kwargs):
for attr, value in kwargs.items():
self.assertEqual(getattr(obj, attr), value)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:1>
Old description:
New description:
Starting in django 3.0.7, specifically after patch #31566 I noticed some
of my more complex queries returning incorrect results. I think I've
narrowed it down to a simpler test case:
Example query:
{{{#!python
Book.objects.all().annotate(
pub_year=TruncYear('pubdate')
).order_by().values('pub_year').annotate(
total_pages=Sum('pages'),
top_rating=Subquery(
Book.objects.filter(
pubdate__year=OuterRef('pub_year')
).order_by('rating').values('rating')[:1]
)
).values('pub_year', 'total_pages', 'top_rating')
}}}
Generated SQL on 3.0.6:
{{{#!sql
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate") AS
"pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate")
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate"),
"top_rating"
}}}
Generated SQL on current master:
{{{#!sql
----
----
{{{#!sql
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
"top_rating"
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:2>
* stage: Unreviewed => Accepted
Comment:
Thanks for the report and test Christian, I think you analysis is right;
all members of the `SELECT` clause should be considered.
Could you submit a PR on Github including your test and the following
changes
{{{#!diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index ee98984826..96819803c0 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -2205,8 +2205,10 @@ def set_values(self, fields):
field_names.append(f)
self.set_extra_mask(extra_names)
self.set_annotation_mask(annotation_names)
+ selected = frozenset(field_names + extra_names +
annotation_names)
else:
field_names = [f.attname for f in
self.model._meta.concrete_fields]
+ selected = frozenset(field_names)
# Selected annotations must be known before setting the GROUP BY
# clause.
if self.group_by is True:
@@ -2220,7 +2222,7 @@ def set_values(self, fields):
# the selected fields anymore.
group_by = []
for expr in self.group_by:
- if isinstance(expr, Ref) and expr.refs not in
field_names:
+ if isinstance(expr, Ref) and expr.refs not in selected:
expr = self.annotations[expr.refs]
group_by.append(expr)
self.group_by = tuple(group_by)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:3>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:5>
Comment (by Simon Charette):
Patch could likely be simplified a bit but it captures the essence of the
issue. Thanks a lot for the report and investigation Christian.
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:6>
* owner: nobody => Christian Klus
* status: new => assigned
* version: 3.1 => 3.0
* severity: Normal => Release blocker
Comment:
Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:7>
* needs_better_patch: 0 => 1
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:8>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"4ac2d4fa42e1659f328c35b6b8d4761b3419c11a" 4ac2d4fa]:
{{{
#!CommitTicketReference repository=""
revision="4ac2d4fa42e1659f328c35b6b8d4761b3419c11a"
Fixed #32152 -- Fixed grouping by subquery aliases.
Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.
Thanks Simon Charette for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:9>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"ab951d242e3de232c7fc9a2f6facfd6aa95f5fe2" ab951d24]:
{{{
#!CommitTicketReference repository=""
revision="ab951d242e3de232c7fc9a2f6facfd6aa95f5fe2"
[3.1.x] Fixed #32152 -- Fixed grouping by subquery aliases.
Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.
Thanks Simon Charette for the review.
Backport of 4ac2d4fa42e1659f328c35b6b8d4761b3419c11a from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:10>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"b0a6798de59151163b94d1c0ccf0d010a3d46ac2" b0a6798d]:
{{{
#!CommitTicketReference repository=""
revision="b0a6798de59151163b94d1c0ccf0d010a3d46ac2"
[3.0.x] Fixed #32152 -- Fixed grouping by subquery aliases.
Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.
Thanks Simon Charette for the review.
Backport of 4ac2d4fa42e1659f328c35b6b8d4761b3419c11a from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:11>