[Django] #32152: Groupby after aggregation and subquery produces subtly wrong results

5 views
Skip to first unread message

Django

unread,
Oct 27, 2020, 1:30:06 PM10/27/20
to django-...@googlegroups.com
#32152: Groupby after aggregation and subquery produces subtly wrong results
-------------------------------------+-------------------------------------
Reporter: Christian | Owner: nobody
Klus |
Type: Bug | Status: new
Component: Database | Version: 3.1
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Starting in django 3.0.7, specifically after patch #31566 I noticed some
of my more complex queries returning incorrect results. I think I've
narrowed it down to a simpler test case:

Example query:
{{{
Book.objects.all().annotate(
pub_year=TruncYear('pubdate')
).order_by().values('pub_year').annotate(
total_pages=Sum('pages'),
top_rating=Subquery(
Book.objects.filter(
pubdate__year=OuterRef('pub_year')
).order_by('rating').values('rating')[:1]
)
).values('pub_year', 'total_pages', 'top_rating')
}}}

Generated SQL on 3.0.6:

{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate") AS
"pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate")
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate"),
"top_rating"
}}}

Generated SQL on current master:

{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
),
"aggregation_regress_book"."pubdate"
}}}

----

I see two separate issues here:
- {{{"aggregation_regress_book"."pubdate"}}} is being added to the group
by clause incorrectly (this issue probably predates the patch mentioned
above)
- Even though the subquery is in the select statement, the alias is not
being used and instead the subquery is reevaluated. This nearly doubles
the cost of one of my queries that is experiencing this problem.

----

I don't know much about the ORM internals, but here is my naive patch for
the second issue:

{{{
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index ee98984826..6ea287d6cb 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -2220,7 +2220,7 @@ class Query(BaseExpression):
# the selected fields anymore.
group_by = []
for expr in self.group_by:
- if isinstance(expr, Ref) and expr.refs not in
field_names:
+ if isinstance(expr, Ref) and expr.refs not in field_names
+ annotation_names:
expr = self.annotations[expr.refs]
group_by.append(expr)
self.group_by = tuple(group_by)
}}}

I'd appreciate anyone with deeper knowlege of the ORM to chime in and let
me know if I'm on the right track. Tests are passing locally.

The resulting query on master:

{{{
SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
"top_rating"
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 27, 2020, 1:33:50 PM10/27/20
to django-...@googlegroups.com
#32152: Groupby after aggregation and subquery produces subtly wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Christian Klus):

Here's the test case I used (not sure if it's in the right location). It
succeeds on 3.0.6 and fails on subsequent releases.

{{{
diff --git a/tests/aggregation_regress/tests.py
b/tests/aggregation_regress/tests.py
index 7604335257..dac995e1fc 100644
--- a/tests/aggregation_regress/tests.py
+++ b/tests/aggregation_regress/tests.py
@@ -8,9 +8,10 @@ from django.contrib.contenttypes.models import
ContentType
from django.core.exceptions import FieldError
from django.db import connection
from django.db.models import (
- Aggregate, Avg, Case, Count, DecimalField, F, IntegerField, Max, Q,
StdDev,
- Sum, Value, Variance, When,
+ Aggregate, Avg, Case, Count, DecimalField, F, IntegerField, Max,
OuterRef,
+ Q, StdDev, Subquery, Sum, Value, Variance, When
)
+from django.db.models.functions import TruncYear
from django.test import TestCase, skipUnlessAnyDBFeature,
skipUnlessDBFeature
from django.test.utils import Approximate

@@ -102,6 +103,21 @@ class AggregationTests(TestCase):
s2.books.add(cls.b1, cls.b3, cls.b5, cls.b6)
s3.books.add(cls.b3, cls.b4, cls.b6)

+ def test_groupby_after_aggregation_and_subquery(self):
+ self.assertEqual(
+ Book.objects.all().annotate(
+ pub_year=TruncYear('pubdate')
+ ).order_by().values('pub_year').annotate(
+ total_pages=Sum('pages'),
+ top_rating=Subquery(
+ Book.objects.filter(
+ pubdate__year=OuterRef('pub_year')
+ ).order_by('rating').values('rating')[:1]
+ )
+ ).values('pub_year', 'total_pages', 'top_rating').count(),
+ 4
+ )
+
def assertObjectAttrs(self, obj, **kwargs):
for attr, value in kwargs.items():
self.assertEqual(getattr(obj, attr), value)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:1>

Django

unread,
Oct 27, 2020, 1:40:52 PM10/27/20
to django-...@googlegroups.com
#32152: Groupby after aggregation and subquery produces subtly wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Christian Klus:

Old description:

New description:

Starting in django 3.0.7, specifically after patch #31566 I noticed some
of my more complex queries returning incorrect results. I think I've
narrowed it down to a simpler test case:

Example query:
{{{#!python


Book.objects.all().annotate(
pub_year=TruncYear('pubdate')
).order_by().values('pub_year').annotate(
total_pages=Sum('pages'),
top_rating=Subquery(
Book.objects.filter(
pubdate__year=OuterRef('pub_year')
).order_by('rating').values('rating')[:1]
)
).values('pub_year', 'total_pages', 'top_rating')
}}}

Generated SQL on 3.0.6:

{{{#!sql


SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate") AS
"pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate")
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate"),
"top_rating"
}}}

Generated SQL on current master:

{{{#!sql

----

----

{{{#!sql


SELECT
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL) AS "pub_year",
SUM("aggregation_regress_book"."pages") AS "total_pages",
(
SELECT U0."rating"
FROM "aggregation_regress_book" U0
WHERE
django_date_extract('year', U0."pubdate") =
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL)
ORDER BY U0."rating" ASC LIMIT 1
) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL,
NULL),
"top_rating"
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:2>

Django

unread,
Oct 27, 2020, 1:58:43 PM10/27/20
to django-...@googlegroups.com
#32152: Groupby after aggregation and subquery produces subtly wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted


Comment:

Thanks for the report and test Christian, I think you analysis is right;
all members of the `SELECT` clause should be considered.

Could you submit a PR on Github including your test and the following
changes

{{{#!diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index ee98984826..96819803c0 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -2205,8 +2205,10 @@ def set_values(self, fields):
field_names.append(f)
self.set_extra_mask(extra_names)
self.set_annotation_mask(annotation_names)
+ selected = frozenset(field_names + extra_names +
annotation_names)
else:
field_names = [f.attname for f in
self.model._meta.concrete_fields]
+ selected = frozenset(field_names)
# Selected annotations must be known before setting the GROUP BY
# clause.
if self.group_by is True:
@@ -2220,7 +2222,7 @@ def set_values(self, fields):


# the selected fields anymore.
group_by = []
for expr in self.group_by:
- if isinstance(expr, Ref) and expr.refs not in
field_names:

+ if isinstance(expr, Ref) and expr.refs not in selected:


expr = self.annotations[expr.refs]
group_by.append(expr)
self.group_by = tuple(group_by)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:3>

Django

unread,
Oct 27, 2020, 2:28:41 PM10/27/20
to django-...@googlegroups.com
#32152: Groupby after aggregation and subquery produces subtly wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Christian Klus):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:5>

Django

unread,
Oct 27, 2020, 3:35:33 PM10/27/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Patch could likely be simplified a bit but it captures the essence of the
issue. Thanks a lot for the report and investigation Christian.

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:6>

Django

unread,
Oct 27, 2020, 4:04:33 PM10/27/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------
Reporter: Christian Klus | Owner: Christian
| Klus
Type: Bug | Status: assigned
Component: Database layer | Version: 3.0
(models, ORM) |
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: nobody => Christian Klus
* status: new => assigned
* version: 3.1 => 3.0
* severity: Normal => Release blocker


Comment:

Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:7>

Django

unread,
Oct 29, 2020, 5:32:26 AM10/29/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------

Reporter: Christian Klus | Owner: Christian
| Klus
Type: Bug | Status: assigned
Component: Database layer | Version: 3.0
(models, ORM) |
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 0 => 1
* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:8>

Django

unread,
Oct 29, 2020, 6:29:52 AM10/29/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------

Reporter: Christian Klus | Owner: Christian
| Klus
Type: Bug | Status: closed

Component: Database layer | Version: 3.0
(models, ORM) |
Severity: Release blocker | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"4ac2d4fa42e1659f328c35b6b8d4761b3419c11a" 4ac2d4fa]:
{{{
#!CommitTicketReference repository=""
revision="4ac2d4fa42e1659f328c35b6b8d4761b3419c11a"
Fixed #32152 -- Fixed grouping by subquery aliases.

Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.

Thanks Simon Charette for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:9>

Django

unread,
Oct 29, 2020, 6:32:55 AM10/29/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------

Reporter: Christian Klus | Owner: Christian
| Klus
Type: Bug | Status: closed
Component: Database layer | Version: 3.0
(models, ORM) |
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"ab951d242e3de232c7fc9a2f6facfd6aa95f5fe2" ab951d24]:
{{{
#!CommitTicketReference repository=""
revision="ab951d242e3de232c7fc9a2f6facfd6aa95f5fe2"
[3.1.x] Fixed #32152 -- Fixed grouping by subquery aliases.

Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.

Thanks Simon Charette for the review.

Backport of 4ac2d4fa42e1659f328c35b6b8d4761b3419c11a from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:10>

Django

unread,
Oct 29, 2020, 6:40:11 AM10/29/20
to django-...@googlegroups.com
#32152: Aggregation over subquery annotation GROUP BY produces wrong results
-------------------------------------+-------------------------------------

Reporter: Christian Klus | Owner: Christian
| Klus
Type: Bug | Status: closed
Component: Database layer | Version: 3.0
(models, ORM) |
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"b0a6798de59151163b94d1c0ccf0d010a3d46ac2" b0a6798d]:
{{{
#!CommitTicketReference repository=""
revision="b0a6798de59151163b94d1c0ccf0d010a3d46ac2"
[3.0.x] Fixed #32152 -- Fixed grouping by subquery aliases.

Regression in 42c08ee46539ef44f8658ebb1cbefb408e0d03fe.

Thanks Simon Charette for the review.

Backport of 4ac2d4fa42e1659f328c35b6b8d4761b3419c11a from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32152#comment:11>

Reply all
Reply to author
Forward
0 new messages