[Django] #25095: Fields annotated for filtering which are not included in values are included in GROUP BY clause in SQL

25 views
Skip to first unread message

Django

unread,
Jul 9, 2015, 2:40:47 PM7/9/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
----------------------------------------------+--------------------
Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: 1.8
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
I am running a complex query, which annotates certain fields solely to
filter on them. They are not included in a call to values. Another
annotation is included, but it is incorrect due to the previous
annotations being included in the GROUP BY clause.

Here is a not so simple example:

{{{
# We are filtering on the journey date, using the datediff function
# We annotate the reported tds fields, since we need two aliases of them
for the different filterings applied

print EpTag.objects.filter(statuschanges__status=attached)\
.annotate(attach_date=F('statuschanges__reported_tds'))\
.filter(statuschanges__status=detached)\
.annotate(detach_date=F('statuschanges__reported_tds'))\
.annotate(journey_days=Func(F('detach_date'), F('attach_date'),
function='DATEDIFF', output_field=IntegerField()))\
.filter(journey_days__lt=10)\
.annotate(name=F('object__producer__name'))\
.values('object__producer', 'name')\
.annotate(count=Count('pk', distinct=True))\
.order_by().query

SELECT `ep_object`.`cid`, `ep_entity`.`name` AS `name`, COUNT(DISTINCT
`ep_tag`.`tid`) AS `count`
FROM `ep_tag`
INNER JOIN `ep_statuschange` ON ( `ep_tag`.`tid` = `ep_statuschange`.`tid`
)
INNER JOIN `ep_statuschange` T4 ON ( `ep_tag`.`tid` = T4.`tid` )
LEFT OUTER JOIN `ep_object` ON ( `ep_tag`.`oid` = `ep_object`.`oid` )
LEFT OUTER JOIN `ep_entity` ON ( `ep_object`.`cid` = `ep_entity`.`eid` )
WHERE (`ep_statuschange`.`statusid` = 201 AND T4.`statusid` = 1000 AND
DATEDIFF(T4.`reported_tds`, `ep_statuschange`.`reported_tds`) < 10)
# Group by still includes reported_tds and DATEDIFF fields, even though
they were not selected. This breaks my count
GROUP BY `ep_object`.`cid`, `ep_statuschange`.`reported_tds`,
T4.`reported_tds`, DATEDIFF(T4.`reported_tds`,
`ep_statuschange`.`reported_tds`), `ep_entity`.`name` ORDER BY NULL
}}}

In django/db/models/sql/query.py, line 1751 - all annotations are included
in the group by. I believe this should be using annotation_select instead
of annotations.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 10, 2015, 9:29:12 PM7/10/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jarshwah):

* needs_docs: => 0
* needs_better_patch: => 0
* version: 1.8 => master
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

I've replicated the problem:

{{{

class FTest(models.Model):
talk = models.IntegerField()
hold = models.IntegerField()

In [8]: from scratch.models import *; from django.db.models import F,
Count

In [9]:
print(FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold').annotate(thecount=Count('pk')).query)
SELECT "scratch_ftest"."hold", COUNT("scratch_ftest"."id") AS "thecount"
FROM "scratch_ftest" WHERE "scratch_ftest"."talk" <= 30 GROUP BY
"scratch_ftest"."hold", "scratch_ftest"."talk"

}}}

It seems like your proposed fix works for your case, but my test VM is
broken at the moment so I can't run the full test suite.

{{{
$ git diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index df65405..2f2375d 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -1674,8 +1674,8 @@ class Query(object):
for col in self.select:
self.group_by.append(col)

- if self._annotations:
- for alias, annotation in six.iteritems(self.annotations):
+ if self.annotation_select:
+ for alias, annotation in
six.iteritems(self.annotation_select):
for col in annotation.get_group_by_cols():
self.group_by.append(col)

}}}

With the patch applied:

{{{
In [2]:
print(FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold').annotate(thecount=Count('pk')).query)
SELECT "scratch_ftest"."hold", COUNT("scratch_ftest"."id") AS "thecount"
FROM "scratch_ftest" WHERE "scratch_ftest"."talk" <= 30 GROUP BY
"scratch_ftest"."hold"
}}}

To hopefully clear up the report somewhat, the issue is that all
annotations are added to the group by list, even if they aren't in the
select list. By applying the above patch, we only add the annotations in
the select list to the group by list.

I'd like to see a test that had an annotation in the .order_by() clause
that wasn't included in .values(), just to ensure we aren't removing too
much from the group by.

Thanks for the great report and the fix. If you'd like to open up a pull
request with the proposed change and some tests, that'd be very welcome.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:1>

Django

unread,
Jul 11, 2015, 10:46:32 AM7/11/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

We need to also test that:
{{{
FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold',
'talking').annotate(thecount=Count('pk')).values('hold')
}}}
does use the talking annotation in the group by.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:2>

Django

unread,
Jul 15, 2015, 5:31:58 PM7/15/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Old description:

New description:

I am running a complex query, which annotates certain fields solely to
filter on them. They are not included in a call to values. Another
annotation is included, but it is incorrect due to the previous
annotations being included in the GROUP BY clause.

Here is a not so simple example:

{{{
# We are filtering on the journey date, using the datediff function
# We annotate the reported tds fields, since we need two aliases of them
for the different filterings applied

print EpTag.objects.filter(statuschanges__status=attached)\
.annotate(attach_date=F('statuschanges__reported_tds'))\
.filter(statuschanges__status=detached)\
.annotate(detach_date=F('statuschanges__reported_tds'))\
.annotate(journey_days=Func(F('detach_date'), F('attach_date'),
function='DATEDIFF', output_field=IntegerField()))\
.filter(journey_days__lt=10)\
.annotate(name=F('object__producer__name'))\
.values('object__producer', 'name')\
.annotate(count=Count('pk', distinct=True))\
.order_by().query

SELECT `ep_object`.`cid`, `ep_entity`.`name` AS `name`, COUNT(nnn


`ep_tag`.`tid`) AS `count`
FROM `ep_tag`
INNER JOIN `ep_statuschange` ON ( `ep_tag`.`tid` = `ep_statuschange`.`tid`
)
INNER JOIN `ep_statuschange` T4 ON ( `ep_tag`.`tid` = T4.`tid` )
LEFT OUTER JOIN `ep_object` ON ( `ep_tag`.`oid` = `ep_object`.`oid` )
LEFT OUTER JOIN `ep_entity` ON ( `ep_object`.`cid` = `ep_entity`.`eid` )
WHERE (`ep_statuschange`.`statusid` = 201 AND T4.`statusid` = 1000 AND
DATEDIFF(T4.`reported_tds`, `ep_statuschange`.`reported_tds`) < 10)
# Group by still includes reported_tds and DATEDIFF fields, even though
they were not selected. This breaks my count
GROUP BY `ep_object`.`cid`, `ep_statuschange`.`reported_tds`,
T4.`reported_tds`, DATEDIFF(T4.`reported_tds`,
`ep_statuschange`.`reported_tds`), `ep_entity`.`name` ORDER BY NULL
}}}

In django/db/models/sql/query.py, line 1751 - all annotations are included
in the group by. I believe this should be using annotation_select instead
of annotations.

--

Comment (by mitchelljkotler):

I created a pull request, 5002.

It adds one test to test for this issue.

@akaariai, I am not sure I understand what you want to test with that test
case.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:3>

Django

unread,
Jul 15, 2015, 8:49:07 PM7/15/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

akaariai wants to ensure that adding the annotation to the `values()` call
still includes that annotation in the group by, even though the final
`values()` call does not include the annotation. This is important,
because we still need to add the annotation to the group by, since it was
included before the aggregate, even though it's excluded from the final
select list.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:4>

Django

unread,
Jul 22, 2015, 7:34:14 AM7/22/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* needs_better_patch: 0 => 1
* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:5>

Django

unread,
Jul 22, 2015, 10:07:52 AM7/22/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by mitchelljkotler):

* needs_better_patch: 1 => 0


Comment:

I have added the test that akaariai requested. I believe the patch is
ready to go now.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:6>

Django

unread,
Jul 22, 2015, 7:48:46 PM7/22/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

The second test you added doesn't actually test what akaariai wants
tested. I've made a comment on the new commit though, with a code example
of what type of query needs to be tested. Hope that's clearer.

Copying the test query below:

{{{
vals = list(
Book.objects.annotate(xprice=F('price'))
.filter(xprice__lte=30).values('publisher', 'xprice') #
xprice should now be in the group by clause
.annotate(count=Count('pk')).values('publisher', 'count') #
but xprice won't be in final select list returned
.order_by('publisher'))
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:7>

Django

unread,
Jul 23, 2015, 10:42:26 AM7/23/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by mitchelljkotler):

Sorry about that. I have updated the test and improved the formatting.
Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:8>

Django

unread,
Jul 25, 2015, 8:18:46 AM7/25/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------

Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* stage: Accepted => Ready for checkin


Comment:

This is good to go pending a fix for the test on Oracle.

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:9>

Django

unread,
Jul 27, 2015, 7:47:32 AM7/27/15
to django-...@googlegroups.com
#25095: Fields annotated for filtering which are not included in values are
included in GROUP BY clause in SQL
-------------------------------------+-------------------------------------
Reporter: mitchelljkotler | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"6024fd5dc2022f3724ea6440c319440d457a7366" 6024fd5]:
{{{
#!CommitTicketReference repository=""
revision="6024fd5dc2022f3724ea6440c319440d457a7366"
Fixed #25095 -- Fixed annotate() + values() group by bug

Thanks Josh Smeaton for help on the tests.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25095#comment:10>

Reply all
Reply to author
Forward
0 new messages