[Django] #34750: Group by removal optimization generates in correct count query

6 views

Skip to first unread message

Django

unread,

Jul 28, 2023, 11:57:19 AM7/28/23

to django-...@googlegroups.com

#34750: Group by removal optimization generates in correct count query
-----------------------------------------+------------------------
Reporter: Toan Vuong | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 4.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
I believe this is related to the work in
https://code.djangoproject.com/ticket/28477 (and follow-up issues/changes
due to that ticket).

Tested on:
Django 4.2.2
OS X 13.4.1
Python 3.9.16

For the Oracle backend:
cx-Oracle 8.3.0 with instantclient 19.8

For the Postgres backend:
psycopg/psycopg-binary 3.1.9

Attached is a sample project, and the relevant query is below:

{{{
def populate_data():
q1 = Question(question_text='t1')
q1.save()

for i in range(10):
c = Choice(question=q1, choice_text='c1', votes=i)
c.save()

# Need to populate the data only once, so comment it out on subsequent
runs
populate_data()

qs = (
Question.objects.values('id', 'question_text')
.annotate(mysum=Sum('choice__votes'))
.annotate(choice__votes_threshold=Case(
When(choice__votes__gt=1, then=Value(1000)),
default=Value(-1)))
)
print(qs.count() == len(qs))

}}}

When issuing the count query (`qs.count()`), Django generates this:

{{{
SELECT COUNT(*) FROM (SELECT "polls_question"."id" AS "col1",
"polls_question"."question_text" AS "col2" FROM "polls_question" LEFT
OUTER JOIN "polls_choice" ON ("polls_question"."id" =
"polls_choice"."question_id") GROUP BY 1
}}}

I've chased it down to
[https://github.com/django/django/blob/main/django/db/models/sql/query.py#L490
this optimization]. This count would return 1 because it's just a simple
join.

However, the actual query is this:

{{{
SELECT "polls_question"."id", "polls_question"."question_text",
SUM("polls_choice"."votes") AS "mysum", CASE WHEN "polls_choice"."votes" >
1 THEN 1000 ELSE -1 END AS "choice__votes_threshold" FROM
"polls_question" LEFT OUTER JOIN "polls_choice" ON ("polls_question"."id"
= "polls_choice"."question_id") GROUP BY "polls_question"."id", 4
}}}

Due to the group by and the varying `choice__votes_threshold`, there
should be 2 rows in the result set.

This _did_ used to work in 3.2.18, and I didn't dig too much into it but I
suspect the optimization was introduced after that version, hence why it
worked.

--
Ticket URL: <https://code.djangoproject.com/ticket/34750>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Jul 28, 2023, 11:57:29 AM7/28/23

to django-...@googlegroups.com

#34750: Group by removal optimization generates in correct count query

-------------------------------+--------------------------------------

Reporter: Toan Vuong | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 4.2

Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0

-------------------------------+--------------------------------------
Changes (by Toan Vuong):

* Attachment "mysite.tar.gz" added.

Sample project

Reply all

Reply to author

Forward

0 new messages