I have a complex query interacting with a legacy database, which I
simplified below to hilight the issue:
{{{
>>> from user.models import Sponsor
>>> from django.db.models import ExpressionWrapper, Count, DecimalField
>>> from django.db.models.expressions import RawSQL
>>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE
objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", [])
>>>
str(Sponsor.objects.all().annotate(report_rate=ExpressionWrapper(nb_reports
/ Count('deliveries'),
output_field=DecimalField())).order_by('-report_rate').query)
}}}
This code, in **Django 1.11.9,** gives me the following query:
{{{
SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM
pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND
state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON
(`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId` ORDER BY `report_rate` DESC
}}}
This is the expected behavior and it works well.
However, in **Django 2.0.5**, the same code gives me this query:
{{{
SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM
pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND
state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON
(`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation
WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2)
ORDER BY `report_rate` DESC
}}}
As you can see, the ORM appended the subquery `(SELECT COUNT(*) FROM
pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND
state=2)` to the GROUP BY clause. Which is wrong, and takes forever to
execute.
I tried to play with `.values('id')` or such as I usually do when I get
unexpected GROUP BY. I spent an afternoon on it but there's no way I could
get rid of this undesired group by clause. The `order_by` is not to blame
either. Here is another example of what I tried:
{{{
str(Sponsor.objects.all().values('id').annotate(report_rate=ExpressionWrapper(nb_reports
/ Count('deliveries'), output_field=DecimalField())).order_by().query)
}}}
Which gives:
{{{
SELECT `ala_sponsor`.`sponId`, ((SELECT COUNT(*) FROM pro_moderation WHERE
objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) /
COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON
(`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation
WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2)
ORDER BY NULL
}}}
Also note that this is the
`annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'),
output_field=DecimalField()))` that causes this issue. If I only do
`annotate(nb_reports=nb_reports)` or
`annotate(nb_deliveries=COUNT('deliveries'))` there is no additional GROUP
BY clause generated.
{{{
In [40]:
str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports).order_by().query)
Out[40]: "SELECT `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM
pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND
state=2) AS `nb_reports` FROM `ala_sponsor`"
In [41]:
str(Sponsor.objects.all().values('id').annotate(nb_deliveries=Count('deliveries')).order_by().query)
Out[41]: 'SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`)
AS `nb_deliveries` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need`
ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY
`ala_sponsor`.`sponId` ORDER BY NULL'
In [42]:
str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports,
nb_deliveries=Count('deliveries')).order_by().query)
Out[42]: "SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`)
AS `nb_deliveries`, (SELECT COUNT(*) FROM pro_moderation WHERE
objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) AS
`nb_reports` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON
(`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY
`ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE
objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) ORDER BY NULL"
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
-----
-----
Which gives:
--
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:1>
Old description:
> I am facing an issue while upgrading from Django 1.11 to Django 2.0.
> -----
> -----
New description:
I am facing an issue while upgrading from Django 1.11 to Django 2.0.
I have a complex query interacting with a legacy MySQL database, which I
simplified below to highlight the issue:
-----
-----
Which gives:
--
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:2>
Comment (by Tim Graham):
Can you [https://docs.djangoproject.com/en/dev/internals/contributing
/triaging-tickets/#bisecting-a-regression bisect] to find where the
behavior changed?
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:3>
Comment (by Antoine Pinsard):
So, here is the commit that introduced the change of behavior:
https://github.com/django/django/commit/1d070d027c218285b66c0bde8079034b33a87f11
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:4>
* Attachment "test_regression_29416.py" added.
tests/annotations/test_regression_29416.py
* cc: felixxm (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:5>
Comment (by Antoine Pinsard):
I think the issue is the `getattr(expr, 'alias', None) not in pk_aliases`.
Maybe it should be `getattr(expr, 'alias', None) not in pk_aliases and not
isinstance(expr, RawSQL)` or something like that?
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:6>
* status: new => assigned
* owner: nobody => felixxm
Comment:
Probably this condition should be more complex. I will dig into it on
DjangoConEU sprints.
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:7>
Comment (by Antoine Pinsard):
OK, FYI I was able to workaround the issue:
{{{
>>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE
objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", [])
>>> nb_reports.alias = 'nb_reports'
>>> nb_reports.target = type('bare', (object,), {'primary_key': True})
}}}
So I will be able to upgrade to 2.0 meanwhile.
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:8>
* severity: Normal => Release blocker
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:9>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/9981 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:10>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:11>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"4ab1f559e8d1264bcb20bb497988973194f5d8f2" 4ab1f559]:
{{{
#!CommitTicketReference repository=""
revision="4ab1f559e8d1264bcb20bb497988973194f5d8f2"
Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause on MySQL
when using a RawSQL annotation.
Regression in 1d070d027c218285b66c0bde8079034b33a87f11.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:12>
Comment (by Tim Graham <timograham@…>):
In [changeset:"b6e48f514ebe4e31b76e1750e043d4f296e645dc" b6e48f51]:
{{{
#!CommitTicketReference repository=""
revision="b6e48f514ebe4e31b76e1750e043d4f296e645dc"
[2.1.x] Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause
on MySQL when using a RawSQL annotation.
Regression in 1d070d027c218285b66c0bde8079034b33a87f11.
Backport of 4ab1f559e8d1264bcb20bb497988973194f5d8f2 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:13>
Comment (by Tim Graham <timograham@…>):
In [changeset:"74bbef4ee0112144f302bb6509fd41a6b09ed63e" 74bbef4e]:
{{{
#!CommitTicketReference repository=""
revision="74bbef4ee0112144f302bb6509fd41a6b09ed63e"
[2.0.x] Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause
on MySQL when using a RawSQL annotation.
Regression in 1d070d027c218285b66c0bde8079034b33a87f11.
Backport of 4ab1f559e8d1264bcb20bb497988973194f5d8f2 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:14>
* status: closed => new
* has_patch: 1 => 0
* resolution: fixed =>
* stage: Ready for checkin => Accepted
Comment:
This broke a MySQL GIS test on MySQL 5.7:
{{{
gis_tests.geoapp.test_expressions.GeoExpressionsTests.test_multiple_annotation
Expression #2 of SELECT list is not in GROUP BY clause and contains
nonaggregated column 'test_django.geoapp_multifields.point' which is not
functionally dependent on columns in GROUP BY clause; this is incompatible
with sql_mode=only_full_group_by")
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:15>
Comment (by Simon Charette):
MySQL documentation for reference.
https://dev.mysql.com/doc/refman/5.7/en/sql-
mode.html#sqlmode_only_full_group_by
Tim, is the only MySQL 5.7 failing against this patch?
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:16>
Comment (by Tim Graham):
Yes, that's the only test failure.
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:17>
Comment (by felixxm):
IMO query in this test (even if we add
"`ST_Distance(geoapp_multifields.point, ST_GeomFromText('POINT (-95.363151
29.763374)'))`" to the GROUP BY clause) doesn't have much sense. This is
not a realistic use case. It also didn't work before
1d070d027c218285b66c0bde8079034b33a87f11.
{{{
#!sql
SELECT
`geoapp_city`.`name`,
ST_Distance(`geoapp_multifields`.`point`, ST_GeomFromText('POINT
(-95.363151 29.763374)')) AS `distance`,
COUNT(`geoapp_multifields`.`id`) AS `count`
FROM `geoapp_city`
LEFT OUTER JOIN `geoapp_multifields` ON (`geoapp_city`.`id` =
`geoapp_multifields`.`city_id`)
GROUP BY `geoapp_city`.`id`
ORDER BY `geoapp_city`.`id` ASC
LIMIT 1
}}}
We can fix this test by changing:
{{{
diff --git a/tests/gis_tests/geoapp/test_expressions.py
b/tests/gis_tests/geoapp/test_expressions.py
index 2d0ebbcae0..89e83a782f 100644
--- a/tests/gis_tests/geoapp/test_expressions.py
+++ b/tests/gis_tests/geoapp/test_expressions.py
@@ -3,7 +3,7 @@ from unittest import skipUnless
from django.contrib.gis.db.models import F, GeometryField, Value,
functions
from django.contrib.gis.geos import Point, Polygon
from django.db import connection
-from django.db.models import Count
+from django.db.models import Count, Min
from django.test import TestCase, skipUnlessDBFeature
from ..utils import postgis
@@ -56,7 +56,7 @@ class GeoExpressionsTests(TestCase):
poly=Polygon(((1, 1), (1, 2), (2, 2), (2, 1), (1, 1))),
)
qs = City.objects.values('name').annotate(
- distance=functions.Distance('multifields__point',
multi_field.city.point),
+ distance=Min(functions.Distance('multifields__point',
multi_field.city.point)),
).annotate(count=Count('multifields'))
self.assertTrue(qs.first())
}}}
or by adding `multifields__point` to the `values`, i.e.
`City.objects.values('name', 'multifields__point')`.
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:18>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/10012 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:19>
Comment (by Tim Graham <timograham@…>):
In [changeset:"d0ad03cded64fc307b15668c81d0c65fd8486eff" d0ad03c]:
{{{
#!CommitTicketReference repository=""
revision="d0ad03cded64fc307b15668c81d0c65fd8486eff"
Refs #29416 -- Fixed GeoExpressionsTests.test_multiple_annotation() on
MySQL 5.7+.
Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:20>
Comment (by Tim Graham <timograham@…>):
In [changeset:"31b9cf97b9b91a057bef65f21092e5c00083b0c9" 31b9cf9]:
{{{
#!CommitTicketReference repository=""
revision="31b9cf97b9b91a057bef65f21092e5c00083b0c9"
[2.1.x] Refs #29416 -- Fixed
GeoExpressionsTests.test_multiple_annotation() on MySQL 5.7+.
Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.
Backport of d0ad03cded64fc307b15668c81d0c65fd8486eff from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:21>
Comment (by Tim Graham <timograham@…>):
In [changeset:"b57ea27d6b25e200a780e8dff07b8289d8e6c844" b57ea27d]:
{{{
#!CommitTicketReference repository=""
revision="b57ea27d6b25e200a780e8dff07b8289d8e6c844"
[2.0.x] Refs #29416 -- Fixed
GeoExpressionsTests.test_multiple_annotation() on MySQL 5.7+.
Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.
Backport of d0ad03cded64fc307b15668c81d0c65fd8486eff from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:22>
* status: new => closed
* resolution: => fixed
--
Ticket URL: <https://code.djangoproject.com/ticket/29416#comment:23>