[Django] #23557: annotate gives different results on postgresql and mysql

40 views
Skip to first unread message

Django

unread,
Sep 25, 2014, 8:26:29 PM9/25/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
----------------------------------------------+--------------------
Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: 1.7
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
Hello,

With the following db model:

{{{
@python_2_unicode_compatible
class CPUJob(models.Model):
account = models.ForeignKey(Account, blank=True, null=True)
username = models.CharField(max_length=50, blank=True, null=True)
project = models.ForeignKey(Project, null=True, blank=True)
machine = models.ForeignKey(Machine, blank=True, null=True)
date = models.DateField(db_index=True, blank=True, null=True)
queue = models.ForeignKey(Queue, blank=True, null=True)
cpu_usage = models.BigIntegerField(blank=True, null=True)
mem = models.BigIntegerField(blank=True, null=True)
vmem = models.BigIntegerField(blank=True, null=True)
ctime = models.DateTimeField(blank=True, null=True)
qtime = models.DateTimeField(blank=True, null=True)
etime = models.DateTimeField(blank=True, null=True)
start = models.DateTimeField(blank=True, null=True)
act_wall_time = models.BigIntegerField(blank=True, null=True)
est_wall_time = models.BigIntegerField(blank=True, null=True)
jobid = models.CharField(max_length=50, blank=True, null=True,
unique=True)
cores = models.BigIntegerField(blank=True, null=True)
list_mem = models.BigIntegerField(blank=True, null=True)
list_pmem = models.BigIntegerField(blank=True, null=True)
list_vmem = models.BigIntegerField(blank=True, null=True)
list_pvmem = models.BigIntegerField(blank=True, null=True)
exit_status = models.BigIntegerField(blank=True, null=True)
jobname = models.CharField(max_length=256, blank=True, null=True)
software = models.ManyToManyField(SoftwareVersion, blank=True,
null=True)

class Meta:
ordering = ['-date']
db_table = 'cpu_job'
}}}

The following on mysql produces a good result:

{{{
q = CPUJob.objects.values('project').annotate(usage=Sum('cpu_usage'),
jobs=Count('id'))
print q.query
}}}

of

{{{
SELECT `cpu_job`.`project_id`, SUM(`cpu_job`.`cpu_usage`) AS `usage`,
COUNT(`cpu_job`.`id`) AS `jobs` FROM `cpu_job` GROUP BY
`cpu_job`.`project_id` ORDER BY `cpu_job`.`date` DESC
}}}

However on Postgresql, with the same data, I get the following query:

{{{
SELECT "cpu_job"."project_id", SUM("cpu_job"."cpu_usage") AS "usage",
COUNT("cpu_job"."id") AS "jobs" FROM "cpu_job" GROUP BY
"cpu_job"."project_id", "cpu_job"."date" ORDER BY "cpu_job"."date" DESC
}}}

Note additional term "cpu_job"."date" in the GROUP BY. I did not ask for
it, it got put there.

I suspect the problem is the sort order on the table. The SQL seems to be
fine on mysql (really?????), but appears to be invalid on Postgresql (this
actually makes more sense to me). So Django appears to be silently
adjusting the request to make it valid on Postgresql.

{{{
karaage=> SELECT "cpu_job"."project_id", SUM("cpu_job"."cpu_usage") AS
"usage", COUNT("cpu_job"."id") AS "jobs" FROM "cpu_job" GROUP BY
"cpu_job"."project_id" ORDER BY "cpu_job"."date" DESC
;
ERROR: column "cpu_job.date" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 1: ...cpu_job" GROUP BY "cpu_job"."project_id" ORDER BY "cpu_job"....
}}}

I would much rather Django gives an error under Postgresql (and maybe even
MYSQL too) rather then silently changing the query, and giving different
results to what I expected.

(for this particular query, I don't need sorting, and had not noticed it
would cause problems - specifying order_by() seems to solve this problem).

Thanks

--
Ticket URL: <https://code.djangoproject.com/ticket/23557>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 26, 2014, 7:43:12 PM9/26/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jarshwah):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

The behaviour you're seeing with Postgres is correct, and documented:
https://docs.djangoproject.com/en/dev/topics/db/aggregation/#interaction-
with-default-ordering-or-order-by

Columns in the ORDER BY clause must be added to the GROUP BY clause on
Postgres, and any RDBMS that conforms to the spec. MySQL and sqlite allow
columns in the select list that aren't also in the GROUP BY. Django should
be consistent though - it's weird that different results are returned
based on the underlying engine.

I would propose that the MySQL backend should add the order by columns
into the group by list.

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:1>

Django

unread,
Sep 26, 2014, 7:59:11 PM9/26/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

Additionally, it may be worth talking about removing the non-standard
behaviour from mysql in django by setting the session to
"ONLY_FULL_GROUP_BY": http://dev.mysql.com/doc/refman/5.0/en/sql-
mode.html#sqlmode_only_full_group_by but that may be backwards
incompatible.

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:2>

Django

unread,
Sep 26, 2014, 7:59:20 PM9/26/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by jarshwah):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:3>

Django

unread,
Sep 26, 2014, 8:11:42 PM9/26/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by shaib):

It should be noted that "full" group-by causes significant performance
degradations on mysql; this is related to #17144.

In other respects, @jarshwah's comments are mostly correct -- except for
one point, and that is that the SQL spec is retarded, and an ORM ''can''
know when it's safe to group by a subset of the retrieved fields (which
are unique for the group).

I suspect the only way to resolve this without breaking stuff is to close
this ticket as "wontfix".

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:4>

Django

unread,
Sep 28, 2014, 7:04:19 PM9/28/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by brian):

My personal opinion is that Django should never alter silently alter
queries. If there is something wrong with the query, it should generate an
error rather then attempt to fix the problem by altering the query. The
fact the current Postgresql behaviour is documented doesn't make it any
more correct IMHO. The programmer should be forced to explicitly state the
preferred solution rather then Django guess it.

I would suggest that Django has some sort of legacy mode, enabled by
default, for preserving the current behaviour, and a new mode where errors
such as these trigger an error condition. That way any problems can be
identified and fixed, rather then silently await some change (e.g.
different database, or somebody deciding to add an ordering field where
there was none), and nobody noticing that the results generated are now
wrong.

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:5>

Django

unread,
Sep 29, 2014, 12:31:53 AM9/29/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by jarshwah):

I would be in favour of clearing out the ordering when using a .values
call, if an entry in values does not explicitly reference the ordered
column. I think this would neatly solve the issue. Unfortunately, I think
that'd be backwards incompatible. A flag could be used, but I'm not sure
it'd be very nice. It could also be retired after a deprecation period
where the behaviour was swapped.

{{{
Meta:
ordering = ['col']
values_respects_ordering = False # defaults to True
}}}


But then I'm sure someone else will say django is silently altering the
query by not respecting the Meta.ordering once the behaviour is swapped.

I wish Meta.ordering didn't exist.

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:6>

Django

unread,
Sep 29, 2014, 2:25:17 AM9/29/14
to django-...@googlegroups.com
#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------

Reporter: brian | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

I am with brian here - I think Django's way to calculate the GROUP BY is
complex enough without the GROUP BY clause being altered by .order_by()
calls.

In my opinion Django's group by logic should be as follows:
- Group by the primary key of the table in the query by default. The
group by will need to be appended by functionally dependent columns as
many databases need this. This means that we can add any column from the
same table to the group by, or any column from any table pointed by direct
foreign key or reverse one to one field.
- Group by the user's defined .values() call. Possibly allow extending
it by functionally dependent columns. That is, if some table's primary key
is in the values() list, then allow extension from that primary key along
foreign keys and o2ofields.
- If any other column is needed in the group by, then error out. This
means that:
`CPUJob.objects.values('project').annotate(usage=Sum('cpu_usage'),
jobs=Count('id')).order_by('date')` is an error.

The functionally dependent columns appending means that we need to do a
group by author.id, author.name on some databases even though technically
group by author.id gives the same result. Similarly, we need to group by
author.id, author.name, book.id, book.name when
.select_related('favorite_book') is applied, where favorite_book is a
foreign key from author to book.

The main point here is that we really shouldn't alter the results because
.order_by() was added to the query. Just error out in that case. Resist
the temptation to guess, explicit is better than implicit and so on...

I think we can deprecate the current behavior of silently altering the
group by. Checking if a column is functionally dependent on some subset of
columns in the group by isn't exactly easy, but should be doable at least
for primary key case. Multi-column unique indexes are going to be a bit
harder to support...

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:7>

Django

unread,
Aug 24, 2022, 12:50:29 AM8/24/22
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I'll note that the issue initially reported issue here is not reproducible
anymore since #14357 which deprecated `Meta.ordering` from being
considered when doing annotation. Even if that was not the case the
optimization of selected primary key grouping in #19259 would also have
solve the

I still think that it would be worth to keep this ticket open though to go
through a deprecation period and then raise an error when `values` is used
for grouping and is paired with an explicit `order_by` that would result
in extra `GROUP BY` entries. This can likely be achieved in
[https://github.com/django/django/blob/897f38fabea5e1b196f11250ff6dadfffa489840/django/db/models/sql/compiler.py#L149-L154
SQLCompiler.get_group_by] by branching off `self.query.group_by is not
True` and comparing the results of calling `self.collapse_group_by` with
expressions originating from `order_by` and ones without and if there is
any difference warn/raise.

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:8>

Django

unread,
Oct 20, 2022, 6:20:06 PM10/20/22
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ryan Cheley):

* cc: Ryan Cheley (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:9>

Django

unread,
Apr 21, 2023, 7:45:33 PM4/21/23
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Some interesting development in SQL:2023 about the possibility to `ORDER
BY` non-selected columns when using aggregation `F868`.

[http://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-
whats-new#order-by-in-grouped-table-f868 If I understand correctly
Postgres 16+ should implement this feature].

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:10>

Django

unread,
Jan 6, 2024, 2:33:36 PM1/6/24
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: Ryan
| Cheley
Type: Bug | Status: assigned

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ryan Cheley):

* owner: nobody => Ryan Cheley
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:11>

Django

unread,
Jan 6, 2024, 2:34:22 PM1/6/24
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: (none)
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ryan Cheley):

* owner: Ryan Cheley => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:12>

Django

unread,
Jun 7, 2025, 3:04:23 AMJun 7
to django-...@googlegroups.com
#23557: Prevent silent extension of explicit GROUP BY when using order_by
-------------------------------------+-------------------------------------
Reporter: Brian May | Owner: ontowhee
Type: Bug | Status: assigned
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by ontowhee):

* has_patch: 0 => 1
* needs_better_patch: 0 => 1
* owner: nobody => ontowhee
* status: new => assigned

Comment:

I opened a [https://github.com/django/django/pull/19536 draft pr]. It is
still a work in progress and quite incomplete. I've looked into Postgres
only so far.

There are a couple of items that still need to be addressed:
- Handle ordering by primary keys, including composite pks. There is
existing support for ordering by primary keys for some databases, so it
seems we should allow ordering by pk even if it is not included in the
select list. Does this sound right? My thought is to get a list of pks
across all table involved in the query and use that to compare against the
columns from order by that are being added to group by. I'm not exactly
sure how to get the models and their pks yet, but I'll dig around.
- Handle ordering by annotations, including when the annotations get
reduced to pks in the group by clause. Similar to above, I'll dig around
to see how this can be done.

Does this sound like it is heading in the right direction?
--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:11>
Reply all
Reply to author
Forward
0 new messages