Django aggregation does excessive GROUP BY clauses

114 views
Skip to first unread message

christian.oudard

unread,
Nov 1, 2011, 6:19:59 PM11/1/11
to Django users
I am doing a very simple aggregation using the Django ORM, and it is
producing a GROUP BY clause that includes the data field, which is
very large, and is slowing down the query by over 100-fold.

Here is a simplified version of the model:

class Document(models.Model):
data = models.TextField()

class Attachment(models.Model):
document = models.ForeignKey(Document)

And the query I am running:

Document.objects.annotate(num_attachments=Count('attachment'))

And the SQL output:

SELECT
`document_document`.`id`,
`document_document`.`data`,
COUNT(`document_attachment`.`id`) AS `num_attachments`
FROM `document_document`
LEFT OUTER JOIN `document_attachment`
ON (`document_document`.`id` =
`document_attachment`.`document_id`)
GROUP BY
`document_document`.`id`,
`document_document`.`id`,
`document_document`.`data`
ORDER BY NULL

Doing GROUP BY on the data field is unnecessary and ridiculous. I can
stop this by doing a values query:

Document.objects.values('pk').annotate(num_attachments=Count('attachment'))

But then how do I get a real, annotated Document query as the result?

Note: I cross-posted this to StackOverflow:
http://stackoverflow.com/questions/7973461/django-aggregation-does-excessive-group-by-clauses

Karen Tracey

unread,
Nov 1, 2011, 9:06:09 PM11/1/11
to django...@googlegroups.com
On Tue, Nov 1, 2011 at 6:19 PM, christian.oudard <christia...@gmail.com> wrote:
I am doing a very simple aggregation using the Django ORM, and it is
producing a GROUP BY clause that includes the data field, which is
very large, and is slowing down the query by over 100-fold.

Here is a simplified version of the model:

class Document(models.Model):
   data = models.TextField()

class Attachment(models.Model):
   document = models.ForeignKey(Document)

And the query I am running:

Document.objects.annotate(num_attachments=Count('attachment'))


The SQL generated by the ORM for this query changed between Django version 1.2 and 1.3. The 1.2 SQL did a group by only on the id field. With 1.3 we're getting id twice and then all other fields in the model. Bisection shows the change was made with r14715:

https://code.djangoproject.com/changeset/14715

It certainly looks to me like the old SQL was correct and preferable for this particular case. In a brief search I did not find a ticket reporting this issue -- could you open one?

Karen
--
http://tracey.org/kmt/

christian.oudard

unread,
Nov 2, 2011, 9:42:53 AM11/2/11
to Django users
Opened a bug:

https://code.djangoproject.com/ticket/17144

I think I might take a crack at fixing this one.

On Nov 1, 9:06 pm, Karen Tracey <kmtra...@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 6:19 PM, christian.oudard <christian.oud...@gmail.com
Reply all
Reply to author
Forward
0 new messages