Bug with distinct + aggregate ?

119 views
Skip to first unread message

donato.gr

unread,
Sep 13, 2010, 11:51:59 AM9/13/10
to Django users
Hi,
when using 'aggregate(Sum(...))' on a queryset that is also using
'distinct()', 'DISTINCT' is misplaced in resulting SQL query...

Here is a silly sample code:

I have the following classes:
class Son(models.Model):
name = models.CharField(max_length=20)

def __unicode__(self):
return self.name

class Father(models.Model):
name = models.CharField(max_length=20)
age = models.IntegerField()
sons = models.ManyToManyField(Son)

def __unicode__(self):
return '%s - %s' %(self.name, self.age)

I want to compute the total age of fathers who have a son called Adam
or Bernard:

so:

>>> f = Father.objects.create(name='Chris', age=30)
>>> adam = f.sons.create(name='Adam')
>>> bernard = f.sons.create(name='Bernard')
>>> f.sons.all() #Check that everything is ok
[<Son: Bernard>, <Son: Adam>]

>>> Father.objects.filter(sons__in=[adam, bernard]) #Let's see the fathers that have either Adam or Bernard among their sons
[<Father: Chris - 30>, <Father: Chris - 30>]

>>> Father.objects.filter(sons__in=[adam, bernard]).distinct() #Since Chris have both, he appears twice; so, I have to apply 'distinct'
[<Father: Chris - 30>]

>>> Father.objects.filter(sons__in=[adam, bernard]).distinct().aggregate(Sum('age')) #Let's see the total age...
{'age__sum': 60}

I think this is an error: the total age should be 30...

The SQL query is:
>>> from django.db import connection
>>> connection.queries[-1]
{'time': '0.000',
'sql': u'SELECT DISTINCT SUM(`testing_father`.`age`) AS `age__sum`
FROM `testing_father` INNER JOIN `testing_father_sons` ON
(`testing_father`.`id` = `testing_father_sons`.`father_id`) WHERE
`testing_father_sons`.`son_id` IN (7, 6)'}

So, Django applies DISTINCT on values already summed, while it shouls
apply DISTINCT BEFORE summing.


What do you think?

Note that my actual problem is that I have to apply aggregation on a
queryset which is provided by a 'black-box' function: the queryset is
already provided with the '__in' filter and the 'distinct' clause and
I cannot change its behaviour, nor taking distinct back...

Any suggestion?

Thanks a lot

donato.gr

unread,
Sep 14, 2010, 6:36:04 AM9/14/10
to Django users
I managed in this way:
>>> subquery = Father.objects.filter(sons__in=[adam, bernard])
>>> Father.objects.filter(pk__in=subquery).aggregate(Sum('age'))

this gives the correct result.

Is there a better way to do so? (maybe more efficient?)


Thanks
Reply all
Reply to author
Forward
0 new messages