[Django] #16025: distinct does not apply to aggregated querysets

6 views
Skip to first unread message

Django

unread,
May 14, 2011, 1:58:36 PM5/14/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
----------------------+----------------------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer (models, ORM)
Version: 1.3 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Easy pickings: 0
----------------------+----------------------------------------------
When applying .distinct() to a QuerySet, performing an aggregation using
.aggregate() on that will only ensure that the aggregation result is
distinct. As a result, this prevents you from doing a query where you
want the input to the aggregation to have distinct rows.

Take the example models:

{{{
class Category(Model):
label = CharField(max_length=128)

class Food(Model):
name = CharField(max_length=128)
categories = ManyToManyField(Category)
kilojoules_per_kg = PositiveIntegerField()
}}}

If you then did a query on which you selected a couple of Categories as
foreign keys (eg: "Pizza Toppings" and "Fruits"), you'd end up with a list
of Foods that would contain duplicates where a food item fits in multiple
categories (eg: "Apple", "Orange", "Pineapple", "Pineapple", "Bacon" and
"Cheese" ...).

If you then wanted to then find the average number of kilojoules per
kilogram for items in those groups and eliminate duplicates (Pineapple),
the documentation would lead you to do this:

{{{
qs.distinct().aggregate(Average('kilojoules_per_kg'))
}}}

However instead of counting Pineapple once, Pineapple would be counted
twice, as the distinct() call would only apply to the result of the
aggregation, not to the input of it.

{{{
>>> qs.distinct().aggregate(Average('kilojoules_per_kg')) ==
qs.aggregate(Average('kilojoules_per_kg'))
True
}}}

Instead, .distinct() before .aggregate() should instead cause a subquery
to be executed to filter the results before passing it to the aggregation
functions. There doesn't seem a way to presently execute a query like
this using the Django ORM.

.distinct() after an aggregation should cause the results of the
aggregation to be filtered so only distinct records are returned (which is
the current behavior).

I've managed to confirm this behavior in Django 1.2.3 and 1.3.

--
Ticket URL: <http://code.djangoproject.com/ticket/16025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 15, 2011, 4:25:06 AM5/15/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
-------------------------------------+-------------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.3 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords:
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Changes (by micolous):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

I'm aware wrote "Average" here instead of "Avg" -- I meant "Avg". The
rest of the bug still applies, I was trying to make an example of the
error that was effecting my application using different models (because my
application's models are rather complex).

--
Ticket URL: <http://code.djangoproject.com/ticket/16025#comment:1>

Django

unread,
May 20, 2011, 11:58:05 PM5/20/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
------------------------------------+---------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: ORM aggregation
Version: 1.3 | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
------------------------------------+---------------------------------
Changes (by ramiro):

* component: Database layer (models, ORM) => ORM aggregation
* stage: Unreviewed => Accepted


Comment:

Does your Food model have any `Meta.ordering` model option or does the
queryset have a `order_by()` call somewhere?

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:2>

Django

unread,
Jun 2, 2011, 2:50:12 AM6/2/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
------------------------------------+---------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: ORM aggregation
Version: 1.3 | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
------------------------------------+---------------------------------

Comment (by micolous):

In the example application I developed it doesn't (the only difference is
I've implemented `__unicode__`, but in the application I'm actually
writing it does.

Both ways give the same (erroneous) result.

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:3>

Django

unread,
Jun 2, 2011, 7:14:08 AM6/2/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
------------------------------------+---------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: ORM aggregation
Version: 1.3 | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
------------------------------------+---------------------------------
Description changed by ramiro:

Old description:
New description:

When applying .distinct() to a QuerySet, performing an aggregation using
.aggregate() on that will only ensure that the aggregation result is
distinct. As a result, this prevents you from doing a query where you
want the input to the aggregation to have distinct rows.

Take the example models:

{{{
class Category(Model):
label = CharField(max_length=128)

class Food(Model):
name = CharField(max_length=128)
categories = ManyToManyField(Category)
kilojoules_per_kg = PositiveIntegerField()
}}}

If you then did a query on which you selected a couple of Categories as
foreign keys (eg: "Pizza Toppings" and "Fruits"), you'd end up with a list
of Foods that would contain duplicates where a food item fits in multiple
categories (eg: "Apple", "Orange", "Pineapple", "Pineapple", "Bacon" and
"Cheese" ...).

If you then wanted to then find the average number of kilojoules per
kilogram for items in those groups and eliminate duplicates (Pineapple),
the documentation would lead you to do this:

{{{
qs.distinct().aggregate(Avg('kilojoules_per_kg'))
}}}

However instead of counting Pineapple once, Pineapple would be counted
twice, as the distinct() call would only apply to the result of the
aggregation, not to the input of it.

{{{
>>> qs.distinct().aggregate(Avg('kilojoules_per_kg')) ==
qs.aggregate(Avg('kilojoules_per_kg'))
True
}}}

Instead, .distinct() before .aggregate() should instead cause a subquery
to be executed to filter the results before passing it to the aggregation
functions. There doesn't seem a way to presently execute a query like
this using the Django ORM.

.distinct() after an aggregation should cause the results of the
aggregation to be filtered so only distinct records are returned (which is
the current behavior).

I've managed to confirm this behavior in Django 1.2.3 and 1.3.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:4>

Django

unread,
Sep 3, 2011, 6:55:41 PM9/3/11
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
------------------------------------+---------------------------------
Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: ORM aggregation
Version: 1.3 | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------+---------------------------------
Changes (by anonymous):

* ui_ux: => 0


Comment:

This bug is also experienced here:
http://stackoverflow.com/questions/4093910/django-aggregates-sums-in-
postgresql-dont-use-distinct-is-this-a-bug

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:5>

Django

unread,
Apr 29, 2014, 3:30:55 AM4/29/14
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
-------------------------------------+-------------------------------------

Reporter: micolous | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by vsajip):

* version: 1.3 => 1.6


Comment:

Still there in 1.6 :-(

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:7>

Django

unread,
Mar 2, 2015, 10:28:57 AM3/2/15
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
-------------------------------------+-------------------------------------
Reporter: micolous | Owner: mikalai
Type: Bug | Status: assigned

Component: Database layer | Version: 1.6
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by mikalai):

* status: new => assigned
* owner: nobody => mikalai


--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:8>

Django

unread,
Mar 8, 2015, 7:18:40 AM3/8/15
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
-------------------------------------+-------------------------------------
Reporter: micolous | Owner:
Type: Bug | Status: new

Component: Database layer | Version: 1.6
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by mikalai):

* owner: mikalai =>
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:9>

Django

unread,
Mar 11, 2019, 11:55:15 PM3/11/19
to django-...@googlegroups.com
#16025: distinct does not apply to aggregated querysets
-------------------------------------+-------------------------------------
Reporter: Michael | Owner: (none)
Type: Bug | Status: closed

Component: Database layer | Version: 1.6
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* status: new => closed
* resolution: => fixed


Comment:

I'm not sure when this got fixed but I can confirm that the `DISTINCT` is
now applied to the subquery aggregated over by the expression provided in
the `aggregate` call and not on the outer query which was effectively
useless. I suspect that #23875 fixed it by moving the
[https://github.com/django/django/commit/c7fd9b242d2d63406f1de6cc3204e35aaa025233
#diff-0edd853580d56db07e4020728d59e193R339 distinct special casing] from
the `get_count` method to the `get_aggregation` one.

--
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:10>

Reply all
Reply to author
Forward
0 new messages