[Django] #24385: Problem with aggregate(Sum())

39 views
Skip to first unread message

Django

unread,
Feb 21, 2015, 7:27:26 PM2/21/15
to django-...@googlegroups.com
#24385: Problem with aggregate(Sum())
----------------------------------------------+--------------------
Reporter: mcagl | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer (models, ORM) | Version: 1.6
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
Hello,
I setup a minimal project here:
https://github.com/mcagl/minimal_django_sum_test to demonstrate the
problem.

I'm working with Django 1.6.x and I noticed something that I don't
understand.

As you can see from the github repository, I have a Tag model and a Row
model with a m2m towards Tag and a DecimalField called amount.

If I filter Row objects for more than one Tag, and there is/are Row
objects that have more than one Tag among the one filtered by,
Sum('amount') counts it/them once per Tag, even if I use distinct().

Please note also that I assert, in the test, that the filtered queryset is
composed by three Row objects, as expected, but the next assert fails,
with 40 != 30.

I added a test that instead of aggregate(Sum('amount')) does sum([x.amount
for x in rows]) which passes.

Is this a bug in Sum() or am I missing something?

Kind regards,
Mark

--
Ticket URL: <https://code.djangoproject.com/ticket/24385>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 21, 2015, 7:40:43 PM2/21/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------

Reporter: mcagl | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by mcagl):

* needs_better_patch: => 0
* needs_tests: => 0
* version: 1.6 => 1.7
* needs_docs: => 0


Old description:

> Hello,
> I setup a minimal project here:
> https://github.com/mcagl/minimal_django_sum_test to demonstrate the
> problem.
>
> I'm working with Django 1.6.x and I noticed something that I don't
> understand.
>
> As you can see from the github repository, I have a Tag model and a Row
> model with a m2m towards Tag and a DecimalField called amount.
>
> If I filter Row objects for more than one Tag, and there is/are Row
> objects that have more than one Tag among the one filtered by,
> Sum('amount') counts it/them once per Tag, even if I use distinct().
>
> Please note also that I assert, in the test, that the filtered queryset
> is composed by three Row objects, as expected, but the next assert fails,
> with 40 != 30.
>
> I added a test that instead of aggregate(Sum('amount')) does
> sum([x.amount for x in rows]) which passes.
>
> Is this a bug in Sum() or am I missing something?
>
> Kind regards,
> Mark

New description:

Hello,
I setup a minimal project here:
https://github.com/mcagl/minimal_django_sum_test to demonstrate the
problem.

I'm working with Django 1.6.x and I noticed something that I don't
understand.

Edit: I checked also with the last stable version, 1.7.4 downloaded right
from the website, and the problem is still there.

As you can see from the github repository, I have a Tag model and a Row
model with a m2m towards Tag and a DecimalField called amount.

If I filter Row objects for more than one Tag, and there is/are Row
objects that have more than one Tag among the one filtered by,
Sum('amount') counts it/them once per Tag, even if I use distinct().

Please note also that I assert, in the test, that the filtered queryset is
composed by three Row objects, as expected, but the next assert fails,
with 40 != 30.

I added a test that instead of aggregate(Sum('amount')) does sum([x.amount
for x in rows]) which passes.

Is this a bug in Sum() or am I missing something?

Kind regards,
Mark

--

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:1>

Django

unread,
Feb 21, 2015, 9:50:20 PM2/21/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

Is this a symptom of the same problem described by
https://code.djangoproject.com/ticket/10060 ?

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:2>

Django

unread,
Feb 22, 2015, 7:45:13 AM2/22/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by mcagl):

It seems similar, or at least somewhat related.

I did a little more inspecting, doing this in {{{./manage.py shell}}} to
obtain the executed raw SQL statements:

{{{
import logging
l = logging.getLogger('django.db.backends')
l.setLevel(logging.DEBUG)
l.addHandler(logging.StreamHandler())
import datetime
from decimal import Decimal
import my_app.models as app_models
t0 = app_models.Tag.objects.create(name="Tag0")
t1 = app_models.Tag.objects.create(name="Tag1")
t2 = app_models.Tag.objects.create(name="Tag2")
t3 = app_models.Tag.objects.create(name="Tag3")
app_models.Row.objects.create(day=datetime.date(2014, 1, 1),
amount=Decimal('10.00'))
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2),
amount=Decimal('10.00'))
r.tags.add(t0)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3),
amount=Decimal('10.00'))
r.tags.add(t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4),
amount=Decimal('10.00'))
r.tags.add(t0, t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5),
amount=Decimal('10.00'))
r.tags.add(t2, t3)
}}}

so mainly adding a DEBUG level logger for SQL queries, and recreating the
objects in the setUp() method of the TestCase in the github repository.
Then I tried the queries, obtaining this output, first without
{{{distinct()}}}:
{{{
rows = app_models.Row.objects.filter(tags__in=[t0, t1]
list(rows)
}}}

{{{
>>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount",
"my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
"my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
"my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row
object>]
}}}

{{{
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
}}}

{{{
>>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum"
FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" =
"my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)'
- PARAMS = (1, 2); args=(1, 2)
>>> 40.00
}}}

Then I added {{{distinct()}}}, which gets added as expected (and the rows
queryset gets 3 objects instead of 4, also expected):

{{{
rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct()
list(rows)
}}}

{{{
>>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id",
"my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN
"my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" )
WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1,
2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
}}}

but somewhat fails when doint aggregation in this version of the queryset,
because the {{{DISTINCT}}} is added before {{{SUM}}}, whereas I expect
that {{{distinct()}}} gets executed before doing the aggregation, as it's
the python code I wrote:

{{{
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
}}}

{{{
>>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS
"amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
"my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
"my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00
}}}

And obviously {{{agg['amount__sum']}}} contains an unexpected value,
because at an higher level it's like the Row object with two tags is
counted twice, completely disregarding the {{{distinct()}}} method I
called before aggregating.


Thanks for the quick response, I hope that this can be useful,
Mark

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:3>

Django

unread,
Mar 3, 2015, 1:49:18 PM3/3/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* type: Uncategorized => Bug
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:4>

Django

unread,
Mar 25, 2015, 9:43:32 AM3/25/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by camillobruni):

http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-
aggregates-and-derived-tables describes this issue on SQL-level. Their
solution was to use a derived table, something that is not directly
possible with django queries, I think. The only solution I came up with
was to sum up the values in a python loop, relying on `distinct()` to
properly do its job.

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:5>

Django

unread,
Apr 13, 2015, 7:28:47 PM4/13/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by kutenai):

I used the test project that mcagl provided.
The test fails under stable/1.7.x but it passes on stable/1.8.x and also
on master.

It would appear that this is fixed in 1.8.

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:6>

Django

unread,
Apr 14, 2015, 12:50:33 PM4/14/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by kutenai):

* cc: kutenai (added)
* has_patch: 0 => 1


Comment:

Added a pull request with a test to validate this issue and insure it does
not get re-introduced.

https://github.com/django/django/pull/4507

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:7>

Django

unread,
Apr 14, 2015, 1:41:06 PM4/14/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* stage: Accepted => Ready for checkin


Comment:

Looks good, pending some comment cleanups and squashing of commits.

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:8>

Django

unread,
Apr 14, 2015, 6:58:00 PM4/14/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"910638fc4e4f83bda31dd91db5017acb06dfc8d2" 910638f]:
{{{
#!CommitTicketReference repository=""
revision="910638fc4e4f83bda31dd91db5017acb06dfc8d2"
Refs #24385 -- Added tests for distinct sum issue fixed in
c7fd9b242d2d63406f1de6cc3204e35aaa025233
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:9>

Django

unread,
Apr 14, 2015, 6:59:05 PM4/14/15
to django-...@googlegroups.com
#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
Reporter: mcagl | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* status: new => closed
* resolution: => fixed


--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:10>

Reply all
Reply to author
Forward
0 new messages