I'm working with Django 1.6.x and I noticed something that I don't
understand.
As you can see from the github repository, I have a Tag model and a Row
model with a m2m towards Tag and a DecimalField called amount.
If I filter Row objects for more than one Tag, and there is/are Row
objects that have more than one Tag among the one filtered by,
Sum('amount') counts it/them once per Tag, even if I use distinct().
Please note also that I assert, in the test, that the filtered queryset is
composed by three Row objects, as expected, but the next assert fails,
with 40 != 30.
I added a test that instead of aggregate(Sum('amount')) does sum([x.amount
for x in rows]) which passes.
Is this a bug in Sum() or am I missing something?
Kind regards,
Mark
--
Ticket URL: <https://code.djangoproject.com/ticket/24385>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* version: 1.6 => 1.7
* needs_docs: => 0
Old description:
> Hello,
> I setup a minimal project here:
> https://github.com/mcagl/minimal_django_sum_test to demonstrate the
> problem.
>
> I'm working with Django 1.6.x and I noticed something that I don't
> understand.
>
> As you can see from the github repository, I have a Tag model and a Row
> model with a m2m towards Tag and a DecimalField called amount.
>
> If I filter Row objects for more than one Tag, and there is/are Row
> objects that have more than one Tag among the one filtered by,
> Sum('amount') counts it/them once per Tag, even if I use distinct().
>
> Please note also that I assert, in the test, that the filtered queryset
> is composed by three Row objects, as expected, but the next assert fails,
> with 40 != 30.
>
> I added a test that instead of aggregate(Sum('amount')) does
> sum([x.amount for x in rows]) which passes.
>
> Is this a bug in Sum() or am I missing something?
>
> Kind regards,
> Mark
New description:
Hello,
I setup a minimal project here:
https://github.com/mcagl/minimal_django_sum_test to demonstrate the
problem.
I'm working with Django 1.6.x and I noticed something that I don't
understand.
Edit: I checked also with the last stable version, 1.7.4 downloaded right
from the website, and the problem is still there.
As you can see from the github repository, I have a Tag model and a Row
model with a m2m towards Tag and a DecimalField called amount.
If I filter Row objects for more than one Tag, and there is/are Row
objects that have more than one Tag among the one filtered by,
Sum('amount') counts it/them once per Tag, even if I use distinct().
Please note also that I assert, in the test, that the filtered queryset is
composed by three Row objects, as expected, but the next assert fails,
with 40 != 30.
I added a test that instead of aggregate(Sum('amount')) does sum([x.amount
for x in rows]) which passes.
Is this a bug in Sum() or am I missing something?
Kind regards,
Mark
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:1>
Comment (by jarshwah):
Is this a symptom of the same problem described by
https://code.djangoproject.com/ticket/10060 ?
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:2>
Comment (by mcagl):
It seems similar, or at least somewhat related.
I did a little more inspecting, doing this in {{{./manage.py shell}}} to
obtain the executed raw SQL statements:
{{{
import logging
l = logging.getLogger('django.db.backends')
l.setLevel(logging.DEBUG)
l.addHandler(logging.StreamHandler())
import datetime
from decimal import Decimal
import my_app.models as app_models
t0 = app_models.Tag.objects.create(name="Tag0")
t1 = app_models.Tag.objects.create(name="Tag1")
t2 = app_models.Tag.objects.create(name="Tag2")
t3 = app_models.Tag.objects.create(name="Tag3")
app_models.Row.objects.create(day=datetime.date(2014, 1, 1),
amount=Decimal('10.00'))
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2),
amount=Decimal('10.00'))
r.tags.add(t0)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3),
amount=Decimal('10.00'))
r.tags.add(t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4),
amount=Decimal('10.00'))
r.tags.add(t0, t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5),
amount=Decimal('10.00'))
r.tags.add(t2, t3)
}}}
so mainly adding a DEBUG level logger for SQL queries, and recreating the
objects in the setUp() method of the TestCase in the github repository.
Then I tried the queries, obtaining this output, first without
{{{distinct()}}}:
{{{
rows = app_models.Row.objects.filter(tags__in=[t0, t1]
list(rows)
}}}
{{{
>>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount",
"my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
"my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
"my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row
object>]
}}}
{{{
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
}}}
{{{
>>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum"
FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" =
"my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)'
- PARAMS = (1, 2); args=(1, 2)
>>> 40.00
}}}
Then I added {{{distinct()}}}, which gets added as expected (and the rows
queryset gets 3 objects instead of 4, also expected):
{{{
rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct()
list(rows)
}}}
{{{
>>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id",
"my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN
"my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" )
WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1,
2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
}}}
but somewhat fails when doint aggregation in this version of the queryset,
because the {{{DISTINCT}}} is added before {{{SUM}}}, whereas I expect
that {{{distinct()}}} gets executed before doing the aggregation, as it's
the python code I wrote:
{{{
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
}}}
{{{
>>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS
"amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
"my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
"my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00
}}}
And obviously {{{agg['amount__sum']}}} contains an unexpected value,
because at an higher level it's like the Row object with two tags is
counted twice, completely disregarding the {{{distinct()}}} method I
called before aggregating.
Thanks for the quick response, I hope that this can be useful,
Mark
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:3>
* type: Uncategorized => Bug
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:4>
Comment (by camillobruni):
http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-
aggregates-and-derived-tables describes this issue on SQL-level. Their
solution was to use a derived table, something that is not directly
possible with django queries, I think. The only solution I came up with
was to sum up the values in a python loop, relying on `distinct()` to
properly do its job.
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:5>
Comment (by kutenai):
I used the test project that mcagl provided.
The test fails under stable/1.7.x but it passes on stable/1.8.x and also
on master.
It would appear that this is fixed in 1.8.
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:6>
* cc: kutenai (added)
* has_patch: 0 => 1
Comment:
Added a pull request with a test to validate this issue and insure it does
not get re-introduced.
https://github.com/django/django/pull/4507
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:7>
* stage: Accepted => Ready for checkin
Comment:
Looks good, pending some comment cleanups and squashing of commits.
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:8>
Comment (by Tim Graham <timograham@…>):
In [changeset:"910638fc4e4f83bda31dd91db5017acb06dfc8d2" 910638f]:
{{{
#!CommitTicketReference repository=""
revision="910638fc4e4f83bda31dd91db5017acb06dfc8d2"
Refs #24385 -- Added tests for distinct sum issue fixed in
c7fd9b242d2d63406f1de6cc3204e35aaa025233
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:9>
* status: new => closed
* resolution: => fixed
--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:10>