Consider the following model:
{{{
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
author = models.ForeignKey(Author, related_name='books')
title = models.CharField(max_length=100)
published = models.DateField()
pages = models.IntegerField()
}}}
then the query:
{{{
qs = Author.objects.filter(books__published__gte=date(2012, 11, 1))
qs = qs.filter(books__published__lte=date(2012, 11, 30))
qs = qs.values('name').annotate(num_pages=Sum('books__pages'))
print qs.query
}}}
will result in the following SQL:
{{{
SELECT "app_author"."name", SUM("app_book"."pages") AS "num_pages" FROM
"app_author" LEFT OUTER JOIN "app_book" ON ("app_author"."id" =
"app_book"."author_id") INNER JOIN "app_book" T3 ON ("app_author"."id" =
T3."author_id") WHERE ("app_book"."published" >= 2012-11-01 AND
T3."published" <= 2012-11-30 ) GROUP BY "app_author"."name"
}}}
The problem is that in this case the upper date filter is not taken into
account when calculating the 'num_pages' aggregate.
I would have expected to see the following SQL:
{{{
SELECT "app_author"."name", SUM("app_book"."pages") AS "num_pages" FROM
"app_author" LEFT OUTER JOIN "app_book" ON ("app_author"."id" =
"app_book"."author_id") WHERE ("app_book"."published" >= 2012-11-01 AND
"app_book"."published" <= 2012-11-30 ) GROUP BY "app_author"."name"
}}}
Note that a query with just a single filter seems to work correctly:
{{{
qs = Author.objects.filter(books__published__gte=date(2012, 11, 1))
qs = qs.values('name').annotate(num_pages=Sum('books__pages'))
print qs.query
}}}
which gives:
{{{
SELECT "app_author"."name", SUM("app_book"."pages") AS "num_pages" FROM
"app_author" LEFT OUTER JOIN "app_book" ON ("app_author"."id" =
"app_book"."author_id") WHERE "app_book"."published" >= 2012-11-01 GROUP
BY "app_author"."name"
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19415>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => closed
* needs_docs: => 0
* resolution: => invalid
* needs_tests: => 0
* needs_better_patch: => 0
Comment:
As far as I can make out, this is working as expected (for suitably
confusing values of expected :-).
The catch is that:
{{{
Author.objects.filter(books__published__gte=date(2012, 11,
1)).filter(books__published__lte=date(2012, 11, 30))
}}}
and
{{{
Author.objects.filter(books__published__gte=date(2012, 11, 1),
books__published__lte=date(2012, 11, 30))
}}}
are not interchangeable. By using a single filter clause, you're telling
Django to use a single join table. Splitting the filter means 2 joins are
used (hence the T3 table in the "wrong" query). The involvement of
aggregates here is a red herring; the underlying issue is the filters.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:1>
Comment (by svniemeijer@…):
Thanks! Combining everything into a single .filter() does indeed solve the
problem.
However, the aggregate part is not entirely a red herring and some
clarification in the documentation on aggregation could be useful to
prevent others making this mistake.
In the documentation topics/db/queries/#spanning-multi-valued-
relationships it is already clarified how subsequent filter() calls are
treated. However, it may be good to have a similar section in
topics/db/aggregation/#filtering-on-annotations that explains how
aggregates on multi-valued relationships behave in case multiple filter()
statements are used.
There is also no example that shows how to filter using multiple
conditions in topics/db/aggregation/#filtering-on-annotations . Adding
such an example (using a single .filter() call) would have probably
already guided me in the right direction.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:2>
* status: closed => reopened
* resolution: invalid =>
* component: ORM aggregation => Documentation
* easy: 0 => 1
* type: Bug => Cleanup/optimization
* stage: Unreviewed => Accepted
Comment:
That's a fair point. Reopening and accepting as a documentation fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:3>
* status: new => assigned
* owner: nobody => susan
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:5>
* owner: susan =>
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:6>
* version: 1.5-beta-1 => master
* easy: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:7>
* Attachment "19415Docs.diff" added.
* has_patch: 0 => 1
Comment:
Added a documentation patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:8>
Comment (by timgraham):
If you are able to submit a pull request rather than an attachment, that's
ideal. Just let me know if not.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:9>
Comment (by svniemeijer):
This documentation fragment does indeed get to the core of what is
happening, but I still had to read it several times to understand the
difference. To me ''and'' and ''as well as'' mean the same thing.
It may be useful to refer to the concepts of union and intersection,
because the first example gives an annotation on the intersection of books
matching the two conditions whereas the second one gives an annotation on
the union of them.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:10>
Comment (by benred42):
I can go ahead and make a PR. I'll also look into changing the wording to
be more transparent. Thanks for the feedback!
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:11>
Comment (by benred42):
Made a PR for the patch (PR 5729
https://github.com/django/django/pull/5729) and attempted to improve the
wording.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:12>
Comment (by svniemeijer):
Looks good to me. Thanks for the effort.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:13>
* needs_better_patch: 0 => 1
Comment:
The proposed example queries use local fields instead of relations, so I
think it doesn't quite address the ticket.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:14>
Comment (by benred42):
PR has been updated with an example using queries across relationships
instead of local fields.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:15>
Comment (by benred42):
I did some more digging on this issue as I was working on editing my Doc
patch and I learned a bit more about what's going on. Consider the
following test case:
{{{
class Author(models.Model):
name = models.CharField(max_length=10)
class Book(models.Model):
title = models.TextField()
pages = models.IntegerField(null=True)
rating = models.IntegerField(null=True)
authors = models.ManyToManyField(Author)
a1 = Author.objects.create(name='a1')
a2 = Author.objects.create(name='a2')
a3 = Author.objects.create(name='a3')
a4 = Author.objects.create(name='a4')
b1 = Book.objects.create(title='b1', pages=100, rating=4)
b1.authors.set([a1])
b2 = Book.objects.create(title='b2', pages=50, rating=2)
b2.authors.set([a2])
b3 = Book.objects.create(title='b3', pages=150, rating=3)
b3.authors.set([a4])
b4 = Book.objects.create(title='b4', pages=400, rating=5)
b4.authors.set([a4])
b5 = Book.objects.create(title='b5', pages=100, rating=1)
b5.authors.set([a1,a4])
}}}
Now, let's look at two queries on that test case:
{{{
qs1 = Author.objects.filter(book__pages__lt=200,
book__pages__gt=100).annotate(num_book=Count('book'))
qs2 =
Author.objects.filter(book__pages__lt=200).filter(book__pages__gt=100).annotate(num_book=Count('book'))
}}}
The first query gives us the expected results:
{{{
for i in qs1:
print(i, i.num_book)
>>> a4, 1
print(qs1.query)
>>> SELECT "queries_author"."id", "queries_author"."name",
"queries_author"."num", "queries_author"."extra_id",
COUNT("queries_book_authors"."book_id") AS "num_book" FROM
"queries_author" INNER JOIN "queries_book_authors" ON
("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
"queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
WHERE ("queries_book"."pages" < 200 AND "queries_book"."pages" > 100)
GROUP BY "queries_author"."id", "queries_author"."name",
"queries_author"."num", "queries_author"."extra_id" ORDER BY
"queries_author"."name" ASC
}}}
The second query, however, gives some rather strange results (author4 only
has 3 books, not 4!):
{{{
for i in qs2:
print(i, i.num_book)
>>> a4, 4
print(qs2.query)
>>> SELECT "queries_author"."id", "queries_author"."name",
"queries_author"."num", "queries_author"."extra_id",
COUNT("queries_book_authors"."book_id") AS "num_book" FROM
"queries_author" INNER JOIN "queries_book_authors" ON
("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
"queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
INNER JOIN "queries_book_authors" T4 ON ("queries_author"."id" =
T4."author_id") INNER JOIN "queries_book" T5 ON (T4."book_id" = T5."id")
WHERE ("queries_book"."pages" < 200 AND T5."pages" > 100) GROUP BY
"queries_author"."id", "queries_author"."name", "queries_author"."num",
"queries_author"."extra_id" ORDER BY "queries_author"."name" ASC
}}}
So, the problem appears to be that `Count` is only looking at results from
`queries_book_authors` and cannot see results from the aliased tables
(`T4` and `T5`), thus it only counts from the first joins and not the
latter joins. This results, rather bizarrely, in the annotated value
being the ''product'' of the number of results meeting the first condition
(`"queries_book"."pages" < 200`) times the number of results meeting the
second condition (`T5."pages" > 100`). This result is neither intuitive
nor helpful, so I would suggest that, going forward, the documentation
should provide an example of the first query and a warning not to annotate
queries with multiple filters as in the second query since the results are
not what would be expected or useful.
NOTE: I have not yet tested this case with just a ForeignKey relationship,
so as of now the above statements can only be applied when filtering
across M2M relationships.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:16>
Comment (by timgraham):
Just wanted to drop a quick note to ask if the explanation in
https://docs.djangoproject.com/en/dev/topics/db/aggregation/#order-of-
annotate-and-filter-clauses about the "query bug" and `Count('...',
distinct=True)` might help.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:17>
Comment (by benred42):
That does change the results. Using the same setup I used above but
adding in `Count('...', distinct=True)` results in the following:
{{{
for i in qs1:
print(i, i.num_book)
>>> a4 1
print(qs1.query)
>>> SELECT "queries_author"."id", "queries_author"."name",
"queries_author"."num", COUNT(DISTINCT "queries_book_authors"."book_id")
AS "num_book" FROM "queries_author" INNER JOIN "queries_book_authors" ON
("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
"queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
WHERE ("queries_book"."pages" < 200 AND "queries_book"."pages" > 100)
GROUP BY "queries_author"."id", "queries_author"."name"
for i in qs2:
print(i, i.num_book)
>>> a4 2
print(qs2.query)
>>> SELECT "queries_author"."id", "queries_author"."name",
"queries_author"."num", COUNT(DISTINCT "queries_book_authors"."book_id")
AS "num_book" FROM "queries_author" INNER JOIN "queries_book_authors" ON
("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
"queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
INNER JOIN "queries_book_authors" T4 ON ("queries_author"."id" =
T4."author_id") INNER JOIN "queries_book" T5 ON (T4."book_id" = T5."id")
WHERE ("queries_book"."pages" < 200 AND T5."pages" > 100) GROUP BY
"queries_author"."id", "queries_author"."name"
}}}
So, this does appear to be better, but I still don't think it is giving a
result that is helpful or intuitive. It looks to me like the second query
is counting the same book as the first, just twice. What it does not
appear to be doing is giving a list of Authors with books having either
fewer than 200 pages or greater than 100 pages, nor is it giving a list of
Authors with books having between 100 and 200 pages.
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:18>
* cc: Simon Charette, Josh Smeaton (added)
Comment:
I don't have much experience constructing complex annotations so I'm not
sure if there cases where chains filters as in the second case might be
useful. Simon, Josh, others... any input?
--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:19>