Is this a bug on annotate+filter+annotate over M2M field.

80 views
Skip to first unread message

Todor Velichkov

unread,
Jun 7, 2017, 4:38:26 PM6/7/17
to Django users
Hello, everyone, I just want to get some more feedback on this, before posting it into the tracker.

So according to the docs, the order of annotate and filter matters, since they are not commutative operations.

The annotation is over the full queryset if applied before .filter, and the annotation gets filtered if applied after .filter
We can even use both strategies together, i.e.

publishers = (Publisher.objects
                       
.annotate(num_books=Count('book', distinct=True))
                       
.filter(book__rating__gt=3.0)
                       
.annotate(num_rated=Count('book', distinct=True))
                       
.filter(num_books=F('num_rated'))
)

will produce the following SQL

SELECT
   
"store_publisher"."id",
   
"store_publisher"."name",
   
"store_publisher"."num_awards",
    COUNT
(DISTINCT "store_book"."id") AS "num_books",
    COUNT
(DISTINCT T3."id") AS "num_rated"
FROM
"store_publisher"
LEFT OUTER JOIN
"store_book" ON ("store_publisher"."id" = "store_book"."publisher_id")
INNER JOIN
"store_book" T3 ON ("store_publisher"."id" = T3."publisher_id")
WHERE
    T3
."rating" > 3.0
GROUP BY
   
"store_publisher"."id",
   
"store_publisher"."name",
   
"store_publisher"."num_awards"
HAVING COUNT
(DISTINCT "store_book"."id") = (COUNT(DISTINCT T3."id"))

which is exactly what I expected.

However, in this example the annotation + filter is done over a ForeignKey, if do this over a M2M field:

stores = (Store.objects
               
.annotate(num_books=Count('books', distinct=True))
               
.filter(books__rating__gt=3.0)
               
.annotate(num_rated=Count('books', distinct=True))
               
.filter(num_books=F('num_rated'))
)

we get the following SQL:

SELECT
   
"store_store"."id",
   
"store_store"."name",
   
"store_store"."registered_users",
    COUNT
(DISTINCT "store_store_books"."book_id") AS "num_books",
    COUNT
(DISTINCT "store_store_books"."book_id") AS "num_rated"
FROM
"store_store"
LEFT OUTER JOIN
"store_store_books" ON ("store_store"."id" = "store_store_books"."store_id")
INNER JOIN
"store_store_books" T4 ON ("store_store"."id" = T4."store_id")
INNER JOIN
"store_book" T5 ON (T4."book_id" = T5."id")
WHERE
    T5
."rating" > 3.0
GROUP BY
   
"store_store"."id",
   
"store_store"."name",
   
"store_store"."registered_users"
HAVING
    COUNT
(DISTINCT "store_store_books"."book_id") = (COUNT(DISTINCT "store_store_books"."book_id"))

which is incorrect compared to the first one.
The second annotation is not over the filtered set (i.e. its not using the T4/T5 tables).
What I would have expected here, is the same result as the previous query with publishers.

Any thoughts?

PS. I'm testing this on Django 1, 11, 1

Todor Velichkov

unread,
Jun 12, 2017, 1:50:22 AM6/12/17
to Django users
I believe issue #28297 is related to the problem described above.

https://code.djangoproject.com/ticket/28297

Reply all
Reply to author
Forward
0 new messages