[Django] #31535: Calling annotate with a case statement that references a recursive many to many relationship on the same model as the queryset will create duplicates

25 views
Skip to first unread message

Django

unread,
May 4, 2020, 10:43:51 AM5/4/20
to django-...@googlegroups.com
#31535: Calling annotate with a case statement that references a recursive many to
many relationship on the same model as the queryset will create duplicates
-------------------------------------+-------------------------------------
Reporter: david- | Owner: nobody
cooke |
Type: Bug | Status: new
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
So I have a Challenge model that has an unlocks field which is a
recursive, asymmetric many to many field with a reverse name of
unlocked_by. I call annotate on it to add an "unlocked" field which checks
if the challenges unlocked_by contains any ids in a list of solved
challenges, doing this inserts all the solved challenges into the queryset
that annotate is being called on. There's an example of this here:
https://github.com/david-cooke/djangobug/blob/master/bug/tests.py

The sql being run is
{{{
SELECT "bug_challenge"."id", "bug_challenge"."name", CASE WHEN
"bug_challenge_unlocks"."from_challenge_id" IN (5, 1) THEN True ELSE False
END AS "unlocked" FROM "bug_challenge" LEFT OUTER JOIN
"bug_challenge_unlocks" ON ("bug_challenge"."id" =
"bug_challenge_unlocks"."to_challenge_id")
}}}
so it seems like the left outer join is adding the duplicates.

It is possible to work around this by calling
value_list(flat=True).distinct('pk') however isn't compatible with more
complicated examples such as
https://github.com/ractf/core/blob/master/challenge/views.py#L50

--
Ticket URL: <https://code.djangoproject.com/ticket/31535>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 4, 2020, 10:45:24 AM5/4/20
to django-...@googlegroups.com
#31535: Calling annotate with a case statement that references a recursive many to
many relationship on the same model as the queryset will create duplicates
-------------------------------------+-------------------------------------
Reporter: david-cooke | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by david-cooke:

Old description:

> So I have a Challenge model that has an unlocks field which is a
> recursive, asymmetric many to many field with a reverse name of
> unlocked_by. I call annotate on it to add an "unlocked" field which
> checks if the challenges unlocked_by contains any ids in a list of solved
> challenges, doing this inserts all the solved challenges into the
> queryset that annotate is being called on. There's an example of this
> here: https://github.com/david-cooke/djangobug/blob/master/bug/tests.py
>
> The sql being run is
> {{{
> SELECT "bug_challenge"."id", "bug_challenge"."name", CASE WHEN
> "bug_challenge_unlocks"."from_challenge_id" IN (5, 1) THEN True ELSE
> False END AS "unlocked" FROM "bug_challenge" LEFT OUTER JOIN
> "bug_challenge_unlocks" ON ("bug_challenge"."id" =
> "bug_challenge_unlocks"."to_challenge_id")
> }}}
> so it seems like the left outer join is adding the duplicates.
>
> It is possible to work around this by calling
> value_list(flat=True).distinct('pk') however isn't compatible with more
> complicated examples such as
> https://github.com/ractf/core/blob/master/challenge/views.py#L50

New description:

So I have a Challenge model that has an unlocks field which is a
recursive, asymmetric many to many field with a reverse name of
unlocked_by. I call annotate on it to add an "unlocked" field which checks
if the challenges unlocked_by contains any ids in a list of solved
challenges, doing this inserts all the solved challenges into the queryset
that annotate is being called on. There's an example of this here:
https://github.com/david-cooke/djangobug/blob/master/bug/tests.py

The sql being run is
{{{
SELECT "bug_challenge"."id", "bug_challenge"."name", CASE WHEN
"bug_challenge_unlocks"."from_challenge_id" IN (5, 1) THEN True ELSE False
END AS "unlocked" FROM "bug_challenge" LEFT OUTER JOIN
"bug_challenge_unlocks" ON ("bug_challenge"."id" =
"bug_challenge_unlocks"."to_challenge_id")
}}}

so it seems like the left outer join is adding the duplicates although
thats needed for the case statement.

It is possible to work around this by calling
value_list(flat=True).distinct('pk') however isn't compatible with more
complicated examples such as
https://github.com/ractf/core/blob/master/challenge/views.py#L50

--

--
Ticket URL: <https://code.djangoproject.com/ticket/31535#comment:1>

Django

unread,
May 4, 2020, 12:08:38 PM5/4/20
to django-...@googlegroups.com
#31535: Calling annotate with a case statement that references a recursive many to
many relationship creates duplicates.
-------------------------------------+-------------------------------------
Reporter: David Cooke | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* status: new => closed
* resolution: => invalid


Comment:

This is documented behavior (see the warning in the
[https://docs.djangoproject.com/en/dev/ref/models/querysets/#values
QuerySet.values() docs]):

Because `ManyToManyField` attributes and reverse relations can have
multiple related rows, including these can have a multiplier effect on the
size of your result set. This will be especially pronounced if you include
multiple such fields in your `values()` query, in which case all possible
combinations will be returned.

You can use `Subquery()` or `.distinct(...)` to avoid duplicate.

--
Ticket URL: <https://code.djangoproject.com/ticket/31535#comment:2>

Django

unread,
May 4, 2020, 12:19:35 PM5/4/20
to django-...@googlegroups.com
#31535: Calling annotate with a case statement that references a recursive many to
many relationship creates duplicates.
-------------------------------------+-------------------------------------
Reporter: David Cooke | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

The ORM is able to optimize `.filter(unlocked_by__in=solved_challenges)`
into an `INNER JOIN` but it cannot do the same for annotations because it
could be doing some special handling of `NULL` and it would require some
pretty sophisticated introspection to determine it's safe to do. It would
also be backward incompatible at this point. You should rely on `Subquery`
or even `Exists` in your reported case instead.

--
Ticket URL: <https://code.djangoproject.com/ticket/31535#comment:3>

Reply all
Reply to author
Forward
0 new messages