[Django] #28292: ORing on ManyToMany to same model can result in duplicate items in queryset

15 views
Skip to first unread message

Django

unread,
Jun 9, 2017, 5:52:11 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till | Owner: nobody
Theato |
Type: Bug | Status: new
Component: Database | Version: 1.11
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
One of my models has two ManyToManyFields to another model:
{{{
#!python
class Bar(models.Model):
created = models.DateTimeField(auto_now_add=True)

class Foo(models.Model):
rel1 = models.ManyToManyField('Bar', related_name='foo_rel1')
rel2 = models.ManyToManyField('Bar', related_name='foo_rel2')
}}}

I have to query all instances of Foo that point to the same Bar via any of
the both relationships. However when an instance of Foo had multiple
relationships to unrelated Bar instances, I got duplicated instances in
the resulting queryset:
{{{
#!python
b1 = Bar.objects.create()
b2 = Bar.objects.create()
b3 = Bar.objects.create()

f = Foo.objects.create()
f.rel1.set([b2,b3])
f.rel2.set([b1])

assert Foo.objects.filter(rel2=b1).count() == 1 # Good

assert Foo.objects.filter(rel1=b1).count() == 0 # Good

qs = Foo.objects.filter(Q(rel1=b1) | Q(rel2=b1))

assert qs.count() == 1 # AssertionError, qs.count() == 2

print(qs.values('rel1', 'rel2'))
# <QuerySet [{'rel1': 2, 'rel2': 1}, {'rel1': 3, 'rel2': 1}]>
}}}

As the last output shows, one entry per relationship in rel1 is returned.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 9, 2017, 8:22:05 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till Theato | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* status: new => closed
* resolution: => invalid


Comment:

This is documented behavior. See the warning in the
[https://docs.djangoproject.com/en/dev/ref/models/querysets/#values
QuerySet.values() doc]:

Because `ManyToManyField` attributes and reverse relations can have
multiple related rows, including these can have a multiplier effect on the
size of your result set. This will be especially pronounced if you include
multiple such fields in your `values()` query, in which case all possible
combinations will be returned.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:1>

Django

unread,
Jun 9, 2017, 8:36:53 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till Theato | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Till Theato):

In the original code no ManyToMany fields are included in the values call.
I added this only for documentation purposes, but that appears to make no
sense. Sorry about that.
The actual problem lies one line ahead in the failing assertion: The count
of the queryset is 2 instead of the expected 1.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:2>

Django

unread,
Jun 9, 2017, 8:59:56 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till Theato | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

I believe you need to use
[https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct
QuerySet.distinct()]:

By default, a `QuerySet` will not eliminate duplicate rows. In practice,
this is rarely a problem, because simple queries such as
`Blog.objects.all()` don’t introduce the possibility of duplicate result
rows. However, if your query spans multiple tables, it’s possible to get
duplicate results when a `QuerySet` is evaluated. That’s when you’d use
`distinct()`.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:3>

Django

unread,
Jun 9, 2017, 9:22:58 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till Theato | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Till Theato):

I am aware of Queryset.distinct(), however I could not believe that the
observed behavior was expected. Before reporting this bug I already tried
using distinct on the original query where the problem occurred, but this
did not help because I calculate a cumulative sum in an annotation of the
query using a custom Func:
{{{#!python
class CumSum(Func):
function = 'sum'
template = '%(function)s(%(expressions)s) OVER (ORDER BY created)'
}}}
The annotation happens before distinctness is enforced and therefore the
CumSum of entries after duplication is wrong.
Looks like I will have to pull that out of DB then – if the reported
problem is really expected behavior.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:4>

Django

unread,
Jun 9, 2017, 9:59:22 AM6/9/17
to django-...@googlegroups.com
#28292: ORing on ManyToMany to same model can result in duplicate items in queryset
-------------------------------------+-------------------------------------
Reporter: Till Theato | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

I'll admit that I don't have a deep understanding of the ORM. If you can
offer a patch showing why Django is at fault, feel free to reopen the
ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:5>

Reply all
Reply to author
Forward
0 new messages