class Foo(models.Model):
rel1 = models.ManyToManyField('Bar', related_name='foo_rel1')
rel2 = models.ManyToManyField('Bar', related_name='foo_rel2')
}}}
I have to query all instances of Foo that point to the same Bar via any of
the both relationships. However when an instance of Foo had multiple
relationships to unrelated Bar instances, I got duplicated instances in
the resulting queryset:
{{{
#!python
b1 = Bar.objects.create()
b2 = Bar.objects.create()
b3 = Bar.objects.create()
f = Foo.objects.create()
f.rel1.set([b2,b3])
f.rel2.set([b1])
assert Foo.objects.filter(rel2=b1).count() == 1 # Good
assert Foo.objects.filter(rel1=b1).count() == 0 # Good
qs = Foo.objects.filter(Q(rel1=b1) | Q(rel2=b1))
assert qs.count() == 1 # AssertionError, qs.count() == 2
print(qs.values('rel1', 'rel2'))
# <QuerySet [{'rel1': 2, 'rel2': 1}, {'rel1': 3, 'rel2': 1}]>
}}}
As the last output shows, one entry per relationship in rel1 is returned.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => closed
* resolution: => invalid
Comment:
This is documented behavior. See the warning in the
[https://docs.djangoproject.com/en/dev/ref/models/querysets/#values
QuerySet.values() doc]:
Because `ManyToManyField` attributes and reverse relations can have
multiple related rows, including these can have a multiplier effect on the
size of your result set. This will be especially pronounced if you include
multiple such fields in your `values()` query, in which case all possible
combinations will be returned.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:1>
Comment (by Till Theato):
In the original code no ManyToMany fields are included in the values call.
I added this only for documentation purposes, but that appears to make no
sense. Sorry about that.
The actual problem lies one line ahead in the failing assertion: The count
of the queryset is 2 instead of the expected 1.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:2>
Comment (by Tim Graham):
I believe you need to use
[https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct
QuerySet.distinct()]:
By default, a `QuerySet` will not eliminate duplicate rows. In practice,
this is rarely a problem, because simple queries such as
`Blog.objects.all()` don’t introduce the possibility of duplicate result
rows. However, if your query spans multiple tables, it’s possible to get
duplicate results when a `QuerySet` is evaluated. That’s when you’d use
`distinct()`.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:3>
Comment (by Till Theato):
I am aware of Queryset.distinct(), however I could not believe that the
observed behavior was expected. Before reporting this bug I already tried
using distinct on the original query where the problem occurred, but this
did not help because I calculate a cumulative sum in an annotation of the
query using a custom Func:
{{{#!python
class CumSum(Func):
function = 'sum'
template = '%(function)s(%(expressions)s) OVER (ORDER BY created)'
}}}
The annotation happens before distinctness is enforced and therefore the
CumSum of entries after duplication is wrong.
Looks like I will have to pull that out of DB then – if the reported
problem is really expected behavior.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:4>
Comment (by Tim Graham):
I'll admit that I don't have a deep understanding of the ORM. If you can
offer a patch showing why Django is at fault, feel free to reopen the
ticket.
--
Ticket URL: <https://code.djangoproject.com/ticket/28292#comment:5>