Re: [Django] #2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates

49 views
Skip to first unread message

Django

unread,
Jun 8, 2016, 8:58:33 PM6/8/16
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: adrian
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* Attachment "2361-test.diff" added.


--
Ticket URL: <https://code.djangoproject.com/ticket/2361>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 15, 2018, 3:37:52 PM5/15/18
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Calvin
| DeBoer
Type: Bug | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Calvin DeBoer):

* owner: Adrian Holovaty => Calvin DeBoer
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:6>

Django

unread,
Sep 24, 2020, 5:33:48 AM9/24/20
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Calvin
| DeBoer
Type: Bug | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: normal | Resolution: fixed

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Almaz Kunpeissov):

* status: assigned => closed
* resolution: => fixed


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:8>

Django

unread,
Sep 24, 2020, 5:56:31 AM9/24/20
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Calvin
| DeBoer
Type: Bug | Status: new

Component: Database layer | Version: master
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* status: closed => new
* resolution: fixed =>


Comment:

Almaz, thanks for checking this ticket, however it still doesn't work for
me. Have you checked
[https://code.djangoproject.com/attachment/ticket/2361/2361-test.diff
test] attached by Tim? It returns duplicates on master:
{{{
>>> Item.objects.filter(tags__isnull=False)
[<Item: four>, <Item: one>, <Item: one>, <Item: two>, <Item: two>]
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:9>

Django

unread,
Jun 29, 2022, 5:01:51 AM6/29/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Calvin
| DeBoer
Type: Bug | Status: new
Component: Database layer | Version: dev

(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aleks-Daniel Jakimenko-Aleksejev):

Hit the same bug in 3.2.13 (and was *very* surprised!). Adding .distinct()
helps, but it feels like a workaround.

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:10>

Django

unread,
Jul 22, 2022, 6:14:50 AM7/22/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: (none)
Type: Bug | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: Calvin DeBoer => (none)


* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:11>

Django

unread,
Jul 22, 2022, 6:14:55 AM7/22/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: (none)
Type: Bug | Status: new

Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:12>

Django

unread,
Sep 24, 2022, 11:29:08 AM9/24/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Norbert Stüken):

* owner: (none) => Norbert Stüken


* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:13>

Django

unread,
Sep 24, 2022, 12:36:09 PM9/24/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Norbert Stüken):

I was able to reproduce the problem in Django 4.2.dev20220923141503 and
have adapted and extended Tim 's test a bit:

{{{
def test_ticket_2361(self):
"""
Prove that QuerySet.filter(m2mfield__isnull=False) may return
duplicates.
"""
# Tags without items are returned, but several times if they link
to multiple tags.
self.assertQuerysetEqual(
Item.objects.filter(tags__isnull=False),
['<Item: four>', '<Item: one>', '<Item: one>', '<Item: two>',
'<Item: two>'],
transform=repr
)

# Adding distinct helps, but feels like a workaround
self.assertQuerysetEqual(
Item.objects.filter(tags__isnull=False).distinct(),
['<Item: four>', '<Item: one>', '<Item: two>'],
transform=repr
)
}}}

To finally close the ticket after 16 years, it needs a decision from the
Django team with the following options:

1. We change the ORM in such a way that it automatically executes a
`distinct()` in the described edge case. **This would be a breaking
change**.
2. We don't change anything in the Django code, but add a note to the
Django documentation that in this particular case the unexpected behavior
may occur and can be fixed with a `distinct()` .
3. Another solution is found.

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:14>

Django

unread,
Sep 27, 2022, 11:46:53 AM9/27/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Hi Norbert. Thanks for looking at this!

An option may be to document the current situation whilst also waiting for
the other solution, so 2 and 3.

Would you like to prepare a suggestion for 2?

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:15>

Django

unread,
Oct 26, 2022, 6:59:53 AM10/26/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Norbert Stüken):

Hi Carlton,

it took me a while to work through the contributor documentation, but here
are my suggested changes:
https://github.com/django/django/compare/main...stueken:django:ticket_2361

The first commit proves the mentioned behavior in Django, the second adds
a note with an example to the documentation.

Could you have a look at it?

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:16>

Django

unread,
Oct 26, 2022, 7:19:16 AM10/26/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

The current behavior is already
[https://docs.djangoproject.com/en/stable/topics/db/queries/#spanning-
multi-valued-relationships documented] in ''"Spanning multi-valued
relationship"'' section. I think there is no need to document the same
thing twice 🤔.

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:17>

Django

unread,
Oct 26, 2022, 8:31:52 AM10/26/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Norbert Stüken):

Hi Mariusz,

both examples address potential duplicates when using m2m relationships.
However, the already documented example at the end of the "Spanning multi-
valued relationship" section states that the duplicates are yielded since
multiple filters are chained and therefore resulting in multiple joins
compared to using only one filter.

My example does not address multiple filters, but shows the quite
unexpected behavior of m2m relations using just one filter which is the
reason this ticket was created.
I can't follow this behavior out of the example in "Spanning multi-valued
relationships".

As stated, in my opinion, both examples show different screnarios of how
m2m relationships can yield duplicates. Maybe linking both examples to
each other and explaining their difference would make it clearer.

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:18>

Django

unread,
Oct 26, 2022, 9:05:02 AM10/26/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

FWIW I also didn't read the "Spanning multi-valued relationships"
`filter(A).filter(B)` vs `filter(A, B)` example as entailing the
`filter(m2m__isnull)` example here.
(I'd have to sit down with a piece of paper to see why that's equivalent.
🙂)

Maybe expanding that section though, so it's on topic (by section title at
least)?

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:19>

Django

unread,
Oct 26, 2022, 10:22:00 AM10/26/22
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Norbert Stüken):

Thanks for the quick replies!

I like your example namings, very clear (could be used for example
namings):
- Example 1: `filter(A).filter(B) vs filter(A, B)`
- Example 2: `filter(m2m__isnull)`

I think, both examples are placed correctly under the respective headings
and would possibly be missed if put below the other section:
- **Making Queries** > **Retrieving objects** > **Lookups that span
relationships** > **Spanning multi-valued relationships** --> Note with
example 1 for possible duplicates when using multiple filters
- **Making Queries** > **Related objects** > **Many-to-many
relationships** --> Note with example 2 for possible duplicates when using
`isnull` in a filter

So if the examples address different causes for duplicate entries in m2m
relationships, I would prefer to link the examples to each other.

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:20>

Django

unread,
Oct 6, 2023, 10:16:39 AM10/6/23
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Fabio Sangiovanni):

* cc: Fabio Sangiovanni (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:21>

Django

unread,
Aug 26, 2024, 10:37:06 AM8/26/24
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Csirmaz Bendegúz):

I think this is a well-known Django "gotcha", {{{distinct()}}} has a big
performance impact so I would definitely advise against it.
--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:22>

Django

unread,
Aug 26, 2024, 1:27:42 PM8/26/24
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: Norbert
| Stüken
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

FWIW the most straightforward way to work around this issue today is to
use `Exists` instead

{{{#!python
Blog.objects.filter(
Exists(Entry.objects.filter(blog=OuterRef("pk"))
)
}}}

Unfortunately the framework doesn't allow you to register transforms on
related fields lookups otherwise this could be as simple as

{{{#!python
Blog.objects.filter(entries__exists=True)
}}}

Here's [https://github.com/django/django/compare/main...charettes:django
:exists-m2m-lookup a very stale and early attempt] at playing this concept
if anyone is interested in trying.

My thoughts at the time was that allowing such transforms could greatly
reduce a lot of the boilerplate associated with multi-valued relationships
and filtering in hope to eventually resolve #2361.

For example

{{{#!python
Blog.objects.filter(entries__exists=Q(published=True))
# Instead of
Blog.objects.filter(
Exists(Entry.objects.filter(blog=OuterRef("pk"), published=True))
)

Blog.objects.filter(entries__count__gte=10)
# Instead of
Blog.objects.annotate(
entries__count=Count("entries")
).filter(entries__count__gte=10)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:23>

Django

unread,
Oct 24, 2025, 3:58:25 AM10/24/25
to django-...@googlegroups.com
#2361: QuerySet.filter(m2mfield__isnull=False) may return duplicates
-------------------------------------+-------------------------------------
Reporter: daniel.tietze@… | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: Norbert Stüken => (none)
* status: assigned => new

--
Ticket URL: <https://code.djangoproject.com/ticket/2361#comment:24>
Reply all
Reply to author
Forward
0 new messages