[Django] #34393: A filter query returns more items than the original queryset provides after applying INNER JOIN

7 views

Skip to first unread message

Django

unread,

Mar 8, 2023, 2:59:38 PM3/8/23

to django-...@googlegroups.com

#34393: A filter query returns more items than the original queryset provides after
applying INNER JOIN
-------------------------------------+-------------------------------------
Reporter: Ľuboš | Owner: nobody
Mjachky |
Type: | Status: new
Uncategorized |
Component: Database | Version: 3.2
layer (models, ORM) | Keywords: filter query
Severity: Normal | duplicate distinct
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
In our project, we identified that the filter query returns more entries
than the number of entries stored in the initial queryset.

The following piece of code is involved:
{{{
# qs.count() == 4
scoped_repos = repo_viewset.get_queryset().values_list("pk", flat=True)
filtered_content = qs.filter(repositories__in=scoped_repos)
# filtered_content.count() == 8
}}}

The generated query:
{{{
SELECT * FROM "rpm_package" INNER JOIN "core_content" ON
("rpm_package"."content_ptr_id" = "core_content"."pulp_id")
INNER JOIN "core_repositorycontent" ON ("core_content"."pulp_id" =
"core_repositorycontent"."content_id")
WHERE "core_repositorycontent"."repository_id" IN (c35b7039-2c2c-48e3
-8f4f-b0eeabad8af1, ee39a78b-9dd5-4bdf-85d9-eb6406b6ef49)
}}}

One of the things being noticed is that the query is constructed with an
INNER JOIN clause instead of a LEFT JOIN clause. The
core_repositorycontent table contains a lot of duplicates. We believe that
this should not be a problem. Adding the distinct() query at the end of
the call resolves the issue. See
https://github.com/pulp/pulpcore/pull/3642.

The question is whether this is a bug in Django (i.e., a filter query can
return more elements than there are in the original queryset) or on our
side, and we should restructure the query in a specific way. Any advice is
welcome.

--
Ticket URL: <https://code.djangoproject.com/ticket/34393>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Mar 9, 2023, 12:07:29 AM3/9/23

to django-...@googlegroups.com

#34393: A filter query returns more items than the original queryset provides after
applying INNER JOIN
-------------------------------------+-------------------------------------

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => invalid

Comment:

This is a [https://docs.djangoproject.com/en/stable/topics/db/queries
/#spanning-multi-valued-relationships documented] behavior. You can use
`Subquery()` to prevent duplicates.