Re: [Django] #15819: Admin searches should use distinct, if query involves joins

26 views
Skip to first unread message

Django

unread,
Apr 2, 2012, 8:15:39 AM4/2/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: SVN
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by johnfink8):

* status: closed => reopened
* ui_ux: => 0
* resolution: fixed =>
* version: 1.3 => SVN


Comment:

Still produces duplicate rows under the right circumstances. The
lookup_needs_distinct check only goes one join deep looking for a
ManyToMany or a reverse to a ForeignKey, but if the search field is more
deeply nested than that than the possibility of needing distinct() rises
sharply.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:16>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 2, 2012, 8:56:04 AM4/2/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: SVN
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

Maybe the queryset should expose some way to tell if distinct should be
used? The queryset should contain that logic, the admin should not try to
duplicate / guess that. It would be pretty easy to set a flag
"contains_multijoin" whenever such a join is seen in
sql/query/setup_joins, and after that Admin could just do if
qs.query.contains_multijoin: qs = qs.distinct().

It would be possible to generate queries in a way that distinct isn't
needed at all. This can be done using subqueries instead of joins for
reverse fk/m2m filters. Unfortunately some backends will likely choke on
such queries (while others will benefit) so just changing the ORM to use
subqueries isn't possible, or at least the backwards compatibility issues
should be investigated thoroughly. In addition, changing the ORM code to
generate such queries isn't the easiest thing to do, so that is another
problem for this path.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:17>

Django

unread,
Apr 2, 2012, 11:02:04 AM4/2/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: SVN
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

Here is a quick patch moving the "needs distinct" logic into the sql/query
class. No new tests, but I am pretty sure it works correctly even for
deeper join structures.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:18>

Django

unread,
Aug 9, 2012, 2:39:48 PM8/9/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by hcarvalhoalves):

The problem, though, is that `DISTINCT` is '''painfully slow''' on most
databases. A query like...

{{{
SELECT DISTINCT "mytable"."id", ... ORDER BY "mytable"."somefield" DESC
LIMIT 100
}}}

..will force a full table scan and query time will increase linearly with
the number of rows. Django shouldn't ''guess'' when to use it internally
without a way for client code to override it, ''specially'' if this logic
is moved to the ORM.

This currently happens on `contrib.admin` if you include a M2M relation in
the `list_filters`, and makes the `changelist` views unusable if your
database has more than a couple hundred queries. See ticket #18729 for
that.

More often than not, you want to optimize the query to use a subquery
(giving hints to the query planner because it increases selectivity), or
querying ''without'' `DISTINCT` and removing duplicates on Python side.
Here's a case study on optimizing queries to avoid use of `DISTINCT`:
http://www.databasejournal.com/features/postgresql/article.php/3437821
/SELECT-DISTINCT-A-SQL-Case-Study.htm

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:19>

Django

unread,
Aug 9, 2012, 4:26:42 PM8/9/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

Yeah, I agree about distinct being slow, and as mentioned in the article
it is usually a sign of badly written query.

I find it hard to believe doing the distinct operation on Python side
would lead to a net win. Databases are written in C and they will be
faster at doing the distinct than transferring the whole dataset to Python
and then doing the distinct on that side.

Making the ORM to do transformation of m2m filter to subquery should be
easy in the trivial cases, as we already do them for m2m exclude joins.
The harder problem is doing them for queries like this:
{{{
qs.filter(Q(m2m_rel__foo='Bar')|Q(m2m_rel__foo='Baz'))
}}}
The problem is that lookups inside one filter should target the same join
(IIRC), but the transform-to-subquery logic doesn't know anything about
that.

Still, as a long term goal it would be nice to move away from the current
confusing situation where filters to m2m can create duplicate results.
However, doing that cleanly in the ORM is hard, and then there are
backwards compatibility issues too.

There is a closely related issue: use aggregations and m2m filters in the
same query -> wrong results.

So, for the time being, I think we will just need to use the ORM to detect
when to use distinct. It should be a small step forward. I hope we can
have automatic subqueries, but this isn't likely to happen in the close
future.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:20>

Django

unread,
Aug 14, 2012, 7:27:11 PM8/14/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by hcarvalhoalves):

Replying to [comment:20 akaariai]:

> Yeah, I agree about distinct being slow, and as mentioned in the article
it is usually a sign of badly written query.
>
> I find it hard to believe doing the distinct operation on Python side
would lead to a net win.


Depends completely on the use case. If I'm querying 1 million tuples with
`DISTINCT ... LIMIT 100` my database dies because that's a full table scan
(twice) just to retrieve 100. If I query without `DISTINCT`, I send 100
tuples over the wire and remove duplicates in Python - that's trivial.

That's what I mean about the ORM not ''guessing'' when to use `DISTINCT`,
specially if it's going to happen deep inside ORM code without a clear way
to override.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:21>

Django

unread,
Sep 7, 2012, 10:08:23 AM9/7/12
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: reopened
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by tswicegood):

#18729 was opened as a dupe of this.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:22>

Django

unread,
Apr 13, 2013, 12:59:02 PM4/13/13
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: new

Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by ikelly):

The DISTINCT also causes an Oracle error if the model contains a
TextField. This was reported on django-users:

https://groups.google.com/forum/?fromgroups=#!topic/django-
users/eYTdpy4eWLQ

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:24>

Django

unread,
May 23, 2013, 5:41:36 AM5/23/13
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: new
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by deschler):

* cc: eschler@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:25>

Django

unread,
Aug 28, 2013, 11:05:53 AM8/28/13
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: new
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by iorlas):

* needs_tests: 0 => 1


Comment:

#20991 - one more dupe of this bug..., mkay

So, I'll try to write few tests on this week.
Is it that patch that should be fine?
https://code.djangoproject.com/attachment/ticket/15819/query_contains_multijoin.diff

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:26>

Django

unread,
Aug 28, 2013, 11:09:50 AM8/28/13
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: ryankask
<aip@…> | Status: new
Type: Bug | Version: master
Component: contrib.admin | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 1

Needs tests: 1 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by timo):

* needs_better_patch: 0 => 1


Comment:

Yes, the patch needs to be updated to apply cleanly. You can probably
model the test for the reverse foreign key situation on the one that was
added in [065e6b6153].

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:27>

Django

unread,
May 28, 2021, 6:45:13 AM5/28/21
to django-...@googlegroups.com
#15819: Admin searches should use distinct, if query involves joins
-------------------------------------+-------------------------------------
Reporter: Adam Kochanowski | Owner: Ryan
<aip@…> | Kaskel
Type: Bug | Status: closed
Component: contrib.admin | Version: dev
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => fixed


Comment:

Nested case was fixed in d084176cc1273d5faf6f88eedb4c490e961f3a68.

--
Ticket URL: <https://code.djangoproject.com/ticket/15819#comment:28>

Reply all
Reply to author
Forward
0 new messages