[Django] #21760: prefetch_related uses an inefficient query to get the related data

12 views
Skip to first unread message

Django

unread,
Jan 9, 2014, 11:55:23 PM1/9/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related data
----------------------------------------------+----------------------------
Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: 1.6
Severity: Normal | Keywords:
Triage Stage: Unreviewed | prefetch_related
Easy pickings: 0 | Has patch: 0
| UI/UX: 0
----------------------------------------------+----------------------------
Consider the following models and query:

{{{
#!python

class Author(models.Model):
name = models.CharField(max_length = 20)

class Book(models.Model):
name = models.CharField(max_length = 20)
author = models.ForeignKey(Author, related_name = 'books')

Book.objects.all().prefetch_related('author')
}}}

The query I'd expect to be issued by the `prefetch_related` is:

{{{
#!sql

SELECT * FROM author
WHERE id IN (... the author_ids of the books from the main query ...)
}}}

However, the query that actually get executed are:

{{{
#!sql

SELECT a.* FROM author a INNER JOIN book b ON (a.id = b.author_id)
WHERE b.id IN (... the ids of the books from the main query ...)
}}}

There are two problems with this approach:

1. It does a join, which wouldn't be necessary if `author_id`s were used
instead of `book.id`s
2. When one uses `prefetch_related` in such a case (that is, to prefetch a
single fk per object rather than a many to many) instead of
`select_related`, it's usually because there are many primary objects
(books) and few unique related objects (authors); `select_related`,
despite getting all the data in one query, could end up returning much
more data than is needed because the related objects are duplicated many
times over, and will be slower overall. Thus, the approach currently used
by `prefetch_related` ends up using the set of ids that will almost always
be larger (the book ids, rather than the authors).

The reverse case (`Author.objects.all().prefetch_related('books')`) does
the efficient, joinless query I'd expect -- namely:

{{{
#!sql

SELECT * FROM book
WHERE author_id IN (... the ids of the authors from the main query ...)
}}}

I've attached a sample project with a test.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 9, 2014, 11:57:32 PM1/9/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: prefetch_related | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by valtron2000@…):

* needs_better_patch: => 0
* needs_docs: => 0
* needs_tests: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:1>

Django

unread,
Jan 10, 2014, 12:03:37 AM1/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: prefetch_related | Unreviewed
ForeignKey | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |

-------------------------------------+-------------------------------------
Changes (by valtron2000@…):

* keywords: prefetch_related => prefetch_related ForeignKey


--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:2>

Django

unread,
Jan 10, 2014, 1:30:21 AM1/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: prefetch_related | Unreviewed
ForeignKey | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by loic84):

* cc: loic@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:3>

Django

unread,
Jan 10, 2014, 1:37:45 AM1/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0

Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):

* stage: Unreviewed => Accepted


Comment:

Seems OK to me.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:4>

Django

unread,
Jan 10, 2014, 4:57:04 PM1/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0

Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by valtron2000@…):

* has_patch: 0 => 1


Comment:

Added a patch & test; seems to do the trick and all tests pass.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:5>

Django

unread,
Jan 10, 2014, 5:06:28 PM1/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by valtron2000@…):

My bad, I only ran the prefetch_related tests. Patch breaks
`foreign_object.tests.MultiColumnFKTests.test_prefetch_foreignkey_forward_works`.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:6>

Django

unread,
Apr 8, 2014, 11:58:16 AM4/8/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by intgr):

* cc: marti@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:7>

Django

unread,
Apr 8, 2014, 12:07:49 PM4/8/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by intgr):

I am in the process of migrating an app from Django 1.5 to 1.6; this
regression also affects us (it did not occur in 1.5).

I can confirm that the attached patch fixes this issue when applied on top
of Django 1.6.2. Can it be included in 1.6.3?

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:8>

Django

unread,
Apr 8, 2014, 1:02:23 PM4/8/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage: Accepted

Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by timo):

* severity: Normal => Release blocker


Comment:

Haven't verified the claim that it's a regression from 1.5, but marking it
as a release blocker assuming that's the case. A note in the 1.6.3 release
notes would also be helpful.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:9>

Django

unread,
Apr 9, 2014, 5:21:30 AM4/9/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by loic84):

Related to #21410.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:10>

Django

unread,
Apr 10, 2014, 12:22:37 AM4/10/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by loic84):

We discussed this issue with @akaariai on IRC.

Like we did for #21410, we'll take a practical approach for fixing the
regression from Django 1.6 to master and go for the proposed fix. It's
worth noting that the inefficient query will still be present for
multicolumn relations, but these are not documented and one needs to use
private APIs to make them, so that doesn't qualify as a regression.

The implementation of
`ReverseSingleRelatedObjectDescriptor.get_prefetch_queryset()` really
ought to be cleaned up at some point, especially when we land the patch
for composite fields but it'll require a fair amount of refactoring so it
only qualifies for master.

@valtron2000, would you mind updating the patch?

- The "FIXME" comment would have to be updated to explain this the new
case with a reference to this ticket.
- The current test is pretty fragile, could you rather check in QS.query
if there is a join?

Feel free to open a PR on github.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:11>

Django

unread,
Apr 11, 2014, 8:09:25 PM4/11/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by valtron2000@…):

I'll submit a PR soon; thanks for the suggestions.

The fix is a one-liner in the `if`, and the rest of the changes were just
cleanup.

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:12>

Django

unread,
Apr 11, 2014, 9:31:09 PM4/11/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------

Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.6
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by valtron2000@…):

https://github.com/django/django/pull/2538

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:13>

Django

unread,
Apr 12, 2014, 1:46:06 PM4/12/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------
Reporter: valtron2000@… | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 1.6
(models, ORM) | Resolution: fixed

Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by Loic Bistuer <loic.bistuer@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"d3b71b976dee4b02908235b8007db254a9795268"]:
{{{
#!CommitTicketReference repository=""
revision="d3b71b976dee4b02908235b8007db254a9795268"
Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:14>

Django

unread,
Apr 12, 2014, 1:53:19 PM4/12/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------
Reporter: valtron2000@… | Owner: nobody

Type: Bug | Status: closed
Component: Database layer | Version: 1.6
(models, ORM) | Resolution: fixed
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Loic Bistuer <loic.bistuer@…>):

In [changeset:"6b3a8d27052bb5b242ab6c1c2fb0d6da5b49681d"]:
{{{
#!CommitTicketReference repository=""
revision="6b3a8d27052bb5b242ab6c1c2fb0d6da5b49681d"
[1.7.x] Fixed #21760 -- prefetch_related used an inefficient query for
reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Backport of d3b71b976d from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:15>

Django

unread,
Apr 12, 2014, 2:08:45 PM4/12/14
to django-...@googlegroups.com
#21760: prefetch_related uses an inefficient query to get the related fk data
-------------------------------------+-------------------------------------
Reporter: valtron2000@… | Owner: nobody

Type: Bug | Status: closed
Component: Database layer | Version: 1.6
(models, ORM) | Resolution: fixed
Severity: Release blocker | Triage Stage: Accepted
Keywords: prefetch_related | Needs documentation: 0
ForeignKey | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Loic Bistuer <loic.bistuer@…>):

In [changeset:"1252b77824639fba398cf70586bdd75c42f86fdf"]:
{{{
#!CommitTicketReference repository=""
revision="1252b77824639fba398cf70586bdd75c42f86fdf"
[1.6.x] Fixed #21760 -- prefetch_related used an inefficient query for
reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Conflicts:
tests/prefetch_related/tests.py

Backport of d3b71b976d from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21760#comment:16>

Reply all
Reply to author
Forward
0 new messages