[Django] #35947: prefetch_related makes duplicate queries for the same records by different relations

15 views
Skip to first unread message

Django

unread,
Nov 27, 2024, 1:39:15 PM11/27/24
to django-...@googlegroups.com
#35947: prefetch_related makes duplicate queries for the same records by different
relations
-------------------------------------+-------------------------------------
Reporter: Jake Douglas | Type:
| Uncategorized
Status: new | Component: Database
| layer (models, ORM)
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Prefetching the same records via different relations using
`prefetch_related` results in duplicated queries for the same set of
records. This is a problem because many web applications traverse
relations by many different paths, even in the same request. The smallest
example I could reproduce follows:

{{{
from django.db import models

# Create your models here.
class House(models.Model):
pass

class Story(models.Model):
house = models.ForeignKey(House, related_name = "stories", on_delete =
models.CASCADE)

class Room(models.Model):
story = models.ForeignKey(Story, related_name = "rooms", on_delete =
models.CASCADE)
house = models.ForeignKey(House, related_name = "rooms", on_delete =
models.CASCADE)

class Window(models.Model):
story = models.ForeignKey(Story, related_name = "windows", on_delete =
models.CASCADE)
room = models.ForeignKey(Room, related_name = "windows", on_delete =
models.CASCADE)

House.objects.prefetch_related(
"rooms__windows",
"stories__rooms__windows"
).all()
}}}

This results in something like this:

{{{
SELECT ••• FROM "houses_house"
SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) #
DUPLICATE
SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1)
SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) #
DUPLICATE
}}}

The same behavior occurs when the last relation in the chain is one-to-
one.
--
Ticket URL: <https://code.djangoproject.com/ticket/35947>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 27, 2024, 2:35:27 PM11/27/24
to django-...@googlegroups.com
#35947: prefetch_related makes duplicate queries for the same records by different
relations
-------------------------------------+-------------------------------------
Reporter: Jake Douglas | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* resolution: => wontfix
* status: new => closed
* type: Uncategorized => Cleanup/optimization

Comment:

I don't think we can fix this without a significant refactor to how
prefetch related work given the Django ORM does not maintain a map of
object identities by related fields so I'm going to ''won't fix'' this
one.

In order to implement such a feature the prefetch related logic would have
to pass a memoization object of the form `dict[tuple[type[Model],
models.Field], dict[Any, Model]]` that maps model types and from field to
another map of from field value to model instances to
`prefetch_related_objects` and `prefetch_one_level`. All that to support
the likely rare cases where the same relationship is prefetched through
different paths and happens to target the exact set of objects (as any
difference would require an extra query anyway).

If you disagree with the resolution of this ticket
[https://docs.djangoproject.com/en/5.1/internals/contributing/bugs-and-
features/#requesting-features feel free to open a thread on the forums]
about the benefits vs complexity tradeoff of this requested optimization.
--
Ticket URL: <https://code.djangoproject.com/ticket/35947#comment:1>

Django

unread,
Nov 27, 2024, 3:19:51 PM11/27/24
to django-...@googlegroups.com
#35947: prefetch_related makes duplicate queries for the same records by different
relations
-------------------------------------+-------------------------------------
Reporter: Jake Douglas | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jake Douglas):

''All that to support the likely rare cases where the same relationship is
prefetched through different paths and happens to target the exact set of
objects''

For the record, here's an example request from our production application:

{{{
(313 queries including 279 similar and 242 duplicates)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35947#comment:2>

Django

unread,
Nov 27, 2024, 4:58:29 PM11/27/24
to django-...@googlegroups.com
#35947: prefetch_related makes duplicate queries for the same records by different
relations
-------------------------------------+-------------------------------------
Reporter: Jake Douglas | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> For the record, here's an example request from our production
application:
> (313 queries including 279 similar and 242 duplicates)

The example you provided should limit the number of SQL queries to 6 which
you have detailed to be

{{{#!sql
SELECT ••• FROM "houses_house"
SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) #
DUPLICATE
SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1)
SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) #
DUPLICATE
}}}

Which is one for the top-level houses queryset, two for the
`rooms__windows` prefetch, and three for the `stories__rooms__windows`
prefetch where two are duplicates. As explained above, invasive changes
would be required to take that number to 4 under circumstances that your
specific schema design seems to be exhibiting.

If your application is executing 50x the number of 6 queries that could be
reduced to 4 by implementing this optimization there is likely something
else to look on your side before implementing this optimization. In other
words `prefetch_related` is not a feature that systemically make elevated
number of queries issues go away; it's a tool that bounds the N+1 query
problem and not an identity mapper solution #17.
--
Ticket URL: <https://code.djangoproject.com/ticket/35947#comment:3>

Django

unread,
Nov 27, 2024, 5:37:17 PM11/27/24
to django-...@googlegroups.com
#35947: prefetch_related makes duplicate queries for the same records by different
relations
-------------------------------------+-------------------------------------
Reporter: Jake Douglas | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jake Douglas):

The example code I provided is not from our application, it's just a
simplification that demonstrates the behavior. Most occurrences of
duplicate prefetch queries in our application do not involve denormalized
relations.

The quoted query stats ''are'' from a request in our application and were
intended to demonstrate that this behavior is not rare, but I realize the
greater context is missing: almost all of the duplicate queries are coming
from `prefetch_related`, with some queries being issued 20+ times for the
most commonly prefetched relations in this request.
--
Ticket URL: <https://code.djangoproject.com/ticket/35947#comment:4>
Reply all
Reply to author
Forward
0 new messages