#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+-----------------------------------------
Reporter: Tim McCurrach | Type: Bug
Status: new | Component: Uncategorized
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+-----------------------------------------
When prefetching related models. If you apply `.only()` to the related
queryset, django performs additional lookups for related IDs that have
been left out of the `only`.
It is probably easiest to explain the issue with an example. Suppose you
have these models:
{{{
class Blog(models.Model):
name = models.CharField(max_length=100)
class Post(models.Model):
name = models.CharField(max_length=100)
blog = models.ForeignKey(Blog, on_delete=models.CASCADE,
related_name="posts")
...lots of other big fields
}}}
And you create a few items for each:
{{{
blog = Blog.objects.create(name="Django Tricks")
blog2 = Blog.objects.create(name="React Tricks")
Post.objects.create(name="prefetching", blog=blog)
Post.objects.create(name="models", blog=blog)
Post.objects.create(name="templates", blog=blog)
Post.objects.create(name="hooks", blog=blog2)
Post.objects.create(name="components", blog=blog2)
}}}
If I wish to pre-fetch the posts for some blogs, but only want the names
of each post, rather than the content of each post I can do the following:
{{{
Blog.objects.prefetch_related(Prefetch("posts",
queryset=Post.objects.only("name")))
}}}
I would expect this to result in just 2 database queries. One to fetch the
data for the `Blog` instances, and another to fetch the the data for the
related `Post`s. Instead, there is an n+1 issue where there are 5 extra
follow up requests for each of the related `Post` instances. This is the
SQL that is generated:
{{{
SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE
"app_post"."blog_id" IN (1, 2)
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 1 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 2 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 3 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 4 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 5 LIMIT 21
}}}
I can understand it might be a good idea to have the related-id's for the
blog on hand should you need them later. But I also think, that by using
`.only()` you are explicitly telling django - I don't need these. This is
a real problem for larger data-sets, where you end up with thousands of
extra round-trips to the database.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36157>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.