[Django] #36157: Unusual behaviour when pre-fetching with only applied on the related fields

Django

unread,

Jan 29, 2025, 5:20:00 PM1/29/25

to django-...@googlegroups.com

#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+-----------------------------------------
Reporter: Tim McCurrach | Type: Bug
Status: new | Component: Uncategorized
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+-----------------------------------------
When prefetching related models. If you apply `.only()` to the related
queryset, django performs additional lookups for related IDs that have
been left out of the `only`.

It is probably easiest to explain the issue with an example. Suppose you
have these models:

{{{
class Blog(models.Model):
name = models.CharField(max_length=100)

class Post(models.Model):
name = models.CharField(max_length=100)
blog = models.ForeignKey(Blog, on_delete=models.CASCADE,
related_name="posts")
...lots of other big fields
}}}
And you create a few items for each:
{{{
blog = Blog.objects.create(name="Django Tricks")
blog2 = Blog.objects.create(name="React Tricks")
Post.objects.create(name="prefetching", blog=blog)
Post.objects.create(name="models", blog=blog)
Post.objects.create(name="templates", blog=blog)
Post.objects.create(name="hooks", blog=blog2)
Post.objects.create(name="components", blog=blog2)
}}}
If I wish to pre-fetch the posts for some blogs, but only want the names
of each post, rather than the content of each post I can do the following:
{{{
Blog.objects.prefetch_related(Prefetch("posts",
queryset=Post.objects.only("name")))
}}}
I would expect this to result in just 2 database queries. One to fetch the
data for the `Blog` instances, and another to fetch the the data for the
related `Post`s. Instead, there is an n+1 issue where there are 5 extra
follow up requests for each of the related `Post` instances. This is the
SQL that is generated:
{{{
SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE
"app_post"."blog_id" IN (1, 2)
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 1 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 2 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 3 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 4 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 5 LIMIT 21
}}}
I can understand it might be a good idea to have the related-id's for the
blog on hand should you need them later. But I also think, that by using
`.only()` you are explicitly telling django - I don't need these. This is
a real problem for larger data-sets, where you end up with thousands of
extra round-trips to the database.
--
Ticket URL: <https://code.djangoproject.com/ticket/36157>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Jan 29, 2025, 5:27:03 PM1/29/25

to django-...@googlegroups.com

#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+--------------------------------------

Reporter: Tim McCurrach | Owner: (none)

Type: Bug | Status: new
Component: Uncategorized | Version: 5.1

Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Description changed by Tim McCurrach:

Old description:

New description:

When prefetching related models. If you apply `.only()` to the related
queryset, django performs additional lookups for related IDs that have
been left out of the `only`.

It is probably easiest to explain the issue with an example.

=== Example Situation ===

=== Context ===

This is an issue that came up in the wild. I'm using a third-party
optimiser that improves the performance of graphQL queries by decorating
querysets with `only()`, `select_related()` etc. It correctly identifies
that I am only using certain fields and applies `only()` to them, knowing
I will never need to access certain related fields. This results in django
producing many additional hits to the database.

--
--
Ticket URL: <https://code.djangoproject.com/ticket/36157#comment:1>

Django

unread,

Jan 29, 2025, 5:27:55 PM1/29/25

to django-...@googlegroups.com

#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+--------------------------------------
Reporter: Tim McCurrach | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by Tim McCurrach:

Old description:

> When prefetching related models. If you apply `.only()` to the related
> queryset, django performs additional lookups for related IDs that have
> been left out of the `only`.
>
> It is probably easiest to explain the issue with an example.
>

I will never need to access certain related fields. This unfortunately
results in django producing many additional hits to the database. I don't
think this is expected behaviour from django.

--
--
Ticket URL: <https://code.djangoproject.com/ticket/36157#comment:2>

Django

unread,

Jan 29, 2025, 6:23:27 PM1/29/25

to django-...@googlegroups.com

#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+--------------------------------------
Reporter: Tim McCurrach | Owner: (none)

Type: Bug | Status: closed
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Changes (by Simon Charette):

* resolution: => duplicate
* status: new => closed

Comment:

Hello Tim,

This is a duplicate of #33835 which was won't fixed.

> I can understand it might be a good idea to have the related-id's for
the blog on hand should you need them later. But I also think, that by
using .only() you are explicitly telling django - I don't need these. This
is a real problem for larger data-sets, where you end up with thousands of
extra round-trips to the database.

The thing is prefetching **must** have `blog_id` otherwise it has no way
to build the associative map between `Blog` instances and `Post` to
populate `blog.posts.all()`. In other words if I gave you a list of the
form `posts = [{"id": 1, "name": "Some blog Post"}, {"id": 2, "name":
"Some other blog Post"}` how would you partition it by `blog_id`?

Adapting `prefetch_related` to error out if provided an inadequate
`Prefetch(queryset)` could potentially be done but that would not solve
your actual problem.The third-party library you are using to automatically
generate these queries is flawed and should include `blog_id` in the
select mask (AKA the `only`) call if it relies on prefetching.

If you'd like to catch these early I'd suggest looking at the potential
solution to have warnings emitted on query leaks in
ticket:33835#comment:2.
--
Ticket URL: <https://code.djangoproject.com/ticket/36157#comment:3>

Django

unread,

Jan 29, 2025, 6:56:22 PM1/29/25

to django-...@googlegroups.com

#36157: Unusual behaviour when pre-fetching with only applied on the related fields
-------------------------------+--------------------------------------
Reporter: Tim McCurrach | Owner: (none)
Type: Bug | Status: closed
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Comment (by Tim McCurrach):

Hello Simon,

Thanks for the speedy reply, and the explanation. That's very useful, and
makes a lot of sense.

I'll raise the issue with the third-party, and check out your library.
--
Ticket URL: <https://code.djangoproject.com/ticket/36157#comment:4>

Reply all

Reply to author

Forward