[Django] #35779: ORM to avoid deferring the reference_id of a prefetch

13 views
Skip to first unread message

Django

unread,
Sep 20, 2024, 3:25:36 PM9/20/24
to django-...@googlegroups.com
#35779: ORM to avoid deferring the reference_id of a prefetch
-------------------------------------+-------------------------------------
Reporter: Thiago Bellini | Type:
Ribeiro | Uncategorized
Status: new | Component: Database
| layer (models, ORM)
Version: 4.2 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Suppose that I have a `User` and an `Email` models, and they are related
through `Email.user_id`

If I query `User` like this:

{{{
User.objects.prefetch_related("email_set", Email.objects.only("email"))
}}}

I'll see n+1 issues either way, as `user_id` was deferred, and the ORM
will need to refetch the object from the database.

Although the fix for this is easy, just changing this to:

{{{
User.objects.prefetch_related("email_set", Email.objects.only("user_id",
"email"))
}}}

It can catch some users off-guard, as it is not so obvious that the ORM
will not do "the correct thing".

Based on that, I was wondering if maybe the ORM could force the
reference_id of a relation to not be deferred when it is used in a
`Prefetch` object, the same way it already does with `pk`. It makes sense
to me at least.

---

On a side note, this caused some unexpected behavior at work today, as we
were fixing `RemovedInDjango50Warnings` and the simple addition of
`chunk_size` to an `.iterator()` with a `Prefetch` caused some
`Model.DoesNotExist` issues.

After digging for a while I found out about this issue, which means our
`prefetch_related` was being thrown into the trash, but also because the
iteration was taking some time to happen (very large table), some related
objects got deleted in the meantime, and when the ORM tried to
`refresh_from_db` to get the related id, it failed with that error.
--
Ticket URL: <https://code.djangoproject.com/ticket/35779>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 20, 2024, 5:45:52 PM9/20/24
to django-...@googlegroups.com
#35779: ORM to avoid deferring the reference_id of a prefetch
-------------------------------------+-------------------------------------
Reporter: Thiago Bellini | Owner:
Ribeiro | GunSliger00007
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by GunSliger00007):

* owner: (none) => GunSliger00007
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/35779#comment:1>

Django

unread,
Sep 20, 2024, 5:52:33 PM9/20/24
to django-...@googlegroups.com
#35779: ORM to avoid deferring the reference_id of a prefetch
-------------------------------------+-------------------------------------
Reporter: Thiago Bellini | Owner:
Ribeiro | GunSliger00007
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by GunSliger00007):

Consider optimizing your queries by prefetching only the fields you need.
In your case, you’re already using only() to limit the fields loaded by
the query, which helps reduce the amount of data loaded into memory.If the
related Email objects are large, consider using .values() or
.values_list() to reduce the amount of data furthe

User.objects.prefetch_related(
Prefetch("email_set", queryset=Email.objects.only("user_id", "email"))
)
--
Ticket URL: <https://code.djangoproject.com/ticket/35779#comment:2>

Django

unread,
Sep 20, 2024, 5:56:02 PM9/20/24
to django-...@googlegroups.com
#35779: ORM to avoid deferring the reference_id of a prefetch
-------------------------------------+-------------------------------------
Reporter: Thiago Bellini | Owner:
Ribeiro | GunSliger00007
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Thiago Bellini Ribeiro):

Replying to [comment:2 GunSliger00007]:
>
>
> Consider optimizing your queries by prefetching only the fields you
need. In your case, you’re already using only() to limit the fields loaded
by the query, which helps reduce the amount of data loaded into memory.If
the related Email objects are large, consider using .values() or
.values_list() to reduce the amount of data furthe
>
>
> {{{
> User.objects.prefetch_related(
> Prefetch("email_set", queryset=Email.objects.only("user_id",
"email"))
> )
> }}}
>

That `User`/`Email` was mostly to exemplify the issue :)
--
Ticket URL: <https://code.djangoproject.com/ticket/35779#comment:3>

Django

unread,
Sep 27, 2024, 9:16:13 AM9/27/24
to django-...@googlegroups.com
#35779: ORM to avoid deferring the reference_id of a prefetch
-------------------------------------+-------------------------------------
Reporter: Thiago Bellini | Owner:
Ribeiro | GunSliger00007
Type: Uncategorized | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => duplicate
* status: assigned => closed

Comment:

Duplicate of #33835 closed as wontfix. See
[https://code.djangoproject.com/ticket/33835#comment:1 explanation in the
ticket].
--
Ticket URL: <https://code.djangoproject.com/ticket/35779#comment:4>
Reply all
Reply to author
Forward
0 new messages