[Django] #35810: Provide `Select` class for `select_related` (like `Prefetch` exists for `prefetch_related`)

8 views
Skip to first unread message

Django

unread,
Oct 3, 2024, 12:34:30 PM10/3/24
to django-...@googlegroups.com
#35810: Provide `Select` class for `select_related` (like `Prefetch` exists for
`prefetch_related`)
-------------------------------------+-------------------------------------
Reporter: Bart van Andel | Type: New
| feature
Status: new | Component: Database
| layer (models, ORM)
Version: 5.1 | Severity: Normal
Keywords: query optimization | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Use case:

Given a model `Chat` with many linked `Message`s:

{{{
class Chat(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
users = models.ManyToManyField(User)

def last_message(self):
return self.jobmatch_set.latest("created_at")

def last_message_using_prefetched_set(self):
return max(self.message_set.all(), key=lambda item:
item.created_at, default=None)

def last_message_using_prefetched_single_item_sets(self):
assert hasattr(self, "_last_message_set"), "_last_message_set not
populated":
return self._last_message_set[0] if len(self._last_message_set) >
0 else None

class Message(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
chat = models.ForeignKey(Chat)
}}}

We'd like to load a list of chats with the latest message of each chat.
This doesn't seem to be easily achievable at the moment. Example:

{{{
# Valid, but causes an additional trip for each chat:
chats = Chat.objects.all()
last_messages = [chat.last_message for chat in chats]

# Invalid:
chats = Chat.objects.select_related("last_message")

# Also invalid:
chats = Chat.objects.prefetch_related("last_message")

# Valid approach (2 queries), which may however load a huuuuge amount of
data, so in general this should be avoided
# Also, there is no guarantee that the prefetched set is not filtered from
the perspective of the last message getter property
chats = Chat.objects.prefetch_related("message_set")
last_messages = [chat.last_message_using_prefetched_set for chat in chats]

# Workaround (2 queries) using prefetch to get the single latest (if
exist) message per chat:
chats = Chat.objects.prefetch_related(
Prefetch(
"chat_set",
queryset=Chat.objects.order_by("-created_at")[:1],
to_attr="_last_match_set",
),
)
last_messages = [chat.last_message_using_prefetched_single_item_sets for
chat in chats]
}}}

I do appreciate that there are probably very good reasons why the invalid
calls in the example above won't be able to work, unless maybe with some
still-to-be-invented annotations.

Now, the `Prefetch` class was introduced quite recently to even make the
above possible. It can only be used to prefetch _sets of items_ though,
not _single values_, so using something like `.latest("created_at")` is
out of the question.

Could something like that also be implemented to support this syntax?
Basically the same effect as `Prefetch`, except with a single output (or
None).

{{{
chats = Chat.objects.select_related(
Select(
"chat_set",
queryset=Chat.objects.latest("created_at"),
to_attr="last_match",
),
)
}}}

Note: I couldn't find a similar question in the existing ticket list; if I
missed something, I do apologize.
--
Ticket URL: <https://code.djangoproject.com/ticket/35810>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 3, 2024, 1:11:40 PM10/3/24
to django-...@googlegroups.com
#35810: Provide `Select` class for `select_related` (like `Prefetch` exists for
`prefetch_related`)
-------------------------------------+-------------------------------------
Reporter: Bart van Andel | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: query optimization | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Bart van Andel):

Note: I tried to optimize our use case like this (pseudo code):

{{{
if self.message_set.is_loaded:
return max(self.message_set.all(), key=lambda item: item.created_at,
default=None)
# or, more clearly, but also not possible:
return self.message_set.latest("created_at", trust_cached_data=True)
else:
return self.message_set.latest("created_at")

# or, even more clearly, just get the value directly, using preloaded data
if it exists:
return self.message_set.latest("created_at", trust_cached_data=True)
}}}

There is no `is_loaded` or `trust_cached_data` (or equivalents) however,
so this obviously won't work. Also, obviously this should be used with
extreme care, but then again, when optimizing you sometimes need to be
careful and creative anyway.
--
Ticket URL: <https://code.djangoproject.com/ticket/35810#comment:1>

Django

unread,
Oct 3, 2024, 7:00:13 PM10/3/24
to django-...@googlegroups.com
#35810: Provide `Select` class for `select_related` (like `Prefetch` exists for
`prefetch_related`)
-------------------------------------+-------------------------------------
Reporter: Bart van Andel | Owner: (none)
Type: New feature | Status: closed
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: query optimization | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* resolution: => wontfix
* status: new => closed

Comment:

I'm struggling to find it now but I know the idea of allow custom classes
to be provided to `select_related` was already discussed somewhere in the
past years.

I think the question here is less about whether we'd want to support such
pattern but how exactly it should be achieved.

For example, for reverse relationships it could be done using `LEFT JOIN
LATERAL` on backends that support it

{{{#!python
chats = Chat.objects.select_related(
Select(
"messages",
queryset=Message.objects.order_by("created_at"),
to_attr="lastest_message",
),
)
}}}

{{{#!sql
SELECT
chat.*,
latest_message.*
LEFT JOIN LATERAL (
SELECT *
FROM message
WHERE chat_id = chat.id
ORDER BY created_at DESC
LIMIT 1
) latest_message
}}}

I'm not sure what the best way to express this through the ORM should be
though. It feels like it could also be something that is expressed through
a `Subquery` that doesn't make use of `values` to limit to a single field

{{{#!python
chats = Chat.objects.annotate(
lastest_message=Message.objects.filter(
chat_id=OuterRef("id"),
).order_by("created_at")[1]
)
}}}

In all cases, just like any new large feature request,
[https://docs.djangoproject.com/en/stable/internals/contributing/bugs-and-
features/#requesting-features it should be discussed on the forum] before
hand to gather consensus.
--
Ticket URL: <https://code.djangoproject.com/ticket/35810#comment:2>
Reply all
Reply to author
Forward
0 new messages