[Django] #35317: Add the possibility to do prefetches for only a subset of instances

28 views
Skip to first unread message

Django

unread,
Mar 19, 2024, 2:15:51 PM3/19/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent | Owner: nobody
Lyaudet |
Type: New | Status: new
feature |
Component: Database | Version: 5.0
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Hello,

I have use cases with serializers returning a lot of distinct
informations.
For example, my serialiser can be called with many=True and a query of
2000 Order instances.
And for a small part of these 2000 instances, I need very expensive
prefetches.
Currently, I deal with that by overloading the _fetch_all() method and
make by hand the hyper-specifics and expensive prefetches,
after everything else.
What I propose is to add a filter_callback kwarg to Prefetch object.
filter_callback would be either None (default) or a function taking only
one instance of the previous level of prefetch
and return a boolean to know if the instance needs that prefetch.
A simple example is when you will filter on some status field of Order,
but in that case, you can just modify the query somehow.
And unfortunately, the order id would still be injected in the prefetch
query.
But for more realistic examples, you do not want to recode the business
logic
in the SQL query to know what needs the prefetch or not.

Example :
{{{#!python
orders = (
Order.objects.filter(...)
.exclude(...)
.annotate(...)
.prefetch_related(
"items",
"address",
Prefetch(
"packing_task__zonings",
queryset=Zonings.objects.filter(...),
filter_callback=lambda packing_task:
packing_task.order.needs_to_consider_packing_problematic_zonings(),
to="problematic_zonings",
)
)
}}}

Best regards,
Laurent Lyaudet
--
Ticket URL: <https://code.djangoproject.com/ticket/35317>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 19, 2024, 2:18:30 PM3/19/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Laurent Lyaudet:

Old description:
New description:
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:1>

Django

unread,
Mar 19, 2024, 4:16:08 PM3/19/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Laurent Lyaudet):

I have been able to make a very fast proof of concept right now in Django
shell.
Here is the redacted transcript. I will prepare a PR :)

{{{#!python
import random

from django.db import connection
from django.db.models import query


old_prefetch_one_level = query.prefetch_one_level
def new_prefetch_one_level(instances, prefetcher, lookup, level):
if hasattr(lookup, "filter_callback") and lookup.filter_callback is
not None:
instances = [instance for instance in instances if
lookup.filter_callback(instance)]
if len(instances) == 0:
return [], ()
return old_prefetch_one_level(instances, prefetcher, lookup, level)


query.prefetch_one_level = new_prefetch_one_level


from DeleevCore.orders.models import Order, OrderProduct

from django.db.models import Prefetch


some_random_prefetch = Prefetch("items",
queryset=OrderProduct.objects.all())
some_random_prefetch.filter_callback = lambda x: bool(random.randint(0,
1))

orders =
list(Order.objects.all().prefetch_related(some_random_prefetch)[:10])

print(connection.queries)
[{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON
typnamespace = ns.oid WHERE typname = 'hstore'",
'time': '0.001'},
{'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'",
'time': '0.000'},
{'sql': 'SELECT @redacted@ FROM "orders_order" ORDER BY
"orders_order"."shipping_time" ASC LIMIT 10',
'time': '0.019'},
{'sql': 'SELECT @redacted@ FROM "orders_orderproduct" WHERE
"orders_orderproduct"."order_id" IN (114407, 84181, 84183, 25369)',
'time': '0.006'}]

Look only 4 ids in last query instead of 10 :)


}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:2>

Django

unread,
Mar 19, 2024, 5:25:40 PM3/19/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Laurent Lyaudet):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:3>

Django

unread,
Mar 19, 2024, 8:33:08 PM3/19/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

I'm not convinced we should add a feature for that given
`prefetch_related_objects` is already
[https://docs.djangoproject.com/en/5.0/ref/models/querysets/#django.db.models.prefetch_related_objects
a documented public API] that allows for this feature to be built on top
of

{{{#!python
orders = (
Order.objects.filter(...)
.exclude(...)
.annotate(...)
.prefetch_related(
"items",
"address",
)
)
predicate = lambda packing_task:
order.needs_to_consider_packing_problematic_zonings()
problematic_orders = filter(predicate, orders)
prefetch_related_objects(
orders,
Prefetch(
"packing_task__zonings",
queryset=Zonings.objects.filter(...),
to="problematic_zonings"
),
)
}}}

I'd a bit more trickier when dealing with predicates for nested prefetches
but it's still doable.

Just like any other non-trivial feature addition like this one I suggest
trying to gather consensus on a general need for
[https://docs.djangoproject.com/en/5.0/internals/contributing/bugs-and-
features/#requesting-features this feature] on the forum and / or
developer mailing list. From providing support over this channels in the
past years I don't remember anyone else running into a similar issue that
couldn't be solved with `prefetch_related_objects` directly.
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:4>

Django

unread,
Mar 20, 2024, 12:50:16 AM3/20/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* resolution: => wontfix
* status: new => closed

Comment:

I agree with Simon. The ORM is already packaged with features, so we
generally don't add new options for things that are already possible.
Especially for niche use cases.

You can [https://docs.djangoproject.com/en/stable/internals/contributing
/triaging-tickets/#closing-tickets follow triaging guidelines with regards
to wontfix tickets] and take this to DevelopersMailingList, if you don't
agree.
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:5>

Django

unread,
Mar 20, 2024, 7:39:47 AM3/20/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Laurent Lyaudet):

TLDR : Use https://pypi.org/project/django-monkey-patches/

Replying to [comment:4 Simon Charette]:
> I'm not convinced we should add a feature for that given
`prefetch_related_objects` is already
[https://docs.djangoproject.com/en/5.0/ref/models/querysets/#django.db.models.prefetch_related_objects
a documented public API] that allows for this feature to be built on top
of
> I'd be a bit more trickier when dealing with predicates for nested
prefetches but it's still doable.
What do you mean by a little bit trickier ?...
I'm using querysets with Django Rest Framework serializers.
I find it convenient to add a static method `enhance_queryset()` in each
model serializer that adds the needed prefetches.
Moreover, when I have nested model serializers,
I call recursively the methods `enhance_queryset()` of the nested model
serializers
in the method `enhance_queryset()` of the main model serializer.
If you use `prefetch_related_objects()` instead, either you do not inject
querysets in your serializers,
or you have to deal with all of this in the __init__() method of your
serializer.
Moreover, you have to gather the relevant objects by hand with nested
loops to put them in `prefetch_related_objects()`...
whilst it is already dealed with with nested prefetches using my patch...
I think my solution is superior, but I will not lose too much time
arguing.
> Just like any other non-trivial feature addition like this one I suggest
trying to gather consensus on a general need for
[https://docs.djangoproject.com/en/5.0/internals/contributing/bugs-and-
features/#requesting-features this feature] on the forum and / or
developer mailing list. From providing support over this channels in the
past years I don't remember anyone else running into a similar issue that
couldn't be solved with `prefetch_related_objects` directly.
Sorry, if I do that it will split the discussion in many places;
and frankly, I already know how are recieved most of the feature requests
on forums/mailing lists/whatever.
Bureaucracy is not for me.

Replying to [comment:5 Mariusz Felisiak]:
> I agree with Simon. The ORM is already packaged with features, so we
generally don't add new options for things that are already possible.
Especially for niche use cases.
>
> You can [https://docs.djangoproject.com/en/stable/internals/contributing
/triaging-tickets/#closing-tickets follow triaging guidelines with regards
to wontfix tickets] and take this to DevelopersMailingList, if you don't
agree.
Idem.

I don't see any way to upvote a feature request here. It would be a nice
way to know what dev using Django wants.
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:6>

Django

unread,
Mar 21, 2024, 6:56:16 PM3/21/24
to django-...@googlegroups.com
#35317: Add the possibility to do prefetches for only a subset of instances
-------------------------------------+-------------------------------------
Reporter: Laurent Lyaudet | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Laurent Lyaudet):

I did a second PR https://github.com/django/django/pull/18003 adding a
post_prefetch_callback argument to Prefetch.
Here is a redacted production code example that I could develop with the
corresponding patch in my package django-monkey-patches.
I doubt you can find an efficient way and requiring around the same amount
of code
to do the same thing without these features.

{{{#!python
class OModelSerializer(...):
@staticmethod
def enhance_queryset(
query_set,
prefix: str = "",
c_id: Optional[CId] = None,
):
query_set = SomeMixin1.enhance_queryset(query_set, prefix)

def ventilate_ps_by_c_id(
lookup,
done_queries,
):
prefetch_to = lookup.prefetch_to
prefetch_before = get_previous_prefetch_to(prefetch_to)
for obj in prefetch_before:
if hasattr(obj, "needed_ps"):
if not hasattr(obj, "needed_p_by"):
obj.needed_p_by = {}
for obj2 in obj.needed_ps:
obj.needed_p_by[obj2.c_id] = obj2

return query_set.prefetch_related(
f"{prefix}p2",
Prefetch(
f"{prefix}s",
post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple(
retrieve_forward_cache_callback=lambda o: [o.s] if
o.s_id else [],
backward_cache_name="current_o_ancestors",
),
),
Prefetch(
f"{prefix}c2",
post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple(
retrieve_forward_cache_callback=lambda o:[o.c2] if
o.c2_id else [],
backward_cache_name="current_order_ancestors",
),
),
Prefetch(
f"{prefix}s__ps",
queryset=(
P.objects.filter(c_id=c_id)
if c_id
else P.objects.all()
),
to_attr="needed_ps",
filter_callback=lambda p: hasattr(p,
"_prefetched_objects_cache")
and
p._prefetched_objects_cache.get("current_order_ancestors")
and any(
map(
lambda o: o.c2_id is not None,
p._prefetched_objects_cache.get(
"current_o_ancestors"
).values(),
)
),
post_prefetch_callback=ventilate_ps_by_c_id,
),
Prefetch(
f"{prefix}c2__u",
queryset=C2U.objects.filter(p2_id__isnull=False)
.distinct("c2_id")
.prefetch_related(
"p2",
Prefetch(
f"p2__ps",
queryset=(
P.objects.filter(c_id=c_id)
if c_id
else P.objects.all()
),
to_attr="needed_ps",
post_prefetch_callback=ventilate_ps_by_c_id,
),
),
to_attr="needed_u",
filter_callback=lambda c: (
hasattr(c2, "_prefetched_objects_cache")
and
c2._prefetched_objects_cache.get("current_o_ancestors")
and any(
map(
lambda o: o.c2_id is not None
and o.s_id is None,
c._prefetched_objects_cache.get(
"current_o_ancestors"
).values(),
)
)
),
),
)


class DModelSerializer(...):
...

@staticmethod
def enhance_queryset(
query_set,
prefix: str = "",
c_id: Optional[CId] = None,
):
query_set = SomeMixin2.enhance_queryset(
query_set,
f"{prefix}l__",
)
query_set = OModelSerializer.enhance_queryset(
query_set,
f"{prefix}l__",
c_id=c_id,
)
query_set = query_set.prefetch_related(
f"{prefix}l__s2",
f"{prefix}l__p3",
f"{prefix}l2__s3__u",
Prefetch(
f"{prefix}l__r1",
queryset=R1.objects.filter(
d_id=F("o__s2__d_id"),
r2__s4=SOME_CONSTANT,
).select_related("r2"),
to_attr="pertinent_r3",
),
)

return query_set
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35317#comment:7>
Reply all
Reply to author
Forward
0 new messages