[Django] #22757: prefetch_related isn't as effecient as it could be with GenericForeignKey and proxy models

40 views
Skip to first unread message

Django

unread,
Jun 3, 2014, 7:00:28 PM6/3/14
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
--------------------------------------+--------------------
Reporter: gwahl@… | Owner: nobody
Type: Uncategorized | Status: new
Component: contrib.contenttypes | Version: 1.6
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+--------------------
`prefetch_related` is able to prefetch across a GenericForiegnKey.
However, it does more queries than necessary when using proxy models.

Our models:

{{{#!python
from django.db import models
from django.contrib.contenttypes.generic import GenericForeignKey
from django.contrib.contenttypes.models import ContentType

class Parent(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
child = GenericForeignKey('content_type', 'object_id',
for_concrete_model=False)

class Child(models.Model):
pass

class ProxiedChild(Child):
class Meta:
proxy = True
}}}

Now create some instances

{{{
>>> child = Child.objects.create()
>>> proxied_child = ProxiedChild.objects.create()
>>> p1 = Parent.objects.create(child=child)
>>> p2 = Parent.objects.create(child=proxied_child)
}}}

And query using prefetch_related:

{{{
>>> Parent.objects.all().prefetch_related('child')
SELECT "foo_parent"."id", "foo_parent"."content_type_id",
"foo_parent"."object_id" FROM "foo_parent" LIMIT 21

SELECT "foo_child"."id" FROM "foo_child" WHERE "foo_child"."id" IN (3)

SELECT "foo_child"."id" FROM "foo_child" WHERE "foo_child"."id" IN (2)
}}}

This is doing 3 queries instead of the expected 2. The two queries for the
`foo_child` table should be combined to be `"foo_child"."id" IN (2, 3)`

--
Ticket URL: <https://code.djangoproject.com/ticket/22757>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 4, 2014, 12:22:46 PM6/4/14
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: prefetch_related | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by charettes):

* needs_better_patch: => 0
* component: contrib.contenttypes => Database layer (models, ORM)
* needs_tests: => 0
* version: 1.6 => master
* keywords: => prefetch_related
* needs_docs: => 0
* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:1>

Django

unread,
Jul 10, 2017, 5:58:14 PM7/10/17
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: Paulo
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Paulo):

* owner: nobody => Paulo
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:2>

Django

unread,
Aug 22, 2022, 11:56:30 PM8/22/22
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* owner: Paulo => (none)
* status: assigned => new


Comment:

This is achievable by grouping queries by concrete model in
`GenericForeignKey.get_prefetch_queryset` but will also likely require
adjusting the prefetching logic to allow for some custom logic to be run
on assignment so returned `Child` instances can be turned into
`ProxiedChild`.

--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:3>

Django

unread,
Oct 19, 2023, 3:06:07 PM10/19/23
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: Nolan
Type: | Little
Cleanup/optimization | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nolan Little):

* owner: (none) => Nolan Little


* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:4>

Django

unread,
Nov 3, 2023, 10:35:13 AM11/3/23
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nolan Little):

* owner: Nolan Little => (none)


* status: assigned => new


Comment:

I started working on this at DjangoCon US. I had difficulty coming up with
a true optimization. I started from
`GenericForeignKey.get_prefetch_queryset` like Simon recommended. Grouping
the queries by concrete model was fairly straightforward, the difficulty
began with turning all instances back into their correct concrete/proxy
types correctly.

The only method I could come up with required iterating over every object
and checking if its instance type was `proxy`. I created a dict of the
instances content types to instance PK during the query grouping so that I
could use it to identify the correct `ContentType`. In the below code
`fkeys` was a list of grouped ids for all concrete and proxy objects:

{{{
objs = concrete_ct.get_all_objects_for_this_type(pk__in=fkeys)
for obj in objs:
instance = instance_dict[obj.pk]
if instance.content_type.model_class()._meta.proxy:
obj.__dict__.pop("_state") # necessary?
ret_obj =
instance.content_type.model_class()(**obj.__dict__)
else:
ret_obj = obj
ret_val.append(ret_obj)
}}}

There may be a cleaner way to get the correct Instance type back that I'm
unaware of as this piece felt quite ugly:
{{{
obj.__dict__.pop("_state") # necessary?
ret_obj = instance.content_type.model_class()(**obj.__dict__)
}}}

Unless there is a way to avoid iterating over all objects to determine the
correct instance I question the viability of optimizing the amount of
queries here. Realistically it seems that the amount of objects returned
is likely to be far higher than the amount of proxy models queried.

If i'm way off on this approach or if there is a way to get the correct
type for return instances without iterating all objects I'd love to learn
about it! I don't anticipate having a lot of time to make progress on this
in the near future so I'm going to un-assign myself, but I'm happy to pick
it back up in the future if anyone has recommendations on the approach.

--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:5>

Django

unread,
Nov 3, 2023, 7:21:14 PM11/3/23
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:6>

Django

unread,
Nov 3, 2023, 9:46:22 PM11/3/23
to django-...@googlegroups.com
#22757: prefetch_related isn't as effecient as it could be with GenericForeignKey
and proxy models
-------------------------------------+-------------------------------------
Reporter: gwahl@… | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch_related | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Thanks for having a shot at this Nolan!

I think that a lot of the complexity here stems from the fact doesn't
support polymorphic querysets that is querysets that can return multiple
models.

This poses a challenge from an implementation perspective for this issue
because the (simplified here) return signature of `get_prefetch_querysets`
contains three members

1. A collection of prefetch querysets
2. A callable that takes a model instance returned from one of the
prefetch queries and return a tuple uniquely identifying the item (that's
`lambda obj: (obj.pk, obj.__class__)`)
3. A callable that take model instances related objects are prefetched for
and return a tuple uniquely identifying the related items (that's
`gfk_key`)

The `prefetch_related` logic then executes all the querysets from 1.,
build a map from 2, and inserts it with 3. to do the proper
`_prefetch_related_cache` assignments.

Since Django only allows for a single model class to be used to construct
objects returned from a queryset, remember the lack of polymorphism
support, then one might be tempted to annotate objects with a
`_content_type_id` attribute to have the logic in 2. changed to `lambda
obj: (obj.pk, self.get_content_type(id=obj._content_type_id,
using=obj._state.db).model_class()` so the proper mapping occurs. Doing do
doesn't solve the problem of proper `__class__` assignment as you've
brought up Nolan nor does it address another edge case that would have
probably caused during review.

Even if we're able to return a specialized `Queryset` that overrides the
`_iterable_class` to return mixed models ([https://github.com/django-
polymorphic/django-
polymorphic/blob/5111fb070e4e0b0153e816d163871f2039ca2ae2/polymorphic/query.py#L25
it's the tactic used by packages like django-polymorphic] for example) how
we deal with the case where the same id is needs to be prefetched both
concretely and for it's proxy.

In other words, given the reported scenario, what kind of SQL query would
we generate for

{{{#!python
child = Child.objects.create(id=1)
proxy_child = ProxiedChild.objects.get(id=1)

parent = Parent.objects.create(child=child)
proxy_parent = Parent.objects.create(child=proxy_child)

Parent.objects.prefetch_related("child")
}}}

In this case the same `foo_child` row needs to returned twice one as a
child and one as a proxy_child which cannot be achieved with a simple `id
IN (1, 1)`.

The only way I can think of dealing with this edge case is by using a
`UNION ALL` with combined annotation so the following SQL would be
generated

{{{#!sql
-- Assuming the content type of the Child model is 2 and the content type
of the ProxiedChild model is 3
SELECT "foo_child"."id", 2 AS "_content_type_id" FROM "foo_child" WHERE
"foo_child"."id" IN (1)
UNION ALL
SELECT "foo_child"."id", 3 AS "_content_type_id" FROM "foo_child" WHERE
"foo_child"."id" IN (1)
}}}

And that must then be combined with the `_iterable_class` override
mentioned above.

There's even more fun now that we've added `GenericPrefetch` in 5.0 as it
allows you to specify querysets for each of the models involved which
means that it allows passing querysets with `select_related` and
annotations which cannot use the `UNION ALL` approach described above so
the ''optimization'' would have to be disabled.

After looking at this into more details I also agree that the addition in
complexity to avoid this query is likely not worth it as it would either
require some potentially expensive Python workaround or significant boiler
plate to get working properly. I suspect we'll want to ''wontfix'' this
one unfortunately.

--
Ticket URL: <https://code.djangoproject.com/ticket/22757#comment:7>

Reply all
Reply to author
Forward
0 new messages