[Django] #25544: prefetch_related sends duplicate ids to database

31 views
Skip to first unread message

Django

unread,
Oct 12, 2015, 7:53:43 AM10/12/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
----------------------------------------------+----------------------------
Reporter: spectras | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer (models, ORM) | Version: 1.8
Severity: Normal | Keywords: prefetch
| duplicate
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+----------------------------
That may be intended, but it seems dubious, so…

Issue happens when prefetching a model relation from another one, linked
by a M2M field. Sample case:

{{{#!python
class Picture(Model):
# some fields

class Collection(Model):
pictures = ManyToManyField(Picture, blank=True,
related_name='collections')
# some fields

class CollectionPermission(Model):
content_object = ForeignKey(Collection,
related_name='permissions')
# fk to users and fk to permissions
}}}

When prefetching collection permissions from a picture queryset, like
this:

{{{#!python
Picture.objects.prefetch_related('collections__permissions')
}}}

The third query (the second level of prefetching) looks like this:
{{{#!sql
SELECT "gallery_collectionpermission"."id",
"gallery_collectionpermission"."user_id",
"gallery_collectionpermission"."permission_id",
"gallery_collectionpermission"."content_object_id"
FROM "gallery_collectionpermission"
WHERE "gallery_collectionpermission"."content_object_id" IN (
11, 11, 11, 11,
11, 11, 11, 11,
11, 11, 10, 10,
10, 10, 10, 10,
10, 10, 10, 10,
10, 10, 10, 10,
10, 10, 10, 10,
10, 10, 10, 10,
10, 10, 9, 9,
9, 9, 9 )
}}}

The issue arises in two steps:
* As prefetch has no identity mapping, the first prefetching step will
build a different Collection instance for every picture that uses it.
* Then the second prefetching step will gather all instances, and dump
their ids to the database without looking for duplicates.

Ideally, this could be fixed at the first step: when a queryset prefetches
the same row several times, it would make sense to return a single
instance. In this example, all Pictures' cache should point to the same
Gallery instances.

The second step could also be fixed: the related manager could eliminate
duplicates when building the query. For some reason the
ReverseSingleRelatedObjectDescriptor does it, but none of the other
descriptors or managers do.

The cost seems huge. Typically, I have collections of hundreds of
pictures, but each picture will only appear in 4 or 5 collections. The
query thus sends the db server tens of thousands of ids instead of a few
dozens.

Is there something I missed? Or is this use case out of the normal scope
of `prefetch_related`?

--
Ticket URL: <https://code.djangoproject.com/ticket/25544>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 21, 2015, 12:03:39 PM10/21/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* type: Uncategorized => Cleanup/optimization
* needs_tests: => 0
* needs_docs: => 0


Comment:

I'm not too familiar with this, but accepting on the basis for further
investigation.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:1>

Django

unread,
Nov 7, 2015, 7:53:08 AM11/7/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned

Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ian-Foote):

* owner: nobody => Ian-Foote
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:2>

Django

unread,
Nov 7, 2015, 11:19:22 AM11/7/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Ready for
| checkin

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by carljm):

* stage: Accepted => Ready for checkin


Comment:

Patch is ready, IMO. Waiting for actual merge until the CI catches up...

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:3>

Django

unread,
Nov 7, 2015, 2:01:57 PM11/7/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* needs_better_patch: 0 => 1
* has_patch: 0 => 1
* stage: Ready for checkin => Accepted


Comment:

`postgres_tests.test_array.TestQuerying.test_in` fails.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:4>

Django

unread,
Nov 7, 2015, 8:28:42 PM11/7/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by carljm):

@IanFoote Oops, my fault. I didn't consider the possibility that non-
hashable values (e.g. lists for an ArrayField) could be passed to `__in`
lookups. I don't really see a way to preserve this fix in the face of that
possibility (going through the value and trying to convert everything to
something hashable doesn't seem like a good plan), so unless you have any
ideas I'm not seeing, I think we may need to revert to your previous fix
at the `prefetch_related` layer.

Sorry for the bad direction!

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:5>

Django

unread,
Nov 8, 2015, 1:53:03 AM11/8/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by charettes):

@carljm, what do you think of simply special casing `ArrayField`'s `__in`
lookup which is something we already do for `__exact` anyway.

Something like the following comes to mind.

{{{#!python
@ArrayField.register_lookup
class ArrayInLookup(InLookup):
def get_prep_lookup(self):
values = super(ArrayInLookup, self).get_prep_lookup()
return [tuple(value) for value in values]
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:6>

Django

unread,
Nov 8, 2015, 5:18:00 AM11/8/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by carljm):

@charettes I think that's a good idea! As long as we're confident that a)
there are no other valid cases outside ArrayField where an unhashable
value might be passed in an `__in` lookup rhs, and b) every value passed
to an ArrayField lookup should always be iterable. But I can't come up
with any counter-examples to either of those.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:7>

Django

unread,
Nov 8, 2015, 6:37:23 AM11/8/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

A possibility is to try-except the deduping of the values. If it doesn't
succeed, just use the original values.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:8>

Django

unread,
Nov 8, 2015, 10:55:49 AM11/8/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by spectras):

I do not know if it is related, but ̀ArrayField.get_db_prep_value` always
converts tuples to lists.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:9>

Django

unread,
Nov 8, 2015, 8:59:39 PM11/8/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by charettes):

Might be worth pinging Mark about this.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:10>

Django

unread,
Nov 10, 2015, 5:04:37 AM11/10/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ian-Foote):

* needs_better_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:11>

Django

unread,
Nov 10, 2015, 5:48:04 PM11/10/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by carljm):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:12>

Django

unread,
Nov 12, 2015, 1:44:42 PM11/12/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* needs_better_patch: 0 => 1

* stage: Ready for checkin => Accepted


Comment:

Latest comments on the PR indicates a few more tests should be added.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:13>

Django

unread,
Dec 14, 2015, 3:07:25 PM12/14/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: assigned
Cleanup/optimization |
Component: Database layer | Version: master

(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* needs_better_patch: 1 => 0

* version: 1.8 => master


Comment:

The comments have been addressed.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:14>

Django

unread,
Dec 17, 2015, 7:09:09 PM12/17/15
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: spectras | Owner: Ian-Foote
Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"86eccdc8b67728d84440a46e5bf62c78f2eddf6d" 86eccdc8]:
{{{
#!CommitTicketReference repository=""
revision="86eccdc8b67728d84440a46e5bf62c78f2eddf6d"
Fixed #25544 -- Removed duplicate ids in prefetch_related() queries.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:15>

Django

unread,
Jun 18, 2018, 10:21:46 AM6/18/18
to django-...@googlegroups.com
#25544: prefetch_related sends duplicate ids to database
-------------------------------------+-------------------------------------
Reporter: Julien Hartmann | Owner: Ian Foote

Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: prefetch duplicate | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nick Pope):

The fix to this resulted in a regression where parameters passed to `__in`
were reordered due to the use of `set()`. See #29503.

--
Ticket URL: <https://code.djangoproject.com/ticket/25544#comment:16>

Reply all
Reply to author
Forward
0 new messages