[Django] #26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated model

22 views
Skip to first unread message

Django

unread,
Mar 3, 2016, 5:07:14 PM3/3/16
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: aleontiev | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) | Keywords: queryset
Severity: Normal | prefetch_related duplicate
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
We discovered an issue in {{{django.db.models.query.Prefetch}}} logic that
results in duplicate queries made when the leaves of certain prefetch
trees are accessed.

With these models:

{{{
from django.db import models


class Publisher(models.Model):
name = models.CharField(max_length=128)


class Author(models.Model):
name = models.CharField(max_length=128)
publishers = models.ManyToManyField('Publisher',
related_name='authors')


class Book(models.Model):
name = models.CharField(max_length=128)
publisher = models.ForeignKey('Publisher', related_name='books')
}}}

The following test fails:

{{{
from django.db.models.query import Prefetch
from django.test import TestCase
from .models import Author, Book, Publisher


def flatten(ls):
return list([i for s in ls for i in s])


class PrefetchTestCase(TestCase):

def setUp(self):
publisher = Publisher.objects.create(name='Publisher0')
Book.objects.create(name='Book0', publisher=publisher)
author = Author.objects.create(name='Author0')
author.publishers.add(publisher)

def test_prefetch_nested(self):
publishers = Publisher.objects.prefetch_related(
Prefetch(
'books',
Book.objects.all().prefetch_related(
Prefetch(
'publisher',
Publisher.objects.all().prefetch_related('authors')
)
)
)
)
with self.assertNumQueries(4):
publishers = list(publishers)

with self.assertNumQueries(0):
books = flatten([p.books.all() for p in publishers])
with self.assertNumQueries(0):
publishers = [b.publisher for b in books]
with self.assertNumQueries(0):
authors = flatten([p.authors.all() for p in publishers])
}}}

For more details (comments, queries executed) and an analogous green test-
case that uses the flat prefetch form to prefetch the same tree, see the
attached test package.

To run the tests:

{{{
tar -zxvf prefetch-bug-test.tar.gz
cd prefetch-bug-test
make test
}}}

This issue seemed very similar to
https://code.djangoproject.com/ticket/25546, but unfortunately the patch
for that ticket
(https://code.djangoproject.com/changeset/bdbe50a491ca41e7d4ebace47bfe8abe50a58211)
did not fix this problem.

Tested in Django 1.8, 1.9, and master.

--
Ticket URL: <https://code.djangoproject.com/ticket/26318>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 3, 2016, 5:07:30 PM3/3/16
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------------------+-------------------------

Reporter: aleontiev | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Resolution:
Keywords: queryset prefetch_related duplicate | Triage Stage:

| Unreviewed
Has patch: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------------------+-------------------------
Changes (by aleontiev):

* Attachment "prefetch-bug-test.tar.gz" added.

Django

unread,
Mar 3, 2016, 6:42:29 PM3/3/16
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: aleontiev | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* needs_docs: => 0
* needs_better_patch: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

Managed to reproduce against master.

I suppose the algorithm gets confused because the `Publisher` model has to
be prefetched twice.

--
Ticket URL: <https://code.djangoproject.com/ticket/26318#comment:1>

Django

unread,
Apr 28, 2016, 7:56:20 PM4/28/16
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: aleontiev | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by melinath):

I just ran into this as well, but without recursion. i.e. (with apologies
for the implied structure)

{{{
users = User.objects.prefetch_related(
Prefetch(
'person',
queryset=Person.objects.prefetch_related(
'books',
),
),
)
}}}

would generate 4 queries instead of three (because the books are
duplicated).

To be clear, this is a trimmed down version of the actual query I'm trying
to construct. I haven't tested it (and I don't know whether I'll be able
to any time soon.) Someone should try tweaking the test case to not use
recursion, is all I'm saying.

--
Ticket URL: <https://code.djangoproject.com/ticket/26318#comment:2>

Django

unread,
Apr 7, 2018, 4:55:43 AM4/7/18
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: Anthony Leontiev | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Yuri Kanivetsky):

I've run into it as well. Consider model `M1`, having many-to-many
relationship to `M3` via `M2`. And `M3`, having one-to-many relationship
to `M4`:

{{{
M1 <- M2 -> M3 <- M4
}}}

And the following statement, which is supposed to fetch all models:

{{{#!python
m1s = M1.objects.all() \
.prefetch_related(
Prefetch('m2s', queryset=M2.objects.prefetch_related('m3__m4s'))
)
}}}

This statement results in two calls to
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1437
prefetch_related_objects]. One for `M1`'s, and the nested one for `M2`'s.
The nested one does
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1455
one prefetch lookup] `m3__m4s`. The outer one does two. One is `m2s`, and
the second one is `m2s__m3__m4s`. Since inner lookups
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1641-L1644
get passed] to the outer `prefetch_related_objects` calls.

Now then,
[https://github.com/django/django/blob/2.0.4/django/db/models/fields/related_descriptors.py#L456-L510
reverse many-to-one descriptors] (which are used to access `M4`'s from
`M3`'s) don't have
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1587
get_prefetch_queryset] method. So models are cached in
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1706
the instances].

The thing is in our case `cache_name` is `m4`, but `get_prefetcher` (or
basically `prefetch_related_objects`) expects to see
[https://github.com/django/django/blob/2.0.4/django/db/models/query.py#L1606
m4s] there. As a result, both outer and inner `prefetch_related_objects`
calls retrieve `M4`'s from database.

It can be overcome by moving nested lookup to the upper level:

{{{#!python
m1s = M1.objects.all().prefetch_related('m2s', 'm2s__m3__m4s')
}}}

Not sure if that is possible in each and every case.

--
Ticket URL: <https://code.djangoproject.com/ticket/26318#comment:3>

Django

unread,
Apr 7, 2018, 5:02:04 AM4/7/18
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: Anthony Leontiev | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Yuri Kanivetsky):

* Attachment "nested_prefetch_create_app.sh" added.

creates an app in nested_prefetch dir to easily reproduce the issue

Django

unread,
Apr 7, 2018, 5:05:12 AM4/7/18
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: Anthony Leontiev | Owner: nobody

Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Yuri Kanivetsky):

* Attachment "nested_prefetch_create_app.sh" added.

creates an app in nested_prefetch dir to easily reproduce the issue

--

Django

unread,
May 2, 2022, 7:34:22 AM5/2/22
to django-...@googlegroups.com
#26318: Unexpected / duplicated queries on nested Prefetch queryset with repeated
model
-------------------------------------+-------------------------------------
Reporter: Anthony Leontiev | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: queryset | Triage Stage: Accepted
prefetch_related duplicate |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => duplicate


Comment:

Fixed by f5233dce309543c826224be9dfa9c9f4f855f73c. Duplicate of #32511.

--
Ticket URL: <https://code.djangoproject.com/ticket/26318#comment:4>

Reply all
Reply to author
Forward
0 new messages