[Django] #22014: `prefetch_related` recursion protection does not cover all cases

22 views
Skip to first unread message

Django

unread,
Feb 11, 2014, 11:33:25 AM2/11/14
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------+----------------------
Reporter: StillNewb | Owner: nobody
Type: Bug | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Keywords: prefetch
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+----------------------
This issue related to #17014

Commit (see line 1655):
https://github.com/django/django/commit/64da8eec3077eaaba5cfa62775523da772f4de44
#diff-5b0dda5eb9a242c15879dc9cd2121379R1655

== Summary: ==
`prefetch_related` collects additional prefetch lookups made by querysets
that are created during prefetching, and handle them to in addition to
lookups defined by user.
Sometimes it can casue infinite recursion, so for preventing that, there
needed some mechanism that ensures that prefetches that was already done,
would not be performed again.

== Problems in the code: ==
Now that mechanism stores descriptors of relational fields and checks
against them every iteration.
Python descriptor is shared against instances of classes. So descriptor
represents relational field of model, not field of instance.
For same relation and different sets of instances there are different
prefetch queries, so descriptors cant be used as identifiers for facts
that lookup was already prefetched.
And I also would add that passing descriptors around is very much
unpythonic and not healthy practice in general. Descriptors are advanced
properties, they must serve for same purpose as properties - hiding some
logic in attribute assigment/deletion/retrieving.

Reason here - is to identify lookups while traversing data relationships
and never repeat them.
In code this reason is not well exporessed:

* There is `done_queries` and `followed_descriptors` - two places for same
thing.
* Check is against descriptors, but in `done_queries` lookup paths is
being added.
* Check against lookup type (auto lookup or normal) is redundant, there is
no difference between auto-lookups and normal lookups in matter of which
allowed to spam, they must be checked for which was already done
uniformly.

**Specific problem** is in the check condition for adding in
`done_lookups` - `not (lookup in auto_lookups and descriptor in
followed_descriptors)` - intention here was "if this is not autolookup,
descriptor for which is already seen, and presumably lookup path for it
was added to done_queries - not adding it to done_queries. So autolookup-
spam will be stopped".
But what if descriptor for field, throug lookup-spam assumed to flow, was
already added while performing different lookup with different
data/parameters? If that lookup will be autolookup - it will never be
added to done_queries and would be allowed to be performed infinity number
of times.

There is too much unneeded complexity in that code.

ps.
Ive made the patch with two tests.

--
Ticket URL: <https://code.djangoproject.com/ticket/22014>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 16, 2014, 12:08:41 PM5/16/14
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------+--------------------------------------
Reporter: StillNewb | Owner: nobody
Type: Bug | Status: closed
Component: Uncategorized | Version: master
Severity: Normal | Resolution: needsinfo

Keywords: prefetch | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by bmispelon):

* status: new => closed
* needs_docs: => 0
* resolution: => needsinfo
* needs_tests: => 0
* needs_better_patch: => 0


Comment:

Hi,

Correct me if I'm wrong, but from what I can tell, the test you provide
doesn't demonstrate an infinite loop.

There does appear to be some repeated queries but I'm not familiar enough
with the ORM internals to say whether it's actually a bug or a performance
issue that we could improve.


I'll mark this ticket as "needs info". Can you reopen it with a testcase
that shows the infinite recursion?

Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/22014#comment:1>

Django

unread,
Jun 18, 2014, 5:41:56 AM6/18/14
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------+--------------------------------------

Reporter: StillNewb | Owner: nobody
Type: Bug | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Resolution:

Keywords: prefetch | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by Pashkin):

* status: closed => new
* resolution: needsinfo =>


Comment:

Yes, you right, in the patch I've demostrated one "step" of infinite
recursion, now I wrote third test which really reproduces infinite
recursion.
Here is the commit:
https://github.com/AndrewPashkin/django/commit/87d60b4958dc9c63c1dd970930e8c20d03e5eb36

--
Ticket URL: <https://code.djangoproject.com/ticket/22014#comment:2>

Django

unread,
Jul 28, 2014, 3:50:33 PM7/28/14
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------------+-------------------------------------

Reporter: StillNewb | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by timo):

* component: Uncategorized => Database layer (models, ORM)
* stage: Unreviewed => Accepted


Comment:

Confirmed the provided tests do fail.

--
Ticket URL: <https://code.djangoproject.com/ticket/22014#comment:3>

Django

unread,
Jul 30, 2014, 4:49:28 AM7/30/14
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------------+-------------------------------------

Reporter: StillNewb | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: prefetch | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by anonymous):

I actually have fixed this bug in this app:
https://bitbucket.org/andrew_pashkin/django-deep-prefetch/

It adds feature for prefetching sequential GFK lookups and fixes this bug.
Code from it can be ported to Django.

--
Ticket URL: <https://code.djangoproject.com/ticket/22014#comment:4>

Django

unread,
Dec 16, 2019, 12:18:00 PM12/16/19
to django-...@googlegroups.com
#22014: `prefetch_related` recursion protection does not cover all cases
-------------------------------------+-------------------------------------
Reporter: StillNewb | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: prefetch | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Baptiste Mispelon):

* status: new => closed

* resolution: => needsinfo


Comment:

Using a modified version of the `prefetch_recursion_protection_fail.diff​`
patch attached at the top of the ticket, I was able to run the provided
test against the latest master and the problem seems to have been fixed.

Using `git bisect`, I found that the commit that fixed the attached
failing test was bdbe50a491ca41e7d4ebace47bfe8abe50a58211.

Sadly, link from comment:2 is now dead so I can't reproduce any infinite
recursion error so I'm going to draw the same conclusion as I did in
comment:1:

I'm going to close this ticket as `needsinfo`. Please reopen if you can
provide some steps to reproduce the issue (ideally in the form of a unit
test against a current version of Django).

Thanks.

For reference, here's the modified testcase I used:
{{{#!python
class RecursionProtectionTests(TestCase):
def setUp(self):
self.book1 = Book.objects.create()
self.book2 = Book.objects.create()
self.author1 = Author.objects.create(first_book=self.book1,
name='Author #1')
self.author2 = Author.objects.create(first_book=self.book1,
name='Author #2')
self.publisher = Publisher.objects.create()
self.author1.books.add(self.book1)
self.author2.books.add(self.book2)
self.book1.publishers.add(self.publisher)
self.publisher.authors.add(self.author1, self.author2)

def walk(self, authors):
fetched_books = []
for author in authors:
for book in author.books.all():
for publisher in book.publishers.all():
for published_author in publisher.authors.all():
for fetched_book in published_author.books.all():
fetched_books.append(fetched_book)
return fetched_books

@override_settings(DEBUG=True)
def test_descriptors_storing_protection1(self):
qs1 =
Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books',
Book.objects.prefetch_related('publishers__authors__books')),
'books__publishers__authors__books')
qs2 =
Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books',
Book.objects.prefetch_related('publishers__authors__books')),
'books__publishers__authors__books', 'books__publishers__authors__books')


def count_queries(qs):
offset = len(connection.queries)
results = self.walk(qs)
return results, len(connection.queries[offset:])

results1, queries1 = count_queries(qs1)
results2, queries2 = count_queries(qs2)

self.assertEqual(len(results1), 2)
self.assertEqual(len(results2), 2)

self.assertEqual(queries1, queries2)


@override_settings(DEBUG=True)
def test_only_unique_queries(self):
qs =
Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books',
Book.objects.prefetch_related('publishers__authors__books')),
'books__publishers__authors__books')
offset = len(connection.queries)
self.walk(qs)
queries = [q['sql'] for q in connection.queries[offset:]]
self.assertEqual(len(queries), len(set(queries)))
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22014#comment:5>

Reply all
Reply to author
Forward
0 new messages