So calling `get_path_info` everywhere when there's no filtered relation
being used is expensive:
{{{
In [2]: %timeit User.groups.rel.get_path_info()
2.27 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each
}}}
By defining additionally (and the opposite for places where
`get_reverse_path_info` is defined):
{{{
@cached_property
def path_infos(self):
return self.get_path_info()
}}}
we can amortize that cost over the life of the process such that:
{{{
In [3]: %timeit User.groups.rel.path_infos
164 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each
}}}
Thus, given 101 Users, each with 1 user permission, and 1 group (also with
single permission) as outlined here:
{{{
perms = tuple(Permission.objects.all())
for x in range(n):
y = get_random_string(10)
user = User.objects.create(username=y, email=f'{y}@{y}', password=y)
random_group_perm = random.choice(perms)
random_user_perm = random.choice(perms)
user.user_permissions.add(random_user_perm)
group = Group.objects.create(name=y)
group.permissions.add(random_group_perm)
user.groups.add(group)
}}}
we can execute the following before changing over to using a cached
property:
{{{
In [2]: %prun -stime -lpath_info for _ in range(100):
tuple(User.objects.prefetch_related('groups', 'user_permissions',
'groups__permissions'))
ncalls tottime percall cumtime percall filename:lineno(function)
31500 0.033 0.000 0.048 0.000 related.py:728(get_path_info)
600 0.002 0.000 0.009 0.000 related.py:1545(_get_path_info)
600 0.002 0.000 0.003 0.000
related.py:920(get_reverse_path_info)
600 0.001 0.000 0.009 0.000
related.py:1574(get_reverse_path_info)
600 0.000 0.000 0.010 0.000
reverse_related.py:197(get_path_info)
}}}
and then after:
{{{
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3 0.000 0.000 0.000 0.000 related.py:1559(_get_path_info)
3 0.000 0.000 0.000 0.000
related.py:934(get_reverse_path_info)
4 0.000 0.000 0.000 0.000 related.py:734(get_path_info)
3 0.000 0.000 0.000 0.000
related.py:1592(get_reverse_path_info)
3 0.000 0.000 0.000 0.000
reverse_related.py:207(get_path_info)
4 0.000 0.000 0.000 0.000 related.py:748(path_infos)
3 0.000 0.000 0.000 0.000
related.py:766(reverse_path_infos)
3 0.000 0.000 0.000 0.000
reverse_related.py:210(path_infos)
}}}
Note that those `ncalls` aren't a ''true'' reflection, because after the
cached property replaces itself subsequent attribute accesses won't be
tracked, hence the `timeit` information above.
Across the test suite, beforehand:
{{{
> python -m cProfile runtests.py -v0 | grep "path_info"
...
211111 0.227 0.000 0.335 0.000
related.py:728(get_path_info)
5349 0.021 0.000 0.073 0.000
related.py:1545(_get_path_info)
6804 0.019 0.000 0.027 0.000
related.py:920(get_reverse_path_info)
3469 0.006 0.000 0.013 0.000 wsgi.py:152(get_path_info)
Unrelated
5568 0.005 0.000 0.068 0.000
reverse_related.py:197(get_path_info)
3969 0.004 0.000 0.058 0.000
related.py:1574(get_reverse_path_info)
1380 0.001 0.000 0.021 0.000
related.py:1571(get_path_info)
124 0.000 0.000 0.001 0.000 fields.py:399(get_path_info)
113 0.000 0.000 0.000 0.000
related.py:742(get_reverse_path_info)
27 0.000 0.000 0.000 0.000
fields.py:416(get_reverse_path_info)
6 0.000 0.000 0.000 0.000
fields.py:359(_get_path_info_with_parent)
15 0.000 0.000 0.000 0.000
empty_join.py:54(get_path_info)
10 0.000 0.000 0.000 0.000
empty_join.py:67(get_reverse_path_info)
1 0.000 0.000 0.002 0.002
tests.py:23(test_bad_path_info Unrelated
1 0.000 0.000 0.000 0.000
tests.py:196(test_environ_path_info_type) Unrelated
1 0.000 0.000 0.000 0.000
tests.py:164(test_wsgirequest_path_info) Unrelated
2 0.000 0.000 0.000 0.000
request.py:148(get_full_path_info) Unrelated
}}}
and after switching to a cached property:
{{{
> python -m cProfile runtests.py -v0 | grep "path_info"
...
3469 0.006 0.000 0.013 0.000 wsgi.py:152(get_path_info)
Unrelated
873 0.003 0.000 0.004 0.000
related.py:734(get_path_info)
396 0.002 0.000 0.003 0.000
related.py:934(get_reverse_path_info)
203 0.001 0.000 0.006 0.000
related.py:1559(_get_path_info)
843 0.001 0.000 0.004 0.000 related.py:748(path_infos)
342 0.001 0.000 0.006 0.000
reverse_related.py:207(get_path_info)
256 0.000 0.000 0.006 0.000
reverse_related.py:210(path_infos)
196 0.000 0.000 0.002 0.000
related.py:766(reverse_path_infos)
130 0.000 0.000 0.004 0.000
related.py:1592(get_reverse_path_info)
73 0.000 0.000 0.002 0.000
related.py:1585(get_path_info)
17 0.000 0.000 0.000 0.000 fields.py:399(get_path_info)
67 0.000 0.000 0.002 0.000 related.py:1588(path_infos)
7 0.000 0.000 0.000 0.000
related.py:752(get_reverse_path_info)
15 0.000 0.000 0.000 0.000 fields.py:416(path_infos)
4 0.000 0.000 0.000 0.000
fields.py:420(get_reverse_path_info)
1 0.000 0.000 0.001 0.001
tests.py:23(test_bad_path_info) Unrelated
2 0.000 0.000 0.000 0.000
fields.py:359(_get_path_info_with_parent)
1 0.000 0.000 0.000 0.000
tests.py:196(test_environ_path_info_type) Unrelated
2 0.000 0.000 0.000 0.000
empty_join.py:54(get_path_info)
1 0.000 0.000 0.000 0.000
tests.py:164(test_wsgirequest_path_info) Unrelated
1 0.000 0.000 0.000 0.000
empty_join.py:67(get_reverse_path_info)
2 0.000 0.000 0.000 0.000
request.py:148(get_full_path_info) Unrelated
}}}
As you can see, even within the test suite there are a substantial number
of calls to effectively get the same information, across the many and
various test models. Obviously if you look at the timing numbers, it's all
kind of insubstantial. It's really just work that doesn't need repeatedly
doing, I don't think.
I have a branch which I'll tidy up and push to check CI and allow for
discussion, should it be accepted.
--
Ticket URL: <https://code.djangoproject.com/ticket/32996>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
Comment:
Accepting so discussion can continue with the code in front of us.
--
Ticket URL: <https://code.djangoproject.com/ticket/32996#comment:1>
* needs_better_patch: 0 => 1
* has_patch: 0 => 1
Comment:
Yes, it does rather sound like the last line of the ticket is me holding
the implementation hostage until it's triaged. Wasn't the intent, just
forgot to come back and update it once CI had run :) Feel free to put it
back to untriaged if that was a deciding factor :)
https://github.com/django/django/pull/14750
--
Ticket URL: <https://code.djangoproject.com/ticket/32996#comment:2>
* needs_better_patch: 1 => 0
Comment:
Unticking because I think it's ready to be looked at. Naturally it may go
back to needing improvement :)
--
Ticket URL: <https://code.djangoproject.com/ticket/32996#comment:3>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/32996#comment:4>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"a697424969f7f464bf6492b09a6cdac135499e02" a6974249]:
{{{
#!CommitTicketReference repository=""
revision="a697424969f7f464bf6492b09a6cdac135499e02"
Fixed #32996 -- Cached PathInfos on relations.
PathInfo values are ostensibly static over the lifetime of the object
for which they're requested, so the data can be memoized, quickly
amortising the cost over the process' duration.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32996#comment:5>