[Django] #28455: Create "inplace" QuerySets to speed up certain operations

10 views
Skip to first unread message

Django

unread,
Jul 31, 2017, 1:56:50 PM7/31/17
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi | Owner: nobody
Kääriäinen |
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Part two of #20880: Inplace querysets work precisely like full-clone
querysets. A queryset is called inplace if it doesn't do cloning at all
between operations. This again allows speeding up certain internal
operations like model.save() and prefetch_related(), and also allows users
to optimise heavy-to-create querysets by avoiding clone() overhead.

--
Ticket URL: <https://code.djangoproject.com/ticket/28455>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 10, 2017, 1:24:08 PM8/10/17
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi Kääriäinen | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

I
[https://github.com/django/django/compare/master...timgraham:28455-inplace?expand=1
rebased work] from
[https://github.com/akaariai/django/commits/splitted_clone Anssi's
branch]. I guess the plan was to keep "inplace" as a private API initially
(hence the underscore prefix) but I'm not sure. Someone else is welcome to
continue this work. I'm not planning to continue it soon.

--
Ticket URL: <https://code.djangoproject.com/ticket/28455#comment:1>

Django

unread,
Oct 14, 2019, 9:07:05 PM10/14/19
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi Kääriäinen | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28455#comment:2>

Django

unread,
Jul 20, 2021, 8:19:32 AM7/20/21
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi Kääriäinen | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* owner: nobody => Keryn Knight
* status: new => assigned


Comment:

I don't think I'm stepping on anyone's toes at this juncture, but do shout
if so!

I'd accidentally been wading into similar waters when I stumbled across
this ticket, which has ended up at roughly the same place/conclusions, and
I have a rough sketch based on the prior branches for an MVP to resume
discussion on this. I'll tidy up what I have, try and gather some simple
benchmarks, and throw it into the PR queue for sanity checking.

I guarantee there's more nuance to this than I've anticipated so far, but
we've got to restart somewhere, right? If it works out, all the better :)

--
Ticket URL: <https://code.djangoproject.com/ticket/28455#comment:3>

Django

unread,
Jul 20, 2021, 11:47:12 AM7/20/21
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi Kääriäinen | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Keryn Knight):

PR is https://github.com/django/django/pull/14675

Based heavily on Anssi's original implementation idea. This introduces 2
ways of opting out of cloning, both ostensibly private API. Firstly and
primarily, a context manager which temporarily makes the current QuerySet
(+ Query, implicitly) mutable:

{{{
with User.objects.all()._avoid_cloning() as queryset1:
queryset1.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
# OR
queryset2 = User.objects.all()
with queryset2._avoid_cloning():
queryset2.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
}}}

And secondly, an imperative form, for use cases where there might be
multiple filters across a long list of lines, where it's not desirable to
muddy the VCS history and/or the indent depth by using the context
manager:
{{{
queryset = User.objects.all()._disable_cloning()
queryset.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
queryset._enable_cloning()
queryset2 = queryset.filter(abc='test') # This should be a new clone, as
usual.
}}}

There's currently no support for any sort of nesting or ensured balancing
(as with the original implementation, as I understand it), so
`._disable_cloning()._disable_cloning()._disable_cloning()` would be
undone by a single `._enable_cloning()` call because it's a toggle rather
than a stack of pushes/pops; hence the preference for the contextmanager.

Given the following SQLite data:
{{{
In [1]: from django.contrib.auth.models import User, Group, Permission
In [2]: Group.objects.count()
Out[2]: 101
In [3]: User.objects.count()
Out[3]: 101
In [4]: Permission.objects.count()
Out[4]: 24
In [5]: User.user_permissions.through.count()
Out[5]: 0
In [6]: for user in User.objects.all():
...: print(f'{user.groups.count()}, {user.groups.first().pk},
{user.user_permissions.count()}')
1, 1, 0
1, 2, 0
1, 3, 0
1, 4, 0
1, 5, 0
... they all have 1 group and zero permissions
In[7]: from django.db import connection
In[8]: connection.queries[-2:]
[{'sql': 'SELECT "auth_user"."id", ... FROM "auth_user"',
'time': '0.001'},
{'sql': 'SELECT ("auth_user_groups"."user_id") AS
"_prefetch_related_val_user_id", "auth_group"."id", "auth_group"."name"
FROM "auth_group" INNER JOIN "auth_user_groups" ON ("auth_group"."id" =
"auth_user_groups"."group_id") WHERE "auth_user_groups"."user_id" IN (1,
2, 3, [...] 99, 100, 101)',
'time': '0.000'}]
}}}

Before applying any of the changes:
{{{
In [2]: %timeit tuple(User.objects.all())
1.26 ms ± 9.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
(0 calls to ._clone)
In [3]: %timeit tuple(User.objects.prefetch_related('groups'))
6.52 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(105 calls to ._clone)
In [4]: %timeit tuple(User.objects.prefetch_related('groups',
'user_permissions'))
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(213 calls to ._clone)
}}}
After enabling the avoidance cloning technique for prefetch_related only:
{{{
In [2]: %timeit tuple(User.objects.all())
1.28 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
(0 calls to ._clone)
In [3]: %timeit tuple(User.objects.prefetch_related('groups'))
5.93 ms ± 53.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(4 calls to ._clone)
In [4]: %timeit tuple(User.objects.prefetch_related('groups',
'user_permissions'))
10.2 ms ± 59.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(7 calls to ._clone)
}}}

So for the 1 prefetch it's a decrease of maybe ~9%, and for the 2
prefetches it's a decrease of roughly ~15%. I make no pretense about being
able to calculate improvement by factor vs percent and risk getting it
wrong.

The number of calls to clone was done by re-running the queries in fresh
`ipython` sessions, having `_clone()` increment a global integer each time
it's called. It isn't part of the actual timeit numbers.

These improved prefetch timings are only found when hitting the
`manager.get_queryset()` of `prefetch_one_level` because we can assume
that returns a fresh QuerySet instance (or subclass) that holds OK data
and doesn't need cloning.
The same cannot be said of `if leaf and lookup.queryset [...]` I don't
think, because that queryset is shared and in an unknown state, so at
least for the initial MVP remains cloned. It ''may'' do one fewer clone
operation ''if'' the database would be changed by `using()`.

Calling chain directly, without disabling cloning:
{{{
In [1]: from django.contrib.auth.models import User, Group, Permission
In [2]: x = User.objects.all()
In [4]: %timeit y = x._chain()
7.05 µs ± 62.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}
With cloning disabled:
{{{
In [6]: %timeit y = x._chain()
275 ns ± 3.15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
}}}

As I write this, I have not implemented the other updates Anssi did like
update `get_contenttypes_and_models` or `create_permissions` ''et al'' so
that the proposed API changes can be decided on and looked over before
moving forward with other potential usages.

Additionally as I write this, I've added no new tests to properly
demonstrate the correctness of it, for the same reasons. For the
discussion to kick off, it's going to be enough to see that the test suite
passes or fails and iterate from there.

The number of `QuerySet._clone` operations done by the test suite (Ran
14876 tests, skipped=1192, expected failures=4) drops from `144851` to
`142305` which isn't a huge amount of confidence, but it's a start.

--
Ticket URL: <https://code.djangoproject.com/ticket/28455#comment:4>

Django

unread,
Aug 2, 2021, 6:41:50 AM8/2/21
to django-...@googlegroups.com
#28455: Create "inplace" QuerySets to speed up certain operations
-------------------------------------+-------------------------------------
Reporter: Anssi Kääriäinen | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* needs_better_patch: 0 => 1
* has_patch: 0 => 1


Comment:

Updated slightly, and now I've sat down to get cProfile information using
`%prun for _ in range(100):
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))`

First, the baseline, showing only operations related to the change, for
brevity (so no `Model.__init__` etc):
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
31200 0.165 0.000 0.303 0.000 query.py:290(clone)
300 0.098 0.000 3.134 0.010
query.py:1860(prefetch_one_level)
31200 0.066 0.000 0.444 0.000 query.py:1337(_clone)
30200 0.055 0.000 0.655 0.000
related_descriptors.py:883(_apply_rel_filters)
40600 0.052 0.000 1.231 0.000 query.py:45(__iter__)
30200 0.051 0.000 0.897 0.000
related_descriptors.py:899(get_queryset)
31200 0.046 0.000 0.500 0.000 query.py:1325(_chain)
40200 0.041 0.000 0.327 0.000 base.py:511(from_db)
30500 0.039 0.000 0.931 0.000
query.py:982(_filter_or_exclude)
30600 0.030 0.000 0.194 0.000 manager.py:142(get_queryset)
31200 0.029 0.000 0.337 0.000 query.py:341(chain)
31200 0.028 0.000 0.070 0.000 where.py:142(clone)
}}}

Using the `@contextmanager` decorator. Lines are shown in the same order
as the above, so they're technically ordered by the baseline's internal
time:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
1000 0.007 0.000 0.013 0.000 query.py:290(clone)
300 0.082 0.000 2.932 0.010
query.py:1881(prefetch_one_level)
1000 0.003 0.000 0.019 0.000 query.py:1358(_clone)
30200 0.097 0.000 0.444 0.000
related_descriptors.py:884(_apply_rel_filters)
40600 0.052 0.000 0.947 0.000 query.py:46(__iter__)
30200 0.054 0.000 1.039 0.000
related_descriptors.py:901(get_queryset)
31200 0.029 0.000 0.056 0.000 query.py:1343(_chain)
40200 0.041 0.000 0.376 0.000 base.py:511(from_db)
30500 0.041 0.000 0.496 0.000
query.py:984(_filter_or_exclude)
30600 0.032 0.000 0.545 0.000 manager.py:142(get_queryset)
1000 0.001 0.000 0.014 0.000 query.py:341(chain)
1000 0.002 0.000 0.003 0.000 where.py:142(clone)
...
30600 0.081 0.000 0.086 0.000 contextlib.py:86(__init__)
60400 0.024 0.000 0.034 0.000
query.py:1335(_avoid_cloning)
30600 0.017 0.000 0.067 0.000 contextlib.py:121(__exit__)
30600 0.016 0.000 0.102 0.000 contextlib.py:242(helper)
30600 0.016 0.000 0.042 0.000 contextlib.py:112(__enter__)
}}}

And then finally with a custom context manager class, again using the same
ordering as the baseline:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
1000 0.010 0.000 0.016 0.000 query.py:290(clone)
300 0.095 0.000 3.293 0.011
query.py:1888(prefetch_one_level)
1000 0.004 0.000 0.023 0.000 query.py:1365(_clone)
30200 0.102 0.000 0.412 0.000
related_descriptors.py:884(_apply_rel_filters)
40600 0.062 0.000 1.133 0.000 query.py:46(__iter__)
30200 0.063 0.000 1.032 0.000
related_descriptors.py:901(get_queryset)
31200 0.077 0.000 0.111 0.000 query.py:1350(_chain)
40200 0.049 0.000 0.391 0.000 base.py:511(from_db)
30500 0.070 0.000 0.650 0.000
query.py:996(_filter_or_exclude)
30600 0.038 0.000 0.563 0.000 manager.py:142(get_queryset)
1000 0.002 0.000 0.018 0.000 query.py:341(chain)
1000 0.002 0.000 0.004 0.000 where.py:142(clone)
...
30200 0.006 0.000 0.006 0.000 query.py:178(__init__)
30200 0.016 0.000 0.022 0.000
query.py:1347(_avoid_cloning)
30200 0.014 0.000 0.020 0.000 query.py:184(__exit__)
30200 0.015 0.000 0.021 0.000 query.py:181(__enter__)
}}}

Using the custom context manager is substantially faster than the
`contextlib` decorator, at least in the simple form I currently have it.
Here's the decorator version:


{{{
In [1]: from django.contrib.auth.models import User

In [2]: x = User.objects.all()

In [3]: %timeit x._avoid_cloning()
1.01 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
each)
In [4]: %timeit with x._avoid_cloning(): pass
1.98 µs ± 63.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}
and then using the custom class:
{{{
In [3]: %timeit x._avoid_cloning()
276 ns ± 2.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit with x._avoid_cloning(): pass
819 ns ± 39.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
}}}

py-spy (sampling profiler) also suggests `sql.Query.clone` is 3% of the
time spent, where it doesn't even look to get sampled enough to be
relevant, once cloning avoidance is introduced.

Meanwhile linters are the bane of my life and so the patch remains 'needs
improvement'

--
Ticket URL: <https://code.djangoproject.com/ticket/28455#comment:5>

Reply all
Reply to author
Forward
0 new messages