[Django] #32970: Investigate feasibility of improving WhereNode clone performance

Django

unread,

Jul 29, 2021, 10:25:41 AM7/29/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------
Reporter: Keryn | Owner: Keryn Knight
Knight |
Type: | Status: assigned
Cleanup/optimization |
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
This relates to #32948 insofar as it's about figuring out how shallow or
deep a copy is necessary for cloning `Node` instances/subclasses. And a
bit to #28455 which acknowledges the need to avoid cloning sometimes due
to it being costly.

The PR I made for optimising Q combining/inverting
(https://github.com/django/django/pull/14673) wants to introduce a
`__copy__` method onto `Q` objects to indicate what level of operation is
needed. Unlike that PR, it's decidedly not possible to take the same
approach with WhereNode and just do `clone.__dict__ =
self.__dict__.copy()` (things broke horribly) because of attributes like
`contains_aggregate` and `output_field` which are cached properties on the
WhereNode instance. Also `resolved` gets set after the fact by
`resolve_expression`. But, as far as I can tell, the looping/testing/child
cloning ''may be removable''. Perhaps once upon a time they were required
and other components changed such that it's now safe to consider. I can't
readily say, but initial experimentation suggests it's OK to look at, at
least in theory.

There is a PR already for this, here:
https://github.com/django/django/pull/14709 which is currently marked
draft/WIP because I couldn't easily run the whole test suite locally and
needed to see if the more exotic parts caused problems. They didn't, which
was unexpected, so here I am. The PR currently replaces the
`WhereNode.clone` method like so:
{{{
clone = self.__class__._new_instance(children=[],
connector=self.connector, negated=self.negated)
for child in self.children:
if hasattr(child, 'clone'):
clone.children.append(child.clone())
else:
clone.children.append(child)
}}}
with:
{{{
clone = self.__class__._new_instance(children=None,
connector=self.connector, negated=self.negated)
clone.children = self.children[:]
}}}

But I ''think'' that Q and WhereNode ''can'' both just return a shallow
copy by only aliasing the children property, at which point I think the
`__copy__` method could be implemented on `Node` directly and mirror the
existing `__deepcopy__` method. By preferring those stdlib names it draws
a line in the sand over what level of copy should be expected. The
existing `clone` and `copy` methods can become aliases to same.

=== Before any changes

If we now begin to examine the before and after with as much data as I can
easily gather. First a bit of prelude:

{{{
In [1]: from django.db.models.sql.where import WhereNode
In [2]: from django.db.models import QuerySet
In [3]: from django.db.models.sql import Query
In [4]: from django.contrib.auth.models import User
In [5]: x = User.objects.all()
In [6]: y = User.objects.filter(username='test',
email='te...@test.test').exclude(username='test').filter(email='nottest')
In [7]: %load_ext line_profiler
}}}

Now let's establish a ballpark for how much time is spent where, I've
stripped the line profiling down to just the relevant bits for clarity.

==== Queryset with no clauses (`x`)

{{{
In [8]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): x._chain()

Function: QuerySet._chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1325 def _chain(self,
**kwargs):
...
1330 1000 88169.0 88.2 97.8 obj =
self._clone()

Function: QuerySet._clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1337 def _clone(self):
...
1342 1000 80796.0 80.8 86.3 chained =
self.query.chain()
1343 1000 7240.0 7.2 7.7 c =
self.__class__(model=self.model, query=chained, using=self._db,
hints=self._hints))

Function: Query.chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
341 def chain(self,
klass=None):
...
346 2000 116961.0 58.5 94.8 obj =
self.clone()

Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 18294.0 9.1 27.0 obj.where =
self.where.clone()

Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 4000 6227.0 1.6 63.7 clone =
self.__class__._new_instance(
149 2000 1170.0 0.6 12.0 children=[],
connector=self.connector, negated=self.negated)
150 2000 1407.0 0.7 14.4 for child in
self.children:
151 if
hasattr(child, 'clone'):
152
clone.children.append(child.clone())
153 else:
154
clone.children.append(child)
155 2000 969.0 0.5 9.9 return clone
}}}

==== Queryset with where clauses (`y`)

Youu can see the cost begin to go up:
{{{
In [9]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): y._chain()

Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 60065.0 30.0 59.1 obj.where =
self.where.clone()

Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
148 8000 11473.0 1.4 33.1 clone =
self.__class__._new_instance(
149 4000 2379.0 0.6 6.9 children=[],
connector=self.connector, negated=self.negated)
150 14000 7241.0 0.5 20.9 for child in
self.children:
151 10000 5553.0 0.6 16.0 if
hasattr(child, 'clone'):
152 2000 1255.0 0.6 3.6
clone.children.append(child.clone())
153 else:
154 8000 5069.0 0.6 14.6
clone.children.append(child)
155 4000 1715.0 0.4 4.9 return clone

}}}

and the timings for those 2 (`x` and `y`) querysets, plus an example of
prefetching related data:

{{{
In [10]: %timeit x._chain()
6.43 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [11]: %timeit y._chain()
8.24 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [12]: %timeit y.query.where.clone()
2.79 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %prun for _ in range(100000): y._chain()
3400003 function calls (3300003 primitive calls) in 1.506 seconds
ncalls tottime percall cumtime percall
filename:lineno(function)
100000 0.357 0.000 1.007 0.000 query.py:290(clone)
200000/100000 0.301 0.000 0.539 0.000 where.py:142(clone) <--
WhereNode
100000 0.172 0.000 1.344 0.000 query.py:1337(_clone)
100000 0.068 0.000 1.086 0.000 query.py:341(chain)
100000 0.058 0.000 1.415 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
18.2 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each))
}}}

=== After replacing the clone method

Given those data points, here are the same bits of line profiling, showing
only those where the needle has moved:

==== Queryset with no clauses (`x`)

{{{
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 14134.0 7.1 22.8 obj.where =
self.where.clone()

Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5586.0 2.8 71.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1412.0 0.7 18.0 clone.children =
self.children[:]
150 2000 865.0 0.4 11.0 return clone
}}}

==== Queryset with where clauses (`y`)

{{{
Function: Query.clone at line 290
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 13730.0 6.9 24.7 obj.where =
self.where.clone()

Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5273.0 2.6 69.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1498.0 0.7 19.6 clone.children =
self.children[:]
150 2000 871.0 0.4 11.4 return clone
}}}

And the timings; in the empty case (`x`), it's ''basically'' just paying
the cost of class creation, but where there are clauses it is more
consistent (ie: ''nearer'' a constant time vs linear growth).
{{{
In [10]: %timeit x._chain()
6.51 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [11]: %timeit y._chain()
6.25 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: %timeit y.query.where.clone()
1.16 µs ± 9.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
each)
In [13]: %prun for _ in range(100000): y._chain()
3800003 function calls in 1.926 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
100000 0.355 0.000 0.606 0.000 query.py:290(clone)
100000 0.169 0.000 0.939 0.000 query.py:1337(_clone)
100000 0.068 0.000 0.139 0.000 where.py:142(clone) <---
WhereNode
100000 0.067 0.000 0.683 0.000 query.py:341(chain)
100000 0.058 0.000 1.009 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
17.9 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
}}}

Gut feeling says the "final" implementation could be something like the
following, directly on the `Node` class.
{{{
def __copy__(self):
clone = self._new_instance(children=None, connector=self.connector,
negated=self.negated)
clone.children = self.children[:]
return clone

copy = __copy__
clone = __copy__

def __deepcopy__(self, memo):
...
clone.children = copy.deepcopy(self.children, memo)
...
}}}

which would remove the need for methods on either subclass. Note also that
aliasing the methods that way rather than doing `def copy(): return
self.clone()` as `WhereNode` currently does is also ''slightly'' quicker.
We're talking `0.10µs` - small potatoes, but it doesn't hurt to have +
it's 1 fewer stack frame.

A final note then: if this properly shallow clone is done, I suspect
`resolve_expression` can be altered too, to avoid calling `_resolve_node`
if it's already been done on `self` - the shallow copy would ensure that
anything which had previously been resolved is kept around and doesn't
need doing again, I think.

Other stats: Of the 14,000 tests I can run easily, only 1 time was
`resolved` in `self.__dict__` when `clone` was called (the value was
`True`). `contains_aggregate` turned up 209 times (207 with a value of
`False`, 2 with `True`), and `output_field` only appeared 2 times (both
`BooleanField`). Those are the only extra attribute I spotted.

--
Ticket URL: <https://code.djangoproject.com/ticket/32970>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Jul 29, 2021, 10:29:52 AM7/29/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Keryn Knight:

Old description:

New description:

=== Before any changes

}}}

2100003 function calls in 1.087 seconds

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:1>

Django

unread,

Aug 3, 2021, 4:22:05 AM8/3/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted

Comment:

OK, we can certainly ''Investigate''. Thanks for the work Keryn. :)

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:2>

Django

unread,

Sep 20, 2021, 1:04:03 AM9/20/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:3>

Django

unread,

Sep 20, 2021, 3:45:31 AM9/20/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Reporter: Keryn Knight | Owner: Keryn
Type: | Knight

Cleanup/optimization | Status: closed

Component: Database layer | Version: dev
(models, ORM) |

Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed

Comment:

In [changeset:"e441847ecae99dd1ccd0d9ce76dbcff51afa863c" e441847e]:
{{{
#!CommitTicketReference repository=""
revision="e441847ecae99dd1ccd0d9ce76dbcff51afa863c"
Fixed #32970 -- Changed WhereNode.clone() to create a shallow copy of
children.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:4>

Django

unread,

Sep 30, 2021, 5:26:30 AM9/30/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by GitHub <noreply@…>):

In [changeset:"903aaa35e5ceaa33bfc9b19b7f6da65ce5a91dd4" 903aaa3]:
{{{
#!CommitTicketReference repository=""
revision="903aaa35e5ceaa33bfc9b19b7f6da65ce5a91dd4"
Fixed #33159 -- Reverted "Fixed #32970 -- Changed WhereNode.clone() to

create a shallow copy of children."

This reverts commit e441847ecae99dd1ccd0d9ce76dbcff51afa863c.

A shallow copy is not enough because querysets can be reused and
evaluated in nested nodes, which shouldn't mutate JOIN aliases.

Thanks Michal Čihař for the report.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:5>

Django

unread,

Sep 30, 2021, 5:27:22 AM9/30/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"93a42d43a6995993b9bbcb743ab3c2a2b8414ebd" 93a42d43]:
{{{
#!CommitTicketReference repository=""
revision="93a42d43a6995993b9bbcb743ab3c2a2b8414ebd"
[4.0.x] Fixed #33159 -- Reverted "Fixed #32970 -- Changed

WhereNode.clone() to create a shallow copy of children."

This reverts commit e441847ecae99dd1ccd0d9ce76dbcff51afa863c.

A shallow copy is not enough because querysets can be reused and
evaluated in nested nodes, which shouldn't mutate JOIN aliases.

Thanks Michal Čihař for the report.

Backport of 903aaa35e5ceaa33bfc9b19b7f6da65ce5a91dd4 from main
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:6>

Django

unread,

Sep 30, 2021, 5:28:24 AM9/30/21

to django-...@googlegroups.com

#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------

Severity: Normal | Resolution: wontfix

Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------