[Django] #32940: Consider removing unused/unnecessary functionality in tree.Node.add

2 views
Skip to first unread message

Django

unread,
Jul 18, 2021, 6:58:57 AM7/18/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
------------------------------------------------+------------------------
Reporter: Keryn Knight | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: dev
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
I'm proposing for discussion the removal of 2 things within the `Node.add`
method, or at least further investigation on why it's ''not'' possible to
do so; those 2 bits are outlined immediately below.

If we instrument the `Node.add` method, used by `query_utils.Q` and
`where.WhereNode` extensively, like so:
{{{
...
if data in self.children:
print('v- data existed')
traceback.print_stack()
if self.connector == conn_type and data in self.children:
print('^- data existed and connector matched')
return data
if not squash:
print('<- squash is not Truthy')
self.children.append(data)
return data
...
}}}

and run the test suite (or as much as I'm able easily):

{{{
...
Ran 14834 tests in 453.388s
OK (skipped=1188, expected failures=4)
}}}

We will find that the `data in self.children` barely registers as
tested/used, and `squash` is entirely unused (unless those 1188 tests hide
the truth).

Having separately instrumented the add method previously to check the
number of ''calls'' by class type, it appears as below:

- `1493` calls to `Q.add` of which `0` have a `data in children` match.
- `2` calls to `Node.add` of which `1` is a `data in children` match.
- `210862` calls to `WhereNode.add` of which `2` are a `data in children
match.

But of all of those matches, the connector is never the same, so it never
does that return early expected optimisation.

The tests which exercise those 3 matches are:
- `tests.queries.tests.DisjunctiveFilterTests.test_ticket8283`
- `tests.queries.tests.Queries4Tests.test_combine_or_filter_reuse`
-
`tests.utils_tests.test_tree.NodeTests.test_add_eq_child_mixed_connector`

The last of those is ''intentionally'' making sure that it ''does'' get
added due to the connector difference.
The first two, as they could affect where clause pushdown & merging, or
cause different query plans due to repeats, we should check to make sure
that the queries (additionally to the results) remain the same before and
after by doing the following back of the napkin job for each of them:
{{{
self.assertEqual(
str(x.query),
"???"
)
}}}
and confirm that they remain the same (they do), which are:
{{{
'SELECT "queries_extrainfo"."id", "queries_extrainfo"."info",
"queries_extrainfo"."note_id", "queries_extrainfo"."value",
"queries_extrainfo"."date_id", "queries_extrainfo"."filterable" FROM
"queries_extrainfo" WHERE (("queries_extrainfo"."note_id" = 1 OR
"queries_extrainfo"."info" = e2) AND "queries_extrainfo"."note_id" = 1)
ORDER BY "queries_extrainfo"."info" ASC';

'SELECT "queries_extrainfo"."id", "queries_extrainfo"."info",
"queries_extrainfo"."note_id", "queries_extrainfo"."value",
"queries_extrainfo"."date_id", "queries_extrainfo"."filterable" FROM
"queries_extrainfo" WHERE (("queries_extrainfo"."info" = e2 OR
"queries_extrainfo"."note_id" = 1) AND "queries_extrainfo"."note_id" = 1)
ORDER BY "queries_extrainfo"."info" ASC';

'SELECT "queries_author"."id", "queries_author"."name",
"queries_author"."num", "queries_author"."extra_id" FROM "queries_author"
WHERE (("queries_author"."name" = a1 OR "queries_author"."name" = a3) AND
"queries_author"."name" = a1)';
}}}

Removing the `data in self.children` check should be a small optimisation
win because it's an `O(n)` scan across the entire list of direct children,
and uses the `Node.__eq__` method which does an equality scan on it's own
children (so grandchildren).

{{{
In [2]: %%timeit
...: x = Q(('a', 1), ('b', 2), c=3, d=4, b=1)
...: y = Q(aa=1, bb=2, cc=3)
...: x.add(y, Q.OR)
4.81 µs ± 16.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}

after removing:
{{{
4.66 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}

It's worth noting that the concept of checking `data in self.children` is
semi-flawed anyway, because `Q(('a', 1), a=2)` or `Q(('a', 1), ('a', 2))`
very ''simply'' allow duplicate entries to occur immediately and up-front.

I have a patch which I'll push up shortly once I've tidied it of all my
instrumentation and `pdb` usage. The hope is that the entire CI passes, if
it doesn't this is all moot :)

--
Ticket URL: <https://code.djangoproject.com/ticket/32940>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 18, 2021, 7:07:43 AM7/18/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Utilities | Version: dev
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* owner: nobody => Keryn Knight
* status: new => assigned


Comment:

Draft PR is https://github.com/django/django/pull/14658
Let's see what it says about whether or not this ''could'' even move to
accepted.

--
Ticket URL: <https://code.djangoproject.com/ticket/32940#comment:1>

Django

unread,
Jul 18, 2021, 5:01:17 PM7/18/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Utilities | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nick Pope):

* stage: Unreviewed => Accepted


Old description:

New description:

--

Comment:

The `squash` argument was added in
d3f00bd5706b35961390d3814dd7e322ead3a9a3. It seems to have been unused
since it was introduced. (I didn't find `squash=False` anywhere.)

The `if self.connector == conn_type and data in self.children:` line had
the first half of the expression added recently in
b81c7562fc33f50166d5120138d6398dc42b13c3. Interestingly this was actually
fixing a regression in d3f00bd5706b35961390d3814dd7e322ead3a9a3 as prior
to that commit the condition was the same!

It'd probably be a good idea to see if Simon can cast an eye over this.

--
Ticket URL: <https://code.djangoproject.com/ticket/32940#comment:2>

Django

unread,
Jul 20, 2021, 2:14:12 AM7/20/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Utilities | Version: dev

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"ff661dbd506efbdf51cc8da89cb98731c8a62f49" ff661db]:
{{{
#!CommitTicketReference repository=""
revision="ff661dbd506efbdf51cc8da89cb98731c8a62f49"
Refs #32940 -- Removed unnecessary branch in Node.add().

The "data in self.children" branch was causing data.__eq__ to be
called for each entries in "self.children" which resulted in a huge
slowdown during queryset construction.

It's purpose was to prevent queries of the form
Model.objects.filter(foo='bar').filter(foo='bar')
from resulting in
WHERE foo='bar' AND foo='bar'
but it's not covered by the suite and has arguable performance benefits
since it's not very common and SQL engines are usually very good at
folding/optimizing these.

See also #32632 for prior discussion around comparing data to the
Node's children.

Co-authored-by: Nick Pope <ni...@nickpope.me.uk>
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32940#comment:4>

Django

unread,
Jul 20, 2021, 2:14:13 AM7/20/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Utilities | Version: dev

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"fb35e0a2feb36e60b93a12dd43eb9eed2015adda" fb35e0a2]:
{{{
#!CommitTicketReference repository=""
revision="fb35e0a2feb36e60b93a12dd43eb9eed2015adda"
Refs #32940 -- Removed Node.add()'s unused squash parameter.

Unused since its introduction in d3f00bd5706b35961390d3814dd7e322ead3a9a3.

Co-authored-by: Nick Pope <ni...@nickpope.me.uk>
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32940#comment:3>

Django

unread,
Jul 20, 2021, 2:15:00 AM7/20/21
to django-...@googlegroups.com
#32940: Consider removing unused/unnecessary functionality in tree.Node.add
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: closed
Component: Utilities | Version: dev
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: assigned => closed
* resolution: => fixed


--
Ticket URL: <https://code.djangoproject.com/ticket/32940#comment:5>

Reply all
Reply to author
Forward
0 new messages