[Django] #33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless necessary, for performance.

8 views
Skip to first unread message

Django

unread,
Aug 14, 2021, 11:12:54 AM8/14/21
to django-...@googlegroups.com
#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn | Owner: Keryn Knight
Knight |
Type: | Status: assigned
Cleanup/optimization |
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Currently, `build_lookup` contains:
{{{
# For Oracle '' is equivalent to null. The check must be done at this
# stage because join promotion can't be done in the compiler. Using
# DEFAULT_DB_ALIAS isn't nice but it's the best that can be done here.
# A similar thing is done in is_nullable(), too.
if
(connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
and
lookup_name == 'exact' and lookup.rhs == ''):
return lhs.get_lookup('isnull')(lhs, True)
}}}
but going through `connections` requires accessing the thread critical
`asgiref.Local` which is only strictly necessary if the cheap comparisons
are truthy and can be short-circuited like so:
{{{
if (lookup_name == 'exact' and lookup.rhs == '' and
connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
return lhs.get_lookup('isnull')(lhs, True)
}}}

This is coming off the back of the report in #33015, which is a nice
pathological case we can use:
{{{
In [1]: from django.db.models import Q
In [2]: from testapp.models import ExampleModel
In [3]: items_to_fetch = [(i, j) for i in range(0,100) for j in
range(0,100)]
In [4]: query = Q()
...: for v1, v2 in items_to_fetch:
...: query |= Q(val1=v1, val2=v2)
%timeit x = ExampleModel.objects.filter(query)
644 ms ± 6.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
If we cProfile that `ExampleModel.objects.filter(query)` we get something
like:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
30001/1 0.178 0.000 1.268 1.268 query.py:1221(build_filter)
40000 0.142 0.000 0.157 0.000 query.py:1457(names_to_path)
10002/1 0.077 0.000 1.268 1.268 query.py:1386(_add_q)
20000 0.046 0.000 0.379 0.000 query.py:1156(build_lookup)
}}}
and if we do `%lprun -f Query.build_filter -f Query.build_lookup
ExampleModel.objects.filter(query)` on it we'll get (snipped for brevity):
{{{
Function: build_filter at line 1221
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1329 20000 651989.0 32.6 24.9 condition =
self.build_lookup(lookups, col, value)

Function: build_lookup at line 1156
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1195 20000 200730.0 10.0 38.1 if
(connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
and
1196
lookup_name == 'exact' and lookup.rhs == ''):
1197 return
lhs.get_lookup('isnull')(lhs, True
}}}
Because I'm using sqlite, that 38% of time is essentially free to reclaim
by switching the if condition as above, in which case we get cProfile
output like so:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
30001/1 0.171 0.000 1.082 1.082 query.py:1220(build_filter)
40000 0.145 0.000 0.159 0.000 query.py:1456(names_to_path)
10002/1 0.076 0.000 1.082 1.082 query.py:1385(_add_q)
20000 0.042 0.000 0.177 0.000 query.py:1559(setup_joins)
220005 0.039 0.000 0.064 0.000 {built-in method
builtins.isinstance}
20000 0.037 0.000 0.223 0.000 query.py:1156(build_lookup)
}}}
and line profile information like so:
{{{
Function: build_filter at line 1220
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1328 20000 447682.0 22.4 18.9 condition =
self.build_lookup(lookups, col, value)

Function: build_lookup at line 1156
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1195 20000 12744.0 0.6 3.9 if (lookup_name
== 'exact' and lookup.rhs == '' and
1196
connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
1197 return
lhs.get_lookup('isnull')(lhs, True
}}}

which ultimately changes the timed performance for non oracle backends to:
{{{
In [5]: %timeit x = ExampleModel.objects.filter(query)
547 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
(YMMV; it's being tested on a laptop and I've seen it ranging from 519ms
to 600ms average.)

Patch will be forthcoming as a PR shortly, to confirm it doesn't break CI.

--
Ticket URL: <https://code.djangoproject.com/ticket/33025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 14, 2021, 11:30:24 AM8/14/21
to django-...@googlegroups.com
#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* needs_better_patch: 0 => 1


Comment:

PR is https://github.com/django/django/pull/14769
So far, a linter is telling me off ... because of course it is. I hate it.
And 3 CI jobs died ''immediately'' (shockingly quickly, never even saw
them go to pending) having failed to do a git checkout?

I'm going to hope that at least some of the others pass...

--
Ticket URL: <https://code.djangoproject.com/ticket/33025#comment:1>

Django

unread,
Aug 14, 2021, 12:25:59 PM8/14/21
to django-...@googlegroups.com
#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by David Smith):

* stage: Unreviewed => Accepted


Comment:

Yes asgiref's `Local` is "slow" see #32312, although I'm not sure what's
possible to improve its performance.

--
Ticket URL: <https://code.djangoproject.com/ticket/33025#comment:2>

Django

unread,
Aug 19, 2021, 7:30:43 AM8/19/21
to django-...@googlegroups.com
#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 1 => 0
* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/33025#comment:3>

Django

unread,
Aug 19, 2021, 7:53:48 AM8/19/21
to django-...@googlegroups.com
#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: closed

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"bf5abf1bdcedb15e949db419c61eeec7c88414ea" bf5abf1]:
{{{
#!CommitTicketReference repository=""
revision="bf5abf1bdcedb15e949db419c61eeec7c88414ea"
Fixed #33025 -- Avoided accessing the database connections in
Query.build_lookup() when not necessary.

Of the built-in backends, only Oracle treats empty strings and nulls as
equal, so avoid testing the default connection backend for
interprets_empty_strings_as_nulls if it can be established from the
lookup that it wouldn't affect the lookup instance returned. This
improves performance a small amount for most lookups being built,
because accessing the connections requires touching the thread critical
`Local` which is an expensive operation.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33025#comment:4>

Reply all
Reply to author
Forward
0 new messages