[Django] #32824: Potential micro-optimisations for NodeList.render

8 views
Skip to first unread message

Django

unread,
Jun 7, 2021, 11:08:58 AM6/7/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
------------------------------------------------+------------------------
Reporter: Keryn Knight | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Template system | Version: dev
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
The current implementation of `NodeList.render` looks like:
{{{
def render(self, context):
bits = []
for node in self:
if isinstance(node, Node):
bit = node.render_annotated(context)
else:
bit = node
bits.append(str(bit))
return mark_safe(''.join(bits))
}}}

which, given a single template render (template + code to be attached, but
is basically the same fakery as in #32814) is rendered 41 times (cProfile
output, relevant portions to be discussed in this ticket only):
{{{
41/1 0.000 0.000 0.007 0.007 base.py:956(render)
591 0.000 0.000 0.000 0.000 {built-in method
builtins.hasattr}
191 0.000 0.000 0.000 0.000 {built-in method
builtins.callable}
115/113 0.000 0.000 0.000 0.000 safestring.py:53(mark_safe)
}}}

Given the numbers are too miniscule to register, for the purposes of
surfacing the most time consuming parts, and pushing things like importlib
machinery etc to the bottom of the profile output, rendering the same
template 1000 times gives us:
{{{
4897753 function calls (4539046 primitive calls) in 2.832 seconds

41000/1000 0.193 0.000 2.671 0.003 base.py:956(render)
70045/68045 0.042 0.000 0.095 0.000
safestring.py:50(mark_safe)
128045 0.053 0.000 0.053 0.000 safestring.py:26(__init__)
159432 0.023 0.000 0.023 0.000 {built-in method
builtins.hasattr}
114077 0.012 0.000 0.012 0.000 {built-in method
builtins.callable}
686582 0.076 0.000 0.076 0.000 {built-in method
builtins.isinstance}
318184 0.036 0.000 0.036 0.000 {method 'append' of 'list'
objects}
}}}
(to ensure `SafeString.__init__` calls are registered, I manually created
the method, so timings for it aren't correct but counts will be).

==== Proposed change

The primary change I'm suggesting is deviating from using `mark_safe` to
directly returning a new `SafeString` - as far as I know, the result of
`''.join(...)` is ''always'' of type `str` regardless of if any of the
underlying bits are themselves `SafeString` etc, so for each call to
`NodeList.render` we'd be saving 1 `hasattr(...)` test and 1
`callable(...)` test, which would change the 1000 runs to:
{{{
4774747 function calls (4416040 primitive calls) in 2.710 seconds

41000/1000 0.194 0.000 2.669 0.003 base.py:956(render)
29045/27045 0.019 0.000 0.063 0.000
safestring.py:50(mark_safe)
128045 0.048 0.000 0.048 0.000 safestring.py:26(__init__)
118432 0.019 0.000 0.019 0.000 {built-in method
builtins.hasattr}
73077 0.008 0.000 0.008 0.000 {built-in method
builtins.callable}
686582 0.073 0.000 0.073 0.000 {built-in method
builtins.isinstance}
318184 0.038 0.000 0.038 0.000 {method 'append' of 'list'
objects}
}}}

You can see that whilst the timings don't change ''substantially'', it is
a whole bunch less calls.

==== Worth investigating?

But ''possibly'' more interesting than that is the rest of the method. As
far as I can tell from running the test suite (which, tbf, does include a
boatload of skips being output), the line `if isinstance(node, Node):` is
**always** true -- that is, there's no test anywhere for what happens if
you somehow, somewhere include a non `Node` instance into a `NodeList`? If
that's the case (which again, it may not be, given skips) then there's 2
possiblities:

1. add in a test demonstrating what should happen when a `non-Node`
appears in a `NodeList` and canonically accept it as part of DTL
2. take the stance that NodeLists are inherently composed of only nodes
(or at least things with `render_annotated(context)`) and sweep away the
isinstance check.

If option 2 were taken, the final method could be (without any
intermediate deprecation stuffs if necessary):
{{{
def render(self, context):
bits = []
append = bits.append
for node in self:
append(str(node.render_annotated(context)))
return SafeString(''.join(bits))
}}}

which, compounded with the change from `mark_safe` to `SafeString`
proposed above would give (again, 1000 renders):

{{{
4575748 function calls (4217041 primitive calls) in 2.685 seconds

41000/1000 0.152 0.000 2.567 0.003 base.py:956(render)
29045/27045 0.018 0.000 0.062 0.000 safestring.py:53(mark_safe
128045 0.048 0.000 0.048 0.000
safestring.py:26(__init__))
118432 0.019 0.000 0.019 0.000 {built-in method
builtins.hasattr}
73077 0.008 0.000 0.008 0.000 {built-in method
builtins.callable}
487582 0.058 0.000 0.058 0.000 {built-in method
builtins.isinstance}
318184 0.039 0.000 0.039 0.000 {method 'append' of 'list'
objects}
}}}
for a total saving of 199,000 `isinstance` checks, 41,000 `hasattr` checks
and 41,000 `callable` checks (from a starting point of `4897753` function
and `4539046` primitive calls down to `4575748` and `4217041`
respectively)

Were that final method implementation the target/goal, it ''may'' be OK to
move from stacking up a intermediate list (the `nodes` variable) to a
generator expression into the `''.join(...)` but briefly using
`memory_profiler` to `@profile` the method shows its all 0 allocations
anyway (presumably references & gc detection?)

==== Additional notes

Yappi is slower than cProfile but suggests the same small wins (amplified
in both cases by the profiling machinery slowing everything down).

The difference (when measured via yappi, and reported via pycharm's handy
profile visualiser) between the **original method** and the **ideal
replacement** would be a decrease from `7.6%` of time being spent
(directly) on this method to `5.7%`

The difference by just removing the `mark_safe` call and the `SafeString`
suggestion is obviously less pronounced, but yappi suggests that the time
spent in `mark_safe` goes from `1.9%` of time to `0.9%`

Those proportions are ''vaguely'' the same for cProfile, give or take.

--
Ticket URL: <https://code.djangoproject.com/ticket/32824>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 7, 2021, 11:10:00 AM6/7/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------

Reporter: Keryn Knight | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Template system | Version: dev
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* Attachment "textnode_test2.py" added.

yappi/cprofile/timeit oriented template rendering (1 complex template,
fake/minimal context)

Django

unread,
Jun 7, 2021, 12:58:38 PM6/7/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------

Reporter: Keryn Knight | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Template system | Version: dev
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Jacob Walls):

Tim McCurrach proposed directly returning `SafeString` in several places
in #32568.

Regarding the rest of the proposal, `NodeList` is documented to consist of
`Node` objects, so option #2 sounds right-track to me:
{{{
``parser.parse()`` takes a tuple of names of block tags ''to parse
until''. It
returns an instance of ``django.template.NodeList``, which is a list of
all ``Node`` objects that the parser encountered ''before'' it encountered
any of the tags named in the tuple.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:1>

Django

unread,
Jun 8, 2021, 12:13:08 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
--------------------------------------+------------------------------------

Reporter: Keryn Knight | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* stage: Unreviewed => Accepted


Comment:

We don't even need `bits`, so maybe:
{{{
def render(self, context):
return SafeString(''.join([
str(node.render_annotated(context))
for node in self
]))
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:2>

Django

unread,
Jun 8, 2021, 7:22:39 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned

Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* owner: nobody => Keryn Knight
* status: new => assigned


Comment:

I think I'm happier still with that change, I wasn't proposing it because
I couldn't previously get `memory_profiler` to play ball, but (at least in
the pathological case of a stupid number of child nodes) it does skip the
intermediate list, which does have some small measure on runtime
performance:

Using a small stub for benchmarking, using ipython:
{{{
from nodelist_render_test import render
from django.template.base import NodeList
%load_ext memory_profiler
%mprun -f NodeList.render render()
Line # Mem usage Increment Occurences Line Contents
============================================================
957 58.8 MiB 51.1 MiB 10 def render(self,
context):
958 # return
SafeString(''.join([
959 #
str(node.render_annotated(context))
960 # for node in
self
961 # ]))
962 58.8 MiB 0.0 MiB 10 bits = []
963 58.8 MiB 0.0 MiB 10 append =
bits.append
964 59.5 MiB 0.0 MiB 1000010 for node in self:
965 59.5 MiB 7.4 MiB 1000000
append(str(node.render_annotated(context)))
966 59.6 MiB 1.1 MiB 10 return
SafeString(''.join(bits))
}}}
and:
{{{
%timeit render()
216 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
timeit is consistently between 215ms & 225ms, repeatedly running the above
(rather than using `-nX00`)

switching to the generator expression version, now I've got
`memory_profiler` to report ''something'':
{{{
%mprun -f NodeList.render render()
Line # Mem usage Increment Occurences Line Contents
============================================================
957 58.7 MiB 51.3 MiB 10 def render(self,
context):
958 58.9 MiB -11945.8 MiB 1000040 return
SafeString(''.join([
959 58.8 MiB -11939.6 MiB 1000000
str(node.render_annotated(context))
960 58.8 MiB -25012.3 MiB 1000010 for node in
self
961 ]))
962 # bits = []
963 # append =
bits.append
964 # for node in self:
965 #
append(str(node.render_annotated(context)))
966 # return
SafeString(''.join(bits))
}}}
dunno what to make of the deallocations, but I guess zeroes or minus
numbers are better than `7.1MB` ... maybe I really am generating 11GB of
nodes in this synthetic benchmark :)

and timeit:
{{{
%timeit render()
202 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
timeit was consistently around 200 to 205ms repeatedly running it.

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:3>

Django

unread,
Jun 8, 2021, 7:23:29 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* Attachment "nodelist_render_test.py" added.

synthetic test for isolating just this method, with memory profiler etc

Django

unread,
Jun 8, 2021, 7:46:28 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Keryn Knight):

Oh and for completeness, the **original** method before any of these
proposed changes, for historic reference:

{{{


%mprun -f NodeList.render render()
Line # Mem usage Increment Occurences Line Contents
============================================================

957 62.3 MiB -43.6 MiB 10 def render(self,
context):
958 62.3 MiB 0.0 MiB 10 bits = []
959 72.8 MiB -4336382.0 MiB 1000010 for node in self:
960 72.8 MiB -4336466.3 MiB 1000000 if
isinstance(node, Node):
961 bit =
node.render_annotated(context)
962 else:
963 72.8 MiB -4333316.3 MiB 1000000 bit =
node
964 72.8 MiB -4333210.0 MiB 1000000
bits.append(str(bit))
965 72.8 MiB 10.1 MiB 10 return
mark_safe(''.join(bits))
}}}
and the timeit (remember this is the pathological case in terms of nodes
to iterate over):
{{{
578 ms ± 4.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
consistently between 850-615ms across repeated runs.

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:4>

Django

unread,
Jun 8, 2021, 8:04:20 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:5>

Django

unread,
Jun 8, 2021, 8:23:02 AM6/8/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Keryn Knight):

Sigh. Apologies for the noise, but I went for a coffee and realised I was
mis-representing the performance of the **original** method (last set of
benchmarks above) by not ensuring my `FakeNode` was a subclass of `Node`
(and thus `render_annotated` wasn't being called, and so the string being
returned was the class repr, inflating things massively) ... and that's
not useful to anyone, so here's a repeat which should more accurately
demonstrate the change for historic record:

{{{


Line # Mem usage Increment Occurences Line Contents
============================================================

956 60.2 MiB 52.9 MiB 10 def render(self,
context):
957 60.2 MiB 0.0 MiB 10 bits = []
958 60.4 MiB -11945.9 MiB 1000010 for node in self:
959 60.4 MiB -11946.1 MiB 1000000 if
isinstance(node, Node):
960 60.4 MiB -11946.1 MiB 1000000 bit =
node.render_annotated(context)
961 else:
962 bit = node
963 60.4 MiB -11939.6 MiB 1000000
bits.append(str(bit))
964 60.5 MiB 1.0 MiB 10 return
mark_safe(''.join(bits))
}}}
and the timeit:
{{{
306 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
(consistently between 305 and 315ms across repeated runs)

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:6>

Django

unread,
Jun 10, 2021, 8:37:40 AM6/10/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Template system | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim McCurrach):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:7>

Django

unread,
Jun 11, 2021, 6:52:20 AM6/11/21
to django-...@googlegroups.com
#32824: Potential micro-optimisations for NodeList.render
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: closed

Component: Template system | Version: dev
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"854e9b066850b9b4eb1171966e996322b2c16d27" 854e9b0]:
{{{
#!CommitTicketReference repository=""
revision="854e9b066850b9b4eb1171966e996322b2c16d27"
Fixed #32824 -- Improved performance of NodeList.render().

This avoids the following:
- checking that each item in the nodelist is a subclass of Node,
- calling str() on the render_annotated() output, because it's
documented that Node.render() must return a string,
- calling mark_safe() on the output, when the value to be wrapped is
definitively known to be a string because the result of ''.join()
is always of that type,
- using an intermediate list to store each individual string.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32824#comment:8>

Reply all
Reply to author
Forward
0 new messages