[Django] #36242: NodeList render overhead with huge templates

7 views
Skip to first unread message

Django

unread,
Mar 10, 2025, 9:57:08 AMMar 10
to django-...@googlegroups.com
#36242: NodeList render overhead with huge templates
-------------------------------------+-------------------------------------
Reporter: Michal Čihař | Type:
| Cleanup/optimization
Status: new | Component:
| Uncategorized
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
While debugging rendering of some huge templates, I've noticed that the
rendering is slower and needs more memory than necessary because of:

{{{
def render(self, context):
return SafeString("".join([node.render_annotated(context) for node
in self]))
}}}

which unnecessarily builds a list and then passes it to join, which could
directly consume an iterable.

I will prepare a pull request with a fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/36242>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 10, 2025, 12:41:06 PMMar 10
to django-...@googlegroups.com
#36242: NodeList render overhead with huge templates
-------------------------------------+-------------------------------------
Reporter: Michal Čihař | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* resolution: => wontfix
* status: new => closed

Comment:

There is nothing to fix here. A list comprehension is preferable here as
`str.join()` converts to list internally anyway. It is better performance
to provide a list up front.
- https://stackoverflow.com/questions/9060653/list-comprehension-without-
in-python/9061024#9061024
- https://github.com/adamchainz/flake8-comprehensions/issues/156
--
Ticket URL: <https://code.djangoproject.com/ticket/36242#comment:1>

Django

unread,
Mar 10, 2025, 3:13:57 PMMar 10
to django-...@googlegroups.com
#36242: NodeList render overhead with huge templates
-------------------------------------+-------------------------------------
Reporter: Michal Čihař | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michal Čihař):

Thanks for sharing this. I've seen improvement with my data, but
apparently, this depends on the actual content and I should have done more
research.

On short strings using list clearly wins:

{{{
(py3.14)$ python -m timeit '"".join([str(n) for n in range(1000)])'
5000 loops, best of 5: 79.8 usec per loop
(py3.14)$ python -m timeit '"".join(str(n) for n in range(1000))'
5000 loops, best of 5: 102 usec per loop
}}}

On long strings it is the other way around:

{{{
(py3.14)$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in
range(1000)])'
1000 loops, best of 5: 3.27 msec per loop
(py3.14)$ python -m timeit -n 1000 '"".join("x" * 5000 for n in
range(1000))'
1000 loops, best of 5: 750 usec per loop
}}}

But it is more likely that there will be short strings handled in Django
templates.
--
Ticket URL: <https://code.djangoproject.com/ticket/36242#comment:2>

Django

unread,
Mar 10, 2025, 6:23:32 PMMar 10
to django-...@googlegroups.com
#36242: NodeList render overhead with huge templates
-------------------------------------+-------------------------------------
Reporter: Michal Čihař | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

I just ran your second benchmark, and for me the list was consistently
faster even for the larger strings
{{{
% python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 451 usec per loop
% python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'
1000 loops, best of 5: 456 usec per loop
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/36242#comment:3>

Django

unread,
Mar 11, 2025, 4:09:30 AMMar 11
to django-...@googlegroups.com
#36242: NodeList render overhead with huge templates
-------------------------------------+-------------------------------------
Reporter: Michal Čihař | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michal Čihař):

This all made me look into the implementation, and the list comprehension
seems like the best approach in this case. It calls `PySequence_Fast`
which converts iterable into a list if it is not a list or a tuple.
Creating a list using comprehension should be faster than creating a
generator and then converting it to the list, but it most likely depends
on the CPU cache size and this is the corner case I observe (the strings
fill the CPU cache in my case).

Additionally, there is a fast path for same width Unicode strings
(separator + all items), so pure ASCII is:

{{{
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["x" * 5000
for n in range(1000)])'
1000 loops, best of 5: 4.34 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("x" * 5000
for n in range(1000))'
1000 loops, best of 5: 772 usec per loop
}}}

But once you mix Unicode into that:

{{{
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["š" * 5000
for n in range(1000)])'
1000 loops, best of 5: 10.5 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("š" * 5000
for n in range(1000))'
1000 loops, best of 5: 10.4 msec per loop
}}}

And now any difference is gone. So, indeed, this is not a way to optimize.
--
Ticket URL: <https://code.djangoproject.com/ticket/36242#comment:4>
Reply all
Reply to author
Forward
0 new messages