[Django] #36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum() calculation

1 view
Skip to first unread message

Django

unread,
Jan 31, 2026, 8:36:57 AM (5 days ago) Jan 31
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
-------------------------------------+-------------------------------------
Reporter: Tarek Nakkouch | Type:
| Cleanup/optimization
Status: new | Component: Utilities
Version: 6.0 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
The `TruncateCharsHTMLParser.process()` method in `django/utils/text.py`
recalculates `sum(len(p) for p in self.output)` every time it processes a
text chunk. For HTML with multiple text nodes, this repeatedly iterates
over the growing output list unnecessarily.

{{{#!python
def process(self, data):
self.processed_chars += len(data)
if (self.processed_chars == self.length) and (
sum(len(p) for p in self.output) + len(data) == len(self.rawdata)
):
self.output.append(data)
raise self.TruncationCompleted
output = escape("".join(data[: self.remaining]))
return data, output
}}}

== Suggested optimization ==

Cache the output length as `self.output_len` and increment it when
appending to `self.output`:

* Initialize `self.output_len = 0` in `TruncateHTMLParser.__init__()`
* Increment in `handle_starttag()`, `handle_endtag()`, `handle_data()`,
`feed()`, and `process()`
* Replace `sum(len(p) for p in self.output)` with `self.output_len`

This eliminates redundant iteration over already-processed output.
--
Ticket URL: <https://code.djangoproject.com/ticket/36896>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 1, 2026, 11:30:59 AM (3 days ago) Feb 1
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
-------------------------------------+-------------------------------------
Reporter: Tarek Nakkouch | Owner: absyol
Type: | Status: assigned
Cleanup/optimization |
Component: Utilities | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by absyol):

* owner: (none) => absyol
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/36896#comment:1>

Django

unread,
Feb 1, 2026, 12:19:47 PM (3 days ago) Feb 1
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
-------------------------------------+-------------------------------------
Reporter: Tarek Nakkouch | Owner: absyol
Type: | Status: assigned
Cleanup/optimization |
Component: Utilities | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by absyol):

Hello, I am new here (and to open source contributions in general). Please
feel feel free to let me know if there is anything I can improve in.

I do agree that iterating through every item in `self.output` is redundant
and we can do it while adding to the list. I have a solution right now
that implements a helper function to append a items in a list to
`self.output` and compute `self.output_length` on the fly. Each
modification to `self.output` will be replaced with this helper function
so extending the class later will be less error-prone.

{{{
def update_output_fields(self, outputs):
for output in outputs:
self.output.append(output)
self.output_length += len(output)
}}}

When modifying one of the other functions, we can call it like the
following:

{{{
def handle_data(self, data):
data, output = self.process(data)
data_len = len(data)
if self.remaining < data_len:
self.remaining = 0

# call here
self.update_output_fields([add_truncation_text(output,
self.replacement)])
raise self.TruncationCompleted
self.remaining -= data_len

# call here
self.update_output_fields([output])
}}}

We take a list as input to support `feed()`:
{{{
def feed(self, data):
try:
super().feed(data)
except self.TruncationCompleted:
self.update_output_fields([f"</{tag}>" for tag in self.tags])
self.tags.clear()
self.reset()
else:
# No data was handled.
self.reset()
}}}

Would this be sufficient for this optimization?
--
Ticket URL: <https://code.djangoproject.com/ticket/36896#comment:2>

Django

unread,
Feb 1, 2026, 12:21:25 PM (3 days ago) Feb 1
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
--------------------------------------+------------------------------------
Reporter: Tarek Nakkouch | Owner: absyol
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by absyol):

* stage: Unreviewed => Accepted

--
Ticket URL: <https://code.djangoproject.com/ticket/36896#comment:3>

Django

unread,
Feb 1, 2026, 1:20:46 PM (3 days ago) Feb 1
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
--------------------------------------+------------------------------------
Reporter: Tarek Nakkouch | Owner: absyol
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by absyol):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/36896#comment:4>

Django

unread,
Feb 1, 2026, 1:21:49 PM (3 days ago) Feb 1
to django-...@googlegroups.com
#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
--------------------------------------+------------------------------------
Reporter: Tarek Nakkouch | Owner: absyol
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Comment (by absyol):

Here is the pull request - https://github.com/django/django/pull/20624
--
Ticket URL: <https://code.djangoproject.com/ticket/36896#comment:5>
Reply all
Reply to author
Forward
0 new messages