#36896: Optimize TruncateCharsHTMLParser.process() to avoid redundant sum()
calculation
-------------------------------------+-------------------------------------
Reporter: Tarek Nakkouch | Type:
| Cleanup/optimization
Status: new | Component: Utilities
Version: 6.0 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
The `TruncateCharsHTMLParser.process()` method in `django/utils/text.py`
recalculates `sum(len(p) for p in self.output)` every time it processes a
text chunk. For HTML with multiple text nodes, this repeatedly iterates
over the growing output list unnecessarily.
{{{#!python
def process(self, data):
self.processed_chars += len(data)
if (self.processed_chars == self.length) and (
sum(len(p) for p in self.output) + len(data) == len(self.rawdata)
):
self.output.append(data)
raise self.TruncationCompleted
output = escape("".join(data[: self.remaining]))
return data, output
}}}
== Suggested optimization ==
Cache the output length as `self.output_len` and increment it when
appending to `self.output`:
* Initialize `self.output_len = 0` in `TruncateHTMLParser.__init__()`
* Increment in `handle_starttag()`, `handle_endtag()`, `handle_data()`,
`feed()`, and `process()`
* Replace `sum(len(p) for p in self.output)` with `self.output_len`
This eliminates redundant iteration over already-processed output.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36896>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.