[Django] #36520: Performance Regression in parse_header_params

31 views
Skip to first unread message

Django

unread,
Jul 22, 2025, 6:48:03β€―PMJul 22
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-----------------------------+-----------------------------------------
Reporter: David Smith | Type: Uncategorized
Status: new | Component: HTTP handling
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------+-----------------------------------------
https://github.com/django/django/pull/18424 / #35440 simplified the
`parse_header_params()` function.

This introduced a performance regression than can be seen on django-asv.
[https://django.github.io/django-asv/regressions.xml link] shows 5
performance regressions on or arround 28 March 2025, all of which relate
to this commit.

A minimal reproduction shows a 5x performance regression.

Before:

{{{
In [2]: %timeit parse_header_parameters("text/plain")
500 ns Β± 1.42 ns per loop (mean Β± std. dev. of 7 runs, 1,000,000 loops
each)
}}}

After
{{{
In [2]: %timeit parse_header_parameters("text/plain")
2.65 ΞΌs Β± 44.7 ns per loop (mean Β± std. dev. of 7 runs, 100,000 loops
each)
}}}

Running a [https://pypi.org/project/line-profiler/#quick-start-modern
line-profile] shows:

{{{
Line # Hits Time Per Hit % Time Line Contents
==============================================================
320 @line_profiler.profile
321 def
parse_header_parameters(line, max_length=MAX_HEADER_LENGTH):
322 """
323 Parse a Content-type
like header.
324 Return the main
content-type and a dictionary of options.
325
326 If `line` is longer
than `max_length`, `ValueError` is raised.
327 """
328 1 11.0 11.0 2.4 if max_length is not
None and line and len(line) > max_length:
329 raise
ValueError("Unable to parse header parameters (value too long).")
330
331 1 47.0 47.0 10.3 m = Message()
332 1 51.0 51.0 11.2 m["content-type"] =
line
333 1 332.0 332.0 73.0 params =
m.get_params()
334
335 1 2.0 2.0 0.4 pdict = {}
336 1 7.0 7.0 1.5 key =
params.pop(0)[0].lower()
337 1 2.0 2.0 0.4 for name, value in
params:
338 if not name:
339 continue
340 if
isinstance(value, tuple):
341 value =
collapse_rfc2231_value(value)
342 pdict[name] =
value
343 1 3.0 3.0 0.7 return key, pdict
}}}

As I think this function is called with every request, I thought it worth
raising.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 22, 2025, 10:01:16β€―PMJul 22
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Changes (by Natalia Bidart):

* severity: Normal => Release blocker
* stage: Unreviewed => Accepted
* type: Uncategorized => Bug
* version: 5.2 => dev

Comment:

Thank you David for raising this. IIUC, the slowest code path comes from
the Python implementation when calling `get_params()`?

Also the 5x regression comes from your test, not asv, right? I see better
values in asv if I'm reading that correctly.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:1>

Django

unread,
Jul 24, 2025, 7:18:51β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by David Smith):

Yes, `get_params()` seems to be where most of the time is being spent.

The 5x claim I made above is just looking at this function. Other things
are happening in the ASV benchmarks, so the impact shown there is much
smaller. Maybe that's a good reason to suggest that even though this
function is much slower, it is small in the overall request/response
cycle. However, I think that as this is called with every request even
small gains are worth having.

I have done a bit more work on this. We could optimise for content types
that do not include parameters and avoid a lot of the work in this case.
Something like this:

I'm rather unsure how common parameters are. Some evidence would be useful
to avoid optimising for the wrong thing.

{{{

diff --git a/django/utils/http.py b/django/utils/http.py
index 1c7aec7141..111740cd02 100644
--- a/django/utils/http.py
+++ b/django/utils/http.py
@@ -326,6 +326,10 @@ def parse_header_parameters(line,
max_length=MAX_HEADER_LENGTH):
if max_length is not None and line and len(line) > max_length:
raise ValueError("Unable to parse header parameters (value too
long).")

+ if line and ";" not in line:
+ # No parameters, just return the content type.
+ return line.strip().lower(), {}
+
m = Message()
m["content-type"] = line
params = m.get_params()

}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:2>

Django

unread,
Jul 24, 2025, 7:38:40β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

Thank you David for following up on this ticket!

I was thinking that maybe it may be worth reporting this issue to Python
upstream? I feel like the proposed shortcut could benefit the broader
ecosystem and thus be part of the `Message` implementation. Would you be
available to pursue that?

Thanks again!
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:3>

Django

unread,
Jul 24, 2025, 8:24:04β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

Update: I built a script to benchmark this and I think I can report the
issue myself to Python.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:4>

Django

unread,
Jul 24, 2025, 9:02:32β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

I created [https://discuss.python.org/t/performance-regression-from-using-
email-message-message-get-params-as-replacement-of-cgi-parse-header-
following-pep-594/99963 this Python forum post].
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:5>

Django

unread,
Jul 24, 2025, 9:40:20β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Carlton Gibson):

Since the regression is only on `main` so far, should we treat this as a
release blocker on 6.0? Regardless of a resolution in Python (which will
probably be slow if coming at all) David's patch looks preferable to the
regression here. WDYT?
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:6>

Django

unread,
Jul 24, 2025, 10:02:32β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Changes (by Carlton Gibson):

* cc: Carlton Gibson (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:7>

Django

unread,
Jul 24, 2025, 10:05:23β€―AMJul 24
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Jake Howard):

I'm rather unsure how common parameters are.

Intentionally, probably not especially common. But browses seem eager to
add `charset=utf-8` when they default generate the `Content-Type`. Also,
parameters are used to set the `boundary` for form submissions, so could
be sent with the majority of `POST` requests.

Because any form submission will probably hit the slow path, as could many
other clients, short-circuiting probably isn't the way to go. Given
reverting #35440 ought not to be too complex, we can probably afford to
hold off and see what the response from upstream is.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:8>

Django

unread,
Jul 29, 2025, 11:07:52β€―AMJul 29
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

So! I followed the suggestion from the Python Discourse post, where Barry
Scott suggested to benchmark *bytes* header parsing instead of unicode
since:

> When I worked on code to parse HTTP headers we worked in bytes to save
the cost of decoding to unicode.

And I was surprised to see that the slowdown is actually a speed gain for
complex headers. See these comparison tables:

||= Python 3.11 (real `cgi`, email.message gets `bytes`) =||= `cgi` =||=
`email.message` =||= ratio =||
|| `text/plain` || 0.316 ||
1.618 || ~5.1x ||
|| `text/html; charset=UTF-8; boundary=something` || 1.362 ||
4.364 || ~3.2x ||
|| `application/x-stuff; title*=...; foo=bar; foo2*=...` || 1.822 ||
2.405 || ~1.3x ||


||= Python 3.13 (`cgi` shim, email.message gets `bytes`) =||= `cgi` =||=
`email.message` =||= ratio =||
|| `text/plain` || 0.312 ||
1.676 || ~5.4x ||
|| `text/html; charset=UTF-8; boundary=something` || 1.188 ||
3.887 || ~3.3x ||
|| `application/x-stuff; title*=...; foo=bar; foo2*=...` || 4.358 ||
2.175 || ~0.5x ||

So maybe we could indeed shortcircuit the parsing if no `;` is found in
the header line (to avoid paying the 5x slowdown), but then is unclear to
me how we could resort back to the header bytes when what we get in the
WSGI `environ` is str.

Carlton, Jake, David, any ideas?
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:9>

Django

unread,
Jul 29, 2025, 11:37:32β€―AMJul 29
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Changes (by Natalia Bidart):

* cc: Jake Howard (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:10>

Django

unread,
Jul 29, 2025, 12:08:15β€―PMJul 29
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Jake Howard):

Encoding a string into bytes is incredibly fast. Perhaps doing the short-
circuit, converting the data to bytes and then parsing it with
`email.message` is fast enough? It's still going to be a performance
regression, but hopefully not by quite as much.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:11>

Django

unread,
Jul 29, 2025, 1:05:03β€―PMJul 29
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

Replying to [comment:11 Jake Howard]:
> Encoding a string into bytes is incredibly fast. Perhaps doing the
short-circuit, converting the data to bytes and then parsing it with
`email.message` is fast enough? It's still going to be a performance
regression, but hopefully not by quite as much.

Below are benchmark results using my testing code, where `line =
line.encode("utf-8")` is called inside the benchmarked function, just
before invoking `Message.get_params()`. This approach could work: we see a
potential 2x performance improvement for very large headers (which are
often the source of security reports). However, the average case (having a
`charset` and potentially a `boundary`) appears to suffer a 3x slowdown,
and I don't think we could easily avoid that penalty.

||= Python 3.11 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.337 || 1.635 || 1.672 || 0.20 || 5.0x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.362 || 4.348 ||
4.514 || 0.30 || 3.3x slower ||
|| `application/x-stuff; ...` || 1.955 || 8.675 || 2.494 || 1.27 || 1.3x
slower ||

||= Python 3.12 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.356 || 1.657 || 1.725 || 0.21 || 4.8x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.407 || 4.582 ||
4.697 || 0.30 || 3.3x slower ||
|| `application/x-stuff; ...` || 2.017 || 9.609 || 2.645 || 1.31 || 1.3x
slower ||

||= Python 3.13 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.325 || 1.613 || 1.717 || 0.19 || 5.3x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.167 || 3.862 ||
3.943 || 0.30 || 3.4x slower ||
|| `application/x-stuff; ...` || 4.263 || 9.445 || 2.252 || 1.89 || 1.9x
faster ||

||= Python 3.14 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.258 || 1.601 || 1.725 || 0.16 || 6.7x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.037 || 3.773 ||
3.870 || 0.27 || 3.7x slower ||
|| `application/x-stuff; ...` || 3.978 || 8.789 || 2.132 || 1.87 || 1.9x
faster ||
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:12>

Django

unread,
Jul 30, 2025, 3:07:04β€―AMJul 30
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Carlton Gibson):

I wonder if the best option isn't to revert
9aabe7eae3eeb3e64c5a0f3687118cd806158550, reopen #35440, and assess again
in more leisurely time, keeping this performance issue in mind. πŸ€”
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:13>

Django

unread,
Jul 30, 2025, 4:46:58β€―AMJul 30
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Jake Howard):

That sounds sensible to me.

As an aside, the parsing implementations from
[https://github.com/psf/requests/blob/4b9c546aa3f35fca6ca24945376fe7462bb007c4/src/requests/utils.py#L504
requests] and
[https://github.com/overdosyu/werkzeug/blob/19a4a145f68535ad533d5a34bfad8474c87c54e2/werkzeug/http.py#L318
werkzeug] (ie `flask`) are both hand-rolled, and differ a fair bit from
Django's. It's possibly using those implementations may be faster (or at
least be used to accelerate what Django already has).

I'd be interested to see how those benchmark against `cgi` and
`email.message`.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:14>

Django

unread,
Aug 8, 2025, 3:19:31β€―PMAug 8
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

Following up on your interest, Jake, here are the comparisons using the
linked implementations. All ratios are calculated against the base
`cgi`-copied implementation currently in Django 5.2.x (and real `cgi` in
3.11 and 3.12). The next step, which I'd like to know if you're interested
in, is to verify whether the comparisons with Requests and Werkzeug are
fair and whether these implementations can be considered both robust and
accurate.


||= Python 3.14 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params(bytes)` =||
|| ` text/plain ` || 0.257 || 0.114 πŸš€ -2.3x ||
0.717 🐒 2.8x || 1.664 🐒 6.5x || 1.690 🐒 6.6x ||
|| `text/html; charset=UTF-8; b...` || 1.033 || 0.494 πŸš€ -2.1x ||
1.538 🐒 1.5x || 3.693 🐒 3.6x || 3.907 🐒 3.8x ||
|| `application/x-stuff; title*...` || 3.866 || 0.695 πŸš€ -5.6x ||
1.982 πŸš€ -2.0x || 8.487 🐒 2.2x || 2.152 πŸš€ -1.8x ||

||= Python 3.13 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params(bytes)` =||
|| ` text/plain ` || 0.308 || 0.134 πŸš€ -2.3x ||
0.764 🐒 2.5x || 1.661 🐒 5.4x || 1.689 🐒 5.5x ||
|| `text/html; charset=UTF-8; b...` || 1.135 || 0.527 πŸš€ -2.2x ||
1.617 🐒 1.4x || 3.769 🐒 3.3x || 3.980 🐒 3.5x ||
|| `application/x-stuff; title*...` || 4.224 || 0.721 πŸš€ -5.9x ||
2.069 πŸš€ -2.0x || 9.363 🐒 2.2x || 2.164 πŸš€ -2.0x ||

||= Python 3.12 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params(bytes)` =||
|| ` text/plain ` || 0.329 || 0.136 πŸš€ -2.4x ||
0.756 🐒 2.3x || 1.683 🐒 5.1x || 1.816 🐒 5.5x ||
|| `text/html; charset=UTF-8; b...` || 1.441 || 0.623 πŸš€ -2.3x ||
1.718 🐒 1.2x || 4.599 🐒 3.2x || 4.709 🐒 3.3x ||
|| `application/x-stuff; title*...` || 1.997 || 0.827 πŸš€ -2.4x ||
2.199 🐒 1.1x || 9.565 🐒 4.8x || 2.521 🐒 1.3x ||

||= Python 3.11 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params(bytes)` =||
|| ` text/plain ` || 0.322 || 0.123 πŸš€ -2.6x ||
0.696 🐒 2.2x || 1.671 🐒 5.2x || 1.667 🐒 5.2x ||
|| `text/html; charset=UTF-8; b...` || 1.360 || 0.573 πŸš€ -2.4x ||
1.604 🐒 1.2x || 4.337 🐒 3.2x || 4.498 🐒 3.3x ||
|| `application/x-stuff; title*...` || 1.868 || 0.757 πŸš€ -2.5x ||
2.106 🐒 1.1x || 8.624 🐒 4.6x || 2.472 🐒 1.3x ||
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:15>

Django

unread,
Aug 8, 2025, 3:28:49β€―PMAug 8
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

The benchmark script can be found
[https://gist.github.com/nessita/6b4e659f8af0ce9fbd4c78ca0093a0fe here].

Since `werkzeug` performs nearly as slowly as `get_params(bytes)`, and
although I’m hesitant (but not opposed) to reverting the fix, would it be
acceptable to address this by encoding to bytes the header line while
continuing to use Python's stdlib `Message().get_params()`?
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:16>

Django

unread,
Aug 20, 2025, 8:11:18β€―AMAug 20
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Jake Howard):

The `requests` implementation was the most interesting to me, being so
much faster. After plugging it in to Django however, it seems it's not
quite as robust - many of the test cases fail and produce very unexpected
results. `werkzeug`'s implementation appears to be more robust to the
weird edge cases we need to handle, but still isn't perfect.

To revert or not to revert feels difficult. On the one hand, the `cgi`
implementation is faster almost across the board. However, it's a fair bit
of work to maintain, and parser code like that tends to be where security
issues get introduced. `Message().get_params` is slower (at least on the
easy cases), but using `bytes` makes it slightly less slower than it
currently is, is the officially recommended path forward, and could get
improvements as time goes on (removing the need to create a new `Message`
instance each time improves performance quite a bit). Personally, I'd vote
for adding the short-circuit and `bytes` changes over reverting.

As an aside, to integrate `requests` and `werkzeug`'s implementations into
Django, I had to modify how multi-part form parsing worked, since it
passed the entire header to `parse_header_parameters`, rather than just
the value. I doubt it'll be a performance improvement, but probably
something worth cleaning up regardless, especially if we do swap out the
implementation. I can push up a PR if it's useful?
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:17>

Django

unread,
Aug 20, 2025, 9:54:18β€―AMAug 20
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

Replying to [comment:17 Jake Howard]:
> The `requests` implementation was the most interesting to me, being so
much faster. After plugging it in to Django however, it seems it's not
quite as robust - many of the test cases fail and produce very unexpected
results. `werkzeug`'s implementation appears to be more robust to the
weird edge cases we need to handle, but still isn't perfect.

Thanks for checking!

> To revert or not to revert feels difficult. On the one hand, the `cgi`
implementation is faster almost across the board. However, it's a fair bit
of work to maintain, and parser code like that tends to be where security
issues get introduced. `Message().get_params` is slower (at least on the
easy cases), but using `bytes` makes it slightly less slower than it
currently is, is the officially recommended path forward, and could get
improvements as time goes on (removing the need to create a new `Message`
instance each time improves performance quite a bit). Personally, I'd vote
for adding the short-circuit and `bytes` changes over reverting.

I fully agree with this rationale.

> As an aside, to integrate `requests` and `werkzeug`'s implementations
into Django, I had to modify how multi-part form parsing worked, since it
passed the entire header to `parse_header_parameters`, rather than just
the value. I doubt it'll be a performance improvement, but probably
something worth cleaning up regardless, especially if we do swap out the
implementation. I can push up a PR if it's useful?

Yes please, a PR with these fixes would be much appreciated. Let me know
when it's up so I can prioritize it for reviews. Thank you for
investigating further!
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:18>

Django

unread,
Aug 28, 2025, 1:25:48β€―PMΒ (7 days ago)Β Aug 28
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by nessita <124304+nessita@…>):

In [changeset:"41ff30f6f9d072036be1f74db8f0c8b21565299f" 41ff30f]:
{{{#!CommitTicketReference repository=""
revision="41ff30f6f9d072036be1f74db8f0c8b21565299f"
Refs #36520 -- Ensured only the header value is passed to
parse_header_parameters for multipart requests.

Header parsing should apply only to the header value. The previous
implementation happened to work but relied on unintended behavior.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:19>

Django

unread,
Sep 2, 2025, 4:39:51β€―PMΒ (2 days ago)Β Sep 2
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* owner: (none) => Natalia Bidart
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:20>

Django

unread,
Sep 3, 2025, 11:04:04β€―AMΒ (yesterday)Β Sep 3
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

I started looking into this AGAIN to see how the branch using bytes would
look like. In my head, "passing bytes" to `Message.get_params` would just
be:

{{{#!diff
diff --git a/django/utils/http.py b/django/utils/http.py
index 504f28c678..54dc85aead 100644
--- a/django/utils/http.py
+++ b/django/utils/http.py
@@ -327,7 +327,7 @@ def parse_header_parameters(line,
max_length=MAX_HEADER_LENGTH):
raise ValueError("Unable to parse header parameters (value too
long).")

m = Message()
- m["content-type"] = line
+ m["content-type"] = line.encode("utf-8")
params = m.get_params()

pdict = {}
}}}

But this change generates all sort of test failures: some are "expected"
because we'd need to `decode` the results into what's actually expected as
a `str` result, but other show jusy plain wrong results. For example, for
a `rfc2231` like this:

{{{
Content-Type: application/x-stuff; title*=us-ascii'en-
us'This%20is%20%2A%2A%2Afun%2A%2A%2A
}}}

Passing str to `Message.get_params` results in the correct:

{{{
[('Content-Type: application/x-stuff', ''), ('title', ('us-ascii', 'en-
us', 'This is ***fun***'))]
}}}

But passing bytes yields:

{{{
[('b"content-type: application/x-stuff; title*', 'us-ascii\'en-
us\'This%20is%20%2A%2A%2Afun%2A%2A%2A"')]
}}}

I further debugged and ended up in a rabbit hole in the Python code, to
realize that `get_params` is not, at least to my exhausted eyes, expecting
bytes in any way! I posted an update in the Python Discourse, but I would
truly appreciate your opinion on this since at this point my head is very
confused.

I dug into `Message.get_params()`, and from the call chain it seems clear
that everything is built around `str` headers. For example,
`_parseparam()` (the call chain is `get_params` -> `_get_params_preserve`
-> `_parseparam`) starts with these two lines
(https://github.com/python/cpython/blob/main/Lib/email/message.py#L73):

{{{#!python
def _parseparam(s):
# RDM This might be a Header, so for now stringify it.
s = ';' + str(s)
}}}

It then string-splits, string-strips, and string-compares all the way
down. If I assign bytes to the header, the parsing doesn’t work as
expected: you basically just get the `repr()` of the bytes object. So
while I understand the idea of avoiding decoding overhead, I don't see a
way to feed bytes into `Message` and get equivalent results. The API
really assumes `str` throughout.

=== Side note===

There is ALSO the question of whether the header name should or should not
be passed to `parse_header_parameters`. On a second look in our code, I
think that `line` is expected to have the header name in it. Tests are
super confusing in this regard, see how `test_basic` does not pass it but
the rest of test cases do!
https://github.com/django/django/blob/main/tests/utils_tests/test_http.py
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:21>

Django

unread,
Sep 3, 2025, 4:07:33β€―PMΒ (yesterday)Β Sep 3
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

Going with David's short-circuit suggestion in comment:2 seems wise to me.
I don't think it's worth reverting and waiting to hear from upstream,
since if I were CPython I don't think I'd take any action:
- the performance impact although measurable is extremely minor
- it's inside a
[https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_param
"legacy method"] for the "legacy" `Message` class
- the burden of optimizing for the "right thing" in David's words is
probably on us (Django)

Re: "legacy", porting to `EmailMessage` performs 10x worse. For posterity
I had something like this, but with test failures:
{{{#!diff
diff --git a/django/utils/http.py b/django/utils/http.py
index 504f28c678..8a05780160 100644
--- a/django/utils/http.py
+++ b/django/utils/http.py
@@ -3,7 +3,7 @@ import re
import unicodedata
from binascii import Error as BinasciiError
from datetime import UTC, datetime
-from email.message import Message
+from email.message import EmailMessage
from email.utils import collapse_rfc2231_value, formatdate
from urllib.parse import quote
from urllib.parse import urlencode as original_urlencode
@@ -326,13 +326,13 @@ def parse_header_parameters(line,
max_length=MAX_HEADER_LENGTH):
if max_length is not None and line and len(line) > max_length:
raise ValueError("Unable to parse header parameters (value too
long).")

- m = Message()
+ m = EmailMessage()
m["content-type"] = line
- params = m.get_params()
+ params = m["content-type"].params

pdict = {}
- key = params.pop(0)[0].lower()
- for name, value in params:
+ key = m["content-type"].split(";")[0]
+ for name, value in params.items():
if not name:
continue
if isinstance(value, tuple):
}}}

----
I checked DRF, and DRF calls Django's `parse_header_parameters` in
[https://github.com/encode/django-rest-
framework/blob/14728485018098a477e302afebc6d61547a0789f/rest_framework/negotiation.py#L51
a nested loop], about 17 times for the view I was testing. The values it
tested were:

{{{
application/json
text/html; q=1.0
*/*
application/json
text/html; q=1.0
text/html
text/html; q=1.0
text/html; q=1.0
text/html
application/json
application/json
application/json
application/json
text/html; q=1.0
text/html; q=1.0
application/json
multipart/form-data
}}}

Simplifying the bench (to remove the red herring bytes stuff and add in
David's suggestion from comment:2):
||= Python 3.13 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params#comment:2` =||
|| ` application/json ` || 0.358 || 0.145 πŸš€ -2.5x ||
0.806 🐒 2.3x || 1.758 🐒 4.9x || 0.089 πŸš€ -4.0x ||
|| ` text/html; q=1.0 ` || 0.887 || 0.375 πŸš€ -2.4x ||
1.210 🐒 1.4x || 2.954 🐒 3.3x || 3.497 🐒 3.9x ||


So, with comment:2, all but a handful of these calls will be faster. I
think we could call this a wash at that point.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:22>

Django

unread,
Sep 3, 2025, 4:09:03β€―PMΒ (yesterday)Β Sep 3
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by David Smith):

Looking at the original [https://peps.python.org/pep-0333/#environ-
variables PEP 333] under envion variables it says...

> Also note that CGI-defined variables must be strings, if they are
present at all. It is a violation of this specification for a CGI
variable’s value to be of any type other than str.

And the modern [https://peps.python.org/pep-3333/#a-note-on-string-types
PEP3333] it says:

> β€œNative” strings (which are always implemented using the type named str)
that are used for request/response headers and metadata.

So the comments above about `get_params` working with strings and WSGI
environ being `str` seems to be in line with the spec.

Given how close to the feature freeze we are, I support Carlton's comment
about reverting the change and coming back to it at a more leisurely pace.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:23>

Django

unread,
Sep 3, 2025, 4:33:25β€―PMΒ (yesterday)Β Sep 3
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* cc: Mike Edmunds (added)

Comment:

Mike (added as cc), would you have an opinion? I consider your an expert
in everything relating to Python email's API (legacy and modern message
classes).
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:24>

Django

unread,
Sep 3, 2025, 9:21:20β€―PMΒ (yesterday)Β Sep 3
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Mike Edmunds):

I don't have any deep insights here (and my expertise in Python's email
APIs is, at best, at the "enough knowledge to be dangerous" level). But
from what I've seen:

* I was… surprised to see Python recommending the ''legacy'' email APIs as
a replacement for the cgi package. I've been under the impression the
legacy APIs are being gently edged toward deprecation.
* Python's email package (both APIs) suffers from a severe shortage of
maintainer bandwidth. Even security fixes can take months or years to
land. Having two differentβ€”but intertwinedβ€”email APIs doesn't help. (To be
fair, the cgi package had exactly zero maintainers, which is strictly
worse.)
* I don't think performance has been a primary consideration in either
email API's design. (The modern API prioritizes simplified DX around
complying with newer RFCs. It has several extra layers compared to the
legacy API, so it's probably going to be slower. But maybe more accurate
in some corner cases.)
* Although both email APIs are specced to parse HTTP headers (MIME is
MIME), I'm guessing neither has had as much real-world use for that
purpose as the cgi package did. It wouldn't be surprising to see a bunch
of HTTP-specific email package issues emerge over the next few years.
(Call this the first one.)

Given all that, I suspect Django would be better off ''not'' using
Python's email package to parse HTTP headers. The technical and
maintenance requirements aren't well aligned. Jacob's analysis in
comment:22 seems spot on. James Webber's [https://discuss.python.org/t
/performance-regression-from-using-email-message-message-get-params-as-
replacement-of-cgi-parse-header-following-pep-594/99963/4 reply] to
Natalia's Python forum post also puts it pretty well.

Post-6.0, perhaps there could be a collaboration with Flask's maintainers
on a high-performance, secure, maintained HTTP header parser purpose-built
for web framework needs? (Or maybe there's already one in rust just
waiting for some Python bindings?)
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:25>

Django

unread,
10:45β€―AMΒ (11 hours ago)Β 10:45β€―AM
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

Replying to [comment:25 Mike Edmunds]:
> Given all that, I suspect Django would be better off ''not'' using
Python's email package to parse HTTP headers. The technical and
maintenance requirements aren't well aligned. Jacob's analysis in
comment:22 seems spot on. James Webber's [https://discuss.python.org/t
/performance-regression-from-using-email-message-message-get-params-as-
replacement-of-cgi-parse-header-following-pep-594/99963/4 reply] to
Natalia's Python forum post also puts it pretty well.

Thanks so much, Mike! This is incredibly useful. Jacob and I will chat
more in person next week at DjangoCon US, but it looks like we may need to
go with the revert for now. I really appreciate your perspective, it helps
a lot in framing the next steps.

> Post-6.0, perhaps there could be a collaboration with Flask's
maintainers on a high-performance, secure, maintained HTTP header parser
purpose-built for web framework needs? (Or maybe there's already one in
rust just waiting for some Python bindings?)

Current Flask parsing is slower than our backported `cgi` version. Were
you envisioning a collaboration to optimize their implementation for
performance (perhaps reusing some of `cgi` bits) or to find a way to use
ours across projects under a shared agreement?
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:26>

Django

unread,
12:44β€―PMΒ (9 hours ago)Β 12:44β€―PM
to django-...@googlegroups.com
#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Mike Edmunds):

Replying to [comment:26 Natalia Bidart]:
> Replying to [comment:25 Mike Edmunds]:
> > Post-6.0, perhaps there could be a collaboration with Flask's
maintainers on a high-performance, secure, maintained HTTP header parser
purpose-built for web framework needs? (Or maybe there's already one in
rust just waiting for some Python bindings?)
>
> Current Flask parsing is slower than our backported `cgi` version. Were
you envisioning a collaboration to optimize their implementation for
performance (perhaps reusing some of `cgi` bits) or to find a way to use
ours across projects under a shared agreement?

I was thinking about your comment in the Python forum that "relying on the
Python stdlib whenever possible improves maintainability, reduces
potential security risks, and benefits from broader community scrutiny."
Although generally true, it seems like that isn't the case for this
particular use. But that maybe there's an opportunity for web frameworks
with that shared need (should've included Tornado, probably others) to
create a shared, de facto standard package outside of Python stdlib.

(That said, I don't see anything wrong with Django maintaining its own
backported bits of `cgi`.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:27>
Reply all
Reply to author
Forward
0 new messages