Re: [Django] #36293: Extend `django.utils.text.compress_sequence()` to optionally flush data written to compressed file

21 views
Skip to first unread message

Django

unread,
Apr 4, 2025, 5:06:09 AM4/4/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Carlton Gibson):

I'd like to see more exploration of solutions in project-space before we
add API here in Django. In particular, from the PR, the magic `"text
/event-stream"` restriction here is... well... a little ad hoc.

Better would be to subclass the GZIP middleware, and just skip compression
for for SSE responses (presuming events of less than 17kb). Such would be
a small number of lines, and quite a simple approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 7, 2025, 6:08:35 AM4/7/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by huoyinghui):

I’ve already resolved this in my own project by subclassing GZipMiddleware
to skip compression for responses with Content-Type: text/event-stream.
However, I believe this is a common enough use case that it deserves
better support at the framework level.

SSE relies on real-time delivery, and buffering caused by gzip (especially
under the 17KB threshold) can introduce noticeable latency on the client
side. Developers may not immediately realize gzip is the cause, leading to
unnecessary debugging time.

It would be helpful if Django could:
1. Document this behavior and its impact on SSE;
2. Automatically skip compression for text/event-stream
responses;
3. Or offer a more explicit way to opt-out of compression per
response.

Supporting this natively would improve the developer experience and better
accommodate streaming use cases.

Here’s my test case:

def test_sse_middleware_skipped(self):
from django.middleware.gzip import GZipMiddleware
from django.http import StreamingHttpResponse

class DummyRequest:
META = {"HTTP_ACCEPT_ENCODING": "gzip"}

def event_stream():
for i in range(3):
yield f"data: {i}\n\n".encode("utf-8")
time.sleep(1)

response = StreamingHttpResponse(event_stream(), content_type="text
/event-stream")
middleware = GZipMiddleware(lambda req: response)
result = middleware(DummyRequest())

first_chunk = next(result.streaming_content)
self.assertFalse(first_chunk.startswith(b"\x1f\x8b"), "SSE response
should not be gzipped")

Test results:
• Case 1: flush_each=True
✅ The client receives each event promptly — no visible delay.
• Case 2: flush_each=False (default gzip behavior)
⚠️ All events are buffered and maybe delivered only after ~17KB, which
defeats SSE’s purpose. The client appears stuck until buffer threshold is
reached.
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:6>

Django

unread,
Apr 7, 2025, 6:19:01 AM4/7/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by huoyinghui):

* Attachment "image-20250407-181853.png" added.

--
Ticket URL: <https://code.djangoproject.com/ticket/36293>

Django

unread,
Apr 7, 2025, 6:19:32 AM4/7/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by huoyinghui):

* Attachment "image-20250407-181923.png" added.

Django

unread,
Apr 7, 2025, 6:35:32 AM4/7/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Carlton Gibson):

As per the
[https://docs.djangoproject.com/en/5.1/ref/middleware/#django.middleware.gzip.GZipMiddleware
GZip middleware documentation], response content will not be compressed if
(among other options):

> The response has already set the Content-Encoding header.

You should then be able to set the [https://www.iana.org/assignments/http-
parameters/http-parameters.xhtml#content-coding `identity` content
encoding] before sending the response to bypass the middleware here.
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:7>

Django

unread,
Apr 8, 2025, 1:42:50 AM4/8/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by huoyinghui):

Replying to [comment:5 Carlton Gibson]:

This is a good suggestion—it’s simple to implement and avoids having the
real-time performance of the SSE request blocked by compress_sequence. I
accept it.

I think the significance of this issue lies in the fact that when users
use Django to develop SSE requests, they may experience sudden blocking.
However, it’s not directly caused by their own actions, making it hard to
understand and debug.

Perhaps documentation can be added to explain how Django handles SSE
responses. To avoid blocking, users need to configure response["Content-
Encoding"] == "identity".
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:8>

Django

unread,
Oct 10, 2025, 4:24:58 PM10/10/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Natalia Bidart):

#36656 seems to be a dupe.
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:9>

Django

unread,
Oct 10, 2025, 7:30:15 PM10/10/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Adam Johnson):

I think Natalia got mixed up and meant to report #36655 as a dupe.

In my report there, I demonstrated the issue using HTML, rather than SSE.
It could be similarly troublesome to buffer content when streaming HTML,
especially if there's time between chunks, such as from database queries.

In fact, I think if you're using `StreamingHttpResponse`, any buffering
from outer layers is unacceptable. It can always be done inside your
iterator, if necessary, but once a chunk is ready, Django should pass it
out as fast as possible.

#24242 removed the `flush()` call to reduce the total data transfer. I
don't think this was the right call—latency was not discussed at all. The
OP also seemed to be using a very naive approach, hooking up a streaming
JSON library that yields individual tokens, which sounds like an uncommon
way of using `StreamingHttpResponse`.

So I would say reopen this ticket and revert #24242—no option to flush.
That's the approach I effectively implemented in django-http-compression
([https://github.com/adamchainz/django-http-compression/pull/8 PR]), when
I wasn't aware that `flush()` was ever removed.
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:10>

Django

unread,
Oct 17, 2025, 1:14:45 PM10/17/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Jacob Walls):

Thanks for the script in #36655, Adam.

> I don't think this was the right call—latency was not discussed at all.
The OP also seemed to be using a very naive approach, hooking up a
streaming JSON library that yields individual tokens, which sounds like an
uncommon way of using StreamingHttpResponse.

It seems reasonable to revert #24242 if that is indeed uncommon and
doesn't reintroduce a performance problem -- so long as we don't introduce
additional API, see comment:5 -- but I checked our example of
[https://docs.djangoproject.com/en/stable/howto/outputting-csv/#streaming-
large-csv-files streaming a large CSV], and it yields every row, producing
a similar problem.

Adjusting your script like this...

{{{#!py
#!/usr/bin/env uv run --script
# /// script
# requires-python = ">=3.14"
# dependencies = [
# "django",
# ]
# ///
import csv
import os
import sys

from django.conf import settings
from django.core.wsgi import get_wsgi_application
from django.http import StreamingHttpResponse
from django.urls import path

settings.configure(
# Dangerous: disable host header validation
ALLOWED_HOSTS=["*"],
# Use DEBUG=1 to enable debug mode
DEBUG=(os.environ.get("DEBUG", "") == "1"),
# Make this module the urlconf
ROOT_URLCONF=__name__,
# Only gzip middleware
MIDDLEWARE=[
"django.middleware.gzip.GZipMiddleware",
],
)


class Echo:
"""An object that implements just the write method of the file-like
interface.
"""

def write(self, value):
"""Write the value by returning it, instead of storing in a
buffer."""
return value


def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum
number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
return StreamingHttpResponse(
(writer.writerow(row) for row in rows),
# Show the response in the browser to more easily measure
response.
content_type="text/plain; charset=utf-8",
# content_type="text/csv",
# headers={"Content-Disposition": 'attachment;
filename="somefilename.csv"'},
)

urlpatterns = [
path("", some_streaming_csv_view),
]

app = get_wsgi_application()

if __name__ == "__main__":
from django.core.management import execute_from_command_line

execute_from_command_line(sys.argv)
}}}

...gives:
{{{
gzipped, no flushing: 315KB, 194ms
gzipped, with flushing: 859KB, 1.5sec
no gzipping: 1.09MB, 450ms
}}}
In the case of adding flushing the gzipping triples the response time to
produce a 10% smaller response.

What do you suggest we do here?
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:11>

Django

unread,
Oct 18, 2025, 3:08:29 AM10/18/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Carlton Gibson):

> …so long as we don't introduce additional API…

Just to clarify, I'm not anti more API here ''per se''. Rather, I'd like
to see us be sure we've thought it through before we do that.
(I have a pretty strong suspicion that an ounce of docs would go a long
way here (especially given the identity encoding suggestion above)

> In the case of adding flushing the gzipping triples the response time to
produce a 10% smaller response.

I think there's a typo there somewhere. 🤔 (The numbers as written have
triple response time for ≈double response size, unless I'm failing to read
it right.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:12>

Django

unread,
Oct 18, 2025, 8:35:58 AM10/18/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Jacob Walls):

> (The numbers as written have triple response time for ≈double response
size, unless I'm failing to read it right.)

Ah, I was comparing GZipMiddleware + Adam's proposal (859KB) to *no
GZIPMiddleware* (1.09MB), so it's about ~~10%~~ 22% smaller. (This was
because in #24242 the ask was "flushing makes the middleware useless in my
case", and I wanted to evaluate Adam's suggestion that that was a deviant
case.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:13>

Django

unread,
Oct 24, 2025, 7:10:41 PM10/24/25
to django-...@googlegroups.com
#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Adam Johnson):

Thanks for using that example Jacob.

However, I don’t think it’s a great template for streaming responses,
pushing one line at a time. The approach is quite wasteful for HTTP, as
each line is pushed in its own packet. Moreover, it would be wasteful to
load one record at a time from the database.

A better example would paginate a queryset and yield one page of rows at a
time. This would be more optimal in all layers, compression included. And
this should stnad for streaming all kinds of data: HTML, JSON, CSV, ...

I modified the example to send batches of 100 rows at a time using
`itertools.batched`:

{{{#!python
#!/usr/bin/env uv run --script
# /// script
# requires-python = ">=3.14"
# dependencies = [
# "django",
# ]
#
# [tool.uv.sources]
# django = { path = "../../../django", editable = true }
# ///
from __future__ import annotations

import csv
import os
import sys
from io import StringIO
from itertools import batched

from django.conf import settings
from django.core.wsgi import get_wsgi_application
from django.http import StreamingHttpResponse
from django.urls import path

settings.configure(
# Dangerous: disable host header validation
ALLOWED_HOSTS=["*"],
# Use DEBUG=1 to enable debug mode
DEBUG=(os.environ.get("DEBUG", "") == "1"),
# Make this module the urlconf
ROOT_URLCONF=__name__,
# Only gzip middleware
MIDDLEWARE=[
"django.middleware.gzip.GZipMiddleware",
],
)


def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum
number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = ([f"Row {idx}", str(idx)] for idx in range(65536))
buffer = StringIO()
writer = csv.writer(buffer)

def stream_rows():
for batch in batched(rows, 100):
buffer.seek(0)
buffer.truncate()
writer.writerows(batch)
yield buffer.getvalue()

return StreamingHttpResponse(
stream_rows(),
# Show the response in the browser to more easily measure
response.
content_type="text/plain; charset=utf-8",
# content_type="text/csv",
# headers={"Content-Disposition": 'attachment;
filename="somefilename.csv"'},
)


urlpatterns = [
path("", some_streaming_csv_view),
]

app = get_wsgi_application()

if __name__ == "__main__":
from django.core.management import execute_from_command_line

execute_from_command_line(sys.argv)
}}}

(Note editable Django install in metadata, needs correct path.)

I used this cURL command to measure the same stats you were checking:

{{{
curl -w "%{size_download} %{time_total}" -o /dev/null -s -H "Accept-
Encoding: gzip" --raw "http://127.0.0.1:8000/" | awk '{printf "%.0fKB,
%.0fms\n", $1/1024, $2*1000}'
}}}

It gave me these results:

gzipped, no flushing: 307KB, 44ms
gzipped, flushing: 287KB, 52ms
no gzipping: 1066KB, 30ms

This change makes the flushing version *more* optimal than the flushless
one (73% savings versus 71%), while a bit (~15%) slower. I think we can
always expect some slowdown because we’ll send more packets, but the per-
chunk latency saving is worth it.

So maybe we should rewrite that example, as well as restore the flushing?
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:14>
Reply all
Reply to author
Forward
0 new messages