#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Jacob Walls):
Thanks for the script in #36655, Adam.
> I don't think this was the right call—latency was not discussed at all.
The OP also seemed to be using a very naive approach, hooking up a
streaming JSON library that yields individual tokens, which sounds like an
uncommon way of using StreamingHttpResponse.
It seems reasonable to revert #24242 if that is indeed uncommon and
doesn't reintroduce a performance problem -- so long as we don't introduce
additional API, see comment:5 -- but I checked our example of
[
https://docs.djangoproject.com/en/stable/howto/outputting-csv/#streaming-
large-csv-files streaming a large CSV], and it yields every row, producing
a similar problem.
Adjusting your script like this...
{{{#!py
#!/usr/bin/env uv run --script
# /// script
# requires-python = ">=3.14"
# dependencies = [
# "django",
# ]
# ///
import csv
import os
import sys
from django.conf import settings
from django.core.wsgi import get_wsgi_application
from django.http import StreamingHttpResponse
from django.urls import path
settings.configure(
# Dangerous: disable host header validation
ALLOWED_HOSTS=["*"],
# Use DEBUG=1 to enable debug mode
DEBUG=(os.environ.get("DEBUG", "") == "1"),
# Make this module the urlconf
ROOT_URLCONF=__name__,
# Only gzip middleware
MIDDLEWARE=[
"django.middleware.gzip.GZipMiddleware",
],
)
class Echo:
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a
buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum
number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
return StreamingHttpResponse(
(writer.writerow(row) for row in rows),
# Show the response in the browser to more easily measure
response.
content_type="text/plain; charset=utf-8",
# content_type="text/csv",
# headers={"Content-Disposition": 'attachment;
filename="somefilename.csv"'},
)
urlpatterns = [
path("", some_streaming_csv_view),
]
app = get_wsgi_application()
if __name__ == "__main__":
from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
}}}
...gives:
{{{
gzipped, no flushing: 315KB, 194ms
gzipped, with flushing: 859KB, 1.5sec
no gzipping: 1.09MB, 450ms
}}}
In the case of adding flushing the gzipping triples the response time to
produce a 10% smaller response.
What do you suggest we do here?
--
Ticket URL: <
https://code.djangoproject.com/ticket/36293#comment:11>