I'd like to re-visit the discussion surrounding #7581 [1], a ticket about streaming responses that is getting quite long in the tooth now, which Jacob finally "accepted" 11 months ago (after a long time as DDN) and said that it is clear we have to do *something*, but *what* remains to be seen.
I'd like to try provide a little refresher and summarise the options that have been suggested, and ask any core devs to please weigh in with their preference so that I can work up a new patch that will be more likely to gain acceptance.
THE PROBLEM:
1. There are bugs and surprising behaviour that arise when creating an HttpResponse with a generator as its content, as a result of "quantum state" of `HttpResponse.content` (measuring it changes it).
>>> from django.http import HttpResponse
>>> r = HttpResponse(iter(['a', 'b']))
>>> r.content
'ab'
>>> r.content
''
>>> r2 = HttpResponse(iter(['a', 'b']))
>>> print r2.content
ab
>>> print r2.content
>>> r3 = HttpResponse(iter(['a', 'b']))
>>> r3.content == r3.content
False
2. Some middleware prematurely consume generator content by accessing `HttpResponse.content`, which can use a lot of memory and cause browser timeouts when attempting to stream large amounts of data or slow-to-generate data.
There have been several tickets [2] [3] [4] and django-developers discussions [5] [6] [7] [8] about these issues.
SOME USE CASES FOR STREAMING RESPONSES:
A. Generating and exporting CSV data directly from the database.
B. Restricting file access to authenticated users for files that may be hosted on external servers.
C. Drip-feeding chunks of content to prevent timeout when requesting a page that takes a long time to generate.
OPTION 1:
Remove support for "streaming" responses. If an iterator is passed in as content to `HttpResponse`, consume it in `HttpResponse.__init__()` to eliminate buggy behaviour. Middleware won't have to worry about what type of content a response has.
Now that Jacob has accepted #7581 and said that it is clear we need to do *something*, I hope we can rule out this option.
OPTION 2:
Make `HttpResponse.__init__()` consume any iterator content, and add an `HttpResponseStreaming` class or an `HttpResponse(streaming=False)` argument. Allow middleware to check via `hasattr()` or `isinstance()` whether or not the response has generator content, and conditionally skip code that is incompatible with streaming responses.
Some middleware will have to be updated for compatibility with streaming responses, and any 3rd party middleware that prematurely consumes generator content will continue to work, only without the bugs (and potentially with increased memory usage and browser timeouts).
OPTION 3:
Build a capabilities API for `HttpResponse` objects, and have middleware inspect responses to determine "can I read content?", "can I replace content?", "can I change etag?", etc. This will likely become a bigger and more complicated design decision as we work out what capabilities we want to support. Some have argued that it should be sufficient to know if we have generator content or not, for all the cases that people have reported so far.
OPTION 4:
Provide a way for developers to specify on an `HttpResponse` object or subclass that specific middleware should be skipped for that response. This would be problematic because 3rd party views won't know what other middleware is installed in a project in order to name them for exclusion.
OPTION 5:
Add Yet Another Setting that would allow developers to define `CONDITIONAL_MIDDLEWARE_CLASSES` at a project level. At the project level, developers would know which middleware classes they are using, and when they should be executed or skipped. This would give very fine grained control at a project level to match middleware conditionally with `HttpResponse` subclasses, without requiring any changes to existing or 3rd party middleware. This could look something like this:
MIDDLEWARE_CLASSES = (
'django.middleware.common',
)
CONDITIONAL_MIDDLEWARE_CLASSES = {
'exclude': {
'django.http.HttpResponseStreaming': ['django.middleware.common', 'otherapp.othermiddleware', ...],
},
'include': {
'myapp.MyHttpResponse': ['myapp.mymiddleware', ...],
},
}
MY TAKE:
I think that option 1 and option 4 are non-starters.
I think option 3 is perhaps a little overkill and will be more difficult to get committed once we start thinking about what capabilities we want to support.
I think option 2 is probably going to be the easiest solution. It's practically implemented and up-to-date already (missing docs and tests).
Although it does involve Yet Another Setting, I think option 5 provides the most flexibility, where it is most needed. It gives developers working at the project level a way to override and conditionally skip or execute 3rd party middleware without having to make any changes to 3rd party middleware.
I would be happy with either option 2 or 5, or a variation.
NEXT STEPS:
I'd really like to see this and the related tickets closed (preferably marked "fixed"!) :)
I'm specifically looking for opinions and direction from any of the core devs, especially those who have previously commented on the ticket or in the discussions, even if it is just to permanently reject some of the options.
I'd also like to hear from anyone who has a new use case or a new solution to suggest.
I will happily work up a new patch if required (including docs and tests), even if just a proof of concept. I just need to know which way the core devs would like me to go.
Thanks.
Tai.