HttpResponseStreaming, ticket #7581, and [soc2009/http-wsgi-improvements] Final(?) status update

ccahoon

unread,

Aug 14, 2009, 4:13:30 AM8/14/09

to Django developers

My last task before the GSoC hard pencils down date on Monday is an
attempt at a solution to ticket #7581. The essential issue is that a
lot of middleware requires access to the whole of
HttpResponse.content, but this prevents streaming responses (to
prevent timeouts).

To allow for streaming responses, I have created a special
HttpResponse class, HttpResponseStreaming, which does not consume
generators/iterators until it is specifically instructed to, and which
only uses certain middleware. See changeset 11449 for my attempt at it
on my branch. It has pretty extensive documentation on which
middleware do work with it and brief instructions for adapting third-
party middleware to that end.

I imagine that some or much of this changeset will be controversial,
which is why I am posting it here. I think it provides a good balance
of not overcomplicating HttpResponse to handle both streaming and
normal responses, as well as only making minor changes to
core.handlers.base and some response middleware to make the whole
thing groovy. It works very well in my tests. However, I agree with
Malcolm's assessment that it is not quite a "slam-dunk" solution.

One bit of advice I am looking for is what people's opinions are on
the use of isinstance versus hasattr to determine functionality. I am
personally partial to hasattr, because it covers subclasses, but I do
not know what the community tends to go towards. Right now, in
core.handlers.base on my branch, I am using both to deal with
selective application of response middleware, but I would like to
choose one or the other.

Tai (Mr Machine) suggests that perhaps we should have HttpResponse
ALWAYS consume data passed to it on init (which it already does on my
branch), and have HttpStreamingResponse throw an exception when it is
not passed an iterator. I think this is reasonable, because there is
no point to using the semi-crippled (with respect to middleware)
HttpStreamingResponse if you don't take advantage of its main goal.

From his email to me: "If we can be sure HttpResponseStreaming always
has an iterator for content, middleware can just check if the response
is an instance of HttpResponseStreaming rather than checking for the
`content_generator` attribute, and if so, it can access/replace the
public `content_generator` attribute directly."

This also has some relevance to my hasattr/isinstance question from
earlier. I think I agree with trying to keep the _container/content as
an iterator, which would make replacing the generator a bit less
confusing than it is now (for an example, see the regression test in
the changeset).

I look forward to your input.

James Bennett

unread,

Aug 14, 2009, 4:57:32 AM8/14/09

to django-d...@googlegroups.com

On Fri, Aug 14, 2009 at 3:13 AM, ccahoon<chris....@gmail.com> wrote:
> One bit of advice I am looking for is what people's opinions are on
> the use of isinstance versus hasattr to determine functionality. I am
> personally partial to hasattr, because it covers subclasses, but I do
> not know what the community tends to go towards. Right now, in
> core.handlers.base on my branch, I am using both to deal with
> selective application of response middleware, but I would like to
> choose one or the other.

One thing I'm curious about here is the HTTP Content-Length header.
It's late and I'm tired and cranky, so I may have missed it, but
outside of the infrastructure for runserver I can't find any place
where Django reliably sets it on outgoing responses. And the WSGI
spec's stance is that if the client supports it, a missing
Content-Length can be taken as a cue to chunk the response
(unfortunately WSGI forbids the one HTTP header that truly serves as a
reliable source of information for this).

If we could get some sort of mechanism for having HttpResponse set the
Content-Length header (or not) as needed, we'd have something which
would be a lot more consistent, and not just at the Django level.
Middlewares which want to look at the response body could check for
the header: if it's there the response is safe to
inspect/consume/fiddle with, and if it's missing the response is not
safe to inspect/consume/fiddle with. Servers would still be free to do
what they like, and presumably people who want to stream out a
response piece by piece will use servers which implement the behavior
permitted by WSGI. This sidesteps your question about type checking,
and uses a mechanism which is at least semi-blessed by the gateway
protocol, so it has interoperability benefits beyond Django middleware
classes.

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Tai Lee

unread,

Aug 14, 2009, 7:07:20 AM8/14/09

to Django developers

I think I prefer keeping HttpResponse and HttpResponseStreaming
separate. Using the latter indicates explicitly that the user wants to
send a streaming response and disable any middleware that doesn't
explicitly support it. Using the former should always consume any
content passed to it so that middleware doesn't have to know or care
what type of content was passed to the HttpResponse.

This way only specific middleware are required to opt-in to support
streaming responses, rather than requiring all middleware to
repetitively examine the content type and opt-out to avoid breaking
streaming responses.

It also keeps the interface simple. HttpResponse.content will always
be a string, HttpResponseStreaming.content will always be an iterator,
and HttpResponse classes will only need to expose one attribute for
content either way.

And it's more explicit. Django will only disable middleware that
doesn't explicitly assert that it works with HttpResponseStreaming,
and only when the user explicitly creates an HttpResponseStreaming
object.

Regarding `hasattr` vs `isinstance`, the latter should also work with
subclasses and has the benefit of detecting only HttpResponseStreaming
(and derivative) objects, not just any object with a
`content_generator` attribute. If we use `isinstance`, the
HttpResponseStreaming class won't even need a `content_generator`
attribute, nor will middleware need to poke around in it's internals
to replace the content with a new generator.

Regarding the Content-Length header, HttpResponse should always set it
because the content will always be a string, but HttpResponseStreaming
should never set it because we can never know the length without
consuming the content.

ccahoon

unread,

Aug 14, 2009, 1:19:22 PM8/14/09

to Django developers

I'm in agreement with Tai. Keeping them separate gives them consistent
behavior for view authors to expect.

"One thing I'm curious about here is the HTTP Content-Length header.
It's late and I'm tired and cranky, so I may have missed it, but
outside of the infrastructure for runserver I can't find any place
where Django reliably sets it on outgoing responses."

This is the case, yes, if the HTTP and GZip middleware are not used.
We had to modify both of those to make sure they didn't consume the
content generator to find the Content-Length. I imagine, IF we wanted
to, setting the Content-Length header could be done in the base
handler. One thing I am not certain about (I am about to head out the
door) is if it isn't set BECAUSE the middleware can modify it. If so,
then it would certainly have to be set after the response middleware
were applied.

Tai Lee

unread,

Aug 14, 2009, 8:25:22 PM8/14/09

to Django developers

On Aug 15, 3:19 am, ccahoon <chris.cah...@gmail.com> wrote:
> This is the case, yes, if the HTTP and GZip middleware are not used.
> We had to modify both of those to make sure they didn't consume the
> content generator to find the Content-Length. I imagine, IF we wanted
> to, setting the Content-Length header could be done in the base
> handler. One thing I am not certain about (I am about to head out the
> door) is if it isn't set BECAUSE the middleware can modify it. If so,
> then it would certainly have to be set after the response middleware
> were applied.

We could add a method to HttoResponse.response_fixes that sets the
Content-Length header if not already set, but remove that fix from
HttpResponseStreaming. The response fixes are applied after response
middleware, so this should be a good place to ensure that all
HttpResponse objects have a Content-Length applied while respecting
any existing headers set by middleware and any changes to the response
content made by middleware.

We could also then simplify ConditionalGetMiddleware and
GZipMiddleware, neither of which would need to set the Content-Length
anymore. I think that right now depending on the order of these two
middleware, the Content-Length might be calculated twice. It makes
sense to do this only once after all middleware that might alter the
response have been applied.

Reply all

Reply to author

Forward