In GzipMiddleware you don't touch Content-Length if the response is not
string. This means that if a response has Content-Length already set it
will become incorrect upon gzipping the content. In this case
Content-Length can only be deleted from headers, though sending
responses with no determined length is a bit ugly...
I don't think we should be supporting on-demand generation like this. Go
back a year or 18 months in the archives and look at where Brian Harring
put up some patches to do that sort of thing. He and I and others spent
ages chasing down unintended side-effects and in the end it was
considered not worth it. That was tip-of-the-iceberg stuff. Yes, it's
probably possible to get it right, but I'm not convinced it's worth it.
It's quite a large restriction to say that no middleware should ever try
to examine the contents of the HttpResponse since it might be an
iterator that shouldn't be consumed. You're proposing a bunch of
specific changes for the middleware we have now to work around that, but
you're also imposing a huge functionality constraint on any future
middleware (all middleware will have to work with the possibility that a
non-consumable generator is there). You're also effectively said that
only certain content types can be streamed (because a whole bunch of
them *will* be affected by consuming middleware). "Just say no" is a
cleaner solution here.
I thought we'd fixed it, but apparently we haven't: if any iterator is
passed into an HttpResponse, it should be converted to a string
immediately so that things can index into it without doing
non-repeatable consumption.
The only thing that might be worth doing in this are is adding a way to
say "middleware should never be called on this response" and then
somebody can write their own HttpResponse subclass and be in complete
control of their destiny.
Adding all the complexity to the existing HttpResponse gets a -1 from
me. It's too easy to have unintended side-effects and not really in the
sweet spot of Django. It also doesn't seem to be a 1.0 requirement.
Malcolm
Malcolm, sorry, that won't work. I've implemented iterable HttpResponse
in the first place out of purely practical reason: web server times out
if a particularly long file was passed into an HttpResponse. And also it
was really slow and was consuming all the memory.
In fact this ticket appeared out of our conversation with Tai Lee. I
don't think you argument on adding complexity makes sense because not
all middleware should care about this. Why, say, TransactionMiddleware
should be affected? Or some 3rd-party HTMLValidationMiddleware which
won't touch huge iterators because it checks only HTML? This patch just
establishes a good practice to follow: if you expect a "reasonably
short" response in your middleware -- check for it.
Any middleware that examines the content has to pull the content into
memory in case it's an iterator. If they don't they're buggy because
they're consuming the content ahead of the web server. If somebody has
particular requirements like yours, then, absolutely, they should use a
subclass of HttpResponse that doesn't do that. And they know the
restrictions from that decision. But the default behaviour shouldn't
require repetitive practices from normal, everyday middleware.
> In fact this ticket appeared out of our conversation with Tai Lee. I
> don't think you argument on adding complexity makes sense because not
> all middleware should care about this. Why, say, TransactionMiddleware
> should be affected? Or some 3rd-party HTMLValidationMiddleware which
> won't touch huge iterators because it checks only HTML? This patch just
> establishes a good practice to follow: if you expect a "reasonably
> short" response in your middleware -- check for it.
If I write a third-party middleware that behaves exactly as the
content-length middleware does now, it won't work with something that
can only be used as a generator. Since I won't know ahead of time what
responses will be give to this middleware I am therefore forced to
assume such a generator will be passed in. Thus, a change like this is
saying that from now on, for all time, no middleware can look at the
content of the response unless it fits into the very narrow profile in
that ticket (matching one or other content types, etc).
Malcolm
Agreed.
> But the default behaviour shouldn't
> require repetitive practices from normal, everyday middleware.
What exactly you mean by "repetitive practices"? I'm afraid my English
fails me again, sorry!
> If I write a third-party middleware that behaves exactly as the
> content-length middleware does now, it won't work with something that
> can only be used as a generator. Since I won't know ahead of time what
> responses will be give to this middleware I am therefore forced to
> assume such a generator will be passed in. Thus, a change like this is
> saying that from now on, for all time, no middleware can look at the
> content of the response unless it fits into the very narrow profile in
> that ticket (matching one or other content types, etc).
If a middleware has to look at the content - it will. But why not make
certain standard middleware not required to look at the content? They
will be more useful then, and it doesn't imply, IMHO, that everyone
should do it. No?
This got me thinking that the real problem is, in fact, a bit different.
Right now middleware applies to all responses indifferently. But it
looks like there are at least one more or less common case -- returning
a big iterable -- for which most middleware don't make sense anyway and
it's better handled as an exception. So may be we should devise a
mechanism for such exceptions that won't require a middleware itself to
be aware of it.
At the moment, lots of pieces of middleware are buggy because they
consume the content from response.content and neither the content
property nor the middleware itself puts the content back into the
response. Those pieces of middleware should, with the current state of
the code, be checking if the response was created from an iterator and,
if so, be careful to replace the content after use by poking into the
internals of the response. Every single piece of middleware that
examines response.content needs to do this (this is the bit that
requires the repetition).
Of course, that's a silly requirement (and requires poking at non-public
pieces of the response). The response should just return a copy of the
content when response.content is accessed, which means turning any
iterator into a proper string.
> > If I write a third-party middleware that behaves exactly as the
> > content-length middleware does now, it won't work with something that
> > can only be used as a generator. Since I won't know ahead of time what
> > responses will be give to this middleware I am therefore forced to
> > assume such a generator will be passed in. Thus, a change like this is
> > saying that from now on, for all time, no middleware can look at the
> > content of the response unless it fits into the very narrow profile in
> > that ticket (matching one or other content types, etc).
>
> If a middleware has to look at the content - it will. But why not make
> certain standard middleware not required to look at the content? They
> will be more useful then, and it doesn't imply, IMHO, that everyone
> should do it. No?
But it *does* require that everybody will have to behave this way. If I
am writing a piece of middleware I have to assume the worst case, which
here means that the content is a non-repeatable generator. Since
generators would be permitted in that world, I have to treat them as
possible in my middleware code. So I cannot examine the content.
Otherwise my middleware is crippled as a reusable piece of code: it
would have to carry a big warning saying "cannot be used unless you know
your responses will never use generators", which means I have to
investigate the details of every third-party app I have installed before
I can safely use my middleware. That simply is not going to be good
practice.
I really don't see that it's worth the enormous complexity it brings at
the moment to introduce this. It's not something that cannot wait and be
trialled in various approaches after 1.0, but rushing to create some
public interface and functionality promises right now so that generators
become fully supported in HttpResponse is going to lead to bugs. The
ticket itself already proposes some things that will lead to bugs: for
example, content-length must *always* be set, since chunked
transfer-encoding isn't permitted with WSGI, so it's not a question of
checking _is_string or not -- which, btw, is an internal attribute that
middleware should never access -- but, rather, a question of whether the
header has already been set.
Malcolm
Ouch.. I thought it already does just that. Yeah, it's a bug then.
Though simple to fix.
> But it *does* require that everybody will have to behave this way. If I
> am writing a piece of middleware I have to assume the worst case, which
> here means that the content is a non-repeatable generator.
Now I understand what you don't like. If we simply make HttpResponse to
replace its source generator with a string upon exhaustion, the problem
will disappear.
But it's another bug. I'm talking here along the lines "why don't we
make GzipMiddleware not read response.content if it doesn't have to".
> It's not something that cannot wait and be
> trialled in various approaches after 1.0
Oh, yes, definitely post-1.0.
> The
> ticket itself already proposes some things that will lead to bugs: for
> example, content-length must *always* be set
I was thinking about it myself... But don't things like Comet work with
infinitely long responses that don't have content length?