On 7/21/08 12:35 AM, Dan Kubb (dkubb) wrote:
> > Recently I was developing an app using Merb + Thin + HAML, and I
> > noticed that the Content-Length header wasn't being set by Thin, while
> > it was when using Mongrel.  Initially this led me to contribute a
> > patch to Thin which would set the Content-Length when possible.
> >
> > (For more info see http://thin.lighthouseapp.com/projects/7212/tickets/74-content-length-should-be-added-when-possible)
> >
> > After macournoyer and I discussed this, we came to the conclusion that
> > the framework using Rack would be in a better position to calculate
> > the Content-Length and that my patch was more of a band-aid than a
> > true solution.
> >
> > My idea was to update Rack::Lint so that it ensured that the Content-
> > Length is always set unless:
> >
> >   1) the Status code is 1xx, 204 or 304
> >   2) the Transfer-Encoding header is set to chunked
> >
> > According to RFC 2616 this is correct behavior for servers -- *all
> > responses*, aside from the ones for the statuses mentioned above, must
> > either set the Content-Length or Transfer-Encoding: chunked and return
> > a properly chunked body.
> >
> > Would a patch be accepted to Rack::Lint to add this functionality?
> > This will at least help ensure all the frameworks are "doing the right
> > thing" with respect to RFC 2616, while allowing servers like Thin and
> > Ebb to not have to perform this type of "hack".
This led to additional checks in Rack::Lint and the
Rack::ContentLength middleware. I think we got it wrong, though.
Here's my current thinking as to how we should handle Content-Length
and variable-length bodies:
First: it is simply not possible to require the Content-Length
response header. Some bodies are impossible to know the length of
until you iterate over them. With large bodies, the headers must be
sent before iteration is complete (unless the entire body is slurped
into memory/temporary storage). I think there's general agreement on
this.
But, 'Transfer-Encoding: chunked' should probably not ever be applied
within application / middleware logic. It's not an application level
concern -- it's purely a technicality of transferring the body between
two ends of a network connection. It's of no use within the Rack
middleware chain and is actually intrusive. Consider what happens when
a Rack::Chunker middleware is run before Rack::Deflator (or any other
middleware that modifies the response body) in the middleware chain
for instance.
If a response without a Content-Length header makes it to the server
handler, it should be the handler's job to decide how to deal with it
based on something like the following rules:
1. The handler should first attempt to determine and set the
Content-Length header. The Content-Length can be set if -- and only if
-- the body's length can be established without iteration. i.e., The
body would have to be some fixed-length object: Array or String. This
is what the Rack::ContentLength middleware does today. It should be
safe to simply move this logic into shared handler code so that it's
always applied.
2. If the Content-Length can not be established without iteration, the
handler should deliver the response based on the version of HTTP being
used. When the request is from an HTTP/1.1 client, the handler should
set the 'Transfer-Encoding: chunked' response header and encode the
response appropriately while iterating over and sending the body on
the socket. When the request is from an HTTP/1.0 client, the handler
must not apply any Transfer-Encoding and instead close the socket
after writing the response body. That proper handling of this requires
certain behavior of the server handler is why I believe the whole
problem should be moved into the handler's problem domain.
I propose the following changes and am working on getting them onto a
branch as I write this:
* Include the above logic in some kind of shared handler code that all
handlers could use if needed (Rack::Handler or Rack::Utils?). Servers
like Thin that implement this correctly already would require no
changes. Servers like Mongrel (and maybe Webrick) that do not perform
chunked handling would need to have their handlers perform this logic.
* Remove the Content-Length header requirement from Rack::Lint and
replace it with a check that verifies the Content-Length, when
present, matches the length of the body if the body length can be
established. A missing Content-Length header should be allowed.
Thoughts?
Thanks,
Ryan
I agree. It should be handled be the server, or the handler if the
server didn't do it. Making it middleware implies that correct
implementation of HTTP is somehow optional, or an application concern.
There is the oddity of the trailing headers in the last chunk. I've
never seen this used (except for one company I worked at where our
server implemented and required it to be used, which later developers
had come to curse).
Cheers,
Sam
I think I based the implementation on RFC 2616 so I wouldn't be
surprised if things are slightly different and quirky with real world
client implementations. The Rack::Chunker example I threw out is
likely not an acceptable final implementation but I figured it would
serve as an example of the basic logic required and one way of
approaching it.
Thanks,
Ryan
Yep. There are definitely a variety of chunk sizing optimizations that
could happen here.
> Would you consider altering the spec to say explicitly that the
> response body is an object which may respond to #read(n) or #each?
> This would give the handler writer a licence to try calling read()
> first.
I'd be against it. Don't return IO objects directly as bodies if you
don't like the characteristics of IO#each; or, redefine the IO
instance's #each method to yield 4K/8K chunks before returning as a
body.
I wouldn't be against some kind of File wrapper in Rack::Utils. There
were a few 5-10 LOC example implementations throw out in a recent
thread.
>> If the Content-Length can not be established without iteration
>
> By which I presume you mean: if body.is_a?(Array) and all Array
> elements are Strings, sum their :bytesize or :size. So you *are*
> actually iterating, but only when it is cheap and safe.
Right. Sorry, I should have said, "Unless the Content-Length can be
established without destructive iteration". And that basically means
only objects that respond to #to_ary as of Rack 1.0.
This is actually something I've long felt is left out of the Rack
SPEC. How a body is to behave on successive calls to #each is not
specified. For example, IO objects have a destructive #each
implementation - once you've iterated over it, it's exhausted and will
not give the same result on successive calls. Arrays are
non-destructive in that #each yields the same results on multiple
successive calls. Since we don't know whether a body has a destructive
#each implementation, we basically have to assume it always does,
except for the Array special case. This same problem plagues
rack.input and gives rise to all of the sometimes-rewind idioms that I
don't think anyone is very happy with. I don't have a good solution,
though.
> It might occasionally be useful for an app to be able to force a
> chunked response for a large reply even if its size is known up-front.
> It can do this by wrapping the response in some arbitrary non-Array
> object which only responds to #each.
There's no upside to using "Transfer-Encoding: chunked" when you know
the Content-Length up front. Note that just because a Content-Length
header is present doesn't mean that responses are sent out in one
large hunk. It's possible to return a body that responds to #each and
generates the response iteratively while also including a
Content-Length header in the response. In other words, whether
responses are chunked encoded and whether they're "streamed" are
totally separate issues.
> Another aside: if Rack::ContentLength is to go, perhaps it could be
> replaced by Rack::ContentType which applies a default Content-Type?
> This is so I can be lazy and write
>
>    [200, {}, "The response"]
>
> But if we're going to be strict about RFC 2616 (7.2.1), it only says
> that responses SHOULD have a Content-Type. It is legal to omit it, and
> the recipient can either guess the type, or treat as application/octet-
> stream. So I'm not sure that the Rack spec should make a stronger
> requirement than the RFC.
A simple middleware that applies a default Content-Type when none is
present might be a worthwhile addition:
use Rack::ContentType, "text/plain"
Put a patch together and let's see what happens.
Thanks,
Ryan
> A simple middleware that applies a default Content-Type when none is
> present might be a worthwhile addition:
>
>     use Rack::ContentType, "text/plain"
+1
+100 for "chunk-when-no-content-length-is-given".
-- 
Christian Neukirchen  <chneuk...@gmail.com>  http://chneukirchen.org
Patches attached for review:
0001 Add Rack::Utils.bytesize function, use everywhere
0002 Add Rack::Chunked (Transfer-Encoding: chunked) middleware
0003 Rack::Lint no longer requires a Content-Length response header
0004 Handlers use ContentLength and Chunked middleware where needed
Generated from the "chunky" branch here:
http://github.com/rtomayko/rack/commits/chunky
I went a slightly different direction with this than I originally
planned. The "Content-Length" and "Transfer-Encoding: chunked" logic
are implemented as middleware instead of Utils and the server handlers
wrap their apps in either or both pieces of middleware as needed.
Handlers have not used middleware like this in the past and it might
be confusing that the ContentLength and Chunked middleware are really
for handler use only now. I'm willing to move this logic around if we
think that's a better option.
All handlers use the ContentLength middleware; some use the Chunked
middleware. See the commit message on the following for more details
on why some handlers require different behavior:
http://github.com/rtomayko/rack/commit/17feb4c52dd7807ee930645c2eb129b6886f66fa
Some other things worth mentioning:
 * Thin and Mongrel include their own chunked encoding
implementations. For those handlers, we don't actually have to apply
CE as they'll do it automatically when no Content-Length header is
present. Still, I've put our chunking in place for these handlers
because Thin wants to remove its chunking code, IIRC, and I figured it
would be good for consistency across servers.
 * Buffering chunks to a certain size would likely be a nice
performance enhancement. As of now, each string yielded from body.each
is encoded as a separate chunk.
Thanks,
Ryan
http://tomayko.com/about
> 0001 Add Rack::Utils.bytesize function, use everywhere
> 0002 Add Rack::Chunked (Transfer-Encoding: chunked) middleware
> 0003 Rack::Lint no longer requires a Content-Length response header
> 0004 Handlers use ContentLength and Chunked middleware where needed
First, thanks!
The patches look good to me, but since they change the internals of
Rack quite heavily, it would be good if more people reviewed them.
It would be really nice to get rid of Content-Length enforcement. :)
Agreed. I've split this out into a couple of patches and opened
tickets in lighthouse:
"Rack::Utils.bytesize"
http://rack.lighthouseapp.com/projects/22435/tickets/34
"Automatic Content-Length and Transfer-Encoding: chunked handling "
http://rack.lighthouseapp.com/projects/22435/tickets/35
> It would be really nice to get rid of Content-Length enforcement. :)
+1 to that.
Thanks,
Ryan
I don't think we want to monkeypatch webrick. Those changes should go
upstream, IMO.
> It does not allow streaming of response bodies without Content-Length
> to HTTP/1.0 clients, as that required too deep monkey patching to
> WEBrick. But I don't suppose there are too many HTTP/1.0 clients out
> there any more.
Closing the socket after sending the response is how you signal the
end of the response under HTTP/1.0 (i.e., there is no
Tranfer-Encoding:chunked and Content-Length is more hint than protocol
requirement). WEBrick should support this by default, I think. What
made streaming under HTTP/1.0 troublesome?
Thanks,
Ryan
> They have been sat at http://redmine.ruby-lang.org/issues/show/855
> since last November, and posted to ruby-core some months before that,
> with no response. This is a shame, since WEBrick is actually a pretty
> good platform, one you can rely on being present in a minimal Ruby
> installation, and requires no native code compilation.
>
> I could release an entirely separate monkey-patch, but Rack would need
> to know whether it's present in order to be able to do
I think it's okay to monkey-patch that in.  I'd like to support 1.8.6
for some time.
Rack processes are unlikely to use WEBrick in "other" ways, so this
should not do much harm.
I wonder if the rack webrick handler could use as body a class the defines
#is_a?() to return true if it's asked if it's an IO, and that
implements the interface webrick expects (by calling #each on the Rack
response object).
More horrible, but would probably work, would be to actually derive
from IO, and override what you need to ignoring all of your base
classes implementation.
Thanks,
Sam
You don't have to worry about the implementation, its a feature of the
standard library:
http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/index.html
As an implementation technique it uses call/cc.
Also, I'm not sure why you would call fibers nasty (other than the
name is terrible). I use coroutines all the time in lua. They are one
of the things I look forward to most about ruby1.9.
Speaking of "nasty", the alternative to using the generator to
construct an external iterator is donkey patching WEBrick...
Cheers,
Sam
> 
> On Fri, Mar 20, 2009 at 12:31 AM, candlerb <b.ca...@pobox.com>
> wrote:
> >> I wonder if the rack webrick handler could use as body a class the
> >> defines #is_a?() to return true if it's asked if it's an IO, and
> >> that implements the interface webrick expects (by calling #each on
> >> the Rack response object).
> >
> > I don't think that can work without nastiness like threads/fibers.
> 
> You don't have to worry about the implementation, its a feature of the
> standard library:
> 
> http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/index.html
> 
> As an implementation technique it uses call/cc.
Which is known to leak memory like crazy.