Content-Length and Transfer-Encoding: chunked

2121 views
Skip to first unread message

Ryan Tomayko

unread,
Mar 5, 2009, 12:20:02 AM3/5/09
to rack-...@googlegroups.com
I believe the following is the original discussion that led to the
current thinking around Content-Length and Transfer-Encoding.
I'd like to revisit this in light of the new Content-Length
requirements that have been in place since Rack 0.9's release.

On 7/21/08 12:35 AM, Dan Kubb (dkubb) wrote:
> > Recently I was developing an app using Merb + Thin + HAML, and I
> > noticed that the Content-Length header wasn't being set by Thin, while
> > it was when using Mongrel. Initially this led me to contribute a
> > patch to Thin which would set the Content-Length when possible.
> >
> > (For more info see http://thin.lighthouseapp.com/projects/7212/tickets/74-content-length-should-be-added-when-possible)
> >
> > After macournoyer and I discussed this, we came to the conclusion that
> > the framework using Rack would be in a better position to calculate
> > the Content-Length and that my patch was more of a band-aid than a
> > true solution.
> >
> > My idea was to update Rack::Lint so that it ensured that the Content-
> > Length is always set unless:
> >
> > 1) the Status code is 1xx, 204 or 304
> > 2) the Transfer-Encoding header is set to chunked
> >
> > According to RFC 2616 this is correct behavior for servers -- *all
> > responses*, aside from the ones for the statuses mentioned above, must
> > either set the Content-Length or Transfer-Encoding: chunked and return
> > a properly chunked body.
> >
> > Would a patch be accepted to Rack::Lint to add this functionality?
> > This will at least help ensure all the frameworks are "doing the right
> > thing" with respect to RFC 2616, while allowing servers like Thin and
> > Ebb to not have to perform this type of "hack".

This led to additional checks in Rack::Lint and the
Rack::ContentLength middleware. I think we got it wrong, though.
Here's my current thinking as to how we should handle Content-Length
and variable-length bodies:

First: it is simply not possible to require the Content-Length
response header. Some bodies are impossible to know the length of
until you iterate over them. With large bodies, the headers must be
sent before iteration is complete (unless the entire body is slurped
into memory/temporary storage). I think there's general agreement on
this.

But, 'Transfer-Encoding: chunked' should probably not ever be applied
within application / middleware logic. It's not an application level
concern -- it's purely a technicality of transferring the body between
two ends of a network connection. It's of no use within the Rack
middleware chain and is actually intrusive. Consider what happens when
a Rack::Chunker middleware is run before Rack::Deflator (or any other
middleware that modifies the response body) in the middleware chain
for instance.

If a response without a Content-Length header makes it to the server
handler, it should be the handler's job to decide how to deal with it
based on something like the following rules:

1. The handler should first attempt to determine and set the
Content-Length header. The Content-Length can be set if -- and only if
-- the body's length can be established without iteration. i.e., The
body would have to be some fixed-length object: Array or String. This
is what the Rack::ContentLength middleware does today. It should be
safe to simply move this logic into shared handler code so that it's
always applied.

2. If the Content-Length can not be established without iteration, the
handler should deliver the response based on the version of HTTP being
used. When the request is from an HTTP/1.1 client, the handler should
set the 'Transfer-Encoding: chunked' response header and encode the
response appropriately while iterating over and sending the body on
the socket. When the request is from an HTTP/1.0 client, the handler
must not apply any Transfer-Encoding and instead close the socket
after writing the response body. That proper handling of this requires
certain behavior of the server handler is why I believe the whole
problem should be moved into the handler's problem domain.

I propose the following changes and am working on getting them onto a
branch as I write this:

* Include the above logic in some kind of shared handler code that all
handlers could use if needed (Rack::Handler or Rack::Utils?). Servers
like Thin that implement this correctly already would require no
changes. Servers like Mongrel (and maybe Webrick) that do not perform
chunked handling would need to have their handlers perform this logic.

* Remove the Content-Length header requirement from Rack::Lint and
replace it with a check that verifies the Content-Length, when
present, matches the length of the body if the body length can be
established. A missing Content-Length header should be allowed.

Thoughts?

Thanks,
Ryan

Sam Roberts

unread,
Mar 5, 2009, 2:05:04 AM3/5/09
to rack-...@googlegroups.com
On Wed, Mar 4, 2009 at 9:20 PM, Ryan Tomayko <r...@tomayko.com> wrote:
> But, 'Transfer-Encoding: chunked' should probably not ever be applied
> within application / middleware logic. It's not an application level
> concern -- it's purely a technicality of transferring the body between
> two ends of a network connection.

I agree. It should be handled be the server, or the handler if the
server didn't do it. Making it middleware implies that correct
implementation of HTTP is somehow optional, or an application concern.

There is the oddity of the trailing headers in the last chunk. I've
never seen this used (except for one company I worked at where our
server implemented and required it to be used, which later developers
had come to curse).

Cheers,
Sam

Ryan Tomayko

unread,
Mar 5, 2009, 3:54:49 AM3/5/09
to rack-...@googlegroups.com

I think I based the implementation on RFC 2616 so I wouldn't be
surprised if things are slightly different and quirky with real world
client implementations. The Rack::Chunker example I threw out is
likely not an acceptable final implementation but I figured it would
serve as an example of the basic logic required and one way of
approaching it.

Thanks,
Ryan

candlerb

unread,
Mar 5, 2009, 4:17:00 AM3/5/09
to Rack Development
In general, I think this is the right thing to do.

* It would make the Rack API easier to use for endpoints.
* It means Rack::ContentLength middleware can be dropped. Aside: this
appears to be 1.9-broken at present - it uses part.length instead of
(byte)size
* It would allow clients to send bodies of indeterminate size to HTTP/
1.0 clients (which is legal, albeit dodgy in the face of network
errors, but useful for things like multipart/x-mixed-replace)

> Thoughts?

A naive handler will call #each on the body and output one HTTP chunk
for each yielded string. This may lead to some strange behaviour for
those apps which return an open File for the body: one chunk for each
line!!

So a sensible handler should either check for read() and use it if
available, or concatenate yielded strings to build up a sensible chunk
size. This could live in shared handler code.

Would you consider altering the spec to say explicitly that the
response body is an object which may respond to #read(n) or #each?
This would give the handler writer a licence to try calling read()
first.

> If the Content-Length can not be established without iteration

By which I presume you mean: if body.is_a?(Array) and all Array
elements are Strings, sum their :bytesize or :size. So you *are*
actually iterating, but only when it is cheap and safe.

It might occasionally be useful for an app to be able to force a
chunked response for a large reply even if its size is known up-front.
It can do this by wrapping the response in some arbitrary non-Array
object which only responds to #each.

Another aside: if Rack::ContentLength is to go, perhaps it could be
replaced by Rack::ContentType which applies a default Content-Type?
This is so I can be lazy and write

[200, {}, "The response"]

But if we're going to be strict about RFC 2616 (7.2.1), it only says
that responses SHOULD have a Content-Type. It is legal to omit it, and
the recipient can either guess the type, or treat as application/octet-
stream. So I'm not sure that the Rack spec should make a stronger
requirement than the RFC.

Regards,

Brian.

Ryan Tomayko

unread,
Mar 5, 2009, 5:47:25 AM3/5/09
to rack-...@googlegroups.com
On Thu, Mar 5, 2009 at 1:17 AM, candlerb <b.ca...@pobox.com> wrote:
> In general, I think this is the right thing to do.
>
> * It would make the Rack API easier to use for endpoints.
> * It means Rack::ContentLength middleware can be dropped. Aside: this
> appears to be 1.9-broken at present - it uses part.length instead of
> (byte)size
> * It would allow clients to send bodies of indeterminate size to HTTP/
> 1.0 clients (which is legal, albeit dodgy in the face of network
> errors, but useful for things like multipart/x-mixed-replace)
>
>> Thoughts?
>
> A naive handler will call #each on the body and output one HTTP chunk
> for each yielded string. This may lead to some strange behaviour for
> those apps which return an open File for the body: one chunk for each
> line!!
>
> So a sensible handler should either check for read() and use it if
> available, or concatenate yielded strings to build up a sensible chunk
> size. This could live in shared handler code.

Yep. There are definitely a variety of chunk sizing optimizations that
could happen here.

> Would you consider altering the spec to say explicitly that the
> response body is an object which may respond to #read(n) or #each?
> This would give the handler writer a licence to try calling read()
> first.

I'd be against it. Don't return IO objects directly as bodies if you
don't like the characteristics of IO#each; or, redefine the IO
instance's #each method to yield 4K/8K chunks before returning as a
body.

I wouldn't be against some kind of File wrapper in Rack::Utils. There
were a few 5-10 LOC example implementations throw out in a recent
thread.

>> If the Content-Length can not be established without iteration
>
> By which I presume you mean: if body.is_a?(Array) and all Array
> elements are Strings, sum their :bytesize or :size. So you *are*
> actually iterating, but only when it is cheap and safe.

Right. Sorry, I should have said, "Unless the Content-Length can be
established without destructive iteration". And that basically means
only objects that respond to #to_ary as of Rack 1.0.

This is actually something I've long felt is left out of the Rack
SPEC. How a body is to behave on successive calls to #each is not
specified. For example, IO objects have a destructive #each
implementation - once you've iterated over it, it's exhausted and will
not give the same result on successive calls. Arrays are
non-destructive in that #each yields the same results on multiple
successive calls. Since we don't know whether a body has a destructive
#each implementation, we basically have to assume it always does,
except for the Array special case. This same problem plagues
rack.input and gives rise to all of the sometimes-rewind idioms that I
don't think anyone is very happy with. I don't have a good solution,
though.

> It might occasionally be useful for an app to be able to force a
> chunked response for a large reply even if its size is known up-front.
> It can do this by wrapping the response in some arbitrary non-Array
> object which only responds to #each.

There's no upside to using "Transfer-Encoding: chunked" when you know
the Content-Length up front. Note that just because a Content-Length
header is present doesn't mean that responses are sent out in one
large hunk. It's possible to return a body that responds to #each and
generates the response iteratively while also including a
Content-Length header in the response. In other words, whether
responses are chunked encoded and whether they're "streamed" are
totally separate issues.

> Another aside: if Rack::ContentLength is to go, perhaps it could be
> replaced by Rack::ContentType which applies a default Content-Type?
> This is so I can be lazy and write
>
>    [200, {}, "The response"]
>
> But if we're going to be strict about RFC 2616 (7.2.1), it only says
> that responses SHOULD have a Content-Type. It is legal to omit it, and
> the recipient can either guess the type, or treat as application/octet-
> stream. So I'm not sure that the Rack spec should make a stronger
> requirement than the RFC.

A simple middleware that applies a default Content-Type when none is
present might be a worthwhile addition:

use Rack::ContentType, "text/plain"

Put a patch together and let's see what happens.

Thanks,
Ryan

Christian Neukirchen

unread,
Mar 5, 2009, 6:46:09 AM3/5/09
to rack-...@googlegroups.com
Ryan Tomayko <r...@tomayko.com> writes:

> A simple middleware that applies a default Content-Type when none is
> present might be a worthwhile addition:
>
> use Rack::ContentType, "text/plain"

+1

+100 for "chunk-when-no-content-length-is-given".

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

Ryan Tomayko

unread,
Mar 6, 2009, 12:48:10 AM3/6/09
to rack-...@googlegroups.com
On Thu, Mar 5, 2009 at 3:46 AM, Christian Neukirchen
<chneuk...@gmail.com> wrote:
> +100 for "chunk-when-no-content-length-is-given".

Patches attached for review:

0001 Add Rack::Utils.bytesize function, use everywhere
0002 Add Rack::Chunked (Transfer-Encoding: chunked) middleware
0003 Rack::Lint no longer requires a Content-Length response header
0004 Handlers use ContentLength and Chunked middleware where needed

Generated from the "chunky" branch here:

http://github.com/rtomayko/rack/commits/chunky

I went a slightly different direction with this than I originally
planned. The "Content-Length" and "Transfer-Encoding: chunked" logic
are implemented as middleware instead of Utils and the server handlers
wrap their apps in either or both pieces of middleware as needed.
Handlers have not used middleware like this in the past and it might
be confusing that the ContentLength and Chunked middleware are really
for handler use only now. I'm willing to move this logic around if we
think that's a better option.

All handlers use the ContentLength middleware; some use the Chunked
middleware. See the commit message on the following for more details
on why some handlers require different behavior:

http://github.com/rtomayko/rack/commit/17feb4c52dd7807ee930645c2eb129b6886f66fa

Some other things worth mentioning:

* Thin and Mongrel include their own chunked encoding
implementations. For those handlers, we don't actually have to apply
CE as they'll do it automatically when no Content-Length header is
present. Still, I've put our chunking in place for these handlers
because Thin wants to remove its chunking code, IIRC, and I figured it
would be good for consistency across servers.

* Buffering chunks to a certain size would likely be a nice
performance enhancement. As of now, each string yielded from body.each
is encoded as a separate chunk.

Thanks,
Ryan
http://tomayko.com/about

0001-Add-Rack-Utils.bytesize-function-use-everywhere.patch
0002-Add-Rack-Chunked-Transfer-Encoding-chunked-middl.patch
0003-Rack-Lint-no-longer-requires-a-Content-Length-respo.patch
0004-Handlers-use-ContentLength-and-Chunked-middleware-wh.patch

Christian Neukirchen

unread,
Mar 6, 2009, 7:57:41 AM3/6/09
to rack-...@googlegroups.com
Ryan Tomayko <r...@tomayko.com> writes:

> 0001 Add Rack::Utils.bytesize function, use everywhere
> 0002 Add Rack::Chunked (Transfer-Encoding: chunked) middleware
> 0003 Rack::Lint no longer requires a Content-Length response header
> 0004 Handlers use ContentLength and Chunked middleware where needed

First, thanks!

The patches look good to me, but since they change the internals of
Rack quite heavily, it would be good if more people reviewed them.

It would be really nice to get rid of Content-Length enforcement. :)

candlerb

unread,
Mar 6, 2009, 1:15:12 PM3/6/09
to Rack Development
> I went a slightly different direction with this than I originally
> planned. The "Content-Length" and "Transfer-Encoding: chunked" logic
> are implemented as middleware instead of Utils and the server handlers
> wrap their apps in either or both pieces of middleware as needed.
> Handlers have not used middleware like this in the past and it might
> be confusing that the ContentLength and Chunked middleware are really
> for handler use only now.

Sounds like we need a new name. How about "Bottomware"? :-)

Anyway, I haven't had a chance to look through this yet, but I did
think a bit more about the problem where the WEBrick handler is
buffering the entire POST body (whether chunked or not) into RAM. The
offending code is just this:

"rack.input" => StringIO.new(req.body.to_s),

Now, WEBrick provides an API for reading bodies as a stream
(#read_body), which maps directly to Rack's #each. All good so far.

However the Rack spec requires us to implement gets() and read(n) as
well. Connecting a 'pull' sink (read) to a 'push' source (each) isn't
straightforward: without using threads or fibers, you've got to buffer
the lot. So one solution might look like this:

- if #each(&blk) is called, call #read_body(&blk)
- elsif #read or #gets is called, wrap the entire body in StringIO as
before, and return that

This is better, but has the same problem for large body uploads if the
app uses read/gets instead of each. To improve this, we could start
reading into a StringIO, and if the body is larger than a certain
size, switch to a TempFile. This is adding quite a lot of complexity
into the handler.

And this, I think, is the nub of the problem: Rack input streams are
required to implement three different APIs, one push and two pull. If
input streams were defined only to implement #each, this would be
symmetrical with output streams, and would make any middleware which
wants to modify the input stream much simpler. It's also very simple
for handlers, because connecting a 'pull' source into a 'push' sink is
trivial, and if they're dechunking they probably want to 'push'
anyway.

So what about apps which really want read(n) or gets()? Well, we could
provide some buffering middleware for them. As has been observed
before, some apps want a rewindable input stream. This will always
necessitate buffering to RAM or disk.

So if we're clarifying the roles of webservers, handlers and apps with
regards to chunking, I think we should also clarify where the
responsibility lies for spooling - with the webserver/handler, or with
middleware/framework/app. If not, code portability will be a problem.

Ryan Tomayko

unread,
Mar 6, 2009, 5:00:59 PM3/6/09
to rack-...@googlegroups.com
On Fri, Mar 6, 2009 at 4:57 AM, Christian Neukirchen
<chneuk...@gmail.com> wrote:
>
> Ryan Tomayko <r...@tomayko.com> writes:
>
>> 0001 Add Rack::Utils.bytesize function, use everywhere
>> 0002 Add Rack::Chunked (Transfer-Encoding: chunked) middleware
>> 0003 Rack::Lint no longer requires a Content-Length response header
>> 0004 Handlers use ContentLength and Chunked middleware where needed
>
> First, thanks!
>
> The patches look good to me, but since they change the internals of
> Rack quite heavily, it would be good if more people reviewed them.

Agreed. I've split this out into a couple of patches and opened
tickets in lighthouse:

"Rack::Utils.bytesize"
http://rack.lighthouseapp.com/projects/22435/tickets/34

"Automatic Content-Length and Transfer-Encoding: chunked handling "
http://rack.lighthouseapp.com/projects/22435/tickets/35

> It would be really nice to get rid of Content-Length enforcement. :)

+1 to that.

Thanks,
Ryan

candlerb

unread,
Mar 7, 2009, 9:17:13 AM3/7/09
to Rack Development
> And this, I think, is the nub of the problem: Rack input streams are
> required to implement three different APIs, one push and two pull.

Furthermore: the requirement to implement gets() theoretically
requires the whole input stream to be read into RAM. After all, it's
possible that someone could provide a 2GB upload which consists of
only one line.

candlerb

unread,
Mar 12, 2009, 5:57:24 PM3/12/09
to Rack Development
I finally got a chance to look at these patches.

(1) Firstly, I looked at chunking in the client->server (POST)
direction. Here's my test server:

-----
$:.unshift "/home/brian/git/rack/lib"
require 'rack'
require 'digest/md5'

app = lambda { |env|
digest = Digest::MD5.new
env["rack.input"].each do |chunk|
digest << chunk
end
res = digest.hexdigest + "\n"
[200, {"Content-Type"=>"text/plain","Content-
Length"=>res.size.to_s}, res]
}

opt = {:Port => 9292, :Host => "0.0.0.0", :AccessLog => []}
Rack::Handler::WEBrick.run app, opt
#$:.unshift "/usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib"
#Rack::Handler::Mongrel.run app, opt
-----

And here's the test client for sending chunked uploads:

-----
require 'net/http'
file = '/v/downloads/linux/ubuntu-8.04-desktop-i386.iso'
File.open(file,'rb') do |f|
req = Net::HTTP::Post.new('/')
req['Transfer-Encoding'] = 'chunked'
#req['Content-Length'] = f.stat.size
req.body_stream = f
res = Net::HTTP.new('127.0.0.1',9292).start { |http| http.request
(req) }
puts res.body
end
-----

[Note: performance from net/http is dire by default, but if you change
the chunksize from 1024 to 65536 in net/http.rb it works much faster]

* WEBrick: upload is buffered entirely into RAM (as shown by 'top')
* Mongrel: upload doesn't work at all (raises Errno::EPIPE in the
client)

So neither of these work.

(2) Now I'll try the opposite direction, server->client. Server code:

-----
$:.unshift "/home/brian/git/rack/lib"
require 'rack'

file = '/v/downloads/linux/ubuntu-8.04-desktop-i386.iso'

class Sender
def initialize(fn)
@fn = fn
end
def each
File.open(@fn,"rb") do |f|
while blk = f.read(65536)
yield blk
end
end
end
end

app = lambda { |env|
[200, {"Content-Type"=>"text/plain"}, Sender.new(file)]
#[200, {"Content-Type"=>"text/plain","Content-Length"=>File.size
(file).to_s}, Sender.new(file)]
}

opt = {:Port => 9292, :Host => "0.0.0.0", :AccessLog => []}
Rack::Handler::WEBrick.run app, opt
#$:.unshift "/usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib"
#Rack::Handler::Mongrel.run app, opt
-----

Client: curl http://127.0.0.1:9292/ | md5sum

* WEBrick: not chunking :-(
- top shows that the entire file is read into RAM
- curl shows the transfer doesn't start until the entire file has
been read in
- however the file is transferred correctly
- sending a smaller file (/etc/motd) and using telnet shows a
Content-Length: header and no chunking

* Mongrel: this is working fine
- telnet shows chunked encoding
- curl receives the file as a stream
- top shows memory usage is not growing

=============================

Finally, I repeated all these tests using standard Content-Length
streaming rather than chunking.

(3) POST client: enable the Content-Length header and disable the
Transfer-Encoding header

* WEBrick: entire upload buffered in RAM
* Mongrel: works correctly (streams without reading into RAM)

(4) Large file server: change to

[200, {"Content-Type"=>"text/plain","Content-Length"=>File.size
(file).to_s}, Sender.new(file)]

* WEBrick: entire download buffered in RAM
* Mongrel: works correctly (streams without reading into RAM)

I had a quick look at the WEBrick code to see what needs to be done.

- downloads (server->client) require some monkey-patching. This is
because HTTPResponse will only stream if the body is a concrete
instance of IO, and also responds to #read

I've already written the necessary code, and posted here:
http://redmine.ruby-lang.org/issues/show/855 (see in particular point
4)

The patch itself is http://redmine.ruby-lang.org/attachments/download/161
and the bits of interest are send_body_proc and ChunkedWrapper

- uploads (client->server) should be OK, by wrapping request.input in
some custom object instead of StringIO, at least for rack.input#each.
However the Rack spec requires gets() and read() to be implemented
too, which complicates things rather, especially if you want read() to
be able to support streaming when the client is uploading using
chunked encoding. Personally I don't think that Rack should be
requiring all three APIs.

Regards,

Brian.

candlerb

unread,
Mar 19, 2009, 5:40:43 AM3/19/09
to Rack Development
I have made a patch which fixes points (2) and (4) under WEBrick -
that is, chunking of large response bodies and streaming of large
response bodies with Content-Length, without reading the entire body
into RAM first.

http://rack.lighthouseapp.com/projects/22435-rack/tickets/39-patch-streaming-of-response-bodies-under-webrick

It does not allow streaming of response bodies without Content-Length
to HTTP/1.0 clients, as that required too deep monkey patching to
WEBrick. But I don't suppose there are too many HTTP/1.0 clients out
there any more.

Ryan Tomayko

unread,
Mar 19, 2009, 5:45:47 AM3/19/09
to rack-...@googlegroups.com
On Thu, Mar 19, 2009 at 2:40 AM, candlerb <b.ca...@pobox.com> wrote:
> I have made a patch which fixes points (2) and (4) under WEBrick -
> that is, chunking of large response bodies and streaming of large
> response bodies with Content-Length, without reading the entire body
> into RAM first.
>
> http://rack.lighthouseapp.com/projects/22435-rack/tickets/39-patch-streaming-of-response-bodies-under-webrick

I don't think we want to monkeypatch webrick. Those changes should go
upstream, IMO.

> It does not allow streaming of response bodies without Content-Length
> to HTTP/1.0 clients, as that required too deep monkey patching to
> WEBrick. But I don't suppose there are too many HTTP/1.0 clients out
> there any more.

Closing the socket after sending the response is how you signal the
end of the response under HTTP/1.0 (i.e., there is no
Tranfer-Encoding:chunked and Content-Length is more hint than protocol
requirement). WEBrick should support this by default, I think. What
made streaming under HTTP/1.0 troublesome?

Thanks,
Ryan

candlerb

unread,
Mar 19, 2009, 8:01:46 AM3/19/09
to Rack Development
On Mar 19, 9:45 am, Ryan Tomayko <r...@tomayko.com> wrote:
> I don't think we want to monkeypatch webrick. Those changes should go
> upstream, IMO.

They have been sat at http://redmine.ruby-lang.org/issues/show/855
since last November, and posted to ruby-core some months before that,
with no response. This is a shame, since WEBrick is actually a pretty
good platform, one you can rely on being present in a minimal Ruby
installation, and requires no native code compilation.

I could release an entirely separate monkey-patch, but Rack would need
to know whether it's present in order to be able to do

res.body = lambda { |out| ... write each chunk to out ... }

> > It does not allow streaming of response bodies without Content-Length
> > to HTTP/1.0 clients, as that required too deep monkey patching to
> > WEBrick. But I don't suppose there are too many HTTP/1.0 clients out
> > there any more.
>
> Closing the socket after sending the response is how you signal the
> end of the response under HTTP/1.0

Exactly.

> WEBrick should support this by default, I think. What made streaming under HTTP/1.0 troublesome?

Buried deep inside the method setup_header() is a hard-coded
assumption that only IO objects can behave in this way:

elsif @header['content-length'].nil?
unless @body.is_a?(IO)
@header['content-length'] = @body ? @body.size : 0
end
end

That is: anything which is not an IO is a String. But to enable
streaming with Rack, I had to add a Proc as a third type of response
body.

This in turn is due to WEBrick's expectation that anything which can
be streamed is an IO (where you "pull" chunks of data using #read),
whereas Rack provides an object which "pushes" chunks of data using
#each

candlerb

unread,
Mar 19, 2009, 10:46:20 AM3/19/09
to Rack Development
I've now forked WEBrick and made all the updates as separate commits:

http://github.com/candlerb/webrick/commits/master
http://github.com/candlerb/webrick/commits/ruby18
http://github.com/candlerb/webrick/commits/ruby186

With luck this means that these commits will find their way into Ruby
subversion. If not, then it becomes feasible to release a standalone
version of WEBrick.

I haven't put anything in to identify whether send_body_proc is
available, but perhaps WEBrick's version number will be bumped.

Regards,

Brian.

Christian Neukirchen

unread,
Mar 19, 2009, 5:04:20 PM3/19/09
to rack-...@googlegroups.com
candlerb <b.ca...@pobox.com> writes:

> They have been sat at http://redmine.ruby-lang.org/issues/show/855
> since last November, and posted to ruby-core some months before that,
> with no response. This is a shame, since WEBrick is actually a pretty
> good platform, one you can rely on being present in a minimal Ruby
> installation, and requires no native code compilation.
>
> I could release an entirely separate monkey-patch, but Rack would need
> to know whether it's present in order to be able to do

I think it's okay to monkey-patch that in. I'd like to support 1.8.6
for some time.

Rack processes are unlikely to use WEBrick in "other" ways, so this
should not do much harm.

Sam Roberts

unread,
Mar 19, 2009, 5:41:09 PM3/19/09
to rack-...@googlegroups.com
On Thu, Mar 19, 2009 at 5:01 AM, candlerb <b.ca...@pobox.com> wrote:
>> WEBrick should support this by default, I think. What made streaming under HTTP/1.0 troublesome?
>
> Buried deep inside the method setup_header() is a hard-coded
> assumption that only IO objects can behave in this way:
>
> elsif @header['content-length'].nil?
> unless @body.is_a?(IO)
> @header['content-length'] = @body ? @body.size : 0
> end
> end
>
> That is: anything which is not an IO is a String. But to enable
> streaming with Rack, I had to add a Proc as a third type of response
> body.
>
> This in turn is due to WEBrick's expectation that anything which can
> be streamed is an IO (where you "pull" chunks of data using #read),
> whereas Rack provides an object which "pushes" chunks of data using
> #each

I wonder if the rack webrick handler could use as body a class the defines
#is_a?() to return true if it's asked if it's an IO, and that
implements the interface webrick expects (by calling #each on the Rack
response object).

More horrible, but would probably work, would be to actually derive
from IO, and override what you need to ignoring all of your base
classes implementation.

Thanks,
Sam

candlerb

unread,
Mar 20, 2009, 3:31:39 AM3/20/09
to Rack Development
> I wonder if the rack webrick handler could use as body a class the defines
> #is_a?() to return true if it's asked if it's an IO, and that
> implements the interface webrick expects (by calling #each on the Rack
> response object).

I don't think that can work without nastiness like threads/fibers.

Notice that WEBrick is doing a loop which calls read() repeatedly to
fetch individual chunks, e.g.

while buf = input.read(BUFSIZE)
_write_data(output, buf)
end

This means that WEBrick is in control of the flow, invoking #read
multiple times as required. It "pulls".

However, Rack response provides a #each method, which you invoke once
and it yields multiple times. That is, the Rack response object is in
control of the flow. It "pushes".

It's easy to convert something which implements #read into something
which implements #each, but it's not easy the other way round.

Sam Roberts

unread,
Mar 20, 2009, 6:10:25 PM3/20/09
to rack-...@googlegroups.com
On Fri, Mar 20, 2009 at 12:31 AM, candlerb <b.ca...@pobox.com> wrote:
>> I wonder if the rack webrick handler could use as body a class the defines
>> #is_a?() to return true if it's asked if it's an IO, and that
>> implements the interface webrick expects (by calling #each on the Rack
>> response object).
>
> I don't think that can work without nastiness like threads/fibers.

You don't have to worry about the implementation, its a feature of the
standard library:

http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/index.html

As an implementation technique it uses call/cc.

Also, I'm not sure why you would call fibers nasty (other than the
name is terrible). I use coroutines all the time in lua. They are one
of the things I look forward to most about ruby1.9.

Speaking of "nasty", the alternative to using the generator to
construct an external iterator is donkey patching WEBrick...

Cheers,
Sam

Michael Fellinger

unread,
Mar 20, 2009, 8:45:42 PM3/20/09
to rack-...@googlegroups.com
On Fri, 20 Mar 2009 15:10:25 -0700
Sam Roberts <vieu...@gmail.com> wrote:

>
> On Fri, Mar 20, 2009 at 12:31 AM, candlerb <b.ca...@pobox.com>
> wrote:
> >> I wonder if the rack webrick handler could use as body a class the
> >> defines #is_a?() to return true if it's asked if it's an IO, and
> >> that implements the interface webrick expects (by calling #each on
> >> the Rack response object).
> >
> > I don't think that can work without nastiness like threads/fibers.
>
> You don't have to worry about the implementation, its a feature of the
> standard library:
>
> http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/index.html
>
> As an implementation technique it uses call/cc.

Which is known to leak memory like crazy.

Reply all
Reply to author
Forward
0 new messages