Adding #size to rack.input

77 views
Skip to first unread message

Ryan Tomayko

unread,
Feb 22, 2010, 5:46:26 PM2/22/10
to rack-...@googlegroups.com
On Mon, Feb 22, 2010 at 3:14 AM, Hongli Lai wrote:
> Hi Ryan.
>
> Rack.input is already required to be rewindable, meaning the input has
> to be backed by a file or a StringIO at some point. It should be
> trivial for pretty much all Rack web servers to implement #size. Do
> you have any objections against including #size in the spec?
>
> With kind regards,
> Hongli Lai

That's a good point. Any objections to adding #size to the SPEC and
Rack::RewindableInput? I'd be happy to put the patches together.

Thanks,
Ryan

Hongli Lai

unread,
Feb 22, 2010, 5:52:28 PM2/22/10
to Rack Development

It should be noted that this request came from Chad Fowler who
encountered the problem in the Gemcutter website. The existing code
already assumes that #size is available, which is the case for e.g.
Thin which exposes a StringIO or Tempfile as rack.input.

Brian Lopez

unread,
Feb 22, 2010, 5:59:08 PM2/22/10
to rack-...@googlegroups.com
What does this mean for streaming requests?

-brianmario

Magnus Holm

unread,
Feb 22, 2010, 6:01:39 PM2/22/10
to rack-...@googlegroups.com
Isn't the Content-Length already sent as a HTTP header, or isn't it always there?
// Magnus Holm

James Tucker

unread,
Feb 22, 2010, 9:30:48 PM2/22/10
to rack-...@googlegroups.com

On 22 Feb 2010, at 22:59, Brian Lopez wrote:

> What does this mean for streaming requests?

Block. This change just takes us further from async support, but it seems there's a need.

Ryan Tomayko

unread,
Feb 22, 2010, 10:33:26 PM2/22/10
to rack-...@googlegroups.com

Yeah. I've always hated #rewind to be honest but, as long as we have
it, I can't think of a good argument to not also support #size. Let's
either move away from seekable input as part of the SPEC or accept it
and allow stuff like this in. Also, Rack 2.0 (?!) is probably the
earliest we can drop rewind if we want to.

Thanks,
Ryan

James Tucker

unread,
Feb 23, 2010, 5:29:24 AM2/23/10
to rack-...@googlegroups.com

We could also say that rack.input may be either:

1. a string-like, supporting to_s
2. a file-like, supporting IO methods

And allow server authors to make it optional.

>
> Thanks,
> Ryan

Hongli Lai

unread,
Feb 23, 2010, 6:29:00 PM2/23/10
to Rack Development
On Feb 23, 12:01 am, Magnus Holm <judo...@gmail.com> wrote:
> Isn't the Content-Length already sent as a HTTP header, or isn't it always
> there?

No, HTTP allows chunked file uploads.

Hongli Lai

unread,
Feb 23, 2010, 6:30:52 PM2/23/10
to Rack Development
On Feb 23, 3:30 am, James Tucker <jftuc...@gmail.com> wrote:
> Block. This change just takes us further from async support, but it seems there's a need.

Does it really break async support? It should be possible to support
async even when #size is supported, async frameworks/apps just musn't
call #size.

Yehuda Katz

unread,
Feb 23, 2010, 9:57:52 PM2/23/10
to rack-...@googlegroups.com
I've actually been really bugged about the rewindable requirement since it was introduced.

I like a variant of James' proposal, which would work like this:

env["rack.input"] must either be an IO that, when #read, returns the full input.

This would mean that if you #read from a non-rewindable input, you should put a StringIO in place with the read data. If you read a rewindable input, you should rewind it.

Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325

James Tucker

unread,
Feb 24, 2010, 9:24:30 AM2/24/10
to rack-...@googlegroups.com
On 24 Feb 2010, at 02:57, Yehuda Katz wrote:

I've actually been really bugged about the rewindable requirement since it was introduced.

I like a variant of James' proposal, which would work like this:

env["rack.input"] must either be an IO that, when #read, returns the full input.

This would mean that if you #read from a non-rewindable input, you should put a StringIO in place with the read data. If you read a rewindable input, you should rewind it.

I think this boils down to two rules, with two notes:

1. env["rack.input"] must respond to #read and return the full input

Note: Middleware that consumes rack.input may use #rewind on rack.input if it is available, but:

2. Middleware must always ensure that it restores env['rack.input'] to contain an object that at minimum responds to #read and returns the full input.

Note: StringIO is recommended.

Charles Oliver Nutter

unread,
Feb 26, 2010, 3:05:32 AM2/26/10
to rack-...@googlegroups.com
Agreed. The ability to arbitrarily rewind read input at any time
*requires* adding extra buffering and indirection around the actual
stream, which makes Rack less suitable for handling small requests as
fast as possible. In jruby-rack, for example, we could directly funnel
the request's channel straight through the app with basically no extra
wrapping or buffering if not for this requirement. It doesn't seem
worth it to penalize small, fast requests just for the sake of
middleware that could do the right thing on it's own.

On Wednesday, February 24, 2010, Yehuda Katz <wyc...@gmail.com> wrote:
> I've actually been really bugged about the rewindable requirement since it was introduced.
> I like a variant of James' proposal, which would work like this:
> env["rack.input"] must either be an IO that, when #read, returns the full input.
>
>

> This would mean that if you #read from a non-rewindable input, you should put a StringIO in place with the read data. If you read a rewindable input, you should rewind it.Yehuda Katz

Christian Neukirchen

unread,
Feb 26, 2010, 5:49:35 AM2/26/10
to rack-...@googlegroups.com
Charles Oliver Nutter <hea...@headius.com> writes:

> Agreed. The ability to arbitrarily rewind read input at any time
> *requires* adding extra buffering and indirection around the actual
> stream, which makes Rack less suitable for handling small requests as
> fast as possible. In jruby-rack, for example, we could directly funnel
> the request's channel straight through the app with basically no extra
> wrapping or buffering if not for this requirement. It doesn't seem
> worth it to penalize small, fast requests just for the sake of
> middleware that could do the right thing on it's own.

What are the cases where we need #rewind, anyway?

Parsing the POST parameters twice, what else?

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

Nick Sieger

unread,
Feb 26, 2010, 2:53:38 PM2/26/10
to rack-...@googlegroups.com
On Tue, Feb 23, 2010 at 8:57 PM, Yehuda Katz <wyc...@gmail.com> wrote:
> I've actually been really bugged about the rewindable requirement since it
> was introduced.

I agree, there are environments where it's not possible to provide a
rewindable input (e.g., AppEngine).

> I like a variant of James' proposal, which would work like this:
> env["rack.input"] must either be an IO that, when #read, returns the full
> input.
> This would mean that if you #read from a non-rewindable input, you should
> put a StringIO in place with the read data. If you read a rewindable input,
> you should rewind it.

What about a huge multipart upload? Do you really want to create a
StringIO that holds the whole thing in memory? Shouldn't there be a
size limit or a way to prevent that behavior?

/Nick

Yehuda Katz

unread,
Feb 26, 2010, 4:20:38 PM2/26/10
to rack-...@googlegroups.com
I think in that case, from the perspective of those downstream from the middleware processing the huge upload, there *is no* input.

Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325


Eric Wong

unread,
Feb 26, 2010, 7:33:23 PM2/26/10
to rack-...@googlegroups.com
Christian Neukirchen <chneuk...@gmail.com> wrote:
> Charles Oliver Nutter <hea...@headius.com> writes:
>
> > Agreed. The ability to arbitrarily rewind read input at any time
> > *requires* adding extra buffering and indirection around the actual
> > stream, which makes Rack less suitable for handling small requests as
> > fast as possible. In jruby-rack, for example, we could directly funnel
> > the request's channel straight through the app with basically no extra
> > wrapping or buffering if not for this requirement. It doesn't seem
> > worth it to penalize small, fast requests just for the sake of
> > middleware that could do the right thing on it's own.

Unicorn/Rainbows!/Zbatery cheat in body-less requests and reuse the same
StringIO object for rack.input. The only extra overhead is the constant
lookups and hash assignments:

env[RACK_INPUT] = NULL_IO

> What are the cases where we need #rewind, anyway?
>
> Parsing the POST parameters twice, what else?

One app I've played with in the past is a retrying reverse proxy
with the ability to replay PUT requests if some backends fail.

Rewindability is a nice-to-have, but it can (and should imho) be
implemented via middleware instead of being a requirement.

--
Eric Wong

Yehuda Katz

unread,
Feb 26, 2010, 9:01:47 PM2/26/10
to rack-...@googlegroups.com
Here's how I think about it:

Given the following middleware chain:

Passenger => LargeFileProcessor => StatsCollector => ParamsParser => AppEndpoint

Passenger, the server, is responsible for serving a Rack-compliant request to LargeFileProcessor, which it perceives to be the endpoint. LargeFileProcessor notices that the rack.input contains a large file, and pulls it out, replacing the input with StringIO.new(""). It sends the request to StatsCollector with a new key ("large_file.location"). StatsCollector wraps its call in a benchmarking block, saving the results somewhere. 

From the perspective of StatsCollector, LargeFileProcessor is the server, which serves it a valid Rack request. StatsCollector sends the request to ParamsParser with a new key ("large_file.location"). Again, From the perspective of ParamsParser, StatsCollector is the server, which serves *it* a valid Rack request. It parses the params from the Query string, seeing nothing in the body, and sends the request on.

From the perspective of AppEndpoint, ParamsParser is the server, but with a slightly modified Rack contract ("params.parsed_params" is a required key containing the params and "large_file.location" contains a large file, if it exists). It then returns a response, and each middleware in chain perceives the response to be coming from "the endpoint", before it eventually makes its way back to Passenger.

The important thing is that contract-extending middlewares (like LargeFileProcessor and ParamsParser) do not break transparent middlewares (like StatsCollector). Contract-extending middlewares are responsible for serving a request that is still a valid Rack request (but not necessarily identical to the original HTTP request), but they may provide additional information for downstreams that participate in the modified contract.

The beauty of Rack is that it leverages dynamism via an arbitrary Hash to allow contract extensions to be made at will, while retaining a fairly strict list of requirements for what the resulting Hash (and response tuple) must look like.

Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325


Hongli Lai

unread,
Feb 28, 2010, 10:19:36 AM2/28/10
to Rack Development
I'm not opposed to removing the rewindability requirement, but I think
we're forgetting here what the rewindability requirement was meant to
solve. Eric Wong mentioned that rewindability should be outside the
Rack spec and should be implemented in a middleware instead. However
this was exactly the situation in Rails before the rewindability
requirement was introduced. Rails had a middleware which consumes the
input into a StringIO and replaces rack.input with that StringIO
(ActionController::RewindableInput). This lead to the following
situation:

- Some developers write input-consuming middlewares and place it
*before* ActionController::RewindableInput. As a result the app layer
does not receive any POST data at all. For a good case study, read
http://code.google.com/p/phusion-passenger/issues/detail?id=220. The
problem did not occur in Mongrel or Thin because they always buffer
their input into a StringIO or Tempfile.
- ActionController::RewindableInput's implementation is quite naive
and will happily slurp 2 GB of rack.input data into memory if there's
that much data.

In theory the rewindability requirement can be removed by agreeing on
a protocol, e.g. by demanding that rack.input-consuming middlewares
wrap rack.input, or by implementing rewindability in middleware. I
agree that this is technically more elegant than placing requirements
on the web server. But I'm not arguing technical elegance here. The
problem with the technically elegant method is that it leaves too much
room for developer mistakes and too easily causes confusion among
developers, such as the Facebooker bug mentioned in the Phusion
Passenger bug report.

Let's count the number of Rack web servers. There are about 4 I think
(Mongrel, Thin, Unicorn, Phusion Passenger); in any case probably less
than 10. Now let's denote the number libraries or apps out there that
use middlewares as x; the value of x is unknown but I'm pretty sure
it's greater than 10. The chance that there's at least one rack.input
consuming bug in x apps/libraries is a lot greater than the chance
that there's at least one rack.input consuming bug in one of the Rack
web servers. It's just easier to fix all the web servers than to fix
every broken app or library. Furthermore streaming can be implemented
even with the rewindability requirement, see Unicorn's TeeInput. This
is why I think having the rewindability requirement in the spec is a
good idea: it makes implementing a Rack web server a bit more
complicated, but it prevents library/app developer mistakes while at
the same time not making streaming impossible.

If the rewindability requirement is to be removed then steps must be
taken to ensure that developers cannot make mistakes and that any
input-wrapping code is efficient. I'm thinking about this:
- rack.input MUST NOT respond to #size, #rewind, #seek, etc. If the
web server exposes those methods then developers will rely on them,
and their code will fail when they switch to a web server that does
provide #size, #rewind, etc. and they might think that the web server
is bugged instead of their own code.
- The Rack library must provide a utility class for wrapping
rack.input into something rewindable, so that people don't naively
slurp all of rack.input into a StringIO. This utility class must be
implemented in the most efficient manner possible and must not hog
memory. The spec must make mention of this class so that developers
don't try to write their own naive implementations.

Charles Oliver Nutter

unread,
Feb 28, 2010, 8:56:35 PM2/28/10
to rack-...@googlegroups.com
I don't think it should be a requrement for server vendors to
compensate for bugs in libraries. It was arguably a bug in facebooker
that it consumed a stream in a destructive way. I would not have
modified rack to accommodate destructive plugins, since down that path
lies madness. The common cases should not be penalized for the unusual
(or buggy) cases.

I would propose that Rack nor provided a rewindable input by default,
but potentially provided an easy API to flip the input if needed,
either through the "replace IO with something rewindable" approach or
by providing a call to be made before the first destructive read from
the stream. This will allow requests that don't have destructive
middleware to have fastest-possible request processing without making
such middleware substantially more difficult to support.

FYI, you would want the JRuby based servers in that list as well, like
GlassFish server, GlassFish gem, JBoss TorqueBox, jruby-rack (used by
all WAR file deployments), whatever Google App Engine uses now, and
possibly other one-off servers like jetty-rails.

Kyle Drake

unread,
Mar 8, 2010, 1:40:58 PM3/8/10
to rack-...@googlegroups.com
>
>> What are the cases where we need #rewind, anyway?
>>
>> Parsing the POST parameters twice, what else?
>
> One app I've played with in the past is a retrying reverse proxy
> with the ability to replay PUT requests if some backends fail.
>
> Rewindability is a nice-to-have, but it can (and should imho) be
> implemented via middleware instead of being a requirement.
>

Wow, now that you mention it, I can't really think of much either.
That's a neat use of #rewind though.

Even mp3/flv/ect. audio seeking AFAICT works through sending another
GET request with a byte-range:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1

I'll know by the end of the week, planning on implementing a web app
with seekable content.

-Kyle

Graham

unread,
Mar 11, 2010, 6:56:28 PM3/11/10
to Rack Development
I'd like to second Charles' sentiment that the rewindable requirement
is placing the onus on the wrong part of the stack. In the end, what
this means is that every request pays for a feature only a few
requests need to use. Using a hammer to kill a flea.

There will always, obviously, be more apps and middlewares out there
than handlers, but that doesn't mean that changes to the handler
interface should not be looked at critically. Especially when those
changes to the interface are basically bug fixes for middleware.

Graham.

Reply all
Reply to author
Forward
0 new messages