That's a good point. Any objections to adding #size to the SPEC and
Rack::RewindableInput? I'd be happy to put the patches together.
Thanks,
Ryan
It should be noted that this request came from Chad Fowler who
encountered the problem in the Gemcutter website. The existing code
already assumes that #size is available, which is the case for e.g.
Thin which exposes a StringIO or Tempfile as rack.input.
-brianmario
> What does this mean for streaming requests?
Block. This change just takes us further from async support, but it seems there's a need.
Yeah. I've always hated #rewind to be honest but, as long as we have
it, I can't think of a good argument to not also support #size. Let's
either move away from seekable input as part of the SPEC or accept it
and allow stuff like this in. Also, Rack 2.0 (?!) is probably the
earliest we can drop rewind if we want to.
Thanks,
Ryan
We could also say that rack.input may be either:
1. a string-like, supporting to_s
2. a file-like, supporting IO methods
And allow server authors to make it optional.
>
> Thanks,
> Ryan
No, HTTP allows chunked file uploads.
Does it really break async support? It should be possible to support
async even when #size is supported, async frameworks/apps just musn't
call #size.
I've actually been really bugged about the rewindable requirement since it was introduced.I like a variant of James' proposal, which would work like this:env["rack.input"] must either be an IO that, when #read, returns the full input.This would mean that if you #read from a non-rewindable input, you should put a StringIO in place with the read data. If you read a rewindable input, you should rewind it.
On Wednesday, February 24, 2010, Yehuda Katz <wyc...@gmail.com> wrote:
> I've actually been really bugged about the rewindable requirement since it was introduced.
> I like a variant of James' proposal, which would work like this:
> env["rack.input"] must either be an IO that, when #read, returns the full input.
>
>
> This would mean that if you #read from a non-rewindable input, you should put a StringIO in place with the read data. If you read a rewindable input, you should rewind it.Yehuda Katz
> Agreed. The ability to arbitrarily rewind read input at any time
> *requires* adding extra buffering and indirection around the actual
> stream, which makes Rack less suitable for handling small requests as
> fast as possible. In jruby-rack, for example, we could directly funnel
> the request's channel straight through the app with basically no extra
> wrapping or buffering if not for this requirement. It doesn't seem
> worth it to penalize small, fast requests just for the sake of
> middleware that could do the right thing on it's own.
What are the cases where we need #rewind, anyway?
Parsing the POST parameters twice, what else?
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
I agree, there are environments where it's not possible to provide a
rewindable input (e.g., AppEngine).
> I like a variant of James' proposal, which would work like this:
> env["rack.input"] must either be an IO that, when #read, returns the full
> input.
> This would mean that if you #read from a non-rewindable input, you should
> put a StringIO in place with the read data. If you read a rewindable input,
> you should rewind it.
What about a huge multipart upload? Do you really want to create a
StringIO that holds the whole thing in memory? Shouldn't there be a
size limit or a way to prevent that behavior?
/Nick
Unicorn/Rainbows!/Zbatery cheat in body-less requests and reuse the same
StringIO object for rack.input. The only extra overhead is the constant
lookups and hash assignments:
env[RACK_INPUT] = NULL_IO
> What are the cases where we need #rewind, anyway?
>
> Parsing the POST parameters twice, what else?
One app I've played with in the past is a retrying reverse proxy
with the ability to replay PUT requests if some backends fail.
Rewindability is a nice-to-have, but it can (and should imho) be
implemented via middleware instead of being a requirement.
--
Eric Wong
- Some developers write input-consuming middlewares and place it
*before* ActionController::RewindableInput. As a result the app layer
does not receive any POST data at all. For a good case study, read
http://code.google.com/p/phusion-passenger/issues/detail?id=220. The
problem did not occur in Mongrel or Thin because they always buffer
their input into a StringIO or Tempfile.
- ActionController::RewindableInput's implementation is quite naive
and will happily slurp 2 GB of rack.input data into memory if there's
that much data.
In theory the rewindability requirement can be removed by agreeing on
a protocol, e.g. by demanding that rack.input-consuming middlewares
wrap rack.input, or by implementing rewindability in middleware. I
agree that this is technically more elegant than placing requirements
on the web server. But I'm not arguing technical elegance here. The
problem with the technically elegant method is that it leaves too much
room for developer mistakes and too easily causes confusion among
developers, such as the Facebooker bug mentioned in the Phusion
Passenger bug report.
Let's count the number of Rack web servers. There are about 4 I think
(Mongrel, Thin, Unicorn, Phusion Passenger); in any case probably less
than 10. Now let's denote the number libraries or apps out there that
use middlewares as x; the value of x is unknown but I'm pretty sure
it's greater than 10. The chance that there's at least one rack.input
consuming bug in x apps/libraries is a lot greater than the chance
that there's at least one rack.input consuming bug in one of the Rack
web servers. It's just easier to fix all the web servers than to fix
every broken app or library. Furthermore streaming can be implemented
even with the rewindability requirement, see Unicorn's TeeInput. This
is why I think having the rewindability requirement in the spec is a
good idea: it makes implementing a Rack web server a bit more
complicated, but it prevents library/app developer mistakes while at
the same time not making streaming impossible.
If the rewindability requirement is to be removed then steps must be
taken to ensure that developers cannot make mistakes and that any
input-wrapping code is efficient. I'm thinking about this:
- rack.input MUST NOT respond to #size, #rewind, #seek, etc. If the
web server exposes those methods then developers will rely on them,
and their code will fail when they switch to a web server that does
provide #size, #rewind, etc. and they might think that the web server
is bugged instead of their own code.
- The Rack library must provide a utility class for wrapping
rack.input into something rewindable, so that people don't naively
slurp all of rack.input into a StringIO. This utility class must be
implemented in the most efficient manner possible and must not hog
memory. The spec must make mention of this class so that developers
don't try to write their own naive implementations.
I would propose that Rack nor provided a rewindable input by default,
but potentially provided an easy API to flip the input if needed,
either through the "replace IO with something rewindable" approach or
by providing a call to be made before the first destructive read from
the stream. This will allow requests that don't have destructive
middleware to have fastest-possible request processing without making
such middleware substantially more difficult to support.
FYI, you would want the JRuby based servers in that list as well, like
GlassFish server, GlassFish gem, JBoss TorqueBox, jruby-rack (used by
all WAR file deployments), whatever Google App Engine uses now, and
possibly other one-off servers like jetty-rails.
Wow, now that you mention it, I can't really think of much either.
That's a neat use of #rewind though.
Even mp3/flv/ect. audio seeking AFAICT works through sending another
GET request with a byte-range:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1
I'll know by the end of the week, planning on implementing a web app
with seekable content.
-Kyle
There will always, obviously, be more apps and middlewares out there
than handlers, but that doesn't mean that changes to the handler
interface should not be looked at critically. Especially when those
changes to the interface are basically bug fixes for middleware.
Graham.