Proposal: require rack.input to be seekable

99 views
Skip to first unread message

Hongli Lai

unread,
Apr 12, 2009, 1:15:13 PM4/12/09
to Rack Development
The specification currently does not state whether rack.input has to
be seekable. This means that if rack.input is not seekable, then any
middleware which tries to parse POST parameters will inevitably
consume the POST data, which is then lost forever. In Phusion
Passenger, rack.input is a socket, and thus is not seekable.

Middlewares can work around this problem. For example, Rails tries to
fix this with its RewindableInput middleware: it wraps an object
around rack.input which buffers all data in rack.input into a tempfile
whenever an I/O operation on it occurs. Tempfiles are seekable so
other middlewares can just rewind rack.input if they want to read the
POST data. There are however two problems with this approach:

1. Everybody will implement the workaround in a different way. Rails's
implementation is naive: it slurps rack.input into memory entirely,
and writes the entire buffer into a tempfile.
2. If POST-consuming middleware is inserted above the workaround
middleware, then the workaround will not work. For example, the
Facebooker plugin inserts a POST-consuming middleware above
ActionController::RewindableInput. As a result, any POST data is lost:
http://code.google.com/p/phusion-passenger/issues/detail?id=220

I propose that the spec requires rack.input to be seekable. Then
workarounds like RewindableInput will no longer be necessary, and
middleware developers will not be able to make mistake 2. The Rack
library can provide an efficiently-implemented utility class for
making sockets seekable, which can then be used by all web servers/
Rack handlers.

Joshua Peek

unread,
Apr 12, 2009, 2:28:02 PM4/12/09
to rack-...@googlegroups.com
Please, I want to remove that RewindableInput bandaid.

+1

Hongli, just curious what you will do specifically for passenger since
the socket is not rewindable?

--
Joshua Peek

Hongli Lai

unread,
Apr 12, 2009, 3:31:27 PM4/12/09
to Rack Development
On Apr 12, 8:28 pm, Joshua Peek <j...@joshpeek.com> wrote:
> Please, I want to remove that RewindableInput bandaid.
>
> +1
>
> Hongli, just curious what you will do specifically for passenger since
> the socket is not rewindable?

At the moment, the web server will only buffer POST data that's larger
than 4 KB. The buffer file is, for security reasons, unlinked from the
filesystem for security purposes, so that it becomes an anonymous disk-
backed byte buffer. When all POST data has been received, the data in
the buffer file is then streamed to the backend.

If rack.input is required to be rewindable then I'll modify the Apache
module to always buffer to a non-anonymous file (i.e. it is not
unlinked) regardless of the size of the POST data. This filename is
then passed to the backend, which then opens it and assigns the
resulting File object to env["rack.input"]. No data is copied so this
is highly efficient, moreso than the current implementation.

Hongli Lai

unread,
Apr 12, 2009, 3:53:13 PM4/12/09
to Rack Development
It should be noted that there is one disadvantage if rack.input is
required to be seekable: implementing upload progress handling in Rack
applications will become impossible because all data has to be
buffered first. However, nobody seems to be interested in this anyway,
and all existing web servers that can serve Ruby web apps already make
it impossible by buffering large POST data. People who want to
implement upload progress handling can and probably should do it at
the web server layer, not at the backend application layer.

Yehuda Katz

unread,
Apr 12, 2009, 3:58:51 PM4/12/09
to rack-...@googlegroups.com
The easiest way to do upload progress these days is via Flash on the client-side. That's what I'd do...

-- Yehuda
--
Yehuda Katz
Developer | Engine Yard
(ph) 718.877.1325

Hongli Lai

unread,
Apr 12, 2009, 5:34:03 PM4/12/09
to Rack Development
On Apr 12, 9:58 pm, Yehuda Katz <wyc...@gmail.com> wrote:
> The easiest way to do upload progress these days is via Flash on the
> client-side. That's what I'd do...

Wordpress's Flash uploader causes my browser to lock up half of the
time. I don't know whether Flash is to blame or whether Wordpress's
Flash applet is to blame...

Yehuda Katz

unread,
Apr 12, 2009, 6:01:36 PM4/12/09
to rack-...@googlegroups.com
Is the whole thing in Flash or is it just using a shim to get the "Upload Multiple Files" dialog?

-- Yehuda

candlerb

unread,
Apr 13, 2009, 4:20:44 AM4/13/09
to Rack Development
On Apr 12, 6:15 pm, Hongli Lai <hon...@phusion.nl> wrote:
> The specification currently does not state whether rack.input has to
> be seekable.

The specification also says that three different APIs must be provided
concurrently: each, read and gets. This is another fundamental
problem. Any middleware which touches the body must also provide these
three APIs to the consumer.

The worst of these is 'gets', because a malicious client may force the
webserver to read the entire input into RAM (e.g. by submitting a 2GB
body which doesn't contain a newline)

read(n) isn't quite so bad, because the consumer can specify the
maximum amount to read, and the producer doesn't necessarily have to
provide that much for each chunk. However read(n) and each are
fundamentally different: read(n) is a "pull" (driven by consumer) and
each() is a "push" (driven by producer). It becomes very hard to write
modules which work both ways.

Personally I think that Rack should specify exactly one API for
rack.input. My preference is for each(), since this is simple to
implement both for producers and consumers. This is however orthogonal
to the rewindability question, since you still need to define whether
the consumer can call 'each' only once or more than once.

I think that it should be defined that 'each' can only be called once,
but we can then provide a middleware module which makes the body
rewindable, and in addition provides the 'read' and 'gets' APIs. This
middleware would keep small bodies in RAM but spool larger ones to
disk.

This approach keeps web server adapters and middleware modules simple
but still provides buffered input to those apps which require it. The
disadvantage is that if the webserver itself already provides body
buffering, we would be ignoring this capability. (Do any of them do
this?)

Regards,

Brian.

Hongli Lai

unread,
Apr 13, 2009, 5:53:16 AM4/13/09
to Rack Development
On Apr 13, 10:20 am, candlerb <b.cand...@pobox.com> wrote:
> The specification also says that three different APIs must be provided
> concurrently: each, read and gets. This is another fundamental
> problem. Any middleware which touches the body must also provide these
> three APIs to the consumer.
>
> The worst of these is 'gets', because a malicious client may force the
> webserver to read the entire input into RAM (e.g. by submitting a 2GB
> body which doesn't contain a newline)
>
> read(n) isn't quite so bad, because the consumer can specify the
> maximum amount to read, and the producer doesn't necessarily have to
> provide that much for each chunk. However read(n) and each are
> fundamentally different: read(n) is a "pull" (driven by consumer) and
> each() is a "push" (driven by producer). It becomes very hard to write
> modules which work both ways.
>
> Personally I think that Rack should specify exactly one API for
> rack.input. My preference is for each(), since this is simple to
> implement both for producers and consumers. This is however orthogonal
> to the rewindability question, since you still need to define whether
> the consumer can call 'each' only once or more than once.
>
> I think that it should be defined that 'each' can only be called once,
> but we can then provide a middleware module which makes the body
> rewindable, and in addition provides the 'read' and 'gets' APIs. This
> middleware would keep small bodies in RAM but spool larger ones to
> disk.

The problem with this middleware approach is that app developers can
easily get things wrong. The Facebooker plugin issue is a classic
example: they inserted input consuming middleware above the workaround
middleware. The only reason why Facebooker works in Mongrel is because
the input object happens to be rewindable on Mongrel; it breaks on
Phusion Passenger where the input object is not rewindable.

I think that it's best to force adapter developers to implement
rewindability. If the Rack library provides a utility class which can
make an IO object rewindable, then implementing rewindability should
not be hard for adapter developers at all. Adapter developers who want
to make use of the web server's native buffering capabilities can
still choose to do so.


> This approach keeps web server adapters and middleware modules simple
> but still provides buffered input to those apps which require it. The
> disadvantage is that if the webserver itself already provides body
> buffering, we would be ignoring this capability. (Do any of them do
> this?)

All major Ruby-serving web servers do this. Phusion Passenger,
Mongrel, Thin, they all support body buffering.

Christian Neukirchen

unread,
Apr 13, 2009, 6:22:02 AM4/13/09
to rack-...@googlegroups.com
candlerb <b.ca...@pobox.com> writes:

> The worst of these is 'gets', because a malicious client may force the
> webserver to read the entire input into RAM (e.g. by submitting a 2GB
> body which doesn't contain a newline)

Do we use gets at all...?

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

Hongli Lai

unread,
Apr 13, 2009, 7:17:13 AM4/13/09
to Rack Development
On Apr 13, 12:22 pm, Christian Neukirchen <chneukirc...@gmail.com>
wrote:
> candlerb <b.cand...@pobox.com> writes:
> > The worst of these is 'gets', because a malicious client may force the
> > webserver to read the entire input into RAM (e.g. by submitting a 2GB
> > body which doesn't contain a newline)
>
> Do we use gets at all...?

I don't know any middleware which uses gets. Given the context of web
applications it's quite useless: normal POST data is read with read(env
['Content-Length'].to_i) (assuming a naive implementation), and
multipart POST data is parsed block-by-block instead of line-by-line.
Reply all
Reply to author
Forward
0 new messages