On Apr 11, 11:48 pm, Pavel Kunc <
pavel.k...@gmail.com> wrote:
> Multipart POST will arrive as rack.input which is an String::IO.
Big POST request bodies could arrive as File. I don't think we should
assume anything but the interface specified by Rack.
> I'm not that sure that changing the whole rack.input encoding to
> Encoding.default_external would not break things. This was actually
> solution in the Rack Lighthouse which was dropped due to this fear.
>
> Also in theory each part of the Multipart post can have different
> encoding AFAIK which makes things even more complicated. Or am I wrong
> here?
Probably not. This is The Web where everything is more complicated.
There's also two rather orthogonal concepts: encoding and character
sets. The encoding (Content-Transfer-Encoding header a message part)
only specifies the wire format for transmission.
Anyway, looking at Thin, rack.input contains raw request data; at
least it should since it doesn't do any fancy interpretation of the
Content-Type header(s). So as I read it, we actually need to
*reinterpret* (as opposed to *convert*) incoming data based on
* each part's Content-Transfer-Encoding header
* each part's Content-Encoding header (yay)
* the type and character set specified in the part's Content-Type
header (defaulting to US-ASCII text/plain)
Effectively, we should probably consider the content of rack.input as
byte soup with 7bit-clean delimiters. It might still be interesting to
push the processing into Merb::Application and create a merb.input
environment part with more convenient semantics.