Webob 1.0.3 and infinite body_file reads

134 views
Skip to first unread message

tl...@kptpf.com

unread,
Mar 9, 2011, 9:45:56 PM3/9/11
to Paste Users
from webob import Request

req = Request.blank('/', body='infinite')
print repr(req.body_file.read())
print repr(req.body_file.read())
body_file = req.body_file
print repr(body_file.read())
print repr(body_file.read())
print repr(body_file.read())

-------------

Webob 1.0.3 outputs:

'infinite'
'infinite'
'infinite'
''
''

--------------

Whereas Webob 1.0-1~lucid2 outputs:

'infinite'
''
''
''
''

Sergey Schetinin

unread,
Mar 10, 2011, 1:27:02 AM3/10/11
to tl...@kptpf.com, Paste Users
Access to body_file property makes it seekable and seeks it to 0.
Previously all accesses to it should have been preceded with a manual
call to make_body_seekable and a seek to 0 and whenever that was not
done the app would be fragile and would work or not depending on what
kind of input stream was passed into it (usually there's some
middleware that makes the body seekable, so it did work most of the
time) or it might break the apps down the stack.

In other words, the behavior you've demonstrated is intentional. If
you want to get wsgi.input as is, use req.body_file_raw. Or if you do
want the seekable body at pos 0, but want to make multiple small
reads, use f = req.body_file; f.read(1)... Things like
lxml.etree.parse(req.body_file) work as expected anyway -- no change
there.

> --
> You received this message because you are subscribed to the Google Groups "Paste Users" group.
> To post to this group, send email to paste...@googlegroups.com.
> To unsubscribe from this group, send email to paste-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/paste-users?hl=en.
>
>

--
Best Regards,
Sergey Schetinin

http://self.maluke.com/ -- My articles and open-source stuff
http://www.maluke.com/ -- My commercial software and custom development services

tl...@kptpf.com

unread,
Mar 10, 2011, 8:55:59 AM3/10/11
to Paste Users
So how does this work with a 5G body? Or should we now stop using
body_file if there's a chance of that?
> > For more options, visit this group athttp://groups.google.com/group/paste-users?hl=en.
>
> --
> Best Regards,
> Sergey Schetinin
>
> http://self.maluke.com/-- My articles and open-source stuffhttp://www.maluke.com/-- My commercial software and custom development services

Sergey Schetinin

unread,
Mar 10, 2011, 9:15:37 AM3/10/11
to tl...@kptpf.com, Paste Users
If you need to gradually read that file and you know it will not need
to be reused, you can just use req.body_file_raw.

> For more options, visit this group at http://groups.google.com/group/paste-users?hl=en.
>
>

--
Best Regards,
Sergey Schetinin

http://self.maluke.com/ -- My articles and open-source stuff
http://www.maluke.com/ -- My commercial software and custom development services

Sergey Schetinin

unread,
Mar 10, 2011, 9:23:50 AM3/10/11
to tl...@kptpf.com, Paste Users
Man.. that does break the openstack swift. Both swift.obj.server and
swift.proxy.server use req.body_file to read the possibly huge input
in chunks. Keeping / reverting to previous semantics of body_file
would make a lot of apps revert to being fickle again. But those must
have accumulated the necessary crutches by now. Hmm...

tl...@kptpf.com

unread,
Mar 10, 2011, 9:25:25 AM3/10/11
to Paste Users
Or just use body_file_raw to begin with? I'm not trying to
argumentative here just for the sake of it. :) Just trying to
understand how body_file works now. If someone sends me a 5G chunked
transfer request and I get body_file and start reading it (in chunks,
i.e. read(65536)), will I eventually use up 5G of memory just for the
off chance I might want to seek?
> >>http://self.maluke.com/--My articles and open-source stuffhttp://www.maluke.com/--My commercial software and custom development services

tl...@kptpf.com

unread,
Mar 10, 2011, 9:27:33 AM3/10/11
to Paste Users
Ah, we're cross-emailing each other. :) No problem, we can switch
Swift to use body_file_raw. I'm just worried about what might happen
to other Webob folks. There could be easy threats where an attacker
gobbles up all memory.
> >>>http://self.maluke.com/--My articles and open-source stuffhttp://www.maluke.com/--My commercial software and custom development services
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "Paste Users" group.
> >> To post to this group, send email to paste...@googlegroups.com.
> >> To unsubscribe from this group, send email to paste-users...@googlegroups.com.

Sergey Schetinin

unread,
Mar 10, 2011, 9:55:44 AM3/10/11
to tl...@kptpf.com, Paste Users
Making the body seekable (by accessing .body_file for example) means
reading all of the input first, but that doesn't mean it will be
stored in memory. If it is known (by means of content-length) to be
over 10K it will be saved to a tempfile. If the content-length is not
present, it will be read into memory and then flushed to temp file if
it's bigger than the 10K. The 10K limit can be configured by setting
req.request_body_tempfile_limit. Of course this is only done once, the
tempfile will not be reread when .body_file is accessed again.

The req.body_file_raw exposes environ['wsgi.input'] directly, so
reading from it will not add any additional processing, prereading
etc. The reason for making sure body_file is seekable by default is to
help the middleware play nice without additional work. Great to hear
that this change isn't too bad for the Swift.

Looking at this as a vector of attack, I imagine any app that uses cgi
module to parse the form data without first checking the body length
can be easily tricked into filling up the disk with tempfiles. And
given the use of readline, the memory as well. So I don't think this
makes the situation any worse.

> For more options, visit this group at http://groups.google.com/group/paste-users?hl=en.
>
>

--
Best Regards,
Sergey Schetinin

http://self.maluke.com/ -- My articles and open-source stuff
http://www.maluke.com/ -- My commercial software and custom development services

tl...@kptpf.com

unread,
Mar 10, 2011, 11:19:53 AM3/10/11
to Paste Users
There was no webob.Request.body_file_raw in 1.0.1. I guess we'll start
using webob.Request.environ['wsgi.input']. :/

Sergey Schetinin

unread,
Mar 10, 2011, 11:57:58 AM3/10/11
to tl...@kptpf.com, Paste Users
Another option is to subclass the Request and add the "body_file_raw =
environ_getter('wsgi.input')" line and possibly commonly used utility
methods such as req.body_file_iterator(chunk_size) etc.

On 10 March 2011 18:19, tl...@kptpf.com <tl...@kptpf.com> wrote:
> There was no webob.Request.body_file_raw in 1.0.1. I guess we'll start
> using webob.Request.environ['wsgi.input']. :/
>

Sergey Schetinin

unread,
Mar 10, 2011, 12:10:55 PM3/10/11
to tl...@kptpf.com, Paste Users
BTW, I wonder why requiring 1.0.3 or newer is not an option?

Sergey Schetinin

unread,
Mar 10, 2011, 2:26:43 PM3/10/11
to tl...@kptpf.com, Paste Users
After thinking about this, I have to admit such a change in semantics
was a bad idea. I've rearranged the code so that req.body_file
semantics is back, there's a new req.body_file_seekable, and the
req.get_response seeks the body back to 0. This way compatibility
should be preserved but the middleware has less of a chance to corrupt
input stream and there's a body_file_seekable for anyone who wants
that.

If there are no objections to this, I'll release this as 1.0.4 in a few days.

Reply all
Reply to author
Forward
0 new messages