WebOb: bug in req.body vs. req.POST?

11 views
Skip to first unread message

Jon Nelson

unread,
Aug 28, 2009, 4:17:27 PM8/28/09
to Paste Users
I think I've found a bug in WebOb.
Depending on which is accessed first - either req.body or req.POST,
the contents of req.body are modified.
Example:

>>> from webob import Request
>>> req = Request.blank('/', method='POST')
>>> req.body = "<empty/>"
>>> req.body
'<empty/>'
>>> req.POST
MultiDict([('<empty/>', '')])
>>> req.body
'%3Cempty%2F%3E='
>>>

After assignment, I check req.body
It's good.
Then I access req.POST, which is also fine.
But then I access req.body again, and it's been altered!

This is not expected!

--
Jon

Jon Nelson

unread,
Sep 2, 2009, 8:23:59 AM9/2/09
to Paste Users

ping?

--
Jon

Sergey Schetinin

unread,
Sep 2, 2009, 8:43:33 AM9/2/09
to Jon Nelson, Paste Users
I don't thik this is a problem for form submissions, and when POSTing
XML this behavior is not present:

>>> from webob import Request
>>> req = Request.blank('/', method='POST')
>>> req.body = "<empty/>"
>>> req.content_type = 'application/xml'
>>> print req.POST
<NoVars: Not an HTML form submission (Content-Type: application/xml)>
>>> print req.body
<empty/>

2009/9/2 Jon Nelson <jne...@jamponi.net>:
--
Best Regards,
Sergey Schetinin

http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter

Jon Nelson

unread,
Sep 2, 2009, 9:09:06 AM9/2/09
to Paste Users
In my "real" application it certainly was a problem.
I accessed req.POST *first*, and then req.body, and had issues.
When I accessed req.body first, the code worked as expected.
I'll try to work up a better test case, but it really doesn't seem
right to me that accessing req.POST *alters* req.body (presumably,
this depends on the content type).
--
Jon

Jon Nelson

unread,
Sep 2, 2009, 9:24:40 AM9/2/09
to Paste Users
On Wed, Sep 2, 2009 at 7:43 AM, Sergey Schetinin<mal...@gmail.com> wrote:
> I don't thik this is a problem for form submissions, and when POSTing
> XML this behavior is not present:

Even when I am using a content-type of application/x-www-form-urlencoded,
if I access req.body *before* req.POST, body looks like this:

'%3Cempty%2F%3E%0A%0A'

if I merely access req.POST, req.body becomes this:

'%3Cempty%2F%3E%0A%0A='

The former was supplied by curl. The fact that webob alters req.body
regardless of the content type, and that it does so after accessing
req.POST, certainly violates the principle of least surprise.


--
Jon

Sergey Schetinin

unread,
Sep 2, 2009, 10:43:01 AM9/2/09
to Jon Nelson, Paste Users
2009/9/2 Jon Nelson <jne...@jamponi.net>:

Wait, you're saying that you're using
'application/x-www-form-urlencoded' content-type, right? And the body
is not a valid urlencoded string, correct? What accessing .POST does
is normalizing the request body, which can be argued is the right
thing to do. This can be a problem if you don't set correct
content-type on requests, but I'm not sure why would it be a problem
otherwise.

I think you're wrong in saying "webob alters req.body regardless of
the content type".

Ian Bicking

unread,
Sep 2, 2009, 10:55:21 AM9/2/09
to Jon Nelson, Paste Users
Well, looking at the code, it looks like all request bodies are treated as urlencoded when the content type is application/x-www-form-urlencoded, or multipart/form-data, or empty.  Empty in particular is probably causing the problem here.

When req.POST is processed, it doesn't keep req.body, but instead reconstructs it on-demand from the post variables.  In the case of what you are doing, the request body isn't a set of urlencoded variables, so the reconstruction doesn't work.

I don't know of any way to detect that the body isn't properly urlencoded, so that there is a real semantic change when you reconstruct it.  I don't want to preserve the body in all cases, as this requires a temporary file or other caching.

Jon Nelson

unread,
Sep 2, 2009, 11:46:58 AM9/2/09
to Paste Users
On Wed, Sep 2, 2009 at 9:43 AM, Sergey Schetinin<mal...@gmail.com> wrote:
> 2009/9/2 Jon Nelson <jne...@jamponi.net>:
>>
>> On Wed, Sep 2, 2009 at 7:43 AM, Sergey Schetinin<mal...@gmail.com> wrote:
>>> I don't thik this is a problem for form submissions, and when POSTing
>>> XML this behavior is not present:
>>
>> Even when I am using a content-type of application/x-www-form-urlencoded,
>> if I access req.body *before* req.POST, body looks like this:
>>
>> '%3Cempty%2F%3E%0A%0A'
>>
>> if I merely access req.POST, req.body becomes this:
>>
>> '%3Cempty%2F%3E%0A%0A='
>>
>> The former was supplied by curl. The fact that webob alters req.body
>> regardless of the content type, and that it does so after accessing
>> req.POST, certainly violates the principle of least surprise.
>
> Wait, you're saying that you're using
> 'application/x-www-form-urlencoded' content-type, right? And the body
> is not a valid urlencoded string, correct? What accessing .POST does
> is normalizing the request body, which can be argued is the right
> thing to do. This can be a problem if you don't set correct
> content-type on requests, but I'm not sure why would it be a problem
> otherwise.

No. In some cases I'm using that content type, and sometimes the body
matches that content type and other times it doesn't. However, in
*both* cases accessing req.POST *changes* req.body.

For example, given:

content-type: application/x-www-form-urlencoded
body: properly URL-encoded content

accessing req.POST does actually change req.body

and when given:

content-type: application/x-www-form-urlencoded
body: content that has *not* been URL-encoded (but is not invalid, either)

accessing req.POST renders req.body into a url-encoded version.

Part of the problem is that content-type is not necessarily correct.
All sorts of clients do all sorts of broken stuff, and I found that
when accessing req.POST req.body changed, and this was very much a
surprise to me. Acessing (but not changing) req.POST I would have
treated as a "read" operation (currying and memoization aside) but I
got a "write" operation as an unexpected side-effect.

Absolutely it can be argued that accessing req.POST normalizes the
body (using the content type header to do so) but it still violates


the principle of least surprise.

> I think you're wrong in saying "webob alters req.body regardless of
> the content type".

It appears to do so, however. In the example I posted content still
changed, even when the content was url-encoded and the header was set
appropriately - note the addition of the trailing '='. It's
irrelevant if the /decoded/ content is the same, a byte comparison of
req.body /before/ and req.body /after/ still shows that they are
different.

Another way to say it might be that accessing req.POST will rewrite req.body.


--
Jon

Ian Bicking

unread,
Sep 2, 2009, 2:28:00 PM9/2/09
to Jon Nelson, Paste Users
On Wed, Sep 2, 2009 at 11:46 AM, Jon Nelson <jne...@jamponi.net> wrote:
No. In some cases I'm using that content type, and sometimes the body
matches that content type and other times it doesn't. However, in
*both* cases accessing req.POST *changes* req.body.

For example, given:

content-type: application/x-www-form-urlencoded
body: properly URL-encoded content

accessing req.POST does actually change req.body

and when given:

content-type: application/x-www-form-urlencoded
body: content that has *not* been URL-encoded (but is not invalid, either)

accessing req.POST renders req.body into a url-encoded version.


Yes, in both cases it will normalize the request body.  In the first case, probably without any real harm, in the second probably corrupting the response.
 
Part of the problem is that content-type is not necessarily correct.
All sorts of clients do all sorts of broken stuff, and I found that
when accessing req.POST req.body changed, and this was very much a
surprise to me. Acessing (but not changing) req.POST I would have
treated as a "read" operation (currying and memoization aside) but I
got a "write" operation as an unexpected side-effect.

Absolutely it can be argued that accessing req.POST normalizes the
body (using the content type header to do so) but it still violates
the principle of least surprise.

Well... stuff happens ;)  When the request is invalid in some sense, there's only so much you can promise.  Most other libraries simply make the body inaccessible after parsing the form.  I'm not sure if it is the case, but it *should* be the case that if you access req.body, then it should stay constant, as there's no particular reason with respect to optimization to throw away the cached body; that is, if it's not the case, it's a bug that should be fixed.  If you access req.POST before req.body, I don't think WebOb can promise to keep the original body.
 
> I think you're wrong in saying "webob alters req.body regardless of
> the content type".

It appears to do so, however.  In the example I posted content still
changed, even when the content was url-encoded and the header was set
appropriately - note the addition of the trailing '='.  It's
irrelevant if the /decoded/ content is the same, a byte comparison of
req.body /before/ and req.body /after/ still shows that they are
different.

Another way to say it might be that accessing req.POST will rewrite req.body.

What Sergey is saying, which I believe is true, is that req.body won't change if you have a request content type that can't be confused for a form submission (i.e., empty or one of the two form content types).
 

Jon Nelson

unread,
Sep 8, 2009, 4:30:38 PM9/8/09
to Paste Users
On Wed, Sep 2, 2009 at 1:28 PM, Ian Bicking<ianbi...@gmail.com> wrote:
> On Wed, Sep 2, 2009 at 11:46 AM, Jon Nelson <jne...@jamponi.net> wrote:
>>
>> No. In some cases I'm using that content type, and sometimes the body
>> matches that content type and other times it doesn't. However, in
>> *both* cases accessing req.POST *changes* req.body.
>>
>> For example, given:
>>
>> content-type: application/x-www-form-urlencoded
>> body: properly URL-encoded content
>>
>> accessing req.POST does actually change req.body
>>
>> and when given:
>>
>> content-type: application/x-www-form-urlencoded
>> body: content that has *not* been URL-encoded (but is not invalid, either)
>>
>> accessing req.POST renders req.body into a url-encoded version.
>>
>
> Yes, in both cases it will normalize the request body.  In the first case,
> probably without any real harm, in the second probably corrupting the
> response.

> Well... stuff happens ;)  When the request is invalid in some sense, there's


> only so much you can promise.  Most other libraries simply make the body
> inaccessible after parsing the form.  I'm not sure if it is the case, but it
> *should* be the case that if you access req.body, then it should stay
> constant, as there's no particular reason with respect to optimization to
> throw away the cached body; that is, if it's not the case, it's a bug that
> should be fixed.  If you access req.POST before req.body, I don't think
> WebOb can promise to keep the original body.

It seems to me, then, that perhaps it's a documentation issue.
Maybe req.body should have a big fat caveat which states that
accessing req.POST will always normalize/re-write req.body /according
to the currently set content-type/ and that req.body is "raw". An
alternative might be to say that setting req.body *before*
req.content_type (if not is not already set) is bad/deprecated/a
warning, and then *when* req.body is set it is encoded using the
current content-type (if encoding is appropriate). I'm not sold on
that idea, either, but it seems to me that the disconnect is between
req.body being /sometimes/ treated as though it is raw, and other
times encoded - depending on the order of access between .POST and
.body (and the value of .content_type). Perhaps this is an area that
could some clarification, at least when it comes to the documentation?

>> Another way to say it might be that accessing req.POST will rewrite
>> req.body

... if it hasn't been re-written already.
Unless you are saying that every time I access req.POST it
normalizes/re-writes req.body, in which case that seems inefficient.

> What Sergey is saying, which I believe is true, is that req.body won't
> change if you have a request content type that can't be confused for a form
> submission (i.e., empty or one of the two form content types).

That seems to be the key here.

Thanks for the replies! I think this bears thinking about, however.
The interaction between .POST, .body, and .content_type seems like a
bit of a booby trap.

--
Jon

Reply all
Reply to author
Forward
0 new messages