Fwd: request.copy() != request, webob bug??

Jorge Vargas

unread,

Feb 27, 2009, 11:58:45 AM2/27/09

to paste...@googlegroups.com

I'm sending this to paste as I believe it's a bug with webob.

I also made a ticket in the TG trac and committed a temporal patch to
it http://trac.turbogears.org/ticket/2241

Thoughts?

---------- Forwarded message ----------
From: Jorge Vargas <jorge....@gmail.com>
Date: Tue, Feb 24, 2009 at 8:10 AM
Subject: request.copy() != request, webob bug??
To: turbogea...@googlegroups.com

Hey,

After several hours after trying to figure out why each and every
method in codemill's hg was working except push I notice that
CONTENT_LENGTH was getting set to -1 a couple of minutes later I ran
the following in weberror request==new_req to my surprise it returned
false, even better request.copy()==request returns FalsE!

While still in shock by all this I sat down and wrote 3 unit tests one
for TG, one for pylons and one for plain webob, to my surpise all of
them are failing! so what's wrong? how come copy doesn't copy? even if
the webob docs say so
http://pythonpaste.org/webob/reference.html#modifying-the-request why
am I getting a content-lenght of -1??

Still skeptic about this I went to the WSGIAppController and added
+ new_req.environ['CONTENT_LENGTH']=request.environ['CONTENT_LENGTH']

and now i can push! anyone knows what the heck is going on??

patch with the tests
http://paste.chrisarndt.de/paste/dbd7e1ff954542caa647ef65d6df97cc

Ian Bicking

unread,

Mar 5, 2009, 2:07:47 PM3/5/09

to Jorge Vargas, paste...@googlegroups.com

I'm a little confused by this. Why should request.copy() == request?
WebOb doesn't define any __eq__, so it's just object identity. Or is
it the -1 CONTENT_LENGTH that's throwing you off?

--
Ian Bicking | http://blog.ianbicking.org

Jorge Vargas

unread,

Mar 7, 2009, 3:48:10 AM3/7/09

to Ian Bicking, paste...@googlegroups.com

On Thu, Mar 5, 2009 at 3:07 PM, Ian Bicking <ia...@colorstudy.com> wrote:
> I'm a little confused by this. Why should request.copy() == request?
> WebOb doesn't define any __eq__, so it's just object identity. Or is
> it the -1 CONTENT_LENGTH that's throwing you off?
>

hi,

well yes it's the content_length why it is reset to -1? shouldn't it
be whatever the original content_lenght was?

as for the == isn't .copy supposed to produce an exact same copy of
the object? but with a different memory space? I guess when I said ==
I wasn't talking specifically about the python nothing of copy rather
the fact that the copy is not equal because the content_lenght is
different.

Ian Bicking

unread,

Mar 7, 2009, 12:02:30 PM3/7/09

to Jorge Vargas, paste...@googlegroups.com

On Sat, Mar 7, 2009 at 2:48 AM, Jorge Vargas <jorge....@gmail.com> wrote:
> On Thu, Mar 5, 2009 at 3:07 PM, Ian Bicking <ia...@colorstudy.com> wrote:
>> I'm a little confused by this. Why should request.copy() == request?
>> WebOb doesn't define any __eq__, so it's just object identity. Or is
>> it the -1 CONTENT_LENGTH that's throwing you off?
>>
> hi,
>
> well yes it's the content_length why it is reset to -1? shouldn't it
> be whatever the original content_lenght was?

Once you read the POST variables, WebOb doesn't keep the request body.
Instead it will reproduce the body from the POST variables. This can
change the length, since there isn't a canonical encoding for POST
variables (you can %-encode any value). The -1 tells libraries to
read the body in its entirety, instead of relying on a specific
length.

> as for the == isn't .copy supposed to produce an exact same copy of
> the object? but with a different memory space? I guess when I said ==
> I wasn't talking specifically about the python nothing of copy rather
> the fact that the copy is not equal because the content_lenght is
> different.

During the copy process WebOb makes the body concrete again, which
gives the copied object a specific content_length.

Jorge Vargas

unread,

Mar 12, 2009, 1:39:02 AM3/12/09

to Ian Bicking, paste...@googlegroups.com

On Sat, Mar 7, 2009 at 1:02 PM, Ian Bicking <ia...@colorstudy.com> wrote:
> On Sat, Mar 7, 2009 at 2:48 AM, Jorge Vargas <jorge....@gmail.com> wrote:
>> On Thu, Mar 5, 2009 at 3:07 PM, Ian Bicking <ia...@colorstudy.com> wrote:
>>> I'm a little confused by this. Why should request.copy() == request?
>>> WebOb doesn't define any __eq__, so it's just object identity. Or is
>>> it the -1 CONTENT_LENGTH that's throwing you off?
>>>
>> hi,
>>
>> well yes it's the content_length why it is reset to -1? shouldn't it
>> be whatever the original content_lenght was?
>
> Once you read the POST variables, WebOb doesn't keep the request body.
> Instead it will reproduce the body from the POST variables. This can
> change the length, since there isn't a canonical encoding for POST
> variables (you can %-encode any value). The -1 tells libraries to
> read the body in its entirety, instead of relying on a specific
> length.

I see, so from this it seems my patch is wrong. (see below)

>
>> as for the == isn't .copy supposed to produce an exact same copy of
>> the object? but with a different memory space? I guess when I said ==
>> I wasn't talking specifically about the python nothing of copy rather
>> the fact that the copy is not equal because the content_lenght is
>> different.
>
> During the copy process WebOb makes the body concrete again, which
> gives the copied object a specific content_length.
>

right, well for some reason this isn't working for me, inside
TG/pylons as you can see from this patch.
http://paste.chrisarndt.de/paste/dbd7e1ff954542caa647ef65d6df97cc and
the explanation here http://trac.turbogears.org/ticket/2241

The relevant class is a helper we added to mount WSGIapps in a sane
simple way inside TurboGears object dispatch controller and allow our
auth system to protect them
http://trac.turbogears.org/browser/trunk/tg/controllers.py?rev=6442#L756

I know the code is a bit hairy but this is currently an experiment.
improvements are welcome :)

Ian Bicking

unread,

Mar 12, 2009, 12:20:16 PM3/12/09

to Jorge Vargas, paste...@googlegroups.com

On Thu, Mar 12, 2009 at 12:39 AM, Jorge Vargas <jorge....@gmail.com> wrote:
>> During the copy process WebOb makes the body concrete again, which
>> gives the copied object a specific content_length.
>>
> right, well for some reason this isn't working for me, inside
> TG/pylons as you can see from this patch.
> http://paste.chrisarndt.de/paste/dbd7e1ff954542caa647ef65d6df97cc and
> the explanation here http://trac.turbogears.org/ticket/2241
>
> The relevant class is a helper we added to mount WSGIapps in a sane
> simple way inside TurboGears object dispatch controller and allow our
> auth system to protect them
> http://trac.turbogears.org/browser/trunk/tg/controllers.py?rev=6442#L756
>
> I know the code is a bit hairy but this is currently an experiment.
> improvements are welcome :)

Well, I guess I'm wondering why you want the request objects to be
equal? I don't think they will be equal, or even can reasonably be
equal, so avoiding that requirement would be best.

Jorge Vargas

unread,

Mar 12, 2009, 10:46:38 PM3/12/09

to Ian Bicking, paste...@googlegroups.com

I don't want them to be equal. That was a bad assumption on me part.

I want it to maintain the content_length which for some reason it's
being lost. For some reason the only element that is not equal is
this.

That said after this thread you gave me the clue that maybe my
downstream application is not respecting this content_length = -1 an
re-reading the body, and instead it's generating an error.

Graham Dumpleton

unread,

Mar 12, 2009, 11:21:25 PM3/12/09

to Paste Users

On Mar 8, 4:02 am, Ian Bicking <i...@colorstudy.com> wrote:
> On Sat, Mar 7, 2009 at 2:48 AM, Jorge Vargas <jorge.var...@gmail.com> wrote:

> > On Thu, Mar 5, 2009 at 3:07 PM, Ian Bicking <i...@colorstudy.com> wrote:
> >> I'm a little confused by this. Why should request.copy() == request?
> >> WebOb doesn't define any __eq__, so it's just object identity. Or is
> >> it the -1 CONTENT_LENGTH that's throwing you off?
>
> > hi,
>
> > well yes it's the content_length why it is reset to -1? shouldn't it
> > be whatever the original content_lenght was?
>
> Once you read the POST variables, WebOb doesn't keep the request body.
> Instead it will reproduce the body from the POST variables. This can
> change the length, since there isn't a canonical encoding for POST
> variables (you can %-encode any value). The -1 tells libraries to
> read the body in its entirety, instead of relying on a specific
> length.

Ian, sorry for jumping in on a running thread here and an old post at
that, but are you finding that having CONTENT_LENGTH being set to -1
to indicate that application should just read all available input
working in practice and compatible with code libraries that process
request content, such as cgi.FieldStorage.

The reason I ask relates to quite an old discussion on Python WEB-SIG
about having WSGI support chunked request encoding, something that
technically it cant at the moment. The suggestion back then from you
was for CONTENT_LENGTH to be set to -1 to indicate that unknown length
data was present. So, same idea you are applying here.

Because I keep having requests about supporting chunked request
content, in mod_wsgi 3.0 I allow chunked requests, but it would be up
to application to detect that scenario by looking at Transfer-Encoding
header to know that is what was supplied and decide to step outside of
what WSGI says you are supposed to do in respect of handling
wsgi.input. I haven't gone as far as setting CONTENT_LENGTH to -1,
although feasibly I could given that what happens with WSGI for
chunked request content is undefined as far as I can tell.

So, what would Paste do in situation where it ran on mod_wsgi and was
passed a chunked request, for the case of CONTENT_LENGTH not being set
but Transfer-Encoding being chunked, and for the case of mod_wsgi
passing CONTENT_LENGTH of -1, if it was changed to do that.

Graham

Ian Bicking

unread,

Mar 13, 2009, 11:47:22 AM3/13/09

to Jorge Vargas, paste...@googlegroups.com

It should be okay to re-read the body, especially if you just do
environ['wsgi.input'].read(int(environ['CONTENT_LENGTH')) -- -1 in
that context means to read everything. Of course if you app is
expecting to get len(result) == int(environ['CONTENT_LENGTH']), that
won't be true.

Ian Bicking

unread,

Mar 13, 2009, 5:22:30 PM3/13/09

to Graham Dumpleton, Paste Users

On Thu, Mar 12, 2009 at 10:21 PM, Graham Dumpleton
<Graham.D...@gmail.com> wrote:
> Ian, sorry for jumping in on a running thread here and an old post at
> that, but are you finding that having CONTENT_LENGTH being set to -1
> to indicate that application should just read all available input
> working in practice and compatible with code libraries that process
> request content, such as cgi.FieldStorage.

I'm pretty sure it's consistent with cgi.FieldStorage. I'm really
just using -1 in places where previously things were complete broken
(e.g., wedged sockets because two consumers were reading the input).
So there's not many regressions because everything was broken
previously ;) But there are some problems, for instance there was
just a problem with paste.proxy because it was sending an actual HTTP
header with Content-Length: -1.

> The reason I ask relates to quite an old discussion on Python WEB-SIG
> about having WSGI support chunked request encoding, something that
> technically it cant at the moment. The suggestion back then from you
> was for CONTENT_LENGTH to be set to -1 to indicate that unknown length
> data was present. So, same idea you are applying here.
>
> Because I keep having requests about supporting chunked request
> content, in mod_wsgi 3.0 I allow chunked requests, but it would be up
> to application to detect that scenario by looking at Transfer-Encoding
> header to know that is what was supplied and decide to step outside of
> what WSGI says you are supposed to do in respect of handling
> wsgi.input. I haven't gone as far as setting CONTENT_LENGTH to -1,
> although feasibly I could given that what happens with WSGI for
> chunked request content is undefined as far as I can tell.
>
> So, what would Paste do in situation where it ran on mod_wsgi and was
> passed a chunked request, for the case of CONTENT_LENGTH not being set
> but Transfer-Encoding being chunked, and for the case of mod_wsgi
> passing CONTENT_LENGTH of -1, if it was changed to do that.

I find handing wsgi.input is kind of a mess, so at least it doesn't
feel like extensions are breaking an otherwise working thing ;) It
would be nice if there was some way to signal that wsgi.input was a
more file-like object (that safely returns '' at EOF). I'd really
just prefer everything worked like that, because overreading from a
socket is just too frustrating of a bug to allow it to just block and
assume consumers won't misuse the input. Another option might be
something like an optional attribute .chunked(chunk_size), which would
return an iterator that would yield strings of at most chunk_size
characters. (Would that be reentrant, though? Hrm)

If you do use -1 and don't have any extension, I don't think there's
any way to receive the body of the request except in one big string.
That's kind of crappy, but it's better than nothing. I'm not really
comfortable reading environ['HTTP_TRANSFER_ENCODING'], because that's
something that existing servers might very well leak in without having
the semantics you intend.

Some servers read all the input before calling to the WSGI app, at
which point they could calculate CONTENT_LENGTH. I actually think
this is quite reasonable, as having the application start work before
the request has been fully received seems kind of unnecessary.
(Except perhaps for 401 responses, or request body too big responses,
which would be nice to be able to implement in WSGI)

Jorge Vargas

unread,

Mar 15, 2009, 5:32:32 AM3/15/09

to Ian Bicking, paste...@googlegroups.com

I see, ok then I'll take this to the downstream app, maybe it's a bug
that it is not respecting the -1 and that is why I'm getting the
original error.

Reply all

Reply to author

Forward