I also made a ticket in the TG trac and committed a temporal patch to
it http://trac.turbogears.org/ticket/2241
Thoughts?
---------- Forwarded message ----------
From: Jorge Vargas <jorge....@gmail.com>
Date: Tue, Feb 24, 2009 at 8:10 AM
Subject: request.copy() != request, webob bug??
To: turbogea...@googlegroups.com
Hey,
After several hours after trying to figure out why each and every
method in codemill's hg was working except push I notice that
CONTENT_LENGTH was getting set to -1 a couple of minutes later I ran
the following in weberror request==new_req to my surprise it returned
false, even better request.copy()==request returns FalsE!
While still in shock by all this I sat down and wrote 3 unit tests one
for TG, one for pylons and one for plain webob, to my surpise all of
them are failing! so what's wrong? how come copy doesn't copy? even if
the webob docs say so
http://pythonpaste.org/webob/reference.html#modifying-the-request why
am I getting a content-lenght of -1??
Still skeptic about this I went to the WSGIAppController and added
+ new_req.environ['CONTENT_LENGTH']=request.environ['CONTENT_LENGTH']
and now i can push! anyone knows what the heck is going on??
patch with the tests
http://paste.chrisarndt.de/paste/dbd7e1ff954542caa647ef65d6df97cc
Once you read the POST variables, WebOb doesn't keep the request body.
Instead it will reproduce the body from the POST variables. This can
change the length, since there isn't a canonical encoding for POST
variables (you can %-encode any value). The -1 tells libraries to
read the body in its entirety, instead of relying on a specific
length.
> as for the == isn't .copy supposed to produce an exact same copy of
> the object? but with a different memory space? I guess when I said ==
> I wasn't talking specifically about the python nothing of copy rather
> the fact that the copy is not equal because the content_lenght is
> different.
During the copy process WebOb makes the body concrete again, which
gives the copied object a specific content_length.
I see, so from this it seems my patch is wrong. (see below)
>
>> as for the == isn't .copy supposed to produce an exact same copy of
>> the object? but with a different memory space? I guess when I said ==
>> I wasn't talking specifically about the python nothing of copy rather
>> the fact that the copy is not equal because the content_lenght is
>> different.
>
> During the copy process WebOb makes the body concrete again, which
> gives the copied object a specific content_length.
>
right, well for some reason this isn't working for me, inside
TG/pylons as you can see from this patch.
http://paste.chrisarndt.de/paste/dbd7e1ff954542caa647ef65d6df97cc and
the explanation here http://trac.turbogears.org/ticket/2241
The relevant class is a helper we added to mount WSGIapps in a sane
simple way inside TurboGears object dispatch controller and allow our
auth system to protect them
http://trac.turbogears.org/browser/trunk/tg/controllers.py?rev=6442#L756
I know the code is a bit hairy but this is currently an experiment.
improvements are welcome :)
Well, I guess I'm wondering why you want the request objects to be
equal? I don't think they will be equal, or even can reasonably be
equal, so avoiding that requirement would be best.
I don't want them to be equal. That was a bad assumption on me part.
I want it to maintain the content_length which for some reason it's
being lost. For some reason the only element that is not equal is
this.
That said after this thread you gave me the clue that maybe my
downstream application is not respecting this content_length = -1 an
re-reading the body, and instead it's generating an error.
It should be okay to re-read the body, especially if you just do
environ['wsgi.input'].read(int(environ['CONTENT_LENGTH')) -- -1 in
that context means to read everything. Of course if you app is
expecting to get len(result) == int(environ['CONTENT_LENGTH']), that
won't be true.
I'm pretty sure it's consistent with cgi.FieldStorage. I'm really
just using -1 in places where previously things were complete broken
(e.g., wedged sockets because two consumers were reading the input).
So there's not many regressions because everything was broken
previously ;) But there are some problems, for instance there was
just a problem with paste.proxy because it was sending an actual HTTP
header with Content-Length: -1.
> The reason I ask relates to quite an old discussion on Python WEB-SIG
> about having WSGI support chunked request encoding, something that
> technically it cant at the moment. The suggestion back then from you
> was for CONTENT_LENGTH to be set to -1 to indicate that unknown length
> data was present. So, same idea you are applying here.
>
> Because I keep having requests about supporting chunked request
> content, in mod_wsgi 3.0 I allow chunked requests, but it would be up
> to application to detect that scenario by looking at Transfer-Encoding
> header to know that is what was supplied and decide to step outside of
> what WSGI says you are supposed to do in respect of handling
> wsgi.input. I haven't gone as far as setting CONTENT_LENGTH to -1,
> although feasibly I could given that what happens with WSGI for
> chunked request content is undefined as far as I can tell.
>
> So, what would Paste do in situation where it ran on mod_wsgi and was
> passed a chunked request, for the case of CONTENT_LENGTH not being set
> but Transfer-Encoding being chunked, and for the case of mod_wsgi
> passing CONTENT_LENGTH of -1, if it was changed to do that.
I find handing wsgi.input is kind of a mess, so at least it doesn't
feel like extensions are breaking an otherwise working thing ;) It
would be nice if there was some way to signal that wsgi.input was a
more file-like object (that safely returns '' at EOF). I'd really
just prefer everything worked like that, because overreading from a
socket is just too frustrating of a bug to allow it to just block and
assume consumers won't misuse the input. Another option might be
something like an optional attribute .chunked(chunk_size), which would
return an iterator that would yield strings of at most chunk_size
characters. (Would that be reentrant, though? Hrm)
If you do use -1 and don't have any extension, I don't think there's
any way to receive the body of the request except in one big string.
That's kind of crappy, but it's better than nothing. I'm not really
comfortable reading environ['HTTP_TRANSFER_ENCODING'], because that's
something that existing servers might very well leak in without having
the semantics you intend.
Some servers read all the input before calling to the WSGI app, at
which point they could calculate CONTENT_LENGTH. I actually think
this is quite reasonable, as having the application start work before
the request has been fully received seems kind of unnecessary.
(Except perhaps for 401 responses, or request body too big responses,
which would be nice to be able to implement in WSGI)
I see, ok then I'll take this to the downstream app, maybe it's a bug
that it is not respecting the -1 and that is why I'm getting the
original error.