While testing Trac-on-mod_wsgi, I ran into a problem with POST request
data. My wiki edits were acting really strange, doing an edit when I
asked for a preview and truncating the contents for the new page. So
after initially blaming Trac, I started to dive in, and it seems the
problem is already there by the time the request data gets to Trac. So
I guess the problem may lie inside mod_wsgi.
I'm seeing that environ['wsgi.input'].read() just returns a 1387-
character string, even if there's more. This is with mod_wsgi r203, by
the way. Is it possible that there's a bug like this in mod_wsgi?
Regards,
Manuzhai
That said, I don't discount there could be a problem with mod_wsgi. :-)
So, if you can send me a reference to and/or snippet of Trac code that
does the read that would help.
Thanks.
Graham
An input stream (file-like object) from which the HTTP request body
can be read.
When performing a read on a file-like object in Python it doesn't have
to return the full amount of data that was requested to be read. This
would occur for example where the file-like object was actually a
wrapper around a non blocking socket.
Quoting the Python function help for read() on sys.stdin as some
support for support for this view:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
Thus, what you are probably finding is that not all data was available
for reading at the time the read was performed and so it just returned
what data was available at that time as a file-like object is allowed
to do.
Now this is where we run into problems with the WSGI PEP as it is
quite hazy on how wsgi.input should actually work and thus some
applications being hosted on top of WSGI may just assume that they
will always be returned the full amount of data they asked for when in
practice a portable WSGI application shouldn't be doing that.
Thus it would seem that Trac may simply be assuming that it will get
back everything it asks for when that is actually incorrect behaviour
because of how Python defines a file-like object.
Problem now is that although what mod_wsgi is doing probably fits
within what the specification says is correct, there probably are
going to be various WGSI applications which aren't robust in dealing
with partial reads.
This is going to be tricky and I'll have to work out what I am going
to do. I may have to provide a directive that can be enabled/disabled
so as to perhaps enforce that partial reads will not occur to cope
with these applications. Maybe should be called WSGIInputBlocking and
make it default to On so that these non robust WSGI applications still
work.
I may have to once again raise with the Python WEB-SIG group the
inadequate definition of how wsgi.input behaves, although responses
haven't always been too helpful in the past. :-(
Graham
Hmm, that makes sense (I was just composing a reply pointing out all
of this as well). So I guess it would be a lright for me to file a bug
with Trac after all, since this is partly their problem. Good luck
with figuring out a solution for mod_wsgi, seems like a hard problem.
Regards,
Manuzhai
In the mean time, in trac/web/api.py, change:
def read(self, size=None):
"""Read the specified number of bytes from the request body."""
fileobj = self.environ['wsgi.input']
if size is None:
size = self.get_header('Content-Length')
if size is None:
size = -1
else:
size = int(size)
data = fileobj.read(size)
return data
to something like:
def read(self, size=None):
"""Read the specified number of bytes from the request body."""
fileobj = self.environ['wsgi.input']
if size is None:
size = self.get_header('Content-Length')
if size is None:
size = -1
else:
size = int(size)
input = []
length = size
while length:
data = fileobj.read(length)
length -= len(data)
input.append(data)
return ''.join(input)
Changing mod_wsgi isn't that hard as have similar code to what is
required in readline().
Graham
On 26/04/07, Manuzhai <manu...@gmail.com> wrote:
>
New version shortly, just trying to work out how I can test change.
Graham
On 26/04/07, Graham Dumpleton <graham.d...@gmail.com> wrote:
I haven't been able to realistically test this on example where I knew
it would trigger what you were seeing, but changes minimal enough that
shouldn't have caused any further problems. Simple post cases still
work okay.
Thanks for catching this issue.
That fixed my primary problem. Unfortunately, I've already reopened
the Trac bug I had opened earlier and pointed it at this thread. See
http://trac.edgewall.org/ticket/5094
Thanks for fixing this!
Manuzhai
If the fix to mod_wsgi solves the problem and is what mod_wsgi should
be doing anyway, will the bug report be closed now or is there still
some outstanding issue.
Graham
Well, since the Trac behavior is still wrong according to a literal
reading of the WSGI spec, I'd still like for Trac to be fixed to
handle this (admittedly edge) case, even if your mod_wsgi fix also
solved the problem. I'm not exactly sure how invasive your changes in
r204 were or even what exactly they mean in terms of your algorithm,
but I can see from your reasoning that there is a point to your
previous behavior, so Trac should be able to handle that, too.
Regards,
Manuzhai
As I said in prior email but perhaps didn't make clear, it is actually
me and mod_wsgi that had it wrong. This is because WSGI PEP doesn't
really allow non blocking file input and thus that a file-like object
in Python can return partial reads for a non blocking input isn't
really valid in the context of WSGI. Thus, no point in Trac being
changed.
The changes in mod_wsgi were actually quite trivial, I just had to put
the read in a while loop until required length attained. It is quite
possible that I had it that way originally but thought I could
optimise it at some point to avoid having to keep doing reads until
all data was read. I more than likely got confused at some point as to
what I was supposed to do due to related issues around how using
read() with no argument should behave.
Graham