Truncated request data?

60 views
Skip to first unread message

Manuzhai

unread,
Apr 26, 2007, 3:58:59 AM4/26/07
to modwsgi
Hi there,

While testing Trac-on-mod_wsgi, I ran into a problem with POST request
data. My wiki edits were acting really strange, doing an edit when I
asked for a preview and truncating the contents for the new page. So
after initially blaming Trac, I started to dive in, and it seems the
problem is already there by the time the request data gets to Trac. So
I guess the problem may lie inside mod_wsgi.

I'm seeing that environ['wsgi.input'].read() just returns a 1387-
character string, even if there's more. This is with mod_wsgi r203, by
the way. Is it possible that there's a bug like this in mod_wsgi?

Regards,

Manuzhai

Graham Dumpleton

unread,
Apr 26, 2007, 6:21:13 AM4/26/07
to mod...@googlegroups.com
Is Trac supplying a length of data to read, or no arguments? If it
isn't supplying an argument and expecting to get all available input
then Trac is wrong as wsgi.input is a file like object and thus
doesn't necessarily have to return all available input when read() is
called with no argument. A caller is meant to keep calling it until an
empty string is returned. Also, the way that the WSGI specification is
written, it is actually undefined as to what happens when read() is
called with no argument and a WSGI adapter is at liberty to hang if an
attempt is made to read more data than what is specified by the
content length.

That said, I don't discount there could be a problem with mod_wsgi. :-)

So, if you can send me a reference to and/or snippet of Trac code that
does the read that would help.

Thanks.

Graham

Graham Dumpleton

unread,
Apr 26, 2007, 6:52:27 AM4/26/07
to mod...@googlegroups.com
The problem is the WSGI PEP says:

An input stream (file-like object) from which the HTTP request body
can be read.

When performing a read on a file-like object in Python it doesn't have
to return the full amount of data that was requested to be read. This
would occur for example where the file-like object was actually a
wrapper around a non blocking socket.

Quoting the Python function help for read() on sys.stdin as some
support for support for this view:

read(...)
read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.

Thus, what you are probably finding is that not all data was available
for reading at the time the read was performed and so it just returned
what data was available at that time as a file-like object is allowed
to do.

Now this is where we run into problems with the WSGI PEP as it is
quite hazy on how wsgi.input should actually work and thus some
applications being hosted on top of WSGI may just assume that they
will always be returned the full amount of data they asked for when in
practice a portable WSGI application shouldn't be doing that.

Thus it would seem that Trac may simply be assuming that it will get
back everything it asks for when that is actually incorrect behaviour
because of how Python defines a file-like object.

Problem now is that although what mod_wsgi is doing probably fits
within what the specification says is correct, there probably are
going to be various WGSI applications which aren't robust in dealing
with partial reads.

This is going to be tricky and I'll have to work out what I am going
to do. I may have to provide a directive that can be enabled/disabled
so as to perhaps enforce that partial reads will not occur to cope
with these applications. Maybe should be called WSGIInputBlocking and
make it default to On so that these non robust WSGI applications still
work.

I may have to once again raise with the Python WEB-SIG group the
inadequate definition of how wsgi.input behaves, although responses
haven't always been too helpful in the past. :-(

Graham

Manuzhai

unread,
Apr 26, 2007, 6:58:07 AM4/26/07
to mod...@googlegroups.com
On 4/26/07, Graham Dumpleton <graham.d...@gmail.com> wrote:
> Thus, what you are probably finding is that not all data was available
> for reading at the time the read was performed and so it just returned
> what data was available at that time as a file-like object is allowed
> to do.

Hmm, that makes sense (I was just composing a reply pointing out all
of this as well). So I guess it would be a lright for me to file a bug
with Trac after all, since this is partly their problem. Good luck
with figuring out a solution for mod_wsgi, seems like a hard problem.

Regards,

Manuzhai

Graham Dumpleton

unread,
Apr 26, 2007, 7:09:04 AM4/26/07
to mod...@googlegroups.com
Don't file a report with Trac at this point as I am sure they will
argue that there is enough existing precedent with WSGI applications
that a WSGI adapter should give the appearance of being blocking and
therefore input isn't really file-like after all as far as Python
defines it and what WSGI PEP says.

In the mean time, in trac/web/api.py, change:

def read(self, size=None):
"""Read the specified number of bytes from the request body."""
fileobj = self.environ['wsgi.input']
if size is None:
size = self.get_header('Content-Length')
if size is None:
size = -1
else:
size = int(size)
data = fileobj.read(size)
return data

to something like:

def read(self, size=None):
"""Read the specified number of bytes from the request body."""
fileobj = self.environ['wsgi.input']
if size is None:
size = self.get_header('Content-Length')
if size is None:
size = -1
else:
size = int(size)

input = []
length = size

while length:
data = fileobj.read(length)
length -= len(data)
input.append(data)

return ''.join(input)

Changing mod_wsgi isn't that hard as have similar code to what is
required in readline().

Graham

On 26/04/07, Manuzhai <manu...@gmail.com> wrote:
>

Graham Dumpleton

unread,
Apr 26, 2007, 7:50:24 AM4/26/07
to mod...@googlegroups.com
The way I have interpreted WSGI PEP is probably wrong in some respects
as the WSGI PEP doesn't really support the whole concept of non
blocking sockets as there is no way to indicate that a read would in
fact block. Thus I am wrong to assume that can return less data than
what was requested.

New version shortly, just trying to work out how I can test change.

Graham

On 26/04/07, Graham Dumpleton <graham.d...@gmail.com> wrote:

Graham Dumpleton

unread,
Apr 26, 2007, 7:59:48 AM4/26/07
to mod...@googlegroups.com
Try code in subversion repository now. Should be revision 204.

I haven't been able to realistically test this on example where I knew
it would trigger what you were seeing, but changes minimal enough that
shouldn't have caused any further problems. Simple post cases still
work okay.

Thanks for catching this issue.

Manuzhai

unread,
Apr 26, 2007, 12:13:25 PM4/26/07
to mod...@googlegroups.com
On 4/26/07, Graham Dumpleton <graham.d...@gmail.com> wrote:
> Try code in subversion repository now. Should be revision 204.

That fixed my primary problem. Unfortunately, I've already reopened
the Trac bug I had opened earlier and pointed it at this thread. See
http://trac.edgewall.org/ticket/5094

Thanks for fixing this!

Manuzhai

Graham Dumpleton

unread,
Apr 26, 2007, 5:42:22 PM4/26/07
to mod...@googlegroups.com
I noted the bug report.

If the fix to mod_wsgi solves the problem and is what mod_wsgi should
be doing anyway, will the bug report be closed now or is there still
some outstanding issue.

Graham

Manuzhai

unread,
Apr 26, 2007, 9:07:14 PM4/26/07
to mod...@googlegroups.com
On 4/26/07, Graham Dumpleton <graham.d...@gmail.com> wrote:
> If the fix to mod_wsgi solves the problem and is what mod_wsgi should
> be doing anyway, will the bug report be closed now or is there still
> some outstanding issue.

Well, since the Trac behavior is still wrong according to a literal
reading of the WSGI spec, I'd still like for Trac to be fixed to
handle this (admittedly edge) case, even if your mod_wsgi fix also
solved the problem. I'm not exactly sure how invasive your changes in
r204 were or even what exactly they mean in terms of your algorithm,
but I can see from your reasoning that there is a point to your
previous behavior, so Trac should be able to handle that, too.

Regards,

Manuzhai

Graham Dumpleton

unread,
Apr 26, 2007, 9:16:07 PM4/26/07
to mod...@googlegroups.com
On 27/04/07, Manuzhai <manu...@gmail.com> wrote:
>

As I said in prior email but perhaps didn't make clear, it is actually
me and mod_wsgi that had it wrong. This is because WSGI PEP doesn't
really allow non blocking file input and thus that a file-like object
in Python can return partial reads for a non blocking input isn't
really valid in the context of WSGI. Thus, no point in Trac being
changed.

The changes in mod_wsgi were actually quite trivial, I just had to put
the read in a while loop until required length attained. It is quite
possible that I had it that way originally but thought I could
optimise it at some point to avoid having to keep doing reads until
all data was read. I more than likely got confused at some point as to
what I was supposed to do due to related issues around how using
read() with no argument should behave.

Graham

Reply all
Reply to author
Forward
0 new messages