[Web-SIG] Implementing File Upload Size Limits

34 views
Skip to first unread message

Randy Syring

unread,
Nov 22, 2008, 12:50:45 AM11/22/08
to web...@python.org
I am looking for opinions and thoughts on best practice for limiting file upload size.  I have a few considerations:
  • Ultimately, I would want my application with my method of handling forms to be able to give the user a message that the file size was too big.  That means that however, the size is limited, just blanking out wsgi.input and setting content-length to zero doesn't seem correct.  That would make it look like the form wasn't submitted with any data I believe.
  • Given the above, it seems that something would need to get put in the environment to tell middleware and the application that the file input was aborted, but what would be the best way for doing it?  Should it be some kind of standard, or just dependent on your server or middleware?
  • It seems best to implement this functionality as the very first middleware in the stack.  Since other middleware read and manipulate wsgi.input, handling the upload size at the application level wouldn't prevent middlware from wasting resources dealing with a very large file.
Is it possible to prevent the server from even accepting all the data (i.e. trying to save bandwidth and server resources) if the content-length is known to be too big?  Or is the server required to take all the client's data regardless, even if it ends up going in the bit bucket?  I realize some of this is server specific, not WSGI specific, but I would be interested in knowing how the most popular servers handle this or what the HTTP specs require if anyone knows.

Thanks in advance for any insight you might be able to provide.
-- 
--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31

Randy Syring

unread,
Nov 22, 2008, 4:07:53 AM11/22/08
to web...@python.org
I did find this:

http://wiki.pylonshq.com/display/pylonscookbook/A+Better+Way+To+Limit+File+Upload+Size

Which was good, but still leaves some unanswered questions:
  • What if one is not using the paste http server?
  • This method gives an unfriendly response.  What would be the best method to propagate this error condition down to the app so that a message could be given to the user in the context of the form they had previously submitted (i.e. an error message under the input field reminding them of the max upload size and even possibly telling them how big the file was they uploaded).
Thanks.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


Randy Syring wrote:

_______________________________________________ Web-SIG mailing list Web...@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com

Graham Dumpleton

unread,
Nov 22, 2008, 4:12:26 AM11/22/08
to Randy Syring, web...@python.org
2008/11/22 Randy Syring <ra...@rcs-comp.com>:

> I am looking for opinions and thoughts on best practice for limiting file
> upload size. I have a few considerations:
>
> Ultimately, I would want my application with my method of handling forms to
> be able to give the user a message that the file size was too big. That
> means that however, the size is limited, just blanking out wsgi.input and
> setting content-length to zero doesn't seem correct. That would make it
> look like the form wasn't submitted with any data I believe.
> Given the above, it seems that something would need to get put in the
> environment to tell middleware and the application that the file input was
> aborted, but what would be the best way for doing it? Should it be some
> kind of standard, or just dependent on your server or middleware?
> It seems best to implement this functionality as the very first middleware
> in the stack. Since other middleware read and manipulate wsgi.input,
> handling the upload size at the application level wouldn't prevent middlware
> from wasting resources dealing with a very large file.
>
> Is it possible to prevent the server from even accepting all the data (i.e.
> trying to save bandwidth and server resources) if the content-length is
> known to be too big? Or is the server required to take all the client's
> data regardless, even if it ends up going in the bit bucket? I realize some
> of this is server specific, not WSGI specific, but I would be interested in
> knowing how the most popular servers handle this or what the HTTP specs
> require if anyone knows.
>
> Thanks in advance for any insight you might be able to provide.

If you use Apache/mod_wsgi to host your WSGI application, the best way
of handling this is use the Apache LimitRequestNody directive for
appropriate context. This will result in Apache returning a
HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
you need a custom error document for that response type use Apache
ErrorDocument directive to specify URL of handler which would generate
it.

Except for the custom error document if delegated to the WSGI
application, doing it this way results in it all being handled by
Apache/mod_wsgi and your WSGI application will not even be invoked.
The request body content would also not even be read by Apache at all.
Do note that whether this avoids the client sending the request body
input depends on whether the client was expecting a '100 Continue'
response before it send the data. Most web browsers still I believe
don't use '100 Continue' response.

This would be the preferred solution for Apache/mod_wsgi as it is
handled at lowest levels and guaranteed that request content wouldn't
be read at that point. It is however taking control out of your
application.

For Apache/mod_wsgi, if you do not do it this way but instead validate
content length in the WSGI application and have the WSGI application
return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
whether the request content gets read depends on whether you are using
embedded mode or daemon mode of mod_wsgi.

If you use embedded mode, so long as your WSGI application doesn't
read the input and just returns the error response, the request
content wouldn't be read at all. If you are using daemon mode however,
then the request content would always be read by Apache child worker
process, even if client asked for '100 Continue' response. This is
because the Apache child worker process will always proxy request
content to the daemon process.

Anyway, that is how things are for Apache/mod_wsgi.

Graham


_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Randy Syring

unread,
Nov 22, 2008, 1:06:15 PM11/22/08
to Graham Dumpleton, web...@python.org
[forgot to copy list]

Graham Dumpleton wrote:
> 2008/11/22 Randy Syring <ra...@rcs-comp.com>:
>
>> I am looking for opinions and thoughts on best practice for limiting file
>> upload size. I have a few considerations:
>>

>> <snip>


>>
> If you use Apache/mod_wsgi to host your WSGI application, the best way
> of handling this is use the Apache LimitRequestNody directive for
> appropriate context. This will result in Apache returning a
> HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
> you need a custom error document for that response type use Apache
> ErrorDocument directive to specify URL of handler which would generate
> it.
>

Graham,

Thank you for your response. What you noted above does seem to be the
lowest level solution possible if you are using apache. I suppose using
an error document that is part of the application would at least allow
me to serve a specific page from my application that could detail the
error. If I wanted to get fancy, each time a form with an input element
was sent to a user, I could save that path in a special variable in the
user's session. My error page could then look for that value in the
user session and if present, load the correct form, giving the user an
error message noting that the file uploaded was too big. The downfall
to that approach is that the form comes back empty. It might be better
to just have the error page give them some details and encourage them to
use the back button, in which case the form's fields would hopefully
still be filled in.


> Except for the custom error document if delegated to the WSGI
> application, doing it this way results in it all being handled by
> Apache/mod_wsgi and your WSGI application will not even be invoked.
> The request body content would also not even be read by Apache at all.
> Do note that whether this avoids the client sending the request body
> input depends on whether the client was expecting a '100 Continue'
> response before it send the data. Most web browsers still I believe
> don't use '100 Continue' response.
>
> This would be the preferred solution for Apache/mod_wsgi as it is
> handled at lowest levels and guaranteed that request content wouldn't
> be read at that point. It is however taking control out of your
> application.
>

Hopefully you can clarify something for me. Lets assume that the client
does not use '100 Continue' but sends data immediately, after sending
the headers. If the server never reads the request content, what does
that mean exactly? Does the data get transferred over the wire but then
discarded or does the client not get to send the data until the server
reads the request body? I.e. the client tries to "send" it, but the
content isn't actually transferred across the wire until the server
reads it. I am just wondering if there is a buffer or queue or
something between the server and the client that allows data to be
transferred even if the server doesn't "read" the request body. Or, is
it just like a straight pipe where one end (the client) can't push data
through until the other end (the server) reads it.

I agree that it does take control out of the application. From a
usability perspective, the best solution IMO would be for the user to
get the form back and have a red error messsage under the input field
indicating the file size uploaded was too big and giving them the max
file size allowed. However, on second thought, that may not be true.
As noted above, because the entire request body was rejected, the form
loaded would have none of the information they submitted and most users
would probably think they have to fill out the whole form again.
Probably better to just give them a non-form error page and let them use
the back button (or even provide a link that uses javascript to go back)
and in so doing hopefully salvage the time they put into the form.

I suppose, though, that two different kinds of file size limits need to
be thought through. The first limit would be an application wide limit
that is set for security/resource reasons. That, I believe, is what we
have been discussing up to this point. I am just realizing that it
would also be fine to limit upload sizes at the application level and
give more user-friendly error messages. So I might decide on a 10MB
application-wide upload limit, but I might also restrict free accounts
and paid accounts to 256k and 5MB respectively. As long as a user
uploads something less than 10MB, they get a friendly in-line error
message. If they upload over 10MB, we handle that at the apache level
and send them to a custom error page.


> For Apache/mod_wsgi, if you do not do it this way but instead validate
> content length in the WSGI application and have the WSGI application
> return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
> whether the request content gets read depends on whether you are using
> embedded mode or daemon mode of mod_wsgi.
>
> If you use embedded mode, so long as your WSGI application doesn't
> read the input and just returns the error response, the request
> content wouldn't be read at all. If you are using daemon mode however,
> then the request content would always be read by Apache child worker
> process, even if client asked for '100 Continue' response. This is
> because the Apache child worker process will always proxy request
> content to the daemon process.
>
>

Thats good to know. I think at this point I have talked myself into
thinking that there is no good reason to handle it at the application
level, but would appreciate any further feedback you might have.

One other thing, what would be a good upload size limit? Should it
always be as low as possible? What might be a good "middle-ground" for
the average web application uploading documents and pictures?

Thank you for taking the time to respond.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or
whatever you do, do all to the glory
of God." 1 Cor 10:31

Brian Smith

unread,
Nov 25, 2008, 12:03:22 PM11/25/08
to ra...@rcs-comp.com, graham.d...@gmail.com, web...@python.org
Randy Syring wrote:
> Hopefully you can clarify something for me. Lets assume that the
> client does not use '100 Continue' but sends data immediately, after
> sending the headers. If the server never reads the request content,
> what does that mean exactly? Does the data get transferred over the
> wire but then discarded or does the client not get to send the data
> until the server reads the request body? I.e. the client tries to
> "send" it, but the content isn't actually transferred across the
> wire until the server reads it. I am just wondering if there
> is a buffer or queue or something between the server and the client
> that allows data to be transferred even if the server doesn't
> "read" the request body. Or, is it just like a straight pipe
> where one end (the client) can't push data through until the other
> end (the server) reads it.

Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
this scenario. The input and the output are buffered separately both of
those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
non-blocking I/O logic needed to prevent deadlocks. I heard (but did not
verify) that mod_fastcgi does not have this deadlocking problem. The sizes
of the buffers determines the size of the inputs and outputs needed to cause
a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default.

Therefore, for maximum portability, a WSGI application should ALWAYS consume
the *whole* request body if it wants to avoid the deadlock using the
reference WSGI adapter in PEP 333 or mod_wsgi.

Probably other WSGI gateways have similar issues. It would be nice if there
was a standard entry in the WSGI environment (e.g.
"wsgi.may_ignore_request_body") that could be used to safely detect when we
can skip the request body. It would be even nicer if WSGI gateways were
updated to avoid this problem. However, that is easier said than done.

If you know C, it is relatively simple to modify mod_wsgi to use a different
Apache<->daemon communication protocol so that the daemon mode works as you
would expect (no deadlocks, proper 100-continue support, request body isn't
read unless your application asks for it). A long time ago I had a patch
that did this (among other things) but I don't think I have it any more.

However, once you get to that point, you still run into problems. If your
goal is to avoid reading the request body, then you need to close the
connection in your error response; Otherwise, if the request was a HTTP/1.1
request, you still need to read the entire request body in order to process
any requests that follow it in the request pipeline. Unfortunately, a WSGI
application doesn't have any way of signaling that the connection is to be
closed; the WSGI specification forbids the WSGI application from returning
the Connection header since it is hop-by-hop. And, even if there was such a
mechanism, a poorly-coded client is likely to still cause a deadlock if the
server doesn't read its full request. Make sure you test with all your
targeted browsers.

Consequently...

> > If you are using daemon mode however,
> > then the request content would always be read by Apache child worker
> > process, even if client asked for '100 Continue' response. This is
> > because the Apache child worker process will always proxy request
> > content to the daemon process.
> >
> Thats good to know. I think at this point I have talked myself into
> thinking that there is no good reason to handle it at the application
> level, but would appreciate any further feedback you might have.

...if your users will often attempt to upload large files exceed your
limits, is to best to mitigate the problem on the client-side. First,
document the file size limit clearly on the page where the upload happens.
Secondly, implement a flash-based and/or java-based file upload control that
can be used when the user has Flash installed (fall back to the regular
control otherwise). With such an uploader, you can check the file size on
the client and prevent these requests from even being made (in the typical
case). You will still have to implement the validation logic on the server
to prevent malicious use and/or disabled Javascript/Flash/Java. There are
additional benefits to this approach (better UI, multi-file selection,
compression, encryption, doesn't waste the user's time, saves bandwidth) but
it comes with all the drawbacks inherent with Flash/Java/Javascript.

Regards,
Brian

Andrew Clover

unread,
Nov 25, 2008, 3:14:52 PM11/25/08
to Web SIG
Brian Smith wrote:

> Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
> this scenario.

Under IIS CGI it's considerably more likely. The output buffer you get
is smaller than Apache/Linux (at least on Win2K3 it's only 2KB), so even
a relatively small error page spat out before reading the whole input
will result in a cheeky hang.

> Therefore, for maximum portability, a WSGI application should ALWAYS consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi

(...in daemon mode)

yep.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

Graham Dumpleton

unread,
Nov 25, 2008, 5:59:10 PM11/25/08
to Brian Smith, web...@python.org
2008/11/26 Brian Smith <br...@briansmith.org>:

> Randy Syring wrote:
>> Hopefully you can clarify something for me. Lets assume that the
>> client does not use '100 Continue' but sends data immediately, after
>> sending the headers. If the server never reads the request content,
>> what does that mean exactly? Does the data get transferred over the
>> wire but then discarded or does the client not get to send the data
>> until the server reads the request body? I.e. the client tries to
>> "send" it, but the content isn't actually transferred across the
>> wire until the server reads it. I am just wondering if there
>> is a buffer or queue or something between the server and the client
>> that allows data to be transferred even if the server doesn't
>> "read" the request body. Or, is it just like a straight pipe
>> where one end (the client) can't push data through until the other
>> end (the server) reads it.
>
> Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
> this scenario. The input and the output are buffered separately both of
> those buffers can fill up.

It isn't 'many situations', it is a quite specific situation.

The issue applies only to mod_wsgi daemon mode and only occurs where
the size of the request content body size is larger than the UNIX
socket buffer size for that platform and the WSGI application doesn't
consume all the request body. At the same time, the WSGI application
would then have to return a set of response headers and response body
which combined are also larger than the UNIX socket buffer size for
that platform.

> Neither mod_wsgi nor mod_cgid implement the
> non-blocking I/O logic needed to prevent deadlocks.

Both mod_wsgi and mod_cgi do have timeouts so that a permanent
deadlock situation at least doesn't arise. This is based off standard
Apache Timeout directive. AFAIK I know mod_cgid still has bug in it
whereby it doesn't detect it and so possibly easy way to DOS an Apache
server.

As far as changing how mod_wsgi works, there exists the issue:

http://code.google.com/p/modwsgi/issues/detail?id=56

It is low priority though as no one has been reporting it as a problem
in actual use. Scenarios where it technically might be triggered would
generally be SPAM bots trying to POST large amounts of data to
arbitrary URLs. If an application is function as intended, the
situation shouldn't really arise as POST requests should be getting
directed at URLs which will consume it.

That issue also references the IIS+CGI issue someone else mentioned:

http://www.doxdesk.com/updates/2006.html#u20060416-cgi

FWIW, mod_scgi also has same problem and it doesn't implement timeouts
so can suffer permanent deadlock.

> I heard (but did not
> verify) that mod_fastcgi does not have this deadlocking problem. The sizes
> of the buffers determines the size of the inputs and outputs needed to cause
> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default.

MacOS X is only system I know of that has small default UNIX socket
buffer sizes. This small buffer size only applies to UNIX socket
buffer sizes, for INET sockets it is much much larger. Since
mod_fastcgi predominantly uses INET sockets, if there is an issue it
may not be obvious as you would need to be returning very large
response. From what I remember when I looked at mod_fastcgi and
mod_proxy for certain types of operations they both try and force all
request content down the socket before trying to read response. Thus,
am not convinced that problem couldn't actually occur for both of
these as well, but since INET socket buffer size much much larger, not
generally triggered.

To work around UNIX socket buffer size on mod_wsgi, there are options
which can be supplied to WSGIDaemonProcess to change the UNIX socket
buffer sizes used to something more sensible.

> Therefore, for maximum portability, a WSGI application should ALWAYS consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi.
>
> Probably other WSGI gateways have similar issues. It would be nice if there
> was a standard entry in the WSGI environment (e.g.
> "wsgi.may_ignore_request_body") that could be used to safely detect when we
> can skip the request body. It would be even nicer if WSGI gateways were
> updated to avoid this problem. However, that is easier said than done.
>
> If you know C, it is relatively simple to modify mod_wsgi to use a different
> Apache<->daemon communication protocol so that the daemon mode works as you
> would expect (no deadlocks, proper 100-continue support, request body isn't
> read unless your application asks for it). A long time ago I had a patch
> that did this (among other things) but I don't think I have it any more.

Depends on your definition of simple. It would be quite fiddly to do
and get right, or one would have to rewrite a large amount of code. I
wouldn't regard either as really that simple.

> However, once you get to that point, you still run into problems. If your
> goal is to avoid reading the request body, then you need to close the
> connection in your error response; Otherwise, if the request was a HTTP/1.1
> request, you still need to read the entire request body in order to process
> any requests that follow it in the request pipeline. Unfortunately, a WSGI
> application doesn't have any way of signaling that the connection is to be
> closed; the WSGI specification forbids the WSGI application from returning
> the Connection header since it is hop-by-hop. And, even if there was such a
> mechanism, a poorly-coded client is likely to still cause a deadlock if the
> server doesn't read its full request. Make sure you test with all your
> targeted browsers.

Apache, and I would expect any sensible web server, always closes a
client connection when error responses are returned. Thus it will only
allow request pipelining so long as 200 response is returned. Okay, it
isn't this simple as Apache looks at lots of other things as well, but
close enough.

The WSGI specification may forbid returning Connection header, but if
you do do it with mod_wsgi, then Apache will note it and close the
connection even if 200 response is returned.

Graham

Brian Smith

unread,
Nov 26, 2008, 10:01:43 AM11/26/08
to graham.d...@gmail.com, web...@python.org
Brian Smith wrote:
> 2008/11/26 Brian Smith <br...@briansmith.org>:

> > Under Apache CGI or mod_wsgi, in many situations you will get a
> > deadlock in this scenario.
>
> It isn't 'many situations', it is a quite specific situation.

Right. I meant that it can happen quite often (every time) that situation
occurs, depending on the characteristics of the application.



> > If you know C, it is relatively simple to modify mod_wsgi to use a
> > different Apache<->daemon communication protocol
>

> Depends on your definition of simple. It would be quite fiddly to do
> and get right, or one would have to rewrite a large amount of code. I
> wouldn't regard either as really that simple.

I did it by implementing the communication protocol that I had proposed on
the mod_wsgi mailing list a while ago. It is straightforward to do, but it
does take a lot of time to learn how mod_wsgi works in order to make the
change, especially if you have never written an Apache module before.

- Brian

Robert Brewer

unread,
Nov 27, 2008, 12:07:31 PM11/27/08
to Brian Smith, ra...@rcs-comp.com, graham.d...@gmail.com, web...@python.org

Indeed. This is covered in RFC 2616 Section 8.2.3:

If an origin server receives a request that does not include an
Expect request-header field with the "100-continue" expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations.

CherryPy's wsgiserver will read any remaining request body (which the
application hasn't read) before sending response headers.


Robert Brewer
fuma...@aminus.org

Graham Dumpleton

unread,
Nov 27, 2008, 6:15:17 PM11/27/08
to Robert Brewer, web...@python.org
2008/11/28 Robert Brewer <fuma...@aminus.org>:

A WSGI application could technically want to send response headers and
only then read remaining request content. I don't believe there is
anything in the WSGI specification which prevents that. If you are
discarding the request content as soon as response headers are
generated, that could technically be a problem for some use cases,
even if they may be obscure.

I cant tell from looking at latest CherryPy WSGI server code as has
been changed since last I looked at it and haven't yet had time to
grok it and run some tests, but previously in respect of where WSGI
specification says:

"""The server is not required to read past the client's specified
Content-Length, and is allowed to simulate an end-of-file condition if
the application attempts to read past that point."""

the CherryPy WSGI server code chose NOT to simulate an end-of-file
condition. This was the case as the amount of data read from
wsgi.input was never tracked. This meant that if application did try
and read more content than available and request pipelining occurring
then the read would hang as would not get an empty string returned as
would be normal for end-of-file condition for file like object.

If the code is still behaving this way, then it wouldn't be possible
for it to discard remaining input as how much was read wasn't tracked.

Looking at latest code I do note the presence of a wrapper around
socket used for wsgi.input, but haven't been able to work out yet
whether it returns a traditional empty string as end-of-file
condition, or whether it is going to instead raise your
MaxSizeExceeded exception and thus not be file like in it behaviour.

Can you perhaps explain what is going to happen when an attempt is
made to read more content than what was available and whether it is
actually going to raise an exception rather than just return an empty
string like file like objects would.

Personally I think that that part of WSGI specification should be
amended such that it is required that an end-of-file condition MUST be
indicated using an empty string just like with normal file like
objects. Just this one change would mean that one could call read()
with no arguments and have it return all input, whereas at the moment
WSGI specification does allow argument to read() be optional.

This would actually negate the whole need for applications to even
check/use CONTENT_LENGTH except for situations where it mattered such
as 413 response or where how it decided to process it was dependent on
size. That is, to get all request content you would just call read()
with no argument. If you wanted to process it in chunks, then it would
just loop reading a set chunk size until empty string returned and it
wouldn't need to track how much it read and short read the last chunk.
If applications worked this way then one could handle mutating input
filters that changed amount of request content, ie., decompression of
data, plus could handle chunked transfer encoding on request content
in a reasonable way without having to read it all in and buffer it
just to work out CONTENT_LENGTH.

Up till now, the only major WGSI server (ignoring wsgiref perhaps) I
knew of which didn't allow read() with no argument or which didn't
simulate end-of-file through empty string being returned was CherryPy
WSGI server. Now its code has been changed, but not sure if it still
does that or whether it has done something totally different to
everything else by raising an exception instead.

Graham

Robert Brewer

unread,
Nov 28, 2008, 12:58:25 AM11/28/08
to Graham Dumpleton, web...@python.org
Graham Dumpleton wrote:
> 2008/11/28 Robert Brewer <fuma...@aminus.org>:

> > CherryPy's wsgiserver will read any remaining request body (which
the
> > application hasn't read) before sending response headers.
>
> A WSGI application could technically want to send response headers and
> only then read remaining request content. I don't believe there is
> anything in the WSGI specification which prevents that. If you are
> discarding the request content as soon as response headers are
> generated, that could technically be a problem for some use cases,
> even if they may be obscure.

I'll look into that further.

> I cant tell from looking at latest CherryPy WSGI server code as has
> been changed since last I looked at it and haven't yet had time to
> grok it and run some tests, but previously in respect of where WSGI
> specification says:
>
> """The server is not required to read past the client's specified
> Content-Length, and is allowed to simulate an end-of-file condition if
> the application attempts to read past that point."""
>
> the CherryPy WSGI server code chose NOT to simulate an end-of-file
> condition. This was the case as the amount of data read from
> wsgi.input was never tracked. This meant that if application did try
> and read more content than available and request pipelining occurring
> then the read would hang as would not get an empty string returned as
> would be normal for end-of-file condition for file like object.
>
> If the code is still behaving this way, then it wouldn't be possible
> for it to discard remaining input as how much was read wasn't tracked.
>
> Looking at latest code I do note the presence of a wrapper around
> socket used for wsgi.input, but haven't been able to work out yet
> whether it returns a traditional empty string as end-of-file
> condition, or whether it is going to instead raise your
> MaxSizeExceeded exception and thus not be file like in it behaviour.

It still raises MaxSizeExceeded.

I'd be open to changing it to EOF instead of error; amending the WSGI
spec would be nice too.


Robert Brewer
fuma...@aminus.org

Reply all
Reply to author
Forward
0 new messages