mod_deflate input filter with mod_wsgi lags Apache

269 views
Skip to first unread message

Aaron Staley

unread,
Oct 13, 2009, 9:14:46 PM10/13/09
to modwsgi
For an application I'm developing, the user submits a gzipped HTTP
POST request (content-encoding: GZIP) with multipart form data
(content-type: multipart/form-data). I use mod_deflate as an input
filter to decompress and the web request is processed in Django via
mod_wsgi.

Generally, everything is fine. But for certain requests
(deterministic), there is almost a minute lag from request to
response. Investigation shows that the processing in Django is done
immediately, but the response from the server stalls. If the request
is not GZIPed, all works well.

Note that in what is probably a violation of the HTTP specification, I
set the request content-length to the uncompressed message size. This
has thus far worked, but I'm not sure if something "bad" is occurring
within apache or mod_deflate because of this 'hack'. (More info here
in comments on this bug: http://code.djangoproject.com/ticket/10819#comment:1).

Has anyone run into this problem? I've been advised that I just
shouldn't be using input filters with wsgi applications; is that the
consensus here? Will I just have to use a (likely significantly
slower and more memory consuming) decompressing implementation within
Django?

Graham Dumpleton

unread,
Oct 13, 2009, 9:26:34 PM10/13/09
to mod...@googlegroups.com
2009/10/14 Aaron Staley <usa...@gmail.com>:

What is the typical size of your compressed request content?

Am going to write a WSGI middleware wrapper for you which recalculates
correct CONTENT_LENGTH, however, if files are two big, then will need
to use a temporary file on disk rather than buffer it in memory.

Also, so I know what to check for, what header is it in request which
designates content was compressed and what it is set to. I can look
this up, but just want to be sure of what you are using.

In respect of SO comment and the original Django ticket, it wasn't
suggest you send wrong Content-Length from client, but that
CONTENT_LENGTH be recalculated as WSGI middleware I will post when
done will do.

Graham

Aaron Staley

unread,
Oct 13, 2009, 9:56:39 PM10/13/09
to modwsgi
Hi Graham,
Thanks a lot. I am setting Content-Encoding to gzip in the request
header.
The typical request is only a few hundred bytes. The one that is
having the issue I described is 40kb uncompressed.

-Aaron

On Oct 13, 6:26 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/10/14 Aaron Staley <usaa...@gmail.com>:

Graham Dumpleton

unread,
Oct 13, 2009, 10:02:46 PM10/13/09
to mod...@googlegroups.com
Just so you know what I expect issue is in that you are seeing. When
Apache finalises a request, if HTTP 1.1 and connection to be kept
alive for subsequent request, it has to read any remaining request
content up to Content-Length. If you have set Content-Length to be
more than is actually there, Apache will hang waiting to see if that
extra data comes. Eventually the browser probably kills connection and
server gets unstuck.

I'll try and send the WSGI middleware later. Because requests are
small, buffering can be done in memory. I think Django loads it all in
memory anyway.

Graham

2009/10/14 Aaron Staley <usa...@gmail.com>:

Graham Dumpleton

unread,
Oct 13, 2009, 10:45:42 PM10/13/09
to mod...@googlegroups.com
Try this in your WSGI script file:

class Wrapper:

def __init__(self, application):
self.__application = application

def __call__(self, environ, start_response):
if environ.get('CONTENT_ENCODING', '') == 'gzip':
buffer = cStringIO.StringIO()
input = environ['wsgi.input']
blksize = 8192
length = 0

data = input.read(blksize)
buffer.write(data)
length += len(data)

while data:
data = input.read(blksize)
buffer.write(data)
length += len(data)

buffer = cStringIO.StringIO(buffer.getvalue())

environ['wsgi.input'] = buffer
environ['CONTENT_LENGTH'] = length

return self.__application(environ, start_response)

import os, sys
sys.path.append('/usr/local/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'

import django.core.handlers.wsgi

application = Wrapper(django.core.handlers.wsgi.WSGIHandler())

In other words, wrap Django WSGI application instance in this Wrapper object.

It should when seeing gzip content encoding, read in all request
content, ignore CONTENT_LENGTH, then recalculate CONTENT_LENGTH and
replace wsgi.input with data just read.

The Content-Length in your original request should be what it should
be, that is the size of the compressed content being sent.

This will work with mod_wsgi, but not necessarily all other WSGI adapters.

Graham

2009/10/14 Graham Dumpleton <graham.d...@gmail.com>:

Aaron Staley

unread,
Oct 14, 2009, 8:36:36 AM10/14/09
to modwsgi
Graham,
Your solution worked perfectly. Thank you so much for your help!

One thing though: I had to use HTTP_CONTENT_ENCODING, as opposed to
CONTENT_ENCODING in the environ.get line. Is the dictionary key
changing in the next version of mod_wsgi?

-Aaron

On Oct 13, 7:45 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/10/14 Graham Dumpleton <graham.dumple...@gmail.com>:
>
> > Just so you know what I expect issue is in that you are seeing. When
> > Apache finalises a request, if HTTP 1.1 and connection to be kept
> > alive for subsequent request, it has to read any remaining request
> > content up to Content-Length. If you have set Content-Length to be
> > more than is actually there, Apache will hang waiting to see if that
> > extra data comes. Eventually the browser probably kills connection and
> > server gets unstuck.
>
> > I'll try and send the WSGI middleware later. Because requests are
> > small, buffering can be done in memory. I think Django loads it all in
> > memory anyway.
>
> > Graham
>
> > 2009/10/14 Aaron Staley <usaa...@gmail.com>:

Graham Dumpleton

unread,
Oct 14, 2009, 6:15:00 PM10/14/09
to mod...@googlegroups.com
2009/10/14 Aaron Staley <usa...@gmail.com>:
>
> Graham,
>  Your solution worked perfectly.  Thank you so much for your help!
>
>  One thing though: I had to use HTTP_CONTENT_ENCODING, as opposed to
> CONTENT_ENCODING in the environ.get line.  Is the dictionary key
> changing in the next version of mod_wsgi?

No. It is supposed to be HTTP_CONTENT_ENCODING. All headers except for
CONTENT_LENGTH and CONTENT_TYPE are prefixed with 'HTTP_'.

Me just not thinking properly.

Graham
Reply all
Reply to author
Forward
0 new messages