Re: Proposal: add request decompression to gzip middleware

903 views
Skip to first unread message

Florian Apolloner

unread,
May 25, 2013, 7:08:01 PM5/25/13
to django-d...@googlegroups.com
Hi,

given that I already think that the GzipMiddleware is a bad idea (it should be left up to the webserver), I don't like the idea of adding this to the current middleware. Any reason why you couldn't do this in the webserver?

Regards,
Florian

Aymeric Augustin

unread,
May 26, 2013, 4:50:21 AM5/26/13
to django-d...@googlegroups.com
On 26 mai 2013, at 01:08, Florian Apolloner <f.apo...@gmail.com> wrote:

Any reason why you couldn't do this in the webserver?

Unlike compression, which is supported by almost all web servers (even though it's next to impossible to handle etags properly for dynamic content at this level), decompression isn't supported in general.

My only concern with adding this to GzipMiddleware is backwards-compatibility. Suddenly every user of GzipMiddleware may see a different behavior.

For example, double-decoding could occur in the following circumstances:
- if an application gzip-decodes the request body without checking the Content-Encoding header, because it knows it's gzipped,
- if a WSGI middleware gzip-decods the request body but doesn't strip the Content-Encoding header.

Arguable, these are bad designs, and I think we can make this change and document it in the release notes.


On 25 mai 2013, at 16:34, Sébastien Béal <seba...@gmail.com> wrote:

The idea behind this is simply to decompress the body of requests containing Content-Encoding: gzip header. 
I provided a working example on this branch: https://github.com/sebastibe/django/tree/gzip-request-middleware

This patch doesn't work at all. request.POST is a MultiValueDict, not a string! You're artificially setting it to a string in the test, and you're unit-testing the implementation, so the test passes, but a real request wouldn't work.

The process for parsing the request body into POST, FILES, or request.body is quite complicated in Django. The only reasonable implementation is to wrap environ['wsgi.input'] in a gzip decoder, and that much easier to implement as a WSGI middleware. I have such an implementation floating around:
import logging
from StringIO import StringIO
import zlib
class UnzipRequestMiddleware(object):
    """A middleware that unzips POSTed data.

For this middleware to kick in, the client must provide a value
for the ``Content-Encoding`` header. The only accepted value is
``gzip``. Any other value is ignored.
"""

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        encoding = environ.get('HTTP_CONTENT_ENCODING')
        if encoding == 'gzip':
            data = environ['wsgi.input'].read()
            try:
                uncompressed = zlib.decompress(data)
                environ['wsgi.input'] = StringIO(uncompressed)
                environ['CONTENT_LENGTH'] = len(uncompressed)
            except zlib.error:
                logging.warning(u"Could not decompress request data.", exc_info=True)
                environ['wsgi.input'] = StringIO(data)
        return self.app(environ, start_response)
If you'd like to implement this as a Django middleware, I'll have a look at the code. It will be more complex that your initial attempt and it will require a healthy amount of documentation, especially for middleware ordering. For example, if a middleware accesses request.POST prior to unzipping, that won't work.

You should also check how this interacts with upload handlers. If the decoding is performed at the right level, it should be independent, but please check:

-- 
Aymeric.




Sébastien Béal

unread,
May 26, 2013, 7:48:31 AM5/26/13
to django-d...@googlegroups.com
Thank you for the review and feedbacks. Indeed it seems easier to do it as WSGI middleware. However I think it could still be useful in Django so I am going to rework my code as follow:
  • create a dedicated 'UnzipRequestMiddleware' in the gzip module
  • handle POST and FILES correctly (I was too focus on my use case dealing with JSON hence my error with the string/dict)
  • include documentation regarding middleware ordering
-- 
Sébastien

Mikhail Korobov

unread,
May 26, 2013, 8:23:55 AM5/26/13
to django-d...@googlegroups.com
Request decompression looks scary: how are you going to implement protection against zip bombs (http://en.wikipedia.org/wiki/Zip_bomb)? See also: http://bugs.python.org/issue16043

суббота, 25 мая 2013 г., 20:34:44 UTC+6 пользователь Sébastien Béal написал:
Hi,

I would like to suggest to add requests decompression to the gzip middleware. Although few browsers have the ability to compress the request body, some use cases exists with other type of clients when building REST APIs or WebDAV clients.

To my knowledge, only Apache mod_deflate was providing this feature.

The idea behind this is simply to decompress the body of requests containing Content-Encoding: gzip header. 
I provided a working example on this branch: https://github.com/sebastibe/django/tree/gzip-request-middleware

What do you think about this?

Sébastien Béal

unread,
May 26, 2013, 9:49:48 AM5/26/13
to django-d...@googlegroups.com
I didn't think of it at first but I see 2 ways to do it:

- limit resources available with the resource module. Is there any other parts of Django using this technique?
- use the zlib module instead of the gzip one with a max_size in the decompress function. A pattern could be to require the Content-Length of the request to be equal to the uncompress size and use it as the max_size argument to be more dynamic, or just to set an arbitrary max_size. It will involve more work than just using the gzip module though.

Florian Apolloner

unread,
May 26, 2013, 11:51:28 AM5/26/13
to django-d...@googlegroups.com
Hi,


On Sunday, May 26, 2013 3:49:48 PM UTC+2, Sébastien Béal wrote:
- limit resources available with the resource module. Is there any other parts of Django using this technique?

Using rlimits are imo not an option; as (to my knowledge) it affects the whole process and not just the thread; also you'd have to reset it later on etc… All in all a solution which sounds to complicated imo.
 
- use the zlib module instead of the gzip one with a max_size in the decompress function. A pattern could be to require the Content-Length of the request to be equal to the uncompress size and use it as the max_size argument to be more dynamic, or just to set an arbitrary max_size. It will involve more work than just using the gzip module though.

I am not sure that's going to fly well, since when you use gzip you probably wanna send plenty of data, and as long as the user can control the max_size you are running into the same issues; so the only option would be your suggested max_size.

As Aymeric already pointed out, this is certainly easier in a real WSGI middleware, so the question is on whether we really want a suboptimal and error prone implementation as a django middleware. Personally I don't think it's worth is if the code is really as short as Aymeric demonstrated, ymmv.

Regards,
Florian

Reply all
Reply to author
Forward
0 new messages