Content-Encoding: gzip (patch included)

204 views
Skip to first unread message

Louis Gerbarg

unread,
Jan 22, 2009, 6:41:28 PM1/22/09
to moch...@googlegroups.com
I have been using CouchDB as the backend for an iPhone app. Since we
often are running over cellular data size is a big deal, so I was
pretty surprised when I noticed the server could not send compressed
content.

Attached is a patch that adds gzip support to mochiweb, it is
generated from the build of mochiweb that is currently included with
CouchDB. It only adds gzip support for non-chunked data. In order to
really support chunked gzip'ed data I would need to add some new API*,
I actually have a rough cut of it, but the changes in CouchDB are
moderately tricky, so I don't have high confidence it is actually
correct, I will probably build a smaller test app to get that working
then make the changes to CouchDB.

Be kind with this patch, it is the first erlang code I have ever written ;-)

Louis

* Technically you can gzip each individual chunk since gzip data can
be catted together to result in a compressed file that decompresses to
the cat of the individual compressed files. The problem with doing
that is that the dictionary is very small, so you get really bad
compression if you do that, especially with something like CouchDB
where each record is sent in a separate chunk and is likely to contain
a lot common strings. So in order to make that happen I need to pass
around state between between transmitting chunks. The downside of that
is that it requires a new API, the upside is that that would allow for
coalescing chunks into to specific sizes even when not doing
compression.

gzip.patch

Bob Ippolito

unread,
Jan 22, 2009, 7:20:03 PM1/22/09
to moch...@googlegroups.com
What's the motivation to do this in the web server by default? It
would actually be very bad for us, since a large part of what we serve
with our mochiweb instances are SWF files which (in our case) already
have a layer of zlib compression applied.

Everything in this patch could be easily done in the application layer.

Louis Gerbarg

unread,
Jan 22, 2009, 8:16:14 PM1/22/09
to moch...@googlegroups.com
Usually doing it in the app layer is bad because it is a generic
service that you really want everyone but precompressed data using, so
you end up repeating everywhere. Now since you wrote mochiweb to do
services that mostly serve swf, images, and other data that has its
own compression I can see how that might seem foreign, but imagine a
few more apps like CouchDB each having their own compression layer,
that would also be quite bad.

The usual solution to this (which I did not implement, since it is not
necessary for my purposes, but would be more than happy to implement)
is to automatically disable it for certain mime/types.

Bob Ippolito

unread,
Jan 23, 2009, 1:40:45 AM1/23/09
to moch...@googlegroups.com
I think it would make the most sense to have it available as some kind
of library that people can use if they want to, I don't want to turn
any potentially surprising behavior on by default. Also note that
Internet Explorer sucks at gzip encoding in many cases.

Louis Gerbarg

unread,
Jan 23, 2009, 2:06:54 AM1/23/09
to moch...@googlegroups.com
It could certainly be set to off by default and the app could turn it
on. I imagine in a lot case with CouchDB the clients have a lot
control over their http stack, so concerns over old versions of IE is
not an issue there. I personally have not had to deal with IE
recentlly, though I have to concede I find it shocking their gzip
implementation could be bad enough to make sending uncompressed data
preferable.

The other (potential) benefit of putting the code into the base app
server, even if it is not on, is the ability to cluster up chunks.
Currently if an app uses chunked encoding a chunk is sent every time
the app pushes any data. In a lot of cases it is better to buffer up
to some size and push it. There is no real point is sending a bunch of
40 byte chunks, the OS is going to buffer them up to MTU size anyway.
I would need to implement such buffering anyway to get a decent size
window for compression. While I could do that up in an app, it seems
like it would be a decent win for all chunked encoding (compressed or
not), so it makes sense to put that in app server so not every app
needs to deal with it. It would require a new interface, though the
current one could be implemented on top of the new one for
compatibility. I would only want to do that in the appserver if I am
doing compression there, because otherwise I would need to implement
another buffer somewhere else for my compression window, and double
buffering the data in the app and the app server is nonsensical.

The new interface would look something like:

NewState = send_chunks(State, Data)

and

send_chunks_immediate(State, Data)

Louis

Bob Ippolito

unread,
Jan 23, 2009, 3:58:11 AM1/23/09
to moch...@googlegroups.com
None of this needs to be implemented in the core of mochiweb. I'm fine
with including stuff like this as modules that ship with mochiweb that
can be used explicitly, but I don't intend to put anything like this
in the core. All of the ideas you've suggested are smart in some
situations but potentially harmful in others. They're also all
implementable without changing a single line of existing mochiweb
source code...

mochiweb is intended to implement enough of the HTTP protocol such
that you can do whatever you need to, but it should never do weird
shit behind your back. In no case should you have to work around
something in mochiweb, you should be able to layer damn near anything
you want on top.

Chris Anderson

unread,
Jan 23, 2009, 4:04:31 AM1/23/09
to moch...@googlegroups.com
On Fri, Jan 23, 2009 at 12:58 AM, Bob Ippolito <b...@redivi.com> wrote:
>
> mochiweb is intended to implement enough of the HTTP protocol such
> that you can do whatever you need to, but it should never do weird
> shit behind your back. In no case should you have to work around
> something in mochiweb, you should be able to layer damn near anything
> you want on top.
>

I like the way you put this Bob.

And, Louis, if the implementation is solid, it does sound like it
could be a good option in CouchDB.


--
Chris Anderson
http://jchris.mfdz.com

Pichi

unread,
Jan 28, 2009, 1:54:46 PM1/28/09
to MochiWeb
I have just shortly peeped inside your patch and I think this is not
good idea use io_list_size just only to determine if size is more than
zero. It should be done in much more effective way by code like

is_nonempty_io_list([H|T]) when is_list(H); is_binary(H) ->
is_nonempty_io_list(H) orelse is_nonempty_io_list(T);
is_nonempty_io_list([_|_]) -> true;
is_nonempty_io_list(B) when is_binary(B) ->size(B)>0;
is_nonempty_io_list(_) -> false.
>  gzip.patch
> 1KZobrazitStáhnout

Louis Gerbarg

unread,
Jan 28, 2009, 11:01:07 PM1/28/09
to moch...@googlegroups.com
Since Bob does not want this behaviour in mochiweb (regardless of the
implementation) I am actually doing a completely different
implementation that is a library that couchdb calls out to. I am also
throwing away the fixed size implementation (since that is a special
case of streaming, which I need to implement as well).

In other words, all that code is gone ;-)

Bob Ippolito

unread,
Jan 28, 2009, 11:54:24 PM1/28/09
to moch...@googlegroups.com
That's not what quite I said. I said I don't want requests to do this
by default, which is how you implemented the patch. It can be
implemented many other ways. I would accept a patch that allowed
explicit use of gzip from a mochiweb request. It could work in a
similar way to how mochiweb serves files (explicitly), for example.

Kunthar

unread,
Jan 29, 2009, 5:46:45 AM1/29/09
to moch...@googlegroups.com
AFAIK, Mochiweb has MIT license. Anyone has a different approach then
the main idea, can put the forked version to the github.
Every one then could be happy. Quite simple.

Peace
Kunth

Louis Gerbarg

unread,
Jan 29, 2009, 3:35:55 AM1/29/09
to moch...@googlegroups.com
Ah, sorry for the misunderstanding. I may end up sending you a patch
then, as I think it is better in the app server. Let me sketch out
what I have done, and how it could work to see if you find it more
workable.

I implemented a basic buffer, that take two inputs, a transformer and
an emitter. I then implemented zip as a transformer (as well as an
identity transformer). I then implemented two emitters, a chunked
emitter, and a burst emitter (which gathers up everything until the
buffer is flushed and bursts it out). With both of those emitters no
data is actually buffered, though I would obviously add a coalescing
emitter that really did buffer data on the server prior to being
compressed.

The way I would like to hook this up in mochiweb (and I am thinking
this up as I write this, so bear with me) is to send all data this
transmitted through mochiweb through the buffer interface using either
the chunk emitter or the burst emitter (depending if it was chunked or
not), and using the identity transform. The output should be exactly
the same as it is now, and the processing cost should be minimal, jut
an extra level of indirection. Additionally, I could add an option to
specify a custom emitter or transform when a response is created. It
would still leave the header parsing up in couchdb, but the actual
compression and and buffering would happen in mochiweb (though perhaps
using an emitter and gzip transformer living in couchdb).

Louis Gerbarg

unread,
Jan 29, 2009, 6:23:34 PM1/29/09
to moch...@googlegroups.com
Forking has a maintenance cost associated with it, and even though Bob
and I have different needs I think the fact that he has a lot more
familiarity with Erlang and Mochiweb means his views in how something
like this should be architected are very valid and I should try to
figure out how to make my needs conform to them.

It might be the case that there is no way to cleanly do it within
Mochiweb in a way that I find usable and that Bob likes, at which
point I have the choice of either implementing somewhere I think is
not quite the right layer of the stack, or to fork mochiweb. Either of
those are acceptable options as I ultimately need this to ship a
product, but doing either of those immediately, especially since I am
the novice here, seems somewhere rash. Instead I should see if I can
come up with something acceptable to everyone, I suspect trying to
make it work within the constraints Bob laid out will result in a
nicer implementation.

Louis

Justin Sheehy

unread,
Feb 4, 2009, 11:13:49 AM2/4/09
to MochiWeb
I'm in complete agreement regarding not doing gzip by default.
Gzip'ed content-encoding is extremely useful in many cases, but can be
problematic in others.

In webmachine, which rests atop mochiweb, one can turn on GZIP
encoding for a given resource just by giving that resource the
following:

encodings_provided(_ReqProps, Context) -> {[{"identity", fun(X) -> X
end}, {"gzip", fun(X) -> zlib:gzip(X) end}], Context}.

More commonly, people make the function inspect the request to decide
whether to support gzip encoding depending on method or other
details. If you simply leave out this function then the identity
encoding is the only one supported.

I'm only mentioning this to make the same point that I believe Bob
was: gzip is very useful, but turning it on all the time in the guts
of mochiweb is a bad idea -- and it's not hard to do it at a slightly
higher level. Louis, perhaps this is the "right layer of the stack"
you were thinking of?

-Justin

Louis Gerbarg

unread,
Feb 4, 2009, 5:00:33 PM2/4/09
to moch...@googlegroups.com
So, I actually have a (not very clean) proof of concept for CouchDB,
that I think architecturally should be okay with Bob. Here is
basically how it works.

If you do nothing mochiweb behaves exactly the same as it does now.
Most of the new code lives in method entries that will never be
reached due to pattern matching, though only exception being some
changes in the "write_chunk" method to know how to use buffering if it
is enabled.

In addition to being able to specify something as "chunked" you can
specify it as "buffered", "zipped", or "bufferedAndZipped" (well
atoms, not strings, but I am using quotes here ;-). Buffered coalesces
chunks, zipped zips a whole (unchunked) body, bufferedAndZipped does
both. All of the interfaces to mochiweb remain the same with those
additions, and I added a utility function in Mochiweb that walks
through the headers and chooses the "densest" available encoding (gzip
or none, I will add deflate as well).

The end result is that all the code that is likely to be duplicated
lives in mochiweb, but most of it stays dormant. If a client wants to
use it they for a particular request they call the utility function in
order to find out what encoding the client wants, then pass that in
when they create the response.

If you want to see what it looks like now, it is up at
<http://github.com/lgerbarg/couchdb/tree/gzip-support>, it still has a
few bugs I am working through, but it passes most of couchdb's
testsuite, and the failures are intermittent.

Louis
Reply all
Reply to author
Forward
0 new messages