multi-member gzip usage

jefflad

unread,

Jul 9, 2012, 7:17:34 PM7/9/12

to

What are the use cases for multi-member gzip ? Are there any applications that uses it ?

For instance would http 1.1 chunk encoding use this ?

thanks

Eli the Bearded

unread,

Jul 10, 2012, 3:25:49 PM7/10/12

to

In comp.compression, jefflad <jlado...@gmail.com> wrote:
> What are the use cases for multi-member gzip ? Are there any
> applications that uses it ?

I've used it in the past for log archives. For cases where you
want to keep more logs around than you can fit uncompressed on
disk, it can fill a niche: Daily log files compressed daily
into monthly archives, that can then be fed to a log parser
able process from a compressed stream.

> For instance would http 1.1 chunk encoding use this ?

I don't know if it does, but that seems a reasonable use case.

Elijah
------
now just uses daily files

Thomas Pornin

unread,

Jul 11, 2012, 7:06:09 AM7/11/12

to

According to jefflad <jlado...@gmail.com>:

> What are the use cases for multi-member gzip ? Are there any
> applications that uses it ?

I have seen "mutt" using it. This is a Unix application for reading and
sending emails; it can read mailboxes in the traditional format (i.e.
all mails concatenated in a single file), and it can also use compressed
mailboxes (the same file, compressed with gzip). When adding new emails
to a mailbox (a common case of an "archive mailbox"), it may do it by
appending the gzip of the new emails to the existing file (i.e. a
multi-member gzip), thus avoiding the cost of decompressing and
recompressing the existing mailbox (CPU cost, and also the temporary
storage cost).

(This was more than 15 years ago, and it was quite handy when the system
was a 25 MHz Sparc system, and the mailbox was in the 10MB+ range. Since
then, machines have grown, and mailboxes have grown, too, but not as
fast.)

--Thomas Pornin

Sven Köhler

unread,

Jul 11, 2012, 3:13:29 PM7/11/12

to

No. I think chunked transfer-encoding was invented to solve one
essential problem: if you don't know the length of the content in
advance (content-length HTTP response header), you cannot know when the
HTTP response ends. Prior to to chunked encoding, you had to read till
EOF and then throw the HTTP connection away. In order to allow the
re-use of the connection, you send the data in chunks, terminating the
stream of chunks with a zero length chunk. It's simply another way of
encoding an EOF. No content-length header is needed.

Also, wikipedia states, that gzip and chunk encoding are at different
layers: content-encoding (gzip) and transfer-encoding (chunks):
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
So first, the data is gzipped, then the gzip data is split into chunks.

A HTTP-oriented use-case of multi-member gzip could be the following:
If the server wants to perform a flush (i.e. emptying any buffers,
making sure any data sent so far successfully arrives at the client),
the server could terminate the current member and start a new one.
But aren't there other means of flushing? Also, I don't know whether
some webservers actually implement flushing of gzip-compressed HTTP
connections.

Regards,
Sven