GZIPOutputStream (for C++)

289 views
Skip to first unread message

bolson

unread,
Mar 1, 2009, 5:01:41 PM3/1/09
to Protocol Buffers
Has anyone written a ZeroCopyOutputStream that compresses its output
with zlib on the way out? (Ideally there'd be a matching
GZIPInputStream.) This is of course trivial in Java, but my project is
in C++.

I recently noticed that one of my protobuf-data-files still benefitted
from gzip at a ratio slightly better than 7 to 1.

I'm imagining usage like this:
ZeroCopyOutputStream* file_out = new FileOutputStream(fd);
ZeroCopyOutputStream* zlib_out = new GZIPOutputStream(file_out);
CodedOutputStream* code_out = new CodedOutputStream(zlib_out);
code_out.WriteTag(42);
my_message.SerializeToCodedStream(code_out);


If there's any interest and no prior art I may get around to this
myself in a day or two. My files will just have to be big until then.

Kenton Varda

unread,
Mar 1, 2009, 6:30:25 PM3/1/09
to bolson, Protocol Buffers
On Sun, Mar 1, 2009 at 2:01 PM, bolson <brian...@gmail.com> wrote:
I recently noticed that one of my protobuf-data-files still benefitted
from gzip at a ratio slightly better than 7 to 1.

Just to note, this is not surprising.  PB is a compact format but does not actual compression, so if you have a message containing repetitive data, it will still compress well.

Mark Assad

unread,
Mar 1, 2009, 10:04:42 PM3/1/09
to bolson, Protocol Buffers
Hi,

I've implemented something like this (attached to this email).
You'll probably have to do a bit of work to get it working with what
you want to do. The InputStream takes as input another ZeroCopyInput
Stream. Unfortunatly, the output stream hasn't been implemented this
way, instead it takes a filename of a file descriptor -- No reason
technically, just it was done the wrong way the first time, and I
haven't had the chance to do anything about it yet.

I've used the boost shared_ptr to store the underlying
ZeroCopyInputStreams. The typedefs are something like;

typedef boost::shared_ptr<ZeroCopyInputStream> ZeroCopyInputStreamPtr;

I've attached the .h/.cpp and the testcases I've written for the input
class. This will need to be linked with zlib.

If you spot any problems, I'd appreciate hearing about them. Thanks.

Mark
GZFileInputStream.cpp
GZFileInputStream.h
GZFileInputStream_unittest.cpp
GZFileOutputStream.cpp
GZFileOutputStream.h
Reply all
Reply to author
Forward
0 new messages