Simple working example of GzipOutputStream and GzipInputStream

2,170 views
Skip to first unread message

Luke

unread,
Oct 3, 2011, 6:07:02 PM10/3/11
to Protocol Buffers
Dear all,

after some days of experimenting with Protocol Buffers I tried to
compress the files.
With Python this is quite simple to do and does not require any play
with streams.

Since most of our code is written in C++ I would like to compress/
decompress files in the same language.
I've searched for examples utilising GzipOutputStream and
GzipInputStream with Protocol Buffers but could not find a working
example.
As you probably noticed by now I am a beginner at best with streams
and would really appreciate a fully working example as in
http://code.google.com/apis/protocolbuffers/docs/cpptutorial.html
(I have my address_book, how do I save it in a gziped file?)

Thank you in advance.

Cheers,
Luke

Jason Hsueh

unread,
Oct 6, 2011, 3:25:23 PM10/6/11
to Luke, Protocol Buffers
You need to use the SerializeToZeroCopyStream interface, since you want to pass in a custom stream. You use GzipOutputStream just like any other ZeroCopyOutputStream, except it needs to wrap the underlying stream implementation. Since you mention writing to a gzip file, you might do something like:

int fd = ...;
FileOutputStream file_stream(fd);
GZipOutputStream gzip_stream(&file_stream, GzipOutputStream::Options());
msg.SerializeToZeroCopyStream(&gzip_stream);


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.


Luke

unread,
Oct 7, 2011, 3:04:13 PM10/7/11
to Protocol Buffers
Hi,

thanks a lot for the example!
I have some further questions which can be found inline.

On Oct 6, 8:25 pm, Jason Hsueh <jas...@google.com> wrote:
> You need to use the SerializeToZeroCopyStream interface, since you want to
> pass in a custom stream. You use GzipOutputStream just like any other
> ZeroCopyOutputStream, except it needs to wrap the underlying stream
> implementation. Since you mention writing to a gzip file, you might do
> something like:
>
> int fd = ...;
> FileOutputStream file_stream(fd);
> GZipOutputStream gzip_stream(&file_stream, GzipOutputStream::Options());
> msg.SerializeToZeroCopyStream(&gzip_stream);

The solution I got from the example:
int writeEventCollection2(shared_ptr<HEP::EventCollection>
eCollection, std::string filename,
unsigned int compressionLevel) {
using namespace google::protobuf::io;
int fd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC,
S_IREAD | S_IWRITE);

if (fd == -1) {
throw "open failed on output file";
}
google::protobuf::io::FileOutputStream file_stream(fd);
GzipOutputStream::Options options;
options.format = GzipOutputStream::GZIP;
options.compression_level = compressionLevel;
google::protobuf::io::GzipOutputStream gzip_stream(&file_stream,
options);

if (!eCollection->SerializeToZeroCopyStream(&gzip_stream)) {
cerr << "Failed to write event collection." << endl;
return -1;
}
close(fd);

return 0;
}


My current working solution is:
int writeEventCollection(shared_ptr<HEP::EventCollection> eCollection,
std::string filename, unsigned int compressionLevel) {
filtering_ostream out;
out.push(gzip_compressor(compressionLevel));
out.push(file_sink(filename, ios_base::out | ios_base::binary));

if (!eCollection->SerializeToOstream(&out)) {
cerr << "Failed to write event collection." << endl;
return -1;
}

return 0;
}

Timing for writeEventCollection:
real 13m1.185s
user 11m18.500s
sys 0m13.430s

CPU usage: 65-70%
Size of test sample: 4.2 GB

Timing for writeEventCollection2:
real 12m37.061s
user 10m55.460s
sys 0m11.900s

CPU usage: 90-100%
Size of test sample: 3.9 GB

Is this expected? Better CPU usage, slightly faster and smaller output
size for GzipOutputStream?

Cheers,
Luke
Reply all
Reply to author
Forward
0 new messages