How to use ZeroCopy*?

Yang Zhang

unread,

Feb 10, 2009, 4:46:45 PM2/10/09

to Protocol Buffers

Hi, is there any documentation on how exactly to use the ZeroCopy*
streams, or at least a high-level description of how they are used/fit
in with everything else? Thanks in advance!

Kenton Varda

unread,

Feb 10, 2009, 5:14:33 PM2/10/09

to Yang Zhang, Protocol Buffers

Just the API docs:

http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream.html

http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream_impl.html

The ZeroCopyStream interfaces are the underlying abstract stream interfaces used by the protocol buffer I/O code. Messages are parsed from ZeroCopyInputStreams and written to ZeroCopyOutputStreams. These classes are particularly useful when reading from / writing to in-memory data structures because they can direct the parser/serializer directly at the memory rather than copying to/from separate buffers.

(The protobuf library does not provide any direct support for zero-copy file or network I/O, but it would be possible to implement ZeroCopy streams that accomplish this.)

Yang Zhang

unread,

Feb 10, 2009, 5:50:51 PM2/10/09

to Kenton Varda, Protocol Buffers

Kenton Varda wrote:
> The ZeroCopyStream interfaces are the underlying abstract stream
> interfaces used by the protocol buffer I/O code. Messages are parsed
> from ZeroCopyInputStreams and written to ZeroCopyOutputStreams. These
> classes are particularly useful when reading from / writing to in-memory
> data structures

I had assumed that protocol buffers are always read from/written to
in-memory data structures. Is there something else?

> because they can direct the parser/serializer directly
> at the memory rather than copying to/from separate buffers.

After I build up a protocol buffer, I need to call
SerializeToOstream/SerializeToString/etc., which performs a copy, right?
So where exactly do these fit in? Are they an intermediate buffer so
as to reduce SerializeTo* to what's basically a memcpy?

And would these have poor worst-case performance if the protocol buffer
is built up using a sequence of operations that is not amenable to
direct serialization? (E.g. insertions that cause subsequent bits to be
shifted.)

And when I use ParseFrom*, I can imagine a buffer-backed implementation,
but this would be unsafe (could be left with a dangling reference once
the backend is destroyed) - which, AFAIK, is not the case with protobufs.
--
Yang Zhang
http://www.mit.edu/~y_z/

Kenton Varda

unread,

Feb 10, 2009, 6:28:43 PM2/10/09

to Yang Zhang, Protocol Buffers

I think the ZeroCopy stuff is a lot less exciting than you seem to imagine. :)

On Tue, Feb 10, 2009 at 2:50 PM, Yang Zhang <yangha...@gmail.com> wrote:

Kenton Varda wrote:

The ZeroCopyStream interfaces are the underlying abstract stream interfaces used by the protocol buffer I/O code. Messages are parsed from ZeroCopyInputStreams and written to ZeroCopyOutputStreams. These classes are particularly useful when reading from / writing to in-memory data structures

I had assumed that protocol buffers are always read from/written to in-memory data structures. Is there something else?

You can read from / write to files and C++ iostreams, among other things.

because they can direct the parser/serializer directly at the memory rather than copying to/from separate buffers.

After I build up a protocol buffer, I need to call SerializeToOstream/SerializeToString/etc., which performs a copy, right?

SerializeToString() writes the serialized bytes directly into the string as they are generated. You could say that the information is "copied" from the in-memory structures to the serialized bytes, I suppose, but there's no way around that.

The point here is that most I/O stream interfaces require you to write to an intermediate buffer, and then call some sort of stream.write(buffer, size), which then has to make a copy of that buffer. ZeroCopyOutputStream instead provides the serializer with a pointer directly to the final destination of the bytes (if possible).

So where exactly do these fit in?

All of the SerializeTo*() methods simply construct a ZeroCopyOutputStream of the desired type and then call SerializeToZeroCopyStream().

Are they an intermediate buffer so as to reduce SerializeTo* to what's basically a memcpy?

And would these have poor worst-case performance if the protocol buffer is built up using a sequence of operations that is not amenable to direct serialization? (E.g. insertions that cause subsequent bits to be shifted.)

The protocol buffer wire format is nothing like the in-memory representation.

And when I use ParseFrom*, I can imagine a buffer-backed implementation, but this would be unsafe (could be left with a dangling reference once the backend is destroyed) - which, AFAIK, is not the case with protobufs.

When parsing, the input bytes are used to build in-memory data structures. The input is no longer needed once parsing is complete. The point of ZeroCopyInputStream is to avoid having to copy the input bytes into a temporary buffer before parsing them -- instead, ZeroCopyInputStream provides a pointer directly into the original in-memory data structure holding the bytes.

Reply all

Reply to author

Forward