How to use ZeroCopy*?

2,179 views
Skip to first unread message

Yang Zhang

unread,
Feb 10, 2009, 4:46:45 PM2/10/09
to Protocol Buffers
Hi, is there any documentation on how exactly to use the ZeroCopy*
streams, or at least a high-level description of how they are used/fit
in with everything else? Thanks in advance!

Kenton Varda

unread,
Feb 10, 2009, 5:14:33 PM2/10/09
to Yang Zhang, Protocol Buffers
Just the API docs:

The ZeroCopyStream interfaces are the underlying abstract stream interfaces used by the protocol buffer I/O code.  Messages are parsed from ZeroCopyInputStreams and written to ZeroCopyOutputStreams.  These classes are particularly useful when reading from / writing to in-memory data structures because they can direct the parser/serializer directly at the memory rather than copying to/from separate buffers.

(The protobuf library does not provide any direct support for zero-copy file or network I/O, but it would be possible to implement ZeroCopy streams that accomplish this.)

Yang Zhang

unread,
Feb 10, 2009, 5:50:51 PM2/10/09
to Kenton Varda, Protocol Buffers
Kenton Varda wrote:
> The ZeroCopyStream interfaces are the underlying abstract stream
> interfaces used by the protocol buffer I/O code. Messages are parsed
> from ZeroCopyInputStreams and written to ZeroCopyOutputStreams. These
> classes are particularly useful when reading from / writing to in-memory
> data structures

I had assumed that protocol buffers are always read from/written to
in-memory data structures. Is there something else?

> because they can direct the parser/serializer directly
> at the memory rather than copying to/from separate buffers.

After I build up a protocol buffer, I need to call
SerializeToOstream/SerializeToString/etc., which performs a copy, right?
So where exactly do these fit in? Are they an intermediate buffer so
as to reduce SerializeTo* to what's basically a memcpy?

And would these have poor worst-case performance if the protocol buffer
is built up using a sequence of operations that is not amenable to
direct serialization? (E.g. insertions that cause subsequent bits to be
shifted.)

And when I use ParseFrom*, I can imagine a buffer-backed implementation,
but this would be unsafe (could be left with a dangling reference once
the backend is destroyed) - which, AFAIK, is not the case with protobufs.
--
Yang Zhang
http://www.mit.edu/~y_z/

Kenton Varda

unread,
Feb 10, 2009, 6:28:43 PM2/10/09
to Yang Zhang, Protocol Buffers
I think the ZeroCopy stuff is a lot less exciting than you seem to imagine.  :)

On Tue, Feb 10, 2009 at 2:50 PM, Yang Zhang <yangha...@gmail.com> wrote:
Kenton Varda wrote:
The ZeroCopyStream interfaces are the underlying abstract stream interfaces used by the protocol buffer I/O code.  Messages are parsed from ZeroCopyInputStreams and written to ZeroCopyOutputStreams.  These classes are particularly useful when reading from / writing to in-memory data structures

I had assumed that protocol buffers are always read from/written to in-memory data structures.  Is there something else?

You can read from / write to files and C++ iostreams, among other things.
 

because they can direct the parser/serializer directly at the memory rather than copying to/from separate buffers.

After I build up a protocol buffer, I need to call SerializeToOstream/SerializeToString/etc., which performs a copy, right?

SerializeToString() writes the serialized bytes directly into the string as they are generated.  You could say that the information is "copied" from the in-memory structures to the serialized bytes, I suppose, but there's no way around that.

The point here is that most I/O stream interfaces require you to write to an intermediate buffer, and then call some sort of stream.write(buffer, size), which then has to make a copy of that buffer.  ZeroCopyOutputStream instead provides the serializer with a pointer directly to the final destination of the bytes (if possible).
 
 So where exactly do these fit in?

All of the SerializeTo*() methods simply construct a ZeroCopyOutputStream of the desired type and then call SerializeToZeroCopyStream().
 
 Are they an intermediate buffer so as to reduce SerializeTo* to what's basically a memcpy?

And would these have poor worst-case performance if the protocol buffer is built up using a sequence of operations that is not amenable to direct serialization?  (E.g. insertions that cause subsequent bits to be shifted.)

The protocol buffer wire format is nothing like the in-memory representation.
 
And when I use ParseFrom*, I can imagine a buffer-backed implementation, but this would be unsafe (could be left with a dangling reference once the backend is destroyed) - which, AFAIK, is not the case with protobufs.

When parsing, the input bytes are used to build in-memory data structures.  The input is no longer needed once parsing is complete.  The point of ZeroCopyInputStream is to avoid having to copy the input bytes into a temporary buffer before parsing them -- instead, ZeroCopyInputStream provides a pointer directly into the original in-memory data structure holding the bytes.
Reply all
Reply to author
Forward
0 new messages