Protocol Buffers and asynchronous sockets

1,240 views
Skip to first unread message

Gilad Ben-Ami

unread,
Nov 24, 2009, 4:17:44 AM11/24/09
to Protocol Buffers
Hey,

I'm using ACE library for C++ and it's reactor pattern for handling
asynchronous read from / write to sockets.
I'm trying to integrate Protocol buffers into my solution in order to
exchange data with another process developed in Java.

The way asynchronous work, forces me to know in advance what is the
expected message size and only after i have all the data try to parse
it with PB.
What is the best way to use PB in this scenario? Is there any Stream i
can use to hold the data arrived? and i can i recover from trying to
parse a message that has failed because of not enough data arrived?

Your help is appreciated.
Thanks.

Mika Raento

unread,
Nov 24, 2009, 5:56:17 AM11/24/09
to Gilad Ben-Ami, Protocol Buffers
Protobufs are pretty much designed to be read all at once. The normal
thing would be to define a stream format that prefixes the serialized
protobufs with their length and buffer the data until a whole protobuf
has been read.

In other words: you should not describe the whole stream as a single
protobuf (like you often would with, say, XML) but instead use a
different format for framing a stream of protobufs.

Regards,
Mika

2009/11/24 Gilad Ben-Ami <gila...@gmail.com>:
> --
>
> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
>
>
>

Gilad Ben-Ami

unread,
Nov 24, 2009, 7:40:55 AM11/24/09
to Protocol Buffers
Hey,

The protocol we've defined for this kind of solution is to send a
fixed 4 byte unsigned interger that represents the
following PB message length, read the PB message and wait again for
the size.

So in this case, what is the best method to use PB?
Should i use SerializeToArray and ParseFromArray instead of using the
protobuf::io streams?
(because the data I'm buffering is stored in a char* array) Does PB
provide any stream i can feed with data until
I've read all the expected bytes and then order to parse?

Thanks.


On Nov 24, 12:56 pm, Mika Raento <mika.rae...@gmail.com> wrote:
> Protobufs are pretty much designed to be read all at once. The normal
> thing would be to define a stream format that prefixes the serialized
> protobufs with their length and buffer the data until a whole protobuf
> has been read.
>
> In other words: you should not describe the whole stream as a single
> protobuf (like you often would with, say, XML) but instead use a
> different format for framing a stream of protobufs.
>
> Regards,
>    Mika
>
> 2009/11/24 Gilad Ben-Ami <gilad...@gmail.com>:

Evan Jones

unread,
Nov 24, 2009, 10:29:52 AM11/24/09
to Gilad Ben-Ami, Protocol Buffers
Gilad Ben-Ami wrote:
> So in this case, what is the best method to use PB?
> Should i use SerializeToArray and ParseFromArray instead of using the
> protobuf::io streams?

To use protocol buffers with an asynchronous library, you need to
collect the data for the message is some data structure until you know
it is all there. If performance is not critical the least effort
approach is:

1. Read the message_length from the stream in some way.
2. Create a std::string.
3. Read message_length bytes from the stream, appending them to the
std::string.
4. Use message.ParseFromString() to parse the message.

This can be bad for performance because the data may be copied many
times. If performance is really critical, you basically need to
efficiently collect the bytes into some "buffer data structure." I'm
assuming the ACE library probably provides something that does this?
Then, once you have at least message_length bytes, you parse it via a
ZeroCopyInputStream implementation.

For my asynchronous library, my implementation is approximately:

// assume we read the message_length from input somehow
if (input.availableBytes() < message_length) {
// get called back later
return IO_WAIT;
}

// MyInputWrapper implements google::protobuf::io::ZeroCopyInputStream
MyInputWrapper wrapper(&input, message_length);
MyProtocolBuffer message;
message.ParseFromZeroCopyStream(&wrapper);


I hope this helps,

Evan

--
Evan Jones
http://evanjones.ca/

Gilad Ben-Ami

unread,
Nov 24, 2009, 11:10:16 AM11/24/09
to Protocol Buffers
Thanks for the suggestion.

Do you think that using std::iostream in the following scenario would
work / be a good choice?
1. read message_length
2. buffer message_length bytes into iostream variable.
3. when all data is received, use IstreamInputStream to wrap the
iostream and have it parsed with ParseFromZeroCopyStream()

Does the iostream handles releasing the bytes already read by PB?

Thanks.

Evan Jones

unread,
Nov 24, 2009, 12:07:23 PM11/24/09
to Gilad Ben-Ami, Protocol Buffers
Gilad Ben-Ami wrote:
> Do you think that using std::iostream in the following scenario would
> work / be a good choice?
> 1. read message_length
> 2. buffer message_length bytes into iostream variable.
> 3. when all data is received, use IstreamInputStream to wrap the
> iostream and have it parsed with ParseFromZeroCopyStream()

If your application doesn't have a buffer already, I recommend using
std::string. AFAIK, the C++ standard library doesn't provide anything
more appropriate. It will do a good enough job, particularly if you
re-use one std::string rather than allocating a new one for each message.

The reason to use something more complicated is because lots of
applications already have some sort of buffer, and you want to try and
avoid extra copies.

Kenton Varda

unread,
Nov 24, 2009, 3:35:33 PM11/24/09
to Evan Jones, Gilad Ben-Ami, Protocol Buffers
Yes, use std::string.  The only potential problem is if your messages are very large -- allocating large contiguous blocks of memory (as std::string does) could lead to memory fragmentation.  But for small and medium-sized messages, there's no reason not to use std::string as the buffer.  Parsing from an std::string (or a simple array -- they're essentially the same) is (slightly) faster than parsing from any other data structure.

Reply all
Reply to author
Forward
0 new messages