Message forwarding and partial parsing

villintehaspam

unread,

Oct 7, 2009, 8:46:44 AM10/7/09

to Protocol Buffers

Hi,

I am wondering about the best way of forwarding received protocol
buffer messages from one entity to another without having to parse the
entire message just to serialize it again.

My scenario is the following: I have a process A connected to process
B using local IPC. B is in turn connected to process C on another
machine using tcp and C is connected to process D using local IPC. I.e
A->B->C->D.

Process A wants to send messages to process B, C and D, to control the
operations. Process A has no concept of tcp/ip and uses process B to
forward messages to the 'C' processes running on other machines (each
machine has a unique id). Each machine might have several 'D'
processes running (each has a unique id).

The basic message is similar to this:
message MyMessage {
extensions 100 to max;
}

and several messages that would make sense to B, C and D are declared
similar to this:

message MyExtension {
extend MyMessage {
optional MyExtension my_extension = 100;
}
...
}

In a naive implementation, a message sent from A to D would involve
the message being serialized by A, deserialized by B, serialized by B,
deserialized by C, serialized by C and then finally being deserialized
by D. This seems a bit too much to me, so I am hoping that anyone
would be willing to comment on the possible solutions to routing
messages, while minimizing unnecessary serialization/deserialization
overhead.

I have several options:

Option 1:
Extend the MyMessage message with destination information like this:
message MyMessage {
optional MyId destination_id;
...
}

When process B deserializes the message it can look at the
destination_id to decide where to forward the message. The problem
with this would be that some extensions would be recognized by process
B even though they are aimed at process C, which I'm _guessing_ would
mean that the extension would be parsed and then encoded again when
the message is forwarded. So I'm thinking this approach is out.

Option 2:
Extend MyMessage with an internal message:
message MyMessage {
optional MyId destination_id = 1;
optional bytes internal_message = 2;
...
}
Now process B would not have to parse the internal message. However,
process A would have to first serialize the message to a byte
sequence, then insert that into another message and serialize that.
This seems awkward to me.

Option 3:
Extend the header sent on the channel with more information. Right now
I am sending the length of the message first, then the actual
serialized message. This could be extended into more of a header with
the destination id as well. Sounds like a protocol buffer message
would be suitable for use as a header... something like this:

message MyHeader {
optional destination_id = 1;
required uint32 message_length;
}

On the wire, I would still need to first send the length of the header
(or possibly make sure that the header has a fixed length), then the
serialized header followed by the serialized message. Process B could
then simply forward the bytes in the message without having to parse
the contents.

Of these three options, I'm thinking that option 3 is the correct way
to go. Am I missing some functionality provided by protocol buffers
(such as the ability to skip parsing extensions even if they are
recognized or similar or only parse as much as needed)? Am I missing
any problems?

On a somewhat related note, is it possible to parse a partially
transmitted message and continue parsing at a later time when more
data is available? I.e. since I cannot guarantee that all data for a
message is available directly, do I need to buffer data until I know
that I have the entire message (which is what I do today) before
allowing protocol buffers to parse it?

Example: the message X is sent on the wire consisting of a number of
fields. It is delivered on the other side of the connection as a
series of chunks. For instance, in a theoretical scenario the first
chunk could contain the first field descriptor, the first data value
and half the second field descriptor. The next chunk could contain the
second half of the second field descriptor and half the second data
value and the last chunk could contain the rest of the message.

Can I allow protocol buffers to parse the chunks of data as they come
in without having to worry about half field descriptors, half data
values and so on? I see that there are ParsePartialFrom... functions
for messages, but the documentation states that the difference between
these and the regular ParseFrom... functions are that they allow
required fields to be missing. I assume that this means that there is
no partial parse functionality in the sense that partial field
descriptors or partial values can be "continued" at a later time?

Sorry for a lengthy post... Any comments on either problem are
appreciated!

Cheers,
V

Kenton Varda

unread,

Oct 7, 2009, 3:44:25 PM10/7/09

to villintehaspam, Protocol Buffers

On Wed, Oct 7, 2009 at 5:46 AM, villintehaspam <villint...@gmail.com> wrote:

I am wondering about the best way of forwarding received protocol
buffer messages from one entity to another without having to parse the
entire message just to serialize it again.

It looks like you've figured out all the major options.

One thing I'd encourage you to do if you haven't already is actually profile your system to find out if repeated parsing and serialization is a real problem for you. It may not be a real problem in practice even if it feels wrong.

Of these three options, I'm thinking that option 3 is the correct way

to go.

All three options are reasonable. Option 3 is the most complicated solution, but probably the most performant.

Am I missing some functionality provided by protocol buffers
(such as the ability to skip parsing extensions even if they are
recognized or similar or only parse as much as needed)? Am I missing
any problems?

If you are using C++, then all compiled-in extensions will be eagerly parsed. If you only compile-in the extensions that each process actually cares about, that solves your problem.

In Java you provide an ExtensionRegistry listing extensions you care about, so it's trivial to include only the ones you want. I'm guessing you aren't using Java.

On a somewhat related note, is it possible to parse a partially
transmitted message and continue parsing at a later time when more
data is available?

Not without blocking. The library is designed to parse an entire message at once. Allowing partial parsing (without blocking) would be quite complicated.

villintehaspam

unread,

Oct 8, 2009, 5:30:57 AM10/8/09

to Protocol Buffers

Hi Kenton,

Thank you for your quick response and your feedback.

I'm going to use option 3, since as you say this will probably be the
fastest solution and I think that it will fit in the best with our
application. You are probably right that this will not be an issue for
most messages that are going to be forwarded (most messages will be
quite small), but I consider the complexity of the different options
to be roughly the same so I might as well go for the solution that
feels the best.

Thanks,
V

On Oct 7, 9:44 pm, Kenton Varda <ken...@google.com> wrote:

Reply all

Reply to author

Forward