villintehaspam
unread,Oct 7, 2009, 8:46:44 AM10/7/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Protocol Buffers
Hi,
I am wondering about the best way of forwarding received protocol
buffer messages from one entity to another without having to parse the
entire message just to serialize it again.
My scenario is the following: I have a process A connected to process
B using local IPC. B is in turn connected to process C on another
machine using tcp and C is connected to process D using local IPC. I.e
A->B->C->D.
Process A wants to send messages to process B, C and D, to control the
operations. Process A has no concept of tcp/ip and uses process B to
forward messages to the 'C' processes running on other machines (each
machine has a unique id). Each machine might have several 'D'
processes running (each has a unique id).
The basic message is similar to this:
message MyMessage {
extensions 100 to max;
}
and several messages that would make sense to B, C and D are declared
similar to this:
message MyExtension {
extend MyMessage {
optional MyExtension my_extension = 100;
}
...
}
In a naive implementation, a message sent from A to D would involve
the message being serialized by A, deserialized by B, serialized by B,
deserialized by C, serialized by C and then finally being deserialized
by D. This seems a bit too much to me, so I am hoping that anyone
would be willing to comment on the possible solutions to routing
messages, while minimizing unnecessary serialization/deserialization
overhead.
I have several options:
Option 1:
Extend the MyMessage message with destination information like this:
message MyMessage {
optional MyId destination_id;
...
}
When process B deserializes the message it can look at the
destination_id to decide where to forward the message. The problem
with this would be that some extensions would be recognized by process
B even though they are aimed at process C, which I'm _guessing_ would
mean that the extension would be parsed and then encoded again when
the message is forwarded. So I'm thinking this approach is out.
Option 2:
Extend MyMessage with an internal message:
message MyMessage {
optional MyId destination_id = 1;
optional bytes internal_message = 2;
...
}
Now process B would not have to parse the internal message. However,
process A would have to first serialize the message to a byte
sequence, then insert that into another message and serialize that.
This seems awkward to me.
Option 3:
Extend the header sent on the channel with more information. Right now
I am sending the length of the message first, then the actual
serialized message. This could be extended into more of a header with
the destination id as well. Sounds like a protocol buffer message
would be suitable for use as a header... something like this:
message MyHeader {
optional destination_id = 1;
required uint32 message_length;
}
On the wire, I would still need to first send the length of the header
(or possibly make sure that the header has a fixed length), then the
serialized header followed by the serialized message. Process B could
then simply forward the bytes in the message without having to parse
the contents.
Of these three options, I'm thinking that option 3 is the correct way
to go. Am I missing some functionality provided by protocol buffers
(such as the ability to skip parsing extensions even if they are
recognized or similar or only parse as much as needed)? Am I missing
any problems?
On a somewhat related note, is it possible to parse a partially
transmitted message and continue parsing at a later time when more
data is available? I.e. since I cannot guarantee that all data for a
message is available directly, do I need to buffer data until I know
that I have the entire message (which is what I do today) before
allowing protocol buffers to parse it?
Example: the message X is sent on the wire consisting of a number of
fields. It is delivered on the other side of the connection as a
series of chunks. For instance, in a theoretical scenario the first
chunk could contain the first field descriptor, the first data value
and half the second field descriptor. The next chunk could contain the
second half of the second field descriptor and half the second data
value and the last chunk could contain the rest of the message.
Can I allow protocol buffers to parse the chunks of data as they come
in without having to worry about half field descriptors, half data
values and so on? I see that there are ParsePartialFrom... functions
for messages, but the documentation states that the difference between
these and the regular ParseFrom... functions are that they allow
required fields to be missing. I assume that this means that there is
no partial parse functionality in the sense that partial field
descriptors or partial values can be "continued" at a later time?
Sorry for a lengthy post... Any comments on either problem are
appreciated!
Cheers,
V