If you marshal a message is it preceded by its length?
If so, then each message can be read individually.
If not, then there is nothing on the wire to indicate where the boundary
lies.
So the right way to put 2 messages next to each other is with their
length written first (like when they are fields).
Or am I missing something?
--
Chris
Could I just manually delimit the end of message by adding in the end
of group tag myself to the stream after the message is encoded? And
if did would all implementations properly decode to multiple messages?
The plan of creating a "Bar { repeated Message foo = 1 }" and reading a
Bar means that every Message must be read in a single operation. One
might need to read each Message one at a time.
The only thing that needs to be done is to expose two operations in the API:
(1) Write the varint encoded size of the message followed by the message
(2) Read the varint encoded size of the message followed by the message
And these operations are already present, because this is how a Message
field is written and read!
Note that I do not propose writing a (field id,wire type) tag, just the
length.
All of the C++/Java/Python/Haskell/Lisp/C/Java/C#/Matlab/... bindings
will always needs (1) and (2) internally. So they need only expose
these in the external API.
In fact, I already had exposed (1) and (2) in Haskell because I assumed
they were going to be useful for this purpose.
In fact, I thought this was such a good idea that I assumed the other
APIs did this and did not realize this was not part of the other APIs.
Perhaps it is part the current API?
Using this only affects the top-level of a given wire-stream, making it
look more like a protocol channel with a series of (potentially
different) messages.
Cheers,
Chris
Check the post below for some information on how I handled it. Yes, I'm using
CodedInputStream and framing the messages myself.
http://mysqlmusings.blogspot.com/2008/08/missing-pieces-in-protobuf-binary-log.html
Just my few cents,
Mats Kindahl
I do not know what protobuf-net's streaming is doing (and I read
http://groups.google.com/group/protobuf/browse_thread/thread/951ed9d0359184ea/7713405ac3599fb1?hl=en&lnk=gst&q=Facilitate#7713405ac3599fb1
).
Adding a wrapper message does nothing for the issue of delimiting the
outermost message.
And using a single wrapper message means that one has to read all the
input before using the first contained message.
I do not propose removing the current way of doing things. I merely
proposed adding to the API so that length prefixed messages can be
written and read. This could only be a problem if one does not know
whether a stream has such a length prefix, but in that case I doubt you
would also know which kind of message to try reading.
Question: I have not looked for the answer in the code yet, but how
does the service/method API delimit the request and answer?
--
Chris
I am in favor of putting some kind of delimited message format into the
API, this is the requested feature. This will keep people from being
too creative and reinventing incompatible solutions to have delimited
messages. Such as the suggestions in this thread.
The LengthDelimited functions are the most obvious: all possible
implementation of messages must already have the functionality
internally (as your code shows). So a few lines of code in the API (not
the generated code files) takes care of this feature request. I have
not checked if the existing API for all languages exposes enough to
write the equivalent of the few lines of code you showed for C++.
The most obvious use to me is reading from a continous (e.g. network)
stream of bytes. The outermost message needs to be delimited somehow,
currently by the application inventing more protocol rules.
Cheers,
Chris
Would a good choice for a new API by a generalization of those commands?
Hmmm...you also write:
> I think it's best to concentrate on the simple requirement first, and
> not guess too much about what would be needed. Use cases for an
> homogenous stream are easy to come up with - the simplest being
> logging, for example.
Is writing and reading a field at a time an overly complicated mechanism?
You fixed field# only sinks and sources the message object. A more general API would be able to set the field# and return the field#. A slight generalization would work on strings and bytes since their wire encoding is identical to messages. A full generalization would work on all allowed field types.
These are each about 2 or 3 lines of Haskell (plus documentation), so I will probably add them all.
And it seems that I am coming around to your view that the A+B+C is better than B+C encoding.
Cheers,
Chris
http://groups.google.com/group/protobuf/browse_thread/thread/19ab6bbb364fef35?hl=en#
This is about Alex integrating with Hadoop and:
> Now, when I see the stream coming in on the deserialization side, I get
> "<binary>my_string<binary>" The leading binary is the same as the
> original,
> however the trailing binary is something new entirely.
Where Kenton replies:
> No, it won't work. Protocol buffers are not self-delimiting. They assume
> that the input you provide is supposed to be one complete message, not a
> message possibly followed by other stuff.
>
> You will need to somehow communicate the size of the message and make sure
> to limit the input to that size.
And so we have a customer for a delimited message API to use in a mixed
protocol binary stream.
I have just posted a message in that thread pointing at this thread.
It looks like (Length + Message) on the wire would work.
I would also like to note that there is another (probably silly) way to
delimit a message: a trailing byte of value 0 to 7. The 0-7, as a wire
tag, decodes to 0 as a field number and 0-7 as the wire encoding. A
field number of 0 is disallowed by the ".proto" specification. Thus the
0-7 cannot be for the next field and could be used as punctuation after
the message by a new API.
I still prefer Tag+Length+Message or Length+Message. But there have
been long threads here with those that think precomputing the Length is
expensive and/or want a streaming write capability. These people might
want a punctuation delimited API.
Cheers,
Chris