order is specified by "digits" in .proto file
from the official example:
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
"= 1" means first
"= 2" means second
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Since each encode field have a header that describes their
tag+wiretype, the order in which they are sent does not matter. So a
protocol decoder must assume that fields come in any order.
For repeated fields, the elements end up in the same order they are
'on the wire'.
The documentation should probably state this more explicitly.
If you read the last paragraph in
http://code.google.com/apis/protocolbuffers/docs/encoding.html that
states that merging two protocol buffers is equivalent to parsing the
concatenated
binary representation this could be deduced.
So it is encoder dependent, in which order the fields are written to
the wire; typically, of course, you might see the elements written in
the order you see them in the proto file, but you must not assume this
when you write a decoder.
-henner
>
> Regards,
> Willem
> >
>
--
Henner Zeller | h.ze...@acm.org
Bücher kaufen und freie Software fördern | http://bookzilla.de
- I am afraid I don't understand the point about unknown fields. What scenario are you thinking of? A sending system sending fields that are not present in the .proto file that is used on the sending side? In that case, how is the field number for such a field determined?
- If you guarantee that known fields are ordered by field number, then why not make this a part of the encoding specification? I am asking, because (a) I have some code that I might reuse for writing a protobuf decoder, but it will be easier if I can be rely on fields being in sequence (some additional out-of-sequence fields at the end would be ignored), and (b) I think it would generally be possible to write a more efficient parser if the parser always 'knows what to receive next'. In that case it doesn't have to check a list of things that it might receive, but it just has to check the next item. Also, it wouldn't have to check in the end whether all required elements were present - it could fail immediately if it noticed a required element was missing.
On Wed, Jul 9, 2008 at 11:18 PM, Willem de Jong <w.a.d...@gmail.com> wrote:
- I am afraid I don't understand the point about unknown fields. What scenario are you thinking of? A sending system sending fields that are not present in the .proto file that is used on the sending side? In that case, how is the field number for such a field determined?
When you parse a message off the wire, if the parser sees unknown fields, it won't just ignore them. It puts the field values off to the side, in the message's UnknownFieldSet. If you then serialize the message without clearing it in between, the unknown fields are written back out. This way, if you have a server that acts as a proxy -- receiving messages and then forwarding them elsewhere -- you do not have to upgrade it every time you add a new field to your format.
- If you guarantee that known fields are ordered by field number, then why not make this a part of the encoding specification? I am asking, because (a) I have some code that I might reuse for writing a protobuf decoder, but it will be easier if I can be rely on fields being in sequence (some additional out-of-sequence fields at the end would be ignored), and (b) I think it would generally be possible to write a more efficient parser if the parser always 'knows what to receive next'. In that case it doesn't have to check a list of things that it might receive, but it just has to check the next item. Also, it wouldn't have to check in the end whether all required elements were present - it could fail immediately if it noticed a required element was missing.
It's guaranteed that serializing a protocol message object will write the tags in order. However, there are other ways to construct protocol messages. If you simply concatenate two messages, for example, this has the effect of merging them as if you used MergeFrom(). This is a useful property which we have actually used in some cases. We actually have found several cases where people wanted to write messages manually and not have to write the tags in order, so the format does not mandate an order.
I don't think you could really get that much of a performance improvement by assuming tags are ordered. Unless you message is entirely required fields (no optional or repeated), you would still have to check each tag. The code we generate now (for C++, at least) actually has an optimization where it predicts that the next tag in the input will be the next tag in sequence, and so it compares against that prediction before falling back to the switch.
But it wouldn't be terribly difficult (or even costly, I would say) to put them in the right order.
That supports my point, doesn't it? The optimization is probably counter productive if the fields are out-of-sequence, so it would make sense to add a recommendation to the standard that the fields SHOULD be in sequence whenever feasible.
Note that, if the elements can be out of sequence, you have to maintain some administration of what you have parsed, in order to be able to check whether you received everything (and an additional step at the end to execute this check). If the fields are in sequence, you can conclude that a field is missing when you receive the next field - no administration. This can also be done when some of the fields are optional, that doesn't really change it.