Compatibility Issue + Max value for the indices/field numbers + are high field number slower?

a_teammate

unread,

Jun 20, 2016, 2:38:37 PM6/20/16

to Protocol Buffers

Hey there,

This might a stupid question, but i haven't found anything certain in the docs/specs about that:

Our main goal is actually to keep compatibility while syncing a tree.

The protocol is actually just one giant oneof containing all possible paths for the tree:

message TreeNodeChanged {

oneof key {

sometype path_to_node1 = 1;

sometype path_to_node2 = 2;

...

}

and thats already working.

The problem is however that we don't only want forward backward compability between server and client, but also sideways:

e.g. person A introduced message index 2 and also did person B, both meaning totally different things, but it should be recognized and ignored(/or maybe even accepted?! if we find a way to do this)

So the idea of my mate was to make the field number a hash of the path_to_node!

Candidates are e.g. a 32bit FNV-hash or maybe an adapted CRC32. This would both mean field numbers in a pretty high area.

Maybe we're going the totally wrong way here but following this path leads to the following issues:

1) what is the maximum value of the protobuf field numbers?

In the proto language specification it simply says its of type "intLit" and intLit is:

intLit     = decimalLit | octalLit | hexLit
decimalLit = ( "1" … "9" ) { decimalDigit }
octalLit   = "0" { octalDigit }
hexLit     = "0" ( "x" | "X" ) hexDigit { hexDigit }

So this means only decimal and hexadecimal values are actually allowed doesnt it?
Then however given:

decimalDigit = "0" … "9"
hexDigit     = "0" … "9" | "A" … "F" | "a" … "f"

Means it has different limits for hex and int notation, is that correct?

I mean:

the max value for decimalLit is one billion-1 : "999 999 999"

according to this specs, which fits fine in a 32bit integer (with 30bits set)

but for base 16 its allowed length is 16! which would be awesome cause that

would mean an allowed integer size of 64bit.

So which one is true? Both?

which leads to issue 2:

1) are there issues with high field numbers

And are they even tested at all?

I've red elsewhere that "we have used field numbers in the range 50000-99999. This range is reserved for internal use within individual organizations"

which would suggest that even values above 50 000 are uncommen ..

Furthermore some people mentioned high values would suffer from beeing less performant, but: in how far is that relevant? Only because the index number consumes slightly more memory?

Well: Maybe we totally ask the wrong questions here and theres a much simpler logic already introduced or invented to better make protobuf message version independent, if yes we would be happy to hear them!

Thanks in advance and for reading all this stuff :)

Jeremy Ong

unread,

Jun 20, 2016, 2:50:17 PM6/20/16

to a_teammate, Protocol Buffers

https://developers.google.com/protocol-buffers/docs/encoding#structure

Protobuf messages are associative arrays of key value pairs, where the key is a union of the field number encoded as a varint and the value wire type (union operator being a left shift of the field number by 3 bits). Because the field number is variable width, it's theoretical size is unbounded but is likely implementation dependent as some programming languages super arbitrarily large numbers, and other implementations might use fixed width types to represent the field for convenience.

If you are trying to prevent collisions between two people modifying the key space, I recommend making separate embedded messages so there is no chance of collision. CRC-ing field numbers is just too heavy weight for what it is you're trying to do in my opinion.

Regarding performance, varint encoding/decoding time is O(n) in the byte length of the result. Whether this is important depends on your application of course, but you're really better off understanding how the encoding works so you can do a quick back of the envelope guess to see if it matters, followed by actually benchmarking if performance is really that important to you.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

--

Jeremy Ong

PlexChat CTO

650.400.6453

Feng Xiao

unread,

Jun 20, 2016, 3:47:39 PM6/20/16

to a_teammate, Protocol Buffers

On Sun, Jun 19, 2016 at 7:56 AM, a_teammate <mad...@web.de> wrote:

Hey there,

This might a stupid question, but i haven't found anything certain in the docs/specs about that:

Our main goal is actually to keep compatibility while syncing a tree.

The protocol is actually just one giant oneof containing all possible paths for the tree:

message TreeNodeChanged {
oneof key {
sometype path_to_node1 = 1;
sometype path_to_node2 = 2;
...
}
}

and thats already working.
The problem is however that we don't only want forward backward compability between server and client, but also sideways:
e.g. person A introduced message index 2 and also did person B, both meaning totally different things, but it should be recognized and ignored(/or maybe even accepted?! if we find a way to do this)

If person A already added a field with field number 2, how can person B add another one with the same field number? Do you have multiple copies of the .proto files and they are not synced?

So the idea of my mate was to make the field number a hash of the path_to_node!
Candidates are e.g. a 32bit FNV-hash or maybe an adapted CRC32. This would both mean field numbers in a pretty high area.

Maybe we're going the totally wrong way here but following this path leads to the following issues:

1) what is the maximum value of the protobuf field numbers?

The range of valid field numbers is 1 to 2^29 - 1:

https://developers.google.com/protocol-buffers/docs/proto#assigning-tags

Some field numbers in this range are reserved so you will need to account for those as well.

In the proto language specification it simply says its of type "intLit" and intLit is:
intLit     = decimalLit | octalLit | hexLit
decimalLit = ( "1" … "9" ) { decimalDigit }
octalLit   = "0" { octalDigit }
hexLit     = "0" ( "x" | "X" ) hexDigit { hexDigit } 
So this means only decimal and hexadecimal values are actually allowed doesnt it?
Then however given:
decimalDigit = "0" … "9"
hexDigit     = "0" … "9" | "A" … "F" | "a" … "f"
Means it has different limits for hex and int notation, is that correct?

I mean:

the max value for decimalLit is one billion-1 : "999 999 999"
according to this specs, which fits fine in a 32bit integer (with 30bits set)

but for base 16 its allowed length is 16! which would be awesome cause that
would mean an allowed integer size of 64bit.

So which one is true? Both?

which leads to issue 2:

1) are there issues with high field numbers

And are they even tested at all?

I've red elsewhere that "we have used field numbers in the range 50000-99999. This range is reserved for internal use within individual organizations"
which would suggest that even values above 50 000 are uncommen ..

Furthermore some people mentioned high values would suffer from beeing less performant, but: in how far is that relevant? Only because the index number consumes slightly more memory?

Well: Maybe we totally ask the wrong questions here and theres a much simpler logic already introduced or invented to better make protobuf message version independent, if yes we would be happy to hear them!

Thanks in advance and for reading all this stuff :)

Jeremy Ong

unread,

Jun 20, 2016, 3:51:00 PM6/20/16

to Feng Xiao, a_teammate, Protocol Buffers

> The range of valid field numbers is 1 to 2^29 - 1

Ah thanks for pointing this out; I hadn't noticed that reading the specification. I think there are many implementations of protobufs in the wild that do not in fact enforce this so there are likely bugs out there if those languages interoperate with, say, C++ implementations that shift 4 byte integers with dynamically generated field numbers.

a_teammate

unread,

Jun 20, 2016, 6:36:56 PM6/20/16

to Protocol Buffers

Am Montag, 20. Juni 2016 20:50:17 UTC+2 schrieb Jeremy Ong:

https://developers.google.com/protocol-buffers/docs/encoding#structure

Protobuf messages are associative arrays of key value pairs, where the key is a union of the field number encoded as a varint and the value wire type (union operator being a left shift of the field number by 3 bits). Because the field number is variable width, it's theoretical size is unbounded but is likely implementation dependent as some programming languages super arbitrarily large numbers, and other implementations might use fixed width types to represent the field for convenience.

Ah! thank you for pointing that out, that lets me understand the structure of protobuf much better, and especially thanks for the link!

Am Montag, 20. Juni 2016 21:47:39 UTC+2 schrieb Feng Xiao:

On Sun, Jun 19, 2016 at 7:56 AM, a_teammate <mad...@web.de> wrote:

The problem is however that we don't only want forward backward compability between server and client, but also sideways:
e.g. person A introduced message index 2 and also did person B, both meaning totally different things, but it should be recognized and ignored(/or maybe even accepted?! if we find a way to do this)

If person A already added a field with field number 2, how can person B add another one with the same field number? Do you have multiple copies of the .proto files and they are not synced?

Well we're an open-source multiplayer-game and highly encourage modding: So it would be cool if a modded client meeting a modded server could work together (could be doable since our scripting uses a similar/the same API, if that's smart security-wise is another question ofc)

The protobuf code gets generated here from code reflection, so people don't need to deal with syncing themselves. That's where I meant would the CRCing would come into play,

well I assume I wasn't quite clear about that initially.

Am Montag, 20. Juni 2016 20:50:17 UTC+2 schrieb Jeremy Ong:

If you are trying to prevent collisions between two people modifying the key space, I recommend making separate embedded messages so there is no chance of collision. CRC-ing field numbers is just too heavy weight for what it is you're trying to do in my opinion.

Separate embedded messages would involve switches for the code generation (official build vs modded build) but could be doable, and maybe it is even a bit cleaner.

The CRC thing would have meant an uniform solution, but maybe namespacing the modded messages isn't the worst idea.

Another alternative would be to sync the metadata initially (so the modded server deals with the input according to the clients description of its own protocol, not on the servers assumption), well we've got the choice :)

Am Montag, 20. Juni 2016 20:50:17 UTC+2 schrieb Jeremy Ong:

Regarding performance, varint encoding/decoding time is O(n) in the byte length of the result. Whether this is important depends on your application of course, but you're really better off understanding how the encoding works so you can do a quick back of the envelope guess to see if it matters, followed by actually benchmarking if performance is really that important to you.

Yeah I see, well benchmarking will come into play sooner or later thats for sure!

Am Montag, 20. Juni 2016 21:47:39 UTC+2 schrieb Feng Xiao:

On Sun, Jun 19, 2016 at 7:56 AM, a_teammate <mad...@web.de> wrote:

1) what is the maximum value of the protobuf field numbers?

The range of valid field numbers is 1 to 2^29 - 1:

https://developers.google.com/protocol-buffers/docs/proto#assigning-tags

Some field numbers in this range are reserved so you will need to account for those as well.

Ah nice! so if our benchmarks suggests a negligible performane impact, hashing could be doable, since that area seems perfectly fine for that.

Jeremy Ong

unread,

Jun 20, 2016, 6:56:18 PM6/20/16

to a_teammate, Protocol Buffers

> Separate embedded messages would involve switches for the code generation (official build vs modded build) but could be doable, and maybe it is even a bit cleaner.

> The CRC thing would have meant an uniform solution, but maybe namespacing the modded messages isn't the worst idea.

I'd be really surprised if a CRC ended up being more performant than a single branch to see if the mod message exists if I'm understanding your situation correctly.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

a_teammate

unread,

Jun 20, 2016, 7:53:48 PM6/20/16

to Protocol Buffers, mad...@web.de

Am Dienstag, 21. Juni 2016 00:56:18 UTC+2 schrieb Jeremy Ong:

> Separate embedded messages would involve switches for the code generation (official build vs modded build) but could be doable, and maybe it is even a bit cleaner.
> The CRC thing would have meant an uniform solution, but maybe namespacing the modded messages isn't the worst idea.

I'd be really surprised if a CRC ended up being more performant than a single branch to see if the mod message exists if I'm understanding your situation correctly.

Well, if you'd use CRCs for the field numbers collisions wont be a problem (since our variable names are unique, even between modded servers talking with modded clients).

But on the other hand we could also do sth like adding a submessage e.g. "modding" and as modders need to register submessages in there according to their mods-name.

That would mean quasi-unique field numbers (quasi since different modders could use the same mod name) but would be good enough. :) And if long field numbers are out of the race, thats probably what we'll do. Or did you mean something different?

Thank you for your answers though!

Reply all

Reply to author

Forward