can proto 3 message's bytes parsed by proto2?

927 views
Skip to first unread message

Jason Huang

unread,
Mar 14, 2019, 1:21:10 PM3/14/19
to Protocol Buffers
i chose proto3 for cache (serialize) for my application , and it have run for several month . but now i wanna change my mind for proto2 , because the hasField is really needed .

the problem is , there're still lots of data in the cache which are serialized with proto3 , if i can't deserialize with them with proto2 . it will be unacceptable .

my question is , is it safe to switch from proto3 to proto2 ? i made some test i could work in some case . but i'm afraid this is not full coverage test .

my current proto3 version is 3.6.1 , and the proto2 version i want to use is 2.6.1

Adam Cozzette

unread,
Mar 14, 2019, 2:12:31 PM3/14/19
to Jason Huang, Protocol Buffers
Going from proto3 to proto2 should be fine. There are some slight differences but I can't think of any major problems. The only thing that comes to mind is that proto2 handles unknown enum values a little bit differently from proto3. I doubt that would be a problem but if you want to be extra cautious you could double-check that you're not storing any unknown enum values.

However, there is no need to downgrade to version 2.6.1 and if anything that would only introduce bugs and make the code slower. The proto2 semantics are still fully supported in all versions going forward, so all you have to do is put syntax = "proto2"; at the top of your .proto files. You can stick with 3.6.1 or even upgrade to any newer version.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Jason Huang

unread,
Mar 14, 2019, 10:29:52 PM3/14/19
to Protocol Buffers
thanks for your reply , as for the `unknown enum values` , do you mean the case of a unseted enum field ?

Michael Powell

unread,
Mar 15, 2019, 2:16:57 PM3/15/19
to Protocol Buffers


On Thursday, March 14, 2019 at 2:12:31 PM UTC-4, Adam Cozzette wrote:
Going from proto3 to proto2 should be fine. There are some slight differences but I can't think of any major problems. The only thing that comes to mind is that proto2 handles unknown enum values a little bit differently from proto3. I doubt that would be a problem but if you want to be extra cautious you could double-check that you're not storing any unknown enum values.

However, there is no need to downgrade to version 2.6.1 and if anything that would only introduce bugs and make the code slower. The proto2 semantics are still fully supported in all versions going forward, so all you have to do is put syntax = "proto2"; at the top of your .proto files. You can stick with 3.6.1 or even upgrade to any newer version.

On Thu, Mar 14, 2019 at 10:21 AM Jason Huang <jasonhu...@gmail.com> wrote:
i chose proto3 for cache (serialize) for my application , and it have run for several month . but now i wanna change my mind for proto2 , because the hasField is really needed .

the problem is , there're still lots of data in the cache which are serialized with proto3 , if i can't deserialize with them with proto2 . it will be unacceptable .

There is "some" backwards compatibility if you need to import proto3 into proto2. Groups is the chief one that the docs talk about that I know of.

Adam Cozzette

unread,
Mar 19, 2019, 6:50:37 PM3/19/19
to Jason Huang, Protocol Buffers
Not exactly, by unknown enum value I mean an enum value that doesn't appear in the enum definition. For example let's say your enum has only values 0, 1, and 2 but you parse a 3. This could happen if the message was serialized by another binary using a newer version of the schema. Proto2 will store unknown enum values in the unknown field set whereas proto3 will just store them normally in the field.

Michael Powell

unread,
Mar 19, 2019, 7:13:59 PM3/19/19
to Protocol Buffers


On Tuesday, March 19, 2019 at 6:50:37 PM UTC-4, Adam Cozzette wrote:
Not exactly, by unknown enum value I mean an enum value that doesn't appear in the enum definition. For example let's say your enum has only values 0, 1, and 2 but you parse a 3. This could happen if the message was serialized by another binary using a newer version of the schema. Proto2 will store unknown enum values in the unknown field set whereas proto3 will just store them normally in the field.

I think in the above scenario, that would likely break whether v2 or v3, but I could be wrong.

I would have to re-read the language guide. You can parse an ordinal value where the Enumeration was expected?

i.e. UNKNOWN = 0;

Would accept 0 or UNKNOWN?

Are we talking descriptor / protobuf specification level? Or binary level? 

I do not read any biases where unexpected ordinal values are concerned, but I would expect that it fail any sort of verification.

That being said, specification versioning is a concern regardless of whether v2 or v3, I think, and not just with Protocol Buffers. It's a concern for this type of framework, regardless.

Adam Cozzette

unread,
Mar 19, 2019, 8:25:38 PM3/19/19
to Michael Powell, Protocol Buffers
Oh, I am talking about the binary format in particular. In that scenario it's important for unknown enum values to be handled in some way, since you might want to add a new enum value but it should still be parseable by older binaries.

Michael Powell

unread,
Mar 19, 2019, 9:07:44 PM3/19/19
to Protocol Buffers


On Tuesday, March 19, 2019 at 8:25:38 PM UTC-4, Adam Cozzette wrote:
Oh, I am talking about the binary format in particular. In that scenario it's important for unknown enum values to be handled in some way, since you might want to add a new enum value but it should still be parseable by older binaries.

From reading the language guide, etc, I take it that the default to the first value. That can be anything, 0, 1, 99, whatever. I could be wrong there, however; my current interest in Protobuf is in the v2 descriptors only, at the moment.

Adam Cozzette

unread,
Mar 20, 2019, 10:41:43 AM3/20/19
to Michael Powell, Protocol Buffers
That is true, in proto2 the default value of an enum field is the first value. But the behavior is more confusing than one might expect. Let's say the only enum values declared are 0, 1, and 2 but you parse a 3 from the wire. If you examine that field it will appear to be empty and you can read it to get the default value (0 in this case assuming that was the first value defined). But the 3 is still there, hidden in the unknown field set. Once you reserialize the proto, the 3 will be serialized again. Worse still, since unknown fields are serialized after known fields (that's not a requirement but typically happens in practice), the unknown enum value can overwrite a change that your code tried to make. This is why proto3 changed things so that unknown enum values are stored normally instead of in the unknown field set, and this makes it much easier to reason about.

Michael Powell

unread,
Mar 20, 2019, 10:58:41 AM3/20/19
to Protocol Buffers


On Wednesday, March 20, 2019 at 10:41:43 AM UTC-4, Adam Cozzette wrote:
That is true, in proto2 the default value of an enum field is the first value. But the behavior is more confusing than one might expect. Let's say the only enum values declared are 0, 1, and 2 but you parse a 3 from the wire. If you examine that field it will appear to be empty and you can read it to get the default value (0 in this case assuming that was the first value defined). But the 3 is still there, hidden in the unknown field set. Once you reserialize the proto, the 3 will be serialized again. Worse still, since unknown fields are serialized after known fields (that's not a requirement but typically happens in practice), the unknown enum value can overwrite a change that your code tried to make. This is why proto3 changed things so that unknown enum values are stored normally instead of in the unknown field set, and this makes it much easier to reason about.
 
You may need to do some vetting of your actual domain data before/after parsing. That's the extent of my knowledge, at least as far as the docs describe it. Perhaps there are also posts, blogs, etc, concerning how to manage versioning of assets; is a problem not exclusive to protocol buffers, ZeroC, ZeroMQ, WCF from back in the day, even nanomsg, I would imagine.
Reply all
Reply to author
Forward
0 new messages