Upcoming Go protobuf release

11,124 views
Skip to first unread message

Joe Tsai

unread,
Jan 29, 2018, 5:47:40 PM1/29/18
to golang-nuts, prot...@googlegroups.com

(If you don't use Go protocol buffers, you can stop reading)

Hello gophers,

This is an announcement that we will be merging the dev branch of github.com/golang/protobuf into master on April 30th (approximately 3 months from now).

This merge will introduce several significant changes:

  • A new table-driven implementation that is shown to be about 1.3x to 2.1x faster when tested inside Google.
  • The preservation of unknown fields in proto3 messages.
  • Validation that strings are valid UTF-8 as specified in the language guides.

When tested inside Google, we discovered that these changes were more disruptive than expected. The cause of the issues are mostly due to additional fields added to generated messages by protoc-gen-go:

  • An XXX_NoUnkeyedLiteral field that the generator now creates to force users to use keyed literals (e.g., foopb.Message{Name: "Golang", Age: 8} as opposed to foopb.Message{"Golang", 8}) to ensure forward compatible usage of messages.
  • An XXX_unrecognized field that is necessary for proto3 to always preserve unknown fields. This breaks users that assume comparability of proto3 messages. Since XXX_unrecognized is of type []byte, this means that proto3 messages cannot be used as map keys nor directly compared using the == operator.
  • An XXX_sizecache field that is part of the internal implementation of the table-driven serializer. The presence of this field can break tests that use reflect.DeepEqual to compare some received messages with some golden message. It is recommended that proto.Equal be used instead for this situation.

The semantic changes may also cause issues:

  • Strict validation of string fields as valid UTF-8. It is regrettable that the Go implementation has not enforced the encoding thus far, and that was an oversight on our part. However, in order for protobuf messages to be properly parsed by implementations in other languages, it is important to produce an error on the Go side to protect users from generating invalid messages. If your strings do not use UTF-8, then consider using the bytes type.
  • Preservation of unknown fields in proto3. In nearly all use-cases, this change should not cause an issue. However, the preservation of unknown fields can cause issues if you relied upon proto3 to drop unknown fields (e.g., for security reasons to avoid leaking sensitive information). Note that the proto3 specification has previously said such behavior is unspecified. To explicitly drop unknown fields, users should use proto.DiscardUnknown as appropriate.

Note that the Go Protocol Buffer compatibility agreement reserves the right to make changes to internal XXX fields or due to specification errors without violating backwards compatibility. We are still making an announcement in advanced so that appropriate user-side changes can be made beforehand so that the eventual merge causes as little disruption as possible.

We recommend that you try vendoring the the dev branch to check whether your code works properly with the upcoming Go protobuf changes. If you experience failures, we recommend that you perform the appropriate fix as mentioned in the points above. If you discover any issues with the dev branch, feel free to file an issue on the golang/protobuf tracker.

Thanks,

JT (on behalf of the Go library team)


P.S. Note that this only affects golang/protobuf and not gogo/protobuf, which is a fork of the former.

l...@pinkfroot.com

unread,
Jan 30, 2018, 1:45:55 AM1/30/18
to golang-nuts
Very nice!

Do the speed improvements also benefit Proto2?

thebroke...@gmail.com

unread,
Jan 30, 2018, 6:05:33 AM1/30/18
to golang-nuts
Yes, although not as much as for proto3. The limiting factor for proto2 is the pervasive use of pointers to primitive fields. In order to improve that further, we will likely have to change the generated API for proto2.

JT

Alexey Palazhchenko

unread,
Jan 30, 2018, 12:37:23 PM1/30/18
to golang-nuts
Hi,

Can you please add tags to the repository before that? SemVer or even tags with _any_ semantic would greatly help to rollback to the latest working version when things break.

–-–
Alexey «AlekSi» Palazhchenko

matiasbaruc...@gmail.com

unread,
Jan 30, 2018, 2:51:02 PM1/30/18
to golang-nuts
Awesome! Looking forward to try it out.

joe...@google.com

unread,
Jan 30, 2018, 5:44:37 PM1/30/18
to golang-nuts
Done. I tagged v1.0.0. When we perform the merge in the future, it will be tagged as v1.1.0.

Walter Schulze

unread,
Jan 31, 2018, 10:13:38 AM1/31/18
to golang-nuts
gogo/protobuf is happy to be acknowledged by Google as an entity in the golang protobuf space.
gogo/protobuf welcomes golang/protobuf to the community and is extremely happy to see this kind of transparency.

gogo/protobuf will also merge these changes and as usual try to stay as close as possible to golang/protobuf, 
including also following the same version tagging.

gogo/protobuf is disappointed that golang/protobuf still thinks that runtime reflection is an efficient way of serializing structures.

go Green go GoGoProtobuf

PS

gogo/protobuf is still open to being merged back into golang/protobuf and has been since its inception 5 years ago.
gogo/protobuf feels for its users, especially those that are not acknowledged by grpc-gateway and grpc-go,
and forced to employ work arounds, to preserve their missions of safety and efficiency.
It knows that its existence is not something that anyone prefers, and it welcomes death, 
but only if it can preserve its legacy of fast serailization and generating the structures you want to use.

thebroke...@gmail.com

unread,
Jan 31, 2018, 1:05:36 PM1/31/18
to golang-nuts
Thank you, Walter, for your support.

> gogo/protobuf is disappointed that golang/protobuf still thinks that runtime reflection is an efficient way of serializing structures.

The table-driven implementation avoids reflect in the fast and common path. Instead, are you referring to the fact that we don't perform full-code generation of Marshal/Unmarshal like what gogo/protobuf does? We are aware that full-code generation will often out-perform the table-driven approach we took. However, full code-generation drastically bloats the binary size when you have many proto messages linked in. Keeping the binary size smaller was an important design decision for us and seemed to be a better default.

We are open to considering an option that allows user to specify full-code generation for select messages.

> gogo/protobuf is still open to being merged back into golang/protobuf and has been since its inception 5 years ago.

That is good to hear. I have not yet gone through all of gogo/protobuf to determine what it would to merge, or what should be merged. This will be future work.

JT

Walter Schulze

unread,
Jan 31, 2018, 2:20:31 PM1/31/18
to thebroke...@gmail.com, golang-nuts
Hi JT please see my inline replies.

On Wed, 31 Jan 2018 at 19:05 <thebroke...@gmail.com> wrote:
Thank you, Walter, for your support.

> gogo/protobuf is disappointed that golang/protobuf still thinks that runtime reflection is an efficient way of serializing structures.

The table-driven implementation avoids reflect in the fast and common path. Instead, are you referring to the fact that we don't perform full-code generation of Marshal/Unmarshal like what gogo/protobuf does? We are aware that full-code generation will often out-perform the table-driven approach we took. However, full code-generation drastically bloats the binary size when you have many proto messages linked in. Keeping the binary size smaller was an important design decision for us and seemed to be a better default.

Yes, I was referring to the speed of code generation over runtime reflection.
What I struggle to understand is why the optimize_for file option that is part of proto 2 and 3 is not considered by golang/protobuf as a way to specify when code generation should be used over runtime reflection.
This seems to work for most other languages, including Java, which I heard is quite popular among real software developers.
 

We are open to considering an option that allows user to specify full-code generation for select messages.

This is exactly what gogo/protobuf allows users to do.
Using protobuf extensions gogo/protobuf allows the user to specify per message or file whether they want to generate marshalers, unmarshalers, etc.
A user can also create a vanity binary to generate these methods if you do not wish to use extensions and want to enforce a specific style across and organization.
 

> gogo/protobuf is still open to being merged back into golang/protobuf and has been since its inception 5 years ago.

That is good to hear. I have not yet gone through all of gogo/protobuf to determine what it would to merge, or what should be merged. This will be future work.

gogo/protobuf is also be open to only being partly merged.

One other major advantage of gogo/protobuf is generating the structures you want to use, by allowing you to modify the generated structure using protobuf extensions like customtype.
This way you can avoid copying between the protobuf generated structure and a user defined go structure that you actually want to use.
This is a huge speed and safety gain and probably the most important feature of gogo/protobuf.
proto3 has addressed the biggest concern by allowing the generation of fields without pointers, but there are other cases as well, including casttype, customname for generating more lintable code and even not generating the structure at all, for ultimate customization.
I would hope that merging some of these ideas will also be on the table.

Looking forward to working together for a change
Please let me know how I can help

Skeptically hopeful about a new era for protobufs in Go
Walter Schulze
 

JT

On Wednesday, January 31, 2018 at 7:13:38 AM UTC-8, Walter Schulze wrote:
gogo/protobuf is happy to be acknowledged by Google as an entity in the golang protobuf space.
gogo/protobuf welcomes golang/protobuf to the community and is extremely happy to see this kind of transparency.

gogo/protobuf will also merge these changes and as usual try to stay as close as possible to golang/protobuf, 
including also following the same version tagging.

gogo/protobuf is disappointed that golang/protobuf still thinks that runtime reflection is an efficient way of serializing structures.

go Green go GoGoProtobuf

PS

gogo/protobuf is still open to being merged back into golang/protobuf and has been since its inception 5 years ago.
gogo/protobuf feels for its users, especially those that are not acknowledged by grpc-gateway and grpc-go,
and forced to employ work arounds, to preserve their missions of safety and efficiency.
It knows that its existence is not something that anyone prefers, and it welcomes death, 
but only if it can preserve its legacy of fast serailization and generating the structures you want to use.


On Tuesday, 30 January 2018 23:44:37 UTC+1, joe...@google.com wrote:
Done. I tagged v1.0.0. When we perform the merge in the future, it will be tagged as v1.1.0.

On Tuesday, January 30, 2018 at 9:37:23 AM UTC-8, Alexey Palazhchenko wrote:
Hi,

Can you please add tags to the repository before that? SemVer or even tags with _any_ semantic would greatly help to rollback to the latest working version when things break.

–-–
Alexey «AlekSi» Palazhchenko

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/F5xFHTfwRnY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

liu...@google.com

unread,
Jan 31, 2018, 3:44:21 PM1/31/18
to golang-nuts


On Wednesday, January 31, 2018 at 11:20:31 AM UTC-8, Walter Schulze wrote:
Hi JT please see my inline replies.

On Wed, 31 Jan 2018 at 19:05 <thebroke...@gmail.com> wrote:
Thank you, Walter, for your support.

> gogo/protobuf is disappointed that golang/protobuf still thinks that runtime reflection is an efficient way of serializing structures.

The table-driven implementation avoids reflect in the fast and common path. Instead, are you referring to the fact that we don't perform full-code generation of Marshal/Unmarshal like what gogo/protobuf does? We are aware that full-code generation will often out-perform the table-driven approach we took. However, full code-generation drastically bloats the binary size when you have many proto messages linked in. Keeping the binary size smaller was an important design decision for us and seemed to be a better default.

Yes, I was referring to the speed of code generation over runtime reflection.
What I struggle to understand is why the optimize_for file option that is part of proto 2 and 3 is not considered by golang/protobuf as a way to specify when code generation should be used over runtime reflection.
This seems to work for most other languages, including Java, which I heard is quite popular among real software developers.

Note that the code-generation approach used by other languages (mostly C++/Java) has its own problem. Mostly because of the tight coupling among generated code, runtime and embedded sub messages (generated by a different party using a different version of protoc). These problem don't exist inside of google as we use a single repo build system, but it cause significant issues in opensource. For instance, Hadoop is still shipping protobuf v2.5 generated code which is incompatible with later version of protobufs. All the projects using Hadoop are then version locked to v2.5, as an upgrade in any project (including Hadoop itself) will break the build. Version upgrade can only happen when all the transitive dependency closure upgrade together atomically.

We solve this problem in protobuf-java v3.0+ by introducing ABI backward/forward compatibility guarantees on generated code and runtime. However, this introduced lots of overhead on code maintenance, reduced development velocity and limited the change we could do.  We are now solving the issue by introduce table driven to Java. The recent benchmark result showed performance on par for android platforms, and hopefully we can release the new implementation in a few months.

For Go, if we are going to introduce full generated code, I'd strongly recommend considering those complications. Major version bump is also expensive for protobufs as all the dependency libraries would have to bump their major version too.

Walter Schulze

unread,
Feb 1, 2018, 1:26:17 AM2/1/18
to liu...@google.com, golang-nuts

Could we perhaps get a code snippet example that non java programmers can follow.

PS

When I previously referred to java programmers as real software developers, I didn't add some much needed context.

Sometimes in this java dominated world I personally don't feel like a real software developer.

Jisi Liu

unread,
Feb 1, 2018, 2:29:14 AM2/1/18
to Walter Schulze, golang-nuts
This is not Java specific. I just used Java as an example. For most protobuf implementations, there is a contract between the runtime and generated code which is not meant to be public. e.g. In go-protobuf, runtime expects the generated classes define some internal fields, like XXX_unrecognized, and  XXX_sizecache.

If we are moving from table-driven to codegen, then I guess there will be more contracts between the runtime and generated code, and potentially introduce contracts also between the containing message and embedded message fields - you probably want to extract duplicate methods into runtime to save code size, and the generate code needs to deal with submessages itself.  This is my primary concern.

In a single repo development model, someone can change such contract as long as they change the protoc-gen and the runtime library at the same time. However, in open source, such change can cause dependency-hell issues as the different versions of generated protobufs are no-longer compatible with each other and/or with the runtime. With a more tightly coupled model, we are more prone to such issue.

One way to solve this is to make the runtime support all the previous versions of the contract indefinitely and be always backward compatible. Then users can switch to the newest runtime to support all the old libraries. We are following this approach in protobuf-java 3.x, which causes lots of dev overhead. I'm a bit worried that we would need to do the same for Go, if we go with the code-gen approach.

Henrik Johansson

unread,
Feb 1, 2018, 2:40:54 AM2/1/18
to Jisi Liu, Walter Schulze, golang-nuts

Surely these issues already exist in gogo-protobuf? How are they handled there?


You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Walter Schulze

unread,
Feb 1, 2018, 3:02:01 AM2/1/18
to Henrik Johansson, Jisi Liu, golang-nuts

I cannot recall that I have experienced these problems. The generated code has also been far easier to maintain than the runtime reflected code that is repeatedly patched by golang/protobuf. Except for import paths, this is probably one of the easiest projects a developer will get to work on. It is deterministic and you have awesome debugging output. The only thing that tires me is big patches like this one to the runtime reflected code and libraries that don't recognize gogoprotobuf and only import the golang/protobuf library.

I have seriously been pondering a protobuf generator that does not have a library as a dependency. It would be incredibly easy to maintain, since I don't have to merge golang/protobuf patches and get around the fact that grpc libraries only want a single protobuf library. I have one missing puzzle piece and then I just need time.

I don't get the argument for small binaries at all, but not importing a library will also decrease the binary size and given enough flexibility, which I intend to have, will also result in not generating the code you don't need.

A contract which expect your generated struct to have a marshal and unmarshal method is very simple to maintain.

I'll be at fosdem this weekend if anyone wants to chat in person and is there anyway.

johan.br...@gmail.com

unread,
Feb 12, 2018, 5:02:55 PM2/12/18
to golang-nuts
I'm hugely in favor of bringing the gogoproto enhancements into golang/protobuf, as a spectator to this discussion. I'm not sure what requirements would put generated binary size over marshalling speed, but maybe I haven't worked on large enough projects yet. We have used gogoproto for about 2 years at work without dependency hell, but again I recognize that I have little experience with large open source projects and dependency chains.

It might be interesting to note that the new kid on the gRPC block, Twitch Twirp, mentioned a large runtime library as one of the reasons they implemented their RPC-over-protobuf implementation.

Would love to hear more thoughts about this from the go protobuf team.

Tharaneedharan Vilwanathan

unread,
May 3, 2018, 7:01:44 PM5/3/18
to johan.br...@gmail.com, golang-nuts
Hi All,

Can someone share some details on this code merge? Has this happened? How can I play with it?

Thanks
dharani

--------

Hello gophers,

This is an announcement that we will be merging the dev branch of github.com/golang/protobuf into master on April 30th (approximately 3 months from now).

This merge will introduce several significant changes:

  • A new table-driven implementation that is shown to be about 1.3x to 2.1x faster when tested inside Google.
  • The preservation of unknown fields in proto3 messages.
  • Validation that strings are valid UTF-8 as specified in the language guides.
...

Joe Tsai

unread,
May 4, 2018, 1:41:02 PM5/4/18
to vdha...@gmail.com, johan.br...@gmail.com, golang-nuts
The merge has happened on pull request #591.

JT

Tharaneedharan Vilwanathan

unread,
May 5, 2018, 3:31:22 AM5/5/18
to joe...@google.com, Johan Brandhorst, golang-nuts
Hi Joe,

Thats, great to know! Thanks!!

Regards
dharani

witold....@gmail.com

unread,
Jun 11, 2018, 12:49:33 PM6/11/18
to golang-nuts
I am also interested in that. What is the increase in memory usage, allocator / GC impact, and binary size impact with heavy use of proto messages. Lets say few hundredth message types and lets say 100k messages in memory. Or some rules of thumbs. Thanks.
Reply all
Reply to author
Forward
0 new messages