goog.proto2 et al.

715 views
Skip to first unread message

Thomas Broyer

unread,
Mar 31, 2010, 5:50:30 AM3/31/10
to Closure Library Discuss
Hi Googlers,

I'm reading through the goog.proto2 package and have some questions
about some missing things:
* test.proto and package_test.proto are missing from the SVN repo,
they could quite easily be reverse-engineered from the generated
*.pb.js files but I don't think open-sourcing the *.proto files
themselves would cause any harm to Google ;-)
* is there a plan to open source the proto->js compiler? is it a
protoc 2.3.0+ plugin or a modified protoc? (fortunately, if it's in C+
+ it shouldn't be hard to convert it into a protoc plugin)

Aside from that, can any Googler share some thoughts about how (and
why!) it's used (I mean goog.proto2.*) inside Google products?

Thanks in advance!

Andy Hochhaus

unread,
Apr 24, 2010, 4:19:22 PM4/24/10
to closure-lib...@googlegroups.com
Thomas,

I am in the process of getting protocol buffers working in the closure-library.

On Wed, Mar 31, 2010 at 4:50 AM, Thomas Broyer <t.br...@gmail.com> wrote:
>  * test.proto and package_test.proto are missing from the SVN repo,
> they could quite easily be reverse-engineered from the generated
> *.pb.js files but I don't think open-sourcing the *.proto files
> themselves would cause any harm to Google ;-)

I also was not able to find the test.proto and package_test.proto
files in the repository. As you suggested, I reverse-engineered what I
think the .proto files look like:

https://samegoal.com/s/test.proto
https://samegoal.com/s/package_test.proto

Anyone with access to the original files care to offer hints on
mistakes that I made? :-)

>  * is there a plan to open source the proto->js compiler? is it a
> protoc 2.3.0+ plugin or a modified protoc? (fortunately, if it's in C+
> + it shouldn't be hard to convert it into a protoc plugin)

I wrote a basic protoc extension that compiles *.proto files into
*.pb.js files. Testing the extension on my mock files from above and
comparing with the "golden" *.pb.js files from the repository, my
extension generates the same code with the exception of a few
whitespace differences.

However, while writing my extension, I discovered that the enum
defined in test.proto (proto2.TestAllTypes.NestedEnum) is not
goog.provided() at the top of test.pb.js. Can anyone comment if this
in intentional or an oversight in the internal proto->js compiler?

It also appears that the closure-library implementation of protocol
buffers specify two new (JSON based) wire formats of the protocol
message. If you are interested, these formats are described in:

http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/proto2/pbliteserializer.js
http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/proto2/objectserializer.js

I am currently in the process of adding support to the c++
implementation of protocol buffers so that we can easily (de)serialize
to/from both of these wire formats for communication from the server
(c++) to the client (js).

If anyone is interested in these protoc extensions let me know and I
will clean them up and release them under APL v2.

> Aside from that, can any Googler share some thoughts about how (and
> why!) it's used (I mean goog.proto2.*) inside Google products?

The reasons that I can think of are:

* It is nice to easily be able to share structured data between the
server and client using the same format on both. Protocol buffers are
widely used on the server, so it is nice to be able to seamlessly pass
those messages to the client.

* Using protocol buffers to access fields in your javascript is
"cleaner" than interacting directly with JSON. Since fields are
accessed with get/set methods instead of hard coded field names the
compiler can catch typos, collapse names, etc.

* Due to the design of the wire formats using "field number" rather
than field name, the size of the protocol messages is smaller than the
trivial JSON encoding for most messages. (In the case of PbJsLite
format the field number is not required -- instead inferring the
number from the array index).

* I think that the PbJsLite version of protocol buffers are used (for
some messages) in gmail. If you install firebug and look at the
response messages some of them look very much like PbJsLite. Can
anyone confirm?

Thanks,
Andy


--
Subscription settings: http://groups.google.com/group/closure-library-discuss/subscribe?hl=en

Thomas Broyer

unread,
Apr 24, 2010, 5:16:47 PM4/24/10
to Closure Library Discuss
First, I received a previous reply of a Googler but it didn't make it
to the list for some reason (whereas the group was in Cc).
What I learned:
* the .proto->.pb.js compiler is a hacked protoc, because it was
written before protoc had support for plugins
* goog.proto2 *might* not even be used in any Google product
* the *.proto files for test and package_test might be open sourced
later, it's more an oversight than a decision if they're missing.

On Apr 24, 10:19 pm, Andy Hochhaus <ahochh...@samegoal.com> wrote:
> Thomas,
>
> I am in the process of getting protocol buffers working in the closure-library.
[...]
> It also appears that the closure-library implementation of protocol
> buffers specify two new (JSON based) wire formats of the protocol
> message.

I'd rather say 3 if you consider ObjectSerializer actually deals with
2 object-based formats (keyed by field name or by field tag)

> I am currently in the process of adding support to the c++
> implementation of protocol buffers so that we can easily (de)serialize
> to/from both of these wire formats for communication from the server
> (c++) to the client (js).

I was rather thinking about either a protoc plugin (injecting methods
at messages and builders insertion points) or an approach based on
reflection, just like the current TextFormat or http://code.google.com/p/protobuf-java-format/
(I'm working with Java on the server-side); eventually using a
modified "protolib" to provide the default reflection-based
implementation in AbstractMessage and more importantly add the
appropriate methods to the Message(Lite) for polymorphism.

> > Aside from that, can any Googler share some thoughts about how (and
> > why!) it's used (I mean goog.proto2.*) inside Google products?
>
> The reasons that I can think of are:

My own thoughts on the matter:
http://tbroyer.posterous.com/exploring-using-protobuf-in-the-browser
http://tbroyer.posterous.com/using-protobuf-client-side-with-gwt

> * It is nice to easily be able to share structured data between the
> server and client using the same format on both. Protocol buffers are
> widely used on the server, so it is nice to be able to seamlessly pass
> those messages to the client.
>
> * Using protocol buffers to access fields in your javascript is
> "cleaner" than interacting directly with JSON. Since fields are
> accessed with get/set methods instead of hard coded field names the
> compiler can catch typos, collapse names, etc.

And more importantly: default values!

> * Due to the design of the wire formats using "field number" rather
> than field name, the size of the protocol messages is smaller than the
> trivial JSON encoding for most messages. (In the case of PbJsLite
> format the field number is not required -- instead inferring the
> number from the array index).

And for people reading our conversation without knowing the internals:
PbLite is actually the internal representation within
goog.proto2.Message, and goog.proto2 has a LazySerializer for PbLite,
where it'll decode the nested messages only when first accessed; which
IMO makes it the "lingua franca" or "PB as JSON" (see also the note
about goog.proto.Serializer at the end of this message).

> * I think that the PbJsLite version of protocol buffers are used (for
> some messages) in gmail. If you install firebug and look at the
> response messages some of them look very much like PbJsLite. Can
> anyone confirm?

I agree that it really looks the same, though GMail probably uses
proto "1" (note how goog.proto.Serializer overrides the JSON
serializer's serializeArray to "output empty slots when the value is
null or undefined", which has the consequence of preserving the
indices, just like in the PbLite format).

Erik Arvidsson

unread,
Apr 24, 2010, 5:41:00 PM4/24/10
to closure-lib...@googlegroups.com, Joseph Schorr (יוסף שור)‎
Offline Gmail uses JsPbLite (goog.proto). goog.proto2 was added at a
later point and it adds a better API that gets compiled down to almost
NOOPs.

I've attached the files Andy was asking about.

--
erik
test.proto
package_test.proto

Thomas Broyer

unread,
Apr 24, 2010, 8:12:27 PM4/24/10
to Closure Library Discuss


On Apr 24, 11:41 pm, Erik Arvidsson <a...@google.com> wrote:
>
> Offline Gmail uses JsPbLite (goog.proto). goog.proto2 was added at a
> later point and it adds a better API that gets compiled down to almost
> NOOPs.

Thanks for the info!

> I've attached the files Andy was asking about.

Hmm, so you're using a custom option too (javascript_package)... If
one would want to use an unmodified protoc (i.e. a protoc plugin), it
would have to be changed to "option (javascript_package) = ..." with
an added import for the *.proto that defines the option.
Anyway, thanks for sharing!

Andy Hochhaus

unread,
May 7, 2010, 10:42:21 AM5/7/10
to closure-lib...@googlegroups.com
Thanks for the files Erik. They were very helpful.

I have written two protoc plugins to enable communication between c++
on our server and js in the browser. Once I can cleanup the plugin
code and test them a bit more I'll be open sourcing them in case
anyone else has interest (hopefully within a week or so). The first
plugin generates *.pb.js for using protocol buffers in js. The second
plugin adds (de)serialization methods for the three "wire formats"
supported by the goog.proto2 package (described by Thomas in a
previous message).

However, when using js protocol buffers I am a bit confused by how
default values are handled.

According to the protocol buffer tutorial accessing unset optional
fields always returns the default value.

http://code.google.com/apis/protocolbuffers/docs/cpptutorial.html
"""
If an optional field value isn't set, a default value is used. For
simple types, you can specify your own default value, as we've done
for the phone number type in the example. Otherwise, a system default
is used: zero for numeric types, the empty string for strings, false
for bools. For embedded messages, the default value is always the
"default instance" or "prototype" of the message, which has none of
its fields set. Calling the accessor to get the value of an optional
(or required) field which has not been explicitly set always returns
that field's default value.
"""

However, in the js implementation, undefined is returned instead of
the default value. This appears to be expected behavior, as it is
asserted in the unit test:

http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/proto2/proto_test.html
"""
// Check non-set values.
assertUndefined(message.getOptionalInt32());
assertUndefined(message.getOptionalInt64());
assertUndefined(message.getOptionalFloat());
assertUndefined(message.getOptionalString());
assertUndefined(message.getOptionalBytes());
assertUndefined(message.getOptionalNestedMessage());
assertUndefined(message.getOptionalNestedEnum());
"""

This is a rather large inconsistency in how messages are handled
client/server side. Can anyone comment on the rational behind this
decision?

Thanks,
Andy

Andy Hochhaus

unread,
May 7, 2010, 11:38:20 AM5/7/10
to closure-lib...@googlegroups.com
On Fri, May 7, 2010 at 9:42 AM, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> This appears to be expected behavior, as it is asserted in the unit test

One other reason this looks intentional is that a "get$ValueOrDefault"
method does exist in goog.proto2 so it is trivial to make the behavior
match c++/java/py by modifying the compiler to replace calls to
"get$Value" with the OrDefault variant.

Thanks,
Andy

Andy Hochhaus

unread,
May 7, 2010, 12:51:56 PM5/7/10
to closure-lib...@googlegroups.com
For reference, I had an off-list conversation with Joseph who let me
know that get$Value is used for performance reasons while
get$ValueOrDefault can be used in the event that default values are
required. However, get$ValueOrDefault is not used by default as it
needs to access the metadata (for defaults) and is larger so it may
not be inlined as aggressively by the compiler. He stated that current
uses of protocol buffers in js were not using default values (or
explicitly setting them via code in the event they were required).

Joseph proposed a middle of the road solution allowing the user to
opt-in to getting default values returned (with the performance hit):

"""
function getFoo(opt_returnDefault) {
return opt_returnDefault ? this.get$ValueOrDefault(..) : this.get$Value(..);
}
"""

Going this route provides the best of both worlds...

Thanks,
Andy

Ivan Kozik

unread,
May 26, 2010, 2:07:38 PM5/26/10
to closure-lib...@googlegroups.com
On Fri, May 7, 2010 at 2:42 PM, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> I have written two protoc plugins to enable communication between c++
> on our server and js in the browser. Once I can cleanup the plugin
> code and test them a bit more I'll be open sourcing them in case
> anyone else has interest (hopefully within a week or so). The first
> plugin generates *.pb.js  for using protocol buffers in js. The second
> plugin adds (de)serialization methods for the three "wire formats"
> supported by the goog.proto2 package (described by Thomas in a
> previous message).

I am very interested in any ,proto -> *.pb.js compiler. Are these available
yet? (Or anyone else's implementation?) I'm new to Protocol Buffers and
it would be a big help. If I made my own it would be the third or fourth
redundant implementation.

Thanks a lot,
Ivan

Andy Hochhaus

unread,
May 26, 2010, 2:17:52 PM5/26/10
to closure-lib...@googlegroups.com
Hi Ivan,

On Wed, May 26, 2010 at 1:07 PM, Ivan Kozik <ivan....@gmail.com> wrote:
I am very interested in any ,proto -> *.pb.js compiler. Are these available
yet?

To the best of my knowledge, no official implementation has been released.
 
(Or anyone else's implementation?)

I have written two protoc extensions that we are using internally at my company. Unfortunately, I have not gotten around to cleaning them up enough to be released. Are you looking for only the *.proto -> *.pb.js portion or do you also need the c++ to json (de)serialization extension as well? If I know what you need I can try to cleanup that portion as a priority and get it out sooner.
 
Thanks,
Andy

Ivan Kozik

unread,
May 26, 2010, 2:39:25 PM5/26/10
to closure-lib...@googlegroups.com
On Wed, May 26, 2010 at 6:17 PM, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> I have written two protoc extensions that we are using internally at my
> company. Unfortunately, I have not gotten around to cleaning them up enough
> to be released. Are you looking for only the *.proto -> *.pb.js portion or
> do you also need the c++ to json (de)serialization extension as well? If I
> know what you need I can try to cleanup that portion as a priority and get
> it out sooner.

Thanks, Andy. I'm only looking for the *.proto -> *.pb.js portion at the moment.
I assume the JSON format is exactly what goog.proto2.PbLiteSerializer does,
right?

Ivan

Andy Hochhaus

unread,
May 26, 2010, 2:56:37 PM5/26/10
to closure-lib...@googlegroups.com
On Wed, May 26, 2010 at 1:39 PM, Ivan Kozik <ivan....@gmail.com> wrote:
Thanks, Andy. I'm only looking for the *.proto -> *.pb.js portion at the moment.

Sounds good. That is the chuck that requires the least amount of cleanup on my part. I'll do my best to get this out soon.
 
I assume the JSON format is exactly what goog.proto2.PbLiteSerializer does,
right?
 
If you only communicating in javascript, then the (de)serialization provided by goog.proto2 is sufficient. If you wish to talk to a different language (say c++, java or python on the server) then the other language must be extended to be able to read the three corresponding json formats that goog.proto2 knows about. They are discussed briefly up a few messages in this thread.

Andy 

Andy Hochhaus

unread,
May 27, 2010, 8:25:06 PM5/27/10
to closure-lib...@googlegroups.com
Hi Ivan,

On Wed, May 26, 2010 at 1:56 PM, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> I'll do my best to get this out soon.

I did the minimal cleanup necessary to get you a usable proto to pb.js
plugin. Unfortunately, as this is normally built buy our build system,
no Makefile or configure scripts exist. Instead, I've included some
generic instructions for how to compile in the README file.

https://samegoal.com/s/protobuf/

I've tested it on the two sample files provided by google and the
dozen or so files we are using it for internally and things seems to
mostly work. That said, I'm sure problems exist so if you find
anything wrong please let me know. The plugin for c++
(de)serialization isn't ready yet but it is still on my list of things
to do. Let me know if it becomes important for you and I'll try to
prioritize it.

Thanks,
Andy

Ivan Kozik

unread,
May 29, 2010, 4:45:03 PM5/29/10
to closure-lib...@googlegroups.com
On Fri, May 28, 2010 at 00:25, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> I did the minimal cleanup necessary to get you a usable proto to pb.js
> plugin. Unfortunately, as this is normally built by our build system,

> no Makefile or configure scripts exist. Instead, I've included some
> generic instructions for how to compile in the README file.
>
> https://samegoal.com/s/protobuf/

Many thanks! The plugin is working well for me. It might be a little while
before I use it in an application. After I understand the format, I'll write my
own Python serializer/deserializer.

I'm including my own build steps, nearly identical to the README, in case
they help anyone:

step -1: compile and install Google's protobuf to /opt/protobuf
step 0: wget -r -np the above URL; copy js/ into the protobuf source tree.

Then I used this build script, which worked on Ubuntu 10.04:

#!/bin/sh -e

export PROTOBUF="/opt/protobuf"
export OUTDIR="$PROTOBUF/bin"

# step 1: generate the necessary headers to complete plugin compilation

"$PROTOBUF/bin/protoc" -I . -I protobuf/src --cpp_out=. \
protobuf/js/javascript_package.proto

# step 2: compile the plugin

mkdir -p ./protobuf/js/build

g++ -I "$PROTOBUF/include" \
-I . \
./protobuf/src/google/protobuf/compiler/plugin.pb.cc \
./protobuf/src/google/protobuf/compiler/plugin.cc \
./protobuf/js/code_generator.cpp \
./protobuf/js/protoc-gen-js.cpp \
./protobuf/js/javascript_package.pb.cc \
"-l:$PROTOBUF/lib/libprotobuf.a" \
"-l:$PROTOBUF/lib/libprotoc.a" \
-lpthread \
-o ./protobuf/js/build/protoc-gen-js

sudo cp ./protobuf/js/build/protoc-gen-js "$OUTDIR/"

### END OF SCRIPT

This successfully generated both the .js and .cc files:

/opt/protobuf/bin/protoc --plugin=/opt/protobuf/bin/protoc-gen-js \
-I . -I ./protobuf/src --js_out=/tmp --cpp_out=/tmp protobuf/js/test.proto

Thanks again,

Ivan

Hochhaus, Andrew

unread,
Apr 17, 2011, 1:03:40 PM4/17/11
to closure-lib...@googlegroups.com
On Sat, Apr 24, 2010 at 3:19 PM, Andy Hochhaus <ahoc...@samegoal.com> wrote:
> I am currently in the process of adding support to the c++
> implementation of protocol buffers so that we can easily (de)serialize
> to/from both of these wire formats for communication from the server
> (c++) to the client (js).
>
> If anyone is interested in these protoc extensions let me know and I
> will clean them up and release them under APL v2.

I finally got around to releasing the C++ pb json wire format
(de)serialization plugin. You can find it in the "cppjs" directory:

http://code.google.com/p/protobuf-plugin-closure/

This project now has everything you need to share protocol buffers
between a c++ server and a js client.

-Andy

Hochhaus, Andy

unread,
Nov 24, 2015, 5:33:35 PM11/24/15
to closure-lib...@googlegroups.com
Sorry to revive an old thread. In case it is helpful to future searchers, we updated the protoc-gen-js protoc plugin to be implemented in go (and therefore easier to compile vs the previous c++ implementation). The project lives here:

    https://github.com/samegoal/protoclosure

Note that the project contains two distinct components which can be used together (or not) depending on your needs:
  1. Go library to Marshal/Unmarshal from the JSON formats understood by goog.proto2
  2. A protoc compiler plugin (protoc-gen-js) which generated goog.proto2-style *.pb.js files based upon *.proto files

-Andy

Reply all
Reply to author
Forward
0 new messages