Thoughts on protoc plugins

194 views
Skip to first unread message

Kenton Varda

unread,
Oct 28, 2009, 6:18:04 PM10/28/09
to Protocol Buffers
Hi all,

I just had an idea for making protoc more extensible:

Currently, it is possible to write custom code generators and link against libprotoc to build custom protoc-like binaries.  This allows third-party implementations to reuse the protoc front-end while keeping development independent.  The down side is that third-party implementations must publish a different binary rather than being accessible through protoc itself.  Also, writing third-party code generators in languages other than C++ is tricky.

It has been proposed before that protoc could use a "plugin" infrastructure for code generators.  Third-party generators would be compiled as dynamic libraries and loaded by protoc at runtime.  The disadvantage of this approach is that dynamic loading works very differently on different platforms.  Worse, compiled plugins would be tightly-coupled to a particular version of protoc, meaning they would all have to be re-compiled when protoc is updated to a new version.

Instead, I propose a similar architecture, but where each "plugin" is a complete *binary*.  protoc would invoke the binary as a sub-process and communicate with it over stdin and stdout.  The communication protocol would be defined using protocol buffers!

  message CodeGeneratorRequest {
    // FileDescriptorProtos for the .proto files listed on the command line and everything
    // they import.
    optional FileDescriptorSet parsed_files = 1;

    // The .proto files that were explicitly listed on the command-line.
    repeated string files_to_generate = 2;
    
    // The generator parameter passed on the command-line.
    optional string parameter = 3;

    // Directory to which output files should be written.
    optional string output_directory = 4;
  }

  message CodeGeneratorResponse {
    // Error message.  If non-empty, code generation failed.
    optional string error = 1;

    // names of files that were generated.
    repeated string generated_files = 2;
  }

For code generators written in C++, we'd provide a simple front-end library for plugins that implements the above protocol in terms of the existing CodeGenerator interface.

Plugins will be allowed to be placed anywhere in PATH and will be identified by the file name:

  protoc-$LANGUAGE

E.g.:

  protoc-cpp  (implements --cpp_out)

Advantages (vs. dynamic libraries):

* No binary compatibility problem.  New versions of protoc will work fine with old plugins and vice versa.  Since the protocol is based on protobufs, we can easily extend it without breaking compatibility.  Note that code generator binaries will have to link against libprotobuf, but they can easily link against a different version than protoc itself links against.

* Code generators can be written in any language.  To avoid the obvious bootstrapping problem when writing a code generator in the language that it generates, we could add an option for protoc to communicate with the plugin in JSON format instead of protobuf binary format.  In fact, perhaps we should always communicate in JSON since it would avoid binary/text conversion issues.

* Code generators can easily define and use custom options that protoc itself doesn't know about.

* If we made the official implementations be plugins as well, then you could actually install and use multiple versions of the code generators.  This is particularly useful since versions of the runtime library are tightly-coupled to versions of protoc, and in some cases you may find yourself stuck using an older version of the runtime library.

* Easier to implement.

Thoughts?

Peter Keen

unread,
Oct 28, 2009, 6:52:17 PM10/28/09
to Kenton Varda, Protocol Buffers
This sounds great! Communicating using a well-defined JSON spec sounds
like a better idea than trying to communicate using protobufs. One
thing I would suggest is having a protocol version number in there
somewhere so the generator knows what version it's targeting.

--Pete

Kenton Varda

unread,
Oct 28, 2009, 7:37:26 PM10/28/09
to Peter Keen, Protocol Buffers
On Wed, Oct 28, 2009 at 3:52 PM, Peter Keen <peter...@gmail.com> wrote:
This sounds great! Communicating using a well-defined JSON spec sounds
like a better idea than trying to communicate using protobufs. One
thing I would suggest is having a protocol version number in there
somewhere so the generator knows what version it's targeting.

The code generator should generate exactly the same code regardless of the parser version, since newer parsers should never change the behavior of existing features.

If you want to be able to target different runtime library versions, the way to do this would be to install multiple versions of the code generator under different names.  Or, perhaps the code generator can accept an option for this via the generator parameter -- but this would just be passed verbatim from the command-line, not set by protoc.

So, I don't think the parser version should be part of the protocol.

Peter Keen

unread,
Oct 28, 2009, 8:01:06 PM10/28/09
to Kenton Varda, Protocol Buffers
On Wed, Oct 28, 2009 at 4:37 PM, Kenton Varda <ken...@google.com> wrote:
> The code generator should generate exactly the same code regardless of the
> parser version, since newer parsers should never change the behavior of
> existing features.

Yup, I misunderstood that part of your original email. Makes perfect sense now.

--Pete

Neil T. Dantam

unread,
Oct 28, 2009, 11:24:08 PM10/28/09
to Kenton Varda, prot...@googlegroups.com

Kenton Varda wrote:
> Also, writing third-party code generators in languages other than C++ is
> tricky.

Yes, I opted not to bother with .proto parsing for s-protobuf, using
only the protobuf-encoded FileDescriptorSet's that protoc can emit.

> Instead, I propose a similar architecture, but where each "plugin" is a
> complete *binary*. protoc would invoke the binary as a sub-process and
> communicate with it over stdin and stdout. The communication protocol would
> be defined using protocol buffers!

Sounds like an excellent idea, and since you seem to want to reuse
FileDescriptorSet, I could make this work with s-protobuf with
minimal effort.

> To avoid the obvious bootstrapping problem when writing a code
> generator in the language that it generates, we could add an
> option for protoc to communicate with the plugin in JSON format
> instead of protobuf binary format.

Personally, I solved this problem with a (very easy) manual
translation of FileDescriptorSet to the internal data structures
used by my code generator. JSON may be a nice extra, but it doesn't
seem like a hard requirement.

> In fact, perhaps we should always communicate in JSON since it
> would avoid binary/text conversion issues.

Please do provide the protobuf binary encoding, at least as an
option.

> Thoughts?

This would be quite spiffy.

--
Neil

Kenton Varda

unread,
Oct 28, 2009, 11:46:03 PM10/28/09
to Neil T. Dantam, prot...@googlegroups.com
If JSON is an option, how should protoc detect that a plugin wants to use JSON?  We could make it part of the filename, e.g.:
  protoc-foo-json  (implements --foo_out)
This seems ugly.  But, other options I can think of (a config file, some sort of handshake) would be a lot more complicated.

Hmm, perhaps protoc could start out assuming the generator wants binary protos, but if it exits with a special error code, this indicates that it would really prefer JSON instead.  protoc then executes it again with the --json command-line flag.  This still seems convoluted, but less error-prone than making it part of the file name.

I guess another option is for the plugin to start out by writing a line of text to stdout specifying which encoding to use.  protoc then has to wait until it has received this line before it can send the generator request, but that's probably not a big deal.  On the other hand, if the plugin doesn't write the line, protoc hangs, which may be awkward for people trying to write plugins.

Another idea:  protoc could always use binary format, but a tool could be provided that translates between binary protobufs and json.  Then if you want JSON format, what you do is wrap your plugin in a simple shell script which pipes the input and output through the translator tool.  Thus protoc only needs to know binary format and your tool only needs to know JSON.  Assuming most people only need to use JSON to bootstrap their initial implementation, the extra complication of injecting a middleman may not be a problem.

I like the last idea because it is the most modular.  Any other ideas?
Reply all
Reply to author
Forward
0 new messages