Thinking of implementing: extract documentation in .proto file and store in FileDescriptorProto

Henner Zeller

unread,

Dec 22, 2009, 4:53:45 PM12/22/09

to Protocol Buffers

Hi,
Since this question came up earlier today and I have this anyway on my
TODO list, I think this is as well some nice side project for the
holidays I could work on ;)

Basically, there are two forms of comments typically found for
messages and fields: block comments in front of the declaration of the
message/field and a single end-of-line comment at the end of a field.
While often block comments are regarded as a multi-line /* ... */
block, I'd as well like to see a multi-line comment as multiple
consecutive lines of //-style comments:

/* some block
* comment
*/
message Foo {

/* some stray comment, not part of a field documentation */

/*
* some block comment
*/
int32 some_field = 1;

int32 some_other_field = 2; // short comment.

// Some stray comment, not part of a field. No documentation.

// Some block comment
// comprising of consecutive //-style comments
// over multiple lines with no newline in-between
int32 yet_another_field = 3;
}

There are several documentation styles out there such as JavaDoc or
Doxygen that require a particular start of a comment (like /** .. */
or /*! .. */ or ///... ). Is this a constraint we want to have or need
? I think this makes sense for these documentation tools as they are
designed for code that can have some arbitrary comments in-between.
The only requirement I'd propose is that there should be no empty line
between a block comment and the field/message it describes. This
enforces readability and prevents stray comments or file-header
comments being accidentally included in the documentation.

Thoughts ?

H.

Kenton Varda

unread,

Dec 22, 2009, 5:30:51 PM12/22/09

to Henner Zeller, Protocol Buffers

I agree, I don't think we should require a specific style for doc comments. Just take whatever comments appear before / on the same line as the field, as you describe.

One tricky issue is formatting. Javadoc requires paragraphs to be explicitly delimited using HTML (). I've always found this really annoying -- why can't it automatically insert paragraph breaks when it sees blank lines?

But more difficult is comments like this:

// Blah blah blah here is a list:

// * blah blah blah

// * blah blah blah blah

// * blah blah

Or comments like this:

// Blah blah blah, here is some example code:

// foo(bar);

// baz.qux();

Not to be confused with comments like this:

// TODO(kenton): Blah blah blah blah. Note that

// for TODOs I indent lines after the first. I'm not

// exactly sure why I do this but I always have.

Ideally I'd like to come up with reasonable rules which allow us to infer the proper formatting for comments that already exist today, rather than require developers to add explicit markup. Maybe you could do a survey of some existing .proto files to try to figure out the common patterns, and then detect those?

--

You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.

Henner Zeller

unread,

Dec 22, 2009, 5:56:38 PM12/22/09

to Kenton Varda, Protocol Buffers

On Tue, Dec 22, 2009 at 14:30, Kenton Varda <ken...@google.com> wrote:
> I agree, I don't think we should require a specific style for doc comments.
> Just take whatever comments appear before / on the same line as the field,
> as you describe.
> One tricky issue is formatting. Javadoc requires paragraphs to be
> explicitly delimited using HTML (). I've always found this really
> annoying -- why can't it automatically insert paragraph breaks when it sees
> blank lines?

I would not go with any paragraph formatting as suggested by JavaDoc,
just do a plain conversion of the comment, complete with newlines, but
with comment characters removed. So the final comment would actually
contain newlines so would need to be re-encoded with comment
characters in the particular target language if needed. If there is a
tool that generates HTML out of it, it has to include or as
it pleases.

So the following comment:

> But more difficult is comments like this:
>   // Blah blah blah here is a list:
>   // * blah blah blah
>   // * blah blah blah blah
>   // * blah blah

Would become " Blah blah blah here is a list:\n * blah blah blah ..."
Note that would remove the comment characters but leave the
whitespaces in front of the comments intact. Maybe we could remove the
common prefix whitespaces (so minimum of whitespaces found on all
lines of a block).

> Or comments like this:
>   // Blah blah blah, here is some example code:
>   // foo(bar);
>   // baz.qux();

> Not to be confused with comments like this:
>   // TODO(kenton): Blah blah blah blah. Note that
>   // for TODOs I indent lines after the first. I'm not
>   // exactly sure why I do this but I always have.

Garbage in, garbage out ;) So you would get a multi-line comment with
a different number of whitespaces in front of each line (maybe with
the common number of whitespaces (i.e. one) removed from all of them,
as suggested above).

> Ideally I'd like to come up with reasonable rules which allow us to infer
> the proper formatting for comments that already exist today, rather than
> require developers to add explicit markup.

I am purely for ASCII art formatting here - this is what the developer
writes in the proto file and this it what will end up in generated
source file - so exactly what everyone would expect. If people want to
add additional formatting to make stuff look good in HTML formatting,
then they'll get it, but I don't think the protocol compiler should do
anything with it except passing through.

So in particular, a newline is added by having an empty line in a block comment
/*
* Foo
*
* Bar
*/
Leading and trailing newlines are eaten; this is equivalent to
// Foo
//
// Bar

Both would result in " Foo\n\n Bar" (maybe with the removal of the
leading space).

To accommodate other comment styles, superfluous leading comment
characters are removed as well
//// Foo
//// Bar
or
/** Foo
*** Bar
***/
will both result in " Foo\n Bar".

> Maybe you could do a survey of
> some existing .proto files to try to figure out the common patterns, and
> then detect those?

Yeah, I've seen already in descriptor.proto, that you sometimes have
multi-line post-enum-value comments ( "Not ZigZag encoded. ..."). In
my little survey of some other protocol buffers, I've seen this style
sometimes, but I think that is too fragile to support as general
documentation style. So these have to changed to pre-enum-value block
comments.

I think after the basic implementation we can easily write a JavaDoc
like HTML documentation tool; in the generated doc we then will see
comments that are off and see if things need tweaking.

-h

Christopher Piggott

unread,

Dec 22, 2009, 6:00:49 PM12/22/09

to Protocol Buffers

On Dec 22, 4:53 pm, Henner Zeller <henner.zel...@googlemail.com>
wrote:

> /*
> * some block comment
> */
> int32 some_field = 1;
> int32 some_other_field = 2; // short comment.

I would be fine with that, but I also woudn't have a problem with you
requiring everything be a block, because you can still do it on one
line:

/**

* some block comment
*/
int32 some_field = 1;

int32 some_other_field = 2; /** short comment */

Notice I did keep the /** in there, because:

> Is this a constraint we want to have or need

I think so. I think it's helpful to say "This comment is special." I
can see a good argument, though, that it's redundant - especially if
"pass documentation comments to generated code" is a .proto file
"option." I like /** something */ because it fits well with java and
C/C++ (with Doxygen) and because I think the Python triple-quote is
ugly. If you really wanted // then I'd be happiest with ///

Ultimately, though, the .proto is its own language, so decide upon
whatever makes sense to you. It shouldn't be overly cumbersome or
ugly, and it should be reasonably easy for the .proto parsing code to
handle (so you don't wind up hating me).

> The only requirement I'd propose is that there should be no empty line
> between a block comment and the field/message it describes. This
> enforces readability and prevents stray comments or file-header
> comments being accidentally included in the documentation.

I agree.

There are some other things you didn't ask that are bothering me a
little. One has to do with fields. The fields themselves, at least
in java, are private, so documenting them in this way is not
especially useful. What you really want is to have these documents
put something meaningful in the .hasSomething(), .getSomething
(), .getSomethingCount(), etc. and in the builder, to .setSomething()
and .addSomething(), and similar methods.

How to make this work and "look good" is a real question in my mind.
If you do:

message Something {
required int32 ageField = 1; /** Age of this human */
}

what you really would want for "useful inline documentation" (using
javadoc as an exapmle, but same for Doxygen) would be something like

/** Get ageField
* @return Age of this human
*/
public int hasAgeField() { ... }

for the builder:

/** Set ageField
* @param value Age of this human
*/
public void setAgeField(int value) { ... }

and similar for lists etc. The "ageField" part I grabbed from the
field name, and the actual comment I applied to the @return and
@param.

I would have to take some time to think about how you would phrase
this so that it makes sense for lists.

This is kind of where I was going / what I was wishing for with regard
to fields. The decisions are a little more straight-forward when it's
documentation for messages, as that documentation I would expect to
more or less pass straight through unchanged, and use it to document
the classes being generated. (The fields are just more complicated,
since the actual fields in the class are private).

I'm not sure where you'd go with services - method calls I suppose,
but for java/doxygen those would be the form @param @param ...
@return. I don't really know python/php so I'm not sure how this maps
over to those languages.

Is that helpful?

Kenton Varda

unread,

Dec 22, 2009, 6:04:00 PM12/22/09

to Henner Zeller, Protocol Buffers

Preserving ASCII art sounds great to me, but I'm not sure how this will mesh with Javadoc. We can't just say "Your comments will be interpreted by the doc tool for the target language" because that makes it very hard to write comments that work nicely in all languages. So for Javadoc we'd presumably need to wrap the whole comment in <pre></pre>. I'm not sure how that would end up looking.

Christopher Piggott

unread,

Dec 22, 2009, 6:07:30 PM12/22/09

to Protocol Buffers

> > But more difficult is comments like this:
> >   // Blah blah blah here is a list:
> >   // * blah blah blah
> >   // * blah blah blah blah
> >   // * blah blah

Hmm. Javadoc would let you encode lists as <ul> <li> ... <li> ... </
ul> which would be nice, though I suppose not critical. Seems that
you could just pass the html through, though.

> Garbage in, garbage out ;) So you would get a multi-line comment with
> a different number of whitespaces in front of each line (maybe with
> the common number of whitespaces (i.e. one) removed from all of them,
> as suggested above).

Yeah.

Not to repeat my other message, but @param @return etc. are important
things for the accessors/setters/etc. Otherwise this won't be able to
generate documentation that IDEs will find useful. Documenting
private fields is of limited use, IMO. and in a lot of cases aren't
even translated to the final documentation.

Kenton Varda

unread,

Dec 22, 2009, 6:15:53 PM12/22/09

to Christopher Piggott, Protocol Buffers

On Tue, Dec 22, 2009 at 3:00 PM, Christopher Piggott <cpig...@gmail.com> wrote:

> Is this a constraint we want to have or need

I think so. I think it's helpful to say "This comment is special."

I disagree.

There are two cases:

1) The developer had the doc generator in mind when he wrote the file. In this case, he will have written the comments according to whatever rules we specify and therefore it is basically irrelevant what we specify.

2) The developer did not have the doc generator in mind. In that case, if we have a special style for doc comments, presumably there won't be in any in the file. What should we do then? Just not provide documentation? I think that instead we should make a best effort, which means assuming that a comment appearing immediately before a definition documents that definition. The results won't be perfect but they are better than nothing.

So in both cases, I don't see any strong argument for forcing the developer to mark his doc comments using a special style.

There are some other things you didn't ask that are bothering me a
little. One has to do with fields. The fields themselves, at least
in java, are private, so documenting them in this way is not
especially useful. What you really want is to have these documents
put something meaningful in the .hasSomething(), .getSomething
(), .getSomethingCount(), etc. and in the builder, to .setSomething()
and .addSomething(), and similar methods.

This is a good point. I think the best we can do is have the javadoc comments say something to the effect of "this field is documented as:" followed by the extracted documentation. If we try to assume that the comments from the .proto file make sense in any particular context, we're likely to be wrong a lot of the time. For example, if your examples you have:

/** Get ageField
* @return Age of this human
*/

public int getAgeField() { ... }

But what happens if age_field has a doc comment that is many lines long? Then it would no longer make sense to put in the @return clause.

In fact, we should probably only embed the documentation in the getter and then have all the other accessors simply link to the getter.

/** Get the field {@code age_field}.

*

* Documentation of {@code age_field} from {@code something.proto}:

* <pre>

* [insert docs from .proto file here]

* </pre>
*/
public int getAgeField() { ... }

Henner Zeller

unread,

Dec 22, 2009, 6:21:25 PM12/22/09

to Christopher Piggott, Protocol Buffers

Hi,

I don't see much gain in having to revisit all my existing protocol
buffer files to add the information that a comment is special ;) If I
commented a field, I probably meant to, uhm, comment it - so this is
what the protocol compiler get out of it.

Yeah, haven't thought too much about the target documentation yet
which will be done in each code generator explicitly. But it would go
along the lines of what you suggest. Some heuristics will evolve there
I guess (e.g. Using the first sentence as short documentation for the
field - some trick JavaDoc does).
The implementation would be two steps: first get the documentation in
the meta-data, then have the code generators generate the pretty
documentation

> This is kind of where I was going / what I was wishing for with regard
> to fields. The decisions are a little more straight-forward when it's
> documentation for messages, as that documentation I would expect to
> more or less pass straight through unchanged, and use it to document
> the classes being generated. (The fields are just more complicated,
> since the actual fields in the class are private).
>
> I'm not sure where you'd go with services - method calls I suppose,
> but for java/doxygen those would be the form @param @param ...
> @return. I don't really know python/php so I'm not sure how this maps
> over to those languages.
>
> Is that helpful?
>

Henner Zeller

unread,

Dec 22, 2009, 6:27:30 PM12/22/09

to Kenton Varda, Protocol Buffers

On Tue, Dec 22, 2009 at 15:04, Kenton Varda <ken...@google.com> wrote:
> Preserving ASCII art sounds great to me, but I'm not sure how this will mesh
> with Javadoc.

If we just pass through HTML formatting for folks that mainly work
with Java, that would be fine. But I don't want my C++ code (and for
that matter, my .proto-files) cluttered with HTML formatting gibberish
;)

> We can't just say "Your comments will be interpreted by the
> doc tool for the target language" because that makes it very hard to write
> comments that work nicely in all languages.

So instead of HTML, maybe some simple wiki like syntax ? But I guess
that would be overthinking the problem right now. Problem is, that it
is even worse parsing arbitrary HTML documentation if you want convert
it to a documentation format that does not support HTML.

> So for Javadoc we'd presumably
> need to wrap the whole comment in <pre></pre>. I'm not sure how that would
> end up looking.

Going with <pre></pre>, maybe with the heuristics if there are no
HTML-formattings found in the comment, sounds like a good initial
solution to me.

Kenton Varda

unread,

Dec 22, 2009, 6:36:02 PM12/22/09

to Henner Zeller, Protocol Buffers

I am certainly not arguing for HTML -- I am arguing against it.

Something wiki-like would be cool (usually I hate wiki but in this context I think it makes sense)... but I think that would be too big a dependency for what we're trying to do. So I think just embedding ASCII art with <pre> makes sense.

Henner Zeller

unread,

Dec 22, 2009, 6:47:30 PM12/22/09

to Protocol Buffers

.. alright, will spend some time tomorrow implementing things.

Christopher Piggott

unread,

Dec 22, 2009, 9:08:13 PM12/22/09

to Protocol Buffers

> > > Is this a constraint we want to have or need
> > I think so. I think it's helpful to say "This comment is special."
> I disagree.

OK, I concede. I tried to think of a good reason why I would have a
comment in the .proto but NOT want to have it in the generated code -
and I couldn't really think of one (not a good one anyway).

> Going with <pre></pre>, maybe with the heuristics if there are no
> HTML-formattings found in the comment, sounds like a good initial
> solution to me.

I'm not thrilled with the idea of precluding HTML formattings in the
comments, because they are useful. An example already presented was
for formatting lists. Some kind of very simple wiki-like approach
sounds interesting. I just worry that would make things difficult for
you - the guys who actually write protoc - and contrbute to bloat.

>I don't see much gain in having to revisit all my existing protocol
>buffer files to add the information that a comment is special ;) If I
>commented a field,

Yes. That is a very solid point.

There's another case we haven't talked about yet, and that is Enums,
though they aren't all that different than fields. It would be nice
to allow documentation on the Enum itself, as well as on its values.
See any problems/complications in that?

Reply all

Reply to author

Forward