Documentation (website vs proto)

64 views
Skip to first unread message

Carlo Aldo Curino

unread,
May 19, 2022, 8:26:30 PM5/19/22
to Substrait
Hello Folks,

Chatting with Jesus, we were pondering how we can keep the webiste and protobuf aligned. While the "spec first" mantra is very valid, it seems hard to keep things aligned. One option could be looking at things like protoc-doc-gen or the alike and pile much of our spec into the protobuf itself and autogenerate the website from that (as markdown/html or whathaveyou). If there is interest we can poke at it a bit and see if we can make something useful out of it (to avoid file bloat, we can have in the CI/CD also a "strip-comments" pass to reduce the .proto files for storage/sharing).

Thanks,
Carlo

Carlo Aldo Curino

unread,
May 19, 2022, 10:22:03 PM5/19/22
to Substrait
Actually, Jesus noticed that buf does provide some of it. Is this what we intend to use?

--
You received this message because you are subscribed to the Google Groups "Substrait" group.
To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/f94fa2eb-0d42-43bb-b59b-ca563ce85c4fn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jacques Nadeau

unread,
May 20, 2022, 12:55:30 PM5/20/22
to Substrait
It's a good question and something we've struggled with. We discussed it extensively here [1] without a lot of success. 

My general sense is that protobuf should be treated as one-of the representations of the spec, not the only one (even though today it is the only one). My hope was actually to embed the protobuf idls in the spec such as we did here [2] (see binary representation, etc tabs)

We already push to buf so you can review the protobuf info there [3]. Does buf provide anything beyond what this url already provides?


Carlo Aldo Curino

unread,
May 20, 2022, 1:08:55 PM5/20/22
to Jacques Nadeau, Substrait
I think buf is pretty good, and if we were to move all/much of the spec textual explanation in the comments that get surfaced by buf  (and link/surface in website) I think it would be great.  I agree with the principle of spec-first and protobuf being "just" one incarnation, but in practice I am with cpcloud it is likely the primary one for a while and over time I hope we will have many more users of that than people discussing how to change the high level spec. I think over-indexing on protobuf as the source of truth and keeping it as the corner of the project that is best documented is our best bet. While we can try to keep things aligned I have little faith we will be spot-on at all times, especially since we autorelease (so no specialized curation phase happening at every release).  I like a lot the tabs for [2], and if we were to use exactly the protobuf message names in the spec, and embed a tons of those snippets (autoamtically?) it might be an alternatively. 

Concretely the proposal would be to use protobuf as the source of truth and move all comments in there and autogenerate from there the docs (maybe with a layer of text around them and embedding heavily for ease of organization). When we have a second meaningful format (beside prettyprint) we can tackle the "how to keep the two aligned". 

<2cents>
Carlo

Carlo Aldo Curino

unread,
May 25, 2022, 4:05:36 PM5/25/22
to Jacques Nadeau, Substrait
What we are doing for Plan (Binary Serialization - Substrait: Cross-Language Serialization for Relational Algebra) I think it is pretty good. The .proto has some good level of docs, and they are embedded (with extra explanation) in the website. Should we move to this? I can (clumsily) start doing some of this, where I get ideas from the website port them to .proto and embed the references in the website (and someone review as I will make mistakes). 

Thanks,
Carlo

Jeroen van Straten

unread,
May 25, 2022, 5:25:38 PM5/25/22
to Substrait, Jacques Nadeau
Big +1 from me.

Adding enough comments to the .proto files for them to basically become the spec in and of itself has been on my todo list for a while; I'm a big fan of putting docs as close to the relevant sources as possible, to minimize cross references and to promote actually updating the docs when something is changed. The more of the *actual* docs aka website that can be built from that, the better.

The same thing could be done with YAML, using a golden extension file that has a bit of everything as well as copious amounts of comments. That file could then be schema-validated as part of CI. I'd suggest using the schema file instead of an example, but I don't find jsonschema wrapped in YAML all that self-explanatory...

For the type expression grammar I've rabbit-holed my way into making a tool that converts as basic, formal, and descriptive as possible EBNF into ANTLR, something using nom (the de-facto standard for parsing in Rust), and maybe flex/bison for the hell of it, because otherwise I'm kind of stuck for validating those things. It and its docstrings could then be used similarly. (If I actually finish this it'll probably be a spinoff project like jdot aka jsom. If it's too ridiculous a sufficiently commented ANTLR grammar would suffice, too)

I think those things plus an explanation of the type system and a general overview could more or less cover the whole spec.

You received this message because you are subscribed to the Google Groups "substrait" group.

To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.

Jacques Nadeau

unread,
May 29, 2022, 8:27:02 PM5/29/22
to Carlo Aldo Curino, Jacques Nadeau, Substrait
Yes, I think ti would be best to pattern that way:

Basic detail in proto. Proto embedded in website. Enhanced detail in website. I think we should require future modifications of the proto to include embedding in the correct place in the website so we can get there (similar to plan).

Carlo Aldo Curino

unread,
Jun 3, 2022, 8:27:10 PM6/3/22
to substrait

Should we capture this (and other conventions/choices we are making as a community) in a "how to contribute" section of the website?  Also things like what roles do we have (contributor/committer/PMC), what distinguish them, how do we mint committers etc?

Thanks,
Carlo

Jacques Nadeau

unread,
Jun 4, 2022, 1:07:50 PM6/4/22
to Substrait
Yes to the first part. We haven't formalized the governance part. I have it on my list to propose a model for that. 

You received this message because you are subscribed to the Google Groups "substrait" group.

To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.

Carlo Aldo Curino

unread,
Jun 5, 2022, 5:05:11 PM6/5/22
to subs...@googlegroups.com
Sounds good. I propose we settle the overall governance and then we can write it all up as part of the how-to-contribute (or linked pages). 

Thanks,
Carlo

You received this message because you are subscribed to a topic in the Google Groups "substrait" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/substrait/oXY2IRpKvRo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAJ9XdSq42hkS_h%2BHPb--Xx%3DKfgyefX4Yv5Gt%3D05Jp43p0Te%3DHQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages