Reminder: Substrait Sync Meeting Today @ 8AM PDT / 11AM EDT

15 views
Skip to first unread message

Weston Pace

unread,
Jun 8, 2022, 6:52:59 AM6/8/22
to Substrait

Weston Pace

unread,
Jun 8, 2022, 12:06:16 PM6/8/22
to Substrait
Notes from the meeting (Notes are taken in the agenda doc so feel free
to correct there and check for historical meeting notes):

# Regarding docs

Carlo: Landing page is very friendly, at a high level of abstraction
“Page 2” suddenly gets very technical
Jeroen: Would be nice if website could be generated from protobuf,
less likelihood of conflicts too
Jacques: There is a lot of content that doesn’t belong in protobuf.
Some stuff could be done entirely in protobuf. But not all stuff.
Also, might not want to go too far into protobuf if we are going to
add a human readable format later. E.g. some things might be unique
to protobuf and should not be part of the spec. If we constrain
ourselves to the vocabulary of protobuf we are constraining ourself.
Jeroen: Protobuf-specific stuff does belong only in protobuf. The
other stuff is protobuf-oriented. Users need to know what the
protobuf messages are. Precision is lacking from the website.
Documentation for type expression grammar is difficult (essentially by
example)
Carlo: There are few places where we are out of sync (something in
spec but not in protobuf). Also, if protobuf becomes constraining,
then we need to tackle those constraints regardless. Also, makes it
much clearer which parts of the spec have been implemented / not yet
implemented.
Jacques: There are two levels of abstraction, logical concepts and
physical representation. Original idea was to align on concepts (the
spec is the concepts) and then move onto representation. Although, we
aren’t seeing this translation of spec to protobuf happening when gaps
are encountered (e.g. people just give up if it isn’t in the protobuf
instead of inventing the protobuf needed). Agree there are some gaps
/ lack of precision, but that’s partly just immaturity on the project
too. Haven’t even had anyone run TPC-H / TPC-DS fully yet. Would
like more real world use cases.
Carlo: The pattern of spec before protobuf is important now. In the
future, the protobuf is likely to be more important. Planning on more
intense use of Substrait, trying to introduce it between components
internally. Likely to start bringing up new people, and they are
going to be expecting more polish on the protobuf.
Jacques: Let’s get more tactical. We aren’t conceptually far off.
Proposals to improve documentation aren’t being rejected. Let’s just
make some PRs and we’re probably more aligned than we expect.
Anything that improves the content is a win right now. Let’s just
focus on comprehension. Tools, example, content that helps
understanding is welcome and lets just do it.

# Governance

Carlo: I feel like a guest at the moment. People might be a bit timid
because the current governance model doesn’t have a lot of formal
“contributor/committer” roles and if people have such a role they
might be more bold.
Jacques: I’d like to propose / put forward an Apache model.
Conceptually I feel like we are kind of doing this already. People
that are starting a new project are committers. Core repository has a
slightly higher bar at the moment.
E.g. right now: Phillip / Jacques are the PMC. There are a number of
committers (e.g. rust, c#, etc.) and they are focused on their areas.
Jacques: Minor clarification, people starting new projects need to
have some history of open source elsewhere or have some history on the
Substrait project. So there is a bit of background required.
Carlo: For existing repository will we add committership as people
show investment (commits, etc.)
Jacques: Yes.
Jacques: There is a bit of weirdness that it is easier to become a
committer on a new project than an existing one and that is just kind
of how it works. But we should generally find some good level of
initial investment to show interest in committership. Also, slowly
grow the group of people in the PMC as people show interest /
involvement above and beyond “the stuff I need for my current project”
and more “interested in success of the project as a whole”.
Carlo: I do like that each repository can have its own subcommunity.
There is some risk that each subcommunity (or some particular
subcommunity) is rather small and if they walk away there is a gap.
Jacques: There is some trust concern possible as people come in on a
new niche project and start a repo and become a committer. At some
point though, the project will mature, and the bar for starting repo
will be higher. Instead you can create your own repository and get
established first.
Carlo: Does it really matter if you’re working on a new project or an
established project? Let’s just say “X amount of work / investment
means committer”.
Jacques: Sounds good, but just to be clear, “X amount of work” isn’t
really concretely defined.
Jacques: I’ll write up a proposal for the mailing list. Also, can
someone find the Apache documentation on this?

# Java update

Jacques: There are 4-5 people working on the Java repo. Trying to get
to the point where TPC-H and TPC-DS can go from SQL to Substrait. For
example, one of the problems we are running into now is issues with
enumeration arguments (eg. extract date) and so we are running into
and tackling these issues as we go.
Jacques: Trying to round trip SQL / Calcite / Substrait. Goal is to
get a wider selection of plans / etc. and build up the example corpus.
Getting a lot of requests from Microsoft for plans
Carlo: There is a team that is converting relational algebra to
tensorflow. They’re looking into using Substrait and that’s probably
where the requests are coming from. Calcite integration will also
help as we use that elsewhere.

# Insert / Update / Create view

Carlo: Are insert, update, delete, create view a part of Substrait?
Jacques: 80% yes. A lot of these can be a pretty simple plan but
there are more complex variations too. Let’s just try proposing
something.

# Protobuf & layers of code

Carlo: Protobuf gives you Java/etc. objects natively. But protobuf is
very much against hierarchies. Might not be the most useful code
representation. Might be interested in tooling to help bridge this
gap.
Jacques: Specific to Kotlin, not sure if there is enough flexibility.
Might be some challenges as a consequence of the Java language.
Jacques: For reference, the Java bindings have two layers. One has a
richer abstraction and more methods around traversal, etc. However,
this is all tightly coupled to the protobuf and a pain to maintain.
Kotlin may have been the answer to avoid some of this maintenance
pain.
Reply all
Reply to author
Forward
0 new messages