Visualizing protobuf relationships

291 views
Skip to first unread message

Dane Pitkin

unread,
Jun 20, 2023, 2:56:22 PM6/20/23
to subs...@googlegroups.com
Hello,

Has anyone tried generating visualizations for Substrait spec protobuf relationships? Does this sound useful? 

As a newcomer to Substrait, I often find myself routinely parsing protobuf definitions/documentation or reading library implementations to understand the different combinations of Substrait plans that can be created. It could be helpful if Substrait auto-generated something like a UML diagram to depict the relationship tree containing the Substrait spec. If useful enough, it might even be worth archiving per version to help document the differences. 

From my initial research, there are a handful of small libraries out there, but I haven't found anything that I think would be great at visualizing Substrait's protobuf relationships yet. For example, this was generated from an old version of the substrait-python protobuf classes[1] and it is too hard to read IMO.

Thanks,
Dane

Aldrin

unread,
Jun 20, 2023, 3:35:47 PM6/20/23
to subs...@googlegroups.com
> For example, this was generated from an old version of the substrait-python protobuf classes and it is too hard to read IMO.

I think for this reason, visualizing the spec doesn't seem too useful to me. I often look at the protobuf definitions to figure out what is possible and that works fine for me.

However, visualizing a specific query plan would be extremely useful for anyone trying to do anything with a substrait plan itself, which I'll eventually be trying to do. Also, this type of visualization is how admins and developers understand if a generated plan is sub-optimal, etc. to determine how the source of the plan should be altered (or how to fix the planner or optimizer).

publickey - octalene.dev@pm.me - 0x21969656.asc
signature.asc

David Sisson

unread,
Jun 20, 2023, 6:39:35 PM6/20/23
to subs...@googlegroups.com
When I started looking at the protobuffers I used protodot to convert them into a graph.  To make things more manageable I concentrated on the relations since that's the main focus of Substrait anyway (there's a solid argument for expression although I never really needed it).  Once I understood those basics I have just lived in the protobuf file.

My go-to for understanding a plan is to just run it through the binary plan to text plan converter.  The pipelines section is really handy to see how the relations are interconnected.

--
You received this message because you are subscribed to the Google Groups "substrait" group.
To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAA4tKkB8YBVj%3DApOGARYL-xEd%3DT4yLrm0%3DRfbqWM3QjRX0T9LA%40mail.gmail.com.
substrait-relation.svg.dot.svg.gz

Dane Pitkin

unread,
Jun 22, 2023, 10:18:35 AM6/22/23
to subs...@googlegroups.com
The protodot image does look much better than what I generated! Thanks for sharing.

Do we think the substrait.io documentation could be organized in a more effective way? For example, ExtendedExpression is a top level protobuf message like Plan, but it is buried in the Expressions documentation. Organizing the documentation by message hierarchy could better showcase Substrait's capabilities/purpose. This wouldn't be easy to do, so I initially thought a separate auto-generated image might suffice as well. As a newcomer, this is probably the biggest hurdle for me in being able to quickly understand how to utilize Substrait.

Just my 2c!

Weston Pace

unread,
Jun 22, 2023, 11:22:25 AM6/22/23
to subs...@googlegroups.com
> Do we think the substrait.io documentation could be organized in a more effective way?

This has come up a few times and I'm pretty sure the answer is "Yes, it probably could be organized better, feel free to propose something".  In particular, if you come up with a useful visualization, then please do find a way to upload it somewhere.

One particular thing that has been recommended is that we could use more tutorial / explanatory content.  However, I think we want it sort of parallel / sibling to the specification, and not intertwined with it.

That's kind of unrelated to your point on extended expression though.  I agree that it makes sense to raise that up somewhere higher in the documentation.

Another, slightly related topic, is that David Sisson and I have been talking about adding examples.  We haven't been able to do this in the past because we didn't have a very readable serialization (e.g. the JSON format does not make for good examples).  Now that a text format is coming along I think we could start to add more examples using that.

Dane Pitkin

unread,
Jun 23, 2023, 2:12:38 PM6/23/23
to subs...@googlegroups.com
+1 for additional examples as well! That would be #2 on my list of "most helpful additions for Substrait newcomers." If I find a good way to visualize the protobuf, I'll definitely propose it! My current thinking is that it would just be easiest to write ourselves if we want it done right.. so may or may not be worth the effort.

Reply all
Reply to author
Forward
0 new messages