Friendlier Substrait Python package

48 views
Skip to first unread message

Tim Swast

unread,
Nov 29, 2022, 2:20:23 PM11/29/22
to subs...@googlegroups.com, Amir Hormati
Hello folks,

I'm curious as to folks recommendations on how Python packages could produce (and potentially consume as well) Substrait expression trees.

I can use the protobufs directly, but that leaves a lot to be desired from a client API. Importantly, a client API might want to keep track of intermediate names (which are intentionally omitted from the substrait expression tree) and types after applying various operations.

Is https://github.com/ibis-project/ibis-substrait intended to be that API, at least on the producer side? I worry a bit about pulling in a heavyweight package. Last I tried SQLAlchemy was required for most of the backends I tried, but perhaps ibis-substrait can get away with avoiding it?

  •  Tim Swast
  •  Google Cloud Platform
  •  Chicago, IL, USA

Jacques Nadeau

unread,
Dec 4, 2022, 10:40:27 PM12/4/22
to Substrait, Amir Hormati
I'm supportive of a helper library. I don't think ibis serves that purpose. Is this something you'd be interested in bootstrapping?

--
You received this message because you are subscribed to the Google Groups "substrait" group.
To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAJM%2BdeKxNd_MKVzXKPxQWrzRcv9z%3DwgzZdLrH3s58XBtycjOXg%40mail.gmail.com.

Tim Swast

unread,
Dec 5, 2022, 10:50:58 AM12/5/22
to subs...@googlegroups.com, Amir Hormati
I started something, but noticed I was duplicating a lot of what's already in Ibis, such as logic to fetch the schema for a table from the backend when creating a "read from named table" relation. This would then have to be made generic in some way to turn it into a general Substrait Python client and not just a BigQuery Substrait client, which would make it look even more like Ibis.

I'm curious what you're envisioning? Would it be cutting the boundary somewhere similar to the current ibis-substrait package in which one needs to supply the schema along with the table name when constructing those relations?

  •  Tim Swast
  •  Senior Software Friendliness Engineer, BigQuery
  •  Google Cloud Platform
  •  Chicago, IL, USA


Andy Grove

unread,
Dec 5, 2022, 11:09:39 AM12/5/22
to subs...@googlegroups.com
One more option to throw in here, although maybe not ideal, is to use DataFusion for this purpose. DataFusion has a query plan representation (and query planner/optimizer) and has a Python API:


DataFusion also supports substrait via https://github.com/datafusion-contrib/datafusion-substrait

So, with some work, we could enable something like:

from datafusion import SessionContext

c = SessionContext()
c.register_parquet("foo", "/path/to/foo")
df = c.sql("SELECT * FROM foo") // this can be a complex query
substrait_plan = df.to_substrait()

Aldrin

unread,
Jan 10, 2023, 2:45:22 PM1/10/23
to subs...@googlegroups.com
Hello!

Just wanted to follow up here and see if there is a repo that has been started?

Thanks!


Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


Reply all
Reply to author
Forward
0 new messages