Attempt to convert Flink SQL to Substrait with Isthumus

53 views
Skip to first unread message

Vignesh C

unread,
Dec 19, 2022, 5:40:26 PM12/19/22
to substrait
Hi,

I am trying out Isthumus to convert a simple flink sql to substrait. When this works, I would like to attempt more flink sql queries.


I have hit two issues far.
  1. It looks like I need to use a different sql conformance mode, although I am not sure if that is all is needed.
  2. Isthumus does not handle all the operators yet (example: LogicalTableFunctionScan).
Gist contains additional contextual comments and questions.

I am new to both Substrait and Calcite. Before I spend more time on this, I wanted to get community's thoughts on the approach and potential issues that I might run into.

Thanks,
Vignesh.

James Taylor

unread,
Dec 22, 2022, 6:59:20 PM12/22/22
to substrait
Hi Vignesh,

Thanks for reaching out and thanks for all the context. Super excited to see work done to enable substrait + flink sql. I'm curious - what kind of initial use cases do you have in mind once the two are working together?

We've been adding support incrementally to substrait-java in terms of the breadth of sql support. We're definitely open to increasing this, so adding support for LogicalTableFunctionScan would be a welcome contribution. As far as the sql conformance mode, I can't think of any reason why we'd need to stick to only accepting SqlConformanceEnum. Have you gone far enough into it to see if making this less restrictive led to any issue?

Thanks,
James

Vignesh

unread,
Dec 23, 2022, 3:04:56 PM12/23/22
to subs...@googlegroups.com
Hi James,

Thank you for your reply. Use case that I am considering at a high level is to ensure the ones listed here extend to streaming as well. Examples use cases are - i) Support Flink SQL dialect and semantics in other streaming engines ii) Leverage Fiink's (or Calcite's)  optimization stack in another engine iii) Developer tools, such as VS Code extensions and plan visualizers built for one streaming engine can be made usable for a different engine. These use cases require the execution plan for a streaming query to be representable in substrait. I picked Flink SQL since it is open source and is widely used. My main goal is to see what it takes to represent an engine agnostic execution plan for streaming queries.

I have not spent more time trying to use SqlConfirmance mode instead of SqlConformance enum, it is in my backlog to attempt it. Good to know that adding LogicalTableFunction is acceptable. Using SubstraitRelVisitor directly instead of SQL might be faster to get the first query working, to assess feasibility. I am planning to do this first and then look at SqlConfirmance mode.

Thanks,
Vignesh.

--
You received this message because you are subscribed to a topic in the Google Groups "substrait" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/substrait/gOJm0Jj7Rdc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/c0681bec-996e-486f-bef8-0701fff138a1n%40googlegroups.com.

Vignesh

unread,
Jan 22, 2023, 11:32:37 PM1/22/23
to subs...@googlegroups.com
Hi All,

I looked at this more, and here is the current state.

1. To parse flink create table statements like this and for other select statements, we need to set the parser factory to the flink parser factory. This isn't configurable today, Substrait always uses  SqlDdlParserImpl.FACTORY
       i. This could be made configurable in the feature board, I think that would still not be enough.
       ii. SqlConverterBase.java expects SqlCreateTable, flink returns its own SqlCreateTable with extended properties. This part would need to be changed as well.
2. I suspect other statements, like for example window aggregation would also use extended nodes as well.
3. With 1 and 2, it wouldn't be possible to update isthmus to directly convert Flink SQL to substrait. I think making more parts (such as how the create table is used to add DefinedTable) pluggable, adding derived classes (that has Flink dependency) in a separate project would be a better option. Adding LogicalTableFunction might fix that error, but I suspect we would run into another one and finally require flink dependency.
4. I updated the feature board to take sql conformance instead of enum and sent a PR. This alone is not enough as I shared in (1).

I will attempt 3 as time permits.

Thanks,
Vignesh.

Reply all
Reply to author
Forward
0 new messages