Hello everyone,
I would like to propose an extension for the Fetch operation in Substrait [1].
The fetch operator eliminates records outside a specified window, typically corresponding to an OFFSET/FETCH or LIMIT/OFFSET clause in SQL. The fetch operator defines two primary properties:
• Offset (optional): Specifies the number of records to skip before retrieval begins.
• Count: Specifies the number of records to retrieve.
Currently, both offset and count in Substrait are restricted to integers. This proposal seeks to generalize these fields to support expressions that evaluate to a constant integer, thus providing more flexibility for engines.
Background
In the SQL standard, engines supporting features F860 (dynamic row count) and F865 (dynamic offset) enable such capabilities. Some examples of engines supporting expressions for offset and/or count include:
1. SQL Server: Supports variables, parameters, and constant scalar expressions, including subqueries [2].
2. PostgreSQL: Supports literal constants, parameters, variables, and other expressions [3].
3. DB2: Supports expressions that must not contain column references, scalar full-selects, non-deterministic functions, functions with external actions, or sequence references [4].
4. Oracle: Supports a literal or an expression that evaluates to a numeric value [5].
5. Apache Spark: Allows foldable function expressions for limit and offset [6].
ProposalExtending FetchRel in Substrait to support expressions for both offset and count. Here is a sketch of how it would look:
// The relational operator representing LIMIT/OFFSET or TOP type semantics.
message FetchRel {
RelCommon common = 1;
Rel input = 2;
// the offset expressed in number of records
// Deprecated: use `offset_rel` instead
int64 offset = 3 [deprecated = true];
// Expression evaluated into an integer specifying the number of records to
// skip.
Expression offset_rel = 5;
// the amount of records to return
// use -1 to signal that ALL records should be returned
// Deprecated: use `count_rel` instead
int64 count = 4 [deprecated = true];
// Expression evaluated into an integer specifying the number of records to
// return. -1 signals that all records should be returned.
Expression count_rel = 6;
substrait.extensions.AdvancedExtension advanced_extension = 10;
}
If the community agrees the proposal is reasonable, I will proceed by creating a GH issue and starting to work on a PR. Please let me know if you have any feedback.
Thanks,