WindowRel

28 views
Skip to first unread message

Timothy Chen

unread,
Mar 30, 2014, 5:31:04 AM3/30/14
to opti...@googlegroups.com
HI Julian,

I'm working on to get Windows working for Drill, and first step is to get correct logical plan operator generated.

I'm using this sql query in sql parser:

select count(*) over w from cp.`employee.json` window w as ( partition by position_id order by position_id);

Just as a quick test, and I was expecting to see a WindowRel being converted.

However I'm getting this plan outputted with the planner:

Plan after converting SqlNode to RelNode
ProjectRel(EXPR$0=[CAST(COUNT() OVER (PARTITION BY $1 ORDER BY $1 ROWS BETWEEN CURRENT ROW AND CURRENT ROW)):BIGINT])
  EnumerableTableAccessRel(table=[[cp, employee.json]])

I wonder if I'm missing something?

Tim


Julian Hyde

unread,
Mar 30, 2014, 9:24:18 PM3/30/14
to opti...@googlegroups.com
Optiq translates SQL to a project, and you need an extra rule to convert from project to window. You need to enable WindowedAggSplitterRule. 

Then you'll need something analogous to EnumerableWindowRule to convert from WindowRel to DrillWindowRel.

Debug say JdbcTest.testWinAgg if you want to see how the process works.

Julian


--
You received this message because you are subscribed to the Google Groups "optiq-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiq-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timothy Chen

unread,
Mar 31, 2014, 4:02:42 AM3/31/14
to opti...@googlegroups.com
I see, I tried to enable WindowsAggSplitterRule but it's complaining that the WindowRel it generates doesn't implement the Drill convention trait set.

Looking at Drill's new sql parsing code it looks like we only enable all the Drill specific Rel -> Logical operator rules but none of the Optiq rules.

Do you know if it's possible and how to perhaps transform a validated RelNode twice that two different rule set?

I was thinking if that's possible I can run the WindowAggSplitterRule first and pass the results into the planner.

I also tried to copy and paste the code from WindowAggSplitterRule, but I'm hitting problem where CalcRelSplitter constructor is package protected and feel like creating a custom inherited class in the same package in Drill is a bit of a hack.

Tim

Julian Hyde

unread,
Mar 31, 2014, 1:30:58 PM3/31/14
to opti...@googlegroups.com
You can apply several planning phases. Note that Frameworks.getPlanner() takes a list of RuleSets. You can select the RuleSet by ordinal when you run Planner.transform().

So, your first RuleSet should be one that contains a WindowedAggSplitterRule (and perhaps very little else) to transform the tree.

Julian

Timothy Chen

unread,
Apr 1, 2014, 6:31:54 AM4/1/14
to opti...@googlegroups.com
I tried having one subset with just the WindowAggSplitterRule and call the index 0 rule set when calling transform.

However I'm getting this error now:

message: "Failure while parsing sql. < CannotPlanException:[ Node [rel#32:Subset#1.NONE.[]] could not be implemented; planner state:\n\nRoot: rel#32:Subset#1.NONE.[]\nOriginal rel:\nProjectRel(subset=[rel#32:Subset#1.NONE.[]], EXPR$0=[CAST(COUNT() OVER (PARTITION BY $1 ORDER BY $1 ROWS BETWEEN CURRENT ROW AND CURRENT ROW)):BIGINT]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io}, id = 31\n  EnumerableTableAccessRel(subset=[rel#30:Subset#0.ENUMERABLE.[]], table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 28\n\nSets:\nSet#0, type: DrillRecordRow\n\trel#30:Subset#0.ENUMERABLE.[], best=rel#28, importance=0.9\n\t\trel#28:EnumerableTableAccessRel.ENUMERABLE.[](table=[cp, employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}\nSet#1, type: RecordType(BIGINT EXPR$0)\n\trel#32:Subset#1.NONE.[], best=null, importance=1.0\n\t\trel#31:ProjectRel.NONE.[](child=rel#30:Subset#0.ENUMERABLE.[],EXPR$0=CAST(COUNT() OVER (PARTITION BY $1 ORDER BY $1 ROWS BETWEEN CURRENT ROW AND CURRENT ROW)):BIGINT), rowcount=100.0, cumulative cost={inf}\n\t\trel#41:ProjectRel.NONE.[](child=rel#40:Subset#3.NONE.[],EXPR$0=CAST($0):BIGINT), rowcount=1.7976931348623157E308, cumulative cost={inf}\nSet#2, type: RecordType(ANY *, ANY position_id, BIGINT w0$o0)\n\trel#38:Subset#2.NONE.[], best=null, importance=0.81\n\t\trel#34:WindowRel.NONE.[](child=rel#30:Subset#0.ENUMERABLE.[],window#0=window(partition {1} order by [1 Ascending] rows between CURRENT ROW and CURRENT ROW aggs [COUNT()])), rowcount=100.0, cumulative cost={inf}\nSet#3, type: RecordType(BIGINT $0)\n\trel#40:Subset#3.NONE.[], best=null, importance=0.9\n\t\trel#39:ProjectRel.NONE.[](child=rel#38:Subset#2.NONE.[],$0=$2), rowcount=1.7976931348623157E308, cumulative cost={inf}\n\n ]"
]

Not sure if I'm understanding correctly, but seems like my count(*) since it becomes a AggregateRel, the projectRel subset is now empty and somehow optiq can't handle it? I tried sum with a particular column also same exception.

Tim


--
You received this message because you are subscribed to a topic in the Google Groups "optiq-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/optiq-dev/_UMt5UjhgAs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to optiq-dev+...@googlegroups.com.

Julian Hyde

unread,
Apr 1, 2014, 2:07:38 PM4/1/14
to opti...@googlegroups.com
On Apr 1, 2014, at 3:31 AM, Timothy Chen <tna...@gmail.com> wrote:

> I tried having one subset with just the WindowAggSplitterRule and call the index 0 rule set when calling transform.
>
> However I'm getting this error now:

I haven’t had the mental chip implant that allows me to parse \n and \t, so I took the liberty of converting your error stack into English. As follows:

Failure while parsing sql. CannotPlanException: Node [rel#32:Subset#1.NONE.[]] could not be implemented; planner state:

Root: rel#32:Subset#1.NONE.[]
Original rel:
ProjectRel(subset=[rel#32:Subset#1.NONE.[]], EXPR$0=[CAST(COUNT() OVER (PARTITION BY $1 ORDER BY $1 ROWS BETWEEN CURRENT ROW AND CURRENT ROW)):BIGINT]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io}, id = 31
EnumerableTableAccessRel(subset=[rel#30:Subset#0.ENUMERABLE.[]], table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 28

Sets:
Set#0, type: DrillRecordRow
rel#30:Subset#0.ENUMERABLE.[], best=rel#28, importance=0.9
rel#28:EnumerableTableAccessRel.ENUMERABLE.[](table=[cp, employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
Set#1, type: RecordType(BIGINT EXPR$0)
rel#32:Subset#1.NONE.[], best=null, importance=1.0
rel#31:ProjectRel.NONE.[](child=rel#30:Subset#0.ENUMERABLE.[],EXPR$0=CAST(COUNT() OVER (PARTITION BY $1 ORDER BY $1 ROWS BETWEEN CURRENT ROW AND CURRENT ROW)):BIGINT), rowcount=100.0, cumulative cost={inf}
rel#41:ProjectRel.NONE.[](child=rel#40:Subset#3.NONE.[],EXPR$0=CAST($0):BIGINT), rowcount=1.7976931348623157E308, cumulative cost={inf}
Set#2, type: RecordType(ANY *, ANY position_id, BIGINT w0$o0)
rel#38:Subset#2.NONE.[], best=null, importance=0.81
rel#34:WindowRel.NONE.[](child=rel#30:Subset#0.ENUMERABLE.[],window#0=window(partition {1} order by [1 Ascending] rows between CURRENT ROW and CURRENT ROW aggs [COUNT()])), rowcount=100.0, cumulative cost={inf}
Set#3, type: RecordType(BIGINT $0)
rel#40:Subset#3.NONE.[], best=null, importance=0.9
rel#39:ProjectRel.NONE.[](child=rel#38:Subset#2.NONE.[],$0=$2), rowcount=1.7976931348623157E308, cumulative cost={inf}

> Not sure if I'm understanding correctly, but seems like my count(*) since it becomes a AggregateRel, the projectRel subset is now empty and somehow optiq can't handle it? I tried sum with a particular column also same exception.

I don’t see an AggregateRel. The rule seems to have done the right thing — change rel#32:project into rel#41:project on top of rel#39:project on top of rel#34:window. The two projects are almost trivial, but don’t worry about that. Now you need a rule that will convert WindowRel to say DrillWindowRel.

Julian

Timothy Chen

unread,
Apr 1, 2014, 2:29:51 PM4/1/14
to opti...@googlegroups.com
Hi Julian,

Sorry about the formatting, I just copy and pasted from the console :)

I have a rule in the other rule set to do that, but before I can call transform again on index 1 ruleset it already throws an exception though?

When you mean don't worrying about it should I try/catch CannotPlanException? That seems like it will hide many problems.

Thanks!

Tium


I don't see an AggregateRel. The rule seems to have done the right thing -- change rel#32:project into rel#41:project on top of rel#39:project on top of rel#34:window. The two projects are almost trivial, but don't worry about that. Now you need a rule that will convert WindowRel to say DrillWindowRel.

Julian

Julian Hyde

unread,
Apr 1, 2014, 2:40:19 PM4/1/14
to opti...@googlegroups.com
On Apr 1, 2014, at 11:29 AM, Timothy Chen <tna...@gmail.com> wrote:

I have a rule in the other rule set to do that, but before I can call transform again on index 1 ruleset it already throws an exception though?

The phase 0 planner tries to make an executable plan, but the plan isn’t executable until phase 1 has finished. This a bug in Frameworks.getPlanner (logged https://github.com/julianhyde/optiq/issues/221).

Two options:
1. You could use a single rule set, and throw WindowedAggSplitterRule and DrillWindowRule into the mix.
2. For phase 0, you could build a standalone HepPlanner.

When you mean don't worrying about it should I try/catch CannotPlanException? That seems like it will hide many problems.

I didn’t say that. I said you could ignore InvalidRelException.

Julian

Timothy Chen

unread,
Apr 3, 2014, 4:22:53 PM4/3/14
to opti...@googlegroups.com
I originally wanted to do option 1, but it throws exception as the WindowRel that is generated from the WindowAggSplitterRule doesn't implement the Drill convention so I was thinking I have to do a two phase conversion.

I created a HepPlanner with only the WindowAggSplitterRule, and later passes that to the planner running Drill rules, and still getting the CannotPlanException similiarly before:


CannotPlanException: Node [rel#77:Subset#4.DRILL.[]] could not be implemented; planner state:

Root: rel#77:Subset#4.DRILL.[]
Original rel:
AbstractConverter(subset=[rel#77:Subset#4.DRILL.[]], convention=[DRILL], sort=[[]]): rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 78
  ProjectRel(subset=[rel#76:Subset#4.NONE.[]], EXPR$0=[CAST(CASE(>($0, 0), $1, null)):ANY]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 1.7976931348623157E308 cpu, 0.0 io}, id = 75
    ProjectRel(subset=[rel#74:Subset#3.NONE.[]], $0=[$1], $1=[$2]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, Infinity cpu, 0.0 io}, id = 73
      WindowRel(subset=[rel#72:Subset#2.NONE.[]], window#0=[window(partition {0} order by [0 Ascending] rows between CURRENT ROW and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 1.7976931348623157E308 cpu, 0.0 io}, id = 71
        ProjectRel(subset=[rel#70:Subset#1.NONE.[]], position_id=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io}, id = 69
          EnumerableTableAccessRel(subset=[rel#68:Subset#0.ENUMERABLE.[]], table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 49

Sets:
Set#0, type: DrillRecordRow
rel#68:Subset#0.ENUMERABLE.[], best=rel#49, importance=0.5904900000000001
rel#49:EnumerableTableAccessRel.ENUMERABLE.[](table=[cp, employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
rel#86:AbstractConverter.ENUMERABLE.[](child=rel#85:Subset#0.DRILL.[],convention=ENUMERABLE,sort=[]), rowcount=100.0, cumulative cost={inf}
rel#85:Subset#0.DRILL.[], best=rel#92, importance=0.531441
rel#87:AbstractConverter.DRILL.[](child=rel#68:Subset#0.ENUMERABLE.[],convention=DRILL,sort=[]), rowcount=100.0, cumulative cost={inf}
rel#92:DrillScanRel.DRILL.[](table=[cp, employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}

Set#1, type: RecordType(ANY position_id)
rel#70:Subset#1.NONE.[], best=null, importance=0.6561
rel#69:ProjectRel.NONE.[](child=rel#68:Subset#0.ENUMERABLE.[],position_id=$1), rowcount=100.0, cumulative cost={inf}
rel#89:Subset#1.DRILL.[], best=rel#88, importance=0.32805
rel#90:AbstractConverter.DRILL.[](child=rel#70:Subset#1.NONE.[],convention=DRILL,sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#88:DrillProjectRel.DRILL.[](child=rel#85:Subset#0.DRILL.[],position_id=$1), rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}

Set#2, type: RecordType(ANY position_id, BIGINT w0$o0, ANY w0$o1)
rel#72:Subset#2.NONE.[], best=null, importance=0.7290000000000001
rel#71:WindowRel.NONE.[](child=rel#70:Subset#1.NONE.[],window#0=window(partition {0} order by [0 Ascending] rows between CURRENT ROW and CURRENT ROW aggs [COUNT($0), $SUM0($0)])), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#82:Subset#2.DRILL.[], best=null, importance=0.81
rel#83:AbstractConverter.DRILL.[](child=rel#72:Subset#2.NONE.[],convention=DRILL,sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
Set#3, type: RecordType(BIGINT $0, ANY $1)
rel#74:Subset#3.NONE.[], best=null, importance=0.81
rel#73:ProjectRel.NONE.[](child=rel#72:Subset#2.NONE.[],$0=$1,$1=$2), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#79:Subset#3.DRILL.[], best=null, importance=0.9
rel#80:AbstractConverter.DRILL.[](child=rel#74:Subset#3.NONE.[],convention=DRILL,sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#84:DrillProjectRel.DRILL.[](child=rel#82:Subset#2.DRILL.[],$0=$1,$1=$2), rowcount=1.7976931348623157E308, cumulative cost={inf}
Set#4, type: RecordType(ANY EXPR$0)
rel#76:Subset#4.NONE.[], best=null, importance=0.9
rel#75:ProjectRel.NONE.[](child=rel#74:Subset#3.NONE.[],EXPR$0=CAST(CASE(>($0, 0), $1, null)):ANY), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#77:Subset#4.DRILL.[], best=null, importance=1.0
rel#78:AbstractConverter.DRILL.[](child=rel#76:Subset#4.NONE.[],convention=DRILL,sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
rel#81:DrillProjectRel.DRILL.[](child=rel#79:Subset#3.DRILL.[],EXPR$0=CAST(CASE(>($0, 0), $1, null)):ANY), rowcount=1.7976931348623157E308, cumulative cost={inf}

Is there something I need to do to handle the empty subset rels?

Tim


--

Timothy Chen

unread,
Apr 14, 2014, 11:04:43 PM4/14/14
to opti...@googlegroups.com
HI Julian,

Thanks for helping me to get the sql parsing to work for Windowing.

Right now I'm getting an interesting plan returned with the same query
I've been using:

select sum(position_id) over w from `cp`.`employee.json` window w as
(partition by position_id order by position_id);

The resulting plan contains two aggregation calls, one that is $SUM0
and an extra COUNT op that was generated.

Two questions:

1, Looking at the SqlToRelConverter.java:4762, I'm not sure why Optiq
wants to replace the agg call with a $SUM0? This is causing Drill to
not able to find the SUM function call.

2. RexBuilder.java:312 I see that it is generating a extra
CASE(COUNT....) calls, what is the reason for generating this? I don't
see how we need this in a window function at least for Drill.

Tim

Vladimir Sitnikov

unread,
Apr 14, 2014, 11:38:01 PM4/14/14
to opti...@googlegroups.com

2. I guess the count is there to properly calculate the sum of empty set.

The sum of empty set should be null (sql spec), so sum, avg should somehow tell if the set is empty.

Julian Hyde

unread,
Apr 16, 2014, 12:27:13 AM4/16/14
to opti...@googlegroups.com
Tim,

What Vladimir says is correct. The standard requires SUM and AVG return NULL if applied to the empty set.

You could optimize away if you know that the set of inputs is always non-empty. But there are a lot of cases where you can't:
  • aggregation without group by: select sum(sal) from emp -- may be empty if emp has no rows
  • aggregation with group by: select deptno, sum(sal) from emp group by deptno -- even though no 'deptno' group has 0 rows, if sal is nullable, all values of sal might be null
  • windowed aggregation over rows: select sum(sal) over (order by hire_date rows 10 preceding) from emp -- the window never has 0 rows, but if sal is nullable all values of sal might be null
  • windowed aggregation over range: select sum(sal) over (order by hire_date range between interval '1' month and '1' week preceding preceding) -- the window may have 0 rows, even if sal is not nullable
Julian
Reply all
Reply to author
Forward
0 new messages