Windowing Func with Distinct

39 views
Skip to first unread message

John Pullokkaran

unread,
Jul 23, 2014, 8:27:40 PM7/23/14
to opti...@googlegroups.com
Does Optiq support Windowing func with Distinct?

Ex: select avg(distinct c_int) over(PARTITION BY key ORDER BY value ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as p from t1;
Hive support Windowing functions with "Distinct" (UDAF with Distinct).

I am using "m_cluster.getRexBuilder().makeOver" to build window.
Optiq doesn't seems to support distinct on Windowing Functions.
Am i missing something?


Thanks
John

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Julian Hyde

unread,
Jul 24, 2014, 2:25:52 AM7/24/14
to opti...@googlegroups.com
I don't think it does at present. 

It would not be difficult for the parser and validator to support it, or to represent the DISTINCT flag in the SqlNode AST, to add a 'boolean distinct' parameter to RexBuilder.makeOver.

It might be a bit more difficult to actually implement it, so one could run queries in Enumerable or other conventions. Maybe this could be done using a rewrite rule, converting a distinct-windowed-agg to a regular windowed-agg, but I can't quite think of the details.

Can you please log a JIRA case for this.

Julian

--
You received this message because you are subscribed to the Google Groups "optiq-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiq-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vladimir Sitnikov

unread,
Jul 29, 2014, 11:49:39 AM7/29/14
to opti...@googlegroups.com
distinct UDFs might be related to https://issues.apache.org/jira/browse/OPTIQ-337 (Support IGNORE NULLS in window aggregates).

It might be a bit more difficult to actually implement it, so one could run queries in Enumerable or other conventions. Maybe this could be done using a rewrite rule, converting a distinct-windowed-agg to a regular windowed-agg, but I can't quite think of the details.
Main question for me is how we should support different capabilities of the executor engines.

I think it is quite easy to implement Enumerable way of calculating distinct UDFs.
What I am not sure is I do not quite get how "rewrite rule, converting a distinct-windowed-agg to a regular windowed-agg" should work and handle executors that do not support "distinct".

Vladimir

Julian Hyde

unread,
Jul 30, 2014, 5:48:54 PM7/30/14
to opti...@googlegroups.com
I’m not sure that a rewrite rule is possible, but it would be nice if there was one — it would allow us to parallelize/distribute queries, and to run on back-ends that didn’t explicitly support “distinct".

There is a rewrite rule for distinct non-windowed aggregates, which involves adding another level of aggregation: "select x, count(distinct y) from t group by x” becomes “select x, count(y) from (select x, y from t group by x, y) group by x”. (It gets a bit more complicated if there are multiple distincts, but is still possible.)

Julian

Vladimir Sitnikov

unread,
Nov 22, 2014, 12:55:36 PM11/22/14
to opti...@googlegroups.com
I've opened a JIRA case at https://issues.apache.org/jira/browse/CALCITE-476
Welcome to Calcite JIRA and/or dev @ calcite.incubator.apache.org and/or https://github.com/apache/incubator-calcite

Vladimir
Reply all
Reply to author
Forward
0 new messages