I have a couple of questions
1. Currently only single distinct column is allowed but plan itself
permits multiple ones
DISTINCT
DISTINCT
SEQUENTIAL SCAN of "LOG"
Should I account for this case while pushing down distinct nodes? Or
exception should be thrown if more then one distinct nodes are
encountered similar to the handleDistinct code?
2. If we do allow, is it possible for a distinct node to have more
then one child and if so, can it be a mixture of distinct and some
other nodes?
DISTINCT
DISTINCT
SEQUENTIAL SCAN of "LOG"
DISTINCT
SEQUENTIAL SCAN of "LOG"
Or
DISTINCT
DISTINCT
SEQUENTIAL SCAN of "LOG"
SEQUENTIAL SCAN of "LOG"
It shouldn’t be a big deal to accommodate for the first plan, but I am
not sure how to handle the next two.
Thanks,
Mike
select distinct a,b from t;
where a and b are columns in t.
Mike
Mike,
The DISTINCT processing in this case has to happen on the combinations of column values from the same row/scan, so the structure of the plan does not change vs. the single column case.
BTW, having looked at the description of ENG 2503/2504, I'm not sure that it's as simple as a push-down issue, though the push-down is a good first step.
I am new to the code and haven't had a chance to study the DISTINCT implementation, but I suspect that there may be ways to tune it to better suit the case of low-cardinality columns. I'm hoping to look deeper into this in the next few days.
I spoke with Michael last night and he's going to try to make the distinct node executor work with subsets of tuples, rather than single rows.