--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/D8421593-DCDB-4BA1-9A35-25351D3B5519%40imply.io.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/D8421593-DCDB-4BA1-9A35-25351D3B5519%40imply.io.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/B97CDE7D-AC4C-4044-9790-74052879AE21%40apache.org.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYDP65%3DnC7yrFH%3DEH6%2Bv-KE-Om0Mfr%2BLF-x0Qq5OuVs2vg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAG0p_PHk90Bg%2BGUJ%2BeBgTqfV9Do1cPg8OOd%2BkCd-0z6h%3DXenUg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3DweKRM6muXit%3DtcuQdF941yF2uLGwVxwEq6xM8%2Bo%3DgacvA%40mail.gmail.com.
I think native expression filters can make sense but I really think that including methods in ColumnSelectorFactory and ValueMatcherFactory that are expression-aware is adding the "bad" kind of complexity: links between subsystems that don't necessarily need to be linked. In this case, the storage adapter and expression subsystems.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3DwdCuh4nZUsQxR6X13yUqywP3A934MuW1csCQM-yd%3DNcYA%40mail.gmail.com.
IMO a "makeValueMatcher" method on VirtualColumn isn't really very leaky, since ValueMatcher is just a boolean-typed selector. "ValueMatcher makeValueMatcher(factory)" is essentially the same thing as "BooleanColumnSelector makeBooleanColumnSelector(factory)" method, but just with a funny name. The implementation would not need to know about filters, it just has to return a boolean.Filter.makeMatcher accepting ColumnSelectorFactory sounds good to me right now. I haven't thought through it that much, but if it works then I think it'd also be a good solution.
Gian
I skimmed through #2511 and #3758 today, For me, here are the preferred high level expectations.1- Virtual columns should work universally across all query types, aggregators, filters and whatever else deals with virtual columns.2- Virtual columns work "transparently", that is without all extensions to not necessarily implement them. For example, "all" aggregators and filters should be able to have things like expressions without needing to implement that specially in that aggregator or filter. expansion of same idea is that adding more things like "expressions" should just need implementing another virtual column and no more change.3- We should be able to do the grouping in groupBy using virtual columns.4- Semantics of virtual column should be same everywhere. For example, currently it looks like that "select" query would include the virtual column output in the response returned to the user while "groupBy" would not and they are there only to be referenced by aggregators/filters.5- Folding extractioFn into virtual columns.6- During ingestion, users should be able to combine raw input row columns using virtual columns. However, this one is not super high priority because this "can be" achieved by doing simple ETL upfront or writing a custom InputRowParser to handle any row level transformation.With those in mind, I think https://github.com/druid-io/druid/pull/3758 is a step in right direction as it does take care of (1), (2) and (6).I see (3) mentioned in the discussions and possibly in future PR , but we should at least think that through to avoid too much rework later.(5) is great to support the cases where user want to apply extraction function on multiple columns, That said, There are already some open PRs to enable this particular use case (haven't looked at those yet) and if those fulfill the usecase without too much of a hack and maintenance burden then (5) doesn't remain too important.None of the discussion talked about (4) but I believe that is important to keep the mental model of virtual columns consistent for the end users. That means, we might have a different notion than virtual columns to report virtual column output in select query response.-- Himanshu
On Friday, 9 December 2016 15:27:53 UTC-6, Gian Merlino wrote:
IMO a "makeValueMatcher" method on VirtualColumn isn't really very leaky, since ValueMatcher is just a boolean-typed selector. "ValueMatcher makeValueMatcher(factory)" is essentially the same thing as "BooleanColumnSelector makeBooleanColumnSelector(factory)" method, but just with a funny name. The implementation would not need to know about filters, it just has to return a boolean.Filter.makeMatcher accepting ColumnSelectorFactory sounds good to me right now. I haven't thought through it that much, but if it works then I think it'd also be a good solution.
Gian
On Fri, Dec 9, 2016 at 1:15 PM, Roman Leventov <roman.l...@metamarkets.com> wrote:--On Fri, Dec 9, 2016 at 10:12 AM, Gian Merlino <gi...@imply.io> wrote:I think native expression filters can make sense but I really think that including methods in ColumnSelectorFactory and ValueMatcherFactory that are expression-aware is adding the "bad" kind of complexity: links between subsystems that don't necessarily need to be linked. In this case, the storage adapter and expression subsystems.I agree with this. However your proposal makes VirtualColumn a leaked abstraction, because it starts to know that it "could be a filter".Another way to isolate ValueMatcherFactories and StorageAdapters from expressions is to make Filter.makeMatcher() to accept ColumnSelectorFactory. So simple Filters could pull a single dimensionSelector, expression filters may several ones. And ValueMatcherFactory interface could be removed.
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3DwdCuh4nZUsQxR6X13yUqywP3A934MuW1csCQM-yd%3DNcYA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d6c42c45-8d38-4f8d-9945-0e3d0972289f%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3DwdCuh4nZUsQxR6X13yUqywP3A934MuW1csCQM-yd%3DNcYA%40mail.gmail.com.
Roman,
in your suggestion, how would ExpressionFilter.makeMatcher(ColumnSelectorFactory) be implemented? Would it do something like:parsedExpression.eval(bindingsFrom(columnSelectorFactory)).asBoolean()or:new ExpressionVirtualColumn(expressionString).makeValueMatcher(columnSelectorFactory)[of course caching either the ExpressionVirtualColumn or the parsedExpression – just did one line for brevity.]I'm okay with either one. I'm guessing you had the first one in mind since you had argued against the makeValueMatcher method appearing on ExpressionVirtualColumn.
Gian
Also, does anyone see issues with getting rid of "ValueMatcherFactory"? If we did that, we could still have some convenience methods on Filters like Filters.makeValueMatcher(ColumnSelectorFactory, String columName, DruidPredicateFactory) and Filters.makeValueMatcher(ColumnSelectorFactory, String columName, String value). Most existing filters would use those convenience methods.
GianOn Tue, Dec 13, 2016 at 4:26 PM, Gian Merlino <gi...@imply.io> wrote:Roman,in your suggestion, how would ExpressionFilter.makeMatcher(ColumnSelectorFactory) be implemented? Would it do something like:parsedExpression.eval(bindingsFrom(columnSelectorFactory)).asBoolean()or:new ExpressionVirtualColumn(expressionString).makeValueMatcher(columnSelectorFactory)[of course caching either the ExpressionVirtualColumn or the parsedExpression – just did one line for brevity.]I'm okay with either one. I'm guessing you had the first one in mind since you had argued against the makeValueMatcher method appearing on ExpressionVirtualColumn.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3Dwfoppp7aF6i3O-frq8kqrg5Y1AA5CETW_onV9xoQTM5vQ%40mail.gmail.com.
How does this sound for moving forward:1) Make Filters use ColumnSelectorFactory directly for building row-based matchers.- Remove ValueMatcherFactory completely.- Change Filter.makeMatcher(ValueMatcherFactory) to Filter.makeMatcher(ColumnSelectorFactory).- Add some helper methods to Filters.java: Filters.makeValueMatcher(ColumnSelectorFactory, String columnName, DruidPredicateFactory) and Filters.makeValueMatcher(ColumnSelectorFactory, String columnName, String value). Most existing filters should use these.- If needed for performance, add DimensionSelector.makeValueMatcher(value) and DimensionSelector.makeValueMatcher(predicateFactory). If added, then the helper methods in Filters.java should use these when appropriate.2) Remove the dependency of aggregators and storage adapters on expressions.- Remove makeMathExpressionSelector from ColumnSelectorFactory.- Remove "expression" from aggregatorFactories, in favor of users using fieldName + an "expression" typed virtual column.- Add some helper method somewhere that makes an ObjectBindings from a ColumnSelectorFactory. A new ExpressionFilter and ExpressionVirtualColumn should use this.3) Support filtering on expressions using an "expression" filter.- Create ExpressionFilter that uses the helper from (2).4) Support filtering on virtual columns.- Have BitmapIndexSelector impls return null on getBitmapIndex(columnName) when a virtual column exists for columnName, to force ValueMatcher-based filtering. This, combined with (1) above should be enough.- This does imply that filtering on virtual columns will always be done matcher-style and will not use indexes. I think this is fine for a start, although one day we might want to figure out a way to use indexes when possible.
Gian
Notably missing from the previous post is moving extractionFns into virtual columns, which a couple people have suggested doing. The biggest issue there, to me, is how to support filtering on extractionFn'ed dimensions, if done through virtual columns, while still retaining the ability to use indexes. But I think we can deal with that later.
Gian
On Wed, Dec 14, 2016 at 11:20 AM, Gian Merlino <gi...@imply.io> wrote:
How does this sound for moving forward:1) Make Filters use ColumnSelectorFactory directly for building row-based matchers.- Remove ValueMatcherFactory completely.- Change Filter.makeMatcher(ValueMatcherFactory) to Filter.makeMatcher(ColumnSelectorFactory).- Add some helper methods to Filters.java: Filters.makeValueMatcher(ColumnSelectorFactory, String columnName, DruidPredicateFactory) and Filters.makeValueMatcher(ColumnSelectorFactory, String columnName, String value). Most existing filters should use these.- If needed for performance, add DimensionSelector.makeValueMatcher(value) and DimensionSelector.makeValueMatcher(predicateFactory). If added, then the helper methods in Filters.java should use these when appropriate.2) Remove the dependency of aggregators and storage adapters on expressions.- Remove makeMathExpressionSelector from ColumnSelectorFactory.- Remove "expression" from aggregatorFactories, in favor of users using fieldName + an "expression" typed virtual column.- Add some helper method somewhere that makes an ObjectBindings from a ColumnSelectorFactory. A new ExpressionFilter and ExpressionVirtualColumn should use this.3) Support filtering on expressions using an "expression" filter.- Create ExpressionFilter that uses the helper from (2).4) Support filtering on virtual columns.- Have BitmapIndexSelector impls return null on getBitmapIndex(columnName) when a virtual column exists for columnName, to force ValueMatcher-based filtering. This, combined with (1) above should be enough.- This does imply that filtering on virtual columns will always be done matcher-style and will not use indexes. I think this is fine for a start, although one day we might want to figure out a way to use indexes when possible.
Gian
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYCGC1MXZGgadU17CucRphmNw6sPmPJOvzOX37hRjJrpow%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3DwecRmCsb1Be3ZVVdT1xckiBsEpg9g-R9cA3LxRCSPxcDQ%40mail.gmail.com.
Gian, I am in general agreement with what you said in the last 2 posts. I guess, I can give more comments when there is PR.regarding select query, I agree that it should just have columns to output stuff.regarding extractionFn, it has two use cases1) to just transform the column value and do groupBy/topN etc on the mapped value ... no filtering here. this one should automatically be achieved by doing the "groupBy should be able to have virtual columns as groupBy keys"2) filtering on transformed values .... we are already enabling filters on virtual columns (that don't use indexes), so one version of filtering on transformed values is getting implemented anyway.with (1) and (2) done, filters with extractionFn is just an optimization (because (2) can do this too) and can be removed once we can manage index usage in filtering with virtual columns that can be left for the future.-- Himanshu
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CALuYO4eiXtmQPsjg1f6J8rvPvt3QUnVtqjx68%2BH69zrB%3DZMjgA%40mail.gmail.com.
PR for part 1: https://github.com/druid-io/druid/pull/3797. I didn't add DimensionSelector.makeValueMatcher(value) or DimensionSelector.makeValueMatcher(predicateFactory) because the benchmarks looked fine without them.
Gian
On Thu, Dec 15, 2016 at 12:26 AM, Himanshu <g.him...@gmail.com> wrote:
Gian, I am in general agreement with what you said in the last 2 posts. I guess, I can give more comments when there is PR.regarding select query, I agree that it should just have columns to output stuff.regarding extractionFn, it has two use cases1) to just transform the column value and do groupBy/topN etc on the mapped value ... no filtering here. this one should automatically be achieved by doing the "groupBy should be able to have virtual columns as groupBy keys"2) filtering on transformed values .... we are already enabling filters on virtual columns (that don't use indexes), so one version of filtering on transformed values is getting implemented anyway.with (1) and (2) done, filters with extractionFn is just an optimization (because (2) can do this too) and can be removed once we can manage index usage in filtering with virtual columns that can be left for the future.-- Himanshu
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/66dab10a-642d-436d-9103-ab8d12f1c507%40googlegroups.com.