join on sparkline tables

25 views
Skip to first unread message

manasa danda

unread,
Oct 20, 2016, 2:34:21 PM10/20/16
to sparklinedata
Hi,

Is it possible to join two sparkline tables that point to two different druid datasources.

Thanks,
Manasa

Laljo John Pullokkaran

unread,
Oct 20, 2016, 2:48:37 PM10/20/16
to manasa danda, sparklinedata
yes, it is possible.
You have to define nonAggregateQueryHandling push_project_and_filters in the data source options.

Our optimizer would try to push gb & filters below join.
If no GB or Filter got pushed then Druid would act as simple scan.
If it is a simple scan then query performance wouldn’t be that great.

We are planning to add a feature called semi-join optimization where we would ship the data from small table as filter to larger fact table.
This would reduce the amount of data that needs to be scanned.

Hope that helps.

Please reach out if you have any additional questions.

Thanks
John


--
You received this message because you are subscribed to the Google Groups "sparklinedata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparklinedat...@googlegroups.com.
To post to this group, send email to sparkl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sparklinedata/aceeefa8-34e1-45ca-a83d-e69dbc54eeea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

harish

unread,
Oct 20, 2016, 5:35:07 PM10/20/16
to sparklinedata, manas...@gmail.com, jo...@sparklinedata.com
Just to reiterate:

- we execute Spark Plans, so you can execute joins on any 2 tables including when 1 or both have a Druid Index. So any Spark Plan will run; the key thing we strive for is to optimize it as much as possible.
- to get faster execution, use the nonAggregateQueryHandling, this will push non Aggregation Plans to Druid; so you will get the benefit of filtered scans in these cases.
- we go further by pushing Group-Bys under Joins when possible.  See https://github.com/SparklineData/spark-druid-olap/wiki/Introducing-Logical-Optimizer
- We will be providing Semi-Join optimization in an upcoming release; this  will considerably speed-up things like Star-Joins(where dimension columns are not part of the index)

Would love to understand your use-case better.

Harish. 

On Thursday, October 20, 2016 at 11:48:37 AM UTC-7, Laljo John Pullokkaran wrote:
yes, it is possible.
You have to define nonAggregateQueryHandling push_project_and_filters in the data source options.

Our optimizer would try to push gb & filters below join.
If no GB or Filter got pushed then Druid would act as simple scan.
If it is a simple scan then query performance wouldn’t be that great.

We are planning to add a feature called semi-join optimization where we would ship the data from small table as filter to larger fact table.
This would reduce the amount of data that needs to be scanned.

Hope that helps.

Please reach out if you have any additional questions.

Thanks
John
On Oct 20, 2016, at 11:34 AM, manasa danda <manas...@gmail.com> wrote:

Hi,

Is it possible to join two sparkline tables that point to two different druid datasources.

Thanks,
Manasa

--
You received this message because you are subscribed to the Google Groups "sparklinedata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparklinedata+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages