Druid with Spark SQL

899 views
Skip to first unread message

pde...@cloudfabrix.com

unread,
Feb 18, 2016, 1:52:11 PM2/18/16
to Druid User
Hi there,

I am looking for some sample examples on how to integrate Druid with Spark SQL. I have my analytic jobs running using Spark SQL but would try to use Dataframes with Druid and see if it improves the performance.
I am not clear on how do I set up Druid Index that will convert Spark SQL data frames to Druid datasources.

Any help is really appreciated.

Thanks,
Purvi

Fangjin Yang

unread,
Feb 18, 2016, 2:02:35 PM2/18/16
to Druid User

pde...@cloudfabrix.com

unread,
Feb 18, 2016, 2:19:37 PM2/18/16
to Druid User
Thanks for quick response Fangjin. Yes I have looked at the example but am not clear how to do it.

I have sample data which is in json format. I pass that data to spark sql and convert it to dataframe.  I am using Java API to do this.

The example shows next I need to create a druid Datasource by creating a temporary table. 
I am not sure what to set for druiddatasource, starschema values while creating temporary table. Also, do I need to setup Druid indexes before doing anything here ?

Thanks,
Purvi

Harish Butani

unread,
Feb 18, 2016, 3:18:32 PM2/18/16
to Druid User
Hi Purvi,

1. Yes you need to index your dataset. Currently you have to do this outside of Spark using one of the indexing methods in Druid(indexing service, HadoopIndexer etc)
2. You need to setup a DruidDataSource in Spark to expose a Druid datasource in Spark.

Can you send me a separate email and we can help you get started.

regards,
Harish.

Federico Nieves

unread,
Nov 15, 2016, 3:47:39 PM11/15/16
to Druid User
Hi Harish, how are you?

I'm also trying to connect Spark to our current Druid environment but I'm having trouble starting thrift server, is there any more detailed documentation?

As you said, I need to create a DruidDataSource to be able to query structured data that is already loaded into druid and hadoop as deep storage. Any help will be much appreciated!

Thanks,

Chanh Le

unread,
Nov 16, 2016, 10:11:11 PM11/16/16
to Druid User
I had tried spark-line-druid for a month and it works well it help your query faster because some part of execution plan will down to druid in some part like filter sum ...

mona kumar

unread,
Feb 15, 2017, 2:37:20 PM2/15/17
to Druid User
Hello ,
   We are doing POC using druid for that  looking how to create druid data source from java spark program and also need to  setup Druid indexes before doing query .
Any help or pointers will be really appreciated .

Thanks
Mona



On Thursday, February 18, 2016 at 11:19:37 AM UTC-8, pde...@cloudfabrix.com wrote:
Reply all
Reply to author
Forward
0 new messages