Creating the druid index for the raw event data

Jithin Thomas

unread,

Sep 8, 2016, 5:57:13 PM9/8/16

to sparklinedata

The doc titled 'Spark and Druid: a suitable match' mentions that the package expects the following picture:

* raw event data is landing in hdfs/s3, and a Druid Index is kept upto date

(i) Could someone elaborate more on how the Druid index needs to be created (spec, etc.) and kept up to date?

- Should the user write the raw data to hdfs/s3 and then each time explicitly update the druid index?

- Or is it sufficient to write the raw data to druid and depend on druid writing the segments to hdfs/s3?

Thanks,

Jithin

Jithin Thomas

unread,

Sep 8, 2016, 6:12:04 PM9/8/16

to sparklinedata

I forgot to mention that I'm new to Druid - so please let me know if my question is not clear enough.

Laljo John Pullokkaran

unread,

Sep 8, 2016, 6:21:42 PM9/8/16

to Jithin Thomas, sparklinedata

Jithin,

Currently user has to write the indexing spec and run the indexing job both for initial indexing and subsequent updates.

After ingestion Druid can deposit index files in to deep storage (S3/HDFS); see Druid documentation below.

http://druid.io/docs/latest/dependencies/deep-storage.html

http://druid.io/docs/latest/ingestion/faq.html

http://druid.io/docs/latest/ingestion/data-formats.html

Thanks

John

--
You received this message because you are subscribed to the Google Groups "sparklinedata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparklinedat...@googlegroups.com.
To post to this group, send email to sparkl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sparklinedata/574ec11a-6a80-4ba8-a52c-be8793bb3190%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jithin Thomas

unread,

Sep 8, 2016, 7:53:36 PM9/8/16

to sparklinedata, jith...@gmail.com, jo...@sparklinedata.com

Thanks Laljo. I'll go through those docs.

A few more questions:

(i) Is it possible to do the same if my raw data is stored in cassandra (instead of hdfs/s3)?

(ii) Would I be able to reuse the Sparkline accelerator for that (ie, Cassandra-based storage)?

Thanks,

Jithin

Laljo John Pullokkaran

unread,

Sep 8, 2016, 8:24:30 PM9/8/16

to Jithin Thomas, sparklinedata

I am not sure i understood your question.

If the question is you want to read data out from Cassandra and index in to Druid then i am not aware of any Cassandra connectors for Druid.

Try posting to Druid mailing list

Thanks

John

--
You received this message because you are subscribed to the Google Groups "sparklinedata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparklinedat...@googlegroups.com.
To post to this group, send email to sparkl...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sparklinedata/260d4807-13a0-4a2d-a181-f2243d55d2bd%40googlegroups.com.

Reply all

Reply to author

Forward