The doc titled 'Spark and Druid: a suitable match' mentions that the package expects the following picture:
* raw event data is landing in hdfs/s3, and a Druid Index is kept upto date
(i) Could someone elaborate more on how the Druid index needs to be created (spec, etc.) and kept up to date?
- Should the user write the raw data to hdfs/s3 and then each time explicitly update the druid index?
- Or is it sufficient to write the raw data to druid and depend on druid writing the segments to hdfs/s3?
Thanks,
Jithin