Druid supports both raw and pre aggregated data (on dimensions) with ingestion. What are the advantages and disadvantages of providing pre aggregated data? Also how druid aggregate raw data?
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/756b84a3-be3f-4822-aada-689ddccfa2b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAJZGzLfccEehJB5emUQ0Yh8NdusPb8Yff2ByoDpTy80ATy-vgg%40mail.gmail.com.
Thanks for the quick response Fangjin and Slim. I am adding raw data in druid using spark streaming and tranquility. Now I have two more questions.1 My realtime streaming is running at interval of 2 mins but I need time granularity of hour(in druid). As my streaming interval is of 2 mins I can pre aggregate data for 2 mins only. I have to ingest data immediately as I need to support till now queries also. I can run reindexing to get pre aggregated data for past hours. Is there is any other way to achieve same.
2 While querying druid using plyql I am getting rows with already aggregated data. Like if I add 100 events with same dimensions and only count as my metric, on querying "select * from datasource" i am getting 1 row with count as 100. So is druid itself aggregating some data before ingestion?