--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/2PAS12I0ViM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f9d08693-7997-4218-9483-8b1f8975bcec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Fangjin,Kafka -> Druid once ingestion is good, but I am talking about duplicate handling during batch ingestion. Say We ingest 100k data points every 30 mins, want to ensure the same data point was not ingested in previous 30 mins ingestion job.Don't want to do any checks before ingesting.Are there any possible ways to handle it in Druid.Also Even if I ingest duplicate. How do I get the unique count considering all dimensions and metric values.ThanksManish
Hi Manish, if you use batch ingestion, it should be 100% accurate in terms of the data you put in. If you are looking for exactly once streaming ingestion, we are working towards this for Kafka->Druid, and you should follow this PR:--https://github.com/druid-io/druid/pull/2220
On Thursday, January 14, 2016 at 6:18:43 AM UTC-6, Manish Deora wrote:Are there any ways to avoid duplicate data ingestion in druid ? I could't find any from the documentation.Also, what metric spec should be mentioned to count the unique values of a particular dimension.?
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/2PAS12I0ViM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.