Druid.io: update/override existing data via streams from Kafka (Druid Kafka indexing service)

Ilya Bochkov

unread,

Mar 21, 2018, 5:00:34 AM3/21/18

to Druid Development

Hi,

I asked the same question in Druid User forum, but did not receive any reply. So try here.

I'm loading streams from Kafka using the Druid Kafka indexing service.

But the data I uploaded is always changed, so I need to reload it again and avoid duplicates and collisions if data was already loaded.

I research docs about Updating Existing Data in Druid.

But all info about Hadoop Batch Ingestion, Lookups.

Is it possible to update existing Druid data during Kafka streams?

In other words, I need to rewrite the old values with new ones using Kafka indexing service (streams from Kafka).

May be any kind of setting to rewrite duplicates?

Thanks!

Ilya Bochkov

unread,

Apr 24, 2018, 3:28:08 AM4/24/18

to Druid Development

So, nobody knows?

Gian Merlino

unread,

May 3, 2018, 2:51:11 PM5/3/18

to druid-de...@googlegroups.com

Hi Ilya,

Kafka-based indexing is append only, so it's not possible to do updates that originate from the Kafka stream.

You would want to do updates in batch mode, or potentially use one of the strategies on the doc pages you linked (like lookups).

Gian

On Tue, Apr 24, 2018 at 12:28 AM, Ilya Bochkov <i.v.b...@gmail.com> wrote:

So, nobody knows?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/5767521b-2d67-4476-80f1-76390c42cadc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward