Kafka Indexing Service - Multiple kafka topics per task

691 views
Skip to first unread message

Arul Govindarajan

unread,
Nov 17, 2016, 1:41:11 PM11/17/16
to Druid User
Any reason why the ability to handle multiple kafka topics (using a pattern) was removed from new kafka indexing service? That would have been hugely useful for my usecase. I have tens (eventually will hit 100) of kafka topics that feed data to my druid cluster. As it stands now, each topic will need to be handled by a different task, which means a worker per topic  (not including the replicas and partitions). Each worker (If I understand it right) is a JVM and the kafka tasks attach to the worker for their life time which is pretty much never ending. And, that means a ton of resources just to run the kafka indexing tasks.

Any thoughts around how I can workaround this issue?

Thanks, Arul

David Lim

unread,
Nov 21, 2016, 3:36:07 PM11/21/16
to Druid User
Hey Arul,

There's no technical reason multiple topics couldn't be implemented in the indexing service, but it was omitted in the initial implementation because of the added complexity in supporting exactly-once ingestion across multiple topics. One possible solution is to add a stream processor (maybe look at Kafka Streams?) before Druid that will merge the different feeds into a single Kafka topic.

Linbo Jin

unread,
Nov 21, 2016, 7:30:54 PM11/21/16
to Druid User
Hi David,

Actually, we have same resource limitation problem after updating from batch indexing to Kafka indexing service. Based on your reply, we can use Kafka Streams to merge multiple kafka topic into single one but how can druid create(indexes data into) multiple datasources from this single merged topic?

David Lim

unread,
Nov 21, 2016, 7:42:52 PM11/21/16
to Druid User
Right now, topics to datasources are mapped 1-to-1, and any joining or splitting of streams needs to be done before Druid ingestion. You should be able to use a stream processing technology to transform n original topics into m transformed topics which will map 1-to-1 with m Druid datasources.

Jason Cheow

unread,
Nov 24, 2016, 3:35:52 AM11/24/16
to Druid User
Hi David,

Are there any plans in the roadmap for a single task to handle multiple topics?

Arul Govindarajan

unread,
Nov 25, 2016, 9:04:31 AM11/25/16
to druid...@googlegroups.com
Thanks David.

Will take a look at kafka streams.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/-UdXvbTNdXQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/acc63cfb-13da-4658-a241-917a68f6639e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Lim

unread,
Nov 28, 2016, 3:09:40 PM11/28/16
to Druid User
Hey Jason,

Reading from multiple Kafka topics into a single datasource isn't currently on the roadmap. If this is an important feature for you, could you raise an issue for it so that we can track it?

Jason Cheow

unread,
Dec 6, 2016, 10:48:38 PM12/6/16
to Druid User
Thanks, David, But actually, what I was referring to is the ability for a single task to read from N Kafka topics into N data sources.

I have created an issue for this: https://github.com/druid-io/druid/issues/3752

Saurabh Gupta

unread,
Feb 17, 2017, 1:11:04 AM2/17/17
to Druid User
I too have a feature request for the original question: Support for topic pattern in kafka Indexing service extension. I have created this issue for it: https://github.com/druid-io/druid/issues/3945
Reply all
Reply to author
Forward
0 new messages