Kafka rebalancing produce duplicate events

643 views
Skip to first unread message

Lukáš Havrlant

unread,
Nov 3, 2014, 2:37:23 AM11/3/14
to druid-de...@googlegroups.com
Hi!
We've encountered problem with Kafka rebalancing. Normally when Kafka starts rebalance it first stops consume new messages, commits current offset and then the whole rebalance thing is happening. But in Druid spec file we have "auto.commit.enable": "false" because the realtime node takes care of the commiting offset. But this setting is (probably) telling the Kafka consumer to not save the offset when the rebalancing is starting. So now after each rebalancing some messages are read twice. Is it a known issue? How do we prevent it? We've tried some workaround: we've set "auto.commit.enable": "true" and "auto.commit.interval.ms" to some really large number. Thus the autocommiting is enabled but it actually never happen. 

Fangjin Yang

unread,
Nov 3, 2014, 5:29:37 PM11/3/14
to druid-de...@googlegroups.com
Hi Lukáš, it is a known issue that Druid can potentially duplicate messages during real-time ingestion. It is currently not possible to do exactly once ingestion into Druid, but we do plan to address this in the future. For now, you can run a companion batch process to clean up the data.
Reply all
Reply to author
Forward
0 new messages