What features does Kafka have for Data deduplication?

923 views
Skip to first unread message

Kafka123

unread,
Dec 13, 2016, 4:20:33 PM12/13/16
to Confluent Platform
Hello,
     I want to know what Kafka offers for duplicates. For example I have 2 producers and sometimes they might have same messages being produced to Kafka topics (Note: In an ideal situation every message will be unique). How can Kafka avoid having both the messages in the topics? I do not want to do anything to avoid duplication on the producer side.
1) What does Kafka offer for deduplication or anything related to it? 
2) If Kafka doesn't offer anything, we'll probably have to do that on the consumer part.

Thank you!


Eno Thereska

unread,
Dec 14, 2016, 2:15:04 AM12/14/16
to Confluent Platform
Kafka does not do data deduplication for your scenario.

Sometimes, under failure, the same message might be sent twice. For that, there is currently a KIP to provide exactly-once delivery: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

However I believe you are talking about a different kid of deduplication, not under failure.

Thanks
Eno
Reply all
Reply to author
Forward
0 new messages