kafka-hadoop

14 views
Skip to first unread message

Daniel

unread,
Nov 28, 2016, 6:59:25 AM11/28/16
to Camus - Kafka ETL for Hadoop
hi,

we are using camus for long time,
and we have issue with duplicated messages.

our flow is publish events to kafka - >  consume & write to hdfs with camus

after the whole process finishes, we have noticed that there are events that written to hadoop more than once.

anyone had the same issue or have any idea what could be the cause ? 

thanks in advance.



Félix GV

unread,
Nov 29, 2016, 11:50:59 AM11/29/16
to Daniel, Camus - Kafka ETL for Hadoop
You need to use the camus-sweeper module to dedupe.

You should also note that Camus is not actively maintained anymore.

Confluent is now supporting Kafka Connect.

LinkedIn is now supporting the open-source Gobblin project, which had a migration guide from Camus to Gobblin here: https://github.com/linkedin/gobblin/blob/master/gobblin-docs/miscellaneous/Camus-to-Gobblin-Migration.md

-F
--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel

unread,
Dec 8, 2016, 10:15:35 AM12/8/16
to Camus - Kafka ETL for Hadoop, daniel....@gmail.com
thanks!
Reply all
Reply to author
Forward
0 new messages