Setting frequecy for Kafka jobs

16 views
Skip to first unread message

Prachi G

unread,
Nov 5, 2015, 6:07:25 AM11/5/15
to Camus - Kafka ETL for Hadoop
Hi,

I have just started with Camus. I am planning to run camus job every hour. We get ~80000000 messages (with ~4KB avg size) every hour.

How do I set the following properties:

# max historical time that will be pulled from each partition based on event timestamp
kafka.max.pull.hrs=1
# events with a timestamp older than this will be discarded.
kafka.max.historical.days=
3

I am not able to make out these configurations clearly. Should I put days as 1 and and hours property as 2?
How does camus pull the data? Often I see the following error also:



"ERROR kafka.CamusJob: Offset range from kafka metadata is outside the previously persisted offset

Please check whether kafka cluster configuration is correct. You can also specify config parameter: kafka.move.to.earliest.offset to start processing from earliest kafka metadata offset."


How do I set the configurations correctly to run every hour and avoid that error?
Reply all
Reply to author
Forward
0 new messages