Setting frequecy for Kafka jobs

16 views

Skip to first unread message

Prachi G

unread,

Nov 5, 2015, 6:07:25 AM11/5/15

to Camus - Kafka ETL for Hadoop

Hi,

I have just started with Camus. I am planning to run camus job every hour. We get ~80000000 messages (with ~4KB avg size) every hour.

How do I set the following properties:

# max historical time that will be pulled from each partition based on event timestamp
kafka.max.pull.hrs=1
# events with a timestamp older than this will be discarded.
kafka.max.historical.days=3

I am not able to make out these configurations clearly. Should I put days as 1 and and hours property as 2? 
How does camus pull the data? Often I see the following error also:


"ERROR kafka.CamusJob: Offset range from kafka metadata is outside the previously persisted offset

Please check whether kafka cluster configuration is correct. You can also specify config parameter: kafka.move.to.earliest.offset to start processing from earliest kafka metadata offset."

How do I set the configurations correctly to run every hour and avoid that error?

Reply all

Reply to author

Forward

0 new messages