Hi,
I have just started with Camus. I am planning to run camus job every hour. We get ~80000000 messages (with ~4KB avg size) every hour.
How do I set the following properties:
# max historical time that will be pulled from each partition based on event timestamp
kafka.max.pull.hrs=1
# events with a timestamp older than this will be discarded.
kafka.max.historical.days=3
I am not able to make out these configurations clearly. Should I put days as 1 and and hours property as 2?
How does camus pull the data? Often I see the following error also:
"ERROR kafka.CamusJob: Offset range from kafka metadata is outside the previously persisted offset
Please check whether kafka cluster configuration is correct. You can also specify config parameter: kafka.move.to.earliest.offset to start processing from earliest kafka metadata offset."
How do I set the configurations correctly to run every hour and avoid that error?