We're using Kafka for our current data pipeline. We need to implement a monitory application to compare the records produced and the records in the HDFS for each hour. We've thought about a few implementations:
1) The Producer sends an Audit Record about the number of records it has produced in the past one hour. The Consumer application does the same and they're both compared. But there are many loopholes to this. Is there any other way this can be done?
NOTE: We have a HDFS sink connector for injecting records to HDFS. Can we include monitoring in Connector itself? If so, how?
Please help!
Thanks in advance