Duplicate Message Processing with Kafka Rebalance in Sarama

16 views
Skip to first unread message

Krithika Vijaykumar

unread,
Aug 15, 2024, 12:48:15 PM8/15/24
to kafka-clients
Hi All

We have a certain scenario where we are consuming data from a high throughput/high partition topic (approx 10K TPS with 36 partitions ) . This kafka consumer group is running in Kubernetes and there is a frequent chance of pods getting rescheduled due to underlying node activities. We noticed that when that happens we do see duplicate messages being processed. We use the MarkOffset method to commit the offsets. 

Pod /Consumer Deleted - (pod-124)

We do see the following in our logs: All are from the same pod or consumer (pod-123)

10:41:11.671 - Read Message with ID 111
10:41.12.306 - Read Message with ID 111
10:41.12.435 - Ack Message with ID 111
10.41.12.436 - Ack Message with ID 111

The important thing to note is that all these are from the consumer that is going away due to deletion but from the consumer that is intact and has been running for a while but still affected by the rebalance. 

PS: Just used dummy IDs for reference here as I did not want to send in the real data. 

Any insights are useful as we want to avoid this duplicate data processing.
Reply all
Reply to author
Forward
0 new messages