Debezium connector restart results in duplicate events into Kafka Topic

479 views
Skip to first unread message

Aishwarya Balu

unread,
Sep 19, 2023, 2:52:31 PM9/19/23
to debezium

Hi Team,

We have observed that whenever Debezium connector is restarted, we see Duplicate Messages being published into Kafka Topic from the connector. This issue is particularly concerning as it is occurring in our production environment, which demands high data consistency and integrity.

In an effort to address this issue, we have reviewed the official Debezium documentation, specifically:

https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-when-things-go-wrong

https://debezium.io/documentation/faq/#why_must_consuming_applications_expect_duplicate_events

These resources have provided insights into the reasons behind the duplicate event issue during connector restarts. However, to maintain the reliability and data consistency of our system, we are seeking your assistance in resolving this behaviour permanently.

 

We kindly request your support and expertise in addressing and rectifying this issue within the Debezium connector. Our primary objective is to ensure that the connector restarts do not result in the generation of duplicate messages in the Kafka topic.

We understand that Debezium is a widely adopted tool for change data capture (CDC), and we value the benefits it provides. However, the duplicate message issue is a critical concern for us, and we believe that, with your assistance, we can overcome this challenge.

Please let us know how we can proceed with this.

 

 

Thanks.

 

 

 

Abdul Wasay

unread,
Sep 20, 2023, 12:22:38 AM9/20/23
to debezium
1) You can handle it in your downstream.
2)  Check key.convertor and value.convertor
3) check data in source
4) Check connector configuration
5) Check kafka topic configuration, partition factor some times may lead to duplication

Aishwarya Balu

unread,
Sep 20, 2023, 1:14:58 AM9/20/23
to debezium
Hi,

Thanks for the reply.
This is connector properties we have. Helps any way to get a solution? 


Thanks.

connectorconfig-properties.txt

jiri.p...@gmail.com

unread,
Sep 20, 2023, 2:11:16 AM9/20/23
to debezium
Hi,

does it really happen to you with regular connector restart, not when connector crashes?

J.

Aishwarya Balu

unread,
Sep 20, 2023, 3:34:38 AM9/20/23
to debezium
Hi,

We use MSK, here we don't have option of start and stop the connector directly, here restarting the connector refers deleting the existing connector and re-creating the connector. What we observed is, the issue happens whenever binlog is filled and while switching to new binlog file, the connector tries to find old binlog file, when not getting it, taking snapshot and duplicating the some events (not all). I have attached the connector properties file for your reference.


Thanks.

Abdul Wasay

unread,
Sep 20, 2023, 5:58:30 AM9/20/23
to debezium
you may filter duplicate events in your downstream i.e in kafka application

jiri.p...@gmail.com

unread,
Sep 20, 2023, 6:26:49 AM9/20/23
to debezium
Hi,

I think you described completely different problem - and that why Debezium is not able to find the old file. Is it retention issue?

Generally if it is not able to find a position in which it stopped then the result would be data loss.

J.

Aishwarya Balu

unread,
Sep 22, 2023, 7:44:40 AM9/22/23
to debezium
Hi, 

This issue was with prod where the connector is looking for the bin log file which is getting deleted as per the retention.

This duplicate of issues are happening when there is restart to the connector.




Thanks.


Reply all
Reply to author
Forward
0 new messages