MySQL CDC connectorcompleteness/correctness validation

30 views
Skip to first unread message

Liang Mou

unread,
Mar 26, 2023, 5:29:41 PM3/26/23
to debezium
Hi experts,

I'm trying to replace our current batch pipeline (daily DB dump) with CDC pipeline, one important question I often got is how do I guarantee the CDC pipeline doesn't loss data and also the data is correct. To answers questions like this, I'm thinking to build a validation system, one way is to take a MySQL dump as a source of truth and compare it with data CDC (the downside is I still need to take a dump which is what I want to replace), the other way is query MySQL server directly which has many downsides as well. 

I think this might be a common question for many people, do you have any good suggestions?


Thanks.

Chris Cranford

unread,
Mar 27, 2023, 8:06:03 AM3/27/23
to debe...@googlegroups.com
Hi -

Depending on how critical the daily ETL is to your business, I might actually suggest the inverse. It's not an uncommon practice when shifting technology to run both in parallel for some short period of time and do this validation. In fact, you'd let the current daily ETL process continue to feed the downstream system and simply have the Debezium connectors capturing changes and then you compare the data in Kafka's topics to the ETL and validate it this way. This reduces the production deployment impact while also gaining the confidence and certainty you need for the switch. Once you're confident things are working well, you can schedule a time where you stop the ETL and go directly into using Debezium.

Chris

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/17aaafdd-92a3-4d7c-8c08-80d8faf3da58n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages