Hi experts,
I'm trying to replace our current batch pipeline (daily DB dump) with CDC pipeline, one important question I often got is how do I guarantee the CDC pipeline doesn't loss data and also the data is correct. To answers questions like this, I'm thinking to build a validation system, one way is to take a MySQL dump as a source of truth and compare it with data CDC (the downside is I still need to take a dump which is what I want to replace), the other way is query MySQL server directly which has many downsides as well.
I think this might be a common question for many people, do you have any good suggestions?
Thanks.