[ORACLE] CDC sync

22 views
Skip to first unread message

Vitor Pereira

unread,
Sep 9, 2025, 7:57:53 AM (2 days ago) Sep 9
to debezium
Hi, 

Hope this message finds you well.

Today, I came here with a different question. We are considering implementing a mechanism that allows us to ensure that both databases (source and sink) are synchronised. 


So, my problem is the following: our transitional system is suffering DML, as any other system like this, and our analytics system is being fed with CDC. Knowing that, we intend to run some reports in the analytics system, but we need to ensure that the data is synced even in large transactions. 

So, my question is: Is there any built-in parameter/configuration in debezium architecture that enables this feature?

Do you have any suggestion to accomplish this goal?

Best regards,
Vitor Pereira

Chris Cranford

unread,
Sep 9, 2025, 2:13:42 PM (2 days ago) Sep 9
to debe...@googlegroups.com
Hi Vitor -

This is a great question and as you've likely uncovered, one of the more challenging aspects of CDC, especially in highly active systems.

Whether we're talking about Oracle or any other source database, there is always going to be this notion that at any moment in time there could be in-transit changes. Even if you look at a connector's JMX metrics, these most often represent state at the time Debezium processes the event, not state that has been flushed to offsets, written to the topics, and most certainly does not take into account any aspect of how quickly those changes are consumed by any sink. So we can't rely on JMX metrics alone.

The most obvious way to know data is synchronized is to have a way to stop changes on the source, wait until consumers no longer poll any changes and then do a comparison between the source and sink to identify row variances. But this isn't practical for obvious reasons. Unfortunately an approximation is really as close as we can get, where you take an extract from the source and target, compare them, identifying variances and spot checking whether those variances eventually synchronize with the source extract a few minutes later.

With Oracle in particular, it makes this process even more complicated because row changes do not have any real unique identifier. System change numbers are non-unique and even multiple commits can share the same system change number. And because Oracle writes changes from the redo buffers to disk in an interwoven way, the system change number can move forward and backward across DML changes in the event stream based on their transaction's commit order; all the while, transaction commit change numbers are always equal to or greater than the prior commit for a given thread.

As you likely guessed, there is no Debezium built-in solution here. Events can be transformed into different representations as part of your pipeline, and may further be enriched/refined in Kafka Streams into new topics where the event no longer resembles how it left Debezium. So this is really an area that falls into the user's domain.

-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/1b801791-676e-457b-8113-c82cd3c17839n%40googlegroups.com.

Marcelo Costa

unread,
Sep 10, 2025, 8:12:48 PM (8 hours ago) Sep 10
to debe...@googlegroups.com
Hi Vitor, Further to what Chris wrote, it all depends on the context and what I call "the nature of the data." You need to delve deeper and understand in detail how this data is generated at the source and thus understand its seasonality. Another aspect is preparing the connector and the database so you can collect data at the speed your business needs. And this analysis needs to be done end-to-end, with fine-tuning at each stage of your entire process. Furthermore, working with your Oracle-specialized DBA will be essential for faster collection; even log details and how they are recorded by LogMiner are important for a good setup.

Regards,
Marcelo Costa
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/5fecdc42-422f-474e-a879-02936d935963%40gmail.com.


--
Marcelo Costa
-------------------------------------------------
http://www.infoq.com/br/author/Marcelo-Costa



Reply all
Reply to author
Forward
0 new messages