Hi,
Our Debezium in 3.1.3 version is using Kafka 4.0 and as source Oracle 19c.
We observe, without any clue to find out the root cause, an issue that is arriving every 5 to 10 minutes. To summarize :
- Initally our database was snapshoted using Incremental method
(snapshot + log) and our Debezium is still in this mode while all
snapshots are over (so only Log for now)
- About 5 K events /s on Log CDC
- About 1 or 2 events every 5 to 10 minutes occuring with a warning on our Sink (see after), so not so often but reccurrent
- We check both databases (source Oracle and target PostgreSql) on every tables : same number of rows, and in partial test checking values (more than 100 Milions of rows on several tables)
- Next step will be to go to "log" only (no snapshot)
The issue :
Wen we get an UPDATE or DELETE (where condition is on PK), the Sink complains that row does not exist. Our Sink fix the issue by retrying (Update Or Insert so the missing row goes to Insert, Delete while ignoring if no row exists)
But as far as we checked, it seems full snapshot is completed, and therefore we should have all rows. The snapshot was finished a month ago, so we are on Log only events.
The only thing we think about could be (but rarely possible in my opinion) that Debezium could do the following :
- In the same "log" slot, at least 2 events (1 INSERT and 1 UPDATE, or 1 INSERT and 1 DELETE) occurs on the very same row (same PK) (and seems possible in the source application adressing Oracle)
- Debezium could send only one of the event (the last one) by a deduplication operation, therefore leading to only 1 UPDATE ou 1 DELETE,
- This could explain the time to time warning
Could it be possible while Debezium is in "incremental snapshot" mode while snapshots are fully over and logs are now on the flow (no delay except standard delay in CDC) ?
If so (rarely possible for me but we try to understand the root issue), could it be corrected in more recent versions ?
Best regards,
Frederic