[Oracle Connector] Full Load + CDC count mismatch (Oracle 19c, Debezium 3.2.0)

Олександр Тупчієнко

unread,

Nov 26, 2025, 7:32:24 PM11/26/25

to debezium

Hi all,

I’m seeing a discrepancy in total record counts when combining Full Load and CDC using Debezium Oracle 3.2.0 with AWS MSK.

Details:

Oracle version: 19c
Snapshot mode: initial/incremental/blocking
Table receives changes during full load
Full load matches source count at SCN, but after CDC, total records (full load + CDC) don’t match
It seems some changes during full load might be lost

Questions:

Could changes during full load be skipped in CDC?
Are there log mining / SCN gap settings to prevent missing records?
Does internal.log.mining.read.only=true risk missing changes?
Any recommended practices to ensure full load + CDC consistency?

Thanks in advance for advice or example configs!

Chris Cranford

unread,

Nov 26, 2025, 7:33:25 PM11/26/25

to debe...@googlegroups.com

Hi -

There were a few issues with 3.2.0, please update to 3.2.5.Final and let us know if that solves your issues.

Thanks,
-cc

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/43afddb9-103c-41ab-9029-fd97070610e7n%40googlegroups.com.

Олександр Тупчієнко

unread,

Nov 28, 2025, 9:23:30 AM11/28/25

to debezium

Hello! Thanks for the suggestion. I tried running a blocking snapshot, but after it completed I found that the total number of records does not match the source table. The snapshot produced fewer records than expected.

This setup is running on AWS MSK, and before starting the snapshot I upgraded Debezium to 3.2.5.Final, but the issue still occurred.

Could this be caused by the connector configuration?
Here are my current settings:
{
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"database.hostname": "----",
"database.port": "----",
"database.user": "----",
"database.password": "----",
"database.dbname": "----",
"database.connection.adapter": "logminer",
"table.include.list": " BD.SCHEMANM.DEBEZIUM_SIGNAL, ----",
"tasks.max": "1",

"snapshot.mode": "initial",
"snapshot.fetch.size": "20000",
"incremental.snapshot.chunk.size": "150000",
"internal.log.mining.read.only": "true",
"log.mining.transaction.retention.ms": "600000",
"include.transaction.details": "true",

"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"value.converter.ignore.default.for.nullables": "true",
"value.converter.replace.null.with.default": "false",
"key.converter.replace.null.with.default": "false",
"decimal.handling.mode": "string",
"tombstones.on.delete": "false",

"transforms": "changes,route",
"transforms.changes.type": "io.debezium.transforms.ExtractChangedRecordState",
"transforms.changes.header.changed.name": "Changed",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "topic.startsWith('BD.SCHEMANM.') && value?.op != null && value.op != 'r' ? topic + '_cdc' : topic",

"heartbeat.interval.ms": "5000",
"heartbeat.topics.prefix": "debezium-heartbeat",
"signal.data.collection": " BD.SCHEMANM.DEBEZIUM_SIGNAL",
"signal.enabled.channels": "source,kafka",
"notification.enabled.channels": "sink",

"signal.kafka.bootstrap.servers": "----",
"signal.kafka.topic": "kafka_debezium_signal",
"schema.history.internal.kafka.bootstrap.servers": "----",
"schema.history.internal.kafka.topic": "bdddl",

"schema.history.internal.producer.security.protocol": "SASL_SSL",
"schema.history.internal.consumer.security.protocol": "SASL_SSL",
"signal.consumer.security.protocol": "SASL_SSL",

"schema.history.internal.consumer.sasl.mechanism": "AWS_MSK_IAM",
"schema.history.internal.producer.sasl.mechanism": "AWS_MSK_IAM",
"signal.consumer.sasl.mechanism": "AWS_MSK_IAM",

"signal.consumer.sasl.jaas.config": "software.amazon.msk.auth.iam.IAMLoginModule required;",
"schema.history.internal.consumer.sasl.jaas.config": "software.amazon.msk.auth.iam.IAMLoginModule required;",
"schema.history.internal.producer.sasl.jaas.config": "software.amazon.msk.auth.iam.IAMLoginModule required;",

"signal.consumer.sasl.client.callback.handler.class": "software.amazon.msk.auth.iam.IAMClientCallbackHandler",
"schema.history.internal.consumer.sasl.client.callback.handler.class": "software.amazon.msk.auth.iam.IAMClientCallbackHandler",
"schema.history.internal.producer.sasl.client.callback.handler.class": "software.amazon.msk.auth.iam.IAMClientCallbackHandler",

"connect.timeout.ms": "60000",
"request.timeout.ms": "30000",
"creation.default.partitions": "3",
"metrics.jmx.enabled": "false"
}

четвер, 27 листопада 2025 р. о 02:33:25 UTC+2 Chris Cranford пише:

Chris Cranford

unread,

Dec 1, 2025, 3:58:10 AM12/1/25

to debe...@googlegroups.com

Hi -

Are you able to determine if the missing records were added after the blocking snapshot and the connector simply hadn't read the entry from the logs yet?

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/4af88a7b-5b84-42ba-a93a-fa7aa778af88n%40googlegroups.com.

Олександр Тупчієнко

unread,

Dec 1, 2025, 4:46:56 AM12/1/25

to debezium

Hi,

I'm not sure yet. I will try to determine.

понеділок, 1 грудня 2025 р. о 10:58:10 UTC+2 Chris Cranford пише:

Олександр Тупчієнко

unread,

Dec 1, 2025, 4:59:12 AM12/1/25

to debezium

I also wanted to clarify one more thing.
Could the following configuration option cause row-count mismatches when using a primary (non-standby) Oracle database?

"internal.log.mining.read.only": "true"

I’m wondering whether this setting might affect LogMiner behavior or lead to missed changes during the snapshot or CDC phase.

Thanks!

понеділок, 1 грудня 2025 р. о 11:46:56 UTC+2 Олександр Тупчієнко пише:

Chris Cranford

unread,

Dec 1, 2025, 6:01:03 AM12/1/25

to debe...@googlegroups.com

Hi, No, this option wouldn't have any impact.

Could you please enable TRACE logging for `io.debezium` and then retrigger a blocking snapshot, and share the logs. In addition, could you identify which row(s) is/are missing by their primary key and share that information?

Thanks,
-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/44084535-c459-4100-b4fc-f9913698ff59n%40googlegroups.com.

Олександр Тупчієнко

unread,

Dec 1, 2025, 8:36:23 AM12/1/25

to debezium

Thanks for the clarification. I'll look into how to enable TRACE logging and will try to collect the logs and identify the missing row(s) by their primary key.

One more question: since I'm running the connector on AWS MSK Connect, is there anything specific I should keep in mind when enabling TRACE logging in that environment?

Thanks!

понеділок, 1 грудня 2025 р. о 13:01:03 UTC+2 Chris Cranford пише:

Chris Cranford

unread,

Dec 1, 2025, 8:58:15 AM12/1/25

to debe...@googlegroups.com

Hi -

I'm not super familiar with Amazon MSK Connect and what you can and cannot do there, so I would recommend checking the AWS documentation.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/2f9ab0ea-51c9-4260-9ad2-d6a93ec6d454n%40googlegroups.com.

Олександр Тупчієнко

unread,

Dec 1, 2025, 9:01:13 AM12/1/25

to debezium

Thanks, got it — I’m working on that right now.

понеділок, 1 грудня 2025 р. о 15:58:15 UTC+2 Chris Cranford пише:

Олександр Тупчієнко

unread,

Dec 1, 2025, 2:35:51 PM12/1/25

to debezium

I’m still working on enabling TRACE logging for io.debezium — in AWS MSK Connect this is turning out to be more complicated than I expected.

However, I can already share some findings from my initial analysis.

I triggered a blocking snapshot, and the final result contained fewer records than expected (full load + CDC).

After verifying the data in detail, I found the following:

The full load at the assigned SCN completed successfully, and the record count fully matched the source table.
There was a ~5-minute gap between the last snapshot record and the first CDC record (based on creation timestamps in the database).
All changes that occurred during this period did not appear in CDC.
I also noticed that some CDC events are missing later as well, so the issue is not limited to the snapshot–CDC transition.
The connector is replicating about 60 tables, so the workload is quite significant.

So my question is:
Could these missing CDC events be caused by insufficient resources allocated to the MSK Connect worker running the Debezium connector?

понеділок, 1 грудня 2025 р. о 16:01:13 UTC+2 Олександр Тупчієнко пише:

Chris Cranford

unread,

Dec 1, 2025, 9:56:10 PM12/1/25

to debe...@googlegroups.com

Hi,

That's unlikely. If these are changes that are performed in transactions that span the snapshot<->streaming boundary, that could explain the variance. By default, any in-progress transaction that spans the boundary is not replicated because the snapshot position appears mid-transaction. You could try playing around with the internal experimental feature called "internal.log.mining.transaction.snapshot.boundary.mode". By default this is set to "skip", but can be set to any one of the following:

skip
Any in-progress transactions that span the snapshot<->streaming boundary are skipped.

transaction_view_only
Any in-progress transaction that appears in the V$TRANSACTION performance view will be replicated.
This is the least IO intensive and fastest solution if you do not want to skip transactions in-progress at boundary.

all
This captures all in-progress transactions by using a special mining phase to read the transaction logs.
This is very expensive and can take a while if your system is very active.
This is because the connector needs to mine logs backward until we identify a position where there are no overlapping, in-progress transactions.

Once you have trace logging enabled, we can move forward as we need both the logs and we need to use the extract tool [1] to compare what Debezium sees (SCN range and transactions) with the raw data in the transaction logs provided by the extract tool.

Let me know when you have the logging setup and we can continue forward.

Thanks,
-cc

[1]: https://github.com/Naros/debezium-oracle-query-tool

To view this discussion visit https://groups.google.com/d/msgid/debezium/c2f945f6-061c-45f9-b696-1465cf973935n%40googlegroups.com.

Reply all

Reply to author

Forward