[ORACLE] Ad-hoc snapshot - Incremental

17 views

Skip to first unread message

Vitor Pereira

unread,

Nov 24, 2025, 12:09:24 PM (9 days ago) Nov 24

to debezium

Hi,

Hope this message finds you well.

Today, I came here with a different issue. I'm facing myself in the middle of ad hoc snapshot tuning.

I tried adding some tables with different sizes and I got stuck during the process.

I'm trying to add a table after the initial snapshot, using the ad hoc functionality that allows me to do that. This table has almost 20M rows.

The problem is that the snapshot gets stuck, i'd probably say frozen/paused during the process. I ran the process during the whole weekend and it barely added 1.5M rows.

Am I doing anything wrong? Is the process supposed to be like this?

These are my conns configs:

{
"name": "xxxx-source",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"database.hostname": "xxxx",
"database.port": "xxx",
"database.user": "xxxx",
"database.password": "xxxx",
"database.dbname": "xxxx",
"database.name": "xxxx",
"database.server.name": "xxx",
"database.history.kafka.topic": "xxx-schema-changes",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "xxxx-schema-changes",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.include.list": "xxxxx, xxxxx",
"table.include.list": "xxxxxx",
"datatype.propagate.source.type": "xxxxx",
"schema.history.internal.store.only.captured.tables.ddl": "true",
"schema.history.internal.store.only.captured.databases.ddl": "true",
"signal.data.collection": "xxxx.xxxx.xxxx",
"decimal.handling.mode": "precise",
"snapshot.mode": "initial",
"snapshot.max.threads": "8",
"incremental.snapshot.chunk.size": "5000",
"snapshot.fetch.size": "2000",
"topic.prefix": "xxxx",
"heartbeat.interval.ms": "5000",
"heartbeat.action.query": "INSERT INTO xxxx.xxxx VALUES ('heartbeat')",
"log.mining.strategy": "hybrid",
"log.mining.transaction.retention.ms": "172800000",
"log.mining.flush.table.name": "xxxxx",
"log.mining.batch.size.min": "10000",
"log.mining.batch.size.max": "20000",
"log.mining.sleep.time.min.ms": "50",
"log.mining.sleep.time.max.ms": "200",
"log.mining.archive.log.only.mode": "false",
"log.mining.database.connection.reuse.interval.ms": "60000",
"key.converter.schemas.enable": "true",
"value.converter.schemas.enable": "true",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"

Please feel free to suggest anything that might help.

Best regards

Chris Cranford

unread,

Nov 25, 2025, 5:33:56 AM (8 days ago) Nov 25

to debe...@googlegroups.com

Hi Vitor -

I suspect that the chunk size of 5000 is not very efficient. When a table is large, like your 20 million rows, if you take that and divide by the chunk size, that is how many iterations it takes to perform that table's snapshot. If you look at the average of all your JMX metric LastBatchProcessingTimeInMilliseconds, that gives you an idea of how long each iteration takes to process change events. Let assume that it takes 10 seconds on average to process the batches, then that's 40,000 seconds (~11.5 hours).

So it's likely best to increase your chunk size. Just a small bump to 25000 would drop the iterations to 800, with 10 second average, a total of 8,000 seconds (~2.2 hours).

Note the use of 10 second average is just a random number, please review your own JMX metrics for a more accurate calculation.

-cc

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/5d502616-3149-40fe-a212-263748664e84n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages