Incremental Snapshot for Oracle Taking Too Long to Process

Helmi Aziz Muhammad

unread,

Jan 3, 2024, 3:40:53 AM1/3/24

to debezium

Hi,

Currently we have a cluster of three containerized Kafka running in a single machine. We also have a source connection to Oracle database version 12.2 with this configuration:

{
"name": "FRUITS",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"topic.prefix": "FRUITS",
"database.hostname" : "oracle.host.name",
"database.port" : "1521",
"database.user" : "c##dbzuser",
"database.password" : "debezium_pass",
"database.dbname" : "cdb_name",
"database.pdb.name": "pdb_name",
"table.include.list" : "C##DBZUSER.CDC_KAFKA, FRUITS_BASKET.(GREEN|BLUE|RED|PURPLE|MAGENTA|BLACK|WHITE)_BANANA, FRUITS_BASKET.(YELLOW|ORANGE|LIGHT_BLUE|NAVY|BROWN|GREY)_APPLE",
"schema.include.list": "C##DBZUSER, FRUITS_BASKET",
"errors.log.enable": true,
"include.schema.changes": true,
"decimal.handling.mode": "double",
"snapshot.mode": "initial",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://example.schema.registry:8085",
"value.converter.schema.registry.url": "http:// example.schema.registry :8085",
"key.converter.schemas.enable":true,
"value.converter.schemas.enable":true,
"schema.history.internal.kafka.topic": "schema_history.FRUITS",
"schema.history.internal.kafka.bootstrap.servers": "kafka-0:9092,kafka-1:9095,kafka-2:9098",
"schema.history.internal.store.only.captured.tables.ddl": true,
"log.mining.session.max.ms": 1800000,
"log.mining.batch.size.max": 1000000,
"log.mining.scn.gap.detection.gap.size.min": 500000,
"log.mining.scn.gap.detection.time.interval.max.ms": 10000,
"signal.data.collection": "FRUITS.C##DBZUSER.CDC_KAFKA",
"time.precision.mode": "connect"
}
}

The connector is working as intended to capture the changes of all selected tables in real time, but every time we wanted to do incremental snapshot, it always take too long for the Kafka connect to process the snapshot, with sometimes not wanting to capture the snapshot at all, even when the new value in the signal table has been inserted into the topic.

Please help me solve this issue.

Thanks,

Helmi Aziz Muhammad.

Helmi Aziz Muhammad

unread,

Jan 3, 2024, 3:56:49 AM1/3/24

to debezium

Edit: the config looks like this:

{
"name": "FRUITS",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"topic.prefix": "FRUITS",
"database.hostname" : "oracle.host.name",
"database.port" : "1521",
"database.user" : "c##dbzuser",
"database.password" : "debezium_pass",
"database.dbname" : "cdb_name",

"database.pdb.name": "fruits",

Chris Cranford

unread,

Jan 3, 2024, 8:41:16 AM1/3/24

to debe...@googlegroups.com

Hi,

I would suggest trying to adjust the incremental.snapshot.chunk.size [1] and see if that helps.

It's important to understand that incremental snapshots will be a tad slower than initial snapshots due to how they work. These snapshots are chunk-based, meaning that rather than issuing a single query like initial snapshots and then iterating the result set, incremental snapshots issue multiple queries over the lifetime of the snapshot. These queries used a chunk range to dictate how many rows are fetched per chunk, which defaults to only 1024. Additionally, each chunk is compared against any pending data that has been mined from Oracle and about to be sent to Kafka and de-duplicated, so there is a bit more going on with Incremental Snapshots that can affect their performance. However, Incremental Snapshots are resumable so if the connector stops or if you have to restart it for any reason, the incremental snapshot starts where it left off, not from the beginning.

Thanks,
Chris

[1]: https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-property-incremental-snapshot-chunk-size

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/7d019c7d-8602-431b-9a4c-2dd6cd81b0aen%40googlegroups.com.

Helmi Aziz Muhammad

unread,

Jan 3, 2024, 9:25:25 AM1/3/24

to debezium

Hello,

Thank you for the reply, Chris. I assume that in order to make the snapshot faster, I need to adjust the chunk size higher. Say that I have a proper multi-node Kafka, the table I wanted to ingest has 100 million rows, and I adjusted the chunk size into 25 million. Which one between the Kafka cluster and the database would have the performance issue? Also, would you recommend adjusting the chunk size that high in the first place for production environment where there's a rare chance to do incremental snapshot (probably like once or twice every two months)?

Thanks,

Helmi Aziz Muhammad.

Chris Cranford

unread,

Jan 3, 2024, 10:16:02 AM1/3/24

to debe...@googlegroups.com

Hi,

If I understand you correctly, you certainly do not want to set the value to 25M, you likely don't have the heap for such a data set nor would that be wise as it can also impact the performance of your database and the temp tablespace used. I would say first trying changing from the default of 1024 to perhaps 8192 or even 16384 and see how the performance works out. It's important to note that what you want to do is to be able to ingest adequate data per chunk to reduce the number of chunks queried, but that the chunk size be a data set that is quick enough that it doesn't cause lag for your stream ingestion of existing changes. Do be mindful too that with larger sizes like this, you may want to adjust the max.batch.size and max.queue.size parameters accordingly so that you have sufficient space in the internal buffer between Debezium and Kafka Connect so that you don't cause the database read side of the integration pipeline to be blocked because the queue is too small.

One thing I see often is that people seem to think that bumping tuning options from 1024 or 2048 to tens or hundreds of thousands will help and that isn't always the case. It's generally best to increment these values reasonably and get an understanding of where that apex point is for your hardware and infrastructure. A great example is the default fetch size. We used to ship with a value of 2,000 but after some investigation by a community user, they found that typically the best spot was simply 10,000. I would strongly suggest taking the same approach here and find what works best.

Thanks,
Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/eaef9b61-c681-4bcf-8027-049941c3a263n%40googlegroups.com.

Helmi Aziz Muhammad

unread,

Jan 3, 2024, 10:51:31 PM1/3/24

to debezium

Hello,

I've increased the chunk size value and it's working well to do a snapshot right now. I've also just remembered that the initial problem we had wasn't that the incremental snapshot process itself was slow, but the time it took to activate the first snapshot window itself was, to the point where sometimes the snapshot just wouldn't work. The problem occurred for both small tables with only hundreds or thousands of rows and bigger tables with more than 200 thousand rows. Granted that we tried it using Debezium version 2.4.x (we're now using 2.5.0 final).

Is the problem really caused by the low chunk size value or is there actually a bug in the previous version that prevented the connector to activate the snapshot window?

Thanks,

Helmi Aziz Muhammad.

Helmi Aziz Muhammad

unread,

Jan 3, 2024, 11:07:29 PM1/3/24

to debezium

Also, is it okay to truncate the signal table once in a while to free up spaces? Would we still be able to use the signal table after that?

Thanks,

Helmi Aziz Muhammad.

Chris Cranford

unread,

Jan 4, 2024, 6:07:49 AM1/4/24

to debe...@googlegroups.com

Hi -

Well it would depend greatly on whether you're referring to 2.4.0, 2.4.1, or 2.4.2 :). If you were using 2.4.0 or 2.4.1, then yes it's possible you could have experienced the race condition with the Oracle ARC process that caused issues with the capture process and could explain why the signal wasn't detected. This was fixed on 2.4.2.Final and 2.5.0.Final.

Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/6a2d6684-a0fd-4a32-b6f3-2de9246876f7n%40googlegroups.com.

Chris Cranford

unread,

Jan 4, 2024, 6:09:50 AM1/4/24

to debe...@googlegroups.com

Yes, as long as you do not have any in-progress incremental snapshots, you can truncate the table.

Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/1d969b9e-82a5-4743-9ac6-13eeb8f22abfn%40googlegroups.com.

Helmi Aziz Muhammad

unread,

Jan 4, 2024, 6:29:51 AM1/4/24

to debezium

Hello,

There's one last thing I want to clarify. We have tried to truncate the table and insert a new row of snapshot signal with the same unique ID as the one we've inserted before the table truncate, and it seems like the signal didn't activate the snapshot. Only after we inserted another signal with different ID that it started to activate the snapshot. Is it really because the database or the connector somehow stored all the previously done signals, even when all the values in the signal table currently don't exist, which makes the new signal with the same ID don't work, or is it because of some other problems we don't know yet about?

Thanks,

Helmi Aziz Muhammad.

Chris Cranford

unread,

Jan 4, 2024, 8:48:02 AM1/4/24

to debe...@googlegroups.com

Hi,

The only time incremental snapshot state is persisted to the offsets is while one is in-progress. Once the incremental snapshot finishes, that state is discarded. There shouldn't be an issue reusing the same unique ID after the truncate. Looking at our test suite, this isn't a combination we explicitly test, all our use cases explicitly drop and re-create the table but do use the same signal primary key of `ad-hoc` with no problem. Technically whether the table is dropped or truncated, that shouldn't matter. I'll take a look at this corner case and see if there is anything lurking and report back.

Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/b0061ee9-8cfe-4ce2-8bee-ccc5c5c2bf0cn%40googlegroups.com.

Helmi Aziz Muhammad

unread,

Jan 4, 2024, 9:24:46 AM1/4/24

to debezium

Hello,

Please kindly keep us in touch if you find anything about it. Good luck.

Helmi Aziz Muhammad.

Reply all

Reply to author

Forward