Snapshot Performance Observation – Debezium Connector v2.5

Ramesh K

unread,

Sep 17, 2025, 3:34:34 AMSep 17

to cran...@gmail.com, debe...@googlegroups.com

Hi

I deployed Debezium connector v2.5 with snapshot.mode=initial. The snapshot completed quickly for most tables, but performance significantly slowed for one specific table — emitting only ~50 records over an extended period.

Upon investigation, I noticed this table has three primary keys, which may be contributing to the slowdown. It appears that table structure and key complexity can impact snapshot throughput, especially during incremental snapshotting.

Let me know if you’ve encountered similar behavior or have recommendations for optimizing performance in such cases.

and what else could be the reason ?

logs and config also attached below.

Best,
Ramesh.

config and logs.txt

jiri.p...@gmail.com

unread,

Sep 17, 2025, 6:38:50 AMSep 17

to debezium

Hi,

we had a smiliar problem with incremental snapshots and that should be fixed in Debezium 3.x. I am not aware of any performance issues with initial snapshots that would relate to composite key.

Jiri

jiri.p...@gmail.com

unread,

Sep 17, 2025, 6:40:05 AMSep 17

to debezium

See https://issues.redhat.com/browse/DBZ-5071

Jiri

Chris Cranford

unread,

Sep 17, 2025, 6:48:53 AMSep 17

to debezium

Hi Ramesh -

During the initial snapshot, there should be little to no performance issues with a table with a composite key. That's because this process issues a simple SELECT and iterates all the rows one by one. However, during incremental snapshots, a composite key can most definite impact performance. This is because database optimizer may decide not to use the primary key index due to the the predicate conditions, which will make the chunk-based query execute slower as it will use a full table scan to determine the chunk of rows.

Which specific database source are you working with, and is there a single column in the table that is considered unique you could use as a surrogate key instead?

-cc

Chris Cranford

unread,

Sep 17, 2025, 6:49:44 AMSep 17

to debezium

Looks like Jiri and I cross posted, please disregard.

Ramesh K

unread,

Sep 17, 2025, 7:31:04 AMSep 17

to debe...@googlegroups.com

Hi

I’m using Oracle 11g with the Debezium connector in snapshot.mode=initial, and the snapshot completed quickly for most tables. However, one table — — is noticeably slower.

Here’s what I’ve found so far: table

has a composite primary key,
No single-column unique key exists
large payloads in the TEXT field.

Does any of those have an effect?

Appreciate any suggestions you might have to optimise the snapshot.

Best regards,
Ramesh.

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/5a9671ff-79ed-4368-93de-b843570982e1%40gmail.com.

Chris Cranford

unread,

Sep 17, 2025, 11:47:29 AMSep 17

to debe...@googlegroups.com

Hi Ramesh -

The initial snapshot on Oracle tables that use CLOB columns are significantly slower due to how Oracle stores and retrieves LOB (Large Object) data. BLOB and CLOB column values are stored "out-of-line", meaning the actual LOB data is stored in separate database blocks from the main table row. The table row only contains a LOB locator (pointer) that references where the actual LOB data is stored.

When the initial snapshot is executed, Oracle must read the main table row that contains the LOB locator and then make an additional IO call to retrieve the actual LOB content from the separate storage, and finally assemble the data from the row & the out-of-line storage. So the reads for LOB data requires additional block IO operations beyond the normal table scan. If the table has many rows, this multiplies the IO operations substantially. In addition, LOB data is typically much larger and will consume the vast majority of the network bandwidth as result set data is received by Debezium. The larger the LOB data, the longer it takes to serialize the row or rows to the client. Lastly, LOB column data requires subtstantially more memory for buffering the result set, and if your fetch size is too large, this can lead to frequent garbage collection cycles which will impact the JVM's performance.

Hope that helps explain why you see slower performance with LOB columns.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuCut%3DuWpEwzVz-cWhb4Ehr4Z5N5bmrBb2hZUEA4NurHA%40mail.gmail.com.

Ramesh K

unread,

Sep 17, 2025, 12:28:54 PMSep 17

to debe...@googlegroups.com

Hi Chris,

Thank you for the detailed explanation.

Could you please outline the exact corrective steps? Our initial snapshot is hanging on one of our tables, and each time we restart the connector with snapshot.mode=initial, it reruns the full snapshot for all tables.

How can we allow the snapshot for that specific table to complete normally? Should we change field types, or is there an alternative approach?

Additionally, if we temporarily remove that table from the connector’s include list, what steps should we follow to prevent the other tables from being resnapshotted on restart?

Regards,
Ramesh

To view this discussion visit https://groups.google.com/d/msgid/debezium/ae2e4aea-e936-4289-b907-5c2907de29cd%40gmail.com.

Chris Cranford

unread,

Sep 17, 2025, 12:46:43 PMSep 17

to debe...@googlegroups.com

Hi Ramesh -

How important is the CLOB column when it comes to CDC? If the CLOB column is not really relevant, perhaps you could use `columns.exclude.list` and exclude that specific column for the table. In this case, the column would not be part of the SELECT statement and the snapshot should be run faster. However, if the column data is important, then I'm afraid you will need to pick between initial or incremental snapshots. I'm afraid outside of excluding the column, there is little Debezium can do in this regard to improve the performance of fetching your row data and the CLOB data from external storage.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuE2nSyyLmpFs7e6%2B%2B_b5Eq%2BLNjcd5%2BOkqNM-gGi%3DV9vQ%40mail.gmail.com.

Ramesh K

unread,

Sep 18, 2025, 4:20:33 AMSep 18

to debe...@googlegroups.com, jiri.p...@gmail.com

Hi chris/Jiri,

So the problematic table looks like this below snippet.

in snapshot.mode=initial with 11 tables. Snapshots for 10 tables have completed (row counts match the database), but the 11th table is still hanging—and I never saw an explicit “snapshot complete” log for each finished table. and during the connector restart it performs full snapshots for all 11 tables.

My goal: skip snapshotting that last problematic table and start CDC for all 10, without triggering a full resnapshot on connector or cluster restart.

Questions:

Excluding the problematic table
- Should I use the Debezium Signal Table, or
- Remove it from table.include.list, or
Restart behavior
- If the problematic table remains in include.list, will a connector or full cluster restart resume only that table’s snapshot?
- If it’s removed, will Debezium skip all snapshots and go straight to CDC for the other 10?
Snapshot tracking

Does Debezium track “snapshot complete” on a per-table basis (so it can resume individually), or only when all tables have finished?

Regards,
Ramesh.

To view this discussion visit https://groups.google.com/d/msgid/debezium/22c4ab5d-81e7-472a-b2f3-c8fc299705f0%40gmail.com.

Chris Cranford

unread,

Sep 18, 2025, 11:39:06 PMSep 18

to debe...@googlegroups.com

Hi Ramesh

I apologize I misunderstood, when you said TEXT before, I assumed you meant CLOB. Unfortunately LONG data types are not supported by Debezium [1].

Now that should not cause the connector to hang. Can you please do me a favor, and only include this specific table in your include list, enable DEBUG logging for `io.debezium`, and start the connector. Let the connector run for a bit, and then please share the logs.

Thanks,
-cc

[1]: https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-binary-character-lob-types

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJvyxsafk972%2B1KDnOsahf7iLgKstQfy3D8_XULJdLN5MA%40mail.gmail.com.

mohd fasil

unread,

Sep 19, 2025, 12:15:48 AMSep 19

to debe...@googlegroups.com

Hi Chris,

I have already shared the logs; please check the first email.

And how to exclude this specific one column which has a long value?

Best regards,

Mohammed Fasil.

To view this discussion visit https://groups.google.com/d/msgid/debezium/187692b3-119d-4d93-8fcb-0f91b0ee532f%40gmail.com.

image.png

Ramesh K

unread,

Sep 19, 2025, 12:45:40 AMSep 19

to debe...@googlegroups.com

Thanks Fasil.

Chris, Can you please mention the further steps.

Thanks,

Ramesh

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAJMtcOhofaq%3DyOEd4gSkK6JCcLn8uma1pWVVPGsOdAOofRD3Yg%40mail.gmail.com.

Chris Cranford

unread,

Sep 19, 2025, 12:59:53 AMSep 19

to debe...@googlegroups.com

Hi -

The log in the first email does not have DEBUG enabled, nor is it the full and complete log. If you could please setup a second connector that only captures changes for this table with the LONG with debug logging enabled, and share the complete log, that would be great.

Thanks,
-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAJMtcOhofaq%3DyOEd4gSkK6JCcLn8uma1pWVVPGsOdAOofRD3Yg%40mail.gmail.com.

Ramesh K

unread,

Sep 19, 2025, 6:11:05 AMSep 19

to debe...@googlegroups.com

Hi chris,

Logs attached below.

and how can we exclude one LONG column from the snapshot? what config we should use.

i appreciate your help.

Thanks.

To view this discussion visit https://groups.google.com/d/msgid/debezium/24716147-14be-49af-a3ab-d55c7b3b2496%40gmail.com.

logs.txt

Ramesh K

unread,

Sep 22, 2025, 7:54:40 AMSep 22

to debe...@googlegroups.com

Hi Chris,

Thanks I included config "column.exclude.list" for LONG data type and yes performance increased.

I have one more query while the connector was running on initial mode and there are no live changes in db then how can we know that snapshot completed?
because even though the number of records from db and topic are in sync but still after checking in logs it says snapshot completed = false. (screenshot attached below).
how to identify that snapshot is completed.

Regards,

Ramesh

Chris Cranford

unread,

Sep 22, 2025, 8:09:38 AMSep 22

to debe...@googlegroups.com

Hi Ramesh -

The more reliable way is to set `heartbeat.interval.ms` to a value greater than `0`. This way when the snapshot finishes, a heartbeat event is emitted to Kafka and the offsets will reflect `snapshot=false` as you would expect. Otherwise, this only happens when the first streaming event is emitted.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuVc8yRhMjP-0RsNsB0u-qJXXFZbX2O75VNTcMdBDv%3DgQ%40mail.gmail.com.

Ramesh K

unread,

Sep 22, 2025, 11:37:50 AMSep 22

to debe...@googlegroups.com

Hi Chris,

We’ve set heartbeat.interval.ms = 6000, and after the initial snapshot completes (even though no explicit log message confirms it), we verified completion by matching the record count between the database and Kafka topics.

However, we’ve observed that when the connector is restarted with snapshot.mode=initial and no live changes occurring in the database, the snapshot is re-triggered for the same existing data.
Q1: How can we prevent this re-snapshot from happening upon connector restart?

Additionally, as we prepare for production deployment—where there will be both existing data and ongoing CDC activity—
Q2: What is the recommended snapshot mode to use during startup so that it captures both the existing data and continues with live CDC seamlessly?

Looking forward to your guidance.

Regards,
Ramesh

To view this discussion visit https://groups.google.com/d/msgid/debezium/30908a50-05e0-43f7-855a-6d5c3996abc6%40gmail.com.

Chris Cranford

unread,

Sep 22, 2025, 11:06:53 PMSep 22

to debe...@googlegroups.com

Hi Ramesh -

You mention there is no explicit log message confirming the snapshot completed, so are you not seeing logs like this:

    Snapshot step 7 - Snapshotting data
    ...
    Snapshot - Final stage
    Snapshot completed
    Snapshot ended with SnapshotResult [status=COMPLETED, offset=OracleOffsetContext [scn=..., commit_scn=[], lcr_position=null]]

If you are not seeing "Snapshot completed" and "Snapshot ended" in the logs, then I would argue the snapshot isn't finishing properly. Can you provide the full connector logs?

As for your Q2, the default `snapshot.mode` of `initial` is sufficient for this purpose in most cases. Users may elect to use `no_data` instead with Oracle and rely on Incremental Snapshots in the event that your historical data is large and that taking an initial snapshot takes more time than your database's undo retention allows, which will cause a snapshot failure due to the flashback query SCN aging out of undo retention space. This typically results in an ORA-01555 error, which we discuss in the FAQ [1], "What's the cause for ORA-01555 and how to handle it?"

Thanks,
-cc

[1]: https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-frequently-asked-questions

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJvxMmonzDa6vsPN-DKQp5Aoh2uvUCf_fLV1__M1tRROuQ%40mail.gmail.com.

Ramesh K

unread,

Sep 23, 2025, 2:24:58 AMSep 23

to debe...@googlegroups.com

Hi Chris,

Thanks.

The connector was down for few hours and when we restarted it giving error : Online REDO LOG files or archive log files do not contain the offset scn 12337276408659. Please perform a new snapshot. at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:166

what steps should be taken. please.

Regards,
Ramesh.

To view this discussion visit https://groups.google.com/d/msgid/debezium/e709107a-1ffa-4dc4-9b9d-17a7541839af%40gmail.com.

Ramesh K

unread,

Sep 23, 2025, 5:28:58 AMSep 23

to debe...@googlegroups.com

Hi

I tried below

Consider removing the offsets/history topic and redeploy the connector taking a new snapshot.

and still the same error Online REDO LOG files or archive log files do not contain the offset scn 12337276408659. Please perform a new snapshot. at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:166.
suggest further steps.

Chris Cranford

unread,

Sep 23, 2025, 7:51:05 AMSep 23

to debe...@googlegroups.com

Hi Ramesh

In this case, you need to remove the offsets for the connector and delete it's schema history topic, and restart the connector to re-take a full snapshot.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJtKu2%2BJvCpLrHoeOgYhpfVTOWkxXSAjEqZkiFNEpqnZ_Q%40mail.gmail.com.

Chris Cranford

unread,

Sep 23, 2025, 8:06:30 AMSep 23

to debe...@googlegroups.com

Hi Ramesh -

How are you deleting the connector's offsets because the fact the same error with the same SCN value would indicate the offsets were likely not cleared properly.

-cc

To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJt5skuZVUxnRObzRJZnhoAi%3DwwBvjNLBb6uYZoPR%2BgwSw%40mail.gmail.com.

Reply all

Reply to author

Forward