Hi
I deployed Debezium connector v2.5 with snapshot.mode=initial
. The snapshot completed quickly for most tables, but performance significantly slowed for one specific table — emitting only ~50 records over an extended period.
Upon investigation, I noticed this table has three primary keys, which may be contributing to the slowdown. It appears that table structure and key complexity can impact snapshot throughput, especially during incremental snapshotting.
Let me know if you’ve encountered similar behavior or have recommendations for optimizing performance in such cases.
and what else could be the reason ?
logs and config also attached below.
Best,
Ramesh.
I’m using Oracle 11g with the Debezium connector in snapshot.mode=initial
, and the snapshot completed quickly for most tables. However, one table — — is noticeably slower.
Here’s what I’ve found so far: table
has a composite primary key,TEXT
field.--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/5a9671ff-79ed-4368-93de-b843570982e1%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuCut%3DuWpEwzVz-cWhb4Ehr4Z5N5bmrBb2hZUEA4NurHA%40mail.gmail.com.
Thank you for the detailed explanation.
Could you please outline the exact corrective steps? Our initial snapshot is hanging on one of our tables, and each time we restart the connector with snapshot.mode=initial
, it reruns the full snapshot for all tables.
How can we allow the snapshot for that specific table to complete normally? Should we change field types, or is there an alternative approach?
Additionally, if we temporarily remove that table from the connector’s include list, what steps should we follow to prevent the other tables from being resnapshotted on restart?
To view this discussion visit https://groups.google.com/d/msgid/debezium/ae2e4aea-e936-4289-b907-5c2907de29cd%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuE2nSyyLmpFs7e6%2B%2B_b5Eq%2BLNjcd5%2BOkqNM-gGi%3DV9vQ%40mail.gmail.com.
in snapshot.mode=initial
with 11 tables. Snapshots for 10 tables have completed (row counts match the database), but the 11th table is still hanging—and I never saw an explicit “snapshot complete” log for each finished table. and during the connector restart it performs full snapshots for all 11 tables.
My goal: skip snapshotting that last problematic table and start CDC for all 10, without triggering a full resnapshot on connector or cluster restart.
Questions:
Excluding the problematic table
Should I use the Debezium Signal Table, or
Remove it from table.include.list
, or
Restart behavior
If the problematic table remains in include.list
, will a connector or full cluster restart resume only that table’s snapshot?
If it’s removed, will Debezium skip all snapshots and go straight to CDC for the other 10?
Snapshot tracking
Does Debezium track “snapshot complete” on a per-table basis (so it can resume individually), or only when all tables have finished?
To view this discussion visit https://groups.google.com/d/msgid/debezium/22c4ab5d-81e7-472a-b2f3-c8fc299705f0%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJvyxsafk972%2B1KDnOsahf7iLgKstQfy3D8_XULJdLN5MA%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/187692b3-119d-4d93-8fcb-0f91b0ee532f%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAJMtcOhofaq%3DyOEd4gSkK6JCcLn8uma1pWVVPGsOdAOofRD3Yg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAJMtcOhofaq%3DyOEd4gSkK6JCcLn8uma1pWVVPGsOdAOofRD3Yg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/24716147-14be-49af-a3ab-d55c7b3b2496%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJuVc8yRhMjP-0RsNsB0u-qJXXFZbX2O75VNTcMdBDv%3DgQ%40mail.gmail.com.
We’ve set heartbeat.interval.ms = 6000
, and after the initial snapshot completes (even though no explicit log message confirms it), we verified completion by matching the record count between the database and Kafka topics.
However, we’ve observed that when the connector is restarted with snapshot.mode=initial
and no live changes occurring in the database, the snapshot is re-triggered for the same existing data.
Q1: How can we prevent this re-snapshot from happening upon connector restart?
Additionally, as we prepare for production deployment—where there will be both existing data and ongoing CDC activity—
Q2: What is the recommended snapshot mode to use during startup so that it captures both the existing data and continues with live CDC seamlessly?
Looking forward to your guidance.
Regards,
Ramesh
To view this discussion visit https://groups.google.com/d/msgid/debezium/30908a50-05e0-43f7-855a-6d5c3996abc6%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJvxMmonzDa6vsPN-DKQp5Aoh2uvUCf_fLV1__M1tRROuQ%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/e709107a-1ffa-4dc4-9b9d-17a7541839af%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJtKu2%2BJvCpLrHoeOgYhpfVTOWkxXSAjEqZkiFNEpqnZ_Q%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/CAG%2BsLJt5skuZVUxnRObzRJZnhoAi%3DwwBvjNLBb6uYZoPR%2BgwSw%40mail.gmail.com.