Snapshot workaround for SQL Server

45 views
Skip to first unread message

Аяна А

unread,
Oct 31, 2025, 3:37:14 AM (13 days ago) Oct 31
to debezium
Hello everyone,

I am fairly new to Debezium, currently trying to set up cdc pipeline between our source DB on SQL Server and Postgres DWH.
After speaking with the DevOps team, it turned out that we cannot mindfully take the snapshot from source before starting streaming(current hardware limitations regarding Debezium server), do you think that such workaround would be feasible:
1. Turning on cdc on needed tables in prod and a read only replica
2. Figuring out the max LSN in read only
3. Taking a manual snapshot from the replica through ETL(bypassing debezium)
4. Storing said max LSN in the config for the connector
5. Starting up the connector with schema_only_recovery (gonna connect to prod, taking into considerations that logs are still kept alive)

In theory this sounds like a feasible solution, but I probably lack enough insight to be completely sure and we cannot start the testing just yet, so I thought about asking here.

If it is indeed feasible, what are some things beside log retention I have to consider?
Any tips, critique and help appreciated.
Kind regards,
Aiana.

jiri.p...@gmail.com

unread,
Oct 31, 2025, 3:39:31 AM (13 days ago) Oct 31
to debezium
Hi,

I'd recommend to modify the procedure a bit

1. Enable CDC one prod only
2. Start Debezium with snapshot mode no data
3. Stop Debezium
4. Execute the snapshot from replica
5. Modify the LSN in offsets if needed
6. Restart Debezium

Jiri

Аяна А

unread,
Oct 31, 2025, 4:14:28 AM (13 days ago) Oct 31
to debezium
Hello Jiri.
Thank you for the quick answer and the insights.
I have a couple of follow up questions, if you do not mind:

1. Would you consider this workflow (manual snapshot + no_data) production-safe for large SQL Server deployments, or mainly a workaround for initial load?” (need it for the documentation)
2. In theory would such set up entail any duplicates or gaps in events?

Again thank you for your help.
Aiana.

jiri.p...@gmail.com

unread,
Oct 31, 2025, 4:22:30 AM (13 days ago) Oct 31
to debezium
I personally consider this solution to be more or less workaround. The proper solution as I can envision it would be a new snapshot mode (probably external) in Debezium connectors.
It would work in the way that the connector will start, take snapshot of schema, record the offets and then pause.
Then the initial load is done externally and when completed a signal is sent to Debezium and it will resume the operation and start streaming.

WRT the events - there should be no gaps but you have to expect duplicates which is mostly unavoidable given  the processes cannot cooperat at the database level.

Jiri

Аяна А

unread,
Oct 31, 2025, 4:46:10 AM (13 days ago) Oct 31
to debezium
Thank you!
Reply all
Reply to author
Forward
0 new messages