Hi Community,
Let’s say I have a running Debezium Oracle source connector, and I am currently capturing changes from 15 tables after performing an initial snapshot. I would like to add a new table and start with an initial snapshot for that table only, while continuing streamed data capture for the existing tables. Once the snapshot for the new table is complete, I want the streamed data capture to continue for it as well.
With signaling, I can initiate a blocking snapshot, but I have some timing concerns. Suppose I update the configuration and add the new table to the include list. When I deploy the updated configuration, Debezium will start streamed data capture for the newly added table and may produce some records to the Kafka topic from current updates. If I then initiate a blocking snapshot, those streamed records will already be in the topic, so the snapshot records will follow them. Once the blocking snapshot is completed, streamed data capture will continue.
This results in the sequence:
[streamed data capture #1][snapshot][streamed data capture #2]
I would like to do this consistently, so that the baseline (snapshot) is published first to the topic, followed by subsequent changes:
[snapshot][streamed data capture]
One option would be to use a separate connector, but if I add a new connector every time I start CDC for a new table, I will end up managing a large number of connectors.
I would really appreciate any guidance on how to achieve this.
Thanks in advance,
Greg
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/d8e52de6-5d25-4d76-8f38-be2e8f953d7fn%40googlegroups.com.
1. Stop the main connectorWith this approach, you keep your connectors to only one, but it does require a downtime window of the main connector while you perform a snapshot. If the table's snapshot takes a while, you risk archive logs expiring and being removed, so this needs to be managed well. This is one of the main benefits of something like incremental snapshots over these other approaches.
2. Deploy temporary connector
- Use new schema history topic for this connector
- Use different connector name
- Use same topic prefix as main connector so that topic naming is the same
- Include only the new table in the include list
- Set snapshot mode as initial_only
3. Remove temporary connector once it finishes the snapshot, removing the temporary connector's history topic.
4. Add new table to the main connector config
5. Restart the connector
To view this discussion visit https://groups.google.com/d/msgid/debezium/c433bef1-df9c-4031-a1fe-59cd84d2a9fdn%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/c0596b2f-dca8-407e-a040-06a7d49d91a3n%40googlegroups.com.