Adding a New Table to a Running Debezium Oracle Connector

59 views
Skip to first unread message

Gergely Jahn

unread,
Aug 17, 2025, 5:09:24 PMAug 17
to debezium

Hi Community,

Let’s say I have a running Debezium Oracle source connector, and I am currently capturing changes from 15 tables after performing an initial snapshot. I would like to add a new table and start with an initial snapshot for that table only, while continuing streamed data capture for the existing tables. Once the snapshot for the new table is complete, I want the streamed data capture to continue for it as well.

With signaling, I can initiate a blocking snapshot, but I have some timing concerns. Suppose I update the configuration and add the new table to the include list. When I deploy the updated configuration, Debezium will start streamed data capture for the newly added table and may produce some records to the Kafka topic from current updates. If I then initiate a blocking snapshot, those streamed records will already be in the topic, so the snapshot records will follow them. Once the blocking snapshot is completed, streamed data capture will continue.

This results in the sequence:

[streamed data capture #1][snapshot][streamed data capture #2]

I would like to do this consistently, so that the baseline (snapshot) is published first to the topic, followed by subsequent changes:

[snapshot][streamed data capture]

One option would be to use a separate connector, but if I add a new connector every time I start CDC for a new table, I will end up managing a large number of connectors.

I would really appreciate any guidance on how to achieve this.


Thanks in advance,

Greg

Chris Cranford

unread,
Aug 18, 2025, 9:49:09 AMAug 18
to debe...@googlegroups.com
Hi Greg -

I think the closest you could get to this working would be if you relied on the File or Kafka-topic signal channels to send the blocking snapshot signal while the connector is down and being reconfigured. On the stream start-up, it should read the blocking snapshot request almost instantly.

Otherwise I am afraid that if you rely on database signaling, the signals are processed in the commit order of the transactions, and if the connector is mining changes in the past, there could be changes between the read position and the signal insert for the newly added table, resulting in the scenario you described.

You could also try to accomplish this with a transformation that could be configured with a list of topic names that will discard any event until the first `r` (snapshot read) event is observed for the topic, allowing that `r` event and any future changes thereafter. This would be a transformation you add to the connector configuration when adding the new table and you would need to remove it after the snapshot for the table ends. Such a transformation would only work for blocking snapshots and not incremental, but this is extremely brittle and a rebalance immediately after the snapshot concludes and before you drop the transformation could lead to data loss if the transform tracking isn't persisted across restarts.

-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/d8e52de6-5d25-4d76-8f38-be2e8f953d7fn%40googlegroups.com.

Gergely Jahn

unread,
Aug 25, 2025, 5:04:28 PMAug 25
to debezium
Hi Chris,

Thank you for the suggestions. I've tried the Kafka-topic signal, with the following scenario:
- stopped the connector
- added a record to the new table
- sent the blocking snapshot signal to the Kafka topic
- updated connector config with the new table
- resumed connector

The connector picked up the change (op: c) first and then the snapshot (op:r).
The transformation based solution sounds risky.

I think it would be possible to solve this with a stream processing application (for example Flink or Kafka Streams), discarding the changes until the first snapshot message, using durable state store.

It seems like adding another connector is the easiest and safest way when we need initial snapshot for a newly added table or tables.
Can we consider this as a best practice?
Are there any drawbacks, beside it increases the load on the source database?

Best,
Greg



Chris Cranford

unread,
Aug 25, 2025, 5:43:00 PMAug 25
to debe...@googlegroups.com
Hi Greg -

Flink or Kafka Streams is definitely a much better alternative than a transformation, 100%.

The main drawback with multiple connectors is the load on the source, as you point out. I'd like to propose another option and that is to use a temporary connector. The steps would look something like:
1. Stop the main connector
2. Deploy temporary connector
    - Use new schema history topic for this connector
    - Use different connector name
    - Use same topic prefix as main connector so that topic naming is the same
    - Include only the new table in the include list
    - Set snapshot mode as initial_only
3. Remove temporary connector once it finishes the snapshot, removing the temporary connector's history topic.
4. Add new table to the main connector config
5. Restart the connector
With this approach, you keep your connectors to only one, but it does require a downtime window of the main connector while you perform a snapshot. If the table's snapshot takes a while, you risk archive logs expiring and being removed, so this needs to be managed well. This is one of the main benefits of something like incremental snapshots over these other approaches.

One noteworthy point to consider is the potential overlap of changes. If we assume that the main connector is stopped at SCN 1000, and you deploy the temporary connector and it snapshots at SCN 2000, then when the main connector is restarted in step 5, any changes to this table between SCN 1000 and 2000 will look like the state of the row goes back in time and then rolls back forward. The net result is idempotent, but not knowing what your consumers expect or allow, its just worth mentioning.

Hope that helps.
-cc

Gergely Jahn

unread,
Aug 26, 2025, 3:22:06 AMAug 26
to debezium
Hi Chris,

Let me summarize the solutions and please let me know if I've missed something:

Solution #1:
New connector for the new table(s). Different connector id, and 'schema.history.internal.kafka.topic'. 

Pros:
- Easy and quick
- No downtime
- Exactly once
- In order(first the snapshot then the changes)
Cons:
- Additional load on the connect cluster
- Additional load on the source system
- Operational overhead

Solution #2
Temporary new connector. As you described above:

1. Stop the main connector
2. Deploy temporary connector
    - Use new schema history topic for this connector
    - Use different connector name
    - Use same topic prefix as main connector so that topic naming is the same
    - Include only the new table in the include list
    - Set snapshot mode as initial_only
3. Remove temporary connector once it finishes the snapshot, removing the temporary connector's history topic.
4. Add new table to the main connector config
5. Restart the connector

Pros:
- Easy and quick
- No additional load on the connect cluster
- No additional load on the source system
- No operational overhead
- In order(first the snapshot then the changes)
Cons:
- Downtime
- At least once (There will be some change events after the snapshot that do not represent a real data change) 

Solution #3

Blocking snapshot via signaling. As we tested above it is possible that we receive some change messages before the snapshot, but we can easily drop them with a stream processing application.
1. Create temporary topic(s) for the new table(s)
2. Start the stream processing application. It consumes messages from the temporary topic(s) and drops(for every table until the first op:'r') or relays them to the final topic(s)
3. Add new table(s) to the connector config and route them to the temporary topic(s)
4. Restart the connector
5. Send blocking snapshot signal(s) 
6. When the blocking snapshots are completed stop the connector
7. Let the stream processing application consume all messages from the temporary topics
8. Stop the stream processing application
9. Delete the temporary topics
10. Update connector config and route the new table(s) to the final topic(s)
11. Restart the connector

Pros:
- No additional load on the connect cluster
- No additional load on the source system
- No operational overhead
- Exactly once
- In order(first the snapshot then the changes)
Cons:
- More difficult process
- Downtime
- Additional implementation

Solution #4

Incremental snapshots.
1. Add new table(s) to the connector configuration
2. Restart the connector
3. Send signal to trigger and incremental snapshot

Pros:
- Easy and quick
- No additional load on the connect cluster
- No additional load on the source system
- No operational overhead
- Exactly once
- No downtime
Cons:
- Out of order(the snapshot overlaps with the changes)

Let me know your thoughts.

Best,
Greg

Chris Cranford

unread,
Aug 26, 2025, 10:29:48 AMAug 26
to debe...@googlegroups.com
Hi Greg -

That fairly accurately covers all 4 solutions rather well.

-cc

Gergely Jahn

unread,
Aug 26, 2025, 10:37:17 AMAug 26
to debezium
Thank you Chris for the help.
Reply all
Reply to author
Forward
0 new messages