Adding new tables to existing CDC MySQL Connector

Skip to first unread message

Kirk Prybil

Jul 18, 2018, 4:19:25 PM7/18/18
to debezium
We are currently using v 7.5 Debezium MySQL Connector for CDC and need to update it to pull additional tables and schemas/databases.

The initial setup was designating 
database.whitelist: D1, D2, D3
table.whitelist: D1.t1, D2.t1, D3.t1

Whe need to additional datbase to whitelist as well as add additional tables to existing D1, D2...

Is this possible to do in existing connector?  If so, what are the steps?

If not, do we need to define another connector with a different database.history.kafka.topic, and

Thanks for your help.

Gunnar Morling

Jul 19, 2018, 4:54:42 AM7/19/18
to debezium
Hi Kirk,

Updating whitelist/blacklist configuration is a long-standing feature request which still needs to be implemented. There's for tracking it, and some work towards it also has been done (see the pending PR for it), but it's not quite there yet. I hope we can provide this functionality some time soon.

> Is this possible to do in existing connector?  If so, what are the steps?

It's not fully possible atm. Only if you don't need an initial snapshot of any newly added tables (i.e. it's enough for your use case to see any changes for those tables from "now" on, which might be ok for new tables before they are actually going to be used), there's a way which might work for you. See the "schema_only_recovery" mode in the connector docs and the situations when it can be used (in particular, no schema changes must have happened since the last recorded offset). So the following procedure could work:

* Add the new tables, but don't write to them yet
* Shut down the connector
* Change the filter configuration and set snapshot mode to "schema_only_recovery"
* Delete (or rename) the existing DB history topic
* Restart the connector; this will re-create the internal history of the DB schema and then continue streaming from the previously recorded offset
* Begin writing to the new tables

I'd recommend to try this out on a testing system first.

> If not, do we need to define another connector with a different database.history.kafka.topic, and

That'd be the easiest approach, esp. it'd also allow for an initial snapshot of the newly added tables.



Manav Dewan

Jul 19, 2018, 2:22:27 PM7/19/18
to debezium

Can't we follow below step:

Precondition: If binlog event of add table is retained in Mysql
Let's assume event of add table is at time T1

1. Stop debezium connector task
2. Move to binlog location before the table is added (T1) . Refer to
3. Whitelist new table in filter
4. Restart the connector task.

Benefits: If data is added after new table creation that will be captured too. 

Disadvantage: Duplicate CDC events since finding exact location of T1 is difficult. I can moved offset to start of last known binlog location. So, if I have binlog retention of 7 days and table is added within same 7 days, Debezium will replay all events again.
Message has been deleted

Gunnar Morling

Jul 20, 2018, 3:10:17 AM7/20/18
to debezium
Yes, this should work in general, but as you said one might end up with lots of duplicated events. Also it would really work if the additional table you'd like to capture isn't rather new but instead already existed for a long time.

But this all is not very convenient and we should try to automate as much as possible. Essentially, after a filter change the connector should identify the tables to be captured in addition now, take an initial snapshot just of them and continue reading the logs for all the tables. See DBZ-175 for a discussion of some details. I hope we'll get this sorted out some time soon.

Chris Riccomini

Jul 20, 2018, 11:30:55 AM7/20/18
to debezium
FYI- Moira Tagle has made substantial progress on DBZ-175. There are still a couple of minor things to be worked out, but I expect that we'll see this feature soon. @gunnar, I suggest reaching out to her for status.

Eero Koplimets

Jul 20, 2018, 4:19:52 PM7/20/18
As we'll need the same thing in future been thinking about it as well... What do you think about this approach?

So lets say we very typical debezium job with snapshot.mode=initial and table whitelist=A,B. to make it more difficult :)

But now we need to add table C to whitelist. C is old table, already filled with data and created long time ago.

With current 0.8 executed during low activity:
1) pause existing job with whitelisted tables A,B.
2) Create new job with new offsets and history topics and table whitelist=C (only new table). Let it perform initial data snapshot. 
3) when snapshot done just delete this new job + it's offset/history etc topics. This will get us existing data from table C.
4) change existing job config to mode=schema_only_recovery + change whitelist to A,B,C
5) delete existing job history topic
6) resume existing job, let it recreate history topic now with table C definition as well. Connector should resume from old offset, so only very small amout of data duplication from table C.
7) change configuration back to snapshot.mode=initial just to be back on starting point


Gunnar Morling

Aug 2, 2018, 3:44:52 AM8/2/18
to debezium
Sorry for the late reply, this fell a bit through the cracks. This approach should work in general, but as you said you need to be prepared to see some duplicated events for the newly added table. In particular the state could be somewhat inconsistent in the interval between the point where you continue reading after the shutdown and the point where the snapshot for "C" was made. Once reading has proceeded beyond that snapshotting point, everything should be consistent again.

That all being said, as Chris is rightly pointing out, that's all workarounds and we really ought to provide the capability for whitelist changes to the connectors themselves. I'll get back to that PR by Moira and we'll hopefully be able to merge it soon.

Message has been deleted


Aug 13, 2021, 11:52:17 PM8/13/21
to debezium
Hi all, How is this feature going? If it has been merged, which version is it in?

In my test, I found that when I use 'snapshot. New. Tables = parallel', and then add a new table to the tablewhitelist, and then update the connector configuration, I will find that the existing tables will be randomly selected to re execute the snapshot, resulting in duplicate data in the table topic.

Aug 16, 2021, 2:24:06 AM8/16/21
to debezium
Reply all
Reply to author
0 new messages