Create a mySQL Debezium Connector that only captures Schemas

87 views
Skip to first unread message

Drew von Zweck

unread,
Jul 24, 2024, 4:33:10 PM7/24/24
to debezium
I am trying to play with the configs on debezium 2.4.0 to create a connector that only captures captures the schemas in the initial snapshot, and then also only streams DDL for all the databases from there on out. I don't want ANY cdc data. This is essentially feeding a kafka consumer that looks for specific DDL statements and notifies people.

I was hoping that these would work...
"schema.history.internal.store.only.captured.tables.ddl": "false",
database.exclude.list: ".*"
 table.exclude.list: ".*"

My thinking being lets ignore all the tables and databases from a data streaming perspective, but still capture all schemas and schema changes. It sees like the snapshot works, but no DDL after that is picked up. Is there any way to do this using the available configuration?

Drew von Zweck

unread,
Jul 24, 2024, 6:16:32 PM7/24/24
to debezium
Setting table.include.list = "impossible-regex" seems to work. We aren't totally sure why, but is there some special logic with how an include list works with DDL changes vs an exclude list? We can work with this I believe, but would like to understand it better.

Based on our results we are wondering if...
We had an exclude list with table1, and "schema.history.internal.store.only.captured.tables.ddl": "false". Then we removed table1 from the exclude list, why wouldn't this schema exist?
vs.
Include list without table1, and "schema.history.internal.store.only.captured.tables.ddl": "false". Then we added table1 from the include list, we see that the schema does exist in this scenario. (as the documentation also states)

Chris Cranford

unread,
Jul 25, 2024, 6:00:22 AM7/25/24
to debe...@googlegroups.com
Hi Drew -

So if "schema.history.internal.store.only.captured.tables.ddl" is set to "false", then in this case all non-built-in schemas table structures will be captured, irrespective of your include or exclude lists.  When that setting is set to "true", then whatever tables match the include or don't match the exclude will be added to the schema history topic.

So in other words, if the "schema.history.internal.store.only.captured.tables.ddl" is set to "false", it shouldn't matter whether you use "table.include.list" or "table.exclude.list".  The schema for that table should be captured regardless.  The only influence the table include/exclude would have on is whether the CDC events for that table would be sent as change events from Debezium or not.

If you observe anything different, we'd need to understand your configuration & sequence of events.

Thanks,
-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/d6abe15d-0182-48ce-9598-8a9eed5bcaaen%40googlegroups.com.

Drew von Zweck

unread,
Jul 25, 2024, 10:29:59 AM7/25/24
to debe...@googlegroups.com, Brandon Stanley
Hey Chris, in response to
'If you observe anything different, we'd need to understand your configuration & sequence of events.'
we can review what I am seeing.

Goal: We want to exclude cdc data for ALL tables, but include DDL for ALL tables (snapshot and subsequent) into the schema change topic.

Noteworthy configs that are shared across both scenarios...
"schema.history.internal.store.only.captured.tables.ddl": "false",
'topic.creation.groups': 'schema_topic',
 'topic.creation.schema_topic.include': '__schema_only_debezium_schema_change_data_factory_dev'
'topic.creation.schema_topic.replication.factor': -1, 
'topic.creation.schema_topic.partitions': 1, 
'topic.creation.schema_topic.include': '__schema_only_debezium_schema_change_data_factory_dev',
 'topic.creation.schema_topic.cleanup.policy': 'delete', 
'topic.creation.schema_topic.retention.bytes': -1, 
'topic.creation.schema_topic.compression.type': 'lz4',
'transforms.AddPrefix.regex': 'data_factory_dev-debezium-schemaonly-202407242155'
'transforms.AddPrefix.replacement': '__schema_only_debezium_schema_change_data_factory_dev'
'schema.history.internal.kafka.topic': '__debezium_internal_schema_history_data_factory_dev-debezium-schemaonly-202407242155'

Scenario 1:
'table.exclude.list':'.*'
1. Create connector
2. See all messages from snapshot in __schema_only_debezium_schema_change_data_factory_dev topic
3. Alter schema of t1
4. I see a new message added to


You received this message because you are subscribed to a topic in the Google Groups "debezium" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/debezium/jvi40xkRcLQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/213a6414-d5c9-4249-8d9c-aa5d81e87a15%40gmail.com.

Drew von Zweck

unread,
Jul 25, 2024, 10:31:21 AM7/25/24
to debe...@googlegroups.com
not sure what that message seems to have been formatted  with duplicate content, but Scenario 1-4 that are highlighted are the main focus.

Chris Cranford

unread,
Jul 26, 2024, 8:20:02 AM7/26/24
to debe...@googlegroups.com
Hi Drew

Now I understand.  So there are two key configuration properties:

    schema.history.internal.store.only.captured.tables.ddl
    schema.history.internal.store.only.captured.databases.ddl

Now by default, out of the box, both of these settings are `false`, except for MySQL and MariaDB where the "store.only.captured.databases.ddl" is `true`.

So if you are connecting to a MySQL or MariaDB source database, you will need to override this property as `false` to get the desired behavior.

Does that align the scenarios in that case?

Thanks,
-cc
Reply all
Reply to author
Forward
0 new messages