database.history.store.only.captured.tables.ddl

1,197 views
Skip to first unread message

Gali Winer

unread,
Jul 25, 2022, 8:32:57 AM7/25/22
to debezium

Hi,  

i want to use this property and set it to "true" but see that there is some kind of downside and don't understand why. Can you please explain it to me?

"Set to true with care because missing data might become necessary if you change which tables have their changes captured.The safe default is false."

Thanks,Gali

Chris Cranford

unread,
Jul 25, 2022, 12:03:24 PM7/25/22
to debe...@googlegroups.com, Gali Winer
Hi Gali -

When "database.history.store.only.captured.tables.ddl" is false, we capture all table DDL changes.  This allows you to safely add tables to the "table.include.list" configuration that may have existed before prior to the connector being created as well as over the connector's lifetime safely because we have continuously tracked the table's schema evolution. 

However, when this is set to true, you loose this safety and it means that if you add a table to the "table.include.list" that existed prior to the connector being created or a table that has been created over the connector's lifetime, you may run into problems with specific operations.  For example, incremental snapshots expects the table's schema to be registered with the in-memory relational model, but when the table's DDL isn't captured, no schema will exist and incremenal snapshots will fail. 

There are two ways forward in this use case:

The first is to send a 'schema-changes' signal with all the JSON-based schema change metadata that the connector would generate, including column mappings, data types, etc.  This is quite advanced and you may want to use a temporary second connector to generate this data in a separate history topic for the table in question and then use that data as the basis for your signal.  The other way (and I'm not entirely sure every connector supports this) but during streaming, some connectors have a check that if a change event happens for a table that is included based on the configuration but does not yet have a relational model in-memory, we generate the model followed by emitting the event.  So you can safely trigger this second approach with a small update to an existing row or an insert of a new row.  Once you've completed either, you should be able to safely perform an incremental snapshot.

Hope that clarifies.
Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/7b797814-7c88-4950-a4e2-46ee4819ba28n%40googlegroups.com.

Alon P

unread,
Jan 29, 2023, 12:22:44 AM1/29/23
to debezium
Hi Chris,

I'm currently facing the exact situtation you described, but in my case the parameter "database.history.store.only.captured.tables.ddl" was not set in the connector's config (MySQL connector, version 2.0.0.Final) and according to the documentation (https://debezium.io/documentation/reference/2.0/connectors/mysql.html#mysql-property-database-history-store-only-captured-tables-ddl) it should default to "false" but the only DDLs that were captured for tables that are excluded, are CREATE TABLE when the connector was created.

So, a few follow-up questions:
  1. Can you please verify that indeed this parameter defaults to "false" when not specified and it's not a documentation bug?
  2. Is there any open/resolved bug reagrding the scenario I described?
  3. Where in the documentation the 'schema-changes' signal is explained? I could only find 'log' and 'snapshot' signals there.
  4. Is MySQL connector supports the generation of the model during streaming when it's missing as you described?
Thanks,
Alon

Chris Cranford

unread,
Jan 30, 2023, 1:54:06 PM1/30/23
to debe...@googlegroups.com
Hi Alon,

I can confirm that for (1), the default is indeed "false". 

If the table was never mutated via DDL, then for (2) all you would see is the "CREATE TABLE" statements from start-up, but with one exception.  If you have configured "database.history.skip.unparseable.ddl=true" in your configuration, then it's also possible the table was mutated via "ALTER TABLE" but the statement couldn't be parsed, and therefore a warning was written to the logs & no entry emitted to the history topic, which would also explain the missing entries in the topic that you expect to exist.

Regarding (3), it's not currently documented and is going to be connector-specific.  I'll raise a Jira on this, but for now I would say if you have any specific question on how to use it or what to send, we can discuss that in context of MySQL or whatever connector you're using to make sure that the data is properly serialized.

And finally regarding (4), MySQL does not.  If the table is not known the connector during the deserialization, it reports an unknown table warning/error.

Chris

Alon P

unread,
Jan 30, 2023, 3:55:18 PM1/30/23
to debezium
Hi Chris,

Thanks for the quick response!

Could you please send me an example of how the warning message to the log would look like if the statement couldn't parsed (for mysql connector) so I could search my logs for it?

And I'd like also an example of how to use the 'schema-changes' signal for mysql.

- Alon

Alon P

unread,
Jan 30, 2023, 5:08:14 PM1/30/23
to debezium
One more question - would you recommend switching the connector to "snapshot.mode: schema_only_recovery" and then back to "initial" instead of adding the missing schema changes via the signals?
Any specific risks with doing the "schema_only_recovery" ?

Alon P

unread,
Jan 31, 2023, 4:45:53 PM1/31/23
to debezium
Hi Chris,
I know we also split this conversation into Zulip but could you please provide me guidance how'd you recommend to resolve the issue - would you recommend using 'schema-change' signal or 'schema_only_recovery' snaphot mode?
Reply all
Reply to author
Forward
0 new messages