Hi folks,
I believe we've encountered a bug in the oracle connector around how it builds a table's schema should it receive a DML event for a table not in its schema history. Specially, it can miss marking a primary key column as non-optional.
Running Debezium in Kafka Connect:
- version: 2.7.3-Final
- in standalone mode
- with schema history written to a local file.
- debezium is reading from an XStream outbound server
- debezium is configured to only care about a specific set of table schemas.
This bug is unearthed due to the intersect of two things:
1. Debezium is connecting to the db using a service name, e.g. `DB1_SVC.WORLD` not the db name of `DB1.WORLD`.
Debezium is configured with `database.dbname=DB1_SVC.WORLD`.
Note, `database.pdb.name` is not currently set, though we’ll investigate as it will likely allow us to work around this issue.
2. The table has an `ID` primary key. However, its CREATE TABLE statement does not mark `ID` as non null or add the primary key. A second ALTER TABLE statement adds the primary key constraint on `ID`.
When Debezium first starts, the initial building of table schema is done via a JDBC connection. This gets the db name from `database.dbname` (or `database.pdb.name` if set), so the initial schema uses `DB1_SVC.WORLD` in its tableId. This code path correctly marks the `ID` primary key column as non-optional.
Unfortunately, DML events come through with the database name of `DB1.WORLD` and hence the table's schema can’t be found. This triggers the `LcrEventHandler` to build the schema for the table again. However, it builds the table schema using a different method: by parsing the table’s DDL statements, which appears to have a bug: it does not mark the `ID` primary key column as non-optional when handling the second ALTER TABLE statement.
This change in schema later causes Avro schema serialisation failures for us.
Thought I’d put this out there given I’ve investigated and found the cause. Should I raise an ticket?
Thanks,
Andy