Problem with non-string required toasted columns in PostgreSQL connector

308 views
Skip to first unread message

Alexander Chermenin

unread,
Mar 14, 2021, 1:11:26 AM3/14/21
to debezium

Hello everyone,

I want to share a small problem that we faced: in the case when there are required toasted columns with type differ to string (for example, array), can be raised the following exception on update event:

org.apache.kafka.connect.errors.DataException: Invalid value: null used for required field: "...", schema type: ARRAY


It's connected with the code, where a placeholder returns: https://github.com/debezium/debezium/blob/b2ce5104139c86cc95548409630134c0d9715959/debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresValueConverter.java#L1032

It returns as a string and can not be cast to an array, so the null value uses what leads to the exception.

As a short-term solution, I suggest throwing an exception instead of returning the placeholder for unchanged toasted values in this method for unknown data, to inform that it's necessary to use REPLICA IDENTITY FULL for tables with such columns.

A long-term solution requires some additional discussions because this is a slightly more complex problem than it might initially appear: seems there is no flag or attribute in Kafka Connect to mark some field value as unchanged, but there can be used a placeholder for that how it implemented for `string` typed values. But firstly, how I already describe before, for toasted columns with other types, it's impossible (maybe except for arrays of string only), because it's not known what value to use as a placeholder in these cases; secondly, how to differ possible placeholder values and real values from a database.

So, thanks a lot for the attention, and what do you think about all of that?

Gunnar Morling

unread,
Mar 16, 2021, 3:45:23 PM3/16/21
to debezium
Hi,

Thanks for raising this; of what type is that column exactly? Note there's a binary placeholder already, which also is used in some cases. All in all, I agree that this needs some more consideration. 

> for toasted columns with other types, it's impossible

Do we know which columns can be TOASTed to begin with?

> how to differ possible placeholder values and real values from a database.

That's why the placeholder is configurable, so you can set it to a value that's distinct from any actual values. But it looks as if we may have to expand this for more column types.

Could you log this in Jira please?

Thanks,

--Gunnar
Reply all
Reply to author
Forward
0 new messages