Hi folks,
I know that the documentation clearly states that "Debezium currently supports only database with UTF-8 character encoding," but I'm asking if there's any possible work-around for this? We're stuck with a PostgreSQL SQL ASCII database which for the most part contains valid UTF-8 data, but there are a few rows which do not. Yes, there are reasons we chose to use SQL ASCII when we created this database, and perhaps we shall re-evaluate that decision, but for now it's SQL ASCII.
I setup the Debezium Connector for PostgreSQL using the wal2json plugin just to see what would happen. Predictably I soon saw this error:
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xe9 0x66 0x69
Yep, not surprising. When I was previously evaluating Kafka Connect, my work-around was to specify some custom SQL to extract the data and used encode(<column>:bytea, 'base64') to encode the data in base64, then write a custom SMT (Simple Message Transformation) to decode the data back before it landed in Kafka. Seemed to work. But I don't know if it's possible to do that sort of trick with Debezium.
I realize this is probably a futile effort unless we change the encoding of our database, but any ideas?
Thanks,
Jerrell