PostgreSQL databases with non UTF-8 character encoding

557 views
Skip to first unread message

Jerrell Schivers

unread,
Aug 2, 2018, 8:50:04 PM8/2/18
to debezium
Hi folks,

I know that the documentation clearly states that "Debezium currently supports only database with UTF-8 character encoding," but I'm asking if there's any possible work-around for this? We're stuck with a PostgreSQL SQL ASCII database which for the most part contains valid UTF-8 data, but there are a few rows which do not. Yes, there are reasons we chose to use SQL ASCII when we created this database, and perhaps we shall re-evaluate that decision, but for now it's SQL ASCII.

I setup the Debezium Connector for PostgreSQL using the wal2json plugin just to see what would happen. Predictably I soon saw this error:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xe9 0x66 0x69

Yep, not surprising. When I was previously evaluating Kafka Connect, my work-around was to specify some custom SQL to extract the data and used encode(<column>:bytea, 'base64') to encode the data in base64, then write a custom SMT (Simple Message Transformation) to decode the data back before it landed in Kafka. Seemed to work. But I don't know if it's possible to do that sort of trick with Debezium.

I realize this is probably a futile effort unless we change the encoding of our database, but any ideas?

Thanks,
Jerrell

Jiri Pechanec

unread,
Aug 3, 2018, 3:07:26 AM8/3/18
to debezium
Hi,

ther problem is that the change would need to be added not only in Debezium itslef but also in logical decoding plugins. So this is porsible to be added in future it is not a priority now. But if are able to look at the code it would be of a great help to us

J.
Reply all
Reply to author
Forward
0 new messages