I'd like to understand how to proper convert data from a binary blob to a text blob with encoding

46 views
Skip to first unread message

Nickolas Deluca

unread,
Sep 5, 2025, 7:58:53 AM (3 days ago) Sep 5
to firebird-support
Recently I caught a "wrong practice" in our database, which was using binary blobs (sub type 0) to store texts.

Even though there is not actually a problem with doing that, binary blobs won't store enconding and that can cause issues, specially when you're working with third party libraries, like `node-firebird`.

Our database is `WIN1252` encoded, a legacy practice from more than 10 years ago.

I started converting those binary blobs to text blobs with the proper encoding, created another field, updated the data from one to the other, renamed the old and then renamed the new field as the old one, should've been easy right?

Nope. Now, whenever we perform queries on that field using `LIKE` or `CONTAINING`, we get the following exception:

`Cannot transliterate character between character sets.`

I've come to understand that this is because some of the data stored in that field is considered `UTF8` and some UTF8 characters can't be translated to WIN1252.

Just to sanity check, I created a new table, added a field using the same domain as the field I described earlier, added some records manually and performed the LIKE/CONTAINING queries, worked like a charm.

What I want to know is how to properly transfer my data from one field to the other encoding it to WIN1252 so that I can perform those type of queries again.

(We managed to get around that issue casting the new field to the same domain as the old one when performing the LIKE/CONTAINING queries, thankfully it started working again)

I don't know if this will help, but here's the DDL for both domains:

The new BLOB Text field

```sql
CREATE DOMAIN D_BLOB AS
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET WIN1252
COLLATE WIN1252;
```

The old BLOB Binary field

```sql
CREATE DOMAIN D_MEMO AS
BLOB SUB_TYPE 0 SEGMENT SIZE 80;
```

Nickolas Deluca

unread,
Sep 5, 2025, 4:55:21 PM (3 days ago) Sep 5
to firebird-support

Never mind guys, the issue was that SOME characters in the field were badly encoded and were taken from a poorly setup connection to the database, in this case, they used RUSSIAN_CHARSET to connect to our WIN1252 database and edited some triggers and procedures that contained strings, those strings then became corrupted and that corruption was inserted into the field, causing the issue.

When the field was a binary blob, no problem. But when we imposed an WIN1252 encoding on it, it caused the issue.

Pavel Cisar

unread,
Sep 6, 2025, 7:06:23 AM (2 days ago) Sep 6
to firebird...@googlegroups.com
Hi,

the binary blob to text blob with charset conversion issue is basically
the same as using [var]char charset none to [var]char with other
charset. Without charset specification, Firebird uses provided data AS
IS, and it works in 99.9% of cases (0.1% are cases when no error is
raised, but wrong result is returned). It's quite common that some
"wrong" data get into the database this way, from clients that use
unexpected charset, from sw that has other assumptions how db expects or
by simple bugs in applications.

When you want to convert data from this unspecified charset state to
clean charset specification, it's always recommended to validate (and
correct) the data first. Write a short script (Python is great choice
for this, but use whatever suits you) that reads the data and converts
it to desired charset. Catch all errors and print out PK's of these
rows. You can then fix it in your favorite db admin tool.

best regards
Pavel Cisar
IBPhoenix

Reply all
Reply to author
Forward
0 new messages