Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Translate function

217 views
Skip to first unread message

Luiz da Silva

unread,
Jan 22, 2015, 3:13:29 PM1/22/15
to
Hi all,

This works fine in a database with ISO8859-1 code set, but fails in UTF-8:

SELECT TRANSLATE(LTRIM(RTRIM('Blablabla ')),
'''{} ',
'"[]' || X'000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9ca',
' ')
FROM SYSIBM.SYSDUMMY1

The second, third or fourth argument of the TRANSLATE scalar function is incorrect.. SQLCODE=-176, SQLSTATE=42815

In SQL0176N description this drew my attention:

"The translate scalar function does not allow replacement of a character by another character which is encoded using a different number of bytes. For example, a single-byte character cannot be replaced with a double-byte character nor can a double-byte character be replaced with a single-byte character."

Would that be the case? Any ideas why it fails in UTF-8?

Thanks fellows!

Peter H. Coffin

unread,
Jan 22, 2015, 5:25:07 PM1/22/15
to
How many bytes is UTF-8? Especially beyond hex 7E?

--
87. My vats of hazardous chemicals will be covered when not in use.
Also, I will not construct walkways above them.
--Peter Anspach's list of things to do as an Evil Overlord

Luiz da Silva

unread,
Jan 23, 2015, 7:40:34 AM1/23/15
to
from what I understand, it's double byte.

Peter H. Coffin

unread,
Jan 23, 2015, 9:55:08 AM1/23/15
to
Nope, it's variable-byte, with some specific ranges of bytes being
reserved as markers for OTHER ranges of characters. So 'n' is one byte,
'ñ' is two bytes. And about a quarter the bytes in that pad range don't make
valid characters at all.

--
I think it's a beautiful day to go to the zoo and feed the ducks.
To the lions.
-- Brian Kantor

Lennart Jonsson

unread,
Jan 25, 2015, 1:07:45 AM1/25/15
to
To elaborate a bit, UCS-2 (predecessor of UTF-16) which is used in
GRAPHIC, is double byte. Java for example uses UTF-16 (also double byte)
in it's String's.

Personally I find UTF-8 annoyingly complicated, but I guess it is the
price one have to pay for backward compliance with for example ISO8859-1.


0 new messages