Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unpleasant experiences with CLOB encoding

242 views
Skip to first unread message

claes_douglas

unread,
Jan 24, 2013, 8:45:23 AM1/24/13
to
Hello,

This applies to db2 10 for z/OS.
I’m returning an XML structure into a CLOB using the db2 XML functions.
The XMLdeclaration in the XML says the data is in UTF-8, as does the documentation of the XMLSERIALIZE function when the result is a CLOB.
The variable declaration which receives the output:
SQL TYPE IS CLOB(1M) xmlClob;

I’m surprised to see however that when I look at the data in variable xmlCLob it is not in UTF-8 but (almost) in my “normal” EBCDIC (278), with the exception of special characters such as “åä” which are nonsense characters like #, { and the like.
I would have expected the xmlClob data to be in either pure UTF-8 or pure EBCDIC, but now it’s _mostly_ EBCDIC but without proper handling of any non A-Z characters.
Eventually, I need to get the data moved to an EBCDIC string. But when the data in xmlClob is in a weird hybrid encoding that’s not easy.
What is the suggested remedy for this? Any system setups that need to be in place for a CLOB to be in it’s right CCSID? Use DBCLOB instead?

Thanks.

Peter H. Coffin

unread,
Jan 24, 2013, 9:05:39 AM1/24/13
to
On Thu, 24 Jan 2013 05:45:23 -0800 (PST), claes_douglas wrote:

> Hello,
>
> This applies to db2 10 for z/OS. I?m returning an XML structure into a
> CLOB using the db2 XML functions. The XMLdeclaration in the XML says
> the data is in UTF-8, as does the documentation of the XMLSERIALIZE
> function when the result is a CLOB. The variable declaration which
> receives the output: SQL TYPE IS CLOB(1M) xmlClob;
>
> I?m surprised to see however that when I look at the data in variable
> xmlCLob it is not in UTF-8 but (almost) in my ?normal? EBCDIC (278),
> with the exception of special characters such as ???? which are
> nonsense characters like #, { and the like. I would have expected the
> xmlClob data to be in either pure UTF-8 or pure EBCDIC, but now it?s
> _mostly_ EBCDIC but without proper handling of any non A-Z characters.
> Eventually, I need to get the data moved to an EBCDIC string. But when
> the data in xmlClob is in a weird hybrid encoding that?s not easy.
> What is the suggested remedy for this? Any system setups that need to
> be in place for a CLOB to be in it?s right CCSID? Use DBCLOB instead?

Are the data UTF-8 or not? It's sometimes very difficult to tell whether
the problem is with improper encoding of correct data, or proper
encoding of bad data.

--
40. I will be neither chivalrous nor sporting. If I have an unstoppable
superweapon, I will use it as early and as often as possible instead
of keeping it in reserve.
--Peter Anspach's list of things to do as an Evil Overlord

claes_douglas

unread,
Jan 24, 2013, 5:41:04 PM1/24/13
to
> Are the data UTF-8 or not? It's sometimes very difficult to tell whether
>
> the problem is with improper encoding of correct data, or proper
>
> encoding of bad data.

Well, after reading a bit closer in the Applications programming guide it states quite clearly actually that data is converted to the "encoding scheme of the subsystem". So in my case that encoding scheme is EBCDIC which means that regardless if I read UTF-8 up from somewhere it's going to end up in EBCDIC. Unless, which is what I will try tomorrow, I follow my host variable declarations with a DECLARE :hostvar VARIABLE CCSID 1208 statement.
So if this is correct I will be able to declare my CLOB as 1208 and then iconv() it from there to my EBCDIC.
Of course it would have been nice if this had worked automatically, but for all of us who are using EBCDICs other than 1047 it seems to only work for A-Z type characters. A bit silly, since the umlauts I'm getting transformed to junk all exists in both UTF-8 and CCSID 278.

0 new messages