Odd character conversion from Java strings

Dennis Thrysøe

unread,

Nov 21, 2002, 5:25:27 AM11/21/02

to

Hi,

I'm having an odd problem with 'illegal' characters being stored in DB2
instead of the characters being given to JDBC as values for an INSERT
statement.

Whenever I insert for instance the character 0x93 (decimal 147), DB2
stores a 0x1a (decimal 26 same as CTRL-Z).

I have another Java program developed by a third party, using the very
same JDBC driver, for which the character is inserted fine.

The problem isn't really very logical to me. Partly because the third
party program (isql.sourceforge.net) is open source, and I can see that
nothing obvious different happens. But also because the characters
showing the problem aren't all characters (but almost!) within range
0x80 - 0x9f:

0x80, *not 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a,
0x8b, 0x8b, 0x8c, * not 0x8d*, 0x8e, *not 0x8f*, *not 0x90*, 0x91, 0x92,
0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, *not 0x9d*,
0x9e, 0x9f

Any ideas appreciated!

Sincerely,

Dennis Thrysře

Dirk Wollscheid

unread,

Nov 21, 2002, 2:22:26 PM11/21/02

to

If you're dealing with character data (CHAR, VARCHAR, CLOB,...). What
the JDBC driver is going to do is to convert the caharacters java String
in unidcode to the characters in the database code page.

If you don't want that, use a BLOB.

So why do you think the characters in the database are wrong? How do you
check that?

Dirk

--
-------------- This is not an IBM official response --------------

All the information and opinions expressed herein are those of the
author and are not an official IBM statement.

It's also not OK to contact the author with questions or advertisement
using his email adress.

Sherman Lau

unread,

Nov 21, 2002, 3:19:18 PM11/21/02

to

The character conversion depends on the application environment and database
code page. In your case, it is very possible that your database code page
does not have the equivalent character for 0x93 in the application code
page.

When you says 0x93, do you mean U+0093 defined in Unicode? U+0093 is a
control character, set transmit state. Or do you mean 0x93 in Windows
enviroment? 0x93 in Windows code page 1252 is the left double quotation
mark.

What is your data base code page?

Sherman

"Dirk Wollscheid" <wol...@us.ibm.com> wrote in message
news:3DDD3272...@us.ibm.com...

Dennis Thrysøe

unread,

Nov 22, 2002, 2:24:53 AM11/22/02

to

Dirk Wollscheid wrote:
> Dennis Thrysře wrote:
>
>> Hi,
>>
>> I'm having an odd problem with 'illegal' characters being stored in
>> DB2 instead of the characters being given to JDBC as values for an
>> INSERT statement.
>>
>> Whenever I insert for instance the character 0x93 (decimal 147), DB2
>> stores a 0x1a (decimal 26 same as CTRL-Z).
>>
>> I have another Java program developed by a third party, using the very
>> same JDBC driver, for which the character is inserted fine.
>>
>>
>> The problem isn't really very logical to me. Partly because the third
>> party program (isql.sourceforge.net) is open source, and I can see
>> that nothing obvious different happens. But also because the
>> characters showing the problem aren't all characters (but almost!)
>> within range 0x80 - 0x9f:
>>
>> 0x80, *not 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a,
>> 0x8b, 0x8b, 0x8c, * not 0x8d*, 0x8e, *not 0x8f*, *not 0x90*, 0x91,
>> 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, *not
>> 0x9d*, 0x9e, 0x9f
>
>
> If you're dealing with character data (CHAR, VARCHAR, CLOB,...). What
> the JDBC driver is going to do is to convert the caharacters java String
> in unidcode to the characters in the database code page.

Right. But I would think that 0x93 (decimal 147) wouldn't be altered,
since it's within the first single byte?

> If you don't want that, use a BLOB.

I just want a 0x93 back if that was what I gave the database :) (It also
strikes me as quite odd that only some of the characters in the
0x80-0x9f range are replaced. I don't see the pattern here)

> So why do you think the characters in the database are wrong? How do you
> check that?

select hex(field) from table

Thanks for your help,

-dennis

Dennis Thrysøe

unread,

Nov 22, 2002, 2:28:17 AM11/22/02

to

Sherman Lau wrote:
> The character conversion depends on the application environment and database
> code page. In your case, it is very possible that your database code page
> does not have the equivalent character for 0x93 in the application code
> page.

The application codepage (according to the Java VM is Cp1252), so is the
database code page.

> When you says 0x93, do you mean U+0093 defined in Unicode? U+0093 is a
> control character, set transmit state. Or do you mean 0x93 in Windows
> enviroment? 0x93 in Windows code page 1252 is the left double quotation
> mark.

I mean the left double quotation mark.

But the really odd part is that 0x93 doesn't work, but for instance 0x90
works fine.

> What is your data base code page?

Cp1252

-dennis

Dirk Wollscheid

unread,

Nov 25, 2002, 1:34:48 PM11/25/02

to

Dennis Thrysře wrote:

> Dirk Wollscheid wrote:
>
>> If you're dealing with character data (CHAR, VARCHAR, CLOB,...). What
>> the JDBC driver is going to do is to convert the caharacters java
>> String in unidcode to the characters in the database code page.
>
>
> Right. But I would think that 0x93 (decimal 147) wouldn't be altered,
> since it's within the first single byte? >> If you don't want that, use a BLOB.
>
>
> I just want a 0x93 back if that was what I gave the database :) (It also
> strikes me as quite odd that only some of the characters in the
> 0x80-0x9f range are replaced. I don't see the pattern here)
>
>> So why do you think the characters in the database are wrong? How do
>> you check that?
>
>
> select hex(field) from table

Why don't you use BLOB if you're interested in the hex values?

If you use a character type the only thing that's guaranteed is that
when you store the character and you read it back as a charcter you get
the same character in the apps representation. If you read it back as a
hex value code page conversion won't be done. This means you apply
codepage conversion on stor but not on load and of course you get
different results.

Does that make sense?

Dennis Thrysøe

unread,

Nov 27, 2002, 2:51:13 AM11/27/02

to

Dirk Wollscheid wrote:
> Dennis Thrysře wrote:
>
>> Dirk Wollscheid wrote:
>>
>>> If you're dealing with character data (CHAR, VARCHAR, CLOB,...). What
>>> the JDBC driver is going to do is to convert the caharacters java
>>> String in unidcode to the characters in the database code page.
>>
>>
>>
>> Right. But I would think that 0x93 (decimal 147) wouldn't be altered,
>> since it's within the first single byte? >> If you don't want that,
>> use a BLOB.
>>
>>
>> I just want a 0x93 back if that was what I gave the database :) (It
>> also strikes me as quite odd that only some of the characters in the
>> 0x80-0x9f range are replaced. I don't see the pattern here)
>>
>>> So why do you think the characters in the database are wrong? How do
>>> you check that?
>>
>>
>>
>> select hex(field) from table
>
>
> Why don't you use BLOB if you're interested in the hex values?

I'm not.

> If you use a character type the only thing that's guaranteed is that
> when you store the character and you read it back as a charcter you get
> the same character in the apps representation.

That's the problem: I don't get the same character. I get a Ctrl-Z
character instead.

> If you read it back as a
> hex value code page conversion won't be done.

That was just for debug. It illustrated that the database actualy
*stores* 0x1a instead of 0x93.

> This means you apply
> codepage conversion on stor but not on load and of course you get
> different results.

Ordinarily - of course - I don't use the hex() function

> Does that make sense?

Yes, but unfortunately it doesn't help me :/

-dennis

Sherman Lau

unread,

Nov 28, 2002, 1:30:33 PM11/28/02

to

What is your database code page?

Sherman
"Dennis Thrysře" <den...@geysirheste.dk> wrote in message
news:PP_E9.43817$HU.29...@news010.worldonline.dk...

Dennis Thrysøe

unread,

Nov 29, 2002, 2:21:26 AM11/29/02

to

Asmentioned in another post in this thread, it's Cp1252.

-dennis

Sherman Lau

unread,

Nov 29, 2002, 1:52:03 PM11/29/02

to

Oh, I forgot to ask if you are running on Windows machine. If so, what is
your system locale setting?

Sherman

"Sherman Lau" <she...@ca.ibm.com> wrote in message
news:as8cto$s4a$1...@hanover.torolab.ibm.com...
> This is the first time you mention your database code page is 1252. If the
> database code page is 1252, the 0x93 character should be preserved. I was
> thinking you might be using a database with code page 819. Code page 819
> does not contain the character you mentioned in the first note and DB2
> replaces non-exisitng characters with substitution character (i.e. 0x1a).
>
> Did you set the encoding in your Java Application? There must be a code
> page conversion occur somewhere along the line and I am sure one of the
> target code page is not 1252 nor Unicode. And that target code page does
not
> contain that left double quotation mark character.

>
> Sherman
>
>
> "Dennis Thrysře" <den...@geysirheste.dk> wrote in message

> news:UzEF9.47629$HU.32...@news010.worldonline.dk...

Sherman Lau

unread,

Nov 29, 2002, 1:50:35 PM11/29/02

to

This is the first time you mention your database code page is 1252. If the
database code page is 1252, the 0x93 character should be preserved. I was
thinking you might be using a database with code page 819. Code page 819
does not contain the character you mentioned in the first note and DB2
replaces non-exisitng characters with substitution character (i.e. 0x1a).

Did you set the encoding in your Java Application? There must be a code
page conversion occur somewhere along the line and I am sure one of the
target code page is not 1252 nor Unicode. And that target code page does not
contain that left double quotation mark character.

Sherman

"Dennis Thrysře" <den...@geysirheste.dk> wrote in message

news:UzEF9.47629$HU.32...@news010.worldonline.dk...

Dennis Thrysøe

unread,

Dec 2, 2002, 2:50:47 AM12/2/02

to

Sherman Lau wrote:
> This is the first time you mention your database code page is 1252.

Yes, except for me relpy on the 22. Nov. to your post from the 21st. Nov. :)

> If the
> database code page is 1252, the 0x93 character should be preserved.

I agree. That was what I thought.

> I was
> thinking you might be using a database with code page 819. Code page 819
> does not contain the character you mentioned in the first note and DB2
> replaces non-exisitng characters with substitution character (i.e. 0x1a).

This is not (to my knowledge) the case. I only see Cp1252 in the Control
Center ('Configure' on a database).

> Did you set the encoding in your Java Application?

It is Cp1252.

> There must be a code
> page conversion occur somewhere along the line and I am sure one of the
> target code page is not 1252 nor Unicode.

I agree that a conversion must be occuring. I just don't see where.
According to the server-side DB2 trace a 0x93 character actually arrives
at the server. So it must be the server doing something wrong/undesired.

One of our customers run DB2 on a solrais box with codepage 819 and codeset
ISO8859-1. They don't have the problem.

Thanks for taking the time to help me out here.

-dennis

Dennis Thrysøe

unread,

Dec 2, 2002, 2:52:23 AM12/2/02

to

Sherman Lau wrote:
> Oh, I forgot to ask if you are running on Windows machine. If so, what is
> your system locale setting?

I am. Danish/Denmark on both the client and the server.

-dennis

Sherman Lau

unread,

Dec 4, 2002, 10:56:42 AM12/4/02

to

ok. we took this off-line as the number of replies grew too long. Here is
what we found. As I expected, it is just a minor problem in the java
application, not a DB2 problem.

Sherman

> Hi Dennis,

> Here is your input:

> 0069 006e 0073 0065 0072 0074 0020 0069 .i.n.s.e.r.t. .i

> 006e 0074 006f 0020 0063 006d 0064 0024 .n.t.o. .c.m.d.$

> 0061 0072 0074 0069 0063 006c 0065 0028 .a.r.t.i.c.l.e.(

> 0069 0064 002c 0020 0074 0069 0074 006c .i.d.,. .t.i.t.l

> 0065 002c 0020 0073 0074 0061 0074 0065 .e.,. .s.t.a.t.e

> 002c 0020 0073 0079 006e 006f 0070 0073 .,. .s.y.n.o.p.s

> 0069 0073 0029 0020 0076 0061 006c 0075 .i.s.). .v.a.l.u

> 0065 0073 0028 0039 0039 0039 0039 0039 .e.s.(.9.9.9.9.9

> 0039 002c 0020 0027 0074 0069 0074 006c .9.,. .'.t.i.t.l

> 0065 0027 002c 0020 0027 0045 0044 0049 .e.'.,. .'.E.D.I

> 0054 0027 002c 0020 0027 0093 0027 0029 .T.'.,. .'...'.)

>

> Your input is U+0093 which is the control <set transmit state> character.

Okay. That sounds like somethings wrong then. The odd things are:

1) The character works fine in the server-side application until it is

inserted and retrieved from DB2 (displays correctly in browser). It is

infact a 0x93 Java character.

2) A number of characters greater than 127 are stored and handles

correctly. For instance the special scandinavian danish characters.

3) The same code works fine with Oracle.

> Code page 1252 does not have this character

> (http://www.microsoft.com/globaldev/reference/sbcs/1252.htm); thus DB2

> replaced the character with the substitution character 0x1a during

> conversion.

Right. But I thought that my application was inserting a Cp1252 (Latin1)

0x93 value since both my JVM and the database are running Latin1.

Otherwise I don't get where the Latin1 0x93 is converted into a Unicode

0x93. And I also don't see which control I should have over this conversion.

> If you want to insert the "Open Double Quote" character 0x93 in code page

> 1252, your java application should have inserted U+201C, not U+0093.

I understand. Now I only need to figure out how to do that :)

-dennis