I'm having an odd problem with 'illegal' characters being stored in DB2
instead of the characters being given to JDBC as values for an INSERT
statement.
Whenever I insert for instance the character 0x93 (decimal 147), DB2
stores a 0x1a (decimal 26 same as CTRL-Z).
I have another Java program developed by a third party, using the very
same JDBC driver, for which the character is inserted fine.
The problem isn't really very logical to me. Partly because the third
party program (isql.sourceforge.net) is open source, and I can see that
nothing obvious different happens. But also because the characters
showing the problem aren't all characters (but almost!) within range
0x80 - 0x9f:
0x80, *not 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a,
0x8b, 0x8b, 0x8c, * not 0x8d*, 0x8e, *not 0x8f*, *not 0x90*, 0x91, 0x92,
0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, *not 0x9d*,
0x9e, 0x9f
Any ideas appreciated!
Sincerely,
Dennis Thrysře
If you're dealing with character data (CHAR, VARCHAR, CLOB,...). What
the JDBC driver is going to do is to convert the caharacters java String
in unidcode to the characters in the database code page.
If you don't want that, use a BLOB.
So why do you think the characters in the database are wrong? How do you
check that?
Dirk
--
-------------- This is not an IBM official response --------------
All the information and opinions expressed herein are those of the
author and are not an official IBM statement.
It's also not OK to contact the author with questions or advertisement
using his email adress.
When you says 0x93, do you mean U+0093 defined in Unicode? U+0093 is a
control character, set transmit state. Or do you mean 0x93 in Windows
enviroment? 0x93 in Windows code page 1252 is the left double quotation
mark.
What is your data base code page?
Sherman
"Dirk Wollscheid" <wol...@us.ibm.com> wrote in message
news:3DDD3272...@us.ibm.com...
Right. But I would think that 0x93 (decimal 147) wouldn't be altered,
since it's within the first single byte?
> If you don't want that, use a BLOB.
I just want a 0x93 back if that was what I gave the database :) (It also
strikes me as quite odd that only some of the characters in the
0x80-0x9f range are replaced. I don't see the pattern here)
> So why do you think the characters in the database are wrong? How do you
> check that?
select hex(field) from table
Thanks for your help,
-dennis
The application codepage (according to the Java VM is Cp1252), so is the
database code page.
> When you says 0x93, do you mean U+0093 defined in Unicode? U+0093 is a
> control character, set transmit state. Or do you mean 0x93 in Windows
> enviroment? 0x93 in Windows code page 1252 is the left double quotation
> mark.
I mean the left double quotation mark.
But the really odd part is that 0x93 doesn't work, but for instance 0x90
works fine.
> What is your data base code page?
Cp1252
-dennis
Why don't you use BLOB if you're interested in the hex values?
If you use a character type the only thing that's guaranteed is that
when you store the character and you read it back as a charcter you get
the same character in the apps representation. If you read it back as a
hex value code page conversion won't be done. This means you apply
codepage conversion on stor but not on load and of course you get
different results.
Does that make sense?
I'm not.
> If you use a character type the only thing that's guaranteed is that
> when you store the character and you read it back as a charcter you get
> the same character in the apps representation.
That's the problem: I don't get the same character. I get a Ctrl-Z
character instead.
> If you read it back as a
> hex value code page conversion won't be done.
That was just for debug. It illustrated that the database actualy
*stores* 0x1a instead of 0x93.
> This means you apply
> codepage conversion on stor but not on load and of course you get
> different results.
Ordinarily - of course - I don't use the hex() function
> Does that make sense?
Yes, but unfortunately it doesn't help me :/
-dennis
Sherman
"Dennis Thrysře" <den...@geysirheste.dk> wrote in message
news:PP_E9.43817$HU.29...@news010.worldonline.dk...
-dennis
Sherman
"Sherman Lau" <she...@ca.ibm.com> wrote in message
news:as8cto$s4a$1...@hanover.torolab.ibm.com...
> This is the first time you mention your database code page is 1252. If the
> database code page is 1252, the 0x93 character should be preserved. I was
> thinking you might be using a database with code page 819. Code page 819
> does not contain the character you mentioned in the first note and DB2
> replaces non-exisitng characters with substitution character (i.e. 0x1a).
>
> Did you set the encoding in your Java Application? There must be a code
> page conversion occur somewhere along the line and I am sure one of the
> target code page is not 1252 nor Unicode. And that target code page does
not
> contain that left double quotation mark character.
>
> Sherman
>
>
> "Dennis Thrysře" <den...@geysirheste.dk> wrote in message
> news:UzEF9.47629$HU.32...@news010.worldonline.dk...
Did you set the encoding in your Java Application? There must be a code
page conversion occur somewhere along the line and I am sure one of the
target code page is not 1252 nor Unicode. And that target code page does not
contain that left double quotation mark character.
Sherman
"Dennis Thrysře" <den...@geysirheste.dk> wrote in message
news:UzEF9.47629$HU.32...@news010.worldonline.dk...
Yes, except for me relpy on the 22. Nov. to your post from the 21st. Nov. :)
> If the
> database code page is 1252, the 0x93 character should be preserved.
I agree. That was what I thought.
> I was
> thinking you might be using a database with code page 819. Code page 819
> does not contain the character you mentioned in the first note and DB2
> replaces non-exisitng characters with substitution character (i.e. 0x1a).
This is not (to my knowledge) the case. I only see Cp1252 in the Control
Center ('Configure' on a database).
> Did you set the encoding in your Java Application?
It is Cp1252.
> There must be a code
> page conversion occur somewhere along the line and I am sure one of the
> target code page is not 1252 nor Unicode.
I agree that a conversion must be occuring. I just don't see where.
According to the server-side DB2 trace a 0x93 character actually arrives
at the server. So it must be the server doing something wrong/undesired.
One of our customers run DB2 on a solrais box with codepage 819 and codeset
ISO8859-1. They don't have the problem.
Thanks for taking the time to help me out here.
-dennis
I am. Danish/Denmark on both the client and the server.
-dennis
Sherman
> Hi Dennis,
> Here is your input:
> 0069 006e 0073 0065 0072 0074 0020 0069 .i.n.s.e.r.t. .i
> 006e 0074 006f 0020 0063 006d 0064 0024 .n.t.o. .c.m.d.$
> 0061 0072 0074 0069 0063 006c 0065 0028 .a.r.t.i.c.l.e.(
> 0069 0064 002c 0020 0074 0069 0074 006c .i.d.,. .t.i.t.l
> 0065 002c 0020 0073 0074 0061 0074 0065 .e.,. .s.t.a.t.e
> 002c 0020 0073 0079 006e 006f 0070 0073 .,. .s.y.n.o.p.s
> 0069 0073 0029 0020 0076 0061 006c 0075 .i.s.). .v.a.l.u
> 0065 0073 0028 0039 0039 0039 0039 0039 .e.s.(.9.9.9.9.9
> 0039 002c 0020 0027 0074 0069 0074 006c .9.,. .'.t.i.t.l
> 0065 0027 002c 0020 0027 0045 0044 0049 .e.'.,. .'.E.D.I
> 0054 0027 002c 0020 0027 0093 0027 0029 .T.'.,. .'...'.)
>
> Your input is U+0093 which is the control <set transmit state> character.
Okay. That sounds like somethings wrong then. The odd things are:
1) The character works fine in the server-side application until it is
inserted and retrieved from DB2 (displays correctly in browser). It is
infact a 0x93 Java character.
2) A number of characters greater than 127 are stored and handles
correctly. For instance the special scandinavian danish characters.
3) The same code works fine with Oracle.
> Code page 1252 does not have this character
> (http://www.microsoft.com/globaldev/reference/sbcs/1252.htm); thus DB2
> replaced the character with the substitution character 0x1a during
> conversion.
Right. But I thought that my application was inserting a Cp1252 (Latin1)
0x93 value since both my JVM and the database are running Latin1.
Otherwise I don't get where the Latin1 0x93 is converted into a Unicode
0x93. And I also don't see which control I should have over this conversion.
> If you want to insert the "Open Double Quote" character 0x93 in code page
> 1252, your java application should have inserted U+201C, not U+0093.
I understand. Now I only need to figure out how to do that :)
-dennis