Unexpected conversion from AL32UTF8 to WE8ISO8859P1

Jaap W. van Dijk

unread,

Jan 8, 2009, 5:16:15 AM1/8/09

to

Hi,

I have two databases:

the source, version 10.2.0.3.0 with AL32UTF8, and
the target, version 9.2.0.5.0, with WE8ISO8859P1.

I copy a table, containing single and double byte characters, from the
source database to the target database with "insert into ... select
from ..." over a database link. The copy is done in a database
procedure in the target database that is started in a Windows sqlplus
session.

What happens is that in the target database all the originally double
byte characters are converted to an inverted question mark, including
characters that are defined within WE8ISO8859P1.

For instance: a SMALL E ACUTE (code point C265) in the source database
is converted to an INVERTED QUESTION MARK (code point BF) in the
target database.

Why is code point C265 not converted to code point E9, which is SMALL
E ACUTE in WE8ISO8859P1?

Regards,
Jaap.

Laurenz Albe

unread,

Jan 8, 2009, 6:35:06 AM1/8/09

to

Don't know, it works for me between two 10.2.0.4 databases:

I start an sqlplus connection to DB1 (charset AL32UTF8):

CREATE TABLE nlsdata (
id NUMBER(4) CONSTRAINT nlsdata_pk PRIMARY KEY,
val VARCHAR2(10 CHAR)
);
INSERT INTO nlsdata VALUES (1, 'normal');
INSERT INTO nlsdata VALUES (2, 'été');
COMMIT;

Then I connect with sqlplus to DB2 (charset WE8ISO8859P1):

CREATE DATABASE LINK db1
CONNECT TO db1_user IDENTIFIED BY db1pwd USING 'db1';

CREATE TABLE nlsdata (
id NUMBER(4) CONSTRAINT nlsdata_pk PRIMARY KEY,
val VARCHAR2(10 CHAR)
);

CREATE PROCEDURE copy_data AUTHID CURRENT_USER AS
BEGIN
INSERT INTO nlsdata SELECT * FROM nlsdata@db1;
END;
/

Then I connect from a Windows machine with sqlplus and run the procedure:

EXEC copy_data;
COMMIT;

Now when I select from table "nlsdata" on db2, everything looks fine:

SELECT * FROM nlsdata;

ID VAL
---------- ----------------------------------------
1 normal
2 été

Maybe you can try to copy my example and see if it works for you.

Can you post your stored procedure and how you call it?

Yours,
Laurenz Albe

Jonathan Lewis

unread,

Jan 8, 2009, 7:06:13 AM1/8/09

to

"Jaap W. van Dijk" <j.w.v...@hetnet.nl> wrote in message
news:1066c292-96b1-4722...@k36g2000pri.googlegroups.com...

Just guessing, but possibly the simplest coding option
was to convert every two-byte code point that didn't
start with a zero into '0xBF'.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Jaap W. van Dijk

unread,

Jan 8, 2009, 9:31:48 AM1/8/09

to

On Jan 8, 12:35 pm, Laurenz Albe <inv...@spam.to.invalid> wrote:

> Laurenz Albe- Hide quoted text -
>
> - Show quoted text -

Hi Laurenz,

Just to be sure about what is in the database: could you execute a
'select dump(val) from nlsdate' on both databases?

Furthermore my, procedure is coded as follows:

CREATE OR REPLACE procedure jvd_conv
as
begin
delete from test_jvd;
dbms_output.put_line('delete: '||sql%rowcount);
insert into test_jvd
select * from test_jvd@apps_cjibda;
dbms_output.put_line('insert: '||sql%rowcount);
commit;
end;
/

Regards,
Jaap.

Laurenz Albe

unread,

Jan 8, 2009, 11:21:56 AM1/8/09

to

Jaap W. van Dijk <j.w.v...@hetnet.nl> wrote:

> Just to be sure about what is in the database: could you execute a
> 'select dump(val) from nlsdate' on both databases?

I already deleted all the stuff...
But I'm sure that I did not encounter your problem because there were
no inverted question marks.

> Furthermore my, procedure is coded as follows:
>
> CREATE OR REPLACE procedure jvd_conv
> as
> begin
> delete from test_jvd;
> dbms_output.put_line('delete: '||sql%rowcount);
> insert into test_jvd
> select * from test_jvd@apps_cjibda;
> dbms_output.put_line('insert: '||sql%rowcount);
> commit;
> end;
> /

Looks simple enough.

Could you run my test case and see if it works for you or not?

Yours,
Laurenz Albe

joel garry

unread,

Jan 8, 2009, 4:05:56 PM1/8/09

to

I note there are other problems over dblinks not fixed until 11 (like
bug 3419260)... you may have hit a known or unknown bug. Worth trying
in 9208 and creating a simple demo for support. Also see if exp/imp
works right. Also print out your NLS from the procedure.

jg
--
@home.com is bogus.
Webex advertising on radio?

Jaap W. van Dijk

unread,

Jan 9, 2009, 3:02:10 AM1/9/09

to

On Jan 8, 5:21 pm, Laurenz Albe <inv...@spam.to.invalid> wrote:

This is my testcase (comments between square brackets):

[Logging on to the source database:]

SQL*Plus: Release 10.1.0.2.0 - Production on Vr Jan 9 08:40:14 2009

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select *
2 from v$nls_parameters
3 where parameter = 'NLS_CHARACTERSET';

PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8

[Logging on to the target database:]

SQL*Plus: Release 10.1.0.2.0 - Production on Vr Jan 9 08:41:31 2009

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production
With the Partitioning option
JServer Release 9.2.0.5.0 - Production

[Here I have got a procedure with the following code:]

CREATE OR REPLACE procedure DWH.jvd_conv_rbn
as
charset varchar2(100);
begin
select value
into charset
from v$nls_parameters
where parameter = 'NLS_CHARACTERSET'
DBMS_OUTPUT.PUT_LINE(CHARSET);

delete from test_jvd;
dbms_output.put_line('delete: '||sql%rowcount);
insert into test_jvd

select * from test_jvd@robein_ontw01_rbn;

dbms_output.put_line('insert: '||sql%rowcount);
commit;
end;
/

[If I execute this procedure it fails because the table does not
exists in the source database:]

SQL> exec jvd_conv_rbn
BEGIN jvd_conv_rbn; END;

*
ERROR at line 1:
ORA-00942: table or view does not exist
ORA-02063: preceding line from ROBEIN_ONTW01_RBN
ORA-06512: at "DWH.JVD_CONV_RBN", line 12
ORA-06512: at line 1

[Now I create and fill the table at the source database:]

SQL> create table test_jvd (teken varchar2(1 character),tekst varchar2
(100));

Table created.

SQL> insert into test_jvd (teken,tekst) values (chr(49765),'Small E
acute');

1 row created.

SQL> commit;

Commit complete.

SQL> select dump(teken)
2 from test_jvd;

DUMP(TEKEN)
--------------------------------------------------------------------------------
Typ=1 Len=2: 194,101

SQL>

[And I execute the procedure again at the target database:]

SQL> set serveroutput on
SQL> drop table test_jvd;

Table dropped.

SQL> create table test_jvd (teken varchar2(1),tekst varchar2(100));

Table created.

SQL> exec jvd_conv_rbn
WE8ISO8859P1
delete: 0
insert: 1

PL/SQL procedure successfully completed.

SQL> select dump(teken),tekst
2 from test_jvd;

DUMP(TEKEN)
--------------------------------------------------------------------------------
TEKST
--------------------------------------------------------------------------------
Typ=1 Len=1: 191
Small E acute

SQL>

[Which is the wrong value]

Regards,
Jaap.

Gerard H. Pille

unread,

Jan 9, 2009, 12:41:07 PM1/9/09

to

Jaap W. van Dijk schreef:

Probeer de oefening eens vanuit de bron-gegevensbank, die de recentste versie is. Je weet nooit
dat de oudere versie een fout maakt (als ik me niet vergis gebeurt de vertaling door SQL*Net).

Jaap W. van Dijk

unread,

Jan 13, 2009, 4:11:46 AM1/13/09

to

I want to thank all who have contributed to this thread, but I found
the cause: I was handed a list of characters with their hexadecimal
codes, but now it appears that those were not the UTF-8 codes. For
instance SMALL E ACUTE is not C265 but it is C3A9. I managed to find
the correct codes and now all is working properly.

Thanks again for your input,
Jaap.

Laurenz Albe

unread,

Jan 13, 2009, 11:38:54 AM1/13/09

to

Jaap W. van Dijk <j.w.v...@hetnet.nl> wrote:

> This is my testcase (comments between square brackets):
>
> [Logging on to the source database:]
>
> SQL*Plus: Release 10.1.0.2.0 - Production on Vr Jan 9 08:40:14 2009
>
> Copyright (c) 1982, 2004, Oracle. All rights reserved.
>
>
> Connected to:
> Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
> With the Partitioning, OLAP and Data Mining options
>
> SQL> select *
> 2 from v$nls_parameters
> 3 where parameter = 'NLS_CHARACTERSET';
>
> PARAMETER
> ----------------------------------------------------------------
> VALUE
> ----------------------------------------------------------------
> NLS_CHARACTERSET
> AL32UTF8

[...]

> [Now I create and fill the table at the source database:]
>
> SQL> create table test_jvd (teken varchar2(1 character),tekst varchar2
> (100));
>
> Table created.
>
> SQL> insert into test_jvd (teken,tekst) values (chr(49765),'Small E
> acute');
>
> 1 row created.
>
> SQL> commit;
>
> Commit complete.
>
> SQL> select dump(teken)
> 2 from test_jvd;
>
> DUMP(TEKEN)
> ------------------------------------------------------------------------

> Typ=1 Len=2: 194,101

Ok, now I know what your problem is!

CHR(49765) is *NOT* a small e acute.

It is in fact an invalid UTF-8 sequence, and if Oracle were not so
incredibly sloppy about encoding, it would report an error.

As it is, it stores garbage in the table, and this garbage will turn to
question marks as soon as a conversion takes place.

A small e acute in UTF-8 is actually 50089.

Try again with CHR(50089) and voila! it will work.

Yours,
Laurenz Albe

Jaap W. van Dijk

unread,

Jan 14, 2009, 2:48:50 AM1/14/09

to

On 13 Jan 2009 16:38:54 GMT, Laurenz Albe <inv...@spam.to.invalid>
wrote:

Hi Laurenz,

Thanks for diving into this! In an other posting in this thread I came
to the same conclusion. The list with erroneous codes that was handed
to me put me on the wrong track. I had some trouble determining that
the codes were wrong because I could not find a list of correct utf-8
codes for 1 and 2 bytes on the internet. Unicode code point are easily
found, but it wasn't until I found a document with the conversion
algorithm from unicode codepoint to UTF-8 byte representations that I
discovered the the codes handed to me were wrong.

How did you come by your information on valid UTF-8 codes?

Regards,
Jaap.

Laurenz Albe

unread,

Jan 14, 2009, 5:11:43 AM1/14/09

to

Jaap W. van Dijk <j.w.vandijk...@hetnet.nl> wrote:
> Thanks for diving into this! In an other posting in this thread I came
> to the same conclusion.

I saw your posting only after I had written this :^/

> How did you come by your information on valid UTF-8 codes?

I got an error message from "iconv", but here is the official way:

Read RFC 2279 and look at the possible octet sequences in chapter 2.

A two-byte UTF-8 sequence looks like this:

110xxxxx 10xxxxxx

Now if you convert 49765 to binary, you get

11000010 01100101

which clearly dows not match this pattern.

Yours,
Laurenz Albe