Issues while storing a long dash in oracle

Sandy80

unread,

Mar 11, 2008, 2:53:23 AM3/11/08

to

Hi,

I work on an Oracle9 database which is loaded through sqlldr. The
issue that I am facing is because of a special character known as long
dash "-" (the normal one is "-") which appears in the flat file used
to load the database. Whenver such character appears in the file the
database processes it and it appears as "¿" in the database. So the
data coming as "Finance - Fulfillment Factory" in the flat file is
stored as "Finance ¿ Fulfillment Factory" in the database.
The main reason for me posting this issue is because when I format
this data according to the needs of my application it gives me the
wrong result. I use a command like below:

nvl(rpad(UPPER(substr(nvl( substr(column1,1,instr(column1,'(')-1),column1),
1,24)),24),' ')

This command is supposed to return me 24 characters in all situations
but it fails when the data contains that character "¿". It returns
only 23 characters in this situation. I guess that is because that
character takes 2 bytes instead of normal 1 and the rpad pads the data
with 24 bytes and not 24 characters. So since this character takes 2
bytes it returns only 23 characters.
I have also checked and the database uses the UTF8 character set.

Not sure how to solve this issue...any help would be much appreciated.

Thanks!

Thomas Kellerer

unread,

Mar 11, 2008, 3:31:47 AM3/11/08

to

Sandy80, 11.03.2008 07:53:

> Hi,
>
> I work on an Oracle9 database which is loaded through sqlldr. The
> issue that I am facing is because of a special character known as long
> dash "-" (the normal one is "-") which appears in the flat file used
> to load the database. Whenver such character appears in the file the

> database processes it and it appears as "ż" in the database. So the

> data coming as "Finance - Fulfillment Factory" in the flat file is

> stored as "Finance ż Fulfillment Factory" in the database.

> The main reason for me posting this issue is because when I format
> this data according to the needs of my application it gives me the
> wrong result. I use a command like below:
>

> I have also checked and the database uses the UTF8 character set.

Is your input file UTF8? If yes, did you tell sqlldr that your input file is UTF8?

Thomas

Sandy80

unread,

Mar 11, 2008, 3:36:09 AM3/11/08

to

> I have also checked and the database uses the UTF8 character set.
>
> Is your input file UTF8? If yes, did you tell sqlldr that your input file is UTF8?
>
> Thomas

Not sure whether the input file is UTF8 or not but yes the sqlldr does
have the setting to know that the input file would be UTF8.

Thomas Kellerer

unread,

Mar 11, 2008, 3:49:14 AM3/11/08

to

Sandy80, 11.03.2008 08:36:

>> I have also checked and the database uses the UTF8 character set.
>>
>> Is your input file UTF8? If yes, did you tell sqlldr that your input file is UTF8?
>>

> Not sure whether the input file is UTF8 or not but yes the sqlldr does
> have the setting to know that the input file would be UTF8.

That sounds like the /input file/ is not UTF8

Sandy80

unread,

Mar 11, 2008, 4:24:11 AM3/11/08

to

But my problem is that even if that character is processed and stored
correctly in the database it would still be stored as a 2 byte
character. In that case how do I make the following command, which
works on the basis of no. of bytes, work correctly:

Thomas Kellerer

unread,

Mar 11, 2008, 4:35:48 AM3/11/08

to

Sandy80, 11.03.2008 09:24:

I don't see why the statement shouldn't work. substr and instr work on *characters* not bytes:

http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/functions162.htm#i87066
"SUBSTR calculates lengths using characters as defined by the input character set"

http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/functions068.htm#i77598
"INSTR calculates strings using characters as defined by the input character set"

So why do you think your statement doesn't work?

Thomas

Sandy80

unread,

Mar 11, 2008, 5:11:41 AM3/11/08

to

Not sure as to why is it not working but when I run this command on
the data "Finance ¿ Fulfillment Factory" the result returned is:

"FINANCE ¿ FULFILLMENT F" whereas the correct result that I am looking
for is "FINANCE ¿ FULFILLMENT FA".

Any ideas?

Laurenz Albe

unread,

Mar 11, 2008, 6:02:29 AM3/11/08

to

Sandy80 <svarsh...@gmail.com> wrote:
> nvl(rpad(UPPER(substr(nvl( substr(column1,1,instr(column1,'(')-1),column1),
> 1,24)),24),' ')

> Not sure as to why is it not working but when I run this command on

It works fine on a 10g database here.

Most likely the encoding of the data does not match the client character
set.

You must find out the encoding of your text data.
Ask the people who produced the data, or examine the file and guess.

You could for example use "od -c". Here are two samples:

0000000 F i n a n c e 342 200 224 F u l f
0000020 i l l m e n t F a c t o r y \n
0000040

This is in UTF-8 encoding. The em-dash (UNICODE 0x2014) is encoded as three
bytes in UTF-8. They are represented in octal in the "od" output.

0000000 F i n a n c e 227 F u l f i l
0000020 l m e n t F a c t o r y
0000035

This is WIN-1252 encoding. The em-dash is a single byte here.

Next step is to set the client character set correctly.

For the first case, NLS_LANG should be AMERICAN_AMERICA.AL32UTF8, and
for the second case it should be AMERICAN_AMERICA.WE8MSWIN1252
(the AMERICAN_AMERICA is not important - choose what you want).

If data and client character set match, SUBSTR will work and sqlldr should
too.

Yours,
Laurenz Albe