I work on an Oracle9 database which is loaded through sqlldr. The
issue that I am facing is because of a special character known as long
dash "-" (the normal one is "-") which appears in the flat file used
to load the database. Whenver such character appears in the file the
database processes it and it appears as "¿" in the database. So the
data coming as "Finance - Fulfillment Factory" in the flat file is
stored as "Finance ¿ Fulfillment Factory" in the database.
The main reason for me posting this issue is because when I format
this data according to the needs of my application it gives me the
wrong result. I use a command like below:
nvl(rpad(UPPER(substr(nvl( substr(column1,1,instr(column1,'(')-1),column1),
1,24)),24),' ')
This command is supposed to return me 24 characters in all situations
but it fails when the data contains that character "¿". It returns
only 23 characters in this situation. I guess that is because that
character takes 2 bytes instead of normal 1 and the rpad pads the data
with 24 bytes and not 24 characters. So since this character takes 2
bytes it returns only 23 characters.
I have also checked and the database uses the UTF8 character set.
Not sure how to solve this issue...any help would be much appreciated.
Thanks!
Is your input file UTF8? If yes, did you tell sqlldr that your input file is UTF8?
Thomas
Not sure whether the input file is UTF8 or not but yes the sqlldr does
have the setting to know that the input file would be UTF8.
That sounds like the /input file/ is not UTF8
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/functions162.htm#i87066
"SUBSTR calculates lengths using characters as defined by the input character set"
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/functions068.htm#i77598
"INSTR calculates strings using characters as defined by the input character set"
So why do you think your statement doesn't work?
Thomas
"FINANCE ¿ FULFILLMENT F" whereas the correct result that I am looking
for is "FINANCE ¿ FULFILLMENT FA".
Any ideas?
> Not sure as to why is it not working but when I run this command on
It works fine on a 10g database here.
Most likely the encoding of the data does not match the client character
set.
You must find out the encoding of your text data.
Ask the people who produced the data, or examine the file and guess.
You could for example use "od -c". Here are two samples:
0000000 F i n a n c e 342 200 224 F u l f
0000020 i l l m e n t F a c t o r y \n
0000040
This is in UTF-8 encoding. The em-dash (UNICODE 0x2014) is encoded as three
bytes in UTF-8. They are represented in octal in the "od" output.
0000000 F i n a n c e 227 F u l f i l
0000020 l m e n t F a c t o r y
0000035
This is WIN-1252 encoding. The em-dash is a single byte here.
Next step is to set the client character set correctly.
For the first case, NLS_LANG should be AMERICAN_AMERICA.AL32UTF8, and
for the second case it should be AMERICAN_AMERICA.WE8MSWIN1252
(the AMERICAN_AMERICA is not important - choose what you want).
If data and client character set match, SUBSTR will work and sqlldr should
too.
Yours,
Laurenz Albe