Database
Oracle 9.2.0.8
NLS_LANG=AMERICAN_AMERICA.UTF8
NLS_CHARACTERSET UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
Server
Red Hat Linux 2.6.9-67.ELsmp
LANG=en_US.UTF-8
I cannot figure this one out. When I try to populate a table using SQl
Loader I am getting corrupt characters in the table e.g. Alien³
appears in the table as Alien¿
I have even set the CHARACTERSET to UTF8 in the control file for good
measure!
Can anyone throw some light on this one?
Thanks in advance for your replies.
TB3101
I'll take a guess that you're doing this from a client machine. What
is the NLS_LANG on the machine where you're running the loader?
David Fitzjarrell
Thanks for reply David,
I am actually logged onto the server and running the sql loader
process. The NLS_LANG is set to AMERICAN_AMERICA.UTF8
Cheers,
TB3101
We'll need to see your control file and data file.
David Fitzjarrell
Hi David,
Here is the control file:
LOAD DATA
CHARACTERSET UTF8
INFILE '/opt/ora/oracle/oradata/data/xxxx.dat' "str '\n'"
BADFILE '/opt/ora/oracle/oradata/xxxx.bad'
DISCARDFILE '/opt/ora/oracle/oradata/xxxx.dis'
TRUNCATE
INTO TABLE film_film
TRAILING NULLCOLS
(
START_TIME DATE "DD/MM/YYYY,HH24:MI:SS,"
TERMINATED BY '^',
CHANNEL_ID CHAR(300) TERMINATED BY '^',
REGION_CODE CHAR(10) TERMINATED BY '^',
TITLE CHAR(200) TERMINATED BY '^',
PERFORMERS CHAR(4000) TERMINATED BY '^',
DIRECTOR CHAR(150) TERMINATED BY '^',
FILM_RATING INTEGER EXTERNAL TERMINATED BY
'^',
YEAR CHAR(15) TERMINATED BY '^',
DESCRIPTION CHAR(4000) TERMINATED BY '^',
PREMIERE CHAR(1) TERMINATED BY '^',
TV_MOVIE CHAR(1) TERMINATED BY '^',
FILM_CERTIFICATE CHAR(10) TERMINATED BY '^',
WARNING CHAR(200) TERMINATED BY '^',
REVIEW_AUTHOR CHAR(50)
)
and here is the data file:
30/06/2009 19:00:00^252^^Alien³^Warrant Officer Ripley*Sigourney
Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
first two films stand up as sci-fi classics in their own right, Se7en
director David Fincher, in his feature film debut, had a virtually
impossible act to follow with this second sequel. He makes a
surprisingly good fist of it, developing the maternal themes of first
sequel Aliens and providing an exhilarating final showdown. Sigourney
Weaver returns as Ripley, who this time crash-lands on a prison colony
where another lethal alien is let loose. A familiar cast of Brits
(Charles Dance, Paul McGann, Brian Glover) provides the alien food
and, while it isn't in the same class as the first two films, this
provides a satisfactory entry in the series. Still, it would have been
interesting to see what second-choice director Vincent Ward (of The
Navigator: a Medieval Odyssey fame) would have made of it -
apparently, he was brought in when Renny Harlin left after script
disagreements, but was himself replaced when it emerged that his
version of the movie would be set in a monastery and the alien itself
wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.
^JF
Thanks again for your help,
TB3101
Hi David,
When I run 'cat -e' on the data file the output for the title is
different!
30/06/2009 19:00:00^252^^AlienM-3^Warrant Officer Ripley*Sigourney
Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
first two films stand up as sci-fi classics in their own right, Se7en
director David Fincher, in his feature film debut, had a virtually
impossible act to follow with this second sequel. He makes a
surprisingly good fist of it, developing the maternal themes of first
sequel Aliens and providing an exhilarating final showdown. Sigourney
Weaver returns as Ripley, who this time crash-lands on a prison colony
where another lethal alien is let loose. A familiar cast of Brits
(Charles Dance, Paul McGann, Brian Glover) provides the alien food
and, while it isn't in the same class as the first two films, this
provides a satisfactory entry in the series. Still, it would have been
interesting to see what second-choice director Vincent Ward (of The
Navigator: a Medieval Odyssey fame) would have made of it -
apparently, he was brought in when Renny Harlin left after script
disagreements, but was himself replaced when it emerged that his
version of the movie would be set in a monastery and the alien itself
wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.
^JF$
Cheers,
TB3101
Can you also post the SQL*loader log file? Do you see something like
this at the very beginning of the log:
Character Set UTF8 specified for all input.
First primary datafile xxxx.dat has a
utf8 byte order mark in it.
I mean, you gotta make sure the file is really in UTF-8 so that
SQL*Loader could process it properly.
Regards,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com
Hi Vladimir,
The log file has 'Character Set UTF8 specified for all input.' in it.
Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
an observation on your behalf or was this also supposed to be in the
log file?
Thanks for your reply,
TB3101
Maybe the data is stored correctly and you're getting the wrong results when retrieving?
And the input file you show, is not in UTF-8, is it?
Thanks for your reply Gerard,
After consulting one of the devs earlier it would appear that file is
in 'ISO-8859-1' character encoding format.
I guess I will now need to convert this to UTF-8 character encoding
somehow.
The Linux iconv command does seem to make any difference the data is
still appearing corrupted in the database.
Anyone?
Cheers,
TB3101
I observed it while testing your case and I suppose it should be there
for the loader to properly process/convert your data. In my test (on
10.2.0.4/Win32,) a file in UTF-8 with BOM at the very beginning of the
file was loaded correctly. It is not mandatory to have the BOM in an
UTF-8 file as the byte order in UTF-8 is the same on all platforms and
BOM's only purpose in UTF-8 is to mark otherwise unmarked plain text
as being in Unicode.
Since your file is actually in ISO-8859-1, maybe it's easier to
reflect this in the control file (CHARACTERSET WE8ISO8859P1) and let
Oracle do the conversion to the database charset (it's pretty good at
that.)
Hth,
Thanks for your reply Vladimir,
I had hoped that the solution would have been something as easy as
changing the characterset in the control file to WE9ISO8859P1 however,
this does not seem to make any difference what so ever.
Just to satisfy my curiousity which character is the BOM?
Thanks again,
TB3101
U+FEFF (that is, Unicode code point 0xFEFF, also known as "zero width
non-breaking space" if encountered in the middle of the stream, though
this use of the code point is deprecated.) In UTF-8 it's encoded as
sequence of bytes 0xEF 0xBB 0xBE. So whenever you see these three
bytes at the very beginning of a text stream, you can be sure this is
UTF-8.
Thanks Vladimir,
So that file was UTF-8 encode?
Hi Vladimir,
I am not sure what happened to my last post!!!! I take then that
because the file in question has a BOM it is a UTF-8 encoded file.
Thanks again for your assistance,
TB3101
Hi all,
I managed to rectify this issue a little while back now. I created a
script using the unix/linux lconv command to convert the data file
from ISO-8859-1 to UTF-8 and that was it. Fairly simple in the end!
Thanks for all the feedback everyone!
TB3101