Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problems loading data into the database using SQL Loader.

1,656 views
Skip to first unread message

trub3101

unread,
Jun 23, 2009, 12:42:03 PM6/23/09
to orac...@lazydba.com
Hi all,

Database
Oracle 9.2.0.8
NLS_LANG=AMERICAN_AMERICA.UTF8
NLS_CHARACTERSET UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

Server
Red Hat Linux 2.6.9-67.ELsmp
LANG=en_US.UTF-8

I cannot figure this one out. When I try to populate a table using SQl
Loader I am getting corrupt characters in the table e.g. Alien³
appears in the table as Alien¿

I have even set the CHARACTERSET to UTF8 in the control file for good
measure!

Can anyone throw some light on this one?

Thanks in advance for your replies.

TB3101

ddf

unread,
Jun 23, 2009, 12:53:25 PM6/23/09
to

I'll take a guess that you're doing this from a client machine. What
is the NLS_LANG on the machine where you're running the loader?


David Fitzjarrell

trub3101

unread,
Jun 23, 2009, 2:38:10 PM6/23/09
to
> David Fitzjarrell- Hide quoted text -
>
> - Show quoted text -

Thanks for reply David,

I am actually logged onto the server and running the sql loader
process. The NLS_LANG is set to AMERICAN_AMERICA.UTF8

Cheers,

TB3101

ddf

unread,
Jun 23, 2009, 2:53:46 PM6/23/09
to
> TB3101- Hide quoted text -

>
> - Show quoted text -

We'll need to see your control file and data file.


David Fitzjarrell

trub3101

unread,
Jun 23, 2009, 3:30:40 PM6/23/09
to

Hi David,

Here is the control file:

LOAD DATA
CHARACTERSET UTF8
INFILE '/opt/ora/oracle/oradata/data/xxxx.dat' "str '\n'"
BADFILE '/opt/ora/oracle/oradata/xxxx.bad'
DISCARDFILE '/opt/ora/oracle/oradata/xxxx.dis'
TRUNCATE
INTO TABLE film_film
TRAILING NULLCOLS

(
START_TIME DATE "DD/MM/YYYY,HH24:MI:SS,"
TERMINATED BY '^',
CHANNEL_ID CHAR(300) TERMINATED BY '^',
REGION_CODE CHAR(10) TERMINATED BY '^',
TITLE CHAR(200) TERMINATED BY '^',
PERFORMERS CHAR(4000) TERMINATED BY '^',
DIRECTOR CHAR(150) TERMINATED BY '^',
FILM_RATING INTEGER EXTERNAL TERMINATED BY
'^',
YEAR CHAR(15) TERMINATED BY '^',
DESCRIPTION CHAR(4000) TERMINATED BY '^',
PREMIERE CHAR(1) TERMINATED BY '^',
TV_MOVIE CHAR(1) TERMINATED BY '^',
FILM_CERTIFICATE CHAR(10) TERMINATED BY '^',
WARNING CHAR(200) TERMINATED BY '^',
REVIEW_AUTHOR CHAR(50)
)

and here is the data file:

30/06/2009 19:00:00^252^^Alien³^Warrant Officer Ripley*Sigourney
Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
first two films stand up as sci-fi classics in their own right, Se7en
director David Fincher, in his feature film debut, had a virtually
impossible act to follow with this second sequel. He makes a
surprisingly good fist of it, developing the maternal themes of first
sequel Aliens and providing an exhilarating final showdown. Sigourney
Weaver returns as Ripley, who this time crash-lands on a prison colony
where another lethal alien is let loose. A familiar cast of Brits
(Charles Dance, Paul McGann, Brian Glover) provides the alien food
and, while it isn't in the same class as the first two films, this
provides a satisfactory entry in the series. Still, it would have been
interesting to see what second-choice director Vincent Ward (of The
Navigator: a Medieval Odyssey fame) would have made of it -
apparently, he was brought in when Renny Harlin left after script
disagreements, but was himself replaced when it emerged that his
version of the movie would be set in a monastery and the alien itself
wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.
^JF

Thanks again for your help,

TB3101


trub3101

unread,
Jun 23, 2009, 3:38:12 PM6/23/09
to

Hi David,

When I run 'cat -e' on the data file the output for the title is
different!

30/06/2009 19:00:00^252^^AlienM-3^Warrant Officer Ripley*Sigourney


Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
first two films stand up as sci-fi classics in their own right, Se7en
director David Fincher, in his feature film debut, had a virtually
impossible act to follow with this second sequel. He makes a
surprisingly good fist of it, developing the maternal themes of first
sequel Aliens and providing an exhilarating final showdown. Sigourney
Weaver returns as Ripley, who this time crash-lands on a prison colony
where another lethal alien is let loose. A familiar cast of Brits
(Charles Dance, Paul McGann, Brian Glover) provides the alien food
and, while it isn't in the same class as the first two films, this
provides a satisfactory entry in the series. Still, it would have been
interesting to see what second-choice director Vincent Ward (of The
Navigator: a Medieval Odyssey fame) would have made of it -
apparently, he was brought in when Renny Harlin left after script
disagreements, but was himself replaced when it emerged that his
version of the movie would be set in a monastery and the alien itself
wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.

^JF$

Cheers,

TB3101

Vladimir M. Zakharychev

unread,
Jun 24, 2009, 2:55:30 AM6/24/09
to

Can you also post the SQL*loader log file? Do you see something like
this at the very beginning of the log:

Character Set UTF8 specified for all input.
First primary datafile xxxx.dat has a
utf8 byte order mark in it.

I mean, you gotta make sure the file is really in UTF-8 so that
SQL*Loader could process it properly.

Regards,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com

trub3101

unread,
Jun 24, 2009, 9:47:08 AM6/24/09
to
On 24 June, 07:55, "Vladimir M. Zakharychev"
>    http://www.dynamicpsp.com- Hide quoted text -

>
> - Show quoted text -

Hi Vladimir,

The log file has 'Character Set UTF8 specified for all input.' in it.

Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
an observation on your behalf or was this also supposed to be in the
log file?

Thanks for your reply,

TB3101

Gerard H. Pille

unread,
Jun 24, 2009, 1:43:50 PM6/24/09
to
trub3101 schreef:


Maybe the data is stored correctly and you're getting the wrong results when retrieving?


And the input file you show, is not in UTF-8, is it?

trub3101

unread,
Jun 24, 2009, 2:55:17 PM6/24/09
to
> And the input file you show, is not in UTF-8, is it?- Hide quoted text -

>
> - Show quoted text -

Thanks for your reply Gerard,

After consulting one of the devs earlier it would appear that file is
in 'ISO-8859-1' character encoding format.
I guess I will now need to convert this to UTF-8 character encoding
somehow.

The Linux iconv command does seem to make any difference the data is
still appearing corrupted in the database.

Anyone?

Cheers,
TB3101

Vladimir M. Zakharychev

unread,
Jun 25, 2009, 3:14:40 AM6/25/09
to
> >    http://www.dynamicpsp.com-Hide quoted text -

>
> > - Show quoted text -
>
> Hi Vladimir,
>
> The log file has 'Character Set UTF8 specified for all input.' in it.
>
> Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
> an observation on your behalf or was this also supposed to be in the
> log file?
>
> Thanks for your reply,
>
> TB3101

I observed it while testing your case and I suppose it should be there
for the loader to properly process/convert your data. In my test (on
10.2.0.4/Win32,) a file in UTF-8 with BOM at the very beginning of the
file was loaded correctly. It is not mandatory to have the BOM in an
UTF-8 file as the byte order in UTF-8 is the same on all platforms and
BOM's only purpose in UTF-8 is to mark otherwise unmarked plain text
as being in Unicode.

Since your file is actually in ISO-8859-1, maybe it's easier to
reflect this in the control file (CHARACTERSET WE8ISO8859P1) and let
Oracle do the conversion to the database charset (it's pretty good at
that.)

Hth,

trub3101

unread,
Jun 25, 2009, 8:04:26 AM6/25/09
to
On 25 June, 08:14, "Vladimir M. Zakharychev"
> > >    http://www.dynamicpsp.com-Hidequoted text -

>
> > > - Show quoted text -
>
> > Hi Vladimir,
>
> > The log file has 'Character Set UTF8 specified for all input.' in it.
>
> > Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
> > an observation on your behalf or was this also supposed to be in the
> > log file?
>
> > Thanks for your reply,
>
> > TB3101
>
> I observed it while testing your case and I suppose it should be there
> for the loader to properly process/convert your data. In my test (on
> 10.2.0.4/Win32,) a file in UTF-8 with BOM at the very beginning of the
> file was loaded correctly. It is not mandatory to have the BOM in an
> UTF-8 file as the byte order in UTF-8 is the same on all platforms and
> BOM's only purpose in UTF-8 is to mark otherwise unmarked plain text
> as being in Unicode.
>
> Since your file is actually in ISO-8859-1, maybe it's easier to
> reflect this in the control file (CHARACTERSET WE8ISO8859P1) and let
> Oracle do the conversion to the database charset (it's pretty good at
> that.)
>
> Hth,
>    Vladimir M. Zakharychev
>    N-Networks, makers of Dynamic PSP(tm)
>    http://www.dynamicpsp.com- Hide quoted text -

>
> - Show quoted text -

Thanks for your reply Vladimir,

I had hoped that the solution would have been something as easy as
changing the characterset in the control file to WE9ISO8859P1 however,
this does not seem to make any difference what so ever.

Just to satisfy my curiousity which character is the BOM?

Thanks again,

TB3101

Vladimir M. Zakharychev

unread,
Jun 25, 2009, 8:42:34 AM6/25/09
to
> > > >    http://www.dynamicpsp.com-Hidequotedtext -

>
> > > > - Show quoted text -
>
> > > Hi Vladimir,
>
> > > The log file has 'Character Set UTF8 specified for all input.' in it.
>
> > > Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
> > > an observation on your behalf or was this also supposed to be in the
> > > log file?
>
> > > Thanks for your reply,
>
> > > TB3101
>
> > I observed it while testing your case and I suppose it should be there
> > for the loader to properly process/convert your data. In my test (on
> > 10.2.0.4/Win32,) a file in UTF-8 with BOM at the very beginning of the
> > file was loaded correctly. It is not mandatory to have the BOM in an
> > UTF-8 file as the byte order in UTF-8 is the same on all platforms and
> > BOM's only purpose in UTF-8 is to mark otherwise unmarked plain text
> > as being in Unicode.
>
> > Since your file is actually in ISO-8859-1, maybe it's easier to
> > reflect this in the control file (CHARACTERSET WE8ISO8859P1) and let
> > Oracle do the conversion to the database charset (it's pretty good at
> > that.)
>
> > Hth,
> >    Vladimir M. Zakharychev
> >    N-Networks, makers of Dynamic PSP(tm)
> >    http://www.dynamicpsp.com-Hide quoted text -

>
> > - Show quoted text -
>
> Thanks for your reply Vladimir,
>
> I had hoped that the solution would have been something as easy as
> changing the characterset in the control file to WE9ISO8859P1 however,
> this does not seem to make any difference what so ever.
>
> Just to satisfy my curiousity which character is the BOM?
>
> Thanks again,
>
> TB3101

U+FEFF (that is, Unicode code point 0xFEFF, also known as "zero width
non-breaking space" if encountered in the middle of the stream, though
this use of the code point is deprecated.) In UTF-8 it's encoded as
sequence of bytes 0xEF 0xBB 0xBE. So whenever you see these three
bytes at the very beginning of a text stream, you can be sure this is
UTF-8.

trub3101

unread,
Jun 25, 2009, 8:52:35 AM6/25/09
to
On 25 June, 13:42, "Vladimir M. Zakharychev"
> > >    http://www.dynamicpsp.com-Hidequoted text -

>
> > > - Show quoted text -
>
> > Thanks for your reply Vladimir,
>
> > I had hoped that the solution would have been something as easy as
> > changing the characterset in the control file to WE9ISO8859P1 however,
> > this does not seem to make any difference what so ever.
>
> > Just to satisfy my curiousity which character is the BOM?
>
> > Thanks again,
>
> > TB3101
>
> U+FEFF (that is, Unicode code point 0xFEFF, also known as "zero width ...
>
> read more »- Hide quoted text -

>
> - Show quoted text -

Thanks Vladimir,

So that file was UTF-8 encode?

trub3101

unread,
Jun 25, 2009, 11:17:09 AM6/25/09
to
On 25 June, 13:42, "Vladimir M. Zakharychev"
> > >    http://www.dynamicpsp.com-Hidequoted text -

>
> > > - Show quoted text -
>
> > Thanks for your reply Vladimir,
>
> > I had hoped that the solution would have been something as easy as
> > changing the characterset in the control file to WE9ISO8859P1 however,
> > this does not seem to make any difference what so ever.
>
> > Just to satisfy my curiousity which character is the BOM?
>
> > Thanks again,
>
> > TB3101
>
> U+FEFF (that is, Unicode code point 0xFEFF, also known as "zero width ...
>
> read more »- Hide quoted text -

>
> - Show quoted text -

Hi Vladimir,

I am not sure what happened to my last post!!!! I take then that
because the file in question has a BOM it is a UTF-8 encoded file.

Thanks again for your assistance,
TB3101

trub3101

unread,
Jul 16, 2009, 6:38:35 AM7/16/09
to
> > > >    http://www.dynamicpsp.com-Hidequotedtext -

>
> > > > - Show quoted text -
>
> > > Thanks for your reply Vladimir,
>
> > > I had hoped that the solution would have been something as easy as
> > > changing the characterset in the control file to WE9ISO8859P1 however,
> > > this does not seem to make any difference what so ever.
>
> > > Just to satisfy my curiousity which character is the BOM?
>
> > > Thanks again,
>
> > > TB3101
>
> > U+FEFF (that is, Unicode code point 0xFEFF, also known as "zero width ...
>
> > read more »- Hide quoted text -
>
> > - Show quoted text -
>
> Hi Vladimir,
>
> I am not sure what happened to my last post!!!! I take then that
> because the file in question has a BOM it is a UTF-8 encoded file.
>
> Thanks again for your assistance,
> TB3101

Hi all,

I managed to rectify this issue a little while back now. I created a
script using the unix/linux lconv command to convert the data file
from ISO-8859-1 to UTF-8 and that was it. Fairly simple in the end!
Thanks for all the feedback everyone!

TB3101

Nagaraj Radder

unread,
Feb 7, 2023, 4:16:19 AM2/7/23
to
Could you please provide the script.
0 new messages