I'm trying to load an UTF8 file with fixed length record using SQL
Loader for Oracle 8i.
Unfortunately, SQL Loader does not make the difference between bytes
and characters. i.e. that if two bytes are used to code an accentuated
character (like é) SQL loader will consider that two caracters has
been read meaning that beginning of next field will be shifted.
Example:
LOAD DATA
CHARACTERSET UTF8
INFILE 'XXX.txt'
REPLACE
INTO TABLE tmp.MY_TABLE
(
LASTN POSITION (1:20), -- Last name
FIRSN POSITION (21:40), -- First name
STATU POSITION (41:41), -- Status
BIRDT POSITION (42:49) -- Date of birth
)
With this data :
Thévenat Jacques 119460508
I'll get in my DB :
Field 1 : Thévenat
Field 2 : Jacques (note the leading space)
Field 3 : (blank)
Field 4 : 11946050 (wrong date)
This because the "é" is coded like C3 A9 (= 2 bytes) in UTF8.
I know that there is a LENGTH SEMANTICS CHAR option in Oracle 9i but
I haven't found anything similar in 8i.
Does anybody know a work around ?
Thanks in advance.
: I'm trying to load an UTF8 file with fixed length record using SQL
: Loader for Oracle 8i.
But utf-8 isn't fixed length.
: Unfortunately, SQL Loader does not make the difference between bytes
: and characters. i.e. that if two bytes are used to code an accentuated
: character (like é) SQL loader will consider that two caracters has
: been read meaning that beginning of next field will be shifted.
: Example:
: LOAD DATA
: CHARACTERSET UTF8
: INFILE 'XXX.txt'
: REPLACE
: INTO TABLE tmp.MY_TABLE
: (
: LASTN POSITION (1:20), -- Last name
: FIRSN POSITION (21:40), -- First name
: STATU POSITION (41:41), -- Status
: BIRDT POSITION (42:49) -- Date of birth
: )
: With this data :
: Thévenat Jacques 119460508
: I'll get in my DB :
: Field 1 : Thévenat
: Field 2 : Jacques (note the leading space)
: Field 3 : (blank)
: Field 4 : 11946050 (wrong date)
: This because the "é" is coded like C3 A9 (= 2 bytes) in UTF8.
As a said, not fixed length data.
: I know that there is a LENGTH SEMANTICS CHAR option in Oracle 9i but
: I haven't found anything similar in 8i.
: Does anybody know a work around ?
One possibility...
Make the fields in the load file long enough to contain the maximum length
of utf-8 data for that field, and then pad the bytes of each field to be
that length when you create the load file, and then trim off the padding
as the field is inserted into the table.
--
This space not for rent.
> Hi all,
>
> I'm trying to load an UTF8 file with fixed length record using SQL
> Loader for Oracle 8i.
>
> Unfortunately, SQL Loader does not make the difference between bytes
> and characters. i.e. that if two bytes are used to code an accentuated
> character (like é) SQL loader will consider that two caracters has
> been read meaning that beginning of next field will be shifted.
>
(Just pointers, as I haven't tried this yet ...)
You might find Case Study 11 "Loading Data in the Unicode Character Set"
and Chapter 5 "Handling Different Character Encoding Schemes" of interest.
For Oracle9i R2 Look up Chapter 10 of the Utilities manual at
http://www.tahiti.com
HTH
/Hans
This post is a followup of previous thread with same name, but I had a
problem when replying :-(
>Nicolas Mayoraz (nm...@dkweb.ch) wrote:
>
>: I'm trying to load an UTF8 file with fixed length record using SQL
>: Loader for Oracle 8i.
>
>But utf-8 isn't fixed length.
>
I know utf-8 is not fixed length. I'm not referring to the byte length
(number of bytes needed to encode) but to the characters length
(number of characters seen when displayed on the screen) = number of
characters seized by the end-user
>One possibility...
>
>Make the fields in the load file long enough to contain the maximum
length
>of utf-8 data for that field, and then pad the bytes of each field to
be
>that length when you create the load file, and then trim off the
padding
>as the field is inserted into the table.
Unfortunately, I have no way to change the received file :-( But
thanks for the idea.
What happens if you define the ctl file as?:
LOAD DATA
CHARACTERSET UTF8
INFILE 'XXX.txt'
REPLACE
INTO TABLE tmp.MY_TABLE
(
LASTN char(20), -- Last name
FIRSN char(20), -- First name
STATU char(1), -- Status
BIRDT char(8) -- Date of birth
)
?
Anurag
>What happens if you define the ctl file as?:
>
>LOAD DATA
>CHARACTERSET UTF8
> INFILE 'XXX.txt'
>REPLACE
>
> INTO TABLE tmp.MY_TABLE
> (
> LASTN char(20), -- Last name
> FIRSN char(20), -- First name
> STATU char(1), -- Status
> BIRDT char(8) -- Date of birth
>)
>
>?
>
>Anurag
With SQL Loader 8i it's not working properly. But I managed to install
SQL Loader 9i and it's working well with the LENGTH SEMANTICS CHAR
option and using the syntax you proposed above.
Best regards,
Nik
hmm .. well I see different results
.. I did a test on a 8.1.7.4 database and found that it works with the syntax I posted above.
I tested it with greek and french data.
Although POSITION(x:y) refers to bytes in 8i, the char (afaik) refers to characters.
Anurag