Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

SQL Loader Problem and How to remove duplicates ?

197 views

Skip to first unread message

thielm

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to bpih...@nla.gov.au, dav...@sfmic.com, stu...@mit.edu

Hey Guys

Maybe some of you can help me

I've got a asci file with about 600,000 records (about 100MB). Each
record is identified by a 40 character key. There are a lot of
duplicates in the file.

I'm trying to use sql loader to load this file into a table with a
primary key on the 40 characters. i.e. I don't want the duplicates.

When I use sql loader conventional path (GUI Interface) I get the
following behaviour. The first 20,000 records load in 10 seconds.
After the 20,000 (first duplicate occurs) the load suddenly becomes
unbarebly slow and seems to get slower over time. I don't know why, but
I'm suspecting that the bad file is part of it (since it is getting
bigger and bigger).

Maybe somebody can answer these questions for me.

1. How do I turn of the bad file (and discard file). I know there are a
lot of dups and I'd like to turn off the overhead of having to create a
bad file. I couldn't find anything in the docs to turn the bad file off
all together.

2. Does anyone have a theory why the load suddenly gets so slow ?

If I drop the index and do a load of the same file it absolutly flies,
but now I'm stuck with a lot of duplicate keys. I don't want these
duplicates.
I tried using a insert into from select distinct .... statement after
the load but since a duplicate key doesn't necesseraly mean that all the
other fields in the record are the same this is a problem too.

Example File

Key Info CollectionDate

abc Hello 1/3/97
abc Hello 1/8/97
def Hi 2/2/97

If I issue :

insert into sample(Key, Info, CollectDate)
select t.key, t.info, t.collectdate
from (select distinct key, info, collectdate from temptable) t;

I still get 3 records because the collectiondate is different in all
three records. I don't want the second occurence of 'abc'. I do want the
first occurence including the collectdate. This means that I can't
remove the collect date from the statement since I need it.

Maybe somebody can answer this questions for me.

3. Is there an easy way to remove the duplicate keys but still retain
all the fields associated with the first occurence of the key. For
Instance, Other DBMS tools will create a violation table with all the
duplicate keys when you create a primary index.
This would be perfect but it doesn't seem oracle does this, it just
fails when you create the index.

One more problem :

I tried to use the direct path method and got the following message :

SQL Loader version 7.3.2.2.0 etc etc etc
Error checking path homogeneity
ORACLE-02371 : Loader must be at least version 7.3.2.2.0 for Direct Path

I don't know what this means, I clearly have version 7.3.2.2.0

Thanks to anyone who can answer some of my questions.

Mike.

0 new messages