Bulk import date into table.

Leszek Dubiel

unread,

Aug 21, 2004, 4:58:53 PM8/21/04

to

----------------------------------------
BACKGROUND:

My company (www.glass.biz) keeps information about products in a
network-shared (samba) filesystem. Each product has assigned to it
directory, and there our engineers save informations for ERP system.
This information is gathered by travelling whole directory structure
and reading text files. Our engineers work whole days producing more
and more information about new products (individual orders).

----------------------------------------
REAL PROBLEM:

Now I want to make ERP system faster and more reliable by using
PostgreSQL. But one question arrives I can't answer on myself nor by
googling around.

Every period of time (for example 1 day, 1 hour, or 1 minute) I will
have to synchronize my database with information stored in text files.
So how do I tell Postgres to:

1. insert row if data about product doesn't already exist in table

2. delete row if data about product was deleted from text files

3. update row if data has changed

----------------------------------------
MY SOLUTIONS:

1. Start transaction. For each entry in text file do:
-- use SELECT to check if entry exists
-- if entry exists update values
-- if entry doesn't exist insert values
Commit transaction.

In this solution rows that should be deleted are not deleted
and this is wrong. Furthermore I do select for each entry,
and this could take very long, so I think this is not good
solution.

2. Start transaction. Delete all rows from table.
Insert rows from text file. Commit transaction.

In this solution everything works okey, but I think this is
strange to replace whole table by encapsulating it into
transaction. From documentation I know that Postgres will
keep old rows until vacumm is done.

----------------------------------------

I would be very grateful if someone experienced would
advise any reasonable solution.

Leszek Dubiel

Christopher Browne

unread,

Aug 21, 2004, 9:14:20 PM8/21/04

to

The world rejoiced as les...@dubiel.pl (Leszek Dubiel) wrote:
> 2. Start transaction. Delete all rows from table.
> Insert rows from text file. Commit transaction.
>
> In this solution everything works okey, but I think this is
> strange to replace whole table by encapsulating it into
> transaction. From documentation I know that Postgres will keep
> old rows until vacumm is done.

Option #2 is certainly the simplest way to do this.

In #1, you missed one of the steps:

-- If entry in database does not exist in file, then delete from
database

which essentially isn't something you can do "for entry in text file."

It's _way_ more complex to do it row by row, particularly when part of
the logic isn't row-by-row.

You'll want to VACUUM the table in question a little while after each
time it gets replaced, by the way.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxdistributions.html
All ITS machines now have hardware for a new machine instruction --
XOI Execute Operator Immediate.
Please update your programs.

Leszek Dubiel

unread,

Aug 22, 2004, 12:16:00 PM8/22/04

to

> > 2. Start transaction. Delete all rows from table.
> > Insert rows from text file. Commit transaction.
>

> Option #2 is certainly the simplest way to do this.
>

Thank you very much. I thought that it is to simple
to be truth.

Leszek Dubiel

Christopher Browne

unread,

Aug 22, 2004, 6:19:05 PM8/22/04

to

Clinging to sanity, les...@dubiel.pl (Leszek Dubiel) mumbled into her beard:

If the table you keep replacing gets Real Big, this approach will get
steadily Less Nice, as you'll be replacing a whole lot of data on a
regular basis. So that would be bad.

But if, as you say, the data in the data file is always the new,
"authoritative" source of data, then while there may be clever ways to
diminish the amount of work needed to load it, regularly replacing the
NonAuthoritative Data in the database with the Authoritative Data in
the file is surely appropriate.

"Replace all the data" is definitely a "Exercise Brute Force" sort of
method.

When Brute Force works, it works. Sometimes we get query plans that
involve Seq Scans, which are the query equivalent to Brute Force,
because that is, like it or not, the best way to get the answer.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www3.sympatico.ca/cbbrowne/lisp.html
Computers in the future may weigh no more than 1.5 tons. -Popular
Mechanics, forecasting the relentless march of science, 1949