Update too slow !!!

Hung Phi

unread,

Jul 24, 1998, 3:00:00 AM7/24/98

to

Hi

I have two tables (t1, t2)
t1 t2
id recid
col1
col2
t1.id is indexed primary key. It has 1,000,000 rows.
t2.recid is indexed primary key. It has 50 rows.
I have tried several queries below. They all take more than 3 minutes
to update 50 rows in table t1.
update t1
set col1 = 'A',
col2 = 0
where t1.id in (select recid from t2)
update t1
set col1 = 'A',
col2 = 0
where exists (select recid from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where exists (select 'a' from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where t1.id = (select recid from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where t1.id IN (select recid from t2
where recid = t1.id)

It has to scan 1,000,000 rows in t1 table.
How can I improve this Update query ?
How can I let it scan the t2 instead of t1 table ?

My machine is HP5900, 2 CPU, 1G of memory.
DBMS is Informix 7.3.

Will Dunton

unread,

Jul 25, 1998, 3:00:00 AM7/25/98

to

You may want to try;
update statistics for table t1
then your statement

Stefan Weideneder

unread,

Jul 25, 1998, 3:00:00 AM7/25/98

to Hung Phi

Hi,

your problem is not the sequential scan on table t1, but in almost
all cases the 1,000,000 sub-selects on the small table. I thought the
correlated sub-queries would be optimized a bit better in version 7.3
( as they are optimized in IDS/EPO earlier XPS - includes really a
"cool" optimizer ).

Because you know that there are just 50 rows to be updated, transform
your query in this way.

Schema:

for each row select t1.id INTO savedVal from t1, t2
where t1.id = t2.recid
update t1 set col1 = 'A', col2 = 0 where id = savedVal;
end foreach;

Okay, it looks as if you would need a programm. But the best way
I think is to use a temporary stored procedure.

-------------------------- Detail --------------------------------
dbaccess yourdatabase - <<eof

BEGIN WORK; -- if you use transactions, put it in a transaction
CREATE PROCEDURE HungTemp()
DEFINE t1id_i INT; -- use the same datatype as for the t1 column

FOREACH SELECT t1.id INTO t1id_i FROM t1, t2 WHERE
t1.id = t2.recid
UPDATE t1 SET col1 = 'A', col2 = 0 WHERE id = t1id_i;
END FOREACH;
END PROCEDURE;

EXECUTE PROCEDURE HungTemp(); -- will execute the stored procedure
DROP PROCEDURE HungTemp();
COMMIT WORK;
eof
-----------------------------------------------------------------

The whole script should take about 50 * ( 5ms + 20ms ) + 400ms = 1.65sec

If the query will take longer than 2sec, then have a detailled look at
the sqexplain.out file of the optimizer.

dbaccess yourdatabase - <<eof
SET EXPLAIN ON;
SELECT t1.id FROM t1, t2 WHERE t1.id = t2.recid;
eof

The optimizer should start with table t2 and should continue
with t1.
The contents of the file "sqexplain.out" must look like:

Query:
======
SELECT t1.id FROM t1, t2 WHERE t1.id = t2.recid;

Estimated Cost: #####
Estimated #of rows returned: #####

1) t2: SEQUENTIAL SCAN

2) t1: INDEX PATH

Index Keys: id
Lower Index Filter: t1.id = t2.recid

If the order of the path is wrong ( t1 -> t2 ), then you forgot
your "update statistics" for the tables. Do the update statistics
and try it again.
If this will not help, add the optimizer directive into the
SELECT statement and don't forget to enable the directives in
your server environment:

$ export IFX_DIRECTIVES="ON"
$ dbaccess youdatabase - <<eof
...
CREATE PROCEDURE ...
...
FOREACH SELECT { +USE_INDEX(t1, idxnameof_id) } t1.id
INTO t1id_i FROM t2, t1 WHERE t2.recid + 0 = t1.id
...
END FOREACH;
END PROCEDURE;
...
COMMIT WORK;
eof

This will force the optimizer to do everything correct.

Bye

Stefan Weideneder

PS:
For Germans only: Das und vieles mehr lernt man im Informix
Seminar T225 ( man sollte auch schon mal Werbung machen ).

Douglas Wilson

unread,

Jul 25, 1998, 3:00:00 AM7/25/98

to

On Fri, 24 Jul 1998 14:24:18 GMT, "Hung Phi" <hp...@amtelco.com> wrote:

>Hi

>t1.id is indexed primary key. It has 1,000,000 rows.
>t2.recid is indexed primary key. It has 50 rows.
>I have tried several queries below. They all take more than 3 minutes
>to update 50 rows in table t1.
>update t1
> set col1 = 'A',
> col2 = 0
>where t1.id in (select recid from t2)

This one shouldn't be bad with only 50 rows in
t2 and an index on t1.id. And it not a
correlated subquery like the other
ones you list (those'll be slow).

Maybe you need an 'update statistics' on the tables.

Hope that helps,
Douglas Wilson

Pete Bloxam

unread,

Jul 27, 1998, 3:00:00 AM7/27/98

to

I am no expert on the Informix Optimizer, but you could always try putting
the contents of t2 into a temporary table. Informix may take the view that
since the table is in memory (temporary stuff below about 2Mb doesn't
actually hit the temp dbspace - at least on NT) and then try one of your
queries below (probably not the WHERE IN one though, they're usually
slower).

If that doesn't work, (and this will sound odd), try putting a silly
criteria on t2 which will return all rows, like maybe...

update t1
set col1 = 'A',
col2 = 0

where t1.id in (select recid from t2 where recid < (largest possible recid
in t2 + 1) )

A silly criteria like this sometimes causes optimizers to scan t2 rather
than t1.

Worth a try.

Pete Bloxam.
(To reply please mail pe...@bloxam.force9.co.uk. Do not reply to sender as
an Anti Spam e-mail address was entered)

Douglas Wilson

unread,

Jul 28, 1998, 3:00:00 AM7/28/98

to

On Fri, 24 Jul 1998 14:24:18 GMT, "Hung Phi" <hp...@amtelco.com> wrote:

>update t1
> set col1 = 'A',
> col2 = 0

>where t1.id in (select recid from t2)

Did you do a 'set explain on' and get
an 'sqexplain.out' file? Are the two tables
actually in the same database, or is one of them
a synonym for a table in another db? Did you update
statistics?

Good Luck,
Douglas Wilson