I have two tables (t1, t2)
t1 t2
id recid
col1
col2
t1.id is indexed primary key. It has 1,000,000 rows.
t2.recid is indexed primary key. It has 50 rows.
I have tried several queries below. They all take more than 3 minutes
to update 50 rows in table t1.
update t1
set col1 = 'A',
col2 = 0
where t1.id in (select recid from t2)
update t1
set col1 = 'A',
col2 = 0
where exists (select recid from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where exists (select 'a' from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where t1.id = (select recid from t2
where recid = t1.id)
update t1
set col1 = 'A',
col2 = 0
where t1.id IN (select recid from t2
where recid = t1.id)
It has to scan 1,000,000 rows in t1 table.
How can I improve this Update query ?
How can I let it scan the t2 instead of t1 table ?
My machine is HP5900, 2 CPU, 1G of memory.
DBMS is Informix 7.3.
your problem is not the sequential scan on table t1, but in almost
all cases the 1,000,000 sub-selects on the small table. I thought the
correlated sub-queries would be optimized a bit better in version 7.3
( as they are optimized in IDS/EPO earlier XPS - includes really a
"cool" optimizer ).
Because you know that there are just 50 rows to be updated, transform
your query in this way.
Schema:
for each row select t1.id INTO savedVal from t1, t2
where t1.id = t2.recid
update t1 set col1 = 'A', col2 = 0 where id = savedVal;
end foreach;
Okay, it looks as if you would need a programm. But the best way
I think is to use a temporary stored procedure.
-------------------------- Detail --------------------------------
dbaccess yourdatabase - <<eof
BEGIN WORK; -- if you use transactions, put it in a transaction
CREATE PROCEDURE HungTemp()
DEFINE t1id_i INT; -- use the same datatype as for the t1 column
FOREACH SELECT t1.id INTO t1id_i FROM t1, t2 WHERE
t1.id = t2.recid
UPDATE t1 SET col1 = 'A', col2 = 0 WHERE id = t1id_i;
END FOREACH;
END PROCEDURE;
EXECUTE PROCEDURE HungTemp(); -- will execute the stored procedure
DROP PROCEDURE HungTemp();
COMMIT WORK;
eof
-----------------------------------------------------------------
The whole script should take about 50 * ( 5ms + 20ms ) + 400ms = 1.65sec
If the query will take longer than 2sec, then have a detailled look at
the sqexplain.out file of the optimizer.
dbaccess yourdatabase - <<eof
SET EXPLAIN ON;
SELECT t1.id FROM t1, t2 WHERE t1.id = t2.recid;
eof
The optimizer should start with table t2 and should continue
with t1.
The contents of the file "sqexplain.out" must look like:
Query:
======
SELECT t1.id FROM t1, t2 WHERE t1.id = t2.recid;
Estimated Cost: #####
Estimated #of rows returned: #####
1) t2: SEQUENTIAL SCAN
2) t1: INDEX PATH
Index Keys: id
Lower Index Filter: t1.id = t2.recid
If the order of the path is wrong ( t1 -> t2 ), then you forgot
your "update statistics" for the tables. Do the update statistics
and try it again.
If this will not help, add the optimizer directive into the
SELECT statement and don't forget to enable the directives in
your server environment:
$ export IFX_DIRECTIVES="ON"
$ dbaccess youdatabase - <<eof
...
CREATE PROCEDURE ...
...
FOREACH SELECT { +USE_INDEX(t1, idxnameof_id) } t1.id
INTO t1id_i FROM t2, t1 WHERE t2.recid + 0 = t1.id
...
END FOREACH;
END PROCEDURE;
...
COMMIT WORK;
eof
This will force the optimizer to do everything correct.
Bye
Stefan Weideneder
PS:
For Germans only: Das und vieles mehr lernt man im Informix
Seminar T225 ( man sollte auch schon mal Werbung machen ).
>Hi
>t1.id is indexed primary key. It has 1,000,000 rows.
>t2.recid is indexed primary key. It has 50 rows.
>I have tried several queries below. They all take more than 3 minutes
>to update 50 rows in table t1.
>update t1
> set col1 = 'A',
> col2 = 0
>where t1.id in (select recid from t2)
This one shouldn't be bad with only 50 rows in
t2 and an index on t1.id. And it not a
correlated subquery like the other
ones you list (those'll be slow).
Maybe you need an 'update statistics' on the tables.
Hope that helps,
Douglas Wilson
If that doesn't work, (and this will sound odd), try putting a silly
criteria on t2 which will return all rows, like maybe...
update t1
set col1 = 'A',
col2 = 0
where t1.id in (select recid from t2 where recid < (largest possible recid
in t2 + 1) )
A silly criteria like this sometimes causes optimizers to scan t2 rather
than t1.
Worth a try.
Pete Bloxam.
(To reply please mail pe...@bloxam.force9.co.uk. Do not reply to sender as
an Anti Spam e-mail address was entered)
>update t1
> set col1 = 'A',
> col2 = 0
>where t1.id in (select recid from t2)
Did you do a 'set explain on' and get
an 'sqexplain.out' file? Are the two tables
actually in the same database, or is one of them
a synonym for a table in another db? Did you update
statistics?
Good Luck,
Douglas Wilson