a bit of background: we have a stored procedure that needs to lock
several rows via SELECT ... FOR UPDATE. Currently this stored
procedure has a single VARCHAR2 argument which contains a comma
separated list of integers (ids) and consequently uses EXECUTE
IMMEDIATE. Since the application has a lot duplicate SQL which would
be reduced by using bind variables I did some exploration into how
this could be solved via JDBC.
In the spirit of Richard, Jonathan and Tom I did some experimenting to
find out how different approaches would work, especially from a
performance point of view. For anyone interested here is the test
case - and an question follows further down because there is one
decision of the CBO I do not understand.
Kind regards
robert
SQL> set pagesize 100 linesize 200
SQL> set serverout on
SQL> create table t1 (
2 id number(10) primary key,
3 dat varchar2(100) not null
4 )
5 /
Table created.
SQL> timing start insert
SQL> insert into t1
2 select level, rpad('dat ' || level, 90,'*')
3 from dual
4 connect by level <= 1000000
5 /
1000000 rows created.
SQL> commit
2 /
Commit complete.
SQL> timing stop
timing for: insert
Elapsed: 00:00:11.78
SQL> timing start "Gather stats"
SQL> begin
2 dbms_stats.gather_schema_stats(
3 user,
4 cascade => true,
5 estimate_percent => null,
6 method_opt => 'FOR ALL COLUMNS'
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
SQL> timing stop
timing for: Gather stats
Elapsed: 00:00:51.25
SQL> create or replace type iset as table of number(10)
2 /
Type created.
SQL> alter session set tracefile_identifier = 'nested_short'
2 /
Session altered.
SQL> alter session set events '10053 trace name context forever, level
1';
Session altered.
SQL> /
Session altered.
SQL> set autotrace on
SQL> timing start "Literals with IN clause"
SQL> select length(dat) from t1 where id in (-1,2,3)
2 /
LENGTH(DAT)
-----------
90
90
Execution Plan
----------------------------------------------------------
Plan hash value: 47079033
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 285
| 6 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | |
| | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 3 | 285
| 6 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | SYS_C007311 | 3 |
| 5 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("ID"=(-1) OR "ID"=2 OR "ID"=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
363 bytes sent via SQL*Net to client
377 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> timing stop
timing for: Literals with IN clause
Elapsed: 00:00:00.06
SQL> timing start "Subselect with IN clause"
SQL> select length(dat) from t1 where id in ( select column_value from
table(iset(-1,2,3)) )
2 /
LENGTH(DAT)
-----------
90
90
Execution Plan
----------------------------------------------------------
Plan hash value: 949238496
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 255
| 24735 | 529 (1)| 00:00:03 |
| 1 | NESTED LOOPS | | 255
| 24735 | 529 (1)| 00:00:03 |
| 2 | SORT UNIQUE | |
| | | |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| |
| | | |
| 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1
| 95 | 2 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C007311 | 1
| | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("ID"=VALUE(KOKBF$))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
32 consistent gets
0 physical reads
0 redo size
363 bytes sent via SQL*Net to client
377 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> timing stop
timing for: Subselect with IN clause
Elapsed: 00:00:00.06
SQL> timing start "Join"
SQL> select length(dat) from t1 join table(iset(-1,2,3)) tt on
tt.column_value = t1.id
2 /
LENGTH(DAT)
-----------
90
90
Execution Plan
----------------------------------------------------------
Plan hash value: 472081508
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes
| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
773K| 2054 (5)| 00:00:10 |
|* 1 | HASH JOIN | | 8168 |
773K| 2054 (5)| 00:00:10 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | |
| | |
| 3 | TABLE ACCESS FULL | T1 | 1000K|
90M| 2015 (4)| 00:00:10 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"=VALUE(KOKBF$))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
14192 consistent gets
0 physical reads
0 redo size
363 bytes sent via SQL*Net to client
377 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> timing stop
timing for: Join
Elapsed: 00:00:00.23
SQL> drop type iset
2 /
Type dropped.
SQL> drop table t1
2 /
Table dropped.
SQL> exit
I wondered: why is it that the join version does a full table scan but
the IN ( subselect ) version does not? I did a 10053 trace and found
out that the same access strategy which leads to the INDEX UNIQUE SCAN
in the case of IN ( subselect ) is costed much higher (~ factor of 30)
in the JOIN case. BASE STATISTICS are identical in both cases and it
seems the culprit is the join cardinality (255 vs. 8168) which seems
derived from the outer table cardinality. The big myth to me is: where
does the CBO get the cardinality of 255 from all of a sudden since it
originally assumed 8168?
Base stats are identical, just tables are differently ordered. This is
from the good case:
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 1000000 #Blks: 14177 AvgRowLen: 95.00
Index Stats::
Index: SYS_C007309 Col#: 1
LVLS: 2 #LB: 1875 #DK: 1000000 LB/K: 1.00 DB/K: 1.00 CLUF:
14083.00
***********************
Table Stats::
Table: KOKBF$ Alias: KOKBF$ (NOT ANALYZED)
#Rows: 8168 #Blks: 100 AvgRowLen: 100.00
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Table: KOKBF$ Alias: KOKBF$
Card: Original: 8168 Rounded: 8168 Computed: 8168.00 Non
Adjusted: 8168.00
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 15.45 Resp: 15.45 Degree: 0
Cost_io: 15.00 Cost_cpu: 1937344
Resp_io: 15.00 Resp_cpu: 1937344
Best:: AccessPath: TableScan
Cost: 15.45 Degree: 1 Resp: 15.45 Card: 8168.00 Bytes: 0
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Table: T1 Alias: T1
Card: Original: 1000000 Rounded: 1000000 Computed: 1000000.00
Non Adjusted: 1000000.00
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 2014.78 Resp: 2014.78 Degree: 0
Cost_io: 1952.00 Cost_cpu: 270960655
Resp_io: 1952.00 Resp_cpu: 270960655
Best:: AccessPath: TableScan
Cost: 2014.78 Degree: 1 Resp: 2014.78 Card: 1000000.00
Bytes: 0
Good case: select length(dat) from t1 where id in ( select
column_value from table(iset(-1,2,3)) )
Join order[2]: KOKBF$[KOKBF$]#1 T1[T1]#0
SORT resource Sort statistics
Sort width: 349 Area size: 307200 Max Area size:
61446144
Degree: 1
Blocks to Sort: 13 Row size: 13 Total
Rows: 8168
Initial runs: 1 Merge passes: 0 IO Cost /
pass: 0
Total IO sort cost: 0 Total CPU sort cost: 9098366
Total Temp space used: 0
***************
Now joining: T1[T1]#0
***************
NL Join
Outer table: Card: 255.25 Cost: 17.56 Resp: 17.56 Degree: 1
Bytes: 2
Inner table: T1 Alias: T1
Access Path: TableScan
NL Join: Cost: 513378.14 Resp: 513378.14 Degree: 1
Cost_io: 497366.00 Cost_cpu: 69106002705
Resp_io: 497366.00 Resp_cpu: 69106002705
Access Path: index (UniqueScan)
Index: SYS_C007309
resc_io: 2.00 resc_cpu: 16313
ix_sel: 1.0000e-06 ix_sel_with_filters: 1.0000e-06
NL Join: Cost: 528.52 Resp: 528.52 Degree: 1
Cost_io: 525.00 Cost_cpu: 15195495
Resp_io: 525.00 Resp_cpu: 15195495
Access Path: index (AllEqUnique)
Index: SYS_C007309
resc_io: 2.00 resc_cpu: 16313
ix_sel: 1.0000e-06 ix_sel_with_filters: 1.0000e-06
NL Join: Cost: 528.52 Resp: 528.52 Degree: 1
Cost_io: 525.00 Cost_cpu: 15195495
Resp_io: 525.00 Resp_cpu: 15195495
Best NL cost: 528.52
resc: 528.52 resc_io: 525.00 resc_cpu: 15195495
resp: 528.52 resp_io: 525.00 resp_cpu: 15195495
Join Card: 255.25 = outer (255.25) * inner (1000000.00) * sel
(1.0000e-06)
Join Card - Rounded: 255 Computed: 255.25
Bad case: select length(dat) from t1 join table(iset(-1,2,3)) tt on
tt.column_value = t1.id
Join order[1]: KOKBF$[KOKBF$]#0 T1[T1]#1
***************
Now joining: T1[T1]#1
***************
NL Join
Outer table: Card: 8168.00 Cost: 15.45 Resp: 15.45 Degree: 1
Bytes: 2
Inner table: T1 Alias: T1
Access Path: TableScan
NL Join: Cost: 16443617.43 Resp: 16443617.43 Degree: 1
Cost_io: 15930808.00 Cost_cpu: 2213208566404
Resp_io: 15930808.00 Resp_cpu: 2213208566404
Access Path: index (UniqueScan)
Index: SYS_C007309
resc_io: 2.00 resc_cpu: 16313
ix_sel: 1.0000e-06 ix_sel_with_filters: 1.0000e-06
NL Join: Cost: 16382.32 Resp: 16382.32 Degree: 1
Cost_io: 16351.00 Cost_cpu: 135180948
Resp_io: 16351.00 Resp_cpu: 135180948
Access Path: index (AllEqUnique)
Index: SYS_C007309
resc_io: 2.00 resc_cpu: 16313
ix_sel: 1.0000e-06 ix_sel_with_filters: 1.0000e-06
NL Join: Cost: 16382.32 Resp: 16382.32 Degree: 1
Cost_io: 16351.00 Cost_cpu: 135180948
Resp_io: 16351.00 Resp_cpu: 135180948
Best NL cost: 16382.32
resc: 16382.32 resc_io: 16351.00 resc_cpu: 135180948
resp: 16382.32 resp_io: 16351.00 resp_cpu: 135180948
Join Card: 8168.00 = outer (8168.00) * inner (1000000.00) * sel
(1.0000e-06)
Join Card - Rounded: 8168 Computed: 8168.00
> I wondered: why is it that the join version does a full table scan but
> the IN ( subselect ) version does not?
From a logical point of view I would say: because what you ask for is
something different:
- For an IN clause, 1 occurence is enough to decide if there's a match
- For a join, you specifically ask for all matching occurences.
Shakespeare
Yes, that's certainly the major difference. But why does this lead
the CBO to suddenly change its idea of the table variable's
cardinality? I could not find anything in the trace that would
explain this. Of course it could be that the CBO simply does not
trace that bit of information...
Kind regards
robert
If you check the cardinality, you will see that it is 255.25, which
is EXACTLY 8168 / 32.
When Oracle doesn't know what the real statistics are, it uses some
arbitrary estimate. In this case, because the IN approach transforms
into an inline 'select distinct column_value' and Oracle doesn't know
how many distinct values there are, I guess is simply assumes 32.
Just as 8168 is a side effect of the default block size, this 32 may also
be based on the block size. So if you've got a database with a
different block size handy, you might like to test what happens there.
--
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
Yes that's what the report for the join analysis said. I didn't bother
to quote the fractional digits in the text because I did not notice this
relationship. Thanks for the heads up!
> When Oracle doesn't know what the real statistics are, it uses some
> arbitrary estimate. In this case, because the IN approach transforms
> into an inline 'select distinct column_value' and Oracle doesn't know
> how many distinct values there are, I guess is simply assumes 32.
>
> Just as 8168 is a side effect of the default block size, this 32 may also
> be based on the block size. So if you've got a database with a
> different block size handy, you might like to test what happens there.
Hm... I'll see whether I find the time to retest with a different non
standard sized tablespace.
Interestingly an explicit DISTINCT does not save the join and we now get
a hash join - concluding from the rows the CBO does not assume any
duplicate rows - which is in line with the 10053 trace of that run,
which assumes 8162 rows for the subquery:
SQL> timing start "Join and distinct"
SQL> select length(dat) from t1 join ( select distinct column_value from
table(iset(-1,2,3)) ) tt on tt.column_value = t1.id
2 /
LENGTH(DAT)
-----------
90
90
Execution Plan (truncated bytes and other cols for readability)
----------------------------------------------------------
Plan hash value: 731532569
------------------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
|* 1 | HASH JOIN | | 8168 |
| 2 | VIEW | | 8168 |
| 3 | HASH UNIQUE | | 8168 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | |
| 5 | TABLE ACCESS FULL | T1 | 1000K|
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TT"."COLUMN_VALUE"="T1"."ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
14192 consistent gets
0 physical reads
0 redo size
363 bytes sent via SQL*Net to client
377 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> timing stop
timing for: Join and distinct
Elapsed: 00:00:00.23
Basically since the IN with subselect is the version that suits my needs
best I am quite happy that it is blessed with a good plan. Although I
have to say that it makes me a bit wary that I do not exactly understand
what goes on in the other cases. I guess it's time for "CBO
Fundamentals" - if only I had enough time...
Again, thanks for your help!
Kind regards
robert
The explain plan changes back to nested loops when the optimizer goal is
set to first rows.
Shakespeare
> The explain plan changes back to nested loops when the optimizer goal is
> set to first rows.
Interesting. Incidentally there is a recent blog post:
http://jonathanlewis.wordpress.com/2008/11/11/first_rows_n/
Cheers
robert
Thanks for the link! I wasn't advertising on using first rows, but was
trying to show that the plans depend on statistics and optimizer settings.
By the way: I noticed in a test database that all selects on constructs
of table(iset(1,2,3)) boil down to the optimizer guessing 8168 rows...
no matter what type is used or how many rows/blocks.
Shakespeare
I just thought the link fits the context. :-)
> but was
> trying to show that the plans depend on statistics and optimizer settings.
Point taken.
> By the way: I noticed in a test database that all selects on constructs
> of table(iset(1,2,3)) boil down to the optimizer guessing 8168 rows...
> no matter what type is used or how many rows/blocks.
Basically I see the same. There is only this anomaly (?) that for
"... WHERE id in ( select column_value from table(iset(-1,2,3)) )" it
assumes cardinality 255 for table(iset(-1,2,3)):
Since I still have no clear idea where this comes from I am a bit wary
to use this. Who guarantees that the plan won't degrade?
Kind regards
robert