not in VS not exists

R.W. Fairbairn

unread,

Nov 14, 1999, 3:00:00 AM11/14/99

to

What is the difference between these two?

Thomas Kyte

unread,

Nov 14, 1999, 3:00:00 AM11/14/99

to

A copy of this was sent to "R.W. Fairbairn" <richard....@tesco.net>
(if that email address didn't require changing)

On Sun, 14 Nov 1999 10:27:00 -0000, you wrote:

>What is the difference between these two?
>

Logically, they operate differently. consider:

scott@8.0> create table t1 ( x int );
Table created.

scott@8.0> create table t2 ( y int );
Table created.

scott@8.0> insert into t1 values ( 1 );
scott@8.0> insert into t2 values ( 2 );
scott@8.0> insert into t2 values ( null );

scott@8.0> select * from t1 where x not in ( select y from t2 );
no rows selected

scott@8.0> select * from t1 where not exists ( select 1 from t2 where y=x );

X
----------
1

The first one returns zero rows always (due to the inclusion of a NULL in the
subquery). Tri-valued logic returns "unknown" when it asks "is x not in ( NULL
)". therefore, no rows are returned.

In the second query -- it is asking a much more specific question -- Return a
row such that there exists NO exact match in the other table.

So, logically they are very different. Physically they are different as well
(how they get processed). Typically a NOT IN is very expensive to process as
the subquery tends to get evaluated once per row from the outer query (with RBO
this is the case, with the CBO -- it might not be but usually is). The entire
set of values to 'not be in' is generated. You can see this with a query like:

tkyte@8.0> create table emp as select * from scott.emp;
Table created.

tkyte@8.0> create index emp_sal_idx on emp(sal);
Index created.

tkyte@8.0> set autotrace on

tkyte@8.0> select * from emp where sal not in ( select sal from emp );
no rows selected

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 TABLE ACCESS (FULL) OF 'EMP'

Statistics
----------------------------------------------------------
0 recursive calls
45 db block gets
15 consistent gets

tkyte@8.0> select * from emp where not exists ( select 1 from emp b where b.sal
= emp.sal );

no rows selected

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 INDEX (RANGE SCAN) OF 'EMP_SAL_IDX' (NON-UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
3 db block gets
3 consistent gets

The not exists -- using the index -- did less work here (3 db block gets vs 45).
On a large subquery, or a subquery that is expensive to compute, this can be a
very large performance impact.

If the logic permits it, i prefer to use NOT EXISTS when possible. Another
alternative is an outer join. Using t1 and t2 again from above with different
data we see:

tkyte@8.0> insert into t1 values (1);
tkyte@8.0> insert into t1 values (2);
tkyte@8.0> insert into t2 values (2);
tkyte@8.0> insert into t2 values (null);

tkyte@8.0> select t1.* from t1, t2 where t1.x = t2.y(+) and t2.y is null;

X
----------
1

tkyte@8.0> select * from t1 where x not in ( select y from t2 );

no rows selected

tkyte@8.0> select * from t1 where not exists ( select 1 from t2 where y=x );

X
----------
1

An outer join works much like a NOT EXISTS logically. The NOT EXISTS may be
more performant however since it stops evaluating the subquery as soon as it
finds a row that does match whereas the join will evaulate all rows and then
throw out matches.

--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'...
Current article is "Part I of V, Autonomous Transactions" updated June 21'st

Thomas Kyte tk...@us.oracle.com
Oracle Service Industries Reston, VA USA

Opinions are mine and do not necessarily reflect those of Oracle Corporation

R.W. Fairbairn

unread,

Nov 14, 1999, 3:00:00 AM11/14/99

to

Many thanks for this comprehensive response. It is very much appreciated.

Thomas Kyte wrote in message <6ccuOHl1o2azn3...@4ax.com>...

>A copy of this was sent to "R.W. Fairbairn" <richard....@tesco.net>
>(if that email address didn't require changing)
>On Sun, 14 Nov 1999 10:27:00 -0000, you wrote:
>
>>What is the difference between these two?
>>
>
>
>Logically, they operate differently. consider:
>
>scott@8.0> create table t1 ( x int );
>Table created.
>
>scott@8.0> create table t2 ( y int );
>Table created.
>
>scott@8.0> insert into t1 values ( 1 );
>scott@8.0> insert into t2 values ( 2 );
>scott@8.0> insert into t2 values ( null );
>
>scott@8.0> select * from t1 where x not in ( select y from t2 );
>no rows selected
>
>scott@8.0> select * from t1 where not exists ( select 1 from t2 where
y=x );
>
> X
>----------
> 1
>
>
>The first one returns zero rows always (due to the inclusion of a NULL in
the
>subquery). Tri-valued logic returns "unknown" when it asks "is x not in