Which is better... IN() or EXISTS()

Mike Wazowski

unread,

Oct 7, 2008, 7:35:35 AM10/7/08

to

Hello

We're trying to resolve some performance issues and would be grateful for
your thoughts on the benefits of each of these methods to get the same
result.

Which is more performant?

SELECT Fields
FROM Table
WHERE FKeyField IN(
SELECT KeyField FROM OtherTable WHERE OtherValue = 'FilterValue'
)

or...

SELECT Fields
FROM Table
WHERE EXISTS(
SELECT Anything FROM OtherTable WHERE KeyField = Table.FKeyField AND
OtherValue = 'FilterValue'
)

Many thanks

Mike

Uri Dimant

unread,

Oct 7, 2008, 7:40:46 AM10/7/08

to

Mike
I prefer EXISTS/NOT EXISTS
For example if you use NOT IN and the column has a NULL value you are about
to get a wrong result

SELECT * FROM tbl WHERE col NOT IN (SELECT col FROM anothertbl ...)

"Mike Wazowski" <bookham_meas...@yahoo.com> wrote in message
news:O$6jTDHK...@TK2MSFTNGP04.phx.gbl...

Frank Uray

unread,

Oct 7, 2008, 7:54:01 AM10/7/08

to

Hi Mike

The meaning of IN/NOT IN is different to
EXISTS/NOT EXISTS ...

You will use IN when the SubQuery returns more than
one result. In fact, SQL Server is just joining the tables.

In you case you can also use "WHERE FKeyField = (SELECT ...)"

Regards
Frank

Dan Guzman

unread,

Oct 7, 2008, 8:04:05 AM10/7/08

to

> Which is more performant?

Note that SQL is a declarative rather than a procedural language. If the
queries are semantically the same, the SQL Server cost-based optimizer ought
to be able to come up with the same optimal execution plan in both cases.

Proper index use is the key to performance. I suggest you view the query
execution plan (in SQL Server Management Studio, select the query text and
press ctrl-L) to ensure that indexes are used as expected. I would expect a
composite one on KeyField and OtherValue would be optimal for the subquery
performance and perhaps one on KeyField too.

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

"Mike Wazowski" <bookham_meas...@yahoo.com> wrote in message
news:O$6jTDHK...@TK2MSFTNGP04.phx.gbl...

Dan Guzman

unread,

Oct 7, 2008, 8:38:10 AM10/7/08

to

> The meaning of IN/NOT IN is different to
> EXISTS/NOT EXISTS ...

True but EXISTS and IN can often be expressed as a JOIN or correlated
subquery. I get identical execution plans with all of the methods below:

CREATE TABLE dbo.MyTable(
MyTableKey int NOT NULL
CONSTRAINT PK_MyTable PRIMARY KEY,
Fields int NOT NULL,
FKeyField int NOT NULL
);

CREATE TABLE dbo.OtherTable(
KeyField int NOT NULL
CONSTRAINT PK_OtherTable PRIMARY KEY,
OtherValue varchar(20) NOT NULL
);

CREATE INDEX index1 ON dbo.OtherTable(
OtherValue, KeyField);

ALTER TABLE dbo.MyTable
ADD CONSTRAINT FK_MyTable_OtherTable
FOREIGN KEY (FKeyField)
REFERENCES dbo.OtherTable(KeyField);

SELECT Fields
FROM dbo.MyTable
WHERE FKeyField IN(
SELECT OtherTable.KeyField
FROM dbo.OtherTable
WHERE OtherValue = 'FilterValue'
);

SELECT Fields
FROM dbo.MyTable
WHERE EXISTS(
SELECT *
FROM dbo.OtherTable
WHERE OtherTable.KeyField = MyTable.FKeyField AND
OtherTable.OtherValue = 'FilterValue'
);

SELECT Fields
FROM dbo.MyTable
JOIN dbo.OtherTable ON
OtherTable.KeyField = MyTable.FKeyField
AND OtherTable.OtherValue = 'FilterValue';

SELECT Fields
FROM dbo.MyTable
WHERE FKeyField =
(
SELECT KeyField
FROM dbo.OtherTable
WHERE
OtherTable.OtherValue = 'FilterValue'
AND OtherTable.KeyField = MyTable.FKeyField
);

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

"Frank Uray" <Fran...@discussions.microsoft.com> wrote in message
news:0C72B4A6-20B3-4E6E...@microsoft.com...

Aaron Bertrand [SQL Server MVP]

unread,

Oct 7, 2008, 9:11:06 AM10/7/08

to

> You will use IN when the SubQuery returns more than
> one result.

Do you mean compared to EXISTS, or compared to = ? You can also use EXISTS
regardless of the number of results returned by the subquery.

For the OP, I prefer EXISTS for a couple of reasons. One is the reason that
Uri pointed out; if your subquery can return NULLs, then IN/NOT IN will not
necessarily produce correct results. The other is that EXISTS has the
possibility of performing better; it can stop "looking" once it has found
its first result, and you can also set it to return a constant (e.g. SELECT
1) instead of a column... This makes it easier to spot as a
non-data-returning element in your query.

A

Mike Wazowski

unread,

Oct 7, 2008, 9:14:12 AM10/7/08

to

Thank you all for your responses.

I have been advised that the first method, using IN(), results in one
subquery where the results are then used to filter the parent result set.
Allegedly using the second EXISTS method, a subquery is effectively
performed for each record in the parent result set, before any other filter
criteria is applied.

Just going to do some more research in Query Analyser. Thanks again.

Mike

"Mike Wazowski" <bookham_meas...@yahoo.com> wrote in message
news:O$6jTDHK...@TK2MSFTNGP04.phx.gbl...

Aaron Bertrand [SQL Server MVP]

unread,

Oct 7, 2008, 9:23:42 AM10/7/08

to

I'd love to see a case where IN performs better. It sounds to me like your
advisor is either guessing, or making facts up, or speaking to one very
specific case... If they are looking at estimated execution plan, for
example, remind them that they should be checking statistics i/o and actual
execution plan. There may be some cases where this is true but in general
you should always test and compare instead of building some imaginary
"always use x" rule. The only rule is, it always depends.

Mike Wazowski

unread,

Oct 7, 2008, 10:17:00 AM10/7/08

to

Thanks Aaron.

It was my thoughts too that EXISTS does not need to retrieve or return
actual data, so would perform better.

"Aaron Bertrand [SQL Server MVP]" <ten...@dnartreb.noraa> wrote in message
news:C510D82A.13E59%ten...@dnartreb.noraa...

Mike Wazowski

unread,

Oct 7, 2008, 10:18:53 AM10/7/08

to

Hi Guys

Thanks again for the comments of those following this thread.

I produced execution plans for both versions and both plans are the same! I
guess now I need to dig in to the statistics mentioned by Aaron to see if I
can get a definitive 'best method' for my DB.

Mike

"Mike Wazowski" <bookham_meas...@yahoo.com> wrote in message
news:O$6jTDHK...@TK2MSFTNGP04.phx.gbl...

Roy Harvey (SQL Server MVP)

unread,

Oct 7, 2008, 11:32:10 AM10/7/08

to

On Tue, 7 Oct 2008 15:18:53 +0100, "Mike Wazowski"
<bookham_meas...@yahoo.com> wrote:

>I produced execution plans for both versions and both plans are the same! I
>guess now I need to dig in to the statistics mentioned by Aaron to see if I
>can get a definitive 'best method' for my DB.

One more thought on the IN vs EXISTS question. IN only works with
single-column keys. EXISTS works with single or multiple column keys.
My tendency is to want to use a consistent style, and EXISTS lets me
use the same approach in both cases.

Roy Harvey
Beacon Falls, CT

Tibor Karaszi

unread,

Oct 7, 2008, 4:13:44 PM10/7/08

to

My guess is that the person who offered that explanation confused the logical query flow from the
physical execution plan. It might be true that *logically* a query is performed in this or that way,
but since we have optimizers, we, with a reasonable optimizer, tend to end up with identical
execution plan in many of the cases (some type of join)...

--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi

"Aaron Bertrand [SQL Server MVP]" <ten...@dnartreb.noraa> wrote in message

news:C510DB1E.13E63%ten...@dnartreb.noraa...