Indexing Strategy: clustered vs nonclustered index

Jean-Sébastien Philippe

unread,

May 6, 2003, 7:41:09 AM5/6/03

to

Hi,

I am a bit confused about indexing strategies. I could find several
general guidelines - but I would be interested in your opinion on a
specific configuration:

CREATE TABLE table1
(
col1 char(16) NOT NULL,
col2 char(16) NOT NULL,
col3 varchar(8) NULL,
col4 datetime NOT NULL,
col5 datetime NOT NULL,
col6 varchar(10) NULL,
col7 varchar(50) NULL,
col8 float NULL
)

The only candidate primary key is col1.
All requests will be made against the primary key, and no other column
needs to be indexed.
The table is quite large (1,5 million rows).
It is mostly accessed by SELECT queries and sometimes (10% of the
time) by inserts (no contention problems), updates or deletes.
There are no range queries and no ORDER BY - most queries include a
WHERE col1 = 'abcdefghijklmnop' clause.

Should I build a clustered or nonclustered index on col1 - and does it
really make a difference ?
According to what I have read here and there:
- A nonclustered index is very appropriate because I'll be accessing a
single row on a highly selective column.
- But it is advisable to have one clustered index on each table (I
don't understand why), and I only need one index on this table. So be
it clustered.

In this second example, consider col2 is a foreign key that references
table2. Most queries on table1 include a join on table2.
And table2 is as large as table1.
I would build a nonclustered index on col1 and the clustered index on
col2. What do you think ?

Thanks for your help

Jean-Sébastien

Erland Sommarskog

unread,

May 6, 2003, 6:07:04 PM5/6/03

to

Jean-Sébastien Philippe (js.ph...@nomade.fr) writes:
> Should I build a clustered or nonclustered index on col1 - and does it
> really make a difference ?
> According to what I have read here and there:
> - A nonclustered index is very appropriate because I'll be accessing a
> single row on a highly selective column.
> - But it is advisable to have one clustered index on each table (I
> don't understand why), and I only need one index on this table. So be
> it clustered.

I would go for the latter in this case. Much the routine reason "it's
good to have a clustered index". There might also a slightly performance
gain for retrieval. About one page read less to access the data, as there
is no bookmark lookup.

> In this second example, consider col2 is a foreign key that references
> table2. Most queries on table1 include a join on table2.
> And table2 is as large as table1.
> I would build a nonclustered index on col1 and the clustered index on
> col2. What do you think ?

This question is not answerable, because on from which corner you
will come in. If your queries are:

SELECT *
FROM table1 a
JOIN table2 b ON a.col2 = b.col2
WHERE a.col1 = @value

I can't see but that the index on col2 is just dead bytes on the disk.
But for a query like:

SELECT *
FROM table2 b
WHERE EXISTS (SELECT * FROM table1 a WHERE a.col2 = b.col2)

A non-clustered index on table1.col2 would be good.

There are of course queries where a clustered index on table1.col2 is
the desirable as well.

--
Erland Sommarskog, SQL Server MVP, som...@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

BP Margolin

unread,

May 6, 2003, 6:47:27 PM5/6/03

to

Jean-Sébastien,

> There are no range queries and no ORDER BY - most queries include a
> WHERE col1 = 'abcdefghijklmnop' clause.

OK, **most** queries will include a test on the "col1" column, but how about the queries that don't include a test on the "col1" table. Are they important? Do you need good response time for those as well?

I can't argue with Erland's suggestion, given what you posted, but I just feel that it's not the complete picture. Erland is correct that you might save one I/O by creating a clustered index on the "col1" column. But since it is going to be the primary key, SQL Server is going to be able to access the specific row very quickly regardless if you have a clustered or non-clustered index on that column.

Unless you are expecting enormous volume, and every millisecond is critical, I'd recommend going with a non-clustered index on the "col1" column, and taking a moment to review your other queries, and see if creating a clustered index over one or more of the other columns is going to increase the performance of those queries.

Remember that your goal is to optimize the performance of your entire application, not just a subset of the queries.

Another thought: you seem to be addressing only the retrieval of data (SELECTs). Most applications also have to modify data (INSERTs, UPDATEs, DELETEs). Particularly with UPDATEs and DELETEs, one usually has criteria to determine which rows are affected by those commands, and indexes come into play there as well. Are you expecting to modify only single rows identified by the "col1" column, or might you be modifying blocks of data identified by something other than the "col1" column? If you are, perhaps there's an excellent candidate for the clustered index.

-------------------------------------------
BP Margolin
Please reply only to the newsgroups.
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
can be cut and pasted into Query Analyzer is appreciated.

"Jean-Sébastien Philippe" <js.ph...@nomade.fr> wrote in message news:88c17020.03050...@posting.google.com...

Erland Sommarskog

unread,

May 7, 2003, 6:20:12 PM5/7/03

to

BP Margolin (bpm...@attglobal.net) writes:
> Unless you are expecting enormous volume, and every millisecond is
> critical, I'd recommend going with a non-clustered index on the "col1"
> column, and taking a moment to review your other queries, and see if
> creating a clustered index over one or more of the other columns is
> going to increase the performance of those queries.

Completely agree. As soon as there is another query that could benefit
from the clustered index, then that's the way to go.

The only exception from that rule is if this query is a low-priority
query you run once a month during off-hours, and you have such a huge
insert/delete frequency during peak hours, that any extra index is
undesired. But this is a very odd scenario.