question on clustered indexes in sql-server

Lennart Jonsson

unread,

Nov 29, 2011, 10:39:57 AM11/29/11

to

What is the purpose of a clustered index in sql-server (as you probably
have guessed I have zero to none experience with sql-server)?

The reason I ask is because I look at a databas where more or less all
tables are designed as:

create table T (
x int IDENTITY(1,1) NOT NULL,
[...]
CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ...

In db2 I would look at range predicates and order by clauses on queries
to determine what index that should be clustered (inorder to avoid sorts).

What is the rationale to use a clustering index like the one above?

/Lennart

Bob Barrows

unread,

Nov 29, 2011, 12:03:24 PM11/29/11

to

Kimberly Tripp sums up the debate (and there is, indeed, some controversy
about this) quite nicely here:
http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx

Make sure you follow the links near the beginning of the article.

Lennart Jonsson

unread,

Nov 29, 2011, 12:39:19 PM11/29/11

to

On 2011-11-29 18:03, Bob Barrows wrote:
[...]

> Kimberly Tripp sums up the debate (and there is, indeed, some controversy
> about this) quite nicely here:
> http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx
>
> Make sure you follow the links near the beginning of the article.
>

Thanks Bob. I've skimmed through the article and the links and I think I
got most of it. Clustering index is a different creature in sql-server
compared to db2. The clustering indexes used makes more sense given this
information. Thanks

/Lennart

Erland Sommarskog

unread,

Nov 29, 2011, 5:45:42 PM11/29/11

to

Probably database design on autopilot.

I don't know about DB2, but I know that in Oracle, heaps are the norm,
and index-organised tables is something you use rarely. In SQL Server,
it is the other way around. The clustered index is the normal thing,
and heaps is something you only use sometimes. Except in SQL Azure,
where heaps are not even supported. All mindsets in SQL Server is
geared on clustered indexes, and you better know what you are doing if
use a heap.

As a consequence of this, by default when you define a primary key,
it will be clustered, unless there already is a clustered index. Which
there rarely is, since the PK is typically the first index.

Furthermore, many inexperienced developers slaps IDENTITY column in
each table (often with all other columns nullable), so that's why
you get it.

That said, there are also sound reasons to have an clustered index
on an IDENTITY column to avoid fragmentation. But that does not
apply to all tables you see. And, incidently, nor if you want really
good INSERT performance, since you get a single hot-spot. Thomas
Kejser had a good presentation on this on SQL Rally where he talked
about getting really good INSERT performance on flash drives.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Lennart Jonsson

unread,

Nov 30, 2011, 9:30:05 AM11/30/11

to

On 2011-11-29 23:45, Erland Sommarskog wrote:
[...]

>
> I don't know about DB2, but I know that in Oracle, heaps are the norm,
> and index-organised tables is something you use rarely. In SQL Server,
> it is the other way around. The clustered index is the normal thing,
> and heaps is something you only use sometimes. Except in SQL Azure,
> where heaps are not even supported. All mindsets in SQL Server is
> geared on clustered indexes, and you better know what you are doing if
> use a heap.
>

Hi Erland, reading the links provided by Bob made me realize that that
are some essential differences between db2 and sql-server (when it comes
to clustering indexes anyhow). If I got it right, the leaf pages in a
clustered index in sql-server is the data pages. In db2 leaf pages
contains a pointer to the data page just like any other index.

In db2 the main focus is how queries may benefit from the clustering
index (reduce i/o and sort), and clustering indexes is therefor not
added until one is knows what the typical queries are. This may of
course be known at design time, but is often not discovered until later.

Where the clustering strategy used in my example would have been
absolutely braindead in db2, it makes a whole lot more sense in
sql-server (even if there seems to be some controversy of what strategy
to choose).

Looking at other constructions in the datamodel, I'll bet a dollar or
two on your auto pilot hyphothesis.

Cheers
/Lennart

[...]

Erland Sommarskog

unread,

Nov 30, 2011, 5:33:43 PM11/30/11

to

Lennart Jonsson (erik.lenna...@gmail.com) writes:
> If I got it right, the leaf pages in a clustered index in sql-server is
> the data pages.

Yes, that is exactly essense.

> In db2 leaf pages contains a pointer to the data page just like any
> other index.

So then in DB2, what is the difference between a clustered index and a
non-clustered index?

Lennart Jonsson

unread,

Dec 1, 2011, 4:25:08 AM12/1/11

to

On 2011-11-30 23:33, Erland Sommarskog wrote:
[...]

>> In db2 leaf pages contains a pointer to the data page just like any
>> other index.
>
> So then in DB2, what is the difference between a clustered index and a
> non-clustered index?
>

If the index is clustered the data pages are ordered according to the
clustering index. When a row is inserted, db2 finds the page where the
row should reside. If the row does'nt fit there, db2 looks in a
neighbourhould of the page. If that does not succedd etheir, the row is
put at the end. In these cases a pointer to the chosen page is stored in
the page where the row should have been. If there are many "overflow"
pages in a table, additional I/O is required when reading pages, and a
reorg of the table should be performed.

For a table without a clustering index db2 store the rows in no
particular order.

/Lennart

Erland Sommarskog

unread,

Dec 1, 2011, 5:55:43 PM12/1/11

to

Lennart Jonsson (erik.lenna...@gmail.com) writes:
> If the index is clustered the data pages are ordered according to the
> clustering index. When a row is inserted, db2 finds the page where the
> row should reside. If the row does'nt fit there, db2 looks in a
> neighbourhould of the page. If that does not succedd etheir, the row is
> put at the end. In these cases a pointer to the chosen page is stored in
> the page where the row should have been. If there are many "overflow"
> pages in a table, additional I/O is required when reading pages, and a
> reorg of the table should be performed.

So the good news is that there are no page splits.

The bad news is that scans along the clustered index can be jumping forth
and back.

Am I right?

Lennart Jonsson

unread,

Dec 2, 2011, 2:32:30 AM12/2/11

to

On 2011-12-01 23:55, Erland Sommarskog wrote:
> Lennart Jonsson (erik.lenna...@gmail.com) writes:
>> If the index is clustered the data pages are ordered according to the
>> clustering index. When a row is inserted, db2 finds the page where the
>> row should reside. If the row does'nt fit there, db2 looks in a
>> neighbourhould of the page. If that does not succedd etheir, the row is
>> put at the end. In these cases a pointer to the chosen page is stored in
>> the page where the row should have been. If there are many "overflow"
>> pages in a table, additional I/O is required when reading pages, and a
>> reorg of the table should be performed.
>
> So the good news is that there are no page splits.
>
> The bad news is that scans along the clustered index can be jumping forth
> and back.
>
> Am I right?
>

Indeed, a typical rule of thumb is that when overflows (jumps) > 3% of
read rows, it's time to reorganize the table (and eventually indexes on
the table as well). There are other indicators of when reorg is needed,
but this is the one I use most.

/Lennart