Ok, I just reread this. Rick Sawtell, lance mortensen, and Joseph Jorden do not say the clustered index sorts the data physically. They just say it sorts the data. We have all admitted as much. The index sorts the data logically, so SQL Server can follow pointers from one page to the next, but if you look at the output from DBCC IND, which shows all the pages that belong to a table, and which page is previous and which page is next, you will see the pages are frequently not in order.
> Very good. I thought I remembered a topic on this subject while studying > for my MCP in SS2K administration.
> IN the book SQL Server 2000 administration (sybex) by Rick Sawtell, lance > mortensen, and Joseph Jorden it states on page 493 under the topic "Proper > Indexing"
> "If you want sql server queries to run as fast as possible, you must > create proper indexes. But to choose an index you must know what type of > queries you will be executing. There are four primary types:" > ... > ... > "Searches that return sorted data. If you are constantly using the order > by clause of a select statement, then the column is a good candidate for a > clustered index because the clustered index will sort the data > automatically."
> In the microsoft press book "Microsoft SQL Server 2000 Database design and > implementation (exam 70-229) it states on page 385 when talking about when > to use a clustered index:
> "The following query types seperately or in combination benefit from > indexes." > ... > ... > "Range Queries. Queries that search for a sequence of values:
> Select contactname, customerid from customers where customerid between > 'b%' and 'c%'.
> Clustered indexes are an excellent choice for this type of query because > the index pages are *physically sorted in sequential* order. Therefore > once the first record is located, it is likely that the other records in > the range will be adjacent or at least nearby.
> so lance mortensen, Rick Sawtell, Joseph Jorden, and ms press all say > ***YOU ARE WRONG***
> You are a self professed Expert who's advantage over everyone else is > fragile at best.
> BTW I do have Kalen's book Inside SQL server 2000 which I used and > referred to alot while studying for my MCP.
> David Portas wrote: >> NumbLock wrote: >>> And what I am saying is that according to everything I have read, >>> physical order IS index order on a clustered index. Read my quote from >>> bol. It is clear that if you claim to be experts, a little old guy like >>> me won't shake your thinking. But if you READ the SQL BOL Excerpt, you >>> can arrive at no other conclusion. If you refuse to see the facts as >>> presented, then that is your own problem. I am saying that BOL is >>> saying that IF you use an order by clause on a clustered index column it >>> will not need to be sorted because it comes back sorted already.
>>> I;ve had enough of this. You *experts* can talk your terminology all >>> you want. I am *not* an expert and never claimed to be. But I can read >>> and I do believe what I read in BOL.
>> BOL is plain wrong about physical sorting. It ought to be corrected but >> you should also read better books.
>> Amazingly, I just got hold of the "TSQL Querying" volume of the new >> "Inside SQL Server 2005". Starting on page 124 it describes ordered and >> unordered scans of a clustered index, with a diagram included! You >> should like this book because Lubor Kollar is a contributor and he does >> work for MS and actually developed much of the engine functionality.
>> Whenever possible please post enough code to reproduce your problem. >> Including CREATE TABLE and INSERT statements usually helps. >> State what version of SQL Server you are using and specify the content >> of any error messages.
I believe that what you say about future behavior of a clustered indexed column can change. I agree with that although I do doubt it. My last post was just to illustrate as you said that this is obviously a subject that has not been fully addressed by Microsoft in their documentation. I am going on past experience when I say what I said. I am not 100 years old and don't know all but when I hear all of these supposed experts who purport to know so much chiding me on a fact that is published and believed by other supposed *experts* and the proofing staff at sybex and ms press to be correct, I have an obligation to explain my point. Have you considered that your book might be wrong? And a clustered index scan will produce ordered results based on the index?
A further quote from Sawtell's book: Clustered indexes are a lot like a dictionary. They change the physical structure of the data to match the constraints of the index. That means that if you index on last name, for example Adams would be placed physically in front of Barnes in the data file.
I don't see why they would change this behavior when physically ordering the data is the main purpose of the index. You might be right about the data getting out of order in some instances. But this goes against the facts I've stated. You've done the research.
But I leave it at that. I do respect your work. But I know that nobody is perfect, not me or you or ms press and if I am wrong, I am wrong.
Kalen Delaney wrote: >> so lance mortensen, Rick Sawtell, Joseph Jorden, and ms press all say >> ***YOU ARE WRONG***
> Like Microsoft itself, MS Press is not a person. There are a lot of people > writing for MS Press, myself included, and I say that they are wrong. Why > don't you quote from my book?
> (I keep telling myself this will be my last post in thread, because it > really doesn't matter to me if you go through the rest of your SQL life with > this misunderstanding, but somehow, I can't help myself. :-) )
> Is the issue about the usefulness of clustered indexes in many situations, > or is the issue about whether clustered index leaf level pages are stored > physically contiguous. > I will not argue the first point. They are useful in many situations. They > just don't always guarantee sorted results.
> The second issue is really easy to prove. Just look at my SQL Server > Magazine articles on fillfactor. I show you how you can insert into a full > page in a table with a clustered index, and then see what page numbers are > used for the table after the page has to split. The pages are not > contiguous.
> What if I can get Rick Sawtell you respond to this thread and say his > statement was not true? But why would you believe Rick's book and not mine? > I do a lot more work at the lower levels of storage than Rick does.
> Have you considered that your book might be wrong? And a clustered index > scan will produce ordered results based on the index?
As I have said, many times a clustered scan will produce ordered results based on the index. But it is not guaranteed. The optimizer can theoretically choose not to follow the pointers if you don't say ORDER BY.
The other issue is that the rows and pages are almost never physically ordered after any inserts or updates have occurred that have incurred splits. And you can prove it using the commands I have mentioned.
> A further quote from Sawtell's book: > Clustered indexes are a lot like a dictionary. They change the physical > structure of the data to match the constraints of the index. That means > that if you index on last name, for example Adams would be placed > physically in front of Barnes in the data file.
This is most likely true when you first create the index. But the physical order is not maintained. Yes, inserting into full pages is expensive, but not THAT expensive. This you can prove also. Find or create a table with 100s of pages. Create a clustered index with fillfactor = 100 so all your pages are full. Now insert into the table a few rows with values at the beginning of the clustered sequence while you are monitoring statistics (SET STATISTICS IO ON). You will see that the number of pages accessed is small, whereas if the pages had to be maintained in physically sorted sequence, you would see the number of page access would be just about equal to the number of pages in the table.
>I believe that what you say about future behavior of a clustered indexed >column can change. I agree with that although I do doubt it. My last post >was just to illustrate as you said that this is obviously a subject that >has not been fully addressed by Microsoft in their documentation. I am >going on past experience when I say what I said. I am not 100 years old >and don't know all but when I hear all of these supposed experts who >purport to know so much chiding me on a fact that is published and believed >by other supposed *experts* and the proofing staff at sybex and ms press to >be correct, I have an obligation to explain my point. Have you considered >that your book might be wrong? And a clustered index scan will produce >ordered results based on the index?
> A further quote from Sawtell's book: > Clustered indexes are a lot like a dictionary. They change the physical > structure of the data to match the constraints of the index. That means > that if you index on last name, for example Adams would be placed > physically in front of Barnes in the data file.
> I don't see why they would change this behavior when physically ordering > the data is the main purpose of the index. You might be right about the > data getting out of order in some instances. But this goes against the > facts I've stated. You've done the research.
> But I leave it at that. I do respect your work. But I know that nobody > is perfect, not me or you or ms press and if I am wrong, I am wrong.
> Kalen Delaney wrote: >>> so lance mortensen, Rick Sawtell, Joseph Jorden, and ms press all say >>> ***YOU ARE WRONG***
>> Like Microsoft itself, MS Press is not a person. There are a lot of >> people writing for MS Press, myself included, and I say that they are >> wrong. Why don't you quote from my book?
>> (I keep telling myself this will be my last post in thread, because it >> really doesn't matter to me if you go through the rest of your SQL life >> with this misunderstanding, but somehow, I can't help myself. :-) )
>> Is the issue about the usefulness of clustered indexes in many >> situations, or is the issue about whether clustered index leaf level >> pages are stored physically contiguous. >> I will not argue the first point. They are useful in many situations. >> They just don't always guarantee sorted results.
>> The second issue is really easy to prove. Just look at my SQL Server >> Magazine articles on fillfactor. I show you how you can insert into a >> full page in a table with a clustered index, and then see what page >> numbers are used for the table after the page has to split. The pages are >> not contiguous.
>> What if I can get Rick Sawtell you respond to this thread and say his >> statement was not true? But why would you believe Rick's book and not >> mine? I do a lot more work at the lower levels of storage than Rick does.
If indexes are properly tuned and the proper fill factor maintained, the physical reordering will be done at index creation. There will be adequate room to add new rows at the appropriate place. If you have database with a 100% fill factor and you insert a row at the top it will take a looooooooong time to insert. That I have seen. If it is not physically reordering the data, what is it doing? It does not really matter at this point. If it is not physically reordering the data, how do you explain the use of the word physically in reference to a clustered index? And why does rick say the following:
Clustered indexes are a lot like a dictionary. They change the physical structure of the data to match the constraints of the index. That means that if you index on last name, for example Adams would be placed physically in front of Barnes in the data file.
Hey if I am wrong, I sure have a good reason to be wrong when experts write books with such topics.
David Portas wrote: > NumbLock wrote: >> Very good. I thought I remembered a topic on this subject while >> studying for my MCP in SS2K administration.
>> IN the book SQL Server 2000 administration (sybex) by Rick Sawtell, >> lance mortensen, and Joseph Jorden it states on page 493 under the topic >> "Proper Indexing"
>> "If you want sql server queries to run as fast as possible, you must >> create proper indexes. But to choose an index you must know what type >> of queries you will be executing. There are four primary types:" >> ... >> ... >> "Searches that return sorted data. If you are constantly using the >> order by clause of a select statement, then the column is a good >> candidate for a clustered index because the clustered index will sort >> the data automatically."
>> In the microsoft press book "Microsoft SQL Server 2000 Database design >> and implementation (exam 70-229) it states on page 385 when talking >> about when to use a clustered index:
>> "The following query types seperately or in combination benefit from >> indexes." >> ... >> ... >> "Range Queries. Queries that search for a sequence of values:
>> Select contactname, customerid from customers where customerid between >> 'b%' and 'c%'.
>> Clustered indexes are an excellent choice for this type of query because >> the index pages are *physically sorted in sequential* order. Therefore >> once the first record is located, it is likely that the other records in >> the range will be adjacent or at least nearby.
>> so lance mortensen, Rick Sawtell, Joseph Jorden, and ms press all say >> ***YOU ARE WRONG***
>> You are a self professed Expert who's advantage over everyone else is >> fragile at best.
> Actually I would never claim to be an expert on engine-level details. I > know some stuff but I'm not nearly at the same level as Kalen and Itzik > so don't just take my word for it.
> The comments you quoted are wrong if you interpret them literally and > in isolation. However, "physical sort" might perhaps be used loosely > (and rather carelessly IMO) to refer to the ordering of pages in the > IAM. More reasonably, it might refer to the state of the clustered > index assuming the data file is contiguous immediately after the index > is created. Maybe this is clarified elsewhere in the books you > mentioned. There's always a risk that statements may mislead if taken > out of context.
> Common sense should tell you that a clustered index that was literally > ordered on disk wouldn't be feasible. If that were so then inserting a > row at the top of a 1 terabyte clustered index would require the entire > table to be rewritten! - and that demonstrably does not happen.
> -- > David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages.
FOr what it is worth, if you used the clustered index on the column which you are doing a range query on it will come back ordered. I don't care what your mileage says.
>>> Create a table with say a date column. Put a clustered index on that >>> column. Do a query on the column for a range of dates. IT will *ALWAYS* >>> come back ordered.
>> Not even with dates (YMMV).
>> CREATE TABLE T1 (x DATETIME PRIMARY KEY CLUSTERED, z CHAR(1) UNIQUE); >> INSERT INTO T1 VALUES ('20050101','C'); >> INSERT INTO T1 VALUES ('20050102','B'); >> INSERT INTO T1 VALUES ('20050103','A');
>> Whenever possible please post enough code to reproduce your problem. >> Including CREATE TABLE and INSERT statements usually helps. >> State what version of SQL Server you are using and specify the content >> of any error messages.
First of all, your query does not even run as written. And you totally missed the point of the discussion. Observe: (btw I shortened this to 1000 rows, but you get the point)
CREATE TABLE TestStuff ( -- this column is clustered and this is the column queried ID uniqueidentifier CONSTRAINT UNQ1 UNIQUE CLUSTERED, -- this column Data INT PRIMARY KEY NONCLUSTERED ) SET NOCOUNT ON
DECLARE @Ind INT SET @Ind = 1000 WHILE @Ind > 0 BEGIN SET @Ind = @Ind - 1 INSERT INTO TestStuff VALUES (newid(), @Ind) END
select id from teststuff WHERE id between '00000000-0000-0000-0000-000000000000' AND 'ffffffff-ffff-ffff-ffff-ffffffffffff'
On Wed, 24 May 2006 14:55:14 -0500, NumbLock wrote: >So the optimizer can read your mind? What it says is that the data is >already ordered if you use an order by. How much clearer can that be?
Hi Galen,
What this means is NOT that the data will also be returned sorted when not using an order by.
What this DOES mean is:
- IF you specify an ORDER BY, - AND the optimizer decides to use the clustered index, - THEN no additional sort step is needed, because traversing the clustered index pages in logical order (by their pointer chain) will result in the rows being read in the required order.
This might of course result in a faster execution.
However, the optimizer is still free to choose a different index (though it will have to add a sort step in that case).
Without the ORDER BY, the optimizer will be even more likely to choose a different index, since it can do so without the required extra sort step.
Of course, you are right not to believe what people say on face value. (Though you should IMO apply the same critical attitude to what you read in Books Online). The truth can only be found by testing. In other posts, you've been saying that if you query a column with a clustered index for a range, the result will always be ordered by that clustered index. Below is a script to prove that this is not true.
CREATE TABLE test (Cluster int NOT NULL, Other int NOT NULL, Filler char(2000) NOT NULL, PRIMARY KEY CLUSTERED (Cluster)); go CREATE INDEX ix_Other ON test(Other); go DECLARE @i int; SET @i = 1; WHILE @i < 10000 BEGIN INSERT INTO test (Cluster, Other, Filler) SELECT @i, CASE WHEN @i % 789 = 0 THEN @i % 987 ELSE 0 END, ''; SET @i = @i + 1; END; go PRINT 'Without order by:'; SELECT Cluster, Other FROM test WHERE Cluster BETWEEN 100 AND 4500 AND Other > 0; go PRINT 'With order by:'; SELECT Cluster, Other FROM test WHERE Cluster BETWEEN 100 AND 4500 AND Other > 0 ORDER BY Cluster; go DROP TABLE test; go
Results (on SQL Server 2005 and SQL Server 2000 SP4):
Without order by: Cluster Other ----------- ----------- 3156 195 2367 393 1578 591 789 789 3945 984
With order by: Cluster Other ----------- ----------- 789 789 1578 591 2367 393 3156 195 3945 984
Aw, Hugo, I thought you had him for a second, but unfotunately he has apparantly narrowed the scope of his original statement. In other words, according to him, a test will only be valid IF the following conditions are met:
1. There must be a clustered index 2. The WHERE clause must reference the columns in the index AND no other columns covered by an additional index. 3. The SELECT statement cannot contain aggragates.
Just pointing it out, because it took me several posts of his to ascertain those conditions.
It works fine on my system. It may be a configuration issue. I am running on a mulitprocessor system. Without a ORDER BY clause on the ID column the rows do not come out in order. 1000 rows isn't enough to see the effect, on my system if there are only 1000 rows the result always comes out in order on my system too.
> First of all, your query does not even run as written. And you > totally missed the point of the discussion. Observe: (btw I shortened > this to 1000 rows, but you get the point)
> CREATE TABLE TestStuff > ( > -- this column is clustered and this is the column queried > ID uniqueidentifier CONSTRAINT UNQ1 UNIQUE CLUSTERED, -- this column > Data INT PRIMARY KEY NONCLUSTERED > ) > SET NOCOUNT ON > DECLARE @Ind INT > SET @Ind = 1000 > WHILE @Ind > 0 > BEGIN > SET @Ind = @Ind - 1 > INSERT INTO TestStuff VALUES (newid(), @Ind) > END > select id from teststuff WHERE id > between > '00000000-0000-0000-0000-000000000000' > AND > 'ffffffff-ffff-ffff-ffff-ffffffffffff' > drop table teststuff
This is what I needed to see. I agree that you have proved my statement wrong. This is what I consider to be *expert*. Someone who can back up their claim. Why don't you write a book, Hugo?
Hugo Kornelis wrote: > On Wed, 24 May 2006 14:55:14 -0500, NumbLock wrote:
>> So the optimizer can read your mind? What it says is that the data is >> already ordered if you use an order by. How much clearer can that be?
> Hi Galen,
> What this means is NOT that the data will also be returned sorted when > not using an order by.
> What this DOES mean is:
> - IF you specify an ORDER BY, > - AND the optimizer decides to use the clustered index, > - THEN no additional sort step is needed, because traversing the > clustered index pages in logical order (by their pointer chain) will > result in the rows being read in the required order.
> This might of course result in a faster execution.
> However, the optimizer is still free to choose a different index (though > it will have to add a sort step in that case).
> Without the ORDER BY, the optimizer will be even more likely to choose a > different index, since it can do so without the required extra sort > step.
> Of course, you are right not to believe what people say on face value. > (Though you should IMO apply the same critical attitude to what you read > in Books Online). The truth can only be found by testing. > In other posts, you've been saying that if you query a column with a > clustered index for a range, the result will always be ordered by that > clustered index. Below is a script to prove that this is not true.
> CREATE TABLE test > (Cluster int NOT NULL, > Other int NOT NULL, > Filler char(2000) NOT NULL, > PRIMARY KEY CLUSTERED (Cluster)); > go > CREATE INDEX ix_Other ON test(Other); > go > DECLARE @i int; > SET @i = 1; > WHILE @i < 10000 > BEGIN > INSERT INTO test (Cluster, Other, Filler) > SELECT @i, CASE WHEN @i % 789 = 0 THEN @i % 987 ELSE 0 END, ''; > SET @i = @i + 1; > END; > go > PRINT 'Without order by:'; > SELECT Cluster, Other > FROM test > WHERE Cluster BETWEEN 100 AND 4500 > AND Other > 0; > go > PRINT 'With order by:'; > SELECT Cluster, Other > FROM test > WHERE Cluster BETWEEN 100 AND 4500 > AND Other > 0 > ORDER BY Cluster; > go > DROP TABLE test; > go
> Results (on SQL Server 2005 and SQL Server 2000 SP4):
> Without order by: > Cluster Other > ----------- ----------- > 3156 195 > 2367 393 > 1578 591 > 789 789 > 3945 984
> With order by: > Cluster Other > ----------- ----------- > 789 789 > 1578 591 > 2367 393 > 3156 195 > 3945 984
>Aw, Hugo, I thought you had him for a second, but unfotunately he has >apparantly narrowed the scope of his original statement. In other >words, according to him, a test will only be valid IF the following >conditions are met:
>1. There must be a clustered index >2. The WHERE clause must reference the columns in the index AND no >other columns covered by an additional index. >3. The SELECT statement cannot contain aggragates.
>Just pointing it out, because it took me several posts of his to >ascertain those conditions.
>Stu
Hi Stu,
My goal was not to "have" anyone - I believe we are all here to learn or to share knowledge - both in my case!
I may well have missed some points. I've just been reading the entire thread and I didn't check every detail.
If the requirements are as you state, it will be hard to post a repro. But I still feel uncomfortable leaving off the ORDER BY. There is at least one situation (unfortunately not something you can reproduce on a test system) that might mess up the order of results. I once caught a post by someone from the MS team who helped building the optimizer and query engine that mentioned a rare optimization technique. Suppose you submit a query that forces a clustered index scan. Just a split second later, I submit my query that has to scan the same index. If my query has no ORDER BY, the engine will let my query "look over your shoulder", picking up data from the pages read for your query. Once your query is done, my query will go back to the first page of the clustered index and start reading until it gets to the point where it started looking over your shoulder.
That would ruin the expected ordering. It would only happen on very busy systems. And only intermittently. Very nasty bug to track down!!
BTW, I also think that Galen aka NumbLock makes some very valid points. Way to many books just claim that a clustered index controls the physical ordering of the data, where in reality it stores data in a pointer chain that is logically ordered (if you follow the pointers), but can physically be scattered all around the DB. And way to many books state things that might mislead one into thinking that the results from a query that uses a clustered index will always be ordered. The point that Galen takes in this thread is incorrect - but it's very easy to see how he has come to his conclusions. The writers of BOL and the authors of several books are to blame. As are many frequent usenet posters (me probably included).
My original statement was and is that a if you omit the order by clause on a column which has a clustered index on it, it would always come back in index order. I said an index on another column would effect the outcome. But Hugo did not use any other columns in his where clause, he used only the between clause on the clustered column. Directly disproving what I thought to be true.
Stu wrote: > Aw, Hugo, I thought you had him for a second, but unfotunately he has > apparantly narrowed the scope of his original statement. In other > words, according to him, a test will only be valid IF the following > conditions are met:
> 1. There must be a clustered index > 2. The WHERE clause must reference the columns in the index AND no > other columns covered by an additional index. > 3. The SELECT statement cannot contain aggragates.
> Just pointing it out, because it took me several posts of his to > ascertain those conditions.
Further, even if you omit the AND other > 0 it still has the last couple of rows out of order. But if you do a select * instead of specifying a column list, it comes back in index order. Still, I have been proved wrong. Thanks for enlightening me.
Stu wrote: > Aw, Hugo, I thought you had him for a second, but unfotunately he has > apparantly narrowed the scope of his original statement. In other > words, according to him, a test will only be valid IF the following > conditions are met:
> 1. There must be a clustered index > 2. The WHERE clause must reference the columns in the index AND no > other columns covered by an additional index. > 3. The SELECT statement cannot contain aggragates.
> Just pointing it out, because it took me several posts of his to > ascertain those conditions.
On Thu, 25 May 2006 01:40:10 +0200, Hugo Kornelis wrote:
(snip)
> There is at >least one situation (unfortunately not something you can reproduce on a >test system) that might mess up the order of results.
Afterthought....
After reading Dan's latest post, I realize that I have missed a situation. One that would be easier to repro.
Use a multi-processor system with multiple drives. Set up a distributeed partitioned view (on SQL 2000 or SQL 2005) or a partitioned table (SQL 2005 only) that uses multiple drives. Execute a query with a range selection on the clustering/partitioning column, such that part of the data is on one drive and part is on the other drive. I expect SQL Server to come up with a plan that assigns two processors to each get matching rows from one of the drives, then merge the results. It would be unsorted.
Since it's almost 2AM in my time zone, I'll leave it to someone else to post the actual SQL code to verify this expectation.
On Wed, 24 May 2006 18:38:29 -0500, NumbLock wrote: >This is what I needed to see. I agree that you have proved my statement >wrong.
Hi Galen,
Thanks -- but according to Stu, I missed one of the requirements that was hidden somewhere in this monster thread.
> This is what I consider to be *expert*. Someone who can back up >their claim.
<blush>
Most of what I know about this comes from reading Kalen Delaney's books. The rest is partially from reading postings by many other people who have contributed to this thread and many who have not, and from experimenting.
> Why don't you write a book, Hugo?
My native language is Dutch. Almost all Dutch SQL professionals are able to read English quite easily. The market for Dutch SQL books is non-existing.
> On Wed, 24 May 2006 18:38:29 -0500, NumbLock wrote:
>>This is what I needed to see. I agree that you have proved my statement >>wrong.
> Hi Galen,
> Thanks -- but according to Stu, I missed one of the requirements that > was hidden somewhere in this monster thread.
>> This is what I consider to be *expert*. Someone who can back up >>their claim.
> <blush>
> Most of what I know about this comes from reading Kalen Delaney's books. > The rest is partially from reading postings by many other people who > have contributed to this thread and many who have not, and from > experimenting.
>> Why don't you write a book, Hugo?
> My native language is Dutch. Almost all Dutch SQL professionals are able > to read English quite easily. The market for Dutch SQL books is > non-existing.
> On Wed, 24 May 2006 14:55:14 -0500, NumbLock wrote:
>>So the optimizer can read your mind? What it says is that the data is >>already ordered if you use an order by. How much clearer can that be?
> Hi Galen,
> What this means is NOT that the data will also be returned sorted when > not using an order by.
> What this DOES mean is:
> - IF you specify an ORDER BY, > - AND the optimizer decides to use the clustered index, > - THEN no additional sort step is needed, because traversing the > clustered index pages in logical order (by their pointer chain) will > result in the rows being read in the required order.
> This might of course result in a faster execution.
> However, the optimizer is still free to choose a different index (though > it will have to add a sort step in that case).
> Without the ORDER BY, the optimizer will be even more likely to choose a > different index, since it can do so without the required extra sort > step.
> Of course, you are right not to believe what people say on face value. > (Though you should IMO apply the same critical attitude to what you read > in Books Online). The truth can only be found by testing. > In other posts, you've been saying that if you query a column with a > clustered index for a range, the result will always be ordered by that > clustered index. Below is a script to prove that this is not true.
> CREATE TABLE test > (Cluster int NOT NULL, > Other int NOT NULL, > Filler char(2000) NOT NULL, > PRIMARY KEY CLUSTERED (Cluster)); > go > CREATE INDEX ix_Other ON test(Other); > go > DECLARE @i int; > SET @i = 1; > WHILE @i < 10000 > BEGIN > INSERT INTO test (Cluster, Other, Filler) > SELECT @i, CASE WHEN @i % 789 = 0 THEN @i % 987 ELSE 0 END, ''; > SET @i = @i + 1; > END; > go > PRINT 'Without order by:'; > SELECT Cluster, Other > FROM test > WHERE Cluster BETWEEN 100 AND 4500 > AND Other > 0; > go > PRINT 'With order by:'; > SELECT Cluster, Other > FROM test > WHERE Cluster BETWEEN 100 AND 4500 > AND Other > 0 > ORDER BY Cluster; > go > DROP TABLE test; > go
> Results (on SQL Server 2005 and SQL Server 2000 SP4):
> Without order by: > Cluster Other > ----------- ----------- > 3156 195 > 2367 393 > 1578 591 > 789 789 > 3945 984
> With order by: > Cluster Other > ----------- ----------- > 789 789 > 1578 591 > 2367 393 > 3156 195 > 3945 984
NumbLock wrote: > My original statement was and is that a if you omit the order by clause > on a column which has a clustered index on it, it would always come back > in index order. I said an index on another column would effect the > outcome. But Hugo did not use any other columns in his where clause, he > used only the between clause on the clustered column. Directly > disproving what I thought to be true.
SELECT Cluster, Other FROM test WHERE Cluster BETWEEN 100 AND 4500 AND Other > 0; -- What do you call this, if not an additional column?
I'm sorry; I think you must be getting tired. I fail to see how Hugo's query is substantially different than the first query I posted (several hours ago).
I don't want to spoil the party, but I don't get it.
Hugo's query proves that the clustered index is not necessarily used if you have a WHERE clause that filters on the clustered index's column! Look at the query plan: no clustered index anywhere in sight.
And that is basic knowledge, since the optimizer simply makes a tradeoff between the cost of using a nonclustered index (and potential bookmark lookups) or following the clustered index.
In this case the tradeoff is extremely simple, since the nonclustered index is covering the query and thus will require no bookmark lookups. Sorry, but that is a no brainer.
Please note that the person you do recognize as being an export (Hugo) also described that merry-go-round scans (on Enterprise Edition, as of SQL Server 2000) could mess up the order of the clustered index scan, as well as his excellent idea that a partitioned table on multiple disks (as of SQL Server 2005) might do the same.
In the future I expect more innovations in the Storage Engine which will increase performance and exacerbate this behavior.
> This is what I needed to see. I agree that you have proved my statement > wrong. This is what I consider to be *expert*. Someone who can back up > their claim. Why don't you write a book, Hugo?
> Hugo Kornelis wrote: > > On Wed, 24 May 2006 14:55:14 -0500, NumbLock wrote:
> >> So the optimizer can read your mind? What it says is that the data is > >> already ordered if you use an order by. How much clearer can that be?
> > Hi Galen,
> > What this means is NOT that the data will also be returned sorted when > > not using an order by.
> > What this DOES mean is:
> > - IF you specify an ORDER BY, > > - AND the optimizer decides to use the clustered index, > > - THEN no additional sort step is needed, because traversing the > > clustered index pages in logical order (by their pointer chain) will > > result in the rows being read in the required order.
> > This might of course result in a faster execution.
> > However, the optimizer is still free to choose a different index (though > > it will have to add a sort step in that case).
> > Without the ORDER BY, the optimizer will be even more likely to choose a > > different index, since it can do so without the required extra sort > > step.
> > Of course, you are right not to believe what people say on face value. > > (Though you should IMO apply the same critical attitude to what you read > > in Books Online). The truth can only be found by testing. > > In other posts, you've been saying that if you query a column with a > > clustered index for a range, the result will always be ordered by that > > clustered index. Below is a script to prove that this is not true.
> > CREATE TABLE test > > (Cluster int NOT NULL, > > Other int NOT NULL, > > Filler char(2000) NOT NULL, > > PRIMARY KEY CLUSTERED (Cluster)); > > go > > CREATE INDEX ix_Other ON test(Other); > > go > > DECLARE @i int; > > SET @i = 1; > > WHILE @i < 10000 > > BEGIN > > INSERT INTO test (Cluster, Other, Filler) > > SELECT @i, CASE WHEN @i % 789 = 0 THEN @i % 987 ELSE 0 END, ''; > > SET @i = @i + 1; > > END; > > go > > PRINT 'Without order by:'; > > SELECT Cluster, Other > > FROM test > > WHERE Cluster BETWEEN 100 AND 4500 > > AND Other > 0; > > go > > PRINT 'With order by:'; > > SELECT Cluster, Other > > FROM test > > WHERE Cluster BETWEEN 100 AND 4500 > > AND Other > 0 > > ORDER BY Cluster; > > go > > DROP TABLE test; > > go
> > Results (on SQL Server 2005 and SQL Server 2000 SP4):
I neglected to mention here that if you leave off the AND other > 0 it still comes back out of order in the last couple of rows. I did a followup post to explain what I meant.
Do not worry. The coup against your self-professed expert status has been put down. You can have your little newsgroup back now.
Stu wrote: > NumbLock wrote: >> My original statement was and is that a if you omit the order by clause >> on a column which has a clustered index on it, it would always come back >> in index order. I said an index on another column would effect the >> outcome. But Hugo did not use any other columns in his where clause, he >> used only the between clause on the clustered column. Directly >> disproving what I thought to be true.
> SELECT Cluster, Other > FROM test > WHERE Cluster BETWEEN 100 AND 4500 > AND Other > 0; -- What do you call this, if not an additional > column?
> I'm sorry; I think you must be getting tired. I fail to see how Hugo's > query is substantially different than the first query I posted (several > hours ago).
Whatever. You're the one that claimed you liked a good argument; I just wanted to point out a fallacythatt developed over the course of your postings. You seem to be the only one whose feelings are hurt.
Kalen, I have agreed that Hugo has proved my theory incorrect. However, based on our argument about the physical ordering of data in a clustered index, msdn2 states the following at
"Clustered indexes are not a good choice for the following attributes:
* Columns that undergo frequent changes This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile."
This statement completely goes against what you and expert David Portas and others have said about physical ordering of the data in a clustered index.
I've seen that the results come back skewed in Hugo's example. Perhaps you should use some of your contacts at MS to resolve this ambiguity.
Yet another quote about the physical ordering of a clustered index: I understand the concept of a page split, but if a proper fill-factor is maintained there should be no page splits. Isn't that correct?
When you create a clustered index, the data in the table is stored in the data pages of the database according to the order of the values in the indexed columns. When new rows of data are inserted into the table or the values in the indexed columns are changed, Microsoft® SQL Server™ may have to reorganize the storage of the data in the table to make room for the new row and maintain the ordered storage of the data. This also applies to nonclustered indexes; when data is added or changed, SQL Server may have to reorganize the storage of the data in the nonclustered index pages. When adding a new row to a full index page, SQL Server will move approximately half the rows to a new page to make room for the new row, a reorganization known as a page split. This can impair performance and fragment the storage of the data in a table.
Yes stu... Both of my feelings are hurt. :) If you are gonna call yourself an expert, you will have to prove it in my book. And I do love a good argument. Fact is what I said about your query was true.
Stu wrote: > Whatever. You're the one that claimed you liked a good argument; I > just wanted to point out a fallacythatt developed over the course of > your postings. You seem to be the only one whose feelings are hurt.