Go to the download section at www.actoug.org.au, make what you can of the
PowerPoint presentation, wait for the accompanying whitepaper/book, read it
carefully too and hopefully you'll get the picture.
Cheers
Richard
Trust me, that's the last thing you want.
Leave indexes alone. They cope quite nicely on their own.
Regards
HJR
You sound so much like an Oracle 7 DBA! First take a look at
www.jlcomp.demon.co.uk/faq/table_frag.html, then take a look at the
referenced Steve Adam's paper if you want more details. The only time
that you should rebuild an index is if you notice that a specific SQL
statement is significantly slowed down by index fragmentation. This
probably won't ever be an issue if you use locally-managed tablespaces
(preferably with "automatic" extent management, as opposed to uniform
extent management).
HTH
Daniel
And of course, if you don't believe these guys now, look at
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=182699.1
Here's a bunch of handy scripts, but be sure you understand why you
run any you happen to run, some are elderly and have lost correctness:
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=131704.1
Don't forget to post version info, it can make a difference.
http://members.cox.net/oracleunix/readme-cdos.htm
jg
--
@home.com
http://www.myrtlebeachonline.com/mld/myrtlebeachonline/news/politics/8369624.htm
Is there an emoticon for a big, BIG, BIIIIIIGGGG beaming smiley face.
After having read the first slides containing the delicious quotes from
Herr Burleson und seiner freunde, I had to go and bake a doughnut.
After that I had indigestion with your 50-50 leaf block claim. Jonathan
claims it's 50-50 *on average*, but not by volume, I think. Interesting.
However: I just wanna know why that Sharman guy gets all the invites and
I don't :-((
Regards
HJR
Hi Howard,
I'm not sure. What about :) x 100 !!
>
> After having read the first slides containing the delicious quotes from
> Herr Burleson und seiner freunde, I had to go and bake a doughnut.
The difficult part was selecting which quotes to use, there are so many. I
didn't want to single out any one person and I wanted to highlight how so
many have got it so wrong. So I focused on those "experts" that seem to
publish a lot of "stuff" ...
>
> After that I had indigestion with your 50-50 leaf block claim. Jonathan
> claims it's 50-50 *on average*, but not by volume, I think. Interesting.
No, it's definitely by volume. I showed this in a thread here a while back
where after a leaf block split, the number of index entries in each block
varied but the amount of used space sat at the 50% mark in each block.
Look here:
http://groups.google.com/groups?q=g:thl1973251333d&dq=&hl=en&lr=&ie=UTF-8&oe
=UTF-8&selm=ant%259.39604%24jM5.100537%40newsfeeds.bigpond.com. where I
demonstrate my "volume" claim The white paper also has an example.
It's the same thread whereby Don Burleson showed his total ignorance of how
indexes work and kinda opened my eyes that he wasn't quite what he claimed.
Interestingly, almost the entire history of Don Burleson posts have been
removed from the google archives, there's hardly a one left. As an example,
all the posts in this thread where he made a fool of himself are no longer
there, although thankfully for prosperity sake, his "words of wisdom" can
still be found in among the posts of others. It's all very interesting don't
you think, although it's probably a good move from Don's point of view to
reduce the embarrassment of some of the stuff his written.
>
> However: I just wanna know why that Sharman guy gets all the invites and
> I don't :-((
>
A lot of it has to do with the fact he's a local lad who lives in the
neighbouring suburb from me !! However, our next User Group event is on 23
June so if your interested in conducting a presentation in front of 60 odd
folks and would like a lovely free lunch to boot, please let me know and
I'll grab you a spot no worries at all !!
Cheers
Richard
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
June 2004 UK - Optimising Oracle Seminar
July 2004 USA West Coast, Optimising Oracle Seminar
August 2004 Charlotte NC, Optimising Oracle Seminar
September 2004 USA East Coast, Optimising Oracle Seminar
September2004 UK - Optimising Oracle Seminar
"Howard J. Rogers" <h...@dizwell.com> wrote in message
news:409334fa$0$25007$afc3...@news.optusnet.com.au...
>
> After that I had indigestion with your 50-50 leaf block claim. Jonathan
> claims it's 50-50 *on average*, but not by volume, I think. Interesting.
>
I'm not sure I've ever been precise in my wording about how the 50/50
works - by default the tests I've done to date have always had uniform
sized keys, so 50/50 by key count would tend to be 50/50 by volume,
so I've never gone beyond saying just "roughly 50/50".
This, of course, demonstrates, that for every test result and repeatable
test case, there is always an enhancement to the test and refinement of
understanding. (Richard's "50% by volume" is such an obvious detail
to check - after it's been pointed out !! - it's also an extremely practical
point to be aware of, some systems will have keys with lots of columns
with lots of nulls, leading to all sorts of odd variation in key length).
In fact, Steve Adams described a further detail to the split process when
he was doing the Miracle Masterclass in Denmark in 2003, the preliminary
target is a 'fair' split - which Richard now tells us is 50/50 by volume -
but
Oracle will then allow a little side-stepping on the split to mimimise the
size
of the necessary entries in the branch blocks.
> many have got it so wrong. So I focused on those "experts" that seem to
> publish a lot of "stuff" ...
you mean the quantity of books published is not in direct
proportion to expertise? Now now, next you're gonna tell us that
BCHR is not the most performance gauge...
> No, it's definitely by volume. I showed this in a thread here a while back
> where after a leaf block split, the number of index entries in each block
> varied but the amount of used space sat at the 50% mark in each block.
Great prezzie, Richard. Excellent info. I do particularly like the point
you raise about clustering the table data, not the index! I've been advising
people to load their data in physical sequence of range scan keys for years
and have had tremendous performance improvements from just physically
sorting data in tables where range scans are involved. Now I know precisely
the reason why.
After reading it, I've got a question for you if I may:
Let's assume a situation where we have a root block, three branch blocks
and a bucketload of leaf blocks, like in your slide #22.
Assume as well that this index was created on a sequence-generated
column (regular but not necessarily consecutive increase). As such in the
pictorial representation the index leaf blocks would be organised left to
right in increase value of the key.
Now let's look at the physical (in the disk) distribution of this thing.
Presumably the root block and the first branch and a few leaf blocks
would be on consecutive blocks on the disk partition - assuming a raw
disk, an "intelligent" file system might do other things...
Followed by another branch block and more leaf blocks,then another branch
block and more leaf blocks. Correct assumption?
If so, then let's assume that later interspersed insertions of new keys
cause the third block from the left to split. Like you say in your slides,
no problemo: the new block is allocated from index freelist, the initial
block gets "emptied" 50% into the new one and we do not get an increase in
index level. Fine and dandy.
But now consider the physical distribution. Isn't that new block coming
from the freelist? Which might be pointing to a first free block on "the
other side of the disk", so to speak. This block will now LOGICALLY be
between the original third and fourth leaf blocks, but physically very
remote from these two original blocks.
You know where I'm getting at now, don't you? If someone is now doing an
index range scan - which for the sake of example will involve the third,
new fourth and old fourth leaf blocks using the block-to-block link
pointers, wouldn't it involve a heap of I/O wait while the arm got
repositioned to get the new fourth block from "the other side of the disk"?
Wouldn't that be a case for an index rebuild, based purely on physical
distribution rather than hazy "broken b-tree" concepts? What would be
a way of finding out if this was indeed the case, apart from the obvious
system I/O wait queue increasing in this device?
Thanks in advance for any feedback.
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
Your description is correct - if you do a treedump
(and have a version that doesn't dump entire leaf blocks)
you can see the effect very clearly.
Taking you example and pushing it, you might see
that in a clean growth of an index in an initially clean
tablespace, with a monotonic key, that the index
blocks when read in "physical" order go:
root,
100 x leaf blocks
2 x branch
100 x leaf blocks
branch
100 x leaf blocks
branch
100 x leaf blocks
branch
100 x leaf blocks
branch
100 x leaf blocks
branch
100 x leaf blocks
branch
A few more leaf blocks
If you now insert a row that has to go into
leaf block 50, then that the entries from
that block are shared between leaf block
50 and a new block stuck up at the HWM,
which might be block 750 in my sketch above.
So a range scan might now go:
block 49, 50, 750, 51, 52
BUT - having worked out that a particular
phenomenon can occur, you then have to
ask about the circumstances, and frequency,
before you can comment on the impact on
performance.
Because of that one unlucky split, I have to
do one extra physical I/O if I scan across
two or three index leaf blocks - but if I'm
doing that, I'm probably collecting lots of
table rows anyway and if my table rows
are allowed to arrive out of sequence order,
how many different table blocks will I have
to visit to use all the rowids I get from the
index ? Even if the table is sorted perfectly
in order for the index, table row lengths tend
to be much larger than index row length -
so I could still have to read many more table
blocks than index blocks. Will the one extra
read matter enough to make it worth rebuilding
the index.
If you are suspicious of a particular index, and
want to examine the details, you really need to
investigate the tree dump, or even the full block
dump. There's a site I've just found with an
interesting couple of articles about internal structures
that also has a product that uses an external procedure
to turn the internal structure of Oracle blocks into
a view - the articles are at
www.tlingua.com
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
June 2004 UK - Optimising Oracle Seminar
July 2004 USA West Coast, Optimising Oracle Seminar
August 2004 Charlotte NC, Optimising Oracle Seminar
September 2004 USA East Coast, Optimising Oracle Seminar
September2004 UK - Optimising Oracle Seminar
"Noons" <wizo...@yahoo.com.au> wrote in message
news:40936644$0$12740$afc3...@news.optusnet.com.au...
> So a range scan might now go:
> block 49, 50, 750, 51, 52
I was trying to figure out if this could occur.
It appears to me that assuming:
1- a very volatile table where its index also gets "pushed around",
2- a separate partition for the index,
there might be a case for needing to re-build an index.
Nothing to do with a problem with the way Oracle stores the
indexes or the nature of their implementation of B-trees,
but just on pure physical optimisation grounds.
Obviously only to be done if needed. The problem of course
would be to prove the need. I'm just wondering if there is
any way it could be done by querying meta-data on the database
rather than rely on potentially ad-hoc deltas of system I/O
wait counters, always notoriously unreliable.
I suppose one could build a query to analyse distribution of
blocks by rowid on the sequence of the range scans, but I'm having
difficulty figuring out how to produce the results in a way that
would be helpful or even make sense.
> If you are suspicious of a particular index, and
> want to examine the details, you really need to
> investigate the tree dump, or even the full block
> dump. There's a site I've just found with an
> interesting couple of articles about internal structures
> that also has a product that uses an external procedure
> to turn the internal structure of Oracle blocks into
> a view - the articles are at
> www.tlingua.com
Thanks. INTERESTING stuff, that BCB.SQL script! It might solve a
problem I had before, must spend some time digesting it.
Hi Nuno
Just to add to Jonathan's comments. I think the issue you raise is not
entirely dissimilar to the question of whether it's beneficial to have
fewer, larger extents in a segment so that physically it might prove more
efficient to read consecutive data blocks. The answer is almost certainly no
for similar reasons.
Firstly, having consecutive blocks in an extent doesn't necessarily mean
that they have to be consecutive on disk. There may be many reasons why this
may not be the case.
However, assuming a raw device as you suggest, we next have the issue of
what happens when Oracle performs an index range scan. When Oracle reads the
leaf block, it then needs to read the corresponding data blocks in the
corresponding table via the obtained rowids. Now there could very well be,
and generally are, hundreds of index entries per leaf block, meaning we have
to wait the relatively very very long time for all these LIOs to be
completed before we're interested in moving onto the next leaf block.
Remember, we read one leaf block, then read hundreds of table blocks, read
the next leaf block, read hundreds of tables blocks, etc.
Now if the leaf blocks are likely to be physical I/Os (hence your concern),
then there is a very very good chance that the corresponding reads to the
table are also likely to be physical I/Os as indexes by nature of their size
generally have a greater likelihood of having cached blocks than their
generally larger tables. So it's likely then that we have to wait for a good
proportion of hundreds of PIOs before we concern ourselves with the next
leaf block. As I mention in my presentation, the clustering factor of the
table has a big influence in how many PIOs might eventuate.
And this then brings us round to the third issue. Most databases are multi
user environments and while we wait for the current index leaf block to be
processed and for all the corresponding table blocks to be accessed, there
is a very probable likelihood of other users or processes needing to perform
some other PIO on the device. So while we wait for all this, some other
bugger has requested read (or write) access somewhere else on the device,
resulting in that good old, slow, mechanical arm being moved anyways from
the ideal location we might like. Note this other bugger could very well be
ourselves if we store index, or table, or rollback, etc. data on the same
device.
This is "contention" is highly probable while processing one leaf block and
waiting to move onto the next. The chances that an entire index range scan
can be performed uninterrupted from contention on the device by any other
processes is almost certain zero on any remotely multi-user environment.
This is the key point. We are not the only user process on the database, nor
the only process wanting to use that same physical device.
So although yes, index splits can indeed result in the physical result you
describe, the chances that it will actually impede and adversely affect the
performance of index range scans in any measurable manner in multi-user
database environments are practically zero. Having all your index leaf
blocks consecutively on disk will likely take exactly the same time to read
compared with having an index with leaf blocks that are not so consecutive
for this very reason.
Again, it would be an easyish thing to test. Rebuild such an index and see
what effect it has on response times. My predication is that within the
boundaries of index rebuilds as I describe in the white paper, the effect
would be negligible.
Hope it makes sense.
Cheers
Richard
> Firstly, having consecutive blocks in an extent doesn't necessarily mean
> that they have to be consecutive on disk. There may be many reasons why this
> may not be the case.
Of course. Not the least of which might be the file system in use in that
same disk. Always had a chuckle when I saw recommendatins to "defragment"
tables in NT servers when NTFS by default will ensure allocation segments
are NOT contiguous!
> Remember, we read one leaf block, then read hundreds of table blocks, read
> the next leaf block, read hundreds of tables blocks, etc.
It all seems to point to the table's "clustering factor" being much, much
more relevant, isn't it? BTW, the reason I'm using ""s is that I don't
see it really as a clustering of the table but more a physical serialization
of rows with same and/or consecutive values on the relevant columns.
Don't know if there is a better term to describe this. Clustering to me
implies the keeping together of rows with same value of keys, which is
only partially the need here. I also want the next value of the keys to
be physically close to the previous value. All this to make sure that
I minimise the I/O on a range scan on those same values.
Am I explaining myself well?
> This is "contention" is highly probable while processing one leaf block and
> waiting to move onto the next. The chances that an entire index range scan
> can be performed uninterrupted from contention on the device by any other
> processes is almost certain zero on any remotely multi-user environment.
> This is the key point. We are not the only user process on the database, nor
> the only process wanting to use that same physical device.
Good point. I'm reminded of the arguments for/against the SAME technique
a coupla years ago or so. In my mind, what I want is an averaged disk I/O
wait time, without hot spots. Which multi-user gives me a lot of and
SAME helps with even more.
Given a typical multi-user, multi-application instance nowadays the arguments
for physical placement stop being "place this here and there" to become more
of an "average the I/O needed to get this". IOW, what SAME is supposed to
give us in the first place.
> Again, it would be an easyish thing to test. Rebuild such an index and see
> what effect it has on response times. My predication is that within the
Oh as far as response times go, there is nothing that will improve them
more dramatically than more efficient SQL and db design, faster comms and
faster CPU, in that order. I/O on average is not a major factor at all.
Except of course in the odd extreme or boundary situations.
> Hope it makes sense.
Sure does. Thanks a lot.
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
June 2004 UK - Optimising Oracle Seminar
July 2004 USA West Coast, Optimising Oracle Seminar
August 2004 Charlotte NC, Optimising Oracle Seminar
September 2004 USA East Coast, Optimising Oracle Seminar
September2004 UK - Optimising Oracle Seminar
"Noons" <wizo...@yahoo.com.au> wrote in message
news:4093a2c2$0$20085$afc3...@news.optusnet.com.au...
> "Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in message
> > So a range scan might now go:
> > block 49, 50, 750, 51, 52
>
> I was trying to figure out if this could occur.
> It appears to me that assuming:
> 1- a very volatile table where its index also gets "pushed around",
> 2- a separate partition for the index,
>
> there might be a case for needing to re-build an index.
> Nothing to do with a problem with the way Oracle stores the
> indexes or the nature of their implementation of B-trees,
> but just on pure physical optimisation grounds.
>
I think the "rules" are always changing in areas like this,
sometimes for reasons that we won't think of because
our habits keep us looking in the wrong direction.
Two (contradictory) thoughts:
1) Why would reading blocks 50, 51, 52
in serial fashion be any quicker than
reading blocks 50, 750, 51 ?
After all, between the first and second
reads, someone else will almost certainly
have put in a read request for a completely
different location on the disk anyway.
2) Oracle has an 'index prefetch' algorithm
that I believe is supposed to kick in for large
index range scans. I think I've seen it happen
a couple of times on an index FULL scan (NB
I don't mean 'fast full scan'). I think the code
works on the assumption that the leaf blocks
for a large range scan are likely to be adjacent,
so a read request is made for multiple blocks,
and irrelevant ones are discarded.
I think every case has to be considered on
its own merits - and we keep finding more pros
and cons that have to be weighed.
Of course, as the default action, I would still
assume that indexes don't need to be rebuilt
to schedule, but still be inclined to rebuild
partitions just before making them read-only.
> But now consider the physical distribution. Isn't that new block coming
> from the freelist? Which might be pointing to a first free block on "the
> other side of the disk", so to speak. This block will now LOGICALLY be
> between the original third and fourth leaf blocks, but physically very
> remote from these two original blocks.
>
> You know where I'm getting at now, don't you? If someone is now doing an
> index range scan - which for the sake of example will involve the third,
> new fourth and old fourth leaf blocks using the block-to-block link
> pointers, wouldn't it involve a heap of I/O wait while the arm got
> repositioned to get the new fourth block from "the other side of the
disk"?
>
> Wouldn't that be a case for an index rebuild, based purely on physical
> distribution rather than hazy "broken b-tree" concepts? What would be
> a way of finding out if this was indeed the case, apart from the obvious
> system I/O wait queue increasing in this device?
It seems to me that your scenario rather assumes that nothing else of
interest is happening on this device - as soon as you allow other segments
to be placed on this device, or other processes to be interested in this
device then any head movement during the scan will likely be lost amongst
head movement due to the multi-user nature of the system. You could also ask
questions about the efficiency of a query that is doing a range scan over 3
or more leaf blocks - though naturally you'd need to see the query.
Interestingly though the conventional wisdom among sql-server folks (and I
remain to be convinced of it but smart folks do argue it) is that one should
do exactly what you describe above as a routine maintenance procedure, in
fact on that system since most tables will in fact have a clustered index
upon them you will likely be reorganising tables, probably weekly if not
daily, exactly to avoid this 'fragmentation' type of issue. dbcc showcontig
is the command that is what you are looking for in that environment. I'm not
aware of an equivalent in the Oracle world.
Cheers
--
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in message
news:c70ngi$r20$1...@sparta.btinternet.com...
> It seems to me that your scenario rather assumes that nothing else of
> interest is happening on this device - as soon as you allow other segments
> to be placed on this device, or other processes to be interested in this
> device then any head movement during the scan will likely be lost amongst
> head movement due to the multi-user nature of the system. You could also ask
Well, pushing that concept to the extreme means also that we should never
bother with clustering anything because multi-user access will destroy any
benefits that may be gained. Which is probably why SAME came about?
Average the I/O wait across all tablespaces and forget about any space
allocation optimisation other than the basics of blocking and reducing
excessive recursive SQL? Maybe that is the way we should face this?
I'm all for simplification and if it means all I have to look at is
the system I/O distribution counters and balancing I/O across disk
subsystems, I love it! I'd rather control I/O distribution through
the OS itself alone rather than have to learn two or three layers of
optimization in order to tune I/O. Or alternatively, "trust it all to
ASM and think of England"? ;)
> Interestingly though the conventional wisdom among sql-server folks (and I
> remain to be convinced of it but smart folks do argue it) is that one should
> do exactly what you describe above as a routine maintenance procedure, in
> fact on that system since most tables will in fact have a clustered index
> upon them you will likely be reorganising tables, probably weekly if not
> daily, exactly to avoid this 'fragmentation' type of issue. dbcc showcontig
> is the command that is what you are looking for in that environment. I'm not
> aware of an equivalent in the Oracle world.
Well, their rows are logically clustered around a given value of a non-unique
key. This has been one of their "workhorses" for performance for well over
a decade, when it first appeared in Sybase. I'm not sure it means anything
in a NTFS environment where physical allocation is all over the place unless
one takes precautions? Or with any modern SAN?
Look in the 10053 traces for lines line:
Index prefetching is on for index XXX
especially on index full scan paths.
(just because it's on, by the way, doesn't
mean you'll see it happen in a 10046 wait
trace).
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
June 2004 UK - Optimising Oracle Seminar
July 2004 USA West Coast, Optimising Oracle Seminar
August 2004 Charlotte NC, Optimising Oracle Seminar
September 2004 USA East Coast, Optimising Oracle Seminar
September2004 UK - Optimising Oracle Seminar
"Noons" <wizo...@yahoo.com.au> wrote in message
news:4094e4c7$0$12033$afc3...@news.optusnet.com.au...
I think this extreme ignores the reality that so many "typical
business systems" will still have times of batch processing. Even
with multiple, er, streams, of batches, whatever scheduling algorithm
is used will hopefully allow some chunks of time for each.
>
> Average the I/O wait across all tablespaces and forget about any space
> allocation optimisation other than the basics of blocking and reducing
> excessive recursive SQL? Maybe that is the way we should face this?
The idea is to account for common possibilities. SAME may be ok for
high-transactional times, but I think the allocation optimisation this
thread is talking about may make some difference during "other" times.
>
> I'm all for simplification and if it means all I have to look at is
> the system I/O distribution counters and balancing I/O across disk
> subsystems, I love it! I'd rather control I/O distribution through
> the OS itself alone rather than have to learn two or three layers of
> optimization in order to tune I/O. Or alternatively, "trust it all to
> ASM and think of England"? ;)
I'd rather have the tools to optimizer multiple layers for different
possible configurations, _and have the tools work together_. This
"trust us, we're experts" stuff from Oracle and MS is condescending
crap inevitably leading to stuff like NT defragmentation and
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=BUG&p_id=3455402
(During large insert into a table, create unrelated RO TS; drop RO TS
cascade including contents and datafiles; SMON barfs on restart, need
to recreate db ;).
We've seen over and over again that increasing the number and depth of
abstraction layers with the hope that Moore's law will bail us out
leads to putting stuff in the wrong layers.
>
> > Interestingly though the conventional wisdom among sql-server folks (and I
> > remain to be convinced of it but smart folks do argue it) is that one should
> > do exactly what you describe above as a routine maintenance procedure, in
> > fact on that system since most tables will in fact have a clustered index
> > upon them you will likely be reorganising tables, probably weekly if not
> > daily, exactly to avoid this 'fragmentation' type of issue. dbcc showcontig
> > is the command that is what you are looking for in that environment. I'm not
> > aware of an equivalent in the Oracle world.
Somehow, I don't think it would be all that difficult to add an option
to "Tablespace Map" in OEM to color-code contiguity. But if everyone
thinks it would be useless...
>
> Well, their rows are logically clustered around a given value of a non-unique
> key. This has been one of their "workhorses" for performance for well over
> a decade, when it first appeared in Sybase. I'm not sure it means anything
> in a NTFS environment where physical allocation is all over the place unless
> one takes precautions? Or with any modern SAN?
I think predictive read-ahead could mean a lot in a modern SAN.
jg
--
@home.com is bogus.
http://www.thememoryhole.org/war/coffin_photos/dover/clarification.htm
> I think this extreme ignores the reality that so many "typical
> business systems" will still have times of batch processing. Even
> with multiple, er, streams, of batches, whatever scheduling algorithm
> is used will hopefully allow some chunks of time for each.
<brain-storming>
Yeah, but is that still a determining factor? I mean, maybe we should
ignore fine tuning of batch and go all out for an even I/O performance
across the board? Much simpler to keep going, no?
> The idea is to account for common possibilities. SAME may be ok for
> high-transactional times, but I think the allocation optimisation this
> thread is talking about may make some difference during "other" times.
Well, I think the thread is saying very clearly that index-level clustering
and other "defrag" techniques are not that important as soon as multi-user
is involved. Which is the vast majority of the time. Should we really bother
with the odd "other" conditions? Does it really gain us much anymore?
</brain-storming>
> I'd rather have the tools to optimizer multiple layers for different
> possible configurations, _and have the tools work together_. This
> "trust us, we're experts" stuff from Oracle and MS is condescending
> crap inevitably leading to stuff like NT defragmentation and
>
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=BUG&p_id=3455402
> We've seen over and over again that increasing the number and depth of
> abstraction layers with the hope that Moore's law will bail us out
> leads to putting stuff in the wrong layers.
Amen!
The problem is of course how to find a middle ground...
> > a decade, when it first appeared in Sybase. I'm not sure it means anything
> > in a NTFS environment where physical allocation is all over the place unless
> > one takes precautions? Or with any modern SAN?
>
> I think predictive read-ahead could mean a lot in a modern SAN.
Used to think so but not so sure nowadays. The SAN knows remarkably
little about the db structure and allocation strategy. With something
like ASM (or any SQL Server equivalent in future) it can all go horribly
wrong. If "intelligence" is to be added in terms of predictive behaviour,
I'd rather it was applied in one place only: either the SAN or the db.
Mixing two "intelligences" is a recipe for disaster IME.
Not sure yet what Oracle's recommendation on ASM is gonna be, but
I suspect they'll eventually ask anyone relying on it to turn off
all the "intelligent" bits everywhere else.
Hi Nuno,
Need to be a little careful here.
Any techniques that can group or cluster like data together so that we can
read this data by accessing fewer blocks is a good thing providing the
benefits we gain are not counter-balanced by the costs of maintaining such
structures.
So for example as I mention in my presentation, if by re-ordering the data
in a table to match the order of our most "important" index via business
critical index range scan operations, we can dramatically reduce the number
of different data blocks we need to visit, which could then noticeably
improves important business response times, then that's a good thing. Same
potentially goes for some clustered data structures. Like I said, it all
depends on the benefits vs. the maintenance costs.
Cheers
Richard
> > Well, I think the thread is saying very clearly that index-level
> clustering
>snippage
> Any techniques that can group or cluster like data together so that we can
> read this data by accessing fewer blocks is a good thing providing the
> benefits we gain are not counter-balanced by the costs of maintaining such
> structures.
>
> So for example as I mention in my presentation, if by re-ordering the data
> in a table to match the order of our most "important" index via business
> critical index range scan operations, we can dramatically reduce the number
> of different data blocks we need to visit, which could then noticeably
> improves important business response times, then that's a good thing. Same
> potentially goes for some clustered data structures. Like I said, it all
> depends on the benefits vs. the maintenance costs.
Like I said: index-level clustering.
Table-level clustering is a totally different kettle.
Well, since those conditions are often prerequisite to the ones where
the Big Bosses are staring at their lackeys and impatiently tapping
their fingers, it gains in the "business requirements" arena.
>
>
> > I'd rather have the tools to optimizer multiple layers for different
> > possible configurations, _and have the tools work together_. This
> > "trust us, we're experts" stuff from Oracle and MS is condescending
> > crap inevitably leading to stuff like NT defragmentation and
> >
> http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=BUG&p_id=3455402
> > We've seen over and over again that increasing the number and depth of
> > abstraction layers with the hope that Moore's law will bail us out
> > leads to putting stuff in the wrong layers.
>
>
> Amen!
> The problem is of course how to find a middle ground...
>
>
> > > a decade, when it first appeared in Sybase. I'm not sure it means anything
> > > in a NTFS environment where physical allocation is all over the place unless
> > > one takes precautions? Or with any modern SAN?
> >
> > I think predictive read-ahead could mean a lot in a modern SAN.
>
> Used to think so but not so sure nowadays. The SAN knows remarkably
> little about the db structure and allocation strategy. With something
> like ASM (or any SQL Server equivalent in future) it can all go horribly
> wrong. If "intelligence" is to be added in terms of predictive behaviour,
> I'd rather it was applied in one place only: either the SAN or the db.
> Mixing two "intelligences" is a recipe for disaster IME.
>
> Not sure yet what Oracle's recommendation on ASM is gonna be, but
> I suspect they'll eventually ask anyone relying on it to turn off
> all the "intelligent" bits everywhere else.
I'd hope they'd allow something like hints to keep the benefits of
everywhere else. Or even communication between everywhere else and
ASM.
jg
--
@home.com is bogus.
http://www.signonsandiego.com/uniontrib/20040504/news_1b4ipo.html
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
"Joel Garry" <joel-...@home.com> wrote in message
news:91884734.0405...@posting.google.com...
Richard, Roger,
My what a mutual appreciation society! Warms my heart to see you two
so cuddly. Richard, interesting presentation, of course as usual you
ignore all the caveats and expansions that go with the expert quotes,
rather reminds me of the fundamentalist bible thumpers we have in
Georgia who take quotes and twist them to support all sorts of
things...but I digress, I just wish you would give complete examples,
of course it is always easier to tear someone else down than come up
with original material. I try to never denigreate other folks in
public, I may show why general old saws are bad, but I don't take the
delite in castigating others they way you seem to!
For example, speaking for myself, when I tell people to rebuild
indexes I also say they should look at the type of index, how it is
used, if it is on concatenated columns and several other factors. I
also tell them to compare it to the number of dirty blocks in the
table and if it is several times that value the index may benefit from
rebuilding. I also tell them sometimes rebuilding will not improve the
clustering factor and then, they should consider re-ordering the table
data (hmmm...where have we heard that one from?)
I have also showed papers and proofs that clustering factor is
dramatically affected by column order (which should be intuitive) in a
concatenated index and that since it is a major factor in cost
calculations, reordering the columns in the indexes and the SQL it
supports, can make a less attractive index more palatible.
It is odd that in nearly every case where a rebuild has reduced
clustering factor, performance is also improved. From all of your
postings this should not be the case. It is also odd that when indexes
show up very broad, they also usually have large clustering factors
and, a high percentage of non-reused space. Yes, an index will reuse
space, but not very often in a high IUD environment and a rebuild or
coalesce will reduce these indexes in size, thus reducing clustering
factor and blocks used, making them once again more palatible for the
optimizer and give higher performance.
Other than violating relational tenents, reordering table data will
dramatically improve access times in many cases. Of course as the
table ages it must be rebuilt over and over again, along with the
index, to maintain this ordering. I hope those that follow this advice
have lots of maintenance window time! Most of my clients barely have
time to backup their terabyte size databases, let alone reorder all of
their major, multi-million row tables. I heard that first from Don
Burleson when I reviewed one of his first Oracle books. It was a carry
over from Mainframe tuning that is still valid. Nice of you to give
the same advice as Don!
Anyway, I wish if you would quote my suggestions, you would also quote
the qualifying material that goes with them, but that wouldn't be as
much fun now would it?
Mike
> Richard, Roger,
Er, that would be "Howard", not "Roger".
[snip]
>I may show why general old saws are bad, but I don't take the
> delite in castigating others they way you seem to!
>
> For example, speaking for myself, when I tell people to rebuild
> indexes
...you make your first mistake, since the advice should be 'not to
unless', not 'rebuild unless'.
[snip]
>I also say they should look at the type of index, how it is
> used, if it is on concatenated columns and several other factors. I
> also tell them to compare it to the number of dirty blocks in the
> table and if it is several times that value the index may benefit from
> rebuilding.
That sentence doesn't even make sense to me. "I tell them to compare it
[what "it" are we talking about? From the context, it sounds like it
should be "the number of leaf nodes of the index"] and compare it to the
number of dirty blocks in the table [if it's "in" the table, safely on
disk, it's not a dirty block is it? Since I know you know that, I have
to assume you mean 'blocks from the table currently in the buffer cache
that have been dirtied'] and if it is several times that value [again,
what value are we talking about? Index leaf node count?] the index may
benefit from rebuilding."
So if I've got this right, you're saying "if the number of table dirty
buffers at any one moment is several times larger than the number of
index nodes, a rebuild might be beneficial".
But that is clearly a ludicrous proposition, so that can't be what you
meant at all. I would like to know what exactly we're supposed to be
comparing to what.
>I also tell them sometimes rebuilding will not improve the
> clustering factor and then, they should consider re-ordering the table
> data (hmmm...where have we heard that one from?)
>
> I have also showed papers and proofs that clustering factor is
> dramatically affected by column order (which should be intuitive) in a
> concatenated index and that since it is a major factor in cost
> calculations, reordering the columns in the indexes and the SQL it
> supports, can make a less attractive index more palatible.
I would suggest that changing the column order in a concatenated index
is not so much an index rebuild as an index redesign. And of course
column order is important, and if you get it wrong, you should of course
change it. And I'm sure Richard wouldn't disagree with that either.
> Anyway, I wish if you would quote my suggestions, you would also quote
> the qualifying material that goes with them, but that wouldn't be as
> much fun now would it?
It would mean trying to squeeze one hell of a lot of material into a
presentation (and apparently using mind-reading techniques to work out
what some of it means in the first place)! If you don't like being
quoted, then don't say it in the first place. If it is so easy to quote
you "out of context", then it rather suggests you have been careless
with your choice of words. If your published works told the facts of the
matter, that an index rebuild hardly ever makes a difference, but that
there are always exceptions, then I doubt anyone would be quoting you
otherwise.
HJR
mike...@earthlink.net (Mike Ault) wrote in message news:<37fab3ab.04050...@posting.google.com>...
>
> Richard, Roger,
Mike,
In your typical fashion, you can't even get the names right ...
>
> My what a mutual appreciation society! Warms my heart to see you two
> so cuddly.
I know it's lovely. Still, we have a long way to go to match the
"closeness" between Don and yourself ...
>Richard, interesting presentation, of course as usual you
> ignore all the caveats and expansions that go with the expert quotes,
> rather reminds me of the fundamentalist bible thumpers we have in
> Georgia who take quotes and twist them to support all sorts of
> things...but I digress,
Not only do you digress , but you're of course utterly wrong (again).
I haven't twisted anything, each quote stands up in it's own right
(your's especially) and this attempt (again) to suggest otherwise and
to try and deflect the issues is both a little tiresome and immature.
> I just wish you would give complete examples,
> of course it is always easier to tear someone else down than come up
> with original material.
I haven't torn anyone down, Mike, you're getting emotional again. I've
simply highlighted how many of these silly Oracle myths are spread by
many of these so-called experts, by giving very clear and concise
examples. I purposely listed a range of quotes by a range of different
authors to highlight how common this myth spreading is. Each quote is
inaccurate no matter the context, as we'll look at later with your
specific example.
With respect to "original material", if you actually *read* the
presentation, you'll find a great deal of it is original and based on
subject matter that is not so well documented.
I try to never denigreate other folks in
> public, I may show why general old saws are bad, but I don't take the
> delite in castigating others they way you seem to!
Mike, you're getting emotional again. Highlighting common errors and
misunderstandings and *correcting" them is not castigating anyone.
It's a real shame that you find it so hard to understand that ...
Dealing with constructive criticisms is not one of your strengths is
it Mike.
>
> For example, speaking for myself, when I tell people to rebuild
> indexes I also say they should look at the type of index, how it is
> used, if it is on concatenated columns and several other factors. I
> also tell them to compare it to the number of dirty blocks in the
> table and if it is several times that value the index may benefit from
> rebuilding. I also tell them sometimes rebuilding will not improve the
> clustering factor and then, they should consider re-ordering the table
> data (hmmm...where have we heard that one from?)
Yes, let's speak about you for example. Your quote above about the
"number of dirty blocks in the table" is a clear example of you not
having a clue what you're talking about. Your other quote about
"sometimes rebuilding will not improve the CF" is yet an example.
Mike, please do me a favour and try out the following little test.
Pick any index you want, any, compute stats so they're 100% accurate
and note the CF in dba_indexes. Jot it down on a piece of paper.
Then rebuild this index with any pctfree you like, compact it as much
as you like. Then re-compute the stats and check out the new CF with
the previous value.
What do you see ?
The values are identical. While you scratch your head and manually
close your lower jaw, pick another index, and repeat the test.
Keep repeating the test until you finally get the point that by simply
rebuilding the index, you do not change the CF. Why is this ? Because
the order of the index entries is unchanged, the order of the table
remains unchanged, therefore if you only realized what the CF actually
represents, you would see that the CF can therefore not change as the
relative order of the *rows in the table* do not change with respect
to the index.
You've just confirmed in your post here that you clearly don't
understand this but let's now look at your quote in my presentation:
"Deleted space is not reclaimed automatically unless there is an exact
match key inserted. This leads to index broadening and increase in the
indexes clustering factor. You need to reorganize
to reclaim white space. Generally rebuild index when the clustering
factor exceeds eight times the number of dirty blocks in the base
table, when the levels exceed two or when there are
excessive brown nodes in the index."
Looking at the quote "Generally rebuild index when the clustering
factor exceeds eight times the number of dirty blocks in the base
table", we can now see that it is entirely wrong, no matter in what
*context* you want to look at it. Rebuilding an index based purely on
the value of the CF as you describe is rubbish, the CF remains
unchanged.
Now, let's look at the other parts of the quote:
"Deleted space is not reclaimed automatically unless there is an exact
match key inserted." is rubbish, again no matter the context. My
presentation clearly demonstrates why this is the case.
"This leads to index broadening and increase in the indexes clustering
factor" is rubbish, again no matter the context. Having deleted or
wasted space does not increase the CF. Inserting (or migrating) rows
so that the order becomes less aligned *in the table* in relation to
the ordering of the index is how the CF degrades.
"You need to reorganize to reclaim white space" is in most cases
wrong. My presentation describes those rare scenarios where excessive
white space may be an issue but the suggestion that you "need" to
reorganise to reclaim white space is wrong.
"Generally rebuild index when the clustering factor exceeds eight
times the number of dirty blocks in the base table, when the levels
exceed two or when there are excessive brown nodes in the index" is
wrong, wrong wrong. The CF bit we've discussed, totally wrong. The
exceeds 2 levels myth is covered in the presentation. Why you
recommend rebuilding *all* your large indexes again and again for no
benefit is beyond me. Simple fact. Large indexes require more levels
and if by rebuilding these larger indexes you don't reduce the level,
why bother. The excessive brown nodes is discussed in the presentation
where it shows how this space is reusable.
So really Mike, your quote, no matter the *context*, doesn't really
stand up at all well does it ...
>
> I have also showed papers and proofs that clustering factor is
> dramatically affected by column order (which should be intuitive) in a
> concatenated index and that since it is a major factor in cost
> calculations, reordering the columns in the indexes and the SQL it
> supports, can make a less attractive index more palatible.
Yes Mike but aren't you clouding the waters here. This is dropping and
re-creating a totally *different* index, which is hardly the same
thing as recommending the rebuilding of an existing index so that you
can end up with the same CF ...
>
> It is odd that in nearly every case where a rebuild has reduced
> clustering factor, performance is also improved.
Really odd I would say as rebuilding an index doesn't effect the CF !!
The old "performance has definitely improved although I can't really
tell why, or precisely by how much ..."
And you *still* promote this rubbish. How many books have you written
...
>From all of your
> postings this should not be the case.
If you *actually read* my presentation you'll see that an improved CF
most definitely can improve performance. Unfortunately, you have no
clue how to improve the CF ...
>It is also odd that when indexes
> show up very broad, they also usually have large clustering factors
> and, a high percentage of non-reused space.
Mike, suggesting there is a co-relation between the size of an index
and the *effective* CF is wrong wrong wrong. Yes, large tables have
larger "values" for the CF because they obviously have more rows.
However, large indexes can have a perfect CF (equal to the number of
blocks in the table) and small indexes can have an awful CF (equal to
the number of rows in the table). Mike, I have the very very strong
suspicion you have no idea what the CF represents and how it's the
order of the *table* in relation to the index that governs the CF,
pure and simple.
My suspicion is right isn't it ?
>Yes, an index will reuse space,
Didn't you say only if the new index value is the same ? I'm sure you
did, oh, look at your quote ...
That's funny you're got to admit !!
>but not very often in a high IUD environment
Why is that Mike ? Actually read my presentation first, then answer my
question, it might just change your mind ...
> and a rebuild or
> coalesce will reduce these indexes in size, thus reducing clustering
Mike, this is rubbish, we've been through this. The fact you get this
wrong again and again kinda highlights your lack of understanding here
...
Was that 10 books Mike ...
> factor and blocks used, making them once again more palatible for the
> optimizer and give higher performance.
All sounds very good and practical in these performance books you
write. Unfortunately, the truth is somewhat different to your sense of
reality. *Read* my presentation to learn and find out why.
Perhaps I should have my presentation published ...
>
> Other than violating relational tenents, reordering table data will
> dramatically improve access times in many cases. Of course as the
> table ages it must be rebuilt over and over again, along with the
> index, to maintain this ordering. I hope those that follow this advice
> have lots of maintenance window time! Most of my clients barely have
> time to backup their terabyte size databases, let alone reorder all of
> their major, multi-million row tables.
Of course rebuilding a table is not something one can do at the drop
of the hat, if at all. But at least by rebuilding and re-ordering the
table, you *do* influence the CF.
>I heard that first from Don
> Burleson when I reviewed one of his first Oracle books. It was a carry
> over from Mainframe tuning that is still valid. Nice of you to give
> the same advice as Don!
When Don is right, I agree with him. When Don is wrong, I don't agree
with him. For some reason, you have a problem with that.
>
> Anyway, I wish if you would quote my suggestions, you would also quote
> the qualifying material that goes with them, but that wouldn't be as
> much fun now would it?
As I've explained, your quote in whatever *context* you like to place
it in, is utter tripe. Pure and simple. My presentation explains, as
I've briefly attempted to here, why that is the case.
Mike, why don't you get off your little pedestal of how you are
sooooooo good and how I'm soooooo bad, and simply thank me for
pointing out why you are so utterly wrong and for teaching you how
indexes and the CF in particular work in Oracle. You might even
mention me in your next book for showing you the light (thought I'll
use a biblical example too).
Be a nice touch don't you think.
One last point.
Just sit back and think about all the people you've taught over the
years and who have read your book and who now wrongly think that
deleted index entries are only reused if identical values are
inserted, who rebuild indexes in the naive hope that the CF will
improve, who rebuild all their massive indexes again and again in the
hope they will drop a level, who think that a large value for the CF
is bad simply because the table is large, etc. etc. etc.
And yet Mike, you think *I'm* the bad guy ...
Now that really is funny.
Richard
> You sound so much like an Oracle 7 DBA! First take a look at
> www.jlcomp.demon.co.uk/faq/table_frag.html, then take a look at the
> referenced Steve Adam's paper if you want more details.
<snip>
Hi Daniel,
looks like I have at least one reader then :o)
Cheers,
Norman.
Hi Richard,
in fairness to at l;east one of the quotes experts, Jon Wang, when I
saw his article on DBAZine, I wrote up and sent him, and DBAzine, an
article proving without doubt that deleted entries in an index are
reused as soon as an attempt is made to write any entry that fits into
the block(s) in question.
John replied and acknowleged his error and promised a following
article would correct his information. DBAzine replied too and said
that when they had heard back from John, they would probably publish a
correction.
As yet, nothing :o(
Cheers,
Norm. (Behind the firewall from hell !)
> Other than violating relational tenents, reordering table data will
> dramatically improve access times in many cases. Of course as the
> table ages it must be rebuilt over and over again, along with the
> index, to maintain this ordering. I hope those that follow this advice
> have lots of maintenance window time!
Dunno, Mike. I won't dwell on the relational tennets: no one seems
to care about those nowadays...
IME what happens is tables are often initially loaded with data in
random order regarding what the arrival rate will be in future.
So, you end up with a table that has initial data not clustered at all,
then all future data in the so-called "right" sequence.
Simple example: where a "INVOICE_HEADER" table is loaded with data
in sequence of invoice number, togetherr with an "INVOICE_LINE" table
with data in sequence of a surrogate key. Not the sequence of the FK
to "INVOICE_HEADER".
Now, in future all new rows in the "LINE" table will most likely
arrive "clustered" on invoice number FK: apps often enter the
lines of an invoice in one batch. But the initialy loaded ones will
never be so.
The result: you get tremendous variance in efficiency of access via
an index on the FK: some queries will run like lightning, others will
cause huge I/O. Others yet will have crazy behaviour, sometimes taking
a long time, others just running OK-ish. The bigger the delta between
initial table volume and table load rates, the bigger this problem
will be.
Now: re-order/re-cluster the table data ONCE,
and Cod - er,sorry: Bob - will be your uncle.
> their major, multi-million row tables. I heard that first from Don
> Burleson when I reviewed one of his first Oracle books. It was a carry
> over from Mainframe tuning that is still valid. Nice of you to give
> the same advice as Don!
Actually, the first time it was mentioned in the specific Oracle context
was long before Don wrote any books: in Compuserve, in the Oracle
User's Group that Chris Wooldridge used to run there. And where many
"original" ideas and insights were "pinched" along the years. Not just
on this subject either, and by a great many writers.
> Now: re-order/re-cluster the table data ONCE,
> and Cod - er,sorry: Bob - will be your uncle.
Good point, I think. If more people worried just very occasionally about
their table organisation, and worried not at all about their index
(re)organisation, we might just have the balance just about right.
> Actually, the first time it was mentioned in the specific Oracle context
> was long before Don wrote any books:
You mean there *was* a time before Don wrote books??
Regards
HJR
Hi Norm,
I also contacted John and DBAzine approximately a year or so ago after I
read all his articles and detailed briefly a whole collection of errors I
found in many of them. I strongly advised he spend more time researching and
getting his facts together and less time writing.
John responded with a "thanks, you spent a bit of effort there" and to his
credit, has only written the one article since. I received not a word from
DBAzine.
As his article was still there the last time I looked and as this particular
error (one of many, many) was so basic and yet so typical of many, I
included it in my little list.
What did you think of the slides ?
Cheers
Richard
The biggest chuckle I got was the line "For now, the most any Oracle
professional can do is to explore their indexes and learn how the
software manages to [sic] b-tree structures."
I couldn't agree more, and I can think of at least one "Oracle
professional" to whom that advice particularly applies.
There are, for example, the inevitable doses of Burlesonian error: "To
illustrate, assume I have an index on the last_name column of a
1,000,000 row table and the clustering_factor is 1,000,000 indicating
that the rows are in the same sequence as the index". A clustering
factor that is equal to the number of rows in the table indicates
nothing of the kind, of course.
But whatever: for wimping out, the article can't be beat! "So who is
right [those who say you should rebuild indexes or those who say you
shouldn't]? I suspect that they are both correct." Uh huh. I suspect
that to be both a semantic and logical impossibility myself. But if we
wait long enough, I think he might actually just get to the facts of the
matter.
We can but hope.
In the context of the present discussion, however, was is most
interesting is the distance that apparently now exists between Don and
Mike on the issue. I hope the divorce won't be too messy.
Regards
HJR
Snip: > Er, that would be "Howard", not "Roger".
So sorry Howard, won't happen again.
Snip: > ...you make your first mistake, since the advice should be
'not to
> unless', not 'rebuild unless'.
Again, you cut the sentence short only quoting enough to "prove" your
point which is no proof at all.
Snip: That sentence doesn't even make sense to me. "I tell them to
compare it
> [what "it" are we talking about? From the context, it sounds like it
> should be "the number of leaf nodes of the index"]
Again, in the original context, the presentation in which I was
quoted, incompletely, the context was clustering factors. Sorry I
thought you knew this. But I should perhaps have requoted myself to
those who didn't back track to the presentation.
Snip: If you don't like being
> quoted, then don't say it in the first place.
Advice we should all follow, however, delibrately misleading quotes
that don't take into account supporting material stray in to dangerous
territory, especially when they are used to damage the reputations of
others.
Sinp: If your published works told the facts of the
> matter, that an index rebuild hardly ever makes a difference, but that
> there are always exceptions, then I doubt anyone would be quoting you
> otherwise.
If people look at the recommendations in total, then they know I only
advocate rebuilds when it is statistically indicated or performance
points at an index that is showing the signs of aging I have
discussed. In the situations I discuss rebuilds have reduced
processing times from 5 hours to 2 hours, reduced index sizes from 80
gig to 20-30 gig and in another case reduced index tablespace
utilization by 90%. In testing changing the column orders in a
concatenated index can reduce cluetering factors by an order of
magnitude. I could go on, but it is probably casting pearls.
Mike
Hi Howard,
I actually went to some considerable trouble (again) to write directly to
Don when this article first came out. I highlighted for him all the numerous
errors and inconsistencies that the article contained and offered
suggestions on how to rewrite the thing. Note it was a long, long detailed
list of suggestions. He initially responded by saying opps, they published
the wrong version, only for the second version be just as appalling.
Subsequent suggestions were totally ignored so I simply gave up on it all.
In fact, part of my motivation (all be it a small part) for my Oracle Index
Internals paper was to try and write a document that was the exact opposite
to Don's. I wanted it to be technically accurate, I wanted it to be
relatively detailed, I wanted it to make conclusions that were proven and
demonstrable, I wanted to have simple demos that were easy for others to
copy and apply in their own situations, I wanted it to motivate others to
explore and experiment and make their own *valid* conclusions about indexes.
I feel very comfortable with anyone who wishes to compare the two ;)
Cheers
Richard
Just thought I'd quote out of context a bit - seemed right somehow :)
You'll note that I've already recommended the presentation - I like it a
lot.
I do agree somewhat with Mike though, attaching names to the quotes has both
advantages and disadvantages.
First of all, it isn't unreasonable for someone so quoted to consider
themselves attacked personally - especially if they have a personality type
that says a disagreement with what I have said is a disagreement with me as
a person.
Secondly, some websites allow articles to be revised without any indication
that this has happened - for example unless I am being blind Don's December
03 article on dbazine no longer contains the sentence you quote - namely "If
the index clustering factor is high, an index
rebuild may be beneficial" doesn't appear anywhere in the inside oracle
indexing article that is up currently. This could lead the unwary to quote
something that later got corrected and help perpetuate the myth. FWIW I see
this as bad journalism on the part of the website not the article author -
good journalism would peer review the article before publication and not any
changes. .
On the other hand naming quotes lends credence to the fact that these ideas
*are* real and are in circulation.
--
Niall Litchfield
Oracle DBA
Audit Commission UK
http://www.niall.litchfield.dial.pipex.com/
.
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
"Howard J. Rogers" <h...@dizwell.com> wrote in message
news:409a2cf9$0$20347$afc3...@news.optusnet.com.au...
> used, if it is on concatenated columns and several other factors. I
> also tell them to compare it to the number of dirty blocks in the
> table and if it is several times that value the index may benefit from
> rebuilding.
>
> Mike
My apologies ... I try to stay out of these love fests ... but I am
quite confused by your reference to dirty blocks in a table?
Can you please explain how a table can contain a dirty block?
Thanks.
--
Daniel Morgan
http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp
http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp
damo...@x.washington.edu
(replace 'x' with a 'u' to reply)
> Howard,
>
>
>>[what "it" are we talking about? From the context, it sounds like it
>>should be "the number of leaf nodes of the index"]
>
>
> Again, in the original context, the presentation in which I was
> quoted, incompletely, the context was clustering factors.
You weren't in the original context, but making a post to a newsgroup.
You might at least make it comprehensible. But whatever. I take it
therefore that the original quote should have read
"If the number of dirty blocks in the table is several times the
clustering factor, the index might benefit from a rebuild"
Is that right now?
In which case, would you tell me what a "dirty block in a table" is.
Dirty buffers, I am familiar with. Blocks I am familiar with. But dirty
blocks seems to me to be a contradiction in terms.
> Advice we should all follow, however, delibrately misleading quotes
> that don't take into account supporting material stray in to dangerous
> territory, especially when they are used to damage the reputations of
> others.
Well, did you or did you not write "Deleted space is not reclaimed
automatically unless there is an exact match key inserted."?
Let's just start with that one, shall we? How much "supporting material"
is actually needed to make that even approximately resemble the actual
truth (which happens to be that deleted space is reclaimed whenever an
entry that needs it visits a block that contains it, and the entry
doesn't have to match at all). That must be some supporting material!
> If people look at the recommendations in total, then they know I only
> advocate rebuilds when it is statistically indicated
I think that's the point, don't you? Your statistics are questionable.
>or performance
> points at an index that is showing the signs of aging I have
> discussed. In the situations I discuss rebuilds have reduced
> processing times from 5 hours to 2 hours, reduced index sizes from 80
> gig to 20-30 gig and in another case reduced index tablespace
> utilization by 90%. In testing changing the column orders in a
> concatenated index can reduce cluetering factors by an order of
> magnitude.
See, there you go again. You seem to think that dropping an index and
re-creating it with a different column order comes under the rubrick of
an index *rebuild*, when it is no such thing, but is actually a bit of
index *redesign*. It doesn't help clarify an issue if you keep changing
your mind on what the issue we're talking about actually is.
Your claims of 3 hour reduction in run times or 90% reduction in space
utilization are utterly meaningless if we can't be certain whether you
are talking about 'alter index blah rebuild' or some other bit of DML
you happen to have fired off at the time.
>I could go on, but it is probably casting pearls.
Good of you to hold off on the personal abuse for as long as you
managed. I realise you have to make an effort in these things.
HJR
> On the other hand naming quotes lends credence to the fact that these ideas
> *are* real and are in circulation.
Kowing certain of those authors' propensity to fire off "I'll be suing
you" emails at the drop of a hat (and never quite managing to actually
do so), I thought it was a bit brave of Richard, too, actually. But if
they didn't have names attached, you'd never believe they were ever
really uttered, but were just some horrors Richard had knocked up
himself -precisely as you suggest here. It is also educational to know
which Oracle "experts" are best avoided.
So in the end, I thought what he did was necessary and appropriate.
Regards
HJR
Well, maybe one guy...
http://searchdatabase.techtarget.com/originalContent/0,289142,sid13_gci962948,00.html
:) (article is about Date for those who don't want to click there)
I thought the tenet was "the order is not guaranteed" as opposed to
"reordering must not be done."
>
> Actually, the first time it was mentioned in the specific Oracle context
> was long before Don wrote any books: in Compuserve, in the Oracle
> User's Group that Chris Wooldridge used to run there. And where many
> "original" ideas and insights were "pinched" along the years. Not just
> on this subject either, and by a great many writers.
Still wondering if that stuff is archived anywhere. I thought it was
run by Michael mumblesomebodyfromminneapolis...
jg
--
@home.com is bogus.
"Marijuana has not killed anybody in the history of the world alive
today." - Joe Rogan
Beyond necessary. References are a prerequisite for academic
veracity.
Still not convinced that a few inaccuracies cast doubt on an entire
body of work. But agree that reproducible test cases are much better
than bald assertions. But am also concerned that not everything in a
complex system is reproducible or testable.
jg
--
@home.com is bogus.
Be seeing you! http://www.retroweb.com/prisoner.html
> Beyond necessary. References are a prerequisite for academic
> veracity.
>
> Still not convinced that a few inaccuracies cast doubt on an entire
> body of work. But agree that reproducible test cases are much better
> than bald assertions. But am also concerned that not everything in a
> complex system is reproducible or testable.
The old 'ghost in the machine' idea. True enough, which is why one
always allows for exceptions (or should do), and tries not to sound like
Moses (difficult, perhaps, to achieve the required tone of moderation
when first advancing an idea that flies in the face of what passes for
accepted practice, but there eventually, we hope).
What it isn't, however, is two completely contradictory things at the
same time, nor a matter of 'high emotion'.
Not being able to pin down every last nuance doesn't mean abandoning any
effort to be scientific about gathering test-cases and evidence; or
proposing a hypothesis and trying hard to blow it to bits. As, indeed,
you say. And as, indeed, I thought Richard's paper was.
Regards
HJR
>
> What did you think of the slides ?
>
I think they make a dramatic contribution to road safety. (you
probably don't remember that UK road safety advert !)
Actually, I think I need to spend a lot of time going through them to
see if my brain is able to understand some of the more technical
parts. However, from a first walk through, they look excellent.
Thanks.
Cheers,
Norm (Behind the firewall).
Well, I certainly thought it was appropriate. I don't think one can or
should cast doubt on the validity of a 548 slide presentation/seminar on the
basis of a few quotes. I prefer to let the content speak for itself.
As far as avoiding experts goes, I can only think of one really well known
name that I shudder when I discover that they have produced something new -
and I don't feel good with myself for doing that much either - and that in
the end is less to do with the quality of the content (though that quality
can be shocking) as with the unprofessional attitude evidenced in the way
they deal with other members of the Oracle community. As always YMMV.
> Well, maybe one guy...
>
http://searchdatabase.techtarget.com/originalContent/0,289142,sid13_gci962948,00.html
>
> :) (article is about Date for those who don't want to click there)
I see "best-selling author" Celko is also there. Now, if there was a
contest between Celko and DB, my money would be on the later winning on
relevance... But anyways, Fabian and Chris do a super job exposing
that one.
> I thought the tenet was "the order is not guaranteed" as opposed to
> "reordering must not be done."
Well spotted. Very much so.
> Still wondering if that stuff is archived anywhere. I thought it was
> run by Michael mumblesomebodyfromminneapolis...
Dunno. My OS2 disk is long gone, where I stored it all. Pity:
I should have held on to it if nothing else, for fun.
As usual, totally wrong. . .
Actually, anybody with the balls to bother arguing with you loosers
should be commended.
In fact, I'm so impressed with his posts here that I just hired him.
He starts in 3-weeks:
http://dba-oracle.com/oracle_news/2004_5_11_burleson.htm
If you want to retain him to explain index internals to you, here are
his rates:
>>In the context of the present discussion, however, was is most
>>interesting is the distance that apparently now exists between Don and
>>Mike on the issue. I hope the divorce won't be too messy.
>
>
> As usual, totally wrong. . .
Oh, quite probably. I was having to wade through your illiterate
ramblings to work out what you meant; and having to parse Mike's obscure
sentence structure to work out what he meant. It is not at all unlikely
that I, in that heroic effort of interpretation, got it wrong. That,
after all, is the usual meaning of the word "apparently".
> Actually, anybody with the balls to bother arguing with you loosers
> should be commended.
A sentiment that does your professional credibility, er. Well. Whatever.
> In fact, I'm so impressed with his posts here that I just hired him.
Great. So we now know that by avoiding your company, we can avoid *two*
lots of bad advice in one easy decision. That's quite considerate of
you, Don.
HJR
VC
"Howard J. Rogers" <h...@dizwell.com> wrote in message
news:409c0b38$0$30603$afc3...@news.optusnet.com.au...
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
"Don Burleson" <d...@burleson.cc> wrote in message
news:998d28f7.04050...@posting.google.com...
> Actually, anybody with the balls to bother arguing with you loosers
> In fact, I'm so impressed with his posts here that I just hired him.
I don't mind when folks disagree, however, when you attach names to
quotes and then say in effect this person is an idiot and this is
rubbish, it does appear to be a personal attack rather than a
professional one.
To distill my advice on index rebuilds, I say to only rebuild indexes
which show, through proper analysis, to have problems. Examples of
possible probelms are: a large clustering factor to dirty base table
block ratio, a large number of levels (of course large is relative,
some folks say 4 other 2 is where "large" starts), or a large amount
of white space (I call it browned nodes, following with the leaf and
tree metaphore.) I do not advocate rebuilding all indexes.
In the case of a truely random index (such as some text indexes and
concatenated indexes) rebuild of the index will achieve nothing.
Unless you order the table inaccordance with the index, the clustering
factor will remain nearer to the number of rows than to the number of
dirty blocks. However, reordering the index columns to better align it
with the underlying table order can reduce clustering factor
significantly.
In special indexes such as bitmaps, the clustering factor can increase
dramatically with IUD activity, and a rebuild will reduce it just as
dramatically.
Reduction in the size of the clustering factor improves the chance the
index will be selected by the CBO since the clustering factor is a
multiplier in the index cost.
When this advice has been followed we (the folks at TUSC and other
consultants) have seen batch times reduced from 5 hours to 2 hours,
significant reduction in index space usage and significant reduction
in required IO against the index tablespace.
As to whether an index rebuild will affect clustering factor, yes it
will if the index has been broadened due to node splits and other
activities. Remember that as old data is deleted and new data is
inserted into the base table the order of rows in the base table
changes as blocks are moved on and off of the free/used extent areas.
By definition the order of rows in a relational table is random. So a
preloaded table that is perfectly ordered by key eventually falls into
Oracle entropy if it undergoes delete and insert activities (the
property which forces and ordered table into disorder.) This means
that you may start with a perfect clustering factor but with no
changes to the index, table level changes will alter the clustering
factor and a rebuild will reduce it.
I agree that if the base table is held constant and only inserted into
then there is little likelyhood that the indexes will need rebuilds.
However, we all don't live in the town Perfect and we have to deal
with the real world where both tables and indexes get messy and must
periodically be cleaned up. Most of the arguments against index
rebuilds seem to ignore the fact that both the base table and the
index are in a state of flux in a high IUD environment and over time
get misaligned, hence causing the clustering factor to not truly
represent the ordering of the table against the the index. A rebuild
realigns the index nodes to better reflect the structure of the
underlying base table.
I completely understand the concept of clustering factor, both in an
ideal world and in the real one. I must deal with the real one.
Mike Ault
> Ok.
>
> I don't mind when folks disagree, however, when you attach names to
> quotes and then say in effect this person is an idiot and this is
> rubbish, it does appear to be a personal attack rather than a
> professional one.
I agree. I think everyone involved should keep their personal opinions
of the others to themselves. Lets just deal with Oracle, David Bowie,
and Rugby.
> To distill my advice on index rebuilds, I say to only rebuild indexes
> which show, through proper analysis, to have problems.
I doubt anyone would disagree.
Examples of
> possible probelms are: a large clustering factor to dirty base table
But to make this 'discussion' more valuable to everyone can we get
explanations or examples of these things. I, for one, have no idea
what a "dirty base table" is and would appreciate it if explanations
for clarity were included.
> Reduction in the size of the clustering factor improves the chance the
> index will be selected by the CBO since the clustering factor is a
> multiplier in the index cost.
It seems to me that there is a dispute as to whether the clustering
factor is affected by a rebuild. Can someone please provide evidence,
and I mean a demo, proving that their point of view is valid. Is
there a demonstration of the clustering factor changing with a rebuild?
I haven't seen one. And yet my recollection is that someone posted a
demo that showed the opposite.
> When this advice has been followed we (the folks at TUSC and other
> consultants) have seen batch times reduced from 5 hours to 2 hours,
> significant reduction in index space usage and significant reduction
> in required IO against the index tablespace.
I'm not disputing this but it is just anecdotal. Can you produce a
demo ... CREATE TABLE, INSERT INTO TABLE, etc. that demonstrates this
so that those of us in the bleachers can follow along?
> Mike Ault
Well, it's actually appreciated by me at least that you have stuck
around long enough this time to actually discuss the issue. Don's last
post kind of indicated that you weren't going to.
So, thanks for continuing to discuss the issue, and fair enough if we
don't agree, so long as neither one nor other of us actually misleads,
right?
>
> To distill my advice on index rebuilds, I say to only rebuild indexes
> which show, through proper analysis, to have problems. Examples of
> possible probelms are: a large clustering factor to dirty base table
> block ratio,
Well, it appears we are never going to get an explanation of what a
dirty base table block is (I've only asked three times, and Daniel's
asked too).
But whatever this mysterious beastie happens to be, what possible
difference can it make to such a ratio to rebuild an index??
The ratio is based on an index's clustering factor. An index's
clustering factor DOES NOT CHANGE WHEN AN INDEX IS REBUILT. Ever. So if
the ratio is bad *before* a rebuild, it will be *exactly as bad* after
the rebuild.
The ratio must therefore be considered meaningless, and this cannot
therefore be an example of "proper analysis".
In fact what it is, is an example of the "ratio snake-oil" that has been
peddled for ages (not always by you, Mike) which seeks to make easy
decisions about things like 'when do I add memory to my buffer cache';
'when is an index useful'; 'when should I increase my shared pool size';
and 'when should I rebuild an index'.
It would be lovely if there was a nice simple couple of figures you
could capture, divide, and thus get your answer. But any such figure
that involves the clustering factor of an index had better understand
what the clustering factor actually is, and deal with the fact that it
never, ever, changes as a result of a simple index rebuild.
>a large number of levels (of course large is relative,
> some folks say 4 other 2 is where "large" starts), or a large amount
> of white space (I call it browned nodes, following with the leaf and
> tree metaphore.) I do not advocate rebuilding all indexes.
>
> In the case of a truely random index (such as some text indexes and
> concatenated indexes) rebuild of the index will achieve nothing.
> Unless you order the table inaccordance with the index, the clustering
> factor will remain nearer to the number of rows than to the number of
> dirty blocks. However, reordering the index columns to better align it
> with the underlying table order can reduce clustering factor
> significantly.
That is NOT an index rebuild. That is an index REDESIGN.
To suggest otherwise is to either be (a) extremely imprecise in your use
of language or (b) intellectually dishonest.
> In special indexes such as bitmaps, the clustering factor can increase
> dramatically with IUD activity, and a rebuild will reduce it just as
> dramatically.
>
> Reduction in the size of the clustering factor improves the chance the
> index will be selected by the CBO since the clustering factor is a
> multiplier in the index cost.
ALTER INDEX X REBUILD never, ever, ever, ever, ever alters an index's
clustering factor. And it is *that* command that everyone has in mind
when the words "should I rebuild my indexes" comes up in polite
conversation. Not 'drop index X...create index newx'.
> When this advice has been followed we (the folks at TUSC and other
> consultants) have seen batch times reduced from 5 hours to 2 hours,
> significant reduction in index space usage and significant reduction
> in required IO against the index tablespace.
Translation: when we completely redesign our indexes, so that they are
built on different columns and with different column orders, we get
improvements in their efficiency.
Why am I not surprised at that? Why is *no-one* surprised at that?
Because index redesign is obviously and powerfully capable of yielding
performance benefits. Index rebuilding, however, isn't.
> As to whether an index rebuild will affect clustering factor, yes it
> will if the index has been broadened due to node splits and other
> activities.
Rebuilding an index never, ever, ever, ever, alters its clustering
factor, no matter how much fresh air is in the index:
SQL> create table T1 as select * from dba_objects order by owner;
SQL> create index I1 on T1(object_id);
SQL> analyze table T1 compute statistics;
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
4529
SQL> alter index I1 rebuild pctfree 99;
SQL> analyze table T1 compute statistics;
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
4529
SQL> alter index I1 rebuild pctfree 0;
SQL> analyze table T1 compute statistics;
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
4529
Strangely, my clustering factor never changes as a result of an index
rebuild. The only thing that fixes it is to rebuild my *table*:
SQL> create table Ttemp as select * from T1;
SQL> truncate table T1;
SQL> insert into T1 select * from ttemp order by object_id;
SQL> analyze table T1 compute statistics;
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
76
You CANNOT alter your clustering factor by an index rebuild, and I don't
care how many node splits or "other activities" you've been suffering
from. That's because clustering factor is an indication of how well the
*table* rows are ordered vis-a-vis an index ordering, not how much fresh
air your index may or may not contain.
>Remember that as old data is deleted and new data is
> inserted into the base table the order of rows in the base table
> changes as blocks are moved on and off of the free/used extent areas.
Absolutely true.
> By definition the order of rows in a relational table is random.
Also absolutely true.
>So a
> preloaded table that is perfectly ordered by key eventually falls into
> Oracle entropy if it undergoes delete and insert activities (the
> property which forces and ordered table into disorder.)
Yet again, absolutely true.
> This means
> that you may start with a perfect clustering factor but with no
> changes to the index, table level changes will alter the clustering
> factor and a rebuild will reduce it.
Absolute nonsense. 'The table order of rows has degenerated, so an index
rebuild will fix it'?? An index rebuild will not affect the order of
rows in the table at all (quite obviously). An index rebuild will not
affect the order of leaf entries in the index (quite obviously). And the
clustering factor is a measure of the similarity of the two orderings,
so an index rebuild cannot, ever, cause the clustering factor to change.
See the above test. Except that I know you will claim that's not a valid
test, because no DML is going on. So let's do some DML:
SQL> update T1 set object_id=object_Id+10;
6230 rows updated.
SQL> commit;
Commit complete.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
4541
Yup. DML sure does change the Clustering Factor (it's 4541 now, not
4529). DML changing the order of rows in a table, and thus affecting the
clustering factor, is not at issue. It's whether you can fix the thing
up with an index rebuild:
SQL> alter index I1 rebuild;
Index altered.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
4541
And no you can't. The clustering factor was affected by DML, and unless
you REBUILD THE TABLE, the clustering factor is not going to change. Try
again. Let's delete every other row in the table:
SQL> delete from T1 where mod(object_id,2)=1;
3109 rows deleted.
SQL> commit;
Commit complete.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
2049
Clustering factor has come down a fair bit as a result of DML. It's
true. But you're saying, Mike, that an INDEX REBUILD will fix it up:
SQL> alter index I1 rebuild;
Index altered.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
2049
Oh no it won't. And one last test:
SQL> insert into T1 select * from dba_objects order by secondary;
6230 rows created.
SQL> commit;
Commit complete.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
8471
Horror of horrors, my clustering factor has shot up as a result of the
insert. But no worries, Mike's advice is that the rebuild will fix it:
SQL> alter index I1 rebuild;
Index altered.
SQL> analyze table T1 compute statistics;
Table analyzed.
SQL> select clustering_factor from user_indexes
2 where index_name='I1';
CLUSTERING_FACTOR
-----------------
8471
Ooops. There are, in other words, NO CIRCUMSTANCES where an index
rebuild changes the clustering factor. If DML over time has caused your
clustering factor to go sky-high, sure that's a problem and needs
looking at. But the one thing that is guaranteed NOT to fix the problem
is an INDEX rebuild.
> I agree that if the base table is held constant and only inserted into
> then there is little likelyhood that the indexes will need rebuilds.
> However, we all don't live in the town Perfect and we have to deal
> with the real world where both tables and indexes get messy and must
> periodically be cleaned up. Most of the arguments against index
> rebuilds seem to ignore the fact that both the base table and the
> index are in a state of flux in a high IUD environment and over time
> get misaligned, hence causing the clustering factor to not truly
> represent the ordering of the table against the the index. A rebuild
> realigns the index nodes to better reflect the structure of the
> underlying base table.
Jeez. How in God's name does an index rebuild "realign the index nodes
to better reflect the structure of underlying base table"? There is only
one way for index nodes to be ordered or aligned, and that's in
ascending alphabetic or numeric order, depending on what column(s)
you've built the index on, of course. An index cannot "reflect the
structure of the table". And a rebuild doesn't change that fact at all,
ever.
As is evidenced by the fact that I can rebuild my index until the cows
come how and the clustering factor NEVER, ever changes.
>
> I completely understand the concept of clustering factor, both in an
> ideal world and in the real one. I must deal with the real one.
You don't, Mike. Clearly you don't. Otherwise you wouldn't make such
silly statements to the effect that a rebuild will alter it; the
fluffiness of an index will affect it; or that a rebuild somehow makes
the index realign itself to match the table structure.
You are trying to offer quick solutions when none exist, and what is
worse, the solutions exhibit a lack of understanding about the very
materials we are dealing with. And worst of all, your advice achieves
nothing. Nothing at all. Rebuild with gay abandon as you may, and your
clustering factors will remain unchanged, every time.
Which means, in turn, that any ratio or measure that you can possibly
devise that uses the clustering factor as a factor in determining when
to rebuild the index is just stupid. Because it's going to measure as
much after the rebuild as before.
HJR
>>Mike Ault wrote:
> Well, it appears we are never going to get an explanation of what a
> dirty base table block is (I've only asked three times, and Daniel's
> asked too).
>
> But whatever this mysterious beastie happens to be, what possible
> difference can it make to such a ratio to rebuild an index??
>
Please, please Mike, add me to the list of requestors for this explanation.
Hopefully a clear definition of "dirty base table block" will add more light
than heat to this discussion.
>
> > By definition the order of rows in a relational table is random.
>
> Also absolutely true.
>
Howard, are you sure of this? Relational theorists please correct me if I am
wrong, but I thought the best we could say about the order of rows is that
it is indeterminate, which means we can't even make statistical arguments
based on assumed randomness.
>
> HJR
My only experience with this was with a third party vendor who slipped a
weekly "rebuild every index in the system" job into a production system I
was babysitting without saying a word to anyone (Grrrr!) So I can add a
note here that index rebuilds of any real size are expensive in both CPU and
log archive space as well as being almost always ineffective and pointless.
Thank you folks for an informative and so far entertaining thread. I, for
one, promise faithfully never ever to rebuild an index (not that I ever
would or did, mark you) without revisiting this thread and rereading every
post in it.
Roger S Gay
"Roger S Gay" <roge...@shaw.ca> wrote in message
news:47gnc.395459$Pk3.216572@pd7tw1no...
> "Howard J. Rogers" <h...@dizwell.com> wrote in message
> news:409d7a18$0$442$afc3...@news.optusnet.com.au...
>
>
>>>Mike Ault wrote:
>
>
>>Well, it appears we are never going to get an explanation of what a
>>dirty base table block is (I've only asked three times, and Daniel's
>>asked too).
>>
>>But whatever this mysterious beastie happens to be, what possible
>>difference can it make to such a ratio to rebuild an index??
>>
>
> Please, please Mike, add me to the list of requestors for this explanation.
> Hopefully a clear definition of "dirty base table block" will add more light
> than heat to this discussion.
>
>>>By definition the order of rows in a relational table is random.
>>
>>Also absolutely true.
>>
>
> Howard, are you sure of this? Relational theorists please correct me if I am
> wrong, but I thought the best we could say about the order of rows is that
> it is indeterminate, which means we can't even make statistical arguments
> based on assumed randomness.
Well, OK.... what do we mean by "random" and what do we mean by
"indeterminate"?
Now you maths wonks will tell me that the two are not the same. But for
us poor schmucks trying to find a pair of socks to wear of a morning,
the fact that the ordering in my clothes closet is not random but merely
indeterminate is of no practical consequence. It still takes me
excessive minutes to find what I am looking for.
I doubt that Mike was arguing for "randomness" in its
mathematically-rigorous incarnation. I certainly wouldn't. But a mess of
clothes on the bedroom floor certainly *looks* random enough, and that
is all that was being said here, I think.
> My only experience with this was with a third party vendor who slipped a
> weekly "rebuild every index in the system" job into a production system I
> was babysitting without saying a word to anyone (Grrrr!) So I can add a
> note here that index rebuilds of any real size are expensive in both CPU and
> log archive space as well as being almost always ineffective and pointless.
That is a very important point, of course, and one that hasn't been done
justice so far, because we are all still trying to work out what on
Earth Mike is talking about. But when we do, and when we finally get
over the clustering factor as an issue, we will still have to do
hand-to-hand combat with the fact that index rebuilds are bloody
expensive, so you'd better be pretty certain before you start that the
benefits are going to outweigh the costs.
> Thank you folks for an informative and so far entertaining thread. I, for
> one, promise faithfully never ever to rebuild an index (not that I ever
> would or did, mark you) without revisiting this thread and rereading every
> post in it.
>
> Roger S Gay
I think that is a victory of sorts, then, Roger!
Thanks
HJR
> Me too, me too !.
Don't forget, I really want to find out as well, because it doesn't help
*at all* to have widely-read authors using language in an indeterminate
and imprecise way.
But whatever it turns out to be, don't lose site of the bigger picture:
whatever ratio Mike uses to suggest a rebuild, the rebuild can never,
ever alter the clustering factor of the index.
So one hopes from logical necessity that the clustering factor is not an
element of the ratio he uses to suggest a rebuild.
Because you would then be prompted to rebuild by something which doesn't
change after the rebuild; which therefore promts you to rebuild; which
therefore prompts you to rebuild; which therefore prompts you to
rebuild; which therefore prompts you ...
You get the idea.
Regards
HJR
>>>By definition the order of rows in a relational table is random.
>>
>>Also absolutely true.
>>
>
> Howard, are you sure of this? Relational theorists please correct me if I am
> wrong, but I thought the best we could say about the order of rows is that
> it is indeterminate, which means we can't even make statistical arguments
> based on assumed randomness.
You are correct. The use of the word "random" is exactly what is the
problem with much of this thread: It is imprecise. I know that both
Mike and Howard know that what you said is correct. And both undoubtedly
will respond with "that is what I meant." But it may well be that much
of this dispute is the fact the words are being thrown around without
the clarity that would be provided by the demos we are asking for.
Mike, for example, must be able to quickly and easily determine whether
a clustering factor is altered by a rebuild. The fact that he continues
to repeat his statement indicates to me that we likely have more
sloppinless of language than sloppiness of code. Problem is we have yet
to see that code.
> My only experience with this was with a third party vendor who slipped a
> weekly "rebuild every index in the system" job into a production system I
> was babysitting without saying a word to anyone (Grrrr!) So I can add a
> note here that index rebuilds of any real size are expensive in both CPU and
> log archive space as well as being almost always ineffective and pointless.
Which is a point to which I will agree and I suspect Howard will too.
That it appears that Mike does not is what I would like to see resolved
in this thread if we can just stick to Oracle and not personal insults.
> Thank you folks for an informative and so far entertaining thread. I, for
> one, promise faithfully never ever to rebuild an index (not that I ever
> would or did, mark you) without revisiting this thread and rereading every
> post in it.
>
> Roger S Gay
Let the entertainment continue.
> Roger S Gay wrote:
>
>>>> By definition the order of rows in a relational table is random.
>>>
>>>
>>> Also absolutely true.
>>>
>>
>> Howard, are you sure of this? Relational theorists please correct me
>> if I am
>> wrong, but I thought the best we could say about the order of rows is
>> that
>> it is indeterminate, which means we can't even make statistical arguments
>> based on assumed randomness.
>
>
> You are correct. The use of the word "random" is exactly what is the
> problem with much of this thread: It is imprecise.
No, the problem is intervention by people who don't know what the point
of this thread is.
Imprecision in the use of the word "random" is not the issue.
The issue is "should I rebuild my indexes". And moving on from that,
"how do I know if an index should be rebuilt". That is all. Everything
else is just so much navel-gazing.
Mike Ault claims he has a ratio which tells you when an index should be
rebuilt. That ratio is built on one number we can't quite work out what
he means. And another number, the clustering factor, which we all know
presicely what it means.
Either that ratio is a load of old nonsense, or it isn't. That's all you
have to be precise about. True or false. Black or White. Right or Wrong.
All Mike has to do is to show us how to calculate this ratio, and show
us the ratio improving after an index rebuild (NOT an index redesign).
That's all. It isn't rocket science. And it isn't very difficult. And
it's not at all imprecise.
> I know that both
> Mike and Howard know that what you said is correct. And both undoubtedly
> will respond with "that is what I meant."
Bzzt. I replied, and reply, that I couldn't give a monkeys whether its
random or indeterminate or painted bright blue and playing Knees Up
Mother Brown on the harpsichord. It *doesn't* *actually* *matter*.
> But it may well be that much
> of this dispute is the fact the words are being thrown around without
> the clarity that would be provided by the demos we are asking for.
No, the dispute is: can you give me a ratio that tells me when to
rebuild an index?
Not when one of the components of that ratio is a number which is
invariant when rebuilding, you can't, is my reply.
It is a simple enough dispute.
> Mike, for example, must be able to quickly and easily determine whether
> a clustering factor is altered by a rebuild. The fact that he continues
> to repeat his statement indicates to me that we likely have more
> sloppinless of language than sloppiness of code. Problem is we have yet
> to see that code.
This I will grant you. If Mike keeps stating something when the simplest
of test proves the contrary, he either is merely obtuse, or he is
talking about something we mere mortals aren't.
We have already seen this, in fact, when Mike subtley shifts the meaning
of the words "rebuild an index" to actually encompass "drop index X...
create index newX".
So maybe there is something else there he hasn't yet explained, and it
would do us all a lot of good for him to so explain it. That is indeed true.
> Which is a point to which I will agree and I suspect Howard will too.
> That it appears that Mike does not is what I would like to see resolved
> in this thread if we can just stick to Oracle and not personal insults.
I truly wish you would call a spade a spade. It is insulting, frankly,
to be accused of not discussing technical points. Which you do by your
"equanimity" in such comments.
>
> Let the entertainment continue.
>
It isn't entertainment. This is factual science, pure and simple. Either
Mike knows what he's talking about; or he's not talking about it very
clearly; or he doesn't. There are not many other possibilities.
If you find that fun, so be it. Personally, I find it immensely
irritating that the *knowledge* element of this is having to be spelled
out as if for the first time.
Regards
HJR
There IS a ratio...Its
effort to rebuild / benefit of rebuild < 1
Of course, you can't precisely define the benefit until you've done it..
:-)
--
Connor McDonald
Co-author: "Mastering Oracle PL/SQL - Practical Solutions"
ISBN: 1590592174
web: http://www.oracledba.co.uk
web: http://www.oaktable.net
email: connor_...@yahoo.com
Coming Soon! "Oracle Insight - Tales of the OakTable"
"GIVE a man a fish and he will eat for a day. But TEACH him how to fish,
and...he will sit in a boat and drink beer all day"
------------------------------------------------------------
--
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
"Daniel Morgan" <damo...@x.washington.edu> wrote in message
news:1084072587.374768@yasure...
"Howard J. Rogers" <h...@dizwell.com> wrote in message
news:409d9d9d$0$4548$afc3...@news.optusnet.com.au...
> VC wrote:
>
> > Me too, me too !.
>
> Don't forget, I really want to find out as well, because it doesn't help
> *at all* to have widely-read authors using language in an indeterminate
> and imprecise way.
I am genuinely curious about what the term is supposed to mean. It might be
that I am missing something and it would useful to get an alternative
viewpoint from Mike...
>But whatever it turns out to be, don't lose site of the bigger picture:
>whatever ratio Mike uses to suggest a rebuild, the rebuild can never,
>ever alter the clustering factor of the index.
No argument about that.
Regards.
VC
Connor,
First let me say, excellent book! I suggest it to the PL/SQL
developers I come across and use it to review, shall we say,
suboptimal code?
Anyway, I thought the concept of dirty base table blocks was
self-evident to those using Oracle, but since there seems to be some
confusion let me digress into an explanation. When a table is created
all of its blocks are in a sense "clean", that is, they contain no
data. As Oracle writes data to the database tables, their blocks
become "dirty", a reference no doubt to the "dirty buffers" the
dirty-buffer writer writes to them (while the politically correct
"database writer" has supplanted the original "dirty buffer writer" in
the original database documentation in version 6 and earlier it was
"dirty buffer writer".) So, a dirty block contains data, while a clean
one does not. Obviously, unless it is corrupt, an index can only point
to dirty blocks in a table. Hence a "perfect" clustering factor would
be equal or perhaps less than the number of dirty blocks, while for a
very random index, it would be closer to the number of actual table
entries (for some bitmap indexes, I have seen it greater than the
number of rows.) Usually, clustering factor shouldn't be greater than
the number of rows in the underlying table.
Now, as a block fills and then reaches pctused, it is taken off of the
free list and is not eligible for any more insertions, however it can
under go updates which can cause row chaining, but that is another
issue. Once deletions from the block cause the available space to drop
below pctused and it is placed back on the free list for more inserts.
This insertion back on the free list could be hours, days or some
greater time later than the last time it was used for inserts. This
means the index nodes that are referencing its inserted rows are most
likely way across the index tree from the original ones that hold most
of its entries.
Add to this the various node splitting algorithms Oracle uses for
non-sequential inserts and updates and you can easily see why
clustering factor increases and can become out of sync with reality.
An index rebuild coalesces nodes and aligns them with the underlying
table. Now, in many cases this reduces the clustering factor, however,
I have seem it stay the same (usually on truly random keys such as
text, or concatenated columns), decrease (the desired outcome) or
increase!
However, I may have stated things unclearly, the goal in index
rebuilding is not to reduce clustering factor, that is actually a
desired by-product, the goal is to ensure that the index is properly
aligned with the underlying table and that its entries are not sparse.
Clustering factor ratios are just one of several indicators that can
tell you an index needs to be investigated.
Mike
This sounds great, but please quantify.
Based upon what specific & quantifiable metrics
can anyone conclude that any specific index is
"properly aligned with the underlying table and
that its entries are not sparse."
>Clustering factor ratios are just one of several indicators
At what values does the clustering factor ratio turn from
a goodness indicator to a badness indicator.
What is YOUR formula for computing the clustering factor?
Is a high clustering factor ratio a good or bad indicator?
What are some of the other unspecified indicators and
how are they calculated or obtained?
> Anyway, I thought the concept of dirty base table blocks was
> self-evident to those using Oracle, but since there seems to be some
> confusion let me digress into an explanation. When a table is created
> all of its blocks are in a sense "clean", that is, they contain no
> data. As Oracle writes data to the database tables, their blocks
> become "dirty", a reference no doubt to the "dirty buffers" the
> dirty-buffer writer writes to them (while the politically correct
> "database writer" has supplanted the original "dirty buffer writer" in
> the original database documentation in version 6 and earlier it was
> "dirty buffer writer".) So, a dirty block contains data, while a clean
> one does not.
Extraordinary. Mike just keeps on re-writing the Oracle lexicon. A dirty
block is one with data in it (we all know it as a used block, or even
just a block below the high water mark). A clean block is an empty block
or a block above the high water mark. Fair enough. He has his own
private language on these matters that none of us can understand until
it's translated, but whatever keeps him happy, I suppose.
Anyway, we can now, finally, work out what Mike was talking about a few
posts ago.
Here is his original quote:
"To distill my advice on index rebuilds, I say to only rebuild indexes
which show, through proper analysis, to have problems. Examples of
possible probelms are: a large clustering factor to dirty base table
block ratio, [and he does go on to list a number of other tests which
are not at issue in this thread]"
That can now be translated to: "I say to rebuild an index if its
clustering factor exceeds the number of used table blocks by a large margin"
That's all.
I honestly thought (and hoped) it would be subtler and more cunning than
that. But that's what he was saying. If DBA_INDEXES.CLUSTERING_FACTOR >
DBA_TABLES.BLOCKS (by a lot) then REBUILD.
The only slight problem is that an index rebuild cannot possibly change
the number of BLOCKS for the table (hopefully, that one is obvious). And
as I showed in my post which Mike won't actually reply to, the
clustering factor won't change as a result of an index rebuild either.
Meaning that the formula Mike is using to suggest the rebuild suggests
it equally as well after the rebuild as before it -which by any
definition makes it meaningless.
No doubt, however, we will now discover that Mike has his own definition
of an index clustering factor and that he doesn't use the one in
DBA_INDEXES at all. Or some other definitional issue will arise. So in
the interests of clarity.
Mike: could you please confirm that you use the formula
DBA_INDEXES.CLUSTERING_FACTOR > DBA_TABLES.BLOCKS (by a lot). In other
words, to assess an index's clustering factor, you take the figure shown
in the column of that name in the DBA_INDEXES view. And that to assess
the number of "dirty base table blocks" you use the BLOCKS column in the
DBA_TABLES view. If you get your figures from some other source or
computation, could you please explain how you do it?
And you claim to have seen CLUSTERING_FACTOR in DBA_INDEXES change after
an index rebuild, right?
Thanks
HJR
I assume he means a dirty block in the table is a data (as opposite to
index)
block which has changed since the last index rebuild. Presumably if the
number (and hence percentage) of
a) new blocks (i.e. rows put into the index in a sub-optimal manner)
b) deleted blocks which leave sub-optimal free space in an index
i.e. leave free space in a block which could be occupied by other
index entries using optimal placement.
c) updated blocks which have have rows moved and hence non-optimal
placment of index entries.
is large relative to the size of the table (relative being a subjective
value
on how much optimal index placement gains you and how often sub-optimal
index placement occurs) than you rebuild. IT IS SUBJECTIVE. If you have
an idle system all Sunday night then why not rebuild after the weekly
full
backup to POTENTIALLY gain a slight increase in performance
in the next week? What do you lose by using idle machine time to
potentially improve (and certainly not decrease) performance?
Isn't that what these java 'hot spot' performance enhancing runtime
on-the-fly optimzers are about. It's just doing the optimzing during
idle time rather then the java ones trying to squeeze it in amongst
actually
running the application! Surely that's what Oracle self-managing and
IBM self-optimizing stuff is all about - use idle time to help
performance.
Automating this stuff is what the big guys are essentially pushing (even
if
they wrap it in sales talk).
USE YOUR IDLE CYCLES MAN!!
Hmm that could be the hippie salesman coming out in me!
> First let me say, excellent book! I suggest it to the PL/SQL
> developers I come across and use it to review, shall we say,
> suboptimal code?
I agree.
> Anyway, I thought the concept of dirty base table blocks was
> self-evident to those using Oracle, but since there seems to be some
> confusion let me digress into an explanation. When a table is created
> all of its blocks are in a sense "clean", that is, they contain no
> data. As Oracle writes data to the database tables, their blocks
> become "dirty", a reference no doubt to the "dirty buffers" the
> dirty-buffer writer writes to them (while the politically correct
> "database writer" has supplanted the original "dirty buffer writer" in
> the original database documentation in version 6 and earlier it was
> "dirty buffer writer".) So, a dirty block contains data, while a clean
> one does not.
Your verbiage may have been "common" back with version 6, I don't recall
that but I'll make allowances for my memory perhaps being faulty. But no
document finable at otn.oracle.com, tahiti.oracle.com, docs.oracle.com,
etc. uses this language. Nor could I find it in any of my reference
books. Including one written by you.
Perhaps much of the problem here is one of communication and I would
like to suggest on behalf of myself, my students, and others that watch
these exchanges, and a few that participate, that such language be
brought current to version 9i or later.
> Obviously, unless it is corrupt, an index can only point to dirty blocks
> in a table.
Given that one wishes to "invent" new terminology. Why can't you use the
terms used by Oracle and in standard usage by developers and DBAs? It
would make this thread much more usable and useful.
Hence a "perfect" clustering factor would
> be equal or perhaps less than the number of dirty blocks, while for a
> very random index, it would be closer to the number of actual table
> entries (for some bitmap indexes, I have seen it greater than the
> number of rows.) Usually, clustering factor shouldn't be greater than
> the number of rows in the underlying table.
>
> Now, as a block fills and then reaches pctused, it is taken off of the
> free list and is not eligible for any more insertions, however it can
> under go updates which can cause row chaining, but that is another
> issue. Once deletions from the block cause the available space to drop
> below pctused and it is placed back on the free list for more inserts.
> This insertion back on the free list could be hours, days or some
> greater time later than the last time it was used for inserts. This
> means the index nodes that are referencing its inserted rows are most
> likely way across the index tree from the original ones that hold most
> of its entries.
Freelist? I don't think anything I've been doing has used freelists for
years.
> Add to this the various node splitting algorithms Oracle uses for
> non-sequential inserts and updates and you can easily see why
> clustering factor increases and can become out of sync with reality.
> An index rebuild coalesces nodes and aligns them with the underlying
> table. Now, in many cases this reduces the clustering factor, however,
> I have seem it stay the same (usually on truly random keys such as
> text, or concatenated columns), decrease (the desired outcome) or
> increase!
Can you create a demo we can try to see this?
>
> However, I may have stated things unclearly, the goal in index
> rebuilding is not to reduce clustering factor, that is actually a
> desired by-product, the goal is to ensure that the index is properly
> aligned with the underlying table and that its entries are not sparse.
> Clustering factor ratios are just one of several indicators that can
> tell you an index needs to be investigated.
>
> Mike
Once again ... could you create a demo that would show this?
Thanks.
> I assume he means a dirty block in the table is a data (as opposite to
> index)
> block which has changed since the last index rebuild. Presumably if the
> number (and hence percentage) of
>
> a) new blocks (i.e. rows put into the index in a sub-optimal manner)
> b) deleted blocks which leave sub-optimal free space in an index
> i.e. leave free space in a block which could be occupied by other
> index entries using optimal placement.
> c) updated blocks which have have rows moved and hence non-optimal
> placment of index entries.
>
Assuming you are correct ... and that involves a lot of assuming ...
you are misusing the phrase "dirty block". I have no idea how a row
can be put into an index in a sub-optimal manner or even what
"sub-optimal" actually means. I can't see what is sub-optimal about
free space in an index block? Is that space less optimal than having
no space when an insert comes along with a block that belongs in the
block, etc.?
In short ... we have a huge amount of imprecise language being bandied
about by people who have yet to produce demos that supports their
contentions.
Which means to me that those of us trying to learn from these "experts"
are still in the dark. So lets stop making assumptions and get
clarifications and demos.
Never confuse movement with progress.
Going around in circles is movement,
but only few folks would consider it progress.
This is the equivalent of giving chicken soup to a dead man.
It cerntainly won't hurt him & maybe a miracle will occur & it help him.
I am an Informix dba who is posting in an Oracle group. I do not know
your terminology but I do not btree index structures.
> can be put into an index in a sub-optimal manner or even what
> "sub-optimal" actually means. I can't see what is sub-optimal about
sub-optimal means having all the index entries in as few blocks as
possible
to read/update/delete the blocks required whilst still allowing room for
expected
growth. Having to read nore index blocks to satisfy a query is not
optimal.
If the index entries you need to read can fit in less blocks than less
reads are
done and it is faster yes?
If two index entries are in the same block than less writes are done when
you
update both rows at the same time, yes?
If the index blocks are such that range scans can be done using
sequential
rather than random disk reads than it is faster yes?
> free space in an index block? Is that space less optimal than having
> no space when an insert comes along with a block that belongs in the
> block, etc.?
Is depends on how many rows you will insert. If inserting no more rows
or few rows (e.g. customer table where you do not get many new customers
per month and most of you business is repeat business than you need a
small
percentage free than a phone records table for a phone company!). If this
is a
table that get no more rows (e.g. prices table where prices only change
once
per year) than yes having no free space is optimal).
>
> In short ... we have a huge amount of imprecise language being bandied
> about by people who have yet to produce demos that supports their
> contentions.
>
> Which means to me that those of us trying to learn from these "experts"
> are still in the dark. So lets stop making assumptions and get
> clarifications and demos.
It is subjective and every case needs to be considered independently.
You need to use some general rules and the skill is in applying them to
specific cases. We need a few examples but do not assume the hugh
speedups you get from contrived examples will apply to you. Getting
a small speedup from lots of little tweaks can add up to a lot -
compound interest. Index rebuilds are just part of it.
What else would you using idle time on the machine for? Running seti
on all your servers...or sitting idle doing nothing? Of course even this
advice needs to be balanced. Using every free second is not wise
as well.
Exactly and this is a bad thing??
> "Howard J. Rogers" <h...@dizwell.com> wrote in message
> news:409e95ac$0$13706$afc3...@news.optusnet.com.au...
>
>>Mike Ault wrote:
>>
>>
>>>Anyway, I thought the concept of dirty base table blocks was
>>>self-evident to those using Oracle, but since there seems to be some
>>>confusion let me digress into an explanation. When a table is created
>>>all of its blocks are in a sense "clean", that is, they contain no
>>>data. As Oracle writes data to the database tables, their blocks
>>>become "dirty", a reference no doubt to the "dirty buffers" the
>>>dirty-buffer writer writes to them (while the politically correct
>>>"database writer" has supplanted the original "dirty buffer writer" in
>>>the original database documentation in version 6 and earlier it was
>>>"dirty buffer writer".) So, a dirty block contains data, while a clean
>>>one does not.
>>
>>
>>Extraordinary. Mike just keeps on re-writing the Oracle lexicon. A dirty
>>block is one with data in it (we all know it as a used block, or even
>>just a block below the high water mark). A clean block is an empty block
>>or a block above the high water mark. Fair enough. He has his own
>>private language on these matters that none of us can understand until
>>it's translated, but whatever keeps him happy, I suppose.
>>
>
>
> I assume he means a dirty block in the table is a data (as opposite to
> index)
> block which has changed since the last index rebuild.
There's no need to assume anything, David. Mike has told us what he means.
Quote: "So, a dirty block contains data, while a clean
>>>one does not."
> Presumably if the
> number (and hence percentage) of
>
> a) new blocks (i.e. rows put into the index in a sub-optimal manner)
Rows are not put into the index in a sub-optimal manner. They are placed
where their value determines they should be placed. A new entry for
"Borodin" must be inserted between one for "Beethoven" and one for
"Britten". That's not going to be placed "sub-optimally".
> b) deleted blocks which leave sub-optimal free space in an index
> i.e. leave free space in a block which could be occupied by other
> index entries using optimal placement.
Deleted entries can always be re-used for entries which need to make use
of it.
> c) updated blocks which have have rows moved and hence non-optimal
> placment of index entries.
I'm sorry. You're introducing a term here which has no meaning.
"Sub-optimal placement of index entries" can have no possible meaning,
because index entries must always be placed where their data value tells
us they must be placed.
> is large relative to the size of the table (relative being a subjective
> value
> on how much optimal index placement gains you and how often sub-optimal
> index placement occurs) than you rebuild. IT IS SUBJECTIVE.
It most certainly isn't. Mike Ault says that when the clustering factor
is bigger than the number of blocks in the table by a large margin (and
large he has defined elsewhere as about 8 times, but we don't need to
hold him to that for now), then a rebuild is called for.
It is objectively true that such a rebuild will leave both the
clustering factor and the number of table blocks unchanged, and hence
the measurement he is using to suggest an index rebuild will continue to
suggest a rebuild.
> If you have
> an idle system all Sunday night then why not rebuild after the weekly
> full
> backup to POTENTIALLY gain a slight increase in performance
> in the next week? What do you lose by using idle machine time to
> potentially improve (and certainly not decrease) performance?
Because you can't back up that "certainly not decrease" statement, can
you? I have seen perfomance slow down as a result of an index rebuild
because the index now has to start reclaiming the extents it lost during
the rebuild the very next time a user starts inserting into the table on
the Monday morning.
But I suppose this is progress: "we have an enormous maintenance window,
we have nothing better to do with our time or our CPU cycles, so why not
rebuild".
It certainly beats trying to claim there is an objective measure based
on clustering factor to tell you when to rebuild.
> Isn't that what these java 'hot spot' performance enhancing runtime
> on-the-fly optimzers are about. It's just doing the optimzing during
> idle time rather then the java ones trying to squeeze it in amongst
> actually
> running the application! Surely that's what Oracle self-managing and
> IBM self-optimizing stuff is all about - use idle time to help
> performance.
Regardless of the fact of the crassness of any approach that is
effectively saying "I've got nothing else to do so I might as well
rebuild my indexes", you are making one huge assumption: that a rebuild
always improves performance and never degrades it. That assumption is
simply not true.
> Automating this stuff is what the big guys are essentially pushing (even
> if
> they wrap it in sales talk).
>
> USE YOUR IDLE CYCLES MAN!!
A more sensible slogan might be "use your idle cycles intelligently, man".
Regards
HJR
> I am an Informix dba who is posting in an Oracle group. I do not know
> your terminology but I do not btree index structures.
It explains a lot.
>
>>can be put into an index in a sub-optimal manner or even what
>>"sub-optimal" actually means. I can't see what is sub-optimal about
>
>
> sub-optimal means having all the index entries in as few blocks as
> possible
> to read/update/delete the blocks required whilst still allowing room for
> expected
> growth. Having to read nore index blocks to satisfy a query is not
> optimal.
And how many extra blocks, precisely, do you think you have to read when
you create an index on EMPNO with 99% free space in the index nodes
compared to 0% free space, and your query says 'select * from emp where
empno=8867'? (I'll give you a clue. It's a nice round number. Very round).
What you are saying is that indexes can grow over time, and that affects
the speed with which we can perform index range scans. Absolutely true.
And absolutely nothing to do with the current discussion, which is about
whether the clustering factor of an index has anything to tell us about
whether to rebuild an index.
So you've effectively diluted a thread with observations which have
nothing to do with the matter being discussed. Mike will no doubt thank
you for the diversion.
It also does not follow that because a large index takes longer to scan
than a smaller one, that you should thus rebuild it. Because a large
index contains a lot of empty space which subsequent inserts into the
table can occupy without causing (expensive) block splits and extent
acquisitions. A compact index will block split and have to acquire new
extents as soon as users start doing DML on the table. There is
therefore a cost/benefit analysis to perform.
Index management is a lot subtler a subject than you appear to be aware.
>
> If the index entries you need to read can fit in less blocks than less
> reads are
> done and it is faster yes?
>
> If two index entries are in the same block than less writes are done when
> you
> update both rows at the same time, yes?
>
> If the index blocks are such that range scans can be done using
> sequential
> rather than random disk reads than it is faster yes?
Oh dear.
You presumably think that extents are contiguous on disk, or that making
them so speeds things up.
I give up. You're making silly assumptions and assertions without
apparently knowing very much about the matter.
> It is subjective
No it's not. Or it oughtn't to be. Mike Ault claims there is an
objective set of test you can perform that indicate when an index should
be rebuilt. I say that objectively his tests mean nothing. Objectively,
that can be measured, assessed and verified.
If we are all going to take refuge in the "it's all subjective"
argument, then we might as well all pack up and go home and never post
here again. "Anything goes, so long as it "feels" right to you"?
Very scientific. Not.
> and every case needs to be considered independently.
> You need to use some general rules
Some general rules that are verifiably true would help, don't you think?
>and the skill is in applying them to
> specific cases. We need a few examples but do not assume the hugh
> speedups you get from contrived examples will apply to you. Getting
> a small speedup from lots of little tweaks can add up to a lot -
> compound interest. Index rebuilds are just part of it.
>
> What else would you using idle time on the machine for? Running seti
> on all your servers...or sitting idle doing nothing? Of course even this
> advice needs to be balanced. Using every free second is not wise
> as well.
Well, it's an original argument, that's for sure. "I don't know how
indexes work, or whether a rebuild will speed things up or slow them
down, but I've got nothing better to do, so I might as well rebuild them
anyway".
It's not going to win you any prizes, I think.
HJR
How much did the chicken cost? How long did you spend cooking it? What
about the starving children down the road who could have actually made
rather better use of the chicken than Mr. Corpse? What about the fact
that Mr. Corpse is in his current predicament because he was suffering
from Ebola, which your totally superfluous soup ministrations have now
caused you to acquire?
So yes, doing anything because you've got nothing better to do but you
don't actually know what you are doing, is indeed a bad thing.
> There IS a ratio...Its
>
> effort to rebuild / benefit of rebuild < 1
>
> Of course, you can't precisely define the benefit until you've done it..
>
> :-)
I prefer a subtraction
benefit of rebuild - cost of rebuild > 0
Of course this suffers from the problem you identify twice, you can't
measure the benefit until you have performed the action, but you probably
can't measure the cost until you have performed the action either, not often
I agree with Rumsfeld but that unknowns quote seems appropriate here. I know
that I should be measuring the benefits and the costs, but i'm not sure what
either of them are.
cheers
--
Niall Litchfield
Oracle DBA
Audit Commission UK
http://www.niall.litchfield.dial.pipex.com/
> > > a) new blocks (i.e. rows put into the index in a sub-optimal manner)
There is no such thing as a "sub-optimal manner" in the Oracle
b-tree implementation since at least release 8i. Previous to that one
there wasn't either. But there were a few nasty bugs here and there
that could cause heaps of problems. Mostly gone.
> > > b) deleted blocks which leave sub-optimal free space in an index
> > > i.e. leave free space in a block which could be occupied by other
> > > index entries using optimal placement.
They ARE occupied by other index entries using optimal placement...
> > > c) updated blocks which have have rows moved and hence non-optimal
> > > placment of index entries.
Repeat after me please: In Oracle, a move of a row in a data block
will never ever cause a "non-optimal placement" of its corresponding
entry in the index. The ONLY thing that can happen is for the rowid in
the index to be updated to a new one. That is NOT "non-optimal"
by ANY stretch of the imagination.
> I am an Informix dba who is posting in an Oracle group. I do not know
> your terminology but I do not btree index structures.
Cool. Welcome. I hope you will concede that b-trees a-la
Informix are NOT the same as b-trees a-la Oracle?
> sub-optimal means having all the index entries in as few blocks as
> possible
That is controlled in Oracle by the index creation process.
It's fixed and it has no progressive disorganization.
> to read/update/delete the blocks required whilst still allowing room for
> expected
> growth. Having to read nore index blocks to satisfy a query is not
> optimal.
The only way that can really happen in Oracle is if the number of index
branch levels increases. You need a LOT of inserts for that to happen
in a significant manner. And I mean a *LOT*!
> If the index entries you need to read can fit in less blocks than less
> reads are
> done and it is faster yes?
The only way that can happen is if someone changes the allocation
parameters for the index and re-creates it or if the length of the keys
has changed. But none of that happens as a result of progressive
degradation of the index.
>
> If two index entries are in the same block than less writes are done when
> you
> update both rows at the same time, yes?
Sure. And it ain't gonna change over time. Unless your key size
has changed DRAMATICALLY.
> If the index blocks are such that range scans can be done using
> sequential
> rather than random disk reads than it is faster yes?
Yes. Absolutely. Compactness. And re-building an index gives you
squat guarantee it will happen. You have to do a LOT more than just
rebuild to ensure that.
> Is depends on how many rows you will insert. If inserting no more rows
> or few rows (e.g. customer table where you do not get many new customers
> per month and most of you business is repeat business than you need a
> small
> percentage free than a phone records table for a phone company!). If this
> is a
> table that get no more rows (e.g. prices table where prices only change
> once
> per year) than yes having no free space is optimal).
Yes, it CAN affect the index compactness. At build time.
Not during normal use. Once you define the per block free space,
it is the same for ever new block. It ain't gonna get worse because
you pumped a few million rows into the table...
> speedups you get from contrived examples will apply to you. Getting
> a small speedup from lots of little tweaks can add up to a lot -
> compound interest. Index rebuilds are just part of it.
Yes, but the problem is that we all doubt that an index re-build
provides ANY speed improvement, let alone small increments.
IF you also change the NATURE of the index, then YES INDEED you may
see an improvement. That is the case when you change the number or
order of columns in a concatenated index. Or you may also change the
space allocation parameters for the index to make it more compact.
Or even the actual columns being indexed.
Sure. But that is NOT the rebuild we're talking about. That is design
tuning, which can be done at ANY time in ANY database and have good
to excellent performance results. The "index rebuild" that is in
question here is the arbitrary ALTER INDEX REBUILD used as a "preventive"
maintenannce strike by some DBAs and which serves no purpose whatsoever
and achieves nothing, nowadays. Kinda like reorganising tables to reduce
number of extents.
> What else would you using idle time on the machine for? Running seti
> on all your servers...or sitting idle doing nothing? Of course even this
> advice needs to be balanced. Using every free second is not wise
> as well.
>
I'd settle for having ANY free time...
;)
Cheers
Nuno Souto
wizo...@yahoo.com.au.nospam
> There is no such thing as a "sub-optimal manner" in the Oracle
> b-tree implementation since at least release 8i.
>
> Cheers
> Nuno Souto
> wizo...@yahoo.com.au.nospam
I am beginning to wonder how much of this thread is based upon
information that may have been valid many releases ago but is
no longer valid. Certainly some of the terminology used is not
words that have been used to describe things in more than a decade.
You know, when all is said and done, the simple, code, hard fact is that you
really *don't* understand the significance of the CF and how Oracle
implicitly maintains the CF. You *don't* understand how to alter the CF and
you *don't* understand how rebuilding an index has *no* effect on the value
of the CF.
Mike, you just don't get it !!
And because you don't get it you, I understand now how you just don't see
how the quote of yours in my presentation is so utterly and totally wrong.
Ignorance is making you blind to the truth.
See comments below ...
"Mike Ault" <mike...@earthlink.net> wrote in message
news:37fab3ab.04050...@posting.google.com...
Not necessarily at all. This implies that the previously inserted rows are
mostly aligned with the index entries. With a random index (say a table
containing a list of random customers), the customer name index may be never
be aligned with the index entries. Mr Simith can be immediately followed by
Mr Adams that can be immediately followed by Mr Zulu that can be immediately
followed by Mr Foote, etc. in the *table*. But the *index* is always sorted
in the order of the index. So if you delete say Mr Adams and replace it with
another row where the Mr Adams row previously existed, the impact with the
CF is also random (it may be 1 better, 1 worse or unchanged)
Also, even if you happen to have *1* index that currently has a good CF that
may be impacted by subsequent deletes and inserts, what about all the
*other* indexes on the table. You likely only have 1 index that has a good
CF anyways, the table can only be ordered significantly on more than 1
column.
Also, you continually confuse the *physical* and *logically* location of the
index nodes as if they have some significance with regard to the CF. They do
*not* !! It's the physical order of the rows in the *table* in relation to
each of the indexes that determines the CF.
The CF is a value that represent the number of physical reads of the *table*
likely required to read the entire *table* via a full scan of the index. The
index *must* be sorted in the order of the index entries. It must.
Therefore, you are concentrating on the wrong object when you're trying the
determine the impact of inserts and deletions on the CF. The logical order
of the index is not impacted by DML. Therefore, when navigating the index
during a full scan, you will always read the values starting by A, then the
values starting by B. It's how well aligned the table is in relation to the
index that's important.
Mike, you've got it the wrong way around !!
>
> Add to this the various node splitting algorithms Oracle uses for
> non-sequential inserts and updates and you can easily see why
> clustering factor increases and can become out of sync with reality.
This is totally and fundamentally wrong. An index node split has *NO* impact
on the CF. None. After the node split, the index entries are still
*logically* ordered. Therefore a full index scan will required exactly the
same estimated physical reads to read the *table* before the index split as
it does after the index split. When you insert an index entry, it has *NO*
impact on the CF. None !! It's where the *row* in the *table* is physically
located in relation to the index(es) that's relevant.
Mike, this is where your logic is no longer aligned with reality. Your
statement above *proves* you simply don't understand the CF. Sad but very
much true.
> An index rebuild coalesces nodes and aligns them with the underlying
> table.
Mike, Mike Mike ...
This is totally, absolutely and completely wrong, wrong, wrong !!
When you rebuild an index (or coalesce), the logical order of the index
entries remain exactly the same. Let me say that again, it might just help.
*When you rebuild an index (or coalesce), the logical order of the index
entries remain exactly the same.* Therefore, after the rebuild, even if you
reduce the number of nodes by half, the logical order of the index entries
remain exactly the same. Therefore, the number of physical I/Os required to
read the *table* remains *exactly* the same. It must. Therefore, the CF
remains *exactly* the same. It must.
For some reason, you are under the delusion that the CF is impacted if you
rebuild an index. If you rebuild an index, the CF is not impacted.
Again, the statement above, which to your embarrassment you keep repeating
publicly *proves* that you do not understand how the CF works. If you did,
you would know that the CF is not changed by simply rebuilding the index.
>Now, in many cases this reduces the clustering factor,
Mike no. This is getting silly. By simply rebuilding an index, the CF is
*not* changed. Please, give us one demo, just one, only one, just a tiny,
tiny little one demo that show the CF of an index changing after a simple
rebuild of an index. Like I requested in my previous email that you've
conveniently not replied to, you simply can't.
>however, I have seem it stay the same (usually on truly random keys such as
> text, or concatenated columns),
actually, you've seen it stay the same *every single time* you've bother to
look
>decrease (the desired outcome) or
> increase!
Mike, only because (possibly) your statistics were not up to date to begin
with.
Mike, please, please, please, please show one example where you have fresh,
up to date stats, you perform an index rebuild, you take new stats and the
CF has changed. Just one example. I throw this out to anyone listening in.
Please pick any index, no matter how "old", no matter how "broad", no matter
after any number of "node splits", no matter what "ratio" of dirty or brown,
or green or pink with yellow spots blocks, no matter what, analyze the
index, note the CF, rebuild the index, check out the new CF. In *every*
case, the CF will be the same. Everyone, please prove it for yourselves ....
>
> However, I may have stated things unclearly, the goal in index
> rebuilding is not to reduce clustering factor, that is actually a
> desired by-product,
No Mike it is not. Please someone, how many times has Mike made this
fundamental error ? Not just here but in his other posts as well. I've truly
lost count. And no matter how many more times he says it, it simply is not t
rue.
Anyone who truly understands the CF would realize this.
> the goal is to ensure that the index is properly
> aligned with the underlying table
But Mike, that's what the CF is, only the other way around !! A measurement
of how aligned the *table* is in relation to the index with the CF value
representing the estimated number of physical I/O required to read the
entire table based on a full index scan of the corresponding index.
Why can't you see that ...
> and that its entries are not sparse.
> Clustering factor ratios are just one of several indicators that can
> tell you an index needs to be investigated.
Mike. You are publicly recommending rebuilding indexes based on some ratio
(how typical) of blocks to CF. If the ratio is too high, rebuild the index.
Problem is Mike, after the rebuild, the CF remains *the same*. So you then
recommend that this index be rebuilt again. And again, and again, and again,
and again, and again, and again, and again, and again, and again, and again,
and again, and again, and again, and again, and again, and again, and again,
and again, and again, and again, and again, and again, and again, and again,
and again, and again, and again, and again, and again, and again, and again,
and again, and again, and again, and again, and again, and again, and again,
and again, .....
Doesn't that strike you as being rather pointless. Hummm, me too.
And that Mike, is why your quote is featured in my presentation. Because
such a recommendation is total and utter tripe. *No matter the context* !!
Fortunately, your posts have confirmed you lack understanding in this area
and the validity of using your quote in the first place. Hopefully, you will
now, finally get it and realize the errors of your understanding.
And hopefully, finally, you'll be big enough to admit it and maybe even
considerate enough to thank me (rather than attack me) for showing you the
light.
You're confused a lot of people with your rubbish but if it means finally,
such rubbish will now cease, it might just be worth it. My presentation is
simply an attempt of mine to try and put such rubbish where it belongs. In
the bin.
One last time, repeat after me. The CF is *not* changed after an index
rebuild.
Cheers
Richard
Hi Daniel,
May I suggest none.
This thread is based *entirely* on someone's misunderstanding of how the CF
works in Oracle and someone's misunderstanding that a way to improve the CF
is to rebuild an index. If someone can just grasp the concept that the CF
remains unchanged after an index rebuild, much of the confusion will
disappear.
I'm doing my best to make this happen both with my presentation and with my
contributions in this thread ...
Regards
Richard
</snip>
And show us how application performance improves as a result. And I'd
like to see more than just SELECT statements improving. One also has to
take into account DML.
Cheers,
Brian
--
===================================================================
Brian Peasland
dba@remove_spam.peasland.com
Remove the "remove_spam." from the email address to email me.
"I can give it to you cheap, quick, and good. Now pick two out of
the three"
>
> I'm doing my best to make this happen both with my presentation and with
> my contributions in this thread ...
Might I suggest a version of your presentation that can be presented by
others using a variant of the GPL OR some other way of easily getting
permission to present your material.
(I would love to have you present to our local UG here in Canada. However I
suspect the compensatory air fare and accomodations would be difficult to
arrange.)
/Hans
Welcome back, we've all missed you, we really have.
I'm really looking forward to reading your post, it's sure to be full of
accurate, interesting, technical information ...
"Don Burleson" <d...@burleson.cc> wrote in message
news:998d28f7.04050...@posting.google.com...
> > In the context of the present discussion, however, was is most
> > interesting is the distance that apparently now exists between Don and
> > Mike on the issue. I hope the divorce won't be too messy.
>
> As usual, totally wrong. . .
>
> Actually, anybody with the balls to bother arguing with you losers
> should be commended.
Now Don, that's not very nice (and not very professional either).
We're not arguing with Mike, we're just trying to show him why he is so
utterly wrong. Hopefully, it might eventually sink in.
Why are you calling all of us "losers" ? It seems a remarkably ungracious
thing to say considering all the many times your "expert" articles have been
discussed here and considering the many times we have highlighted the
numerous errors that they've often contained. These highlighted errors have
been most beneficial to you because of the numerous times they have led you
to correct and in some cases totally rewrite your articles as a consequence.
Don, please tell us. How many times have you *directly* made amendments to
your articles based on what I and others have discussed here. Please Don,
how many times ? And how many of those times have you publicly acknowledged
these contributions ? Please Don, how many times have you thanked and
publicly acknowledged changes you're made to your articles based on
discussions here ?
I really don't think "losers" is a nice thing to call people who have helped
with your knowledge of Oracle do you ?
>
> In fact, I'm so impressed with his posts here that I just hired him.
> He starts in 3-weeks:
>
> http://dba-oracle.com/oracle_news/2004_5_11_burleson.htm
>
So you've hired Mike to work for you based on his posts here in the last few
days ? That's quite remarkable !! Not sure how many other people would hire
someone who :
- has no idea how Oracle indexes reuse space and thinks index entries are
only reused if identical (as per his quote he's attempted to defend)
- incorrectly thinks that the CF is affected by index splits
- incorrectly thinks that the CF is affected by how "broad" an index is
- incorrectly thinks that the CF is affected by DML changes to the *indexes*
- incorrectly thinks that an index should be rebuilt based a ration between
the CF and the blocks in a table
- incorrectly thinks that an index rebuild will improve or alter the CP
- in summary, has no idea about the CF
- thinks that indexes should be rebuilt based on x (2) levels in an index
(also per the quote he's attempted to defend)
What's perhaps worse, is that when faced with the cold hard facts of logic
and the technical truth, he either chooses not to or is incapable of
accepting it.
And this has *impressed* you ? Only you Don, only you. You have an
interesting selection criteria to say the least ...
> If you want to retain him to explain index internals to you, here are
> his rates:
>
> http://www.dba-oracle.com/consulting_prices.htm#ault
You mean you're going to charge people to have Oracle myths in relation to
index internals explained to them !! That doesn't sound quite fair to me ...
I tell you what Don, here's the deal. I'm going to give you the perfect
opportunity to prove all your doubters wrong. This will be the perfect
opportunity to show to the world what an Oracle "expert" you really are. You
too make the claim that the CF is a reason for an index rebuild, your
memorable "Index Internals" article said so as per your quote in my
presentation (what do you think of my presentation btw).
All you have to do is simply explain to us all how Mike is right and I'm
wrong.
1) Why, why should the CF improve when you rebuild an index ? What are the
technical reasons for such a phenomenon ?
2) How, how does a rebuild that results in no change to the index order
result in an improved CF ? What are the magical steps that Oracle performs,
please a technical description ?
3) Where, where is a demo that proves that a rebuild changes the CF ? Where
is your proof ?
4) and finally in relation to your specific quote in my presentation, please
explain how an Oracle index can become unbalanced. Please explain to us all
why and how an Oracle index will "spawn to a fourth level only in areas of
the index where a massive insert occurred, such that 99% of the index has
three levels, but the index is reported as having four levels". How is this
possible in Oracle ?
This is your chance to make a *positive* contribution by explaining why you
and your new employee believe in such things.
And a final little question for you Don. How come most of all your previous
posts here in these newsgroups have totally disappeared from the google
archives ? Most of them have simply vanished, including your "classic"
comments in the Index Rebuilding thread where you initially made your quote
above. Where have they all gone ? Are you devastated that all your wonderful
contributions have gone, or have you something to hide ?
Just interested Don.
But please, answer the questions above, I would dearly love to hear your
technical views on these things ...
Cheers
Richard
</snip>
That wasn't clear to me, maybe because I didn't get involved with Oracle
until v7.1. However, I'm not sure that I'll be referring to my data on
disk as "dirty". It's good data. It's valid data. But thanks for the
explanation of your term.
<snip>
> Now, as a block fills and then reaches pctused, it is taken off of the
> free list and is not eligible for any more insertions, however it can
> under go updates which can cause row chaining, but that is another
> issue.
</snip>
Didn't you mean row migration?
<snip>
> Add to this the various node splitting algorithms Oracle uses for
> non-sequential inserts and updates and you can easily see why
> clustering factor increases and can become out of sync with reality.
> An index rebuild coalesces nodes and aligns them with the underlying
> table. Now, in many cases this reduces the clustering factor, however,
> I have seem it stay the same (usually on truly random keys such as
> text, or concatenated columns), decrease (the desired outcome) or
> increase!
</snip>
Just to make sure that we are all talking about the same clustering
factor....
According to the Oracle docs, the clustering factor "Indicates the
amount of order of the rows in the table based on the values of the
index". How does rebuilding an index alter the clustering factor? An
ALTER INDEX REBUILD does not change the order of rows in the table. Pure
and simple. And Howard has a nice example which illustrates this exact
point.
Is it possible that your definition of "clustering factor" differs from
the above? If so, can you please post your definition? If not, then I'm
afraid that your statements are false.
Hi Hans,
There's a white paper that goes hand in hand with the presentation that
discusses all the concepts in a lot more detail. I'm still polishing the
thing up but time is always the killer. I'm currently a technical editor for
a great new book from the OakTable gang which is taking up a lot of my spare
time. If I do my job right, it should be "myth free" !!
I should finally finish the damn thing in a week or two, I'll keep you all
posted.
I'm glad you like it, I can't believe the positive feedback it's all
received (except from the odd, disgruntled Oracle "author" !!).
I would just *love* to go to Canada :)
Cheers
Richard
As usual, you only quote the part of the posting that supports your
argument and ignore other relevant data. Using this method I can prove
that because we in the USA drive on the opposite side of the road than
those in England, we get tornados and they don't. As I have tried to
tell you in words of one syllable or less, the clustering factor to
"used base table block" ratio is only one thing that should be used to
see if an index needs to be rebuilt. The number of levels and the
amount of "white" space in the index are also determining factors. As
someone so appropriately stated, rebuild the index when the benefits
of doing so exceed the cost, unfortunately you can't determine this
until after they are rebuilt. I am just trying to provide guidelines
as to when the index becomes a candidate for rebuild. Notice, I said
"guideline" not absolute rule, not law of nature, just guideline.
You want to force me into this idiot box you have created, sorry, I
don't fit, I hope you like it in there. If you refuse to see the
entire picture, then no one can correct your misunderstandings.
The idea that you should optimize an index for inserts and updates is
totally odd to me as indexes are used for optimization of select
processing. If indexes are causing that big of a problem during
inserts and updates then maybe you need to review what you are
indexing and why. Avoiding index rebuilds by crippling the performance
of indexes by making them too-broad (i.e. sparse) is also an odd idea.
The next time I have a client who needs to consider index rebuilds to
improve performance I will capture some stats for the list and will
post before and after data. Those that want to see what report I use
to gather the index stats data can email me, I supply it to any who
ask. I have no magic formula for clustering factor, Oracle calculates
that readily enough and provides it in the dba_indexes view. The
number of dirty, used, occupied or whatever euphemism you wish to use,
blocks is also readily available in dba_tables. A ratio is determined
by the division of two values, so when I say the ratio of clustering
factor (a single, easy to obtain number) to dirty blocks (another easy
to obtain number) then I have stated my formula. Since the actual
value of this ratio for "badness" varies from table to table and index
to index, I usually just say, when it exceeds several times the number
of dirty blocks and when really pressed I say start at 8 to 10 times
the value and do further analysis (many people want a hard number,
sorry, there isn't one.)
I don't say, rebuild all indexes, which would be as bad as never
rebuild indexes. Both of these can lead to problems. Analyze (as in
look at various statistics), rebuild when indicated. Think for
yourself.
Mike Ault
> Going around in circles is movement,
> but only few folks would consider it progress.
My tires on my vehicle went around in circles this morning. I eventually
got to work. Since it is Monday, I'd consider that progress!!!
Can you show us some concrete examples of how rebuilding an index
changes the clustering factor?
> The idea that you should optimize an index for inserts and updates is
> totally odd to me as indexes are used for optimization of select
> processing.
So an index can't be used for UPDATE statements? What if I issued the
following:
UPDATE emp
SET salary=salary*1.03
WHERE empid=1001;
Surely an index on EMPID would be beneficial in this case if the table
is sufficiently large (making the normal assumption in this case that
EMPID is unique).
You didn't mention DELETE statements, so I'm just adding it here....an
index can also help certain DELETE statements as well. Taking my above
example....
DELETE FROM emp WHERE empid=1001;
If the table is sufficiently large, then an index can help here too.
I strongly disagree with your statement "indexes are used for
optimization of select processing." As I've stated above, an UPDATE and
a DELETE statement can use indexes to speed processing. Here is an
example of an EXPLAIN PLAN on a simple DELETE statement:
ORA9I SQL> set autotrace on explain
ORA9I SQL> delete from emp where empno=1001;
1 row deleted.
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=13)
1 0 DELETE OF 'EMP'
2 1 INDEX (UNIQUE SCAN) OF 'EMP_PK' (UNIQUE) (Cost=1 Card=1
Bytes=13)
If an index where, as you say "used for optimization of select
processing" then why use the index in the DELETE statement?
As for INSERT statements...an index won't *help* a statement similar to
the following:
INSERT INTO emp (empid,ename) VALUES (1099,'Bob');
However, rebuilding an index and losing any white space can slow down
the above insert should a block split need to take effect. Instead of an
index on EMPID (which is a bad example for this case), an index on ENAME
would show the point. If my index on ENAME had leaf values similar to:
.... Bill, Brad, Brian, ....
And I delete 'Brad', then when I insert 'Bob', it will take the slot
previously occupied by 'Brad'. If I rebuild the index and remove that
slot, then an insert of 'Bob' can require block splits.
> I have no magic formula for clustering factor, Oracle calculates
> that readily enough and provides it in the dba_indexes view. The
> number of dirty, used, occupied or whatever euphemism you wish to use,
> blocks is also readily available in dba_tables. A ratio is determined
> by the division of two values, so when I say the ratio of clustering
> factor (a single, easy to obtain number) to dirty blocks (another easy
> to obtain number) then I have stated my formula.
Thanks for providing that. I now know that your clustering factor is the
same as Oracle's definition, since you are using the CLUSTERING_FACTOR
column of DBA_INDEXES. And I now fully understand your ratio you have
been talking about.
> I don't say, rebuild all indexes, which would be as bad as never
> rebuild indexes. Both of these can lead to problems. Analyze (as in
> look at various statistics), rebuild when indicated. Think for
> yourself.
On this last point, you and I agree, sort of. I don't say rebuild
indexes just because of some magic number. And I don't say to never
rebuild indexes. There are times to rebuild indexes and times not to.
But for me, those times are more dictated by the expected flow of data,
not by some magic number. If an index has a tendancy to get one-sided
(normally due to the use of a monotic sequence as the value to the
indexed column followed by deletions of "older" values) then rebuilding
periodically can help. And if I delete massive amounts of data and I
don't expect to add much more data in the future, then rebuilding can
help. And sometimes rebuilding an index can help in one area and hurt in
others, but the one area that I am fixing is more important to me than
the other areas. But in none of these circumstances can I arrive at any
ratio which defines the need to rebuild or not. There is simply no query
you can apply the index and table metrics to arrive at these
conclusions. A DBA has to know about the data flows in, and out of,
their database.
"Mike Ault" <mike...@earthlink.net> wrote in message
news:37fab3ab.04051...@posting.google.com...
> Snip>"To distill my advice on index rebuilds, I say to only rebuild
> indexes
> which show, through proper analysis, to have problems. Examples of
> possible probelms are: a large clustering factor to dirty base table
> block ratio, [and he does go on to list a number of other tests which
> are not at issue in this thread]">
>
> As usual, you only quote the part of the posting that supports your
> argument and ignore other relevant data. Using this method I can prove
> that because we in the USA drive on the opposite side of the road than
> those in England, we get tornados and they don't. As I have tried to
> tell you in words of one syllable or less, the clustering factor to
> "used base table block" ratio is only one thing that should be used to
> see if an index needs to be rebuilt. The number of levels and the
> amount of "white" space in the index are also determining factors. As
> someone so appropriately stated, rebuild the index when the benefits
> of doing so exceed the cost, unfortunately you can't determine this
> until after they are rebuilt.
> I am just trying to provide guidelines
> as to when the index becomes a candidate for rebuild. Notice, I said
> "guideline" not absolute rule, not law of nature, just guideline.
Thanks for the nice remark - I do try to stay on topic most of the time :).
There is a quite handy guideline for when to do index rebuilds (defined as
issue ALTER INDEX ... REBUILD..). That would be 'don't do it'. I'd hope that
guidelines covered the usual state of affairs and not unusual or edge
scenarios.
You have offered the following if I understand you correctly.
1. Ratio of DBA_INDEXES.CLUSTERING_FACTOR/(DBA_TABLES.NUM_BLOCKS -
DBA_TABLES.EMPTY_BLOCKS)
This cannot ever be changed by ALTER INDEX ... REBUILD all 3 components
refer to the table not the index. There is no way that
this makes sense for a guidline I'm afraid. Your good indexes after 'fixing'
will have the same ratio they did before you fixed them and thus will be
candidates for fixing again.
2. The number of levels in an index.
This *can* be changed by ALTER INDEX ... REBUILD but such a change will be
very rare indeed most 4 level indexes for example are that height because
they have to be because of the number of rows they are indexing. In addition
it is likely to only save you a very few logical IOs (usually precisely 1
per sql statement). Thus I'd be very wary of this factor as well.
3. The amount of 'white space' in an index. This would seem to be the most
likely factor to consider - I assume you mean large numbers of almost empty
index blocks. For queries that do index access by primary or unique key this
will also be irrelevant (it will make no difference at all to the IO if you
rebuild the index). For some range scans this *might* lead you to rebuild
the index, I guess then the question will be how many index leaf blocks
fewer will you visit. This however is the change most likely to have a cost
associated with it. AFTER you rebuild the index in a tightly packed manner
you are highly likely to have increased leaf block splits due to DML.
> The next time I have a client who needs to consider index rebuilds to
> improve performance I will capture some stats for the list and will
> post before and after data
I appreciate this offer - I'd like to see the before and after stats.
> I don't say, rebuild all indexes, which would be as bad as never
> rebuild indexes.
Never rebuild any b*tree indexes is actually much better advice than rebuild
all indexes :(. It goes too far, but not much too far. The onus will always
be on the person advocating a change to an object to make the case -
preferably with numbers ahead of time to back it up.
> Both of these can lead to problems. Analyze (as in
> look at various statistics), rebuild when indicated. Think for
> yourself.
Cheers
Don't be bloody stupid Mike. I quoted the part which makes reference to
the thing under discussion, namely the clustering factor. The other
tests, which I acknowledged existed, aren't to do with the clustering
factor.
Mike, in case you haven't noticed, we're not particularly interested in
this thread any more with your words of wisdom on a range of topics.
We are interested in one simple fact: do you actually have even the
vaguest idea of what an index's clustering factor is?
That's all.
> Using this method I can prove
> that because we in the USA drive on the opposite side of the road than
> those in England, we get tornados and they don't. As I have tried to
> tell you in words of one syllable or less, the clustering factor to
> "used base table block" ratio is only one thing that should be used to
> see if an index needs to be rebuilt.
And as I have gone to great lengths to point out, since an index rebuild
DOES NOT change the order of entries in an index; and since an index
rebuild DOES NOT change the order of rows in a table; and since the
clustering factor is a measure of the correlation between those two sort
orders... the clustering factor will remain precisely, exactly the same
AFTER a rebuild as BEFORE.
Now I don't care if you have 57 other tests to determine whether an
index should be rebuilt: it's not relevant to this thread. You happen to
have this one particular test in your kit bag... and it is *that* *one*
*test* that is under discussion here, and which usefulness or lack
thereof is being assessed.
The scope of this thread, despite many diversionary tactics by some, is
really very narrow. Does Mike Ault's test for clustering factor/in-use
table blocks ratio have any actual meaning or value? That's all.
Once we get an answer to that, I suppose readers will be able to draw
their own conclusions about a related, but subsidiary matter: Does Mike
Ault have a clue about what he's talking about?
But we can't answer that question until we get an answer to the
clustering factor one.
>The number of levels and the
> amount of "white" space in the index are also determining factors. As
> someone so appropriately stated, rebuild the index when the benefits
> of doing so exceed the cost, unfortunately you can't determine this
> until after they are rebuilt. I am just trying to provide guidelines
> as to when the index becomes a candidate for rebuild. Notice, I said
> "guideline" not absolute rule, not law of nature, just guideline.
Guideline or 11th commandment. It doesn't matter if, whatever it is, is
meaningless drivel, wouldn't you say? So the question before us is to
assess whether this particular guidelines actually tells us something
useful, or is a complete waste of time.
> You want to force me into this idiot box you have created, sorry, I
> don't fit, I hope you like it in there. If you refuse to see the
> entire picture, then no one can correct your misunderstandings.
Blather, blather, blather. Jeez Louise: you're making an awful mess as
you try and wriggle and slither out of this one.
No-one has put you in an idiot box Mike. I spent several posts trying to
work out what a 'dirty table block' was. I've now spent several posts
trying to work out what you mean by 'clustering factor'. I have, in
short, spent considerable time and effort trying to get you to elaborate
what you mean, precisely so that I don't mischaracterise what it is that
you're saying.
I wanted to be absolutely 100% sure that what you called the clustering
factor is what ordinary users of Oracle would know as the clustering
factor. That wasn't to force you into a box. It was merely to be certain
we are talking the same language.
Because *when* you use Oracle's definition of clustering factor, which
is invariant after an index rebuild, your proposed test, 1 of several
hundred I am sure, is *stupid*. A complete waste of time. A wild goose
chase.
Now, you can defend your test if you like, and prove it does what you
claim it can do. You can post some actual evidence that the clustering
factor is variable under the impact of an index rebuild. Or you can try
and kick up some "I've-never-been-so-insulted-in-my-life" diversionary
dust so we all don't notice that you can do none of those things.
I'll give you a clue, though, Mike. I haven't put you in an idiot box,
but your continual refusal to post anything resembling *evidence* is
making an awful lot of people consign you to that location in their own
minds. One simple bit of evidence that we can all try out at home would
be enough to severely embarrass me, and justify you. Just one.
> The idea that you should optimize an index for inserts and updates is
> totally odd to me as indexes are used for optimization of select
> processing. If indexes are causing that big of a problem during
> inserts and updates then maybe you need to review what you are
> indexing and why. Avoiding index rebuilds by crippling the performance
> of indexes by making them too-broad (i.e. sparse) is also an odd idea.
Well, I have news for you Mike. In the wide world of "odd", nothing is
odder than advocating spending more than, Oh I dunno... a nanosecond?,
on calculating a ratio to indicate whether an index should be rebuilt.
Only to have that ratio remain unaltered AFTER the rebuild. That really
*is* odd advice.
> The next time I have a client who needs to consider index rebuilds to
> improve performance I will capture some stats for the list and will
> post before and after data.
You mean, you don't actually *have* any data to hand? You mean, the
author of so many Oracle books, the self-proclaimed expert, can't
actually lay his hands on a bit of research data right now to justify
his preposterous claims?
Why am I not surprised?
And no doubt this is also the answer to those several posters here who
have asked to see a demo they can run at home to see the clustering
factor change after a rebuild: no you can't kiddies, because this is
much too hard and complex to be easily reproducible. World-renowned
expert has to acquire data from big, important clients before the effect
of which he has written ever becomes visible.
It's a total load of Horlicks, bombast and fudge. Which is not as
delicious a confection as it might sound.
> Those that want to see what report I use
> to gather the index stats data can email me, I supply it to any who
> ask. I have no magic formula for clustering factor, Oracle calculates
>
> The
> number of dirty, used, occupied or whatever euphemism you wish to use,
> blocks is also readily available in dba_tables. A ratio is determined
> by the division of two values, so when I say the ratio of clustering
> factor (a single, easy to obtain number) to dirty blocks (another easy
> to obtain number) then I have stated my formula. Since the actual
> value of this ratio for "badness" varies from table to table and index
> to index, I usually just say, when it exceeds several times the number
> of dirty blocks and when really pressed I say start at 8 to 10 times
> the value and do further analysis (many people want a hard number,
> sorry, there isn't one.)
Right. Finally, we have an answer.
You DO use DBA_TABLES.BLOCKS. And you DO use DBA_INDEXES.CLUSTERING_FACTOR.
And you therefore cannot possibly, ever, have seen the ratio change
after a simple alter index X rebuild command for a regular old b*tree index.
Not once, not ever.
And any claims you make to that effect are just so much hot air.
And we are left with the uncontestable and astonishing fact that Mike
Ault, prolific "author" and self-proclaimed "expert", advocates the
calculation of a ratio which DOES NOT CHANGE after you take the action
he claims you should consider taking when the ratio is bad.
Mike: when will the penny finally drop? Users don't want hard numbers or
soft numbers, but MEANINGFUL numbers. And truthful ones.
> I don't say, rebuild all indexes, which would be as bad as never
> rebuild indexes.
Sniff. Uh huh. Smells like a strawman.
Nobody said you ever suggested "rebuild all indexes", Mike. We've just
had it confirmed in your own words, however, that you do recommend
thinking about rebuilding indexes if the ratio of clustering factor to
in-use table blocks gets too high. That is quite bad enough.
>Both of these can lead to problems. Analyze (as in
> look at various statistics), rebuild when indicated. Think for
> yourself.
I couldn't agree more. Let's be just a *little* bit more specific,
however: "Don't let Mike Ault do the thinking for you, because at least
in one measurable and testable respect, he hasn't a clue what he's
talking about".
There. A little more accuracy never did anyone any harm.
HJR
>
> Thanks for providing that. I now know that your clustering factor is the
> same as Oracle's definition, since you are using the CLUSTERING_FACTOR
> column of DBA_INDEXES. And I now fully understand your ratio you have
> been talking about.
And that's it? That's all you have to *say* about this ratio that you
now fully understand?
Would you not care to elaborate just a little on how this ratio behaves
after an index rebuild?
Regards
HJR
> On this last point, you and I agree, sort of.
And it's a point Mike just made up. No-one has ever accused him of
advocating rebuilding all indexes.
> I don't say rebuild
> indexes just because of some magic number. And I don't say to never
> rebuild indexes. There are times to rebuild indexes and times not to.
> But for me, those times are more dictated by the expected flow of data,
> not by some magic number. If an index has a tendancy to get one-sided
> (normally due to the use of a monotic sequence as the value to the
> indexed column followed by deletions of "older" values) then rebuilding
> periodically can help.
See, this is where that paper you wrote is wrong, too, IIRC. If the
index has got "lop-sided" because you deleted a lot of older values,
then you now have a lot of empty leaf nodes on the left-hand edge of
your index, and those empty blocks can now receive the next inserts from
your monotonically incrementing sequence number. Which means that a
rebuild is *not* warranted.
If you have performed *scattered* deletes, different story. There are
always exceptions, after all.
> And if I delete massive amounts of data and I
> don't expect to add much more data in the future, then rebuilding can
> help.
Well, I'd suggest that if you'd done that to your table, you are going
to get a much better performance improvement from re-compacting your
*table* (alter table X move), because you are now suffering from High
Water Mark Inflation Syndrome. Of course, after a table move, an index
must be rebuilt for it to be usable. And we could discuss, forever I
suspect, how to disentangle the effects of both operations on
performance. But I think anyone who would carefully rebuild their index,
but happily leave their table full of fresh air, has lost the plot.
In which case, we are *still* not talking about just rolling out 'alter
index X rebuild' commands.
>And sometimes rebuilding an index can help in one area and hurt in
> others, but the one area that I am fixing is more important to me than
> the other areas.
That's getting rather vague (which I think is really your point: these
things are vague and subtle and no one number is going to give you the
green light or red). But I'll buy into it. There are always the odd
exceptions.
> But in none of these circumstances can I arrive at any
> ratio which defines the need to rebuild or not. There is simply no query
> you can apply the index and table metrics to arrive at these
> conclusions.
The starting point for any discussion you may have about the need to
rebuild an index should be, I think, that Tom Kyte has rebuilt 7 of them
in his entire professional career.
And I don't mean he just got unlucky with the allocation of keyboard
duties over time, but that he positively and actively decided to rebuild
only 7.
Your justification for rebuilding an index when taken in that light had
better be pretty good, I think.
Regards
HJR
All I said was that I understood his ratio and how it was computed. I
guess I shouldn't have used the word "fully". But I now know how Mike
computed it. I never said anything about how this ratio behaves after an
index rebuild. And if you've read my other posts in this thread, you'll
see that I agreed with you on the clustering factor not changing. And
since the number of blocks in the table doesn't change, it should be a
logical conclusion that I do understand that this ratio *does not*
change after an index rebuild. If you are looking for more, then you'll
have to ask Mike. All I said was that I now understood how Mike
calculated his ratio...
Well, no... you said you 'understood it (the ratio) fully'. Implying,
I'm afraid, that there is actually some informational content in this
ratio that is to be understood.
Because one can 'understand how Mike calculated his ratio', and one can
'fully understand the ratio'. And those aren't the same thing at all.
But since you clarify, that's fine.
Regards
HJR
I would disagree with you here. The empty leaf nodes on the left-hand
edge of the index cannot receive new inserts because those inserts will
always be on the right hand side! I'll draw a crude diagram to
illustrate my point. I have a table called EMP with an index on EMPID.
The EMPID column is populated from my EMPID_SEQ sequence, so the next
employee id will always be larger than previous employee id. Let's say I
have an index that crudely looks like:
|
---------------------------------------
| |
-------------------- ---------------
| | | |
---------------- ---------------- ---------------- ------
| | | | | | | | | | | | | |
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014
(Please excuse the very rough diagram of this simple B-tree structure.
And I do hope that the formatting does go through for people.)
Anyway, The next employee id to this index will be 1015 which will be
added to the right side of the B-tree. Since this key is increasing, it
must be larger than all other previous index entries.
|
---------------------------------------
| |
-------------------- ---------------
| | | |
---------------- ---------------- ----------------
-----------
| | | | | | | | | | | | | |
|
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014
1015
Now as life goes one, people leave the company for various reasons. So
let's assume that some people have left the company. The leaf nodes in
the tree might look something like the following:
|
---------------------------------------
| |
-------------------- ---------------
| | | |
---------------- ---------------- ----------------
-----------
| | | | | | | | | | | | | |
|
1003 1005 1009 1010 1011 1012 1013 1014
1015
Will the next employee id use one of those empty slots? It can't. The
next employee will be numbered 1016. How can it reuse one of those empty
slots?
Given enough time, you can have a tree structure that starts to look
similar to the following:
|
---------------------------------------
| |
-------------------- ---------------
| | | |
---------------- ---------------- ----------------
-----------
| | | | | | | | | | | | | |
|
1009 1010 1011 1012 1013 1014
1015
The entire left side of the tree is unnecessarily contributing to the
height of the index.
This same concept gets even worse for something like invoices. The
invoice numbers are generated by a sequence. All new invoices will get
added to the right side of the tree. What makes this worse is that
companies tend to remove the oldest invoices all at once. For instance,
the company may decide to remove all of last year's invoices (or older)
from the table. In that case, there won't be a sparse look to the tree
like I indicated in my 3rd diagram.
Looking at these examples, how is it that those empty blocks can now
receive the next inserts from your monotonically incrementing sequence
number"?
> > And if I delete massive amounts of data and I
> > don't expect to add much more data in the future, then rebuilding can
> > help.
>
> Well, I'd suggest that if you'd done that to your table, you are going
> to get a much better performance improvement from re-compacting your
> *table* (alter table X move), because you are now suffering from High
> Water Mark Inflation Syndrome. Of course, after a table move, an index
> must be rebuilt for it to be usable. And we could discuss, forever I
> suspect, how to disentangle the effects of both operations on
> performance. But I think anyone who would carefully rebuild their index,
> but happily leave their table full of fresh air, has lost the plot.
>
> In which case, we are *still* not talking about just rolling out 'alter
> index X rebuild' commands.
I fully appreciate that! Yes, if you delete a very large amount of data
from your table, then you might want to consider re-compacting your
table as the HWM accounts for a large amount of empty space, thus
affecting things like full table scans.
> >And sometimes rebuilding an index can help in one area and hurt in
> > others, but the one area that I am fixing is more important to me than
> > the other areas.
>
> That's getting rather vague (which I think is really your point: these
> things are vague and subtle and no one number is going to give you the
> green light or red). But I'll buy into it. There are always the odd
> exceptions.
Yes it is getting vague, and that was part of my point. Lots to think
about.
> > But in none of these circumstances can I arrive at any
> > ratio which defines the need to rebuild or not. There is simply no query
> > you can apply the index and table metrics to arrive at these
> > conclusions.
>
> The starting point for any discussion you may have about the need to
> rebuild an index should be, I think, that Tom Kyte has rebuilt 7 of them
> in his entire professional career.
>
> And I don't mean he just got unlucky with the allocation of keyboard
> duties over time, but that he positively and actively decided to rebuild
> only 7.
>
> Your justification for rebuilding an index when taken in that light had
> better be pretty good, I think.
To me, as I've tried to say, the justification for rebuilding an index
isn't as simple as coming up with a metric. There is more to it than
that. Like many things in database tuning, a simple metric can't always
give an accurate picture on whether or not you need to take
Course_of_action_A or Course_of_action_B.
And to be honest with you, then only time I've rebuilt an index in the
last two years or so was when it was marked UNUSABLE due to some other
operation that I've performed, such as an ALTER TABLE MOVE command.
Hi Brian,
Because empty index leaf nodes are placed on the freelist and are reused by
subsequent index leaf splits. The fact that leaf blocks are not reused is
yet another indexing myth.
See my presentation/white paper for the details.
Cheers
Richard
I don't know about your closet, but in mine there are a couple of
physically ordered subsets within, basically "clothes favored for
work" and "clothes favored for play" subtypes within various
organizations such as "shirts," "pants," "underwear, guns and
balloons," "sock drawer" and so forth. There is physical ordering
since the clothes are laundered in batch jobs, ie, clothes worn to
work tend to be washed together once a week, and then put away
together, all jammed into the open space left by previous removal.
The practical consequence is I go directly to them when selecting for
work, usually when rushing about in the morning when excessive minutes
are to be avoided. The only full closet scans occur when purging or
if looking for a particular article of clothing not used in a while.
>
> I doubt that Mike was arguing for "randomness" in its
> mathematically-rigorous incarnation. I certainly wouldn't. But a mess of
> clothes on the bedroom floor certainly *looks* random enough, and that
> is all that was being said here, I think.
Whether you're a theory wonk or looking at physical ordering, there is
a very great difference between random and un-ordered. As an
overly-broad generalization, I usually see the result of an ordered
load from the use of tools followed by some amount of chaos (depending
on design, of course) with additional transactions. Some of the
performance issues that may result from this may not be visible to
Oracle. For example, I went to look at a system that I was going to
be doing some ETL on, expecting to see usual late-morning
transactions, and then evaluate whether those people would notice
degradation when I started. There were no transactions, only one
program maxing out 1 cpu. Such unexpected always gets my attention,
so I investigated further. There actually was a lull in transactions,
just one of those random times when no one is calling in orders. The
one program was one that normally takes about 3 hours. Oracle doesn't
show any waits, including cpu. Explain plan shows sort (order
by)/header table by index rowid/ header index range scan/ detail table
by index rowid/detail index range scan. So, why would cpu be maxed
out if Oracle isn't waiting on it? (The answer is the application
program is actually doing homegrown temporary tables in memory to
summarize data). Certainly a case for throwing hardware (cpu Hz) at
it, were anybody to want it faster than 3 hours. But if it wasn't
cpu-bound, the physical ordering of the indices getting range-scanned
might make a difference - and there is a detail version of the program
that may be doing just that, taking 12+ hours. So that is why I'm
interested in this whole CF argument, and it got my attention when
Mike mentioned batches. I see in my test db (loaded by imp and very
few subsequent transactions) the header index CF is 99045 and the
detail index CF is 210097. In my production db (many months and many
[would be monotonically increasing, except for 3 leading characters of
low cardinality] transactions since loaded) the header index CF is
blank and the detail index CF is blank. 1.3M rows in detail, 150K
rows in header, 8i, hpux, DMT, RBO.
So do I need to rebuild indexes? (Pretty sure the answer would be
"no" since there is no key updating and hasn't yet been deletion...
but there will be, and massive... hopefully, after going to 9 and
partitioning. But it sure looks bad in Tablespace Map. Of course,
this whole thread only applies to CBO, right?).
(Richard, I think slide 24 has a typo in the first line)
>
> > My only experience with this was with a third party vendor who slipped a
> > weekly "rebuild every index in the system" job into a production system I
> > was babysitting without saying a word to anyone (Grrrr!) So I can add a
> > note here that index rebuilds of any real size are expensive in both CPU and
> > log archive space as well as being almost always ineffective and pointless.
>
> That is a very important point, of course, and one that hasn't been done
> justice so far, because we are all still trying to work out what on
> Earth Mike is talking about. But when we do, and when we finally get
> over the clustering factor as an issue, we will still have to do
> hand-to-hand combat with the fact that index rebuilds are bloody
> expensive, so you'd better be pretty certain before you start that the
> benefits are going to outweigh the costs.
>
> > Thank you folks for an informative and so far entertaining thread. I, for
> > one, promise faithfully never ever to rebuild an index (not that I ever
> > would or did, mark you) without revisiting this thread and rereading every
> > post in it.
> >
> > Roger S Gay
>
> I think that is a victory of sorts, then, Roger!
Maybe the OP should summarize :-)
>
> Thanks
> HJR
jg
--
@home.com is bogus.
http://www.signonsandiego.com/news/metro/20040507-9999-7m7breach.html
> Whether you're a theory wonk or looking at physical ordering, there is
> a very great difference between random and un-ordered.
It makes no difference to the theory or practice of computing a
clustering factor for an index in Oracle.
HJR
This one popped up in my email:
http://searchoracle.techtarget.com/tip/0,289483,sid41_gci960148,00.html?track=NL-94&ad=482086
For those who don't want to register there, it basically does an
analyse index validate structure on all non-sys% indices, then builds
a rebuild command if > 20% of the leafs are deleted.
Interested in people's opinions on it, rants about arbitrary
percentages always appreciated.
jg
--
@home.com is bogus. After performing for the British royal family,
the story goes, he was introduced to Queen Elizabeth. "How do you do,
Mr. King?" she is reported to have said. "How do you do, Mrs. Queen?"
he is said to have replied.
RIP Irwin Alan Kniberg 1927 - 2004 AKA Alan King
> I would disagree with you here. The empty leaf nodes on the left-hand
> edge of the index cannot receive new inserts because those inserts will
> always be on the right hand side!
Excuse me??!
Indexes don't have "sides". They have blocks. And those blocks are
wherever Oracle happens to source them. And if you delete all your rows
from the table with sequence number 0 to 10,000 you have just emptied a
lot of blocks. And those blocks are available for the next insert. Even
if the next insert is of sequence number 5,000,000.
This is *very* basic stuff.
>I'll draw a crude diagram to
> illustrate my point.
I wouldn't bother.
Really. It's not difficult. If you delete employee 456, and employee
982, then of course employee 5,000,001 can't use those slots.
But if you delete all employees from 0 to 10,000 then employee 5,000,001
will quite happily use an earlier empty block, because it is now
COMPLETELY EMPTY AND THUS HAS NO POSITIONAL SIGNIFICANCE IN THE INDEX.
> Given enough time, you can have a tree structure that starts to look
> similar to the following:
>
>
> |
> ---------------------------------------
> | |
> -------------------- ---------------
> | | | |
> ---------------- ---------------- ----------------
> -----------
> | | | | | | | | | | | | | |
> |
> 1009 1010 1011 1012 1013 1014
> 1015
No, you can't Brian.
I have no intention of doing ASCII artwork, but no you can't.
A block ceases to have any positional significance in the index tree
structure when it no longer has any leaf entries in it at all.
And an index such as you draw here with a nice lot of empty nodes
between the left-hand edge and the 1009 can therefore accomodate the
next entry for 1016 in the LEFT-HAND-MOST leaf node.
Because left- and right-handness have no physical significance.
> The entire left side of the tree is unnecessarily contributing to the
> height of the index.
But it's usable space.
> This same concept gets even worse for something like invoices. The
> invoice numbers are generated by a sequence. All new invoices will get
> added to the right side of the tree.
>What makes this worse is that
> companies tend to remove the oldest invoices all at once.
Which is GOOD NEWS. Because it means vast swathes of their index are
emptied AND ARE THUS RE-USABLE.
Even with a monotonically incrementing sequence number for the invoice
number.
I can't believe you don't know this. You're just having fun, aren't you?
For instance,
> the company may decide to remove all of last year's invoices (or older)
> from the table. In that case, there won't be a sparse look to the tree
> like I indicated in my 3rd diagram.
>
>
> Looking at these examples, how is it that those empty blocks can now
> receive the next inserts from your monotonically incrementing sequence
> number"?
Oh for gawd's sake. Please find out how indexes work.
The clue is in the words "empty blocks". What do you think PCTUSED of
zero means for an index block, exactly?
Mike Ault's stunning amount of ignorance on the subject is just about
bearable. But I refuse to add any more to my plate, especially when you
should know better.
HJR
> This one popped up in my email:
>
> http://searchoracle.techtarget.com/tip/0,289483,sid41_gci960148,00.html?track=NL-94&ad=482086
>
> For those who don't want to register there, it basically does an
> analyse index validate structure on all non-sys% indices, then builds
> a rebuild command if > 20% of the leafs are deleted.
>
> Interested in people's opinions on it, rants about arbitrary
> percentages always appreciated.
It's been done to death here, surely. It is irrelvant if deleted leaf
rows exceeds any particular number. The issue is whether it *stays* that
way.
HJR