Search on the internet, some expert say
"In the overwhelming majority of cases, indexes are extremely well
self-maintained and index rebuilds are NOT required, ever."
Tom Kytes in his website says
"The time lag between index rebuilds should be approximately FOREVER.
Tell me -- why why why WHY are you rebuilding them on a schedule? What is
the scientific
basis in reality that is driving you to do this???? "
And some other site says:
"First rule of thumb is if the index has height greater than four, rebuild
the index. For most indexes, the height of the index will be quite low,
i.e. one or two. I have seen an index on a 3 million-row table that had
height three. An index with height greater than four may need to be rebuilt
as this might indicate a skewed tree structure. "
Which one is right?
Thanks for your comments
Thanks
You have to be certifiably insane to do index rebuilds just because
some rule-of-thumb somewhere says that if some metric you don't
understand reaches some value whose significance you don't understand
says you should.
What Tom Kyte is saying is "APPROXIMATELY forever". In other words,
not absolutely never, but very, very rarely when the specific
circumstances require it. A skewed index would be such a circumstance
(as would moving a table, doing DDL on a partitioned table with global
indexes, realising your queries are for Column A, B and your index is
built on Column B, A (though that's a re-create not a rebuild, but
close enough) and so on).
If you simply say, "Height 4... rebuild", that is a silly way to
manage a database. It's akin to saying, "Buffer Cache Hit Ratio = 75%,
must increase db_cache_size!" Likewise "deleted leaf rows > 20% of
leaf rows" is a daft way to go, because deleted leaf row space will
sort itself out over time... unless the index is on something like a
sequence (or time), and thus monotonically incrementing, because then
you will never re-use that deleted leaf row space.... unless the
deleted leaf row space comes in such large chunks that it causes
entire blocks to be vacated, in which case you will...
I mean, in short, the subject is complicated and boiling it down to
moronic rules of thumb is a mug's games.
But, generally and with due caution, it is more or less fair to say
that if you HAD to have a rule of thumb that applied in a rhetorical
95% of cases for a rhetorical 95% of the time, it would be, "don't
rebuild your indexes". The risks of adopting that approach far
outweigh the costs associated with the other one of "rebuild your
indexes routinely because otherwise your CPU cycles just go to waste".
Certainly, you do not want to be ROUTINELY rebuilding your indexes.
(Having just scheduled a routine rebuild of an Intermedia index, I
know there are exceptions even to that rule provided you get precise
enough about it! But I'm assuming we're talking about regular b-tree
indexes here otherwise).
I meant, of course, "...the BENEFITS of adopting that approach far
outweigh the costs associated with the other one...."
Apologies for mangling my grammar.
:0
See the thread "Proving or Debunking the need for rebuilding"
in this group for a report...
http://groups.google.be/group/comp.databases.oracle.server/browse_thread
/thread/9579b57c3434d990/9e32aa3cf9282f81?
hl=fr&lnk=gst&q=+Proving+or+Debunking+the+need+for+rebuilding&rnum=1#
9e32aa3cf9282f81
--- Raoul
>
> We do rebuild indexes, because it was measured to improve
> performance on our system (9i).
>
> See the thread "Proving or Debunking the need for rebuilding"
> in this group for a report...
>
> http://groups.google.be/group/comp.databases.oracle.server/browse_thread
> /thread/9579b57c3434d990/9e32aa3cf9282f81?
> hl=fr&lnk=gst&q=+Proving+or+Debunking+the+need+for+rebuilding&rnum=1#
> 9e32aa3cf9282f81
>
> --- Raoul
Raoul,
Did you ever find time to work out which indexes were
particularly relevant to the massive change in the batch
job, and what the difference was between the slow and
the fast runs ?
Running statspack at its default level 5 once per hour across
the batch job might give you some useful information about
the SQL with the biggest time change, and what made the
time change - level 6 would allow you to check if this was
due to plan changes, and level 7 might show you where the
I/O changed most.
--
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
Well, from what I read of that thread, you didn't measure very much
before, you weren't entirely certain on what to measure afterwards,
and it was all a bit vague and airy!
But that's standard fare in the 'I know my index rebuilds are
beneficial' industry: no-one actually every properly measures a darn'd
thing!
I don;t mean that in a critical way, by the way. More a frustration
thing: everyone has anecdotes, few have facts and figures.
Anyway, even if you happen to be of that rare breed, someone for whom
an index rebuild genuinely produces tangible, repeatable and
measureable benefits, fine: I have no problem with that. So long as
you, some third party with a bouffant hair-do or some poor innocent
newbie passing over this thread months or years from now understands:
that is not usual and it doesn't provide the basis for a simplistic
rule of thumb of the sort that says, "If Height>= 4 then rebuild;"
Proceed carefully in making any changes to an environment that is
running in production.
Take the advice of Tom Kyte and Jonathan Lewis. Read up on the topic
and the relevant history.
Beware of the silver bullet type of advice that tells you "when you
have to rebuild an index". Most of the time it is un-needed in oracle.
Delete intensive applications are usually trying to keep only a
certain volume of data in the active database and have some process
that deletes data on a frequent basis. Another type of app "reloads"
data with the same key values but some additional data columns
populated in later versions/data sets - instead of updates or merges -
the code deletes and then inserts. In both of these cases, you will
see the amount of space used by the index increase steadily while the
table size remains relatively stable. Rebuilding the index results in
a smaller index and improved performance on queries that use the
index.
On Oct 4, 10:10 pm, "charlie cs" <charliecs(nospam)@hotmail.com>
wrote:
No, Jonathan, Unfortunatly I never investigated more as
of today, for lack of time and because the system is
working fine as is.
I certainly agree that only a few indexes actually
benefit from rebuild, and rebuilding them all is overkill
and detrimental to some extent.
On the other hand, rebuilding all indexes is quick
- on my system -, had no observable side effects in routine
work, and we dont have to carefully analyze and measure
whether a freshly added index actually needs a rebuild.
From my experience index rebuilds are from the
proactive-tuning-business which actually doesn't make sense at all. I
never experienced a measurable performance gain by an index rebuild.
Jan
Well, I cannot really agree, Howard
As I recall, that was a very long, very controlled measurement effort,
whose goal was to check whether our system did benefit from index
rebuilds.
I really expected to see that index rebuilds were not needed.
To my surprise, the test showed that the rebuilds were beneficial.
Trouble is that while the tests show with certainty a very
measurable change, I dont have data to know with certainty why.
The best hypothesis right now is :
* Many of our queries are optimized with index range scans.
* Some important indexes do index monotonically increasing fields -
mainly sequence and date primary keys.
* The delete pattern is such that 9 out of ten rows with consecutive
dates or sequences are deleted, leaving the corresponding indexes
with non empty but sparsely populated leaf blocks.
- Therefore range scans take longer
As I recall, and this is only an hypothesis... All the
information I have is in the referenced thread - albeit in
bits and pieces, sorry.
About repeatability... For various reasons it has happened
that some of our customers have been running without rebuilds
for a few weeks. And complained to our technical support that
the system was becoming slow. The effect I have measured is
therefore very real, experienced by actual users in the field.
>
> But that's standard fare in the 'I know my index rebuilds are
> beneficial' industry: no-one actually every properly measures a darn'd
> thing!
>
> I don;t mean that in a critical way, by the way. More a frustration
> thing: everyone has anecdotes, few have facts and figures.
>
> Anyway, even if you happen to be of that rare breed, someone for whom
> an index rebuild genuinely produces tangible, repeatable and
> measureable benefits, fine: I have no problem with that. So long as
> you, some third party with a bouffant hair-do or some poor innocent
> newbie passing over this thread months or years from now understands:
> that is not usual and it doesn't provide the basis for a simplistic
> rule of thumb of the sort that says, "If Height>= 4 then rebuild;"
>
Sure. I am not claiming anything more than what I have observed
*on our system* - and not suggesting anything more than "measure your
system"
BTW a favorite optimization technique here is that critical
queries get their data from the index alone. The underlying
table is not accessed at all.
--- Raoul
And that may be, however your posts relating your efforts appear
disjointed and vague and throw that same light on your testing and
findings..
> I really expected to see that index rebuilds were not needed.
>
> To my surprise, the test showed that the rebuilds were beneficial.
> Trouble is that while the tests show with certainty a very
> measurable change, I dont have data to know with certainty why.
I realise this would take time, but have you considered a treedump of
the indexes before and after the rebuild:
alter session set evnts = 'immediate trace name treedump level <index
object id>';
You should see considerable differences in the dump files for a given
index if the rebuild of it improves performance. And those file pairs
with differences would indicate the indexes upon which you should
concentrate your rebuild efforts.
> system"- Hide quoted text -
>
> - Show quoted text -
David Fitzjarrell
Test this one on a small index before trying it on a large index.
On some versions of Oracle it does a full dump of all leaf blocks,
rather than one line per block.
Remember, it reads the index one block at a time anyway, so
it can be very slow even when it doesn't to a full dump.
> I realise this would take time, but have you considered a treedump of
> the indexes before and after the rebuild:
>
> alter session set evnts = 'immediate trace name treedump level <index
> object id>';
>
> You should see considerable differences in the dump files for a given
> index if the rebuild of it improves performance. And those file pairs
> with differences would indicate the indexes upon which you should
> concentrate your rebuild efforts.
Comparing index sizes before and after rebuild would probably reveal
all there is to know.
Probably not. I wouldn't indicate that a huge waste of time and
resources had just taken place which is what would be the end-result
the majority of the time.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Waste of time for majority of cases - yes. In a small number of cases
where performance is critical and benefits are tangible, a good DBA
should know which ones they are, and what he needs to do, and how
often.
No, it wouldn't be all there is to know.
If an index settles down at about 75% space utilisation in a steady-
state scenario, it stands to reason that a rebuild will show it small
after the rebuild that before. But if you were to conclude from that
that the rebuild was obviously of benefit, and that the reduction in
size was "all you needed to know", you'd be missing a trick or three.
Does the now-compacted index now suffer from higher contention rates
than it used to? It could well do so. Do the next X-number of inserts
to the table now take place slower than they did before because they
now have to cause the index to re-acquire the empty space it
previously already had available to it? It could well be the case.
Size is NOT everything, in short.
Ah, the perils of making sweeping statements! Let me try again. I was
replying to the treedump post and comparing looking at index sizes vs
doing treedumps.
In a bulk rebuild, most indexes would flip from 75% to 90% utilisation
(or whatever you set the new pctfree to be), and the sizes would vary
accordingly. No surprises there.
However, some indexes would shrink in size by a far greater
proportion. Those ones you examine a little closer. That and the fact
that I (and I suspect the majority of people reading this, though I
could be wrong) am not able to draw any meaningful conclusions as to
whether a rebuild is warranted from looking at treedumps. The last
time I looked, it just looked like a tree to me, something which I
already knew.
Size, on the other hand, tells me that either a bulk delete has
occurred, or the index is possibly suffering from monotonic inserts/
deletes. The first one needs no future action if the delete was a
known one-off. The second one is a candidate for future rebuilds. And
that is all I need to know.
If I inherit a database with a dubious history, I would schedule a
mass rebuild for these reasons, plus the fact that you get every
extent size possible under the blue sky. Especially, when you are
required to add/resize datafiles at a rate out of kilter with expected
growth rates. The trick however is deciding on a good pctfree for the
individual larger indexes, especially when you have a quite a few to
contend with. But that is another story.
> Waste of time for majority of cases - yes. In a small number of cases
> where performance is critical and benefits are tangible, a good DBA
> should know which ones they are, and what he needs to do, and how
> often.
I agree with you. But I doubt you will find one DBA in 10,000 that can
define the metric that will tell them when that is.
Hint: I would expect the correct answer might make reference to SYS_OP_LBID.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Thanks for the advice, Jonathan. No, David, I never did
a treedump in my life.
Hmm.. If there is interest here, I could try to
collect data from an actual production site, before
and after a rebuild.
Which data would you (collective) want to see ?
Keep in mind that this would be from a customer
production site. Therefore,
* Nothing destructive, please
* Collected data take a reasonable amount of space
* Collection takes at most ten minutes or so,
preferably non blocking
* You'll see changes induced by a one day activity
(night job + daily activity)
Best regards, everybody
--- Raoul
Recently in a 9.2.0.6 database where the application attempts to keep
only the last 4 days of completed orders online, index sizes dropped
an average of 50% with an average performance improvement of 150%.
Please note that "orders" enter the system in batches and stay in the
systems for 4 days after they have completed; however, most of the
orders loaded on day 1 are not deleted on day 5 or even by day 10.
It's the widely spread deletion pattern of small numbers of records
(particularly when the deletion order does not correlate to the
insertion order) that creates an index that benefits from rebuilding.
Other customers using the same application code with a dense insert-
deletion pattern (50% of day 1 inserts are deleted on day 5) do not
experience the performance drop off.
So if the index avg row length is a fraction of the table avg row
length and the index storage is still a multiple of the table storage
(30 bytes on the index, 200 bytes on the table, 10g for the table, 20g
for the index) - you may want consider reindexing. The ratio does not
necessarily mean rebuilding is needed - it's just a tool/method for
identifying indexes to examine. A better method may be segment
activity - fragmented indexes that are causing performance problems in
OLTP also tend to be among the most active segments.
> Jan- Hide quoted text -
Picking just one potentially interesting index, as it does
a fast full scan with aggregate of all entries.
Code changes needed:
supply the index name in the 'select from user_objects'
(which may have to be dba_objects)
Where I have 'column is not null' change the column
names (and add predicates if necessary) to cover the
columns in the chosen index.
A note of the number of leaf_blocks in the index (from
dba_indexes/user_indexes) before and after would also
be useful, and the block size.
--
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
"Hasta" <hast...@hotmail.com> wrote in message
news:MPG.21729f89b...@news.dommel.be...
I have a situation where the production schema had a major delete at
the end of last year, and has slowly been adding since. When I exp/
imp the schema in question to a test db, the size is a good 20%
smaller. I then look at the largest index that OEM tells me has
significantly fewer blocks with your program, and I see a much broader
distribution of number of rows per index blocks.
>From production:
[snippage]
ROWS_PER_BLOCK BLOCKS
-------------- ----------
113 1877
114 4984
115 3167
116 28250
117 2096
118 4314
119 2301
120 5501
121 2526
122 9402
123 1803
ROWS_PER_BLOCK BLOCKS
-------------- ----------
124 3403
125 1255
126 2421
127 919
128 2528
129 985
130 1775
131 714
132 2354
133 727
134 1589
[snippage of more rows like this]
ROWS_PER_BLOCK BLOCKS
-------------- ----------
----------
sum 172843
176 rows selected.
SQL> select leaf_blocks from user_indexes where
index_name='IC_MOVEMENTS_2';
LEAF_BLOCKS
-----------
172697
>From test:
ROWS_PER_BLOCK BLOCKS
-------------- ----------
113 1
121 1
131 1
160 1
165 1
188 210
189 2
192 1
193 697
197 1
198 2
ROWS_PER_BLOCK BLOCKS
-------------- ----------
199 52109
200 230
201 288
202 468
203 692
204 2554
205 10419
206 8775
207 3780
208 1040
209 463
ROWS_PER_BLOCK BLOCKS
-------------- ----------
210 32797
211 3
212 3
213 5
214 2
215 5
216 1
217 3012
218 1
219 1
220 1
ROWS_PER_BLOCK BLOCKS
-------------- ----------
223 358
224 1
225 1
227 1
228 1
230 1
231 222
233 1
235 1
----------
sum 118154
42 rows selected.
SQL> select leaf_blocks from user_indexes where
index_name='IC_MOVEMENTS_2';
LEAF_BLOCKS
-----------
118334
Therefore, exp/imp must indeed be the answer to life, indices and
everything! :-)
The question that is begged: Can-or-should I just compare the
leaf_blocks on all indices between production and test and rebuild
those with the largest difference? Seems too simple. (Especially
since I know if I mention this to this customer they will just want me
to exp/imp the whole db, MS-think abounds). I've never felt any need
to do anything to these indices, except due to the space issue -
between hardware upgrades and such, this is the first time they've
gone a couple of years with no maintenance.
jg
--
@home.com is bogus.
http://www.signonsandiego.com/uniontrib/20071006/news_1b6halo.html
I quite like that advice. It's not simplistic ("If index is lots
bigger than table, then it needs further investigation") and it asks
for cross-checking with other metrics (in this case, segment activity,
but I wouldn't rule out throwing in the del_lf_rows/lf_rows measure,
too). Much, much better than "if some ratio=20%, rebuild"!!
Hi joel,
i'm very surprised for this approach.
i'm sorry but do you make these operation for know if your indexes
must be rebuild?
Tell more please.
Regards
Alberto
No, I've never rebuilt any indexes on this db. I'm thinking about it
now, for space reasons, not performance - I don't think there are any
fast full index scans on this index, for instance. There's just a
number of gigabytes of wasted space in this schema.
> Tell more please.
Jonathan is the expert :-)
http://en.wikipedia.org/wiki/Answer_to_Life%2C_the_Universe%2C_and_Everything
>
jg
--
@home.com is bogus.
I'd like to see nominations for the sysadmin from hell.
http://www.sysadminoftheyear.com/
Nice, Jonathan.
Allow me a few days, OK ?
Best Regards
--- Raoul
Joel,
Sorry for taking so long to get back on this - I've been
a bit busy for the last week.
Simple solutions are desirable - and there's no such thing as
"too simple", although there is "too simple-minded".
It's an interesting set of figures on the rebuild - especially
with those little spikes. I'm guessing that there's been a little
bit of activity that would explain why you've got a few blocks
with about 113 rows in - those look like the result of a few
leaf block splits.
Could I guess that the leading column of this index has a few
(perhaps just 3) values which are particularly popular - perhaps
it's a single column index with a lot of skew, or a multi-column index
with a skew on the first column. That might explain the spikes at
199, 205, and 210 rows per block. If that's the case, then compressing
on that column would be sensible at any time.
Since you've got 176 rows reported before the rebuild that suggests
you've got some blocks with 290 rows per block - which is odd
because that doesn't show up after the rebuild. (At a default 10%
free on a rebuild that 290 peak ought to drop to about 261, and
your figures are way below that - do the counts start to taper
off at about 230 rows per block so that blocks with more rows
are just the odd one or two ?)
The post-rebuild figures would encourage me to look closely at
the index because it isn't a "boring, random" index; but (if I hadn't
seen the post-rebuild) I would have looked at the pre-rebuild
figures and decided that I wasn't really going to see a DIRECT
performance improvement from the rebuild - so I would only
consider rebuilding it if
a) It was extremely popular
and I was fairly confident that
b) the 60,000 (call it 40,000 after initial degeneration) drop
in the block count meant I would get a constant saving in
the buffer cache that paid for the work and time spent in
rebuilding.
However, given that blocks seem to start splitting soon after the
rebuild I might then take a little care to balance the frequency of
rebuilds against the pctfree I set so that I managed to rebuild
before any serious amount of block splits started to occur.
Bottom line - if you see a performance issue related to db file
sequential reads, and if the segment statistics (v$segstat) highlights
a particular index as being a major culprit, and if the query shows
the index blocks to be thinly populated (spread like yours, or with
a very long near-empty tail that you think is being scanned) then
consider rebuilding a thinly populated index, and coalescing an index
with a long tail.
Remember: a B-tree index on randomly arriving data will spread
to an average 70% utilisation. A rebuild index will (by default) run
at 90% utilisation. Therefore rebuilding indexes which show a 20%
drop between prod and test is too simple-minded. An index that
shows an interesting pattern before or after rebuild may be worthy
of special consideration.
No problem!
>
> Simple solutions are desirable - and there's no such thing as
> "too simple", although there is "too simple-minded".
I run into that a lot.
>
> It's an interesting set of figures on the rebuild - especially
> with those little spikes. I'm guessing that there's been a little
> bit of activity that would explain why you've got a few blocks
> with about 113 rows in - those look like the result of a few
> leaf block splits.
Guessing? What would Alex say! :-)
>
> Could I guess that the leading column of this index has a few
> (perhaps just 3) values which are particularly popular - perhaps
> it's a single column index with a lot of skew, or a multi-column index
> with a skew on the first column. That might explain the spikes at
> 199, 205, and 210 rows per block. If that's the case, then compressing
> on that column would be sensible at any time.
All indices on this table have lots of columns - Primary has 10
columns, another has 5, the one in question 6. Being a generalized
ERP inventory system, they all start with a company code (just one in
each instance, so far, 1 character), and something else with few
values next. So that's a spot-on guess. The first 5 columns in this
index are limited in value range, the last is a document number which
comes from several systems with several definitions, some of which are
steadily increasing.
>
> Since you've got 176 rows reported before the rebuild that suggests
> you've got some blocks with 290 rows per block - which is odd
> because that doesn't show up after the rebuild. (At a default 10%
> free on a rebuild that 290 peak ought to drop to about 261, and
> your figures are way below that - do the counts start to taper
> off at about 230 rows per block so that blocks with more rows
> are just the odd one or two ?)
Yes, sorry I didn't post the whole list, I didn't think it would be
that interesting :-)
...
ROWS_PER_BLOCK BLOCKS
-------------- ----------
223 153
224 603
225 137
226 427
227 143
228 554
229 141
230 210
231 118
232 125
233 134
ROWS_PER_BLOCK BLOCKS
-------------- ----------
234 122
235 11
236 12
237 24
238 12
239 15
240 15
241 15
244 2
245 1
249 1
>
> The post-rebuild figures would encourage me to look closely at
> the index because it isn't a "boring, random" index; but (if I hadn't
> seen the post-rebuild) I would have looked at the pre-rebuild
> figures and decided that I wasn't really going to see a DIRECT
> performance improvement from the rebuild - so I would only
> consider rebuilding it if
> a) It was extremely popular
No evidence one way or the other, but no one complains...
> and I was fairly confident that
> b) the 60,000 (call it 40,000 after initial degeneration) drop
> in the block count meant I would get a constant saving in
> the buffer cache that paid for the work and time spent in
> rebuilding.
I have no such confidence. But I do have a routine from the dark ages
that imp/exp the whole schema on a weekend night. I'm trying to
figure out if I can justify to management anything less simple-
minded. So far I haven't.
> However, given that blocks seem to start splitting soon after the
> rebuild I might then take a little care to balance the frequency of
> rebuilds against the pctfree I set so that I managed to rebuild
> before any serious amount of block splits started to occur.
This does seem worthwhile to watch on certain indices.
>
> Bottom line - if you see a performance issue related to db file
> sequential reads, and if the segment statistics (v$segstat) highlights
> a particular index as being a major culprit, and if the query shows
> the index blocks to be thinly populated (spread like yours, or with
> a very long near-empty tail that you think is being scanned) then
> consider rebuilding a thinly populated index, and coalescing an index
> with a long tail.
No one's complaining... (except when I added a repository instance
with the wrong sga_max_size and started swapping [D'Oh!]).
>
> Remember: a B-tree index on randomly arriving data will spread
> to an average 70% utilisation. A rebuild index will (by default) run
> at 90% utilisation. Therefore rebuilding indexes which show a 20%
> drop between prod and test is too simple-minded. An index that
> shows an interesting pattern before or after rebuild may be worthy
> of special consideration.
Most of the indices have some sort of steadily increasing component,
like a job number in each of several factories or sales order number
in each of several divisions. This particular index has the document
number last. I was kind of surprised I didn't see a size difference
(either way) in some of the other more normal large indices. I need
to look more closely at what else shrank, and what didn't.
Thanks!
jg
--
@home.com is bogus.
http://www.kasamba.com/professional/viewexpert.aspx?expid=128108&catid=10337
Couldn't agree more.
This is what I have come to refer to as the KISS method.
"Keep It Stupid Simple"