Account Options

  1. Sign in
Google Groups Home
« Groups Home
Message from discussion index usage
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Richard Foote  
View profile  
 More options Jan 20 2005, 6:36 am
Newsgroups: comp.databases.oracle.server
From: "Richard Foote" <richard.fo...@bigpond.nospam.com>
Date: Thu, 20 Jan 2005 11:36:43 GMT
Local: Thurs, Jan 20 2005 6:36 am
Subject: Re: index usage

"hastenthunder" <hastenthun...@hotmail.com> wrote in message

news:56uHd.2452$Ny6.4229@mencken.net.nih.gov...

> Hello,

> I've read many documentations online stating to only create an index if
> queries against this table frequently retrieve less than 15% of the rows.
> However, if the query returns, say, 40% of the rows, wouldn't indexing the
> column still help by cutting the work by roughly half?

> hastenthunder

A much *simplified* example on how I teach this stuff...

Let's say we have a table that has 10,000,000 rows which are stored in
1,000,000 data blocks meaning we have approximately 10 rows per block on
average.

Let's say we have an index on this table that has 100,000 leaf blocks
meaning we have on average approximately 100 leaf entries per leaf block the
index has 3 levels.

Let's also say we have an "effective" multi-block read capability of 10
blocks per I/O (meaning Oracle will read 10 "consecutive" blocks at a time
on average during a full table scan multiblock read).

Finally, let's say we're interested in accessing *just* 10% of the data (or
1,000,000 of the total 10,000,000 rows). Will Oracle use the index or won't
it ? Hopefully, I've picked an easy set of numbers to help illustrate the
answer ...

Firstly, to calculate the "cost" of using the index access path.

We need to read the root block + a branch block in order to get to the first
leaf block of interest. That's 2 logical I/Os (LIOs). We then need to read
approximately 10% of the leaf blocks in order to get our 1,000,000 leaf
entries required to directly access our 1,000,000 rows of interest, that's
10% of the 100,000 leaf blocks = 10,000 leaf blocks. Because we're reading
an index via a range scan and because the leaf blocks are not (necessarily)
physically co-related, Oracle must read each leaf block via a single I/O. So
that's 10,000 LIOs. So, just to read the index alone, we require 2 + 10,000
= 10,002 LIOs.

Note by default, Oracle assumes the above "cost" to be physical I/Os (PIOs).
Now assuming this index is heavily accessed, a good number of these index
blocks may already be cached in memory. The optimizer_index_caching
parameter can be used to adjust the above cost by suggesting that x% are
actually already cached and so are "cheaper" to access. To keep things
simple, we'll assume the default value of 0% or that no index blocks are
actually likely to be cached (generally not a wise assumption but let's keep
the arithmetic simple).

To access the corresponding table blocks, again Oracle can only perform
these reads via a single block read as each index entry points to a table
block that contains it's specific table row . Now we're after 1,000,000 rows
which means we require 1,000,000 LIOs in order to access the required rows.
Question is, how many *different* table blocks do we need to access ? Well,
this is entirely dependent on the Clustering Factor (CF) of the index, or
how closely aligned are the corresponding rows in the table in relation to
the order of the index (which must be in the order of the index values). In
the "best" possible case, all the required rows are all ordered and grouped
together in the same "collection" of table blocks meaning we only have to
access 10% of the 1,000,000 table blocks or 100,000 table blocks in a
roughly *consecutively* manner.

However, as is more common, if the required rows are randomly and evenly
distributed among the table blocks, then on average we need to read 1 row
(10%) from *each and every table block*. Note in your case of wanting to
access 40% of the data, we might depending on a poor CF need to visit on
average *each and every* data block *4 times*. This is the key point (no pun
intended).

The greater the number of differing blocks we access, then the less likely
we will find the block in memory from it being previously read and the more
likely that the block will need to be read from disk (PIO). Oracle considers
this and uses the CF in it's costing calculations. Assuming a randomly
distributed set of required rows, note we will need to visit *all* the table
blocks on average because on average we are interested in 1 in 10 of the
rows that each block contains (yes, some blocks may not actually be visited
and some may be visited a number of times but with such volume of blocks, it
conceivably might be a significant duration between reads to the same block
meaning it could easily have been aged and be physically re-read anyways).

The point though is that it's 1,000,000 LIOs regardless, of which a very
significant number *could* be *actual distinct* (or differing) blocks. So
that's 10,002 for the index + 1,000,000 for the table = 1,010,002 LIOs to
read *just* 10% of the data via an index.

Now to calculate the "cost" of a FTS. A FTS has a number of advantages over
an index access path. Firstly, because we read each block "consecutively"
(kinda) Oracle can investigate the appropriate selectiveness of each row
within the block ensuring that each table block is read just *once* (special
blocks such as extent maps withstanding). Secondly, again because each block
is read consecutively, Oracle can perform a multi-block read and read
multiple blocks within the one LIO. This is based on factors such as
db_file_multiblock_read_count, system statistics, OS I/O characteristics,
the caching characteristics of the table and the "fudge-factor" that the
Oracle CBO applies in it's calculations.

For simplicity (and to keep the numbers really simple), assuming an
effective multi-block read of 10, we can read the entire table in
approximately 1,000,000 table blocks / 10 = 100,000 LIOs.  Note that
although these are larger and potentially more "costly" I/Os than the single
block I/Os used by the index, Oracle assumes by default that the actual cost
of each type of I/O to be the same. The optimizer_index_cost_adj parameter
can be used to more accurately estimate (if necessary) the relative cost of
a single block I/O to that of a FTS multi-block I/O. Again for simplicity,
we'll assume the default of 100 meaning that the cost of a single block I/O
is 100% (or the same) as a FTS I/O.

So, we now have our two comparative costings. The index access has a rough
cost of 1,010,002 and the FTS has a rough cost of just 100,000. The FTS wins
hands down.... Note for 40% of the data, the relative costs would have been
roughly 4,040,002 vs. 100,000. Even more hands down ...

The break-even point can now be calculated based on the above criteria, some
of which include:

- the selectivity of the query
- number of leaf blocks
- average number of leaf entries per leaf block
- height of index
- caching characteristics of index
- clustering factor of index
- number of table blocks (below HWM)
- average number of rows per block
- effective (or calculated) multi-block read
- caching characteristics of the table (which can influence the effective
multi-block read)
- relative cost of a single block I/O vs. a multi-block I/O
- amount of row migration / row chaining (although the CBO is not so good
with this)
- parallelism (potentially a major factor)

So your assumption that reading 40% of rows would cut the work by roughly
half is not correct. In the example above, it would actually cost about 40
times as much. In my long-winded manner, I hope this makes some kinda sense
and goes some way to explaining why.

One final piece of advice. Ignore any writings or suggestions that there is
a magical break even point is x% (where x could be 2% or 10% or 50% or
whatever). Hopefully the above will hint that there is *no* such percentage
as it all depends on too many factors. I can easily give you an example
where an index is most efficient when reading 0% of data and I can easily
give you an example where an index is most efficient when reading *100%* of
data (and *any* value in between). When one understands how the CBO
functions, one understands why such so-called rules of thumb are a nonsense.

Cheers

Richard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.