index usage

hastenthunder

unread,

Jan 19, 2005, 9:26:09 AM1/19/05

to

Hello,

I've read many documentations online stating to only create an index if
queries against this table frequently retrieve less than 15% of the rows.
However, if the query returns, say, 40% of the rows, wouldn't indexing the
column still help by cutting the work by roughly half?

hastenthunder

Arun Mathur

unread,

Jan 19, 2005, 10:57:39 AM1/19/05

to

Hello,

I would start by gathering some concrete metrics on your query.

There are several rules of thumb regarding when and when not to use an
index. I typically fire up SQL*Plus, set autotrace on, and then perform
the query that's of interest. My goal is to bring the number of
consistent gets down, and of course, the time it takes to run the
query. If the number of consistent gets decreases after creating an
index, the optimizer is most probably taking advantage of it. You'll
know for sure by looking at its explain plan. Apart from that, it's
equally important to look at the query itself, as many times in my
experience, the performance suffered because of the way I wrote the
query.

Good luck, and let me know how things go.

Regards,
Arun

Holger Baer

unread,

Jan 19, 2005, 11:40:10 AM1/19/05

to

The number actually varies (down to 2% in some cases) and, as they say, depends
on many factors.

But just think about it:
If you're going to read about half a book - would you really go through the index
or would you rather just leafe through the book and skip those pages that don't
interest you?

The same logic applies to table access via an index. So in fact using an index might
increase the work instead of cutting down. And the threshold where using the index
is increases the work depends on many factors, that's why those numbers when an index
makes sense vary so much.

HTH
Holger

Richard Foote

unread,

Jan 20, 2005, 6:36:43 AM1/20/05

to

"hastenthunder" <hasten...@hotmail.com> wrote in message
news:56uHd.2452$Ny6....@mencken.net.nih.gov...

A much *simplified* example on how I teach this stuff...

Let's say we have a table that has 10,000,000 rows which are stored in
1,000,000 data blocks meaning we have approximately 10 rows per block on
average.

Let's say we have an index on this table that has 100,000 leaf blocks
meaning we have on average approximately 100 leaf entries per leaf block the
index has 3 levels.

Let's also say we have an "effective" multi-block read capability of 10
blocks per I/O (meaning Oracle will read 10 "consecutive" blocks at a time
on average during a full table scan multiblock read).

Finally, let's say we're interested in accessing *just* 10% of the data (or
1,000,000 of the total 10,000,000 rows). Will Oracle use the index or won't
it ? Hopefully, I've picked an easy set of numbers to help illustrate the
answer ...

Firstly, to calculate the "cost" of using the index access path.

We need to read the root block + a branch block in order to get to the first
leaf block of interest. That's 2 logical I/Os (LIOs). We then need to read
approximately 10% of the leaf blocks in order to get our 1,000,000 leaf
entries required to directly access our 1,000,000 rows of interest, that's
10% of the 100,000 leaf blocks = 10,000 leaf blocks. Because we're reading
an index via a range scan and because the leaf blocks are not (necessarily)
physically co-related, Oracle must read each leaf block via a single I/O. So
that's 10,000 LIOs. So, just to read the index alone, we require 2 + 10,000
= 10,002 LIOs.

Note by default, Oracle assumes the above "cost" to be physical I/Os (PIOs).
Now assuming this index is heavily accessed, a good number of these index
blocks may already be cached in memory. The optimizer_index_caching
parameter can be used to adjust the above cost by suggesting that x% are
actually already cached and so are "cheaper" to access. To keep things
simple, we'll assume the default value of 0% or that no index blocks are
actually likely to be cached (generally not a wise assumption but let's keep
the arithmetic simple).

To access the corresponding table blocks, again Oracle can only perform
these reads via a single block read as each index entry points to a table
block that contains it's specific table row . Now we're after 1,000,000 rows
which means we require 1,000,000 LIOs in order to access the required rows.
Question is, how many *different* table blocks do we need to access ? Well,
this is entirely dependent on the Clustering Factor (CF) of the index, or
how closely aligned are the corresponding rows in the table in relation to
the order of the index (which must be in the order of the index values). In
the "best" possible case, all the required rows are all ordered and grouped
together in the same "collection" of table blocks meaning we only have to
access 10% of the 1,000,000 table blocks or 100,000 table blocks in a
roughly *consecutively* manner.

However, as is more common, if the required rows are randomly and evenly
distributed among the table blocks, then on average we need to read 1 row
(10%) from *each and every table block*. Note in your case of wanting to
access 40% of the data, we might depending on a poor CF need to visit on
average *each and every* data block *4 times*. This is the key point (no pun
intended).

The greater the number of differing blocks we access, then the less likely
we will find the block in memory from it being previously read and the more
likely that the block will need to be read from disk (PIO). Oracle considers
this and uses the CF in it's costing calculations. Assuming a randomly
distributed set of required rows, note we will need to visit *all* the table
blocks on average because on average we are interested in 1 in 10 of the
rows that each block contains (yes, some blocks may not actually be visited
and some may be visited a number of times but with such volume of blocks, it
conceivably might be a significant duration between reads to the same block
meaning it could easily have been aged and be physically re-read anyways).

The point though is that it's 1,000,000 LIOs regardless, of which a very
significant number *could* be *actual distinct* (or differing) blocks. So
that's 10,002 for the index + 1,000,000 for the table = 1,010,002 LIOs to
read *just* 10% of the data via an index.

Now to calculate the "cost" of a FTS. A FTS has a number of advantages over
an index access path. Firstly, because we read each block "consecutively"
(kinda) Oracle can investigate the appropriate selectiveness of each row
within the block ensuring that each table block is read just *once* (special
blocks such as extent maps withstanding). Secondly, again because each block
is read consecutively, Oracle can perform a multi-block read and read
multiple blocks within the one LIO. This is based on factors such as
db_file_multiblock_read_count, system statistics, OS I/O characteristics,
the caching characteristics of the table and the "fudge-factor" that the
Oracle CBO applies in it's calculations.

For simplicity (and to keep the numbers really simple), assuming an
effective multi-block read of 10, we can read the entire table in
approximately 1,000,000 table blocks / 10 = 100,000 LIOs. Note that
although these are larger and potentially more "costly" I/Os than the single
block I/Os used by the index, Oracle assumes by default that the actual cost
of each type of I/O to be the same. The optimizer_index_cost_adj parameter
can be used to more accurately estimate (if necessary) the relative cost of
a single block I/O to that of a FTS multi-block I/O. Again for simplicity,
we'll assume the default of 100 meaning that the cost of a single block I/O
is 100% (or the same) as a FTS I/O.

So, we now have our two comparative costings. The index access has a rough
cost of 1,010,002 and the FTS has a rough cost of just 100,000. The FTS wins
hands down.... Note for 40% of the data, the relative costs would have been
roughly 4,040,002 vs. 100,000. Even more hands down ...

The break-even point can now be calculated based on the above criteria, some
of which include:

- the selectivity of the query
- number of leaf blocks
- average number of leaf entries per leaf block
- height of index
- caching characteristics of index
- clustering factor of index
- number of table blocks (below HWM)
- average number of rows per block
- effective (or calculated) multi-block read
- caching characteristics of the table (which can influence the effective
multi-block read)
- relative cost of a single block I/O vs. a multi-block I/O
- amount of row migration / row chaining (although the CBO is not so good
with this)
- parallelism (potentially a major factor)

So your assumption that reading 40% of rows would cut the work by roughly
half is not correct. In the example above, it would actually cost about 40
times as much. In my long-winded manner, I hope this makes some kinda sense
and goes some way to explaining why.

One final piece of advice. Ignore any writings or suggestions that there is
a magical break even point is x% (where x could be 2% or 10% or 50% or
whatever). Hopefully the above will hint that there is *no* such percentage
as it all depends on too many factors. I can easily give you an example
where an index is most efficient when reading 0% of data and I can easily
give you an example where an index is most efficient when reading *100%* of
data (and *any* value in between). When one understands how the CBO
functions, one understands why such so-called rules of thumb are a nonsense.

Cheers

Richard

Richard Foote

unread,

Jan 20, 2005, 6:58:10 AM1/20/05

to

"Holger Baer" <holge...@science-computing.de> wrote in message
news:csm2hc$ssu$1...@news.BelWue.DE...

> hastenthunder wrote:
>> Hello,
>>
>> I've read many documentations online stating to only create an index if
>> queries against this table frequently retrieve less than 15% of the rows.
>> However, if the query returns, say, 40% of the rows, wouldn't indexing
>> the
>> column still help by cutting the work by roughly half?
>>
>>
>> hastenthunder
>>
>>
>
> The number actually varies (down to 2% in some cases) and, as they say,
> depends
> on many factors.

Hi Holger,

And much lower than 2% in other cases ...

>
> But just think about it:
> If you're going to read about half a book - would you really go through
> the index
> or would you rather just leafe through the book and skip those pages that
> don't
> interest you?

The point that some people find hard "to picture" is that to read half a
book, you might need to actually access *every* page. Why, because you might
only be interested on average with the information on 1/2 of every page. 1/2
a book doesn't necessarily mean 1/2 the pages but 1/2 of every page.

With Oracle "pages", this is more likely the higher the selectivity and the
worse the CF of the table in relation to the required data set.

Cheers

Richard

Holger Baer

unread,

Jan 20, 2005, 9:47:57 AM1/20/05

to

Richard Foote wrote:
> "Holger Baer" <holge...@science-computing.de> wrote in message
> news:csm2hc$ssu$1...@news.BelWue.DE...
>
>>hastenthunder wrote:
>>
>>>Hello,
>>>
>>>I've read many documentations online stating to only create an index if
>>>queries against this table frequently retrieve less than 15% of the rows.
>>>However, if the query returns, say, 40% of the rows, wouldn't indexing
>>>the
>>>column still help by cutting the work by roughly half?
>>>
>>>
>>>hastenthunder
>>>
>>>
>>
>>The number actually varies (down to 2% in some cases) and, as they say,
>>depends
>>on many factors.
>
>
> Hi Holger,
>
> And much lower than 2% in other cases ...

What I meant was, that the numbers I've read sofar went down to 2%, I didn't
intend to create a lower bound, but thanks for the correction.

>
>
>>But just think about it:
>>If you're going to read about half a book - would you really go through
>>the index
>>or would you rather just leafe through the book and skip those pages that
>>don't
>>interest you?
>
>
> The point that some people find hard "to picture" is that to read half a
> book, you might need to actually access *every* page. Why, because you might
> only be interested on average with the information on 1/2 of every page. 1/2
> a book doesn't necessarily mean 1/2 the pages but 1/2 of every page.
>
> With Oracle "pages", this is more likely the higher the selectivity and the
> worse the CF of the table in relation to the required data set.
>
> Cheers
>
> Richard
>
>

I've still got to hone my skills in writing analogies in a language that is not my own,
so while struggling for the right words I missed this important part. ;-(

Thanks.

Oh, btw, I seem to remember that you once did a presentation how block splitting in
indexes work. Is that still available somewhere? Any pointers would be much appreciated.

Holger

Niall Litchfield

unread,

Jan 20, 2005, 3:57:23 PM1/20/05

to

"Richard Foote" <richar...@bigpond.nospam.com> wrote in message
news:fJMHd.126019$K7.8...@news-server.bigpond.net.au...

> I can easily give you an example where an index is most efficient when
> reading 0% of data

snip all the rest of the good stuff to say

1. welcome back. and
2. er, wouldnt not executing the query at all be more efficient in the 0%
case?

--
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com

Richard Foote

unread,

Jan 20, 2005, 4:18:51 PM1/20/05

to

"Niall Litchfield" <niall.li...@dial.pipex.com> wrote in message
news:41f01b1e$0$16583$cc9e...@news-text.dial.pipex.com...

> "Richard Foote" <richar...@bigpond.nospam.com> wrote in message
> news:fJMHd.126019$K7.8...@news-server.bigpond.net.au...
>> I can easily give you an example where an index is most efficient when
>> reading 0% of data
>
> snip all the rest of the good stuff to say
>
> 1. welcome back. and

Hi Niall

Thank you

> 2. er, wouldnt not executing the query at all be more efficient in the 0%
> case?
>

Possibly but then how can you determine that there are no space seats left
on flight Q123 from Canberra to Brisbane on Jan 26 2005 without first
querying the database and selecting no rows ...

An index is the perfect way to select no rows (btw, Oracle uses them all the
time to do the same thing when validating a value in a unique constraint).

Cheers ;)

Richard

Richard Foote

unread,

Jan 20, 2005, 4:22:58 PM1/20/05

to

"Holger Baer" <holge...@science-computing.de> wrote in message

news:csogau$r6v$1...@news.BelWue.DE...

> Oh, btw, I seem to remember that you once did a presentation how block
> splitting in
> indexes work. Is that still available somewhere? Any pointers would be
> much appreciated.
>

Hi Holger,

You can google the newsgroup archives as I posted a demonstration right here
a while back. Recently, I found the same post at this web-site:
www.dbaclick.com (it's about the third item down).

Cheers

Richard

lamb...@yahoo.com

unread,

Jan 20, 2005, 5:35:15 PM1/20/05

to

Richard,

Very informational. Thanks a lot.

Ram.

Niall Litchfield

unread,

Jan 21, 2005, 11:13:25 AM1/21/05

to

Richard Foote wrote:
> >> I can easily give you an example where an index is most efficient
when
> >> reading 0% of data
> >
> > snip all the rest of the good stuff to say
> >
> > 1. welcome back. and
>
> Hi Niall
>
> Thank you
>
> > 2. er, wouldnt not executing the query at all be more efficient in
the 0%
> > case?
> >
>
> Possibly but then how can you determine that there are no space seats
left
> on flight Q123 from Canberra to Brisbane on Jan 26 2005 without first

> querying the database and selecting no rows ...
>
> An index is the perfect way to select no rows (btw, Oracle uses them
all the
> time to do the same thing when validating a value in a unique
constraint).

Ah. at the risk of driving you away again you referred to *reading* 0%
of the data rather than *returning* 0% of the data - it was an attempt
at pedant humour, albeit with the serious point that apps often query
databases for data they already know, I'd promise not to do it again
but I fear that would be a broken promise.

What I want is a seat from Heathrow to Pretoria for England winning the
test series on tuesday - for GBP1. In this case I really don't need to
read the relevant table - I know that there will be an empty resultset
:(.

Niall

DA Morgan

unread,

Jan 21, 2005, 2:35:41 PM1/21/05

to

Richard Foote wrote:

A well written and remarkable piece of writing. I have copied this post
to my indexes page with full attribution for its source.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu
(replace 'x' with 'u' to respond)

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---