Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
multi-column indexes, statistics and selectivity
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
bugbear  
View profile  
 More options Dec 4 2009, 11:48 am
Newsgroups: comp.databases.oracle.server
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Date: Fri, 04 Dec 2009 16:48:43 +0000
Local: Fri, Dec 4 2009 11:48 am
Subject: multi-column indexes, statistics and selectivity
Can anyone tell me (or point me at information) about
how statistics are generated/used for multi-column indexes?

We have a 2 column status model (major and minor status, if you like).
This is key to our application.

"Naturally" we have put a 2 column index on these two columns.

But we have "doubts" about whether the CBO is always
doing as good as job as we'd like.

The key question seems to be:

Are the columns "treated as a pair" so that the
frequency estimate for a pair of statuses depends
on both of them or...

Are the columns treated in isolation
so that the frequency estimate
is simply obtained by getting the estimate
for each column, and combining them mathematically.

This would make a big difference in our app,
since our major/minor status (in practice) have interesting
and complex correlations.

    BugBear


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bugbear  
View profile  
 More options Dec 4 2009, 12:09 pm
Newsgroups: comp.databases.oracle.server
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Date: Fri, 04 Dec 2009 17:09:39 +0000
Local: Fri, Dec 4 2009 12:09 pm
Subject: Re: multi-column indexes, statistics and selectivity

bugbear wrote:
> Can anyone tell me (or point me at information) about
> how statistics are generated/used for multi-column indexes?

Apologies: I should have said "in 10g".

I've also found a reference to a new feature in 11g;
"Multi-column histograms"

If this is new in 11g it has "unfortunate" implications
for our running on 10g...

   BugBear


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
joel garry  
View profile  
 More options Dec 4 2009, 12:29 pm
Newsgroups: comp.databases.oracle.server
From: joel garry <joel-ga...@home.com>
Date: Fri, 4 Dec 2009 09:29:58 -0800 (PST)
Local: Fri, Dec 4 2009 12:29 pm
Subject: Re: multi-column indexes, statistics and selectivity
On Dec 4, 9:09 am, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:

> bugbear wrote:
> > Can anyone tell me (or point me at information) about
> > how statistics are generated/used for multi-column indexes?

> Apologies: I should have said "in 10g".

> I've also found a reference to a new feature in 11g;
> "Multi-column histograms"

> If this is new in 11g it has "unfortunate" implications
> for our running on 10g...

>    BugBear

Check this out:
http://richardfoote.wordpress.com/2008/09/16/index-monitoring-and-ind...

jg
--
@home.com is bogus.
"...Network hardware includes Dell servers, Dell workstations, IBM
RS6000 and P570 RISK based systems, Cisco routers, and HP ProCurve
switch’s.
MINIMUM REQUIREMENTS
Qualified candidates must have a Bachelor’s degree in Computer Science
or a related degree with 4-6 years programming experience with HTML/
CGI/Java, ASP.Net and an understanding of internet working and
knowledge of applications available on the WEB, Business Objects XIR2,
Crystal Reports XIR2, VB6 Programming Language, Visual Basic and
Studio, in an Oracle/AIX/2003 environment, or..." - metrolink job ad.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mladen Gogala  
View profile  
 More options Dec 4 2009, 5:21 pm
Newsgroups: comp.databases.oracle.server
From: Mladen Gogala <n...@email.here.invalid>
Date: Fri, 4 Dec 2009 22:21:09 +0000 (UTC)
Local: Fri, Dec 4 2009 5:21 pm
Subject: Re: multi-column indexes, statistics and selectivity

10g doesn't do multi-column histograms. 10g will create histograms for
every single column of the index and will estimate the number of entries
retrieved for each column by using those histograms. After that, it will
calculate the selectivity of the conditions by multiplying the
selectivities for every single column. Selectivity is the estimated
number of retrieved values divided by the total number of values in that
column. When the CBO gets the selectivity, it will multiply it by the
index clustering factor to estimate the number of blocks that need to be
retrieved. The correlation between the number of blocks and the price of
the query is known only to the God and Jonathan, but there definitely is
one.

--
http://mgogala.byethost5.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Klemme  
View profile  
 More options Dec 5 2009, 8:18 am
Newsgroups: comp.databases.oracle.server
From: Robert Klemme <shortcut...@googlemail.com>
Date: Sat, 05 Dec 2009 14:18:38 +0100
Local: Sat, Dec 5 2009 8:18 am
Subject: Re: multi-column indexes, statistics and selectivity
On 12/04/2009 11:21 PM, Mladen Gogala wrote:

Are there cases where it makes sense to use a FBI to get "multi column"
histogram information?  Of course, queries then would also have to use
that concatenated value as query criteria which makes usage of this
quite nasty (especially if the SQL is generated by some kind of
persistence container).

Cheers

        robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark D Powell  
View profile  
 More options Dec 5 2009, 10:26 am
Newsgroups: comp.databases.oracle.server
From: Mark D Powell <Mark.Powe...@hp.com>
Date: Sat, 5 Dec 2009 07:26:21 -0800 (PST)
Local: Sat, Dec 5 2009 10:26 am
Subject: Re: multi-column indexes, statistics and selectivity
On Dec 4, 11:48 am, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:

To add to what others have said, how the CBO treats the two columns is
going to in part depend on how the sql statement is written.  If the
query uses two bind variables, one for each indexed column, in the
where clause conditions then the CBO (ignoring bind variable peeking)
uses the default selectivity (.05) for the first column times the
default selectivity of the second column in its calculations.  If the
data is not skewed then the presence or absence of histograms will
probably not have a major mpact on the plan.

My question to you though is major/minor status of what?  Will you be
seeking all occurrences of whatever with a specific major status or
all rows with a specific minor status?  What I am trying to bring into
consideration is if the status columns need to be associated with
another key.  That is you might need  the index to be major/minor/
part_no or something like this depending on it major/minor is the PK
of one of your tables or jsut really an attribute set of some of your
tables.  The three part key would allow more selective determination
of the target rows before the table has to be accessed.

HTH -- Mark D Powell --


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mladen Gogala  
View profile  
 More options Dec 5 2009, 4:35 pm
Newsgroups: comp.databases.oracle.server
From: Mladen Gogala <gogala.mla...@gmail.com>
Date: Sat, 5 Dec 2009 21:35:37 +0000 (UTC)
Local: Sat, Dec 5 2009 4:35 pm
Subject: Re: multi-column indexes, statistics and selectivity

On Sat, 05 Dec 2009 14:18:38 +0100, Robert Klemme wrote:
> Are there cases where it makes sense to use a FBI to get "multi column"
> histogram information?

Hmmm, let me quote from the 11g book: if the columns are related in the
way that breaks the 3rd normal form, such statistics would make sense.
The famous example by Jonatan was about zodiac signs and months of birth.
The 10g CBO would produce an incorrect estimate for the number of people
with the sign of Capricorn born in June. There are, of course, none, but
you can't know that without a multi-column histogram.

--
http://mgogala.byethost5.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Klemme  
View profile  
 More options Dec 6 2009, 7:33 am
Newsgroups: comp.databases.oracle.server
From: Robert Klemme <shortcut...@googlemail.com>
Date: Sun, 06 Dec 2009 13:33:06 +0100
Local: Sun, Dec 6 2009 7:33 am
Subject: Re: multi-column indexes, statistics and selectivity
On 05.12.2009 22:35, Mladen Gogala wrote:

> On Sat, 05 Dec 2009 14:18:38 +0100, Robert Klemme wrote:

>> Are there cases where it makes sense to use a FBI to get "multi column"
>> histogram information?

> Hmmm, let me quote from the 11g book: if the columns are related in the
> way that breaks the 3rd normal form, such statistics would make sense.
> The famous example by Jonatan was about zodiac signs and months of birth.
> The 10g CBO would produce an incorrect estimate for the number of people
> with the sign of Capricorn born in June. There are, of course, none, but
> you can't know that without a multi-column histogram.

That makes a lot of sense!  Thank you, Mladen!

Kind regards

        robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mladen Gogala  
View profile  
 More options Dec 6 2009, 3:58 pm
Newsgroups: comp.databases.oracle.server
From: Mladen Gogala <gogala.mla...@gmail.com>
Date: Sun, 6 Dec 2009 20:58:51 +0000 (UTC)
Local: Sun, Dec 6 2009 3:58 pm
Subject: Re: multi-column indexes, statistics and selectivity

Give the credit where the credit is due. The original source is Jonathan
Lewis, not me. I do confess to reading his articles vewy, vewy cawefully.
I was unable to find the example quickly so here is where Tom Kyte also
invokes the same famous example: http://tinyurl.com/yjw5gbe

--
http://mgogala.freehostia.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bugbear  
View profile  
 More options Dec 7 2009, 9:54 am
Newsgroups: comp.databases.oracle.server
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Date: Mon, 07 Dec 2009 14:54:23 +0000
Local: Mon, Dec 7 2009 9:54 am
Subject: Re: multi-column indexes, statistics and selectivity

Robert Klemme wrote:
> Are there cases where it makes sense to use a FBI to get "multi column"
> histogram information?  Of course, queries then would also have to use
> that concatenated value as query criteria which makes usage of this
> quite nasty (especially if the SQL is generated by some kind of
> persistence container).

I though about this rather carefully.

It is possible to do this (in my case, at least)

However, the "accurate" statistics associated
with the FBI make it quite important
that there are not "too many" other indexes that
the CBO might consider.

Because due to the inapropriate estimation
of selectivity the CBO is quite determined to
use almost-anything-other that YOUR FBI.

It doesn't "know" that the apparently low
selectivity of your FBI is because it's
the only index with accurate stats!

   BugBear


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Lewis  
View profile  
 More options Dec 11 2009, 10:43 am
Newsgroups: comp.databases.oracle.server
From: "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
Date: Fri, 11 Dec 2009 15:43:16 -0000
Local: Fri, Dec 11 2009 10:43 am
Subject: Re: multi-column indexes, statistics and selectivity
"bugbear" <bugbear@trim_papermule.co.uk_trim> wrote in message

news:i7ednZdyeur2oITWnZ2dnUVZ7txi4p2d@brightview.co.uk...

If you create histograms on the two columns separately and
use literals in the query, then Oracle can use the histograms
to calculate the selectivity of the two columns separately and
multiply to get the overall selectivity.  If you are using bind
variables, then the values used for this optimisation step will
be the peeked bind values on the first call - and that may not
be good for subsequent calls).

For a pair of status columns with (we assume) a small number
of distinct values, it makes sense to use literals and histograms.
(And then it's a good idea to construct histograms, rather than
letting the automatic stats collection gather them - see:
http://jonathanlewis.wordpress.com/2009/05/28/frequency-histograms/

If you don't have histograms, then there are various possibilities
depending on version of Oracle. The optimizer may multiply the
column selectivities, but it may base its arithmetic on the number
of distinct keys in the index.  The behaviour varies between
10.2.0.1, 10.2.0.3 and 10.2.0.4.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Lewis  
View profile  
 More options Dec 11 2009, 10:49 am
Newsgroups: comp.databases.oracle.server
From: "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
Date: Fri, 11 Dec 2009 15:49:52 -0000
Local: Fri, Dec 11 2009 10:49 am
Subject: Re: multi-column indexes, statistics and selectivity

"Mark D Powell" <Mark.Powe...@hp.com> wrote in message
news:80408c17-9e36-4602-a787-94f017a1edcc@g23g2000vbr.googlegroups.com...

To add to what others have said, how the CBO treats the two columns is
going to in part depend on how the sql statement is written.  If the
query uses two bind variables, one for each indexed column, in the
where clause conditions then the CBO (ignoring bind variable peeking)
uses the default selectivity (.05) for the first column times the
default selectivity of the second column in its calculations.

Mark

That 5% figure does appear in various places in the optimizer's arithmetic,
but in this case is would be using the product of 1/num_distinct for the
two columns, or 1/distinct_keys for the index (depending on version)

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »