On Sep 13, 8:45 am, Charles Hooper <
hooperc2...@gmail.com> wrote:
> “The columns that really do need histograms, where it really does
> matter, are the indexed columns. Those are the only columns where the
> optimizer can choose the access path.”
> That is a common misconception: that histograms only matter for
> indexed columns. Choosing an access method ( index or tablescan ) for
> a row source is only part of the task of the CBO when putting together
> an access plan. Other, at least equally important, tasks are the types
> and especially order of joins – unless none of your sql contain joins.
Which, strangely enough, is the precise case of the examples provided
by Greg in that entry.
Sorry. but trying to convince anyone that joins can be affected with
an example that doesn't use joins is, IMHO, a bit of a dream.
Yes, we can see the difference in cardinality. Yes, the query in the
examples uses no indexes in ALL cases exposed. As it should, given
there are no indexes in those columns.
What exactly is the problem?
What, you telling me that in a join somehow magically the CBO would
create an index on the fly and use it if there were histograms for all
columns?
You don't use Peoplesoft HR much, do you? I can provide some concrete
examples where using histograms in all columns, indexed or not, can be
deadly in 10gr2.
In fact, there is now (FINALLY!!!) a documented process that
specifically eliminates histograms from certain tables used by PS for
paycalc runs and as a result gets reasonable performance instead of
the horrible pot-luck that is histograms combined with bind variable
peeking in 10gr2.
So, please: before we start crucifying those who by choice wouldn't
give a fig about being an ACE or OCP, wouldn't it be better to
actually provide proper examples if at all possible?
Particularly where relevant to a specific release?
It's not like the CBO behaves consistently across releases or has EVER
done so in recent history!...
> And it is there where column selectivities, derived from column
> statistics, play a crucial role. If you deprive the optimizer of this
> vital information it is bound to produce suboptimal, even very bad
> plans. If you have not yet encountered that count yourself lucky."
I've never encountered that situation and I am not particularly
lucky.
Given lack of indexes and partition pruning, the CBO can only produce
full table scans to get data, end of story and period.
That may or may not be bad, depending on a huge set of factors and
conditions.
Nothing to do with histograms on indexed or non-indexed columns.
> snippage...
> Now consider that table T1 has an unindexed column named STATUS - the
> status is one of RELEASED, CLOSED, and CANCELLED. Your particular
> query is looking for the few rows in table T1 with a STATUS of
> CANCELLED, with just 200 of the 1,000,000 rows matching that
> criteria. If a histogram were created on that column, the optimizer
> might change the cardinality estimates like this (note that it likely
> will not be exactly correct due to the selectivity estimated being
> multiplied together when multiple predicates are present with AND
> specified between the predicates:
> T1: 5
> T2: 500,000
> T3: 100,000
>
> In the above case, the selected join order might be T1 -> T3 -> T2
>
> I think that the above is a case where just collecting histograms on
> indexed columns might be limiting.
"might", "likely to change", and so on. See the problem?
Yes, there may be a particular case, once in a blue moon, when the
wind is blowing in the right direction, where this might be
necessary. In the other 99.999999% of the cases, it will never
happen. Production dbas deal with the 0.000001% case when/if needed,
otherwise they just set one size fits all and go with it. Nothing
wrong with that and certainly not an appropriate subject for
"crucification".
> Richard Foote stated:
> "I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can
> benefit from stats and histograms as well as indexed columns as the
> stats can provide important cardinality estimates of steps within an
> execution plan that can influence join orders and when steps are
> performed"
Like I said: "can" is not a synonym for "must" or "will"
> Why not index the STATUS column in my example? You could, but then
> what if you are looking for something different, like sales in the
> state of Alaska (probably much fewer than for California) - would you
> necessarily index a column that indicates the state of a customer?
Yes, of course I would! In the case where we are after a high
cardinality value, the indexes would work perfectly and as desired.
In the case of low cardinality, hopefully the histogram accompanying
that index - the case of "FOR ALL INDEXED COLUMNS" - would allow the
CBO to drop the index and FTS. And in a join the same would happen.
Isn't that what one would hope for (abstracting the usual CBO bugs)?
I've said it many times before: a specific example is not proof of a
much wider case's validity. I think it applies here more than
anything else. Although I'm not sure I'd follow Mladen's idea of "256
always" without further scrutiny, I can't see what the "crucification"
problem is with histograms on indexed columns.
> (On second thought, I probably would not have thought to manually
> build a histogram on this column either, but the statistics collection
> process might).
The lesser said about the "auto" stats gathering process, the better...