wildcard search and full tablescans

wage...@yahoo.com

unread,

Dec 7, 2006, 6:55:32 AM12/7/06

to

Oracle10g 10.2.0.1

select * from <table_name> where <column_name> like '%abc%';

does not use the index and does a full tablescan (guess this is the
expected behaviour as per explain plan).

Any suggestions as to how to use the index (index hint didn't help), IF
possible or other alternatives.

thanks

sybrandb

unread,

Dec 7, 2006, 7:03:46 AM12/7/06

to

Index skip scan might work, but I doubt whether it will work when both
the leading and the trailing part of a column are unknown.
Actually Oracle Context/InterMedia, with the CONTAINS operator was
invented for queries like this. I have never used it, so I can't help
you further, but the very goal of CONTAINS is to address queries like
this one.

Hth,
--
Sybrand Bakker
Senior Oracle DBA

hpuxrac

unread,

Dec 7, 2006, 8:52:53 AM12/7/06

to

If you CANNOT begin the search with some leading characters ( not the
percent sign ) ... ie where some_col like 'abc%' ... then normally I
would expect to the optimizer to pick either a full table scan or some
kind of index scan.

You might want to do a 10053 trace and some analysis of where the
optimizer is going with your query.

You can put in a hint like this select /*+ INDEX(table_or_view_name
optional_index_name) */ column_list FROM etc ...

Eiher a full table scan or a full index scan ( of some type ) produces
an application that has scalability limitations built into it. ( Your
logical IO's are very high ).

For full searching within a column there are options including the TEXT
indexes but they have some complications ( different SQL where syntax,
maintenance and cpu implications etc ).

Oracle will work nicely with regular character indexes IF you put a
where clause and a like that DOESN'T start with a percent sign.

Charles Hooper

unread,

Dec 8, 2006, 5:42:15 AM12/8/06

to

Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF)
http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf
"Pattern Matching on Indexed Columns
When you use LIKE to search an indexed column for a pattern, Oracle can
use the
index to improve performance of a query if the leading character in the
pattern is not %
or _. In this case, Oracle can scan the index by this leading
character. If the first
character in the pattern is % or _, then the index cannot improve
performance because
Oracle cannot scan the index."

I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or
others that detail why the index cannot be used in such a such a case,
but cannot locate a reference to one of the write-ups. As a general
rule, if more than 20% to 25% of a table's rows are expected to be
read, it is usually less expensive to perform a full table scan, rather
than a scan of the index and then an access by ROWID to read the data
from the table. Note that using a % or _ wildcard for the initial
character, Oracle would be required to access every block in the index
and then potentially every row in the table by ROWID.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.

hpuxrac

unread,

Dec 8, 2006, 7:09:15 AM12/8/06

to

Charles Hooper wrote:
> wage...@yahoo.com wrote:
> > Oracle10g 10.2.0.1
> >
> > select * from <table_name> where <column_name> like '%abc%';
> >
> > does not use the index and does a full tablescan (guess this is the
> > expected behaviour as per explain plan).
> >
> > Any suggestions as to how to use the index (index hint didn't help), IF
> > possible or other alternatives.
> >
> > thanks
>
> Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF)
> http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf
> "Pattern Matching on Indexed Columns
> When you use LIKE to search an indexed column for a pattern, Oracle can
> use the
> index to improve performance of a query if the leading character in the
> pattern is not %
> or _. In this case, Oracle can scan the index by this leading
> character. If the first
> character in the pattern is % or _, then the index cannot improve
> performance because
> Oracle cannot scan the index."
>
> I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or
> others that detail why the index cannot be used in such a such a case,
> but cannot locate a reference to one of the write-ups.

Because the regular indexes are built based on all the "characters"
starting with the first character. If you search for LIKE 'X%' then you
start with index entries ( range scan ) that have an X in the first
character. If the SQL says LIKE '%X%' then you need to look at the
index entries that have "AX%", 'BX%', 'CX%' etc.

In other words all of the index entries. In certain cases it still
"might be" better to look at them then get rowids versus tablescans.
Cases could be built either way probably.

> As a general
> rule, if more than 20% to 25% of a table's rows are expected to be
> read, it is usually less expensive to perform a full table scan, rather
> than a scan of the index and then an access by ROWID to read the data
> from the table. Note that using a % or _ wildcard for the initial
> character, Oracle would be required to access every block in the index
> and then potentially every row in the table by ROWID.

I think the big point to emphasize is that if you specify LIKE
'%PATTERN%' the most straightforward execution path for oracle is
either a tablescan or a complete index search. Both of those are not
good candidates for producing scalable applications. The logical IO is
a huge problem for systems that need to support concurrent access to
more than a few queries per time interval.

Many applications don't really require searching within a string.
Customer last name searches can usually begin with one or more
characters of the last name.

If you really need to support searching within a string there are other
things to consider such as text indexes.

DA Morgan

unread,

Dec 8, 2006, 11:39:24 AM12/8/06

to

Jonathan and other, oh how I hate to try paraphrasing others accurately,
have been generally dismissive of throwing out that 20-25% number. I
have run tests in my lab where I have been able to show wildly different
numbers for the optimizer depending on many factors including version,
caching related parameters, etc. But as a general rule it is certainly a
good thing to keep in mind that >50% is probably not a winning
percentage for index usage.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

hpuxrac

unread,

Dec 8, 2006, 12:28:53 PM12/8/06

to

Anytime you have an application that is either doing full tablescans or
chewing up all of an index to return the results of a select within a
character column, you have a problem.

You might be able to get away with it in a low volume data warehouse
environment. A high volume OLTP environment that's going to be a
scalability killer.

The important point is to consider changes to your indexing choices IF
you really require searching anywhere within a text string.

Charles Hooper

unread,

Dec 8, 2006, 3:43:21 PM12/8/06

to

DA Morgan wrote:
> Charles Hooper wrote:

Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte:
B*Tree indexes should be used only when retrieving a small portion of
the rows in a table - less than 20% of the rows.

I have seen similar comments from other respectable sources. The 20%
to 25% is a rough guideline. A read through "Cost-Based Oracle
Fundamentals" leaves the impression that nothing is that simple. If
the database has an 8KB block size, and the average row length is 20
bytes, does the 20% to 25% rough guideline hold true? If the database
has an 8KB block size, and the average row length is 8000 bytes, does
the 20% to 25% rough guideline hold true? What if the last 80% of the
rows added to the table are deleted causing a high water mark that is
far beyond the last used block, does the 20% to 25% rough guideline
hold true? Yes, No, and Maybe are all possibly correct answers.

Maybe Gaja Vaidyanatha put it best in "Oracle Performance Tuning 101"
(page 76), when referring to an article written by Cary Millsap in
1993:
"It is pretty clear that the use of an index for a query should not be
determined by some arbitrary percentage of the rows processed or
selected from the table, instead it should be determined by the number
of blocks visited to retrieve the data. If the number of blocks
visited for an index is lower than a full table scan, then the index
will be useful. Otherwise, a full tablescan will provide much better
performance."

But then, you must also consider the number of index and table blocks
that may already be in the buffer cache, the multiblock read count, the
time required to retrieve a single block compared with the time to
retrieve multiple blocks, etc.

I am not disagreeing with your comments on this topic. However, if
someone asks me why Oracle is not using my index, it is very easy to
ask, does Oracle expect to retrieve more than 20% to 25% of the rows by
index? If the answer is Yes or Maybe, then that is why Oracle is not
using your index.

hpuxrac

unread,

Dec 8, 2006, 7:57:57 PM12/8/06

to

There's no magic formula.

To me it doesn't matter in the case of a search as submitted by the OP
whether an index or a full tablescan is used.

Either type of approach is likely to produce an application that has
inherent scalability limitations due to a large number of Logical IO's.

Searching "anywhere" within a text string requires thought and research
and normally an approach that does not rely on using a regular oracle
index based on some character or varchar column.

Searching from the beginning of a string is different.

Discussion of percentages etc in this particular case is missing the
forest because of the trees.

DA Morgan

unread,

Dec 11, 2006, 9:39:38 PM12/11/06

to

Charles Hooper wrote:

> Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte:
> B*Tree indexes should be used only when retrieving a small portion of
> the rows in a table - less than 20% of the rows.

Oh I know Tom wrote that and I suspect he regrets having done so. Not
because it isn't often correct ... but because often it is not correct.

I can, for example, fiddle with optimizer_index_cost_adj and
optimizer_index_caching and make that number come out just about
anywhere I want.

Remember Connor's demo of how he could dial in just about any hit ratio
he wanted? Somewhere I have one that does the same thing with B*Trees.
I'll see if I can find it.

Regards

StefanKapitza

unread,

Dec 12, 2006, 2:31:45 AM12/12/06

to

you could try

http://www.dominicgiles.com/technicalpapers.html

(the Article about Bitmap Index)

regards

s.kapitza

thoma...@oracle.com

unread,

Dec 12, 2006, 9:01:40 AM12/12/06

to

DA Morgan wrote:
> Charles Hooper wrote:
>
> > Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte:
> > B*Tree indexes should be used only when retrieving a small portion of
> > the rows in a table - less than 20% of the rows.
>
> Oh I know Tom wrote that and I suspect he regrets having done so. Not
> because it isn't often correct ... but because often it is not correct.
>

Not at all - as was said "paraphrase" which could also be equated with
"out of context"

This was in a section where I was talking about indexes in general -
and I actually used "somewhere between 1 to 20%" - but caveatted it
heavily. You might use an index to retrieve EVERY row from a table
(first rows optimization for example), but in general, you are using
indexes to retrieve a relatively small percentage of the tables row.

I never said this was a hard and fast number - never said "never use it
for more than 20%". I don't really "regret" writing it as I did

....
Here, only the index was used to answer the query-it would not matter
now what percentage of rows we were accessing, as we would use the
index only. We can see from the plan that the underlying table was
never accessed; we simply scanned the index structure itself.

It is important to understand the difference between the two concepts.
When we have to do a TABLE ACCESS BY INDEX ROWID, we must ensure we are
accessing only a small percentage of the total blocks in the table,
which typically equates to a small percentage of the rows, or that we
need the first rows to be retrieved as fast as possible (the end user
is waiting for them impatiently). If we access too high a percentage of
the rows (larger than somewhere between 1 and 20 percent of the rows),
then it will generally take longer to access them via a B*Tree than by
just full scanning the table.

With the second type of query, where the answer is found entirely in
the index, we have a different story. We read an index block and pick
up many "rows" to process, then we go on to the next index block,
and so on-we never go to the table. There is also a fast full scan we
can perform on indexes to make this even faster in certain cases. A
fast full scan is when the database reads the index blocks in no
particular order; it just starts reading them. It is no longer using
the index as an index, but even more like a table at that point. Rows
do not come out ordered by index entries from a fast full scan.

In general, a B*Tree index would be placed on columns that we use
frequently in the predicate of a query, and we would expect some small
fraction of the data from the table to be returned or the end user
demands immediate feedback. On a thin table (i.e., a table with few or
small columns), this fraction may be very small. A query that uses this
index should expect to retrieve 2 to 3 percent or less of the rows to
be accessed in the table. On a fat table (i.e., a table with many
columns or very wide columns), this fraction might go all the way up to
20 to 25 percent of the table. This advice doesn't always seem to
make sense to everyone immediately; it is not intuitive, but it is
accurate. An index is stored sorted by index key. The index will be
accessed in sorted order by key. The blocks that are pointed to are
stored randomly in a heap. Therefore, as we read through an index to
access the table, we will perform lots of scattered, random I/O. By
"scattered," I mean that the index will tell us to read block 1,
block 1,000, block 205, block 321, block 1, block 1,032, block 1, and
so on-it won't ask us to read block 1, then block 2, and then block
3 in a consecutive manner. We will tend to read and reread blocks in a
very haphazard fashion. This single block I/O can be very slow.
...............................

DA Morgan

unread,

Dec 12, 2006, 12:29:40 PM12/12/06

to

thoma...@oracle.com wrote:
> DA Morgan wrote:

> I never said this was a hard and fast number - never said "never use it
> for more than 20%". I don't really "regret" writing it as I did

I stand corrected. Thank you for the clarification of your intent.

hpuxrac

unread,

Dec 12, 2006, 12:47:13 PM12/12/06

to

Here's the question as posed by the OP ...