> select * from <table_name> where <column_name> like '%abc%';
> does not use the index and does a full tablescan (guess this is the > expected behaviour as per explain plan).
> Any suggestions as to how to use the index (index hint didn't help), IF > possible or other alternatives.
> thanks
Index skip scan might work, but I doubt whether it will work when both the leading and the trailing part of a column are unknown. Actually Oracle Context/InterMedia, with the CONTAINS operator was invented for queries like this. I have never used it, so I can't help you further, but the very goal of CONTAINS is to address queries like this one.
> select * from <table_name> where <column_name> like '%abc%';
> does not use the index and does a full tablescan (guess this is the > expected behaviour as per explain plan).
> Any suggestions as to how to use the index (index hint didn't help), IF > possible or other alternatives.
> thanks
If you CANNOT begin the search with some leading characters ( not the percent sign ) ... ie where some_col like 'abc%' ... then normally I would expect to the optimizer to pick either a full table scan or some kind of index scan.
You might want to do a 10053 trace and some analysis of where the optimizer is going with your query.
You can put in a hint like this select /*+ INDEX(table_or_view_name optional_index_name) */ column_list FROM etc ...
Eiher a full table scan or a full index scan ( of some type ) produces an application that has scalability limitations built into it. ( Your logical IO's are very high ).
For full searching within a column there are options including the TEXT indexes but they have some complications ( different SQL where syntax, maintenance and cpu implications etc ).
Oracle will work nicely with regular character indexes IF you put a where clause and a like that DOESN'T start with a percent sign.
> select * from <table_name> where <column_name> like '%abc%';
> does not use the index and does a full tablescan (guess this is the > expected behaviour as per explain plan).
> Any suggestions as to how to use the index (index hint didn't help), IF > possible or other alternatives.
> thanks
Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF) http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf "Pattern Matching on Indexed Columns When you use LIKE to search an indexed column for a pattern, Oracle can use the index to improve performance of a query if the leading character in the pattern is not % or _. In this case, Oracle can scan the index by this leading character. If the first character in the pattern is % or _, then the index cannot improve performance because Oracle cannot scan the index."
I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or others that detail why the index cannot be used in such a such a case, but cannot locate a reference to one of the write-ups. As a general rule, if more than 20% to 25% of a table's rows are expected to be read, it is usually less expensive to perform a full table scan, rather than a scan of the index and then an access by ROWID to read the data from the table. Note that using a % or _ wildcard for the initial character, Oracle would be required to access every block in the index and then potentially every row in the table by ROWID.
Charles Hooper PC Support Specialist K&M Machine-Fabricating, Inc.
Charles Hooper wrote: > wagen...@yahoo.com wrote: > > Oracle10g 10.2.0.1
> > select * from <table_name> where <column_name> like '%abc%';
> > does not use the index and does a full tablescan (guess this is the > > expected behaviour as per explain plan).
> > Any suggestions as to how to use the index (index hint didn't help), IF > > possible or other alternatives.
> > thanks
> Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF) > http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf > "Pattern Matching on Indexed Columns > When you use LIKE to search an indexed column for a pattern, Oracle can > use the > index to improve performance of a query if the leading character in the > pattern is not % > or _. In this case, Oracle can scan the index by this leading > character. If the first > character in the pattern is % or _, then the index cannot improve > performance because > Oracle cannot scan the index."
> I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or > others that detail why the index cannot be used in such a such a case, > but cannot locate a reference to one of the write-ups.
Because the regular indexes are built based on all the "characters" starting with the first character. If you search for LIKE 'X%' then you start with index entries ( range scan ) that have an X in the first character. If the SQL says LIKE '%X%' then you need to look at the index entries that have "AX%", 'BX%', 'CX%' etc.
In other words all of the index entries. In certain cases it still "might be" better to look at them then get rowids versus tablescans. Cases could be built either way probably.
> As a general > rule, if more than 20% to 25% of a table's rows are expected to be > read, it is usually less expensive to perform a full table scan, rather > than a scan of the index and then an access by ROWID to read the data > from the table. Note that using a % or _ wildcard for the initial > character, Oracle would be required to access every block in the index > and then potentially every row in the table by ROWID.
I think the big point to emphasize is that if you specify LIKE '%PATTERN%' the most straightforward execution path for oracle is either a tablescan or a complete index search. Both of those are not good candidates for producing scalable applications. The logical IO is a huge problem for systems that need to support concurrent access to more than a few queries per time interval.
Many applications don't really require searching within a string. Customer last name searches can usually begin with one or more characters of the last name.
If you really need to support searching within a string there are other things to consider such as text indexes.
Charles Hooper wrote: > wagen...@yahoo.com wrote: >> Oracle10g 10.2.0.1
>> select * from <table_name> where <column_name> like '%abc%';
>> does not use the index and does a full tablescan (guess this is the >> expected behaviour as per explain plan).
>> Any suggestions as to how to use the index (index hint didn't help), IF >> possible or other alternatives.
>> thanks
> Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF) > http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf > "Pattern Matching on Indexed Columns > When you use LIKE to search an indexed column for a pattern, Oracle can > use the > index to improve performance of a query if the leading character in the > pattern is not % > or _. In this case, Oracle can scan the index by this leading > character. If the first > character in the pattern is % or _, then the index cannot improve > performance because > Oracle cannot scan the index."
> I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or > others that detail why the index cannot be used in such a such a case, > but cannot locate a reference to one of the write-ups. As a general > rule, if more than 20% to 25% of a table's rows are expected to be > read, it is usually less expensive to perform a full table scan, rather > than a scan of the index and then an access by ROWID to read the data > from the table. Note that using a % or _ wildcard for the initial > character, Oracle would be required to access every block in the index > and then potentially every row in the table by ROWID.
> Charles Hooper > PC Support Specialist > K&M Machine-Fabricating, Inc.
Jonathan and other, oh how I hate to try paraphrasing others accurately, have been generally dismissive of throwing out that 20-25% number. I have run tests in my lab where I have been able to show wildly different numbers for the optimizer depending on many factors including version, caching related parameters, etc. But as a general rule it is certainly a good thing to keep in mind that >50% is probably not a winning percentage for index usage. -- Daniel A. Morgan University of Washington damor...@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org
DA Morgan wrote: > Charles Hooper wrote: > > wagen...@yahoo.com wrote: > >> Oracle10g 10.2.0.1
> >> select * from <table_name> where <column_name> like '%abc%';
> >> does not use the index and does a full tablescan (guess this is the > >> expected behaviour as per explain plan).
> >> Any suggestions as to how to use the index (index hint didn't help), IF > >> possible or other alternatives.
> >> thanks
> > Oracle Database SQL Reference 10g Release 2 Page 7-16 (page 432 in PDF) > > http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200.pdf > > "Pattern Matching on Indexed Columns > > When you use LIKE to search an indexed column for a pattern, Oracle can > > use the > > index to improve performance of a query if the leading character in the > > pattern is not % > > or _. In this case, Oracle can scan the index by this leading > > character. If the first > > character in the pattern is % or _, then the index cannot improve > > performance because > > Oracle cannot scan the index."
> > I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or > > others that detail why the index cannot be used in such a such a case, > > but cannot locate a reference to one of the write-ups. As a general > > rule, if more than 20% to 25% of a table's rows are expected to be > > read, it is usually less expensive to perform a full table scan, rather > > than a scan of the index and then an access by ROWID to read the data > > from the table. Note that using a % or _ wildcard for the initial > > character, Oracle would be required to access every block in the index > > and then potentially every row in the table by ROWID.
> > Charles Hooper > > PC Support Specialist > > K&M Machine-Fabricating, Inc.
> Jonathan and other, oh how I hate to try paraphrasing others accurately, > have been generally dismissive of throwing out that 20-25% number. I > have run tests in my lab where I have been able to show wildly different > numbers for the optimizer depending on many factors including version, > caching related parameters, etc. But as a general rule it is certainly a > good thing to keep in mind that >50% is probably not a winning > percentage for index usage.
Anytime you have an application that is either doing full tablescans or chewing up all of an index to return the results of a select within a character column, you have a problem.
You might be able to get away with it in a low volume data warehouse environment. A high volume OLTP environment that's going to be a scalability killer.
The important point is to consider changes to your indexing choices IF you really require searching anywhere within a text string.
DA Morgan wrote: > Charles Hooper wrote: > > I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or > > others that detail why the index cannot be used in such a such a case, > > but cannot locate a reference to one of the write-ups. As a general > > rule, if more than 20% to 25% of a table's rows are expected to be > > read, it is usually less expensive to perform a full table scan, rather > > than a scan of the index and then an access by ROWID to read the data > > from the table. Note that using a % or _ wildcard for the initial > > character, Oracle would be required to access every block in the index > > and then potentially every row in the table by ROWID.
> > Charles Hooper > > PC Support Specialist > > K&M Machine-Fabricating, Inc.
> Jonathan and other, oh how I hate to try paraphrasing others accurately, > have been generally dismissive of throwing out that 20-25% number. I > have run tests in my lab where I have been able to show wildly different > numbers for the optimizer depending on many factors including version, > caching related parameters, etc. But as a general rule it is certainly a > good thing to keep in mind that >50% is probably not a winning > percentage for index usage. > -- > Daniel A. Morgan > University of Washington > damor...@x.washington.edu > (replace x with u to respond) > Puget Sound Oracle Users Group > www.psoug.org
Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte: B*Tree indexes should be used only when retrieving a small portion of the rows in a table - less than 20% of the rows.
I have seen similar comments from other respectable sources. The 20% to 25% is a rough guideline. A read through "Cost-Based Oracle Fundamentals" leaves the impression that nothing is that simple. If the database has an 8KB block size, and the average row length is 20 bytes, does the 20% to 25% rough guideline hold true? If the database has an 8KB block size, and the average row length is 8000 bytes, does the 20% to 25% rough guideline hold true? What if the last 80% of the rows added to the table are deleted causing a high water mark that is far beyond the last used block, does the 20% to 25% rough guideline hold true? Yes, No, and Maybe are all possibly correct answers.
Maybe Gaja Vaidyanatha put it best in "Oracle Performance Tuning 101" (page 76), when referring to an article written by Cary Millsap in 1993: "It is pretty clear that the use of an index for a query should not be determined by some arbitrary percentage of the rows processed or selected from the table, instead it should be determined by the number of blocks visited to retrieve the data. If the number of blocks visited for an index is lower than a full table scan, then the index will be useful. Otherwise, a full tablescan will provide much better performance."
But then, you must also consider the number of index and table blocks that may already be in the buffer cache, the multiblock read count, the time required to retrieve a single block compared with the time to retrieve multiple blocks, etc.
I am not disagreeing with your comments on this topic. However, if someone asks me why Oracle is not using my index, it is very easy to ask, does Oracle expect to retrieve more than 20% to 25% of the rows by index? If the answer is Yes or Maybe, then that is why Oracle is not using your index.
Charles Hooper PC Support Specialist K&M Machine-Fabricating, Inc.
Charles Hooper wrote: > DA Morgan wrote: > > Charles Hooper wrote: > > > I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or > > > others that detail why the index cannot be used in such a such a case, > > > but cannot locate a reference to one of the write-ups. As a general > > > rule, if more than 20% to 25% of a table's rows are expected to be > > > read, it is usually less expensive to perform a full table scan, rather > > > than a scan of the index and then an access by ROWID to read the data > > > from the table. Note that using a % or _ wildcard for the initial > > > character, Oracle would be required to access every block in the index > > > and then potentially every row in the table by ROWID.
> > > Charles Hooper > > > PC Support Specialist > > > K&M Machine-Fabricating, Inc.
> > Jonathan and other, oh how I hate to try paraphrasing others accurately, > > have been generally dismissive of throwing out that 20-25% number. I > > have run tests in my lab where I have been able to show wildly different > > numbers for the optimizer depending on many factors including version, > > caching related parameters, etc. But as a general rule it is certainly a > > good thing to keep in mind that >50% is probably not a winning > > percentage for index usage. > > -- > > Daniel A. Morgan > > University of Washington > > damor...@x.washington.edu > > (replace x with u to respond) > > Puget Sound Oracle Users Group > > www.psoug.org
> Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte: > B*Tree indexes should be used only when retrieving a small portion of > the rows in a table - less than 20% of the rows.
> I have seen similar comments from other respectable sources. The 20% > to 25% is a rough guideline. A read through "Cost-Based Oracle > Fundamentals" leaves the impression that nothing is that simple. If > the database has an 8KB block size, and the average row length is 20 > bytes, does the 20% to 25% rough guideline hold true? If the database > has an 8KB block size, and the average row length is 8000 bytes, does > the 20% to 25% rough guideline hold true? What if the last 80% of the > rows added to the table are deleted causing a high water mark that is > far beyond the last used block, does the 20% to 25% rough guideline > hold true? Yes, No, and Maybe are all possibly correct answers.
> Maybe Gaja Vaidyanatha put it best in "Oracle Performance Tuning 101" > (page 76), when referring to an article written by Cary Millsap in > 1993: > "It is pretty clear that the use of an index for a query should not be > determined by some arbitrary percentage of the rows processed or > selected from the table, instead it should be determined by the number > of blocks visited to retrieve the data. If the number of blocks > visited for an index is lower than a full table scan, then the index > will be useful. Otherwise, a full tablescan will provide much better > performance."
> But then, you must also consider the number of index and table blocks > that may already be in the buffer cache, the multiblock read count, the > time required to retrieve a single block compared with the time to > retrieve multiple blocks, etc.
> I am not disagreeing with your comments on this topic. However, if > someone asks me why Oracle is not using my index, it is very easy to > ask, does Oracle expect to retrieve more than 20% to 25% of the rows by > index? If the answer is Yes or Maybe, then that is why Oracle is not > using your index.
There's no magic formula.
To me it doesn't matter in the case of a search as submitted by the OP whether an index or a full tablescan is used.
Either type of approach is likely to produce an application that has inherent scalability limitations due to a large number of Logical IO's.
Searching "anywhere" within a text string requires thought and research and normally an approach that does not rely on using a regular oracle index based on some character or varchar column.
Searching from the beginning of a string is different.
Discussion of percentages etc in this particular case is missing the forest because of the trees.
Charles Hooper wrote: > Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte: > B*Tree indexes should be used only when retrieving a small portion of > the rows in a table - less than 20% of the rows.
Oh I know Tom wrote that and I suspect he regrets having done so. Not because it isn't often correct ... but because often it is not correct.
I can, for example, fiddle with optimizer_index_cost_adj and optimizer_index_caching and make that number come out just about anywhere I want.
Remember Connor's demo of how he could dial in just about any hit ratio he wanted? Somewhere I have one that does the same thing with B*Trees. I'll see if I can find it.
Regards -- Daniel A. Morgan University of Washington damor...@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org
> > Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte: > > B*Tree indexes should be used only when retrieving a small portion of > > the rows in a table - less than 20% of the rows.
> Oh I know Tom wrote that and I suspect he regrets having done so. Not > because it isn't often correct ... but because often it is not correct.
Not at all - as was said "paraphrase" which could also be equated with "out of context"
This was in a section where I was talking about indexes in general - and I actually used "somewhere between 1 to 20%" - but caveatted it heavily. You might use an index to retrieve EVERY row from a table (first rows optimization for example), but in general, you are using indexes to retrieve a relatively small percentage of the tables row.
I never said this was a hard and fast number - never said "never use it for more than 20%". I don't really "regret" writing it as I did
.... Here, only the index was used to answer the query-it would not matter now what percentage of rows we were accessing, as we would use the index only. We can see from the plan that the underlying table was never accessed; we simply scanned the index structure itself.
It is important to understand the difference between the two concepts. When we have to do a TABLE ACCESS BY INDEX ROWID, we must ensure we are accessing only a small percentage of the total blocks in the table, which typically equates to a small percentage of the rows, or that we need the first rows to be retrieved as fast as possible (the end user is waiting for them impatiently). If we access too high a percentage of the rows (larger than somewhere between 1 and 20 percent of the rows), then it will generally take longer to access them via a B*Tree than by just full scanning the table.
With the second type of query, where the answer is found entirely in the index, we have a different story. We read an index block and pick up many "rows" to process, then we go on to the next index block, and so on-we never go to the table. There is also a fast full scan we can perform on indexes to make this even faster in certain cases. A fast full scan is when the database reads the index blocks in no particular order; it just starts reading them. It is no longer using the index as an index, but even more like a table at that point. Rows do not come out ordered by index entries from a fast full scan.
In general, a B*Tree index would be placed on columns that we use frequently in the predicate of a query, and we would expect some small fraction of the data from the table to be returned or the end user demands immediate feedback. On a thin table (i.e., a table with few or small columns), this fraction may be very small. A query that uses this index should expect to retrieve 2 to 3 percent or less of the rows to be accessed in the table. On a fat table (i.e., a table with many columns or very wide columns), this fraction might go all the way up to 20 to 25 percent of the table. This advice doesn't always seem to make sense to everyone immediately; it is not intuitive, but it is accurate. An index is stored sorted by index key. The index will be accessed in sorted order by key. The blocks that are pointed to are stored randomly in a heap. Therefore, as we read through an index to access the table, we will perform lots of scattered, random I/O. By "scattered," I mean that the index will tell us to read block 1, block 1,000, block 205, block 321, block 1, block 1,032, block 1, and so on-it won't ask us to read block 1, then block 2, and then block 3 in a consecutive manner. We will tend to read and reread blocks in a very haphazard fashion. This single block I/O can be very slow. ...............................
> I can, for example, fiddle with optimizer_index_cost_adj and > optimizer_index_caching and make that number come out just about > anywhere I want.
> Remember Connor's demo of how he could dial in just about any hit ratio > he wanted? Somewhere I have one that does the same thing with B*Trees. > I'll see if I can find it.
> Regards > -- > Daniel A. Morgan > University of Washington > damor...@x.washington.edu > (replace x with u to respond) > Puget Sound Oracle Users Group > www.psoug.org
thomas.k...@oracle.com wrote: > DA Morgan wrote: > I never said this was a hard and fast number - never said "never use it > for more than 20%". I don't really "regret" writing it as I did
I stand corrected. Thank you for the clarification of your intent. -- Daniel A. Morgan University of Washington damor...@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org
thomas.k...@oracle.com wrote: > DA Morgan wrote: > > Charles Hooper wrote:
> > > Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte: > > > B*Tree indexes should be used only when retrieving a small portion of > > > the rows in a table - less than 20% of the rows.
> > Oh I know Tom wrote that and I suspect he regrets having done so. Not > > because it isn't often correct ... but because often it is not correct.
> Not at all - as was said "paraphrase" which could also be equated with > "out of context"
> This was in a section where I was talking about indexes in general - > and I actually used "somewhere between 1 to 20%" - but caveatted it > heavily. You might use an index to retrieve EVERY row from a table > (first rows optimization for example), but in general, you are using > indexes to retrieve a relatively small percentage of the tables row.
> I never said this was a hard and fast number - never said "never use it > for more than 20%". I don't really "regret" writing it as I did
> .... > Here, only the index was used to answer the query-it would not matter > now what percentage of rows we were accessing, as we would use the > index only. We can see from the plan that the underlying table was > never accessed; we simply scanned the index structure itself.
> It is important to understand the difference between the two concepts. > When we have to do a TABLE ACCESS BY INDEX ROWID, we must ensure we are > accessing only a small percentage of the total blocks in the table, > which typically equates to a small percentage of the rows, or that we > need the first rows to be retrieved as fast as possible (the end user > is waiting for them impatiently). If we access too high a percentage of > the rows (larger than somewhere between 1 and 20 percent of the rows), > then it will generally take longer to access them via a B*Tree than by > just full scanning the table.
> With the second type of query, where the answer is found entirely in > the index, we have a different story. We read an index block and pick > up many "rows" to process, then we go on to the next index block, > and so on-we never go to the table. There is also a fast full scan we > can perform on indexes to make this even faster in certain cases. A > fast full scan is when the database reads the index blocks in no > particular order; it just starts reading them. It is no longer using > the index as an index, but even more like a table at that point. Rows > do not come out ordered by index entries from a fast full scan.
> In general, a B*Tree index would be placed on columns that we use > frequently in the predicate of a query, and we would expect some small > fraction of the data from the table to be returned or the end user > demands immediate feedback. On a thin table (i.e., a table with few or > small columns), this fraction may be very small. A query that uses this > index should expect to retrieve 2 to 3 percent or less of the rows to > be accessed in the table. On a fat table (i.e., a table with many > columns or very wide columns), this fraction might go all the way up to > 20 to 25 percent of the table. This advice doesn't always seem to > make sense to everyone immediately; it is not intuitive, but it is > accurate. An index is stored sorted by index key. The index will be > accessed in sorted order by key. The blocks that are pointed to are > stored randomly in a heap. Therefore, as we read through an index to > access the table, we will perform lots of scattered, random I/O. By > "scattered," I mean that the index will tell us to read block 1, > block 1,000, block 205, block 321, block 1, block 1,032, block 1, and > so on-it won't ask us to read block 1, then block 2, and then block > 3 in a consecutive manner. We will tend to read and reread blocks in a > very haphazard fashion. This single block I/O can be very slow. > ...............................
> > I can, for example, fiddle with optimizer_index_cost_adj and > > optimizer_index_caching and make that number come out just about > > anywhere I want.
> > Remember Connor's demo of how he could dial in just about any hit ratio > > he wanted? Somewhere I have one that does the same thing with B*Trees. > > I'll see if I can find it.
> > Regards > > -- > > Daniel A. Morgan > > University of Washington > > damor...@x.washington.edu > > (replace x with u to respond) > > Puget Sound Oracle Users Group > > www.psoug.org
Here's the question as posed by the OP ...
Oracle10g 10.2.0.1
select * from <table_name> where <column_name> like '%abc%';
does not use the index and does a full tablescan (guess this is the expected behaviour as per explain plan).
Any suggestions as to how to use the index (index hint didn't help), IF