low cardinality columns and indexing.

ayrobins

unread,

Dec 5, 2001, 10:58:04 AM12/5/01

to

I am using the standard edition of oracle. I realize that bitmap indexes
aren't supported.

I have a table that can contain around 50 million records and can be looked
up by column
X. X has a cardinality of 20. i.e. it can contain 20 distinct values. A
lookup by 'X' can
will be done often.

Does it make sense to not use an index at all. Or is a Btree index better
then nothing.

Brian Tkatch

unread,

Dec 5, 2001, 12:21:31 PM12/5/01

to

On Wed, 05 Dec 2001 15:58:04 GMT, "ayrobins" <ant...@lumos.com>
wrote:

Isn't this a perfect case for INDEX COMPRESSion?

Anyway,

CREATE TABLE Moo (Cow NUMBER);

BEGIN
FOR A IN 1..20 LOOP
FOR B IN 1..1000 LOOP
INSERT INTO Moo VALUES (A);
END LOOP;
END LOOP;
END;
/

COMMIT;

SELECT COUNT(*) FROM Moo WHERE Cow = 5;
SELECT COUNT(*) FROM Moo WHERE Cow = 6;
SELECT COUNT(*) FROM Moo WHERE Cow = 7;

For any of them:

First time: Elapsed: 00:00:00.03
After that: Elapsed: 00:00:00.01

CREATE INDEX A ON Moo(Cow);

SELECT COUNT(*) FROM Moo WHERE Cow = 8;
SELECT COUNT(*) FROM Moo WHERE Cow = 9;
SELECT COUNT(*) FROM Moo WHERE Cow = 10;

For any of them:
First time: Elapsed: 00:00:00.01
After that: Elapsed: 00:00:00.00

I'd say the index helps. That's with only 20,000 records. You may just
want to test yourself on a larger set.

Brian

Daniel

unread,

Dec 5, 2001, 3:19:37 PM12/5/01

to

I will suggest that you store these 50 millions rows into a
partitioned table by using each cardinality as the key for each
partition.

"ayrobins" <ant...@lumos.com> wrote in message news:<gurP7.80$6e6....@dca1-nnrp2.news.digex.net>...

ayrobins

unread,

Dec 5, 2001, 4:06:24 PM12/5/01

to

But this becomes non-maintanable if you have, say, 5 columns in a table that
you can do lookups on that have a low cardinality.

"Daniel" <twi...@yahoo.com> wrote in message
news:e0e46f5b.01120...@posting.google.com...

Dan Peacock

unread,

Dec 5, 2001, 4:44:08 PM12/5/01

to

That depends on how those values are distributed and how the rows are
distributed in your blocks. Depending on your block size, you could
have one of your 'X' rows in nearly every block. Thus, you would do a
range scan of the index and then virtually do a full tablescan as you
would still read in every block. Now, if you have skewed values, that
is your column X has some values that take up most of the rows, then
an index would help when going after the values that do not exist in
most of the blocks (this is where histograms and the cost based
optimizer come in handy!) Unfortunately, you've stumbled upon the
biggest problem with using cardinality as a benchmark for the
usefulness of an index. A classic example is indexing by gender. The
cardinality will (generally) be 2. Which would lead you to think that
an index wouldn't be a good idea. In the general population, this is
true. Odds are, you will fetch all of the blocks anyway in a query
using gender. Now, take the case of VMI or the Citidel. The
cardinality is still two, but you and I both know that 50% of the
school isn't made up of women cadets. In that case, an index would
REALLY help when you were trying to access female cadet information,
but hurt you when accessing male cadet information.

I apologize for being a bit long winded, but you can't rely solely on
cardinality when making index choices. You need to know the nature of
your data and the distribution of values in the data set.

HTH

Dan Peacock
DBA
Wolverine World Wide

"ayrobins" <ant...@lumos.com> wrote in message news:<gurP7.80$6e6....@dca1-nnrp2.news.digex.net>...

ayrobins

unread,

Dec 5, 2001, 6:07:43 PM12/5/01

to

Yes, i understand your point. I'm assumming that our column 'X' will be
distributed
most evenly. But you're right, if there is an almost-unique value in a low
cardinality column,
then maybe an index is worth doing.

"Dan Peacock" <peac...@yahoo.com> wrote in message
news:67044b3b.01120...@posting.google.com...