Primary key, Foreign key and indexes in Informix

KalpanaPai

unread,

Apr 27, 2004, 8:11:08 PM4/27/04

to

Dear All,

We r using informix 9.21 on solaris.
We have a small database of 1.5GB with 100 tables.

I have staretd analysing the appliaction queries which have written
in the past. I have few questions regarding PK,FK and index creation
in informix.

1. When i create a Primary key it creates an index automatically. Do i
need to create explicit index for better performance?

2. When i create a foreign key it also creates an index, do i need to
create an index on the foreign key also explicitly , if it is used in
where cluase etc.

3. Will it imprve the performance of the queries if we create a
explicit indexes .

4. If i am using 4 columns in one query which runs very frequently,
and one of the same column i use in another qury, and another column
in another query. In this case is it better to create a composite
index on all 4 columns , and separate indexes on each column?

Any suggestions highly appreciated..
Thanks in Advance
Kpai

rkusenet

unread,

Apr 27, 2004, 8:50:19 PM4/27/04

to

"KalpanaPai" <kalpa...@hotmail.com> wrote in message news:8b77f6f5.04042...@posting.google.com...

> Dear All,
>
> We r using informix 9.21 on solaris.
> We have a small database of 1.5GB with 100 tables.
>
> I have staretd analysing the appliaction queries which have written
> in the past. I have few questions regarding PK,FK and index creation
> in informix.
>
> 1. When i create a Primary key it creates an index automatically. Do i
> need to create explicit index for better performance?

PKY creates index automatically. Trying to create index on the
same column will result in error, unless of course you create it
as a DESC index.

>
> 2. When i create a foreign key it also creates an index, do i need to
> create an index on the foreign key also explicitly , if it is used in
> where cluase etc.

Same as PKY. FKY will create an index for you.

> 3. Will it imprve the performance of the queries if we create a
> explicit indexes .

Moot.

> 4. If i am using 4 columns in one query which runs very frequently,
> and one of the same column i use in another qury, and another column
> in another query. In this case is it better to create a composite
> index on all 4 columns , and separate indexes on each column?

assuming the columns to be indexed are a,b,c,d
in this case composite index is worthless for those queries which
uses column b or c or d.
Composite index will work only for
WHERE a = jujuju AND b = jujuju AND c = jujuju AND d = jujuju.

composite index will also work for this query
WHERE a = jujuju
because column a is the leading column in this index.

so for a query like WHERE b = jujuju, there will be no index scan.
same for column c and d.

You have no other choice but to create separate index. I would also
suggest creating a composite index and individual index as required.
For all said and done, composite index perform order of magnitude
better than separate index for queries like
WHERE a = jujuju AND b = jujuju AND c = jujuju AND d = jujuju.

Andrew Hamm

unread,

Apr 27, 2004, 9:42:48 PM4/27/04

to

KalpanaPai wrote:
>
> 1. When i create a Primary key it creates an index automatically. Do i
> need to create explicit index for better performance?
>
> 2. When i create a foreign key it also creates an index, do i need to
> create an index on the foreign key also explicitly , if it is used in
> where cluase etc.
>
> 3. Will it imprve the performance of the queries if we create a
> explicit indexes .

As you've observed, the engine will create a hidden index if it cannot find
an existing explicit index that will cover the task. I find that it is a
little bit better administratively to create an explicit index first and
then let the primary or foreign keys "piggy back" on the existing indexes.
The same also goes for a UNIQUE constraint attached to columns, by the way.

As for performance, an index is an index. If the engine can use it then it
doesn't matter where the index comes from. The will be no performance
difference from using an explicit index, or depending on the implicit index.

There is, however, a potential problem with foreign key indexes, and it
comes about due to an effect that causes trouble with any indexes that have
a very poor data distribution. To understand the problem you need to know a
bit about the internal structure of an index.

For every unique value in an index, the engine stores one copy of the value
(this can include a set of more than one field; it doesn't matter how the
stored value is constructed). Then for every value "inserted" into the
index, if the value already exists in the index then only the address of the
new row is stored. All of the addresses are stored in a linked list of
rowids that are attached to the key value.

If you insert 1, 1, 2, 2, 1, 3, 2, 3 into a table and there is an index on
that column, then there will be 3 entries for (1), 3 entries for (2), and 2
entries for (3). These will be stored something like this [this diagram is
very simplified]:

1: rowid1 rowid2 rowid5
2: rowid3 rowid4 rowid7
3: rowid6 rowid8

If you insert or delete a row then the engine must find the exact rowid in
the list for that value. If there is only a few values then that search is
extremely quick, since many fit on one page. However, when the number of
rows sharing the same key approaches hundreds, then the search for the rowid
can cause quite a few page reads and will take a lot of time. Also, the
adjustment of the list to fill in the gaps will probably take a lot of time
[If anyone can describe engine optimisations here I'd be most interested to
read]

Therefore, if you are going to put an index onto a column or set of columns
that has a large number of duplicates, then you would probably be better off
NOT using the index.

A good example of a bad index would be to index the State field of a
customer order table in the US - perhaps for IBM customer orders, for
example.

Let's say there are 50 US states (close enough) and there are 200,000
customer orders in the table at any one time. On average there will be
200,000 / 50 rows all sharing the same index value, which means there will
be on average 4000 rows attached to one key. Given the population spread of
America, there would probably be 50,000 orders for the states of CA and NY
each. That is a very large list of rowids to be stored in an index.

Therefore, every time an order is added or removed from the table, there
will be a lot of hard work to be done adjusting the index. This index would
be A Very Bad Idea.

Now, if you are a Relational Database Purist, you might insist that your
order table MUST have a foreign key on the state table. If you believe this,
you will suffer.

It is just as easy (generally) to make the programs check these
relationships. To get the best from FKs, I suggest that you should use as
many foreign keys as you need in a development database, but you should
remove the bad FK and index definitions from production databases. If you do
adequate testing in development then any software faults should be caught
before it ever gets to production. At the very least, the existence of the
FK in development is a very strong hint to the programmers that they need to
validate the state field.

> 4. If i am using 4 columns in one query which runs very frequently,
> and one of the same column i use in another qury, and another column
> in another query. In this case is it better to create a composite
> index on all 4 columns , and separate indexes on each column?

If an index exists on columns (A,B,C,D) then the engine can use the index on
queries that refer to (A,B,C,D) or (A,B,C) or (A,B) or (A) columns. So I try
to order (A,B,C,D) in a way that can be useful for other queries. If you
also want to query by column C only, then you may need an index on (C).

On the other hand, the engine can do other query paths that don't
necessarily need indexes. This applies especially when you are selecting
large parts of the tables involved in the joins. The engine can either make
a temporary index, or perform a sort operation at Just The Right Time, or it
can perform a hash-join.

The easiest way to know if an index is useful is to try it and see what
happens. With experience you'll be able to make better guesses in the future
without needing to test every time.

Jack Parker

unread,

Apr 27, 2004, 11:00:49 PM4/27/04

to

----- Original Message -----
From: "KalpanaPai" <kalpa...@hotmail.com>
To: <inform...@iiug.org>
Sent: Tuesday, April 27, 2004 9:11 PM
Subject: Primary key, Foreign key and indexes in Informix

> Dear All,
>
> We r using informix 9.21 on solaris.
> We have a small database of 1.5GB with 100 tables.
>
> I have staretd analysing the appliaction queries which have written
> in the past. I have few questions regarding PK,FK and index creation
> in informix.
>

> 1. When i create a Primary key it creates an index automatically. Do i
> need to create explicit index for better performance?

When you create an index you can tell the engine where to put it. When you
use the primary key syntax - the engine decides where to put it. If you
create a unique key that matches the primary key requirements - the engine
will use that index to satisfy the primary key - hence it is generally best
to first define you own unique index which the engine can use to satisfy the
primary key.

>
> 2. When i create a foreign key it also creates an index, do i need to
> create an index on the foreign key also explicitly , if it is used in
> where cluase etc.

Ditto. fer shure.

>
> 3. Will it imprve the performance of the queries if we create a
> explicit indexes .

Perhaps (assuming the same line of questioning). Index location is more of
a storage location consideration and spindle usage consideration - which is
generally better known by the DBA than by the engine. The index itself is a
b(+)-tree, so you are going to navigate the b-tree whenever you try to read
it. You want that tree to be as tight as possible and removed from
potentially contentious situations - hence why we bother to take control
over where it goes.

>
> 4. If i am using 4 columns in one query which runs very frequently,
> and one of the same column i use in another qury, and another column
> in another query. In this case is it better to create a composite
> index on all 4 columns , and separate indexes on each column?
>

It depends. (;-)). The first column (or set of columns) of a composite
index can be used by queries that do not fully qualify the entire index
contents. So if I have an index on columns a,b,c and d, this will be used
to satisfy queries where a,b,c,d are all spelled out - or where a and b are
spelled out, or just 'a'. It will NOT be used when only 'b' is spelled out
(or just c or just d - or some combination thereof).

> Any suggestions highly appreciated..
> Thanks in Advance
> Kpai
>

sending to informix-list

Colin Bull

unread,

Apr 28, 2004, 4:54:30 AM4/28/04

to

Andrew Hamm wrote

> There is, however, a potential problem with foreign key indexes, and it
> comes about due to an effect that causes trouble with any indexes that have
> a very poor data distribution. To understand the problem you need to know a
> bit about the internal structure of an index.

What would you say about a table that returns the following -

select count(*), id_code from tab1 group by id_code

1254696 X
1069 Y
1530855 Z

Tab1 is only ever added to, but is used a lot in queries.
id_code is a non composite index.
id_code is a foreign key on another table.

Field and table name changed to protect the innocent ( or guilty!! )

Colin Bull
c.b...@videonetworks.com

sending to informix-list

Dragi Raos

unread,

Apr 28, 2004, 5:57:23 AM4/28/04

to

----- Original Message -----
From: "Andrew Hamm" <ah...@mail.com>
To: <inform...@iiug.org>
Sent: Wednesday, April 28, 2004 3:42 AM
Subject: Re: Primary key, Foreign key and indexes in Informix

[On indices on potential FK columns with poor data distribution]

> If you do
> adequate testing in development then any software faults should be
caught
> before it ever gets to production.

Heh, the famous last words :-)

I am still in favour of some kind of db-level consistency checking - be
it using referential constraint, check constraint, trigger.... No such
thing as perfectly tested application.

BTW, good old Informix 5 was terrible with this kind of indices - not
only when inserting or updating; since meaningful distribution
statistics were missing, the optimizer very often got it completely
wrong.

Bonzi

sending to informix-list

David Williams

unread,

Apr 28, 2004, 2:39:13 PM4/28/04

to

"Andrew Hamm" <ah...@mail.com> wrote in message
news:c6n282$dqs2i$1...@ID-79573.news.uni-berlin.de...

>
> Therefore, if you are going to put an index onto a column or set of
columns
> that has a large number of duplicates, then you would probably be better
off
> NOT using the index.

Nope...add a serial field to the table and build the index as
a unique index on (myfield1,serial_field). Then the index is used but is
unique
so no rowid lists are used!

>
> Therefore, every time an order is added or removed from the table, there
> will be a lot of hard work to be done adjusting the index. This index
would
> be A Very Bad Idea.
>

See above.

> Now, if you are a Relational Database Purist, you might insist that your
> order table MUST have a foreign key on the state table. If you believe
this,
> you will suffer.

I am assuming that the foreign key will be smart enough to use the
above index? ..wait a sec I have 9.40 on windows here.....ok
playing in dbaccess I can create

create table "Administrator".t2
(
p1 integer
);
revoke all on "Administrator".t2 from "public";

create unique index "Administrator".pi1 on "Administrator".t2
(p1) using btree ;
alter table "Administrator".t2 add constraint primary key (p1)
constraint "Administrator".c1 ;

and

create table "Administrator".t1
(
a serial not null ,
f1 integer
);
revoke all on "Administrator".t1 from "public";

create unique index "Administrator".fi1 on "Administrator".t1
(f1,a) using btree ;
alter table "Administrator".t1 add constraint (foreign key (f1)
references "Administrator".t2 constraint "Administrator".fc1);

...even though my index fi1 contains the serial column a as well as the
foreign key f1 the constraint does NOT create it own index!

>
> It is just as easy (generally) to make the programs check these
> relationships. To get the best from FKs, I suggest that you should use as
> many foreign keys as you need in a development database, but you should

You mean like Oracle Financials which I heard has hundreds of tables
and only 3 foreign keys even though Oracle recommend using foreign keys?

> remove the bad FK and index definitions from production databases. If you
do
> adequate testing in development then any software faults should be caught
> before it ever gets to production. At the very least, the existence of the
> FK in development is a very strong hint to the programmers that they need
to
> validate the state field.

The indexes should be used anyway. When adding a row in the foreign table
using the index on the primary. When deleting (or flagging as disabled) a
row
in the primary check for no references in the foreign.

>
> > 4. If i am using 4 columns in one query which runs very frequently,
> > and one of the same column i use in another qury, and another column
> > in another query. In this case is it better to create a composite
> > index on all 4 columns , and separate indexes on each column?
>
> If an index exists on columns (A,B,C,D) then the engine can use the index
on
> queries that refer to (A,B,C,D) or (A,B,C) or (A,B) or (A) columns. So I
try
> to order (A,B,C,D) in a way that can be useful for other queries. If you
> also want to query by column C only, then you may need an index on (C).
>
> On the other hand, the engine can do other query paths that don't
> necessarily need indexes. This applies especially when you are selecting
> large parts of the tables involved in the joins. The engine can either
make
> a temporary index, or perform a sort operation at Just The Right Time, or
it
> can perform a hash-join.

A temporary index (autoindex path) is usually a very bad idea!

David Williams

unread,

Apr 28, 2004, 2:40:13 PM4/28/04

to

"Colin Bull" <c.b...@videonetworks.com> wrote in message
news:c6nsl3$dr6$1...@terabinaries.xmission.com...

> What would you say about a table that returns the following -
>
> select count(*), id_code from tab1 group by id_code
>
> 1254696 X
> 1069 Y
> 1530855 Z
>
> Tab1 is only ever added to, but is used a lot in queries.
> id_code is a non composite index.
> id_code is a foreign key on another table.
>

As a serial. create composite index on (id_code,serial), remove index on
id_code.

Andrew Hamm

unread,

Apr 28, 2004, 11:32:02 PM4/28/04

to

Colin Bull wrote:
>
> What would you say about a table that returns the following -
>
> select count(*), id_code from tab1 group by id_code
>
> 1254696 X
> 1069 Y
> 1530855 Z
>
> Tab1 is only ever added to, but is used a lot in queries.
> id_code is a non composite index.
> id_code is a foreign key on another table.

Ouch! I'd say, fragment - that will satisfy equality clauses asking for X Y
or Z, and you can eliminate that index.

I would actually expect that add operations are fairly cheap; hoping that
the database keeps a pointer to the end page, or maybe a freelist of slots
in the linked list of rowids. But pity the poor person who has to change a
value or rollback a transaction.

If it's definitely a 3-valued index then fragment by expression for = "X", =
"Y" and ="Z". "Y" would probably be better as an OTHERWISE (or is it ELSE?)
clause since it will collect other stray values that creep in. If you don't
want other stray values to creep in, then don't use the OTHERWISE clause,
and the engine will spit it out.

If the values of X Y and Z can change occasionally or be added to, then you
need to consider small administrative activity on the fragments when new
values arrive. This is not uncommon for slowly creeping fragmentation rules.

Since it's FK'd to another table, can you make the entry programs validate
it? However, if you do use an explicit list of values in the fragmentation
expressions, then you'll still get an engine error not unlike a FK violation
if a naughty value is inserted.

Andrew Hamm

unread,

Apr 28, 2004, 11:36:58 PM4/28/04

to

Dragi Raos wrote:
>
>> If you do
>> adequate testing in development then any software faults should be
>> caught before it ever gets to production.
>
> Heh, the famous last words :-)

yup - very famous. An alternative which would be just as rigorous would be
to use a trigger on the production database. Set the trigger to fire on
insert or update of the field, and it can do a lookup into the little
reference table. That way you get the strong validation without having to
suffer the cost of the FK index.

> I am still in favour of some kind of db-level consistency checking -
> be it using referential constraint, check constraint, trigger.... No
> such thing as perfectly tested application.

No, but when something can so severely compromise performance you need to
shift the compromise. Perhaps the paranoid could run a regular process
looking for bad values that have crept in to a column that has a removed FK.
However then you'd only know that there is a problem, but not which program
is responsible.

> BTW, good old Informix 5 was terrible with this kind of indices - not
> only when inserting or updating; since meaningful distribution
> statistics were missing, the optimizer very often got it completely
> wrong.

yeah true. the 7's and 9's seem to get less confused with crap indexes.