Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

implicit index creation on foreign key addition

15 views
Skip to first unread message

Coniglio Sgabbiato

unread,
Dec 17, 2009, 12:27:53 PM12/17/09
to
Hi,
Is there an option in PG to do it?

Andreas Kretschmer

unread,
Dec 18, 2009, 1:45:48 AM12/18/09
to
Coniglio Sgabbiato <nob...@nowhere.net> wrote:
> Hi,
> Is there an option in PG to do it?

No. Only for primary keys.

Andreas
--
Andreas Kretschmer
Linux - weil ich es mir wert bin!
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net

Coniglio Sgabbiato

unread,
Dec 18, 2009, 9:44:19 AM12/18/09
to
Il 18/12/2009 7.45, Andreas Kretschmer ha scritto:
> Coniglio Sgabbiato<nob...@nowhere.net> wrote:
>> Hi,
>> Is there an option in PG to do it?
>
> No. Only for primary keys.
>
> Andreas

What a shame! Is it possible to request such a feature for PG next
release? Im my opinion it should be active by default (it is expected
that foreign keys will be used in joins besides referential integrity).

Thomas Kellerer

unread,
Dec 18, 2009, 9:46:39 AM12/18/09
to
Coniglio Sgabbiato, 18.12.2009 15:44:

>> No. Only for primary keys.
>>
>> Andreas
>
> What a shame! Is it possible to request such a feature for PG next
> release? Im my opinion it should be active by default (it is expected
> that foreign keys will be used in joins besides referential integrity).

Do you know any DBMS where FK columns are automatically indexed?

Thomas

Coniglio Sgabbiato

unread,
Dec 18, 2009, 10:55:34 AM12/18/09
to
[CUT]

> Do you know any DBMS where FK columns are automatically indexed?
>
> Thomas
>

Here:

http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html

it _seems_ to say that Mysql _could_ do that:

"InnoDB requires indexes on foreign keys and referenced keys so that
foreign key checks can be fast and not require a table scan. In the
referencing table, there must be an index where the foreign key columns
are listed as the first columns in the same order. Such an index is
created on the referencing table automatically if it does not exist.
(This is in contrast to some older versions, in which indexes had to be
created explicitly or the creation of foreign key constraints would
fail.) index_name, if given, is used as described previously."

I tested it but it seems it does not work. I am not so accustomed to
Mysql, so I could did a mistake:

mysql> create table test (id int primary key) engine=innodb;
Query OK, 0 rows affected (0.04 sec)

mysql> create table test2 (id int primary key, fk int references test)
engine=innodb;
Query OK, 0 rows affected (0.05 sec)

mysql> show keys from test2;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| test2 | 0 | PRIMARY | 1 | id | A
| 0 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)

Andreas Kretschmer

unread,
Dec 18, 2009, 10:57:16 AM12/18/09
to

In short: -1 from me.

No, there are a LOT of use cases where nobody needs such an index,
especially with small refereced tables.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°

Marco Mariani

unread,
Dec 18, 2009, 11:05:41 AM12/18/09
to
Coniglio Sgabbiato wrote:


[ talking about foreign keys in mysql ]


> http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
>
> it _seems_ to say that Mysql _could_ do that:

[...]

> I tested it but it seems it does not work.

Maybe you could wait for a full moon and retry..


Coniglio Sgabbiato

unread,
Dec 18, 2009, 11:18:32 AM12/18/09
to
[CUT]

> In short: -1 from me.
>
> No, there are a LOT of use cases where nobody needs such an index,
> especially with small refereced tables.

I meant creation of index in referencing table, not referenced, what do
you think about?

Andreas Kretschmer

unread,
Dec 18, 2009, 11:29:49 AM12/18/09
to

Ahhh...

test=# create table "Coniglio Sgabbiato" (id serial primary key);
NOTICE: CREATE TABLE will create implicit sequence "Coniglio Sgabbiato_id_seq" for serial column "Coniglio Sgabbiato.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "Coniglio Sgabbiato_pkey" for table "Coniglio Sgabbiato"
CREATE TABLE

Read the NOTICE - lines careful ...

But:

test=*# create table foobar(i int references "Coniglio Sgabbiato" on delete cascade);
CREATE TABLE

Marco Mariani

unread,
Dec 18, 2009, 11:39:37 AM12/18/09
to
Coniglio Sgabbiato wrote:


Ok, so I took the time to write a test case.

Can you explain me the effect of the index bigone_dayname_idx, apart
from taking space and slowing insertions?

BTW, most of my foreign keys are never used in joins, are there purely
for referential integrity, and reference small tables.


> -- create a set of rows with low cardinality
>
> CREATE TABLE week ( dayname VARCHAR PRIMARY KEY, stuff FLOAT );
> INSERT INTO week VALUES
> ('monday', 3.1415926535897931),
> ('tuesday', 2.7182818284590451),
> ('wednesday', 1.4142135623730951),
> ('thurstday', 0.3010299956639812),
> ('friday', 0.8660254037844386),
> ('saturday', 0.99627207622074998),
> ('sunday', 0);
>
> -- and a table that will contain a lot of data, referencing the first one
>
> CREATE TABLE bigone ( pk SERIAL, dayname VARCHAR REFERENCES week );
>
> -- 700k rows
>
> INSERT INTO bigone (dayname)
> SELECT dayname FROM (
> SELECT * FROM week CROSS JOIN (SELECT * FROM generate_series(1,100000)) numbers
> ) days
> ORDER BY random();
>
> INSERT INTO bigone (dayname) VALUES ((SELECT dayname FROM week ORDER BY RANDOM() LIMIT 1));
>


now, join them


> test=# EXPLAIN SELECT pk, stuff FROM bigone natural JOIN week;
> QUERY PLAN
> ---------------------------------------------------------------------
> Hash Join (cost=36.10..14089.86 rows=440832 width=12)
> Hash Cond: ((bigone.dayname)::text = (week.dayname)::text)
> -> Seq Scan on bigone (cost=0.00..7992.32 rows=440832 width=36)
> -> Hash (cost=21.60..21.60 rows=1160 width=40)
> -> Seq Scan on week (cost=0.00..21.60 rows=1160 width=40)
> (5 rows)
>

create the index you want (and I don't)
> test=# CREATE INDEX bigone_dayname_idx ON bigone (dayname);
> CREATE INDEX

and retry.

> coges=# EXPLAIN SELECT pk, stuff FROM bigone natural JOIN week;
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join (cost=36.10..20245.12 rows=700001 width=12)
> Hash Cond: ((bigone.dayname)::text = (week.dayname)::text)
> -> Seq Scan on bigone (cost=0.00..10584.01 rows=700001 width=12)
> -> Hash (cost=21.60..21.60 rows=1160 width=40)
> -> Seq Scan on week (cost=0.00..21.60 rows=1160 width=40)

Marco Mariani

unread,
Dec 18, 2009, 11:50:19 AM12/18/09
to
Marco Mariani wrote:

>> INSERT INTO bigone (dayname) VALUES ((SELECT dayname FROM week ORDER
>> BY RANDOM() LIMIT 1));

of course this is unneded

>> coges=# EXPLAIN SELECT pk, stuff FROM bigone natural JOIN week;
>> QUERY PLAN

Fact is, even if you ANALYZE the table, postgres doesn't make use of the
index. It need a bigger week table. How big I don't know, and I'd like
to know if there are some heuristics.

Coniglio Sgabbiato

unread,
Dec 18, 2009, 12:02:55 PM12/18/09
to
Il 18/12/2009 17.29, Andreas Kretschmer ha scritto:
> Coniglio Sgabbiato<nob...@nowhere.net> wrote:
>> [CUT]
>>> In short: -1 from me.
>>>
>>> No, there are a LOT of use cases where nobody needs such an index,
>>> especially with small refereced tables.
>>
>> I meant creation of index in referencing table, not referenced, what do
>> you think about?
>
> Ahhh...
>
> test=# create table "Coniglio Sgabbiato" (id serial primary key);
> NOTICE: CREATE TABLE will create implicit sequence "Coniglio Sgabbiato_id_seq" for serial column "Coniglio Sgabbiato.id"
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "Coniglio Sgabbiato_pkey" for table "Coniglio Sgabbiato"
> CREATE TABLE
>
> Read the NOTICE - lines careful ...
>
> But:
>
> test=*# create table foobar(i int references "Coniglio Sgabbiato" on delete cascade);
> CREATE TABLE

Sorry I cant understand what you mean here, I already know about
implicit creation of indexes on primary keys, I wanted a similar feature
on foreign keys... :)

Andreas Kretschmer

unread,
Dec 18, 2009, 12:03:51 PM12/18/09
to

Sure, the planner works with costs, and so you can calculate this costs.

I haven't time at the moment to illustrate this, maybe later...

Coniglio Sgabbiato

unread,
Dec 18, 2009, 1:13:42 PM12/18/09
to
Il 18/12/2009 17.39, Marco Mariani ha scritto:
> Coniglio Sgabbiato wrote:
>
>>> In short: -1 from me.
>>>
>>> No, there are a LOT of use cases where nobody needs such an index,
>>> especially with small refereced tables.
>>
>> I meant creation of index in referencing table, not referenced, what
>> do you think about?
>
>
> Ok, so I took the time to write a test case.
>
> Can you explain me the effect of the index bigone_dayname_idx, apart
> from taking space and slowing insertions?
>
> BTW, most of my foreign keys are never used in joins, are there purely
> for referential integrity, and reference small tables.
[CUT]

I got your point, good one. Otherwise, I believe that an index on
foreign keys in little tables is a negligible overhead, either in disk
space and in cpu time on inserts. In big tables it might be useful if
the join condition is associate to another condition, for example, if
t1.foo references t2.foo then:

select * from t1 join t2 on t1.foo=t2.foo and t1.foo>'something'

I am assuming (did not test it) that when t1 is huge the planner will
use the index, if present, on t2.foo. What do you think about?

Coniglio Sgabbiato

unread,
Dec 18, 2009, 1:17:22 PM12/18/09
to

have to check relative moisture and wind direction first :)

Coniglio Sgabbiato

unread,
Dec 18, 2009, 1:19:46 PM12/18/09
to

typo:

I am assuming (did not test it) that when t1 is huge the planner will

use the index, if present, on _t1.foo_. What do you think about?

Marco Mariani

unread,
Dec 20, 2009, 5:18:06 PM12/20/09
to
On Dec 18, 7:13 pm, Coniglio Sgabbiato <nob...@nowhere.net> wrote:

> I got your point, good one. Otherwise, I believe that an index on
> foreign keys in little tables is a negligible overhead,

We're not talking about an index in a little table -- but about an
index in a possibly HUGE table
So this is an index that slows down a lot of things.
An index that references a (relatively) small table, hence has a low
selectivity, so it is useless because using it would be slower than a
full scan.

Coniglio Sgabbiato

unread,
Dec 21, 2009, 4:40:35 PM12/21/09
to

True, I was assuming here that a fk on a big table might be there not
only for referential integrity. I was thinking about complex joins with
some "where" clause involved in the join, apart the identity fk-id (se
my post above), that will hold the planner from performing a full scan.
But my assumption it is not the general case, it is only _my_ case, so
your statement against the fk idx automatic creation is right, many
times it would be actually a waste of resources.
So... what about Mysql? It is OT but now I'm curious about their
decision to make it automatic :)

Jasen Betts

unread,
Dec 22, 2009, 3:35:52 AM12/22/09
to
On 2009-12-21, Coniglio Sgabbiato <nob...@nowhere.net> wrote:

> So... what about Mysql? It is OT but now I'm curious about their
> decision to make it automatic :)

it it seems like just more arbitrary mysql stuff.

Marco Mariani

unread,
Dec 22, 2009, 4:44:16 AM12/22/09
to
Coniglio Sgabbiato wrote:

> So... what about Mysql? It is OT but now I'm curious about their
> decision to make it automatic :)

Mysql is even worse in that regard, since its SQL optimizer is... a very
special kid, as some parents would say, it will use indexes even when
scans would be more appropriate.
Several explanations with tests and charts are given in "Refactoring SQL
Applications", which I recommend
(http://oreilly.com/catalog/9780596514976) where the authors compare
Oracle, MSSQL and Mysql.
Needless to say, Postgres behaves more like the first two.

Coniglio Sgabbiato

unread,
Dec 22, 2009, 7:51:32 AM12/22/09
to

Mysql has several issues I really don't like. Its "group by" behavior
simply shocked me. This is just another point against it. I cannot
understand why people choose it over PG. Perhaps built-in replication is
the only functionality Mysql has over PG.

Mladen Gogala

unread,
Dec 22, 2009, 1:32:40 PM12/22/09
to

That's why MySQL will be assimilated. The resistance is futile.

--
http://mgogala.byethost5.com

0 new messages