In many normalisation articles, they sometimes mention to mark the alternate
key's.
Why is it important to 'mark' these? Where are they used during the
normalisation-steps?
If you choose for a certain combination of attributes as a primary key, does
the other attributes, (that are part of the alternate key), influence the
normalisation process?
Regards,
Jos
Yes, they do. And the texts that suggest that you only need to look at the
primary key are simply wrong. In fact, the notion of primary key has no
place in normalization theory, at all. You just talk about the set of
candidate keys. For example, if you want to know if you are in 3NF you
should check that for every non-trivial dependency that ends in a non-key
attribute (i.e., an attribute not in any candidate key) the determinant is
the superset of a candidate key. The words "primary key" are nowhere to be
found in that definition, as you can see.
-- Jan Hidders>
If a uniqueness relationship exists in your data model, then you must
show it because it is the truth! That is very simple.
There is a story about the whole PRIMARY KEY thing that we don't like
to talk about. In his early papers, Dr. Codd created the idea of a
PRIMARY KEY, which was one of several candidate keys that was picked
for this honor. Later, he realized that he was still being influenced
by sequential file systems (think of a magnetic tape), which must have
a single key that is used to sort the file. All operations in a
sequential file system depend on (1) sorted order and (2) record
position within the file. The relational model does not have these
limits.
The current RM says a key is a key and nobody is special.
However, this insight was too late for SQL-86. We inherited the
PRIMARY KEY concept, and several SQL products made special PHYSICAL
allowances for it in their access methods and storage. It is the
default for FOREIGN KEY references in SQL-92 and cannot be NULL.
Here is an old article I did for INTELLIGENT ENTERPRISE magazine
http://www.intelligententerprise.com/010101/celko.shtml
===========================
One of the basic tricks in SQL is representing a many-to-many
relationship. You create a third table that references the two (or
more) tables involved by their primary keys. This third table has
quite a few popular names, such as "junction table" or "join table,"
but I know that it is a relationship. People tell you this, then leave
you on your own to figure out the rest.
For example, here are two tables:
CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
...);
CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
... );
Yes, I know using names for a key is a bad practice, but doing so will
make my examples easier to read. You can make a lot of different
many-to-many relationships between these two tables. (If you don't
believe me, just watch the Jerry Springer show some time.) The
simplest relationship table looks like this:
CREATE TABLE Pairs
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name));
The "Pairs" table let’s you insert rows like this:
('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
('Joe Celko', 'Brooke Shields')
Oops! I am shown twice with 'Brooke Shields' because the "Pairs" table
does not have its own key. This mistake is easy to make, but how to
fix it is not always obvious.
The "Orgy" table gets rid of the duplicated rows and makes this a
proper table:
CREATE TABLE Orgy
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key
The primary key for the table comprises two or more columns and is
called a compound key because of that fact.
('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
But the only restriction on the pairs is that they appear only once.
Every boy can be paired with every girl, much to the dismay of the
moral majority. Now, I want to make a rule that guys can have as many
gals as they want, but the gals have to stick to one guy.
The way I do this is to use a NOT NULL UNIQUE constraint on the
girl_name column, which makes it a key. It’s a simple key
because it is only one column, but it is also a nested key because it
appears as a subset of the compound PRIMARY KEY.
"Playboys" is a proper table, without duplicated pairs, but it also
enforces the condition that I get to play around with one or more
ladies:
CREATE TABLE Playboys
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE, -- nested key
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key
('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key
The Playgirls table would permit these rows from our original set:
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
The moral majority is pretty upset about this Hollywood scandal and
would love for us to stop running around and settle down in nice,
stable couples:
CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE -- nested key,
REFERENCES Girls(girl_name),
PRIMARY KEY(boy_name, girl_name)); -- compound key
The "Couples" table lets you insert these rows from the original set:
('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')
Think about this table for a minute. The PRIMARY KEY is now redundant.
If each boy appears only once in the table and each girl appears only
once in the table, then each (boy_name, girl_name) pair can appear
only once.
From a theoretical viewpoint, I could drop the compound key and make
either boy_name or girl_name the new primary key, or I could just
leave them as candidate keys.
SQL products and theory do not always match. Many products make the
assumption that the PRIMARY KEY is in some way special in the data
model and will be the way to access the table most of the time.
In fairness, making a special provision for the primary key is not a
bad assumption because the REFERENCES clause uses the PRIMARY KEY of
the referenced table as a default. Many programmers are not aware that
a FOREIGN KEY constraint can also reference any UNIQUE constraint in
the same table or in another table. The following nightmare will give
you an idea of the possibilities. The multiple column versions follow
the same syntax.
CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,
...
self_ref INTEGER NOT NULL
REFERENCES Foo(fookey),
outside_ref_1 INTEGER NOT NULL
REFERENCES Bar(bar_key),
outside_ref_2 INTEGER NOT NULL
REFERENCES Bar(other_key),
...);
CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,
other_key INTEGER NOT NULL UNIQUE,
...);
But getting back to the nested keys, just how far can you go with
them? My favorite example is a teacher's schedule kept in a table like
this (I am leaving off reference clauses and CHECK() constraints):
CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));
That choice of a primary key is the most obvious one -- use all the
columns. Typical rows would look like this: ('Mr. Celko', 'Database
101', 222, 6) The rules you want to enforce are:
A teacher is in only one room each period
A teacher teaches only one class each period
A room has only one class each period
A room has only one teacher in it each period.
Stop reading and see what you come up with for an answer.
CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period), -- rule #3
UNIQUE (teacher, room, period), -- rule #4
PRIMARY KEY (teacher, class, room, period));
You know that you have 24 ways to pick three objects from a set of
four in an ordered sequence (permutation). If order does not matter,
then you have a combination and only four subsets, all of which I have
used in the UNIQUE constraints. Although column order is important in
creating an index, you can ignore it for now and then worry about
index tuning later.
I probably want to drop the PRIMARY KEY as redundant if I have all
four of these constraints in place. But what happens if I drop the
PRIMARY KEY and then one of the constraints?
CREATE TABLE Schedule-2
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period)); -- rule #3
I can now insert these rows in the second version of the table:
('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
This gives me a very tough sixth period class load because I have to
be in two different rooms at the same time. Things can get even worse
when another teacher is added to the schedule:
('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
('Ms. Shields', 'Database 101', 223, 6)
Ms. Shields and I are both in room 223, trying to teach different
classes at the same time. I think you get the idea that a relationship
table is not simple. And I did not even get into referential actions
and the fact that a relationship can have its own attributes apart
from those of its participants.
Thank you for your insites ...
In newbie-language:
The attributes shall be dependend of ALL the key's
Nothing else then ALL the key's?
Regards,
Jos
--CELKO-- <71062...@compuserve.com> wrote in message ...
>>> In many normalisation articles, they sometimes mention to mark the
>alternate
>key's. Why is it important to 'mark' these? Where are they used during
>the normalisation-steps? <<
>
>If a uniqueness relationship exists in your data model, then you must
>show it because it is the truth! That is very simple.
>
>There is a story about the whole PRIMARY KEY thing that we don't like
>to talk about. In his early papers, Dr. Codd created the idea of a
>PRIMARY KEY, which was one of several candidate keys that was picked
>for this honor. Later, he realized that he was still being influenced
>by sequential file systems (think of a magnetic tape), which must have
>a single key that is used to sort the file. All operations in a
>sequential file system depend on (1) sorted order and (2) record
>position within the file. The relational model does not have these
>limits.
>
>The current RM says a key is a key and nobody is special.
>
>>> snipped <<<<
That is Chris Date's short description of Third Normal Form (3NF). The
actual quote is "the key, the whole key and nothing but the key" which
is a parody of the oath that witnesses take in a trial in the United
States -- "Do you swear to tell the Truth, the whole Truth and
nothing but the Truth, so help you God?"
Chris has a short piece at www.dbdebunk.com on primary keys and how to
pick them.
Not BCNF? Because for 3NF it should read "The non-key attributes shall ..."
>The actual quote is "the key, the whole key and nothing but the key" which
>is a parody of the oath that witnesses take in a trial in the United States
>-- "Do you swear to tell the Truth, the whole Truth and nothing but the
>Truth, so help you God?"
.. and then it turns out that there is not always such a thing as *the* key.
Go figure. :-)
-- Jan Hidders
The circle is complete ...
We are back at my original point/question .. how to interprete 'the key' in
this parodie/saying?
Only the primary key that you choose, or 'all' the alternate key's.
>Chris has a short piece at www.dbdebunk.com on primary keys and how to
>pick them.
Great source of DB-information ... added to my favorites ... thanks,
Jos
Neither is really correct. Remember that it is just an asses bridge and
should not be regarded as a substitute for the actual definitions of 3NF
and BCNF. The closest thing would probably be:
"Every attribute depends upon a candidate key, a whole candidate key and
nothing but a candidate key."
But that of course doesn't have the ring of the original. :-)
-- Jan Hidders