Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Choosing a surrogate key vs. a composite key - "holy war" issue or clear best practice?

1,048 views
Skip to first unread message

dan...@yahoo.com

unread,
Jun 28, 2008, 7:36:45 AM6/28/08
to
Early in my career, I was accustomed to seeing simple, sequence-
generated surrogate keys. Seemed clean and logical. Later in my
career, I saw seemingly monstrous (to me) composite keys of up to 5
fields. Found them initially repulsive because they made queries
difficult to read and write. I'm wondering if there is a clear best
practice on choosing a surrogate key over a composite key--or if this
is a "holy war" issue no one (collectively) currently agrees on.

Some pros to Composite primary keys:
--------------------------------------------------------

* They enforce uniqueness; meaning you could have a sequence-generated
surrogate key, yet redundant rows may still creep into your table.
They will be uniquely identifiable in that, yes, the surrogate key
values are unique; but all other column values in > 1 rows could be
identical.

* Simpler in the aspect of not having to create, manage, and have
programmers reference a surrogate key sequence

* Can't think of any other Pros to composite keys. Anyone?

Some cons to Composite primary keys:
---------------------------------------------------------

* Composite PKs with a large # of columns make queries difficult to
read and write

* Composite PKs with a large # of columns makes referencing tables
larger; e.g. a foreign key in a child table is a copy of a composite
PK in a parent table. This means more data storage is required in
referencing child tables and the burdens that presents.

* Some applications, like ESRI's ArcGIS Desktop, seem not to be
composite key aware, e.g. to those who know the product, you seemingly
can't do a "relate" to any existing external tables that use composite
keys; this makes database integration with legacy relational and
current relational databases, difficult. This may be the case
generally in OO programming and approaches and not at all unique to
ESRI--the presumption that all primary keys are only ever composed of
a single-column.

Some pros to surrogate primary keys:
-------------------------------------------------------

* Aside from the obvious ones, given the cons I've listed for
composite keys, it seems you can have a surrogate key yet still
enforce more than nominal row uniqueness by creating a unique
constraint/index on the fields you'd otherwise.

* Sometimes there's no good combination of keys for making a composite
primary key

What have I missed in this debate?

I'm open minded. And though it seems I'm biased toward surrogate keys,
I've found myself creating composite keys recently where either a
maximum of two or three fields make a suitable composite key (there's
a certain pleasure in not adding more columns than necessary; shaving
Chris Dates' opponents' beards with Occam's Razor?) and/or the table
I'm creating the composite key on is a "leaf" table--no other tables
are likely to reference it as a parent (saving people the difficulty
of reading/writing queries involving a composite pk).

If anyone knows of any good articles or book chapters fairly weighing
the pros and cons of surrogate vs. composite keys, I'm all eyes. Would
be interesting to see someone make the best case for one, then another
author make the best case for the other.

Thanks.

Dana

Frank van Bortel

unread,
Jun 28, 2008, 9:15:12 AM6/28/08
to
dan...@yahoo.com wrote:
> Early in my career, I was accustomed to seeing simple, sequence-
> generated surrogate keys. Seemed clean and logical. Later in my

They are not. They're called technical keys for a reason. They
are invented by technocrats.

> career, I saw seemingly monstrous (to me) composite keys of up to 5
> fields. Found them initially repulsive because they made queries
> difficult to read and write. I'm wondering if there is a clear best
> practice on choosing a surrogate key over a composite key--or if this
> is a "holy war" issue no one (collectively) currently agrees on.
>
> Some pros to Composite primary keys:
> --------------------------------------------------------
>
> * They enforce uniqueness; meaning you could have a sequence-generated
> surrogate key, yet redundant rows may still creep into your table.
> They will be uniquely identifiable in that, yes, the surrogate key
> values are unique; but all other column values in > 1 rows could be
> identical.

You do not seem to know the difference between Primay and Unique keys

>
> * Simpler in the aspect of not having to create, manage, and have
> programmers reference a surrogate key sequence

Yeah - let's keep it simple. Lets call all PK "ID".


>
> * Can't think of any other Pros to composite keys. Anyone?

They are logical.
Because of that, they are self-documenting.

>
> Some cons to Composite primary keys:
> ---------------------------------------------------------
>
> * Composite PKs with a large # of columns make queries difficult to
> read and write

Well, learn how to read and write!


>
> * Composite PKs with a large # of columns makes referencing tables
> larger; e.g. a foreign key in a child table is a copy of a composite
> PK in a parent table. This means more data storage is required in
> referencing child tables and the burdens that presents.

Storage - who cares? I don't, but I know some employers
do. They just cannot understand many cheap disks will be as fast
as many expensive "enterprise-class" disks. No - those disks
rotate faster, so the "have less latency" - yeah, and so what?

>
> * Some applications, like ESRI's ArcGIS Desktop, seem not to be
> composite key aware, e.g. to those who know the product, you seemingly
> can't do a "relate" to any existing external tables that use composite
> keys; this makes database integration with legacy relational and
> current relational databases, difficult. This may be the case
> generally in OO programming and approaches and not at all unique to
> ESRI--the presumption that all primary keys are only ever composed of
> a single-column.

ESRI and Oracle is not a match, made in heaven. I understand they
are in therapy, so who knows...

> Some pros to surrogate primary keys:
> -------------------------------------------------------
>
> * Aside from the obvious ones, given the cons I've listed for
> composite keys, it seems you can have a surrogate key yet still
> enforce more than nominal row uniqueness by creating a unique
> constraint/index on the fields you'd otherwise.
>
> * Sometimes there's no good combination of keys for making a composite
> primary key

Then, redo your (home)work, and apply analysis.

>
> What have I missed in this debate?

There is no debate. You're mumbling to yourself, and throwing
flamebait. This seems to happen every now and them - searching the
archives is not difficult, but hey, getting your name out is
far more important.
[snip]

--

Regards,
Frank van Bortel

hpuxrac

unread,
Jun 28, 2008, 9:44:31 AM6/28/08
to

Yes this is a religious topic like flavors of unix and linux.

Mostly it is known as surrogate versus natural keys your terminology
seems to imply something else.

It's been thrown around and debated and flamed on all/most of the
major database forums numerous times. You can find several long ones
here on cdos if you search the archive ( accessible thru google groups
interface and possibly other ones ).

dan...@yahoo.com

unread,
Jun 28, 2008, 9:59:33 AM6/28/08
to
On Jun 28, 9:44 am, hpuxrac <johnbhur...@sbcglobal.net> wrote:

> Yes this is a religious topic like flavors of unix and linux.
>
> Mostly it is known as surrogate versus natural keys your terminology
> seems to imply something else.

Thanks for correcting me. The same issue could be stated as synthetic
vs. natural keys. But your terminology is better paired than mine.
There are simple keys and composite keys (also known as complex keys
and concatenated keys). I tend to think of simple keys as surrogate/
synthetic, but a Social Security Number is both a simple key and a
natural key.

> It's been thrown around and debated and flamed on all/most of the
> major database forums numerous times. You can find several long ones
> here on cdos if you search the archive ( accessible thru google groups
> interface and possibly other ones ).

Thanks HP, will do. I'm trying to learn. Not push a particular agenda
and certainly not in a rude manner.

Dana

Mark D Powell

unread,
Jun 28, 2008, 10:11:13 AM6/28/08
to
On Jun 28, 9:15 am, Frank van Bortel <frank.van.bor...@gmail.com>
wrote:

A well designed system will likely have a mix of natural and surrogate
keys. Never use a surrogate key where a suitable natural key exists.
The practice of tacking a surrogate key on every table complicates SQL
more than multi-column PK keys. You still end up with multi-column
keys from inheritance when you attach a surrogate key to every table
plus you often have to create indexes on the natural key anyway since
the natural key value is what the customer uses to search the data.

Worse while the FK will enforce that no child rows exist without a
valid parent key value the FK is usually useless for enforcing
business rules involving data integrity. So if you did not have to
create an index on the natural key for access reasons you end up
creating unique indexes to enforce rules like no two rows will exist
with the same value combination in col-C and col-D.

Even in the case where the natural key value might change it is
usually fairly easy to handle the requirement by just duplicating the
existing key row data with a new PK, updating all existing child rows,
and then deleting the old PK row. We have discovered that using
surrogate keys does not always eliminate the need to update child data
where a PK value changes where corporate mergers, distributors buying
dealers, and similar activities exist. In fact updating or
recognizing the change with historical data is often easier where
natural keys exist within the data rather than surrogate keys
especially if the FK relationship was not defined to the database.

There are pro’s and con’s to each approach. The use of intelligence
in making the key selections rather than relying on dogma will result
in a better design.

IMHO -- Mark D Powell --

hpuxrac

unread,
Jun 28, 2008, 8:00:13 PM6/28/08
to
On Jun 28, 9:59 am, dana...@yahoo.com wrote:
> On Jun 28, 9:44 am, hpuxrac <johnbhur...@sbcglobal.net> wrote:
>
> > Yes this is a religious topic like flavors of unix and linux.
>
> > Mostly it is known as surrogate versus natural keys your terminology
> > seems to imply something else.
>
> Thanks for correcting me. The same issue could be stated as synthetic
> vs. natural keys. But your terminology is better paired than mine.
> There are simple keys and composite keys (also known as complex keys
> and concatenated keys). I tend to think of simple keys as surrogate/
> synthetic, but a Social Security Number is both a simple key and a
> natural key.

Well the SSN example again brings up problems. You don't want a
primary key that can ever change ( fk relationships for example ) or
be duplicated.

But what about identity fraud? How about when people ( as they can do
rarely ) actually change their SSN.

In most cases and applications yes a lot of people do use SSN ... does
it sometimes cause problems though?

> > It's been thrown around and debated and flamed on all/most of the
> > major database forums numerous times.  You can find several long ones
> > here on cdos if you search the archive ( accessible thru google groups
> > interface and possibly other ones ).
>
> Thanks HP, will do. I'm trying to learn. Not push a particular agenda
> and certainly not in a rude manner.

Not sure what Frank got into his bonnet for his reply. No worries. I
don't hang out much on any of the other database forums but there's a
lot of them. No shortage of discussions here in the past on cdos
though. Of course some good discussions on Tom Kyte's
asktom.oracle.com also.

Message has been deleted

DA Morgan

unread,
Jun 30, 2008, 3:56:58 PM6/30/08
to
Mark D Powell wrote:

> Here is another fact involving Social Security Numbers that many
> people do not know. SSN's are not truely unique as the Social
> Security Administration re-issued around 10,000 numbers (technically
> due to an error). While I believe the original holder for all these
> numbers were supposed to be dead some insurance companies and credit
> card issuers with extensive databases have probably encountered the
> issue.

On one hand you are correct ... SSN's, in the US, are not unique.
On the other hand for purposes of computer systems they are a great
natural key in that those circumstances where you need to have them
they'd better be unique.

What I'm thinking here is banking, credit cards, taxes, etc.

We all know there are dups but something must be done if a dup is
found. A surrogate key that allows this to happen sidesteps the
quick way to identify the dups and leaves open the possibility
of far worse things happening.

The sad fact, too, is that 99% of dup SSNs are the result of
identity theft ... not governmental incompetence.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

0 new messages