Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Surrogate primary key plus unique constraint vs. natural primary key: data integrity?

592 views
Skip to first unread message

Wolfgang Keller

unread,
Mar 8, 2013, 2:22:33 PM3/8/13
to
Hello,

I'm aware that this might be an FAQ, but I've googled to no avail so
far. The discussion/documents that I've found relating to the
subject revolve around other aspects than data integrity, and this is
what I care for.

I've always been used to using exclusively natural (comosite) primary
keys, to warrant data integrity. Now computer science people (I'm just
an engineer) keep telling me that concerning data integrity, it would be
absolutely equivalent to use a surrogate primary key together with a
unique constraint on the natural key.

I seem to vaguely remember from my classes long ago that there is an
issue with data integrity, especially referential integrity with this
alternative, but I can't figure anymore out what it was precisely.

Am I mistaken?

Or could anyone give an example where the approach of using a surrogate
primary key together with a unique constraint on the natural key would
"break" data integrity (especially referential integrity) which could be
avoided by using a natural (composite) primary key?

When they taught us the different levels of normalisation, the professor
proceeded this way; by giving an example of a not-normalised schema for
each level and how this would break integrity in case of certain
(legal) operations on the data.

Is there a similar example for my case?

Any hints to corresponding explanations that I haven't found yet are
welcome.

Sorry for opening a potential can of worms.

TIA,

Sincerely,

Wolfgang

P.S.: In my specific application case surrogate keys are a non-starter
anyway, since the schema I am working on serves for
exchanging/federating data from several independent sources, and if
records coming from different input systems referring to the same
"real world" item have different keys, the data from the different
sources can't be federated/exchanged without prohibitive manual extra
work.

But I'm just looking for the generic case.

Roy Hann

unread,
Mar 8, 2013, 3:05:17 PM3/8/13
to
Wolfgang Keller wrote:

> Hello,
>
> I'm aware that this might be an FAQ, but I've googled to no avail so
> far. The discussion/documents that I've found relating to the
> subject revolve around other aspects than data integrity, and this is
> what I care for.
>
> I've always been used to using exclusively natural (comosite) primary
> keys, to warrant data integrity. Now computer science people (I'm just
> an engineer) keep telling me that concerning data integrity, it would be
> absolutely equivalent to use a surrogate primary key together with a
> unique constraint on the natural key.

I'm a computer science person, albeit one who dates back to the 1970s
and I will tell you (a) there's no such thing as a primary key; a key is
a key or it's not. Preferring one candidate key over another is purely
psychological. There is no theoretical/logical basis to introduce a
surrogate primary key in the presence of a natural key. Nor, of course
is there a basis for excluding a surrogate primary key either. Use
surrogates or don't, but don't pretend the surrogate key improves your
data integrity/quality because it doesn't.

And (b) in the absence of a natural key any surrogate key is pure hope
and conjecture (i.e. error dressed up as "best practice"). Doing that
certainly would mask a data massiveintegrity problem.

>
> I seem to vaguely remember from my classes long ago that there is an
> issue with data integrity, especially referential integrity with this
> alternative, but I can't figure anymore out what it was precisely.

It's a pain in the jacksie when surrogate keys spread deep into the
database abandoning the natural keys at the door so you can never find
anything except by joining (and joining and joining...).

I don't think there are any integrity problems provided you enforce
unique constraints on the natural keys.

But why introduce surrogate keys at all? Even in SQL databases we can
define a foreign key with ON UPDATE CASCADE.

> Am I mistaken?

Only a bit.

> Sorry for opening a potential can of worms.

Not at all. We like a good worm here.

--
Roy

Wolfgang Keller

unread,
Mar 8, 2013, 5:11:29 PM3/8/13
to
> (a) there's no such thing as a primary key;

PostgreSQL seems to think differently. >;->

It requires definition of a primary key afaik.

> a key is a key or it's not. Preferring one candidate key over
> another is purely psychological. There is no theoretical/logical
> basis to introduce a surrogate primary key in the presence of a
> natural key.

Don't tell that to me. ;-)

Tell that to those DB admins who refuse natural (composite) keys
and to the developers of certain DB application frameworks that
don't allow composite primary keys.

> Nor, of course is there a basis for excluding a surrogate primary key
> either. Use surrogates or don't, but don't pretend the surrogate key
> improves your data integrity/quality because it doesn't.

The surrogate key was not supposed to *improve* data integrity.

The issue was whether a surrogate primary key plus a unique constraint
on the natural key is *equivalent* concerning data integrity, especially
relational integrity, to a natural primary key.

> I don't think there are any integrity problems provided you enforce
> unique constraints on the natural keys.

So I'm mistaken...?

> But why introduce surrogate keys at all?

I don't want to open *that* can of worms, since for me the decision is
taken anyway. By the requirement to federate/exchange data
referring to the same "real world" item but coming from different
sources. Which would require prohibitive amounts of handwork with
surrogate keys, if it is possible at all.

I just wanted to know if there's a "killer" argument concerning
data integrity *against* the approach to use a surrogate primary key
plus a unique constraint.

> > Sorry for opening a potential can of worms.
>
> Not at all. We like a good worm here.

Bon appetit. >;->

Sincerely,

Wolfgang

James K. Lowden

unread,
Mar 8, 2013, 7:49:57 PM3/8/13
to
On Fri, 8 Mar 2013 23:11:29 +0100
Wolfgang Keller <feli...@gmx.net> wrote:

> I just wanted to know if there's a "killer" argument concerning
> data integrity *against* the approach to use a surrogate primary key
> plus a unique constraint.

The killer argument: You don't make unique things more unique by
numbering them. You don't even make nonunique things unique by
numbering them; it just seems that way.

> > There is no theoretical/logical basis to introduce a surrogate
> > primary key in the presence of a natural key.
>
> Tell that to those DB admins who refuse natural (composite) keys
> and to the developers of certain DB application frameworks that
> don't allow composite primary keys.

Roy correctly said that the relational model identifies a row by its
key. A relational database wouldn't require one of the keys to be
"primary".

I'm shocked, shocked to hear of DBAs and frameworks that don't
understand elementary aspects of the relational model. Shocked, I
say.

I have worked with tables whose natural key was four columns or more.
The worst was a tax lot table with IIRC at 13-column key: the
combination of account number, security id, quantity, date, and price
wasn't enough. Assigning each one a number made them more convenient
to refer to. But (as in your case) it was an extra column, another
number to generate and maintain, another opportunity for error.

HTH.

--jkl

Norbert_Paul

unread,
Mar 8, 2013, 2:37:07 PM3/8/13
to
Hi Wolfgang,

your concerns seem to have a contradiction:

> I've always been used to using exclusively natural (comosite) primary
> keys, to warrant data integrity. Now computer science people (I'm just
> an engineer) keep telling me that concerning data integrity, it would be
> absolutely equivalent to use a surrogate primary key together with a
> unique constraint on the natural key.

Note that they say "absolutely equivalent". Though I have never
heard of a "relative equivalence" this means that what applies to
A applies to B and vice versa.

> Or could anyone give an example where the approach of using a surrogate
> primary key together with a unique constraint on the natural key would
> "break" data integrity (especially referential integrity) which could be
> avoided by using a natural (composite) primary key?

When the approaches are, in fact, equivalent such an example cannot exist.

Hope that helped.

Norbert
an engineer, too :)

George Neuner

unread,
Mar 9, 2013, 2:24:35 AM3/9/13
to
On Fri, 8 Mar 2013 20:05:17 +0000 (UTC), Roy Hann
<spec...@processed.almost.meat> wrote:

>I'm a computer science person,

Ditto.

>and I will tell you (a) there's no such thing as a primary key; a key is
>a key or it's not. Preferring one candidate key over another is purely
>psychological. There is no theoretical/logical basis to introduce a
>surrogate primary key in the presence of a natural key.

Correct. However there may be a practical basis ... e.g., it may be
computationally expensive to perform comparisons on the natural key
and so you would prefer to work with a less expensive surrogate when
possible.

>But why introduce surrogate keys at all? Even in SQL databases we can
>define a foreign key with ON UPDATE CASCADE.

Many intro DB texts use strings - typically names - as keys in their
examples, while eliding (or glossing over) practical consideration of
using strings as keys in real world systems. Unfortunately, there are
too many people creating/administering databases who have, at best,
only intro level exposure.

Ideally [and in most cases], a foreign key in a dependent table will
be an internal object reference to the record in the primary table
rather than a distinct copy of the key value ... but you can't count
on that if you're working with older software or with a distributed
database [yes Virginia, there are reasons to have the primary and
dependent tables be on different hosts (or DBMS instances)].

Regardless of the internal FK representation, inserts/updates on the
dependent table still involve comparisons against the referenced key
column of the primary table. If your design makes extensive use of
FKs in dynamic tables, then for performance reasons you want to make
those comparisons as inexpensive as possible.

As always, YMMV.
George

Jan Hidders

unread,
Mar 9, 2013, 9:18:53 AM3/9/13
to
Op vrijdag 8 maart 2013 20:22:33 UTC+1 schreef Wolfgang Keller het volgende:
> Hello,
>
>
>
> I'm aware that this might be an FAQ, but I've googled to no avail so
>
> far. The discussion/documents that I've found relating to the
>
> subject revolve around other aspects than data integrity, and this is
>
> what I care for.
>
>
>
> I've always been used to using exclusively natural (comosite) primary
>
> keys, to warrant data integrity. Now computer science people (I'm just
>
> an engineer) keep telling me that concerning data integrity, it would be
>
> absolutely equivalent to use a surrogate primary key together with a
>
> unique constraint on the natural key.
>
>
>
> I seem to vaguely remember from my classes long ago that there is an
>
> issue with data integrity, especially referential integrity with this
>
> alternative, but I can't figure anymore out what it was precisely.
>
>
>
> Am I mistaken?
>
>
>
> Or could anyone give an example where the approach of using a surrogate
>
> primary key together with a unique constraint on the natural key would
>
> "break" data integrity (especially referential integrity) which could be
>
> avoided by using a natural (composite) primary key?


As you probably have understood by now from the other responses, there is no such example. The reasons to be against surrogate identifiers are usually more of the pragmatic type, e.g.,
- Schema designers tend to specify only the artificial PK and forget about the other keys
- Tables become harder to understand and interpret for people (can become very relevant at maintenance time)

An argument that might be considered more fundamental is about the updatability of columns. If you believe that columns of PKs should not be updatable (and some DBMSs enforce this) because such updates are in principle better thought of as a delete plus an insert, then introducing a surrogate identifier changes the updatability. But most DBMSs actually do support updating such columns, and IMO the arguments for their non-updatability are not strong anyway.

-- Jan Hidders

Roy Hann

unread,
Mar 9, 2013, 9:52:27 AM3/9/13
to
Wolfgang Keller wrote:

>> (a) there's no such thing as a primary key;
>
> PostgreSQL seems to think differently. >;->

Well like any product PostgreSQL will manifest the psychological quirks
of its implementors. I'll take your word that PostgreSQL does this.

> It requires definition of a primary key afaik.

If it is made to insist that you must prefer one key over all others,
even if you *really* don't want to, just because it suits someone
sometimes, that would be a serious flaw IMO.

>> a key is a key or it's not. Preferring one candidate key over
>> another is purely psychological. There is no theoretical/logical
>> basis to introduce a surrogate primary key in the presence of a
>> natural key.
>
> Don't tell that to me. ;-)
>
> Tell that to those DB admins who refuse natural (composite) keys

I encounter those people frequently. Usually they refuse on the basis
of some second-hand dogma about efficiency or performance, if they have
any argument at all.

Provided a key is familiar, stable, unique and simple it is a good key
to use. The desire for simplicity does not exclude composite keys. It
excludes unweildy composite keys. The cut-off beyond which you
judge a composite key to be unweildy is psychological, but we can
probably all agree that more than four or five components is a nuisance.
Two is definitely manageable. Three probably is too.

If a composite key leads to simpler, more comrehensible code and fewer
joins to get required data, with no great loss of performance, the case
for a composite is water-tight.

> and to the developers of certain DB application frameworks that
> don't allow composite primary keys.

Again, frameworks are products and they will manifest the mental tics
and quirks of their developers. They will also be designed around the
misconceptions of the developers, and since they are invariably
programmers--usually OO programmers--they will usually be deeply
ignorant of database technology, and proud of it.

Choosing a so-called "framework" (I'll pick on Hibernate by name) is the
first step to completely botching the database design. You end up with
an inefficient object repository in which all knowledge of the business
is subordinated to knowledge of the objects.

No one gives a shit about objects except an object-oriented programmer.
The end-users and the business CERTAINLY don't want to know about
objects.

>> Nor, of course is there a basis for excluding a surrogate primary key
>> either. Use surrogates or don't, but don't pretend the surrogate key
>> improves your data integrity/quality because it doesn't.
>
> The surrogate key was not supposed to *improve* data integrity.
>
> The issue was whether a surrogate primary key plus a unique constraint
> on the natural key is *equivalent* concerning data integrity, especially
> relational integrity, to a natural primary key.

OK, I will admit I missed the point of the question.

Assuming the unique constraints are defined that's the job done. You
can add any other columns you like to the table, including a surrogate
key.

A surrogate key plus a unique constraint is not some much equivalent to
using natural keys; the surrogate key is irrelevant. You can have it or
not. Your choice.

Repeating an argument I made above, if a surrogate key leads to simpler,
more comrehensible code and fewer joins to get required data, with no
great loss of performance, the case for a surrogate is water-tight.

The point here being dogma isn't helpful. You have to think.

But I can be as dogmatic as the next man, and my dogma is use natural
keys till you know they're a problem.

>> But why introduce surrogate keys at all?
>
> I don't want to open *that* can of worms, since for me the decision is
> taken anyway. By the requirement to federate/exchange data
> referring to the same "real world" item but coming from different
> sources. Which would require prohibitive amounts of handwork with
> surrogate keys, if it is possible at all.

I think you are arguing that using natural keys make federation not
just easier, but more possible. I would agree, in general.

> I just wanted to know if there's a "killer" argument concerning
> data integrity *against* the approach to use a surrogate primary key
> plus a unique constraint.

Most DBAs and all programmers love writing code to solve problems. The
more the better. So my killer argument--natural keys are
usually way easier to work with--won't sway them one bit.

--
Roy


com...@hotmail.com

unread,
Mar 9, 2013, 6:02:29 PM3/9/13
to
On Friday, March 8, 2013 2:11:29 PM UTC-8, Wolfgang Keller wrote:
> The issue was whether a surrogate primary key plus a unique constraint
> on the natural key is *equivalent* concerning data integrity, especially
> relational integrity, to a natural primary key.

Wolfgang,

Key columns should be declared to be unique. If there are no other columns then declaring such a constraint is unnecessary but not wrong. ("Primary" might have some meaning to a product but it doesn't in the relational model.)

Applying this to having the natural key columns plus a surrogate key column means that each of these key column sets needs an explicit unique constraint even if there aren't any further columns, because each is the other's "other columns".

A row is in a table when and only when it makes some membership condition true. The way you are using keys, your membership condition for the original table is that the natural key column values identify something AND that some further condition is true of those plus any further column values. Also, your membership condition for the new table is that condition/conjunction AND that the surrogate key column value identifies the same thing as the natural key column values do.

Clearly there is a one-to-one correspondence between the tables/designs given that each surrogate & natural key value pair in the new table is in fact how you are identifying things.

You want tables/designs to be "equivalent". What does this mean? One sense is that for an old expression involving the old table giving an old value you want a corresponding new expression involving the new table to give a corresponding new value. Note that this involves corresponding new/old table variable membership conditions, values of table variables and query results, and query and constraint expressions (including literals).

One correspondence is "the new version of a condition/value/expression everywhere looks like the old one with the appropriate surrogate key columns/values added to the natural key columns/values". Another correspondence is "the new version of a condition/value/expression involving just that one table variable adds the appropriate surrogate key columns/values but in every other condition/value/expression it replaces the natural key columns/values by the the appropriate surrogate key columns/values".

In this sense, each of those new designs is "equivalent" to the old one.

Note that "appropriate column/value" means that you have to use/expect a surrogate-natural value pair that appears in the new table!

Note also that the correspondence applies to membership conditions and constraint expressions, not just values and query expressions. A foreign key (or equivalent) constraint on the natural key has to be converted to the corresponding constraint on the new key (natural+surrogate pair or just surrogate).

If you do not convert constraints appropriately when a new table has both natural+surrogate key columns then the two databases will not be equivalent. Because there will possibly be new rows not corresponding to old rows, because their natural+surrogate pairs will not be constrained appropriately. Integrity will not have been maintained.

In summary: I explained a notion of equivalence of designs under a correspondence. Every key needs a constraint. Wherever a natural+surrogate same-identification pair appears in any new version of a table variable it must be constrained to appear in the new version of the natural+surrogate pairs table variable. If your approach is NOT that then the argument of the preceding paragraph kills it.

philip

Harry Tuttle

unread,
Mar 11, 2013, 3:48:12 AM3/11/13
to
Wolfgang Keller, 08.03.2013 23:11:
>> (a) there's no such thing as a primary key;
>
> PostgreSQL seems to think differently. >;->
>
> It requires definition of a primary key afaik.

No it doesn't.

You can have a unique constraint without having a primary key (and you can have a FK to that unique constraint)

The only way the SQL database differ is how they treat NULL values if the unique constraint is made of nullable columns.




rob...@gmail.com

unread,
Mar 11, 2013, 5:48:45 AM3/11/13
to
For a single relation, obviously there is no difference. But at the database level there is at least one case - composite overlapping keys - when a surrogate key design can lead to anomalies if not supplemented by other restrictions.

Maybe is not the best example, but let’s take the following database (with the obvious foreign keys):

clients {client}
client_orders {client, order}
client_locations {client, location}

deliveries {delivery_no, client, order, location}

Now add surrogate keys to all relations, and replace the last one with:

deliveries {delivery_no, order_id, location_id}

You may see that client attribute is no longer needed since it can be determined from either order_id or location_id. However this design no longer enforce the rule that the order and the location should refer to the same client.

Wolfgang Keller

unread,
Mar 11, 2013, 2:53:56 PM3/11/13
to
> For a single relation, obviously there is no difference. But at the
> database level there is at least one case - composite overlapping
> keys

I have lots of those in the schema that I use. The entities are in
fact arranged into a deeply nested hierarchy (which corresponds very
well to the semantics of the application domain), where the (natural)
keys of "child" entities include foreign keys to (often several
different) "parent" entities. With lots of overlapping.

Due to the level of depth of the nesting, primary keys, especially for
n:m(:o:...) tables can be long - I stopped counting at 15 columns or so.

> - when a surrogate key design can lead to anomalies if not
> supplemented by other restrictions.

That's exactly the kind of application case I had in mind.

Lots of tables with lots of foreign key constraints, often several of
them in one primary key of a "child" entity.

> Maybe is not the best example, but let’s take the following database
> (with the obvious foreign keys):
>
> clients {client}
> client_orders {client, order}
> client_locations {client, location}
>
> deliveries {delivery_no, client, order, location}
>
> Now add surrogate keys to all relations, and replace the last one
> with:
>
> deliveries {delivery_no, order_id, location_id}
>
> You may see that client attribute is no longer needed since it can be
> determined from either order_id or location_id. However this design
> no longer enforce the rule that the order and the location should
> refer to the same client.

Thanks a *lot* for that one. And even if you do include the client_id in
deliveries (as the standard suggestion would probably be), that still
won't enforce the consistency here. And I wouldn't even *want to* try
and investigate what constraints I would need to enforce integrity in my
case, given the shear number of entities and the complexity of the
dependencies between the primary/foreign keys.

Since I couldn't find an example like that in the (online as well as
printed) literature, is there a textbook (or online documentation) about
these issues? First, to quote it when I have to argue with people, and
second, to improve my own education.

Sincerely,

Wolfgang

Wolfgang Keller

unread,
Mar 11, 2013, 2:54:05 PM3/11/13
to
> >> (a) there's no such thing as a primary key;
> >
> > PostgreSQL seems to think differently. >;->
> >
> > It requires definition of a primary key afaik.
>
> No it doesn't.

I have to admit, my knowledge of PostgreSQL did not reach very far. ;-)

Thanks for the correction.

As too often, I just mis-took a personal habit (always define a primary
key - don't slap me please) for a requirement.

Sincerely,

Wolfgang

Jan Hidders

unread,
Mar 11, 2013, 5:38:50 PM3/11/13
to
Op maandag 11 maart 2013 19:53:56 UTC+1 schreef Wolfgang Keller het volgende:
Not so fast. :-)

Are your hierarchies perhaps inheritance or part-of hierarchies? In that case there is probably still no problem. It's not so much the overlapping that creates the problem, but the partial overlapping. Pretty much anything that you can model in EER models is not really a problem under that type of mapping transformation.

The example shows indeed an important insight, namely that this particular way to transform your schema does not work. However, it is certainly possible to map it to an equivalent relational schema while only using keys and common foreign keys over surrogate identifiers. For example:

clients {id, client}
key {id}
key {client}

orders{id, order}
key {id}
key {order}

client_orders {id, client-id, order-id}
key {id}
key {client-id, order-id}
fk (client-id) -> clients(client-id)
fk (order-id) -> orders(order-id)

locations{id, location}
key {id}
key {location}

client_locations {id, client-id, location-id}
key {id}
key {client-id, location-id}
fk (client-id) -> clients(id)
fk (location-id) -> locations(id)

deliveries {id, delivery_no, client-id, order-id, location-id}
key {id}
key {delivery_no}
fk (client-id, order-id) -> client_orders(client-id, order-id)
fk (client-id, location-id) -> client_locations(client-id, location-id)

I'm not saying it's pretty, just possible. ;-)

-- Jan Hidders

James K. Lowden

unread,
Mar 11, 2013, 5:41:03 PM3/11/13
to
On Mon, 11 Mar 2013 19:53:56 +0100
Wolfgang Keller <feli...@gmx.net> wrote:

> Since I couldn't find an example like that in the (online as well as
> printed) literature, is there a textbook (or online documentation)
> about these issues? First, to quote it when I have to argue with
> people, and second, to improve my own education.

If you want to argue with people, database design is a fine place to
start. ;-)

The design you've described

> deeply nested hierarchy (which corresponds very
> well to the semantics of the application domain

sounds like what sometimes happens when the designer understood object
orientation better than the relational model.

> Due to the level of depth of the nesting, primary keys, especially for
> n:m(:o:...) tables can be long - I stopped counting at 15 columns or
> so.

Most objects in the real world can be identified by just a few
attributes. If each of those 15 columns were just a single digit, you
could describe 10^15 rows, or 1000 for every star in our galaxy. That
suggests to me that the design could be simplified, perhaps with a
flamethrower from a safe distance.

I suspect your instinct is right, and that the issue you've hit on is
only the tip of the iceberg. If so, the question isn't how to identify
the entities in your database, but how to choose the entities to
identify. IOW how to design the database in the first place, the
logical model.

What to read about that? Fabian Pascal has lately been updating his
papers on database design. You might want to check
http://www.dbdebunk.com/p/papers_3.html.

HTH.

--jkl

com...@hotmail.com

unread,
Mar 11, 2013, 6:14:44 PM3/11/13
to
On Monday, March 11, 2013 11:53:56 AM UTC-7, Wolfgang Keller wrote:
> > For a single relation, obviously there is no difference. But at the
> > database level there is at least one case - composite overlapping
> > keys

Last message I mentioned two particular ways of changing your database: adding surrogate side-by-side with naturals everywhere, and adding a surrogate in a 1:1 base table and replacing the naturals by the surrogate everywhere else. Here you are more or less talking about adding a surrogate everywhere.

> And I wouldn't even *want to* try
> and investigate what constraints I would need to enforce integrity in my
> case, given the shear number of entities and the complexity of the
> dependencies between the primary/foreign key]s.

So your base tables using natural columns that get surrogates added need the constraint added that each surrogate is paired with the naturals in its 1:1 table. You should add a redundant but optimization-helping fk definition involving a surrogate corresponding to each fk involving its naturals. Why is this overwhelming?

> Since I couldn't find an example like that in the (online as well as
> printed) literature, is there a textbook (or online documentation) about
> these issues? First, to quote it when I have to argue with people, and
> second, to improve my own education.

When you add a surrogate column corresponding to some natural key columns you have to add the fact the surrogate has to be paired with those naturals in the 1:1 base table. Do you need any more theory than the definition/reason for "constraint", "a constraint restricts a base table to values that can validly arise"?

philip

Cimode

unread,
Mar 12, 2013, 11:56:45 AM3/12/13
to
All unique identifiers are at some point in time a surrogate key. Only designer's subjectivity qualify them as natural or primary key.

In the context of guaranteeing data integrity, I am still amazed at how such implementation-driven minor point triggers so much debate.

Roy Hann

unread,
Mar 12, 2013, 12:26:12 PM3/12/13
to
Cimode wrote:

> All unique identifiers are at some point in time a surrogate key.

That is a truism. No identification exists in nature.

> Only designer's subjectivity qualify them as natural or primary key.

Maybe I mistake your point but you seem to suggest it is decided on a
whim. It is a consequence of defining the enterprise of interest. A
value assigned outside the enterprise of interest that is a key
within it is "natural".

A credit card number is a synthetic/surrogate key in the card issuer's
database but it's a natural key in the merchant's database.

I will quckly concede that "natural" key is a poor choice of
terminology. "Alien key" might be better, or "extant key". I'm sure we
can think of yet better names.

> In the context of guaranteeing data integrity, I am still amazed at
> how such implementation-driven minor point triggers so much debate.

Well when you work with bone-heads who actually ban natural keys from
being propagated as foreign keys, and who introduce a new surrogate in
every table, and in effect create link-lists of tables that you have
to navigate to do anything, you don't give in without a fight.

--
Roy


rob...@gmail.com

unread,
Mar 12, 2013, 12:47:31 PM3/12/13
to
On Monday, March 11, 2013 8:53:56 PM UTC+2, Wolfgang Keller wrote:
>
> Since I couldn't find an example like that in the (online as well as
> printed) literature, is there a textbook (or online documentation) about
> these issues? First, to quote it when I have to argue with people, and
> second, to improve my own education.
>

Unfortunately I cannot recommend any paper on this particular subject (dbdebunk is still a good reference), my observations are mainly based on practice, I use composite natural keys a lot.

The main point is that foreign keys do not represent references to rows, but subsets of values. And an overlapping foreign key actually denotes a subset of an intersection (or join) of some relations. This has no equivalence in OO world, that’s why the analogy between foreign keys and references is so dangerous (in spite it apparently works for single attribute keys).

rob...@gmail.com

unread,
Mar 12, 2013, 12:54:12 PM3/12/13
to
On Monday, March 11, 2013 11:38:50 PM UTC+2, Jan Hidders wrote:
> deliveries {id, delivery_no, client-id, order-id, location-id}
>
> key {id}
> key {delivery_no}
> fk (client-id, order-id) -> client_orders(client-id, order-id)
> fk (client-id, location-id) -> client_locations(client-id, location-id)
>
> I'm not saying it's pretty, just possible. ;-)
>


Well, would be simpler to add foreign keys to superkeys, this would solve the problem (I actually use this technique in practice to simplify constraint implementations). But let's not forget that the main argument of surrogate keys supporters is the elimination of composite foreign keys.

Cimode

unread,
Mar 12, 2013, 2:24:19 PM3/12/13
to
Le mardi 12 mars 2013 17:26:12 UTC+1, Roy Hann a écrit :
> Cimode wrote:
>
>
>
> > All unique identifiers are at some point in time a surrogate key.
>
>
>
> No identification exists in nature.
An no need to distinguish things either. Any theory is nothing but natural, including relational theory.

> > Only designer's subjectivity qualify them as natural or primary key.
>
>
>
> Maybe I mistake your point but you seem to suggest it is decided on a
>
> whim.
Your assumption. Not mine.

> It is a consequence of defining the enterprise of interest. A
>
> value assigned outside the enterprise of interest that is a key
>
> within it is "natural".
An *enterprise of interest* does not say anything about the fact that a surrogate key may at some future point in time be considered a natural key. But the point is that subjectivity can not be taken from the equation in any scheme involving establishing a unique identifier.

> A credit card number is a synthetic/surrogate key in the card issuer's
>
> database but it's a natural key in the merchant's database.
See above.

> I will quckly concede that "natural" key is a poor choice of
>
> terminology. "Alien key" might be better, or "extant key". I'm sure we
>
> can think of yet better names.
>
>
>
> > In the context of guaranteeing data integrity, I am still amazed at
>
> > how such implementation-driven minor point triggers so much debate.
>
>
>
> Well when you work with bone-heads who actually ban natural keys from
>
> being propagated as foreign keys, and who introduce a new surrogate in
>
> every table, and in effect create link-lists of tables that you have
>
> to navigate to do anything, you don't give in without a fight.
Sorry, but this is your battle.

I stopped making myself sick on this issue for a while since I know who my audience is and what it knows about database theory.

Explaining distinguishibility seems a more important challenge to me than focusing the debate on whether they feel more comfortable with a surrogate or a natural. I have too many developers who do not understand the concept on a logical level. IMHO.

Regards
> Roy

Jan Hidders

unread,
Mar 12, 2013, 3:22:38 PM3/12/13
to
Op dinsdag 12 maart 2013 17:54:12 UTC+1 schreef rob...@gmail.com het volgende:
Is it? I'm agnostic on the issue myself, and I would never use that exact argument. Rather I would claim that in general the approach tends to reduce their number, which is true.

-- Jan Hidders


Roy Hann

unread,
Mar 12, 2013, 3:56:47 PM3/12/13
to
Cimode wrote:

> Any theory is nothing but natural, including relational theory.

Huh? Is that what you intended to write? That makes no sense as it
stands.

>> value assigned outside the enterprise of interest that is a key
>> used within it is "natural".

> An *enterprise of interest* does not say anything about the fact
> that a surrogate key may at some future point in time be considered a
> natural key.

The enterprise of interest (EoI) doesn't "say" anything, ever. It is
merely the context for a conceptual model. The EoI is whatever my
customer/user tells me it is. If my customer tells me he is given some
information and I am satisfied it uniquely determines an entity in his
EoI then it is a natural key in his EoI. There really is no room to
argue that it is not.

> But the point is that subjectivity can not be taken from
> the equation in any scheme involving establishing a unique identifier.

I think you have abbreviated this argument excessively. I really don't
see your point.

>> A credit card number is a synthetic/surrogate key in the card issuer's
>> database but it's a natural key in the merchant's database.

> See above.

See what above? And why? Your brevity borders on cryptic.

> Explaining distinguishibility seems a more important challenge

Another cryptic squib. What is challenging about it?

--

Roy

Cimode

unread,
Mar 12, 2013, 5:09:34 PM3/12/13
to
On 12 mar, 20:56, Roy Hann <specia...@processed.almost.meat> wrote:
> Cimode wrote:
> > Any theory is nothing but natural, including relational theory.
>
> Huh?  Is that what you intended to write?  That makes no sense as it
> stands.
Apologies but I have no other way to say this. It means exactly what
it says.

> >> value assigned outside the enterprise of interest that is a key
> >> used within it is "natural".
> > An *enterprise of interest* does not say anything about the fact
> > that a surrogate key may at some future point in time be considered a
> > natural key.
>
> The enterprise of interest (EoI) doesn't "say" anything, ever.  It is
> merely the context for a conceptual model.

> The EoI is whatever my
> customer/user tells me it is. If my customer tells me he is given some
> information and I am satisfied it uniquely determines an entity in his
> EoI then it is a natural key in his EoI. There really is no room to
> argue that it is not.
A design effort should also be an as-faithful-as-possible
representation of a segment of reality in time.
As I mentioned, I am not interested discussing this specific subject
at length.

> > But the point is that subjectivity can not be taken from
> > the equation in any scheme involving establishing a unique identifier.
>
> I think you have abbreviated this argument excessively.  I really don't
> see your point.
I believe the hidden point of the thread was about what would give the
impression that there is such thing as a natural or primary key since
there is not.

I rephrase my take on that :

1> Any key is at some point in time a surrogate key . There is no
such thing as a natural key.
2> A design effort to represent a segment of reality is bound to
designer's subjectivity. (Designers are not machines).
3> Tuple distinguishibility is a part of the design effort, involving
upgrade of surrogate to natural key.

1 + 2 + 3 implies designer's own subjectivity partly explains the
impression hat there would be such thing as a primary key.

> >> A credit card number is a synthetic/surrogate key in the card issuer's
> >> database but it's a natural key in the merchant's database.
> > See above.
>
> See what above?  And why?  Your brevity borders on cryptic.
I make the exact same point you are making. I am merely using
different terminology.

> > Explaining distinguishibility seems a more important challenge
>
> Another cryptic squib.  What is challenging about it?
Explaining distinguishibility of tuples vs distinguishibility of
physical rows is a challenge to most developers since most of them
confuse Logical/Physical Layers. .

> Roy

James K. Lowden

unread,
Mar 12, 2013, 7:42:54 PM3/12/13
to
On Tue, 12 Mar 2013 11:24:19 -0700 (PDT)
Cimode <cim...@hotmail.com> wrote:

> > It is a consequence of defining the enterprise of interest. A
> > value assigned outside the enterprise of interest that is a key
> > within it is "natural".
>
> An *enterprise of interest* does not say anything about the fact that
> a surrogate key may at some future point in time be considered a
> natural key. But the point is that subjectivity can not be taken
> from the equation in any scheme involving establishing a unique
> identifier.

I suppose you're on solid ground there, in a way. The determination of
what is an entity, of what will be modelled in the database, is surely
subjective. We might even say "establishing a unique identifier" is,
at best, heuristic. It's unique until it's not. I remember when
currency mapped 1:1 to country.

Roy's point is that the surrogate/natural nomenclature derives from
(subjective) perspective. Do you disagree with that, or just think
the matter too trifling to discuss?

--jkl

Cimode

unread,
Mar 12, 2013, 9:37:38 PM3/12/13
to
On 13 mar, 00:42, "James K. Lowden" <jklow...@speakeasy.net> wrote:
> On Tue, 12 Mar 2013 11:24:19 -0700 (PDT)
>
> Cimode <cim...@hotmail.com> wrote:
> > > It is a consequence of defining the enterprise of interest.  A
> > > value assigned outside the enterprise of interest that is a key
> > > within it is "natural".
>
> > An *enterprise of interest* does not say anything about the fact that
> > a surrogate key may at some future point in time be considered a
> > natural key.  But the point is that subjectivity can not be taken
> > from the equation in any scheme involving establishing a unique
> > identifier.
>
> I suppose you're on solid ground there, in a way.  The determination of
> what is an entity, of what will be modelled in the database, is surely
> subjective.  We might even say "establishing a unique identifier" is,
> at best, heuristic.  It's unique until it's not.  I remember when
> currency mapped 1:1 to country.
That is not what I said.

The point I am trying to express is that the process of making a
natural key/primary key is inherently subjective even though a unique
identifier has to respond to precise criteria. I am not talking about
entities in general.


> Roy's point is that the surrogate/natural nomenclature derives from
> (subjective) perspective.  Do you disagree with that, or just think
> the matter too trifling to discuss?
I have nothing to agree or disagree with.

The precise point of my comment is that any unique identifier is
*fundamentally* artificial prior to a designer's effort which may
never occur. It only becomes natural once a designer's effort at
formalizing a unique identifier imposes a reasonable (but subjective)
choice of attribute subset to constitute a candidate key, then a
natural key. As far I am concerned, the process of *natural key*
generation in itself is way more interesting than the *surrogate vs
natural key* debate, which is more an implementation issue than a
theoretical issue in RM.

But that is just me.

vldm10

unread,
Mar 13, 2013, 3:46:54 AM3/13/13
to
Dana petak, 8. ožujka 2013. 20:22:33 UTC+1, korisnik Wolfgang Keller napisao je:


> I've always been used to using exclusively natural (comosite) primary
>
> keys, to warrant data integrity. Now computer science people (I'm just
>
> an engineer) keep telling me that concerning data integrity, it would be
>
> absolutely equivalent to use a surrogate primary key together with a
>
> unique constraint on the natural key.


Hi Wolfgang,

It seems to me that you did not understand what is surrogate key. I suggest you look at the paper called "RM / T" by E. Codd and there you will find the definition of the surrogate key.

Please note that a surrogate key is the primary key. Note also that surrogate key is very bad solution. We note also that the surrogate key is used only in the binary relation, the so-called P-relations.

Vladimir Odrljin

Roy Hann

unread,
Mar 13, 2013, 5:51:02 AM3/13/13
to
Cimode wrote:

>> > But the point is that subjectivity can not be taken from
>> > the equation in any scheme involving establishing a unique identifier.
>>
>> I think you have abbreviated this argument excessively. �I really don't
>> see your point.
> I believe the hidden point of the thread was about what would give the
> impression that there is such thing as a natural or primary key since
> there is not.
>
> I rephrase my take on that :
>
> 1> Any key is at some point in time a surrogate key . There is no
> such thing as a natural key.

You are giving excessive weight to the formal logic and no weight at all
the *application" of formal logic. Sure it's all just symbols and there
is no truth either, only valid derivations. But I want to *apply*
logic and set theory to database management. If you reject natural keys
you cannot be talking about databases. That's not to say your point is
wrong; just that it is literally useless.

> 2> A design effort to represent a segment of reality is bound to
> designer's subjectivity. (Designers are not machines).

Whole-heartedly agreed. Nor are the user/customers who commission the
designers.

> 3> Tuple distinguishibility is a part of the design effort, involving
> upgrade of surrogate to natural key.

Now you're starting to make some sense, but we are way past needing
to be persuaded that formal logic underlies what we do when we design a
database.

> 1 + 2 + 3 implies designer's own subjectivity partly explains the
> impression hat there would be such thing as a primary key.

Nah, you've lost me again.

>> > Explaining distinguishibility seems a more important challenge
>>
>> Another cryptic squib. �What is challenging about it?
> Explaining distinguishibility of tuples vs distinguishibility of
> physical rows is a challenge to most developers since most of them
> confuse Logical/Physical Layers. .

More sense.

Really, you need to show your work. It's a good thing I am willing to
press for explanations of your gnomic utterances.

--
Roy


rob...@gmail.com

unread,
Mar 13, 2013, 6:37:00 AM3/13/13
to
On Tuesday, March 12, 2013 9:22:38 PM UTC+2, Jan Hidders wrote:
>
> Is it? I'm agnostic on the issue myself, and I would never use that exact argument. Rather I would claim that in general the approach tends to reduce their number, which is true.
>

Application programmers usually invoke this argument, they find it easier to deal with an integer key column. Also many ORMs and frameworks do not support composite keys. And actually I’ve seen many programmers that didn’t heard about composite keys, I’ve even been asked once by a PM why I did not added a primary key on a table. It took me a while to realize that by primary key he was meaning a surrogate.

Roy Hann

unread,
Mar 13, 2013, 7:06:52 AM3/13/13
to
rob...@gmail.com wrote:

> IοΏ½ve even been asked once by a PM why I did not added a primary key on
> a table. It took me a while to realize that by primary key he was
> meaning a surrogate.

Did he also tell you it was "unprofessional" not to add a primary
surrogate key? I've been told that. :-|

--
Roy


Cimode

unread,
Mar 13, 2013, 7:20:56 AM3/13/13
to
On 13 mar, 10:51, Roy Hann <specia...@processed.almost.meat> wrote:
> Cimode wrote:
> >> > But the point is that subjectivity can not be taken from
> >> > the equation in any scheme involving establishing a unique identifier.
>
> >> I think you have abbreviated this argument excessively.  I really don't
> >> see your point.
> > I believe the hidden point of the thread was about what would give the
> > impression that there is such thing as a natural or primary key since
> > there is not.
>
> > I rephrase my take on that :
>
> > 1> Any key is at some point in time a surrogate key .  There is no
> > such thing as a natural key.
>
> You are giving excessive weight to the formal logic and no weight at all
> the *application" of formal logic.
Interesting. The former determines the latter.

> Sure it's all just symbols and there
> is no truth either, only valid derivations.  But I want to *apply*
> logic and set theory to database management. If you reject natural keys
> you cannot be talking about databases.  That's not to say your point is
> wrong; just that it is literally useless.
Clarifying the process of natural key generation neither equates to
rejecting them nor does it say anything about their usefulness in
database design.

I wish I had your confidence.

> > 2> A design effort  to represent a segment of reality is bound to
> > designer's subjectivity.  (Designers are not machines).
>
> Whole-heartedly agreed.  Nor are the user/customers who commission the
> designers.
>
> > 3> Tuple distinguishibility is a part of the design effort, involving
> > upgrade of surrogate to natural key.
>
> Now you're starting to make some sense, but we are way past needing
> to be persuaded that formal logic underlies what we do when we design a
> database.


> > 1 + 2 + 3 implies designer's own subjectivity partly explains the
> > impression  hat there would be such thing as a primary key.
>
> Nah, you've lost me again.
Perhaps that is because, I only view primary key as a design milestone
that is no more or less important than the rest of the process.

> >> > Explaining distinguishibility seems a more important challenge
>
> >> Another cryptic squib.  What is challenging about it?
> > Explaining distinguishibility of tuples vs distinguishibility of
> > physical rows is a challenge to most developers since most of them
> > confuse Logical/Physical Layers.  .
>
> More sense.


> Really, you need to show your work.  It's a good thing I am willing to
> press for explanations of your gnomic utterances.
Please allow me to take some burden off your shoulders. Please ignore
my useless comments.

"Lie is not the main enemy of Truth, Certainty is" (Nietzche)

> Roy

Regards

Roy Hann

unread,
Mar 13, 2013, 9:38:59 AM3/13/13
to
Cimode wrote:

>> Really, you need to show your work. �It's a good thing I am willing to
>> press for explanations of your gnomic utterances.
> Please allow me to take some burden off your shoulders. Please ignore
> my useless comments.

You may be sure I will do that, just as soon as I figure out which ones
are useless.

--
Roy

Cimode

unread,
Mar 13, 2013, 1:00:38 PM3/13/13
to
On 13 mar, 14:38, Roy Hann <specia...@processed.almost.meat> wrote:
> Cimode wrote:

> You may be sure I will do that, just as soon as I figure out which ones
> are useless.
Now, you are making sense. :)

Regards...
> Roy

Wolfgang Keller

unread,
Mar 13, 2013, 1:18:37 PM3/13/13
to
> > Since I couldn't find an example like that in the (online as well as
> > printed) literature, is there a textbook (or online documentation)
> > about these issues? First, to quote it when I have to argue with
> > people, and second, to improve my own education.
>
> Unfortunately I cannot recommend any paper on this particular subject
> (dbdebunk is still a good reference),

I just had to find the right keywords to search for, in this case it's
the term "overlapping foreign keys". It's actually addressed in some
textbooks.

It's astounding imho that e.g. normalisation seems to be sufficiently
addressed in database design classes, textbooks etc., while the problem
of proper unification appears to be quite underrated, even ignored, even
by lots of computer scientists by education. Which has also been
confirmed by the number of answers I got in this group that
apparently totally ignored the issue.

Thank goodness I had professors who had a clue (and taught me to always
stay away from surrogate keys) even though I'm "just" an engineer.

Sincerely,

Wolfgang

P.S.: I consider the issue as closed. I am using an industry standard
database model that has been established in cooperation by domain
experts and (apparently less clueless) database developers and which I
want (and need) to use as is. It matches the domain semantics really
well and it's pretty complex (over 400 entities and likely to grow). It
has been in development and use for over 10 years now and of course,
transforming it and adding ridiculous kludges just to satisfy the
professional ignorance of DB admins and "developers" while creating lots
of incompatibility problems as well as potential sources for data
inconsistency is totally out of question.

Jan Hidders

unread,
Mar 13, 2013, 5:50:36 PM3/13/13
to
On 2013-03-13 17:18:37 +0000, Wolfgang Keller said:

>>>
>>> Since I couldn't find an example like that in the (online as well as
>>> printed) literature, is there a textbook (or online documentation)
>>> about these issues? First, to quote it when I have to argue with
>>> people, and second, to improve my own education.
>>
>> Unfortunately I cannot recommend any paper on this particular subject
>> (dbdebunk is still a good reference),
>
> I just had to find the right keywords to search for, in this case it's
> the term "overlapping foreign keys". It's actually addressed in some
> textbooks.
>
> It's astounding imho that e.g. normalisation seems to be sufficiently
> addressed in database design classes, textbooks etc., while the problem
> of proper unification appears to be quite underrated, even ignored, even
> by lots of computer scientists by education. Which has also been
> confirmed by the number of answers I got in this group that
> apparently totally ignored the issue.
>
> Thank goodness I had professors who had a clue (and taught me to always
> stay away from surrogate keys) even though I'm "just" an engineer.

Yes, always listen to your database professor! :-)

-- Jan Hidders

paul c

unread,
Mar 13, 2013, 6:37:37 PM3/13/13
to
On 12/03/2013 4:42 PM, James K. Lowden wrote:
> Roy's point is that the surrogate/natural nomenclature derives from
> (subjective) perspective. Do you disagree with that, or just think
> the matter too trifling to discuss?

Actually, that was the point of Cimode's very first sentence in this
thread though I imagine some literalists might trip over the term "point
in time". Even though my first language is English, the term doesn't
upset me because I know whenever people try to hone in on meaning that
precedes requirements and therefore is relevant to db design, the real
meaning is "in a sense". He also fairly obviously stated in his very
first message that the matter is "too trifling to discuss".

paul c

unread,
Mar 13, 2013, 6:40:44 PM3/13/13
to
On 13/03/2013 3:37 PM, paul c wrote:
> therefore is relevant to db design

sorry about the typo', it should read 'irrelevant' (as the follow-on
sidetracks were).

Jan Hidders

unread,
Mar 13, 2013, 7:38:36 PM3/13/13
to
Sure. But that does not mean that all arguments in favor of surrogate
identifiers are therefore complete nonsense, even if most of the people
that favor them tend to be crappy data modelers with a weak grasp of
the importance of having proper integrity constraints. :-) I'd like to
see a bit more precise analysis on when and why they are a problem,
rathern then a naive sweeping generalization that declarese them as
always evil.

-- Jan Hidders

rob...@gmail.com

unread,
Mar 14, 2013, 6:55:42 AM3/14/13
to
On Thursday, March 14, 2013 1:38:36 AM UTC+2, Jan Hidders wrote:
> Sure. But that does not mean that all arguments in favor of surrogate
> identifiers are therefore complete nonsense, even if most of the people
> that favor them tend to be crappy data modelers with a weak grasp of
> the importance of having proper integrity constraints. :-) I'd like to
> see a bit more precise analysis on when and why they are a problem,
> rathern then a naive sweeping generalization that declarese them as
> always evil.
>
> -- Jan Hidders


I summarized my pros and cons on surrogates:
Pros:
- Stability
- "Convenience" to use
- Performance (?)

Cons:
- Difficult to synchronize with external systems.
- You need more joins (performance?)
- You lose the simplicity and power of overlapping composite foreign keys constraints (discussed in this thread).
- In some cases you lose the ability to manipulate data as a set (see bellow*).


* If you need the newly allocated value you have to process data one row at time and use some system functions such as "get_next" and "get_current". Things get complicated if you need the value twice, because you have to use "get_next" first time and "get_current" further on. Since you can’t control the order of evaluation of expression terms you have to rely on some procedural language.

If you have an alternate natural key, you can manipulate data as a set, but have to do it in two steps, first insert, then retrieve the surrogate values by using the natural key. In practice you may also need a temporary table to avoid constraint violations.


As can be seen, two "pros" are disputable, and two "cons" are very subtle (and many users may never encounter in practice). So ultimately is a matter of tradeoff.

Eric

unread,
Mar 14, 2013, 8:17:46 AM3/14/13
to
On 2013-03-13, Wolfgang Keller <feli...@gmx.net> wrote:
<snip>
> It's astounding imho that e.g. normalisation seems to be sufficiently
> addressed in database design classes, textbooks etc., while the problem
> of proper unification appears to be quite underrated, even ignored, even
> by lots of computer scientists by education. Which has also been
> confirmed by the number of answers I got in this group that
> apparently totally ignored the issue.

It is inherent in the nature of groups like this that you will get
answers from people who genuinely misunderstand the question, wilfully
misunderstand the question, or deliberately ignore the question.
Sometimes that is all you get, but you have been rather luckier.
Unfortunately sifting the answers is your responsibility.

I think in the end the answer is:
Don't use a surrogate key unless you have to, and then be aware of the
pitfalls. A common pitfall is to forget to include some constraints in
the design which would be there automatically with only natural keys.

Your concerns about data integrity are a database design issue.

Those people (and systems) that use surrogate keys as a matter of course
are usually wrong, not because surrogate keys are inherently bad, but
because those people are not aware of the implications for database
design of using surrogate keys.

Eric
--
ms fnd in a lbry

Wolfgang Keller

unread,
Mar 14, 2013, 12:07:37 PM3/14/13
to
> I remember when currency mapped 1:1 to country.

It *never* did.

Trivial counter-examples are e.g. all those tiny european nations such
as Liechtenstein, Andorra, San Marino, Monaco etc.

Sincerely,

Wolfgang

James K. Lowden

unread,
Mar 14, 2013, 1:01:28 PM3/14/13
to
On Thu, 14 Mar 2013 00:38:36 +0100
Jan Hidders <hid...@gmail.com> wrote:

> > Application programmers usually invoke this argument, they find it
> > easier to deal with an integer key column.
>
> Sure. But that does not mean that all arguments in favor of surrogate
> identifiers are therefore complete nonsense, ...
> I'd like to see a bit more precise analysis on when and why they are
> a problem, rathern then a naive sweeping generalization that
> declarese them as always evil.

I said earlier in this thread that numbering things doesn't make them
uniquer.

Almost every table I've ever seen that used a surrogate key failed
to identify the natural key -- and the uniqueness constraint. The
sweeping generalization declaring evil holds: if you're creating
uniqueness by enumeration, you're doing it wrong.

So the first argument against is that adding a surrogate key, if done
harmlessly, is *extra* work. The job of analyzing the data still
needs to be done. Else duplicates will be permitted, even though the
surrogate "ensures uniqueness".

Having recognized that it's extra work, why do it? That is, is there
any good, nonreligious, engineering reason? There can be. There's no
theoretical reason insofar as the theory speaks only of keys, not
kinds of keys, and is silent on the subject of adding an identifier
for convenience. But there can be an engineering reason.

I remember one time.

In 1997 I was working with a table with over a million rows (the
horror!) whose natural key was 9 columns in my recollection. It was the
biggest table on a server with 512 MB RAM and a 90 MHz processor. For
us, it was the mother of all tables.

This table needed a history kept of its changes, although only the
quantity column could change over time. Nightly processing also
involved some self-joins. For reasons of efficiency, we added an ID
column. It made certain operations faster, and some queries easier to
write.

I reckon by now I've created thousands of tables. That's the only one
I remember inventing an identifier for. We thought it was necessary at
the time, and maybe it was.

--jkl

James K. Lowden

unread,
Mar 14, 2013, 1:08:32 PM3/14/13
to
On Thu, 14 Mar 2013 17:07:37 +0100
Wolfgang Keller <feli...@gmx.net> wrote:

> > I remember when currency mapped 1:1 to country.
>
> It *never* did.
>
> Trivial counter-examples are e.g. all those tiny european nations such
> as Liechtenstein, Andorra, San Marino, Monaco etc.

Ah, right. What was it Roy said about "enterprise of interest"?

Thirty years in the finanical industry will inform certain assumptions
about which countries matter and which don't. I guess we never had
securities issued in or traded in those countries. CWA: not there,
don't exist. ;-)

--jkl

.NET Sales Support

unread,
Mar 14, 2013, 6:00:41 PM3/14/13
to
"Roy Hann" <spec...@processed.almost.meat> wrote in message <news:khnkv4$o4a$1...@speranza.aioe.org>...

> Cimode wrote:
>
>> All unique identifiers are at some point in time a surrogate key.
>
> That is a truism. No identification exists in nature.
>
>> Only designer's subjectivity qualify them as natural or primary key.
>
> Maybe I mistake your point but you seem to suggest it is decided on a
> whim. It is a consequence of defining the enterprise of interest. A
> value assigned outside the enterprise of interest that is a key
> within it is "natural".
>
> A credit card number is a synthetic/surrogate key in the card issuer's
> database but it's a natural key in the merchant's database.
>
> I will quckly concede that "natural" key is a poor choice of
> terminology. "Alien key" might be better, or "extant key". I'm sure we
> can think of yet better names.
>
>> In the context of guaranteeing data integrity, I am still amazed at
>> how such implementation-driven minor point triggers so much debate.
>
> Well when you work with bone-heads who actually ban natural keys from
> being propagated as foreign keys, and who introduce a new surrogate in
> every table, and in effect create link-lists of tables that you have
> to navigate to do anything, you don't give in without a fight.
>
> --
> Roy

When you do that navigation, do the bone-heads then b*tch about your bringing the
DBMS to its knees just to link a row in Z to its distantly corresponding row in A?

select A.ID, A.Foo, A.Bar
from A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z
where A.ID = B.A_ID and B.ID = C.B_ID and C.ID = D.C_ID and ... and Y.ID = Z.Y_ID and Z.ID = ?


Jan Hidders

unread,
Mar 14, 2013, 8:03:43 PM3/14/13
to
On 2013-03-14 10:55:42 +0000, rob...@gmail.com said:

> On Thursday, March 14, 2013 1:38:36 AM UTC+2, Jan Hidders wrote:
>> Sure. But that does not mean that all arguments in favor of surrogate>
>> identifiers are therefore complete nonsense, even if most of the
>> people> that favor them tend to be crappy data modelers with a weak
>> grasp of> the importance of having proper integrity constraints. :-)
>> I'd like to> see a bit more precise analysis on when and why they are a
>> problem,> rathern then a naive sweeping generalization that declarese
>> them as> always evil.
>>
>> -- Jan Hidders
>
>
> I summarized my pros and cons on surrogates:
> Pros:
> - Stability
> - "Convenience" to use
> - Performance (?)

Doubtful in my experience. Programmers tend to think that since the
join condition becomes simpeler the joins become faster, but that is
usually not the bottle neck. There can be a observable gain if the size
of many records decreases, which might typically happen for
cross-reference tables representing relationships. But significant
gains are rare in my experience.

> Cons:
> - Difficult to synchronize with external systems.

Not so much if there is also a natural key.

> - You need more joins (performance?)

Really? Why you think that? One typically would have the same number of
tables in the schema, so why would we need more joins?

> - You lose the simplicity and power of overlapping composite foreign
> keys constraints (discussed in this thread).

Using surrogate keys does not imply that you cannot use composite foreign keys.

> - In some cases you lose the ability to manipulate data as a set (see bellow*).
>
>
> * If you need the newly allocated value you have to process data one
> row at time and use some system functions such as "get_next" and
> "get_current". Things get complicated if you need the value twice,
> because you have to use "get_next" first time and "get_current" further
> on. Since you can’t control the order of evaluation of expression terms
> you have to rely on some procedural language.
>
> If you have an alternate natural key, you can manipulate data as a set,
> but have to do it in two steps, first insert, then retrieve the
> surrogate values by using the natural key. In practice you may also
> need a temporary table to avoid constraint violations.

Fair enough, although set-operations are rare at insertion-time in my
experience. But indeed, if you are inserting two linked records you
have to do some extra work. Could still be done in just two INSERTs,
though. :-)

> As can be seen, two "pros" are disputable, and two "cons" are very
> subtle (and many users may never encounter in practice). So ultimately
> is a matter of tradeoff.

Indeed it is.

-- Jan Hidders

rob...@gmail.com

unread,
Mar 15, 2013, 8:01:45 AM3/15/13
to
On Friday, March 15, 2013 2:03:43 AM UTC+2, Jan Hidders wrote:
>
> > - Performance (?)
>
>
> Doubtful in my experience. Programmers tend to think that since the
> join condition becomes simpeler the joins become faster, but that is
> usually not the bottle neck. There can be a observable gain if the size
> > of many records decreases, which might typically happen for
> cross-reference tables representing relationships. But significant
> gains are rare in my experience.
>

That’s why I put a "?".

> > - You need more joins (performance?)
>
> Really? Why you think that? One typically would have the same number of
> tables in the schema, so why would we need more joins?
>

Let’s consider the relation reviews{reviewer, review-language, book, rating} and tuple {34, 14, 66734, 4}. Can you tell anything about it? Now replace with {34, 'EN', 'ISBN-XXXXX’, 'verry good'}. Do we need any join if we want to search for a specific book, review language or rating?

Also if you have something like D -> C -> B -> A, each relation having it’s own surrogate, then you cannot join A and D directly (pretty stupid design, but found in practice).


>
> > - You lose the simplicity and power of overlapping composite foreign
> > keys constraints (discussed in this thread).
>
> Using surrogate keys does not imply that you cannot use composite foreign keys.
>

The problem may appear only if you replace a natural composite key by a surrogate.

Jan Hidders

unread,
Mar 16, 2013, 7:27:46 AM3/16/13
to
On 2013-03-15 12:01:45 +0000, rob...@gmail.com said:

> On Friday, March 15, 2013 2:03:43 AM UTC+2, Jan Hidders wrote:
>>
>>> - Performance (?)
>>
>>
>> Doubtful in my experience. Programmers tend to think that since the>
>> join condition becomes simpeler the joins become faster, but that is>
>> usually not the bottle neck. There can be a observable gain if the
>> size> > of many records decreases, which might typically happen for>
>> cross-reference tables representing relationships. But significant>
>> gains are rare in my experience.
>>
>
> That’s why I put a "?".
>
>>> - You need more joins (performance?)
>>
>> Really? Why you think that? One typically would have the same number
>> of> tables in the schema, so why would we need more joins?
>>
>
> Let’s consider the relation reviews{reviewer, review-language, book,
> rating} and tuple {34, 14, 66734, 4}. Can you tell anything about it?
> Now replace with {34, 'EN', 'ISBN-XXXXX’, 'verry good'}. Do we need any
> join if we want to search for a specific book, review language or
> rating?

Ok. Fair enough. If the query only needs the parts of the entity that
is in the foreign key that was replaced with the surrogate key, then,
yes, you need an extra join.

> Also if you have something like D -> C -> B -> A, each relation having
> it’s own surrogate, then you cannot join A and D directly (pretty
> stupid design, but found in practice).

Hmm, yeah, but that's just stupid design and not really inherent in the
use of surrogate keys. You could argue that the mistake cannot even be
made if you wouldn't have had surrogate keys.

>>> - You lose the simplicity and power of overlapping composite foreign> >
>>> keys constraints (discussed in this thread).
>>
>> Using surrogate keys does not imply that you cannot use composite foreign keys.
>>
>
> The problem may appear only if you replace a natural composite key by a
> surrogate.

As in "it's because you are using them you might be misled into
thinking you don't need composite keys"? Maybe, but what I would like
to know is if they cause some inherent problems that cannot be avoided,
even when modelling carefully.

-- Jan Hidders

rob...@gmail.com

unread,
Mar 18, 2013, 6:26:55 AM3/18/13
to
On Saturday, March 16, 2013 1:27:46 PM UTC+2, Jan Hidders wrote:
> As in "it's because you are using them you might be misled into
> thinking you don't need composite keys"? Maybe, but what I would like
> to know is if they cause some inherent problems that cannot be avoided,
> even when modelling carefully.

If you never replace a composite by a surrogate then the design should be 100% equivalent since there is a 1:1 correspondence between naturals and surrogates. In practice some constraints might be more difficult to implement since many DBMSs don’t allow foreign keys to views.

Jan Hidders

unread,
Mar 18, 2013, 2:22:02 PM3/18/13
to
Why would we need views for that/ And if we do replace composites by
surrogates?

-- Jan Hidders

rob...@gmail.com

unread,
Mar 19, 2013, 6:00:04 AM3/19/13
to
On Monday, March 18, 2013 8:22:02 PM UTC+2, Jan Hidders wrote:
> Why would we need views for that

Because surrogates do not carry information, so you can’t write constraints based on their values. In practice you may need to replace simple check constraints with some SELECT / WHERE constructs (which is not well supported).

>/ And if we do replace composites by surrogates?

Because if (a,b) <=> (x) and (b,c) <=>(y) then {(a,b) and (b,c)} is not equivalent to (x,y) (you replace a join by a product). So you will need some other constraint.

Jan Hidders

unread,
Mar 19, 2013, 7:53:09 AM3/19/13
to
On 2013-03-19 10:00:04 +0000, rob...@gmail.com said:

> On Monday, March 18, 2013 8:22:02 PM UTC+2, Jan Hidders wrote:
>> Why would we need views for that
>
> Because surrogates do not carry information, so you can’t write
> constraints based on their values. In practice you may need to replace
> simple check constraints with some SELECT / WHERE constructs (which is
> not well supported).

Ah, row constraints, yes, agreed. You mentioned foreign keys, so that
had me thinking of other constraints. And even if row constraints for
views would be possible, they would probably be bad for performance.

>> / And if we do replace composites by surrogates?
>
> Because if (a,b) <=> (x) and (b,c) <=>(y) then {(a,b) and (b,c)} is
> not equivalent to (x,y) (you replace a join by a product). So you will
> need some other constraint.

Agreed.

-- Jan Hidders


Wolfgang Keller

unread,
Mar 19, 2013, 10:53:28 AM3/19/13
to
> > > I remember when currency mapped 1:1 to country.
> >
> > It *never* did.
> >
> > Trivial counter-examples are e.g. all those tiny european nations
> > such as Liechtenstein, Andorra, San Marino, Monaco etc.
>
> Ah, right. What was it Roy said about "enterprise of interest"?
>
> Thirty years in the finanical industry will inform certain assumptions
> about which countries matter and which don't.

You bet that those countries do matter for "financial services".

If it wasn't for their "financial services", they simply wouldn't exist
any more.

Sincerely,

Wolfgang

Wolfgang Keller

unread,
Mar 19, 2013, 10:54:42 AM3/19/13
to
> Those people (and systems) that use surrogate keys as a matter of
> course are usually wrong, not because surrogate keys are inherently
> bad, but because those people are not aware of the implications for
> database design of using surrogate keys.

That's exactly what my own empiric experience has taught me: If they
"advocate" surrogate keys by default, they don't have clue or they don't
care or both. Mental lazyness.

And that's probably why my professor had told us to use natural
(composite) keys by default; because then you *have to* think about
*proper* unification *and* such integrity issues as the one with the
overlapping foreign keys are much less likely to remain unnoticed.

Sincerely,

Wolfgang

Cimode

unread,
Mar 19, 2013, 11:16:54 AM3/19/13
to

> And that's probably why my professor had told us to use natural
> (composite) keys by default; because then you *have to* think about
> *proper* unification *and* such integrity issues as the one with the
> overlapping foreign keys are much less likely to remain unnoticed.
Unique identification is EOI specific which makes defining *default*
type of identifier at best a moot concept, at worst a subjective
choice.

IMHO

Regards,

Jan Hidders

unread,
Mar 19, 2013, 9:32:21 PM3/19/13
to
That's why we have conceptual data models (as opposed to logical data
models) where you don't have to make that choice. Why does this worry
you so much?

-- Jan Hidders

Cimode

unread,
Mar 20, 2013, 9:56:23 AM3/20/13
to
On 20 mar, 01:32, Jan Hidders <hidd...@gmail.com> wrote:
<Snipped context>
> That's why we have conceptual data models (as opposed to logical data
> models) where you don't have to make that choice. Why does this worry
> you so much?
You misunderstand my intent.
My comments only aim at encouraging people to educate themselves
further by triggering further questionning above surface issues.
> -- Jan Hidders

Jan Hidders

unread,
Mar 20, 2013, 12:30:47 PM3/20/13
to
And you think having a serious discussion about that issue would not be
educational?

-- Jan Hidders

Cimode

unread,
Mar 20, 2013, 12:43:04 PM3/20/13
to
No. I think educational implies you must know who your audience is
and how one may want to best convey a message. Exposing formal
definitions is one way, raising inclusive questions that lead to
formal definitions and knowledge thirst is another.

IMHO.
> -- Jan Hidders

com...@hotmail.com

unread,
Mar 20, 2013, 2:31:56 PM3/20/13
to
On Wednesday, March 13, 2013 10:18:37 AM UTC-7, Wolfgang Keller wrote:
> I just had to find the right keywords to search for, in this case it's
> the term "overlapping foreign keys". It's actually addressed in some
> textbooks.

Beware that "overlapping foreign keys" is just a special case of introducing constraints with a surrogate key. Here what you are doing can require a constraint even if there is no other foreign.

Eg. Suppose you have S{a,...} and T{x,...} with constraint (among others) S(a,...) IMPLIES T(f(a),...). Ie for row s in S EXISTS t in T [f(s.a)=t.x]. You add surrogate k as key in S and fk in T to get S{k,a,...} and T{k,x,...}. Then the new constraint is S(k,a,...) IMPLIES T(k,f(a),...). Ie for row s in S EXISTS t in T [f(s.a)=t.x AND s.k=t.k]. So in the SQL style you have to add the "AND s.k=t.k". This is independent of whether S origninally had any fk into T.

To not miss any cases yet be mechanical about it you have to do what I said in my first message. Determine:
> corresponding new/old table variable membership conditions, values of table variables and query results, and query and constraint expressions (including literals).

philip

com...@hotmail.com

unread,
Mar 20, 2013, 2:41:14 PM3/20/13
to
On Wednesday, March 20, 2013 11:31:56 AM UTC-7, com...@hotmail.com wrote:
> Eg. Suppose you have S{a,...} and T{x,...} with constraint (among others) S(a,...) IMPLIES T(f(a),...). Ie for row s in S EXISTS t in T [f(s.a)=t.x]. You add surrogate k as key in S and fk in T to get S{k,a,...} and T{k,x,...}. Then the new constraint is S(k,a,...) IMPLIES T(k,f(a),...). Ie for row s in S EXISTS t in T [f(s.a)=t.x AND s.k=t.k]. So in the SQL style you have to add the "AND s.k=t.k". This is independent of whether S origninally had any fk into T.

I meant: add surrogate k as key in T (target) and fk in S (source). (This is reflected in the rest.)

philip

karl.s...@o2online.de

unread,
Aug 21, 2013, 10:03:30 AM8/21/13
to
Am Freitag, 8. März 2013 20:22:33 UTC+1 schrieb Wolfgang Keller:
>
>
>
> Or could anyone give an example where the approach of using a surrogate
>
> primary key together with a unique constraint on the natural key would
>
> "break" data integrity (especially referential integrity) which could be
>
> avoided by using a natural (composite) primary key?
>
>
>
Hello Wolfgang,

I am a engineer too and develop applications for work preparation in plant
inspection and maintainance. In our field of work unique surrogate keys
never violate referential integrity. On the contrary we had to migrate from
natural (composite) keys to Guids (surrogate keys with guaranteed uniqueness)
for referential integrity in complex structured data (moving subsystems with
all components from one system to another).

The advantage of surrogate keys is, they will be created, but never changed.
Changing the identification of entities never affects the referential integrity between entities. Using natural keys for linking of entities results in
massive updates.

Hope this is useful

m.f.G.
Karl Scheurer




Wolfgang Keller

unread,
Aug 25, 2013, 8:23:44 AM8/25/13
to
> I am a engineer too and develop applications for work preparation in
> plant inspection and maintainance.

CMMS/EAM?

I happen to be a reliability engineer, btw. and thus a user of such
systems.

And the datamodel I work with is MIMOSA's CRIS, in case you happen to
know that one. It uses composite primary keys well over ten columns in
length for certain entities.

> In our field of work unique surrogate keys never violate referential
> integrity.

They inevitably will, depending on the database schema, as
rob...@gmail.com has explained. As soon as you have overlapping
foreign key relationships. Which you will inevitably get sooner or
later during the evolution of your data model, according to Murphy's
law.

Meanwhile I've found out that half-decent database design textbooks
mention the issue right under that label ("overlapping foreign keys").

> On the contrary we had to migrate from natural (composite) keys to
> Guids (surrogate keys with guaranteed uniqueness) for referential
> integrity in complex structured data (moving subsystems with all
> components from one system to another).

GUIDs (which are "unique" only within the virtual IT "world" and have
absolutely no identification function in the real world) *must* fail
in this case.

If you need to identify data relating to a specific real-world physical
item (such as a compoent of a machine) throughout its entire lifespan
(from forge to scrapyard...), across different IT systems from
different vendors operated by different companies (original
manufacturer, operator, maintenance workshop, sometimes several
different owners over the lifespan...), a natural key is the only way
to accomplish this.

And with technical components, which usually hold a serial number from
the manufacturer, the "natural" key to use it just that: naturally
obvious; manufacturer's key (can be a composite itself) + serial number.

That's what MIMOSA uses, btw. Although they had the problem that
there seems to be no world-wide unique identification method for
companies or other organisations.

> The advantage of surrogate keys is, they will be created, but never
> changed.

They will. See the "information logistics" chain over the lifespan of
those physical items.

Or do you want to create a world where everything in the "real world"
is identified by its "SAP number" (or "personal IDs" in Sweden and
Norway)? I know the mess that results from that requirement in
companies that use SAP. The users who have to deal with that *hate*
it.

> Changing the identification of entities never affects the
> referential integrity between entities. Using natural keys for
> linking of entities results in massive updates.

In the technical world, a serialised component never changes its serial
number during its entire lifespan. In fact this would be a crime in any
half-civilised country, since it makes tracability impossible. Which is
a legal requirement for safety-critical components.

Sincerely,

Wolfgang

karl.s...@o2online.de

unread,
Aug 26, 2013, 8:45:15 AM8/26/13
to
Am Sonntag, 25. August 2013 14:23:44 UTC+2 schrieb Wolfgang Keller:

>
>
> > In our field of work unique surrogate keys never violate referential
>
> > integrity.
>
>
>
> They inevitably will, depending on the database schema, as
>
> rob...@gmail.com has explained. As soon as you have overlapping
>
> foreign key relationships. Which you will inevitably get sooner or
>
> later during the evolution of your data model, according to Murphy's
>
> law.

When mixing (external) identification and (refential integrity then you
are right. We strictly separat external and internal identity. Only
internal identity is used for referential integrity. Try tracability
with "natural keys" when moving a subsystem e.g. "TY12" with all its
components to "TZ12" and it should be "TX12". Changing parts of a plant
design is bread and butter for our company
>
>
>
> GUIDs (which are "unique" only within the virtual IT "world" and have
>
> absolutely no identification function in the real world) *must* fail
>
> in this case.
>
In our field of work we have to deal with function (User Identity),
actual hardware (serial number) and internal identity (planung
inspections and tests to ordered components with currently unknown
serial number). Linking hardware to function with guids is a simple
way to identify unknown hardware with functional ident. At end the
serial number is added as documentation.


>
> > The advantage of surrogate keys is, they will be created, but never
>
> > changed.
>
>
>
> They will. See the "information logistics" chain over the lifespan of
>
> those physical items.
>
>
Explain !


>
> Or do you want to create a world where everything in the "real world"
>
> is identified by its "SAP number" (or "personal IDs" in Sweden and
>
> Norway)? I know the mess that results from that requirement in
>
> companies that use SAP. The users who have to deal with that *hate*
>
> it.
>
No! Mixing external identification, uniqueness and linking between
entities might work, but not for our applications
>
>

>
> In the technical world, a serialised component never changes its serial
>
> number during its entire lifespan. In fact this would be a crime in any
>
> half-civilised country, since it makes tracability impossible. Which is
>
> a legal requirement for safety-critical components.
>
That is not the point. In powerplant stations components never been identified
by serial number but the intended function. The same valve, pump or motor
could serve different functions and has different idents accordingly. Deinstallation and reinstallation yields to a changed installtion.

m.f.G
Karl Scheurer

com...@hotmail.com

unread,
Aug 28, 2013, 6:20:44 PM8/28/13
to
On Monday, August 26, 2013 5:45:15 AM UTC-7, karl.s...@o2online.de wrote:
> Only
> internal identity is used for referential integrity.

Karl seems to have a very limited notion of referential integrity, that it is enough to constrain by an id FK. This ignores relevant natural key constraints. He doesn't seem to understand the problem discussed in this thread.

On Monday, March 11, 2013 2:48:45 AM UTC-7, rob...@gmail.com wrote:
> clients {client}
> client_orders {client, order}
> client_locations {client, location}
> deliveries {delivery_no, client, order, location}
>
> Now add surrogate keys to all relations, and replace the last one with:
> deliveries {delivery_no, order_id, location_id}
>
> You may see that client attribute is no longer needed since it can be determined from either order_id or location_id. However this design no longer enforce the rule that the order and the location should refer to the same client.

philip

karl.s...@o2online.de

unread,
Aug 31, 2013, 10:10:09 AM8/31/13
to
Am Donnerstag, 29. August 2013 00:20:44 UTC+2 schrieb com...@hotmail.com:

> > clients {client}
>
> > client_orders {client, order}
>
> > client_locations {client, location}
>
> > deliveries {delivery_no, client, order, location}
>
> >
>
> > Now add surrogate keys to all relations, and replace the last one with:
>
> > deliveries {delivery_no, order_id, location_id}
>
> >
>
> > You may see that client attribute is no longer needed since it can be determined from either order_id or location_id. However this design no longer enforce the rule that the order and the location should refer to the same client.
>
Philip is right, my notion of notion of referential integrity is very limited.
Based on a design in one of our applications (repair_orders) I would change
his design in this way

clients {client, client_id}
client_orders {client_id, order_no, location}
client_locations {client_id, location}
deliveries {delivery_no, order_no}

As long as clients can have several locations, I see the need to secure the
address of the order ("who has issued the order"). Referencing clients with
natural keys vioalates referential integrity when companies change names. One
of our customer recently changed from BHR to EHR. Client location can change
too (new factories) so the actual location has to be copied to orders.
Deliveries and Orders need no surrogate keys, since finished, commited business transactions never change.

m.f.G.

Karl Scheurer



James K. Lowden

unread,
Aug 31, 2013, 1:56:48 PM8/31/13
to
On Sat, 31 Aug 2013 07:10:09 -0700 (PDT)
karl.s...@o2online.de wrote:

> Referencing clients with natural keys vioalates referential integrity
> when companies change names. One of our customer recently changed
> from BHR to EHR.

The trading symbol "T" on the NYSE currently belongs to AT&T. The same
was true in 1970. But those aren't the same company or the same T!
During that same time, the company that was AT&T underwent many
changes and had many names (and symbols) until it was acquired by one
of its spinoffs, IIRC Cingular, which then opted to call itself AT&T.

Is that a reason not to use "exchange, symbol" as a natural key for the
stock? It is not. On any one day, there is only one T on the NYSE.
Permitting two in the database would make the model not reflect the
real world events it was created to capture.

In your case, you had a customer BHR and now have a customer EHR.
Same company, same address, same people, maybe. But the fact -- the
real-world fact -- is that you received some orders from BHR and
nowadays you're getting them from EHR. If you record those facts as
they occur, you can't go wrong. If you don't, you can.

The name change might be incidental or important. Old orders might be
handled by the new entity, or they might not. (The specifics of the
BHR/EHR transaction don't matter. What matters is that some name
changes are significant, and recording actual reality deals with
both cases equally well.)

Let's say all your orders use the customer name as a FK. You can
distinguish orders according to the name used. You can get the union
of orders with the union of names used (probably relying on some kind
of customer history table). The RI rules can require that each order
carry an order date for a name that lies between the name's valid start
and end dates.

There is no need to update all the history, or *any* of the history.
Just record the facts, and be prepared for the fact that facts change
over time.

Let's say instead you use CUSTID for the key, and just keep the name by
reference. If you don't keep the history, you have no way to query by
historical name, and no way to distinguish between orders received
under new and old names. If you do keep the history, you have at best
a fragile way: you can *infer* that orders received prior to the change
used the old name, but that inference relies on that date being
correct, something the DBMS cannot ensure. Changing that date changes
history, intentionally or not.

Perhaps the course I'm recommending would be more work, perhaps not
worthwhile, at least not yet. That's OK. But more work is not a
failing of RI, nor is using natural keys a problem for it.

--jkl

karl.s...@o2online.de

unread,
Sep 1, 2013, 4:59:19 AM9/1/13
to
You are right! Based on our application, querying for history client name
was newer a item. A correct identification for aggregating queries is all
what was required. Adding historical question is easy with
client_orders {client_id, order_no, client, location}

m.f.G.
Karl Scheurer
0 new messages