Use of "surrogate" vs. "natural" keys

Cdenman3

unread,

May 26, 2000, 3:00:00 AM5/26/00

to

After having solicited the opinion of others on my DB tables and the
organization thereof, I am now thoroughly confused by the differing opinions.
In a nutshell, I need to track my client's illnesses and the treatments they
receive for them. I have the following tables:

tblClient
ClientID (SSN)
etc

tblImpairments
ImpairmentID (PK)(Autonumber)
Name ("Schizophrenia", etc)
Etc

tblinkClientToImpairment
ClientImpairmentID (PK)(Autonumber)
ClientID
ImpairmentID

tblTreatmentEvent
TreatmentEventID (PK)(Autonumber)
ClientImpairmentID (ForeignKey)
Doctor (who prescribed)
StartDate
StopDate
Result

tblMeds
TreatmentID (number)(PK)
Name
Dosage
SideEffect

tblTreatments
TreamentID(number)(PK)
Nature(therapy, surgery)
Location(where was it done)

Now you will not that tblTreatments and tblMeds are a subentity of
TreamentEvents. I found something on Dev Ashish's site about this that made
sense to me, unless I have perverted it. Taking a medicine and undergoing
surgery have common attributes, but some different ones. Some of my commenters
have said this is a bad technique.

Another critism I got was the use of the autonumber, ClientImpairmentID, to ID
the conjucture of the client with an illness. They thought that ClientID and
ImpairmentID together should be the primary key. But then how to get related
to tblTreatmentEvents? The suggestion was to plug bout ClientID and
ImpairmentID right into tblTreatments. That struck me as redundant. Does not
the creation of the "surrogate" key avoid this? In fact, when you use a
combination key like my reviewer suggests, aren't you "at the end of the line"
so to speak for relational purpose?

Jimmy Smith

unread,

May 26, 2000, 3:00:00 AM5/26/00

to

Surrogate keys rule for at least two reasons.
1) If the table that has them ever has to relate to a third where it is on
the one side (thus it is easier to make the AutoNumber (your surrogate) a
foreign rather than screw around with two or more composite keys.
2) When doing code to search for a record and pass whatever, it is alot
easier to pass or work with one value than many.
Mike Hnatt

"Cdenman3" <cden...@cs.com> wrote in message
news:20000526172838...@ng-cm1.news.cs.com...

Tom Mitchell

unread,

May 27, 2000, 3:00:00 AM5/27/00

to

I used to be a firm believer in natural keys, mainly because it made sense
with a minimum amount of data. However, recently I was burned by circular
reference with natural keys. When I realized that I had made a keystroke
error in data entry and went to change one of the key values, the cascading
update wouldn't make it around the horn so to speak. Major PIA. That has
caused me to seriously rethink my aversion to surrogate keys.

Bottom line, neither is right or wrong. As with everything else, they both
have their advantages and disadvantages. Pick the system you are most
comfortable with and go with it. If others don't like it, tough (unless
they are the ones paying the bills!).

Good Luck.

Michael (michka) Kaplan

unread,

May 27, 2000, 3:00:00 AM5/27/00

to

I would ALWAYS recommend against cascading updates and deletes as THEY are
to blame here for the problem you saw. RI that is enforced through natural
keys was in no way to blame, only cascades as I said, as well as a UI that
allowed you to make the bad change (but mainly the cascades).

Think about it, and about what the RI is designed to protect, and you will
see what I am getting at.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message
news:JPXX4.2149$tK3.2...@nnrp1.sbc.net...

Tom Mitchell

unread,

May 27, 2000, 3:00:00 AM5/27/00

to

Oh I agree. RI worked the way it should have and the application defaulted
to the "safe state" which is always preferable. However, in this rare (?)
case the combination of typo's, table design, RI, cascading updates and
natural keys conspired to create a situation which was less than desirable.
Take away any of those factors and the situation wouldn't have occurred.
Since (I think) the table design was sound, RI was definitely called for and
there is no way to stop typo's, the only way to effectively prevent the
situation would have been to replace cascade updates with code to
essentially do the same work, or to have used surrogate keys to begin with.
Thus, it gave me reason to reconsider my previous aversion to using
surrogate keys (and in no way affected my opinion that RI is a good thing).
Further thought and reading has lead me to the opinion that the natural vs
surrogate key argument comes down to a style issue. If a client or employer
has a standard one way or another, fine use it. If not, use whichever you
method you prefer, recognizing that there may be some annoyances you face
down the road. But don't let somebody criticize you because your style
doesn't fit his preconceived notions.

So with that said, why do you recommend against cascades? Surely you don't
make your users delete child records before Parent records are deleted? I
assume that means you write code to handle the cascade effects giving the
user plenty of options to back out. Why is that preferable to using
something that is "built into" Access as part of RI? - it seems like alot of
effort for not alot of return. Finally, where do you stand on the surrogate
vs natural key issue as per the original question? I'd be interested in
your opinion/practices.

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:#UQmr1Cy$GA.325@cpmsnbbsa07...

Michael (michka) Kaplan

unread,

May 28, 2000, 3:00:00 AM5/28/00

to

Access/Jet support for cascading updates and deletes is VERY dangerous. The
very nature of key values is that they should never or SELDOM change. When
they do change, it should be treated with the sort of care that any major
operation should be, ESPECIALLY if there are FK records that will be
affected, x10 if there are FK records to those child tables.

I am wholeheartedly in favor of NATURAL KEYS any time that (by looking at
the structure) it is proveable that one of the RI goals of the database will
not be met if the natural keys not use.... which would include any time that
the natural key does not have a unique index on it (for example). But this
is just to keep people from getting lazy and thwarting design principles out
of laziness. Assuming that IF two structures ARE equal in all regards, then
I would recommend that either is okay and at THAT point it comes down to
style.

But that is a major "IF" to make, and most dbs I have reviewed schema for
would fail that test, and would include cascades, and would use surrogate
keys. In most cases these structural issues would end up being one of the
reasons I was called into do the review, and I was able to solve problems
with these suggestions. It kind of goes on from there.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:9t%X4.1198$u35.2...@nnrp2.sbc.net...

Adam Cogan

unread,

May 28, 2000, 3:00:00 AM5/28/00

to

I prefer Natural is most cases. However I think you have better be careful
if replication is in the picture.

The support in SQL 2000 of Cascading Updates was important for the future
viability of this preference.

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------

"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:9t%X4.1198$u35.2...@nnrp2.sbc.net...

Michael (michka) Kaplan

unread,

May 28, 2000, 3:00:00 AM5/28/00

to

In what way would replication make natural keys a bad idea? IMHO replication
makes the issues I raised elsewhere in this thread even more vital as its
even more important to maintain integrity of the DATA itself, rather than
just some articifical keys.

As for cascading updates and deletes, they are always a bad idea as it makes
it way to easy to do bad things. The prior bar to causing this kind of
damage has a LOT to do with the perceived stability of SQL Server, and many
people will incorrectly start thinking SQL server is less reliable as they
find they are messing up their data through such features.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Adam Cogan" <adam...@ssw.com.au> wrote in message
news:8gqjtv$6f5$1...@argon.syd.dav.net.au...

Tom Mitchell

unread,

May 28, 2000, 3:00:00 AM5/28/00

to

So that goes back to my point that maybe surrogate keys aren't such a bad
thing. You can never stop typos. Therefore typos will happen in natural
key fields. Somebody will eventually realize the typo is there and will
want to change it to the "correct value." So now you are stuck with:

1. cascade updates as part of Access/Jet (which you are obviously against)
2. cascade updates by custom coding (which strikes me as alot of work on a
part of the system that will be seldom used)
3. Relying on the adhoc abilities of the local DB administrator (gives me
the willies just thinking about it)
4. telling the client they have to pay you the developer $$ every time one
of there minimum wage data entry people make a typo (sign me up for those
clients) or
5. telling the client, no your people %*&&^-up, now live with it (not very
elegant).

Surrogate keys are ONE way to avoid the situation.

Again, I am not advocating surrogate keys as superior to natural keys. All
I'm saying is that they are not the evil thing that some (including me a few
months ago) thought they were.

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote

in message news:u3nT#hHy$GA.310@cpmsnbbsa07...

Michael (michka) Kaplan

unread,

May 28, 2000, 3:00:00 AM5/28/00

to

Actually, surrogate keys compound the problem by making it harder to find
the problem.

THAT is exactly what is wrong with people using surrogate keys.... they lose
the built-in integreity that natural keys give them in terms of making sure
the data is right at entry time.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:jVbY4.2652$tK3.3...@nnrp1.sbc.net...

David W. Fenton

unread,

May 30, 2000, 3:00:00 AM5/30/00

to

forme...@spamfree.trigeminal.nospam.com (Michael $michka$
Kaplan) wrote in <Obi5UASy$GA.188@cpmsnbbsa09>:

>Actually, surrogate keys compound the problem by making it harder
>to find the problem.
>
>THAT is exactly what is wrong with people using surrogate keys....
>they lose the built-in integreity that natural keys give them in
>terms of making sure the data is right at entry time.

There is only one kind of surrogate key that anyone should ever
use, and that's an AutoNumber. And users should never see or be
allowed to edit the key field.

I can't imagine a circumstance in which any other surrogate key is
ever justified.

BTW, I'm pretty much opposed to natural keys in general, since I've
run onto circumcstances where they could be used so seldom that I
just don't use them, on principle. The only case where I use them
would be in lookup tables, but I consider that completely trivial.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc

Adam Cogan

unread,

May 30, 2000, 3:00:00 AM5/30/00

to

> In what way would replication make natural keys a bad idea? IMHO
replication
> makes the issues I raised elsewhere in this thread even more vital as its
> even more important to maintain integrity of the DATA itself, rather than
> just some articifical keys.

Image Northwind is replicated.....
Enter Company 'Superior Software' and Client ID 'SUPER' and a few orders

On another replica....
Enter Company 'Super League' and Client ID 'SUPER' and a few orders

Synchronise.....

Conflicts... OK in Jet - Unresolvable in SQL Server

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:uNCZpdKy$GA.321@cpmsnbbsa07...

Michael (michka) Kaplan

unread,

May 30, 2000, 3:00:00 AM5/30/00

to

But again, if you do not do the work to maintain integrity, then surrogate
keys are not ok.

In other words you cannot sacrifice data integrity. :-)

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message
news:8F43D0352df...@news1.bway.net...

> forme...@spamfree.trigeminal.nospam.com (Michael $michka$
> Kaplan) wrote in <Obi5UASy$GA.188@cpmsnbbsa09>:
>

> >Actually, surrogate keys compound the problem by making it harder
> >to find the problem.
> >
> >THAT is exactly what is wrong with people using surrogate keys....
> >they lose the built-in integreity that natural keys give them in
> >terms of making sure the data is right at entry time.
>

Craig Alexander Morrison

unread,

Jun 1, 2000, 3:00:00 AM6/1/00

to

David

<<<<BTW, I'm pretty much opposed to natural keys in general, since I've run

onto circumstances where they could be used so seldom that I

just don't use them, on principle. The only case where I use them would be
in lookup tables, but I consider that completely trivial.>>>

How do you guarantee the uniqueness of your records?

As you know the AutoNumber sometimes referred to as a Surrogate Key is
pseudo-randomly (or unwisely incrementally) generated and does nothing more
than provide a reference to the record. Nothing the Surrogate does allows it
to ensure that records in that table are unique.

As far as the AutoNumber goes it should probably not be seen by any user of
the database including the developer. Indeed would it not be preferable that
the Natural Key (should it exist) be defined by the user/developer as the
Primary Key and then Jet automatically generate the internal number (using
it's own hashing algorithm) to be used to enforce the relationships. The
external view being of the Primary Key and the Foreign Keys.

Practically as Access/Jet does not support this the AutoNumber is (sadly)
exposed and the designer has to use this as the Primary Key and part or all
of the Foreign Key. The designer should also take care to ensure that at
least one of the candidate keys is defined as a No Nulls, Required, Unique
Index.

Failure to do this makes a mockery of the Relational Model (as does SQL with
its support of duplicate records).

Slainte

Craig Alexander Morrison, CData SystemsHouse

BTW If you do not have a natural key you are sometimes required to create a
field to be used as the (or part of the) Primary Key, this is not a
surrogate, this is a generated key.

Michael (michka) Kaplan

unread,

Jun 1, 2000, 3:00:00 AM6/1/00

to

Exactly! I knew someone would know what I was talking about!

Using the autonumber instead of data integrity is about as good as that
guillotine cure for dandruff that Marie Antionette used....

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Alexander Morrison" <CraigAlexan...@NoSpamNoMail.com> wrote
in message news:8h4neh$jt5$1...@supernews.com...

Adam Cogan

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

Using Northwind as an example - what WOULD you use as the primary key in the
following tables:
* Customers
* Orders
* Products
* Products Category

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------

David W. Fenton

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

forme...@spamfree.trigeminal.nospam.com (Michael $michka$
Kaplan) wrote in <eD$918Bz$GA.321@cpmsnbbsa07>:

>Exactly! I knew someone would know what I was talking about!
>
>Using the autonumber instead of data integrity is about as good as
>that guillotine cure for dandruff that Marie Antionette used....

When there is no candidate natural key (and there is not in the
vast majority of applications I've encountered; but we had this
discussion a long, long time ago), a surrogate key is required.

An AutoNumber is a perfectly good surrogate key, and one that the
user should never see (as with every surrogate key).

It was clear to me the last time we had this discussion that people
with perspectives different from mine see things differently.
Particularly, it seemed that those writing financial applications
had a much greater number of viable natural keys than in the kinds
of applications that I've been involved with, which mostly store
information about people and their activities.

When there is a natural key that is not subject to change, I'll use
it. That happens so seldom that I spend little time looking for
them during schema design.

Craig Alexander Morrison

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

David

How do you guarantee the uniqueness of your records?

Do you define a No Nulls, Required, Unique Index to do this or do you allow
duplicate records all for the AutoNumber.

I have seen you write about normalisation in other conversations yet you
seem to fail to observe step 1. In first normal form you are, in addition to
other things, expected to ensure that no two rows in a table are identical.
By creating a meaningless value such as the AutoNumber (used as a Surrogate
Key) you have not satisfied the above condition.

Also what is this about the PK changing, it can, it may, you should choose
(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change.

I ask again: How do you guarantee the uniqueness of your records?

How do your users choose between two identical records?

As you have said the users do not (should not) see the Surrogate Key
(AutoNumber) so all they have available to them is the remaining data which
could exist, duplicated, in one, two or thousands of records.

I believe that you are using a Surrogate Key in place of what you should be
using which is a generated field that can be used in conjunction with the
other fields in the record to identify uniqueness. This does not preclude
you using a Surrogate Key such as the Jet AutoNumber so long as you define
the No Nulls, Required, Unique Index on the field(s) that are included in
the candidate key(s).

Until you can identify each record as a unique record using the REAL (even a
generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form.

Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?

Keri Hardwick

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

What you "know" so far is simply false in two of your 4 areas.

Database passwords are not security, they are a toy. Proper mdb security
does not have passwords in an easily cracked area.

There is certainly a way to enforce uniqueness of data; it's called a unique
index.

Of course Access does not have the same functionality as Oracle. It is not
intended for the same market, and its cost is several orders of magnitude
lower to reflect this fact.

Keri
"Brad Allan" <brad...@home.com> wrote in message
news:G1HZ4.5456$F9.1...@news1.gvcl1.bc.home.com...
> Craig (or anyone else relatively familiar with Access' engine/inner
> workings),
>
> I hope you read this and are able/willing to give me a hand with at least
> some of this. Recently, I've gone back to university and am curently
taking
> a databases course. A question on our first assignment asks us to beifly
> determine Access' compliance with the primary functions of a reasonable
> batabase management system. From this thread (and my own experience), I
can
> see that there is a substantial divergence from what we would hope to find
> in terms of data integrity. Staying with the aspect of controlled access,
> could you explain some of Access' shortcomings in the following areas? I
> will include what I know already.
>
> Security
> - the way I understand it, the passwords are stored in a relatively easy
to
> access area using a known (and therefore cracked) encryption routine
>
> Integrity
> - there is no way to enforce true uniqueness of data
> - the de facto database manipulation language, SQL - which Access uses -
> does allows duplicate records
>
> Concurrency
> - I believe, from my experiences, that shared access is not too bad
>
> Recovery
> - well, I've used the recovery tools in Oracle, and quite frankly, Access
is
> a LONG way from that (whether that constitutes a failure or shortcoming is
a
> matter of opinion, I suppose :) )
>
> Thanks to anyone who is able to contribute to this.
>
> Brad Allan

>
>
> "Craig Alexander Morrison" <CraigAlexan...@NoSpamNoMail.com> wrote
> in message news:8h4neh$jt5$1...@supernews.com...
> > David
> >
> > <<<<BTW, I'm pretty much opposed to natural keys in general, since I've
> run
> > onto circumstances where they could be used so seldom that I
> > just don't use them, on principle. The only case where I use them would
be
> > in lookup tables, but I consider that completely trivial.>>>
> >

> > How do you guarantee the uniqueness of your records?
> >

> > As you know the AutoNumber sometimes referred to as a Surrogate Key is
> > pseudo-randomly (or unwisely incrementally) generated and does nothing
> more
> > than provide a reference to the record. Nothing the Surrogate does
allows
> it
> > to ensure that records in that table are unique.
> >
> > As far as the AutoNumber goes it should probably not be seen by any user
> of
> > the database including the developer. Indeed would it not be preferable
> that
> > the Natural Key (should it exist) be defined by the user/developer as
the
> > Primary Key and then Jet automatically generate the internal number
(using
> > it's own hashing algorithm) to be used to enforce the relationships. The
> > external view being of the Primary Key and the Foreign Keys.
> >
> > Practically as Access/Jet does not support this the AutoNumber is
(sadly)
> > exposed and the designer has to use this as the Primary Key and part or
> all
> > of the Foreign Key. The designer should also take care to ensure that at

> > least one of the candidate keys is defined as a No Nulls, Required,
Unique

> > Index.
> >
> > Failure to do this makes a mockery of the Relational Model (as does SQL
> with
> > its support of duplicate records).
> >

> > Slainte
> >
> > Craig Alexander Morrison, CData SystemsHouse
> >

Michael (michka) Kaplan

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

Sheesh Brad, this is not such a smart list, I hope your grade does not
depend on it!

Security -- only true about the database password or poorly implemented
security schemes. But you can say the latter about ANY system.

Integrity -- You are simply dead wrong.

Concurrency -- Works fine for me, not sure what you want to know about it.

Recovery -- Its a DESKTOP database. Comparing it to a server db is hardly a
fair comparison. Do you pit your 6year old against a professional boxer?

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Brad Allan" <brad...@home.com> wrote in message

Arvin Meyer

unread,

Jun 2, 2000, 3:00:00 AM6/2/00

to

Michael (michka) Kaplan wrote in message ...

>Do you pit your 6year old against a professional boxer?

Really more like a bar room brawler against a professional boxer, I'd say.

Most of the time, the professional will win, but there is always that 1 time
<g>
---
Arvin Meyer

Michael (michka) Kaplan

unread,

Jun 3, 2000, 3:00:00 AM6/3/00

to

You would need to redesign several tables to start. Northwind is a really
disgusting example of database schema.

I would never take a contract that required me to be responsible for bad
design.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message
news:8F48D417Adf...@news1.bway.net...
> CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
> wrote in <8h778k$ojp$1...@supernews.com>:

>
> >"Adam Cogan" <adam...@ssw.com.au> wrote in message

> >news:8h6u1o$gd8$1...@argon.syd.dav.net.au...

> >
> ><<< Using Northwind as an example - what WOULD you use as the
> ><<< primary key in
> >the following tables:
> ><<< * Customers * Orders * Products * Products Category
> >

> >I had a look at Northwind's database design (for the first time)
> >in Access 97 and there you have it complete and utter... Of course
> >I do not believe that Northwind is anything more than a sample to
> >show off programming techniques, it was not intended to show how
> >to design relational databases (at least I hope not).
>
> He didn't ask you to critique the Northwinds schema, which we all
> already knew was a disaster.
>
> He asked what the candidate keys are for that data structure.
>
> What primary key would *you* use for Customers? For Orders? For
> Products? For Category?
>
> Answer the question, and then I'll explain to you why you will need
> a surrogate key in several of those tables.

David W. Fenton

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

David W. Fenton

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8h7cfn$khk$1...@supernews.com>:

>How do you guarantee the uniqueness of your records?

Er, as a primary key:

>Do you define a No Nulls, Required, Unique Index to do this or do
>you allow duplicate records all for the AutoNumber.

Unique index, no nulls. D'oh. I set the Autonumber type, and then
click the wee little primary key button on the toolbar.

I guess I could do it in DAO or something, and claim to be doing
something very complicated and geeky and extra robust, but it would
seem like flim-flammery to me.

>I have seen you write about normalisation in other conversations
>yet you seem to fail to observe step 1. In first normal form you
>are, in addition to other things, expected to ensure that no two
>rows in a table are identical. By creating a meaningless value
>such as the AutoNumber (used as a Surrogate Key) you have not
>satisfied the above condition.

I said that when there are no candidate natural keys (that is, that
which defines the unique record cannot be guaranteed to have no
nulls), you have to use a surrogate key, and then, yes, of course,
you have to insure uniqueness through programmatic logic.

>Also what is this about the PK changing, it can, it may, you
>should choose (if you have a choice) the candidate key whose
>field(s) are the least volatile. I do not know where this idea
>that the value of the PK is static (set in stone) came from. It
>should not be volatile, however everything changes or can change.

I've never written an application with alterable PKs that was
stable or easy to maintain. Now that more and more of my apps are
replicated, I feel even more strongly about this, since propagating
PK changes cannot in those circumstances be done via cascading
updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context.

>I ask again: How do you guarantee the uniqueness of your records?

When there is no viable natural key, programmatically.

>How do your users choose between two identical records?

I build routines to help users avoid duplicates at record creation
and administrative tools to de-dup the data if users have made
incorrect choices in data entry.

>As you have said the users do not (should not) see the Surrogate
>Key (AutoNumber) so all they have available to them is the
>remaining data which could exist, duplicated, in one, two or
>thousands of records.
>
>I believe that you are using a Surrogate Key in place of what you
>should be using which is a generated field that can be used in
>conjunction with the other fields in the record to identify

>uniqueness. . . .

This is becoming a rehash of a thread we had a few months ago. I
described a circumstance in which I was storing companies, and only
the company name could be set to No Nulls, but there were multiple
locations for each company, and more than one possible in each city
(though not at the same address). But the only thing the actual
users *always* had was the name of the company, and, most often,
the location. This meant that Nulls had to be allowed in City,
State, Country and Address (which is the only compound key that
could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of
duplicates, and administrative tools to clean up any duplicates
that were created.

The same thing applies to people. When the only information you you
are guranteed to have for a person is their last name, you cannot
use a natural key.

Since nearly every application I have ever written stores data on
people and companies, and has exactly the same operating
requirements, I have never used a natural key for either of these
types of data.

Do you honestly believe that I am missing out on a candidate
natural key for these two entities, given the described operating
requirements?

If you do, I'm all ears. I've love to be able to eliminate the
programmatic logic and de-duping tools.

> . . . This does not preclude

>you using a Surrogate Key such as the Jet AutoNumber so long as
>you define the No Nulls, Required, Unique Index on the field(s)
>that are included in the candidate key(s).

Why you assumed that I would not do that, I can't say. We *were*
talking about primary keys, so that was an implicit part of the
definition.

>Until you can identify each record as a unique record using the
>REAL (even a generated field) fields you cannot have a normalised
>relational database design, you are not even in first normal form.

And real-life applications very seldom allow full normalization.
This does not disturb me, since my job is to provide application
solutions to my clients, not normalizaed data schemas.

And, BTW, if you have a generated field for your PK, you're not in
1NF.

I have *never* used a generated PK field. That's an abomination, in
my opinion, since it violates the first principle of normalization,
dependencies between fields within a record.

>Ultimately you can do whatever you want, I just do not think it is
>fair to call it Relational Database Design, do you?

Call it whatever you like. Or not. I'm talking about real life
applications. You build a data model that is as normalized as is
possible giving the operating conditions. Surrogate keys are very
often required for certain kinds of data. It matters not one iota
to me that you would not call these data structures a "Relational
Database Design" since it seems to me that you've defined the term
so narrowly as to preclude its use in 99% of real-world operating
conditions.

Of course, YMMV.

Craig Alexander Morrison

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

David

>>> Answer the question, and then I'll explain to you why you will need a
surrogate key in several of those tables.<<<

Oh I did. You don't "need" Surrogates, but I can show you that you can use
Surrogates in all the tables if you want, that is not my point.

I have no problem with use of Surrogates, David, I do have a problem with
the absence of uniqueness in the records in that database, note the ability
to copy and paste records where the only unique index is on the AutoNumber
PK (Surrogate).

Explain to me how you guarantee uniqueness and I'll explain to you why you
need a generated key to be used in conjunction with other fields to create a
composite key that can be set as a No Nulls, Required, Unique Index. No need
to wait for your explanation see my other message.

They contain AutoNumbers as PKs without a No Nulls, Required, Unique Index.
(The only table with such an index is a lookup.)

See my other message.

If you give someone a fish they can feed their family today, if you give
that person a
fishing rod they can feed them for years.

Discussing a particular database design does not deal with the principles of
design, I did however take some examples from that design to illustrate my
points. Database design by e-mail is dangerous.To design Northwind properly
you need to talk to the users or have a clearly defined problem domain that
clearly sets out the requirement and objectives of the system.

Choice of the primary keys depends on the answers to our questions about the
data made to the users. If I make a choice on the Northwind design anyone
could raise a point in objection and these points may or may not be valid
who could tell. In the REAL world you have REAL users and they would help
you to decide upon the correct candidate key(s) one of which could be made
the PK or you could even use a Surrogate (it must be clear to you now that I
am concerned about uniqueness of the records within the database, not
Surrogates).

Craig Alexander Morrison

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

Craig Alexander Morrison

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow
duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?
Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing
something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.
Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that
which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural
key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose

(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility
is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When
there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate
between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build
routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and
if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only
the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in
conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are

guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people
and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if
you did find a natural key and you did not make it the PK you would have
defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.
If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the
programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph

" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the
candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even
a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very
possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my

opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated
field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.

>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out
with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for
implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data
structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly
done meets the real world requirements of most clients. I build REAL
systems, please do not try to justify your approach on that basis.

>>> Of course, YMMV. <<<

I have no idea what that means, enlighten me, unless it is derogatory (of
course)

SUMMARY

Surrogates can and often should be used in Access as the Primary Key (even
using the AutoNumber). AGREED?

If you do use a Surrogate as the PK you must define a No Nulls, Required,
Unique Index on the candidate key(s). AGREED?

If there is no natural key, create a generated field to be used and exposed
to the user to differentiate between records. This composite (a key
consisting of more than one field and including a generated field) candidate
key should be a No Nulls, Required, Unique Index. AGREED? THIS IS WHERE WE
DIFFER I THINK

The ultimate, yet achievable, goal is to ensure and enforce unique records
at the database engine level. AGREED?

Slainte

Craig Alexander Morrison, CData SystemsHouse

Note: I refer to composite keys and compound keys by which I mean:

Compound Key - A key made up of 2 or more fields of REAL fields in the
table.
Composite Key - A key made up of 1 or more fields of REAL fields plus a
generated field

Further Note: A generated field is meaningless (like a surrogate) and it is
(unlike a surrogate) exposed to the user to allow them to differentiate
between records. It is most definitely NOT an "Intelligent Number" which are
not very clever at all.

Craig Alexander Morrison

unread,

Jun 4, 2000, 3:00:00 AM6/4/00

to

David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow

duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?

Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing

something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.

Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that

which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural

key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose

(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility

is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When

there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate

between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build

routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and

if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only

the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in

conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are

guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people

and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if

you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.

If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the

programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph

" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the

candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even

a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very

possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my

opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated

field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.

>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out

with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for

implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data

structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly