Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Use of "surrogate" vs. "natural" keys

6 views
Skip to first unread message

Cdenman3

unread,
May 26, 2000, 3:00:00 AM5/26/00
to
After having solicited the opinion of others on my DB tables and the
organization thereof, I am now thoroughly confused by the differing opinions.
In a nutshell, I need to track my client's illnesses and the treatments they
receive for them. I have the following tables:

tblClient
ClientID (SSN)
etc

tblImpairments
ImpairmentID (PK)(Autonumber)
Name ("Schizophrenia", etc)
Etc

tblinkClientToImpairment
ClientImpairmentID (PK)(Autonumber)
ClientID
ImpairmentID

tblTreatmentEvent
TreatmentEventID (PK)(Autonumber)
ClientImpairmentID (ForeignKey)
Doctor (who prescribed)
StartDate
StopDate
Result

tblMeds
TreatmentID (number)(PK)
Name
Dosage
SideEffect

tblTreatments
TreamentID(number)(PK)
Nature(therapy, surgery)
Location(where was it done)


Now you will not that tblTreatments and tblMeds are a subentity of
TreamentEvents. I found something on Dev Ashish's site about this that made
sense to me, unless I have perverted it. Taking a medicine and undergoing
surgery have common attributes, but some different ones. Some of my commenters
have said this is a bad technique.

Another critism I got was the use of the autonumber, ClientImpairmentID, to ID
the conjucture of the client with an illness. They thought that ClientID and
ImpairmentID together should be the primary key. But then how to get related
to tblTreatmentEvents? The suggestion was to plug bout ClientID and
ImpairmentID right into tblTreatments. That struck me as redundant. Does not
the creation of the "surrogate" key avoid this? In fact, when you use a
combination key like my reviewer suggests, aren't you "at the end of the line"
so to speak for relational purpose?

Jimmy Smith

unread,
May 26, 2000, 3:00:00 AM5/26/00
to
Surrogate keys rule for at least two reasons.
1) If the table that has them ever has to relate to a third where it is on
the one side (thus it is easier to make the AutoNumber (your surrogate) a
foreign rather than screw around with two or more composite keys.
2) When doing code to search for a record and pass whatever, it is alot
easier to pass or work with one value than many.
Mike Hnatt


"Cdenman3" <cden...@cs.com> wrote in message
news:20000526172838...@ng-cm1.news.cs.com...

Tom Mitchell

unread,
May 27, 2000, 3:00:00 AM5/27/00
to
I used to be a firm believer in natural keys, mainly because it made sense
with a minimum amount of data. However, recently I was burned by circular
reference with natural keys. When I realized that I had made a keystroke
error in data entry and went to change one of the key values, the cascading
update wouldn't make it around the horn so to speak. Major PIA. That has
caused me to seriously rethink my aversion to surrogate keys.

Bottom line, neither is right or wrong. As with everything else, they both
have their advantages and disadvantages. Pick the system you are most
comfortable with and go with it. If others don't like it, tough (unless
they are the ones paying the bills!).

Good Luck.

Michael (michka) Kaplan

unread,
May 27, 2000, 3:00:00 AM5/27/00
to
I would ALWAYS recommend against cascading updates and deletes as THEY are
to blame here for the problem you saw. RI that is enforced through natural
keys was in no way to blame, only cascades as I said, as well as a UI that
allowed you to make the bad change (but mainly the cascades).

Think about it, and about what the RI is designed to protect, and you will
see what I am getting at.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message
news:JPXX4.2149$tK3.2...@nnrp1.sbc.net...

Tom Mitchell

unread,
May 27, 2000, 3:00:00 AM5/27/00
to
Oh I agree. RI worked the way it should have and the application defaulted
to the "safe state" which is always preferable. However, in this rare (?)
case the combination of typo's, table design, RI, cascading updates and
natural keys conspired to create a situation which was less than desirable.
Take away any of those factors and the situation wouldn't have occurred.
Since (I think) the table design was sound, RI was definitely called for and
there is no way to stop typo's, the only way to effectively prevent the
situation would have been to replace cascade updates with code to
essentially do the same work, or to have used surrogate keys to begin with.
Thus, it gave me reason to reconsider my previous aversion to using
surrogate keys (and in no way affected my opinion that RI is a good thing).
Further thought and reading has lead me to the opinion that the natural vs
surrogate key argument comes down to a style issue. If a client or employer
has a standard one way or another, fine use it. If not, use whichever you
method you prefer, recognizing that there may be some annoyances you face
down the road. But don't let somebody criticize you because your style
doesn't fit his preconceived notions.

So with that said, why do you recommend against cascades? Surely you don't
make your users delete child records before Parent records are deleted? I
assume that means you write code to handle the cascade effects giving the
user plenty of options to back out. Why is that preferable to using
something that is "built into" Access as part of RI? - it seems like alot of
effort for not alot of return. Finally, where do you stand on the surrogate
vs natural key issue as per the original question? I'd be interested in
your opinion/practices.


"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:#UQmr1Cy$GA.325@cpmsnbbsa07...

Michael (michka) Kaplan

unread,
May 28, 2000, 3:00:00 AM5/28/00
to
Access/Jet support for cascading updates and deletes is VERY dangerous. The
very nature of key values is that they should never or SELDOM change. When
they do change, it should be treated with the sort of care that any major
operation should be, ESPECIALLY if there are FK records that will be
affected, x10 if there are FK records to those child tables.

I am wholeheartedly in favor of NATURAL KEYS any time that (by looking at
the structure) it is proveable that one of the RI goals of the database will
not be met if the natural keys not use.... which would include any time that
the natural key does not have a unique index on it (for example). But this
is just to keep people from getting lazy and thwarting design principles out
of laziness. Assuming that IF two structures ARE equal in all regards, then
I would recommend that either is okay and at THAT point it comes down to
style.

But that is a major "IF" to make, and most dbs I have reviewed schema for
would fail that test, and would include cascades, and would use surrogate
keys. In most cases these structural issues would end up being one of the
reasons I was called into do the review, and I was able to solve problems
with these suggestions. It kind of goes on from there.


--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:9t%X4.1198$u35.2...@nnrp2.sbc.net...

Adam Cogan

unread,
May 28, 2000, 3:00:00 AM5/28/00
to
I prefer Natural is most cases. However I think you have better be careful
if replication is in the picture.

The support in SQL 2000 of Cascading Updates was important for the future
viability of this preference.

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------


"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:9t%X4.1198$u35.2...@nnrp2.sbc.net...

Michael (michka) Kaplan

unread,
May 28, 2000, 3:00:00 AM5/28/00
to
In what way would replication make natural keys a bad idea? IMHO replication
makes the issues I raised elsewhere in this thread even more vital as its
even more important to maintain integrity of the DATA itself, rather than
just some articifical keys.

As for cascading updates and deletes, they are always a bad idea as it makes
it way to easy to do bad things. The prior bar to causing this kind of
damage has a LOT to do with the perceived stability of SQL Server, and many
people will incorrectly start thinking SQL server is less reliable as they
find they are messing up their data through such features.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Adam Cogan" <adam...@ssw.com.au> wrote in message
news:8gqjtv$6f5$1...@argon.syd.dav.net.au...

Tom Mitchell

unread,
May 28, 2000, 3:00:00 AM5/28/00
to
So that goes back to my point that maybe surrogate keys aren't such a bad
thing. You can never stop typos. Therefore typos will happen in natural
key fields. Somebody will eventually realize the typo is there and will
want to change it to the "correct value." So now you are stuck with:

1. cascade updates as part of Access/Jet (which you are obviously against)
2. cascade updates by custom coding (which strikes me as alot of work on a
part of the system that will be seldom used)
3. Relying on the adhoc abilities of the local DB administrator (gives me
the willies just thinking about it)
4. telling the client they have to pay you the developer $$ every time one
of there minimum wage data entry people make a typo (sign me up for those
clients) or
5. telling the client, no your people %*&&^-up, now live with it (not very
elegant).

Surrogate keys are ONE way to avoid the situation.

Again, I am not advocating surrogate keys as superior to natural keys. All
I'm saying is that they are not the evil thing that some (including me a few
months ago) thought they were.

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote

in message news:u3nT#hHy$GA.310@cpmsnbbsa07...

Michael (michka) Kaplan

unread,
May 28, 2000, 3:00:00 AM5/28/00
to
Actually, surrogate keys compound the problem by making it harder to find
the problem.

THAT is exactly what is wrong with people using surrogate keys.... they lose
the built-in integreity that natural keys give them in terms of making sure
the data is right at entry time.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Tom Mitchell" <rtm...@swbell.net> wrote in message

news:jVbY4.2652$tK3.3...@nnrp1.sbc.net...

David W. Fenton

unread,
May 30, 2000, 3:00:00 AM5/30/00
to
forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
Kaplan) wrote in <Obi5UASy$GA.188@cpmsnbbsa09>:

>Actually, surrogate keys compound the problem by making it harder
>to find the problem.
>
>THAT is exactly what is wrong with people using surrogate keys....
>they lose the built-in integreity that natural keys give them in
>terms of making sure the data is right at entry time.

There is only one kind of surrogate key that anyone should ever
use, and that's an AutoNumber. And users should never see or be
allowed to edit the key field.

I can't imagine a circumstance in which any other surrogate key is
ever justified.

BTW, I'm pretty much opposed to natural keys in general, since I've
run onto circumcstances where they could be used so seldom that I
just don't use them, on principle. The only case where I use them
would be in lookup tables, but I consider that completely trivial.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc

Adam Cogan

unread,
May 30, 2000, 3:00:00 AM5/30/00
to
> In what way would replication make natural keys a bad idea? IMHO
replication
> makes the issues I raised elsewhere in this thread even more vital as its
> even more important to maintain integrity of the DATA itself, rather than
> just some articifical keys.

Image Northwind is replicated.....
Enter Company 'Superior Software' and Client ID 'SUPER' and a few orders

On another replica....
Enter Company 'Super League' and Client ID 'SUPER' and a few orders

Synchronise.....

Conflicts... OK in Jet - Unresolvable in SQL Server

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:uNCZpdKy$GA.321@cpmsnbbsa07...

Michael (michka) Kaplan

unread,
May 30, 2000, 3:00:00 AM5/30/00
to
But again, if you do not do the work to maintain integrity, then surrogate
keys are not ok.

In other words you cannot sacrifice data integrity. :-)

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message
news:8F43D0352df...@news1.bway.net...


> forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
> Kaplan) wrote in <Obi5UASy$GA.188@cpmsnbbsa09>:
>

> >Actually, surrogate keys compound the problem by making it harder
> >to find the problem.
> >
> >THAT is exactly what is wrong with people using surrogate keys....
> >they lose the built-in integreity that natural keys give them in
> >terms of making sure the data is right at entry time.
>

Craig Alexander Morrison

unread,
Jun 1, 2000, 3:00:00 AM6/1/00
to
David

<<<<BTW, I'm pretty much opposed to natural keys in general, since I've run

onto circumstances where they could be used so seldom that I


just don't use them, on principle. The only case where I use them would be
in lookup tables, but I consider that completely trivial.>>>

How do you guarantee the uniqueness of your records?

As you know the AutoNumber sometimes referred to as a Surrogate Key is
pseudo-randomly (or unwisely incrementally) generated and does nothing more
than provide a reference to the record. Nothing the Surrogate does allows it
to ensure that records in that table are unique.

As far as the AutoNumber goes it should probably not be seen by any user of
the database including the developer. Indeed would it not be preferable that
the Natural Key (should it exist) be defined by the user/developer as the
Primary Key and then Jet automatically generate the internal number (using
it's own hashing algorithm) to be used to enforce the relationships. The
external view being of the Primary Key and the Foreign Keys.

Practically as Access/Jet does not support this the AutoNumber is (sadly)
exposed and the designer has to use this as the Primary Key and part or all
of the Foreign Key. The designer should also take care to ensure that at
least one of the candidate keys is defined as a No Nulls, Required, Unique
Index.

Failure to do this makes a mockery of the Relational Model (as does SQL with
its support of duplicate records).

Slainte

Craig Alexander Morrison, CData SystemsHouse

BTW If you do not have a natural key you are sometimes required to create a
field to be used as the (or part of the) Primary Key, this is not a
surrogate, this is a generated key.


Michael (michka) Kaplan

unread,
Jun 1, 2000, 3:00:00 AM6/1/00
to
Exactly! I knew someone would know what I was talking about!

Using the autonumber instead of data integrity is about as good as that
guillotine cure for dandruff that Marie Antionette used....

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Alexander Morrison" <CraigAlexan...@NoSpamNoMail.com> wrote
in message news:8h4neh$jt5$1...@supernews.com...

Adam Cogan

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to
Using Northwind as an example - what WOULD you use as the primary key in the
following tables:
* Customers
* Orders
* Products
* Products Category

Adam
adam...@ssw.com.au
--------------------------------------------------------
Check out these HOT UTILITIES FOR ACCESS AND VB DEVELOPERS....
www.ssw.com.au
* SSW Data PRO - Version Control for your data.mdb
* SSW Data Renovator - Compare the differences between two data.mdb's
* SSW Upsize PRO! - Don't UPSIZE to SQL Server without it
--------------------------------------------------------

David W. Fenton

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to
forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
Kaplan) wrote in <eD$918Bz$GA.321@cpmsnbbsa07>:

>Exactly! I knew someone would know what I was talking about!
>
>Using the autonumber instead of data integrity is about as good as
>that guillotine cure for dandruff that Marie Antionette used....

When there is no candidate natural key (and there is not in the
vast majority of applications I've encountered; but we had this
discussion a long, long time ago), a surrogate key is required.

An AutoNumber is a perfectly good surrogate key, and one that the
user should never see (as with every surrogate key).

It was clear to me the last time we had this discussion that people
with perspectives different from mine see things differently.
Particularly, it seemed that those writing financial applications
had a much greater number of viable natural keys than in the kinds
of applications that I've been involved with, which mostly store
information about people and their activities.

When there is a natural key that is not subject to change, I'll use
it. That happens so seldom that I spend little time looking for
them during schema design.

Craig Alexander Morrison

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to
David

How do you guarantee the uniqueness of your records?

Do you define a No Nulls, Required, Unique Index to do this or do you allow
duplicate records all for the AutoNumber.

I have seen you write about normalisation in other conversations yet you
seem to fail to observe step 1. In first normal form you are, in addition to
other things, expected to ensure that no two rows in a table are identical.
By creating a meaningless value such as the AutoNumber (used as a Surrogate
Key) you have not satisfied the above condition.

Also what is this about the PK changing, it can, it may, you should choose
(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change.

I ask again: How do you guarantee the uniqueness of your records?

How do your users choose between two identical records?

As you have said the users do not (should not) see the Surrogate Key
(AutoNumber) so all they have available to them is the remaining data which
could exist, duplicated, in one, two or thousands of records.

I believe that you are using a Surrogate Key in place of what you should be
using which is a generated field that can be used in conjunction with the
other fields in the record to identify uniqueness. This does not preclude
you using a Surrogate Key such as the Jet AutoNumber so long as you define
the No Nulls, Required, Unique Index on the field(s) that are included in
the candidate key(s).

Until you can identify each record as a unique record using the REAL (even a
generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form.

Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?

Keri Hardwick

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to
What you "know" so far is simply false in two of your 4 areas.

Database passwords are not security, they are a toy. Proper mdb security
does not have passwords in an easily cracked area.

There is certainly a way to enforce uniqueness of data; it's called a unique
index.

Of course Access does not have the same functionality as Oracle. It is not
intended for the same market, and its cost is several orders of magnitude
lower to reflect this fact.

Keri
"Brad Allan" <brad...@home.com> wrote in message
news:G1HZ4.5456$F9.1...@news1.gvcl1.bc.home.com...
> Craig (or anyone else relatively familiar with Access' engine/inner
> workings),
>
> I hope you read this and are able/willing to give me a hand with at least
> some of this. Recently, I've gone back to university and am curently
taking
> a databases course. A question on our first assignment asks us to beifly
> determine Access' compliance with the primary functions of a reasonable
> batabase management system. From this thread (and my own experience), I
can
> see that there is a substantial divergence from what we would hope to find
> in terms of data integrity. Staying with the aspect of controlled access,
> could you explain some of Access' shortcomings in the following areas? I
> will include what I know already.
>
> Security
> - the way I understand it, the passwords are stored in a relatively easy
to
> access area using a known (and therefore cracked) encryption routine
>
> Integrity
> - there is no way to enforce true uniqueness of data
> - the de facto database manipulation language, SQL - which Access uses -
> does allows duplicate records
>
> Concurrency
> - I believe, from my experiences, that shared access is not too bad
>
> Recovery
> - well, I've used the recovery tools in Oracle, and quite frankly, Access
is
> a LONG way from that (whether that constitutes a failure or shortcoming is
a
> matter of opinion, I suppose :) )
>
> Thanks to anyone who is able to contribute to this.
>
> Brad Allan


>
>
> "Craig Alexander Morrison" <CraigAlexan...@NoSpamNoMail.com> wrote
> in message news:8h4neh$jt5$1...@supernews.com...
> > David
> >
> > <<<<BTW, I'm pretty much opposed to natural keys in general, since I've
> run
> > onto circumstances where they could be used so seldom that I
> > just don't use them, on principle. The only case where I use them would
be
> > in lookup tables, but I consider that completely trivial.>>>
> >

> > How do you guarantee the uniqueness of your records?
> >

> > As you know the AutoNumber sometimes referred to as a Surrogate Key is
> > pseudo-randomly (or unwisely incrementally) generated and does nothing
> more
> > than provide a reference to the record. Nothing the Surrogate does
allows
> it
> > to ensure that records in that table are unique.
> >
> > As far as the AutoNumber goes it should probably not be seen by any user
> of
> > the database including the developer. Indeed would it not be preferable
> that
> > the Natural Key (should it exist) be defined by the user/developer as
the
> > Primary Key and then Jet automatically generate the internal number
(using
> > it's own hashing algorithm) to be used to enforce the relationships. The
> > external view being of the Primary Key and the Foreign Keys.
> >
> > Practically as Access/Jet does not support this the AutoNumber is
(sadly)
> > exposed and the designer has to use this as the Primary Key and part or
> all
> > of the Foreign Key. The designer should also take care to ensure that at

> > least one of the candidate keys is defined as a No Nulls, Required,
Unique


> > Index.
> >
> > Failure to do this makes a mockery of the Relational Model (as does SQL
> with
> > its support of duplicate records).
> >

> > Slainte
> >
> > Craig Alexander Morrison, CData SystemsHouse
> >

Michael (michka) Kaplan

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to
Sheesh Brad, this is not such a smart list, I hope your grade does not
depend on it!

Security -- only true about the database password or poorly implemented
security schemes. But you can say the latter about ANY system.

Integrity -- You are simply dead wrong.

Concurrency -- Works fine for me, not sure what you want to know about it.

Recovery -- Its a DESKTOP database. Comparing it to a server db is hardly a
fair comparison. Do you pit your 6year old against a professional boxer?

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Brad Allan" <brad...@home.com> wrote in message

Arvin Meyer

unread,
Jun 2, 2000, 3:00:00 AM6/2/00
to

Michael (michka) Kaplan wrote in message ...

>Do you pit your 6year old against a professional boxer?

Really more like a bar room brawler against a professional boxer, I'd say.

Most of the time, the professional will win, but there is always that 1 time
<g>
---
Arvin Meyer

Michael (michka) Kaplan

unread,
Jun 3, 2000, 3:00:00 AM6/3/00
to
You would need to redesign several tables to start. Northwind is a really
disgusting example of database schema.

I would never take a contract that required me to be responsible for bad
design.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message
news:8F48D417Adf...@news1.bway.net...
> CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
> wrote in <8h778k$ojp$1...@supernews.com>:


>
> >"Adam Cogan" <adam...@ssw.com.au> wrote in message

> >news:8h6u1o$gd8$1...@argon.syd.dav.net.au...


> >
> ><<< Using Northwind as an example - what WOULD you use as the
> ><<< primary key in
> >the following tables:
> ><<< * Customers * Orders * Products * Products Category
> >

> >I had a look at Northwind's database design (for the first time)
> >in Access 97 and there you have it complete and utter... Of course
> >I do not believe that Northwind is anything more than a sample to
> >show off programming techniques, it was not intended to show how
> >to design relational databases (at least I hope not).
>
> He didn't ask you to critique the Northwinds schema, which we all
> already knew was a disaster.
>
> He asked what the candidate keys are for that data structure.
>
> What primary key would *you* use for Customers? For Orders? For
> Products? For Category?
>
> Answer the question, and then I'll explain to you why you will need
> a surrogate key in several of those tables.

David W. Fenton

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

David W. Fenton

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8h7cfn$khk$1...@supernews.com>:

>How do you guarantee the uniqueness of your records?

Er, as a primary key:

>Do you define a No Nulls, Required, Unique Index to do this or do
>you allow duplicate records all for the AutoNumber.

Unique index, no nulls. D'oh. I set the Autonumber type, and then
click the wee little primary key button on the toolbar.

I guess I could do it in DAO or something, and claim to be doing
something very complicated and geeky and extra robust, but it would
seem like flim-flammery to me.

>I have seen you write about normalisation in other conversations
>yet you seem to fail to observe step 1. In first normal form you
>are, in addition to other things, expected to ensure that no two
>rows in a table are identical. By creating a meaningless value
>such as the AutoNumber (used as a Surrogate Key) you have not
>satisfied the above condition.

I said that when there are no candidate natural keys (that is, that
which defines the unique record cannot be guaranteed to have no
nulls), you have to use a surrogate key, and then, yes, of course,
you have to insure uniqueness through programmatic logic.

>Also what is this about the PK changing, it can, it may, you
>should choose (if you have a choice) the candidate key whose
>field(s) are the least volatile. I do not know where this idea
>that the value of the PK is static (set in stone) came from. It
>should not be volatile, however everything changes or can change.

I've never written an application with alterable PKs that was
stable or easy to maintain. Now that more and more of my apps are
replicated, I feel even more strongly about this, since propagating
PK changes cannot in those circumstances be done via cascading
updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context.

>I ask again: How do you guarantee the uniqueness of your records?

When there is no viable natural key, programmatically.

>How do your users choose between two identical records?

I build routines to help users avoid duplicates at record creation
and administrative tools to de-dup the data if users have made
incorrect choices in data entry.

>As you have said the users do not (should not) see the Surrogate
>Key (AutoNumber) so all they have available to them is the
>remaining data which could exist, duplicated, in one, two or
>thousands of records.
>
>I believe that you are using a Surrogate Key in place of what you
>should be using which is a generated field that can be used in
>conjunction with the other fields in the record to identify

>uniqueness. . . .

This is becoming a rehash of a thread we had a few months ago. I
described a circumstance in which I was storing companies, and only
the company name could be set to No Nulls, but there were multiple
locations for each company, and more than one possible in each city
(though not at the same address). But the only thing the actual
users *always* had was the name of the company, and, most often,
the location. This meant that Nulls had to be allowed in City,
State, Country and Address (which is the only compound key that
could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of
duplicates, and administrative tools to clean up any duplicates
that were created.

The same thing applies to people. When the only information you you
are guranteed to have for a person is their last name, you cannot
use a natural key.

Since nearly every application I have ever written stores data on
people and companies, and has exactly the same operating
requirements, I have never used a natural key for either of these
types of data.

Do you honestly believe that I am missing out on a candidate
natural key for these two entities, given the described operating
requirements?

If you do, I'm all ears. I've love to be able to eliminate the
programmatic logic and de-duping tools.

> . . . This does not preclude


>you using a Surrogate Key such as the Jet AutoNumber so long as
>you define the No Nulls, Required, Unique Index on the field(s)
>that are included in the candidate key(s).

Why you assumed that I would not do that, I can't say. We *were*
talking about primary keys, so that was an implicit part of the
definition.

>Until you can identify each record as a unique record using the
>REAL (even a generated field) fields you cannot have a normalised
>relational database design, you are not even in first normal form.

And real-life applications very seldom allow full normalization.
This does not disturb me, since my job is to provide application
solutions to my clients, not normalizaed data schemas.

And, BTW, if you have a generated field for your PK, you're not in
1NF.

I have *never* used a generated PK field. That's an abomination, in
my opinion, since it violates the first principle of normalization,
dependencies between fields within a record.

>Ultimately you can do whatever you want, I just do not think it is
>fair to call it Relational Database Design, do you?

Call it whatever you like. Or not. I'm talking about real life
applications. You build a data model that is as normalized as is
possible giving the operating conditions. Surrogate keys are very
often required for certain kinds of data. It matters not one iota
to me that you would not call these data structures a "Relational
Database Design" since it seems to me that you've defined the term
so narrowly as to preclude its use in 99% of real-world operating
conditions.

Of course, YMMV.

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

>>> Answer the question, and then I'll explain to you why you will need a
surrogate key in several of those tables.<<<

Oh I did. You don't "need" Surrogates, but I can show you that you can use
Surrogates in all the tables if you want, that is not my point.

I have no problem with use of Surrogates, David, I do have a problem with
the absence of uniqueness in the records in that database, note the ability
to copy and paste records where the only unique index is on the AutoNumber
PK (Surrogate).

Explain to me how you guarantee uniqueness and I'll explain to you why you
need a generated key to be used in conjunction with other fields to create a
composite key that can be set as a No Nulls, Required, Unique Index. No need
to wait for your explanation see my other message.

They contain AutoNumbers as PKs without a No Nulls, Required, Unique Index.
(The only table with such an index is a lookup.)

See my other message.

If you give someone a fish they can feed their family today, if you give
that person a
fishing rod they can feed them for years.

Discussing a particular database design does not deal with the principles of
design, I did however take some examples from that design to illustrate my
points. Database design by e-mail is dangerous.To design Northwind properly
you need to talk to the users or have a clearly defined problem domain that
clearly sets out the requirement and objectives of the system.

Choice of the primary keys depends on the answers to our questions about the
data made to the users. If I make a choice on the Northwind design anyone
could raise a point in objection and these points may or may not be valid
who could tell. In the REAL world you have REAL users and they would help
you to decide upon the correct candidate key(s) one of which could be made
the PK or you could even use a Surrogate (it must be clear to you now that I
am concerned about uniqueness of the records within the database, not
Surrogates).

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow
duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?
Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing
something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.
Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that
which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural
key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility
is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When
there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate
between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build
routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and
if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only
the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in
conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people
and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if
you did find a natural key and you did not make it the PK you would have
defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.
If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the
programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the
candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even
a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very
possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated
field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.


"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out
with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for
implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data
structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly
done meets the real world requirements of most clients. I build REAL
systems, please do not try to justify your approach on that basis.

>>> Of course, YMMV. <<<

I have no idea what that means, enlighten me, unless it is derogatory (of
course)

SUMMARY

Surrogates can and often should be used in Access as the Primary Key (even
using the AutoNumber). AGREED?

If you do use a Surrogate as the PK you must define a No Nulls, Required,
Unique Index on the candidate key(s). AGREED?

If there is no natural key, create a generated field to be used and exposed
to the user to differentiate between records. This composite (a key
consisting of more than one field and including a generated field) candidate
key should be a No Nulls, Required, Unique Index. AGREED? THIS IS WHERE WE
DIFFER I THINK

The ultimate, yet achievable, goal is to ensure and enforce unique records
at the database engine level. AGREED?

Slainte

Craig Alexander Morrison, CData SystemsHouse


Note: I refer to composite keys and compound keys by which I mean:

Compound Key - A key made up of 2 or more fields of REAL fields in the
table.
Composite Key - A key made up of 1 or more fields of REAL fields plus a
generated field

Further Note: A generated field is meaningless (like a surrogate) and it is
(unlike a surrogate) exposed to the user to allow them to differentiate
between records. It is most definitely NOT an "Intelligent Number" which are
not very clever at all.

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still
use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a
Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

See my other message.

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
David

"How do you guarantee the uniqueness of your records?" >>> Er, as a primary

key: <<< An AutoNumber won't do that!

"Do you define a No Nulls, Required, Unique Index to do this or do you allow


duplicate records all for the AutoNumber."

>>> Unique index, no nulls. D'oh. I set the Autonumber type, and then click
the wee little primary key button on the toolbar. <<<

That confirms my point clearly, do you understand my point about uniqueness?


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

>>> I guess I could do it in DAO or something, and claim to be doing


something very complicated and geeky and extra robust, but it would seem
like flim-flammery to me. <<<

You can create a No Nulls, Required, Unique Index on the UI. Nothing fancy.


Why would one need to resort to code?

>>> I said that when there are no candidate natural keys (that is, that


which defines the unique record cannot be guaranteed to have no nulls), you
have to use a surrogate key, and then, yes, of course, you have to insure
uniqueness through programmatic logic.

Wrong! Surrogates on their own should not be used when there is no natural


key, they can be used when there is a natural key and that candidate key is
defined as a No Nulls, Required, Unique Index. Databases/Database Managers
are used to ensure uniqueness not applications.

When there is no natural key you need to create a generated field to
differentiate one record from another and this field needs to be available
to the users. This composite candidate key need not be the PK you can still

use a Surrogate as long as you define the candidate key as an No Nulls,
Required, Unique Index.

"Also what is this about the PK changing, it can, it may, you should choose


(if you have a choice) the candidate key whose field(s) are the least
volatile. I do not know where this idea that the value of the PK is static
(set in stone) came from. It should not be volatile, however everything
changes or can change."

>>> I've never written an application with alterable PKs that was stable or
easy to maintain. Now that more and more of my apps are replicated, I feel
even more strongly about this, since propagating PK changes cannot in those
circumstances be done via cascading
> updates (i.e., at the engine level) because the results can be
unpredictable in a replicated context. <<<

You could of course use Surrogates if the PKs are volatile (this volatility


is subjective, you decide). However we agree on this, don't we?

"I ask again: How do you guarantee the uniqueness of your records?" >>> When


there is no viable natural key, programmatically. <<<

How? I mean if the records are duplicates and you cannot differentiate


between them for the database how can you differentiate between them when
dealing with them programmatically. I am sure that you could reduce the
amount of programming required in your applications should you accept and
understand the point I am making.

"How do your users choose between two identical records?" >>> I build


routines to help users avoid duplicates at record creation and
administrative tools to de-dup the data if users have made incorrect choices
in data entry. <<<

These routines would surely make reference to the fields in the record, and


if so why not enforce this at database level not leave it to the whim of the
application. This approach requires that every operation performed by the
application needs to include routines to handle the fact that the database
is not doing its job. And as above how do you deal with duplicate records if
even the database doesn't know that they are.

>>>This is becoming a rehash of a thread we had a few months ago.<<<

I was not present at that time, so perhaps this time we can come to the
right conclusion.

>>> I described a circumstance in which I was storing companies, and only


the company name could be set to No Nulls, but there were multiple locations
for each company, and more than one possible in each city (though not at the
same address). But the only thing the actual users *always* had was the name
of the company, and, most often, the location. This meant that Nulls had to
be allowed in City, State, Country and Address (which is the only compound
key that could insure uniqueness). Therefore, a surrogate key was required,
and program-level duplicate checking to minimize the creation of duplicates,
and administrative tools to clean up any duplicates that were created. <<<

What's wrong with CompanyName and a generated field to be used in


conjunction with the name to be the PK or the No Nulls, Required, Unique
Index. The users could then see the Name and the Generated Field when
choosing the Company. This does not preclude you using an AutoNumber
(Surrogate) as the PK.

Thus you use a composite key as opposed to a compound key, the argument
given above proves that the compound candidate key suggested is not
adequate, indeed many systems would require more than one address and this
information would not exist in the Company table anyway. This information
would (more likely) reside in a CompanyAddress table.

>>> The same thing applies to people. When the only information you are


guranteed to have for a person is their last name, you cannot use a natural
key. <<<

Again use a generated field.

>>> Since nearly every application I have ever written stores data on people


and companies, and has exactly the same operating requirements, I have never
used a natural key for either of these types of data. <<<

I take it you meant to say you have never found a natural key, because if


you did find a natural key and you did not make it the PK you would have

defined it as a No Nulls, Required, Unique Index, wouldn't you?

>>> Do you honestly believe that I am missing out on a candidate natural key
for these two entities, given the described operating requirements? <<<

No. Whether or not you have a natural key does not change what I am saying.


If you wish to use Surrogates you can, however when using them on a table
that does not have a natural key you need a generated field for the users.
They cannot (should not) see the Surrogate, they should see the data
including the generated field to allow them to differentiate between two
records that without the generated field would be duplicates. Using a
Surrogate as the PK does not do anything to protect against duplicate
records as you have agreed, you even have to write code to try to control
and eliminate these duplicates, we never had to do this.

>>> If you do, I'm all ears. I've love to be able to eliminate the


programmatic logic and de-duping tools. <<<

I hope you will consider the above paragraph


" . . . This does not preclude you using a Surrogate Key such as the Jet
AutoNumber so long as you define the No Nulls, Required, Unique Index on the
field(s) that are included in the candidate key(s)."

>>> Why you assumed that I would not do that, I can't say. We *were* talking
about primary keys, so that was an implicit part of the definition. <<<

No, I was asking you. A Primary Key is just the chosen one from the


candidate key(s). No Nulls, Required, Unique Index should be set on all
candidate keys not chosen as the Primary Key. This applies even more so if
you employ a Surrogate for use as the Primary Key.

"Until you can identify each record as a unique record using the REAL (even


a generated field) fields you cannot have a normalised relational database
design, you are not even in first normal form."

>>> And real-life applications very seldom allow full normalization. This
does not disturb me, since my job is to provide application solutions to my
clients, not normalizaed data schemas.<<<

Please, I build REAL systems and we respect the clients data. It is very


possible to design high performance professional reliable systems by
adhering to the Relational Model, we do it every day. Of course it depends
what you load into your comment about "full normalisation" I too object to
what I call "Anal Normalisation" after all we are all building real systems
for use by real people in real companies.

>>> And, BTW, if you have a generated field for your PK, you're not in 1NF.
<<<

Wrong! Product Code, Account Number, Registration Number

Anyway if you cannot identify a natural key the generated field is usually a
part of the PK not the PK itself, and you can of course still use a

Surrogate such as the AutoNumber and then define a, you guessed it, No
Nulls, Required, Unique Index.

>>> I have *never* used a generated PK field. That's an abomination, in my


opinion, since it violates the first principle of normalization,
dependencies between fields within a record.<<<

I beg your pardon, what dependency are you talking about? The generated


field is not an "intelligent number" it is a reference much like the
Surrogate, except that it needs to be visible and normally forms part of the
unique index. Unlike the Surrogate it needs to be visible to permit the user
to distinguish between records that are very similar.

If I was talking about an "Intelligent Number" you are dead right that is an
abomination and creates a functional dependency that is totally
unacceptable.

A simple routine can be added to prompt the user when the generated field is
incremented above "0001". If the generated field is a simple code such as
"0001" you can increment it within say the Company table within the same
CompanyName. So whenever the code generated exceeds "0001" you can prompt
the user that they may be creating a duplicate, allowing them to view the
other record(s) and confirm or cancel the insert. For "0001" you could read
"AAAB" or "X001" or whatever you fancy. I would most strongly advise no-one
to use parts of the other fields to create this field.

Using this generated key you can alert the user when selecting a Company
that others exist in the database that have the same name.

"Ultimately you can do whatever you want, I just do not think it is fair to
call it Relational Database Design, do you?"

>>> Call it whatever you like. Or not. I'm talking about real life
applications. <<<

Oh so am I. Where there are no duplicate entities in real life.


>>> You build a data model that is as normalized as is possible giving the
operating conditions. <<<

I could not agree with you more here. Normalisation should be carried out


with a clear understanding of the problem domain. Anal Normalisation ignores
what the real business area requires and tries to model the real world, you
should always model the business information in its context. In fact we have
already covered this earlier.

>>> Surrogate keys are very often required for certain kinds of data. <<<

Again I agree and have already said as much. It is often preferable for


implementing RI in many products and when you have a complex candidate key.
If all the candidate keys are volatile it is highly desirable not to use
them as the PK

>>> It matters not one iota to me that you would not call these data


structures a "Relational Database Design" since it seems to me that you've
defined the term so narrowly as to preclude its use in 99% of real-world
operating conditions.<<<

As I said you can do whatever you want, Relational Database Design, properly


done meets the real world requirements of most clients. I build REAL
systems, please do not try to justify your approach on that basis.

>>> Of course, YMMV. <<<

I have no idea what that means, enlighten me, unless it is derogatory (of
course)

SUMMARY

Surrogates can and often should be used in Access as the Primary Key (even
using the AutoNumber). AGREED?

If you do use a Surrogate as the PK you must define a No Nulls, Required,
Unique Index on the candidate key(s). AGREED?

If there is no natural key, create a generated field to be used and exposed
to the user to differentiate between records. This composite (a key
consisting of more than one field and including a generated field) candidate
key should be a No Nulls, Required, Unique Index. AGREED? THIS IS WHERE WE
DIFFER I THINK

The ultimate, yet achievable, goal is to ensure and enforce unique records
at the database engine level. AGREED?

Slainte

Craig Alexander Morrison, CData SystemsHouse

Arvin Meyer

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
This is as good a place as any to jump in, so here I go. BTW, Craig, fix
your newsreader 7 to 8 14K message duplicates is NOT necessary.

If you are familiar with the Bob Newhart show of years ago:

"Hi, I'm Larry. This is my brother Darryl, and this is my other brother
Darryl"

Duplicates exist when you attempt to reduce the real world to
representational mathematics. Of course they are usually not duplicates, but
the database may see them that way. Any attempt to enforce uniqueness at the
simplified level requires over-complexity for a very few cases, or
frustrating the users, who will then munge the hell out of the data just to
"get the job done".

In a highly normalized database, the person record in the people table may
not have many attributes, so the uniqueness may not exist for a natural
key. I.e: There are undoubtedly quite a few John Smith's, there may be in a
large company several in the same department, and indeed, several at the
same apartment complex.

Then what happens, for instance, in an association database, if one of the
John Smith's moves to another company and applies for a new membership
without informing, the association that he is, indeed, the same John Smith
who used to work for CompanyX? What do you think the over-worked $8/hr. data
entry person at the association is going to do with that if you enforce
uniqueness? And do we really need to spend $250, or more, in programming
time to deal with the few occurrences of that? My experience is that 100% of
the time, that data entry person will defeat the perfect relational model.
They will do it by either munging the data or simply by not entering it.
What they will never do, is take the time to hunt down the reason for the
apparent duplicity when they are busy. If you inform, but not enforce, they
*may* (if they are conscientious) take the time later to deal with it.

Your argument is for a "theoretical" relational model. But David (and I),
would argue that the real world doesn't support that. What we are talking
about is a "functional" relational model. The "theoretical" model is the one
we aspire to, but for cost/benefit reasons the "functional" model is what we
produce. How closely the "functional" model parallels the "theoretical"
model is a factor of our skills and the client's budget.

I enforce uniqueness where it is absolutely required (like in lookup
tables), but allow duplicates in other areas after informing the user that
the possibility exists ("Do you realize that you have another John Smith
working here?") Certain tables, like tblAddresses, have only an AutoNumber
to enforce the uniqueness as the PK, because the FK may certainly exist as a
duplicate. Too many indices on the data itself only slows bulk inserts, and
enforcing uniqueness at the data level means that you cannot allow nulls in
places where the schema may require it ("Oh my, what do I do now? He didn't
fill in the City, State, and ZipCode! I guess I'll just have to put one in,
or just forget it!") Often, by looking at the wear pattern on the user's
{Esc} key, you can easily tell how much enforcement is present.

So, I agree with your first premise. I would disagree with the second
statement in your summary. For the reasons above, I'd disagree with your 3rd
premise. And while I'd agree that the 4th statement is the goal, I'd
disagree that it is 100% achievable. "My other brother, Darryl" will
certainly exist as a duplicate in my databases, although there *may* be a
note to that effect (memos are non-indexable, and I sure as hell won't be
adding a field to my databases to allow for that one case in several hundred
thousand).
---
Arvin Meyer

Craig Alexander Morrison wrote in message <8hdaja$l71$1...@supernews.com>...

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
Sorry everyone, my ISP has had problems the connection appeared to hang when
it had in fact sent the messages

Arvin, will get back to you when I have read your message, just wanted to
apologise for the DUPLICATES, it was not a comment on the existence of
duplicates honestly. (g)

"Arvin Meyer" <a...@m.com> wrote in message
news:8hdsd6$6j1$1...@slb7.atl.mindspring.net...


> This is as good a place as any to jump in, so here I go. BTW, Craig, fix
> your newsreader 7 to 8 14K message duplicates is NOT necessary.

Slainte

Craig Alexander Morrison

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
Arvin

As you may suspect I disagree with you. (g)

All I have said in essence is that you should not rely on the Surrogate when
you have no natural key.

The generated field of which I speak can be automatically generated too and
it in conjunction with other real fields can be assigned as a No Nulls,
Required, Unique Index, if you wish to continue to use the surrogate.

At data entry the user can be prompted that this may be a duplicate and then
create it/save it with the real fields together with the generated field.

>>> Your argument is for a "theoretical" relational model. But David (and
I), would argue that the real world doesn't support that. <<<

Absolutely NOT. We have REAL databases, working. Why do we always get this
"theoretical" label as an argument, I consider that intellectually
indefensible. The relational model is practical and we use it for real world
solutions. Relational Theory is not "theoretical" it is proven.

Anyway this is incorrect terminology "theoretical and functional". You
define a logical design and then the physical design, a product like Access
supports the Relational Model and permits a fairly clean almost compromise
free translation from the logical to the physical. Unique records are the
foundation of all good relational database designs, in fact it is NOT
relational at all if duplicates are supported. This is the main problem with
SQL. Simply put SQL is not relational and neither are databases that do not
enforce unique records.

>>> I enforce uniqueness where it is absolutely required (like in lookup
tables), but allow duplicates in other areas after informing the user that
the possibility exists ("Do you realize that you have another John Smith
working here?") <<<

That is also what a generated field would allow you to do. The generated
field has the advantage of being visible to the user allowing them to know
which John Smith they are doing things to, even is that is a simple
reference number. It means that they know which of the suspected duplicates
they are dealing with. The surrogate is hidden and should not be used for
this purpose.

I find it hard to believe that you do not agree with "If you do use a


Surrogate as the PK you must define a No Nulls, Required, Unique Index on

the candidate key(s)." Why if you have at least one acceptable candidate key
would you not define it so. I really find this quite incredible. This
statement clearly states that when you have acceptable candidate keys you
should define them and is separate to when you do not have a natural key.

>>> And while I'd agree that the 4th statement is the goal, I'd disagree
that it is 100% achievable. <<<

You can't be a little bit pregnant...

Are you really saying that you would not enforce data integrity at database
level. Any application not aware of the design flaws in the database could
hose the database's data integrity.

>>> (memos are non-indexable, and I sure as hell won't be adding a field to
my databases to allow for that one case in several hundred thousand).<<<

Well this is a whole new topic, Elimination of Nulls and the use of
Supertype/Subtype, If you have a field that only applies to a subset of the
entities then it would be best located in a table with a 1 to 1 relationship
where one side contains the main data and the other side(s) contain data
specific to that particular type.

Actually you already do add a field if you use a surrogate as primary key
instead of the natural key.

I have to say I am very concerned that you and others do not see the
importance of unique records. And seem to think that Data Integrity can be
adequately controlled by applications.

Anyway just to reiterate Relational Theory is proven it is not "theoretical"
as you seem to think it is. Check the various definitions of Theory you
will note that it has several meanings some of which are the antithesis of
others. Relational Theory is subject to proofs; we make real "PRACTICAL"
systems based on this "THEORY" and our clients keep coming back for more.

So as I said earlier in this thread you can do what you like, but please
don't think it is Relational Database Design.

You can argue that it works for you but please, it is NOT Relational
Database Design, it fails first normal form.

I see no point in continuing this discussion as we are obviously talking
about different things. I am a relational database designer building real
systems for real companies in the real world. You and David would obviously
not be comfortable being described as relational database designers,
database designers would be more fitting, no doubt building
real systems for real companies in the real world too.

If it works for you fine, it sure works fine for me.

I can't see that we could ever agree on this, you are certainly not going to
convince me and it seems I am not going to convince you.

Michael (michka) Kaplan

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
Yes it is. I was being asked how I would handle Northwind. And I answered
it.

If you ask me a question, I answer it the way I would for a client.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message

news:8F49F3688df...@news1.bway.net...
> forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
> Kaplan) wrote in <OljTS8bz$GA.331@cpmsnbbsa09>:


>
> >You would need to redesign several tables to start. Northwind is a
> >really disgusting example of database schema.
> >
> >I would never take a contract that required me to be responsible
> >for bad design.
>

> That's not the point of the inquiry, though.

Michael (michka) Kaplan

unread,
Jun 4, 2000, 3:00:00 AM6/4/00
to
Craig is saying the same thing I was: that very often, surrogate keys are
used in a way that basically allows you to circumvent database integrity.

Everyone needs to calm down a little bit!

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"David W. Fenton" <dXXXf...@bway.net> wrote in message

news:8F49F77A1df...@news1.bway.net...
> CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
> wrote in <8hdb36$gjd$1...@supernews.com>:


>
> >I do have a problem with
> >the absence of uniqueness in the records in that database, note
> >the ability to copy and paste records where the only unique index
> >is on the AutoNumber PK (Surrogate).
>

> You have assumed something not in evidence, that the records are
> not unique. The AutoNumber makes them unique, even if the records
> are otherwise identical.
>
> Need I explain in more detail?

Brendan Reynolds

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
I'm finding the discussion extremely interesting, and I'm glad you posted
your views, regardless of whether we ever achieve a consensus or not.

I'm trying to relate all this to a concrete example. I worked on an app
recently that, among other things, records sales enquires. Sometimes, the
people making the enquiries will not give a full name. They may only give a
first name, or they may only give a last name.

What I did in this particular application is to make the last name required.
If the caller won't give a last name, the users enter 'Unknown' as the last
name. I don't like it much, but given that some callers will only give a
first name, and some callers will only give a last name, I can't see any
alternative. The app assigns a number to each contact, and I have a no
nulls, required, unique index on the combination of last name and number.

At the end of the day, though, the result is two records, one with last name
'Unknown' and number '001', one with last name 'Unknown' and number '002'.
There are no duplicate records in the database, but what about the real word
entities that these records represent? Are 'Unknown 001' and 'Unknown 002'
the same person? No one knows.

I'm not at all sure that I shouldn't have just discarded these records, on
the basis that there isn't enough information available to justify recording
them at all.

--
Brendan Reynolds
bren...@indigo.ie

Craig Alexander Morrison <CraigAlexan...@NoSpamNoMail.com> wrote in

message news:8he7nm$2i2$1...@supernews.com...

> I find it hard to believe that you do not agree with "If you do use a


> Surrogate as the PK you must define a No Nulls, Required, Unique Index on

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8hdbb6$go4$1...@supernews.com>:

>"How do you guarantee the uniqueness of your records?" >>> Er, as
>a primary key: <<< An AutoNumber won't do that!
>
>"Do you define a No Nulls, Required, Unique Index to do this or do
>you allow duplicate records all for the AutoNumber."
>
>>>> Unique index, no nulls. D'oh. I set the Autonumber type, and
>>>> then click
>the wee little primary key button on the toolbar. <<<
>
>That confirms my point clearly, do you understand my point about
>uniqueness? Just creating a random number and sticking it on the
>record does nothing for the uniqueness of the REAL data.

You've just described the flaw in all uses of surrogate keys.
Congrats.

>>>> I guess I could do it in DAO or something, and claim to be
>>>> doing
>something very complicated and geeky and extra robust, but it
>would seem like flim-flammery to me. <<<
>
>You can create a No Nulls, Required, Unique Index on the UI.
>Nothing fancy. Why would one need to resort to code?

Well, you seem to think that I wasn't doing that, so I figured it
must be that doing it through the UI must somehow be a less
"respectable" way to do it.

I said "primary key" -- that means, by definition, that it has the
characteristics you described. I didn't think it was necessary to
indicate explicitly that I knew what the definition of a primary
key is. Otherwise, it wouldn't be a primary key.

>>>> I said that when there are no candidate natural keys (that is,
>>>> that
>which defines the unique record cannot be guaranteed to have no
>nulls), you have to use a surrogate key, and then, yes, of course,
>you have to insure uniqueness through programmatic logic.
>
>Wrong! Surrogates on their own should not be used when there is no
>natural key, they can be used when there is a natural key and that
>candidate key is defined as a No Nulls, Required, Unique Index.
>Databases/Database Managers are used to ensure uniqueness not
>applications.

You seem to be switching back and forth between your points of
reference. When I said "you have to insure uniqueness through
programmatic logic" I was quite clearly referring to the uniqueness
of the record, not the uniqueness of the primary key value, which
is (obviously) insured by the index, and enforced by the database
engine.

You removed the context of my remark, and thus lost the meaning.

>When there is no natural key you need to create a generated field
>to differentiate one record from another and this field needs to
>be available to the users. This composite candidate key need not
>be the PK you can still use a Surrogate as long as you define the
>candidate key as an No Nulls, Required, Unique Index.

An AutoNumber is simply a form of generated field. I don't see
anything wrong with it, and it has the distinct virtue in Access of
being the only possible way of generating a unique value at the
engine level.

>"Also what is this about the PK changing, it can, it may, you
>should choose (if you have a choice) the candidate key whose
>field(s) are the least volatile. I do not know where this idea
>that the value of the PK is static (set in stone) came from. It
>should not be volatile, however everything changes or can change."
>
>>>> I've never written an application with alterable PKs that was
>>>> stable or
>easy to maintain. Now that more and more of my apps are
>replicated, I feel even more strongly about this, since
>propagating PK changes cannot in those circumstances be done via
>cascading
>> updates (i.e., at the engine level) because the results can be
>unpredictable in a replicated context. <<<
>
>You could of course use Surrogates if the PKs are volatile (this
>volatility is subjective, you decide). However we agree on this,
>don't we?

I'm not sure, since you seem to be changing your argument with
every paragraph.

I believe that a volatile field doesn't belong as a primary key,
since changing it could cause collisions with other records. If
you're going to have to write code to handle *that*, I don't see
any reason not to trade the one problem for the other, and avoid
the engine-level collisions by doing the de-duping in the
application.

>"I ask again: How do you guarantee the uniqueness of your
>records?" >>> When there is no viable natural key,
>programmatically. <<<
>
>How? I mean if the records are duplicates and you cannot
>differentiate between them for the database how can you
>differentiate between them when dealing with them
>programmatically. I am sure that you could reduce the amount of
>programming required in your applications should you accept and
>understand the point I am making.

Read my lips: in the circumstances I am describing, there is no
candidate key (including composite keys) that will guarantee
uniqueness that can be guaranteed to avoid the need to store nulls.

And, therefore, the user has to decide. I did not mean to imply
that the determination of the distinctions was done programtically,
but that it was handled programmatically by presenting the user
with the tools to decide that which can not be decided on the basis
of the data alone.

>"How do your users choose between two identical records?" >>> I
>build routines to help users avoid duplicates at record creation
>and administrative tools to de-dup the data if users have made
>incorrect choices in data entry. <<<
>
>These routines would surely make reference to the fields in the
>record, and if so why not enforce this at database level not leave
>it to the whim of the application. This approach requires that
>every operation performed by the application needs to include
>routines to handle the fact that the database is not doing its
>job. And as above how do you deal with duplicate records if even
>the database doesn't know that they are.

I give up.

>>>>This is becoming a rehash of a thread we had a few months
>>>>ago.<<<
>
>I was not present at that time, so perhaps this time we can come
>to the right conclusion.

I was hinting that perhaps you should check Deja.com.

And your conclusion is *not* the right conclusion. It is simply
*your* conclusion.

>>>> I described a circumstance in which I was storing companies,
>>>> and only
>the company name could be set to No Nulls, but there were multiple
>locations for each company, and more than one possible in each
>city (though not at the same address). But the only thing the
>actual users *always* had was the name of the company, and, most
>often, the location. This meant that Nulls had to be allowed in
>City, State, Country and Address (which is the only compound key
>that could insure uniqueness). Therefore, a surrogate key was
>required, and program-level duplicate checking to minimize the
>creation of duplicates, and administrative tools to clean up any
>duplicates that were created. <<<
>
>What's wrong with CompanyName and a generated field to be used in
>conjunction with the name to be the PK or the No Nulls, Required,
>Unique Index. The users could then see the Name and the Generated
>Field when choosing the Company. This does not preclude you using
>an AutoNumber (Surrogate) as the PK.

The generated field would still have to be created via code, since
Jet does not have triggers, so you might as well use the
AutoNumber.

>Thus you use a composite key as opposed to a compound key, the
>argument given above proves that the compound candidate key
>suggested is not adequate, indeed many systems would require more
>than one address and this information would not exist in the
>Company table anyway. This information would (more likely) reside
>in a CompanyAddress table.

Fine. Let's normalize more.

And what do you choose when you don't know the location? What do
you choose when you know the state, but not the city?

You have just moved the problem from the Company table to the
Address table.

>>>> The same thing applies to people. When the only information
>>>> you are
>guranteed to have for a person is their last name, you cannot use
>a natural key. <<<
>
>Again use a generated field.

I can't believe you are calling for the use of a generated field.
How does one guarantee uniquess of the generated field? An
AutoNumber with the proper index is the best way of generating a
unique key in Jet.

I assume you *know* that I am only speaking of Jet databases here?

>>>> Since nearly every application I have ever written stores data
>>>> on people
>and companies, and has exactly the same operating requirements, I
>have never used a natural key for either of these types of data.
><<<
>
>I take it you meant to say you have never found a natural key,
>because if you did find a natural key and you did not make it the
>PK you would have defined it as a No Nulls, Required, Unique
>Index, wouldn't you?

There *is* a natural key, if there were perfect data. In reality,
all companies have a name and a unique location. Therefore, there
is a proper candidate natural key, a compound of company name and
whatever fields it takes to define a unique location.

In reality, it is often the case that one needs to create company
records representing these unique entitites, each and everone
having a unique company name and location, but the details of which
are often unknown at the time the record is created. Therefore, the
natural key cannot be used in the database schema, for it would
prohibit users adding records unless they had 100% complete data
for the fields that make up the compound key.

Therefore, you've got to use some key field to take the place of
the real natural key, a surrogate, and the AutoNumber is the best
such surrogate in Access databases, because it can be completely
handled by the database engine.

>>>> Do you honestly believe that I am missing out on a candidate
>>>> natural key
>for these two entities, given the described operating
>requirements? <<<
>
>No. Whether or not you have a natural key does not change what I
>am saying. If you wish to use Surrogates you can, however when
>using them on a table that does not have a natural key you need a
>generated field for the users.

Why do the users need a generated field? That, so far as I can see,
is old-fashioned, main-frame-style logic, where users have to use a
list of Vendor codes or Vendor numbers, because the UI was
insufficient to make it possible for them to choose between the
*real* records.

>They cannot (should not) see the Surrogate, they should see the
>data including the generated field to allow them to differentiate
>between two records that without the generated field would be
>duplicates. Using a Surrogate as the PK does not do anything to
>protect against duplicate records as you have agreed, you even
>have to write code to try to control and eliminate these
>duplicates, we never had to do this.

Generated fields (whether derived or arbitrary) are worse than an
AutoNumber field, and users should never be exposed to them.

To be explicit, in the case of companies, I give the users a
dropdown list of company names, and a second dropdown with the
locations of that company. This is completely sufficient for them
to determine which company to pick, and no form of generated key
would accomplish that, since they'd have to refer to *something* in
order to figure out which generated key was the appropriate one.

>>>> If you do, I'm all ears. I've love to be able to eliminate the
>programmatic logic and de-duping tools. <<<
>
>I hope you will consider the above paragraph

I have, and I categorically reject it.

It is nonsense, and programmer-centric instead of user-centric.

>" . . . This does not preclude you using a Surrogate Key such as
>the Jet AutoNumber so long as you define the No Nulls, Required,
>Unique Index on the field(s) that are included in the candidate
>key(s)."
>
>>>> Why you assumed that I would not do that, I can't say. We
>>>> *were* talking
>about primary keys, so that was an implicit part of the
>definition. <<<
>
>No, I was asking you. A Primary Key is just the chosen one from
>the candidate key(s). No Nulls, Required, Unique Index should be
>set on all candidate keys not chosen as the Primary Key. This
>applies even more so if you employ a Surrogate for use as the
>Primary Key.

I begin to suspect that you really don't know what you are talking
about, or that you are being intentionally obtuse.

>"Until you can identify each record as a unique record using the
>REAL (even a generated field) fields you cannot have a normalised
>relational database design, you are not even in first normal
>form."
>
>>>> And real-life applications very seldom allow full
>>>> normalization. This
>does not disturb me, since my job is to provide application
>solutions to my clients, not normalizaed data schemas.<<<
>
>Please, I build REAL systems and we respect the clients data. It
>is very possible to design high performance professional reliable
>systems by adhering to the Relational Model, we do it every day.
>Of course it depends what you load into your comment about "full
>normalisation" I too object to what I call "Anal Normalisation"
>after all we are all building real systems for use by real people
>in real companies.

Generated keys (and I'm not referring to derived ones here) are a
requirement of poor user interface design. If that's your solution,
I reject it categorically.

>>>> And, BTW, if you have a generated field for your PK, you're
>>>> not in 1NF.
><<<
>
>Wrong! Product Code, Account Number, Registration Number

All generated keys. Meaningless to a human being.

An AutoNumber would suffice in all cases, *unless* it were the case
that one needed to use that generated number to match the database
records up with physical objects (like inventory).

>Anyway if you cannot identify a natural key the generated field is
>usually a part of the PK not the PK itself, and you can of course
>still use a Surrogate such as the AutoNumber and then define a,
>you guessed it, No Nulls, Required, Unique Index.

I give up.

>>>> I have *never* used a generated PK field. That's an
>>>> abomination, in my
>opinion, since it violates the first principle of normalization,
>dependencies between fields within a record.<<<
>
>I beg your pardon, what dependency are you talking about? The
>generated field is not an "intelligent number" it is a reference
>much like the Surrogate, except that it needs to be visible and
>normally forms part of the unique index. Unlike the Surrogate it
>needs to be visible to permit the user to distinguish between
>records that are very similar.

I disagree strenuously. A user should only ever have to pick based
on human-meaningful information. In the case of companies, company
name and location (country, state, city, address) should allow a
user to distinguish the records. Generated fields are just a poorer
version of AutoNumbers.

The exception to this is things like invoice numbers or part
numbers that human beings actually *do* deal with at some level,
because they are the simplest method for matching up the database
records with pieces of paper or little objects on shelves, items
that would otherwise be completely indistinguishable to a human
being.

>If I was talking about an "Intelligent Number" you are dead right
>that is an abomination and creates a functional dependency that is
>totally unacceptable.

What kind of generated field are you talking about that is superior
to an AutoNumber? With Jet, there is no way at the engine level to
generate a key value except with an AutoNumber field, so you've
simply moved the problem back to the programmatic level.

>A simple routine can be added to prompt the user when the
>generated field is incremented above "0001". If the generated
>field is a simple code such as "0001" you can increment it within
>say the Company table within the same CompanyName. So whenever the
>code generated exceeds "0001" you can prompt the user that they
>may be creating a duplicate, allowing them to view the other
>record(s) and confirm or cancel the insert. For "0001" you could
>read "AAAB" or "X001" or whatever you fancy. I would most strongly
>advise no-one to use parts of the other fields to create this
>field.
>
>Using this generated key you can alert the user when selecting a
>Company that others exist in the database that have the same name.

This sounds far, far worse than anything I've suggested, since
you've got to do the same thing in code that my schema with
AutoNumbers requires.

>"Ultimately you can do whatever you want, I just do not think it
>is fair to call it Relational Database Design, do you?"
>
>>>> Call it whatever you like. Or not. I'm talking about real life
>applications. <<<
>
>Oh so am I. Where there are no duplicate entities in real life.

But there is lots of incomplete information, so you often have to
create records that are, to the database engine, identical, but
function as representations of unique real-life entities.

[]

>>>> Surrogate keys are very often required for certain kinds of
>>>> data. <<<
>
>Again I agree and have already said as much. It is often
>preferable for implementing RI in many products and when you have
>a complex candidate key. If all the candidate keys are volatile it
>is highly desirable not to use them as the PK

Good lord, have we come all this way to find out that we *agree*
with each other? AARRRGGHHH!!!

>>>> It matters not one iota to me that you would not call these
>>>> data
>structures a "Relational Database Design" since it seems to me
>that you've defined the term so narrowly as to preclude its use in
>99% of real-world operating conditions.<<<
>
>As I said you can do whatever you want, Relational Database
>Design, properly done meets the real world requirements of most
>clients. I build REAL systems, please do not try to justify your
>approach on that basis.

You have simply moved all the problems that you have adduced in my
approach into different domains. The problems remain because the
data are imperfectly recorded, and your schema has to account for
that. Your approach requires code outside the database engine just
as much as mine does. And you put a burden on your end users to
distinguish records based on meaningless (i.e., generated) data.

Your solution sounds far worse to me.

>>>> Of course, YMMV. <<<
>
>I have no idea what that means, enlighten me, unless it is
>derogatory (of course)

Your mileage my vary.

>SUMMARY
>
>Surrogates can and often should be used in Access as the Primary
>Key (even using the AutoNumber). AGREED?

Of course.

>If you do use a Surrogate as the PK you must define a No Nulls,
>Required, Unique Index on the candidate key(s). AGREED?

That all depends on whether or not you *can*. I've described to you
circumstances in which No Nulls and Required are just not possible.

You seem to have missed that.

If it were possible to have a No Nulls, Required, Unique index,
then I wouldn't use a surrogate, since I've got a perfect natural
primary key. When the natural key cannot meet all those
requirements a surrogate key is required, and then programmatic
logic must be used to enforce the uniqueness that cannot be
enforced by the database engine itself.

>If there is no natural key, create a generated field to be used
>and exposed to the user to differentiate between records. This
>composite (a key consisting of more than one field and including a
>generated field) candidate key should be a No Nulls, Required,
>Unique Index. AGREED? THIS IS WHERE WE DIFFER I THINK

This is definitely where we differ. If your candidate key is not
sufficient in and of itself, no amount of adding columns to
construct a composite key will produce anything better. In that
case, you are just as well off from the standpoint of uniqueness
using an AutoNumber as surrogate key. From the standpoint of ease
of use (for the programmer) and performance, you're much better off
with the AutoNumber.

Again, take my example (regardless of whether or not you normalize
CompanyNames in one table and Locations in another):

Company -- required, cannot be null
Country -- could be null
State -- could be null
City -- could be null
Address -- could be null

You can't make the fields required and exclude nulls without making
it impossible to input data that hasn't been fudged with fake
values. Adding a generated key to this won't do anything but move
the problem to a different place. The problem here is that
operating conditions require the entry of "imperfect" (i.e.,
incomplete) records.

Using a generated field plus the Company name is no better than
using an AutoNumber, since it's just as subject to the possibility
of the creation of duplicates.

And the generated field requires code, while the AutoNumber does
not.

*Both* would require code to prevent the creation of duplicate
records.

>The ultimate, yet achievable, goal is to ensure and enforce unique
>records at the database engine level. AGREED?

Of course. And you can't do it in certain operating conditions.

More important, your solution, the generated key, does not
accomplish it since, in Access, generated fields cannot be
populated at engine level.

>Note: I refer to composite keys and compound keys by which I mean:
>
>Compound Key - A key made up of 2 or more fields of REAL fields in
>the table.
>Composite Key - A key made up of 1 or more fields of REAL fields
>plus a generated field
>
>Further Note: A generated field is meaningless (like a surrogate)
>and it is (unlike a surrogate) exposed to the user to allow them
>to differentiate between records. It is most definitely NOT an
>"Intelligent Number" which are not very clever at all.

On that I misread you.

In all cases, under the operating circumstances I described, your
composite key (with the generated field) is in no way superior to
the AutoNumber, since it has exactly the same set of problems from
the standpoint of enforcing uniqueness as the AutoNumber, while
adding the burden of maintaining the field via code.

And, again, unless the generated number actually has meaning (like
an invoice number or a case number) that connects the database
record to a physical object, then it seems to me that the user
should *never* see it.

I can't imagine that there is anything more to be said on the
subject that hasn't already been said twice or more.

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8he7nm$2i2$1...@supernews.com>:

>I have to say I am very concerned that you and others do not see
>the importance of unique records.

I think you are really obtuse.

Of course we have unique records in our databases. That's what the
AutoNumbers are used for. The goal is that one AutoNumber record
will correspond to one unique instance of the entities being
modelled in the data table.

So it is simply the case that our databased do indeed have unique
records in them, records artificially created to be unique by the
use of AutoNumber fields (which, it seems to me, are completely
indistinguishable in function from the generated fields you
describe). And we use the user interface to help our users keep
from creating multiple records representing unique entities.

Sounds to me as if we are talking entirely about the same thing,
but that you are willfully choosing to imagine that we are not
doing the basics to insure data integrity.

You seem to shoot first (assuming we are stupid) and ask questions
later.

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8he7nm$2i2$1...@supernews.com>:

>I am a relational database designer building real
>systems for real companies in the real world. You and David would
>obviously not be comfortable being described as relational
>database designers, database designers would be more fitting, no
>doubt building real systems for real companies in the real world
>too.

This is a very arrogant and insulting thing to say, as well as 100%
untrue.

I shall have nothing more to do with you.

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
bren...@indigo.ie (Brendan Reynolds) wrote in
<aUB_4.922$v4....@news.indigo.ie>:

>At the end of the day, though, the result is two records, one with
>last name 'Unknown' and number '001', one with last name 'Unknown'
>and number '002'. There are no duplicate records in the database,
>but what about the real word entities that these records
>represent? Are 'Unknown 001' and 'Unknown 002' the same person? No
>one knows.
>
>I'm not at all sure that I shouldn't have just discarded these
>records, on the basis that there isn't enough information
>available to justify recording them at all.

Why bother making it a compound key? Why not just use 001 and 002?

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
Kaplan) wrote in <OljTS8bz$GA.331@cpmsnbbsa09>:

>You would need to redesign several tables to start. Northwind is a
>really disgusting example of database schema.
>
>I would never take a contract that required me to be responsible
>for bad design.

That's not the point of the inquiry, though.

--

David W. Fenton

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8hdb36$gjd$1...@supernews.com>:

>I do have a problem with
>the absence of uniqueness in the records in that database, note
>the ability to copy and paste records where the only unique index
>is on the AutoNumber PK (Surrogate).

You have assumed something not in evidence, that the records are
not unique. The AutoNumber makes them unique, even if the records
are otherwise identical.

Need I explain in more detail?

--

Craig Alexander Morrison

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
David

>>> I shall have nothing more to do with you. <<<

Sorry to hear that, I don't believe that I was being insulting, so whilst I
apologise for appearing to you to be insulting I thought it was fair
comment.

And while we are on the subject of things that appear to be insulting I
plead not guilty to being obtuse, but you would have got me bang to rights
if you had said verbose. (g)

Please re-read my comments in this thread and think about them carefully;
think of them not as a personal criticism but as a technical exposition.

Craig Alexander Morrison

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
Brendan,

>>> I'm finding the discussion extremely interesting, and I'm glad you
posted your views, regardless of whether we ever achieve a consensus or not.
<<<

I am sure that we (the group) will not reach a consensus.

The problem you describe is fairly widespread, however your approach (in
general) seems to be the right one to me.

By using the LastName and the generated field as the No Nulls, Required,
Unique Index you ensured the uniqueness of the records and I take it you
used the AutoNumber as the Surrogate and made it the Primary Key.

Now the AutoNumber was hidden from the users and the Generated Field along
with the other data was visible to the users.

Now that is fine as all data in a relation should be explicit and not
implicit. If you had not used the Generated Field and as you should not
expose users to Surrogates (I think we all agree on that) your data would
not have explicitly shown that you had two records (that may be different)
it would have shown you two duplicate records. Now in this particular
example it may be that these records are in fact duplicates however the
Generated Field indicates that two records are actually two different
records. Looking at the record without a view of the Surrogate (when there
is no Generated Field) may leave one to assume that they are duplicates as
all of the explicit data was identical.

However I am a bit nervous in the use of the "Unknown" bit as this is what
Nulls are for. Perhaps in this particular example the Generated Field should
be used on its own as suggested by David. You would still (if desired be
using the Surrogate as the PK) set this field alone as a No Nulls, Required,
Unique Index, it would be a candidate key and would be the Primary Key if
you did not use a Surrogate.

>>> I'm not at all sure that I shouldn't have just discarded these records,
on the basis that there isn't enough information available to justify
recording them at all. <<<

Well this would surely depend upon the use that these results would be put
to. Obviously with a series of records where you only had a first name or a
last name (imagine the luxury of a first and last name) and some without
even that would not be much use as a prospects list. Although the records
that did have further details could be used in such a way. The overall
results would (I assume) be used to allow the company to better understand
what products/services are attracting interest and used in conjunction with
the actual sales of these products/services give them an indication of the
level of interest and the conversion rate of prospective customers into
customers. Other fields may also give them reasons for high and low
conversion rates.

The discussion thus far boils down to: that as Surrogates are invisible to
the user (a point on which we all seem to agree) you need something to give
the user so that they know which possible "duplicate" they are dealing with,
and this something is a Generated Field (which is very similar to a
Surrogate, except it is visible to the users). If you have to generate a
field for use as (or part of the Primary Key) you may ask why even bother
with a Surrogate at all. Well if the generated field forms part of a complex
Composite Key that will not be a good choice for the Primary Key. If the
generated field is the Primary Key then there is little need of the
Surrogate as well.

Slainte,

Brendan Reynolds

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
No very good reason that I can remember right now, David. I was describing
what I did in one specific real life situation, as an illustration of the
problem. I'm not claiming to have found the 'perfect' solution or anything.
Far from it. Regardless of whether the unique index is on the combination of
the two fields or just the one generated field, the end result is two
records that are identical except for the AutoNumber and the generated
field, instead of two records that are identical except for the AutoNumber,
which really doesn't get us very far at all, does it? If I've understood
your posts correctly, I believe this is one of the points you were making?

--
Brendan Reynolds
bren...@indigo.ie


David W. Fenton <dXXXf...@bway.net> wrote in message

news:8F49FE00Ddf...@news1.bway.net...


> bren...@indigo.ie (Brendan Reynolds) wrote in
> <aUB_4.922$v4....@news.indigo.ie>:
>

> >At the end of the day, though, the result is two records, one with
> >last name 'Unknown' and number '001', one with last name 'Unknown'
> >and number '002'. There are no duplicate records in the database,
> >but what about the real word entities that these records
> >represent? Are 'Unknown 001' and 'Unknown 002' the same person? No
> >one knows.
> >
> >I'm not at all sure that I shouldn't have just discarded these
> >records, on the basis that there isn't enough information
> >available to justify recording them at all.
>

> Why bother making it a compound key? Why not just use 001 and 002?
>

Craig Alexander Morrison

unread,
Jun 5, 2000, 3:00:00 AM6/5/00
to
Chuck,

>>> For what it's worth, Craig, you were being insulting. <<<

I don't agree, re-read what I said. Anyway I apologised to the person who
had taken offence, unfortunately.

I cast no aspersions on that persons ability as a database designer or
application developer, I did question their understanding of the relational
model through a series of messages. Anyway re-read it.

It is NOT strictly relational database design to allow duplicates in a
design. As already made clear a Surrogate is not a real piece of data and
therefore sticking one on the end of a record does not deal with duplicate
data in the real world.

A Surrogate is not visible to the user (should not even be visible to the
developer either) it is not an explicit part of what a record in a relation
(table) contains.

Explicit data is the only data that can be used to enforce uniqueness. It is
NOT relational database design to design tables that do not guarantee
uniqueness with the explicit data. Check out what the definition of a
relation is.

There are many developers who like to think of themselves as relational
database designers, they are not. They may be using products that purport to
support the relational model but they themselves design databases that could
never be called relational. I am not saying they do not do good work that
can support a business quite well, I just think they have to do a lot of
extra work at the application level to handle the anomalies that should have
been dealt with at the database engine level.

If data integrity is not handled by the database engine it is simple to
subvert, any errant application can cause havoc because the database is not
designed to protect itself.

Using Access does not make you a Relational Database Developer nor does it
force you to design relational databases.

If you want we could call them quasi-relational database designers, or
perhaps application developers and/or database designers would do just fine.

>>> There are =lots= of developers here who do "real world" relational
databases, <<<

What is the difference between "real world" relational databases and REAL
WORLD REAL RELATIONAL DATABASES.

Remember a REAL relational database consists of relations (a specific
relational definition of a table)

>>> Please don't take the fact that we don't post our credentials, or the
company we work for, as a indicator that we don't have any. <<<

I do not care about the credentials of the individual, and I can not see
where your comment is coming from. I care about what they say and what their
arguments are. If the "top" developer at a Fortune 1 company is talking
rubbish they are talking rubbish, if a first year student is talking sense
they are talking sense.

>>> Who you do work for has =no= basis on how "good" that person's work is.
<<<

Oh I do so agree, but wonder why you think I don't.

Slainte

David W. Fenton

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
Kaplan) wrote in <uChEGwqz$GA.332@cpmsnbbsa09>:

>random junk of dubious value, at the multilingual,
>no scripts required, http://www.trigeminal.com/
>

>"David W. Fenton" <dXXXf...@bway.net> wrote in message

>news:8F49F3688df...@news1.bway.net...


>> forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
>> Kaplan) wrote in <OljTS8bz$GA.331@cpmsnbbsa09>:
>>
>> >You would need to redesign several tables to start. Northwind
>> >is a really disgusting example of database schema.
>> >
>> >I would never take a contract that required me to be
>> >responsible for bad design.
>>
>> That's not the point of the inquiry, though.
>

>Yes it is. I was being asked how I would handle Northwind. And I
>answered it.
>
>If you ask me a question, I answer it the way I would for a
>client.

I haven't gone back and checked, but I was under the impression
that the question was being asked of Craig Alexander.

And the question had nothing to do with the structure of Northwind,
but with what the proper primary key is for tables holding the kind
of information found in Northwind.

David W. Fenton

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
forme...@spamfree.trigeminal.nospam.com (Michael \(michka\)
Kaplan) wrote in <u49Npwqz$GA.331@cpmsnbbsa09>:

>Craig is saying the same thing I was: that very often, surrogate
>keys are used in a way that basically allows you to circumvent
>database integrity.
>
>Everyone needs to calm down a little bit!

When Craig stops telling me and Arvin that we are incompetent, I'll
calm down a bit.

(I've never seen anyone who writes so poorly as he)

David W. Fenton

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
bren...@indigo.ie (Brendan Reynolds) wrote in
<93M_4.999$v4....@news.indigo.ie>:

>David W. Fenton <dXXXf...@bway.net> wrote in message

>news:8F49FE00Ddf...@news1.bway.net...

>> Why bother making it a compound key? Why not just use 001 and
>> 002?
>

>No very good reason that I can remember right now, David. I was
>describing what I did in one specific real life situation, as an
>illustration of the problem. I'm not claiming to have found the
>'perfect' solution or anything. Far from it. Regardless of whether
>the unique index is on the combination of the two fields or just
>the one generated field, the end result is two records that are
>identical except for the AutoNumber and the generated field,
>instead of two records that are identical except for the
>AutoNumber, which really doesn't get us very far at all, does it?
>If I've understood your posts correctly, I believe this is one of
>the points you were making?

That's precisely my point, and the whole reason why I am arguing
that an AutoNumber is superior to a generated field in all cases
where the generated field itself does not refer to something in the
real world (like a printed invoice, or a physical part number). And
the AutoNumber has the virtue of being generated at the engine
level instead of in code, so I just can't understand the call for
use of generated numbers instead of AutoNumbers.

Add to that the fact that our correspondent has assumed (for no
obvious reason) that I'm calling for the use of improperly indexed
AutoNumber fields (despite having said they are being used as
surrogate primary keys, which itself automatically implies that
they are unique and cannot be null), and, well, you end up with an
enormous amount of confusion.

I've never felt so frustrated in my life as I was last night
writing that long reply, where it took until the last screen to
actually comprehend what my correspondent was saying. I then had to
go back and re-write *every single paragraph* of my response, since
what was intended was only clear in retrospect.

Sorry for the spectacle.

Brendan Reynolds

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
No apologies for 'the spectacle' are necessary as far as I'm concerned,
David. I'm learning from this thread.

The only purpose I can see for the generated field at this stage, and the
reason I used one in the app I described, was that the generated field was
visible to the user, while the AutoNumber PK was not. Without the generated
field, there was no _visible_ way to distinguish the two records.
--
Brendan Reynolds
bren...@indigo.ie

David W. Fenton <dXXXf...@bway.net> wrote in message

news:8F4AD2788df...@news1.bway.net...

Brendan Reynolds

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
That's all right, Craig. That has happened once or twice before! :-)

--
Brendan Reynolds
bren...@indigo.ie


Craig Alexander Morrison <CraigAlexan...@NoSpamNoMail.com> wrote in

message news:8hfioe$i7c$2...@supernews.com...
<snip>

Arvin Meyer

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
And just to add more to this "spectacle", I sometimes allow my PK's to be
visible (although grayed out and disabled) if I think the user will benefit.
For instance, when opening 2 forms to the same record (like a popup instead
of a subform) This permits quick verification. The user is aware that he/she
can't change anything, but knows that the 2 forms are synched.
---
Arvin Meyer

Brendan Reynolds wrote in message ...

Arvin Meyer

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to

David W. Fenton wrote in message <8F4AD8DD3df...@news1.bway.net>...

>
>When Craig stops telling me and Arvin that we are incompetent, I'll
>calm down a bit.

Personally, I could care less what Craig thinks of me. The measure of the
quality of my work is my list of satisfied clients and my bank account.
---
Arvin Meyer

Arvin Meyer

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to

Craig Alexander Morrison wrote in message <8hg99p$r3u$1...@supernews.com>...
(lots snipped)

>A Surrogate is not visible to the user (should not even be visible to the
>developer either) it is not an explicit part of what a record in a relation
>(table) contains.

Usually, but should the user need verification of that specific row, your
proposal to add a derived field, does nothing more than add bulk, and
violates dependency rules. A unique index is a unique index. It matters not
one wit whether that unique index is the PK or any other key. And as far as
the developer is concerned, I always leave my keys visible until my
applications are fully debugged and ready for delivery. That aids
immeasurably in debugging. Then I hide them in about 99% of the cases, since
the user only rarely needs to see any key or index.

Seems to me that you're doing the same thing by adding a field and building
a derived key that does exactly the same thing. What David said about
AutoNumbers is true, they are never used as anything but a PK which by
definition is ALWAYS REQUIRED, NON-NULL, AND UNIQUE.

One other thing, for data entry ease, an AutoNumber is ALWAYS more efficient
than a natural key. I use natural keys, but rarely. Natural PK's require
coded enforcement, and slow efficiency and data entry when that code is
exercised. Seems to me that you do the client a disservice if you waste time
coding uniqueness, when the database engine will enforce it for you at no
cost.
---
Arvin Meyer

Craig Alexander Morrison

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
Arvin

>>> your proposal to add a derived field, does nothing more than add bulk,
and violates dependency rules. <<<

I did not propose adding a DERIVED field I proposed adding a generated field
which does not create any functional dependency.

I won't bother with the other points about surrogates as they have all been
addressed in earlier messages as has this one. However this one seriously
misrepresents what I had said. Admittedly in some rather long and winding
messages. (g)

Craig Alexander Morrison

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
Arvin

I should also add that these generated fields are only required when there
is no natural key.

It is used to (help to) explicitly declare uniqueness and be all or part of
the Primary Key or a No Nulls, Required, Unique Index where a surrogate is
to be used.

All this is database engine stuff, nothing to do with the applications as
you seem to infer. Once generated the field enforces uniqueness at engine
level.

Brendan Reynolds

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
But, surely, the AutoNumber PK enforces uniqueness at the engine level. As I
understand it (and do please correct me if I'm wrong) your objection to
using the AutoNumber PK _alone_ to enforce uniqueness is that it is
artificial, not a 'real' part of the data. If you'll forgive me for quoting
from two of your earliest posts in this thread:

<begin quote>
In first normal form you are, in addition to
other things, expected to ensure that no two rows in a table are identical.
By creating a meaningless value such as the AutoNumber (used as a Surrogate
Key) you have not satisfied the above condition.
<end quote>

<begin quote>


Just creating a random number and sticking it on the record does nothing for
the uniqueness of the REAL data.

<end quote>

What I don't understand is how the generated field is preferable to the
AutoNumber PK from this point of view. How does it satisfy the condition
(that no two rows in a table are identical) any better than the AutoNumber
PK? Isn't it equally meaningless, artificial, and divorced from the 'real'
data?

Let's focus for a moment on that second quote above, where you talk about
the uniqueness of the real data. Surely, in the example I gave before (a
telephone caller who will not give a full name), the real data is not, in
fact, unique. The caller may be, but the data (if we define the data as the
facts available to us about that caller) is not.

--
Brendan Reynolds
bren...@indigo.ie

Craig Alexander Morrison <CraigAlexan...@NoSpamNoMail.com> wrote in

message news:8hj6gh$2q1$1...@supernews.com...

Lyle Fairfield

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
Whew, I was afraid it was girth.

a...@m.com (Arvin Meyer) wrote in
<8hiqtj$act$1...@nntp9.atl.mindspring.net>:

>Personally, I could care less what Craig thinks of me. The
>measure of the quality of my work is my list of satisfied
>clients and my bank account.
---
>Arvin Meyer

--
Lyle
http://www.cyriv.com/

Craig Luna

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
Don't forget about the ability to use GUID just like an AutoNumber
field.

I know of a few situations where a GUID is a good surrogate key (think
of a distributed database or replication).
You can use it just like a Counter (aka: AutoNumber or Identity) by
setting the Default Value equal to "=GenGuid()"

There are a few problems with FindFirst on a GUID if using DAO code
but that is easily circumvented by appending it to the where clause.

Having to produce and manually verify a key as others have suggested
can be a pain and too expensive CPU wise. AutoNumber and GUIDs are
almost free and already populated before you update the record. How
can you lose if you don't have a natural key? (and in some cases it
wouldn't hurt to have both)


-CAL

"David W. Fenton" <dXXXf...@bway.net> wrote in message

news:8F43D0352df...@news1.bway.net...
> There is only one kind of surrogate key that anyone should ever
> use, and that's an AutoNumber. And users should never see or be
> allowed to edit the key field.
>
> I can't imagine a circumstance in which any other surrogate key is
> ever justified.

Arvin Meyer

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
That's what I was saying.
---
Arvin Meyer

David W. Fenton wrote in message <8F4BE7DFDdf...@news1.bway.net>...


>a...@m.com (Arvin Meyer) wrote in

><8hiqmg$nch$1...@slb0.atl.mindspring.net>:


>
>>One other thing, for data entry ease, an AutoNumber is ALWAYS more
>>efficient than a natural key. I use natural keys, but rarely.
>>Natural PK's require coded enforcement, and slow efficiency and
>>data entry when that code is exercised. Seems to me that you do
>>the client a disservice if you waste time coding uniqueness, when
>>the database engine will enforce it for you at no cost.
>

>Well, there's uniqueness of the data record, and then there's
>uniqueness of the entity. You have to build code to ensure the
>latter when you use a surrogate key. But Craig's proposal would
>mean you *also* have to write code to enforce uniqueness of the key
>itself. That seems like way too much work.
>
>I was working on my second commercial app today, and it uses
>natural keys. It's a damned pain in the ass (actually, it
>originally used more natural keys than it does now, but I changed
>it several years ago to use AutoNumber surrogate keys in most of
>the tables) to maintain and work with.
>
>And that app is the one that trained me to avoid natural keys.
>
>And it's a financially oriented app.

Michael (michka) Kaplan

unread,
Jun 6, 2000, 3:00:00 AM6/6/00
to
Craig, this is COMPLETELY wrong.

I have a colleage who thought it was quite funny that I made more last year
on this ONE ISSUE than she had made all that same year. This is a BAD and
dangerous practice in Access.

I have a link off my website that explains some of the MANY problems you
will have.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Luna" <cl...@midsouth.rr.com> wrote in message
news:38g%4.40221$L3.3...@typhoon.midsouth.rr.com...

David W. Fenton

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
bren...@indigo.ie (Brendan Reynolds) wrote in
<yQ2%4.1227$v4....@news.indigo.ie>:

>The only purpose I can see for the generated field at this stage,
>and the reason I used one in the app I described, was that the
>generated field was visible to the user, while the AutoNumber PK
>was not. Without the generated field, there was no _visible_ way
>to distinguish the two records.

No "Date created" field? No "Updated by" field? Absolutely nothing
that indicated any of the details about the record?

Yes, I guess that's what you'd have to do, but I don't see any
reason not to simply use the AutoNumber itself. Who cares whether
the numbers are within the duplicates (i.e., John Smith 001, John
Smith 002, John Doe 001, John Doe 002)?

David W. Fenton

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
a...@m.com (Arvin Meyer) wrote in
<8hiqmg$nch$1...@slb0.atl.mindspring.net>:

>One other thing, for data entry ease, an AutoNumber is ALWAYS more
>efficient than a natural key. I use natural keys, but rarely.
>Natural PK's require coded enforcement, and slow efficiency and
>data entry when that code is exercised. Seems to me that you do
>the client a disservice if you waste time coding uniqueness, when
>the database engine will enforce it for you at no cost.

Well, there's uniqueness of the data record, and then there's
uniqueness of the entity. You have to build code to ensure the
latter when you use a surrogate key. But Craig's proposal would
mean you *also* have to write code to enforce uniqueness of the key
itself. That seems like way too much work.

I was working on my second commercial app today, and it uses
natural keys. It's a damned pain in the ass (actually, it
originally used more natural keys than it does now, but I changed
it several years ago to use AutoNumber surrogate keys in most of
the tables) to maintain and work with.

And that app is the one that trained me to avoid natural keys.

And it's a financially oriented app.

--

David W. Fenton

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
cl...@NopeToSpaMmidsouth.rr.com (Craig Luna) wrote in
<38g%4.40221$L3.3...@typhoon.midsouth.rr.com>:

>Don't forget about the ability to use GUID just like an AutoNumber
>field.
>
>I know of a few situations where a GUID is a good surrogate key
>(think of a distributed database or replication).
>You can use it just like a Counter (aka: AutoNumber or Identity)
>by setting the Default Value equal to "=GenGuid()"

You are quite clearly talking out your ass here. If you had ever
tried to implement a GUID PK, whether in a replicated app or not,
you'd know that it is *impossible* to create a working data
structure without a massive amount of extra work *outside the
database engine*.

>There are a few problems with FindFirst on a GUID if using DAO
>code but that is easily circumvented by appending it to the where
>clause.

This isn't the only problem.

>Having to produce and manually verify a key as others have
>suggested can be a pain and too expensive CPU wise. AutoNumber and
>GUIDs are almost free and already populated before you update the
>record. How can you lose if you don't have a natural key? (and in

>some cases it wouldn't hurt to have both).

This paragraph seems completely contradictory of everything you've
posted in this thread. You are such a poor writer that I am left
without any clue as to what the hell you've been talking about in
this entire thread.

Craig Alexander Morrison

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
Brendan

You actually answered this yourself in an earlier post - it is to do with
visibility.

>>>The only purpose I can see for the generated field at this stage, and the
reason I used one in the app I described, was that the generated field was
visible to the user, while the AutoNumber PK was not. Without the generated
field, there was no _visible_ way to distinguish the two records.<<<

That sums it up nicely, Brendan. You had used the AutoNumber as a surrogate,
which is why the generated field was required. If you had used the generated
field on its own rather than in conjunction with the LastName (as LastName
was sometimes "unknown" and therefore NULL), there would have been no need
for the surrogate as the generated field would have been a good candidate as
the Primary Key as it was a simple (single field) key.

The difference between the AutoNumber (used as a surrogate) and the
generated field is visibility.

The AutoNumber in this discussion has been used as a surrogate, surrogates
should not be visible to the user. (even the developer should not really see
them)

If you made the AutoNumber visible then it is not a surrogate it becomes
real data.

If you generate a number you could make it the Primary Key or part of the
Primary Key, and you need not use a surrogate. However you can also use a
surrogate if you wish, however there is no point if the generated field is
the only field in the Primary Key. If the generated field is part of say a
four field candidate key you may (wisely) decide to use a surrogate (for
which many would use an AutoNumber). You should of course set up a No Nulls,
Required, Unique Index on the generated field and any other fields that make
up the candidate key.

> <begin quote>
> Just creating a random number and sticking it on the record does nothing
for
> the uniqueness of the REAL data.
> <end quote>

I am not sure which message this quote was from and if I didn't I should
have said that as this random number was not visible it did not form part of
the real and explicit data. If it was visible then it would be part of the
real data.

A surrogate is by definition invisible to the user.

If you use an AutoNumber and make it visible to the user then it is not a
surrogate. The generated field could be an AutoNumber although I think most
here would agree that it is not a good idea in general to allow the users to
see AutoNumbers, surrogate or not.

Slainte,

Craig Alexander Morrison

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
Nothing to do with me, David

Sorry you don't seem to able to understand my other posts. The one you
responded to here has nothing to do with me.

Slainte

Craig Alexander Morrison, CData SystemsHouse

"David W. Fenton" <dXXXf...@bway.net> wrote in message

news:8F4BE0FCBdf...@news1.bway.net...

Brendan Reynolds

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
This is were our thinking has diverged up to now. To you, if the generated
field is visible, it is part of the real data. I had not been thinking about
it in that way up to now - I had been thinking about the visibility of the
generated field merely as a convenience to the user. Now that you mention
it, though, I can see your point. There are, after all, many artificially
generated numbers or codes that we commonly treat as 'real' data - invoice
numbers, account numbers, staff numbers, cheque numbers, etc. When you get
right down to it, I guess even a name is not really all that fundamentally
different. I'll have to give some more thought to this, but I certainly do
concede that you have a point.

--
Brendan Reynolds
bren...@indigo.ie


Craig Alexander Morrison <CraigAlexan...@NoSpamNoMail.com> wrote in

message news:8hkh14$4vr$1...@news6.svr.pol.co.uk...

Brendan Reynolds

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
Yes, in many cases the combination of date/time and user name might be a
better choice. But of course, if security is not implemented, all users will
be 'Admin'. And the particular app I described was used by people who
regularly changed their system time - something to do with some other app
they were using, nothing to do with me. So the combination of user name and
date created was not guaranteed to be unique either.

--
Brendan Reynolds
bren...@indigo.ie


David W. Fenton <dXXXf...@bway.net> wrote in message

news:8F4BE5987df...@news1.bway.net...


> bren...@indigo.ie (Brendan Reynolds) wrote in
> <yQ2%4.1227$v4....@news.indigo.ie>:
>

> >The only purpose I can see for the generated field at this stage,
> >and the reason I used one in the app I described, was that the
> >generated field was visible to the user, while the AutoNumber PK
> >was not. Without the generated field, there was no _visible_ way
> >to distinguish the two records.
>

> No "Date created" field? No "Updated by" field? Absolutely nothing
> that indicated any of the details about the record?
>
> Yes, I guess that's what you'd have to do, but I don't see any
> reason not to simply use the AutoNumber itself. Who cares whether
> the numbers are within the duplicates (i.e., John Smith 001, John
> Smith 002, John Doe 001, John Doe 002)?
>

Craig Luna

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
I didn't say it was the best way but it can be used the same way.

> > Don't forget about the ability to use GUID just like an AutoNumber
> > field.

I guess what I should have said was that facilities are in place to
allow for Jet to automatically create a key. That is all.

Its too much of a pain to used them. The only place I have had the
real need is in a code data synchronization routine and in a queuing
operation.

Regardless of preference (and I use counters personally) ,
lets discuss your problems you have listed on your web site.

(Keep in mind this is from an app programmer's POV so I view the GUIDs
as being a little more flexible. If you are keeping the database open
and visible then I will concur with your points)

Basically it appears your major quarrel is with the type variance
among platforms and version. Right?

-CAL

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com>
wrote in message news:ueBrEbD0$GA.329@cpmsnbbsa07...

Craig Luna

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
Well dipshit David, I was letting others know that the AutoNumber
isn't the
only generated field avaliable in JET.

Thats all. If you already know that then great, you won't have to hurt
ears again the next time you pull your head out of your ass.

Look at the post I responded to (I notice you clipped it) and you will
see that was all that was stated.

I agree that it is a pain to implement a GUID and yes code is required
to make it work right. Even though, there are situations that require
uniqueness outside of a single db and this is can be the only option
you have.


"David W. Fenton" wrote:

>>(Craig Luna) wrote in


> >Don't forget about the ability to use GUID just like an AutoNumber
> >field.
> >

> >I know of a few situations where a GUID is a good surrogate key
> >(think of a distributed database or replication).
> >You can use it just like a Counter (aka: AutoNumber or Identity)
> >by setting the Default Value equal to "=GenGuid()"

> You are quite clearly talking out your ass here. If you had ever
> tried to implement a GUID PK, whether in a replicated app or not,
> you'd know that it is *impossible* to create a working data
> structure without a massive amount of extra work *outside the
> database engine*.


Sure whats your point? DAO is required and in some cases the overhead
is worth it.
I hope you don't have an aversion to code since your are on your 2nd
commercial application and oviously can write code in your sleep.

> >There are a few problems with FindFirst on a GUID if using DAO
> >code but that is easily circumvented by appending it to the where
> >clause.

> This isn't the only problem.

How 'bout a little editting on my part then.
_There are a few problems: a major one is with using FindFirst on a
GUID type field _

> >Having to produce and manually verify a key as others have
> >suggested can be a pain and too expensive CPU wise. AutoNumber and
> >GUIDs are almost free and already populated before you update the
> >record. How can you lose if you don't have a natural key? (and in
> >some cases it wouldn't hurt to have both).

> This paragraph seems completely contradictory of everything you've
> posted in this thread. You are such a poor writer that I am left
> without any clue as to what the hell you've been talking about in
> this entire thread.

Ploop. (the sound of your head being pulled out of ass)

This was my first post to this thread!!!!!!!!!!

But even though. No arguements were made by me as to whether or not
the use of a GUID is good or bad. It once again was presented as
another way of producing uniqueness without having to manually
produce
a unique key via code. Autonumber fields are preferrable in almost all
instances.

-CAL


Arvin Meyer

unread,
Jun 7, 2000, 3:00:00 AM6/7/00
to
Now that you mention it, I do need to go on a diet. I've been eating way to
good. <g>
---
Arvin Meyer

Lyle Fairfield wrote in message <8F4B754B7lyl...@24.9.0.17>...


>Whew, I was afraid it was girth.
>

>a...@m.com (Arvin Meyer) wrote in

David W. Fenton

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
cl...@NopeToSpaMmidsouth.rr.com (Craig Luna) wrote in
<5Sy%4.40841$L3.3...@typhoon.midsouth.rr.com>: [To MichKa:]

>Basically it appears your major quarrel is with the type variance
>among platforms and version.

You're posting in an Access group. The only relevant issue is
whether it's a workable solution in Access.

It's not.

I was reckless and irresponsible of you to suggest it was a viable
alternative.

David W. Fenton

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
CraigAlexan...@NoSpamNoMail.com (Craig Alexander Morrison)
wrote in <8hkhkg$rk9$1...@newsg1.svr.pol.co.uk>:

>Nothing to do with me, David
>
>Sorry you don't seem to able to understand my other posts. The one

>you responded to here has nothing to do with me.

You're right. I apologize for that mistake.

Craig Luna

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
In what way is it not viable in Access? Educate me. You have the same
functions I do, so what is the difference?

-CAL

"David W. Fenton" <dXXXf...@bway.net> wrote in message

news:8F4CD80E9df...@news1.bway.net...


> cl...@NopeToSpaMmidsouth.rr.com (Craig Luna) wrote in
> <5Sy%4.40841$L3.3...@typhoon.midsouth.rr.com>: [To MichKa:]
>
> >Basically it appears your major quarrel is with the type variance
> >among platforms and version.
>
> You're posting in an Access group. The only relevant issue is
> whether it's a workable solution in Access.
>
> It's not.
>
> I was reckless and irresponsible of you to suggest it was a viable
> alternative.
>

Michael (michka) Kaplan

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
Did you read the article?

My MAJOR problem is the bugs that occur in trying to use the flipping
things, and the fact that their "fixes" in each version break the
workarounds of the prior versions.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Luna" <cl...@midsouth.rr.com> wrote in message

news:5Sy%4.40841$L3.3...@typhoon.midsouth.rr.com...


> I didn't say it was the best way but it can be used the same way.
>

> > > Don't forget about the ability to use GUID just like an AutoNumber
> > > field.
>

> I guess what I should have said was that facilities are in place to
> allow for Jet to automatically create a key. That is all.
>
> Its too much of a pain to used them. The only place I have had the
> real need is in a code data synchronization routine and in a queuing
> operation.
>
> Regardless of preference (and I use counters personally) ,
> lets discuss your problems you have listed on your web site.
>
> (Keep in mind this is from an app programmer's POV so I view the GUIDs
> as being a little more flexible. If you are keeping the database open
> and visible then I will concur with your points)
>

> Basically it appears your major quarrel is with the type variance

Michael (michka) Kaplan

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
READ THE ARTICLE.

There are HUGE bugs there that have cost thousands of hours of development
time that I know of, personally.

I would be happy to go to those customers and have them come beat the crap
out of you for suggesting that others make the same mistakes, just on
principle.

And I am only half kidding here.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Luna" <cl...@midsouth.rr.com> wrote in message

news:n4E%4.40901$L3.3...@typhoon.midsouth.rr.com...


> In what way is it not viable in Access? Educate me. You have the same
> functions I do, so what is the difference?
>
> -CAL
>
> "David W. Fenton" <dXXXf...@bway.net> wrote in message
> news:8F4CD80E9df...@news1.bway.net...
> > cl...@NopeToSpaMmidsouth.rr.com (Craig Luna) wrote in
> > <5Sy%4.40841$L3.3...@typhoon.midsouth.rr.com>: [To MichKa:]
> >

> > >Basically it appears your major quarrel is with the type variance
> > >among platforms and version.
> >

Craig Luna

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
Yes, I read your article. (Good job btw)

Cite me a specific example though because I haven't been bitten by it
yet.

I would guess that my acceptance of using GUIDs is because I use them
sparingly and only if there is a need of uniqueness across multiple dbs.
That doesn't happen very often. (I've only had the need to use GUIDs
twice in the past 8-10 years) Even if they were not more difficult to
use, they still bloat the db as compared to an autonumber because of
their larger storage requirement.

In both instances that I've needed to use a GUID there was only a simple
relationship between two tables. The saving grace to me was probably
because each table had a counter or identity field as the PK and the
GUID (although generated by dbs) was externally enforced. So almost all
of the operations relied upon the autonumber and not the GUID.


"Michael (michka) Kaplan" wrote:
>
> READ THE ARTICLE.
...cut


> I would be happy to go to those customers and have them come beat the crap
> out of you for suggesting that others make the same mistakes, just on
> principle.
>
> And I am only half kidding here.


Don't make me smack you bitch! :)

I agree they pretty much suck for almost all scenarios but they have a
place. My only intention was to relate the capability and document the
usage to other readers. I was not suggesting that the be used as a
replacement for counters.

-CAL

Michael (michka) Kaplan

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
Any time you need to perform operations like form/subform, domain aggregate
functions, trying to query based on guid criteria, and other problems, you
can be bitten.

Look, I am not going to argue about it. I am just stating that your
recommendation was really very irresponsible in an Access newsgroup.

If I walked into a Delphi newsgroup and answered a question with my limited
knowledge of object pascal and someone corrected me saying I had just given
dangerous advice, I definitely would apologize and hope that no one read my
original post and tried it. I guess its too much to expect others to think
of things the same way.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Luna" <cl...@mc2-ice.com> wrote in message
news:393FB7D3...@mc2-ice.com...

Craig Luna

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to

You are cited as an author of the Access Replication FAQ. Does the fact
that you state distinctly on the second entry that a GUID can
__absoulutely__ be used as a Primary Key Field make you irresponsible
too????? No of course not.

http://support.microsoft.com/support/access/content/repl/replfaq.asp

In case you didn't read carefully before, I agreed with you concerning
JET's GUID and usage as a primary key. I did not recommend its use all
the time, only in cases that need it.

As with anything in this industry, you have to know when to *properly*
use something and when it shouldn't be used. This thread has been more
of universal discussion and not usage specific. Therefore, mentioning
the possible use of a GUID as a PK in JET is acceptable. I maybe should
have added more disclaimers to the initial post, but that has been more
than covered by now.


"Michael (michka) Kaplan" wrote:
>
> Any time you need to perform operations like form/subform, domain aggregate
> functions, trying to query based on guid criteria, and other problems, you
> can be bitten.


If you used a guid like that then you should expect to have been bitten
hard. That is not the purpose of a guid as you well know. I would
equate that to using a double precision field to store an integer,
overkill and wasteful.


> Look, I am not going to argue about it. I am just stating that your
> recommendation was really very irresponsible in an Access newsgroup.

See your own FAQ on the MS site and you state the samething. (link
above)


> If I walked into a Delphi newsgroup and answered a question with my limited
> knowledge of object pascal and someone corrected me saying I had just given
> dangerous advice, I definitely would apologize and hope that no one read my
> original post and tried it. I guess its too much to expect others to think
> of things the same way.


I've been using Jet professionally since the day it was made public. I
guess "limited knowledge" isn't a good example on your part. Just
because *you* don't find value in something doesn't mean its useless.

It just means that either you have a problem knowing the proper way to
implement it or just haven't stumbled upon the situation that really
needs it. It has a purpose. It even works quite well internally for
acting as a primary key for records in the case of the replication
system tables. But yes, for most apps in using Jet it is overkill, buggy
and unnecessary.

-CAL

Michael (michka) Kaplan

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
Actually, there are many authors to the FAQ and I have often made it quite
clear to the people responsbile what my feelings are on the topic.

I do not EVER think it is acceptable to use them in an Access database.
Period. In Jet (say from VB) its fine, there are only Access bugs here.

--
MichKa
"Cause it's a bittersweet symphony, thats life..." -- The Verve

random junk of dubious value, at the multilingual,
no scripts required, http://www.trigeminal.com/

"Craig Luna" <cl...@mc2-ice.com> wrote in message

news:393FEF23...@mc2-ice.com...

Craig Luna

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
I like that answer.

-CAL

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com>
wrote in message news:uuHSe$X0$GA.325@cpmsnbbsa07...

David W. Fenton

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
cl...@mc2-ice.com (Craig Luna) wrote in <393FEF23.C4011623@mc2
-ice.com>:

>As with anything in this industry, you have to know when to
>*properly* use something and when it shouldn't be used. This
>thread has been more of universal discussion and not usage
>specific. Therefore, mentioning the possible use of a GUID as a PK
>in JET is acceptable. I maybe should have added more disclaimers
>to the initial post, but that has been more than covered by now.

Damned straight you should have put in disclaimers. You should have
said it is an alternative the doesn't really work, and that is very
prone to problems.

YOU DID NOT DO THAT.

No amount of hemming and hawing about "well, I knew that" or "I was
thinking that when I posted" will change the fact that you
suggested a course of action AND DID NOT WARN PEOPLE ABOUT THE
MASSIVE PITFALLS INVOLVED IN THAT COURSE OF ACTION.

Face up to it -- you fucked up.

Craig Luna

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
Sir, I believe I said there are a few problems with GUID PKs in the
inital post and that autonumbers are my preference. Findfirst being the
problem that bothers me the most. It could be taken care of using the
where clause in the recordset selection. Of course if you open up the
table, a rs.seek works great. Internally GUIDs are used for replication
without a fault.

Please learn to read a little closer.

GUIDs are perfectly fine PKs in Jet 3.51 but you have better options in
most cases. I still haven't heard your reasons for not liking them. So
far you haven't contributed to this subthread in the least. Almost troll
like.

-CAL

Craig Luna

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to
I said this in jest because you proved my point that sometimes the use
of a GUID in access as a key is acceptable regardless of the problems
the current version of the DBengine has with them.

In the two cases I mentioned before, I leveraged the GenGuid default
value. In one I have a database that acts like a repository and queue.
On the clients machine and database all keys are autonumbers. So yes
they were very simplistic uses of the autonumber GUID.

I have other cases (like with Visio objects) where I have a string field
that contains the objects guid but that is more of a property than a key
(in the way I use it at least). This relationship requires enforcement
via code. I wasn't alluding to those situations when discussing a guid
PK.


Now onto a newer topic. Has anyone verified a problem with GUIDs in the
MDE and how is it performance?

-CAL


Arvin Meyer wrote:
>
> Craig Luna wrote in message ...
> >
> >I find it funny that given the "unreliablity" of GUIDs, you would even
> >consider using it as a foreign key. That sound awfully hypocritical.


> That's the key exposed by the Outlook application. Perhaps your development
> only includes simplistic solutions, but I work for clients who expect the
> full use of Office automation. As a result, Contacts reside in Exchange
> databases and are linked to tables in Access using the GUID from Outlook.
> While not perfect, by any means, it's far better than entering the same data
> in two places. Exchange 2000 offers a better object model, but still not as
> good as the one in Access. I could, of course, build an Access Contacts
> database, but that would not be as cost effective for them.
> ---
> Arvin Meyer

Craig Luna

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to
David, I realize that this is a ms-access ng but I don't know of any Jet
specific forums on USENET. Considering they both use the same db and
there is so little difference between the coded version of each, I don't
perceive too much difference between the two when it comes to the VB/VBA
database coding. Maybe this is a wrong "assumption" (yeah the magic
word). If so I sorry but last I checked, the COM interface DAO isn't
different between VB,Access or VC++ .

The main reason I came here was really to ask a question but I tried to
give a little before taking.

I'm sorry if I offended but considering the way I was attacked outright,
it was inline with the post i responded to. (though after the fact I
think we determined that your initial reply was really because you
thought I was someone else).

I still stand behind the use of a guid as a key in certain circumstances
even though it has a few quirks. (Now if you want to really stir up the
crowd, imagine the one case I need to use a double as a key. Now there
is a really fun entity to work with as a key, a floating value. Thank
GOD for autonumbers )

-CAL

"David W. Fenton" wrote:
>
> cl...@NopeToSpaMmidsouth.rr.com (Craig Luna) wrote in

> <hmX05.44195$L3.3...@typhoon.midsouth.rr.com>:
>
> >As for expericence with the Access engine, I've been using it
> >professionally since its weak implemention in VB2 and have ~3000
> >customers using one product, ~1500 with another product and ~2500
> >others that uses the third. This is not including the other
> >products that these have replaced or the not JET based products.
>
> You've apparently failed to notice that you are posting in an
> Access, not a Jet, newsgroup, and that those who have taken issue
> with your recommendation have always explicitly qualified their
> objections as applying only to Access, not to Jet.

Craig Luna

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to
Na.... Ok I won't say it.

Chuck Grimsby wrote:
>
> Ok people... enough!
>
> No one's actually said the magic word to invoke Godwin's Law, but
> let's stop it before it happens.
>
> Time to end this thread!

0 new messages