Data Primary key vs. Artificial (Autonumber) primary key

M.

unread,

Dec 4, 2007, 2:49:02 PM12/4/07

to

Dear all,

Although many Microsoft Access books advise to set an autonumber field as
primary index (a so called pseudo primary key),
http://www.blueclaw-db.com/database_link_tables.htm advises to use real data
to define a (composite) primary key. In summary, this results in the
following two designs:

>Microsoft Acces books setup for Employee table
Employee_ID (autonumber, primary key)
SSN (social security number, composite index key1)
Employee_Name (full employee name, composite index key2)
composite index SSN + Employee_Name = unique

>BlueClaw setup for Employee table
Employee_ID (autonumber, unique index)
SSN (social security number, (composite) primary key1)
Employee_Name (full employee name, (composite) primary key2)

In both approaches, Employee_ID would be used as a foreign key in other
tables to define the relationship with the Employee table.

Are there any negative aspects associated with the BlueClaw approach?

Pros of BlueClaw approach
*Display of table is meaningful, because it's sorted on primary index
*No cascaded update necessary of linked relationship fields in other tables,
because autonumber is only used for linking tables and therefore will never
change.
*Prevention of duplicates is improved, since data fields are used to check
for duplicates, instead of an (always unique) autonumber field > this can
also be achieved with the composite unique index as shown above in the Access
books example.

Cons of BlueClaw approach
*???

I would appreciate your comments / opinion on the BlueClaw approach, because
I currently have the feeling that I'm missing something that explains why so
many people use autonumber fields as primary (artificial) key. If the
BlueClaw approach is the best one, I'm considering to use it as a standard in
new database design questions.

Best regards,

M.

Jeff Boyce

unread,

Dec 4, 2007, 4:15:25 PM12/4/07

to

You may have just inflamed a long-running religious war about proper primary
keys.<g>

See comments in-line below...

"M." <M...@discussions.microsoft.com> wrote in message
news:5C60984F-9FD8-4E90...@microsoft.com...

> Dear all,
>
> Although many Microsoft Access books advise to set an autonumber field as
> primary index (a so called pseudo primary key),

"psuedo" implies "not real" -- a primary key is a unique identifier, no
matter where it comes from. It isn't a question of "real".

> http://www.blueclaw-db.com/database_link_tables.htm advises to use real
> data
> to define a (composite) primary key. In summary, this results in the
> following two designs:
>
>>Microsoft Acces books setup for Employee table
> Employee_ID (autonumber, primary key)
> SSN (social security number, composite index key1)

you will want to be very careful about capturing/displaying SSNs. Moreover,
not every "person" has (or cares to share) one. How will you handle a Null
SSN?

> Employee_Name (full employee name, composite index key2)

No, no no! If you combine more than one fact in a single field, you have to
work extra hard to do simple things, like, say, sort by LAST NAME! Use
FirstName and LastName fields, then use a query to concatenate them when
needed.

> composite index SSN + Employee_Name = unique

can you say "identity theft"? what makes you think that SSN + Employee_Name
will be unique?

>
>>BlueClaw setup for Employee table
> Employee_ID (autonumber, unique index)

Why? By definition, an Autonumber is supposed to already be unique, so you
wouldn't gain anything by indexing it.

> SSN (social security number, (composite) primary key1)

(see above)

> Employee_Name (full employee name, (composite) primary key2)

(see above)

>
> In both approaches, Employee_ID would be used as a foreign key in other
> tables to define the relationship with the Employee table.

If you go to the effort of creating a composite primary key, then why would
you not also go to the effort to "migrate" that key (i.e., all fields) to
the "child" tables?

>
> Are there any negative aspects associated with the BlueClaw approach?
>
> Pros of BlueClaw approach
> *Display of table is meaningful, because it's sorted on primary index

No, NO, NO!! Access tables store data, Access forms (and reports) display
it. Using Access tables to display data is asking for trouble! (can you
tell I have some strong feelings on this topic?<g> - check this newsgroup
for others' ideas about using tables to display data. From experience, I
don't want inexperienced users mucking about directly in my tables.
Instead, I'll guide their use of the data via forms. This is a major
difference between, say, Word {everyone knows how to move words around} and
Access, a relational database {how many normal people understand relational
database design?})

And "meaningful"?! To whom? Just because a set of data is sorted in one
order doesn't mean that EVERYONE wants to see it in that order. I, for one,
prefer to see a list of employees sorted by last name when I'm considering
Human Resources activities, but by firstname when I'm looking for their
phone numbers.

> *No cascaded update necessary of linked relationship fields in other
> tables,
> because autonumber is only used for linking tables and therefore will
> never
> change.

The implication is that the SSN and Employee_Name MAY change. So what?
There's next to zero effort required to set Cascading Updates when you set
the relationships among tables. And while an Autonumber may not change, you
can re-record a row of data and get a NEW autonumber, then delete the old
record. Where's your foreign key now?!

> *Prevention of duplicates is improved, since data fields are used to check
> for duplicates, instead of an (always unique) autonumber field > this can
> also be achieved with the composite unique index as shown above in the
> Access
> books example.

Ahem! ?"Duplicates"? Are the following employees the same person:
John Doe
J. J. Doe
And what about John and his son John, who both work for your company, both
live at the same address, and both have the same last name. ?Duplicates?!

>
> Cons of BlueClaw approach
> *???
>
> I would appreciate your comments / opinion on the BlueClaw approach,
> because
> I currently have the feeling that I'm missing something that explains why
> so
> many people use autonumber fields as primary (artificial) key. If the
> BlueClaw approach is the best one, I'm considering to use it as a standard
> in
> new database design questions.

JOPO (just one person's opinion) Any approach to this that claims to be the
one and only appropriate way to do this is probably wrong! Use what works
for you.

Regards

Jeff Boyce
Microsoft Office/Access MVP

>
> Best regards,
>
> M.

M.

unread,

Dec 4, 2007, 4:56:05 PM12/4/07

to

Thanks for your reply, but unfortunately this doesn't answer my question. I'm
sorry to distract you with minor issues, like the SSN and name fields. These
were copied from the Blue Claw example on their website. Of course I'm aware
of full name issues, but that's not my question.

My main question is: are there negative aspects associated with using a
primary key based on data fields versus using a primary key based on an
artificial primary key as generated with an autonumber field? In both cases
the autonumber field would be used for defining relations between tables.

Until now my answer would be: there are no negative aspects associated with
the data fields approach.

Best regards,

M.

Dennis

unread,

Dec 4, 2007, 5:53:01 PM12/4/07

to

ANY data can be used as a Primary Key AS LONG AS you are 100% sure of no
duplication and/or have code in place to prevent such an occurrance. I've
been a database designer for over 30 years, and until I started working with
MS Access, there was NO SUCH THING as an "autonumber key." EVERY key/primary
key was based on data. And several of my projects written in Access use
data-based keys/primaries.

Douglas J. Steele

unread,

Dec 4, 2007, 5:53:43 PM12/4/07

to

"M." <M...@discussions.microsoft.com> wrote in message

news:124E0144-170B-4DC5...@microsoft.com...

>
> My main question is: are there negative aspects associated with using a
> primary key based on data fields versus using a primary key based on an
> artificial primary key as generated with an autonumber field? In both
> cases
> the autonumber field would be used for defining relations between tables.

There's no reason to have an Autonumber field AND a "natural" primary key.

If you're going to create relationships, they will always be based on the
primary key. You cannot change that.

As Jeff said, this really is a religious war, so I won't say any more. <g>

--
Doug Steele, Microsoft Access MVP
http://I.Am/DougSteele
(no private e-mails, please)

Boyd Trimmell aka HiTechCoach via AccessMonster.com

unread,

Dec 4, 2007, 5:57:21 PM12/4/07

to

A primary key is a SINGLE field used as the record identifier. It really
should have no meaning to the user and in most cases never seam by users.

In all the databases that have worked with the have multiple fields for a
primary key have made it a lot harder to do anything, even things that should
be simple. I would never create one that way. I probably am glad the there
are people that recommend it (like Blue Claw) or actually do it so that I
will get lots of work fixing the issues it causes.

--
Boyd Trimmell
aka HiTechCoach
http://www.hitechcoach.com
http://www.officeprogramming.com

Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/Forums.aspx/access-tablesdbdesign/200712/1

Michael Gramelspacher

unread,

Dec 4, 2007, 6:58:20 PM12/4/07

to

On Tue, 4 Dec 2007 17:53:43 -0500, "Douglas J. Steele"
<NOSPAM_djsteele@NOSPAM_canada.com> wrote:

>"M." <M...@discussions.microsoft.com> wrote in message
>news:124E0144-170B-4DC5...@microsoft.com...
>>
>> My main question is: are there negative aspects associated with using a
>> primary key based on data fields versus using a primary key based on an
>> artificial primary key as generated with an autonumber field? In both
>> cases
>> the autonumber field would be used for defining relations between tables.
>
>There's no reason to have an Autonumber field AND a "natural" primary key.
>
>If you're going to create relationships, they will always be based on the
>primary key. You cannot change that.
>
>As Jeff said, this really is a religious war, so I won't say any more. <g>

This works for me. It seems to go against what you are saying.

Sub CreateTest()
With CurrentProject.Connection

.Execute _
"CREATE TABLE Cities" & _
" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (city_name));"

.Execute _
"CREATE TABLE States" & _
" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
",state_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (state_name));"

.Execute _
" CREATE TABLE Locations" & _
" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_id INTEGER NOT NULL" & _
" REFERENCES Cities (city_id)" & _
",state_id INTEGER NOT NULL" & _
" REFERENCES States (state_id)" & _
",PRIMARY KEY (city_id, state_id));"

End With
End Sub

Tony Toews [MVP]

unread,

Dec 4, 2007, 8:22:45 PM12/4/07

to

M. <M...@discussions.microsoft.com> wrote:

>>BlueClaw setup for Employee table

BlueClaw are full of cr*p. The middle sentence in the following is
exceedingly arrogant.

"You may look at this design and say you have always seen the
Employee_ID set as the table primary key. No matter what your teacher
or books say - this would be incorrect. See why you almost never use
an artificially generated numeric ID as a primary key."

As far as thier "Table Design Example - Detail Table" goes it's based
on a very faulty assumption. That the employee only does one task
throughout the day. And that's seldom the case.

And they don't explain themselves very well either.

As Jeff states this can become a religious war.

In my opinion use autonumber primary keys on every table and unique
and duplicate indexes as appropriate.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/

Douglas J. Steele

unread,

Dec 4, 2007, 9:23:43 PM12/4/07

to

A primary key does NOT have to be a single field.

A primary key is simply a special unique index, and therefore can contain up
to ten separate fields.

Perhaps you're thinking of an Autonumber field when you say "It really

should have no meaning to the user and in most cases never seam by users."

--
Doug Steele, Microsoft Access MVP
http://I.Am/DougSteele
(no private e-mails, please)

"Boyd Trimmell aka HiTechCoach via AccessMonster.com" <u14500@uwe> wrote in
message news:7c3433c66efd9@uwe...

John W. Vinson

unread,

Dec 5, 2007, 2:10:17 AM12/5/07

to

On Tue, 04 Dec 2007 17:58:20 -0600, Michael Gramelspacher <gram...@psci.net>
wrote:

>
>This works for me. It seems to go against what you are saying.
>
>Sub CreateTest()
>With CurrentProject.Connection
>
>.Execute _
>"CREATE TABLE Cities" & _
>" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
>",city_name VARCHAR (30) NOT NULL" & _
>",PRIMARY KEY (city_name));"
>
>.Execute _
>"CREATE TABLE States" & _
>" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
>",state_name VARCHAR (30) NOT NULL" & _
>",PRIMARY KEY (state_name));"
>
>.Execute _
>" CREATE TABLE Locations" & _
>" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
>",city_id INTEGER NOT NULL" & _
>" REFERENCES Cities (city_id)" & _
>",state_id INTEGER NOT NULL" & _
>" REFERENCES States (state_id)" & _
>",PRIMARY KEY (city_id, state_id));"
>
>End With
>End Sub

Well, sure, it works. You're creating autonumber primary keys. It *works*,
everyone agrees with that; the question would be is it the *only* way that
works, or even the *best* way? Many would disagree.

In one particular case I'd really have to disagree with you. The US and
Canadian postal services have defined unique, stable (well *almost* stable),
short two-letter state codes. My States table would be more like

.Execute _
"CREATE TABLE States" & _

" (state_code CHAR(2) NOT NULL UNIQUE" & _

",state_name VARCHAR (30) NOT NULL" & _

",PRIMARY KEY (state_code));"

As for making the city_name or state_name the primary key of its table...
state_name you can get away with but certainly not city_name; there are many
cities named Springfield, and New Mexico even has two towns named Los Alamos
(the famous one with the rad lab, and a tiny village north of Las Vegas - no,
not the one in Nevada).

John W. Vinson [MVP]

Jamie Collins

unread,

Dec 5, 2007, 7:07:06 AM12/5/07

to

On Dec 5, 7:10 am, John W. Vinson <jvinson@STOP_SPAM.WysardOfInfo.com>
wrote:

> >With CurrentProject.Connection
>
> >.Execute _
> >"CREATE TABLE Cities" & _
> >" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
> >",city_name VARCHAR (30) NOT NULL" & _
> >",PRIMARY KEY (city_name));"
>
> >.Execute _
> >"CREATE TABLE States" & _
> >" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
> >",state_name VARCHAR (30) NOT NULL" & _
> >",PRIMARY KEY (state_name));"
>
> >.Execute _
> >" CREATE TABLE Locations" & _
> >" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
> >",city_id INTEGER NOT NULL" & _
> >" REFERENCES Cities (city_id)" & _
> >",state_id INTEGER NOT NULL" & _
> >" REFERENCES States (state_id)" & _
> >",PRIMARY KEY (city_id, state_id));"
>
> >End With
> >End Sub
>
> Well, sure, it works. You're creating autonumber primary keys.

No, he isn't.

> It *works*,
> everyone agrees with that

What exactly is everyone agreeing with, in your opinion?

Jeff Boyce said: "If you're going to create relationships, they will

always be based on the primary key. You cannot change that."

Michael Gramelspacher said (paraphrasing): "I can create a FOREIGN KEY
REFERENCE (a.k.a. Access Relationship with referential integrity
enforced) based on a UNIQUE constraint, which goes against what Jeff
Boyce said about relationships always being based on the primary key."

Jamie says: If by 'relationships' Jeff Boyce meant 'Access
Relationships' (i.e. entities created in the Relationships window in
the Access user interface) then there is no requirement for them to be
based on either PKs or even UNIQUEs; heck, the columns in the
respective tables don't even need to be the same data type!

Jamie.

--

Jeff Boyce

unread,

Dec 5, 2007, 7:51:46 AM12/5/07

to

Jamie

I'm old and forgetful, so help me remember...

I don't recall making the statement you quote me as making (or maybe you
know another Jeff Boyce):

"Jamie Collins" <jamiec...@xsmail.com> wrote in message
news:d7f2cfab-9e90-44fd...@s36g2000prg.googlegroups.com...

>
> Jeff Boyce said: "If you're going to create relationships, they will
> always be based on the primary key. You cannot change that."
>

I'm pretty sure I did not say that in the current thread. Is there some
thread out of my past in which I said this? Could you provide the context
in which I said this? (Again, I don't recall ...)

Regards

Jeff Boyce

unread,

Dec 5, 2007, 7:55:41 AM12/5/07

to

I may have misunderstood your post. I got the impression you were looking
to compare artificial vs. natural primary keys, trying to determine a "best
practice" approach.

--
Regards

Jeff Boyce
www.InformationFutures.net

Microsoft Office/Access MVP
http://mvp.support.microsoft.com/

Microsoft IT Academy Program Mentor
http://microsoftitacademy.com/

"M." <M...@discussions.microsoft.com> wrote in message

news:124E0144-170B-4DC5...@microsoft.com...

Jamie Collins

unread,

Dec 5, 2007, 8:12:22 AM12/5/07

to

On Dec 5, 12:51 pm, "Jeff Boyce" <JeffBoyce...@msn.com-

DISCARD_HYPHEN_TO_END> wrote:
> I don't recall making the statement you quote me as making (or maybe you
> know another Jeff Boyce):

Check for the quote upthread. You were calling yourself 'Douglas J.
Steele' at the time but you didn't fool me.

Seriously, apologies for attributing the quote to the wrong Access
MVP. You guys all look the same to me <g>.

Jamie.

--

Fred

unread,

Dec 5, 2007, 8:18:01 AM12/5/07

to

I've been both creating and using databases heavily for 17 years in business,
personal and organizational environments. And I've seen both ways done.
I'd like to weigh in on the side of primary keys not containing real world
data. If it's real world data, there will inevitably be an expectation that
it be current / correct. And that means at least occasionally correcting or
changing it. Even if you have the extremely rare case where the real world
data key is so stable it wil never never never change, (not even
state/province abbreviations are that clean/stable e.g. Quebec due to their
political problems) you still have the problem of making corrections to that
data for wrong entries.

I've also seen plans that try to blend the two. That generate a primary
key from actual data (e.g. a company name) but then never ever change it.
This presents a dilemma. You can't have you cake and eat it too: first,
does it show real world data or not? If the answer is yes, then the data is
wrong as soon as there is a change or a correction. If the answer is no,
then what are you putting real world info into the key.

If the key is abstract, then it is controlled by the database....there are
no hooks that could allow outsides forces to mess with it.

Fred

Jamie Collins

unread,

Dec 5, 2007, 8:59:38 AM12/5/07

to

On Dec 4, 9:56 pm, M. <M...@discussions.microsoft.com> wrote:
> My main question is: are there negative aspects associated with using a
> primary key based on data fields versus using a primary key based on an
> artificial primary key as generated with an autonumber field? In both cases
> the autonumber field would be used for defining relations between tables.

As a guess: for reasons of optimization because PRIMARY KEY determines
physical ordering on disk (clustering). See:

Microsoft Jet 3.5 Performance Overview and Optimization Techniques
http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/dnaraccessdev/html/ODC_MicrosoftOfficeDeveloperForumMicrosoftAccessMicrosoftJetDatabaseEngine.asp

"From a performance perspective, there are many reasons to frequently
compact a database. One reason is that compacting will create a new
database that stores all table rows in a contiguous order. If a
primary key or unique index is defined, the rows will be sorted in
order of the primary key or unique index. This allows Microsoft Jet to
take full advantage of its read-ahead cache and also reduces disk I/O
when doing sequential scans of a table."

Put the other way around, assigning the table's PRIMARY KEY
designation to the Autonumber column could have a negative impact on
the performance of queries which use a sequential scan on a table e.g.
using the BETWEEN keyword in SQL code.

Jamie.

--

Jamie Collins

unread,

Dec 5, 2007, 9:00:31 AM12/5/07

to

On Dec 4, 9:15 pm, "Jeff Boyce" <nonse...@nonsense.com> wrote:

Calling all Jeff Boyces...

> >>BlueClaw setup for Employee table
> > Employee_ID (autonumber, unique index)
>
> Why? By definition, an Autonumber is supposed to already be unique, so

An Autonumber is supposed to auto-generate a value according to an
algorithm; the choices are increment, random or GUID. Show me the
section of the Jet specification which says Autonumber is supposed to
be unique. If you like I can post some code to demonstrate the fact
that Jet can auto-generate duplicate Autonumber values (hint: you
change increment value from the default value of one to a value very
close to the maximum for Long Integer).

> you
> wouldn't gain anything by indexing it.

Can you say "data integrity"? what makes you think that values in an
Autonumber column will be unique unless you put a unique index (or
constraint) on it? Again, I can post code to explicitly insert
duplicate values into an Autonumber column if you like.

Jamie.

--

Jamie Collins

unread,

Dec 5, 2007, 9:53:42 AM12/5/07

to

On Dec 4, 10:53 pm, Dennis <Den...@discussions.microsoft.com> wrote:
> ANY data can be used as a Primary Key AS LONG AS you are 100% sure of no
> duplication and/or have code in place to prevent such an occurrance.

IIRC a column of type OLEOBJECT cannot be PRIMARY KEY.

Perhaps you meant to say "as long as the engine can be 100% sure of no
duplication"? While I can be sure that I am putting unique values into
a MEMO column, the engine only considers the first 255 characters when
checking a PRIMARY KEY constraint. Also I'd be vary wary of
approximate numeric types (e.g. REAL and FLOAT).

Jamie.

--

Jeff Boyce

unread,

Dec 5, 2007, 9:54:17 AM12/5/07

to

Oh, you are in so much trouble now!

Just because Douglas and I (Jeff) have traditionally male surnames doesn't
mean we're males... (oh wait, we are, never mind<g>)

Regards

Jeff

"Jamie Collins" <jamiec...@xsmail.com> wrote in message

news:fc2ab864-f094-4d66...@s12g2000prg.googlegroups.com...

Jeff Boyce

unread,

Dec 5, 2007, 9:56:44 AM12/5/07

to

Thanks! I suspect you're right.

I don't use Autonumbers, ... except as Primary Keys, so they are, by (my)
definition, unique!

I keep forgetting that not everyone shares MY definition!

--
Regards

Jeff Boyce
www.InformationFutures.net

Microsoft Office/Access MVP
http://mvp.support.microsoft.com/

Microsoft IT Academy Program Mentor
http://microsoftitacademy.com/

"Jamie Collins" <jamiec...@xsmail.com> wrote in message
news:9a015145-d39b-4ae8...@w40g2000hsb.googlegroups.com...

Jamie Collins

unread,

Dec 5, 2007, 10:09:18 AM12/5/07

to

On Dec 5, 2:54 pm, "Jeff Boyce" <JeffBoyce...@msn.com-

DISCARD_HYPHEN_TO_END> wrote:
> Just because Douglas and I (Jeff) have traditionally male surnames doesn't
> mean we're males... (oh wait, we are, never mind<g>)

I've also seen the 'traditionally male' beards in the MVP mug
shots ;-)

Jamie.

--

Jeff Boyce

unread,

Dec 5, 2007, 11:05:38 AM12/5/07

to

Facial hair is not restricted to males...

(hence the facination of the "Bearded Lady" in carnivals!)

You are STILL in trouble!

Regards

Jeff Boyce
Microsoft Office/Access MVP

"Jamie Collins" <jamiec...@xsmail.com> wrote in message
news:9a49e146-37f1-496d...@o42g2000hsc.googlegroups.com...

Jamie Collins

unread,

Dec 5, 2007, 11:22:53 AM12/5/07

to

On Dec 5, 4:05 pm, "Jeff Boyce" <nonse...@nonsense.com> wrote:
> Facial hair is not restricted to males...

...in the set of Access MVPs?

Jamie.

--

Dennis

unread,

Dec 5, 2007, 11:48:03 AM12/5/07

to

I was speaking from a pure theory perspective, not from an Access-specific one.

OLE objects aren't "data" per se; they're, well, OBJECTS. And memo fields
cannot be used as indexes or searched on, so as a key they'd be impossible to
use. In the "old-timey world" one needed to keep one's keys as short as
possible while being meaningful. A key-length over about 15 characters almost
guaranteed terrible application performance. But that was then....

John W. Vinson

unread,

Dec 5, 2007, 12:40:47 PM12/5/07

to

On Wed, 5 Dec 2007 04:07:06 -0800 (PST), Jamie Collins
<jamiec...@xsmail.com> wrote:

> "If you're going to create relationships, they will
>always be based on the primary key. You cannot change that."
>
>Michael Gramelspacher said (paraphrasing): "I can create a FOREIGN KEY
>REFERENCE (a.k.a. Access Relationship with referential integrity
>enforced) based on a UNIQUE constraint, which goes against what Jeff
>Boyce said about relationships always being based on the primary key."

You're right (about the meaning if not the attribution), Jamie - I
misinterpreted Michael's point.

John W. Vinson [MVP]

John W. Vinson

unread,

Dec 5, 2007, 12:44:25 PM12/5/07

to

On Wed, 5 Dec 2007 05:18:01 -0800, Fred <Fr...@discussions.microsoft.com>
wrote:

> And that means at least occasionally correcting or
>changing it.

well... there *is* Cascade Updates to cover that possibility.

But I agree, it's ideal if the primary key is rock solid stable. If it's only
San Andreas faultline rock solid stable, you need to depend on cascade
updates, which do work but have their own problems!

John W. Vinson [MVP]

M.

unread,

Dec 5, 2007, 5:11:01 PM12/5/07

to

Thanks for this helpful answer.

Is the following summary correct:

The (unique) index that would be used most frequently for sorting or
filtering the table records, should be defined as primary index from a
performance point of view, because the primary index determines the
sequential physical order on harddisk ?

M.

Fred

unread,

Dec 5, 2007, 5:57:01 PM12/5/07

to

I'm from Chicago, we don't know what fault lines are. Could you use a
tornado analogy so that I can understand?

:-)

Fred

Jeff Boyce

unread,

Dec 5, 2007, 6:10:49 PM12/5/07

to

(psst, Fred, can you say New Madrid?)

Jeff
"Fred" <Fr...@discussions.microsoft.com> wrote in message
news:2F3B5D7D-5C2B-4E96...@microsoft.com...

David W. Fenton

unread,

Dec 5, 2007, 9:52:56 PM12/5/07

to

"Jeff Boyce" <nons...@nonsense.com> wrote in
news:#B2dKrrN...@TK2MSFTNGP02.phx.gbl:

> "M." <M...@discussions.microsoft.com> wrote in message
> news:5C60984F-9FD8-4E90...@microsoft.com...

>>>BlueClaw setup for Employee table
>> Employee_ID (autonumber, unique index)
>
> Why? By definition, an Autonumber is supposed to already be

> unique, so you wouldn't gain anything by indexing it.

Eh? The unique index has an effect on the type of relationship. If
both sides of the join lack a unique index, it's an undefined
relationship type. If one side does, it's a many-to-one. If both
sides do, it's a 1:1.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

David W. Fenton

unread,

Dec 5, 2007, 9:57:16 PM12/5/07

to

=?Utf-8?B?TS4=?= <M...@discussions.microsoft.com> wrote in
news:124E0144-170B-4DC5...@microsoft.com:

> My main question is: are there negative aspects associated with
> using a primary key based on data fields versus using a primary
> key based on an artificial primary key as generated with an
> autonumber field? In both cases the autonumber field would be used
> for defining relations between tables.
>

> Until now my answer would be: there are no negative aspects
> associated with the data fields approach.

Have you tried it? For instance, try creating a join between two
tables where the key they are joined on has 5 fields in it.

My philosophy:

Single-field natural PKs are OK, though they should be for data that
is never updated once entered.

Multi-column keys are a horrid amount of work, and duplicate a
helluva lot of data.

It also makes your joins less efficient.

Last of all, very few of the tables in the apps I'm asked by clients
to build can have no Nulls, so natural keys are only usable if you
assign a default value. If you use a zero-length string, then that
means you can't query on Null, and it means that you can't propagate
Nulls in concatenation operations without using an intermediary
function to convert ZLS's to Null.

David W. Fenton

unread,

Dec 5, 2007, 10:02:54 PM12/5/07

to

=?Utf-8?B?TS4=?= <M...@discussions.microsoft.com> wrote in
news:5CE1CB34-7B5F-41DE...@microsoft.com:

> The (unique) index that would be used most frequently for sorting
> or filtering the table records, should be defined as primary index
> from a performance point of view, because the primary index
> determines the sequential physical order on harddisk ?

No, unless you're limiting the discussion to Jet.

In SQL Server, for instance, you can cluster on any unique index
(i.e., write the table in that index's order). It doesn't have to be
the PK.

Jamie Collins

unread,

Dec 6, 2007, 4:30:40 AM12/6/07

to

On Dec 5, 10:57 pm, Fred <F...@discussions.microsoft.com> wrote:
> > But I agree, it's ideal if the primary key is rock solid stable. If it's only
> > San Andreas faultline rock solid stable, you need to depend on cascade
> > updates, which do work but have their own problems!
>

> I'm from Chicago, we don't know what fault lines are. Could you use a
> tornado analogy so that I can understand?

I'm from the UK, where both weather and geology is relatively benign.
So that we all may understand the point, can you please use an example
(in place of an analogy) of an industry standard key with a trusted
source which manages changes. As a suggestion, consider ISBN which
recently changed from 10 to 13 characters (which Amazon and many
public libraries seem to have survived <g>) and for which duplication
occasionally occurs.

Jamie.

--

Jamie Collins

unread,

Dec 6, 2007, 4:45:24 AM12/6/07

to

On Dec 5, 10:11 pm, M. <M...@discussions.microsoft.com> wrote:
> Thanks for this helpful answer.
>
> Is the following summary correct:
>
> The (unique) index that would be used most frequently for sorting or
> filtering the table records, should be defined as primary index from a
> performance point of view, because the primary index determines the
> sequential physical order on harddisk ?

Not exactly. To repeat: "This allows Microsoft Jet to take full

advantage of its read-ahead cache and also reduces disk I/O when doing

sequential scans of a table." The best example I can think of for
sequential scans of a table is using BETWEEN in SQL because the data
would be on contiguous pages. Tthink of a paper copy telephone
directory ordered by last name then first names and I asked you to get
me all the telephone numbers of people whose last names begin with the
letter 'C' and how easy it would be to rip out the appropriate pages,
then think about how difficult it would be to do the same if the task
was to get all the telephone numbers which began with the numeral '5'.
The next best example would be GROUP BY.

I suggest you do some research on clustered indexes *generally*; you
won't find much in the Jet world because most users disregard or are
unaware of (or even are in denial of <g>) the effects of clustering in
Jet. David W. Fenton's suggestion of looking at SQL Server (where you
have an explicit choice of clustered index) seems a good one. Which
reminds me, David came up with the brilliant suggestion of making a
random Autonumber column the sole PK to favour concurrency (because
values created close together in transaction time have more chance of
appearing on *different* pages).

Jamie.

--

Jamie Collins

unread,

Dec 6, 2007, 4:59:47 AM12/6/07

to

On Dec 6, 2:57 am, "David W. Fenton" <XXXuse...@dfenton.com.invalid>
wrote:
> =?Utf-8?B?TS4=?= <M...@discussions.microsoft.com> wrote innews:124E0144-170B-4DC5...@microsoft.com:

> Have you tried it? For instance, try creating a join between two
> tables where the key they are joined on has 5 fields in it.
>

> Multi-column keys are a horrid amount of work, and duplicate a
> helluva lot of data.

Isn't this the problem Access Relationships were invented to solve?
i.e. defining the Relationship on those five columns is a one-time
task, after which you simply add the tables to the query builder tool
thing and it types out the JOIN syntax for you. Me, I can type fast
(and the parser does spell chekcing for me <g>) so I don't really mind
the 'hard' work. And if you are concerned about disk space, port to a
back end without a 2GB (or whatever) limit and buy a bigger drive
(relatively inexpensive these days).

Using meaningless keys in your tables results in many table joins to
get basic key data so you have to balance these things out. I'm wary
of people who use the same hammer to bash every kind of fixing <g>.

> Last of all, very few of the tables in the apps I'm asked by clients
> to build can have no Nulls, so natural keys are only usable if you
> assign a default value.

I also have only a few nullable columns (subatomic elements, usually
of type DATETIME) and absolutely zero columns that allow a zero length
string but still your point alludes me. I suspect the answer is that
you are have to denormalized tables but could you please clarify with
an example? TIA.

Jamie.

--

Jamie Collins

unread,

Dec 6, 2007, 6:10:46 AM12/6/07

to

On Dec 5, 4:48 pm, Dennis <Den...@discussions.microsoft.com> wrote:
> I was speaking from a pure theory perspective, not from an Access-specific one.

I get into trouble myself for doing that <g>.

> OLE objects aren't "data" per se; they're, well, OBJECTS.

LOL! Did I interpret the word 'data' too literally <g>? Is OLEOBJECT a
*data* type? Can columns of type OLEOBJECT be found in a Jet
'database'?

> And memo fields
> cannot be used as indexes or searched on

I don't know what you mean but memo fields can be indexed and can be
used in WHERE and HAVING clauses; you can cast them using CSTR() in
SQL code to prevent them from being truncated in certain
circumstances.

Jamie.

--

Rod Plastow

unread,

Dec 6, 2007, 7:17:01 AM12/6/07

to

Jamie,

You forgot to mention the penultimate phone number change that meant phone
numbers would never have to change again - that is until the next phone
number change about four years later. :-)

Let me nail my colours to the mast; I belong to the religious sect of a
'meaningless' pk. I believe the job of a pk is to identify the row and
nothing more and I've the dog collar to demonstrate my faith - well that is
until I come across a situation where to include a meaningless key simply
doubles the effort such as an accounting period table that I key with yyyymm
- surely there will never be another Pope Gregory!

As for 'Rock Solid:' I'm in the Philippines and we experienced an earthquake
last week so geological analogies do not work here.

Regards,

Rod

Jamie Collins

unread,

Dec 6, 2007, 8:21:34 AM12/6/07

to

On Dec 6, 12:17 pm, Rod Plastow <RodPlas...@discussions.microsoft.com>
wrote:

> Let me nail my colours to the mast; I belong to the religious sect of a
> 'meaningless' pk. I believe the job of a pk is to identify the row and
> nothing more and I've the dog collar to demonstrate my faith

I'm borderline 'atheist' because I don't use the PRIMARY KEY on every
table, and I have a relationally sound basis for doing so because a
key is a key is a key; calling one 'primary' is arbitrary.

But really it has nothing to do with 'religion': PRIMARY KEY behaves
the same in Jet regardless of your beliefs. So let's stick to the
facts.

What does PRIMARY KEY do for you that other keys ('non-primary keys'?)
do not? Doesn't allow the NULL value but I've got NOT NULL at the
column level for that. Let's you omit the PK columns in the RFERENCES
clause in SQL DDL but I prefer to write things out longhand. Puts the
columns in bold text in the Access UI but I don't look at the
Relationships picture (I find SQL code easier to interpret). So
clustering is the only aspect of PRIMARY KEY which I find appealing
for Jet. But try telling someone to use NOT NULL UNIQUE for their
existing PKs and leveraging the clustering nature of PRIMARY KEY as a
'physical' index and in most cases it's too much of a paradigm shift
for them to comprehend.

Jamie.

--

Fred

unread,

Dec 6, 2007, 10:11:00 AM12/6/07

to

Can someone tell me what this mere mortal is missing regarding how to look at
these long threads? The summary page says 40 posts (which looks correct)
but a look at the posts (2 pages with only one "more" between them) only
shows 30, and some that I know were there before are gone.

Fred

unread,

Dec 6, 2007, 10:33:00 AM12/6/07

to

That's about 325 miles away. Although the shocks will travel farther in
Midwest geology, that's still like another country to us.

:-)

Fred

Jamie Collins

unread,

Dec 6, 2007, 10:43:05 AM12/6/07

to

On Dec 6, 3:11 pm, Fred <F...@discussions.microsoft.com> wrote:
> Can someone tell me what this mere mortal is missing regarding how to look at
> these long threads?

This is what I see:

http://groups.google.com/group/microsoft.public.access.tablesdbdesign/browse_frm/thread/8d21e479a6765df2/

Jamie.

--

Dennis

unread,

Dec 6, 2007, 11:06:07 AM12/6/07

to

> > And memo fields
> > cannot be used as indexes or searched on
>
> I don't know what you mean but memo fields can be indexed and can be
> used in WHERE and HAVING clauses; you can cast them using CSTR() in
> SQL code to prevent them from being truncated in certain
> circumstances.

Then you're a better man that I, because I could never get them to work for
me in that regard. But that wasn't the first time I've said that, and it
CERTAINLY won't be the last - heh....

Dennis

Dale Fye

unread,

Dec 6, 2007, 1:34:02 PM12/6/07

to

M.

I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

This seems like a lot of extra work to me, so I avoid it where I can.

It also occurs to me that if you want to use the composite key, then if you
want to use this stuff in a list or combo box, and want to be able to select
the specific record, then you need to include all of the fields necessary for
the PK (which increases the load on your network).

Just my 2 cents!

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.

"M." wrote:

> Dear all,
>
> Although many Microsoft Access books advise to set an autonumber field as
> primary index (a so called pseudo primary key),
> http://www.blueclaw-db.com/database_link_tables.htm advises to use real data
> to define a (composite) primary key. In summary, this results in the
> following two designs:
>
> >Microsoft Acces books setup for Employee table
> Employee_ID (autonumber, primary key)
> SSN (social security number, composite index key1)
> Employee_Name (full employee name, composite index key2)
> composite index SSN + Employee_Name = unique

>
> >BlueClaw setup for Employee table
> Employee_ID (autonumber, unique index)

> SSN (social security number, (composite) primary key1)
> Employee_Name (full employee name, (composite) primary key2)
>
> In both approaches, Employee_ID would be used as a foreign key in other
> tables to define the relationship with the Employee table.
>
> Are there any negative aspects associated with the BlueClaw approach?
>
> Pros of BlueClaw approach
> *Display of table is meaningful, because it's sorted on primary index
> *No cascaded update necessary of linked relationship fields in other tables,
> because autonumber is only used for linking tables and therefore will never
> change.
> *Prevention of duplicates is improved, since data fields are used to check
> for duplicates, instead of an (always unique) autonumber field > this can
> also be achieved with the composite unique index as shown above in the Access
> books example.
>
> Cons of BlueClaw approach
> *???
>
> I would appreciate your comments / opinion on the BlueClaw approach, because
> I currently have the feeling that I'm missing something that explains why so
> many people use autonumber fields as primary (artificial) key. If the
> BlueClaw approach is the best one, I'm considering to use it as a standard in
> new database design questions.
>
> Best regards,
>
> M.

no email@aol.com Pat Hartman

unread,

Dec 6, 2007, 1:50:50 PM12/6/07

to

I'm not sure why I'm jumping in here and I'm not sure if someone has already
said what I am going to say so I apologize in advance. I also have over 30
years of database design experience and in my mainframe days, all primary
keys were "natural" and many were compound. It took me a few years of using
Access to become converted to the "autonumber as PK camp" but I have. I
find autonumbers extremely easy to work with. Access does have an issue
with autonumbers that are not defined as primary keys so I would NEVER use a
"natural" PK in a table that also had an autonumber. the autonumber would
ALWAYS be the PK and would be used for all relationships. When I have
business rules to satisfy that revolve around uniqueness of "natural" keys,
I use unique indexes.
The benefits of using an autonumber PK are:
1. there is never a need to cascade PK updates since the PK is an autonumber
and by definition cannot be changed.
2. Business rules regarding uniqueness can be implemented with unique
indexes.
3. In the cases where one of the "natural" keys may validly be blank, I
don't have an issue with a unique index whereas I would have an issue if I
defined a compound primary key since PKs may not contain null values. SSN
is a good example here (aside from the fact that there are laws prohibiting
its use as a PK) since not everyone will have one (h1b visa holders for one)
and there are only limited cases where a person actually has to provide one
to the requestor.
4. Combo/List boxes require code behind the scenes to work if the unique
identifier is compound.
5. As the hierarchy gets deeper, the compound key grows and joins become
more unwieldy. It is way too easy to omit one field out of 5 in a join with
disastrous results if the error isn't discovered quickly.

The only pro for using autonumber PKs is that a lot of your tables will end
up with lots of numeric columns that you would prefer to view as text when
you open the table. The solution is NOT table level lookups, it is creating
queries that join the tables so you can see the identifying "natural" key
data from the parent table. You only have to create one of these views for
each table so it's not like it is a great deal of work. Then for yourself
(because ONLY you should be viewing tables or queries directly), just
remember to open the query rather than the table when you want to see the
natural key values.

That's my opinion and I'm stickin' to it:)

"M." <M...@discussions.microsoft.com> wrote in message
news:5C60984F-9FD8-4E90...@microsoft.com...

M.

unread,

Dec 6, 2007, 2:04:01 PM12/6/07

to

Dear Dale,

You could still use your preference for the (unique) autonumber field in the
WHERE statement (as suggested for defining relations in my example), even if
it's not the primary key. The same goes for the lookup box example. I'm in
the mean time convinced that it's a matter of taste (and maybe of performance
for the Jet engine) to decide between an artificial primary key and a real
data primary key.

What still interests me, however, is how pepole prevent entry of duplicate
records when they use a autonumber field as primary index. This choice
doesn't offer you any guarantee that the real data in your record is
accidentally entered a second time in a later record. In my opinion you would
still need a unique single (e.g. ISBN number) or multifield index (e.g. book
title + author name) to prevent duplicate entry. In the end, it's then a
matter of taste whether you name the autonumber field index primary or the
composite field(s) index primary.

If you disagree, please let me know,

regards,

M.

Tony Toews [MVP]

unread,

Dec 6, 2007, 3:03:07 PM12/6/07

to

Dale Fye <dale...@nospam.com> wrote:

>I always love these discussions, because you have some purists, some
>"religious fanatics", and some simpletons; call me a simpleton. The reason I
>use "meaningless" primary keys is that it saves me work!!!

<chuckle> Excellent comment.

>When I want to update a record, or select a record with an autonumber PK, I
>type:
>
>WHERE ID = 123
>
>If I use a composite PK, I have to type:
>
>WHERE [somefield] = somevalue
>AND [someotherfield] = somevalue
>AND [athirdfield] = somevalue
>AND ....

Consider if you have a parent and subform based a composite PK?
That's a lot of extra work there figuring out what fields go where.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/

Dale Fye

unread,

Dec 6, 2007, 3:04:00 PM12/6/07

to

I am certain that there are some performance issues associated with either
decision.

I generally use the Autonumber field as my PK, but use multi-field unique
indexes, and application level business logic, to prevent "duplicate
entries". As long as you restrict your users access to the raw data, it is
frequently easier to implement the business logic in the application rather
then through table constraints.

Dale Fye

unread,

Dec 6, 2007, 3:07:03 PM12/6/07

to

Amen, brother!

--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.

no email@aol.com Pat Hartman

unread,

Dec 6, 2007, 3:22:40 PM12/6/07

to

BTW, I meant con not pro.

"Pat Hartman" <please no em...@aol.com> wrote in message
news:uKz0sjDO...@TK2MSFTNGP06.phx.gbl...

Fred

unread,

Dec 6, 2007, 3:36:01 PM12/6/07

to

Thanks,

That Google portal is a whole new way to look at it.

Fred

David W. Fenton

unread,

Dec 6, 2007, 6:39:23 PM12/6/07

to

=?Utf-8?B?TS4=?= <M...@discussions.microsoft.com> wrote in
news:2D3FCA4D-2535-41D3...@microsoft.com:

> What still interests me, however, is how pepole prevent entry of
> duplicate records when they use a autonumber field as primary
> index. This choice doesn't offer you any guarantee that the real
> data in your record is accidentally entered a second time in a
> later record. In my opinion you would still need a unique single
> (e.g. ISBN number) or multifield index (e.g. book title + author
> name) to prevent duplicate entry. In the end, it's then a matter
> of taste whether you name the autonumber field index primary or
> the composite field(s) index primary.

Well, if you're using a surrogate key, that doesn't mean you leave
out the unique indexes on fields and combinations of fields that
should also be unique. However, in many cases, composite keys can't
be unique because you have to allow storage of Nulls. In that case,
your app has to do duplicate checking of some sort. Given that with
things like names, you can't rely on a unique index even when Nulls
are disallowed (because Bob and Robert could still be the same
person), you often end up putting some form of duplicate checking
into your app even when you already have unique indexes on the
fields you're checking.

So, to me, having unique indexes is a question that is answered
field by field (or composite key by composite key), and is wholly
independent of which unique index you happen to use for your primary
key. The unique indexes are a function of the data, while the
primary key is a function of your database structure. One field or
collection of fields may serve both functions, but they don't have
to.

Jamie Collins

unread,

Dec 7, 2007, 3:37:22 AM12/7/07

to

On Dec 6, 4:06 pm, Dennis <Den...@discussions.microsoft.com> wrote:
> > I don't know what you mean but memo fields can be indexed and can be
> > used in WHERE and HAVING clauses; you can cast them using CSTR() in
> > SQL code to prevent them from being truncated in certain
> > circumstances.
>
> Then you're a better man that I, because I could never get them to work for
> me in that regard. But that wasn't the first time I've said that, and it
> CERTAINLY won't be the last - heh....

Try this:

CREATE TABLE Test1 (memo_col MEMO NOT NULL)
;
CREATE INDEX idx__test1__memo_col ON Test1 (memo_col)
;
CREATE TABLE Test2 (memo_col MEMO NOT NULL PRIMARY KEY)
;
CREATE TABLE Test3 (memo_col MEMO NOT NULL UNIQUE)
;

All the above work in Jet. The real issue is that Jet only tests the
first 255 characters of a MEMO column when testing a unique constraint
and in other contexts e.g. GROUP BY, UNION (but not UNION ALL), etc.
For example:

INSERT INTO Test2 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345x')
;
INSERT INTO Test2 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345z')
;

The above represent different values when considering all characters
but the engine considers them non-unique when checking the PK because
only the first 255 characters are used in the check. Off the top of my
head (so probably not the best), here's one way of achieving a unique
constraint with a MEMO column (ANSI-92 Query Mode syntax):

CREATE TABLE Test4 (
memo_col MEMO NOT NULL,
CONSTRAINT Test4__memo_col__unique
CHECK (1 = (
SELECT COUNT(*)
FROM Test4 AS T4
WHERE CSTR(Test4.memo_col) = CSTR(T4.memo_col))))
;
INSERT INTO Test4 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345x')
;
INSERT INTO Test4 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345z')
;

Jamie.

--

Jamie Collins

unread,

Dec 7, 2007, 4:04:59 AM12/7/07

to

On Dec 6, 6:34 pm, Dale Fye <dale....@nospam.com> wrote:
> I always love these discussions, because you have some purists, some
> "religious fanatics", and some simpletons; call me a simpleton. The reason I
> use "meaningless" primary keys is that it saves me work!!!
>
> When I want to update a record, or select a record with an autonumber PK, I
> type:
>
> WHERE ID = 123
>
> If I use a composite PK, I have to type:
>
> WHERE [somefield] = somevalue
> AND [someotherfield] = somevalue
> AND [athirdfield] = somevalue
> AND ....
>
> This seems like a lot of extra work to me, so I avoid it where I can.

Adding an artificial key to each table introduces data elements not
present in the reality being modelled. This seems like a lot of extra
work to me, so I avoid it where I can. The obvious example is a single-
column lookup table.

I've just got back from a project where an entity's four-column
'natural' key was replaced with a single-column meaningless key. Note,
however, the existing model was flawed because the real world key
would require six columns (not that columns exist in the real world of
course <g>) and the required data is not currently being captured. So
I guess I'm not a purist because, although I am inclined towards a
natural key e.g. a single column lookup table) I can open my mind up
to an artificial key when appropriate.

If you imagine I spent my time on this project 'simplifying' SQL
queries by trimming down WHERE clauses and reducing the SELECT clause
by three columns then you'd be dead wrong. Those columns involved in
the 'natural' key are still required by the application in most
scenarios so a lot of the donkey work has been creating JOINs back to
the entity tables to find out the real key values, sometimes six
tables deep. I've yet to do the metrics but it seems to me that the
additional table joins have had a noticeable negative impact on
performance.

Another minor issue I have is that the tables are less readable. For
example, looking at the InvoicePayments table I now only see the
meaningless invoice_ID whereas I used to see customer_ number; to view
that I now have to create four JOINs, and typing those ON clauses
wastes any effort I may have saved in the now simplified WHERE clause.

Jamie.

--

Jamie Collins

unread,

Dec 7, 2007, 4:18:30 AM12/7/07

to

On Dec 6, 6:50 pm, "Pat Hartman" <please no em...@aol.com> wrote:
> The only [con] for using autonumber PKs is that a lot of your tables will end

> up with lots of numeric columns that you would prefer to view as text when
> you open the table.

I've suggested another in this thread: that you have no influence over
physical ordering on disk in Jet other than by using PRIMARY KEY and
there is virtually no case in favour of clustering on a incremental
Autonumber column, being the Autonumber algorithm of choice because it
generates low value positive integers that are easy to type -- see
Dale Fye's 2 cents in this thread. In other words, putting the PK on
an Autonumber at best wastes an opportunity for better optimization
and at worst causes poor performance.

> Access does have an issue
> with autonumbers that are not defined as primary keys

What is the issue you are alluding too? TIA.

Jamie.

--

Jamie Collins

unread,

Dec 7, 2007, 4:35:50 AM12/7/07

to

On Dec 6, 6:50 pm, "Pat Hartman" <please no em...@aol.com> wrote:

> the autonumber would
> ALWAYS be the PK and would be used for all relationships. When I have
> business rules to satisfy that revolve around uniqueness of "natural" keys,
> I use unique indexes.

The main issue I have with the 'Autonumber PK' movement is that the
message about having table constraints on the candidate keys (and
other business rules) doesn't always come across. You've said the
right things (IMO) about so-called surrogates in this thread but I
would wager than most users in the 'Autonumber PK' camp put a PK on
the Autonumber and think, "Job done" as regards table constraints.

Another significant issue is when Autonumber values are exposed to
users. Most informed people agree this should be avoided but again the
message isn't always cleat and so it does happen. In an ideal world,
surrogates would be implemented under the hood and not even be exposed
to DBA, developers, etc. I encourage people to use the random flavour
of Autonumber to discourage this practice (and persistent offenders
should be forced to use GUIDs <g>).

Jamie.

--

Jamie Collins

unread,

Dec 7, 2007, 4:51:20 AM12/7/07

to

On Dec 6, 11:39 pm, "David W. Fenton" <XXXuse...@dfenton.com.invalid>
wrote:
> =?Utf-8?B?TS4=?= <M...@discussions.microsoft.com> wrote innews:2D3FCA4D-2535-41D3...@microsoft.com:

> The unique indexes are a function of the data, while the
> primary key is a function of your database structure.

I don't see how the fact that your SQL DBMS of choice implements
unique constraints using contiguous storage and indexes has anything
to do with the 'function of the data'.

Jamie.

--

Jamie Collins

unread,

Dec 7, 2007, 4:57:19 AM12/7/07

to

On Dec 6, 8:03 pm, "Tony Toews [MVP]" <tto...@telusplanet.net> wrote:
> Consider if you have a parent and subform based a composite PK?
> That's a lot of extra work there figuring out what fields go where.

This implies you design your database schema (tables etc) for the
convenience of your front end (forms and subforms) and/or for your own
convenience?! My approach is to design the database schema entirely
independent of the front ends (which is done by another person
anyhow).

Jamie.

--

no email@aol.com Pat Hartman

unread,

Dec 7, 2007, 10:59:31 AM12/7/07

to

I agree with you that way too many people do not understand the need for
unique indexes to support business rules when using autonumbers as primary
keys. I don't believe I said that autonumbers should be exposed.
Presumably, if there are candidate keys available, they are what the user
would see and use.

Clustered indexes are a double-edged sword and we won't discuss how they
work in "real" databases. In a Jet database, each table is reordered in
sequence by its primary key whenever the database is compacted. That is
what fools people into thinking queries without sort orders will always
return rows in a predictable order. The reordering acts sort of like a
clustered index. However, unlike "real" databases, all new records and some
changed records (if the record size increases) are stored outside of the
clustered sequence and must be reordered when the database is compacted.
Clustered indexes only provide benefit when large numbers of rows are
regularly retrieved in key sequence. When records are retrieved randomly
(one at a time) or based on other attributes, the clustering has no impact
whatsoever and alternate indexes are much more important for optimizing
query performance.

The most useful clustered index that I have ever used segregated a 9 million
row transaction table into months (days actually). Almost all reporting
access was to the most recent three months so the index was extremely
useful. And since the transaction date didn't change, there wasn't an issue
with outliers.

"Jamie Collins" <jamiec...@xsmail.com> wrote in message
news:b6165f27-f889-4fe0...@t1g2000pra.googlegroups.com...

Jamie Collins

unread,

Dec 7, 2007, 11:32:05 AM12/7/07

to

On Dec 7, 3:59 pm, "Pat Hartman" <please no em...@aol.com> wrote:
> > users. Most informed people agree this should be avoided but again the

> > message isn't always clear

>
> I don't believe I said that autonumbers should be exposed.

You hadn't said that Autonumbers shouldn't be exposed either ;-)

> Clustered indexes are a double-edged sword

Better IMO to try to wield the sword rather than ignore it and risk
stabbing yourself in the foot <g>.

[BTW I never got that idiom: surely a double-edged blade pierces the
victim much more effectively than a single edged blade, making a
better weapon of attack...?]

Jamie.

--

John W. Vinson

unread,

Dec 7, 2007, 12:22:24 PM12/7/07

to

On Fri, 7 Dec 2007 01:35:50 -0800 (PST), Jamie Collins
<jamiec...@xsmail.com> wrote:

> I encourage people to use the random flavour
>of Autonumber to discourage this practice (and persistent offenders
>should be forced to use GUIDs <g>).

eeep! Is that allowed under the Geneva Convention?

<bg>

John W. Vinson [MVP]

Dale Fye

unread,

Dec 7, 2007, 12:37:00 PM12/7/07

to

I disagree, on two points.

1. I design my schemas as close to 3rd or 4th normal form as feasible.
But, rather than carry around the excess baggage of a 3, 4, or 5 field PK, I
have elected to take the path that is most efficient given my chosen
front/back tools (Access). However, even when I get the opportunity to take
advantage of the server side processing that SQL Server provides, I still use
a 'meaningless' PKs to speed development (makes my job easier) and minimize
the risk of mistakes (as one of the other posts mentioned, an update query
where one of the joins is missing can be disasterous).

2. I really enjoy the challenge of developing an application that is both
functional and user friendly. Unlike most "programmers", I also consider
myself to be extremely literate in the "database" side of things, and take
great exception with programmers who kludge their database together based on
their desired form layout.

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.

Tony Toews [MVP]

unread,

Dec 7, 2007, 1:46:11 PM12/7/07

to

Jamie Collins <jamiec...@xsmail.com> wrote:

>> Consider if you have a parent and subform based a composite PK?
>> That's a lot of extra work there figuring out what fields go where.
>
>This implies you design your database schema (tables etc) for the
>convenience of your front end (forms and subforms) and/or for your own
>convenience?!

That would be "and" not "and/or". And yes that's a small part of the
reason yes.

>My approach is to design the database schema entirely
>independent of the front ends (which is done by another person
>anyhow).

I do everything.

David W. Fenton

unread,

Dec 8, 2007, 2:43:07 PM12/8/07

to

Jamie Collins <jamiec...@xsmail.com> wrote in

news:b6165f27-f889-4fe0...@t1g2000pra.googlegroups.com
:

> On Dec 6, 6:50 pm, "Pat Hartman" <please no em...@aol.com> wrote:
>> the autonumber would
>> ALWAYS be the PK and would be used for all relationships. When I
>> have business rules to satisfy that revolve around uniqueness of
>> "natural" keys, I use unique indexes.
>
> The main issue I have with the 'Autonumber PK' movement is that
> the message about having table constraints on the candidate keys
> (and other business rules) doesn't always come across. You've said
> the right things (IMO) about so-called surrogates in this thread
> but I would wager than most users in the 'Autonumber PK' camp put
> a PK on the Autonumber and think, "Job done" as regards table
> constraints.

I'm not sure I think it's a good idea to define "best practices"
based on trying to avoid what stupid people do. If someone fails to
put in place proper constraints on fields other than your surrogate
key, then that's pilot error. It's a mistake.

On the other hand, in a lot of situations, forcing someone to use
the natural key can lead to all sorts of other problems, like
default values to avoid Nulls, and then you have to program around
those (e.g., you won't be able to use this to concatenate the name
fields: Mid(("12" + [LastName]) & (", " + [FirstName]),3), because
with default values, none of your "empty" fields will be Null).

Seems to me you're trading one set of problems for another.

I'm assuming that people will not handle the multi-column key
correctly and you're assuming that they won't handle their
constraints on non-PK keys. Six of one, etc., so I see no reason why
your approach would be superior on the basis of the argument made
above, because to me it applies equally well (and is, in fact,
worse) with multi-column natural keys (I have no objection on
principle to single-column natural keys, though I use them only on a
case-by-case basis).

> Another significant issue is when Autonumber values are exposed to
> users. Most informed people agree this should be avoided but again
> the message isn't always cleat and so it does happen.

This is, once again, an example of pilot error.

> In an ideal world,
> surrogates would be implemented under the hood and not even be
> exposed to DBA, developers, etc. I encourage people to use the
> random flavour of Autonumber to discourage this practice (and
> persistent offenders should be forced to use GUIDs <g>).

But random Autonumber PKs have problems with clustering and table
order (although in high-concurrency environments, it can be turned
into an advantage).

I know you're joking about GUIDs, but just in case someone reading
this takes you seriously, they should read this:

http://trigeminal.com/usenet/usenet011.asp?1033

(the article is specifically about replication, but many of the
problems with GUIDs in Access apply equally to non-replicated MDBs)

--
David W. Fenton http://www.dfenton.com/

usenet at dfenton dot com http://www.dfenton.com/DFA/

David W. Fenton

unread,

Dec 8, 2007, 2:46:16 PM12/8/07

to

Jamie Collins <jamiec...@xsmail.com> wrote in

news:7d00416c-c6e8-426a...@s8g2000prg.googlegroups.com
:

> My approach is to design the database schema entirely
> independent of the front ends (which is done by another person
> anyhow).

I'll bet you have developers cursing you all the time.

I learn a lot about how the properties and structure of the entities
represented in database schema in the process of designing the UI.
It's a two-way process, with feedback flowing from the schema design
to the UI and back again.

This is not to say that the schema design is driven by the UI, only
that in designing a UI *for people* one can learn things about the
schema that were not in the spec, or not obvious from it.

Tony Toews [MVP]

unread,

Dec 8, 2007, 2:51:52 PM12/8/07

to

Jamie Collins <jamiec...@xsmail.com> wrote:

>In an ideal world,
>surrogates would be implemented under the hood and not even be exposed
>to DBA, developers, etc. I encourage people to use the random flavour
>of Autonumber to discourage this practice

I disagree. Sometimes when trouble shooting problems it's a lot
easier to remember two or three digit numbers when viewing data from
different tables. I'm using my own test databases with a few or a
few dozen records. Later I'll test against client databases.

Tony Toews [MVP]

unread,

Dec 9, 2007, 12:04:24 AM12/9/07

to

M. <M...@discussions.microsoft.com> wrote:

>What still interests me, however, is how pepole prevent entry of duplicate
>records when they use a autonumber field as primary index. This choice
>doesn't offer you any guarantee that the real data in your record is
>accidentally entered a second time in a later record. In my opinion you would
>still need a unique single (e.g. ISBN number) or multifield index (e.g. book
>title + author name) to prevent duplicate entry. In the end, it's then a
>matter of taste whether you name the autonumber field index primary or the
>composite field(s) index primary.

Like Dale said I use either unique indexes on other fields if
applicable and which surprisingly often aren't possible. So now you
have to use some business logic and depend on humans.

Book titles can be duplicate (Golden is one example with two different
authors in 2006) as well as having multiple authors, translators,
illustrator, etc.

Humans can have duplicate names and birth dates. Or Fred Jones and
his son Fred Jones live at the same house.

Jamie Collins

unread,

Dec 10, 2007, 3:38:25 AM12/10/07

to

On Dec 7, 5:37 pm, Dale Fye <dale....@nospam.com> wrote:
> I design my schemas as close to 3rd or 4th normal form as feasible.

Doubtful <g>. 3NF is a historical anomaly and 4NF is rarely desirable.
I'd wager you actually achieve BCNF and 5NF more often than you
think :)

> But, rather than carry around the excess baggage of a 3, 4, or 5 field PK, I
> have elected to take the path that is most efficient given my chosen
> front/back tools (Access). However, even when I get the opportunity to take
> advantage of the server side processing that SQL Server provides, I still use
> a 'meaningless' PKs to speed development (makes my job easier) and minimize
> the risk of mistakes (as one of the other posts mentioned, an update query
> where one of the joins is missing can be disasterous).

It seems you use a so-called surrogate (what Codd meant by 'surrogate'
is open to debate e.g. when he said users "have no control over its
value, nor is its value ever displayed to them" did he mean DBAs and
Access Developers?) for FK references. Well, that's your design choice
and, assuming you have table constraints (rather than front end code)
to cover your candidate keys, then fair enough; I'd hope you wouldn't
use another's threads to evangelize <g>. However, note the essence of
the OP's question seems to me to be: given that someone has used
Autonumber as a so-called surrogate, what reasons would they have for
not making the Autonumber the PK? I cordially invite you to attempt to
answer that question.

> I ... take

> great exception with programmers who kludge their database together based on
> their desired form layout.

Then why did you reply to me rather than take great exception with
Tony Toews [MVP] who made the point...?

> Don''t forget to rate the post if it was helpful!

...Ah, perhaps I see why ;-)

Jamie.

--

Jamie Collins

unread,

Dec 10, 2007, 5:34:33 AM12/10/07

to

On Dec 8, 7:46 pm, "David W. Fenton" <XXXuse...@dfenton.com.invalid>
wrote:

> > My approach is to design the database schema entirely
> > independent of the front ends (which is done by another person
> > anyhow).
>
> I'll bet you have developers cursing you all the time.

We have design reviews where discussion is encouraged. Cursing is not
a great way of raising an objection, IMO.

> I learn a lot about how the properties and structure of the entities
> represented in database schema in the process of designing the UI.
> It's a two-way process, with feedback flowing from the schema design
> to the UI and back again.
>
> This is not to say that the schema design is driven by the UI, only
> that in designing a UI *for people* one can learn things about the
> schema that were not in the spec, or not obvious from it.

If you are saying that your logical model is only as good as the
domain expert's spec then I am in full agreement.

Jamie.

--

Jamie Collins

unread,

Dec 10, 2007, 5:44:40 AM12/10/07

to

On Dec 8, 7:51 pm, "Tony Toews [MVP]" <tto...@telusplanet.net> wrote:
> >In an ideal world,
> >surrogates would be implemented under the hood and not even be exposed
> >to DBA, developers, etc. I encourage people to use the random flavour
> >of Autonumber to discourage this practice
>
> I disagree. Sometimes when trouble shooting problems it's a lot
> easier to remember two or three digit numbers when viewing data from
> different tables. I'm using my own test databases with a few or a
> few dozen records. Later I'll test against client databases.

And if you make it the PK, your client and their users will end up
with the double whammy of poor concurrency and poor clustering. But,
hey, the convenience of the developer is what counts for you, eh <g>?

Jamie.

--

Jamie Collins

unread,

Dec 10, 2007, 5:55:58 AM12/10/07

to

On Dec 9, 5:04 am, "Tony Toews [MVP]" <tto...@telusplanet.net> wrote:
> I use either unique indexes on other fields if
> applicable and which surprisingly often aren't possible.

That's why we were given table-level CHECK constraints.

> Book titles can be duplicate (Golden is one example with two different
> authors in 2006) as well as having multiple authors, translators,
> illustrator, etc.

I sure wish someone would hurry up and invent an industry standard
identifier with a trusted source which will be familiar to users. Oh,
hang on: they already did (ISBN).

> Humans can have duplicate names and birth dates. Or Fred Jones and
> his son Fred Jones live at the same house.

Fred Jones's credit card number is unique and he is not supposed to
disclose his PIN number but how do you know it is not his son Fred
Jones using it? Fred Jones has different fingerprints to Fred Jones
but are you authorized to retain that data? It's all about trust i.e.
you need a trusted source for identifiers.

An Autonumber in an Access database cannot verify people in reality
therefore cannot be used as a trusted source. Yes that's a straw man
argument, as are your faulty examples of natural keys.

Jamie.

--

no email@aol.com Pat Hartman

unread,

Dec 10, 2007, 10:32:11 AM12/10/07

to

If you want to conduct an intellectual discussion, you really need to be
able to handle it when people disagree with you and stop taking cheap shots.

"Jamie Collins" <jamiec...@xsmail.com> wrote in message
news:8a3ba570-4433-44a8...@b1g2000pra.googlegroups.com...

Tony Toews [MVP]

unread,

Dec 11, 2007, 12:31:13 AM12/11/07

to

Jamie Collins <jamiec...@xsmail.com> wrote:

>> I disagree. Sometimes when trouble shooting problems it's a lot
>> easier to remember two or three digit numbers when viewing data from
>> different tables. I'm using my own test databases with a few or a
>> few dozen records. Later I'll test against client databases.
>
>And if you make it the PK, your client and their users will end up
>with the double whammy of poor concurrency and poor clustering. But,
>hey, the convenience of the developer is what counts for you, eh <g>?

How does clustering apply to Access databases? And I've only ever had
one problem with concurrency in all these years. So not a problem
for me.

Jamie Collins

unread,

Dec 11, 2007, 3:23:41 AM12/11/07

to

On Dec 10, 3:32 pm, "Pat Hartman" <please no em...@aol.com> wrote:
> If you want to conduct an intellectual discussion

> you really need to be
> able to handle it when people disagree with you and stop taking cheap shots.

If I really was conducting this discussion I would have answers to my
direct questions (ping Dale Fye: "given that someone has used

Autonumber as a so-called surrogate, what reasons would they have for

not making the Autonumber the PK?")

And if I couldn't handle people around here disagreeing with me I'd
have left years ago!

FYI:

The Access Web: Netiquette
http://www.mvps.org/access/netiquette.htm
Be thoughtful of bandwidth and other folks' thought processes:
[Quote] Look for Smileys :-), grins <g>, and other "Emoticons". When
you see one, the preceding statement was not meant to be really
serious, don't take it as such. [Unquote]

I don't think the poster was being serious either (hence my short
dismissal with a grin). Designing database for one's own convenience
rather than your client's doesn't sound like a credible proposition to
me.

Jamie.

--

Jamie Collins

unread,

Dec 11, 2007, 3:27:32 AM12/11/07

to

On Dec 11, 5:31 am, "Tony Toews [MVP]" <tto...@telusplanet.net> wrote:
> How does clustering apply to Access databases?

If by "Access databases" you mean Jet then it has already been
mentioned several times in this thread alone. Here I even got accused
by the OP of providing a "helpful answer":

http://groups.google.com/group/microsoft.public.access.tablesdbdesign/msg/41add6aaab8ef045

Jamie.

--

Jamie Collins

unread,

Dec 11, 2007, 3:59:40 AM12/11/07

to

On Dec 8, 7:43 pm, "David W. Fenton" <XXXuse...@dfenton.com.invalid>
wrote:
> Jamie Collins <jamiecoll...@xsmail.com> wrote innews:b6165f27-f889-4fe0...@t1g2000pra.googlegroups.com
> :

I broadly agree with everything you've said here. I do still think
that many (perhaps most) users of Autonumber columns use it for their
one and only unique constraint per table; for the record, I think such
people are ill informed rather than stupid. Some should now be better
informed after reading your post :)

One point:

> in a lot of situations, forcing someone to use
> the natural key can lead to all sorts of other problems, like
> default values to avoid Nulls, and then you have to program around
> those (e.g., you won't be able to use this to concatenate the name
> fields: Mid(("12" + [LastName]) & (", " + [FirstName]),3), because
> with default values, none of your "empty" fields will be Null).

I don't design my schemas to fit your ready-rolled formulas <g>!. I
have no doubt that if I define my middle_name column as NOT NULL with
a DEFAULT of '{{NONE}}' (plus CHECK constraints to disallow the empty
string, spaces, etc) then you would be able to come up with a similar
formula to derive a full name. My create/update stored procs handle
the NULL value and the empty string so it shouldn't be too onerous on
the 'front end guy' to replace '{{NONE}}', '{{NK}}', etc, with
whatever is appropriate in context.

That said, person's name rarely makes a good key, so is not IMO a good
example of a natural key. My favourite is the ISBN: industry standard
(ask at your local public library or look at the URLs on Amazon),
trusted source (will resolve rare cases of duplication), familiar and
verifiable in reality (take a look at the bar code on the back cover
of your favourite book), stable (sure, they have recently changed from
10 to 13 characters but this has been managed by the trusted source to
provide an upgrade path), fixed length with a check digit (easy to
validate at time of keying in), a so-called 'intelligent' key (it
comprises detail of region, publisher, etc) and so on.

Jamie.

--

no email@aol.com Pat Hartman

unread,

Dec 11, 2007, 11:01:31 AM12/11/07

to

I think the answer you've gotten is that none of us can think of a good
reason to include an autonumber in a table if it is NOT the primary key.

"Jamie Collins" <jamiec...@xsmail.com> wrote in message

news:d3c45586-be83-4216...@d21g2000prf.googlegroups.com...

Jamie Collins

unread,

Dec 14, 2007, 4:10:35 AM12/14/07

to

On Dec 11, 4:01 pm, "Pat Hartman" <please no em...@aol.com> wrote:
> I think the answer you've gotten is that none of us can think of a good
> reason to include an autonumber in a table if it is NOT the primary key.

I'm assuming that by 'primary key' you mean the SQL keyword PRIMARY
KEY rather than the logical meaning of 'primary key', on the basis
that an Autonumber only comes into existence in the physical
implementation and has no place in the logical model. I'm also
assuming that by 'us' you aren't alienating me <g>.

1) Clustering: as an extreme example, imagine a table with a ten-
column compound key and you wanted to cluster on just one of those
columns being a DATETIME consisting of non-unique values. In SQL
Server, if you created a clustered index solely on a column without a
unique constraint then the system would create an auto-increment
column under the hood to 'uniqueify' the values. In Jet, the only way
of influencing clustering is to use the PRIMARY KEY designation and
Jet will not allow you to create a PK solely on a column containing
non-unique values. You can do the same trick as SQL Server but you
have to explicitly create the 'uniqueifier' yourself and an Autonumber
column is a very convenient way of doing this. You'd then create a
compound PK on (datetime_col, unique_col) _in that order_ of course
(BTW how does one do this using the Access interface rather than
code?) In this scenario, the autonumber in the table is not the
primary key, though it is _part of_ the PRIMARY KEY.

2) PRIMARY KEY is arbitrary: you can put it on *any* columns (bar
those of type OLEOBJECT) in the table (unless existing values are NULL
and/or non-unique) or you can opt to omit a PRIMARY KEY completely: if
you have a unique constraint (e.g. unique index and NOT NULL
designation) on a candidate key I don't think anyone could fault you
(though they could point out that you were potentially missing out on
an opportunity for optimization). Therefore, a table with an
Autonumber column but without a PRIMARY KEY is perfectly acceptable.

I'm sure there are other reasons which smarter people than me could
come up with; perhaps a little thinking outside the box is required?

Jamie.

--