dbdebunk 'Quote of Week' comment

Mike Meng

unread,

Aug 18, 2005, 3:25:57 AM8/18/05

to

Hi all,
I'm enjoying reading C. J. Date's new book "Database in Depth", in
which I know the www.dbdebunk.com website. I visited the site just now,
and find the following statement in its "QUOTE OF THE WEEK" section:

** QUOTE **
Don't use primary keys that have meaning for the end user, such as
invoice number or the ISBN value.

--Giuseppe DiMauro & Francesco Balena, PRACTICAL GUIDELINES AND BEST
PRACTICES FOR MICROSOFT VISUAL BASIC AND VISUAL C# DEVELOPERS
** QUOTE **

It seems that they quote this idea to point out how wrong it is. But
I myself always regard this idea as a good practice in database design!
Am I right? If not, what's wrong with it? Please comment.

Troels Arvin

unread,

Aug 18, 2005, 3:59:57 AM8/18/05

to

On Thu, 18 Aug 2005 00:25:57 -0700, Mike Meng wrote:
> ** QUOTE **
> Don't use primary keys that have meaning for the end user, such as
> invoice number or the ISBN value.

[...]

> ** QUOTE **
>
> It seems that they quote this idea to point out how wrong it is. But
> I myself always regard this idea as a good practice in database design!
> Am I right? If not, what's wrong with it? Please comment.

This is probably an area where Darwen/Date and Celko would agree(!).

It has often been discussed in this group and comp.databases. And it has
been written about at various places on the web, eg

http://www.intelligententerprise.com/030320/605celko1_1.jhtml
http://c2.com/cgi/wiki?AutoKeysVersusDomainKeys

My position is that generated columns are sometimes needed because the
world isn't perfect. But it's a stupid habit when people _automatically_
(by heart) create a generated "ID" column in tables and make it the
primary key without thinking. Often, it will contain useless information
because columns can be uniquely identified by other column(s) in the
table. Sometimes, using such generated values for primary keys will allow
wrong, redundant information to accumulate in the non-primary key columns
(if the domain key(s) aren't also enforced with UNIQUE constraints).

--
Greetings from Troels Arvin

Roy Hann

unread,

Aug 18, 2005, 4:28:31 AM8/18/05

to

"Mike Meng" <meng...@gmail.com> wrote in message
news:1124349957.1...@g49g2000cwa.googlegroups.com...

It might seem like an acceptable way of preventing an incorrect manually
entered key value from being propagated through the database. If a user
enters a licence plate number wrong, and that is used as a foreign key in
dozens of tables, you've got dozens of places to correct the error (meaning
lots of code has to exist or be written--and added to in perpetuity as the
database design grows).

The better solution is to create foreign keys with an ON UPDATE CASCADE
constraint so that when the users enter a wrong value and subsequently
correct it, the DBMS automatically propagates the correction. This is
preferable because you don't need a key generator function; you don't need a
cross-reference table; you don't obfuscate data with surrogate values; and
most importantly IMO, you don't conceal duplications. The latter allows
apparently correctly-formed queries to give incorrect answers.

Roy

John

unread,

Aug 18, 2005, 4:39:52 AM8/18/05

to

I agree.

I often take the following:

"one should not increase, beyond what is necessary, the number of
entities required to explain anything"

and modify it to this:

"one should not increase, beyond what is necessary, the number of
entities required to store anything"

IME the best designs are always the simple ones.

John

David Cressey

unread,

Aug 18, 2005, 8:43:55 AM8/18/05

to

"Mike Meng" <meng...@gmail.com> wrote in message
news:1124349957.1...@g49g2000cwa.googlegroups.com...

It depends. There are times when it's wise to use keys like ISBN and
invoice number, and times when it's not.

The important factor is not whether the user understands the keys or does
not, but whether new instances of real world entities come with dependable
keys or not.

Examples: depending on Social Security # to identify persons is a bad idea,
if you are going to be faced with a significant number of people that don't
have an account with Social Security. example: the prison inmate
population.

Depending on ISBN to key books in a library is a bad idea if you are going
to have a significant number of books that come without an ISBN, like
certain government publications.

And, of course, the generator has to avoid duplicates, among other issues.

There are other cases where surrogate keys are a good idea.

But, in general, Date is right. Natural keys are better, as long as they
are dependable.

It's not a dogmatic question. There are situation where one choice is
right, and others where the other choice is right.

And that's what I don't like about dogmatic responses. They tend to assume
that design judgements fall on a simple linear scale from badness to
goodness. Goodness is not a one dimensional measure. And that's why I stay
way from dbdebunk.
At a deeper level, they don't get it.

-CELKO-

unread,

Aug 18, 2005, 6:00:31 PM8/18/05

to

>> .. if you are going to be faced with a significant number of people that don't have an account with Social Security. example: the prison inmate population. <<

Actually, we either deport them or give them an SSN on release or
parole as part of the attempt to do job placement. If they are
deported then they get a number from the INS on the way out.

-CELKO-

unread,

Aug 18, 2005, 6:06:06 PM8/18/05

to

The fact that this question is being considered is a sign that we have
a lot of really bad SQL programmers. By definition a relational key is
a subset of attributes of an entity or relationship being modeled in
the schema. This is absolute foundations of the Relational Model kind
of stuff.

IDENTITY is an exposed PHYSICAL locator. What does that phrase mean?
The value is created by looking at the internal state of the hardware
at the time a PHYSICAL record containing a row is inserted into
storage. Its purpose is to locate the row without any regard to what
the data means.

Think about using a pointer or a track/sector number; same thing but
different mechanism. But SQL does not have a pointer data type or the
mechanisms to handle pointer operators, garbage collection and
housekeeping, so 20+ years ago the original Sybase SQL Server exposed
an integer that can map back to the contiguous storage model used under
the covers.

IDENTITY is not an attribute in the data model and can never be an
attribute in the data model because it does not exist in the reality
from which you derive your data model. The purpose is to fake a
sequential file's positional record number, so I can reference the
PHYSICAL storage location. Sure, I lose all the advantages of an
abstract data model, SQL set oriented programming, carry extra data,
destroy the portability of code and have no data integrity. Who cares?
It is easier to write than a real normalized RDBMS schema.

More and more programmers who have absolutely no database training are
being told to design a database. It takes six years to become a
Journeyman Union Carpenter in New York State. But a programmer with
only a few years in a procedural or OO language is expected to produce
a correct and usable database.

What is he going to do? The smart ones will get some help and beg for
training, knowing they are over their heads and can ruin the company.
Most of them simply start programming SQL as if it were their native
programming language. They grab at GUIDs, IDENTITY, ROWID and other
proprietary auto-numbering "features" in SQL products to imitate either
a record number (sequential file system mindset) or OID (OO mindset)
since they don't know anything else.

They write code with cursors to mimic record-at-a-time file handling --
it is easy do a global replace of READ() with FETCH. They write to temp
tables to mimic scratch files in a series of procedural steps - it is
hard to think of a single declarative statement to get the answer. They
use dynamic SQL and let the user figure out how the system should work
on the fly - they never had a software engineering course and don't
know what coupling and cohesion are.

Experienced database designers look for industry standard codes for
their keys first. Try to tell the IRS you don't have anybody's SSN at
tax time but Cindy Lou Who was employee 42 when you put her into the
Personnel table and you have the IDENTITY value to prove it. Try to
sell a car without a VIN - using IDENTITY for this would be like
identifying a vehicle by the current local parking space number (think
about it - that is exactly what an IDENTITY value is -- a parking space
in one physical location).

We know that we have to use industry standard codes, either de facto or
de jure. We also appreciate the fact that we can exchange data with the
rest of the world. We appreciate the fact that someone else will
maintain and define these codes. A trusted external source is a good
thing to have.

If you don't have an industry standard or natural key and have to
design your own codes, it is hard work to do it right. I know that
newbie programmers want to start coding first and thinking later. It
does not work that way and all the "wish magic" in the world will
not change that fact. I have several chapters in my books on how to
design encoding schemes, but that is another topic.

The first practical consideration is that IDENTITY is proprietary and
non-portable, so you know that you will have maintenance problems when
you change releases or port your system to other SQL products. Newbies
actually think they will never port code! Perhaps they only work for
companies that are failing and will be gone. Perhaps their code is such
crap nobody else want their application. Otherwise, you will port code;
you will share data with some other database; data does not exist in
isolation.

But let's look at the logical problems. First try to create a table
with two columns and try to make them both IDENTITY. If you cannot
declare more than one column to be of a certain data type, then that
thing is not a data type at all, by definition. It is a property which
belongs to the PHYSICAL table, not the LOGICAL data in the table.

Next, create a table with one column and make it an IDENTITY. Now try
to insert, update and delete different numbers from it. If you cannot
insert, update and delete rows from a table, then it is not a table by
definition.

Finally create a simple table with one IDENTITY and a few other
columns. Use a few statements like

INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

To put a few rows into the table and notice that the IDENTITY
sequentially numbered them in the PHYSICAL order they were presented.
If you delete a row, the gap in the sequence is not filled in and the
sequence continues from the highest number that has ever been used in
that column in that particular table. This is how we did record numbers
in pre-allocated sequential files in the 1950's, by the way. A utility
program would then "pack" or "compress" the records that were flagged
as deleted or unused to move the empty space to the PHYSICAL end of the
PHYSICAL file. IDENTITY leaves the gaps unless you write your own
routine to do the compression.

But now use a statement with a query expression in it, like this:

INSERT INTO Foobar (a, b, c)
SELECT x, y, z
FROM Floob;

Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one did you pick? Why?
The answer has been to use whatever the *PHYSICAL* order of the result
set happened to be. That non-relational phrase "PHYSICAL order" again!

But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different PHYSICAL order. Indexes and statistics are not part of the
logical model.

Can you explain from a logical model why the same rows in the second
query get different IDENTITY numbers? In the relational model, they
should be treated the same if all the values of all the attributes are
identical.

How do you verify that an entity has the right key when you use an
IDENTITY? If I use a VIN for a vehicle, I can go to parking lot and
read it off the dashboard. If I use a UPC for a candy bar, I can read
the bar code on the wrapper. But why is little Cindy Lou Who employee
42? If I turn her upside, will I find that she has 42 tattooed
somewhere? If I call an external trusted source, will they know that
she is employee 42?

In the Relational Model, you do not invent a key in the storage. You
discover a key (and the other attributes) in the real world and model
it. If you create your own encoding for a key, then you have to
maintain it, provide audit trails and do all the work that an industry
standard organization would do for you.

If I lean on a mouse button, I can insert the same data with a new
IDENTITY over and over. Cindy Lou Who is now employed two times and
none of my reports are right! Now I have to write some procedural code
like a trigger or a UNIQUE constraint on her SSN to prevent this, thus
making the IDENTITY redundant. But we were assuming that we use only
IDENTITY as a key, so we are screwed.

Newbies often design tables without bothering to look for a relational
key, so they are so surprised when they do a data warehouse and nobody
else has any idea what they are doing in their subsystem.

Cindy Lou Who now has two rows in Personnel. When we sign her up for
the Dental Plan, we get the row with 42. When we sign her up for the
Bowling Team, we get the row with 43. We find our error, and delete the
row with 42 because we have a row that was created later and we assume
it is more current.

Another common way to get this is to have two procedures, one for
inserting a new employee to the Dental Plan and one for inserting a new
employee to the Bowling Team. Both procedures create a row in Personnel
since they use only IDENTITY as a key.

Well, now we have an orphan row in Personnel. In fact, in such systems,
you will find a lot of orphans. I worked for a company that used GUIDs
for OIDs substitutes and our software fell apart in about a year of
actual use by a client.

Another cute way to destroy data integrity:

BEGIN ATOMIC
DELETE FROM Foobar
WHERE id = <<some identity value>>;
INSERT INTO Foobar
VALUES (<<recreate deleted row>>)
END;

Logically this should do nothing, but since IDENTITY has gaps, it
trashes the data. When one query uses the IDENTITY and another uses the
real key, you are like a man with two watches, you are never sure what
time it is.

Finally, an appeal to authority, with a quote from Dr. Codd:
"..Database users may cause the system to generate or delete a
surrogate, but they have no control over its value, nor is its value
ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
E. (1979), Extending the database relational model to capture more
meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] - is
to introduce entity domains which contain system-assigned surrogates.
Database users may cause the system to generate or delete a surrogate,
but they have no control over its value, nor is its value ever
displayed to them....." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434

See this article for problems in replication with IDENTITY columns.
http://www.simple-talk.com/2005/07/05/replication/

--CELKO--
jcel...@earthlink.net

JGE

unread,

Aug 18, 2005, 8:58:40 PM8/18/05

to

Hello Joe:

I'd be an idiot to question anything above, but I will make a comment about
SSN's. Here in Canada, a SIN. There are many situations where one needs to
store information about people where one has no right to ask for and the
person has no obligation to deliver, their SSN. Also, is not the SSN simply
an artificial key, generated by the government? As such, how does it differ
from the lovely primary key or 'identity' that Access is so generous to
provide? In principle, rather than in scale.

Cheers.

JE

paul c

unread,

Aug 18, 2005, 10:14:47 PM8/18/05

to

JGE wrote:
> ...

> ...There are many situations where one needs to store information about people

where one has no right to ask for and the
> person has no obligation to deliver, their SSN. Also, is not the SSN simply
> an artificial key, generated by the government? As such, how does it differ
> from the lovely primary key or 'identity' that Access is so generous to
> provide? In principle, rather than in scale.

>...

i thought that the long message about IDENTITY being an exposed PHYSICAL
indicator was more mystical than logical. i'd say your message is
closer to natural logic, even if Access is flawed. don't claim to know
the answer, since the question, if i've got it right makes no sense, ie.
which should we prefer, a generated key, or a 'natural' one? all
depends on the situation. my own opinion at the moment is that
databases don't have to be global to be useful. in fact, local
databases may be safer. a few months ago i was handcuffed by an Oregon
state trooper who thought i was using stolen plates. fault of a
database that didn't recognize canuck plates. on the other hand, i was
released about an hour later after the canuck db was consulted. shudder
to think what would have happened had there been only one, wrong, db.

pc

Roy Hann

unread,

Aug 19, 2005, 5:29:39 AM8/19/05

to

"David Cressey" <david....@earthlink.net> wrote in message
news:fo%Me.9177$Je....@newsread2.news.atl.earthlink.net...

>
> And that's what I don't like about dogmatic responses. They tend to
assume
> that design judgements fall on a simple linear scale from badness to
> goodness. Goodness is not a one dimensional measure. And that's why I
stay
> way from dbdebunk.

I fear you miss the point of DBDebunk. I doubt Pascal's purpose is to tell
us how to do things (since--for one thing--he knows very well we haven't the
tools or products to do things "his" way). I believe his purpose is to show
that a systematic, logical way of thinking can take us places where common
sense fails or misleads. His purpose is to show that "design judgements"
are improved when one has a grasp of a mathematically respectable theory for
what one is doing (and I don't mean theory in the sense it is often used, as
a synonym for conjecture or speculation). And his often stated purpose is
to point out that there are good reasons to think that better tools are
possible, and that we won't get them if we don't know to ask for them.

Making it a policy to stay away from DBDebunk seems rather dogmatic to me.
There is lots there that is good, and most of the rest is worth testing
intellectually even if you ultimately decide it's wrong. Staying away from
DBDebunk as a matter of policy seems dogmatic at best, and may even be
arrogant.

Roy

Tony Andrews

unread,

Aug 19, 2005, 7:12:12 AM8/19/05

to

David Cressey wrote:
> And that's what I don't like about dogmatic responses. They tend to assume
> that design judgements fall on a simple linear scale from badness to
> goodness. Goodness is not a one dimensional measure. And that's why I stay
> way from dbdebunk.
> At a deeper level, they don't get it.

But at dbdebunk they DO NOT dogmatically say avoid surrogates - e.g.
here:

http://www.dbdebunk.com/page/page/857309.htm

"We do not recommend the use of surrogate keys as a rule, but only in
some rare circumstances and for very specific reasons. Chapter 3 in my
book gives ah explanation of the criteria for choosing keys, which also
help determining when those special circumstances apply."

It is the "quote of the week" that is dogmatic: "Don't use primary keys
that have meaning for the end user..."

Frank_Hamersley

unread,

Aug 19, 2005, 9:21:09 AM8/19/05

to

"JGE" <finar...@hotmail.com> wrote in

> On 2005-08-18, -CELKO- <jcel...@earthlink.net> wrote:

[..]

> Hello Joe:
>
> I'd be an idiot to question anything above, but I will make a comment
about
> SSN's. Here in Canada, a SIN. There are many situations where one needs to
> store information about people where one has no right to ask for and the
> person has no obligation to deliver, their SSN. Also, is not the SSN
simply
> an artificial key, generated by the government? As such, how does it
differ
> from the lovely primary key or 'identity' that Access is so generous to
> provide? In principle, rather than in scale.

There is no difference in its genesis - only in the interest that we attach
to it after it is created!

Errr, Access - shudders violently - surely the single most damaging product
ever to be foisted on an unsuspecting world when measured in terms of
promoting the bad bad practices Joe is railing against. BTW (before I get
flamed) I will declare that I have developed a few Access based apps that
were quite viable outcomes - but it required lots of experience and
discipline to get to achieve those results. Not exactly the skillset of the
targeted market.

It does remind me of a recent experience with one of my internet banking
facilities (sniggers at the memory). The account was suddenly locked
because of 3 failed logins. Suspecting a keylogger hack I quickly rang the
help desk to get some forensic info so I could start to id (sic) the
affected PC. Oh no she said "its probably a customer who transposed a few
digits, didn't notice and then used their password to get your account
locked! We used to allocate the numbers sequentially but found it was
causing too many problems, so we now use random numbers to avoid typo
collisions". My paranoia faded instantly and my evil twin contemplated the
DOS from hell - imagine the help desk meltdown trying to reset irate
customer passwords if you started pounding from 1 onwards, and then did it
again a day later! Simple decisions made by simpletons - BIG headaches
later!

Cheers, Frank.

Frank_Hamersley

unread,

Aug 19, 2005, 9:33:33 AM8/19/05

to

"-CELKO-" <jcel...@earthlink.net> wrote

> The fact that this question is being considered is a sign that we have
> a lot of really bad SQL programmers. By definition a relational key is
> a subset of attributes of an entity or relationship being modeled in
> the schema. This is absolute foundations of the Relational Model kind
> of stuff.
>

[..] (or I would have top posted).

I understand all the points and concerns they generate...but in my own
personal toolkit of techniques I have found I can use surrogate keys and
suffer none of the grief you attribute to them.

I guess I should declare 27 years of involvement with gadgets starting in
the age of PDP-8/11's, Altairs, Alpha Micro, CDC Cybers (what a machine that
was) might have taught me a few tricks in this regard. So all you pups
searching for gems should prolly accord 80 in an 80/20 measure of Joes
dissertation.

Cheers, Frank.

-CELKO-

unread,

Aug 19, 2005, 9:50:31 AM8/19/05

to

>> Here in Canada, a SIN. There are many situations where one needs to store information about people where one has no right to ask for and the person has no obligation to deliver, their SSN. <<

We have some of the same privacy laws here, but you still need it to
pay taxes.

>> Also, is not the SSN simply an artificial key, generated by the government? As such, how does it differ from the lovely primary key or 'identity' that Access is so generous to provide? In principle, rather than in scale. <<

TOTALLY different In principle! -- I have an article on how to
validate and verify an SSN sitting at www.dbazine.com in the slush
pile. The SSN has nothing to do with the internal state of a single
piece of hardware. It is generated by a set of rules in a rigid fomat,
verified by the Social Security Adminstration and issued numbers are
posted once a month on the Internet so you can look them up. Certain
SSN are never issued, certain ranges are available for assignment to
persons without an SSN, etc. This is not just a serial number.

-CELKO-

unread,

Aug 19, 2005, 9:54:50 AM8/19/05

to

>> Oh no she said "its probably a customer who transposed a few digits, didn't notice and then used their password to get your account locked! We used to allocate the numbers sequentially but found it was causing too many problems, so we now use random numbers to avoid typo collisions". <<

I have to use that story! People keep bitching about my mania for
check digits because they cannot or do not wish to do the math. Well,
until the data is corrupted ..

Roy Hann

unread,

Aug 19, 2005, 10:01:22 AM8/19/05

to

"Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message
news:NclNe.4651$FA3....@news-server.bigpond.net.au...

> I understand all the points and concerns they generate...but in my own
> personal toolkit of techniques I have found I can use surrogate keys and
> suffer none of the grief you attribute to them.

I believe you are fooling yourself.

> I guess I should declare 27 years of involvement with gadgets starting in
> the age of PDP-8/11's, Altairs, Alpha Micro, CDC Cybers (what a machine
that
> was) might have taught me a few tricks in this regard. So all you pups
> searching for gems should prolly accord 80 in an 80/20 measure of Joes
> dissertation.

I have been in this field just as long as you. (I still have a copy of the
MITS Altair newsletter with a Chirstmas photo of Bill Gates and the rest of
the Microsoft team--all 16 of them.)

In that time I have seen/written/endured untold mountains of spaghetti code
do to foolish and misguided things, that had I known and followed the advice
of people like Date and Pascal (and maybe even Celko on occasion) I could
have mostly avoided.

If you were to explain to me how unnecessary surrogate keys silently conceal
corruption, and how you guaranteed that they wouldn't, and that it all ended
up being cheaper, simpler, and more maintainable than if you hadn't used
them, then I might be prepared to accept that you haven't suffered any of
the grief. But since you haven't explained that, I'm more inclined to think
you just haven't noticed or recognized it. Not noticing that you have
cancer is not the same as not having it.

It is not too late to explain now. I keep an open (but properly sceptical)
mind.

Roy (No pup)

Alex Papadimoulis

unread,

Aug 19, 2005, 11:15:08 AM8/19/05

to

> I have found I can use surrogate keys and
> suffer none of the grief you attribute to them.

> [...]

> I guess I should declare 27 years of involvement

In my travels, I have been to all corners of North America. I've been
atop mountans. I've peered out of skyscraper. I've even flown in a
plane. With all my traveling experience, I have yet to find any
evidence whatsoever that the earth is round; as far as I can tell, it's
flat as a board (moutants et al withstanding).

Despite my personal observations, I have not joined the Flat Earth
Society. I rely on the expertise of others to mathematically prove a
round earth. I trust the experience of those who have personally
observed that the earth is in fact round.

When you teach yourself something, you end up relying on your own
observations and assumptions. When you rely on what you know, you lose
out on volumes of collective knowledge and experience. Had you not read
it in a science book, do you really think you'd come to the conclusion
that the earth is round?

The same holds true for databases. When a programmer self-teaches
himself databases, he applies his own experience: procedural
programming. This is what happened in the fifties when there were no
databases and, therefore, no experience or expertise to go by.
Programmers back then modeled data exactly like you do now: like a
programmer.

Fifty years, the data field has made a lot of progress. The best model
we have now is the relational model. One of the fundamentals of this
model is that data do not need to be accessed by pointers (meaningless
keys); they can be accessed (and related) by their value.

When you ignore this fundamental, your model becomes closer and closer
to the ineffective ways of the past. Worse still, you lose the tools
they had back then (IDMS pointer functions, for example) to keep their
models running. As Santayana said, "Those who cannot remember the past
are condemned to repeat it."

Dan

unread,

Aug 19, 2005, 11:44:58 AM8/19/05

to

I am strong believer in separating the data model from any and all
client programming model, OOP be dammed (thanks Mr. Fabian Pascal, your
concise book on RDMS should be required reading for all computer
"professionals".). An analogy is collecting real world data for
population studies and then using a variety of statistical models to
analyze it.

Dan

unread,

Aug 19, 2005, 11:51:51 AM8/19/05

to

On 8/19/2005 10:15 AM, Alex Papadimoulis wrote:
>>I have found I can use surrogate keys and
>>suffer none of the grief you attribute to them.
>>[...]
>>I guess I should declare 27 years of involvement
>
>
> In my travels, I have been to all corners of North America. I've been
> atop mountans. I've peered out of skyscraper. I've even flown in a
> plane. With all my traveling experience, I have yet to find any
> evidence whatsoever that the earth is round; as far as I can tell, it's
> flat as a board (moutants et al withstanding).
>

Ever watch a ship disappear over the horizon?

paul c

unread,

Aug 19, 2005, 12:39:07 PM8/19/05

to

Dan wrote:
> On 8/19/2005 10:15 AM, Alex Papadimoulis wrote:
>

>>> ...

>> In my travels, I have been to all corners of North America. I've been
>> atop mountans. I've peered out of skyscraper. I've even flown in a
>> plane. With all my traveling experience, I have yet to find any
>> evidence whatsoever that the earth is round; as far as I can tell, it's
>> flat as a board (moutants et al withstanding).
>>
>
> Ever watch a ship disappear over the horizon?

> ...

Heh, all I was told was that the horizon was a line and the closer the
ship got to it, the smaller it would seem until it disappeared. Didn't
occur to me that the ship might be going downhill. I figured that the
curved horizon must be a side-effect of the thick windows in airplanes.
Personally, I think it's just that it disappeared because of my
imperfect eyesight.

Cheers,
p

Gene Wirchenko

unread,

Aug 19, 2005, 12:58:34 PM8/19/05

to

On Fri, 19 Aug 2005 02:14:47 GMT, paul c <toledob...@oohay.ac>
wrote:

[snip]

>i thought that the long message about IDENTITY being an exposed PHYSICAL
>indicator was more mystical than logical. i'd say your message is

Not mystical at all. If you expose physical details and they get
used, you have lock-in.

>closer to natural logic, even if Access is flawed. don't claim to know
>the answer, since the question, if i've got it right makes no sense, ie.
>which should we prefer, a generated key, or a 'natural' one? all
>depends on the situation. my own opinion at the moment is that
>databases don't have to be global to be useful. in fact, local
>databases may be safer. a few months ago i was handcuffed by an Oregon

You got handcuffed because of a local database. This was safer
because why?

>state trooper who thought i was using stolen plates. fault of a
>database that didn't recognize canuck plates. on the other hand, i was
>released about an hour later after the canuck db was consulted. shudder
>to think what would have happened had there been only one, wrong, db.

Think of the consequences if the database consulted in the first
place had been right.

Sincerely,

Gene Wirchenko

unread,

Aug 19, 2005, 12:58:34 PM8/19/05

to

On Fri, 19 Aug 2005 10:51:51 -0500, Dan <d...@nospam.com> wrote:

>On 8/19/2005 10:15 AM, Alex Papadimoulis wrote:
>>>I have found I can use surrogate keys and
>>>suffer none of the grief you attribute to them.
>>>[...]
>>>I guess I should declare 27 years of involvement
>>
>>
>> In my travels, I have been to all corners of North America. I've been
>> atop mountans. I've peered out of skyscraper. I've even flown in a
>> plane. With all my traveling experience, I have yet to find any
>> evidence whatsoever that the earth is round; as far as I can tell, it's
>> flat as a board (moutants et al withstanding).
>>
>
>Ever watch a ship disappear over the horizon?

Did you figure out yourself that that meant the earth is round,
or did you find out about that from elsewhere?

[snip]

>> When you teach yourself something, you end up relying on your own
>> observations and assumptions. When you rely on what you know, you lose
>> out on volumes of collective knowledge and experience. Had you not read
>> it in a science book, do you really think you'd come to the conclusion
>> that the earth is round?

Being "practical" as it is often used means ignoring what anyone
else has to say.

Sincerely,

Gene Wirchenko

unread,

Aug 19, 2005, 12:58:35 PM8/19/05

to

On Fri, 19 Aug 2005 15:01:22 +0100, "Roy Hann"
<spec...@processed.almost.meat> wrote:

[snip]

>If you were to explain to me how unnecessary surrogate keys silently conceal

^^^
=Mr. Celko.

>corruption, and how you guaranteed that they wouldn't, and that it all ended
>up being cheaper, simpler, and more maintainable than if you hadn't used
>them, then I might be prepared to accept that you haven't suffered any of
>the grief. But since you haven't explained that, I'm more inclined to think

He has explained it several times, and just recently (this week),
too.

>you just haven't noticed or recognized it. Not noticing that you have
>cancer is not the same as not having it.
>
>It is not too late to explain now. I keep an open (but properly sceptical)
>mind.

Sincerely,

Gene Wirchenko

Alexandr Savinov

unread,

Aug 19, 2005, 1:03:06 PM8/19/05

to

David Cressey schrieb:

The main problem of dbdebunk is that they do not propose anything new
and their main argument is that we do not need anything new because we
have already one good theory and that is enough -- the main problem (in
their opinion) is that this theory is not used correctly and there exist
bad boys who do not understand it (bad programmers, bad database
developers and so on). Those bad boys produce bad products and try to
enhance this ultimate theory and its is crime. Why that theory is good?
Just because it is based on set theory (stupid argument, is not it). I
would qualify dbdebunk as religious fanatics who identify themselves
with this one theory and speak on bahalf of it. They think their role
consists in protecting this theory and the way they chose consists in
isolating it from the outside world, frozing in its current state and
fighting with any type of heretics. So it is really dogmatic and
orthodox site (unfortunately). And anybody who wants to learn something
new, unusual, surprising and interesting should keep a distance from
this site because they think that they know what truth is.

--
http://conceptoriented.com

Roy Hann

unread,

Aug 19, 2005, 1:12:20 PM8/19/05

to

"Gene Wirchenko" <ge...@ucantrade.com.NOTHERE> wrote in message
news:lt2cg15rddif79mcn...@4ax.com...

> On Fri, 19 Aug 2005 15:01:22 +0100, "Roy Hann"
> <spec...@processed.almost.meat> wrote:
>
> [snip]
>
> >If you were to explain to me how unnecessary surrogate keys silently
conceal
> ^^^
> =Mr. Celko.

Err, no. =Frank Hamersley. On this one I think Joe Celko is pretty much
right.

Roy

Gene Wirchenko

unread,

Aug 19, 2005, 1:18:03 PM8/19/05

to

On Fri, 19 Aug 2005 19:03:06 +0200, Alexandr Savinov
<sav...@host.com> wrote:

[snip]

Mr. Savinov might be feeling a bit stung since one of his remarks
recently was a Quote of the Week at dbdebunk.com.

>The main problem of dbdebunk is that they do not propose anything new

You have not been reading the site, have you?

>and their main argument is that we do not need anything new because we
>have already one good theory and that is enough -- the main problem (in

They do not argue that we do not need anything new.

>their opinion) is that this theory is not used correctly and there exist
>bad boys who do not understand it (bad programmers, bad database
>developers and so on). Those bad boys produce bad products and try to
>enhance this ultimate theory and its is crime. Why that theory is good?

They do not enhance it. They pervert it. For example, the whole
XML craze is a throwback. dbdebunk.com says why.

>Just because it is based on set theory (stupid argument, is not it). I

Straw man.

>would qualify dbdebunk as religious fanatics who identify themselves
>with this one theory and speak on bahalf of it. They think their role
>consists in protecting this theory and the way they chose consists in
>isolating it from the outside world, frozing in its current state and
>fighting with any type of heretics. So it is really dogmatic and

They would like to see it implemented.

>orthodox site (unfortunately). And anybody who wants to learn something
>new, unusual, surprising and interesting should keep a distance from
>this site because they think that they know what truth is.

Do you think that you know what truth is?

If you do not, why should we listen to you? If you do, according
to you, we should keep a distance from you. (Your argument lacks
rigor.)

Sincerely,

Gene Wirchenko

unread,

Aug 19, 2005, 3:00:13 PM8/19/05

to

On Fri, 19 Aug 2005 18:12:20 +0100, "Roy Hann"
<spec...@processed.almost.meat> wrote:

>"Gene Wirchenko" <ge...@ucantrade.com.NOTHERE> wrote in message
>news:lt2cg15rddif79mcn...@4ax.com...
>> On Fri, 19 Aug 2005 15:01:22 +0100, "Roy Hann"
>> <spec...@processed.almost.meat> wrote:
>>
>> [snip]
>>
>> >If you were to explain to me how unnecessary surrogate keys silently
>conceal
>> ^^^
>> =Mr. Celko.
>
>Err, no. =Frank Hamersley. On this one I think Joe Celko is pretty much
>right.

I apologise for reading that wrong.

Sincerely,

Gene Wirchenko

David Cressey

unread,

Aug 19, 2005, 3:20:10 PM8/19/05

to

"-CELKO-" <jcel...@earthlink.net> wrote in message
news:1124402431....@g49g2000cwa.googlegroups.com...

If I read this right, you are talking about the population of former prison
inmates. I'm talking about the inmate population.

-CELKO-

unread,

Aug 19, 2005, 4:07:48 PM8/19/05

to

I read that the average age of an Internet user is now in his/her 40's
and that it just tipped to 51% female. I guess we are the extreme end
of the curve. My topper is: "Have you ever wired plug boards?"

The problem is not surrogate keys -- read Codd's definition. The
system handles them, just like it would an index, and hide it.

The problem is IDENTITY and attempting to use it as a surrogate key.
The problem is the very poor architecture in SQL Server.

In the Nucleus engine from Sand Technology, I have domains that are
linked together by a compressed bit vector scheme to build tables on
the fly -- no contigous storage at all. In Sybase SQL Anywhere, the
domains have one occurrence of each value and linked list structure
makes all the PK=FK references Therse are true surrogates.

Doesn't it bother you that you have to put an index on the FK side of a
REFERENCES manually in SDQL Server? That the domain value is
physically repeated in contigous storage?

That is what happens when the product is designed with a "table =
contigous storage file" mindset instead of thinking of the whole schema
as the unit of work, with data element domains as the foundation and
tables as beign built from domains and constraints.
.

Alex Papadimoulis

unread,

Aug 19, 2005, 4:11:19 PM8/19/05

to

> Ever watch a ship disappear over the horizon?

Can't say that I have, but that would probably be a bit convincing.

OK, what about the Earth revolving around the Sun? Where's your
"evidence" for that? And please, don't get into long detailed
explanations about math, physics, models, yadda yadda; I'm a
"practical" guy and don't have time for that!

Gene Wirchenko

unread,

Aug 19, 2005, 4:28:19 PM8/19/05

to

On 19 Aug 2005 13:11:19 -0700, "Alex Papadimoulis"
<ernestp...@gmail.com> wrote:

>> Ever watch a ship disappear over the horizon?

^^^^
Why is this word not "under"? Think about it for a moment.
Where is the ship, and where is the horizon?

You can not relate to that? <BEG>

Sincerely,

Gene Wirchenko

JGE

unread,

Aug 19, 2005, 7:04:05 PM8/19/05

to

On 2005-08-19, -CELKO- <jcel...@earthlink.net> wrote:
>>> Here in Canada, a SIN. There are many situations where one needs to store information about people where one has no right to ask for and the person has no obligation to deliver, their SSN. <<
>
> We have some of the same privacy laws here, but you still need it to
> pay taxes.
>

Well obviously. How does that justify (as an example) Radio Shack asking for
a social security number? Because it is a natural key? (They don't of course
because most people would tell them to bugger off. As such, they can't use
the SSN as a primary key). My point here is actually a pet peeve of mine.
The SIN (here in Canada) and the SSN are NOT intended for general usage. If
you have a database that involves income, then you MAY have a right to ask
for the SSN. Otherwise, I'm not going to tell you and you can't make me. And
if I'm a customer in a customer database, you have to ask: Is the customer
wrong?

>>> Also, is not the SSN simply an artificial key, generated by the government? As such, how does it differ from the lovely primary key or 'identity' that Access is so generous to provide? In principle, rather than in scale. <<
>
> TOTALLY different In principle! -- I have an article on how to
> validate and verify an SSN sitting at www.dbazine.com in the slush
> pile. The SSN has nothing to do with the internal state of a single
> piece of hardware. It is generated by a set of rules in a rigid fomat,
> verified by the Social Security Adminstration and issued numbers are
> posted once a month on the Internet so you can look them up. Certain
> SSN are never issued, certain ranges are available for assignment to
> persons without an SSN, etc. This is not just a serial number.
>

I'm not sure why the fact that the algorithm for generating a key is more
complicated than x=x+1 changes the fact that an SSN is still a generated key
that is intended to be an artificial and unique identifier of a member of a set
and that identifier has nothing to do with the elements of the set. Yes it is
more than a serial number. No, in principal, it is no different from that
automatically generated key thing in Access. More complicated yes.
Platonically different. No. I could be wrong here Joe. I'll leave this
thread and the last word to you (if you're interested). Also, I'm not quite
as well versed on the theory as you are.

Cheers.

JE

Frank_Hamersley

unread,

Aug 20, 2005, 1:23:46 AM8/20/05

to

"-CELKO-" <jcel...@earthlink.net> wrote

It's a weird world - the maths isn't that hard! And the same ppl prolly
think the likelihood of an attack affecting their business is so close to
zero that it doesn't warrant the effort.

Cheers, Frank.

Frank_Hamersley

unread,

Aug 20, 2005, 1:52:00 AM8/20/05

to

"Alex Papadimoulis" <ernestp...@gmail.com> wrote

>
> In my travels, I have been to all corners of North America.>

Heh heh ... as a famous TV commentator once said when interviewing a real
life Forrest Gump ...

"thats amazing, you have really traveled the world then!"

As for me I recognise I have only barely scratched the surface with 1
continent and 5 summits left to go!

[..]

> When you ignore this fundamental, your model becomes closer and closer
> to the ineffective ways of the past. Worse still, you lose the tools
> they had back then (IDMS pointer functions, for example) to keep their
> models running. As Santayana said, "Those who cannot remember the past
> are condemned to repeat it."

For me the prime determinant is the talent, not the tool. Sadly (IMO) most
practitioners have limited talent. To add my pithy 10c ...

"A good workman never blames his tools" (apols for the lack of PC)

Regards, Frank.

Frank_Hamersley

unread,

Aug 20, 2005, 3:45:43 AM8/20/05

to

"Roy Hann" <spec...@processed.almost.meat> wrote

> "Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message

> > I understand all the points and concerns they generate...but in my own
> > personal toolkit of techniques I have found I can use surrogate keys and
> > suffer none of the grief you attribute to them.
>
> I believe you are fooling yourself.

I'll concede from where you stand its a reasonable bet to place given the
run rate of the "profession" and the fact you don't know me from Adam!

> > I guess I should declare 27 years of involvement with gadgets starting
in
> > the age of PDP-8/11's, Altairs, Alpha Micro, CDC Cybers (what a machine
that
> > was) might have taught me a few tricks in this regard. So all you pups
> > searching for gems should prolly accord 80 in an 80/20 measure of Joes
> > dissertation.
>
> I have been in this field just as long as you. (I still have a copy of
the
> MITS Altair newsletter with a Chirstmas photo of Bill Gates and the rest
of
> the Microsoft team--all 16 of them.)

Not autographed?

> In that time I have seen/written/endured untold mountains of spaghetti
code
> do to foolish and misguided things, that had I known and followed the
advice
> of people like Date and Pascal (and maybe even Celko on occasion) I could
> have mostly avoided.

Bad luck! I was fortunate to be educated by a prof who was a Pascal addict
and insisted on showing us, via recursion amongst other things, how power
could be achieved using elegant designs and coding. I have carried this
mantra over the years and it has served me well. The other aspect I place
great stead in is discipline - which IMO is an inherent trait, albeit
reinforced by observations like yours. Most often these traits are not
recognised by management it the search for quick thrills ;-)

> If you were to explain to me how unnecessary surrogate keys silently
conceal
> corruption, and how you guaranteed that they wouldn't, and that it all
ended
> up being cheaper, simpler, and more maintainable than if you hadn't used
> them, then I might be prepared to accept that you haven't suffered any of
> the grief. But since you haven't explained that, I'm more inclined to
think
> you just haven't noticed or recognized it. Not noticing that you have
> cancer is not the same as not having it.

If the surrogate keys were indeed "unnecessary " then you would have a valid
point. In my current experimental work they are not (IMO at least)
unnecessary.

When looking at the horror foisted on use by Access suggesting (nay almost
insisting) on id columns as PK's I would further agree. Against my
inclinations I have built a few small but reliable multi-user apps in Access
over the years (even with id columns) but the cost of managing the risks you
identify was quite traumatising when the product was marketed as Gods gift
to "ease of use".

> It is not too late to explain now. I keep an open (but properly
sceptical) mind.

I knew this question would arise and I beg the readers tolerance for I
haven't the time (right now) or inclination (as I am still moderately
proprietal about the techniques) to lay it all out here before I have
resolved a potential business opportunity. However I will say that I am not
using more than 1 table with an identity attribute and in a similar vein to
Ken's Andromeda I am not going to code it all by hand. As I said above the
work is experimental at the moment but the key challenges still to be solved
are not related to the use of surrogate keys.

Cheers, Frank.

Frank_Hamersley

unread,

Aug 20, 2005, 4:00:26 AM8/20/05

to

"-CELKO-" <jcel...@earthlink.net> wrote

> I read that the average age of an Internet user is now in his/her 40's
> and that it just tipped to 51% female. I guess we are the extreme end
> of the curve. My topper is: "Have you ever wired plug boards?"

Yep - just small kits - never been paid to do it (perhaps a sign of
frustrated elec engineer)!

I used to marvel at the Cyber 73 in use at my first real employment. The
backplane was actually a mat of wire wraps so thick you couldn't see the
substrate - admittedly it was wired by a robot - but impressive nonetheless.

> The problem is not surrogate keys -- read Codd's definition. The
> system handles them, just like it would an index, and hide it.
>
> The problem is IDENTITY and attempting to use it as a surrogate key.

This is a very succinct way of describing the real problem.

Cheers Frank.

David Cressey

unread,

Aug 20, 2005, 5:09:26 AM8/20/05

to

"Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message

news:NclNe.4651$FA3....@news-server.bigpond.net.au...

> I guess I should declare 27 years of involvement with gadgets starting in
> the age of PDP-8/11's, Altairs, Alpha Micro, CDC Cybers (what a machine
that
> was) might have taught me a few tricks in this regard. So all you pups
> searching for gems should prolly accord 80 in an 80/20 measure of Joes
> dissertation.

Hmmm. I started out with an IBM 7090 and a PDP-1 (That's a one, not an
eleven), From there moving on to PDP-6 and 10, DECsystem-10 and 20, and
VAX. SQL came much later for me, after Pascal, Datatrieve, and VAX Rdb/VMS.

Celko is right in his diatribe, but he lays it on too thick.

David Cressey

unread,

Aug 20, 2005, 5:14:48 AM8/20/05

to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:uc-dnWcwKdg...@pipex.net...

> "Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message
> news:NclNe.4651$FA3....@news-server.bigpond.net.au...

> > I guess I should declare 27 years of involvement with gadgets starting

in
> > the age of PDP-8/11's, Altairs, Alpha Micro, CDC Cybers (what a machine
> that
> > was) might have taught me a few tricks in this regard. So all you pups
> > searching for gems should prolly accord 80 in an 80/20 measure of Joes
> > dissertation.
>
> I have been in this field just as long as you. (I still have a copy of
the
> MITS Altair newsletter with a Chirstmas photo of Bill Gates and the rest
of
> the Microsoft team--all 16 of them.)
>

So have I. In my reply, I mentioned the PDP-1.

> In that time I have seen/written/endured untold mountains of spaghetti
code
> do to foolish and misguided things, that had I known and followed the
advice
> of people like Date and Pascal (and maybe even Celko on occasion) I could
> have mostly avoided.

I've learned from my own mistakes, from the mistakes of others, and
occasionally, from fundamental principles.

>
> If you were to explain to me how unnecessary surrogate keys silently
conceal
> corruption, and how you guaranteed that they wouldn't, and that it all
ended
> up being cheaper, simpler, and more maintainable than if you hadn't used
> them, then I might be prepared to accept that you haven't suffered any of
> the grief. But since you haven't explained that, I'm more inclined to
think
> you just haven't noticed or recognized it. Not noticing that you have
> cancer is not the same as not having it.
>
> It is not too late to explain now. I keep an open (but properly
sceptical)
> mind.

I suggest you read Ralph Kimball for a good treatment on the benefits of
using surrogate keys.

As I said elsewhere in this thread, surrogate keys are not useful when
natural keys will do the same job.

Incidentally, most "natural keys" are just as "artificial" as surrogate
keys. They are simply assigned externally to the system being disucssed.

Frank_Hamersley

unread,

Aug 21, 2005, 7:40:23 AM8/21/05

to

"David Cressey" <david....@earthlink.net> wrote in message

>
> "Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message

[..]

>
> Hmmm. I started out with an IBM 7090 and a PDP-1 (That's a one, not an
> eleven), From there moving on to PDP-6 and 10, DECsystem-10 and 20, and
> VAX. SQL came much later for me, after Pascal, Datatrieve, and VAX
Rdb/VMS.

Woot! How many PDP-1's did they end up making? As I recall my uni had a
PDP-6 (with a single digit serial number) but I never got to play with it.
One of my associates in the club I believe got it (or perhaps a PDP-8) to
play "Fur Elise" by resting a transistor radio inside the card cage and
executing a program of repeated instructions that induced the required tones
by interference. We used to use a PDP-8 running early Unix to cross compile
Modula-2 for an LSI-11 in the undergrad lab and the DEC-10 was the recipient
of my first <cough> crack. Apols to less nostalgic readers for suffering my
flashbacks.
</OT>

> Celko is right in his diatribe, but he lays it on too thick.

Perhaps - he made the best point in another post (excuse my paraphrasing)
that the problem is not the use of surrogate keys per se, but more the use
of identity columns for PK's. I guess in the hands of the great unwashed
there is not much distinction between the two acts, and religious sermons
might prove to be the most effective way to try and hold back the tide.

Cheers Frank.

Anne & Lynn Wheeler

unread,

Aug 21, 2005, 10:10:11 AM8/21/05

to

"David Cressey" <david....@earthlink.net> writes:
> Hmmm. I started out with an IBM 7090 and a PDP-1 (That's a one, not
> an eleven), From there moving on to PDP-6 and 10, DECsystem-10 and
> 20, and VAX. SQL came much later for me, after Pascal, Datatrieve,
> and VAX Rdb/VMS.

recent posting related to 50th anniv share meeting in boston starting
today & having given presentation as undergraduate at the aug68 boston
share meeting
http://www.garlic.com/~lynn/2005o.html#12

part of aug68 presentation
http://www.garlic.com/~lynn/94.html#18

share web page
http://www.share.org/

recent posting mentioning platform original relational/sql
work was done
http://www.garlic.com/~lynn/2005o.html#10

other postings about original relational/sql work
http://www.garlic.com/~lynn/subtopic.html#systemr

and a recent thread that had some pascal discussion
http://www.garlic.com/~lynn/2005o.html#11

slightly older thread discussing vax/vms
http://www.garlic.com/~lynn/2005n.html#10

and even older posting giving number of vax shipments, sliced and
diced by model, year, us, non-us
http://www.garlic.com/~lynn/2002f.html#0

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

vldm10

unread,

Aug 21, 2005, 5:32:19 PM8/21/05

to

You can find Balena's explanation at
http://www.amazon.com/exec/obidos/tg/detail/-/0735621721/104-1613950-6789558?v=glance
regarding "meaningless keys". (Belana is the author of
this Quote of the week)
There is another aspect of this problem I would like to mention.
Generally speaking, in your DB you can use data of another DB and this
data can be changed, deleted ( for example, SSN ) or a mistake can
occur (made by another person from another DB) in the data upon which
you act (or already have ), but don't know about these changes.
There is also a theoretical question when you design DB in which some
of the attributes were designed by another person. (Keys, meaning,
levels of abstraction,...)
Finally, is there a good specific solution for this "second hand
data"?

vldm10

unread,

Aug 21, 2005, 7:10:37 PM8/21/05

to

-CELKO- wrote:
> The fact that this question is being considered is a sign that we have
> a lot of really bad SQL programmers. By definition a relational key is
> a subset of attributes of an entity or relationship being modeled in
> the schema. This is absolute foundations of the Relational Model kind
> of stuff.

Here, I suppose that by relational key you mean a candidate key and it
is, by definition, a subset of the set of attributes of the relation R
(it is not the subset of attributes of an entity as you wrote). I
believe that this is what you meant, and if it is so we should
emphasize two things regarding surrogate keys.
First, a surrogate key should support the primary key. Now I used
primary instead candidate key. The difference is that when we chose one
candidate key to be the primary key it means that DBMS will add many
things to this key, which is related to tuple level addressing
mechanism, about which you wrote. The surrogate key now should be the
primary key, and it is a primary key like any other primary key -
meaning it will have the DBMS support as any other primary key. (it can
include audit trails, for example)
Secondly, the surrogate key should provide identification of an entity.

(Of course, we can choose another successful way to work with
surrogates).
I add this because in your text it is not clear when and how we can use
surrogate keys.

It seems that you misunderstood Codd. Your quote from Codd's paper

"Extending the database relational model to capture more meaning"

is not related to users who somehow generate a surrogate key from the
system. Codd mentions in one sentence the solutions from papers [4] and
[14]. Even more so, Codd's solution is "in the same direction" as
[4] and [14].
Anyway, in this paper Codd doesn't forbid surrogate keys, he
doesn't classify them into standard or some other ones.
Codd explicitly writes in this paper that his solution is one
possibility more for users who work with entity identifiers that are
totally under their control.

Vladimir Odrljin

David Cressey

unread,

Aug 22, 2005, 4:49:29 AM8/22/05

to

"Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message

news:HKZNe.6342$FA3....@news-server.bigpond.net.au...

> "David Cressey" <david....@earthlink.net> wrote in message
> >
> > "Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message
> [..]
> >
> > Hmmm. I started out with an IBM 7090 and a PDP-1 (That's a one, not an
> > eleven), From there moving on to PDP-6 and 10, DECsystem-10 and 20,
and
> > VAX. SQL came much later for me, after Pascal, Datatrieve, and VAX
> Rdb/VMS.
>
> Woot! How many PDP-1's did they end up making? As I recall my uni had a
> PDP-6 (with a single digit serial number) but I never got to play with it.
> One of my associates in the club I believe got it (or perhaps a PDP-8) to
> play "Fur Elise" by resting a transistor radio inside the card cage and
> executing a program of repeated instructions that induced the required
tones
> by interference. We used to use a PDP-8 running early Unix to cross
compile
> Modula-2 for an LSI-11 in the undergrad lab and the DEC-10 was the
recipient
> of my first <cough> crack. Apols to less nostalgic readers for suffering
my
> flashbacks.
> </OT>

According to this website
http://www.computermuseum.li/Testpage/DEC-PDP1-1960.htm
53. One of the ones I programmed was serial number two. A guy named Paul
Samson made
good music on the one.

There were only about 18 PDP-6 computers made. Ahead of its time.

The DEC-10 was about the peak of my "techie" lifetime. After that, I began
to be more interested in information than in technology. That's what
eventually led me to relational databases.

My apologies to the less nostalgic participants as well. "... nothing but
boring stories about glory days".

David Cressey

unread,

Aug 22, 2005, 5:00:14 AM8/22/05

to

"Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message

news:HKZNe.6342$FA3....@news-server.bigpond.net.au...

> > Celko is right in his diatribe, but he lays it on too thick.
>
> Perhaps - he made the best point in another post (excuse my paraphrasing)
> that the problem is not the use of surrogate keys per se, but more the use
> of identity columns for PK's. I guess in the hands of the great unwashed
> there is not much distinction between the two acts, and religious sermons
> might prove to be the most effective way to try and hold back the tide.
>

Look, I'm in favor of education as much as the next man. A significant part
of my career was as an instructor for DEC.

And there's no question that about half the databases being built today
would be better built if the builders knew how to design a database at least
halfway decently. I'm not being a devil's advocate here. But religious
sermons hold back the tide about as well as king Canute's command did.

Another question is, how many of the programmers are among the great
unwashed, and why is this the case?

The deeper question, IMO, is whether databases are still being used as a
tool for integrating data, and making it more widely accessible, or
whether most of today's databases are private parts of an application, and
there is no useful information to be obtained via SQL, knowing only the
business model and the metadata. By "more widely accessible" I mean useable
in more contexts, not just by more people.

Programmers will always gravitate towards viewing the data in "their
databases" as their private bailiwick, and insist that users of the data
access it through their own API. Learning SQL is certainly better than
learning a hundred programmer's different APIs.

David Cressey

unread,

Aug 22, 2005, 5:02:18 AM8/22/05

to

"Anne & Lynn Wheeler" <ly...@garlic.com> wrote in message
news:m3vf1zs...@lhwlinux.garlic.com...

> recent posting related to 50th anniv share meeting in boston starting
> today & having given presentation as undergraduate at the aug68 boston

I might want to attend the 50th Anniversary of DECUS, when it rolls around.

Frank_Hamersley

unread,

Aug 22, 2005, 8:07:54 AM8/22/05

to

"David Cressey" <david....@earthlink.net> wrote

>
> "Frank_Hamersley" <tera...@isat.bigpond.com> wrote in message
>

> > > Celko is right in his diatribe, but he lays it on too thick.
> >
> > Perhaps - he made the best point in another post (excuse my
paraphrasing)
> > that the problem is not the use of surrogate keys per se, but more the
use
> > of identity columns for PK's. I guess in the hands of the great
unwashed
> > there is not much distinction between the two acts, and religious
sermons
> > might prove to be the most effective way to try and hold back the tide.
> >
> Look, I'm in favor of education as much as the next man. A significant
part
> of my career was as an instructor for DEC.
>
> And there's no question that about half the databases being built today
> would be better built if the builders knew how to design a database at
least
> halfway decently. I'm not being a devil's advocate here. But religious
> sermons hold back the tide about as well as king Canute's command did.

It might stop some of the newbies from walking out to see how far the tide
ebbs, any to have too scramble to safety when it suddenly turns. :-)

I often wonder if the recent arrivals have sufficient grounding in the
underlying technologies to understand the implications of design and
implementation decisions? Whilst I never wrote any microcode and only built
a few ttl (gate) based devices, when combined with the undergrad course and
lots of time in the UCC clubroom, I find I have no trouble analysing design
implications to identify where problems might arise (or could be
neutralised). Today I suspect ppl are more mono-cultural and expect the
underlying layers to simply deliver (and in spades too) without any effort
required to rationalise any implications.

> Another question is, how many of the programmers are among the great
> unwashed, and why is this the case?

Lots - I feel because most ppl "don't get it" about computers period! This
has however has not been an impediment to many attaining gainful employment
because the employers get it even less!

> The deeper question, IMO, is whether databases are still being used as a
> tool for integrating data, and making it more widely accessible, or
> whether most of today's databases are private parts of an application,
and
> there is no useful information to be obtained via SQL, knowing only the
> business model and the metadata. By "more widely accessible" I mean
useable
> in more contexts, not just by more people.

My recent experiences suggest there is a trend to use database systems to
provide direct access for those so bold as to use it as a means of
reassuring RFP evaluators that the business data will not be locked away in
a black box. Of course without the "helper" app the cost of entry often
seems high, but I haven't found anything I can't reverse engineer after
building the right tools for the job. And of course there is plenty of
action in the BI and DW arena to reinforce the trend of using an "SQL"
layer.

> Programmers will always gravitate towards viewing the data in "their
> databases" as their private bailiwick, and insist that users of the data
> access it through their own API. Learning SQL is certainly better than
> learning a hundred programmer's different APIs.

Definitely - but regardless of the merits of any tool a determined
individual will still find a way to cross thread (or tear the head off) a
stud.

Cheers, Frank.

Hugo Kornelis

unread,

Aug 22, 2005, 6:20:09 PM8/22/05

to

On 18 Aug 2005 15:06:06 -0700, -CELKO- wrote:

(snip)

>IDENTITY is an exposed PHYSICAL locator. What does that phrase mean?
>The value is created by looking at the internal state of the hardware
>at the time a PHYSICAL record containing a row is inserted into
>storage. Its purpose is to locate the row without any regard to what
>the data means.

Hi Joe,

I can't comment on IDENTITY in Sybase, but in MS SQL Server, IDENTITY is
not "an exposed PHYSICAL locator" at all. Instead, it is a reliable and
scalable mechanism for the generation of surrogate key values. The value
created is not based on the internal state of the hardware, but at the
number of preceding inserted rows, including the failed INSERTS that
were rolled back. If you would say that anything that ignores ROLLBACK
has no place in a relational database, you'd have a point. But your
statement that IDENTITY exposes anything physical is just silly.

>Think about using a pointer or a track/sector number; same thing but
>different mechanism.

No, completely different. A pointer to a track/sector number has to
change when the location of the data is changed; the identity value
never changes.

>IDENTITY is not an attribute in the data model and can never be an
>attribute in the data model because it does not exist in the reality
>from which you derive your data model. The purpose is to fake a
>sequential file's positional record number, so I can reference the
>PHYSICAL storage location.

The first statement is correct. The second statement is rubbish. There
is absolutely no relationship between identity values and positional
record numbers.

Run this in MS SQL Server 2000:

CREATE TABLE Foobar
(ID INT NOT NULL IDENTITY,
Name CHAR (10) NOT NULL,
PRIMARY KEY (Name)
)
INSERT INTO Foobar(Name)
VALUES ('Joe')
INSERT INTO Foobar(Name)
VALUES ('Celko')

If you use the undocumented DBCC PAGE command to inspect how the data is
laid out on the disk, you'll find that the row with ID = 2 comes first,
and the row with ID = 1 comes second.

>More and more programmers who have absolutely no database training are
>being told to design a database.

(snip)

>Most of them simply start programming SQL as if it were their native
>programming language. They grab at GUIDs, IDENTITY, ROWID and other
>proprietary auto-numbering "features" in SQL products to imitate either
>a record number (sequential file system mindset) or OID (OO mindset)
>since they don't know anything else.

Yes, that's what newbies often do. They fail to see when to use and
especially when not to use an IDENTITY. They think that IDENTITY can be
used as the only key in a table. Of course, experienced database
designers like you know better - you know that the choice to use a
surrogate key requires you to include the real key as well, and to
include a constraint that will ensure uniqueness of the real key.

BTW, newbies also abuse CHARACTER columns to store formatted dates. Does
this mean that the CHARACTER datatype is inherently flawed? Or does this
only mean that almost any feature of SQL can be abused by stupid people?

>Experienced database designers look for industry standard codes for
>their keys first.

Indeed, they do. And if performance is important, or if there is a fair
chance that the industry standard code might change one day (longer UPC
code, ZIP+, legal issues requiring you to stop using SSN, etc) and they
don't want to accept the downtime to propagate such a time to all their
billion-row tables, they will add a suitable surrogate key.

>Try to tell the IRS you don't have anybody's SSN at
>tax time but Cindy Lou Who was employee 42 when you put her into the
>Personnel table and you have the IDENTITY value to prove it.

(more of these examples snipped)

The essence of a surrogate key is that acts as a surrogate for the real
key. The real key is still in the entities' own table, but the surrogate
is used to refer to the row in foreign key relationships.

Cindy Lou Who's SSN will be in the Personnel tabel. But in the
Gratifications table, you'll only find the number 42.

>But let's look at the logical problems. First try to create a table
>with two columns and try to make them both IDENTITY. If you cannot
>declare more than one column to be of a certain data type, then that
>thing is not a data type at all, by definition. It is a property which
>belongs to the PHYSICAL table, not the LOGICAL data in the table.

Correct. IDENTITY is not a data type. It's a property. Look at the table
definition above - the data type is INTEGER.
The identity property is a generator for surrogate keys - no point in
having two surrogate keys in the same table.

>Next, create a table with one column and make it an IDENTITY. Now try
>to insert, update and delete different numbers from it. If you cannot
>insert, update and delete rows from a table, then it is not a table by
>definition.

If you have only a surrogate key column but no other column, then what
is the surrogate key a surrogate for? Indeed - that is not a table.

>Finally create a simple table with one IDENTITY and a few other
>columns. Use a few statements like
>
>INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
>INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
>INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
>
>To put a few rows into the table and notice that the IDENTITY
>sequentially numbered them in the PHYSICAL order they were presented.
>If you delete a row, the gap in the sequence is not filled in and the
>sequence continues from the highest number that has ever been used in
>that column in that particular table. This is how we did record numbers
>in pre-allocated sequential files in the 1950's, by the way.

Incidentally, yes. But note that it also behaves exactly the way Dr.Codd
said surrogate keys should behave: generated by the system, and users
have no control over it's value.

> A utility
>program would then "pack" or "compress" the records that were flagged
>as deleted or unused to move the empty space to the PHYSICAL end of the
>PHYSICAL file. IDENTITY leaves the gaps unless you write your own
>routine to do the compression.

You can't, since SQL Server doesn't allow you to change the values in an
IDENTITY column. Remember: "Database users may cause the system to

generate or delete a surrogate, but they have no control over its value"

>But now use a statement with a query expression in it, like this:

>
>INSERT INTO Foobar (a, b, c)
>SELECT x, y, z
>FROM Floob;
>
>Since a query result is a table, and a table is a set which has no
>ordering, what should the IDENTITY numbers be? The entire, whole,
>completed set is presented to Foobar all at once, not a row at a time.
>There are (n!) ways to number (n) rows, so which one did you pick? Why?

Who cares? As long as they're unique, they serve their purpose.

>The answer has been to use whatever the *PHYSICAL* order of the result
>set happened to be. That non-relational phrase "PHYSICAL order" again!

So if MS decides for it's next release to use a randomizer to reshuffle
the identity values assigned before storing them, you wouldn't object?

>But it is actually worse than that. If the same query is executed
>again, but with new statistics or after an index has been dropped or
>added, the new execution plan could bring the result set back in a
>different PHYSICAL order. Indexes and statistics are not part of the
>logical model.

So? Assuming you did not forget to declare the real key as either
PRIMARY KEY or UNIQUE, executing the same query again would simply
result in a constraint violation, and no rows inserted into the table.

>Can you explain from a logical model why the same rows in the second
>query get different IDENTITY numbers? In the relational model, they
>should be treated the same if all the values of all the attributes are
>identical.

Yes, as a matter of fact I can. To quote Dr.Codd: "Database users (...)
have no control over its value".

>How do you verify that an entity has the right key when you use an
>IDENTITY?

By using the real key. As I said before: experienced database designers
will never forget to include the real key along with the surrogate key;
that's a mistake only newbies make.

>If I lean on a mouse button, I can insert the same data with a new
>IDENTITY over and over.

So you DID forget to include the real key in the table? Or to include
the appropriate constraint?

Oh dear.

> Cindy Lou Who is now employed two times and
>none of my reports are right! Now I have to write some procedural code
>like a trigger or a UNIQUE constraint on her SSN to prevent this, thus
>making the IDENTITY redundant.

Eh? Since when is a UNIQUE constraint procedural?

And the IDENTITY is not made redundant if you declare the SSN to be
UNIQUE (like you should have done in the first palce) - it's a surrogate
key; it serves a different purpose. The SSN is there to relate the row
in the Personnel table to the employee in front of your desk. The
IDENTITY value is there to relate rows in other tables to the row in the
Personnel table.

> But we were assuming that we use only
>IDENTITY as a key, so we are screwed.

Not "we" were - you were assuming that. You are collecting examples of
bad code involving IDENTITY and using that to say that IDENTITY is bad.
But you ignore examples of code where the IDENTITY property is used
without any problems.

It's like saying that cars should be abolished because stupid people
drive drunk and cause accidents.

>Another cute way to destroy data integrity:
>
>BEGIN ATOMIC
>DELETE FROM Foobar
>WHERE id = <<some identity value>>;
>INSERT INTO Foobar
>VALUES (<<recreate deleted row>>)
>END;
>
>Logically this should do nothing, but since IDENTITY has gaps, it
>trashes the data.

Eh? I see two possibilities:

1. The row you try to delete is still refered to in another table. In
that case, the DELETE will fail bacuse of a FOREIGN KEY constraint
violation.
2. The row you try to delete is not refered to in any other table. In
that case, it is deleted and a new row with the same values in inserted.
The IDENTITY value will indeed be different, but that's not important
since it's just a surrogate. For the user (who doesn't get to see the
IDENTITY value), the same data has reappeared, and there are still no
other rows referencing it.

In short - the effect is no different from executing the same commands
against a table without surrogate key.

>Finally, an appeal to authority, with a quote from Dr. Codd:
>"..Database users may cause the system to generate or delete a
>surrogate, but they have no control over its value, nor is its value
>ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
>E. (1979), Extending the database relational model to capture more
>meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.
>

>This means that a surrogate ought to act like an index; created by the
>user, managed by the system and NEVER seen by a user.

Indeed.

> That means never
>used in queries, DRI or anything else that a user does.

But here, you miss the mark. If I write a query to join two tables on a
surrogate key column, do I see any values? Heck, no!
If I write a FOREIGN KEY constraint, do I see any value? Heck, no!

>See this article for problems in replication with IDENTITY columns.
>http://www.simple-talk.com/2005/07/05/replication/

Did you actually read the article?

The article mentions three problems.
For the first two, a repro script is given - but if you scroll down,
you'll see that there are also easy solutions supplied for these two
problems.
For the third problem, no solution is given. But there's allso no repro
script. Which is not surprising, when you read:
"The third problem is typically caused when all of the columns
in a row are updated by the application, including the identity
column."
Have you ever tried to update an identity column in SQL Server? Run the
CREATE TABLE and INSERT statements above, then this:

UPDATE Foobar
SET ID = 8,
Name = 'John'
WHERE ID = 1
AND Name = 'Joe'

The result is

Server: Msg 8102, Level 16, State 1, Line 1
Cannot update identity column 'ID'.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

BobTheDataBaseBoy

unread,

Aug 23, 2005, 9:42:05 PM8/23/05

to

can you please go yell at the knuckleheads i work with?? half are
COBOL/VSAM
refugees of an older age. the other half are java/object folks of a
much younger age. neither half realizes that they both do the same
thing. it is very frustrating to hear talk of "open systems". blech.

BTDB

paul c

unread,

Aug 23, 2005, 9:52:26 PM8/23/05

to

BobTheDataBaseBoy wrote:
> ...

> can you please go yell at the knuckleheads i work with?? half are
> COBOL/VSAM
> refugees of an older age. the other half are java/object folks of a
> much younger age. neither half realizes that they both do the same
> thing. it is very frustrating to hear talk of "open systems". blech.

> ...

heh, especially frustrating when one goal of the RM is to be a 'closed'
system. i wonder if some marketing hack deliberatly coined the 'open
system' BS just to jab the RM people. on second thought it probably
wasn't a jab, but i'll bet it was a marketing hack who coined it.

pc

Alan

unread,

Aug 23, 2005, 10:54:22 PM8/23/05

to

My topper is: "Have you ever wired plug boards?"
>

Yes, I have. here's another one to use: Have you ever drawn a diagonal line
on the side of a deck of punch cards?

Frank_Hamersley

unread,

Aug 24, 2005, 1:51:56 AM8/24/05

to

"Alan" <not...@uhuh.rcn.com> wrote in message
news:yjROe.2793$k92.2212@trndny05...

Yeehar! Using CDC Update to mod the OS on the 73 using 80 column cards was
the best time!

It was very formative - in fact I still style my code like it must fit in 72
chars :-)

Cheers, Frank.

David Cressey

unread,

Aug 24, 2005, 8:45:27 AM8/24/05

to

"BobTheDataBaseBoy" <"xxx at rcn dot com"> wrote in message
news:hJWdneyPOJe...@rcn.net...
> David Cressey wrote:

> > Programmers will always gravitate towards viewing the data in "their
> > databases" as their private bailiwick, and insist that users of the
data
> > access it through their own API. Learning SQL is certainly better than
> > learning a hundred programmer's different APIs.
> >
> can you please go yell at the knuckleheads i work with?? half are
> COBOL/VSAM
> refugees of an older age. the other half are java/object folks of a
> much younger age. neither half realizes that they both do the same
> thing. it is very frustrating to hear talk of "open systems". blech.
>

For $100 an hour, I'll go yell at knuckleheads. For $10 an hour, I won't.
I'm sorry to be so crass about it, but that's the way I feel about such an
activity.

I used to teach "Rdb Programming and Design" for DEC as a contractor, after
I left their employ. Some of the programmers taking the course were
knuckleheads, some weren't.

Mike

unread,

Aug 24, 2005, 10:04:26 AM8/24/05

to

On 21 Aug 2005 14:32:19 -0700, "vldm10" <vld...@yahoo.com> wrote:

[snip]

>Finally, is there a good specific solution for this "second hand
>data"?

In legal systems, we usually model this kind of thing as as assertion
(what someone said). In the example statements below, '1' is what
designers usually find at the beginning of an ORM design; '2' is what
I'm talking about.

1. "Fred Flintstone's driver's license number is B1234567."

2. a) "Officer Rubble said that
Fred Flintstone
had California driver's license number B1234567."

b) "The California DMV said that
Fred Flintstone
has never been issued a California driver's license."

--
Mike Sherrill
Information Management Systems

vldm10

unread,

Aug 24, 2005, 3:30:07 PM8/24/05

to

To avoid ambiguity we should emphasize that two types of surrogate were
discussed in this thread.
Surrogates in the Codd sense - system generated and maintained entity
identifiers, and surrogates that are user defined and user controlled,
which were in the RM before 1979.

Vladimir Odrljin

x

unread,

Aug 19, 2005, 5:47:41 AM8/19/05

to

"Mike Meng" <meng...@gmail.com> wrote in message
news:1124349957.1...@g49g2000cwa.googlegroups.com...
> Hi all,
> I'm enjoying reading C. J. Date's new book "Database in Depth", in
> which I know the www.dbdebunk.com website. I visited the site just now,
> and find the following statement in its "QUOTE OF THE WEEK" section:
>
> ** QUOTE **
> Don't use primary keys that have meaning for the end user, such as
> invoice number or the ISBN value.
>
> --Giuseppe DiMauro & Francesco Balena, PRACTICAL GUIDELINES AND BEST
> PRACTICES FOR MICROSOFT VISUAL BASIC AND VISUAL C# DEVELOPERS
> ** QUOTE **
>
> It seems that they quote this idea to point out how wrong it is. But
> I myself always regard this idea as a good practice in database design!
> Am I right? If not, what's wrong with it? Please comment.

The quote does not mention if the key is generated by the DBMS or by the
application.
It only say that it has no meaning for the end user.

This means that there must be a one to one mapping between the generated key
and some key with a meaning for the end user. Therefore that meaningless
primary key is a pointer. But one of the goals of the relational model is to
eliminate pointers from the data model.

Other goal is to give the end user direct access to data.

x

unread,

Aug 19, 2005, 5:50:55 AM8/19/05

to

x

unread,

Aug 22, 2005, 3:18:39 AM8/22/05

to

x

unread,

Aug 23, 2005, 1:59:56 AM8/23/05

to

x

unread,

Aug 22, 2005, 3:14:24 AM8/22/05

to

Alexandr Savinov

unread,

Aug 26, 2005, 7:30:16 AM8/26/05

to

x schrieb:

> This means that there must be a one to one mapping between the generated key
> and some key with a meaning for the end user. Therefore that meaningless
> primary key is a pointer. But one of the goals of the relational model is to
> eliminate pointers from the data model.

What is the difference between meaningless and meaningful pointer? I
think the model itself does not know anything about the meaning of the
pointers/keys it uses. In this sense the question about the meaning of
pointers/keys relates to data modeling in general rather than to the RM.
In other words, the question is if we should choose meaningful or
meaningless identifiers for our entites.

> Other goal is to give the end user direct access to data.

Any access is indirect (by definition). By direct access we normally
mean some lower level mechanism of access w.r.t. this level.
(Absolutely) direct access does not exist just like instant interaction
does not exist. Possibly you mean an illusion of direct access like in
OOP where we manipulate object like if they were directly accessible.

--
http://conceptoriented.com

Marshall Spight

unread,

Aug 26, 2005, 12:34:30 PM8/26/05

to

x wrote:
>
> This means that there must be a one to one mapping between the generated key
> and some key with a meaning for the end user. Therefore that meaningless
> primary key is a pointer. But one of the goals of the relational model is to
> eliminate pointers from the data model.

I'm not sure I share this point of view. I propose that
every pointer is meaningless; every key is meaningful,
whether system generated or not. The meaning is exactly
that it is the identity of the row. A randomly-generated
customer id still means something.

Another difference between keys and pointers is that
keys are content-addressible, while pointers are
location-addressible.

The differences between keys and pointers are small and
sometimes subtle, but useful nonetheless.

Marshall

dawn

unread,

Aug 26, 2005, 2:10:06 PM8/26/05

to

Marshall Spight wrote:
> x wrote:
> >
> > This means that there must be a one to one mapping between the generated key
> > and some key with a meaning for the end user. Therefore that meaningless
> > primary key is a pointer. But one of the goals of the relational model is to
> > eliminate pointers from the data model.
>
> I'm not sure I share this point of view. I propose that
> every pointer is meaningless; every key is meaningful,
> whether system generated or not. The meaning is exactly
> that it is the identity of the row. A randomly-generated
> customer id still means something.
>
> Another difference between keys and pointers is that
> keys are content-addressible, while pointers are
> location-addressible.

and there have been several prior discussions on pointers that I think
got most of us to the point of understanding that the pointers that the
relational model were eliminating were related to memory locations.
When talking about data that serve as references to other data at the
logical level, there is nothing in the relational model that prohibits
or even discourages such.

> The differences between keys and pointers are small and
> sometimes subtle, but useful nonetheless.
>

It is interesting to me that the relational model that some say is
intended as a logical model for data had as one of its goals a physical
issue. What do you make of that? --dawn

David Cressey

unread,

Aug 26, 2005, 5:43:24 PM8/26/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1125074070....@z14g2000cwz.googlegroups.com...

> x wrote:
> >
> > This means that there must be a one to one mapping between the generated
key
> > and some key with a meaning for the end user. Therefore that meaningless
> > primary key is a pointer. But one of the goals of the relational model
is to
> > eliminate pointers from the data model.
>
> I'm not sure I share this point of view. I propose that
> every pointer is meaningless; every key is meaningful,
> whether system generated or not. The meaning is exactly
> that it is the identity of the row. A randomly-generated
> customer id still means something.
>

I'm not sure I share YOUR point of view, either. I would suggest that the
key is not the identity of the row, but the identity
of the entity described in the row. If we were to delete the row, and then
insert another row with a different key, who is to say that the system
didn't reuse the unused row?

> Another difference between keys and pointers is that
> keys are content-addressible, while pointers are
> location-addressible.

And the important thing, as far as I'm concerned, is that rows are
"unpinned" in the RM.
In practice, this is only partly true. If the DBMS alters either the
content or the location of a row, it may have to do some index maintenance
to reflect that change. But at least a row is unpinned except for the pins
in the index.

Marshall Spight

unread,

Aug 26, 2005, 6:18:47 PM8/26/05

to

dawn wrote:

> Marshall Spight wrote:
> >
> > Another difference between keys and pointers is that
> > keys are content-addressible, while pointers are
> > location-addressible.
>
> and there have been several prior discussions on pointers that I think
> got most of us to the point of understanding that the pointers that the
> relational model were eliminating were related to memory locations.
> When talking about data that serve as references to other data at the
> logical level, there is nothing in the relational model that prohibits
> or even discourages such.

I don't use the word "pointer" for such things. If we broaden
the term "pointer" that far, it stops meaning much of anything.
The appropriately generic term for data that *could* serve
as references to other data at the logical level is "data".

> > The differences between keys and pointers are small and
> > sometimes subtle, but useful nonetheless.
> >
> It is interesting to me that the relational model that some say is
> intended as a logical model for data had as one of its goals a physical
> issue. What do you make of that? --dawn

I don't know what you're referring to.

Marshall

Marshall Spight

unread,

Aug 26, 2005, 6:24:37 PM8/26/05

to

David Cressey wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message
> news:1125074070....@z14g2000cwz.googlegroups.com...
> > x wrote:
> > >
> > > This means that there must be a one to one mapping between the generated
> key
> > > and some key with a meaning for the end user. Therefore that meaningless
> > > primary key is a pointer. But one of the goals of the relational model
> is to
> > > eliminate pointers from the data model.
> >
> > I'm not sure I share this point of view. I propose that
> > every pointer is meaningless; every key is meaningful,
> > whether system generated or not. The meaning is exactly
> > that it is the identity of the row. A randomly-generated
> > customer id still means something.
> >
> I'm not sure I share YOUR point of view, either. I would suggest that the
> key is not the identity of the row, but the identity
> of the entity described in the row. If we were to delete the row, and then
> insert another row with a different key, who is to say that the system
> didn't reuse the unused row?

I use the term "row" as a logical term, not a physical one.

> > Another difference between keys and pointers is that
> > keys are content-addressible, while pointers are
> > location-addressible.
>
> And the important thing, as far as I'm concerned, is that rows are
> "unpinned" in the RM.

Yes.

> In practice, this is only partly true. If the DBMS alters either the
> content or the location of a row, it may have to do some index maintenance
> to reflect that change. But at least a row is unpinned except for the pins
> in the index.

I think I'd just change this a tiny bit to say "In the implementation,

this is only partly true."

Marshall

dawn

unread,

Aug 26, 2005, 11:38:12 PM8/26/05

to

Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:
> > >
> > > Another difference between keys and pointers is that
> > > keys are content-addressible, while pointers are
> > > location-addressible.
> >
> > and there have been several prior discussions on pointers that I think
> > got most of us to the point of understanding that the pointers that the
> > relational model were eliminating were related to memory locations.
> > When talking about data that serve as references to other data at the
> > logical level, there is nothing in the relational model that prohibits
> > or even discourages such.
>
> I don't use the word "pointer" for such things. If we broaden
> the term "pointer" that far, it stops meaning much of anything.
> The appropriately generic term for data that *could* serve
> as references to other data at the logical level is "data".

That works for me too, but I read what was written at
http://en.wikipedia.org/wiki/DBMS about multivalue databases and
pointers and I don't get it. I've heard it before and I think it is
hogwash, but it keeps coming back up, and we have established in the
past (or at least I have) that sometimes I'm the one bringing the
hogwash, so I want to get this straight.

So, is this wikipedia entry correct? I think it is only suggesting
that the mv systems navigate using foreign keys (rather than only
permitting joins and set functions in the logical layer and doing the
navigation under the covers). So, why do they use the term "pointer"
for this? Is that page correct the way it is, or should I (or someone)
change it?

Thanks. --dawn

Marshall Spight

unread,

Aug 27, 2005, 12:41:15 AM8/27/05

to

dawn wrote:
> Marshall Spight wrote:
> >
> > I don't use the word "pointer" for such things. If we broaden
> > the term "pointer" that far, it stops meaning much of anything.
> > The appropriately generic term for data that *could* serve
> > as references to other data at the logical level is "data".
>
> That works for me too, but I read what was written at
> http://en.wikipedia.org/wiki/DBMS about multivalue databases and
> pointers and I don't get it. I've heard it before and I think it is
> hogwash, but it keeps coming back up, and we have established in the
> past (or at least I have) that sometimes I'm the one bringing the
> hogwash, so I want to get this straight.
>
> So, is this wikipedia entry correct?

I didn't read the whole thing, but what I did read was crap.
I wasn't sure what specific part of the article you're referring
to, so I can't directly address whether that specific part was
crap. But I bet it was crap.

Wikipedia is highly variable. It may be that in a few years
this entry will be a lot better. You could try to improve it
yourself, of course, in the spirit of wiki. I confine myself
to correcting obvious factual errors, spelling errors, and
grammatical errors. I help in my own small way. :-)

> I think it is only suggesting
> that the mv systems navigate using foreign keys (rather than only
> permitting joins and set functions in the logical layer and doing the
> navigation under the covers). So, why do they use the term "pointer"
> for this? Is that page correct the way it is, or should I (or someone)
> change it?

You should do some edits if you feel up to it.

Marshall

David Cressey

unread,

Aug 27, 2005, 2:08:19 AM8/27/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125094727.3...@g14g2000cwa.googlegroups.com...

> dawn wrote:
> > Marshall Spight wrote:
> > >
> > > Another difference between keys and pointers is that
> > > keys are content-addressible, while pointers are
> > > location-addressible.
> >
> > and there have been several prior discussions on pointers that I think
> > got most of us to the point of understanding that the pointers that the
> > relational model were eliminating were related to memory locations.
> > When talking about data that serve as references to other data at the
> > logical level, there is nothing in the relational model that prohibits
> > or even discourages such.
>
> I don't use the word "pointer" for such things. If we broaden
> the term "pointer" that far, it stops meaning much of anything.
> The appropriately generic term for data that *could* serve
> as references to other data at the logical level is "data".
>
>

Agreed.

David Cressey

unread,

Aug 27, 2005, 2:15:12 AM8/27/05

to

"dawn" <dawnwo...@gmail.com> wrote in message
news:1125113887.4...@o13g2000cwo.googlegroups.com...
> Marshall Spight wrote:

> > I don't use the word "pointer" for such things. If we broaden
> > the term "pointer" that far, it stops meaning much of anything.
> > The appropriately generic term for data that *could* serve
> > as references to other data at the logical level is "data".
>
> That works for me too, but I read what was written at
> http://en.wikipedia.org/wiki/DBMS about multivalue databases and
> pointers and I don't get it. I've heard it before and I think it is
> hogwash, but it keeps coming back up, and we have established in the
> past (or at least I have) that sometimes I'm the one bringing the
> hogwash, so I want to get this straight.

I followed your link, and I saw no reference to multivalue databases. Did I
miss something?

David Cressey

unread,

Aug 27, 2005, 2:20:44 AM8/27/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125095077.4...@g49g2000cwa.googlegroups.com...

I'm not following you. I *think* I use the term "row" as a logical one as
well.

>
>
> > > Another difference between keys and pointers is that
> > > keys are content-addressible, while pointers are
> > > location-addressible.
> >
> > And the important thing, as far as I'm concerned, is that rows are
> > "unpinned" in the RM.
>
> Yes.
>
>
> > In practice, this is only partly true. If the DBMS alters either the
> > content or the location of a row, it may have to do some index
maintenance
> > to reflect that change. But at least a row is unpinned except for the
pins
> > in the index.
>
> I think I'd just change this a tiny bit to say "In the implementation,
> this is only partly true."

Agreed, with one more tiny change: "In current implementations, this is
only partly true"

>
>
> Marshall
>

David Cressey

unread,

Aug 27, 2005, 2:22:32 AM8/27/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125117675....@f14g2000cwb.googlegroups.com...
> dawn wrote:

> > So, is this wikipedia entry correct?
>
> I didn't read the whole thing, but what I did read was crap.
> I wasn't sure what specific part of the article you're referring
> to, so I can't directly address whether that specific part was
> crap. But I bet it was crap.

I didn't read the whole thing either, but the part I read sounded ok to me.
What did you think was crap?

Marshall Spight

unread,

Aug 27, 2005, 2:50:30 AM8/27/05

to

David Cressey wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message

> > > If we were to delete the row, and then
> > > insert another row with a different key, who is to say that the system
> > > didn't reuse the unused row?
> >
> > I use the term "row" as a logical term, not a physical one.
>
> I'm not following you. I *think* I use the term "row" as a logical one as
> well.

I think we're almost entirely in agreement here. I just thought the
idea of "reus[ing] the unused row" was really only an issue at the
physical level.

> > > In practice, this is only partly true. If the DBMS alters either the
> > > content or the location of a row, it may have to do some index
> maintenance
> > > to reflect that change. But at least a row is unpinned except for the
> pins
> > > in the index.
> >
> > I think I'd just change this a tiny bit to say "In the implementation,
> > this is only partly true."
>
> Agreed, with one more tiny change: "In current implementations, this is
> only partly true"

Perfect!

Marshall

Marshall Spight

unread,

Aug 27, 2005, 2:56:22 AM8/27/05

to

David Cressey wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message
> news:1125117675....@f14g2000cwb.googlegroups.com...
> > dawn wrote:
>
> > > So, is this wikipedia entry correct?
> >
> > I didn't read the whole thing, but what I did read was crap.
>

> I didn't read the whole thing either, but the part I read sounded ok to me.
> What did you think was crap?

"But the query performance in fully normalized relational
databases can be quite poor. For instance, to find the address
of the user named Bob, these implementations may look up Bob
in the USER table, find his "primary key" (the login name),
and then search the ADDRESS table for that key. Although this
appears to be a single operation to the user, in most
implementations it requires a complex and time consuming
search through the tables."

Complex and time consuming; yeah right. Two indexed lookups.

Marshall

David Cressey

unread,

Aug 27, 2005, 6:41:01 AM8/27/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125125430.5...@g43g2000cwa.googlegroups.com...
> David Cressey wrote:

> I think we're almost entirely in agreement here. I just thought the
> idea of "reus[ing] the unused row" was really only an issue at the
> physical level.
>

I think we are in basic agreement.

I've thought about this some more, and here's what I come up with:

When I'm being self-conscious about my language (as now), I think of a row
as a logical data structure, and therefore a container. I don't think of
the row as the data thus contained.

Most of the time, when I speak or write, I'm not self-conscious. And I
don't bother to distinguish between the row and its contents. So I think of
<code>

select * from employees where employee_id = 12345
</code>

as "retrieving a row" rather than "retrieving some row uniquely specified by
its partial contents".

So, to me, a foreign key is not just "a logical pointer to the row in the
reference table". It is
"the conventional name of an entity catalogued in the reference table".

This is really splitting hairs.

David Cressey

unread,

Aug 27, 2005, 6:41:01 AM8/27/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125125782.0...@g47g2000cwa.googlegroups.com...

You are right. This is crap.

dawn

unread,

Aug 27, 2005, 10:05:26 AM8/27/05

to

They mention "pick" (aka multivalue) in the multidimensional section
(and there a logical connection, but we aren't talking typical MOLAP
with pick) and sometimes pick db's do call themselves multidimensional,
so I opted not to mention that little issue.

--dawn

David Cressey

unread,

Aug 28, 2005, 3:46:04 PM8/28/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1125117675....@f14g2000cwb.googlegroups.com...

> > So, is this wikipedia entry correct?

>
> I didn't read the whole thing, but what I did read was crap.
> I wasn't sure what specific part of the article you're referring
> to, so I can't directly address whether that specific part was
> crap. But I bet it was crap.
>
> Wikipedia is highly variable. It may be that in a few years
> this entry will be a lot better. You could try to improve it
> yourself, of course, in the spirit of wiki. I confine myself
> to correcting obvious factual errors, spelling errors, and
> grammatical errors. I help in my own small way. :-)

I looked into this some more, and I find the following surprising
divergence between two entries in wikipedia.

http://en.wikipedia.org/wiki/DBMS

and

http://en.wikipedia.org/wiki/Database

The second one makes a great deal more sense to me. Who coordinates
wikipedia? The former chief architect at the tower of Babel?

Marshall Spight

unread,

Aug 28, 2005, 4:07:15 PM8/28/05

to

David Cressey wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message
> >

> > Wikipedia is highly variable...

>
> I looked into this some more, and I find the following surprising
> divergence between two entries in wikipedia.
>
> http://en.wikipedia.org/wiki/DBMS
>
> and
>
> http://en.wikipedia.org/wiki/Database
>
> The second one makes a great deal more sense to me. Who coordinates
> wikipedia? The former chief architect at the tower of Babel?

Everyone coordinates it. This is the same thing as saying that
no one coordinates it. :-)

If you want to make changes, you just make them. Someone else might
change what you wrote, etc. etc. For example, just yesterday I
came across a mistake in this entry:

http://en.wikipedia.org/wiki/Boolean_algebra

The Huntington equation and the Robbins equation were both
just fragments of the right thing:

Huntington equation: n(n(x)+y) + n(n(x)+n(y)).
Robbins Equation: n(n(x+y)+n(x+n(y))).

But it should be:

Huntington equation: n(n(x)+y) + n(n(x)+n(y)) = x.
Robbins Equation: n(n(x+y)+n(x+n(y))) = x.

So I just fixed it.

(I like this example because it makes me sound smart, but in fact
I have zero understanding of either equation. I just observed
that they weren't, in fact, equations, so I went and looked them
up elsewhere. A variety of other trustworthy sources agreed
as to what it should be. But I am left with questions about
boolean algebra. Maybe another thread.)

Marshall

x

unread,

Aug 29, 2005, 3:09:44 AM8/29/05

to

"Alexandr Savinov" <sp...@conceptoriented.com> wrote in message
news:430efd5d$1...@news.fhg.de...

> x schrieb:
> > This means that there must be a one to one mapping between the generated
key
> > and some key with a meaning for the end user. Therefore that meaningless
> > primary key is a pointer. But one of the goals of the relational model
is to
> > eliminate pointers from the data model.

> What is the difference between meaningless and meaningful pointer?

If the key is meaningful for the end-user, it is not a pointer.

> I think the model itself does not know anything about the meaning of the
> pointers/keys it uses. In this sense the question about the meaning of
> pointers/keys relates to data modeling in general rather than to the RM.

> In other words, the question is if we should choose meaningful or
> meaningless identifiers for our entites.

If they are meaningless, they are not identifiers.

> > Other goal is to give the end user direct access to data.

> Any access is indirect (by definition). By direct access we normally
> mean some lower level mechanism of access w.r.t. this level.
> (Absolutely) direct access does not exist just like instant interaction
> does not exist. Possibly you mean an illusion of direct access like in
> OOP where we manipulate object like if they were directly accessible.

Direct access as in access to data by standard tools, not by specialized
applications.

x

unread,

Aug 29, 2005, 4:15:24 AM8/29/05

to

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1125074070....@z14g2000cwz.googlegroups.com...
> x wrote:

> > This means that there must be a one to one mapping between the generated
key
> > and some key with a meaning for the end user. Therefore that meaningless
> > primary key is a pointer. But one of the goals of the relational model
is to
> > eliminate pointers from the data model.

> I'm not sure I share this point of view. I propose that
> every pointer is meaningless; every key is meaningful,
> whether system generated or not.

Meaningless for whom ?
Meaningful for whom ?

>The meaning is exactly
> that it is the identity of the row. A randomly-generated
> customer id still means something.

That customer_id is meaningful for the end-user even if it was once
randomly-generated.
If it is randomly-generated every time some customer buy something, then it
is not a customer_id.

x

unread,

Aug 29, 2005, 5:11:01 AM8/29/05

to

"dawn" <dawnwo...@gmail.com> wrote in message

news:1125079806....@g47g2000cwa.googlegroups.com...

> and there have been several prior discussions on pointers that I think
> got most of us to the point of understanding that the pointers that the
> relational model were eliminating were related to memory locations.

> When talking about data that serve as references to other data at the
> logical level, there is nothing in the relational model that prohibits
> or even discourages such.

One goal of the relational model is making all data explicit.

Example:
E1(p1, id1, ...) with p1 primary key, id1 candidate key
E2(p2, id2,...) with p2 primary key, id2 candidate key
R(p1,p2,...) with (p1,p2) primary key

What is the reason for introducing p1 and p2 with no meaning for the
end-user ?
Are p1 and p2 references to id1 and id2 ?
E1(id1,...)
E2(id2,...)
R(id1,id2,...)
is a better or an worse model ?

> It is interesting to me that the relational model that some say is
> intended as a logical model for data had as one of its goals a physical
> issue. What do you make of that? --dawn

One goal of a logical model is to be (a) logical (model).

x

unread,

Aug 29, 2005, 5:39:49 AM8/29/05

to

"dawn" <dawnwo...@gmail.com> wrote in message
news:1125079806....@g47g2000cwa.googlegroups.com...

> When talking about data that serve as references to other data at the

> logical level, there is nothing in the relational model that prohibits
> or even discourages such.

Then let's :
1) introduce for every domain D a relation DR(val_id, value)
2) replace every distinct occurence of a value with occurence_id
3) introduce the relation OV(occurence_id, val_id) such as the value
referenced by occurence_id is the same with the one referenced by val_id
4) normalize the relations resulted from the transformation at 2)

:-)

Alexandr Savinov

unread,

Aug 29, 2005, 6:33:42 AM8/29/05

to

x schrieb:

You have only one serious flaw in your reasoning:

we need an element in our model that will denote an end-user

In this sense I find your approach to defining meaningfulness rather
useful but unfortunately I do not know a formal theory that could deal
with end-users as an integral part of the model. We can dras boxes,
arrows, circles but it will not be a formal model.

In general I think that we lack information on "identtity modeling"
althoug it is as important as data modeling itself. Identity modeling is
a separate topic, a dual part for data modeling. In other words, we can
model identity ignoring object properties. And it may well be rather
complex model. It will involve entities without properties - only
identities. The following properties of identity make this task rather
difficult:

- Identity is distributed among many entites. It can be hierarchical.
For example, an element of categorization might have several segments
each specifying relative position. A fully qualified identifier then is
composed of several identifiers (for example, several primary keys taken
from different tables - having one primary key is not enough).

- Identity cannot be considered without its scope. For example, a
physical address is retricted by the scope of one computer, a primary
key might be restricted by one database etc.

- Logical/physical is a relative characterization rather than absolute.
Memory handle is really physical for an application program that uses
it, but it is logical for operating system w.r.t. to absolute offsets
in physical memory (offset may change while memory handle does not
change). In this sense all those disputes about lgoical/physical are
meaningless without specifying the context. Primary key may well be
viewed as a physical identifier from the point of view of some higher
level identification mechanism, say, global id. This means that global
id is permanent while primary key it substitues may change.

- Any identifier is based on some environemnt that provides a coordinate
system that it uses to produce its own identifiers. In other words, any
new identification system is based on some lower level identification
system (environemtn or context) with its scope and structure.

Currently no one model provides anything that would deal with the topic
of identity modeling. We can model columns and object properties more or
less successfully. But another side is still in darkness. We have tools
to implement identification mechanisms ourselves but we do not have a
theory for that.

--
http://conceptoriented.com

x

unread,

Aug 29, 2005, 10:18:03 AM8/29/05

to

"Alexandr Savinov" <sp...@conceptoriented.com> wrote in message

news:4312e49d$1...@news.fhg.de...

> You have only one serious flaw in your reasoning:
>
> we need an element in our model that will denote an end-user

> In this sense I find your approach to defining meaningfulness rather
> useful but unfortunately I do not know a formal theory that could deal
> with end-users as an integral part of the model. We can dras boxes,
> arrows, circles but it will not be a formal model.

But we have the three level architecture of DBMSs.
About formal models, I don't know either but I found this
http://people.cs.vt.edu/~jaburge/abstract.html .

Thank you for this list.

x

unread,

Aug 29, 2005, 10:29:25 AM8/29/05

to

"Alexandr Savinov" <sp...@conceptoriented.com> wrote in message

news:4312e49d$1...@news.fhg.de...

> You have only one serious flaw in your reasoning:

One is usually enough :-)

Marshall Spight

unread,

Aug 29, 2005, 11:19:15 AM8/29/05

to

x wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message

> > > eliminate pointers from the data model.
>
> > I'm not sure I share this point of view. I propose that
> > every pointer is meaningless; every key is meaningful,
> > whether system generated or not.
>
> Meaningless for whom ?
> Meaningful for whom ?

People.

For computers, which one is meaningful is reversed.

> >The meaning is exactly
> > that it is the identity of the row. A randomly-generated
> > customer id still means something.
>
> That customer_id is meaningful for the end-user even if it was once
> randomly-generated.
> If it is randomly-generated every time some customer buy something, then it
> is not a customer_id.

Exactly!

Marshall

mAsterdam

unread,

Aug 29, 2005, 3:29:35 PM8/29/05

to

Alexandr Savinov wrote:
[snip]

This deserves - at least - a thread of it's own.
If it would be a /. moderation I'ld have a hard time to choose between
'Insightful' and 'Interesting'.
No time to comment at this time, though :-(

Gene Wirchenko

unread,

Aug 29, 2005, 4:25:36 PM8/29/05

to

On Fri, 19 Aug 2005 12:47:41 +0300, "x" <x...@not-exists.org> wrote:

[snip]

>The quote does not mention if the key is generated by the DBMS or by the
>application.
>It only say that it has no meaning for the end user.

>
>This means that there must be a one to one mapping between the generated key
>and some key with a meaning for the end user. Therefore that meaningless
>primary key is a pointer. But one of the goals of the relational model is to

No. It is a key value. Unless it states where the data is
located (as in storage medium location (i.e. address)), it is not a
pointer.

>eliminate pointers from the data model.

"pointers" not "keys".

[snip]

Sincerely,

Gene Wirchenko

dawn

unread,

Aug 30, 2005, 12:02:15 AM8/30/05

to

mAsterdam wrote:
> Alexandr Savinov wrote:
> [snip]
> > In general I think that we lack information on "identtity modeling"
> > althoug it is as important as data modeling itself. Identity modeling is
> > a separate topic, a dual part for data modeling. In other words, we can
> > model identity ignoring object properties.

Would this be what I have been simply calling entity modeling, but
where I leave out all weak entities and non-key, non-foreign-key
properties? Or are you suggesting modeling the ID's rather than the
entities with their id's?

> > And it may well be rather
> > complex model. It will involve entities without properties - only
> > identities. The following properties of identity make this task rather
> > difficult:
> >
> > - Identity is distributed among many entites.

Hmmm. Several entities can all use the same attributes as unique
identifiers, but I'm not sure what it means that identity is
distributed among many entities.

> > It can be hierarchical. For example, an element of categorization might have several segments
> > each specifying relative position. A fully qualified identifier then is
> > composed of several identifiers (for example, several primary keys taken
> > from different tables - having one primary key is not enough).

Can you give an example?

> > - Identity cannot be considered without its scope. For example, a
> > physical address is retricted by the scope of one computer, a primary
> > key might be restricted by one database etc.
> >
> > - Logical/physical is a relative characterization rather than absolute.

agreed.

> > Memory handle is really physical for an application program that uses
> > it, but it is logical for operating system w.r.t. to absolute offsets
> > in physical memory (offset may change while memory handle does not
> > change). In this sense all those disputes about lgoical/physical are
> > meaningless without specifying the context.

or at least without a common understanding of the context

> > Primary key may well be
> > viewed as a physical identifier from the point of view of some higher
> > level identification mechanism, say, global id. This means that global
> > id is permanent while primary key it substitues may change.
> >
> > - Any identifier is based on some environemnt that provides a coordinate
> > system that it uses to produce its own identifiers. In other words, any
> > new identification system is based on some lower level identification
> > system (environemtn or context) with its scope and structure.
> >
> > Currently no one model provides anything that would deal with the topic
> > of identity modeling.

it might if you don't try to extract these identities from their
entities. If you model your strong entities and their identifiers,
what falls through the cracks routinely if you use OO modeling with
composition, aggregation, inheritance, etc?

> > We can model columns and object properties more or
> > less successfully. But another side is still in darkness. We have tools
> > to implement identification mechanisms ourselves but we do not have a
> > theory for that.

interesting.

>
> This deserves - at least - a thread of it's own.
> If it would be a /. moderation I'ld have a hard time to choose between
> 'Insightful' and 'Interesting'.

agreed. --dawn

x

unread,

Aug 30, 2005, 3:34:05 AM8/30/05

to

http://www.dbdebunk.com/page/page/622757.htm

http://docs.hp.com/en/32650-90386/ch04s02.html

"Gene Wirchenko" <ge...@ucantrade.com.NOTHERE> wrote in message
news:fcr6h15tdd96ik63v...@4ax.com...

David Cressey

unread,

Aug 30, 2005, 5:32:12 AM8/30/05

to

"mAsterdam" wrote:
> Alexandr Savinov wrote:
[snip]

> > - Identity cannot be considered without its scope. For example, a

> > physical address is retricted by the scope of one computer, a primary
> > key might be restricted by one database etc.

Agreed.

> >
> > - Logical/physical is a relative characterization rather than absolute.
> > Memory handle is really physical for an application program that uses
> > it, but it is logical for operating system w.r.t. to absolute offsets
> > in physical memory (offset may change while memory handle does not
> > change). In this sense all those disputes about lgoical/physical are
> > meaningless without specifying the context. Primary key may well be
> > viewed as a physical identifier from the point of view of some higher
> > level identification mechanism, say, global id. This means that global
> > id is permanent while primary key it substitues may change.

Agreed.

There are multiple levels of abstraction, and what is logical at one level
is physical at another.

One interesting case is that of the URL. Is a URL a key or a pointer? Does
it name the resource, or does it locate the resource?

x

unread,

Aug 30, 2005, 5:49:13 AM8/30/05

to

"David Cressey" <david....@earthlink.net> wrote in message
news:wIVQe.3985$_84....@newsread1.news.atl.earthlink.net...

> There are multiple levels of abstraction, and what is logical at one level
> is physical at another.

> One interesting case is that of the URL. Is a URL a key or a pointer?
Does
> it name the resource, or does it locate the resource?

URL = uniform resource locator
URN = uniform resource name
URI = uniform resource identifier
http://www.w3.org/TR/uri-clarification/

x

unread,

Aug 30, 2005, 6:08:28 AM8/30/05

to

"David Cressey" <david....@earthlink.net> wrote in message
news:wIVQe.3985$_84....@newsread1.news.atl.earthlink.net...

> One interesting case is that of the URL. Is a URL a key or a pointer?

Does
> it name the resource, or does it locate the resource?

It is a pointer, not a key.
The URL and the resource are not tied together.

Christopher Browne

unread,

Aug 30, 2005, 8:41:50 AM8/30/05

to

> One interesting case is that of the URL. Is a URL a key or a
> pointer? Does it name the resource, or does it locate the resource?

It can definitely be either.

<http://www.w3.org/Provider/Style/URI.html> "Cool URIs don't change"

There are a lot of lame reasons for having URLs that have components
that *are* expected to change:

- status changes
- authors leave
- file "extensions" are a throwback to the filesystem styles of the
past
- software mechanisms are *really* likely to change (e.g. - where
you have some sort of ".../exec/foo.cgi?latest?092348921fas"
portion)

A "cool URI" is a name for the resource; uncool URIs merely present
the current location...
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/
If we were meant to fly, we wouldn't keep losing our luggage.

x

unread,

Aug 30, 2005, 9:38:42 AM8/30/05

to

"Alexandr Savinov" <sp...@conceptoriented.com> wrote in message

news:4312e49d$1...@news.fhg.de...

> In general I think that we lack information on "identtity modeling"
> althoug it is as important as data modeling itself. Identity modeling is
> a separate topic, a dual part for data modeling. In other words, we can
> model identity ignoring object properties. And it may well be rather
> complex model. It will involve entities without properties - only
> identities. The following properties of identity make this task rather
> difficult:

Why modeling identity ignoring properties ?
Maybe identity is not that important.
If a database cannot distinguish some "objects", maybe they are
interchangeable.

Gene Wirchenko

unread,

Aug 30, 2005, 12:43:57 PM8/30/05

to

It is a key since it does not specify the location. In a way,
DNS is concerned with translating this key into a pointer.

Sincerely,

Gene Wirchenko

Marshall Spight

unread,

Aug 30, 2005, 1:46:37 PM8/30/05

to

To me, one of the most important characteristics of a pointer
is that it has an operation, dereference, on it, that gets the
pointed-to information. There also exists a reverse operation,
reference, that gets the pointer for a datum. In C, these are
* and & respectively. Note that SQL has no such operations.

It strikes me that GET http://example.com/index.html is a
dereference, so I'd call URLs pointers. DNS likewise has
pointer operations.

To a large extent it's a matter of perspective, though.

Marshall

mAsterdam

unread,

Aug 30, 2005, 2:14:22 PM8/30/05

to

dawn wrote:
> mAsterdam wrote:
>
>>Alexandr Savinov wrote:
[snip]

>>>- Logical/physical is a relative characterization rather than absolute.
>
>
> agreed.

Hi Dawn,
could I lure you into writing a draft [logical/physical] entry
for the glossary?

mAsterdam

unread,

Aug 30, 2005, 2:53:28 PM8/30/05

to

Gene Wirchenko wrote:
> x wrote:

>>David Cressey wrote:
>>
>>>One interesting case is that of the URL.
>>>Is a URL a key or a pointer?
>>>Does it name the resource, or does
>>>it locate the resource?
>>
>>It is a pointer, not a key.
>>The URL and the resource are not tied together.
>
> It is a key since it does not specify the location.
> In a way, DNS is concerned with
> translating this key into a pointer.

The URL is just an alias for another locator.
The existence of what is there (at the pointed-at)
location does not depend on the existence of the URL.

Real keys do not have things that are existentially
dependent on them, but my intuition (not checking
any definitons anywhere) is that primary keys do:
A tuple (relational value) can only exist /with/ it.

So, I (almost) agree with x:
the URL is a pointer(-alias), no key.

The distinction between pointer and reference is that
the referenced object's physical location may change
without a need to change the reference. The alias is
one way (an indirection) to accomplish that.

So the URL may be a reference - but still no key.

My 2 Eurocents.

paul c

unread,

Aug 30, 2005, 3:12:43 PM8/30/05

to

> ...

Apologies if I'm taking David's original question above out of context,
but 'out of context' seems important in this case. As far as the RM is
concerned, my understanding is that within the RM something (ie.
attributes) is a key only within a relation. It's a set of attributes
with some constraint on their values that allows other single-valued
attributes to be dependent on the key values. There are deeper reasons
for it, but that's the only essential one I know of.

Some people say a key is an "address" because keys have been described
as the RM's "addressing mechanism". It doesn't follow that a physical
storage address is at all the same thing. For example, outside the RM,
how could a physical address possibly point to a tuple in a relation?

As far as the RM is concerned, once you take it out of the database
context, it can be anything one might want it to be even if it doesn't
make sense. You could call it Margaret Thatcher or a kumkwat and the RM
wouldn't care. This is okay even if it doesn't make sense. It must be,
witness all the applications that don't make sense.

There was some talk recently to the effect, if I read it right, that a
generated IDENTITY (perhaps the capitals have some special meaning to
some particular product) somehow pollutes or entangles or locks-in a
logical database. Not sure whether this particular IDENTITY was
intended to be a value that is merely generated somehow or if it refers
to, say, a storage location of some physical machine.

I'd say whether it is one or the other doesn't matter at all to the RM.
It may matter to some application that happens to be mapping memory
but that's neither here nor there for the RM. There's nothing I know of
in the RM that forbids a certain value. If some user or some program
inserts it, all the database wants to know are the internal rules it
should follow in deciding whether to allow the insert. If it is a key,
then one mandatory rule will be that no tuple with the same value for
those attributes can be the same. There might be other rules too and a
designer might write some kind of trigger or procedure that goes outside
the database to see if it's a real address. That sounds like
application to me.

Same thing goes for generated keys. If they happen to be sequential,
the RM won't care, as long as they are 'unique'. Some application could
depend on sequence but it would have to guarantee the sequence itself.
Good argument for user domains. Guaranteeing sequence without gaps is a
bigger problem than many people realize in fact I believe even though I
can't prove it except anecdotally that there are always situations when
it won't work. So an application would have to do this itself and make
whatever conditions it need to stick, stick. It might happen to use
some DBMS features to make this happen such as locking support, but I
don't think there is any RM feature it could use or at least any that
would be the only way to do it.

The arguments about IDENTITY or anything else for that matter, being
problematic to a 'relational' database because they are physical and not
logical *outside* the database are apples and oranges and straw at the
same time. Just noise for consultants and various writers to make when
they can't think what to say next. Confusing silence with stupidity.

I detect a consultant's trick in the argument that the database must
contain only the 'logical' values and not 'physical' ones, where
customers or readers might be confused by all the red herrings thrown up
into thinking that if values are only "logical" then the corollary
being that the database that has been designed for them must be logical
too, a kind of sine qua non for their application being logical to boot.
This is complete BS.

There's also nothing in the RM, at least what I've read about it, that
prefers generated keys over 'natural' keys. A generated key might be a
tell-tale that some designer wasn't capable of considering the uses of
natural keys in future applications and simply copped out. But it could
mean the opposite as well - that the designer was fully aware of all
intentions for the database and was explicitly ruling out natural keys.

There's at least one other time when a generated key could help, for
example when you've screwed up the design and people have put lots of
data into the db. Then you could, conceivably, at least append a key,
take some projections and join / union them into the tables you should
have designed in the first place.

What a user thinks a key means is something the RM isn't really
concerned with. It's up to the designer to tell them how it should be
used, ie. what it means as far as the user doing the user's job is
concerned. The designer might tell one set of users, say the Human
Remains people, that there are two sets of departments, one that the
Payroll users see in the view defined for them and another one that the
HR people can see as well. Just two attributes in the DEPT table, only
the Payroll people can't see one of them. Why would you do this? Well,
the HR people could update one row and move everybody in one department
to another. Only the HR people would have to remember that the two
attributes had different meanings. Contrived example, I'll admit, but
it does show why users have to be educated by the designer.

Ah, that feels better,
thanks,
p