A real world example

Brian Selzer

unread,

Aug 9, 2006, 10:35:18 AM8/9/06

to

In an earlier thread, Marshall asked for a real-world example, and while
this isn't exactly what was asked for, it illustrates the problem. Here's
the situation. I have two databases containing orders. One is the
customer's database containing purchase orders, lines, schedules and
releases. The other is the supplier's database containing sales orders and
lines. There should be a 1:1 relationship between the releases in the
customer's database and the lines in the supplier's database.

If you join the respective tables in each of the databases, you get the
following:

Customer purchase order detail:
{PO#, POLine#, CustPart#, Schedule#, ReqestedDate, RequiredQuantity,
ReceivedQuantity}

The primary key is {PO#, POLine#, Schedule#, RequestedDate}

Supplier sales order detail:
{SO#, PO#, SOLine#, Part#, CustPart#, RequestedDate, PromiseDate,
RequiredQuantity, ShippedQuantity}

The primary key is {SO#, SOLine#},
an alternate key is {PO#, CustPart#, RequestedDate, SO#}

The first problem is that the customer uses a blanket PO, and therefore,
(usually) reuses the same PO# and Line# to order a part--the schedule number
differentiates one set of releases from another.

The second problem is that the supplier's database doesn't track schedules,
so there's no way to differentiate sets of releases, except that the SO#
will be different for different schedules.

Aside from those there have been keying errors, some rows don't have a
customer part #, some rows don't have a PO number, etc.

What I've been asked to do is to produce exception reports, showing what is
different between the two databases so that changes to purchase orders that
appear in the customer's database can be verified after they're entered into
the supplier's database.

My point is that here is a real-world example where the universes of two
databases overlap, but the set of attributes used to identify something
common to the two universes is different for different databases. I
understand that this is a common problem when merging or otherwise
connecting databases, and I've run into this numerous times and have it
covered, but the fact remains that a similar situation can occur within a
single database. With natural keys, that is, keys whose value can be
different in successive database states, it is possible for the values of
one set of identifying attributes of something in the universe to be
different in successive universe states, and when this new information is
imparted to the database, it is possible for there to be propositions in
successive database states that should correspond, but cannot because the
only relevant set of identifying attributes of something is different in
successive universe states. It doesn't matter how stable a set of
identifying attributes is, if there's the slightest chance that its values
can be different in successive universe states, you have to assume that they
will. The problem is one of relevance. If the DNA of a person is not
relevant, but is the only set of attribute values that is guaranteed to
remain constant throughout the discussion, then what do you do? If you need
to know what was known about a thing, then there are only three choices: (1)
use a surrogate for the DNA and make it the primary key, (2) identify each
statement in the database so that it can be discussed in successive database
states (in other words, use a surrogate for each statement), or (3) define a
tuple-level assignment operator (A tuple-level assignment operator would
permit the system to correlate tuples in successive database states and to
act accordingly. For example, Oracle has a FOR EACH ROW trigger.)

Keith H Duggar

unread,

Aug 11, 2006, 1:43:55 PM8/11/06

to

Brian Selzer wrote:
> In an earlier thread, Marshall asked for a real-world
> example, and while this isn't exactly what was asked for,
> it illustrates the problem.

(Brian note that I'm just a novice. So any questions or
points I raise are not to argue a position but rather to
provoke further detail or explanation for my Level 6
Fraudulent Education.)

Later you say that it illustrates not "the" but rather "a
similar" problem. Can you provide a real-world example of
"the" actual problem as Marshall asked?

[snip example of a "similar" but different problem]

> My point is that here is a real-world example where the
> universes of two databases overlap, but the set of
> attributes used to identify something common to the two
> universes is different for different databases.

So isn't the actual problem that the databases employ
different models of the same data? In other words, it seems
in that example (and in "the" problem), that the problem is
not different /values/ but different /models/?

> the fact remains that a similar situation can occur within
> a single database. With natural keys, that is, keys whose
> value can be different in successive database states, it
> is possible for the values of one set of identifying
> attributes of something in the universe to be different in
> successive universe states, and when this new information
> is imparted to the database, it is possible for there to
> be propositions in successive database states that should
> correspond, but cannot because the only relevant set of
> identifying attributes of something is different in
> successive universe states.

Why "should [they] correspond"? Are you envisioning some
meta-data external to the data model that remains constant?
Some property not modeled that corresponds? If this external
information and "correspondence" is important, then why is
it not internalized explicitly as part of the data model?

Just as in the previous example, it seems that in your mind
you maintain a /different/ model than the database. So again
the problem seems not to be the changing values but rather
that the two DBMS (in this case the machine and your mind)
have /different models/. No?

Can you give a real-world example of two entities sharing the
/same/ data model having such a correspondence mismatch?

-- Keith -- Fraud 6

JOG

unread,

Aug 11, 2006, 8:39:13 PM8/11/06

to

It may be tiredness, but I found that example hard to digest. However
if I understand your gist correctly, you are pointing out a situation
where an external entity is identified differently, in two distinct and
separate databases? If so, then this will always necessarily require a
'situated' sentient-being to act as middleware. After all a db doesn't
represent a universal truth about the real world, it just stores an
encoding of it acceptable enough for a target user to interpret what it
tells them in a useful way. In the end its always the human user
putting the semantic pieces back together.

>
> My point is that here is a real-world example where the universes of two
> databases overlap, but the set of attributes used to identify something
> common to the two universes is different for different databases. I
> understand that this is a common problem when merging or otherwise
> connecting databases, and I've run into this numerous times and have it
> covered, but the fact remains that a similar situation can occur within a
> single database. With natural keys, that is, keys whose value can be
> different in successive database states, it is possible for the values of
> one set of identifying attributes of something in the universe to be
> different in successive universe states

huh? different identifying attributes, different things. How can they
be the same, unless there is an incorrect key choice? Liebniz equality,
etc., etc. (I know you've heard this from me before now, so apologies
for the repetition, but I still can't see how I'm not connecting with
you on it). The successive database states example, unless i've
misconstrued it, appears analagous to talking about a company who uses
a primary key of firstname for its employees relation, only for them to
realise a year later that they've cocked up when they hire someone with
a clashing firstname. Thats no mistake in the theory, just in the
coconuts trying to apply it.

>, and when this new information is
> imparted to the database, it is possible for there to be propositions in
> successive database states that should correspond, but cannot because the
> only relevant set of identifying attributes of something is different in
> successive universe states.

Again I'm foxed by this Brian. How can propositions with different keys
correspond between database states? That notion seems nonsensical to
me. Only entities can 'correspond', and of course they are part of the
conceptual not logical model right?

> It doesn't matter how stable a set of
> identifying attributes is, if there's the slightest chance that its values
> can be different in successive universe states, you have to assume that they
> will.

Well, only if there is sufficent risk involved to necessitate worrying
about such a situation for that specific database. It is _possible_
that two people's DNA may match for example, but the chances are so
absurdly infintessimely small that it is of no concern.

> The problem is one of relevance. If the DNA of a person is not
> relevant, but is the only set of attribute values that is guaranteed to
> remain constant throughout the discussion, then what do you do? If you need
> to know what was known about a thing, then there are only three choices: (1)
> use a surrogate for the DNA and make it the primary key, (2) identify each
> statement in the database so that it can be discussed in successive database
> states (in other words, use a surrogate for each statement), or (3) define a
> tuple-level assignment operator (A tuple-level assignment operator would
> permit the system to correlate tuples in successive database states and to
> act accordingly. For example, Oracle has a FOR EACH ROW trigger.)

Naturally (1) (with that 'if there is sufficient risk' caveat) - after
all, that's what a surrogate is for. A replacement for an unobtainable
distinguishing attribute. It is the only option which makes any sense
to me logically. However you do not specify which you would choose? My
impression from your previous posts is that you would also choose (1)
but would desire it to be hidden? All best, J.

Brian Selzer

unread,

Aug 11, 2006, 10:32:50 PM8/11/06

to

"Keith H Duggar" <dug...@alum.mit.edu> wrote in message
news:1155318235.1...@m79g2000cwm.googlegroups.com...

> Brian Selzer wrote:
>> In an earlier thread, Marshall asked for a real-world
>> example, and while this isn't exactly what was asked for,
>> it illustrates the problem.
>
> (Brian note that I'm just a novice. So any questions or
> points I raise are not to argue a position but rather to
> provoke further detail or explanation for my Level 6
> Fraudulent Education.)
>
> Later you say that it illustrates not "the" but rather "a
> similar" problem. Can you provide a real-world example of
> "the" actual problem as Marshall asked?
>
> [snip example of a "similar" but different problem]
>
>> My point is that here is a real-world example where the
>> universes of two databases overlap, but the set of
>> attributes used to identify something common to the two
>> universes is different for different databases.
>
> So isn't the actual problem that the databases employ
> different models of the same data? In other words, it seems
> in that example (and in "the" problem), that the problem is
> not different /values/ but different /models/?
>

No. It's not the data that's the same: it's that the things that the data
is about are the same. A database contains information about things in the
universe. In a relational database, this information is organized into
relations, but that doesn't change the fact that each proposition describes
things in the universe. The problem is that a thing can have multiple sets
of identifying properties, that some of these identifying properties can
have different values in different situations, and that when propositions
about a thing refer to it by using identifying properties that can be
different in different situations, there's not enough information available
to know that you're talking about the same thing.

>> the fact remains that a similar situation can occur within
>> a single database. With natural keys, that is, keys whose
>> value can be different in successive database states, it
>> is possible for the values of one set of identifying
>> attributes of something in the universe to be different in
>> successive universe states, and when this new information
>> is imparted to the database, it is possible for there to
>> be propositions in successive database states that should
>> correspond, but cannot because the only relevant set of
>> identifying attributes of something is different in
>> successive universe states.
>
> Why "should [they] correspond"? Are you envisioning some
> meta-data external to the data model that remains constant?
> Some property not modeled that corresponds? If this external
> information and "correspondence" is important, then why is
> it not internalized explicitly as part of the data model?
>
> Just as in the previous example, it seems that in your mind
> you maintain a /different/ model than the database. So again
> the problem seems not to be the changing values but rather
> that the two DBMS (in this case the machine and your mind)
> have /different models/. No?
>

No. See below.

> Can you give a real-world example of two entities sharing the
> /same/ data model having such a correspondence mismatch?
>
> -- Keith -- Fraud 6
>

The universe of discourse is the set of all things that can be discussed.
Each element has a set of characteristics, or properties, that define and
describe it. As I see it, these properties fall into two categories,
individual and universal. An individual property defines the essence of
something, either by itself or when combined with other individual
properties. Individual properties remain constant because their values
depend only on the existence of the thing that they define. For example,
someone's DNA sequence could be considered an individual property (though in
the case where there can exist identical twins or clones, it may not be
sufficient by itself to identify a particular person). A universal
property, on the other hand, is descriptive in nature and its value depends
on a state of the universe (a situation). The value of a universal property
of something can be different in different situations. For example,
person's location can be different at different points in time. (I grant
that the distinction between individual properties and universal properties
assumes that something can come into existence only once.)

So, individual properties define something, and universal properties
describe something; it is possible, however, for a set of universal
properties to distinguish something from all other things within the scope
of a particular situation. For example, consider a queue of people:

(Bob, Brian, Lynn, Susan, Mark, Lisa)

Lynn is third in the queue, so if you say something about Lynn or if you say
something about the third person in the queue, then you're talking about the
same person. But, if Bob is removed from the queue, you get:

(Brian, Lynn, Susan, Mark, Lisa)

Anything that was said about the third person in the queue now applies to
Susan, not Lynn. This is because a person's position in the queue is a
universal property, rather than an individual property, even though that
property can identify a person within a particular situation. To clarify,
consider the following relation schema, QUEUE, with two values, the first
representing what was known prior to Bob's removal, and the second
representing what is known after:

QUEUE: (DNA sequence, Name, Position)

QUEUE Before:
{(1234..., "Bob", 1),
(2345..., "Brian", 2),
(3456..., "Lynn", 3),
(4567..., "Susan", 4),
(5678..., "Mark", 5),
(6789..., "Lisa", 6)}

QUEUE After:
{(2345..., "Brian", 1),
(3456..., "Lynn", 2),
(4567..., "Susan", 3),
(5678..., "Mark", 4),
(6789..., "Lisa", 5)}

[For the sake of this discussion, please assume that "1234...," "2345....,"
etc. are a representation of different people's DNA sequences. Also assume
that identical twins and clones cannot exist in this universe. While I
understand that such a representation is contrived, I'm using DNA sequence
as an attribute to prevent the idiotic argument that a person's name can't
be a candidate key from detracting from the points I'm trying to make.]

This relation schema has two candidate keys, "DNA sequence" and "Position,"
because each meets the criteria for a candidate key: its values determine
all other attribute values within the same relation value, it is minimal,
and these conditions apply to every legal relation value; but it should be
obvious that "DNA sequence" refers to an individual property, whereas
"Position" refers to a universal property. Confusion arises because the
definition of a candidate key ensures only that *statements* about things
are unique within any *single* relation value, not that the things under
discussion are unique, nor that the values of all of the identifying
properties of the things under discussion will remain the same in successive
database states. It is possible, therefore, to define a relation schema
with a single candidate key that refers to a universal property as one of
its domains, for example, Social Security Number.

Since a relation schema can be so defined and since the value of a universal
property can be different in different situations, it is possible for
propositions in successive database states that describe the same aspect of
the same thing to have different candidate key values. In other words, you
can't tell if you're talking about the same thing.

The points I was trying to make in the earlier posts was that this inability
to correlate the propositions in one database state to those in the next
poses several problems. For example, it precludes enforcement of transition
constraints. If you can't tell that you're talking about the same thing,
then how can you determine whether a new attribute value is legal? I refer
you back to those earlier posts for enumeration and examples of the other
problems.

Brian Selzer

unread,

Aug 11, 2006, 11:21:19 PM8/11/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1155343153....@h48g2000cwc.googlegroups.com...

I was pretty tired when I posted it. Maybe it was contageous. It was
possible to correlate most of the entries by using temporal sequencing, but
it requires two successive database states from the customer database; I
just created an extra few reports to deal with the duplicates.

>>
>> My point is that here is a real-world example where the universes of two
>> databases overlap, but the set of attributes used to identify something
>> common to the two universes is different for different databases. I
>> understand that this is a common problem when merging or otherwise
>> connecting databases, and I've run into this numerous times and have it
>> covered, but the fact remains that a similar situation can occur within a
>> single database. With natural keys, that is, keys whose value can be
>> different in successive database states, it is possible for the values of
>> one set of identifying attributes of something in the universe to be
>> different in successive universe states
>
> huh? different identifying attributes, different things.

No, you can have more than one set of identifying attributes for a thing. I
just posted a detailed explanation in response to keith's post. I think
that instead of revisiting the explanations here, I should refer you there.

>... How can they

> be the same, unless there is an incorrect key choice? Liebniz equality,
> etc., etc. (I know you've heard this from me before now, so apologies
> for the repetition, but I still can't see how I'm not connecting with
> you on it). The successive database states example, unless i've
> misconstrued it, appears analagous to talking about a company who uses
> a primary key of firstname for its employees relation, only for them to
> realise a year later that they've cocked up when they hire someone with
> a clashing firstname. Thats no mistake in the theory, just in the
> coconuts trying to apply it.
>

Most of what I do is cleaning up other people's messes, so I understand
fully about poor key choice, but I think the problem goes deeper than that.

>>, and when this new information is
>> imparted to the database, it is possible for there to be propositions in
>> successive database states that should correspond, but cannot because the
>> only relevant set of identifying attributes of something is different in
>> successive universe states.
>
> Again I'm foxed by this Brian. How can propositions with different keys
> correspond between database states? That notion seems nonsensical to
> me. Only entities can 'correspond', and of course they are part of the
> conceptual not logical model right?
>

If a proposition describes some aspect of one or more things in the
universe, then you should be able to correlate that proposition in one
database state with a proposition that describes the same aspect of the same
thing or things in the next database state. In other words, you may need to
know what is known about something in order to assert something new about
that thing.

I think it depends on the situation. (1) if the universe doesn't contain
the database, (2) if it does, and (3) if changing the schema will break a
bunch of applications. (2) is much more appealing for temporal databases.

Bob Badour

unread,

Aug 12, 2006, 11:26:03 AM8/12/06

to

JOG wrote:

It's hard to digest because it is incomplete and artificial. It seems
difficult because the idiot expects a 1:1 correspondence between tuples
when none should exist based on the given information. A summary of two
relations projected on PO# must reconcile, and that is all. If they must
reconcile, then presumably the workers at the sales organization must
have kludged their system probably by combining the customer's PO# and
Schedule# into the sales PO#.

Supplying common business sense where details were omitted, the total
amount invoiced must reconcile with the total amount authorized on the
purchase order. The total requested quantities must match and the total
quantity shipped must reconcile with the quantity requested and authorized.

If things don't reconcile, someone will have to go through the whole
thing line by line and probably also locating the physical material.
Forensic studies are necessarily meticulous and labour intensive.

The whole example suggests to me that the sales organization hired a
crank like Selzer in the first place who then failed to correctly
capture the requirements for progress draws.

However
> if I understand your gist correctly, you are pointing out a situation
> where an external entity is identified differently, in two distinct and
> separate databases?

It goes deeper than that. Will the purchasing organization accept or pay
for shipments they cannot trace to an exact (PO#,Schedule#) ? Why should
they pay for something they didn't order? How else do they know they
ordered it?

If so, then this will always necessarily require a
> 'situated' sentient-being to act as middleware. After all a db doesn't
> represent a universal truth about the real world, it just stores an
> encoding of it acceptable enough for a target user to interpret what it
> tells them in a useful way. In the end its always the human user
> putting the semantic pieces back together.
>
>
>>My point is that here is a real-world example where the universes of two
>>databases overlap, but the set of attributes used to identify something
>>common to the two universes is different for different databases. I
>>understand that this is a common problem when merging or otherwise
>>connecting databases, and I've run into this numerous times and have it
>>covered, but the fact remains that a similar situation can occur within a
>>single database. With natural keys, that is, keys whose value can be
>>different in successive database states, it is possible for the values of
>>one set of identifying attributes of something in the universe to be
>>different in successive universe states
>
> huh? different identifying attributes, different things. How can they
> be the same, unless there is an incorrect key choice?

Since when did a natural key become an unstable key? If you haven't
figured it out yet, it is impossible to communicate with a crank who
makes up his own definitions for words.

A natural key is simply a familiar surrogate. Nothing more. Nothing
less. While stability is an important design criterion for choosing
keys, both surrogate and natural keys can be stable or unstable.

If his whole argument is that stability is an important design
criterion, the appropriate answer is "Well, duh!"
http://www.dbdebunk.com/page/page/622344.htm

Liebniz equality,
> etc., etc. (I know you've heard this from me before now, so apologies
> for the repetition, but I still can't see how I'm not connecting with
> you on it). The successive database states example, unless i've
> misconstrued it, appears analagous to talking about a company who uses
> a primary key of firstname for its employees relation, only for them to
> realise a year later that they've cocked up when they hire someone with
> a clashing firstname. Thats no mistake in the theory, just in the
> coconuts trying to apply it.

Some kind of nuts anyway.

>>, and when this new information is
>>imparted to the database, it is possible for there to be propositions in
>>successive database states that should correspond, but cannot because the
>>only relevant set of identifying attributes of something is different in
>>successive universe states.
>
> Again I'm foxed by this Brian. How can propositions with different keys
> correspond between database states? That notion seems nonsensical to
> me. Only entities can 'correspond', and of course they are part of the
> conceptual not logical model right?

If you haven't figured it out yet, Selzer thinks physical location in
storage has some kind of theoretical or logical importance.

>>It doesn't matter how stable a set of
>>identifying attributes is, if there's the slightest chance that its values
>>can be different in successive universe states, you have to assume that they
>>will.
>
> Well, only if there is sufficent risk involved to necessitate worrying
> about such a situation for that specific database. It is _possible_
> that two people's DNA may match for example, but the chances are so
> absurdly infintessimely small that it is of no concern.

Clones are more common than you think. An incidence of 4 per 1000 for
identical twins makes them just a little more scarce than Canadians in
the world population.

>>The problem is one of relevance. If the DNA of a person is not
>>relevant, but is the only set of attribute values that is guaranteed to
>>remain constant throughout the discussion, then what do you do?

That would be a poor assumption given the existence of chimeras and
transplant recipients--not to mention early recipients of stem cell
treatments.

If you need
>>to know what was known about a thing, then there are only three choices: (1)
>>use a surrogate for the DNA and make it the primary key,

i.e. Choose a valid candidate key. Familiarity, while also an important
design criterion, is irrelevant to the point discussed. After all, a
natural key is merely a familiar surrogate.

(2) identify each
>>statement in the database so that it can be discussed in successive database
>>states (in other words, use a surrogate for each statement),

If one has a valid key, the statements already have logical identity. If
that changes, it is no longer the same statement.

or (3) define a
>>tuple-level assignment operator (A tuple-level assignment operator would
>>permit the system to correlate tuples in successive database states and to
>>act accordingly. For example, Oracle has a FOR EACH ROW trigger.)

That would be stupid. Physical identification has numerous problems that
logical identification obviates.

> Naturally (1) (with that 'if there is sufficient risk' caveat) - after
> all, that's what a surrogate is for. A replacement for an unobtainable
> distinguishing attribute. It is the only option which makes any sense
> to me logically. However you do not specify which you would choose? My
> impression from your previous posts is that you would also choose (1)
> but would desire it to be hidden? All best, J.

Who can say what the hell he is trying to say? He makes up his own
definitions for words and expects everyone to know them. He is an idiot
and a crank.

Brian Selzer

unread,

Aug 12, 2006, 1:46:23 PM8/12/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

No. I stepped in to fix a problem that already existed. The problem was
real--it was in no way artificial. The supplier's database was royally
screwed up. Schedules were being missed and they were in danger of losing a
significant percentage of their business. And yes, the tuples should
correspond because they refer to the same thing. You are obviously too
pig-headed and closed-minded to comprehend that. To add fuel to the fire,
the customer makes frequent changes to the schedules for parts, so there is
no way to correlate by using total amounts or quantities. Any new database
state from the supplier can contain changes to and cancellations of existing
schedules and releases, along with additional schedules and releases for
existing POs, as well as totally new POs. Therefore, the supplier's
database will always lag behind the customer's, in the same way that what is
known by the database lags behind the current state of the universe.

>
> However
>> if I understand your gist correctly, you are pointing out a situation
>> where an external entity is identified differently, in two distinct and
>> separate databases?
>
> It goes deeper than that. Will the purchasing organization accept or pay
> for shipments they cannot trace to an exact (PO#,Schedule#) ? Why should
> they pay for something they didn't order? How else do they know they
> ordered it?
>

The supplier has been hand writing this information on the packing slips.
Scary, yes?

No. Primary keys should not just be stable, they should be immutable.
Unless values for key attributes are guaranteed to remain constant
throughout the discussion, then it doesn't matter how stable a key is: if it
can change, it will, and if you don't have a mechanism to detect such
changes and deal with them, then you'll end up with garbage in your
database. It's as simple as that. Any system that relies on the stability
of a key cannot be proven to be correct. It may operate correctly 99.999%
of the time, but it will eventually fail. I've seen it happen many times.
You must either limit the types of updates that can occur, or you must add a
ton of extra code.

>
> Liebniz equality,
>> etc., etc. (I know you've heard this from me before now, so apologies
>> for the repetition, but I still can't see how I'm not connecting with
>> you on it). The successive database states example, unless i've
>> misconstrued it, appears analagous to talking about a company who uses
>> a primary key of firstname for its employees relation, only for them to
>> realise a year later that they've cocked up when they hire someone with
>> a clashing firstname. Thats no mistake in the theory, just in the
>> coconuts trying to apply it.
>
> Some kind of nuts anyway.
>
>
>>>, and when this new information is
>>>imparted to the database, it is possible for there to be propositions in
>>>successive database states that should correspond, but cannot because the
>>>only relevant set of identifying attributes of something is different in
>>>successive universe states.
>>
>> Again I'm foxed by this Brian. How can propositions with different keys
>> correspond between database states? That notion seems nonsensical to
>> me. Only entities can 'correspond', and of course they are part of the
>> conceptual not logical model right?
>
> If you haven't figured it out yet, Selzer thinks physical location in
> storage has some kind of theoretical or logical importance.
>

That's an asinine comment, and a misrepresentation. Why do you feel the
need to lie? Is your argument so weak that you must resort to misdirection,
misrepresentation, and outright falsehoods? Are you a Democratic
politician, or an adolescent?

As I stated before, if it can change, it will.

>
> (2) identify each
>>>statement in the database so that it can be discussed in successive
>>>database
>>>states (in other words, use a surrogate for each statement),
>
> If one has a valid key, the statements already have logical identity. If
> that changes, it is no longer the same statement.
>

This proves your inability to see beyond your nose. The scope of a
candidate key's ability to provide logical identity is a single database
value. Perhaps you should review the definition. I'm sure there are also a
lot of people on this forum who could walk you through it if that's what you
need. I suggest, however, that you treat them with a little more respect
than you have in the past. Most people don't respond well to abuse.

>
> or (3) define a
>>>tuple-level assignment operator (A tuple-level assignment operator would
>>>permit the system to correlate tuples in successive database states and
>>>to
>>>act accordingly. For example, Oracle has a FOR EACH ROW trigger.)
>
> That would be stupid. Physical identification has numerous problems that
> logical identification obviates.
>

I agree that it would be stupid, but not with your reason. What is physical
identification, and what does it have to do with correlating tuples? Are
you talking about the address on the disk of a particular tuple? Using that
would be the height of stupidity.

>
>> Naturally (1) (with that 'if there is sufficient risk' caveat) - after
>> all, that's what a surrogate is for. A replacement for an unobtainable
>> distinguishing attribute. It is the only option which makes any sense
>> to me logically. However you do not specify which you would choose? My
>> impression from your previous posts is that you would also choose (1)
>> but would desire it to be hidden? All best, J.
>
> Who can say what the hell he is trying to say? He makes up his own
> definitions for words and expects everyone to know them. He is an idiot
> and a crank.

No, I'm a maniac because my mind operates faster than yours; you, on the
other hand, appear to be doing 25mph in the passing lane.

Brian Selzer

unread,

Aug 13, 2006, 6:03:24 AM8/13/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

[snip]

> A natural key is simply a familiar surrogate. Nothing more. Nothing less.

I disagree. A the value of a surrogate (at least according to Codd, and
also Date, if I recall correctly) should permanently identify something.
That has always been my understanding, and that has always been how I've
used the term. Natural keys can change and still refer to the same thing.
It's easy to prove. Consider a relation schema that describes employees and
has two candidate keys, Social Security Number and Badge Number. If an
employee gets a new Badge Number because he lost his badge, does the new
Badge Number refer to the same employee? The answer is obvious: if it
didn't, then the fact that the Social Security Number didn't change
contradicts that. The definition of a candidate key guarantees that the
propositions in a single relation value are unique; therefore, a candidate
key value can identify a tuple, but only within a single relation value. In
order to span multiple database states, that value must be permanent. Codd
understood this even if you can't get it through your head: I refer you to
the paper he wrote in 1979, "Extending the Database Relational Model to
Capture More Meaning."

[snip]

anit...@gmail.com

unread,

Aug 14, 2006, 9:00:20 AM8/14/06

to

"Brian Selzer" <br...@selzer-software.com> wrote in message
news:MNCDg.8096$9T3...@newssvr25.news.prodigy.net...

>
> "Bob Badour" <bba...@pei.sympatico.ca> wrote in message
> news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...
>
> [snip]
>
>> A natural key is simply a familiar surrogate. Nothing more. Nothing less.
>
> I disagree. A the value of a surrogate (at least according to Codd, and
> also Date, if I recall correctly) should permanently identify something.

The only way one can have an attribute whose values never change is by
violating information principle.

> That has always been my understanding, and that has always been how I've
> used the term. Natural keys can change and still refer to the same thing.

Nothing in surrogacy suggests immutability of values.

> It's easy to prove. Consider a relation schema that describes employees
> and has two candidate keys, Social Security Number and Badge Number. If
> an employee gets a new Badge Number because he lost his badge, does the
> new Badge Number refer to the same employee? The answer is obvious: if it
> didn't, then the fact that the Social Security Number didn't change
> contradicts that. The definition of a candidate key guarantees that the
> propositions in a single relation value are unique; therefore, a candidate
> key value can identify a tuple, but only within a single relation value.
> In order to span multiple database states, that value must be permanent.
> Codd understood this even if you can't get it through your head: I refer
> you to the paper he wrote in 1979, "Extending the Database Relational
> Model to Capture More Meaning."

Perhaps you may want to read through the contradictions in Codd's RM/T
paper.
http://www.intelligententerprise.com/db_area/archives/1999/990106/online1.jhtml

--
Anith
>
> [snip]
>
>

Bob Badour

unread,

Aug 14, 2006, 9:44:56 AM8/14/06

to

anit...@gmail.com wrote:

> "Brian Selzer" <br...@selzer-software.com> wrote in message
> news:MNCDg.8096$9T3...@newssvr25.news.prodigy.net...
>
>>"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
>>news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...
>>
>>[snip]
>>
>>
>>>A natural key is simply a familiar surrogate. Nothing more. Nothing less.
>>
>>I disagree. A the value of a surrogate (at least according to Codd, and
>>also Date, if I recall correctly) should permanently identify something.

Cite? The value of a candidate key is it permanently identifies
something; hence, the importance of stability as a design criterion.
That is as true for natural keys as for surrogates.

> The only way one can have an attribute whose values never change is by
> violating information principle.

How so? I agree that hiding an attribute violates the information
principle, but I don't see how precluding changes does.

>>That has always been my understanding, and that has always been how I've
>>used the term. Natural keys can change and still refer to the same thing.

Selzer's understanding is flawed. I suggest he should try harder to use
correct terms.

> Nothing in surrogacy suggests immutability of values.

Hear! Hear!

>>It's easy to prove.

If one accepts Selzer's definitions of terms, then by definition nothing
needs proof. However, his definitions of terms differ greatly from the
generally accepted definitions, which makes any pretense of
communication with him pointless.

Consider a relation schema that describes employees
>>and has two candidate keys, Social Security Number and Badge Number. If
>>an employee gets a new Badge Number because he lost his badge, does the
>>new Badge Number refer to the same employee?

The badge number logically identifies a badge and does so immutably. The
attribute describing the employee merely describes the current
assignment of a badge to the employee.

Suppose a criminally minded individual finds the lost badge and uses it
to gain unauthorized access to the employer's resources. Where does
Selzer's proposed schema record who the badge is assigned to?

In the example he gave above, he proposed two arbitrarily assigned
numbers as candidate keys. What is a surrogate if not an arbitrarily
assigned number?

The numbers happen to be somewhat familiar because the employee already
uses the SSN for his taxes, and the badge number presumably appears on a
physical artifact handed to the employee: ie. the badge. The familiarity
of the keys makes them natural keys.

The answer is obvious: if it
>>didn't, then the fact that the Social Security Number didn't change
>>contradicts that.

Ironically, the idiot is too stupid to realize that SSN's are just as
mutable as badge numbers. While one may not legally have two SSN's at
the same time, transcription errors and identity theft mean SSN's do
change over time from the perspective of the organization.

The definition of a candidate key guarantees that the
>>propositions in a single relation value are unique; therefore, a candidate
>>key value can identify a tuple

True.

, but only within a single relation value.

Bullshit. See foreign key references.

>>In order to span multiple database states, that value must be permanent.

Well, duh. This is why stability is an important design criterion for
choosing candidate keys. Saying that stability is an important design
criterion is a wholly unremarkable statement. If that is what the idiot
was trying to say, why didn't he just say it?

>>Codd understood this even if you can't get it through your head: I refer
>>you to the paper he wrote in 1979, "Extending the Database Relational
>>Model to Capture More Meaning."
>
> Perhaps you may want to read through the contradictions in Codd's RM/T
> paper.
> http://www.intelligententerprise.com/db_area/archives/1999/990106/online1.jhtml

Indeed. Thank you.

anit...@gmail.com

unread,

Aug 14, 2006, 10:30:35 AM8/14/06

to

Bob Badour wrote:
> anit...@gmail.com wrote:
>
> > "Brian Selzer" <br...@selzer-software.com> wrote in message
> > news:MNCDg.8096$9T3...@newssvr25.news.prodigy.net...
> >
> >>"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
> >>news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...
> >>
> >>[snip]
> >>
> >>
> >>>A natural key is simply a familiar surrogate. Nothing more. Nothing less.
> >>
> >>I disagree. A the value of a surrogate (at least according to Codd, and
> >>also Date, if I recall correctly) should permanently identify something.
>
> Cite? The value of a candidate key is it permanently identifies
> something; hence, the importance of stability as a design criterion.
> That is as true for natural keys as for surrogates.
>
>
> > The only way one can have an attribute whose values never change is by
> > violating information principle.
>
> How so? I agree that hiding an attribute violates the information
> principle, but I don't see how precluding changes does.
>

Ah.. data independence, not information principle. My bad.

I was thinking of the facilities in some SQL products like version
timestamps, row ids etc. used as column values, but non-updateable
purely due to its dependence to the physical model.

[snipped]

--
Anith

Brian Selzer

unread,

Aug 14, 2006, 12:58:04 PM8/14/06

to

<anit...@gmail.com> wrote in message
news:1155560420.7...@b28g2000cwb.googlegroups.com...

> "Brian Selzer" <br...@selzer-software.com> wrote in message
> news:MNCDg.8096$9T3...@newssvr25.news.prodigy.net...
>>
>> "Bob Badour" <bba...@pei.sympatico.ca> wrote in message
>> news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...
>>
>> [snip]
>>
>>> A natural key is simply a familiar surrogate. Nothing more. Nothing
>>> less.
>>
>> I disagree. A the value of a surrogate (at least according to Codd, and
>> also Date, if I recall correctly) should permanently identify something.
>
> The only way one can have an attribute whose values never change is by
> violating information principle.

How so? In what way does the information principle require that attributes
be mutable? A relational database contains knowledge about things, not
things. Now, if you're going to talk about something, then that thing must
either exist or have existed in the universe. If a thing exists in the
universe and is relevant to the discussion, then there must be some way to
distinguish that thing from all other things that exist, that have existed
and that can exist, otherwise there isn't any way to be sure that you're
talking about the same thing in successive database instances.

The RM is value-based: a database schema determines the possible values that
that database can take. For the database to change, two values, or
instances, must exist and be consistent prior to the change, that is, the
preceding instance and the succeeding instance. Thus, a series of changes
to a database results in a succession of database instances. It is often
necessary to know what was known about something in order to assert
something new. For example, when a credit card charge clears, the bank must
know the balance of the account in order to compute the new balance. More
importantly, the bank must be able to identify the account that is about to
change, and that identity must remain constant in both the preceding and
succeeding database instances. Because changes are set-based, without some
means to correlate the propositions in the preceding instance with those in
the succeeding instance, you cannot be certain that you're talking about the
same thing; therefore, you cannot be certain if the succeeding instance is
correct. The question is: should the solution to this problem be handled in
implementations, or should the theory be strengthened to eliminate it?

>
>> That has always been my understanding, and that has always been how I've
>> used the term. Natural keys can change and still refer to the same
>> thing.
>
> Nothing in surrogacy suggests immutability of values.

I believe Codd used the term "permanent" to describe surrogates. That
implies immutability. He did mention drastic circumstances, such as merging
databases that could require that they change, but the impression I got was
that they should be permanent.

>
>> It's easy to prove. Consider a relation schema that describes employees
>> and has two candidate keys, Social Security Number and Badge Number. If
>> an employee gets a new Badge Number because he lost his badge, does the
>> new Badge Number refer to the same employee? The answer is obvious: if
>> it
>> didn't, then the fact that the Social Security Number didn't change
>> contradicts that. The definition of a candidate key guarantees that the
>> propositions in a single relation value are unique; therefore, a
>> candidate
>> key value can identify a tuple, but only within a single relation value.
>> In order to span multiple database states, that value must be permanent.
>> Codd understood this even if you can't get it through your head: I refer
>> you to the paper he wrote in 1979, "Extending the Database Relational
>> Model to Capture More Meaning."
>
> Perhaps you may want to read through the contradictions in Codd's RM/T
> paper.
> http://www.intelligententerprise.com/db_area/archives/1999/990106/online1.jhtml
>

Been there. I understand the ramifications of hiding attributes. I've
argued before on this forum that surrogates shouldn't be hidden by the
database, but rather from end users by the applications that need them, or
by the DBA if the DBMS provides a means to provide views without them.
Applications need to see them because then they can be sure that the updates
they're making are correct in a concurrent environment. End users, on the
other hand, generally have no need to see them, so as a best practice, they
should be hidden *by the application* from them.

> --
> Anith
>>
>> [snip]
>>
>>
>

Brian Selzer

unread,

Aug 14, 2006, 1:53:50 PM8/14/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:s7%Dg.44810$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

> anit...@gmail.com wrote:
>
>> "Brian Selzer" <br...@selzer-software.com> wrote in message
>> news:MNCDg.8096$9T3...@newssvr25.news.prodigy.net...
>>
>>>"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
>>>news:fqmDg.39802$pu3.5...@ursa-nb00s0.nbnet.nb.ca...
>>>
>>>[snip]
>>>
>>>
>>>>A natural key is simply a familiar surrogate. Nothing more. Nothing
>>>>less.
>>>
>>>I disagree. A the value of a surrogate (at least according to Codd, and
>>>also Date, if I recall correctly) should permanently identify something.
>
> Cite?

I think I did. Did you read Codd's paper? I found the Date reference as
well: "An Introduction to Database Systems," Seventh Edition, page 444.
Look it up.

>... The value of a candidate key is it permanently identifies something;

>hence, the importance of stability as a design criterion. That is as true
>for natural keys as for surrogates.
>

I think you need to review the definition of a candidate key. According to
Date in "Database in Depth," page 63:

<<<<

Definition: Let K be a subset of the heading of relvar R. Then K is a
candidate key (or just key for short) for R if and only if it possesses both
of the following properties:

Uniqueness
No possible value for R contains two distinct tuples with the same value
for K.

Irreducibility
No proper subset of K has the uniqueness property.

>>>>

This is slightly different from his definition in "An Introduction to
Database Systems," Seventh Edition, page 258:

<<<<

Let K be a set of attributes of relvar R. Then K is a candidate key for R
if and only if it possesses both of the following properties:

a. Uniqueness: No legal value of R ever contains two distinct tuples with
the same value for K.

b. Irreducibility: No proper subset of K has the uniqueness property.

>>>>

It's clear from both of these definitions that the scope of the uniqueness
property of a candidate key is *any* legal value for R, not *every* legal
value for R. In other words, the value of a candidate key does *not*
permanently identify something.

>
>> The only way one can have an attribute whose values never change is by
>> violating information principle.
>
> How so? I agree that hiding an attribute violates the information
> principle, but I don't see how precluding changes does.
>
>
>>>That has always been my understanding, and that has always been how I've
>>>used the term. Natural keys can change and still refer to the same
>>>thing.
>
> Selzer's understanding is flawed. I suggest he should try harder to use
> correct terms.
>
>
>> Nothing in surrogacy suggests immutability of values.
>
> Hear! Hear!
>
>
>>>It's easy to prove.
>
> If one accepts Selzer's definitions of terms, then by definition nothing
> needs proof. However, his definitions of terms differ greatly from the
> generally accepted definitions, which makes any pretense of communication
> with him pointless.
>

I guess C.J. Date is out of the mainstream, yes?

>
> Consider a relation schema that describes employees
>>>and has two candidate keys, Social Security Number and Badge Number. If
>>>an employee gets a new Badge Number because he lost his badge, does the
>>>new Badge Number refer to the same employee?
>
> The badge number logically identifies a badge and does so immutably. The
> attribute describing the employee merely describes the current assignment
> of a badge to the employee.

No, by the definition of a candidate key, the badge number identifies an
employee.

>
> Suppose a criminally minded individual finds the lost badge and uses it to
> gain unauthorized access to the employer's resources. Where does Selzer's
> proposed schema record who the badge is assigned to?
>

If the security system identifies an employee by his badge number, then a
lookup by badge number for a lost badge will fail, and access won't be
granted. By the way, what has this to do with my argument?

> In the example he gave above, he proposed two arbitrarily assigned numbers
> as candidate keys. What is a surrogate if not an arbitrarily assigned
> number?
>

What does this have to do with my argument?

> The numbers happen to be somewhat familiar because the employee already
> uses the SSN for his taxes, and the badge number presumably appears on a
> physical artifact handed to the employee: ie. the badge. The familiarity
> of the keys makes them natural keys.
>

Again, what has this to do with my argument?

>
> The answer is obvious: if it
>>>didn't, then the fact that the Social Security Number didn't change
>>>contradicts that.
>
> Ironically, the idiot is too stupid to realize that SSN's are just as
> mutable as badge numbers. While one may not legally have two SSN's at the
> same time, transcription errors and identity theft mean SSN's do change
> over time from the perspective of the organization.
>

It appears that you can't counter my argument, and you're trying to deflect
attention away from that fact.

>
> The definition of a candidate key guarantees that the
>>>propositions in a single relation value are unique; therefore, a
>>>candidate
>>>key value can identify a tuple
>
> True.
>
>
> , but only within a single relation value.
>
> Bullshit. See foreign key references.
>

What do you mean by that? Are you trying to say that the value of a foreign
key identifies a tuple within the relation value that contains the foreign
key? How can the values in a referencing relation value guarantee
uniqueness in the referenced relation value?

JOG

unread,

Aug 15, 2006, 9:51:44 AM8/15/06

to

Just as it is good to describe a person by something that will remain
consistent (for example telling someone that they will recognise 'john'
by his 'green jumper' is not particularly useful if he does actually
change his clothes once in a while), so in database design it is vital
to pick a good candidate key. Desirable properties are:

1) very likely to maintain uniqueness over relation values.
2) very unlikely to change between relation values.

The RM does not require that anything is immutable, but clearly,
finding a key that tends to immutability is desirable for (2).

It appears to me reading the posts in this thread, that everyone is
agreed on these points, so I would like to suitably bang all your heads
together, as in some instances you are all just agreeing loudly, just
with the vociferous bluster of miscommunication.

Given that all are agreed that surrogates are useful, this leaves the
_only_ point that seem to be in contention:

"Should a surrogate be made hidden from the user"

I'd contend no, absolutely not. Surrogates are instigated by real world
difficulties in distinguishing items even if they are generated by a
DBMS (in fact it does not matter a jot what generates them logically).
Hence if they are hidden, it is a clear violation of the information
principle, which is vital for the real world identification that
precipitated the need for them in the first place.

Brian Selzer

unread,

Aug 15, 2006, 12:14:39 PM8/15/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155649903.6...@i3g2000cwc.googlegroups.com...

> Just as it is good to describe a person by something that will remain
> consistent (for example telling someone that they will recognise 'john'
> by his 'green jumper' is not particularly useful if he does actually
> change his clothes once in a while), so in database design it is vital
> to pick a good candidate key. Desirable properties are:
>
> 1) very likely to maintain uniqueness over relation values.
> 2) very unlikely to change between relation values.
>
> The RM does not require that anything is immutable, but clearly,
> finding a key that tends to immutability is desirable for (2).
>
> It appears to me reading the posts in this thread, that everyone is
> agreed on these points, so I would like to suitably bang all your heads
> together, as in some instances you are all just agreeing loudly, just
> with the vociferous bluster of miscommunication.
>
> Given that all are agreed that surrogates are useful, this leaves the
> _only_ point that seem to be in contention:
>

My point is that the Relational Model has a serious limitation that can
compromise integrity.

>
> "Should a surrogate be made hidden from the user"
>
>
> I'd contend no, absolutely not. Surrogates are instigated by real world
> difficulties in distinguishing items even if they are generated by a
> DBMS (in fact it does not matter a jot what generates them logically).
> Hence if they are hidden, it is a clear violation of the information
> principle, which is vital for the real world identification that
> precipitated the need for them in the first place.
>

I agree that it is a violation of the information principle. I also agree
that if there's no other way to distinguish things, that is, if there is no
relevant natural key, such as for distinguishing individual quarters in a
new roll, then they should definitely be exposed to end-users. I don't
agree that the lack of real world identification is the only reason to use
surrogates; I contend that the fact that that real world identification is
not *guaranteed* to remain constant precipitates that need. If there IS a
natural key, then as a best practice, the surrogate should not appear on any
data entry screen or report. In that case, the surrogate is only needed to
work around the limitations of the Relational Model, and since the users
have another means of identify something, they don't have a need to know.

Here's a simple example of what can happen:

create table P
(
x int primary key,
y int
)
create table F
(
x int primary key references P(x),
z int
)
insert P (x, y) values (1, 1)
insert P (x, y) values (2, 3)
insert P (x, y) values (3, 2)
insert F (x, z) values (1, 5)
insert F (x, z) values (2, 4)
select * from P join F on (p.x = F.x)
update P
set x = case x
when 1 then 2
when 2 then 1
end
where x in (1, 2)
select * from P join F on (p.x = f.x)

RESULTS:

Before:
x y x z
------ -------- ------- --------
1 1 1 5
2 3 2 4

After:
x y x z
------ ------- -------- --------
1 3 1 5
2 1 2 4

Should the new values for x in P have cascaded into F? Assume that the
system only has the before image and the after image (the first two columns
in RESULTS) in order to complete the update. How can such a system
differentiate between the above update and the following update?

update P
set y = case y
when 1 then 3
when 3 then 1
end
where x in (1, 2)

Clearly first update affects a key, and consequently, the changes should
cascade, but with the information available (both updates produce the exact
same before and after images), the system cannot differentiate between the
two updates; therefore, it cannot determine whether or not to cascade the
changes. Updates within the Relational Model are are constrained in the
same way as this hypothetical system. All that is available is the
preceding instance and the succeeding instance, and if the only key can
change, then there is no way to correlate tuples.

JOG

unread,

Aug 15, 2006, 1:15:26 PM8/15/06

to

Brian Selzer wrote:
[big snips for clarity]

> I agree that it is a violation of the information principle. I also agree
> that if there's no other way to distinguish things, that is, if there is no
> relevant natural key, such as for distinguishing individual quarters in a
> new roll, then they should definitely be exposed to end-users.

fair enough. We are on the same page there then.

> I don't agree that the lack of real world identification is the only reason to use
> surrogates;

It is the only logical reason though. Anything else is a kludge.

> I contend that the fact that that real world identification is
> not *guaranteed* to remain constant precipitates that need. If there IS a
> natural key, then as a best practice, the surrogate should not appear on any
> data entry screen or report. In that case, the surrogate is only needed to
> work around the limitations of the Relational Model, and since the users
> have another means of identify something, they don't have a need to know.

This still makes no sense to me. In the real world if you have a value
that is insufficient to identify me over time, then it will be no good
for the database either.

For instance if I change my name, then _in the real world_ this is of
no use as identifying me when I was younger. It is an identifier for me
now, but not at all for me as an entity that stretches over my
lifetime. Same for my height, hair colour, age - they all change and
hence do NOT identify me as a temporal concept. And thats in the real
world, nothing to do with databases. Do you agree that?

Now, there are things that will identify me over my lifetime (in this
case the universe of discourse is objects over a period of time), that
will remain consistent. THESE are the correct identifying keys to
choose if the lifetime of a changing entity is what one is concerned
with. I do not see how one can argue against this.

Now sure, a surrogate can represent these perhaps unrecordable
identifiers. But if I want to refer to myself at any point in time, I
_need to know_ that surrogate (or something very intelligent in the
middle has to deduce it from me giving it lots of other information).
Either way the relational model must not hide it.

Hence I contend to you Brian that this is nothing to do with the
relational model. Rather it is a mistake in the *universe of discourse*
that the schema designer has made. They mistakenly believed they were
identifying 'snapshot entities'. In reality they needed to identify 'an
entity over its lifetime'. The domain they were considering was
completely wrong, and hence the key they chose was completely
incorrect.

This is exactly the same as the basic philosophical question of change.
Are you the same person you were when you were 10? In one context the
answer is absolutely not, so an appropriate identifier is not DNA. In
another yes you are the same person, so DNA is perfect. It depends on
the domain you are considering, and one must pick the correct key
accordingly.

I'll consider the example in due course. RL work to do in the meantime,
sigh. All best, Jim.

Brian Selzer

unread,

Aug 15, 2006, 4:30:34 PM8/15/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155662126.3...@m73g2000cwd.googlegroups.com...

I think that every natural key falls into this category, because they CAN
change. It may not happen often. It may not happen at all, but the fact
that it CAN happen is critical, because it shifts the responsibility for
maintaining integrity during updates onto either implementational extensions
or application programs. I think that that violates data independence,
because it ties the database to a particular implementation or application.

> For instance if I change my name, then _in the real world_ this is of
> no use as identifying me when I was younger. It is an identifier for me
> now, but not at all for me as an entity that stretches over my
> lifetime. Same for my height, hair colour, age - they all change and
> hence do NOT identify me as a temporal concept. And thats in the real
> world, nothing to do with databases. Do you agree that?
>

Sure, I agree.

> Now, there are things that will identify me over my lifetime (in this
> case the universe of discourse is objects over a period of time), that
> will remain consistent. THESE are the correct identifying keys to
> choose if the lifetime of a changing entity is what one is concerned
> with. I do not see how one can argue against this.
>
> Now sure, a surrogate can represent these perhaps unrecordable
> identifiers. But if I want to refer to myself at any point in time, I
> _need to know_ that surrogate (or something very intelligent in the
> middle has to deduce it from me giving it lots of other information).
> Either way the relational model must not hide it.
>

The Relational Model has no temporal component, with the exception of
change. The Relational Model is value-based, meaning that any change
involves replacing an existing database instance with a new database
instance, or selecting a different database instance in favor of another.
Only two database instances are important: those bounded by the change. And
again, unless there's a way to correlate the propositions in one database
instance to those in the next, you can't determine whether or not you're
talking about the same thing in both database instances. This requires the
use of implementation extensions or code in an application program to ensure
that the succeeding database instance is consistent.

> Hence I contend to you Brian that this is nothing to do with the
> relational model. Rather it is a mistake in the *universe of discourse*
> that the schema designer has made. They mistakenly believed they were
> identifying 'snapshot entities'. In reality they needed to identify 'an
> entity over its lifetime'. The domain they were considering was
> completely wrong, and hence the key they chose was completely
> incorrect.
>

Sure it is. The definition of a key is not strong enough to identify
something in a universe that can change. A key value does not directly
identify something: it identifies a statement about something in a single
relation value from a single database instance. Once you need to identify
something in more than one database instance, such as during a change, the
model falls apart and implementational extensions or code in application
programs must pick up the slack.

Another example is trying to write an update trigger in Sql Server to log
changes to one table in another. The deleted and inserted pseudotables can
be thought of as the preceding and succeeding database instances
respectively. Unless you constrain updates to a single row, or reject
updates that affect both key and non-key columns, or use a surrogate, you
can't tell what changed to a particular row. In fact, it's possible for it
to appear that nothing at all has changed, even though new key values have
been propogated into referencing tables. (Sql Server does not faithfully
implement the relational model. It can tell the difference between the two
updates above, even though within a trigger on P, the deleted and inserted
pseudotables would appear identical.)

JOG

unread,

Aug 15, 2006, 7:44:10 PM8/15/06

to

No, if you have chosen the correct key, none of that is necessary. Now,
I reckon your probably saying that "any key can change", and you are
right, and I'll address this further on.

>
> > Hence I contend to you Brian that this is nothing to do with the
> > relational model. Rather it is a mistake in the *universe of discourse*
> > that the schema designer has made. They mistakenly believed they were
> > identifying 'snapshot entities'. In reality they needed to identify 'an
> > entity over its lifetime'. The domain they were considering was
> > completely wrong, and hence the key they chose was completely
> > incorrect.
> >
>
> Sure it is. The definition of a key is not strong enough to identify
> something in a universe that can change. A key value does not directly
> identify something: it identifies a statement about something in a single
> relation value from a single database instance. Once you need to identify
> something in more than one database instance, such as during a change, the
> model falls apart and implementational extensions or code in application
> programs must pick up the slack.

Well for my standpoint, this I think highlights the mistake in your
argument. First the definition of a key is a red herring because we are
all agreed on what a key is. Rather it is picking an _appropriate_ key
for the task at hand that is vital.

It is important to see that "Me now" is a completely different entity
to "Me over my whole lifetime". The temporal issue is irrelevant, all
that matters is to recognise they are just different entites. I know
this is initially seems an obtuse philosophical point, but it has
_real_ consequences for how to model those entities.

Something must remain constant to compare entities - something must
identify them. If nothing remains constant the things being compared,
by liebniz equality, are different things full stop. This is what
mathematical logic is grounded in, we can't just avoid it. You seem to
be saying it is possible that "every attribute of something has
changed, yet it is still the same thing". Surely that's logical
nonsense!

>
> > This is exactly the same as the basic philosophical question of change.
> > Are you the same person you were when you were 10? In one context the
> > answer is absolutely not, so an appropriate identifier is not DNA. In
> > another yes you are the same person, so DNA is perfect. It depends on
> > the domain you are considering, and one must pick the correct key
> > accordingly.

I worry you have glazed over this, but it highlights how the context
information is more complex than we initially think, and we must
understand that context to make the right key choices (especially if
you want to make inter-relation value comparisons).

My DNA don't change over my lifetime, so its a good key for that
conceptual entity (n.b. the representative label to record the DNA
might change, but the value is the same.)

Keith H Duggar

unread,

Aug 15, 2006, 7:52:31 PM8/15/06

to

JOG wrote:
[snip excellent post]

It's interesting. I came to a similar conclusion back when
Brian responded to my question. Ie that it's either a poor
choice of "universe" or an attempt to redefine the RM notion
of "identity". I wrote this long reply then decided it was
too metaphysical to belong in cdt so I didn't post. But what
the hell, it's written might as well post it. Feel free to
ignore what lies below.

------

The bulk of your post was a philosophical and metaphysical
discussion. While this helped me understand (I think) your
points it does not constitute a "real-world example". Can
you please provide a real-world (though perhaps cleaned and
simplified) example? No rush but when you have time I think
it will help clarify some of your concerns.

Brian Selzer wrote:

> Keith H Duggar wrote:
> > > My point is that here is a real-world example where the
> > > universes of two databases overlap, but the set of
> > > attributes used to identify something common to the two
> > > universes is different for different databases.
> >

> > So isn't the actual problem that the databases employ
> > different models of the same data? In other words, it
> > seems in that example (and in "the" problem), that the
> > problem is not different /values/ but different
> > /models/?
>
> No. It's not the data that's the same: it's that the
> things that the data is about are the same.

The terms "universe", "overlap", "something common", and
"things" are all very unclear to me. By "universe" do you
mean a "relation universe" as in the set of all relations
matching the given headers? By thing do you mean a tuple? Or
do you mean some metaphysical concept? etc. Perhaps using
clearer or more formal language would be helpful.

> A database contains information about things in the
> universe. In a relational database, this information is
> organized into relations, but that doesn't change the fact
> that each proposition describes things in the universe.

Again what universe? If this is a philosophical discussion
of how the relational model fairs when modeling a particular
notion of the physical or metaphysical /Universe/ (capital U
there) then I think you should first expound more accurately
your philosophical position.

For example:

> The problem is that a thing can have multiple sets of
> identifying properties, that some of these identifying
> properties can have different values in different
> situations, and that when propositions about a thing refer
> to it by using identifying properties that can be
> different in different situations, there's not enough

> information available to know that you're talking about
> the same thing.

If here by "thing" you are referring to metaphysical things
then the notions of identity I'm familiar with are first-
order and second-order variations of Leibniz identity (ie x
is y if all that is true of x is true of y). In this case it
makes no sense to claim "a thing can have multiple sets of
identifying properties" since identity is the value of
/all/ properties.

If by thing you mean mean tuple then the set theoretic
definition of identity is again the obvious Leibniz "all
elements (or attributes) are identical". On the other hand
and if I understand correctly, the relational model uses a
weaker notion of identity based on a /primary key/? That is
identity is determined by a subset of attributes holding as
a /candidate/ key and designated /primary/. So in this case,
by definition there is one and only one "set of identifying
properties", the /primary key/.

Is this correct? If so then I admit I'm confused as to what
precisely the primary key identifies. I sorta understand
the "guaranteed access rule" from an "access" perspective
but not an "identity" perspective. It seems either a little
physical to me or if a primary key "identifies" a tuple
then I'm having trouble escaping the usual Leibniz notion.

Finally, much of your previous discussion has focused on
identity over time: both diachronic and more recently
synchronic. You seem to take a perdurance perspective. That
is "things" have an identity over all time that /persists/
over time. Hence your language such as "how can we tell if
they /are the same thing/". However, logically this view
requires that time is essentially yet another property. This
fits well in the RM if you store time as an attribute. Ie
something like a log file that has already been suggested to
you before. However, by either Leibniz or key identity the
tuples cannot be "the same thing" and hence the "thing" that
you are "identifying" must become a set of tuples not a
single tuple. Example in plain language I Keith am the set
off all my states past present and future.

If you do not maintain time as an attribute then since a
relation is set, only a single /present/ time slice of the
entities entirety is ever present. Hence the RM simply does
maintain the complete perdurance concept of identity you
hold unless time is an attribute.

So in RM it seems either you need to store time as an
attribute or ditch the concept that "things" have identity
beyond their key. Thus the tuple is the "thing". You could
see these tuples as "temporal parts" of a "thing" that the
database never actually holds in it's entirety. This is the
endurance concept of diachronic identity. Though this is
the "usual" way people view time as a "special" dimension
over which "things" can change yet maintain a meta-identity,
it simply does not seem to fit well in the RM given that the
"thing" cannot be an element of a relational domain lacking
a time attribute and hence cannot be discussed.

JOG

unread,

Aug 15, 2006, 7:55:09 PM8/15/06

to

Ultimately I think it's comes to this:

* If you need to compare an entity over time, you must pick a key that
will allow consistent identification over time (a surrogate if
necessary).

* If there is _nothing_ that is consistent over time, then the entites
you thought were the same thing, are not.

* If a key value must be changed for some reason, but it is only the
"label" that is changing (e.g. if two companies surrogates must be
merged, the new surrogates are still subsitutes for the exact same
distinguishing attributes as they were before), then this also must be
done retroactively to old relation values to maintain inter-rv
integrity.

As an example to illustrate the last point, consider if the company was
taken over by a french corporation, who demanded the database be
translated to french. This necessarily requires changing old relation
values too if one wants to make comparisons, and thats certainly no
flaw in relational theory.

Keith H Duggar

unread,

Aug 15, 2006, 8:21:06 PM8/15/06

to

JOG wrote:
> It is important to see that "Me now" is a completely
> different entity to "Me over my whole lifetime". The
> temporal issue is irrelevant, all that matters is to
> recognise they are just different entites. I know this is
> initially seems an obtuse philosophical point, but it has
> _real_ consequences for how to model those entities.
>
> Something must remain constant to compare entities -
> something must identify them. If nothing remains constant
> the things being compared, by liebniz equality, are
> different things full stop. This is what mathematical
> logic is grounded in, we can't just avoid it. You seem to
> be saying it is possible that "every attribute of
> something has changed, yet it is still the same
> thing". Surely that's logical nonsense!

It's not necessarily nonsense. It's the "endurance" concept
of diachronic identity. There are certainly philosophers who
stand by that view. However, I too feel that it leaves a lot
to be desired. Personally I find the "perdurance" notion far
more appealing. In that notion time is yet-another-attribute
and the indentity of a thing is the totality of it's past,
present, and future self. This view of the Universe as a
static whole is appealing in many ways. The mystery it
leaves us with is to explain our /perception/ of change.
Oh and the asymmetric temporal distribution of entropy.

JOG

unread,

Aug 15, 2006, 9:16:41 PM8/15/06

to

Aye, identity is appealing to humans because we need the notion of self
for our own sanity. As such essentialism biases how we view
information, and so the preponderance with OID's, hidden surrogates,
entity models, etc. Predicate logic is pretty immune to this though and
that's part of why, imo, its a vital grounding point for data models.

Having said that, in general metaphysics for its own sake of it isn't
of much interest to me (life's too short hey) - its only any use if the
stuff has important consequences for the day to day practice of
databases.

Brian Selzer

unread,

Aug 16, 2006, 1:01:39 AM8/16/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155685450.5...@b28g2000cwb.googlegroups.com...

I disagree. The definition of a key is critical. Because its scope is a
single database instance, it cannot permanently identify things in a
universe that can change. Stability is irrelavent. If it can change, then
it must be assumed that it will. In the example below, the tuples in the
preceding state CAN be correlated to the tuples in the succeeding state, yet
as shown below, it cannot be determined whether x changed or y. If you had
a transition constraint on y, it could not be enforced--it would accept
changes that should be rejected and would reject changes that should be
accepted. If you tried to write a trigger to cascade the update, it would
introduce garbage into the system--sometimes it would propogate the change,
sometimes it wouldn't. If you tried to write a trigger to log changes, it
would record incorrect information.

Note: I categorically reject the notion that the application should issue
key updates separately. If the system allows ad-hoc queries (and most do),
then one could issue such a change, bypassing the application.

> It is important to see that "Me now" is a completely different entity
> to "Me over my whole lifetime". The temporal issue is irrelevant, all
> that matters is to recognise they are just different entites. I know
> this is initially seems an obtuse philosophical point, but it has
> _real_ consequences for how to model those entities.
>

I see the difference, but I can't see how you can shrug off the temporal
issue. They must have a common property because they're related. I would
argue that "Me now" is part of "Me over my whole lifetime."

> Something must remain constant to compare entities - something must
> identify them. If nothing remains constant the things being compared,
> by liebniz equality, are different things full stop. This is what
> mathematical logic is grounded in, we can't just avoid it. You seem to
> be saying it is possible that "every attribute of something has
> changed, yet it is still the same thing". Surely that's logical
> nonsense!
>

In the mathematical sense, if *any* attribute is different, then they're
different, but that's beside the point. I'm not saying that it is possible

that "every attribute of something has changed, yet it is still the same

thing" What I'm saying is that "every universal property of something can
change, yet it is still the same thing." I know I'm going to get beat up
about this, but as I see it, there is a difference between a property that
defines something and a property that describes something. In an earlier
post, I used the terms, "individual" and "universal" to describe these
categories of properties. Individual properties define the essence of
something and remain constant throughout its lifetime. Universal properties
depend on the state of the universe during that lifetime. Universal
properties are the only ones that can change.

"Are you the same person you were when you were 10?" depends on the context
of the query, not necessarily the data, but in either context, you must be
able to correlate the state when you were 10 to the state you are now in
order to do the comparison. In the one context, you would compare the
individual properties; in the other you would compare the universal
properties.

Brian Selzer

unread,

Aug 16, 2006, 4:31:10 AM8/16/06

to

"Keith H Duggar" <dug...@alum.mit.edu> wrote in message

news:1155685951....@m73g2000cwd.googlegroups.com...

I guess it's time to wax philosophical. (I'm going to try to use the
correct terms, but I'm not a philosopher, so please, be gentle.) The
universe of discourse is the set of all things that are interesting. (By
"things," I mean also people and places.) "Interesting" limits the universe
to only those things that can be relevant. Interesting things can be
discussed, relevant things are being discussed.
Things have properties that define and describe them. Properties can be
interesting; they can also be relevant. There are two categories of
properties: individual and universal. An individual property defines the

essence of something, either by itself or when combined with other

individual properties. Individual properties remain constant throughout
something's lifetime because their values depend only on the existence of
the thing that they define. A universal property, on the other hand, is

descriptive in nature and its value depends on a state of the universe (a

situation). The value of a universal property of something can be different
in different situations. The distinction between individual and universal
properties implies that the universe can change. Things persist, and they
have both endurant and purdurent properties. Individual properties are
endurant; universal properties are purdurent.
Events change the universe. They can alter the appearance of things, or
they can affect the existence of things. They can create new things; they
can transform things; they can destroy things. Events shape the universe,
unraveling it into a succession of developing situations. Events imply
order (Something must exist before it can be destroyed.), but not
necessarily duration. Events can occur simultaneously, each affecting a
different set of things, or they can occur in a definite sequence, each
affecting the same set of things. Each event marks the temporal boundary
between one situation and its successor. The effect of an event is embodied
in three sets: the set of things destroyed, the set of things changed, and
the set of things created. For things created, the event is an individual
property that provides a link to the circumstances that brought them into
being. For things destroyed, the event relegates them to history. For
things altered, universal properties will have new values, while the old
values become history. Each thing that exists has a universal property that
identifies the event that caused it to appear as it does. Each thing that
is history has no universal properties: it's individual properties include
all of the properties of the thing for which it became history, their values
originating from the situation whose end was bounded by the event that
created it. Thus, an interval exists within the individual properties of a
history that is bounded by the event that caused the originating thing to
have the values that became history and by the event that created the
history.

What is known about the universe is markedly different from what the
universe is. What is known about the universe is part of the universe. The
universe is the set of all interesting things that can be discussed; a
database records the discussion. Events precede knowledge: a situation must
arise before it can be discussed. A database consists of a schema, a
predicate, and a succession of database instances separated by a series of
informative statements. Each database instance contains knowledge about
things in the universe that was relevant to the discussion when that
instance came into being. This knowledge is a set of propositions about
things in the universe. A relational database instance is organized into a
set of relation values containing sets of tuples that correspond to and
represent the set of propositions. A database instance contains only true
propositions (at least they're assumed to be true) about relevant things. A
relational database instance is one where history is not relevant; a
temporal database instance is one where history is relevant.

JOG

unread,

Aug 16, 2006, 6:51:32 AM8/16/06

to

Brian Selzer wrote:
> [snippage]

> I disagree. The definition of a key is critical. Because its scope is a
> single database instance, it cannot permanently identify things in a
> universe that can change.

Ok, I understand your point there - you want a form of key that is
consistent across relation values. I agree that is useful, and
terminology is sketchy. However given that the definition of a database
relation is still argued (date/pascal for example), I think we have to
concede there is also ambiguity at the very edges of the term key. As
ever as long as we are talking about the same thing we'll be ok.

> [snippage]

> > It is important to see that "Me now" is a completely different entity
> > to "Me over my whole lifetime". The temporal issue is irrelevant, all
> > that matters is to recognise they are just different entites. I know
> > this is initially seems an obtuse philosophical point, but it has
> > _real_ consequences for how to model those entities.
> >
>
> I see the difference, but I can't see how you can shrug off the temporal
> issue.

Temporal differences are just a context change.
Geographical/Cultural/etc may be another generating exactly the same
problems. There is no way I'm shrugging it off, I'm just saying we
can't just externalise it - its integral to the universe of discourse
we are concerned with.

> They must have a common property because they're related. I would
> argue that "Me now" is part of "Me over my whole lifetime."

"Me now" and "Me over my lifetime" could have a common property, but
they don't _have to_ as they are entities from separate domains. Again
it's the universe of discourse under concern that matters. At some
point the chances are they will have common properties, but a DNA value
might not be recorded for "Me now" and it still be perfectly valid in
its domain.

>
> > Something must remain constant to compare entities - something must
> > identify them. If nothing remains constant the things being compared,
> > by liebniz equality, are different things full stop. This is what
> > mathematical logic is grounded in, we can't just avoid it. You seem to
> > be saying it is possible that "every attribute of something has
> > changed, yet it is still the same thing". Surely that's logical
> > nonsense!
> >
>
> In the mathematical sense, if *any* attribute is different, then they're
> different, but that's beside the point. I'm not saying that it is possible
> that "every attribute of something has changed, yet it is still the same
> thing" What I'm saying is that "every universal property of something can
> change, yet it is still the same thing." I know I'm going to get beat up
> about this, but as I see it, there is a difference between a property that
> defines something and a property that describes something. In an earlier
> post, I used the terms, "individual" and "universal" to describe these
> categories of properties. Individual properties define the essence of
> something and remain constant throughout its lifetime. Universal properties
> depend on the state of the universe during that lifetime. Universal
> properties are the only ones that can change.

I've wrestled with definitional and descriptional properties too, and
I've come to the conclusion (painfully) from reading the philosophy
that there isn't any difference. Consider two towers, identical in
every way, one in paris and one in london. The only difference to
identify them is their geographical position. Then, what we may have
thought of as a descriptional property originally. is now necessarily
definitional. Its a murky distinction. I'll post a better example at
some point, as I think this also contributes to why we shouldn't hide
surrogates.

>
> "Are you the same person you were when you were 10?" depends on the context
> of the query, not necessarily the data, but in either context, you must be
> able to correlate the state when you were 10 to the state you are now in
> order to do the comparison. In the one context, you would compare the
> individual properties; in the other you would compare the universal
> properties.

No! You have already presumed they are the same person if you do that.

All one needs to do is see if the two entities _do correlate or not_
(And not assume they do a priori). If they don't, as in the identity is
not the same, I conclude they are different people. If they do, I can
say they are the same person, but in different states. I apologise for
banging on about liebniz all the time, but thats exactly how we
distinguish whether something is the same thing or not (conceding that
functional dependency means we can rely on the key for identity in RM).
I think this is a big point where we are differing.

Brian Selzer

unread,

Aug 16, 2006, 7:48:34 AM8/16/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155725491.9...@p79g2000cwp.googlegroups.com...

Ah, but there's the rub. Functional dependencies are the root of the
problem. There existence implies that data is something more than just
data, but their scope is limited to a single database instance.

JOG

unread,

Aug 16, 2006, 8:34:02 AM8/16/06

to

Can you expand on this. Use of functional dependency allows one to
identify a proposition not an entity. As such we'd expect them to
correspond to a single relation value no?

I wonder if we differ because of epistimology. You believe in
genidentity. i.e.:

"An existential relationship underlying the genesis of an object from
one moment to the next. According to this interpretation, what we
usually consider an object really consists of a multitude of entities,
as it were, the phases of the object at various times. Two objects are
not genidentical because they have the same properties and
characteristics, but because one has developed from the other."

Whereas I do not, as in RL this form of identity cannot always be
observed (and hence it opposes RM's information principle). Rather I
believe that humans handle information in terms of 'enduring-identity',
given by at least one consistent, but recordable attribute.

You hence, as a perdurist, believe in hidden surrogates (in fact
surrogate is now the wrong word as it is not a substitute for any
characteristic), whereas as an endurist I believe in visible
surrogates.

If this is true our debate concerning databases, although damn
interesting, is fubar.

Bob Badour

unread,

Aug 16, 2006, 11:52:08 AM8/16/06

to

JOG wrote:

> Just as it is good to describe a person by something that will remain
> consistent (for example telling someone that they will recognise 'john'
> by his 'green jumper' is not particularly useful if he does actually
> change his clothes once in a while), so in database design it is vital
> to pick a good candidate key. Desirable properties are:
>
> 1) very likely to maintain uniqueness over relation values.
> 2) very unlikely to change between relation values.
>
> The RM does not require that anything is immutable, but clearly,
> finding a key that tends to immutability is desirable for (2).
>
> It appears to me reading the posts in this thread, that everyone is
> agreed on these points, so I would like to suitably bang all your heads
> together, as in some instances you are all just agreeing loudly, just
> with the vociferous bluster of miscommunication.

I disagree with your analysis.

A self-aggrandizing ignorant showed up and posted nonsense. In fact, he
continues to post nonsense. However, he now has you trained to re-define
your existing vocabulary to turn his nonsense into something not only
sensible but wholly unremarkable.

Design criteria for choice of candidate keys include: familiarity,
irreducibility, simplicity and stability.

See http://www.dbdebunk.com/page/page/622344.htm

> Given that all are agreed that surrogates are useful

I disagree that the concept of surrogate vs. natural is useful. A
natural key is merely a familiar surrogate. The self-aggrandizing
ignorant now has you redefining terms to make "surrogate" synonymous
with "stable" and to make "natural" synonymous with "unstable".

By accepting his misuse of vocabulary, you encourage and legitimize the
illegitimate while you interfere with communication and comprehension.
Is it your goal to understand theory or to market the services of
ignorants at the expense of their potential clients and other stakeholders?

, this leaves the
> _only_ point that seem to be in contention:
>
> "Should a surrogate be made hidden from the user"

I disagree that the above is the only point that remains in contention,
and I would further note that the issue does not remain in contention
among any who are reasonably informed and intelligent.

Familiarity is generally as important a design criterion as stability.
In specific contexts, one or the other may have greater pragmatic
repercussions. Hiding values not only precludes familiarity but violates
the Information Principle with all that that implies.

> I'd contend no, absolutely not. Surrogates are instigated by real world
> difficulties in distinguishing items even if they are generated by a
> DBMS (in fact it does not matter a jot what generates them logically).
> Hence if they are hidden, it is a clear violation of the information
> principle, which is vital for the real world identification that
> precipitated the need for them in the first place.

All identifiers--even in 'the real world'--are surrogates. Every last
one of them. What then is the use of the term? Twenty years ago there
was some discussion of surrogate keys vs. natural keys, but further
reflection reveals that natural keys are nothing more or less than
familiar surrogates.

The term 'intelligent key' at least defines a proper subset of keys. The
drawback of an intelligent key is increased instability. Sometimes,
though, intelligent keys offer mnemonic or verification advantages
related to familiarity. Whether to use an intelligent key then becomes a
pragmatic tradeoff between stability and familiarity.

There really is little more to discuss about natural keys vs. surrogates
except that self-aggrandizing ignorants periodically appropriate the
terms in the nonsense they spout. The more you try to make sense of the
nonsense the more you legitimize the self-aggrandizing ignorants without
any further benefit to anyone.

If you are going to engage the self-aggrandizing ignorants, please, do
better at calling them on their bullshit.

JOG

unread,

Aug 16, 2006, 12:41:08 PM8/16/06

to

Bob Badour wrote:
> [snip]

> By accepting his misuse of vocabulary, you encourage and legitimize the
> illegitimate while you interfere with communication and comprehension.
> Is it your goal to understand theory or to market the services of
> ignorants at the expense of their potential clients and other stakeholders?

Of course not. All corrections are welcomed Bob.

> [snip]

> I disagree that the concept of surrogate vs. natural is useful. A
> natural key is merely a familiar surrogate.

By this I am unclear what a natural key is a surrogate for. If you
could spend some time expanding this definition, or referencing it, I'd
appreciate it.

> The self-aggrandizing
> ignorant now has you redefining terms to make "surrogate" synonymous
> with "stable" and to make "natural" synonymous with "unstable".

Piffle, bob. I have not adopted any of his terminology in my effort to
engage him, and you are misrepresenting my understanding of terms.

I view a surrogate as a 'substitute' for an unrecordable distinguishing
attribute or set of attributes. This is the only logical definition
that I am currently happy with. It does not require stability by
definition, and I have only ever stated that stability is an attractive
quality and hence often appropriate to good design. That's my current
standpoint, and so that is what should or should not be criticized.

Nevertheless I do not think you understood where Brian's mistakes stem
from, and that's what I have been looking for. It appears to be a
fundamental difference in what he views _identity_ to be. If the OP
does not maintain this basic concept of identity as we do (and as Codd
and Liebniz did) then there is little point in debating natural or
surrogate keys with him at all. The problem goes far deeper than that.

Jim.

Brian Selzer

unread,

Aug 16, 2006, 1:16:57 PM8/16/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:IaHEg.48774$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

> JOG wrote:
>
>> Just as it is good to describe a person by something that will remain
>> consistent (for example telling someone that they will recognise 'john'
>> by his 'green jumper' is not particularly useful if he does actually
>> change his clothes once in a while), so in database design it is vital
>> to pick a good candidate key. Desirable properties are:
>>
>> 1) very likely to maintain uniqueness over relation values.
>> 2) very unlikely to change between relation values.
>>
>> The RM does not require that anything is immutable, but clearly,
>> finding a key that tends to immutability is desirable for (2).
>>
>> It appears to me reading the posts in this thread, that everyone is
>> agreed on these points, so I would like to suitably bang all your heads
>> together, as in some instances you are all just agreeing loudly, just
>> with the vociferous bluster of miscommunication.
>
> I disagree with your analysis.
>
> A self-aggrandizing ignorant showed up and posted nonsense. In fact, he
> continues to post nonsense. However, he now has you trained to re-define
> your existing vocabulary to turn his nonsense into something not only
> sensible but wholly unremarkable.
>

I may be ignorant when it comes to terminology, but at least I know the
definition of a candidate key and intimately understand its scope and
limitations. Most of your arguments have been completely debunked because
they're based upon a false premise. The rest of your bloviating
concentrates only on denigrading anybody who doesn't share your views, even
after having been given ample evidence to the contrary.

Brian Selzer

unread,

Aug 16, 2006, 2:19:35 PM8/16/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155746467....@b28g2000cwb.googlegroups.com...

This thread started out because I objected to Bodour's statement, "natural
keys are just familiar surrogates." That goes counter to the understanding
I've held for years, that is supported by both Codd and Date. Codd used the
term "permanent" to describe surrogates. Date implied permanence also by
describing a surrogate's value as one that has never been used and will
never be reused. Therefore, when I think of a surrogate key, I think in
terms of permanence, not just stability.

Do you agree that in one relation value a candidate key value can identify a
tuple?

Do you agree that a relation schema can have more than one candidate key?

If only one candidate key value is different in successive relation values,
can the others identify corresponding tuples in both relation values? I
guess more importantly, do you think that the tuples correspond?

If tuples can correspond in successive relation values that have multiple
candidate keys where at least one remains constant, then why can't they
correspond in successive relation values where none remain constant? What
is being identified by corresponding tuples in successive relation values
that have more than one candidate key where only one of the values have
changed? The tuple? No, I don't think so. I think that it is the thing
that the corresponding propositions are referring to.
This is the source of the confusion. This is the impetus of my argument.
If the thing that is indirectly identified by a candidate key can have its
appearance altered without altering its identity (which must be possible if
tuples that are different can correspond), then it is possible to have
tuples in successive relation values that should correspond but don't. This
is why I'm arguing for some mechanism to guarantee the ability to correlate
tuples.

> Jim.
>

kvnkr...@gmail.com

unread,

Aug 16, 2006, 2:52:28 PM8/16/06

to

JOG wrote:
> Bob Badour wrote:
> > [snip]
> > By accepting his misuse of vocabulary, you encourage and legitimize the
> > illegitimate while you interfere with communication and comprehension.
> > Is it your goal to understand theory or to market the services of
> > ignorants at the expense of their potential clients and other stakeholders?
>
> Of course not. All corrections are welcomed Bob.
>
> > [snip]
> > I disagree that the concept of surrogate vs. natural is useful. A
> > natural key is merely a familiar surrogate.
>
> By this I am unclear what a natural key is a surrogate for. If you
> could spend some time expanding this definition, or referencing it, I'd
> appreciate it.
>

I think I see where Bob is going with this. About 6 weeks ago, my wife
and I strung together a couple of arbitrary values, "Wade" and "James",
appended a surname, and wound up with a new "natural key" for our
son... or is that a surrogate key? The US government runs a database,
and in order to add an entry for him into the database, it generated a
new surrogate key for our son - but many people treat SSN as a "natural
key". Presumably, our son may one day gain employment at the same
company at which I work, in which case, he will be assigned a
"surrogate key" employee ID. Bob's point, I think, is that there is no
fundamental difference between the "natural key" my wife and I
generated for our son; the SSN natural/surrogate key the US government
generated for our son, and my company's surrogate key Emp_id.

On the other hand, is my DNA sequence a "familiar surrogate key"? It
seems there *is* a fundamental difference between "Line 3 of invoice
12" and "line_id 1433" - both may be candidates for the same entity,
yet it seems like the first should qualify as a natural key and the
second should qualify as a surrogate... making Bob's claim that "A
natural key is merely a familiar surrogate." seem a bit of an
overgeneralization.

Bob Badour

unread,

Aug 16, 2006, 3:00:37 PM8/16/06

to

JOG wrote:

> Bob Badour wrote:
>
>>[snip]
>>By accepting his misuse of vocabulary, you encourage and legitimize the
>>illegitimate while you interfere with communication and comprehension.
>>Is it your goal to understand theory or to market the services of
>>ignorants at the expense of their potential clients and other stakeholders?
>
> Of course not. All corrections are welcomed Bob.
>
>
>>[snip]
>>I disagree that the concept of surrogate vs. natural is useful. A
>>natural key is merely a familiar surrogate.
>
> By this I am unclear what a natural key is a surrogate for. If you
> could spend some time expanding this definition, or referencing it, I'd
> appreciate it.

It is a surrogate for whatever a surrogate key is for. Think of any
natural key. How is it not a surrogate?

My name is not me. It is an arbitrary identifier chosen by my parents.
It is familiar because I was conditioned from an early age to respond to it.

My SSN is not me. It is an arbitrary identifier chosen by the IRS to
identify tax filings related to my income. It is familiar because I was
given a little blue card with it inscribed, and I was instructed to
transcribe it to a variety of documents.

My driver's license is not me. It is an arbitrary identifyer chosen by
the province to identify records related to my driving history. It is
familiar because I was given a little card with it inscribed next to my
picture, and I was instructed to present the card or transcribe the
number in a variety of situations.

I am not suitably represented for machine processing. Whatever
identifier we use for me in the database is not me. It is only a
surrogate for me.

Historically, back when folks debated the merits of natural keys, what
folks identified as natural keys were nothing more or less than familiar
surrogate keys. The arguments against natural keys relate mostly to control.

The IRS controls my SSN. The province controls my driver's license
number. When any other organization chooses to use those identifiers,
they choose to have no control.

When any other organization chooses to create a surrogate, they simply
create another arbitrary identifier but under their control. That
identifier then becomes a natural key.

For example, when I get hired at a company, they give me an employee
number. They tell me what it is and instruct me to transcribe it to a
number of documents (some on a recurring basis) etc.

>>The self-aggrandizing
>>ignorant now has you redefining terms to make "surrogate" synonymous
>>with "stable" and to make "natural" synonymous with "unstable".
>
> Piffle, bob. I have not adopted any of his terminology in my effort to
> engage him, and you are misrepresenting my understanding of terms.

When he uses 'surrogate' as a synonym for 'stable' and 'natural' as a
synonym for 'unstable', you respond to what he posts as if he made the
substitution and without suggesting the definitions are unsuitable.
Thus, you implicitly accept his terminology whether you understand it
that way or not. To any external observer, such as myself, your actions
indicate you accept the definitions.

> I view a surrogate as a 'substitute' for an unrecordable distinguishing
> attribute or set of attributes. This is the only logical definition
> that I am currently happy with.

That definition is fair enough. Thus, any time we use a value to
identify anything other than the value itself, it is a surrogate for
something else.

It does not require stability by
> definition, and I have only ever stated that stability is an attractive
> quality and hence often appropriate to good design. That's my current
> standpoint, and so that is what should or should not be criticized.

With all due respect, you have also stated that agreement exists where
none does, and you have also denied the existence of contention that
does exist.

> Nevertheless I do not think you understood where Brian's mistakes stem
> from, and that's what I have been looking for. It appears to be a
> fundamental difference in what he views _identity_ to be. If the OP
> does not maintain this basic concept of identity as we do (and as Codd
> and Liebniz did) then there is little point in debating natural or
> surrogate keys with him at all. The problem goes far deeper than that.

His mistakes are simple. He is a self-aggrandizing ignorant spouting
nonsense. He hasn't a clue what the terms he uses even mean. He spouts
nonsensical malaprops expecting you to transform them into something
cogent, and you happily comply.

You are now giving him a platform from which to pretend to debate
Liebniz and Codd as if he were a peer. ::rolls eyes::

There is little point debating anything with such a self-aggrandizing
ignorant, which leaves me wondering why you continue to debate him.

anit...@gmail.com

unread,

Aug 16, 2006, 3:15:11 PM8/16/06

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:XkJEg.10231$o27....@newssvr21.news.prodigy.com...

All you had to do to prove him wrong was to provide an example of a
natural key that is not a familiar surrogate.

> That goes counter to the understanding
> I've held for years, that is supported by both Codd and Date. Codd used the
> term "permanent" to describe surrogates. Date implied permanence also by
> describing a surrogate's value as one that has never been used and will
> never be reused. Therefore, when I think of a surrogate key, I think in
> terms of permanence, not just stability.

Neither Codd nor Date suggested the "permanence" of surrogate values,
but the permanence of the surrogate key.

>
> Do you agree that in one relation value a candidate key value can identify a
> tuple?
>
> Do you agree that a relation schema can have more than one candidate key?
>
> If only one candidate key value is different in successive relation values,
> can the others identify corresponding tuples in both relation values? I
> guess more importantly, do you think that the tuples correspond?
>
> If tuples can correspond in successive relation values that have multiple
> candidate keys where at least one remains constant, then why can't they
> correspond in successive relation values where none remain constant? What
> is being identified by corresponding tuples in successive relation values
> that have more than one candidate key where only one of the values have
> changed? The tuple? No, I don't think so. I think that it is the thing
> that the corresponding propositions are referring to.
> This is the source of the confusion. This is the impetus of my argument.
> If the thing that is indirectly identified by a candidate key can have its
> appearance altered without altering its identity (which must be possible if
> tuples that are different can correspond), then it is possible to have
> tuples in successive relation values that should correspond but don't. This
> is why I'm arguing for some mechanism to guarantee the ability to correlate
> tuples.

So you are arguing for tuple identifiers rather than a candidate key.
Here is an old post that references the quote on the distinction:
http://groups.google.com/group/comp.databases.theory/msg/bab7d889d157f137

>
>> Jim.
>>
>
>

Brian Selzer

unread,

Aug 16, 2006, 3:21:37 PM8/16/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:pXJEg.49046$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

If I'm an ignorant, then where do you fall. You can't come up with a decent
argument to counter mine. Yours are baseless and inane. Yet you continue
with these arguments even when confronted with citations from both Codd and
Date. Who, then, is self-aggrandizing? I think you need to take the log
out of your own eye.

erk

unread,

Aug 16, 2006, 3:52:28 PM8/16/06

to

Brian Selzer wrote:
> How so? In what way does the information principle require that attributes
> be mutable? A relational database contains knowledge about things, not
> things. Now, if you're going to talk about something, then that thing must
> either exist or have existed in the universe.

Referring to "things" and "entities" may be part of the problem, as
it's on the slippery slope to First Great Blunderville. Facts are
facts, and don't change, though our understanding of them does. A
database state is simply a set of propositions, which uses predicates
(relations) as the organizing principle. Constraints keep our
propositions consistent, with respect to identifiable values. I don't
think there's any necessity for a group of facts in one database state
to be somehow traceable to a previous one. I could certainly be wrong;
it just feels to me like this discussion is pivoting on identifiable
"thingies," rather than logical constructs.

> If a thing exists in the
> universe and is relevant to the discussion, then there must be some way to
> distinguish that thing from all other things that exist, that have existed
> and that can exist, otherwise there isn't any way to be sure that you're
> talking about the same thing in successive database instances.

Why does one have to "talk about the same thing" in successive database
instances (values)?

> It is often
> necessary to know what was known about something in order to assert
> something new. For example, when a credit card charge clears, the bank must
> know the balance of the account in order to compute the new balance.

Maybe, but from a functional standpoint, that operator is just a
function (e.g. "subtract $500 from X), in which the balance is a free
variable. Only in an imperative world does that involve "knowing"
(referencing) the "previous" balance. Function application means
there's no "query" of the value prior to the update.

> More
> importantly, the bank must be able to identify the account that is about to
> change, and that identity must remain constant in both the preceding and
> succeeding database instances.

Why? As long as it can be identified via some query, what difference
does it make? For example, if I make a database schema change and
introduce a new key, with appropriate view changes to support old
application code, is there some logical distinction? If the external
queries all still produce the same results, excepting the specific
values being updated, what does "identity" have to do with it?

> Because changes are set-based, without some
> means to correlate the propositions in the preceding instance with those in
> the succeeding instance, you cannot be certain that you're talking about the
> same thing;

There is no "thing." These are propositions, or assertions if you like,
nothing more. The only meaning is in the correlation of queries to
external phenonema of interest.

> therefore, you cannot be certain if the succeeding instance is
> correct. The question is: should the solution to this problem be handled in
> implementations, or should the theory be strengthened to eliminate it?

What "strengthening" would eliminate the "problem"?

> I believe Codd used the term "permanent" to describe surrogates. That
> implies immutability. He did mention drastic circumstances, such as merging
> databases that could require that they change, but the impression I got was
> that they should be permanent.

"Impression," "should," and "implies" are all insufficient explanation
- if I missed your solution in a previous post, let me know, but Codd's
writings aren't (especially in cases like this where there appears to
be room for interpretation) holy writ.

- erk

Brian Selzer

unread,

Aug 16, 2006, 3:55:51 PM8/16/06

to

<anit...@gmail.com> wrote in message
news:1155755711....@74g2000cwt.googlegroups.com...

I disagree. Codd emphasized the need for permanent identifiers for entities
immediately before introducing the concept. I grant that he did cite
drastic circumstances that may require them to change, such as merging
databases.

No. I'm arguing for a theoretical mechanism to correlate tuples during an
update. If you can reliably correlate tuples without an immutable key, then
it doesn't matter whether or not the only key changes, you can still be sure
that your talking about the same thing during an update. My point is that
regardless of the mechanism, whether that's surrogate keys, tuple
identifiers, or a tuple-level assignment operator (Oracle has a FOR EACH ROW
trigger, as do several other implementations, but I would argue for a
non-procedural mechanism.), the model is incomplete without it. I think
that surrogate keys make the most sense, since they are essentially
properties of things in the universe of discourse, but an argument can be
made for tuple identifiers as well, at least for temporal databases.

>>
>>> Jim.
>>>
>>
>>
>

JOG

unread,

Aug 16, 2006, 4:38:35 PM8/16/06

to

I appreciate the explanation - I'll discuss it in a distinct thread
away from the bluster of the rest of this one.

>
>
> >>The self-aggrandizing
> >>ignorant now has you redefining terms to make "surrogate" synonymous
> >>with "stable" and to make "natural" synonymous with "unstable".
> >
> > Piffle, bob. I have not adopted any of his terminology in my effort to
> > engage him, and you are misrepresenting my understanding of terms.
>
> When he uses 'surrogate' as a synonym for 'stable' and 'natural' as a
> synonym for 'unstable', you respond to what he posts as if he made the
> substitution and without suggesting the definitions are unsuitable.
> Thus, you implicitly accept his terminology whether you understand it
> that way or not. To any external observer, such as myself, your actions
> indicate you accept the definitions.

I have attempted to focus on what he meant, to address the mistake in
his underlying argument, and yes have not been concerned with
observers. Others such as yourself had already objected to his
defintions. However I do not accept that any point I appeared to have
agreed in the slightest with his views on hidden values.

>
>
> > I view a surrogate as a 'substitute' for an unrecordable distinguishing
> > attribute or set of attributes. This is the only logical definition
> > that I am currently happy with.
>
> That definition is fair enough. Thus, any time we use a value to
> identify anything other than the value itself, it is a surrogate for
> something else.
>
>
> It does not require stability by
> > definition, and I have only ever stated that stability is an attractive
> > quality and hence often appropriate to good design. That's my current
> > standpoint, and so that is what should or should not be criticized.
>
> With all due respect, you have also stated that agreement exists where
> none does, and you have also denied the existence of contention that
> does exist.

Agreed, I was overly optimistic it seems.

>
>
> > Nevertheless I do not think you understood where Brian's mistakes stem
> > from, and that's what I have been looking for. It appears to be a
> > fundamental difference in what he views _identity_ to be. If the OP
> > does not maintain this basic concept of identity as we do (and as Codd
> > and Liebniz did) then there is little point in debating natural or
> > surrogate keys with him at all. The problem goes far deeper than that.
>
> His mistakes are simple. He is a self-aggrandizing ignorant spouting
> nonsense. He hasn't a clue what the terms he uses even mean. He spouts
> nonsensical malaprops expecting you to transform them into something
> cogent, and you happily comply.
>
> You are now giving him a platform from which to pretend to debate
> Liebniz and Codd as if he were a peer. ::rolls eyes::

No, the intention is that anyone involved in a debate, when shown that
they are contesting the conclusions of those such as Liebniz, should
realise they are on a sticky wicket and attempt to reassess their
logic. Should we be so cynical as to think they someone is incapable of
that realisation?

>
> There is little point debating anything with such a self-aggrandizing
> ignorant, which leaves me wondering why you continue to debate him.

Few reasons: First, and most importantly, attempting to convince
someone of my own standpoint gives me much needed practice formulating
the correct arguments, and clarifies my own opinions. Second, by
actually isolating the root mistake (in this case the view of
identity), the next time someone has the same mistaken view (and we all
know that will happen) I can cut to the chase. Third, maybe, just maybe
I will be able to sway the OP. Granted the last now seems impossible
given that I believe Brian doesn't agree with the underpinnings of
predicate logic.

Bob Badour

unread,

Aug 16, 2006, 4:52:36 PM8/16/06

to

JOG wrote:
> Bob Badour wrote:
>
>>JOG wrote:
>>
>>
>>>Bob Badour wrote:

[snip]

>>>Nevertheless I do not think you understood where Brian's mistakes stem

>>>from, and that's what I have been looking for. It appears to be a
>>>fundamental difference in what he views _identity_ to be. If the OP
>>>does not maintain this basic concept of identity as we do (and as Codd
>>>and Liebniz did) then there is little point in debating natural or
>>>surrogate keys with him at all. The problem goes far deeper than that.
>>
>>His mistakes are simple. He is a self-aggrandizing ignorant spouting
>>nonsense. He hasn't a clue what the terms he uses even mean. He spouts
>>nonsensical malaprops expecting you to transform them into something
>>cogent, and you happily comply.
>>
>>You are now giving him a platform from which to pretend to debate
>>Liebniz and Codd as if he were a peer. ::rolls eyes::
>
> No, the intention is that anyone involved in a debate, when shown that
> they are contesting the conclusions of those such as Liebniz, should
> realise they are on a sticky wicket and attempt to reassess their
> logic. Should we be so cynical as to think they someone is incapable of
> that realisation?

All I can say is: You'll learn. You'll learn.

>>There is little point debating anything with such a self-aggrandizing
>>ignorant, which leaves me wondering why you continue to debate him.
>
> Few reasons: First, and most importantly, attempting to convince
> someone of my own standpoint gives me much needed practice formulating
> the correct arguments, and clarifies my own opinions.

Again, I ask that you not ignore the most obviously glaring nonsense
before proceeding to deeper issues. Alternatively, you could try doing
so in private communication.

Second, by
> actually isolating the root mistake (in this case the view of
> identity), the next time someone has the same mistaken view (and we all
> know that will happen) I can cut to the chase. Third, maybe, just maybe
> I will be able to sway the OP. Granted the last now seems impossible
> given that I believe Brian doesn't agree with the underpinnings of
> predicate logic.

Given the way he abuses language, how would you even know?

Bob Badour

unread,

Aug 16, 2006, 5:13:03 PM8/16/06

to

JOG wrote:

> Brian Selzer wrote:
>
>>[snippage]
>>I disagree. The definition of a key is critical. Because its scope is a
>>single database instance, it cannot permanently identify things in a
>>universe that can change.

Please note that the self-aggrandizing ignorant, Selzer, argues for
stability and then denies the possible benefits of familiarity. He
doesn't address irreducibility or simplicity at all.

> Ok, I understand your point there - you want a form of key that is
> consistent across relation values. I agree that is useful, and
> terminology is sketchy. However given that the definition of a database
> relation is still argued (date/pascal for example), I think we have to
> concede there is also ambiguity at the very edges of the term key. As
> ever as long as we are talking about the same thing we'll be ok.
>
>
>>[snippage]
>>
>>>It is important to see that "Me now" is a completely different entity
>>>to "Me over my whole lifetime". The temporal issue is irrelevant, all
>>>that matters is to recognise they are just different entites. I know
>>>this is initially seems an obtuse philosophical point, but it has
>>>_real_ consequences for how to model those entities.
>>
>>I see the difference, but I can't see how you can shrug off the temporal
>>issue.
>
> Temporal differences are just a context change.
> Geographical/Cultural/etc may be another generating exactly the same
> problems. There is no way I'm shrugging it off, I'm just saying we
> can't just externalise it - its integral to the universe of discourse
> we are concerned with.

In fact, the differences serve to define the universe of discourse much
in the same way immovable objects and irresistible forces define their
universes.

>>They must have a common property because they're related. I would
>>argue that "Me now" is part of "Me over my whole lifetime."
>
> "Me now" and "Me over my lifetime" could have a common property, but
> they don't _have to_ as they are entities from separate domains. Again
> it's the universe of discourse under concern that matters. At some
> point the chances are they will have common properties, but a DNA value
> might not be recorded for "Me now" and it still be perfectly valid in
> its domain.

You overestimate the use and importance of DNA. That will change in
coming decades.

>>>Something must remain constant to compare entities - something must
>>>identify them. If nothing remains constant the things being compared,
>>>by liebniz equality, are different things full stop. This is what
>>>mathematical logic is grounded in, we can't just avoid it. You seem to
>>>be saying it is possible that "every attribute of something has
>>>changed, yet it is still the same thing". Surely that's logical
>>>nonsense!

"This hammer has been in our family for over 150 years. In that time, we
replaced the handle five times and the head twice!"

>>In the mathematical sense, if *any* attribute is different, then they're
>>different, but that's beside the point. I'm not saying that it is possible
>>that "every attribute of something has changed, yet it is still the same
>>thing" What I'm saying is that "every universal property of something can
>>change, yet it is still the same thing." I know I'm going to get beat up
>>about this, but as I see it, there is a difference between a property that
>>defines something and a property that describes something. In an earlier
>>post, I used the terms, "individual" and "universal" to describe these
>>categories of properties. Individual properties define the essence of
>>something and remain constant throughout its lifetime.

Thus any change in the property separates lifetimes. This then describes
any property.

Given that Selzer makes up his own meanings for words, what makes you
think you have any insight into what he thinks?

>>>>>This is exactly the same as the basic philosophical question of change.
>>>>>Are you the same person you were when you were 10? In one context the
>>>>>answer is absolutely not, so an appropriate identifier is not DNA. In
>>>>>another yes you are the same person, so DNA is perfect. It depends on
>>>>>the domain you are considering, and one must pick the correct key
>>>>>accordingly.
>>>
>>>I worry you have glazed over this, but it highlights how the context
>>>information is more complex than we initially think, and we must
>>>understand that context to make the right key choices (especially if
>>>you want to make inter-relation value comparisons).
>>>
>>>My DNA don't change over my lifetime, so its a good key for that
>>>conceptual entity (n.b. the representative label to record the DNA
>>>might change, but the value is the same.)

Actually, your DNA change constantly. The length of the 10 year old's
telomeres, for example, are longer. (But size isn't everything--or so
I've been told.)

Of course not. Otherwise, you would have included "ON UPDATE CASCADE"
somewhere in the foreign key declaration.

Assume that
>>>>>>the
>>>>>>system only has the before image and the after image (the first two
>>>>>>columns
>>>>>>in RESULTS) in order to complete the update. How can such a system
>>>>>>differentiate between the above update and the following update?
>>>>>>
>>>>>>update P
>>>>>> set y = case y
>>>>>> when 1 then 3
>>>>>> when 3 then 1
>>>>>> end
>>>>>> where x in (1, 2)
>>>>>>
>>>>>>Clearly first update affects a key, and consequently, the changes
>>>>>>should
>>>>>>cascade,

Actually, they should not cascade as explained above. However, if they
were to cascade, how they do so would depend on the implementation of
the dbms. Hopefully, the dbms implementers would at least document the
implementation.

but with the information available (both updates produce the
>>>>>>exact
>>>>>>same before and after images), the system cannot differentiate between
>>>>>>the
>>>>>>two updates; therefore, it cannot determine whether or not to cascade
>>>>>>the
>>>>>>changes. Updates within the Relational Model are are constrained in
>>>>>>the
>>>>>>same way as this hypothetical system. All that is available is the
>>>>>>preceding instance and the succeeding instance, and if the only key
>>>>>>can
>>>>>>change, then there is no way to correlate tuples.

I don't recall where either set algebra or predicate calculus cover
cascading updates for assignment statements. Perhaps Selzer could cite
his source.

>>>>>I'll consider the example in due course. RL work to do in the meantime,
>>>>>sigh. All best, Jim.
>>>>
>>>>Another example is trying to write an update trigger in Sql Server to log
>>>>changes to one table in another. The deleted and inserted pseudotables
>>>>can
>>>>be thought of as the preceding and succeeding database instances
>>>>respectively. Unless you constrain updates to a single row, or reject
>>>>updates that affect both key and non-key columns, or use a surrogate, you
>>>>can't tell what changed to a particular row. In fact, it's possible for
>>>>it
>>>>to appear that nothing at all has changed, even though new key values
>>>>have
>>>>been propogated into referencing tables. (Sql Server does not faithfully
>>>>implement the relational model. It can tell the difference between the
>>>>two
>>>>updates above, even though within a trigger on P, the deleted and
>>>>inserted
>>>>pseudotables would appear identical.)

Since when does the RM say anything about the implementation of
triggered procedures? Perhaps Selzer could cite his reference.

Bob Badour

unread,

Aug 16, 2006, 5:28:46 PM8/16/06

to

Because the surrogate--in the sense of an arbitrarily assigned number
that forces uniqueness and is never reused--provides no semantic hints
regarding identity whatsoever, surrogates similarly force integrity into
applications unless one declares all candidate keys to the dbms. If one
declares all candidate keys to the dbms, how do Selzer's arguments
regarding cascaded updates change in any way shape or form? The unstable
keys that he ignorantly calls natural keys will still change and the
dbms will still have the same dilemmas.

If it happens that someone records duplicate information, one of the
surrogates must change eventually. How then does a surrogate differ from
any other candidate key?

[remaining nonsense snipped without further comment]

Brian Selzer

unread,

Aug 16, 2006, 5:50:45 PM8/16/06

to

"erk" <eric...@gmail.com> wrote in message
news:1155757948.2...@75g2000cwc.googlegroups.com...

> Brian Selzer wrote:
>> How so? In what way does the information principle require that
>> attributes
>> be mutable? A relational database contains knowledge about things, not
>> things. Now, if you're going to talk about something, then that thing
>> must
>> either exist or have existed in the universe.
>
> Referring to "things" and "entities" may be part of the problem, as
> it's on the slippery slope to First Great Blunderville. Facts are
> facts, and don't change, though our understanding of them does. A
> database state is simply a set of propositions, which uses predicates
> (relations) as the organizing principle. Constraints keep our
> propositions consistent, with respect to identifiable values. I don't
> think there's any necessity for a group of facts in one database state
> to be somehow traceable to a previous one. I could certainly be wrong;
> it just feels to me like this discussion is pivoting on identifiable
> "thingies," rather than logical constructs.
>

Facts are facts about things. If things can change, then the facts about
them must also or must be replaced.

>> If a thing exists in the
>> universe and is relevant to the discussion, then there must be some way
>> to
>> distinguish that thing from all other things that exist, that have
>> existed
>> and that can exist, otherwise there isn't any way to be sure that you're
>> talking about the same thing in successive database instances.
>
> Why does one have to "talk about the same thing" in successive database
> instances (values)?
>

Again, facts are facts about things. If things can change--especially if
things can alter their appearance without altering their identity, then the
facts about those things must either change or be replaced.

>> It is often
>> necessary to know what was known about something in order to assert
>> something new. For example, when a credit card charge clears, the bank
>> must
>> know the balance of the account in order to compute the new balance.
>
> Maybe, but from a functional standpoint, that operator is just a
> function (e.g. "subtract $500 from X), in which the balance is a free
> variable. Only in an imperative world does that involve "knowing"
> (referencing) the "previous" balance. Function application means
> there's no "query" of the value prior to the update.
>

Not necessarily. For example, consider a sales order that can have several
states, proposed, open, firm, shipped, received, closed, cancelled. Assume
that the order stated is the normal set of state changes for the order. Now
consider that an order that cannot become proposed once it is firm, it
cannot become received unless it has been shipped. It cannot become closed
unless it has been received. Unless you define special operators to deal
with the states, you need to know what the old value was in order to
maintain the consistency of the database throughout the update.

>> More
>> importantly, the bank must be able to identify the account that is about
>> to
>> change, and that identity must remain constant in both the preceding and
>> succeeding database instances.
>
> Why? As long as it can be identified via some query, what difference
> does it make? For example, if I make a database schema change and
> introduce a new key, with appropriate view changes to support old
> application code, is there some logical distinction? If the external
> queries all still produce the same results, excepting the specific
> values being updated, what does "identity" have to do with it?
>

Because changes are set-based, and if the identity of the account can
change, then it's possible to update the wrong row, or to allow a charge to
clear that shouldn't be allowed.

>> Because changes are set-based, without some
>> means to correlate the propositions in the preceding instance with those
>> in
>> the succeeding instance, you cannot be certain that you're talking about
>> the
>> same thing;
>
> There is no "thing." These are propositions, or assertions if you like,
> nothing more. The only meaning is in the correlation of queries to
> external phenonema of interest.
>

What are the propositions or assertions about? If they're about values then
they're just hot air. A database contains knowledge. Knowledge about what?
Scalar values? I don't think so.

>> therefore, you cannot be certain if the succeeding instance is
>> correct. The question is: should the solution to this problem be handled
>> in
>> implementations, or should the theory be strengthened to eliminate it?
>
> What "strengthening" would eliminate the "problem"?
>

The relational model doesn't have a correct theoretical mechanism to
correlate tuples during updates. The scope of a key value's ability to
identify a tuple is a single relation value from a single database instance.
I think that the model is incomplete without such a mechanism, because there
are some constraints that cannot be enforced, and certain update anomalies
can occur, as I've provided examples of in other posts.

>> I believe Codd used the term "permanent" to describe surrogates. That
>> implies immutability. He did mention drastic circumstances, such as
>> merging
>> databases that could require that they change, but the impression I got
>> was
>> that they should be permanent.
>
> "Impression," "should," and "implies" are all insufficient explanation
> - if I missed your solution in a previous post, let me know, but Codd's
> writings aren't (especially in cases like this where there appears to
> be room for interpretation) holy writ.
>

True, but it felt really good to spank Badour after all of the abuse he's
dished out, unprovoked for the most part, I might add. I really loved the
fact that he doesn't even understand the definition of a candidate key!

> - erk
>

Gene Wirchenko

unread,

Aug 16, 2006, 5:53:43 PM8/16/06

to

On Wed, 16 Aug 2006 19:21:37 GMT, "Brian Selzer"
<br...@selzer-software.com> wrote:

[snip]

>If I'm an ignorant, then where do you fall. You can't come up with a decent
>argument to counter mine. Yours are baseless and inane. Yet you continue
>with these arguments even when confronted with citations from both Codd and
>Date. Who, then, is self-aggrandizing? I think you need to take the log
>out of your own eye.

You are. I will take Bob on a bad day over you at your best any
day.

Sincerely,

Gene Wirchenko

Bob Badour

unread,

Aug 16, 2006, 5:56:41 PM8/16/06

to

Gene Wirchenko wrote:

> On Wed, 16 Aug 2006 19:21:37 GMT, "Brian Selzer"
> <br...@selzer-software.com> wrote:
>
> [snip]
>
>>If I'm an ignorant, then where do you fall. You can't come up with a decent
>>argument to counter mine. Yours are baseless and inane. Yet you continue
>>with these arguments even when confronted with citations from both Codd and
>>Date.

Confronted? Me? I long ago exiled Selzer to the twit filter as soon as I
realised he was full of shit. I only see what others excerpt. Anith
already addressed the only citation I saw.

[snip]

Bob Badour

unread,

Aug 16, 2006, 6:13:35 PM8/16/06

to

kvnkr...@gmail.com wrote:

What makes you think you have a single unique DNA sequence?

It
> seems there *is* a fundamental difference between "Line 3 of invoice
> 12" and "line_id 1433" - both may be candidates for the same entity,
> yet it seems like the first should qualify as a natural key and the
> second should qualify as a surrogate... making Bob's claim that "A
> natural key is merely a familiar surrogate." seem a bit of an
> overgeneralization.

Awkwardness will impair familiarity; however, if users have to interact
with line_id 1433, they will have to become familiar with it. Only by
hiding line_id 1433 entirely from users can one keep users from becoming
familiar with it. That works right up until the moment in time when it
fails.

Keith H Duggar

unread,

Aug 16, 2006, 6:46:18 PM8/16/06

to

Bob Badour wrote:
> kvnkr...@gmail.com wrote:

Bob and kvn, both of your near simultaneous posts explaining
surrogate vs natural/(familiar surrogate) keys with examples
such as names and SSN were helpful and clarifying.

Thank you!

Brian Selzer

unread,

Aug 16, 2006, 7:03:14 PM8/16/06

to

"Gene Wirchenko" <ge...@ucantrade.com.NOTHERE> wrote in message
news:5807e2l0c4k52jubb...@4ax.com...

Well, you're welcome to your opinion. I don't think I'm self-aggrandizing.
I haven't pretended that I know everything, nor have I tried to say that I'm
greater than anybody else. All I've been trying to do is to describe a
situation that I've run into many times in the decades that I've been in
this business and that I preceive as something that should be prevented
verifiably in theory and not left up to implementations or applications.
You may think that I'm full of it, and you're welcome to say it. But Badour
has been abusive and condescending and at some points, just plain wrong
(verifiably so). He continually sticks his nose in just to be contrary,
apparently because he doesn't have anything of substance to contribute. I'm
usually a very civil person, but I don't take kindly to being called an
idiot, an ignorant, a crank, among other things. So again, you're welcome
to your opinion, but I'm surprised that anybody would stand behind such a
personality.

Sincerely,

Brian Selzer

unread,

Aug 16, 2006, 7:39:39 PM8/16/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:zTLEg.49101$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

For a thing that persists, doesn't an endurant part overlap the same spatial
location as thing of which it is a part during the entire interval that the
thing exists? And doesn't a purdurent temporal part overlap the same
spatial location as the thing of which it is a part only during particular
intervals or instants during a thing's lifetime?

In the above example and the next example I'm trying to illustrate the
problem with correlating tuples in the preceding and succeeding database
instances, because that's what a hypothetical implementation that faithfully
implemented the Relational Model would be presented with. Relational
assignment is set-based, meaning that the hypothetical implementation would
be presented with the current database instance and a proposed relation
value, and then must be able to determine whether or not to reject the
proposed relation value.

JOG

unread,

Aug 16, 2006, 7:41:20 PM8/16/06

to

Bob Badour wrote:
> JOG wrote:

> >>They must have a common property because they're related. I would
> >>argue that "Me now" is part of "Me over my whole lifetime."
> >
> > "Me now" and "Me over my lifetime" could have a common property, but
> > they don't _have to_ as they are entities from separate domains. Again
> > it's the universe of discourse under concern that matters. At some
> > point the chances are they will have common properties, but a DNA value
> > might not be recorded for "Me now" and it still be perfectly valid in
> > its domain.
>
> You overestimate the use and importance of DNA. That will change in
> coming decades.

I'm well versed in DNA and RNA given my other half is a geneticist
dealing with ID2 knockout stem cell research. Using the concept as it
is viewed by layman served well enough to illustrate the principle, so
enough pedantry bob.

> [snip]

> >>>Something must remain constant to compare entities - something must
> >>>identify them. If nothing remains constant the things being compared,
> >>>by liebniz equality, are different things full stop. This is what
> >>>mathematical logic is grounded in, we can't just avoid it. You seem to
> >>>be saying it is possible that "every attribute of something has
> >>>changed, yet it is still the same thing". Surely that's logical
> >>>nonsense!
>
> "This hammer has been in our family for over 150 years. In that time, we
> replaced the handle five times and the head twice!"

As I said (il)logical nonsense. Washington's hammer, theseus' ship, it
doesn't matter (Marshall called me on this when I first arrived. I
think he referred to it as "High School philosophy"). If two things
have the same identity there has to be a common attribute to identify
it as such, full stop. No oid's here thank you.

> [snip]

> > Brian Selzer wrote:
> >>"Are you the same person you were when you were 10?" depends on the context
> >>of the query, not necessarily the data, but in either context, you must be
> >>able to correlate the state when you were 10 to the state you are now in
> >>order to do the comparison. In the one context, you would compare the
> >>individual properties; in the other you would compare the universal
> >>properties.
> >
> > No! You have already presumed they are the same person if you do that.
> >
> > All one needs to do is see if the two entities _do correlate or not_
> > (And not assume they do a priori). If they don't, as in the identity is
> > not the same, I conclude they are different people. If they do, I can
> > say they are the same person, but in different states. I apologise for
> > banging on about liebniz all the time, but thats exactly how we
> > distinguish whether something is the same thing or not (conceding that
> > functional dependency means we can rely on the key for identity in RM).
> > I think this is a big point where we are differing.
>
> Given that Selzer makes up his own meanings for words, what makes you
> think you have any insight into what he thinks?

Well, I recognise him making the same mistakes I once did. That gives
me insight into what might rectify them, and its why I can't be as
fatalist about it as yourself.

Jim.

Brian Selzer

unread,

Aug 16, 2006, 8:12:34 PM8/16/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:i6MEg.49108$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

Of course one must declare all candidate keys to the dbms. But if there is
a surrogate key in the sense you describe, then it would be stupid to define
foreign keys that reference any but the surrogate key, since it is more than
stable, it is permanent (unless you're merging databases). Since surrogates
don't change, there will be no cascading updates.

> If it happens that someone records duplicate information, one of the
> surrogates must change eventually. How then does a surrogate differ from
> any other candidate key?
>

If you declare all other candidate keys, then there won't a duplicate
problem, unless there are none. And then you're in the same boat as you
would be if you were using a natural key. A surrogate differs from other
candidate keys in that its values are permanent (unless you're merging
databases). That's the only difference.

Keith H Duggar

unread,

Aug 16, 2006, 8:16:14 PM8/16/06

to

Brian, let's just take a step back to where this began. It
began back in the "Surrogate Keys: an Implementation Issue"
thread when you argued the following:

Brian Seizer wrote:
> There is always a compelling reason to use surrogates:
> natural keys can change.

Now, my opinions on relational theory count for very little
at this point given my n00b status. However, I've followed
the discussion as best I can, and given recent compelling
arguments that natural = familiar surrogate, I believe and
think you may even agree that your above argument is false?

If a natural can change so can a surrogate since a natural
/is a/ surrogate, a familiar surrogate. Therefore, both can
change and hence the possibility of change cannot argue in
favor of one over the other.

You not knowing this and Bob Badour knowing, seems to have
led to a spiral of communication failure and ultimate doom.

Now I wish that had not happened because I have learned a
lot from the discussion. And frankly, you seem to me a very
civil and intelligent person.

You certainly did not cause the kind of c'motion that
was the dawning of Fraud 6 ;-)

J M Davitt

unread,

Aug 16, 2006, 9:31:02 PM8/16/06

to

Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message

> news:1155685450.5...@b28g2000cwb.googlegroups.com...

>
>>Brian Selzer wrote:
>>
>>>"JOG" <j...@cs.nott.ac.uk> wrote in message
>>>news:1155662126.3...@m73g2000cwd.googlegroups.com...
>>>
>>>>Brian Selzer wrote:
>>>>[big snips for clarity]

[more snips for more clarity]

> Note: I categorically reject the notion that the application should issue
> key updates separately. If the system allows ad-hoc queries (and most do),
> then one could issue such a change, bypassing the application.

I think the "bypassing the application" concern reveals
more of your perspective than everything else you've
written so far. The fact that you somehow see tension
between application requirements and data requirements
and seem to regard the application as more important
indicates shortsightedness.

(The terms are mine and I apologize if my use of the
terms is a mischaracterization of Brian's thoughts.)

>>It is important to see that "Me now" is a completely different entity
>>to "Me over my whole lifetime". The temporal issue is irrelevant, all
>>that matters is to recognise they are just different entites. I know
>>this is initially seems an obtuse philosophical point, but it has
>>_real_ consequences for how to model those entities.
>
> I see the difference, but I can't see how you can shrug off the temporal

> issue. They must have a common property because they're related. I would

> argue that "Me now" is part of "Me over my whole lifetime."

If some sort of temporal continuity is important, the
design must include features that meet the requirement --
but some naively suggested surrogate key isn't the answer...

[snip]

> In an earlier
> post, I used the terms, "individual" and "universal" to describe these
> categories of properties. Individual properties define the essence of

> something and remain constant throughout its lifetime. Universal properties

> depend on the state of the universe during that lifetime. Universal
> properties are the only ones that can change.

..even if you proclaim that your made-up value is an
"individual property."

All this...

>>>>>Should the new values for x in P have cascaded into F? Assume that

>>>>>the
>>>>>system only has the before image and the after image (the first two
>>>>>columns
>>>>>in RESULTS) in order to complete the update. How can such a system
>>>>>differentiate between the above update and the following update?
>>>>>
>>>>>update P
>>>>> set y = case y
>>>>> when 1 then 3
>>>>> when 3 then 1
>>>>> end
>>>>> where x in (1, 2)
>>>>>
>>>>>Clearly first update affects a key, and consequently, the changes
>>>>>should

>>>>>cascade, but with the information available (both updates produce the

>>>>>exact
>>>>>same before and after images), the system cannot differentiate between
>>>>>the
>>>>>two updates; therefore, it cannot determine whether or not to cascade
>>>>>the
>>>>>changes. Updates within the Relational Model are are constrained in
>>>>>the
>>>>>same way as this hypothetical system. All that is available is the
>>>>>preceding instance and the succeeding instance, and if the only key
>>>>>can
>>>>>change, then there is no way to correlate tuples.

..doesn't describe a general case. Without specific
requirements, one can't say "Clearly...the changes
should cascade." Yes, "[u]dates within the Relational
Model" occur as you depicted -- but whether changes to
any value is appropriate or not has nothing to do with
relational theory. To me, it seems that you're trying
to invent requirements for the model because you can't
see how to fulfill them in your design.

Brian Selzer

unread,

Aug 16, 2006, 10:14:21 PM8/16/06

to

"Keith H Duggar" <dug...@alum.mit.edu> wrote in message
news:1155773774.4...@m73g2000cwd.googlegroups.com...

> Brian, let's just take a step back to where this began. It
> began back in the "Surrogate Keys: an Implementation Issue"
> thread when you argued the following:
>
> Brian Seizer wrote:
>> There is always a compelling reason to use surrogates:
>> natural keys can change.
>
> Now, my opinions on relational theory count for very little
> at this point given my n00b status. However, I've followed
> the discussion as best I can, and given recent compelling
> arguments that natural = familiar surrogate, I believe and
> think you may even agree that your above argument is false?
>

I think that there is a fundamental difference. Over the years I've worked
with many databases that used surrogates and many that didn't. In all cases
where surrogates were used, it was always assumed that surrogates could not
change. That meant less application code that was more reliable, more
scalable, and in most cases, performed better. There were times when it was
better for some queries to have all columns of a compound natural key
enumerated in a single table. There were also times when the number of
joins could have a detrimental effect on performance. Ten or twelve years
ago, this was in some cases a show stopper, but today, you can use indexed
views and other implementaional improvements to achieve similar performance
while still retaining the reduced code requirements and increased
reliability that surrogates provide. More importantly, in the last several
years the paradigm has shifted. With the advent of ADO.NET, most of the
work an application does involves disconnected datasets. (You have to go
way out of your way to bypass this, unlike with ADO.) This has significant
implications because the pattern is based on collision detection rather than
collision avoidance. First you read the data without locking it, work on
it, and then when you're ready to commit, flush it to the database, usually
in a stored procedure. The flush procedure re-reads the data with an update
lock to make sure that nothing changed and that nobody else can change it,
and then if nothing has changed, writes the new data. The use of
disconnected datasets will result in that there will more often be a
significant interval between the time that the data is read and the time
that the data is flushed. This increased interval means that it is that
much more likely that the data read out will have been changed by another
user by the time that it is ready to be flushed. This makes it much more
important to be able to ensure that the rows that you're about to update
haven't changed, and more importantly, that they're the same rows. With a
surrogate, there's no question. If the row isn't found, then it no longer
exists. If it is found, then you can be sure that it's the same row. With
natural keys, however, it's possible that when the row isn't found, that
someone else changed the key. Of course, you can't tell: all you know is
that the row no longer exists. Whether a row was changed or deleted may be
important information that a user could use. But worse, it's possible that
between the time that the data was read out and the time that the data is
about to be flushed that more than one change occured that resulted in a
different row having the same key value that was read out. I've seen this
happen, and it's not pretty. (But then, I've seen a teller terminal execute
the video memory, where symptoms of the failure depended on the balance that
was displayed on the screen, so the fact that I've seen something isn't
indicative as to how often it occurs.) Usually, something other than the
key is also different, so this doesn't happen very often, but it definitely
shouldn't be ignored. This situation cannot occur with surrogates.

Brian Selzer

unread,

Aug 16, 2006, 11:05:03 PM8/16/06

to

"J M Davitt" <jda...@aeneas.net> wrote in message
news:qFPEg.76061$Eh1....@tornado.ohiordc.rr.com...

> Brian Selzer wrote:
>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>> news:1155685450.5...@b28g2000cwb.googlegroups.com...
>>
>>>Brian Selzer wrote:
>>>
>>>>"JOG" <j...@cs.nott.ac.uk> wrote in message
>>>>news:1155662126.3...@m73g2000cwd.googlegroups.com...
>>>>
>>>>>Brian Selzer wrote:
>>>>>[big snips for clarity]
> [more snips for more clarity]
>
>> Note: I categorically reject the notion that the application should issue
>> key updates separately. If the system allows ad-hoc queries (and most
>> do), then one could issue such a change, bypassing the application.
>
> I think the "bypassing the application" concern reveals
> more of your perspective than everything else you've
> written so far. The fact that you somehow see tension
> between application requirements and data requirements
> and seem to regard the application as more important
> indicates shortsightedness.
>
> (The terms are mine and I apologize if my use of the
> terms is a mischaracterization of Brian's thoughts.)
>

One argument that I've heard in the past is that key changes should be
segregated. The only way to do that is in an application; hence the point.
As far as tension, it's there, but not in the way you've characterized. I
think that the data requirements are more important: if the information
can't be relied upon, then it doesn't matter how good an application is.

>>>It is important to see that "Me now" is a completely different entity
>>>to "Me over my whole lifetime". The temporal issue is irrelevant, all
>>>that matters is to recognise they are just different entites. I know
>>>this is initially seems an obtuse philosophical point, but it has
>>>_real_ consequences for how to model those entities.
>>
>> I see the difference, but I can't see how you can shrug off the temporal
>> issue. They must have a common property because they're related. I
>> would argue that "Me now" is part of "Me over my whole lifetime."
>
> If some sort of temporal continuity is important, the
> design must include features that meet the requirement --
> but some naively suggested surrogate key isn't the answer...
>

If you've read the rest of my argument, then I ask you: how can you possibly
use a temporal attribute to make the tuples in the database instance
preceding a change correspond to tuples in the database instance succeeding
a change?

That's not the case at all. I see exactly how to fulfill them. The point
is: I shouldn't have to. I should be able to define declarative constraints
to ensure the integrity of the data. That means not only within each
database instance, but also between two successive database instances
because the model allows a database to be updated and describes how it
happens, that is, updates are set-based. Now, if an update involves more
than one tuple, and that update depends on the current database instance,
then there should be a way to correleate the tuples in the current database
instance with those in the proposed instance. I've shown that candidate
keys as defined are not sufficient. Can you think of another way? I
shouldn't be able to compromise integrity with an ad-hoc update.

J M Davitt

unread,

Aug 16, 2006, 11:22:45 PM8/16/06

to

Tell me, then: what was the point you were trying to
make when you used the phrase "bypassing the application?"

Brian, it's not that key constraints are insufficient,
it's that the key constraints you declared in the
example are insufficient. As I've been saying all
along: the design is lacking.

I first dealt with questions like this almost 20 years
ago. The requirement was that "the system should know"
that what is, today, for example, known as "American
Airlines Flight 3" was previously known as "Pan American
Flight 3" and before that "TransWold Airlines Flight 1."

Surrogates - i.e., system generated "uniquifiers" - were
part of that solution -- but I know better, now. And
I grant you: modern SQL products are hard pressed to
meet the need. But, if you think about it, there's no
reason to expect that changing a value constrained as
a key value should be as simple as an SQL UPDATE and
there's no reason to expect the underlying model to
provide convenient support for something that would
make some developers' lives easier. Thinking like
that is part of the reason that commercial DBMSs are
such a mess.

Bob Badour

unread,

Aug 17, 2006, 12:08:56 AM8/17/06

to

J M Davitt wrote:

> Brian Selzer wrote:
>
>> "J M Davitt" <jda...@aeneas.net> wrote in message
>> news:qFPEg.76061$Eh1....@tornado.ohiordc.rr.com...
>>
>>> Brian Selzer wrote:
>>>
>>>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>>>> news:1155685450.5...@b28g2000cwb.googlegroups.com...
>>>>
>>>>
>>>>> Brian Selzer wrote:
>>>>>
>>>>>
>>>>>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>>>>>> news:1155662126.3...@m73g2000cwd.googlegroups.com...
>>>>>>
>>>>>>
>>>>>>> Brian Selzer wrote:
>>>>>>> [big snips for clarity]
>>>
>>>
>>> [more snips for more clarity]

>>>>> It is important to see that "Me now" is a completely different entity
>>>>> to "Me over my whole lifetime". The temporal issue is irrelevant, all
>>>>> that matters is to recognise they are just different entites. I know
>>>>> this is initially seems an obtuse philosophical point, but it has
>>>>> _real_ consequences for how to model those entities.
>>>>
>>>> I see the difference, but I can't see how you can shrug off the
>>>> temporal issue. They must have a common property because they're
>>>> related. I would argue that "Me now" is part of "Me over my whole
>>>> lifetime."
>>>
>>> If some sort of temporal continuity is important, the
>>> design must include features that meet the requirement --
>>> but some naively suggested surrogate key isn't the answer...
>>
>> If you've read the rest of my argument, then I ask you: how can you
>> possibly use a temporal attribute to make the tuples in the database
>> instance preceding a change correspond to tuples in the database
>> instance succeeding a change?

By mapping some other stable attribute to the unstable attribute
recording the duration of the mapping. For instance.

>>> [snip]
>>>
>>>> In an earlier post, I used the terms, "individual" and "universal"
>>>> to describe these categories of properties. Individual properties
>>>> define the essence of something and remain constant throughout its
>>>> lifetime. Universal properties depend on the state of the universe
>>>> during that lifetime. Universal properties are the only ones that
>>>> can change.
>>>
>>>
>>> ..even if you proclaim that your made-up value is an
>>> "individual property."
>>>
>>> All this...
>>>
>>>

>> That's not the case at all. I see exactly how to fulfill them. The
>> point is: I shouldn't have to. I should be able to define declarative
>> constraints to ensure the integrity of the data.

The problem is not Selzer's declarative constraint but his expectation
that the dbms will automagically do things no user ever instructed it to
do. Data management has not yet advanced to the point of reading minds.

Hear! Hear!

JOG

unread,

Aug 17, 2006, 6:08:14 AM8/17/06

to

Brian Selzer wrote:
> [snip]

> If you've read the rest of my argument, then I ask you: how can you possibly
> use a temporal attribute to make the tuples in the database instance
> preceding a change correspond to tuples in the database instance succeeding
> a change?

I'm running out of ways to explain why this doesn't make sense Brian,
so let me summarize:

* Tuples can /only/ correspond if they have a common key attribute. To
do so otherwise makes no logical sense.

* If there is /no/ attribute value that remains the same, then the
items the propositions refer to have no correspondence. They are
different.

* This isn't a problem because there is /always/ an attribute that will
identify something. If your aim is to model the entity that consists of
an 'item over its lifetime', then it is up to the designer to determine
that key to identify that lifetime.

* If no natural key is recordable then the designer must use an
artificial surrogate for it (remember the underlying natural attribute
it represents does exist somewhere). That artificial surrogate /must
not/ be hidden as it is an external identifer.

That's it. Nothing more to it. Can anyone else see any holes in this
logic, or am I losing the plot?

David Cressey

unread,

Aug 17, 2006, 7:03:05 AM8/17/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:pXJEg.49046$pu3.5...@ursa-nb00s0.nbnet.nb.ca...

> My SSN is not me. It is an arbitrary identifier chosen by the IRS to
> identify tax filings related to my income. It is familiar because I was
> given a little blue card with it inscribed, and I was instructed to
> transcribe it to a variety of documents.
>

Good point. But the real situation is actually even worse.

First, a side issue. The IRS chooses to use the SSN, but it doesn't assign
it. A new SSN is assigned by the SSA (Social Security Administration).
Thus, if the SSA screws up, and assigns the same SSN twice, the IRS is now
stuck.

The IRS got a key that it is locked into by force of habit, but that is
beyond its control, except to the extent that there is formal collaboration
between the IRS and the SSA.

All of this may seem beside the point. I think it's right on topic. In
essence, a key is natural if "they" assign it, and it's surrogate if "we"
assign it. Defining "we" and "they" is beyond the scope of this comment.

I'm leaving aside yet another topic, and that is that an SSN doesn't,
technically identidy a person. It identifies a social security account.
There is expected to be a one-to-one relationship between (relevant) persons
and social security accounts, and that is why some people use it to
identify persons.

David Cressey

unread,

Aug 17, 2006, 7:17:23 AM8/17/06

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:1iQEg.8919$1f6....@newssvr27.news.prodigy.net...

> I think that there is a fundamental difference. Over the years I've
worked
> with many databases that used surrogates and many that didn't. In all
cases
> where surrogates were used, it was always assumed that surrogates could
not
> change.

PMFJI. I'm offering a point of view that doesn't necessarily conform to
either yours or Bob's.

I also have worked with many databases, some that used surrogates, and some
that
used so called "natural" keys.

The distinction in my mind, is whether "we" control the key or whether
"they" control the key.
If "we" control the key, then it will only get screwed up if "we" screw it
up, and "we" aren't going to do that, are "we"?

If, on the other hand, "they" control the key, then sooner or later they
are going to screw it up. That's the way "they" are.
If "they" screw it up, and "we" have a problem, then "we" have an
intractable situation.

I can give you real world examples of exactly this happening, but this
comment is getting long enough.

> That meant less application code that was more reliable, more
> scalable, and in most cases, performed better.

This, and most of the rest of what you wrote is a natural consequence of the
fact that "we" can rely on "ourselves" more than we can rely on "them".

I'm netither flaming nor agreeing with what you wrote.

JOG

unread,

Aug 17, 2006, 7:51:00 AM8/17/06

to

Bob Badour wrote:
> I disagree that the concept of surrogate vs. natural is useful.

Ok, I've had time to digest this now, and I have to say that I /do/
believe the distinction can be important, and I think your
interpretation is slightly awry. Let me explain:

Bob Badour wrote:
> It is a surrogate for whatever a surrogate key is for. Think of any
> natural key. How is it not a surrogate?

o.k., a surrogate is a subsitute for something. That's agreed.

> My name is not me. It is an arbitrary identifier chosen by my parents.
> It is familiar because I was conditioned from an early age to respond to it.

I /strongly/ contest that your name is a surrogate. Your name is a
'label' applied to you, it is not a 'substitute' for you. I know it is
a subtle distinction but it is important. (n.b. these are not really my
deductions but a regurgitation of the writings of William Kent, highly
rated by perople such as Date.)

> My SSN is not me. It is an arbitrary identifier chosen by the IRS to
> identify tax filings related to my income. It is familiar because I was
> given a little blue card with it inscribed, and I was instructed to
> transcribe it to a variety of documents.

Again an SSN is a label applied to you just in a different context, and
not a substitute for you. Same for the other examples supplied.

> [snip]

> I am not suitably represented for machine processing.

Agreed, but the labels applied to you /are/ suitable for machine
processing. Hence we don't need to provide any substitutes for them -
they can go straight into propositions and ultimately the database.

However, there are some identifiers that we do not have suitably
formatted labels for. Attributes that are currently not easy to enter
into a proposition. Fingerprints for example. Some attributes might not
be easily recordable even though we know they exist. These are
attributes not suitable for machine processing.

Hence we 'subsitute' that key with a different artificial key we have
generated, to act as its representative. That is what surrogacy is.

The blur comes in that this is only really useful at design time,
because as soon as the attribute is used externally, it becomes a new
natural key. Hence I agree that after the fact the distinction is not
useful, but a priori it is important to /understand/ exactly what's
going on as it eliminates any foolhardy temptation to try and hide such
attributes and violate the information principle.

Brian Selzer

unread,

Aug 17, 2006, 7:50:01 AM8/17/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155809294.4...@m73g2000cwd.googlegroups.com...

> Brian Selzer wrote:
>> [snip]
>> If you've read the rest of my argument, then I ask you: how can you
>> possibly
>> use a temporal attribute to make the tuples in the database instance
>> preceding a change correspond to tuples in the database instance
>> succeeding
>> a change?
>
> I'm running out of ways to explain why this doesn't make sense Brian,
> so let me summarize:
>

I understand why it doesn't make sense. That's my entire argument. If
something can have its appearance altered and still be the same thing, and
if the only way to compare things is by their appearance, then you can't
tell if two sets contain the same things unless you know how each of their
appearances has changed or unless you know that the appearance of only one
of them has changed.

> * Tuples can /only/ correspond if they have a common key attribute. To
> do so otherwise makes no logical sense.
>

If you know that the tuples in the proposed instance correspond to those in
the current instance, and you know that only one tuple has a different key
value, then the one that doesn't correspond by the key values is the one
that is different.

If you know how to identify tuples in one instance and you know how to
identify tuples in another and if you know how they're supposed to
correspond, then instead of issuing a single assignment containing all of
the changes, you can issue a series of individual assignments where only one
tuple is different in each. (e.g., FOR EACH ROW)

Of course, issuing a series of assignments introduces its own problems. For
example, consider a self-referencing relation. If any of the assignments
changes the only key in a tuple that is referenced by another, then the
database will be inconsistent until the new value for the referencing tuple
is assigned.

> * If there is /no/ attribute value that remains the same, then the
> items the propositions refer to have no correspondence. They are
> different.
>

This is true only if by "attributes" you mean the properties of something,
including those that are not included in the propositions about it. If you
are referring to "attributes" as the set of attributes included in a
proposition, then "They are different." must become "You can't tell whether
or not they're the same."

> * This isn't a problem because there is /always/ an attribute that will
> identify something. If your aim is to model the entity that consists of
> an 'item over its lifetime', then it is up to the designer to determine
> that key to identify that lifetime.
>

Then the model should take this into account in its definition. That may
embody changing the definition of a key, or changing its treatment of
attributes in the definition of a relation schema, or both. You can define
multiplicity constraints, and that is defined in the model. Maybe you could
define mutability constraints, and include that in the model. Maybe the
entity integrity rule could be changed to include restrictions against
mutable attributes as well as nullable attributes. I don't know. All I
know is that I can break it, and that should be addressed somehow.

> * If no natural key is recordable then the designer must use an
> artificial surrogate for it (remember the underlying natural attribute
> it represents does exist somewhere). That artificial surrogate /must
> not/ be hidden as it is an external identifer.
>

I would have said, "If no natural key is both recordable and immutable then
the designer must use an artifical surrogate for it."

JOG

unread,

Aug 17, 2006, 8:05:27 AM8/17/06

to

Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1155809294.4...@m73g2000cwd.googlegroups.com...
> > Brian Selzer wrote:
> >> [snip]
> >> If you've read the rest of my argument, then I ask you: how can you
> >> possibly
> >> use a temporal attribute to make the tuples in the database instance
> >> preceding a change correspond to tuples in the database instance
> >> succeeding
> >> a change?
> >
> > I'm running out of ways to explain why this doesn't make sense Brian,
> > so let me summarize:
> >
>
> I understand why it doesn't make sense.

I was referring to why tuples between relation-values cannot correspond
if they do not have a common key.

> That's my entire argument. If
> something can have its appearance altered and still be the same thing, and
> if the only way to compare things is by their appearance, then you can't
> tell if two sets contain the same things unless you know how each of their
> appearances has changed or unless you know that the appearance of only one
> of them has changed.
>
> > * Tuples can /only/ correspond if they have a common key attribute. To
> > do so otherwise makes no logical sense.
> >
>
> If you know that the tuples in the proposed instance correspond to those in
> the current instance,

You can /only/ know that by comparing a key value that maintains
identity over the lifetime of the item, as specified by your design
choice.

> and you know that only one tuple has a different key
> value, then the one that doesn't correspond by the key values is the one
> that is different.
>
> If you know how to identify tuples in one instance and you know how to
> identify tuples in another and if you know how they're supposed to
> correspond,

wuh? Now you are referring to a schema change? If not, tuples are
identified the same way across relation values.

Perhaps bob was right, I give up. I think your notion of how we record
identity is flawed. Maybe oid's are what you are looking for if that is
the case.

JOG

unread,

Aug 17, 2006, 8:08:29 AM8/17/06

to

Apologies for the cross posting. I have jumped ship on this thread.

Brian Selzer

unread,

Aug 17, 2006, 8:35:19 AM8/17/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1155816327....@p79g2000cwp.googlegroups.com...

> Brian Selzer wrote:
>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>> news:1155809294.4...@m73g2000cwd.googlegroups.com...
>> > Brian Selzer wrote:
>> >> [snip]
>> >> If you've read the rest of my argument, then I ask you: how can you
>> >> possibly
>> >> use a temporal attribute to make the tuples in the database instance
>> >> preceding a change correspond to tuples in the database instance
>> >> succeeding
>> >> a change?
>> >
>> > I'm running out of ways to explain why this doesn't make sense Brian,
>> > so let me summarize:
>> >
>>
>> I understand why it doesn't make sense.
>
> I was referring to why tuples between relation-values cannot correspond
> if they do not have a common key.
>

And I agree entirely.

>> That's my entire argument. If
>> something can have its appearance altered and still be the same thing,
>> and
>> if the only way to compare things is by their appearance, then you can't
>> tell if two sets contain the same things unless you know how each of
>> their
>> appearances has changed or unless you know that the appearance of only
>> one
>> of them has changed.
>>
>> > * Tuples can /only/ correspond if they have a common key attribute. To
>> > do so otherwise makes no logical sense.
>> >
>>
>> If you know that the tuples in the proposed instance correspond to those
>> in
>> the current instance,
>
> You can /only/ know that by comparing a key value that maintains
> identity over the lifetime of the item, as specified by your design
> choice.
>

The database can /only/ know that. The originator of the change can know,
however. But again, that pushes the responsibility to enforce constraints
onto the user.

>> and you know that only one tuple has a different key
>> value, then the one that doesn't correspond by the key values is the one
>> that is different.
>>
>> If you know how to identify tuples in one instance and you know how to
>> identify tuples in another and if you know how they're supposed to
>> correspond,
>
> wuh? Now you are referring to a schema change? If not, tuples are
> identified the same way across relation values.
>

No. Not if "you" is the originator of the change.

Bob Badour

unread,

Aug 17, 2006, 10:48:14 AM8/17/06

to

Technically, all candidate keys identify sets of related facts and not
people or external things. There is expected to be a one-to-one
relationship between (relevant) people or external things and whatever
commonality relates the facts.

The above is as true for paper records and files as it is for computer
records and files or for databases.

Bob Badour

unread,

Aug 17, 2006, 10:50:50 AM8/17/06

to

You and I are in complete agreement. I note that the distinguishing
characteristic between natural keys and surrogates keys is familiarity,
and I note that all objections to natural keys relate to control.

erk

unread,

Aug 17, 2006, 11:13:14 AM8/17/06

to

Brian Selzer wrote:
> Facts are facts about things. If things can change, then the facts about
> them must also or must be replaced.

I think the "thing-orientation" here gets in the way of thinking about
facts as instances of predicates that we care about. When we talk about
the database, it's the predicates that are primary, not the things they
concern. A change in the database doesn't necessarily reflect a "thing
changing", unless you collapse "thing" to mean predicate rather than
some entity.

> > Maybe, but from a functional standpoint, that operator is just a
> > function (e.g. "subtract $500 from X), in which the balance is a free
> > variable. Only in an imperative world does that involve "knowing"
> > (referencing) the "previous" balance. Function application means
> > there's no "query" of the value prior to the update.
>
> Not necessarily. For example, consider a sales order that can have several
> states, proposed, open, firm, shipped, received, closed, cancelled. Assume
> that the order stated is the normal set of state changes for the order. Now
> consider that an order that cannot become proposed once it is firm, it
> cannot become received unless it has been shipped. It cannot become closed
> unless it has been received. Unless you define special operators to deal
> with the states, you need to know what the old value was in order to
> maintain the consistency of the database throughout the update.

Maybe the discrepancy hinges on the phrase "you need to know." I'd
argue that no query is needed, merely constraints.

> >> More
> >> importantly, the bank must be able to identify the account that is about
> >> to
> >> change, and that identity must remain constant in both the preceding and
> >> succeeding database instances.
> >
> > Why? As long as it can be identified via some query, what difference
> > does it make? For example, if I make a database schema change and
> > introduce a new key, with appropriate view changes to support old
> > application code, is there some logical distinction? If the external
> > queries all still produce the same results, excepting the specific
> > values being updated, what does "identity" have to do with it?
>
> Because changes are set-based, and if the identity of the account can
> change, then it's possible to update the wrong row, or to allow a charge to
> clear that shouldn't be allowed.

There is no "wrong row," only a set of propositions. The same
possibility for human error would seem to be present in any update:
that you might issue an update without knowing about a change made
between the time you last loaded the page, and the time you pressed
Save, and therefore could violate a constraint which you wouldn't
violate if only the database were in the state you think it is (based
on what's on the screen). This issue seems to be a particular variant.

> > There is no "thing." These are propositions, or assertions if you like,
> > nothing more. The only meaning is in the correlation of queries to
> > external phenonema of interest.
>
> What are the propositions or assertions about? If they're about values then
> they're just hot air. A database contains knowledge. Knowledge about what?
> Scalar values? I don't think so.

They're about what is in our heads - the application (business) domain.
The database doesn't care about that; it's in crafting predicates and
constraints that we tell the database as much as it needs to (or can)
"know."

> The relational model doesn't have a correct theoretical mechanism to
> correlate tuples during updates. The scope of a key value's ability to
> identify a tuple is a single relation value from a single database instance.
> I think that the model is incomplete without such a mechanism, because there
> are some constraints that cannot be enforced, and certain update anomalies
> can occur, as I've provided examples of in other posts.

Since we're not talking about a machine that "really knows" the real
world, I don't understand what sort of mechanism you have in mind -
what is an example of a "correct theoretical mechanism"? The relational
model already allows surrogate keys.

> True, but it felt really good to spank Badour after all of the abuse he's
> dished out, unprovoked for the most part, I might add. I really loved the
> fact that he doesn't even understand the definition of a candidate key!

I didn't gather from that exchange that he had, but don't feel like
diving into that particular argument...

- erk

Bob Badour

unread,

Aug 17, 2006, 11:27:46 AM8/17/06

to

erk wrote:

> Brian Selzer wrote:
>
>>True, but it felt really good to spank Badour after all of the abuse he's
>>dished out, unprovoked for the most part, I might add. I really loved the
>>fact that he doesn't even understand the definition of a candidate key!

At least Selzer's delusions entertain him.

JOG

unread,

Aug 17, 2006, 11:36:21 AM8/17/06

to

erk wrote:
> Brian Selzer wrote:
> > Facts are facts about things. If things can change, then the facts about
> > them must also or must be replaced.
>
> I think the "thing-orientation" here gets in the way of thinking about
> facts as instances of predicates that we care about. When we talk about
> the database, it's the predicates that are primary, not the things they
> concern. A change in the database doesn't necessarily reflect a "thing
> changing", unless you collapse "thing" to mean predicate rather than
> some entity.

Very true. Originally Codd used the terms "roles and values" in the
1970 paper. Perhaps the commonm use of 'attributes and values' to
describe a relation encourages the "thing-oriented" approach.

erk

unread,

Aug 17, 2006, 12:08:23 PM8/17/06

to

JOG wrote:
> Originally Codd used the terms "roles and values" in the
> 1970 paper. Perhaps the commonm use of 'attributes and values' to
> describe a relation encourages the "thing-oriented" approach.

Probably true - on its own, the word "attribute" implies a thing (noun)
its value is attributed to.

I try to remember Date's loose analogy of predicates as sentences, and
attributes as nouns in the sentence. A predicate is a type of
statement; each tuple is a specific statement about specific values, a
statement we assert is true (and Closed World means those not in the
database are assumed false). Constraints, then, are our means of
keeping statements consistent in meaning, since the DBMS can't know the
real-world meaning of any of them.

I'm not sure "roles and values" is any better; "predicates and free
variables," or "predicates and terms," while less intuitive, are as
close as I can get.

- erk

paul c

unread,

Aug 17, 2006, 1:10:01 PM8/17/06

to

erk wrote:
> JOG wrote:
>> Originally Codd used the terms "roles and values" in the
>> 1970 paper. Perhaps the commonm use of 'attributes and values' to
>> describe a relation encourages the "thing-oriented" approach.
>
> Probably true - on its own, the word "attribute" implies a thing (noun)
> its value is attributed to.
>

I'd agree that for many people there is such an implication (but I'd
rather use the word 'suggests' rather than 'implies'). Personally, at
the level of a dbms, I like 'attribute' but I'd rather think of
'attributing' in a very narrow way that doesn't invite comparisons with
some external reality, eg., when I fasten an attribute to a relation,
all I'm doing is associating some domain's values with values of other
domains. It is up to users to agree on some interpretation, not for the
system to try to do.

> I try to remember Date's loose analogy of predicates as sentences, and
> attributes as nouns in the sentence. A predicate is a type of
> statement; each tuple is a specific statement about specific values, a
> statement we assert is true (and Closed World means those not in the
> database are assumed false). Constraints, then, are our means of
> keeping statements consistent in meaning, since the DBMS can't know the
> real-world meaning of any of them.
>
> I'm not sure "roles and values" is any better; "predicates and free
> variables," or "predicates and terms," while less intuitive, are as
> close as I can get.
>
> - erk
>

More good stuff. Even though many here have read Date, I think it's not
repeated enough (even though I'm a bit ignorant about the/any formal
connection between set logic and predicate logic). I'd quibble a bit
about "constraints ... keeping statements consistent" - I tend to think
that in the first instance it is relations that do that whereas
constraints are a convenient way of making sure all apps don't exceed
the range of values intended nor the intended range of mappings among
values. Put another way, I think the act of defining a relation is
defining the 'first constraint', so to speak.

Also (and I know I'm going a bit off-track here), saying a salary can't
be more than say, $10,000, might be thought of as a domain constraint,
not a relation constraint, but I think that would mean some
sophisticated type support in a dbms, eg. to come up with implications
using join. Whereas saying the salary must be less than such-and-such
if the job is clerk is more of a relation constraint.

p

erk

unread,

Aug 17, 2006, 1:50:45 PM8/17/06

to

paul c wrote:

> erk wrote:
> > Probably true - on its own, the word "attribute" implies a thing (noun)
> > its value is attributed to.
>
> I'd agree that for many people there is such an implication (but I'd
> rather use the word 'suggests' rather than 'implies').

I agree, that was sloppy on my part.

> Personally, at
> the level of a dbms, I like 'attribute' but I'd rather think of
> 'attributing' in a very narrow way that doesn't invite comparisons with
> some external reality, eg., when I fasten an attribute to a relation,
> all I'm doing is associating some domain's values with values of other
> domains. It is up to users to agree on some interpretation, not for the
> system to try to do.

Yes, that association depends on the meaning of the predicate. The
various definitions of the verb "attribute" imply a source, person, or
thing that the value is inherent in. There's definitely a "conceptual
containership," while attributes of relations are placeholders for
specific values about which statements are being made.

>> Constraints, then, are our means of
>> keeping statements consistent in meaning, since the DBMS can't know the
>> real-world meaning of any of them.

By consistency, I meant the more general case of consistency between N
different predicates - being able to limit statements by reference to
other statements. All that can be derived only from what we know about
the real-world domain that we're modeling, and the invariants we need
to enforce.

In most cases, those constraints involve only a small subset of the
database's relvars, but it needn't be so.

> [...] I tend to think

> that in the first instance it is relations that do that whereas
> constraints are a convenient way of making sure all apps don't exceed
> the range of values intended nor the intended range of mappings among
> values. Put another way, I think the act of defining a relation is
> defining the 'first constraint', so to speak.

I agree with the spirit - relations are the first principle of
relational (note to self: duh).

> Also (and I know I'm going a bit off-track here), saying a salary can't
> be more than say, $10,000, might be thought of as a domain constraint,
> not a relation constraint, but I think that would mean some
> sophisticated type support in a dbms, eg. to come up with implications
> using join. Whereas saying the salary must be less than such-and-such
> if the job is clerk is more of a relation constraint.

What do you mean by "implications using join"? A domain mustn't be
defined by reference to relvars.

- erk

paul c

unread,

Aug 17, 2006, 2:05:05 PM8/17/06

to

erk wrote:
> paul c wrote:
>> erk wrote:

...

>> Also (and I know I'm going a bit off-track here), saying a salary can't
>> be more than say, $10,000, might be thought of as a domain constraint,
>> not a relation constraint, but I think that would mean some
>> sophisticated type support in a dbms, eg. to come up with implications
>> using join. Whereas saying the salary must be less than such-and-such
>> if the job is clerk is more of a relation constraint.
>
> What do you mean by "implications using join"? A domain mustn't be
> defined by reference to relvars.

I meant only that if I had Execs{Empdomain Emp,BigMoneydomain salary}
and Clerks{Empdomain,PocketChangedomain salary} I might want to say
Execs JOIN Clerks and get the result {Empdomain Emp, Allmoneydomain
salary}, BigMoneydomain and PocketChangedomain are sub-types of
Allmoneydomain. Maybe that's a bit contrived, but as I say, I'm weak on
type theory - even if it's reasonable, I'm not sure the complexity is
worth it for mundane apps.

p

J M Davitt

unread,

Aug 17, 2006, 11:16:55 PM8/17/06

to

Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1155809294.4...@m73g2000cwd.googlegroups.com...
>
>>Brian Selzer wrote:
>>
>>>[snip]

[more snip]

> Then the model should take this into account in its definition. That may
> embody changing the definition of a key, or changing its treatment of
> attributes in the definition of a relation schema, or both. You can define
> multiplicity constraints, and that is defined in the model. Maybe you could
> define mutability constraints, and include that in the model. Maybe the
> entity integrity rule could be changed to include restrictions against
> mutable attributes as well as nullable attributes. I don't know. All I
> know is that I can break it, and that should be addressed somehow.

Stop right there!

On "conceptual model of transactions" we learned of /replacement/
updates and /modification/ updates and some obscure theory of
transactions.

Earlier we got /individual/ and /universal/ attributes and some
vague requirement that the relational model - or DBMS - keep track
of which was which and somehow treat them differently.

Now we see there exist /multiplicity constraints/ and /mutability
constraints/ and /entity integrity/ rules and /restrictions against
nullable attributes/.

[Such a fertile field, this database theory; so much is unknown!]

Brian, please stop making this stuff up! You say, "I can break
it, and that should be addressed somehow." Then you carry on trying
to convince us that the database should provide a solution to the
problem you face. All along, we've been saying, "If these things
are problems, your design is broken."

Relational theory provides all you need to meet the requirements
you've described here.

> I would have said, "If no natural key is both recordable and immutable then
> the designer must use an artifical surrogate for it."

Let me ask: is the surrogate immutable?

Brian Selzer

unread,

Aug 18, 2006, 3:40:15 AM8/18/06

to

"erk" <eric...@gmail.com> wrote in message
news:1155827594....@h48g2000cwc.googlegroups.com...

> Brian Selzer wrote:
>> Facts are facts about things. If things can change, then the facts about
>> them must also or must be replaced.
>
> I think the "thing-orientation" here gets in the way of thinking about
> facts as instances of predicates that we care about. When we talk about
> the database, it's the predicates that are primary, not the things they
> concern. A change in the database doesn't necessarily reflect a "thing
> changing", unless you collapse "thing" to mean predicate rather than
> some entity.
>

If a constraint is defined in terms of successive states of a database, then
facts cannot be thought of just in terms of instances of predicates. Any
such instance has identity only within a single relation value. Since the
value of a candidate key determines the values of all other attributes, it
can be used to identify a single tuple within a single relation value;
therefore, it can be used in other relation values as a substitute for
enumerating all of the attribute values of the referenced tuple in every
referencing tuple within the same database state--but only within the same
database state. Extending the scope of a candidate key's ability to
identify instances of predicates from a single database state to successive
database states would require that those instances be identical, not just
the candidate key values. If the values determined by the candidate key
value in the proposed state were different than those in the current state,
then the instances would not have the same properties, and therefore, would
not be the same. Instances are values; values do not change. Therefore,
relaxing this restriction so that a candidate key value can identify
instances in successive database states that are not necessarily identical,
but have identical candidate key values can only be possible if the
instances of a predicate represent things in the universe of discourse that
can have their appearance altered without altering their identity, and what
is identified by a candidate key value is not just a tuple, but something in
the universe.

>> > Maybe, but from a functional standpoint, that operator is just a
>> > function (e.g. "subtract $500 from X), in which the balance is a free
>> > variable. Only in an imperative world does that involve "knowing"
>> > (referencing) the "previous" balance. Function application means
>> > there's no "query" of the value prior to the update.
>>
>> Not necessarily. For example, consider a sales order that can have
>> several
>> states, proposed, open, firm, shipped, received, closed, cancelled.
>> Assume
>> that the order stated is the normal set of state changes for the order.
>> Now
>> consider that an order that cannot become proposed once it is firm, it
>> cannot become received unless it has been shipped. It cannot become
>> closed
>> unless it has been received. Unless you define special operators to deal
>> with the states, you need to know what the old value was in order to
>> maintain the consistency of the database throughout the update.
>
> Maybe the discrepancy hinges on the phrase "you need to know." I'd
> argue that no query is needed, merely constraints.
>

What type of constraints? I don't understand how you could define a
constraint. Could you please show me?

But it does not require them. Nor does it define mutability constraints in
conjunction with entity integrity. Nor does it define a tuple-level
assignment operator. I think that the definition of the model should be
strong enough so that I can't break it.

Brian Selzer

unread,

Aug 18, 2006, 5:18:37 AM8/18/06

to

"J M Davitt" <jda...@aeneas.net> wrote in message

news:HiaFg.65037$u11....@tornado.ohiordc.rr.com...

> Brian Selzer wrote:
>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>> news:1155809294.4...@m73g2000cwd.googlegroups.com...
>>
>>>Brian Selzer wrote:
>>>
>>>>[snip]
>
> [more snip]
>
>> Then the model should take this into account in its definition. That may
>> embody changing the definition of a key, or changing its treatment of
>> attributes in the definition of a relation schema, or both. You can
>> define multiplicity constraints, and that is defined in the model. Maybe
>> you could define mutability constraints, and include that in the model.
>> Maybe the entity integrity rule could be changed to include restrictions
>> against mutable attributes as well as nullable attributes. I don't know.
>> All I know is that I can break it, and that should be addressed somehow.
>
> Stop right there!
>
> On "conceptual model of transactions" we learned of /replacement/
> updates and /modification/ updates and some obscure theory of
> transactions.
>

In the context of a transaction, there is a definite difference between the
semantics of replacement and modification. And that difference can affect
concurrency. If you can't perceive that difference, let me know, and I'll
try again to explain it.

> Earlier we got /individual/ and /universal/ attributes and some
> vague requirement that the relational model - or DBMS - keep track
> of which was which and somehow treat them differently.
>

Those terms stuck in my head. I'm not sure where I got them, but I think it
was in a discussion or paper about onatology that talked about the
categories of urelements. If I remember correctly, (it was a long time
ago), individual urelements, like substance, are part of the essence of
something, whereas universal urelements, such as time, location, etc., are
situational, that is, they depend on the state of the universe.

> Now we see there exist /multiplicity constraints/ and /mutability
> constraints/ and /entity integrity/ rules and /restrictions against
> nullable attributes/.
>

Isn't NULL/NOT NULL a multiplicity constraint? Isn't the entity integrity
rule one of Codd's rules? Doesn't it restrict the use of nullable
attributes? I didn't make any of that up. Mutability is a common
programming concept, all I suggested was that it might improve the
situation. And what's wrong with suggesting improvements?

> [Such a fertile field, this database theory; so much is unknown!]

>
> Brian, please stop making this stuff up! You say, "I can break
> it, and that should be addressed somehow." Then you carry on trying
> to convince us that the database should provide a solution to the
> problem you face. All along, we've been saying, "If these things
> are problems, your design is broken."
>

And I'm saying that no matter how good your design is, if it includes a
transition constraint without an immutable key, I can issue a multi-tuple
update that will violate that constraint, leaving the database in an
inconsistent state. Doesn't that violate the Golden Rule? There are
implementation-specific extensions that can prevent it, but that would tie
the database to a particular implementation.

I've illustrated the problem in I don't even remember how many ways. I've
been begging Microsoft to implement FOR EACH ROW triggers in Sql Server to
work around a problem I've encountered many times that mirrors exactly the
limitation I perceive in the model as it is defined. If the only key can
change, then you can't correlate the rows in the deleted pseudotable with
the inserted pseudotable, and therefore, you cannot determine with certainty
what changed (unless there's only one row). If the only key can change,
then you can't correlate the tuples in the current instance with those in
the proposed instance, and therefore, you cannot enforce transition
constraints (unless there's only one tuple).

> Relational theory provides all you need to meet the requirements
> you've described here.
>

>> I would have said, "If no natural key is both recordable and immutable
>> then the designer must use an artifical surrogate for it."
>
> Let me ask: is the surrogate immutable?

It should be. The value, once assigned, shouldn't change for as long as the
entity is referenced by the database, and even after it's not, that value
shouldn't be used again.

erk

unread,

Aug 18, 2006, 9:34:40 AM8/18/06

to

Brian Selzer wrote:
> "erk" <eric...@gmail.com> wrote in message
> news:1155827594....@h48g2000cwc.googlegroups.com...
> > Brian Selzer wrote:

> > [snip]

> If a constraint is defined in terms of successive states of a database, then
> facts cannot be thought of just in terms of instances of predicates.

Yes, that's all facts are, though constraints mean that your facts are
consistent with one another, based on how you've defined the predicates
of relevance to you (i.e. inter-relation constraints reflect
requirements as well as "reality"). I'm speaking out of some degree of
ignorance; no database I've ever worked with has had state-transition
constraints like this. The only constraints have been what constitutes
a valid relation (and database) value.

> Any such instance has identity only within a single relation value. Since the
> value of a candidate key determines the values of all other attributes, it
> can be used to identify a single tuple within a single relation value;
> therefore, it can be used in other relation values as a substitute for
> enumerating all of the attribute values of the referenced tuple in every
> referencing tuple within the same database state--but only within the same
> database state.

I don't think that's a valid interpretation. It's not a substitute for
referencing all of the attribute values, nor, given only the key value,
can I determine the values of those attributes. The key has a specific
meaning in all the predicates in which it's referenced, as distinct
from the other attributes - it's not a surrogate for them. Relations,
even those with foreign key constraints, are standalone facts. Joins
and constraints reflect references to the same types.

I think this difference is important, and am not just spitting hairs
for the sake of it.

> Extending the scope of a candidate key's ability to
> identify instances of predicates from a single database state to successive
> database states would require that those instances be identical, not just
> the candidate key values.

So this ability to uniquely identify a fact across successive database
states is important purely for state-transition constraints? Is there
any other use for them?

> If the values determined by the candidate key
> value in the proposed state were different than those in the current state,
> then the instances would not have the same properties, and therefore, would
> not be the same.

Sameness is irrelevant; a fact is a fact, and if database value N+1
differs from database value N, it's because facts have changed, and we
need the database to reflect reality more accurately.

> Instances are values; values do not change. Therefore,
> relaxing this restriction so that a candidate key value can identify
> instances in successive database states that are not necessarily identical,
> but have identical candidate key values can only be possible if the
> instances of a predicate represent things in the universe of discourse that
> can have their appearance altered without altering their identity, and what
> is identified by a candidate key value is not just a tuple, but something in
> the universe.

It's a fact, not a thing. I have to admit that I'm still quite leery of
the notion that facts have "identity." I have no current
counterproposal; I'm just trying to understand why this is necessary. A
fact is a statement about things. One attribute of such a statement
might be that a real-world thing has a real-world unique identifying
value, and I understand the need to sometimes introduce surrogates. But
given that a separate "identity attribute" for a fact can have no
possible correlation with anything in the real world, including natural
candidate keys, I smell potential problems.

> >> > Maybe, but from a functional standpoint, that operator is just a
> >> > function (e.g. "subtract $500 from X), in which the balance is a free
> >> > variable. Only in an imperative world does that involve "knowing"
> >> > (referencing) the "previous" balance. Function application means
> >> > there's no "query" of the value prior to the update.
> >>
> >> Not necessarily. For example, consider a sales order that can have
> >> several
> >> states, proposed, open, firm, shipped, received, closed, cancelled.
> >> Assume
> >> that the order stated is the normal set of state changes for the order.
> >> Now
> >> consider that an order that cannot become proposed once it is firm, it
> >> cannot become received unless it has been shipped. It cannot become
> >> closed
> >> unless it has been received. Unless you define special operators to deal
> >> with the states, you need to know what the old value was in order to
> >> maintain the consistency of the database throughout the update.
> >
> > Maybe the discrepancy hinges on the phrase "you need to know." I'd
> > argue that no query is needed, merely constraints.
>
> What type of constraints? I don't understand how you could define a
> constraint. Could you please show me?

Sorry, my comment didn't directly address the issue, so I'll rephrase:
given that an application can produce several different state
transitions in a row, I'm not sure what value these state-transition
constraints would have. For example, an application can issue two
updates in a row: set order status to open, then immediately set to
firm. Such state transition constraints seem to have no meaning at all
if they don't reference other relations (e.g. you can't set the status
to received if there's no corresponding fact regarding the receipt
date/time), aren't they just "static" relation/database constraints,
which can be enforced for every update?

I have to give this much more thought, but is this a case where
syntactic sugar for state-transition constraints could simply
transparently introduce "surrogate relations" to implement the state
transition constraints as "static" database constraints?

Aren't tuple-level assignment operators unnecessary if you have the
surrogate keys you seek?

In any event, I'm not sure that mandatory surrogate keys solve more
problems than they create. The ability to change the non-surrogate keys
arbitrarily seems to indicate that The anything, given that they allow
arbitrary combinations of the various keys. In other words, by
definition the surrogate key is unrelated to anything else, and as such
it seems the keys can be shuffled at will. In the case of a relation
where there is only a single surrogate key (e.g. the tuples represent
facts that are nearly indistinguishable, like events in a trace), it
doesn't matter.

> I think that the definition of the model should be
> strong enough so that I can't break it.

I think you really expect way too much from models, and I'm not sure
this is a fracture any worse than the cure would produce. I think these
are fundamental identity problems, not just ones particular to the
relational model.

At the risk of being accused of waving my hands, I'll quote from one of
Bill Kent's papers, The Unsolvable Identity Problem
(http://www.idealliance.org/papers/extreme/proceedings/html/2003/Kent01/EML2003Kent01.html):

"Why is the identity problem unsolvable? To begin with, as just shown,
we don't agree on what the identity problem is."

"A general unified theory of identity is elusive. It probably doesn't
exist. The main reasons:
* The problem is not well defined.
* There are theoretical and practical limitations to what can be
achieved.
* There are too many semantic issues.
* There are too many domains. We can't achieve a consistent
solution across all of them.

But this quest for the Holy Grail is educational."

"So what do we do? Cope, as we always do. If there is no ideal
solution, we develop solutions that are good enough. The trouble is
that what's good enough for you today isn't good enough for me
tomorrow. We are forever doomed to compromise, extend, patch and rework
to make our good enough solutions a little better. We'll never get it
right. That's life.

Human beings manage to cope somehow with imperfect identification
schemes. Our computer systems might do no better than that."

- erk

Brian Selzer

unread,

Aug 18, 2006, 8:49:55 AM8/18/06

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:NBfFg.1912$q63...@newssvr13.news.prodigy.com...

sorry, typo, "ontology" not "onatology"

I'd like to add that it doesn't make any sense to stuff old and new values
in the same database instance in order to enforce a constraint, because then
you're relying on the user to reassert both the old and the new values, and
if they fail to do so, there's nothing to compare, and garbage gets in.
Therefore, you must be able to define a transition constraint, or any
temporal constraint for that matter, in terms of the current instance and
the proposed instance. Which means that you must be able to correlate
tuples in order to enforce transition constraints. (I understand that there
can be exceptions, such as a transition constraint where you're comparing
aggregate values, but I would consider that a degenerate form.)

erk

unread,

Aug 18, 2006, 10:13:51 AM8/18/06

to

Nothing is wrong with the suggestion, but it's a programming concept
rife with problems, one that many functional languages do without, as
it turns a program into a complex state machine and inhibits many
useful forms of analysis (not to mention debugging).

> And I'm saying that no matter how good your design is, if it includes a
> transition constraint without an immutable key, I can issue a multi-tuple
> update that will violate that constraint, leaving the database in an
> inconsistent state.

Here is what you wrote in the original post, and re-reading it, I have
questions about it. Pardon me if I'm re-treading well-worn ground; I'm
slow on the uptake.

> With natural keys, that is, keys whose value can be
> different in successive database states,
> it is possible for the values of
> one set of identifying attributes of something in the universe to be
> different in successive universe states,

Perhaps my eyes glazed over in a previous discussion of key mutability,
but doesn't simply disabling key updates prevent this, with less
machinery than mandatory surrogates?

I would suggest that this case indicates design problems - that someone
chose predicates poorly. Design problems can't be solved without some
redesign. And I'm not sure that the above statement indicates anything
about databases at all; the identity problem is hairy at best and
intractable at worst. We make decisions, none of which are perfect, and
sometimes realize another decision would have been better (although we
often aren't considering that choice's inevitable downside as we're
wishing away).

> and when this new information is
> imparted to the database, it is possible for there to be propositions in
> successive database states that should correspond,

Why "should" they correspond? Once the database is in a new state, what
difference does the correspondence make? If I update two tuples,
swapping every bit of information except the surrogate key, then what
difference does the correspondence make? If I swap any possibly
identifying attributes, I get the same situation - a consistent
surrogate key with inconsistent natural keys.

> but cannot because the
> only relevant set of identifying attributes of something is different in
> successive universe states.

The only set? I thought you were talking about a relation that has one
or more natural keys, but to which you're adding a surrogate. Maybe I'm
mistaken.

> Doesn't that violate the Golden Rule? There are
> implementation-specific extensions that can prevent it, but that would tie
> the database to a particular implementation.
>
> I've illustrated the problem in I don't even remember how many ways. I've
> been begging Microsoft to implement FOR EACH ROW triggers in Sql Server to
> work around a problem I've encountered many times that mirrors exactly the
> limitation I perceive in the model as it is defined. If the only key can
> change,

If you only have one key, and it needs to change, then on what basis
would you correlate anyway? And I still don't know why you'd have that
situation, or why you'd need a transition constraint based on it.

> then you can't correlate the rows in the deleted pseudotable with
> the inserted pseudotable, and therefore, you cannot determine with certainty
> what changed (unless there's only one row). If the only key can change,
> then you can't correlate the tuples in the current instance with those in
> the proposed instance, and therefore, you cannot enforce transition
> constraints (unless there's only one tuple).
>
> > Relational theory provides all you need to meet the requirements
> > you've described here.
> >
>
> >> I would have said, "If no natural key is both recordable and immutable
> >> then the designer must use an artifical surrogate for it."
> >
> > Let me ask: is the surrogate immutable?
>
> It should be. The value, once assigned, shouldn't change for as long as the
> entity is referenced by the database, and even after it's not, that value
> shouldn't be used again.

- erk

Bob Badour

unread,

Aug 18, 2006, 10:22:59 AM8/18/06

to

J M Davitt wrote:

It's gets a lot more fertile after the self-aggrandizing ignorants
spread so much fertilizer on it.

> Brian, please stop making this stuff up! You say, "I can break
> it, and that should be addressed somehow." Then you carry on trying
> to convince us that the database should provide a solution to the
> problem you face. All along, we've been saying, "If these things
> are problems, your design is broken."

Why not call the fertilizer what it is? It's horseshit plain and simple.
Brian spreads it liberally. It requires almost no effort on his part to
make up a new malaprop. I suggest you think about the meaning of Date's
_Principle of Incoherence_ and what it says about who has the advantage
when it comes to addressing cranks and charlatans.

> Let me ask: is the surrogate immutable?

You just asked him to expend no effort whatsoever making up nonsense to
require a great deal of effort on your part to reply. That doesn't make
a whole lot of sense to me.

Brian Selzer

unread,

Aug 18, 2006, 11:07:38 AM8/18/06

to

"erk" <eric...@gmail.com> wrote in message

news:1155908080....@p79g2000cwp.googlegroups.com...

But if a relation value includes a foreign key, a value for that key along
with the definition of the foreign key constraint determines the values for
all of the other attributes in the referenced relation value because the key
value identifies the referenced tuple. I thought that a foreign key
constraint defined a whole-part relationship between tuples in the
referencing relation value and tuples in the referenced relation value.

>
>> Extending the scope of a candidate key's ability to
>> identify instances of predicates from a single database state to
>> successive
>> database states would require that those instances be identical, not just
>> the candidate key values.
>
> So this ability to uniquely identify a fact across successive database
> states is important purely for state-transition constraints? Is there
> any other use for them?

For any temporal constraint (unless there's only one fact).

JOG

unread,

Aug 18, 2006, 12:01:51 PM8/18/06

to

Brian believes tuples in seperate relation values should correspond to
each other, even when /none of the items/ they discuss have even a
single key attribute in common.

I don't believe as bob does that I have to scream 'bullshitter' (I'm
british, and quite frankly I find it vulgar), but having tried to help,
at some point you just have to give up.

There is just no way to reason with Brian's logic about hiding
artificial keys, as it is based on a flawed assumption about
identification, and means he genuinely can't see that the problems he
is imagining are just due to poor design.

Brian Selzer

unread,

Aug 19, 2006, 5:08:42 AM8/19/06

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:_IkFg.14326$gY6....@newssvr11.news.prodigy.com...

Facts are things, and thus can have identity, but the distinction between
the identity of a fact and the identity of the thing that a fact is about is
important. When I say that something has identity, I'm saying that that
thing has properties that distinguish it from all other things in the same
frame of reference. The frame of reference for a candidate key is a
relation value in a database instance. The designation of a candidate key
specifies the attributes in the relation value in the database instance that
represent the identity of each tuple in the relation value. Therefore, a
candidate key value identifies a single fact in a single database instance.
On the other hand, in order to enforce a transition constraint, the frame of
reference for the thing that a fact is about must include not only one
database instance, but both the current and proposed database instances.
Therefore, the properties that distinguish the thing from all others must
not change throughout the update.

This difference between the frame of reference of a candidate key and the
frame of reference of an update is the root of the problem. If you can
correlate facts about something in successive database instances, you can
determine how the relevant properties of something changed, and thus you can
enforce transition constraints. If you can't correlate facts about
something, then you can't enforce transition constraints. A change defines
a temporal boundary that marks the end of one situation and the beginning of
another. The current database instance corresponds to the situation that's
ending and the proposed instance corresponds to the one that's beginning.
It's possible for a candidate key value in the current instance to
indirectly identify something in the situation that's ending, but not in the
situation that's beginning. It's also possible for a candidate key value
that indirectly identifies something in the situation that's ending to
identify something else in the situation that's beginning. How? It's
possible for something to have its appearance altered without altering its
essence, and it's also possible for something to be identified by a property
that can change. For example, consider a line of people at the bank. Both
Person and Position are identifying properties. Assume that you're third in
line, so Person is you, and Position is 3. When the guy at the head of the
line leaves, your Position changes to 2. Now let's put that in the context
of a database. You have a relation with candidate keys Person and Position.
So the current instance might look something like

{(Bob, 1), (Brian, 2), (You, 3)}.

The proposed instance would look something like

{(Brian, 1), (You, 2)}

Even though Position is a candidate key in each situation and indirectly
identifies an entry in the queue, the value 2 from the current instance
identifies the tuple containing You in the proposed instance, not the one
containing Brian. This illustrates the difference in the frame of reference
for a candidate key and that for an update, and Position is an example of an
identifying property that can change.

I see updates as database updates, so in order for the proposed instance to
become current, all state constraints along with all transition constraints
must be satisfied. If there are other relations involved, then the update
must include any changes to them as well. Because dependencies between
relations can be mutual, for example, circular inclusion dependencies, I
think it's important to think in terms of "database assignment" instead of
"relational assignment" when defining constraints. (I think Date calls it
"multiple assignment," and for partitions of related relation variables, it
often makes sense to think in terms of what will affect the smallest number
of relations required, but I prefer to keep it simple.)

>> I have to give this much more thought, but is this a case where
>> syntactic sugar for state-transition constraints could simply
>> transparently introduce "surrogate relations" to implement the state
>> transition constraints as "static" database constraints?
>>

I don't think so. I think that a transition constraint must always involve
two database instances, the current instance and the proposed instance. If
you only rely on the proposed instance, then you're relying on the user to
reassert the old values along with the new values. If they fail to do so,
then they can circumvent the constraint. From another perspective, the
number of all possible database instances that satisfy both the state
constraints and the transition constraints can be smaller than the number of
all possible database instances that satisfy the state constraints, even if
you include "surrogate relations." In this case, possible instances that
shouldn't be possible include those that do not include a tuple containing
the old value. The "surrogate relations" solution shifts responsibility for
enforcing the transition constraint onto the user.

J M Davitt

unread,

Aug 19, 2006, 10:44:34 AM8/19/06

to

/Transition constraint/, too?

> I've illustrated the problem in I don't even remember how many ways.

Have any of those illustrations been successful? With
all the ad hoc terms you've invented, I understand
neither the problem nor your vision of "the way things
ought to be."

> I've
> been begging Microsoft to implement FOR EACH ROW triggers in Sql Server to
> work around a problem I've encountered many times that mirrors exactly the
> limitation I perceive in the model as it is defined. If the only key can
> change, then you can't correlate the rows in the deleted pseudotable with
> the inserted pseudotable, and

/Row correlation/? /Pseudotable/?

> therefore, you cannot determine with certainty
> what changed (unless there's only one row).

But if there *is* only one row, you have no potential
comparand, right? How could you determine whether
*anything* has changed?

> If the only key can change,
> then you can't correlate the tuples in the current instance with those in
> the proposed instance, and therefore, you cannot enforce transition
> constraints (unless there's only one tuple).

/Current instance/? /Proposed instance/?

>>Relational theory provides all you need to meet the requirements
>>you've described here.
>>
>
>
>>>I would have said, "If no natural key is both recordable and immutable
>>>then the designer must use an artifical surrogate for it."
>>
>>Let me ask: is the surrogate immutable?
>
>
> It should be. The value, once assigned, shouldn't change for as long as the
> entity is referenced by the database, and even after it's not, that value
> shouldn't be used again.

One-shot, immutable, universal uniquifier engines!

I'm gonna get me a domain name, set up a web service,
and start selling some! How many do you need?

I'm gonna license a companion product that can
verify that an OSIUUID (TM) is unperturbed. This
tool can be delivered as a plug-in for SOA solutions
or can be linked-in to your favorite database engine
and made available to database programmers as a
custom (but standard) function. Java, of course,
won't be left behind. C#? Got you covered.

I gotta get busy before someone else makes this
dream reality.

Brian Selzer

unread,

Aug 19, 2006, 11:12:23 AM8/19/06

to

"erk" <eric...@gmail.com> wrote in message

news:1155908080....@p79g2000cwp.googlegroups.com...

The problem is that none of them are defined in the model. If the model
even stated "there must exist some means to correlate tuples during an
update," then every implementation of the model must provide that
capability. There may be other ways to correlate tuples that I haven't
enumerated or even conceived. Whether it's object identifiers, tuple
identifiers, mutability constraints in the context of entity integrity,
tuple-level assignment, redefining the assignment operator to be three
separate operators that accept ordered sets instead of sets, or something
else, is less important than recognizing that the need exists. The
discussion about which is the best solution is secondary.

> In any event, I'm not sure that mandatory surrogate keys solve more
> problems than they create. The ability to change the non-surrogate keys
> arbitrarily seems to indicate that The anything, given that they allow
> arbitrary combinations of the various keys. In other words, by
> definition the surrogate key is unrelated to anything else, and as such
> it seems the keys can be shuffled at will. In the case of a relation
> where there is only a single surrogate key (e.g. the tuples represent
> facts that are nearly indistinguishable, like events in a trace), it
> doesn't matter.
>
>> I think that the definition of the model should be
>> strong enough so that I can't break it.
>
> I think you really expect way too much from models, and I'm not sure
> this is a fracture any worse than the cure would produce. I think these
> are fundamental identity problems, not just ones particular to the
> relational model.
>

I don't think that expecting constraints to be enforcable is expecting too
much.

paul c

unread,

Aug 19, 2006, 9:39:51 PM8/19/06

to

Brian Selzer wrote:
...

I snipped the whole body because my comment is about the topic. I'd
like to suggest that whenever "real world" is used in the subject line,
the first paragraph should state what the rest of the body has to do
with demonstrable theory.

p

JOG

unread,

Aug 19, 2006, 10:03:03 PM8/19/06

to

Brian Selzer wrote:
> It's possible for something to have its appearance altered without altering its
> essence

Pish! There is no such thing as an 'essence'. See 'King Milindi's
chariot' for a good explication.

> and it's also possible for something to be identified by a property
> that can change.

Nope. Not if you want to compare it over time (which is what you are
talking about). Then it's not an identifying property at all. And that
sort of consideration should be occuring way before the RM is applied.

> For example, consider a line of people at the bank. Both
> Person and Position are identifying properties. Assume that you're third in
> line, so Person is you, and Position is 3. When the guy at the head of the
> line leaves, your Position changes to 2. Now let's put that in the context
> of a database. You have a relation with candidate keys Person and Position.
> So the current instance might look something like
>
> {(Bob, 1), (Brian, 2), (You, 3)}.
>
> The proposed instance would look something like
>
> {(Brian, 1), (You, 2)}
>
> Even though Position is a candidate key in each situation and indirectly
> identifies

'Indirect identity'? There is no such distinction to be made.

> an entry in the queue, the value 2 from the current instance
> identifies the tuple containing You in the proposed instance, not the one
> containing Brian. This illustrates the difference in the frame of reference
> for a candidate key and that for an update, and Position is an example of an
> identifying property that can change.

Why not extend this? Perhaps brian changed his name to bob while he was
waiting, and queueing positions are changed from numerical to
alphabetical by the bank

rv1: { (Brian, 2) }
rv2: { (Bob, A) }

...and you want to automagically correlate these things? Rather than
think in hindsight maybe the identifiers chosen for the entities
concerned might have been a wee bit of a mistake? Does that not strike
you as making more sense?

Brian Selzer

unread,

Aug 19, 2006, 11:12:27 PM8/19/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1156039383.9...@m73g2000cwd.googlegroups.com...

> Brian Selzer wrote:
>> It's possible for something to have its appearance altered without
>> altering its
>> essence
>
> Pish! There is no such thing as an 'essence'. See 'King Milindi's
> chariot' for a good explication.
>
>> and it's also possible for something to be identified by a property
>> that can change.
>
> Nope. Not if you want to compare it over time (which is what you are
> talking about). Then it's not an identifying property at all. And that
> sort of consideration should be occuring way before the RM is applied.
>

So, what you're saying is, "Never use natural keys." Right?

>> For example, consider a line of people at the bank. Both
>> Person and Position are identifying properties. Assume that you're third
>> in
>> line, so Person is you, and Position is 3. When the guy at the head of
>> the
>> line leaves, your Position changes to 2. Now let's put that in the
>> context
>> of a database. You have a relation with candidate keys Person and
>> Position.
>> So the current instance might look something like
>>
>> {(Bob, 1), (Brian, 2), (You, 3)}.
>>
>> The proposed instance would look something like
>>
>> {(Brian, 1), (You, 2)}
>>
>> Even though Position is a candidate key in each situation and indirectly
>> identifies
>
> 'Indirect identity'? There is no such distinction to be made.
>

I didn't say indirect identity, I said indirectly identifies. The candidate
key value identifies a fact which in turn identifies a thing. There is
indeed a distinction.

>> an entry in the queue, the value 2 from the current instance
>> identifies the tuple containing You in the proposed instance, not the one
>> containing Brian. This illustrates the difference in the frame of
>> reference
>> for a candidate key and that for an update, and Position is an example of
>> an
>> identifying property that can change.
>
> Why not extend this? Perhaps brian changed his name to bob while he was
> waiting, and queueing positions are changed from numerical to
> alphabetical by the bank
>
> rv1: { (Brian, 2) }
> rv2: { (Bob, A) }
>
> ...and you want to automagically correlate these things? Rather than
> think in hindsight maybe the identifiers chosen for the entities
> concerned might have been a wee bit of a mistake? Does that not strike
> you as making more sense?
>

If a key can change, it will. It doesn't matter how stable it is. Choosing
a stable key only reduces the probability that a change will occur or
reduces the frequency of the changes. It does not eliminate the
possibility. Your example above supports my argument. Imagine a very large
database that is updated tens or hundreds of thousands of times a day. Now
assume that the probability of a change ocurring is .01%. This means that
at least once a day there's a possibility of corrupting the database. The
point, even if you can't see it, is that it is not a matter of choosing a
more stable key. No update should *ever* be able to violate or circumvent
the database predicate. If an update *can* violate integrity rules, then
either the data model is broken or the implementation is broken. If the
definition of the model cannot prevent it, then the model is broken.

Brian Selzer

unread,

Aug 19, 2006, 11:15:53 PM8/19/06

to

I'll keep that in mind.

"paul c" <toledob...@oohay.ac> wrote in message
news:H3PFg.425242$iF6.217107@pd7tw2no...

JOG

unread,

Aug 20, 2006, 12:23:43 AM8/20/06

to

Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1156039383.9...@m73g2000cwd.googlegroups.com...
> > Brian Selzer wrote:
> >> It's possible for something to have its appearance altered without
> >> altering its
> >> essence
> >
> > Pish! There is no such thing as an 'essence'. See 'King Milindi's
> > chariot' for a good explication.
> >
> >> and it's also possible for something to be identified by a property
> >> that can change.
> >
> > Nope. Not if you want to compare it over time (which is what you are
> > talking about). Then it's not an identifying property at all. And that
> > sort of consideration should be occuring way before the RM is applied.
> >
>
> So, what you're saying is, "Never use natural keys." Right?

No.

Yes, yes, I follow your logic, but it is still flawed: one attribute
out there will be completely stable for your specific problem space, if
you so desire to find it. That's the nature of identity and what allows
you identification in the first place.

With a surrogate you are representing that attribute anyhow. You're
just completely wrong to hide it from the thing that needs to use it as
identification. (all your examples rely on an entity having some sort
of memory of what previous state it was in, which is an incredible
assumption)

>
> >>
> >> >>> >> > Maybe, but from a functional standpoint, that operator is just a
> >> >>> >> > function (e.g. "subtract $500 from X), in which the balance is a
> >> >>> >> > free
> >> >>> >> > variable. Only in an imperative world does that involve
> >> >>> >> > "knowing"
> >> >>> >> > (referencing) the "previous" balance. Function application means
> >> >>> >> > there's no "query" of the value prior to the update.

> [snip]

Brian Selzer

unread,

Aug 20, 2006, 11:21:26 AM8/20/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1156047823.6...@m79g2000cwm.googlegroups.com...

Then the definition of the model should reflect that, but I'm not sure that
surrogates are the only answer. All that is required is the ability to
correlate tuples during an update. A possibility: require that the user
reassert the original candidate key values during the update. The predicate
of a database along with the current database instance determines the set of
all possible instances that can become current. There should be a way to
obtain the set of possible instances from a set of possible transitions,
each of which would contain what is different on a tuple by tuple basis
between the current instance and a proposed instance. A transition could be
defined as a set of triples (r, t, t') where r is the name of a relation, t
is a tuple from the current instance, and t' is a tuple from the proposed
instance. t would be empty for an inserted tuple, t' would be empty for a
deleted tuple, and neither would be empty for a corresponding tuple. I
positive that I read somewhere that it is possible to transform all state
constraints into transition constraints (I'll have to find out where I read
this). So, given a current instance and a set of transition constraints,
you should be able to construct a set of possible transitions. What I'm not
absolutely sure of is whether or not there are more possible transitions
than possible instances. For some reason it seems likely, unless infinity
is involved. Two different possible transitions could result in the same
possible instance.

I was just thinking: if more than one possible transition can result in the
same possible instance, then given only the current instance and a proposed
instance, can you determine which transition a transition constraint should
enforce? Does it matter, that is, is it possible for one transition that
determines a possible instance to be prohibited but for another transition
that determines the same possible instance to be accepted? If it does
matter, then the notions of relational assignment and multiple assignment
are broken: updates would have to be submitted as transitions or something
equivalent, not just as sets of relation values.

A light bulb just came on! Whether or not different candidate key values
identify the same thing in successive database instances is something that
the user must determine, not the model, but there must be some mechanism for
a user to assert that, otherwise transition constraints cannot always be
enforced. It is up to the user to correlate tuples, not the model, but if a
transition constraint is to be enforced, then the system must be informed
how they correlate, and thus the model should reflect that. Perhaps this
supports Fabian Pascal's objections to variables and relational assignment.

> With a surrogate you are representing that attribute anyhow. You're
> just completely wrong to hide it from the thing that needs to use it as
> identification. (all your examples rely on an entity having some sort
> of memory of what previous state it was in, which is an incredible
> assumption)
>

I concede that it is wrong to hide it. Whether or not it's hidden diverts
attention away from the main issue I'm trying to describe.

I don't think it's an assumption at all. Enforcement of any transition
constraint depends not only on the definition of the constraint, but also on
the current database instance in order to determine whether or not a
proposed database instance should be rejected.