Database design, Keys and some other things

vldm10

nelasīta,

2005. gada 23. sept. 09:36:4123.09.05

uz

Here is a simple example about two entities and one relationship - Car,
Person and Owner. It tries to describe more realistically Real World
situations. A car had its color changed twice. In the Real World
attributes are often changed. In another entity, the person Mary
changed her last name because she got married. In the relationship
Owner it can happen that one person buys the same car twice during a
period of time. It can also happen that two entities can be in the same
relationship many times. So,I believe that this new approach is more
appropriate.

Given the table Car :
CarKey CarID Make Color ...
______________________________________________________
...
23 vin1 Buick silver ...
24 vin1 Buick blue ...
25 vin1 Buick red ...
26 vin2 Honda silver ...
27 vin3 Ford black ...
...

In the Car table VIN = Vehicle Identification Number

Now, let Person be the following table:
PersonKey PersonID PersonName ...
_______________________________________________________
...
208 ssn1 Mary Jones ...
209 ssn1 Mary Adams ...
210 ssn2 John Stewart ...
...

In the Person table SSN = Social Security Number

Then the relationship Owner, which is the act of owning a car, can have
the following
values:
OwnerKey Person Key CarKey Year ...
___________________________________________
...
54 210 26 2003 ...
55 210 24 2004 ...
56 210 26 2005 ...
...

More details about this example can be found on my website
www.dbdesign10.com
where I gave a new definition of Key and tried to develop a new Data
Model.
Any feedback is greatly appreciated.

Vladimir Odrljin

mAsterdam

nelasīta,

2005. gada 23. sept. 14:21:0523.09.05

uz

vldm10 wrote:
> Here is a simple example about two entities and one relationship - Car,
> Person and Owner. It tries to describe more realistically Real World
> situations. A car had its color changed twice. In the Real World
> attributes are often changed. In another entity, the person Mary
> changed her last name because she got married. In the relationship
> Owner it can happen that one person buys the same car twice during a
> period of time. It can also happen that two entities can be in the same
> relationship many times. So,I believe that this new approach is more
> appropriate.
>
> Given the table Car :
> CarKey CarID Make Color ...
> ______________________________________________________
> ...
> 23 vin1 Buick silver ...
> 24 vin1 Buick blue ...
> 25 vin1 Buick red ...
> 26 vin2 Honda silver ...
> 27 vin3 Ford black ...
> ...
>
> In the Car table VIN = Vehicle Identification Number

What kind of facts do you have in this table?
What does one row mean?

The ... needs some date/interval columns
to make any sense out of this table.

I don't see "CarKey" as a car identifier here - it may
identify something, but not a car.

> Now, let Person be the following table:
> PersonKey PersonID PersonName ...
> _______________________________________________________
> ...
> 208 ssn1 Mary Jones ...
> 209 ssn1 Mary Adams ...
> 210 ssn2 John Stewart ...
> ...
>
> In the Person table SSN = Social Security Number

m.m. same here.

[snip combination]

David Cressey

nelasīta,

2005. gada 23. sept. 16:13:2223.09.05

uz

"vldm10" <vld...@yahoo.com> wrote in message
news:1127482601....@f14g2000cwb.googlegroups.com...

You might want to compare your approach to that of Ralph Kimball. In one of
his books on Data Warehousing, he suggests that it is better to use inserts
only on dimension tables. Thus whenever, in the normal course of events, an
attribute takes on a new value, and in an "ordinary" database we would
update an existing row, Kimball suggests instead to insert a new row.

This new row will have a new surrogate key, but the same natural key as a
pre existing row. Facts that already reference the prior dimension row
will continue to do so. New facts that are added later will somehow be
linked to the newer row. Perhaps a date field, as has already been
suggested by Masterdam, will be needed for this purpose.

It seems to me that there are numerous points of similarity between
Kimball's method and yours.

dawn

nelasīta,

2005. gada 24. sept. 12:36:1624.09.05

uz

vldm10 wrote:
> Here is a simple example about two entities and one relationship - Car,
> Person and Owner. It tries to describe more realistically Real World
> situations. A car had its color changed twice. In the Real World
> attributes are often changed. In another entity, the person Mary
> changed her last name because she got married. In the relationship
> Owner it can happen that one person buys the same car twice during a
> period of time. It can also happen that two entities can be in the same
> relationship many times. So,I believe that this new approach is more
> appropriate.
>
> Given the table Car :
> CarKey CarID Make Color ...
> ______________________________________________________
> ...
> 23 vin1 Buick silver ...
> 24 vin1 Buick blue ...
> 25 vin1 Buick red ...
> 26 vin2 Honda silver ...
> 27 vin3 Ford black ...
> ...

You named this table "Car" which might suggest that a row in this table
is-a car and that no two rows are the same car. That is not the case
with your table here. If you are going to model Cars, then model Cars.

> In the Car table VIN = Vehicle Identification Number

That sounds like a candidate key for a Car table.

> Now, let Person be the following table:
> PersonKey PersonID PersonName ...
> _______________________________________________________
> ...
> 208 ssn1 Mary Jones ...
> 209 ssn1 Mary Adams ...
> 210 ssn2 John Stewart ...
> ...
>
> In the Person table SSN = Social Security Number

That sounds like a candidate key for a table of USA people. Yet, you
have multiple rows for the same person. If you are going to model a
Person, then model a Person.

> Then the relationship Owner, which is the act of owning a car, can have
> the following
> values:
> OwnerKey Person Key CarKey Year ...
> ___________________________________________
> ...
> 54 210 26 2003 ...
> 55 210 24 2004 ...
> 56 210 26 2005 ...
> ...
>
> More details about this example can be found on my website
> www.dbdesign10.com
> where I gave a new definition of Key and tried to develop a new Data
> Model.
> Any feedback is greatly appreciated.

To the extent feasible, I would suggest modeling real world entities as
whole at the outset. A person in the real world would be modeled by a
row in the Person table. A Car in the real world would be modeled by a
row in the Car table. Feel free at the start of your modeling to add
properties that are lists to an entity (car colors, for example).

Then you can move to 1NF if needed for the target implementation. A
Car might be referenced in multiple CarColor rows after you split out
the list properties. Similarly, a Person might have a list of former
names, but stay the same person. So, you do not want two PersonKeys
for this one person.

This design issue might have a real name, but I think of it as doing a
"1NF in place" rather than splitting out new tables for list
properties. You start with a car and then you realize it has at least
one multivalued property, so you distort your Car table to a
CarProperty relationship table, losing the model for Car in the
process. --dawn

> Vladimir Odrljin

vldm10

nelasīta,

2005. gada 24. sept. 13:58:1824.09.05

uz

The tables are simplified for the purpose paying attention to keys.
There is the complete explanation of this example, definition of Key
and others things related to it, on my website www.dbdesign10.com.
CarKey is Key and in my opinion it has more appropriate definition of
Key. Current definition of Key in the database theory and its
implementation has limitations, especially for the complex database
projects.
I would like to emphasizes the relation between CarKey and CarID ( I
call this E-relation in my Data Model because this is the entity
level).
For the table Car following set of the pairs
(23, vin1), (24, vin1) and (25, vin1)
identify one car in the Real World. (much better then the pair (23,
vin1))
And for these values it is "True" that they are in the relation.

Vladimir Odrljin

Marshall Spight

nelasīta,

2005. gada 24. sept. 14:56:3224.09.05

uz

vldm10 wrote:
> The tables are simplified for the purpose paying attention to keys.
> There is the complete explanation of this example, definition of Key
> and others things related to it, on my website www.dbdesign10.com.
> CarKey is Key and in my opinion it has more appropriate definition of
> Key.

I'm not clear what you're setting out to do, or what problems
you're setting out to solve. Often these kinds of discussions
are best begun with a specific problem statement. Just saying
that what you're doing is "more like the Real World(tm)" is
not a useful statement; all modelling is an approximation of
reality; the question is, what is it about reality that we
want to model? Studying the definition of "abstraction" may
prove useful.

> Current definition of Key in the database theory and its
> implementation has limitations, especially for the complex database
> projects.

Can you be specific about what these limitations are? And why
do you distinguish between the definition and the implementation?
The implementations of keys in the databases I've worked with
have matched the definition precisely. Are you saying there
is a problem with the implementation relative to the existing
definition? If you're saying there's a problem with the definition,
how does that mean there's a problem with the implementation?

> I would like to emphasizes the relation between CarKey and CarID ( I
> call this E-relation in my Data Model because this is the entity
> level).
> For the table Car following set of the pairs
> (23, vin1), (24, vin1) and (25, vin1)
> identify one car in the Real World.

If (23, vin1), (24, vin1) and (25, vin1) identify one car, that
says that vin identifies car.

Your examples suggest that what you're trying to do is capture
the history of changes to an entity. Is that your area of concern?
Are you familiar with any of the current approaches? What
deficiencies do you identify with them that your model overcomes?

Marshall

mAsterdam

nelasīta,

2005. gada 24. sept. 17:47:1224.09.05

uz

vldm10 wrote:
> The tables are simplified for the purpose paying attention to keys.
> There is the complete explanation of this example, definition of Key
> and others things related to it, on my website www.dbdesign10.com.

At that page, you say:
> Here the CarKey is the Identifier of the state of the entity Car
> and this is the only column of the table which has unique values.
> So the attribute CarKey is the primary key.

If you 'ld instead have said:
"Here the CarStateKey is the Identifier of the state of the entity Car"
I would have thought: "Ha, nice, an investigation into
modeling state, and the difficult key issues which arise when
attempting such an endevour" and I would have been interested to
read on - but you didn't. Now my first reaction was: this can't
be at all very interesting, because the first example starts with
lies due to misnaming things - just another sign of disrespect of
/meaning/. This happens a lot when one is in a state of overrating
form/structure at the cost of substance/meaning.
Building on that he'll just run into all the obvious errors.

The only purpose of formal reasoning is to support meaning.
Ignoring that is a basic mistake.

> CarKey is Key and in my opinion it has more appropriate definition of
> Key. Current definition of Key in the database theory and its
> implementation has limitations, especially for the complex database
> projects.
> I would like to emphasizes the relation between CarKey and CarID ( I
> call this E-relation in my Data Model because this is the entity
> level).
> For the table Car following set of the pairs
> (23, vin1), (24, vin1) and (25, vin1)
> identify one car in the Real World. (much better then the pair (23,
> vin1))
> And for these values it is "True" that they are in the relation.

Shlaer, S., and Mellor, S., Object Lifecycles:
Modeling the World in States, Prentice-Hall, NJ (1992)
is pre-UML, but it is a nice read.

mAsterdam

nelasīta,

2005. gada 24. sept. 18:45:0324.09.05

uz

dawn wrote:
> ...If you are going to model Cars, then model Cars.
and
>... If you are going to model a Person, then model a Person.
and

> To the extent feasible, I would suggest modeling real world entities as
> whole at the outset. A person in the real world would be modeled by a
> row in the Person table. A Car in the real world would be modeled by a
> row in the Car table. Feel free at the start of your modeling to add
> properties that are lists to an entity (car colors, for example).
>
> Then you can move to 1NF if needed for the target implementation. A
> Car might be referenced in multiple CarColor rows after you split out
> the list properties. Similarly, a Person might have a list of former
> names, but stay the same person. So, you do not want two PersonKeys
> for this one person.

He forgot "state" - but this will have multiple consequences/lead to
re-edits in his later remarks.

> This design issue might have a real name, but I think of it as doing a
> "1NF in place" rather than splitting out new tables for list
> properties. You start with a car and then you realize it has at least
> one multivalued property, so you distort your Car table to a
> CarProperty relationship table, losing the model for Car in the
> process. --dawn

When are you coming to the Netherlands? :-)

vldm10

nelasīta,

2005. gada 24. sept. 19:58:3224.09.05

uz

I didn't read any book from Ralph Kimball. If he completed a theory
about this then it is good. If there is some new explanation what is
the key, how we should construct the Conceptual Model and all other
things, then we have new Data Model. If his suggestion is with old
definition of key then it can cause some inconsistency.

> his books on Data Warehousing, he suggests that it is better to use inserts
> only on dimension tables. Thus whenever, in the normal course of events, an
> attribute takes on a new value, and in an "ordinary" database we would

If you noticed in my example it is not related to new value of
attribute, rather it is related to event. It can be the "closing"
of data, i.e. there is no new attribute value. It can be 5 same rows.
Only the key's values are different, i.e. again there is no a new
value of attribute.

vldm10

nelasīta,

2005. gada 24. sept. 22:14:4424.09.05

uz

I have two questions:
1. Let VIN be the primary key for the Car table. How we can enter
information in the table Car that vin1 (that is the Buick) is again
painted to the blue.
2. Let SSN be the primary key for the table Person. How we can enter
information in the
table Person that Mary Adams changed her name again to Mary Jones
- she got divorced.
These two questions show one limitation in the RM.

I built this model so that the relations are at lest in the BC normal
form automatically. There is only one candidate key, there are no
compound keys and all the other values that are not Key can be repeated
any number of times. If this reason (generally speaking) is not good
please let me know.

dawn

nelasīta,

2005. gada 24. sept. 22:17:1224.09.05

uz

> When are you coming to the Netherlands? :-)

No time soon, I suspect, but hopefully sometime before all of the known
relatives die. Pass me your e-mail address again if you could,
mAsterdam -- I have not recovered all of my contacts from the theft of
my PC early this year.
Thanks --dawn

Roy Hann

nelasīta,

2005. gada 25. sept. 07:47:5525.09.05

uz

"vldm10" <vld...@yahoo.com> wrote in message

news:1127614484.1...@z14g2000cwz.googlegroups.com...

> These two questions show one limitation in the RM.

You just earned "one good horse laugh".

Roy

dawn

nelasīta,

2005. gada 25. sept. 12:01:1525.09.05

uz

If you are using the RM as typically taught rather than the RM that
includes nested relations, then along with a PaintColor table, you
would have a relationship table between PaintColor and Car, perhaps
named CarColor. That table could be seen as an event table with
paintDate and colorReplacedDate or paintDate and paintStatus (a date
range or a combination of date and status).

So the Car table includes facts about Cars and the relationship table
CarColor includes facts about the car colors. The relationship table
has a candidate key that consists at least of the Car ID and the Color
ID. If you want it to be more activity/event based about when the car
was painted, then it would also include a starting date. Otherwise a
compound candidate key of car id and color id might work. Make sense?

> 2. Let SSN be the primary key for the table Person. How we can enter
> information in the
> table Person that Mary Adams changed her name again to Mary Jones
> - she got divorced.
> These two questions show one limitation in the RM.

This is pretty much the same question, so you can use a similar
approach. This does not show a limitation in the RM, although we might
way that it shows a human-computer-interaction issue. --dawn

vldm10

nelasīta,

2005. gada 25. sept. 16:42:0225.09.05

uz

Marshall Spight wrote:
> I'm not clear what you're setting out to do, or what problems

You can find this on my website.
I added Events and the new way of construction of the entities and
relationships to the Conceptual Model. The intention is that there is
only one key in the Conceptual Model. No the candidate keys, and no the
compound keys. The same intention is in the Logical Model. The
definition of key is also new.
I included the knowledge which is related to data in this Data Model.

> you're setting out to solve. Often these kinds of discussions
> are best begun with a specific problem statement. Just saying
> that what you're doing is "more like the Real World(tm)" is
> not a useful statement; all modelling is an approximation of
> reality; the question is, what is it about reality that we
> want to model? Studying the definition of "abstraction" may
> prove useful.
>

> Can you be specific about what these limitations are? And why

The limitations are basically related to above mention. Let me give you
one example.
Let Key for a relation be a compound key which has two attributes. If
you want to build TransRelational Model then you should "split"
your compound key. This can complicate the reconstruction of the
"record".
(I will replay to dawn also regarding some limitations )

> do you distinguish between the definition and the implementation?
> The implementations of keys in the databases I've worked with
> have matched the definition precisely. Are you saying there
> is a problem with the implementation relative to the existing
> definition? If you're saying there's a problem with the definition,
> how does that mean there's a problem with the implementation?
>

> If (23, vin1), (24, vin1) and (25, vin1) identify one car, that
> says that vin identifies car.

vin1, also can determine the set S = {x: tE(x, vin1) = T}
where E(x, vin1) is the sentence: " the x is in the relation E with
the vin1"

>
> Your examples suggest that what you're trying to do is capture
> the history of changes to an entity. Is that your area of concern?

No. The idea is that we can identify one thing using our knowledge. So
there are many way to identify one thing. We can use our knowledge
instead of the identifier. For example, if you ask your friend to bring
your car from a parking lot then he will rather use his knowledge to
find your car than look for VIN. One person can be identified as father
of his son or as husband of his wife or a man who is living in that
house or by his name.
Set of CarKeys which are in a relation with one CarID, forms better
knowledge to identify one car.

dawn

nelasīta,

2005. gada 25. sept. 17:59:0125.09.05

uz

vldm10 wrote:
> Marshall Spight wrote:
> > I'm not clear what you're setting out to do, or what problems
>
> You can find this on my website.

I took a quick look. I like your focus on events triggering changes to
information. I'm still happy with 70's lingo talking about "master
data" "history data" and "transaction data" that adds, changes, or
deletes data in the master or history "files". (This is just a subset
of data, uh, types? ;-) but a significant subset)

> I added Events and the new way of construction of the entities and
> relationships to the Conceptual Model. The intention is that there is
> only one key in the Conceptual Model.

You give no rationale for that. The model I work with most closely
does what you suggest. I give an "after the fact" rational for why I
like that -- because the relation is more obviously a function of that
ID. In IBM's UniData product that is referred to by the non-standard
name of @ID. Every "file" has a unique @ID. But since it makes little
sense to gen a key for every relationship file, you can specify all
"dimensions" (if you have OLAP blood) for a relationship table in that
single key with delimiters. You can do this in XML too, by having a
parent tag of <ID> and then identify the components as <CarID> and
<ColorID> for example.

So, what is your rationale for choosing a single, often meaningless,
identifier for each relation rather than working with a more natural
candidate key in a relationship table?

> No the candidate keys, and no the
> compound keys. The same intention is in the Logical Model. The
> definition of key is also new.
> I included the knowledge which is related to data in this Data Model.

I don't like the way you confuse (IMO) entities with entities in a
particular state. State properties are properties like others. You
can track my mood, but if you have a Person relation and you include my
mood in there, then you are going to duplicate my name in every one of
those Person rows, one per mood over time? That's a lot of duplicate
data.

Now you decide that you want to also track the colors I'm wearing over
time? Now you have multvalued state information that would need to be
somehow combined with the mood information so you can ask questions
such as "Is there a correlation between the colors worn and the mood?"
Would you have cartesian cross-product with the person-mood, the
person-attire and the attire-colors all modeled in one relation?

> > you're setting out to solve. Often these kinds of discussions
> > are best begun with a specific problem statement. Just saying
> > that what you're doing is "more like the Real World(tm)" is
> > not a useful statement; all modelling is an approximation of
> > reality; the question is, what is it about reality that we
> > want to model? Studying the definition of "abstraction" may
> > prove useful.
> >
> > Can you be specific about what these limitations are? And why
>
> The limitations are basically related to above mention. Let me give you
> one example.
> Let Key for a relation be a compound key which has two attributes. If
> you want to build TransRelational Model then you should "split"
> your compound key. This can complicate the reconstruction of the
> "record".
> (I will replay to dawn also regarding some limitations )

I don't track with this. I can't recall what the TransRelational model
is right now to understand the relevance -- remind me if you can. What
do you mean by splitting your compound key?

> > do you distinguish between the definition and the implementation?
> > The implementations of keys in the databases I've worked with
> > have matched the definition precisely. Are you saying there
> > is a problem with the implementation relative to the existing
> > definition? If you're saying there's a problem with the definition,
> > how does that mean there's a problem with the implementation?
> >
> > If (23, vin1), (24, vin1) and (25, vin1) identify one car, that
> > says that vin identifies car.
>
> vin1, also can determine the set S = {x: tE(x, vin1) = T}
> where E(x, vin1) is the sentence: " the x is in the relation E with
> the vin1"
>
> >
> > Your examples suggest that what you're trying to do is capture
> > the history of changes to an entity. Is that your area of concern?
>
> No. The idea is that we can identify one thing using our knowledge. So
> there are many way to identify one thing. We can use our knowledge
> instead of the identifier. For example, if you ask your friend to bring
> your car from a parking lot then he will rather use his knowledge to
> find your car than look for VIN.

No doubt. You can model that with a SELECT statement, right?

> One person can be identified as father
> of his son or as husband of his wife or a man who is living in that
> house or by his name.
> Set of CarKeys which are in a relation with one CarID, forms better
> knowledge to identify one car.

It almost sounds like you are modeling views of the data for retrieval
which is a good thing to do, but not quite the same as modeling for
quality transaction processing, for example. If you are modeling
read-only data that would be different from data that will be
maintained. If I change my name, you don't want to zip through all my
attrire color mood rows to make (or neglect) that change. That is the
reason for "master data" including modeling entities and current
attributes/properties -- change it in one place and now we can ask
questions about moods using my new name in the search criteria.

>
> > Are you familiar with any of the current approaches? What
> > deficiencies do you identify with them that your model overcomes?

I, too, am interested in what problems or opportunities you are
addressing. It isn't yet clear to me. --dawn

> >
> > Marshall

JOG

nelasīta,

2005. gada 25. sept. 18:39:3125.09.05

uz

Very interesting post, as I think the effect you have produced is
simliar to transclusion. Consider that a car cannot simultaneously be
silver and blue. Let us also agree, that despite its current colour the
buick vin1 is the same car throughout its life, despite its changing
characteristics.

So now you have an interesting problem - you want to record the same
entity in a set multiple times (it has the same carID right, and this
is its primary key?), but the definition of a set precludes this. Yet
your overall aim seems reasonable. Something, somewhere is awry.

Philosophically, according to the constructivist approach anyhow, the
answer lies in the fact that the car now and after its new paint job,
are different items (this seems counterintuitive at first) - yet two
items obviously extremely closely related to each other. From this
viewpoint an object is not defined by some magical, universal
fingerprint, for there is none. Rather it can be decomposed
(or deconstructed) into a combination of:

(1) an entity, representing the concept of the car, which is
irreducible (a chosen, human artifice).
(2) this instance's relations with other items in the world.

What is chosen as irreducible for (1) and at what level you are
deconstructing (2) is up to what level of detail is useful for you and
your users - for a driver, colour, interior style, top speed, etc, seem
fine, a mechanic might deconstruct down to engine specs and lengths of
nuts and bolts, a physicist down its atoms, and so on ad infinitum.

Ok, the point of this is not to try and be clever or existential, but
rather because there is a practical result to this line of thought:

Your car now and a year ago are the same item (1 is the same), however
its defining relations to the world (2) have changed. It is in one
sense the same item, yet in another sense a different item, as it is
existing in a different context. In computer science this is analagous
to Ted Nelson's concept of transclusion, and is a potentially powerful
mechanism.

This concept however does not fit particularly neatly into RDBMS as,
because of the constructivist approach, a car entity has no physical
representation - and so no value for relational calculus to reference
against. However in RM, giving the entity item an artificial key and
you have an effective fudge, doing the job.

In your example this car entity is represented by its (carID and Make),
but its instance is represented by its (CarKey).

J. Goulding

(This sort of approach is in early stages of development at University
of Nottingham, UK, but I'd be interested to hear any critiques -
throughout, the use of the word instance is obviously distinct from any
programming connotations of the term. For constructivism see
http://en.wikipedia.org/wiki/Constructivist_epistemology and for
transclusion see google)

Marshall Spight

nelasīta,

2005. gada 25. sept. 19:23:2625.09.05

uz

vldm10 wrote:
> Marshall Spight wrote:
> > I'm not clear what you're setting out to do, or what problems
>
> You can find this on my website.

I didn't find that to be so. There's a small section near the
bottom that says "Solutions for two significant problems" but
I didn't find any clear problem statement in there. The closest
was "recognize who created the data and how it was created"
but it's not at all clear what that means. I haven't seen
anything in your examples that tracks who created the data.

> I added Events and the new way of construction of the entities and
> relationships to the Conceptual Model. The intention is that there is
> only one key in the Conceptual Model. No the candidate keys, and no the
> compound keys. The same intention is in the Logical Model. The
> definition of key is also new.
> I included the knowledge which is related to data in this Data Model.

Okay. So you're starting with the relational model and restricting
it by disallowing multiple keys or compound keys. That's already
not a good sign. You've significantly limited the expressiveness
of the model, and you haven't explained what you feel you've
gained in return.

> > you're setting out to solve. Often these kinds of discussions
> > are best begun with a specific problem statement. Just saying
> > that what you're doing is "more like the Real World(tm)" is
> > not a useful statement; all modelling is an approximation of
> > reality; the question is, what is it about reality that we
> > want to model? Studying the definition of "abstraction" may
> > prove useful.
> >
> > Can you be specific about what these limitations are? And why
>
> The limitations are basically related to above mention. Let me give you
> one example.
> Let Key for a relation be a compound key which has two attributes. If
> you want to build TransRelational Model then you should "split"
> your compound key. This can complicate the reconstruction of the
> "record".
> (I will replay to dawn also regarding some limitations )

I would not say this qualifies as specific problem statement.
You've also introduced a non-standard term ("TransRelational
Model") without any definition or references. The only
transrelational model that I know of is one occasionally
referenced on dbdebunk.com, and as I understand it, it's
purely an implementation technique, so I have no idea why
you'd be discussing it at the logical level.

> > do you distinguish between the definition and the implementation?
> > The implementations of keys in the databases I've worked with
> > have matched the definition precisely. Are you saying there
> > is a problem with the implementation relative to the existing
> > definition? If you're saying there's a problem with the definition,
> > how does that mean there's a problem with the implementation?
> >
> > If (23, vin1), (24, vin1) and (25, vin1) identify one car, that
> > says that vin identifies car.
>
> vin1, also can determine the set S = {x: tE(x, vin1) = T}
> where E(x, vin1) is the sentence: " the x is in the relation E with
> the vin1"

I don't understand this notation at all. What does the : mean?
What is T? I also note that you didn't answer any of my questions
in the earlier paragraph.

> > Your examples suggest that what you're trying to do is capture
> > the history of changes to an entity. Is that your area of concern?
>
> No. The idea is that we can identify one thing using our knowledge.

Okay. So you're saying that your area of concern is that we can
identify one thing using our knowledge? I'm not sure what that
means. Again, a specific problem statement would help.

> So
> there are many way to identify one thing. We can use our knowledge
> instead of the identifier.

SQL already does quite well at this. How do you propose to make
it better? Alternatively, what aspects of its current capabilities
do you hope to improve?

> For example, if you ask your friend to bring
> your car from a parking lot then he will rather use his knowledge to
> find your car than look for VIN. One person can be identified as father
> of his son or as husband of his wife or a man who is living in that
> house or by his name.

I don't find this kind of analogy very useful. Generally with
computers we are trying to solve algorithmic problems, not find
cars in parking lots. I don't see the commonality.

> Set of CarKeys which are in a relation with one CarID, forms better
> knowledge to identify one car.
>
> > Are you familiar with any of the current approaches? What
> > deficiencies do you identify with them that your model overcomes?

I'm assuming that since you didn't answer this, you're *not*
familiar with any of the current approaches.

The general impression I'm getting of you is of someone who is
smart and fairly creative and also lacking in any real-world
experience. It might be useful to develop your ideas if you
got a job in data management, and thereby collected a set of
*really* Real-World problems-- that is, problems that actually
come up in business or science. In general, recording different
colors of car paint or finding cars in parking lots do not
fall into the category of hard problems for computer science;
but there are plenty of other problems that *are* hard that
a smart and creative person could really help with. But you
absolutely need the experience first, and also the education
to know what else has been tried. Otherwise you'll just spend
years rehashing what everyone else has already thought of.

Good luck!

Marshall

vldm10

nelasīta,

2005. gada 25. sept. 21:48:0025.09.05

uz

>
> If you are using the RM as typically taught rather than the RM that
> includes nested relations, then along with a PaintColor table, you
> would have a relationship table between PaintColor and Car, perhaps
> named CarColor. That table could be seen as an event table with
> paintDate and colorReplacedDate or paintDate and paintStatus (a date
> range or a combination of date and status).
>
> So the Car table includes facts about Cars and the relationship table
> CarColor includes facts about the car colors. The relationship table
> has a candidate key that consists at least of the Car ID and the Color
> ID.

If one car changed color (or any attribute) and had same color
(attribute value) in two different periods this key doesn't work. You
need to add date or some status or anithing else.

>If you want it to be more activity/event based about when the car
> was painted, then it would also include a starting date. Otherwise a
> compound candidate key of car id and color id might work. Make sense?
>
> > 2. Let SSN be the primary key for the table Person. How we can enter
> > information in the
> > table Person that Mary Adams changed her name again to Mary Jones
> > - she got divorced.
> > These two questions show one limitation in the RM.
>
> This is pretty much the same question, so you can use a similar
> approach. This does not show a limitation in the RM, although we might
> way that it shows a human-computer-interaction issue. --dawn

This is the way how people usually practically work, but there are some
things which are not defined in the theory. In the RM and the ER
Conceptual Model there are no events. You mentioned Event table, but
what is event. We know that event is one of the most general concepts
in the Real World.
The second think, that we use often is the date or the time in the
construction of an entity and a relationship. Can a date be an
attribute of an entity? Can for example a date be an attribute of some
person? I don't believe. Not to mention that time or date for some
attribute's value can has three different values in the Real World,
the Conceptual Model and the Logical Model respectively. I defined a
date as a part of a knowledge related to some data, so it is not an
attribute. Now I have a problem with the assumption that date is not
attribute, I can't use date as key (or as a part of the key) because
in the key definition (for RM) states that the key is an attribute or
set of the attributes. It is similarly with things like a status, an
indicator, etc. I think that these are limitation in the RM. I also
think that the Relational Model is the best database model, but there
are things which are not cavered with the theory. (Of course this is
just one opinion)

Vladimir Odrljin

vldm10

nelasīta,

2005. gada 25. sept. 22:18:0525.09.05

uz

JOG wrote:
> Very interesting post, as I think the effect you have produced is
> simliar to transclusion. Consider that a car cannot simultaneously be
> silver and blue. Let us also agree, that despite its current colour the
> buick vin1 is the same car throughout its life, despite its changing
> characteristics.
>
> So now you have an interesting problem - you want to record the same
> entity in a set multiple times (it has the same carID right, and this
> is its primary key?),

The primary key is the CarKey attribute. CarKey and CarID together
provide a things related to the Real World, CarKey alone is related to
the database i.e. CarKey is the Primary Key or simple Key.
So CarKey and CarID together provide identification, meanig and some
knowledge about the Real World entity's (or relationship's) instance.

Vladimir Odrljin

dawn

nelasīta,

2005. gada 26. sept. 10:06:1926.09.05

uz

vldm10 wrote:
> >
> > If you are using the RM as typically taught rather than the RM that
> > includes nested relations, then along with a PaintColor table, you
> > would have a relationship table between PaintColor and Car, perhaps
> > named CarColor. That table could be seen as an event table with
> > paintDate and colorReplacedDate or paintDate and paintStatus (a date
> > range or a combination of date and status).
> >
> > So the Car table includes facts about Cars and the relationship table
> > CarColor includes facts about the car colors. The relationship table
> > has a candidate key that consists at least of the Car ID and the Color
> > ID.
>
> If one car changed color (or any attribute) and had same color
> (attribute value) in two different periods this key doesn't work.

I said the key has AT LEAST these two.

> You
> need to add date or some status or anithing else.

I indicated that above with the date range or combination of date &
status.

> >If you want it to be more activity/event based about when the car
> > was painted, then it would also include a starting date. Otherwise a
> > compound candidate key of car id and color id might work. Make sense?
> >
> > > 2. Let SSN be the primary key for the table Person. How we can enter
> > > information in the
> > > table Person that Mary Adams changed her name again to Mary Jones
> > > - she got divorced.
> > > These two questions show one limitation in the RM.
> >
> > This is pretty much the same question, so you can use a similar
> > approach. This does not show a limitation in the RM, although we might
> > way that it shows a human-computer-interaction issue. --dawn
>
>
> This is the way how people usually practically work, but there are some
> things which are not defined in the theory. In the RM and the ER
> Conceptual Model there are no events.

I agree it is useful to identify what prompts changes to the database
"facts" and when. In the RM nothing is identified as a "master file"
or a "code file" or a "history file" or a "transaction file" as was
common when designing indexed sequential files in years gone by. That
is not so much a gap in the theory as something a relational theorist
might consider "orthogonal" to the RM.

However, the extensions that Date & company are developing (of which I
am ignorant) to address time in a different way suggest (again, I'm
guessing) that they have identified a reason to think not only in terms
of a set of facts as they stand, but facts as they change over time.
So I will agree with you that the identification of events and change
over time is a good thing. I haven't found your solution to be
enticing, however.

> You mentioned Event table, but
> what is event. We know that event is one of the most general concepts
> in the Real World.
> The second think, that we use often is the date or the time in the
> construction of an entity and a relationship. Can a date be an
> attribute of an entity? Can for example a date be an attribute of some
> person?

A date can be an attribute of a "transaction" or event. I agree this
is important and not handled well today. Time & date "stamping" of
data (e.g. startingDate, endingDate, deceasedDate) have handled the
most basic needs, but these are often properties of attribute values
(ColorStatusChangeDate) or weak entities, not properties of strong
entities.

> I don't believe. Not to mention that time or date for some
> attribute's value can has three different values in the Real World,
> the Conceptual Model and the Logical Model respectively. I defined a
> date as a part of a knowledge related to some data, so it is not an
> attribute.

OK, so you have a means of getting at the properties of attributes?
You have a hierarchy of entity > attribute > property? What does a
query look like? XML has entities & attributes with nesting that often
causes us to look at RM entities as container entities and attributes
as child entities, then use the xml attributes of an xml entity for
these properties. That can be useful, but it often results in a
confusion between sub-entities & attributes and makes for a language
like XQuery (not exactly lean & mean).

> Now I have a problem with the assumption that date is not
> attribute, I can't use date as key (or as a part of the key) because
> in the key definition (for RM) states that the key is an attribute or
> set of the attributes. It is similarly with things like a status, an
> indicator, etc. I think that these are limitation in the RM.

OK, I'll buy that -- it is a limitation of the RM that you cannot model
data with entities, properties of entities, properties of values of
properties, properties of properties of values of properties, ... using
such a hierarchy of values.

But simplifying this complexity into entities & attributes is also part
of the charm, allowing for simplicity in querying, sort-a, kind-a. We
could simply further so there is only one "thing" (entity, for example)
and relationships between this thing and other things. Some products
do this. Both too much simplification and too little simplification
cause problems.

> I also
> think that the Relational Model is the best database model,

I don't, but I do think it is the best laid out and best taught.

> but there
> are things which are not cavered with the theory. (Of course this is
> just one opinion)

I'm trying to get us to move beyond the RM myself, so I want to agree
with your questions & answers, but it isn't resonating yet. In
particular, I really don't like the confusion between modeling an
entity and modeling the relationship between that entity and some
property of that entity. Keep trying and maybe it will click. Cheers!
--dawn

> Vladimir Odrljin

JOG

nelasīta,

2005. gada 26. sept. 12:44:0326.09.05

uz

Dawn,

I think the problems that are encountered with the RM are to do with
the fact that it's fundamental principle that we should use values to
define something (as opposed to an identifier) is flawed. Its analagous
to the movement called realism, which has its roots as far back as
Plato. Specifically a sub-philosophy called essentialism:

"Essentialism is the belief and practice centered around a
philosophical claim that for any specific kind of entity it is at least
theoretically possible to specify a finite list of characteristics, all
of which any entity must have to belong to the group defined."

Well this is exactly what the RM does - an entity's form defines it,
and thats why in principal you should be using natural keys, and using
any other abritrary id's is a bit of a fudge within RM.

*However* in over the years this essentialist viewpoint has been pretty
much anhiallated, and nowadays seems extremely antiquated. Rather a
stronger belief now is nominalism:

"Nominalism is the position in metaphysics that there exist no
universals outside of the mind."

This would mean there is no natural primary key can produce some
universal fingerprint from which to describe something - yet that's
what RM's want you to do. And so to fix this incorrect viewpoints in
RM, people bang in surrogate keys to represent the artificial uniqeness
of an item that we necessarily create in our minds.

JOG

nelasīta,

2005. gada 26. sept. 12:48:5526.09.05

uz

Ok, to describe that better with an example:

Consider you have a car and over 20 years you gradually bit by bit
change every single part of it. The paint job, the engine, the
interior, the chassis... everything, bit by bit. After 20 years it
doesn't have a single component that was originally in it. It is still
the SAME car?

If yes, then you might agree that the RM model is insufficient and
breaks down. If no, then at what point did it become a different car?
When the first component changed, when the last component changed, when
it was 50% changed? There is absolutely no satisfactory answer.

The implication is that to refer to it, every single tuple should have
an artificial key representing our concept of it.

paul c

nelasīta,

2005. gada 26. sept. 13:18:0926.09.05

uz

JOG wrote:
> Dawn,
>
> I think the problems that are encountered with the RM are to do with
> the fact that it's fundamental principle that we should use values to

> define something (as opposed to an identifier) is flawed. ...

i think the prominent writers about the RM have never said this, it's
more like they have said that relations are expressed only through
values. nor do i believe they have ever said that an 'identifier'
somehow defines "something" rather it only identifies it. any
definition of what is real or what is not real is outside the RM as far
as i know.

cheers,
paul c

JOG

nelasīta,

2005. gada 26. sept. 14:03:1726.09.05

uz

You're right, my wording is sloppy (I don't mean identifier, I mean
reference, and when i say a tuple's values defines it in RM, I am
inferring this from its values being the only way to access and refer
to it). I also don't mean to be overly critical of the RM - after all
through the use of surrogate keys, you can achieve the desired effect.

I maintain the overall point though, in that any tuple (which after all
is representing a concept, the finite partial map, which in
mathematical notation would happily have a denotation) must have a
reference to it available, as its existence cannot be referred to by a
composite of its values alone (see the 20 year old car example above).

Yet any such reference is not an attribute such as the car's colour or
engine size are - it's a way of denoting a value-less concept,
representing the tuple as a whole. Given this is valid for any tuple,
the data model imo should be handling this for you, not forcing you to
crowbar a solution by fudging in user-generated artificial keys.

Gene Wirchenko

nelasīta,

2005. gada 26. sept. 14:15:3526.09.05

uz

On 25 Sep 2005 18:48:00 -0700, "vldm10" <vld...@yahoo.com> wrote:

[snip]

>The second think, that we use often is the date or the time in the
>construction of an entity and a relationship. Can a date be an
>attribute of an entity? Can for example a date be an attribute of some
>person? I don't believe. Not to mention that time or date for some

Yes. The most common example is probably birthdate.

[snip]

Sincerely,

Gene Wirchenko

vldm10

nelasīta,

2005. gada 26. sept. 21:00:1026.09.05

uz

dawn wrote:
>
> So, what is your rationale for choosing a single, often meaningless,
> identifier for each relation rather than working with a more natural
> candidate key in a relationship table?

I do not generate only a key; it is also the attribute which is the
name of the relation's instance. In current RM, typically, you do not
have an attribute which is the name of the relation's instance, you
have only the key. The key can be compound - meaning a key can be a set
of the attributes; i.e. the key is not an attribute. So my solution is
the attribute and it can't be meaningless because attribute has
origins in Conceptual Model and meaning in the Real World. A key which
has number as value is annoying, but we always use some additional
information to get meaning in the Real World. It is same case with SSN,
VIN etc.
If we have the Car table with the cars which attributes are all the
time same, it is okay to use existing RM solutions. However the RM's
definition of the key can't support cases which we discussed. In
these cases we need maybe to split the table. This involves a FK and
the data integrity or we need some additional "fields" which are
not the attributes, or we need some additional attributes. If an
entity has an attribute which repeats its value then I prefer my
solution. Relationships repeat their values more frequently then the
entities.
Finally current implementation of the Relational Model causes some
problems.
The TransRelational™ Model is what the database world is waiting. It
provides a completely new approach to implementation. In this Model
columns are stored separately. But the key definition in the RM should
be changed. The logic of the compound key is directly opposite to the
logic of this model.
In 2.4 under 2. on my website I set more general representation of
storing the columns separately. An example 2.5 shows how knowledge can
be represented regarding one data (column Amount) and using the key
which is related to the state.

Vladimir Odrljin

JOG

nelasīta,

2005. gada 26. sept. 21:39:4126.09.05

uz

It boils down to this. artificial keys are a workaround. No data model
should rely on the user having to fudge identifiers in to get the job
done (Though I understand that Codd was a pragmatist, and his
recommendation of surrogate keys is to get the system working)

The real world is stating a predicate such as:

"There is a car that is a buick, blue and built in 2000."

NOT

"There is a car that is a buick, blue and built in 2000 with an
invisible universal id of 24."

These key's are fudges, hacks, and indicate something lacking in a
model, which should natively be able to handle what you are trying to
do.

jog
(I believe I may have used the word fudge too many times)

Marshall Spight

nelasīta,

2005. gada 26. sept. 22:37:2626.09.05

uz

JOG wrote:
>
> The real world is stating a predicate such as:
>
> "There is a car that is a buick, blue and built in 2000."
>
> NOT
>
> "There is a car that is a buick, blue and built in 2000 with an
> invisible universal id of 24."

What's a VIN, then? Are we all imagining them? Are they somehow
outside of the real world? Is your bank account id somehow
Not Of This Earth? If you bank account didn't have an id,
would it be enough to identify it just by the balance?
"There is a bank account that is a checking account, with
$1,000,000.00 in it." Could I then authenticate myself to
the bank by saying, yeah, I'm the guy with the million
dollars in the bank. If you forgot your balance, would
that mean you lost access to your money?

Marshall

vldm10

nelasīta,

2005. gada 27. sept. 00:18:3127.09.05

uz

If you mean that birth date is exactly one date when a person began to
live, then It doesn't seem to me that it is the attribute of the
person. The Age of the person is the attribute of the person and is
derived from birth date and todays data

Vladimir Odrljin

Marshall Spight

nelasīta,

2005. gada 27. sept. 01:36:4727.09.05

uz

How can you derive age if birthdate isn't an attribute of the
person? Where would you get the birthdate from?

Marshall

x

nelasīta,

2005. gada 27. sept. 02:32:5527.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1127753335....@f14g2000cwb.googlegroups.com...

> Ok, to describe that better with an example:

> Consider you have a car and over 20 years you gradually bit by bit
> change every single part of it. The paint job, the engine, the
> interior, the chassis... everything, bit by bit. After 20 years it
> doesn't have a single component that was originally in it. It is still
> the SAME car?

> If yes, then you might agree that the RM model is insufficient and
> breaks down.

I would hope it becomed a better car :-)

> If no, then at what point did it become a different car?
> When the first component changed, when the last component changed, when
> it was 50% changed? There is absolutely no satisfactory answer.

When a car is moved. :-)

> The implication is that to refer to it, every single tuple should have
> an artificial key representing our concept of it.

Our concept is artificial :-)
A car is artificial. :-)

x

nelasīta,

2005. gada 27. sept. 03:03:5127.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1127753335....@f14g2000cwb.googlegroups.com...

> Ok, to describe that better with an example:
>
> Consider you have a car and over 20 years you gradually bit by bit
> change every single part of it. The paint job, the engine, the
> interior, the chassis... everything, bit by bit. After 20 years it
> doesn't have a single component that was originally in it. It is still
> the SAME car?

Remember those Western movies ?
Those cowboys and their cattle ?
How do they distingush their herd from those of their neighbors ?

x

nelasīta,

2005. gada 27. sept. 04:34:3727.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1127753335....@f14g2000cwb.googlegroups.com...

> Ok, to describe that better with an example:

> Consider you have a car and over 20 years you gradually bit by bit
> change every single part of it. The paint job, the engine, the
> interior, the chassis... everything, bit by bit. After 20 years it
> doesn't have a single component that was originally in it. It is still
> the SAME car?

What are the attributes of this car ?
Certainly not the paint, the engine, the chassis, the chairs, the wheels,
etc.
Could the car be represented as an atomic value, a relation, a database or
what ?

David Cressey

nelasīta,

2005. gada 27. sept. 08:22:5927.09.05

uz

"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1127799407.7...@g49g2000cwa.googlegroups.com...

> How can you derive age if birthdate isn't an attribute of the
> person? Where would you get the birthdate from?

The decomposition of the universe into entities and attributes is somewhat
subjective. (See the discussion of "nominalism" elsewhere in this thread).

The modeling of attributes is somewhat situation dependent, much as we would
like to pretend that it is not.

If I'm creating a data model for a personnel system, I'm going to model
"date of birth" as an attribute of a person.

If I'm creating a data model for a maternity ward, I'm going to model "date
of birth" as an attribute of a birth.
The person(s) born would inherit the attribute due to their relationship to
the birth.

It's a different model for a different situation.

David Cressey

nelasīta,

2005. gada 27. sept. 08:28:2427.09.05

uz

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1127788646.6...@g44g2000cwa.googlegroups.com...

> What's a VIN, then? Are we all imagining them? Are they somehow
> outside of the real world? Is your bank account id somehow
> Not Of This Earth? If you bank account didn't have an id,
> would it be enough to identify it just by the balance?
> "There is a bank account that is a checking account, with
> $1,000,000.00 in it." Could I then authenticate myself to
> the bank by saying, yeah, I'm the guy with the million
> dollars in the bank. If you forgot your balance, would
> that mean you lost access to your money?

A VIN, a bank account number, and an SSN are all surrogate keys. We tend
not to recognize them as such because they are created and managed by an
agency outside the control of the system we are building or designing. But
they are just as artificial as PersonId, created by our system when a "new"
person is introduced to the system.

x

nelasīta,

2005. gada 27. sept. 10:02:1727.09.05

uz

"David Cressey" <david....@earthlink.net> wrote in message
news:DQa_e.4897$QE1....@newsread2.news.atl.earthlink.net...

> If I'm creating a data model for a personnel system, I'm going to model
> "date of birth" as an attribute of a person.

> If I'm creating a data model for a maternity ward, I'm going to model
"date
> of birth" as an attribute of a birth.
> The person(s) born would inherit the attribute due to their relationship
to
> the birth.

Then, after the translation of this model to the relational model, the date
become the attribute of some relation (not of some entity) and the problem
"which is the entity of this attribute ?" is solved.

Marshall Spight

nelasīta,

2005. gada 27. sept. 10:47:3227.09.05

uz

Agreed on all counts.

What I was tring to get at was the contradiction between
saying "birthdate is not a attribute of person" and
"age is a derived attribute of person from birthdate
and now()." But I fully agree that "The modeling of
attributes is somewhat situation dependent."

Marshall

Marshall Spight

nelasīta,

2005. gada 27. sept. 10:53:1827.09.05

uz

Definitely! But artificial isn't the same as "not part of the real
world." The claim was made that the real world didn't contain
predicates that included artificial keys. So I guess that
means VIN, bank account number, and SSN are supernatural! Spooky!

The car itself is also artificial, but we don't say it's
not part of the real world.

Marshall

JOG

nelasīta,

2005. gada 27. sept. 11:45:4327.09.05

uz

>> " The claim was made that the real world didn't contain
predicates that included artificial keys. So I guess that
means VIN, bank account number, and SSN are supernatural! Spooky!"

Marshall,

You have rather misunderstood. Noone has made any such claim the real
world doesn't contain artificial identifiers such as VIN's and SSN's.

The claim was that if you are modelling the real world you should not
have to add new imaginary keys, you're encoded predicate pretending as
though they were properties of the real world statement, just for the
database to work. It's a hack.

Marshall Spight

nelasīta,

2005. gada 27. sept. 12:57:5927.09.05

uz

JOG wrote:
> > " The claim was made that the real world didn't contain
> > predicates that included artificial keys. So I guess that
> > means VIN, bank account number, and SSN are supernatural! Spooky!"
>
> Marshall,
>
> You have rather misunderstood. Noone has made any such claim the real
> world doesn't contain artificial identifiers such as VIN's and SSN's.

You said "The real world is stating a predicate such as [...]
NOT 'There is a car [...] with an invisible universal id of 24."
I took your "invisible universal id" to be a reference to
artificial identifiers. If you're saying artificial identifiers
are okay but invisible universal ids are not, I'm not clear
what the distinction is.

> The claim was that if you are modelling the real world you should not
> have to add new imaginary keys, you're encoded predicate pretending as
> though they were properties of the real world statement, just for the
> database to work. It's a hack.

When you say it's a hack, do you mean that there are problems with
the appoach, and that there are better ways to do it that don't
have those problems? Can you state what the problems are, and
what the better approach is?

Marshall

JOG

nelasīta,

2005. gada 27. sept. 13:21:5327.09.05

uz

I think it comes down to the fact, it is easy to confuse a predicate
with the situation it describes, but there is a real difference.
Consider the predicate:

"The sky is blue in the daytime"

This has may be represented a a set of three items {sky, blue,
daytime}. But we can say some extra things about this set. First who
stated it - me. Second when it was stated. Third that it is the first
thing I have said on the matter. Fourth, we can comment on the truth of
the statement - that it, in england, is only true.. say 25% of the
time. tops.

Now these are all pieces of information about the PREDICATE itself, not
the situation it describes. They are metadata, and encoding them with
the information about the situation is entirely incorrect - they are
attributes of the container, not of the content.

Mathematically we might write:

P = {"sky", "blue", "daytime"}
Author = {(P, "James")}
Created = {(P, "27th Sept 2005")}
etc...

Saying { sky, blue, daytime, James, "27th Sept 2005", 25% } is wholly
wrong. It is a confusion of data and meta data.

A surrogate key, created specifically to represent a statement, is no
different. It is an artificial way of referencing the predicate. This
is meta_data about the predicate and has no place existing in the same
area as the elements of the predicate - the real information, that
exists in the real world.

vc

nelasīta,

2005. gada 27. sept. 13:42:1827.09.05

uz

JOG wrote:
> I think it comes down to the fact, it is easy to confuse a predicate
> with the situation it describes, but there is a real difference.
> Consider the predicate:
>
> "The sky is blue in the daytime"
>
> This has may be represented a a set of three items {sky, blue,
> daytime}.

I am sorry to say but you are confused. 'The sky is blue in the
daytime' ain't no predicate. It's an [ambiguous] proposition which
could be false or true if it were not ambiguous. It may be useful to
know that the predicate's interpretation is not just any set, but a
mathematical relation. What relation are talking about ?

>But we can say some extra things about this set. First who
> stated it - me. Second when it was stated. Third that it is the first
> thing I have said on the matter. Fourth, we can comment on the truth of
> the statement - that it, in england, is only true.. say 25% of the
> time. tops.

I am not sure what you are trying to say here.

>
> Now these are all pieces of information about the PREDICATE itself, not
> the situation it describes. They are metadata, and encoding them with
> the information about the situation is entirely incorrect - they are
> attributes of the container, not of the content.
>
> Mathematically we might write:
>
> P = {"sky", "blue", "daytime"}
> Author = {(P, "James")}
> Created = {(P, "27th Sept 2005")}
> etc...
>
> Saying { sky, blue, daytime, James, "27th Sept 2005", 25% } is wholly
> wrong. It is a confusion of data and meta data.

That's a confusion allright, but of a different kind.

>
> A surrogate key, created specifically to represent a statement, is no
> different. It is an artificial way of referencing the predicate. This
> is meta_data about the predicate and has no place existing in the same
> area as the elements of the predicate - the real information, that
> exists in the real world.

This just does not make any sense.

paul c

nelasīta,

2005. gada 27. sept. 14:41:4827.09.05

uz

philosophically, i'm likely in over my head but this issue has
interested me for some years as i think there are important practical
consequences to it. to me, as far as db or maybe i should say the RM
rather than db as that is my interest, whether some value is metadata
seems relative to the way it is used. when all is said and done, whether
it is or isn't is in the hands of the beholder.

not to be flip, "{ sky, blue, daytime, James, "27th Sept 2005", 25% }"
*does* exist in the real world, because i saw it in JOG's post! i see no
problem if some user understands the two rightmost values as metadata,
but the db needn't be so precise and can still be useful and precise in
its own realm.

talking only as far as the RM is concerned, i believe that all
metaphysical "meaning" is intended to be external to the db's operation.
(although i have heard there was some controversy when Dr. Codd seemed
to abandon this position in his 1979 paper). i think this would mean
that so as far as as the RM is concerned, the quote could equally mean
"the daytime sky above James was 25% blue on Sept 27th" as well as lots
of other things such as "James gets it right 25% of the time", or even
"the sky *wasn't* blue, etc.". from the RM's perspective, or at least
the earlier descriptions of the RM, none of that matters, only the RM's
internal logic in manipulating the values involved.

you may well be speaking of things which go beyond the RM and i mean no
criticism of that. just thought i'd give my practical db interpretation
of what you say.

maybe i'm saying the same thing as x from not-exists.org when he or she
said "Then, after the translation of this model to the relational model,

the date become the attribute of some relation (not of some entity) and

the problem "which is the entity of this attribute ?" is solved.". to
me, this is part of the potential for db - to be able to transfigure
such a predicate and combine it with others in some internally
consistent way *without* getting tied up knots about the meaning of reality!

cheers,
paul c.

vldm10

nelasīta,

2005. gada 27. sept. 15:21:2027.09.05

uz

This answer needs "longer" explanation. I defined knowledge in
section 2. As knowledge is the monster concept, I limited knowledge
to data which are basically in database. Second, I have concentrated on
data which are in facts attributes values.
Idea was to gradually built knowledge. Incorporate the "attribute"
knowledge in the entity knowledge etc. Almost all the data in the
database are the attribute level. My key which is the name of the
relation's instance and the entity's instance (Ack and Ark
attributes) is the entity level in the database. Gene's message about
the birth date is tricky because it is knowledge about entity. It is
not about an entity's attribute. It is about when the person starts
to be the person.
The building of knowledge begins at the attributes because they are
"atoms". The atoms do not have the parts. But there are some facts
about the atoms. For example: "This atom was created on 1 September
2005." or " John put this atom there", etc. Let mi denote these
facts by Fi ( i = 1,...,n). Is it possible to work only with the
facts? For example, can we subtract f (Fi) - g (Fj) where Fi is the
fact: "Mery was born on 22/8/1950" and Fj is "This is today date
which is calculated by the system's procedure" and f is mapping
which somehow assign a number1 and g is mapping which somehow assign a
number2 and number2 - number1 = age and the age is the attribute of the
corresponding person. I didn't write about this in my text, but I
believe that there are a lot of interesting things there.
I wrote that the age is derived from the birth date because knowledge
about the data that I tried to define can make a bridge (or a
connection) between a data and many other things. If there is no
knowledge about one person birth date then we will never know when
he/she was born and how old is this person.
Let me explain this on one example. Fifteen years ago I made a very
complicated DB project where I tried to store columns separately, to
represent some level of knowledge and some other things (BTW, I have
some experience :-)) It was working so good that I couldn't believe.
Knowledge representation helps me tremendously. It connected the
database, the systems knowledge, and knowledge of people who were out
of IT department. www.dbdesign10.com example 2.5 is an example similar
to technique which was applied in that project. So the example 2.5 is
very close to the real life project. In this example an operator can
enter 5 (or 10) same rows (except the primary key which is always
unique), nothing stops him, i.e. operator can cheats, or enter what he
wants, there are no security things (it is supposed that the operator
has a data entry screen). The goal is that database can solve who made
the fault (or cheats), where is fault, by which procedure etc. So the
database should have kind of "immune system", i.e. to recognize
some things. Here there is only one attribute. But knowledge added
here connects many "small worlds" which are related to this one
attribute. So there are the things which connects different things but
this is I believe another monster subject matter.

Vladimir Odrljin

-CELKO-

nelasīta,

2005. gada 27. sept. 20:14:2827.09.05

uz

>> A VIN, a bank account number, and an SSN are all surrogate keys. <<

No; read Codd's definiiton of a surrogate key. These are all
industry-standard, externally verifiable keys with known validation
rules. Honking BIG difference!! The big part of this is that they
are EXTERNAL to the database.

When you use (longitude, latitude) for a location, is it also a
surrogate? If so, wouldn't every key be a surrogate? I verify a
location with a GPS; I verify a VIN, a bank account number, and an SSN
by computers or phone calls. Just a different device ..

paul c

nelasīta,

2005. gada 27. sept. 21:37:0727.09.05

uz

-CELKO- wrote:
>>>A VIN, a bank account number, and an SSN are all surrogate keys. <<
>
>
> No; read Codd's definiiton of a surrogate key.

could you please tell me where this definiiton can be found?

thanks,
paul c.

JOG

nelasīta,

2005. gada 27. sept. 22:06:5427.09.05

uz

vc wrote:
>'The sky is blue in the daytime' ain't no predicate. It's an [ambiguous]
> proposition which could be false or true if it were not ambiguous.

Ok. There exists a meteorlogical observation where [the sky is blue in
the daytime]. That is the sentence (not a very good one i'll agree, but
a propositional sentence nonetheless), with the predicate in square
brackets.

> It may be useful to know that the predicate's interpretation is not just any
> set, but a mathematical relation. What relation are talking about ?

Totally right about the sloppy representation i used however. Let me
rectify that for the "meteorological_observations" table - in terms of
the finite partial maps that represent the rows of table, correctly
written the corresponding relation to define it extensionally:

P = { <feature: sky>, <colour: blue>, <period: daytime> }

The extra information I specified in the previous post however, is
absolutely not part of this statement about the world. Rather it is
metadata about P:

M = { <creator: James>, <created: 1127871055>, <statement: P> }

Now if one can't reference P through natural keys, or it is inefficient
processing-wise to do so, it is acceptable to use some sort of
surrogate. However this is also not part of our statement about the
world but rather a reference for P, so it should be an attribute in M.

Hope this makes more sense. All best, jog.

Anith Sen

nelasīta,

2005. gada 27. sept. 23:51:2727.09.05

uz

The idea behind surrogates is introduced not by Codd, but by Hall, Owlett &
Todd in their '74 paper "Relations and Entities"; Codd just incorporated in
to RM/T later in his 79 paper. Here is the relevant section of his ACM
paper:
http://www.scism.sbu.ac.uk/~rmkemp/codd1979.pdf

Before taking Joe's interpretation of Codd's suggestions, consider Date's
exposition of its inconsistencies at:
http://www.intelligententerprise.com/db_area/archives/1999/990106/online1.jhtml

It is interesting to note that, little importance has been given to notion
of surrogates in formal database studies. Also, standard bodies like ISO,
have never deemed surrogates to be worthy of a formal definition, due to its
negligible distinction from other logical identifiers. But it is often a hot
topic in newsgroups though.

For practical purposes, consider the following from Date's An Introduction
to Database Systems, 7th Ed. ( ISBN 0201385902 ), p. 444 has the
explanation:

[Quote]
Surrogate keys are keys in the usual relational sense but have the following
specific properties:

* They always involve exactly one attribute.

* Their values serve solely as surrogates ( hence the name ) for the
entities they stand for. In other words, such values serve merely to
represent the fact that the corresponding entities exists -- they carry no
additional information or meaning whatsoever.

* When a new entity is inserted into the database, it is given a surrogate
key value that has never been used before and will never be used again, even
if the entity in question is subsequently deleted.

Ideally, surrogate key values would be system-generated, but whether they
are system- or user-generated has nothing to do with the basic idea of
surrogate keys as such.

It is worth emphasizing that surrogates are not ( as some writers seem to
think ) the same thing as "tuple IDs." For one thing-to state the
obvious-tuple IDs identify tuples and surrogates identify entities, and
there is certainly nothing like a one-to-one correspondence between tuples
and entities ( think of tuple IDs for derived tuples in particular ).
Furthermore, tuple IDs have performance connotations, while surrogates do
not; access to a tuple via its tuple ID is usually assumed to be fast (we
are assuming here that tuples-at least, tuples in base relations-map fairly
directly to physical storage, as is in fact the case in most of today's
products). Also, tuple IDs are usually concealed from the user, while
surrogates must not be ( because of The Information Principle ); in other
words, it is not possible to store a tuple ID as an attribute value, while
it certainly is possible to store a surrogate as an attribute value.

In a nutshell: Surrogates are a logical concept; tuple IDs are a physical
concept.
[End Quote]

--
Anith

x

nelasīta,

2005. gada 28. sept. 04:24:2428.09.05

uz

"vldm10" <vld...@yahoo.com> wrote in message
news:1127848880.1...@g14g2000cwa.googlegroups.com...

> The goal is that database can solve who made
> the fault (or cheats), where is fault, by which procedure etc. So the
> database should have kind of "immune system", i.e. to recognize
> some things. Here there is only one attribute. But knowledge added
> here connects many "small worlds" which are related to this one
> attribute. So there are the things which connects different things but
> this is I believe another monster subject matter.

http://www.oracle.com/technology/deploy/availability/htdocs/Flashback_Overview.htm

David Cressey

nelasīta,

2005. gada 28. sept. 09:04:3528.09.05

uz

"Marshall Spight" <marshal...@gmail.com> wrote in message

news:1127832798.3...@o13g2000cwo.googlegroups.com...

> Definitely! But artificial isn't the same as "not part of the real
> world." The claim was made that the real world didn't contain
> predicates that included artificial keys. So I guess that
> means VIN, bank account number, and SSN are supernatural! Spooky!

Agreed.

But that raises another question: is the system we are building going to be
"part of the real world" once it is built?
My answer is a definite yes. But that means that 'PersonId' is no more and
no less artificial than 'PersonSSN'.
What may be a difference between these two data items is that PersonId will
be kept hidden from the users of the data, while SSN will not.

However, I've seen numerous systems, in the real world, where what was
originally envisioned as a hidden internal key ends up being shared with
special administrative users at first, and later with the general database
user community. So the principle of information hiding of surrogate keys
is, at best, fragile in practice.

The next question is whether a surrogate key and what we are pleased to call
a 'natural key' really identify the same thing or not. In many cases, there
is strong evidence that a surrogate key identifies a tuple, while a natural
key identifies a real world entity that the contents of the tuple describe.
If you'll go back to the origins of this thread, I think you'll see an
example of what I've just said.

If, however, you accept the idea that a surrogate key identifies a tuple,
and not a real world entity directly, then you have to come up with a whole
different response than yours or mine to dawn's claim that a foreign key
amounts to a logical pointer.

If the sole purpose of a foreign key is to act as a surrogate for the
address of the record that represents the tuple, then the argument that
it's really a pointer in disguise begins to carry more merit.

David Cressey

nelasīta,

2005. gada 28. sept. 09:13:5328.09.05

uz

"x" <x...@not-exists.org> wrote in message
news:dhbjde$tru$1...@domitilla.aioe.org...

> Then, after the translation of this model to the relational model, the
date
> become the attribute of some relation (not of some entity) and the problem
> "which is the entity of this attribute ?" is solved.

The problem is not so much solved as obviated. It's not quite the same
thing.

We can carry out a whole class of relational transformations on the data
that give the same result, and have the same validity, regardless of the
ultimate semantics of the data.

However if, at some future time, we get deeply involved in the meaning of
the data, sooner or later the question that's obviated by the relational
model will re-emerge. That's not to say that the RM doesn't achieve
something of value by abstracting this question out of the logical model.
It does.

Much of mathematics is about using the power of abstraction to good
advantage.

-CELKO-

nelasīta,

2005. gada 28. sept. 09:29:4228.09.05

uz

Dr. Codd: "..Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
E. (1979), Extending the database relational model to capture more
meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates. Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them....." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434

-CELKO-

nelasīta,

2005. gada 28. sept. 09:38:3628.09.05

uz

>> Consider you have a car and over 20 years you gradually bit by bit change every single part of it. The paint job, the engine, the interior, the chassis... everything, bit by bit. After 20 years it doesn't have a single component that was originally in it. It is still the SAME car? <<

This is the problem of King Somebody's Charriot from Budhist
literature. The auto industry solved it by saying that a car is the
engine block with the VIN number stamped into it. Everything else is
an add-on.

x

nelasīta,

2005. gada 28. sept. 09:57:3328.09.05

uz

"-CELKO-" <jcel...@earthlink.net> wrote in message
news:1127914182.4...@g47g2000cwa.googlegroups.com...

> Dr. Codd: "..Database users may cause the system to generate or delete
> a surrogate, but they have no control over its value, nor is its value
> ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
> E. (1979), Extending the database relational model to capture more
> meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.

> This means that a surrogate ought to act like an index; created by the
> user, managed by the system and NEVER seen by a user. That means never
> used in queries, DRI or anything else that a user does.

I understood that the VALUE of the SURROGATE cannot be seen and controlled,
not the SURROGATE. Are the VALUE and the SURROGATE the same thing ?

x

nelasīta,

2005. gada 28. sept. 10:06:1628.09.05

uz

"David Cressey" <david....@earthlink.net> wrote in message

news:lGw_e.4989$vw6....@newsread1.news.atl.earthlink.net...

When the question will re-emerge, we have tools to decide "which is the
relation of this attribute".

vc

nelasīta,

2005. gada 28. sept. 10:34:3728.09.05

uz

JOG wrote:
> vc wrote:
> >'The sky is blue in the daytime' ain't no predicate. It's an [ambiguous]
> > proposition which could be false or true if it were not ambiguous.
>
> Ok. There exists a meteorlogical observation where [the sky is blue in
> the daytime]. That is the sentence (not a very good one i'll agree, but
> a propositional sentence nonetheless), with the predicate in square
> brackets.

Still not good enough. In logic, a predicate is a statement whose
truth depends on the variable(s)ranging over some domains(s). E.g 'x
is_older_than y' is a predicate, but 'John is older than Jim' is a
proposition. You need to restructure your sentence some more in order
to make a predicate out of it.

>
> > It may be useful to know that the predicate's interpretation is not just any
> > set, but a mathematical relation. What relation are talking about ?
>
> Totally right about the sloppy representation i used however. Let me
> rectify that for the "meteorological_observations" table - in terms of
> the finite partial maps that represent the rows of table, correctly
> written the corresponding relation to define it extensionally:
>
>
> P = { <feature: sky>, <colour: blue>, <period: daytime> }

As I said above, this row/tuple is not a predicate, it's a
proposition.

>
>
> The extra information I specified in the previous post however, is
> absolutely not part of this statement about the world. Rather it is
> metadata about P:
>
>
> M = { <creator: James>, <created: 1127871055>, <statement: P> }
>

OK. All the talk about predicates aside, why the piece of information
above should be treated in a special way and called by the nebulous
word 'metadata' ? What is gained by this in comparison to treating the
additional attributes as part of the original entity (or being a
separate entity) ?

JOG

nelasīta,

2005. gada 28. sept. 12:47:0728.09.05

uz

vc wrote:
> JOG wrote:
> > vc wrote:
> > >'The sky is blue in the daytime' ain't no predicate. It's an [ambiguous]
> > > proposition which could be false or true if it were not ambiguous.
> >
> > Ok. There exists a meteorlogical observation where [the sky is blue in
> > the daytime]. That is the sentence (not a very good one i'll agree, but
> > a propositional sentence nonetheless), with the predicate in square
> > brackets.
>
> Still not good enough. In logic, a predicate is a statement whose
> truth depends on the variable(s)ranging over some domains(s). E.g 'x
> is_older_than y' is a predicate, but 'John is older than Jim' is a
> proposition. You need to restructure your sentence some more in order
> to make a predicate out of it.

Ok, thank you for the correction. I'm using weak terminology in a very
specific area (using predicate in terms of its grammatical meaning
instead of math logic).

> > P = { <feature: sky>, <colour: blue>, <period: daytime> }
> >

> > M = { <creator: James>, <created: 1127871055>, <statement: P> }
> >
> OK. All the talk about predicates aside, why the piece of information
> above should be treated in a special way and called by the nebulous
> word 'metadata' ?

Yes talk of the terminology is a side issue. First I agree the term
'metadata' is woefully blurry, and given all data may be metadata and
vice versa, its somewhat of a meaningless concept.

> What is gained by this in comparison to treating the
> additional attributes as part of the original entity (or being a
> separate entity) ?

Great question, and not one I'm sure I can answer yet - outside the
fact that its mathematically correct to do so. Storing data _about_ a
proposition in with the encoding of the proposition itself seems to me
like attributing a quote to an author by putting his name inside the
quote marks.

I mean, RM's still do a grand job, but given part of their allure is
their theoretical correctness, it seems to me worthy of checking out as
an area of possible improvement.

vldm10

nelasīta,

2005. gada 28. sept. 14:14:4828.09.05

uz

It seems that you missed the point.
The point is that the solution is on the Conceptual Model & the Logical
Model level rather then on software's house level.
It is the consequence of the DB Design.

Vladimir Odrljin

Gene Wirchenko

nelasīta,

2005. gada 28. sept. 15:03:2628.09.05

uz

On 27 Sep 2005 19:06:54 -0700, "JOG" <j...@cs.nott.ac.uk> wrote:

[snip]

>P = { <feature: sky>, <colour: blue>, <period: daytime> }

>The extra information I specified in the previous post however, is
>absolutely not part of this statement about the world. Rather it is

Of course it is. If nothing else, it is bibliographic, but I
would restructure the statement to:
[Gene's P =] James says that [James's] P.

>metadata about P:
>
>
>M = { <creator: James>, <created: 1127871055>, <statement: P> }

I am concerned about who makes a statement. When assessing
reliability, this is important.

[snip]

Sincerely,

Gene Wirchenko

mAsterdam

nelasīta,

2005. gada 28. sept. 16:06:1628.09.05

uz

It boils down to "what do we need to know?"
The topic of the propositions doesn't matter
in a way that would affect the way we design
a database. Data about data - metadata - so what?

Databases and databased thinking help when
there are a lot of propositions to just a
few predicates. It breaks down when we want
to have propositions, not knowing the
predicates in advance.

>>What is gained by this in comparison to treating the
>>additional attributes as part of the original entity (or being a
>>separate entity) ?
>
> Great question, and not one I'm sure I can answer yet - outside the
> fact that its mathematically correct to do so. Storing data _about_ a
> proposition in with the encoding of the proposition itself seems to me
> like attributing a quote to an author by putting his name inside the
> quote marks.

So you want to know something about the propositions.
Great! Tell me exacly what it is you want to know about
them and together we can design the database
just for you. I'll repeat: the topic does not matter.

You are not the first who wants data about data,
facts about facts. Disasters like Enron's caused
a lot of people to seriously rethink about what
we need to know about what we know, and Larry E
was rather fast to respond with several features
in Oracle 9 & 10 to cater for frequent needs.

vldm10

nelasīta,

2005. gada 28. sept. 17:22:3228.09.05

uz

I would like to use your valuable input about the industry-standard,
externally verifiable keys, to add more about identifying an entity in
the Real World.
Let me start with policemen who check the picture on a driver license
and then licenceID and plate number, name etc. This is an example of
using knowledge of more attributes rather then simple identification by
one attribute. We can observe that the license's picture should be
changed periodically. Same case with a passport, a membership card etc.
And I believe that identifying by the picture and by a non key
attributes (and new technologies) will have significant role in future.
Second example is checking by the phone. If it is about credit card
they ask for last four digits and other additional information. Often
these phone identifications start with information about names,
address, and mother's maiden name or last for digits of the ssn.
That was the reason why I didn't define that key identifies entity in
the Real World.
I wrote that key provides identification. In the discussion I used the
terms 'key helps identification' or 'key together with other
attributes provides' identification of the entity in the Real World.
Finally we also have to identify the relationships in the Real World.
Many relationships are widely checked, by more then one attribute.
Once again I appreciate your input and my input is addition to this
important subject.

Vladimir Odrljin

JOG

nelasīta,

2005. gada 28. sept. 17:29:3328.09.05

uz

Gene Wirchenko wrote:
> >P = { <feature: sky>, <colour: blue>, <period: daytime> }
>
> >The extra information I specified in the previous post however, is
> >absolutely not part of this statement about the world. Rather it is
>
> Of course it is. If nothing else, it is bibliographic, but I
> would restructure the statement to:
> [Gene's P =] James says that [James's] P.

If you are saying there is a distinction between the observer of the
proposition and the creator of the tuple, just as there is a
distinction between the time of the observation and the time of the
tuples creation in the db, well, yes I agree. This was the intention of
the example, although it may have become obfuscated along the way. But
there is a distinction, and if Gene were the person who observed the
the fact that the sky is blue in the day, we have:

P = { <feature: sky>, <colour: blue>, <period: day>, <observer: Gene>}

M = { <creator: James>, <created: 1127871055>, <statement: P> }

P is just a finite partial mapping, and as such my (granted often
unreliable) spider-sense is still telling me that like any function,
there may exist relationships that do not belong as part of P's
extensional representation (orderings, set membership, etc). Easy
employment and manipulation of these would utilise an implicit
conceptual reference (as I would use the letter P in the mathematical
notation - but instead I have to hack in and manage an explicit
artificial key, or specify "on update cascade"s all over the shop to
maintain integrity.)

On a side note, it has of course been pissing it down in the UK, and
the sky is absolutely, categorically and unequivicably, not blue :)

paul c

nelasīta,

2005. gada 28. sept. 19:32:5228.09.05

uz

-CELKO- wrote:
> Dr. Codd: "..Database users may cause the system to generate or delete
> a surrogate, but they have no control over its value, nor is its value
> ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
> E. (1979), Extending the database relational model to capture more
> meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.
>
> This means that a surrogate ought to act like an index; created by the
> user, managed by the system and NEVER seen by a user. That means never
> used in queries, DRI or anything else that a user does.

> ...

thanks, i'd never paid much attention to the idea before. seems like a
bizarre one, if i may say so. i can just imagine trying to explain it
to a properly nosy user, eg.: "well the database did it that way because
of the surrogate key ... what? ... how do i know there aren't two of
them? ... well, i just *know* ... what? ... no, you can't see it!"

if one agrees that there are more basic problems to be solved, it
doesn't seem like a very important idea, but if a guy as smart as Codd
put it forth, i guess i should try harder to understand it.

p

JOG

nelasīta,

2005. gada 28. sept. 20:35:5228.09.05

uz

Great example. King Milindi's chariot - a bhuddist philosopher used it
as an argument against essentialism, to illustrate that there is no one
attribute of an item that can define its essense.

The monk might say to you that the VIN is a great identifier until the
car is stolen (this happens a lot in my neighbourhood), and the thieves
cleave off the VIN and repaint it. It would still be my ruddy car they
were driving around in (despite what philosophers like Leibniz would
probably tell me).

Primary keys are supposed to be immutable. RM relies on this for smooth
running, preventing dangling references and isolated records. But given
that in reality no primary key, or combination, can be gauranteed to be
so, and no natural keys can represent an entity's "essence", every
predicate is a King Milindi Chariot. Course, this doesn't stop us
having usable database admin's handle it with good old common sense,
but imo it's certainly not a rock solid foundation for the theory.

Marshall Spight

nelasīta,

2005. gada 29. sept. 02:03:3529.09.05

uz

JOG wrote:
>
> Great example. King Milindi's chariot - a bhuddist philosopher used it
> as an argument against essentialism, to illustrate that there is no one
> attribute of an item that can define its essense.

Sure, sure. Or the Viking Ship, or even just Grandfather's axe.
Actually grandfather's axe is the best version of the story
because it's the minimal version of the story. This is the
sort of thing I found really interesting in high school. Ahem.

> Course, this doesn't stop us
> having usable database admin's handle it with good old common sense,
> but imo it's certainly not a rock solid foundation for the theory.

Oh, please.

Marshall

x

nelasīta,

2005. gada 29. sept. 02:20:4229.09.05

uz

"vldm10" <vld...@yahoo.com> wrote in message

news:1127931288....@o13g2000cwo.googlegroups.com...

> It seems that you missed the point.

Yes. It seems so.

> The point is that the solution is on the Conceptual Model & the Logical
> Model level rather then on software's house level.
> It is the consequence of the DB Design.

I am aware of this.

But why do this extra work if the software automatically provide ?
I don't know the implementation so I'm not sure if "the solution" isn't at
the logical model level.

x

nelasīta,

2005. gada 29. sept. 02:57:4729.09.05

uz

In the relational database, relations usually have more than an attribute
and keys are not restricted to one attribute. In the "Real World" policemen
do integrity checking in addition to identity checking. Also a database need
to be kept updated. etc.

"vldm10" <vld...@yahoo.com> wrote in message

news:1127942552.0...@g49g2000cwa.googlegroups.com...

JOG

nelasīta,

2005. gada 29. sept. 07:33:2129.09.05

uz

If you were involved in AI research, whose systems have no common sense
input due to their lack of situatedness, you would realise that these
distinctions for the foundations of a data structure can be crucial.

Either way your post is sadly unconstructive, and something I would
expect to see on Slashdot, not within a database theory discussion.

Marshall Spight

nelasīta,

2005. gada 29. sept. 11:01:1229.09.05

uz

JOG wrote:
> Marshall Spight wrote:
> >
> > Oh, please.

>
> If you were involved in AI research, whose systems have no common sense
> input due to their lack of situatedness, you would realise that these
> distinctions for the foundations of a data structure can be crucial.

You keep saying that, but you don't back it up with anything.
You don't provide any formal or even vaguely mathematical arguments,
just handwavy ones. You ignore all my requests for clarification,
and more conveniently, do not respond to the questions I ask
which illustrate what look to me to be dramatic flaws in your
arguments. They might not be, but if you won't actually *have*
the debate, it's hard to say.

If you want to critique a mathematical model, you could show
lack of soundness, lack of completeness, lack of expressiveness.
And even if you could do that, it wouldn't mean much unless
you could show an alternative that didn't suffer from these
problems. This is not a philosophy group, but a theory one.
And the particular theory being discussed has a long tradition
of rolling up its sleeves and doing work, in the real world
I might add.

And by the way, I bump into AI researchers in the lunch line
every day, and so far, all the systems they have produced
use keys.

> Either way your post is sadly unconstructive, and something I would
> expect to see on Slashdot, not within a database theory discussion.

You ride in here on King Milindi's chariot, make some
vague ominous claims about set theory bolstered only by trite
anecdotes about "identity", ignore my efforts to engage you,
and at the first sign of my impatience, start in with the
personal attacks? And you call *me* "unconstructive?" So far
I rate your contribution to the newsgroup as negative. If you
want a "database theory discussion" then you have to actually
have one.

Marshall

JOG

nelasīta,

2005. gada 29. sept. 13:23:4529.09.05

uz

I will try and reply civilly in spite of the vitriol.

> You ride in here on King Milindi's chariot, make some
> vague ominous claims about set theory bolstered only by trite
> anecdotes about "identity", ignore my efforts to engage you,
> and at the first sign of my impatience, start in with the
> personal attacks? And you call *me* "unconstructive?" So far
> I rate your contribution to the newsgroup as negative. If you
> want a "database theory discussion" then you have to actually
> have one.

No, I said your last post was unconstructive, which it plainly was.
You say you're brusqueness was borne of impatience but hey, noones
forcing you to read the thread and a few posts of mine among the many
here shouldn't cause you so much distress for you to lose your cool.

Either way, I certainly have said nothing about you - I don't know you
and absolutely would not personally attack you. In fact, the opposite,
I would love to engage with you, as I want to come to a conclusion
myself. I am certainly not waltzing about saying anything like I've got
it right - I'm just offering a possible opinion for discussion, in an
area that is of interest. I have also not ignored you and endeavoured
to reply to every one of your posts in an attempt to clarify (If I've
failed to do so then just let me know and I can try and rectify that):

Your very first comment:
> What's a VIN, then? Are we all imagining them? Are they somehow
> outside of the real world? Is your bank account id somehow
> Not Of This Earth? If you bank account didn't have an id,
> would it be enough to identify it just by the balance?
> "There is a bank account that is a checking account, with
> $1,000,000.00 in it." Could I then authenticate myself to
> the bank by saying, yeah, I'm the guy with the million
> dollars in the bank. If you forgot your balance, would
> that mean you lost access to your money?

Someone else replied in between my reading the boards:

Your next comment:

> Definitely! But artificial isn't the same as "not part of the real
> world." The claim was made that the real world didn't contain
>predicates that included artificial keys. So I guess that
> means VIN, bank account number, and SSN are supernatural! Spooky!

I replied to you:
> The claim was that if you are modelling the real world you should not have
> to add new imaginary keys, you're encoded predicate pretending as
> though they were properties of the real world statement,
> just for the database to work.

You then justifiably asked:
> If you're saying artificial identifiers
> are okay but invisible universal ids are not, I'm not clear
> what the distinction is.

A VIN exists in the real world, and as such is part of the proposition
we are encoding. A surrogate key is an artifice of the database, does
not exist in the world you are modelling, and is implemented to get
your system working. This seems a very clear distinction to me. I
attempted to highlight this with an example. VC then helped me rid the
example of weak terminology and Gene emphasised the difference between
the observer within the example, and the creator of the tuple. Then
finally you chime in with what reads a very terse post, which adds
nothing at all. Again, I am unclear as to how this engendered so much
aggrievement for you.

Nonetheless I hope you can see you have got the wrong end of the stick
in this case and I welcome discussion with you. I disagree
wholeheartedly with you that the philosophy of what information
actually is, and the consequences that may have for encoding and
manipulation of it, is irrelevant to database theories. After all it is
Codd's information Principle that is at the heart of what we are really
debating here - and that is the philosophical viewpoint, upon which his
mathematical model is based.

James.

vldm10

nelasīta,

2005. gada 29. sept. 13:29:1029.09.05

uz

JOG wrote:
> Gene Wirchenko wrote:
> > >P = { <feature: sky>, <colour: blue>, <period: daytime> }
> >
> > >The extra information I specified in the previous post however, is
> > >absolutely not part of this statement about the world. Rather it is
> >
> > Of course it is. If nothing else, it is bibliographic, but I
> > would restructure the statement to:
> > [Gene's P =] James says that [James's] P.
>
> If you are saying there is a distinction between the observer of the
> proposition and the creator of the tuple, just as there is a
> distinction between the time of the observation and the time of the
> tuples creation in the db, well, yes I agree. This was the intention of
> the example, although it may have become obfuscated along the way. But
> there is a distinction, and if Gene were the person who observed the
> the fact that the sky is blue in the day, we have:
>
> P = { <feature: sky>, <colour: blue>, <period: day>, <observer: Gene>}
>

Instead:

P = {< feature: sky>, <colour: blue>, <period: day>, <observer:
Gene>}

James creates:
P = {< feature: ocean>, <colour: blue>, <period: day>, <observer:
Gene>}

In fact James here avoids the truth.

The DB Designer should to solve some bizarre situations especially if
he decides to allow input of the facts related to attribute.
Besides the cheating a data entry person also can enter the fact, no
fact (he believe that it is a fact but it isn't) as well as thing for
which he doesn't know is it a fact or not.
You can have the VIN88 in your database and the car in the Real World
with the VIN88.
But it can happen that an operator entered the wrong value for some
attribute. For example, he entered VIN88 instead VIN89. Only difference
between these two cars is that VIN88 has "light silver" and VIN89
has "dark silver" for the InteriorColor attribute. Now you can
"identify" the Real World entity using VIN88 number, but this is
not it (because of the colour). People usually solve this kind of the
problems even if it is an intelligent crime. But mapping between the
entities from the Real World (and also from the Conceptual Model) and
your database is not a 1-1. So some keys are meaningless. You can have
many things in your database.

Gene Wirchenko

nelasīta,

2005. gada 29. sept. 14:23:4729.09.05

uz

On 29 Sep 2005 10:23:45 -0700, "JOG" <j...@cs.nott.ac.uk> wrote:

>I will try and reply civilly in spite of the vitriol.
>
>> You ride in here on King Milindi's chariot, make some
>> vague ominous claims about set theory bolstered only by trite
>> anecdotes about "identity", ignore my efforts to engage you,
>> and at the first sign of my impatience, start in with the
>> personal attacks? And you call *me* "unconstructive?" So far
>> I rate your contribution to the newsgroup as negative. If you
>> want a "database theory discussion" then you have to actually
>> have one.
>
>No, I said your last post was unconstructive, which it plainly was.

It was? How did I miss that, if it really was so plain?

Rather, I find Marshall's posts to be generally quite clear and
useful.

[snip]

Sincerely,

Gene Wirchenko

JOG

nelasīta,

2005. gada 29. sept. 15:28:3929.09.05

uz

> This is the sort of thing I found really interesting in high school. Ahem.

>> Course, this doesn't stop us having usable database admin's handle it with good old
>> common sense, but imo it's certainly not a rock solid foundation for the theory.

> Oh, please.

I don't see the constructiveness in those statements Gene, especially
given that Marshall says it was borne of impatience :( ... and while
the dig is quite clear I don't see how it's useful? What have I missed
there? Perhaps I'm being over-sensitive, in which case I apologise.

As I said, questioning the philosophy of Codd's information principle
still seems worthy of discussion to me. Either way, no probs, I don't
want to derail vldm10's thread.

all best, J.

vldm10

nelasīta,

2005. gada 29. sept. 16:02:3329.09.05

uz

vldm10 wrote:
> dawn wrote:
> >
> > So, what is your rationale for choosing a single, often meaningless,
> > identifier for each relation rather than working with a more natural
> > candidate key in a relationship table?
>
>
> I do not generate only a key; it is also the attribute which is the
> name of the relation's instance. In current RM, typically, you do not

Here I made mistake it is not a name it is the key and the attribute.
So it is as I wrote in my text at www.dbdesign10.com Sorry about this
The point was that in my definition it is a key and an attribute. In
current RM it is key. It can be an attribut but not in case of a
compaund key.

> have an attribute which is the name of the relation's instance, you
> have only the key. The key can be compound - meaning a key can be a set
> of the attributes; i.e. the key is not an attribute. So my solution is
> the attribute and it can't be meaningless because attribute has
> origins in Conceptual Model and meaning in the Real World. A key which
> has number as value is annoying, but we always use some additional
> information to get meaning in the Real World. It is same case with SSN,
> VIN etc.
> If we have the Car table with the cars which attributes are all the
> time same, it is okay to use existing RM solutions. However the RM's
> definition of the key can't support cases which we discussed. In
> these cases we need maybe to split the table. This involves a FK and
> the data integrity or we need some additional "fields" which are
> not the attributes, or we need some additional attributes. If an
> entity has an attribute which repeats its value then I prefer my
> solution. Relationships repeat their values more frequently then the
> entities.
> Finally current implementation of the Relational Model causes some
> problems.
> The TransRelational™ Model is what the database world is waiting. It
> provides a completely new approach to implementation. In this Model
> columns are stored separately. But the key definition in the RM should
> be changed. The logic of the compound key is directly opposite to the
> logic of this model.
> In 2.4 under 2. on my website I set more general representation of
> storing the columns separately. An example 2.5 shows how knowledge can
> be represented regarding one data (column Amount) and using the key
> which is related to the state.
>
>
> Vladimir Odrljin

Gene Wirchenko

nelasīta,

2005. gada 29. sept. 16:07:3729.09.05

uz

On 29 Sep 2005 12:28:39 -0700, "JOG" <j...@cs.nott.ac.uk> wrote:

[snip]

>I don't see the constructiveness in those statements Gene, especially

>given that Marshall says it was borne of impatience :( ... and while
>the dig is quite clear I don't see how it's useful? What have I missed
>there? Perhaps I'm being over-sensitive, in which case I apologise.
>
>As I said, questioning the philosophy of Codd's information principle
>still seems worthy of discussion to me. Either way, no probs, I don't
>want to derail vldm10's thread.

I have seen it a number of times in this newsgroup. Someone
mentions supposed weaknesses in the RM, but when called on to
substantiate his point, does not do so. It is your obligation to
support your claim. Otherwise, you can expect a rather cool
reception. Many people have made claims. Proving them is quite
another matter.

Sincerely,

Gene Wirchenko

JOG

nelasīta,

2005. gada 29. sept. 17:33:4029.09.05

uz

Okay fair enough. The issue clearly has baggage. But I have used,
advocated, and taught the RM for over a decade now, and I am not just
trying to be contrary. But I'm not doing down the RM - the primary
attraction of RM os that it was not created in an ad-hoc fashion, but
rather based on fundamental principles. All my posts were intended to
be, were a discussion of a different possible principle to base a
mathematical model upon.

Proposition:
I am putting forward for discussion that Codd's Information Principle
(based on an Liebniz/essentialist viewpoint that an entity is defined
by its attributes, and is nothing more and nothing less than the
aggregate of them, [Celko - D&DB]) is not a complete model of the
nature of information. Rather I am proposing that there is no single
attribute of an entity that can be gauranteed to act as a permenant
immutable reference for it (nominalism - there are no universals), and
so an external one to represent our concept of an entity be provided.

Difficulties arising from the IP:
Primary Key non-mutability - often necessary to employ internal
surrogates that require management [Codd - RM/T paper].
Lack of Internal Referencing checks - often necessary to explicitly use
cascade mechanisms to prevent dangling references, orphan records.

Possible Solution:
provide any encoded proposition with an automated reference that
guarantees primary key mutability, but is externalised from the data,
which then remains a pure encoding of the representation of real-world
attributes (essentially the partial map that is the tuple becomes a
datatype) [W, Kent - D&R]. Joins may be pivoted on this reference, and
integrity is continually maintained by the DBMS between these
references to prevent orphan tuples.

Now I'm afraid, I have no more than that to offer at the moment. And
obviously its all up for assassination, and thats no problem, shoot.
That's why i'm posting - to get your educated opinions. Just don't tell
me this sort of discussion isn't valid here (where else?), that I don't
roll my sleeves up and do real work or that the questioning the
philosophy of axioms is for high school.

Gene Wirchenko

nelasīta,

2005. gada 29. sept. 18:38:4929.09.05

uz

On 29 Sep 2005 14:33:40 -0700, "JOG" <j...@cs.nott.ac.uk> wrote:

>Okay fair enough. The issue clearly has baggage. But I have used,
>advocated, and taught the RM for over a decade now, and I am not just
>trying to be contrary. But I'm not doing down the RM - the primary
>attraction of RM os that it was not created in an ad-hoc fashion, but
>rather based on fundamental principles. All my posts were intended to
>be, were a discussion of a different possible principle to base a
>mathematical model upon.

You are going to have to do better than that. We hear noise
about inadequacies of the RM quite frequently. Not much ever comes of
the noise.

>Proposition:
>I am putting forward for discussion that Codd's Information Principle
>(based on an Liebniz/essentialist viewpoint that an entity is defined
>by its attributes, and is nothing more and nothing less than the
>aggregate of them, [Celko - D&DB]) is not a complete model of the
>nature of information. Rather I am proposing that there is no single
>attribute of an entity that can be gauranteed to act as a permenant
>immutable reference for it (nominalism - there are no universals), and
>so an external one to represent our concept of an entity be provided.
>
>Difficulties arising from the IP:
>Primary Key non-mutability - often necessary to employ internal
>surrogates that require management [Codd - RM/T paper].
>Lack of Internal Referencing checks - often necessary to explicitly use
>cascade mechanisms to prevent dangling references, orphan records.
>
>Possible Solution:
>provide any encoded proposition with an automated reference that
>guarantees primary key mutability, but is externalised from the data,
>which then remains a pure encoding of the representation of real-world
>attributes (essentially the partial map that is the tuple becomes a
>datatype) [W, Kent - D&R]. Joins may be pivoted on this reference, and
>integrity is continually maintained by the DBMS between these
>references to prevent orphan tuples.

This appears to be simply the idea of a surrogate key. If it is
not, you have explaining ahead of you.

>Now I'm afraid, I have no more than that to offer at the moment. And
>obviously its all up for assassination, and thats no problem, shoot.
>That's why i'm posting - to get your educated opinions. Just don't tell
>me this sort of discussion isn't valid here (where else?), that I don't
>roll my sleeves up and do real work or that the questioning the
>philosophy of axioms is for high school.

Not me. Have at it. Just do not blow smoke or handwave (much).

Sincerely,

Gene Wirchenko

JOG

nelasīta,

2005. gada 29. sept. 20:48:4729.09.05

uz

> This appears to be simply the idea of a surrogate key.
> If it is not, you have explaining ahead of you.

Well it appears i'm flailing. Marshall's frustrated jibe aside, i've
appreciate the responses received, and anyone who has put the time
therein deserves a pint next time they are in Nottingham.

I don't have the terminology yet to express the answer to the question
above, outside of a cogent philosophical argument which would come
across as smoke. Nor do I have any mathematical extension to provide
positive evidence for the approach. Sod it, one last handwave...

The difference is analagous to the following - below is the graph
component for some arbitrary mapping:

f = { <a,1>, <b,2>, <c,3> }

In the math, f is the symbol by which I will reference this graph, and
would use to refer to the mapping inside further relations. I won't
reference it by one of its tuples, and the f symbol certainly belongs
outside of the set it represents, not encoded as a tuple within it. The
distinction is between container and the content.

I need a drink, J.

Marshall Spight

nelasīta,

2005. gada 30. sept. 01:11:2830.09.05

uz

JOG wrote:
> ... I would love to engage with you, ...

I'll take you at your word.

> Your next comment:
> > Definitely! But artificial isn't the same as "not part of the real
> > world." The claim was made that the real world didn't contain
> >predicates that included artificial keys. So I guess that
> > means VIN, bank account number, and SSN are supernatural! Spooky!
>
> I replied to you:
> > The claim was that if you are modelling the real world you should not have
> > to add new imaginary keys, you're encoded predicate pretending as
> > though they were properties of the real world statement,
> > just for the database to work.
>
> You then justifiably asked:
> > If you're saying artificial identifiers
> > are okay but invisible universal ids are not, I'm not clear
> > what the distinction is.
>
> A VIN exists in the real world, and as such is part of the proposition
> we are encoding. A surrogate key is an artifice of the database, does
> not exist in the world you are modelling, and is implemented to get
> your system working. This seems a very clear distinction to me.

Perhaps we have different definitions of surrogate keys. The VIN
may be something that's not a surrogate key in database A, but
it's certainly a surrogate key in *some* database, and hence I
don't see any particular difference between it and any other
made-up identifiers. SOMEone, some database, made up the VIN
out of thin air; we could well say they "implemented [it] to get
[their] system working." So I still don't see the distinction
you're drawing here.

> I disagree
> wholeheartedly with you that the philosophy of what information
> actually is, and the consequences that may have for encoding and
> manipulation of it, is irrelevant to database theories.

I don't believe I ever took that position.

However, there is a definite trap in this general area, which
is the trap of thinking that there is anything profound to
be said about the relationship between the internal and the
external predicate. That relationship, between the database
and the world it models, exists only in the minds of the users
of the system, and it is wholly an illusion, if you will.
This is an important point, and it is also a philosophical one.

Marshall

x

nelasīta,

2005. gada 30. sept. 03:56:2730.09.05

uz

"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1128057088.2...@z14g2000cwz.googlegroups.com...
> JOG wrote:

> > A VIN exists in the real world, and as such is part of the proposition
> > we are encoding. A surrogate key is an artifice of the database, does
> > not exist in the world you are modelling, and is implemented to get
> > your system working. This seems a very clear distinction to me.

> Perhaps we have different definitions of surrogate keys. The VIN
> may be something that's not a surrogate key in database A, but
> it's certainly a surrogate key in *some* database, and hence I
> don't see any particular difference between it and any other
> made-up identifiers. SOMEone, some database, made up the VIN
> out of thin air; we could well say they "implemented [it] to get
> [their] system working." So I still don't see the distinction
> you're drawing here.

A VIN exists in the "real world" (outside every database), doesn't matter
if it is artificial or not (maybe it was born at the same time with the car,
so it is natural after all :-)).

An "invisible id" never leave the database where it was born. It exists in
exactly one database.
Or

An "invisible id" never escape the database world. It doesn't exists in the
real world (outside every database).

> However, there is a definite trap in this general area, which
> is the trap of thinking that there is anything profound to
> be said about the relationship between the internal and the
> external predicate. That relationship, between the database
> and the world it models, exists only in the minds of the users
> of the system, and it is wholly an illusion, if you will.
> This is an important point, and it is also a philosophical one.

The database have to be in sync with the world it models .
When they go out of sync, the database and/or the world need to be updated
somehow.

The SSN and the VIN are a bit different because the SSN is not tattooed on
your skull at birth (yet).

mAsterdam

nelasīta,

2005. gada 30. sept. 04:06:0230.09.05

uz

Marshall Spight wrote:
> JOG wrote:

[snip]

>>A VIN exists in the real world, and as such is part of the proposition
>>we are encoding. A surrogate key is an artifice of the database, does
>>not exist in the world you are modelling, and is implemented to get
>>your system working. This seems a very clear distinction to me.
>
>
> Perhaps we have different definitions of surrogate keys. The VIN
> may be something that's not a surrogate key in database A, but
> it's certainly a surrogate key in *some* database, and hence I
> don't see any particular difference between it and any other
> made-up identifiers. SOMEone, some database, made up the VIN
> out of thin air; we could well say they "implemented [it] to get
> [their] system working." So I still don't see the distinction
> you're drawing here.

There is an important difference. Unless we are talking about
that specific "*some*" database, the VIN is /not/ a surrogate key in
the database at hand.
A while ago we had some discussion as to wether an url is a key
or not. My stance on that was: only if the content is in our
database the url may be a key, so - in general: an url is not a key.
There is a similar distinction to be made here.
In general, a VIN is not a surrogate key (though it
is in the "*some*" database - only there).

x

nelasīta,

2005. gada 30. sept. 04:17:4930.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1128041327....@g49g2000cwa.googlegroups.com...

> The difference is analagous to the following - below is the graph
> component for some arbitrary mapping:

> f = { <a,1>, <b,2>, <c,3> }

> In the math, f is the symbol by which I will reference this graph, and
> would use to refer to the mapping inside further relations. I won't
> reference it by one of its tuples, and the f symbol certainly belongs
> outside of the set it represents, not encoded as a tuple within it. The
> distinction is between container and the content.

In the RM there are no entities. There are only values and relations.
Also it is based on the first order logic.
Anything else need to be solved outside the model.

If you have a problem with this, and some ideeas to solve it, you are
wellcome to share them.

x

nelasīta,

2005. gada 30. sept. 04:33:3930.09.05

uz

"mAsterdam" <mAst...@vrijdag.org> wrote in message
news:433cf1d0$0$11072$e4fe...@news.xs4all.nl...

> Marshall Spight wrote:
> > JOG wrote:
> [snip]
>
> >>A VIN exists in the real world, and as such is part of the proposition
> >>we are encoding. A surrogate key is an artifice of the database, does
> >>not exist in the world you are modelling, and is implemented to get
> >>your system working. This seems a very clear distinction to me.
> >
> >
> > Perhaps we have different definitions of surrogate keys. The VIN
> > may be something that's not a surrogate key in database A, but
> > it's certainly a surrogate key in *some* database, and hence I
> > don't see any particular difference between it and any other
> > made-up identifiers. SOMEone, some database, made up the VIN
> > out of thin air; we could well say they "implemented [it] to get
> > [their] system working." So I still don't see the distinction
> > you're drawing here.

> There is an important difference. Unless we are talking about
> that specific "*some*" database, the VIN is /not/ a surrogate key in
> the database at hand.
> A while ago we had some discussion as to wether an url is a key
> or not. My stance on that was: only if the content is in our
> database the url may be a key, so - in general: an url is not a key.
> There is a similar distinction to be made here.
> In general, a VIN is not a surrogate key (though it
> is in the "*some*" database - only there).

I think we need to use other word like identifier when talking about
entities.
In the RM one can say the url is a key only if there exist a relation like
At(url, content) in the database.
It doesn't make sense to talk about relational keys when we talk about
entities.
We can talk how an entity is represented in a relational database.
In general, an url is just a value.

vldm10

nelasīta,

2005. gada 30. sept. 04:48:3730.09.05

uz

Let the data from the relation Car be represented with the following
relations:
(this is the relation Car from the beginning of the thread)

S-Relation: CarKey CarID
-------------------
23 vin1
24 vin1
25 vin1
26 vin2
27 vin3

K1-Relation: CarKey Make

------------------
23 Buick
24 Buick
25 Buick
26 Honda
27 Ford

K2-Relation: CarKey Color
-------------------
23 Silver
24 Blue
25 Red
26 Silver
27 Black

Now we can set the query on above S-Relation, K1-Relation and
K2-relation and we can get for example the following tuple:

( 24, vin1, Buick, Blue )

This is what you want: that part of the tuple identifies tuple.
Key "24" uniquely identifies the above tuple which it is part.
First key "24" was set in the S-Relation as well as in the
K1-Relation and K2-Relation which are real and in the database.
( Similar ideas exist in the math and the computer science)
This Key also has the semantic line, which beginning is one event in
the Real World. (see the definition of the key).
(We also can notice that this is not a surrogate key by definition.)

In www.dbdesign10.comm 2.4 under 2. I defined K-Relations, E-relation
and
S-Relation.
Here P,E,A1,...,An are the attributes of a relation from RM.
(They are also the pictures of some attributes P, E, A1,...An from the
Conceptual Model)

Here in RM, S-Relation:
a) Determines the keys in the database
b) Shows relationship to entity's identifier E
c) Associates knowledge related to P

I also wrote in my text that we create keys using Constructors that is
we can for example click on a button, New or Close to get the
Constructor's screen with many fields on it. So I can create a key
and the knowledge related to this key: who initialize this Constructor,
name of the Constructor (it can be more then one), which procedure
created key, what is the previous key,... Now besides above mentioned
the database "can recognize" which procedures (logic) created its
data. So we can get some meaning and knowledge about the key.If we
really need this.
(My notice to dawn about my mistake was put because, the relation was
in form 2.4 under 1)

Vladimir Odrljin

x

nelasīta,

2005. gada 30. sept. 04:49:5330.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1128029620.6...@f14g2000cwb.googlegroups.com...

> Proposition:
> I am putting forward for discussion that Codd's Information Principle
> (based on an Liebniz/essentialist viewpoint that an entity is defined
> by its attributes, and is nothing more and nothing less than the
> aggregate of them, [Celko - D&DB]) is not a complete model of the
> nature of information. Rather I am proposing that there is no single
> attribute of an entity that can be gauranteed to act as a permenant
> immutable reference for it (nominalism - there are no universals), and
> so an external one to represent our concept of an entity be provided.

The usual answer to this is: there must be some way to discriminate among
any two entities, otherwise how do you know there are two of them.

Immutability in a changing world and absolute guaranties are another matter.
:-)

> Difficulties arising from the IP:
> Primary Key non-mutability - often necessary to employ internal
> surrogates that require management [Codd - RM/T paper].
> Lack of Internal Referencing checks - often necessary to explicitly use
> cascade mechanisms to prevent dangling references, orphan records.

It is hard to keep a database up to date.
Do you have a solution (even partial) to this problem ?

> Possible Solution:
> provide any encoded proposition with an automated reference that
> guarantees primary key mutability, but is externalised from the data,
> which then remains a pure encoding of the representation of real-world
> attributes (essentially the partial map that is the tuple becomes a
> datatype) [W, Kent - D&R]. Joins may be pivoted on this reference, and
> integrity is continually maintained by the DBMS between these
> references to prevent orphan tuples.

This is an encrypted text for me.

x

nelasīta,

2005. gada 30. sept. 05:36:4830.09.05

uz

"x" <x...@not-exists.org> wrote in message
news:dhiu7k$kjm$1...@domitilla.aioe.org...

>
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1128029620.6...@f14g2000cwb.googlegroups.com...
>
>
> > Proposition:
> > I am putting forward for discussion that Codd's Information Principle
> > (based on an Liebniz/essentialist viewpoint that an entity is defined
> > by its attributes, and is nothing more and nothing less than the
> > aggregate of them, [Celko - D&DB]) is not a complete model of the
> > nature of information. Rather I am proposing that there is no single
> > attribute of an entity that can be gauranteed to act as a permenant
> > immutable reference for it (nominalism - there are no universals), and
> > so an external one to represent our concept of an entity be provided.

> The usual answer to this is: there must be some way to discriminate among
> any two entities, otherwise how do you know there are two of them.

And if you care which is which, you'll find a way to tell.

x

nelasīta,

2005. gada 30. sept. 05:54:0830.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1128029620.6...@f14g2000cwb.googlegroups.com...

> Proposition:
> I am putting forward for discussion that Codd's Information Principle
> (based on an Liebniz/essentialist viewpoint that an entity is defined
> by its attributes, and is nothing more and nothing less than the
> aggregate of them, [Celko - D&DB]) is not a complete model of the
> nature of information. Rather I am proposing that there is no single
> attribute of an entity that can be gauranteed to act as a permenant
> immutable reference for it (nominalism - there are no universals), and
> so an external one to represent our concept of an entity be provided.

You are discussing here
a) entity = abstract concept
or
b) entity = object ?

if a), why do you need to reference it ?

x

nelasīta,

2005. gada 30. sept. 06:36:0730.09.05

uz

"vldm10" <vld...@yahoo.com> wrote in message

news:1128070117....@g14g2000cwa.googlegroups.com...

What about the derived relations ?

x

nelasīta,

2005. gada 30. sept. 08:00:0430.09.05

uz

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1128029620.6...@f14g2000cwb.googlegroups.com...

> Proposition:

> I am putting forward for discussion that Codd's Information Principle
> (based on an Liebniz/essentialist viewpoint that an entity is defined
> by its attributes, and is nothing more and nothing less than the
> aggregate of them, [Celko - D&DB]) is not a complete model of the
> nature of information. Rather I am proposing that there is no single
> attribute of an entity that can be gauranteed to act as a permenant
> immutable reference for it (nominalism - there are no universals), and
> so an external one to represent our concept of an entity be provided.

> Now I'm afraid, I have no more than that to offer at the moment. And
> obviously its all up for assassination, and thats no problem, shoot.
> That's why i'm posting - to get your educated opinions. Just don't tell
> me this sort of discussion isn't valid here (where else?), that I don't
> roll my sleeves up and do real work or that the questioning the
> philosophy of axioms is for high school.

It seems to me you have an interest in philosophy.
I've just found this : http://216.25.45.103/pubs.htm

JOG

nelasīta,

2005. gada 30. sept. 09:56:0430.09.05

uz

> Marshall Spight wrote:
> > JOG wrote:
> [snip]
>
> >>A VIN exists in the real world, and as such is part of the proposition
> >>we are encoding. A surrogate key is an artifice of the database, does
> >>not exist in the world you are modelling, and is implemented to get
> >>your system working. This seems a very clear distinction to me.
> >
> >
> > Perhaps we have different definitions of surrogate keys. The VIN
> > may be something that's not a surrogate key in database A, but
> > it's certainly a surrogate key in *some* database, and hence I
> > don't see any particular difference between it and any other
> > made-up identifiers. SOMEone, some database, made up the VIN
> > out of thin air; we could well say they "implemented [it] to get
> > [their] system working." So I still don't see the distinction
> > you're drawing here.
>
> There is an important difference. Unless we are talking about
> that specific "*some*" database, the VIN is /not/ a surrogate key in
> the database at hand.

I'd answer along the same lines as mAmsterdam. The difference is that a
VIN is controlled externally to the database (owned by the real world
as it were) and a pure surrogate key is controlled internally within
the RDBMS (owned by the database). The distinction guarantees that the
latter is immutable and unique, which is not the case with the former.

And of course, yes, as soon as these lines blur and the pure surrogate
key (and by pure i mean solely db owned), starts to be used outside of
the db, the real world has taken over ownership of it, and it loses all
its guarantees.

Marshall Spight

nelasīta,

2005. gada 30. sept. 10:37:4130.09.05

uz

x wrote:
> "Marshall Spight" <marshal...@gmail.com> wrote in message
> news:1128057088.2...@z14g2000cwz.googlegroups.com...
> > JOG wrote:
>
> > > A VIN exists in the real world, and as such is part of the proposition
> > > we are encoding. A surrogate key is an artifice of the database, does
> > > not exist in the world you are modelling, and is implemented to get
> > > your system working. This seems a very clear distinction to me.
>
> > Perhaps we have different definitions of surrogate keys. The VIN
> > may be something that's not a surrogate key in database A, but
> > it's certainly a surrogate key in *some* database, and hence I
> > don't see any particular difference between it and any other
> > made-up identifiers. SOMEone, some database, made up the VIN
> > out of thin air; we could well say they "implemented [it] to get
> > [their] system working." So I still don't see the distinction
> > you're drawing here.
>
> A VIN exists in the "real world" (outside every database), doesn't matter
> if it is artificial or not (maybe it was born at the same time with the car,
> so it is natural after all :-)).
>
> An "invisible id" never leave the database where it was born. It exists in
> exactly one database.
> Or
>
> An "invisible id" never escape the database world. It doesn't exists in the
> real world (outside every database).

Okay. By that line of thought, if we, say, print out the "invisible id"
on a piece of paper, does it then transform into the same kind
of thing as a VIN? To this way of thinking, what category does
a bank account id fall into? I can see my bank account id on my
account statement ...

Are these invisible ids we're discussing supported in any
existing DBMS product, or are we speaking of them exclusively
as a theoretical construct?

> > However, there is a definite trap in this general area, which
> > is the trap of thinking that there is anything profound to
> > be said about the relationship between the internal and the
> > external predicate. That relationship, between the database
> > and the world it models, exists only in the minds of the users
> > of the system, and it is wholly an illusion, if you will.
> > This is an important point, and it is also a philosophical one.
>
> The database have to be in sync with the world it models .
> When they go out of sync, the database and/or the world need to be updated
> somehow.

Well, yeah. But that doesn't mean the two have any actual relationship.
The only way they relate is though a human doing data entry. (Or
an automated sensor, or whatever.) That doesn't fall into the
category of "anything profound."

Marshall

Marshall Spight

nelasīta,

2005. gada 30. sept. 10:39:0930.09.05

uz

mAsterdam wrote:

> Marshall Spight wrote:
> >
> > Perhaps we have different definitions of surrogate keys. The VIN
> > may be something that's not a surrogate key in database A, but
> > it's certainly a surrogate key in *some* database, and hence I
> > don't see any particular difference between it and any other
> > made-up identifiers. SOMEone, some database, made up the VIN
> > out of thin air; we could well say they "implemented [it] to get
> > [their] system working." So I still don't see the distinction
> > you're drawing here.
>
> There is an important difference. Unless we are talking about
> that specific "*some*" database, the VIN is /not/ a surrogate key in
> the database at hand.

So you are saying that whether something is a surrogate key
or not is determined within the context of a specific database?

Marshall

Marshall Spight

nelasīta,

2005. gada 30. sept. 10:45:3530.09.05

uz

vldm10 wrote:
> Let the data from the relation Car be represented with the following
> relations:

It looks to me like what you're doing is ordinary data modelling.
(Although I will add you are doing it somewhat formally and quite
well.) I am not sure if you are claiming to be doing anything more.
Are you?

For example, the three relations you describe strike me as
overly complex unless the application has a requirement to
store past values of the color of the car. Without that requirement,
you could simply collapse the three relations into one, keep
only the current color, use VIN as the key, and lose the
CarKey. (I know this happens in the real world, because
my wife sometimes loses her CarKeys.)

Marshall

Marshall Spight

nelasīta,

2005. gada 30. sept. 10:52:0930.09.05

uz

JOG wrote:
> > Marshall Spight wrote:
> > >

> > > Perhaps we have different definitions of surrogate keys. The VIN
> > > may be something that's not a surrogate key in database A, but
> > > it's certainly a surrogate key in *some* database, and hence I
> > > don't see any particular difference between it and any other
> > > made-up identifiers. SOMEone, some database, made up the VIN
> > > out of thin air; we could well say they "implemented [it] to get
> > > [their] system working." So I still don't see the distinction
> > > you're drawing here.
> >
> > There is an important difference. Unless we are talking about
> > that specific "*some*" database, the VIN is /not/ a surrogate key in
> > the database at hand.
>
> I'd answer along the same lines as mAmsterdam. The difference is that a
> VIN is controlled externally to the database (owned by the real world
> as it were) and a pure surrogate key is controlled internally within
> the RDBMS (owned by the database). The distinction guarantees that the
> latter is immutable and unique, which is not the case with the former.

I don't think I agree with the last sentence. In fact, I don't
see how the guarantees are any different. Some piece of machinery
in the DMV (a dbms, I'm sure) allocates VINs which are then stamped
on cars. If I have a customer database table with a lousy autoincrement
key (not to make Joe C cringe, but just to keep the argument concrete
and simple) then it is every bit as immutable and unique as the
VIN in the DMV database. And once I print out statements or a
customer list or whatever, that id exists in the real world
as well.

> And of course, yes, as soon as these lines blur and the pure surrogate
> key (and by pure i mean solely db owned), starts to be used outside of
> the db, the real world has taken over ownership of it, and it loses all
> its guarantees.

I can't make sense of this sentence. Through what mechanism are
the guarantees lost? In what sense are you using the word "guarantee?"
Generally when we talk about database guarantees, we are talking
about dbms-enforced constraints; that doesn't appear to be what you
mean.

Marshall

mAsterdam

nelasīta,

2005. gada 30. sept. 11:28:0630.09.05

uz

Just as the key-ness of something is.
Is column x a key for table y?
The question only makes sense within the context
of a specific database.

But I may have misunderstood your question.

JOG

nelasīta,

2005. gada 30. sept. 11:39:4330.09.05

uz

This is tough. The distinction is really clear to me, but I'm not sure
the slightly ambiguous terminology of 'surrogate key' is helping us -
we're using it in two senses, so i'm going to invent some terms for
those senses to help with this post:

"substitute key" - a key made up by datbase D to represent items within
its own data. It is owned, generated and controlled by that database D.

"representative key" - an identifier that exists in the real world (by
which I mean it exists independently of D), and is controlled by some
mechanism other than D.

Now Database D wants to store information about a set of cars and it
needs to specify a primary key - it might use a VIN (a representative
key) or it could use some internally generated key (a substitute key).

* If D uses a substitute key as a primary it can absolutely ensure that
this key is unique and immutable.

* However if it uses a representative key it has no such assurance.
Even a great representative key like a VIN, may be input incorrectly,
may have been recorded on paper wrongly, the DMV's generating mechanism
may have be faulty - who knows? At some point the DMV might even decide
its time for a brand new scheme, and replace VINs completely.

All best, J.

David Cressey

nelasīta,

2005. gada 1. okt. 09:28:3001.10.05

uz

"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1128057088.2...@z14g2000cwz.googlegroups.com...

> However, there is a definite trap in this general area, which

> is the trap of thinking that there is anything profound to
> be said about the relationship between the internal and the
> external predicate. That relationship, between the database
> and the world it models, exists only in the minds of the users
> of the system, and it is wholly an illusion, if you will.
> This is an important point, and it is also a philosophical one.

I've seen an interesting way of presenting this idea, or a similar one.

The authors call what the database serves up 'opinions' rather than
'facts'.
That is, 'in the opinion of the database, the employee with employee id
12345 has first name "Marshall" and last name "Spight"'.
Or, 'in the opinion of the database, there is no employee with id 567890'.
Or 'the database has no opinion as what Donald Trump's e-mail address might
be, if he has one'.

I find this enormously refreshing, especially when it comes to missing data.

David Cressey

nelasīta,

2005. gada 1. okt. 09:30:5301.10.05

uz

"x" <x...@not-exists.org> wrote in message

news:dhir3m$dtp$1...@domitilla.aioe.org...

> The SSN and the VIN are a bit different because the SSN is not tattooed on
> your skull at birth (yet).

What does an SSN identify? Does it identify a person? Or does it identify
an account inside the social security system?

David Cressey

nelasīta,

2005. gada 1. okt. 09:38:5401.10.05

uz

"mAsterdam" <mAst...@vrijdag.org> wrote in message
news:433cf1d0$0$11072$e4fe...@news.xs4all.nl...

> There is an important difference. Unless we are talking about

> that specific "*some*" database, the VIN is /not/ a surrogate key in
> the database at hand.

Agreed. This is an important difference.

Perhaps the difference is between "externally supplied data" and "internally
generated data". In a typical personnel system, new employee ids are
assigned by the HR department, and they are responsible, ultimately, for
avoiding duplicate assignments. The IT department ultimately is only a
custodian of data that the HR department "owns". Of course, the IT
department codes duplicate rejection somewhere into their systems, so as to
help HR discover their errors in a timely fashion.

So, when we have a "surrogate key in the database at hand" (or, if you can
accept it, "in the system at hand") what we have is data that is "owned" by
the system itself, rather than merely kept in custody by the system for its
owners.

mAsterdam

nelasīta,

2005. gada 1. okt. 10:08:3201.10.05

uz

David Cressey wrote:

Makes sense to me. Along these lines:
Exposing system owned data to users should
be done only when necessary:
error codes/messages, hm... what more?