Another view on analysis and ER

David Cressey

unread,

Dec 4, 2007, 11:34:26 AM12/4/07

to

Here's a website I stmbled across:

http://www.islandnet.com/~tmc/html/articles/datamodl.htm

Note that, at the start of the introduction, the author says that analysis
is the most important part of any project. That's rather different from the
impression I've gotten in response to my topic on "what is analysis".

By the way, I don't like the author's dialect of ER. In particular, his
topic on "resolving many-to-many relationships" is, I believe extraneous
to ER. His reification of a "watering" reminds me of the term "association
entity" that someone wrote in reposnse to me a few days ago.

In analysis, there is nothing to resolve in a many-to-many relationship.
You only have to resolve it when you are designing relational tables or
relvars. So the resolution is a feature of the solution, not a feature of
the problem to be solved. And ER diagrams have no problem diagramming
many-to-many relationships: just put crow's feet at both end of the
relationship line.

ER diagrams are a little awkward at diagramming n-ary relationships, but a
diamond representing the relationship is good enough, even if a little
cluttered.

Aside to Cimode: you are trying to give ER a fair chance. But if you
indeed do not separate analysis from design, and if I am correct the ER is
useful in analysis but not in design, then it seems logical that you will
eventually conclude that ER is of no value to you.

David Cressey

unread,

Dec 4, 2007, 11:56:28 AM12/4/07

to

OOPS, I goofed. What meant to say under resolving a many-to-many
relationship was the reification of "treatment", rather than "watering".

Drat the loss of short term memory!

"David Cressey" <cres...@verizon.net> wrote in message
news:mkf5j.3247$QS.1019@trndny03...

TroyK

unread,

Dec 4, 2007, 12:33:49 PM12/4/07

to

It looks like the author is using the Barker notation for his
diagrams. I like
to use the same for business modeling due to the accessibility of the
notation to the business users. I find, however, that the diagrams
definitely
require accompanying text for full elucidation of the business rules
at play.

An example can be found in an article of mine (the first in a 5-part
series)
that was just published on a product-specific "tips and tricks"
website:
http://www.sqlservercentral.com/articles/Data+Modeling/61526/

Note that for this example, I am defering the "resolution" of the many-
to-many
relationship to the logical model. In practice, I like to think about
the so-called
"associating entity" to check whether it is or is not something that
should
be surfaced to the subject matter experts.

Behind the scenes, the design is all predicates documented in
mathematical
notation, but I'm the only one that sees that version.

TroyK

JOG

unread,

Dec 4, 2007, 3:04:31 PM12/4/07

to

Genuine question guys. From an E/R perspective (one of the good
variants, that allows relationships to have attributes), if I'm faced
with the following data.

-- Fred married Wilma in Bedrock.
-- Barney and Betty married in Paris.

How do I decide whether I am dealing with a marriage entity or a
marriage relationship?

The literature I'm reading here is telling me that the choice is based
on what 'things' are key to the business. If my business is concerned
with people (tax collection say), 'marriage' is best modelled as a
relationship, whereas if the marriages themselves are my focus
(perhaps I run a church) then its probably better as an entity.

Have I made the right interpretation here, and is there general
agreement here? I am much more comfortable seeing that some variants
allow relationships to themselves have attributes, and that there is
nothing sacred about choices between using relationships or entities,
making it a design decision instead.

Thanks in advance, J.

Bob Badour

unread,

Dec 4, 2007, 3:28:18 PM12/4/07

to

But then again, as has been repeated ad nauseum, analysis is not design.
What business has a design decision in an analysis artefact?

David Cressey

unread,

Dec 4, 2007, 3:33:06 PM12/4/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:58c47eeb-adf0-414a...@l1g2000hsa.googlegroups.com...

My answer is that it's subjective. If the subject matter experts all treat
a marriage as a relationship, follow their lead. If the subject matter
experts all treat it as an entity, follow their lead, but insist that
there's a key attroubt that identifies it. (Do the SME's have any such
thing as a "marriage ID" attribute?

Once you switch over from analysis to design, here's what happens: the
attributes that you discovered during analyisis and attached to entities or
relationships will be carried over from your ER model your design model,
which I presume will be a relational model.
The entities and relationships themselves, as such, will all disappear!

Another thing that will carry over is the keys, used to identify instances
of entities in the subject matter (or UoD if you prefer). Sometimes the
keys used by the SME (subject matter experts) are a little too informal and
require "common sense" to disambiguate. That's a special case, and doesn't
affect this discussion.

I'm used to expressing a relational model in terms of tables (relational
tables), but I presume that what follows could be transliterated into terms
of relvars without any difficulty.

Each entity will have a table of its own, with the attributes that pertain
to that entity. The primary key of the table will be the key attribute of
the entity.
Each relationship will have a table of its own, except for a few that can
be piggy backed onto entity tables. The primary key of a relational table
will consists of two or more foreign keys, compound. These tables will
automatically be normalized up to 3NF unless your analysis put an attribute
on the "wrong entity". I'm not sure about normalization beyond 3NF.

The above process is so automatic that you can have software that does it
for you. Indeed, that's what several tools do. They express an ER model
in terms of metadata, and likewise express the relational model in terms of
metadata. And they have a programmed process that will create a relational
model from an ER model. The only tool I know calls the relaional model a
"physical model" and makes it specific to some product like Oracle or DB2,
etc. But that's a trivial detail. The software also turns models into
diagrams and/or create scripts for you.

So where did the entities and relationships go? They disappeared into the
ether! However,
when the application people get around to designing screens and reports,
they can tie each feature back to an original entity or relationship. That
can make the resulting system coherent for the users.

I apologize for this repsonse. It's really a lot more than you asked for.
But it's actually easier to use than it is to describe. It may also
oversimplify. The relational model that software constructs may not be the
best relational model that could be designed to deal with the original
problem.

David Cressey

unread,

Dec 4, 2007, 3:40:05 PM12/4/07

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:4755b869$0$5294$9a56...@news.aliant.net...

Bingo! That's the big problem with the literature on ER. Many ER
proponents use ER as if it were a design artifact. I think that 's a
misapplication of the artifact, and I'm pretty sure Peter Chen would agree.
If one is designing a relational system (including but not limited to a
relational database) then using the relational model to capture the design
is a much better idea.

Ruud de Koter

unread,

Dec 4, 2007, 6:42:37 PM12/4/07

to

Very clear, this answer. One minor point I 'd like to add is that it is
not subjective. Instead, the choices are governed by the goal to be
served with the application (assuming the analysis aims at building an
application). As clearly stated, there is a difference in perspective
between tax inspectors and priests (and spouses, for that matter). There
simply is no single authorative model for a marriage, there are several
points of view, depending on the universe of discourse one operates in.
What we, in analysis, can do is to make sure we are aware of these UoDs
, and make a conscious choice. That is something else than being subjective.

Hope this helps,

Ruud de Koter.

JOG

unread,

Dec 4, 2007, 7:38:59 PM12/4/07

to

Shared data anyone? Isn't the point that we _don't_ necessarily know
all the applications?

> As clearly stated, there is a difference in perspective
> between tax inspectors and priests (and spouses, for that matter). There
> simply is no single authorative model for a marriage, there are several
> points of view, depending on the universe of discourse one operates in.
> What we, in analysis, can do is to make sure we are aware of these UoDs
> , and make a conscious choice. That is something else than being subjective.

Why make the choice? Keep the data neutral and its good for both tax
inspectors and priests right.

JOG

unread,

Dec 4, 2007, 8:10:11 PM12/4/07

to

Ok so one might summarize the following steps:
1) initial analysis of business processes and important concepts.
2) Formulation of an initial conceptual model (that is necessarily
slanted to a certain viewpoint of the UoD).
3) Translation into a nicely normalized logical model, that's query
neutral.
4) On demand, extract data back out from the neutral logical model,
shaping it either the original conceptual view, or other conceptual
views as needs arise from new applications.

Great. This all makes perfect sense, and is very clear to boot. A
simple process for creating a thorough yet flexible system. It seems
obvious even, right?

So why on earth would /anyone/ want to drop step 3? I'm at a loss as
to why certain cdt'ers (who are clearly intelligent people) seem to be
advocating this. An absolute loss I tell you.

>
> I apologize for this repsonse. It's really a lot more than you asked for.

Yes, how dare you respond with such clarity and thoroughness. Shame on
you.

David Cressey

unread,

Dec 4, 2007, 10:46:47 PM12/4/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:21fa63eb-49f4-4e70...@l16g2000hsf.googlegroups.com...

Excellent summary. I'd just add one more piece, that's really part of step
1.

1b) Discovery of the data.

Ruud de Koter

unread,

Dec 5, 2007, 1:09:03 AM12/5/07

to

There are two troublesome points in your reaction. First of all 'keep
the data neutral' doesn't mean no choices are made. Staying neutral is a
choice as well. One of the hardest choices I 'd say, because in order to
stay neutral, a thorough knowledge of the universes of discourse is
necessary. Also, these universes should not be mutually exclusive.

A second point: we can only keep the data neutral if know all possible
perspectives. It is only then that we can consciously model the data to
fit all the universes of discourse. Yet, you rightly observe we don't
necessarily know all the applications, which amounts to saying we don't
know all the universes of discourse. So choices can not be avoided. In
that case I 'd much rather make these conscious choices instead of
keeping up a pretense of neutrality. At the very least we should be
aware that the model resulting from analysis may be biased, and is not
the final word on the world out there.

Jan Hidders

unread,

Dec 5, 2007, 5:49:50 AM12/5/07

to

You might want to add here that sometimes in this step you have to
integrate several different UoDs that do not necessarily agree on the
overlapping parts. I think that is mainly what Ruud is talking about,
plus the fact that at this stage you might want to anticipate a bit on
other UoDs that might have to be integrated in the future. You could
call that "making it more objective" but I think "making it less
subjective" is more precise. ;-)

> 3) Translation into a nicely normalized logical model, that's query
> neutral.

Normalization is really only a very minor issue here IMO. I've not had
that much personal practical experience in my life but I did work
briefly for two big Dutch companies that both had an organization in
charge of maintaining the global company data model that integrated
all data models from the applications and databases they had. I worked
with the guys that did this, and I remember being completely blown
away buy how much variation there was in concepts such as employee and
order, even within a single company. I still admire these guys.

> 4) On demand, extract data back out from the neutral logical model,
> shaping it either the original conceptual view, or other conceptual
> views as needs arise from new applications.
>
> Great. This all makes perfect sense, and is very clear to boot. A
> simple process for creating a thorough yet flexible system. It seems
> obvious even, right?
>
> So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> to why certain cdt'ers (who are clearly intelligent people) seem to be
> advocating this. An absolute loss I tell you.

I'm not sure that is what Ruud is saying. Anyone else?

-- Jan Hidders

JOG

unread,

Dec 5, 2007, 7:24:31 AM12/5/07

to

Thats a fair point. Neutrality is something i've promoted for a long
time on cdt, and I understand there are issues for processor cycles.
However one can still take a single conceptual view of data in
analysis and flatten it out in the logical layer. Take David's
breakdown of the marriage example for instance - even though it is
translated from a single conceptual view, by the time it is in the
logical layer data may be extracted from it via the perspective of
marriage as an entity, or marriage as a relationship, with equal ease.

>
> A second point: we can only keep the data neutral if know all possible
> perspectives. It is only then that we can consciously model the data to
> fit all the universes of discourse. Yet, you rightly observe we don't
> necessarily know all the applications, which amounts to saying we don't
> know all the universes of discourse. So choices can not be avoided. In
> that case I 'd much rather make these conscious choices instead of
> keeping up a pretense of neutrality. At the very least we should be
> aware that the model resulting from analysis may be biased, and is not
> the final word on the world out there.

I think maybe we are referring to a slightly different definition of
neutrality. I'm suggesting that a logical model should have no bias as
to whether things are relationships or entities, and leave that to be
determined by the person generating the queries. Regards, J.

David Cressey

unread,

Dec 5, 2007, 7:38:28 AM12/5/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message
news:2b32638c-8c58-4ccc...@i12g2000prf.googlegroups.com...

> On 5 dec, 02:10, JOG <j...@cs.nott.ac.uk> wrote:

> > Ok so one might summarize the following steps:
> > 1) initial analysis of business processes and important concepts.
> > 2) Formulation of an initial conceptual model (that is necessarily
> > slanted to a certain viewpoint of the UoD).
>
> You might want to add here that sometimes in this step you have to
> integrate several different UoDs that do not necessarily agree on the
> overlapping parts. I think that is mainly what Ruud is talking about,
> plus the fact that at this stage you might want to anticipate a bit on
> other UoDs that might have to be integrated in the future. You could
> call that "making it more objective" but I think "making it less
> subjective" is more precise. ;-)

Excellent point. Integrated previously disjoint UofDs is a major step that
precedes effective information sharing. And the need for integration
generally eludes the people who have maintained their own data within their
own private fiefdom.

I don't agree that integrated data is "less subjective" than data from
separate Uof Ds. I think that what we achieve when we integrate UofDs is to
substitute one "catholic" subjective viewpoint for several "parochial"
subjective viewpoints.

My vision of database design is biased towards considering this the central
problem in database design. My bias comes from my history with database
development. When I was first coming up to speed on database matters, it
was impossible to build a database of any size or significance whatsoever
without obtaining data that was previously tucked away in files-and-records
applications that did not talk to each other. Neither the users nor the
programmers had the remotest idea of what integrated data would look like or
why it would be useful.

So coming up with an integrated UofD that everyone could more or less accept
was critical to the success of the database project. I think this is still
true if you think of databases as serving multiple applications, some of
which are not yet conceived, as a couple of people have pointed out in this
topic. Too many databases today are as parochial in their mission as files
of records were in my day.

>
> > 3) Translation into a nicely normalized logical model, that's query
> > neutral.
>
> Normalization is really only a very minor issue here IMO. I've not had
> that much personal practical experience in my life but I did work
> briefly for two big Dutch companies that both had an organization in
> charge of maintaining the global company data model that integrated
> all data models from the applications and databases they had. I worked
> with the guys that did this, and I remember being completely blown
> away buy how much variation there was in concepts such as employee and
> order, even within a single company. I still admire these guys.
>

"Normalization" as such is a corrective measure against data that turns out
to have been designed in an unfortunate manner. As such, it's a bottom up
approach. The approach I outlined above is a top down approach, and
results in a normalized model (at least up to 3NF) with no effort at
nromalization as such. The ER model i neither normalized not denormalized.

I strongly suspect that Bob Badour's approach, centered on propositions,
is also a top down approach, and that the appropriate binding of attributes
into relvars is a natural consequence of discovering well formed
propositions rather than a corrective measure taken after an initial design
that deviates from normalization. But I don't have up close experience with
Bob's approach, so I'll defer to Bob's comments in this regard.

> > 4) On demand, extract data back out from the neutral logical model,
> > shaping it either the original conceptual view, or other conceptual
> > views as needs arise from new applications.
> >
> > Great. This all makes perfect sense, and is very clear to boot. A
> > simple process for creating a thorough yet flexible system. It seems
> > obvious even, right?
> >
> > So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> > to why certain cdt'ers (who are clearly intelligent people) seem to be
> > advocating this. An absolute loss I tell you.
>
> I'm not sure that is what Ruud is saying. Anyone else?

I'm not sure who the poeple in c.d.t. that want to drop step 3 are. I'm
surely not one.

I suspect that if anyone wants to drop step 3 it would be someone who wants
to use ER for database design purposes, or someone who wants to use a
pervasive object model to substitute for a relational logical model. Does
anyone here hold either of these views?

>
> -- Jan Hidders

JOG

unread,

Dec 5, 2007, 7:52:31 AM12/5/07

to

Yes its this variation in concepts that is the issue for me really.
People on here are sharp developers in general, and understand the
subjective nature of the issues involved. But outside of here, in the
monkey world of IT, many who learn and employ ERM end up with the view
that an entity is a nice neat /objective/ partition of the world
(again David's example of how many different views of a 'course'
construct would serve them well). Its not necessarily the tool, but
the ethos of E/R and OOP literature seems to perpetuate this
misunderstanding. XML, for its sins, has never really shared this due
to its long history of schema transformation via XSLT.

As ever just musings ;)

>
> > 4) On demand, extract data back out from the neutral logical model,
> > shaping it either the original conceptual view, or other conceptual
> > views as needs arise from new applications.
>
> > Great. This all makes perfect sense, and is very clear to boot. A
> > simple process for creating a thorough yet flexible system. It seems
> > obvious even, right?
>
> > So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> > to why certain cdt'ers (who are clearly intelligent people) seem to be
> > advocating this. An absolute loss I tell you.
>
> I'm not sure that is what Ruud is saying. Anyone else?

Is this now what you were proposing yourself Jan in another thread -
that the logical layer should not be neutral, but rather based on
entities and relationships (and hence taking a specific conceptual
viewpoint)?

>
> -- Jan Hidders

Jon Heggland

unread,

Dec 5, 2007, 7:54:42 AM12/5/07

to

Quoth David Cressey:

> Here's a website I stmbled across:
>
> http://www.islandnet.com/~tmc/html/articles/datamodl.htm
>
> Note that, at the start of the introduction, the author says that analysis
> is the most important part of any project. That's rather different from the
> impression I've gotten in response to my topic on "what is analysis".

Well, that depends on what analysis is. It seems this guy thinks it's
the same as data modeling, which in turn is the same as developing a
graphical representation of the client's needs and processes. Is it?
Furthermore, you could interpret Marshall's and my response as "we don't
do analysis, we just start coding", but I don't think that's what we
mean. Myself, I'm skeptical of presenting analysis as a very separate,
distinct kind of activity, defined by the kinds of artifacts it
produces, i.e. "pretty pictures" to use Bob's term.

But I digress. This was what I meant to respond to:

> By the way, I don't like the author's dialect of ER. In particular, his
> topic on "resolving many-to-many relationships" is, I believe extraneous
> to ER. His reification of a "watering" reminds me of the term "association
> entity" that someone wrote in reposnse to me a few days ago.
>
> In analysis, there is nothing to resolve in a many-to-many relationship.
> You only have to resolve it when you are designing relational tables or
> relvars.

Both yes and no. Reifying relationships can be helpful, but /not/
because "Many-to-many relationships cannot be directly converted into
database tables and relationships". The point is rather to make it
easier to discover their properties---their attributes, mainly, but
potentially also other things, e.g. constraints. When I discover a
many-to-many-relationship, I usually make it a box, with a name, and ask
if there is anything else we want to be able to say about this thing.
Often, there is. If there isn't, I can demote it to a line again.

This mainly applies to many-to-many-relationships, because business
rules / attributes / constraints regarding a one-to-many-relationship
are often better relegated to the entity on the many-side (though not
always, of course). It has little to do with the implementation (or
design?) of many-to-many-relationships in relational databases.

Some might argue that reifying relationships is unnecessary, since
relationships in "good" E/R dialects can have attributes. What, then, is
the difference between an entity and a relationship? The best answer I
can think of is that an entity is identified by itself, while a
relationship is identified by its entities. But what if something has
more than one way of identification (i.e. multiple keys)? This is where
classic E/R breaks down for me. A "relationship" may be identified by
its entities, but also by (say) just one of its entities in combination
with a subset of its attributes. And/or perhaps a subset of its
attributes, disregarding any entities. Is it then a relationship, a weak
entity, or an entity?

This is turning into a rant against the classic(?) E/R notation, but
here goes anyway. I think it's a bad idea that more than one kind of
thing can have attributes. I think it's a bad idea that there are two
(or more) different ways of indicating how something is identified.
Relationship diamonds are required for non-binary relationships, but are
just clutter for binary ones---bad idea.

Fortunately, there is (at least) one E/R dialect that resolves all these
issues, and in so doing, even makes the distinction between entities and
relationships far less important.

Apropos this distinction: As to whether marriage is a relationship or an
entity, you said that one should listen to the subject matter experts. I
have never had such an expert say to me, "No, that's not a relationship,
that's an entity!" or vice versa. Have you?
--
Jon

JOG

unread,

Dec 5, 2007, 8:02:04 AM12/5/07

to

I told my other half that we weren't in a relationship, and that we
should go off and get entity councilling.

Now she thinks i'm weird.

Bob Badour

unread,

Dec 5, 2007, 10:27:03 AM12/5/07

to

Jon Heggland wrote:

The difference between entity and relationship is neither more nor less
than psychological prejudice or bias. The distinction is entirely imagined.

> This is turning into a rant against the classic(?) E/R notation, but
> here goes anyway. I think it's a bad idea that more than one kind of
> thing can have attributes. I think it's a bad idea that there are two
> (or more) different ways of indicating how something is identified.
> Relationship diamonds are required for non-binary relationships, but are
> just clutter for binary ones---bad idea.
>
> Fortunately, there is (at least) one E/R dialect that resolves all these
> issues, and in so doing, even makes the distinction between entities and
> relationships far less important.
>
> Apropos this distinction: As to whether marriage is a relationship or an
> entity, you said that one should listen to the subject matter experts. I
> have never had such an expert say to me, "No, that's not a relationship,
> that's an entity!" or vice versa. Have you?

Most SMEs just agree when they look at a pretty picture because they
don't want to look stupid for not understanding the notation. NIAM's
formalized english works so much better because when an SME disagrees
with a statement, the SME speaks up. They immediately say: "No, that's
wrong." or "No, that's not always right."

David Cressey

unread,

Dec 5, 2007, 10:50:26 AM12/5/07

to

"Jon Heggland" <jon.he...@ntnu.no> wrote in message
news:fj6737$o2p$1...@orkan.itea.ntnu.no...

> Quoth David Cressey:
> > Here's a website I stmbled across:
> >
> > http://www.islandnet.com/~tmc/html/articles/datamodl.htm
> >
> > Note that, at the start of the introduction, the author says that
analysis
> > is the most important part of any project. That's rather different from
the
> > impression I've gotten in response to my topic on "what is analysis".
>
> Well, that depends on what analysis is. It seems this guy thinks it's
> the same as data modeling, which in turn is the same as developing a
> graphical representation of the client's needs and processes. Is it?
> Furthermore, you could interpret Marshall's and my response as "we don't
> do analysis, we just start coding", but I don't think that's what we
> mean. Myself, I'm skeptical of presenting analysis as a very separate,
> distinct kind of activity, defined by the kinds of artifacts it
> produces, i.e. "pretty pictures" to use Bob's term.
>

Not all modeling is analysis. Some of it is design. In particular, I'm
going to claim that you discover attributes, but you design relvars. I've
already have the second claim confirmed by Bob and others.

Bob's distaste for pretty pictures should not obscure the mian theme. A
model isn't a "pretty picture" as such. Rather, a "pretty picture" is the
projection of a model on a flat screen. Other projections have been
proposed. A table written on a whiteboard, with some imaginary sample data
written into it, proposed by another participant, is another projection of
a model on a flat screen.

Whether a pretty picture was worth the cost of making it depends on what
happens next.

If you look at the metadata in the implemented database, none.

> The best answer I
> can think of is that an entity is identified by itself, while a
> relationship is identified by its entities. But what if something has
> more than one way of identification (i.e. multiple keys)? This is where
> classic E/R breaks down for me. A "relationship" may be identified by
> its entities, but also by (say) just one of its entities in combination
> with a subset of its attributes. And/or perhaps a subset of its
> attributes, disregarding any entities. Is it then a relationship, a weak
> entity, or an entity?
>
> This is turning into a rant against the classic(?) E/R notation, but
> here goes anyway. I think it's a bad idea that more than one kind of
> thing can have attributes. I think it's a bad idea that there are two
> (or more) different ways of indicating how something is identified.
> Relationship diamonds are required for non-binary relationships, but are
> just clutter for binary ones---bad idea.
>
> Fortunately, there is (at least) one E/R dialect that resolves all these
> issues, and in so doing, even makes the distinction between entities and
> relationships far less important.
>
> Apropos this distinction: As to whether marriage is a relationship or an
> entity, you said that one should listen to the subject matter experts. I
> have never had such an expert say to me, "No, that's not a relationship,
> that's an entity!" or vice versa. Have you?

Not in so many words. But they have said things like "a reservation for a
certain car, on a certain date, by a certain customer has a way of
identifying it. We call it a 'reservation number'. What you have now
learned is that the UofD people think of a reservation as a thing in and of
itself and not just an association between a customer and a car on some
future date.

This tells you something you need to know about the problem statement: The
database has to store reservation numbers.

It also tells you something you need to know about database design: you
have two candidate keys for identifying a relationship, and eventually, a
relvar. One is reservation number. The other is customer ID, car type,
and date. If you declare primary keys in your database, you need to pick
one of these. This could have consequences for performance, ease of
programming, "natural joins" etc. etc. You also need to anticipate that
the application programmers are going to want to be able to find a
reservation, or the absence of a reservation (CWA), based on the
reservation number, based on a slip of paper the customer hands the clerk,
or based on the customer, the car type, and the date.

In some cases, the business rules will make the design decision for you. In
other cases, the business rules are silent on this score.

Bob Badour

unread,

Dec 5, 2007, 11:46:26 AM12/5/07

to

David Cressey wrote:

Pretty pictures have subtle pitfalls and limiting characteristics.
Learning to think without them and to communicate without them improves
both thought and communication.

Where do you have to look to find any difference? (Other than one is
drawn as a box and the other as a line or diamond.)

Protecting the integrity of data is a primary goal of data management.
If one wants to manage one's data, one must declare all candidate keys.
Whether one needs to pick one to designate as primary is secondary to this.

This could have consequences for performance, ease of
> programming, "natural joins" etc. etc.

Performance is independent of choices at the logical level of discourse
where one identifies candidate keys or designates primary keys.
Performance is only affected at the physical level of discourse.

You also need to anticipate that
> the application programmers are going to want to be able to find a
> reservation, or the absence of a reservation (CWA), based on the
> reservation number, based on a slip of paper the customer hands the clerk,
> or based on the customer, the car type, and the date.
>
> In some cases, the business rules will make the design decision for you. In
> other cases, the business rules are silent on this score.

I disagree. First, what the application programmers want is irrelevant.
They are paid to meet the needs of the organization not their own whim.
Second, business rules are essentially synonymous with what the
organization needs.

Brian Selzer

unread,

Dec 5, 2007, 12:09:37 PM12/5/07

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message

news:4756d5e5$0$5261$9a56...@news.aliant.net...

A picture is worth a thousand words. Only an idiot would try to use a
screwdriver to drive a nail. Only a lunatic would choose to use a
screwdriver when there is a perfectly good hammer available.

Message has been deleted

David Cressey

unread,

Dec 5, 2007, 2:15:01 PM12/5/07

to

"paul c" <toledob...@ooyah.ac> wrote in message
news:0LB5j.6$iU.1@pd7urf2no...

> David Cressey wrote:
> > "Jan Hidders" <hid...@gmail.com> wrote in message

> ...

> > ...
>
> I suspect that two or more equally "correct" normalizations, correct in
> terms of theory, often suggest themselves.

I agree with that, but it doesn't address the issue of whether you design
an unnormalized schema and then normalize it on the one hand or on the other
hand start with something from which you can design a schema that will
already be normalized.

Message has been deleted

David Cressey

unread,

Dec 5, 2007, 4:38:10 PM12/5/07

to

"paul c" <toledob...@ooyah.ac> wrote in message

news:9uD5j.1478$sg.563@pd7urf1no...

> David Cressey wrote:
> > "paul c" <toledob...@ooyah.ac> wrote in message
> > news:0LB5j.6$iU.1@pd7urf2no...

> ...
> >> I suspect that two or more equally "correct" normalizations, correct in
> >> terms of theory, often suggest themselves.
> >
> > I agree with that, but it doesn't address the issue of whether you
design
> > an unnormalized schema and then normalize it on the one hand or on the
other
> > hand start with something from which you can design a schema that will
> > already be normalized.
>

> Why should anybody except possibly recipe-followers, conclude that, eg.,
> a top-down, or bottom-up, or indeed stepwise or iterative style dictates
> any such order, eg., anything but re-visiting motives such as
> normalization during implementation, as necessary? Also I would think
> "as necessary" has more to do with the motives behind normalization.
>
> The notion of "point-in-time" process analysis/design steps has more to
> do with collective/group project management and organizational dogmas
> than with skilled interpretation aimed at implementation of those steps.

Good point. Good stopping point (at least for me).

Jan Hidders

unread,

Dec 5, 2007, 7:18:07 PM12/5/07

to

Why would an ER schema be necessarily less neutral than a relational
schema?

-- Jan Hidders

Message has been deleted

JOG

unread,

Dec 5, 2007, 8:47:33 PM12/5/07

to

On Dec 6, 12:18 am, Jan Hidders <hidd...@gmail.com> wrote:
> On 5 dec, 13:52, JOG <j...@cs.nott.ac.uk> wrote:
>
>
>
> > On Dec 5, 10:49 am, Jan Hidders <hidd...@gmail.com> wrote:
>
> > > On 5 dec, 02:10, JOG <j...@cs.nott.ac.uk> wrote:
>
> > > > So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> > > > to why certain cdt'ers (who are clearly intelligent people) seem to be
> > > > advocating this. An absolute loss I tell you.
>
> > > I'm not sure that is what Ruud is saying. Anyone else?
>

> > Is this not what you were proposing yourself Jan in another thread -

> > that the logical layer should not be neutral, but rather based on
> > entities and relationships (and hence taking a specific conceptual
> > viewpoint)?

Did I misunderstand, or is this not what you currently favour?

>
> Why would an ER schema be necessarily less neutral than a relational
> schema?
>
> -- Jan Hidders

Either we are crossing terminology, or this has already been
highlighted earlier in the thread with reference to marriages. E/R
forces one to pick a single conceptual viewpoint (marriage as
relationship/marriage as entity, etc), whereas a propositional
encoding is neutral on the topic.

David BL

unread,

Dec 5, 2007, 11:33:59 PM12/5/07

to

On Dec 6, 10:47 am, JOG <j...@cs.nott.ac.uk> wrote:

> Either we are crossing terminology, or this has already been
> highlighted earlier in the thread with reference to marriages. E/R
> forces one to pick a single conceptual viewpoint (marriage as
> relationship/marriage as entity, etc), whereas a propositional
> encoding is neutral on the topic.

There is something I may be misunderstanding - can you put me right?

I think an entity is characterised as something in the real world that
we want to be able to reference using a reasonably small identifier.
Typically an entity is subject to change over time, and we may not
have convenient access to a nice, convenient, small number of
measurable attributes that uniquely identifies it. Nevertheless we
believe that the identifier can be used successfully in the real
world. This could depend on the population statistics, such as the
way we identify people around us by their facial characteristics, and
it is helpful that people don't tend to have extensive plastic
surgery, exchange body parts on a regular basis or cross dress. This
allows them to be identified by names like "John".

By contrast I think a relationship is characterised as only being
identified by the entities that it relates.

Now I see the same distinction being made in a propositional
encoding.

The intensional definition of the following predicate implicitly
assumes Husband, Wife and Location correspond to entity types that can
be identified using domain values. Evidently marriage is merely a
relationship between them.

married(Husband, Wife, Location) :- Husband married Wife at
Location

By contrast the following predicates are consistent with thinking of a
marriage itself as an entity

husband(MarriageId, Husband).
wife(MarriageId, Wife).
location(MarriageId, Location).

or maybe just

married(MarriageId, Husband, Wife, Location)

Now whether one "thinks in ER" or "thinks in propositional encodings",
there has to be good reason to introduce a MarriageId.

Aren't you implying that a propositional encoding doesn't commit you
to a decision about whether a marriage is implicitly or explicitly
identified? I fail to see how that is possible.

Jon Heggland

unread,

Dec 6, 2007, 3:35:10 AM12/6/07

to

Quoth Bob Badour:

> David Cressey wrote:
>> This could have consequences for performance, ease of
>> programming, "natural joins" etc. etc.
>
> Performance is independent of choices at the logical level of discourse
> where one identifies candidate keys or designates primary keys.

Minor point: Is designation of primary keys really at the logical level?
Are there any substantial logical consequences?
--
Jon

Jon Heggland

unread,

Dec 6, 2007, 3:37:32 AM12/6/07

to

Quoth David BL:

> By contrast I think a relationship is characterised as only being
> identified by the entities that it relates.
>
> Now I see the same distinction being made in a propositional
> encoding.
>
> The intensional definition of the following predicate implicitly
> assumes Husband, Wife and Location correspond to entity types that can
> be identified using domain values. Evidently marriage is merely a
> relationship between them.
>
> married(Husband, Wife, Location) :- Husband married Wife at
> Location

So you intend that Location is necessary for identification? What if
Location does not correspond to an entity type? I assume that might be
the case if it were simply a spatial coordinate. Is marriage still a
relationship? If not, what is is?

(And "merely" a relationship? Are entities better than relationships in
some way?)

> By contrast the following predicates are consistent with thinking of a
> marriage itself as an entity
>
> husband(MarriageId, Husband).
> wife(MarriageId, Wife).
> location(MarriageId, Location).
>
> or maybe just
>
> married(MarriageId, Husband, Wife, Location)
>
> Now whether one "thinks in ER" or "thinks in propositional encodings",
> there has to be good reason to introduce a MarriageId.

Perhaps marriage certificates has a unique number stamped on them, and
you want to record this in your database.

Anyway, your first predicate breaks down if two people can marry each
other more than once on the same location, which is the case in many
parts of the world. I'd suggest you add a DateTime as well. Does that
make the predicate an entity? Assuming monogamy, it can be identified by
{ Husband, DateTime }, { Wife, DateTime } or perhaps even { Location,
DateTime } if we preclude mass weddings. What if the DateTime
"corresponds to an entity type"? (What determines whether there is such
a correspondence?)

> Aren't you implying that a propositional encoding doesn't commit you
> to a decision about whether a marriage is implicitly or explicitly
> identified? I fail to see how that is possible.

You define relationships as anything that can be identified /only/ by
its related entitites---if there is an alternate key, the thing is an
entity. Correct? What is the rationale behind this rule?

(And what if the alternative means of identification is also a
combination of entities?)
--
Jon

Jon Heggland

unread,

Dec 6, 2007, 4:30:00 AM12/6/07

to

Quoth David Cressey:

> "Jon Heggland" <jon.he...@ntnu.no> wrote in message
> news:fj6737$o2p$1...@orkan.itea.ntnu.no...
>> Quoth David Cressey:

> Not all modeling is analysis. Some of it is design. In particular, I'm
> going to claim that you discover attributes, but you design relvars. I've
> already have the second claim confirmed by Bob and others.

Yet when are discovering attributes, you presumably write them down
somehow. Is it the case that if you do it using E/R notation, you are
doing analysis, but if you do it using some relation- or predicate-based
representation, you are doing design?

Or perhaps it's simpler: Analysis is what you're doing when you're
talking with the subject matter experts; design is what you're doing
when you're not. :)

> Bob's distaste for pretty pictures should not obscure the mian theme. A
> model isn't a "pretty picture" as such. Rather, a "pretty picture" is the
> projection of a model on a flat screen. Other projections have been
> proposed. A table written on a whiteboard, with some imaginary sample data
> written into it, proposed by another participant, is another projection of
> a model on a flat screen.

Ceci n'est pas une pipe... Then is the model wholly intangible, existing
only in a platonic sense inside the designers (or analyst's) head? Never
mind, I see your point---but that doesn't answer my question: When is
modelling design, and when is it analysis? A bald statement that
relation-based models are designed doesn't cut it, even if it is
seconded; I need rational arguments.

>> issues, and in so doing, even makes the distinction between entities and
>> relationships far less important.
>>
>> Apropos this distinction: As to whether marriage is a relationship or an
>> entity, you said that one should listen to the subject matter experts. I
>> have never had such an expert say to me, "No, that's not a relationship,
>> that's an entity!" or vice versa. Have you?
>
> Not in so many words. But they have said things like "a reservation for a
> certain car, on a certain date, by a certain customer has a way of
> identifying it. We call it a 'reservation number'. What you have now
> learned is that the UofD people think of a reservation as a thing in and of
> itself and not just an association between a customer and a car on some
> future date.
>
> This tells you something you need to know about the problem statement: The
> database has to store reservation numbers.

Definitely. But this is the (only) important bit. That a reservation is
a "thing" is fluff, not crunch.

> It also tells you something you need to know about database design: you
> have two candidate keys for identifying a relationship, and eventually, a
> relvar. One is reservation number. The other is customer ID, car type,
> and date.

So the thing in and of itself is still a relationship?

In most E/R notations, you cannot represent the alternate
key---reservation number---if a reservation is a relationship. Vice
versa, if it is an entity, you cannot represent the { CustomerID, CarID,
Date } key. This means that you have to have an underlying model, of
which any graphical E/R diagrams are merely simplified views. I agree to
this, but it raises two points:

1. The underlying model cannot have a strict distinction between
entities and relationships, since the same concept---reservation---can
be thought of and presented as both. This relegates entity/relationship
thinking to a question of presentation, not analysis.

2. What is the formalism (if any) of this underlying model? Is
relational theory forbidden if I'm doing analysis? What can I use instead?

> If you declare primary keys in your database, you need to pick
> one of these. This could have consequences for performance, ease of
> programming, "natural joins" etc. etc. You also need to anticipate that
> the application programmers are going to want to be able to find a
> reservation, or the absence of a reservation (CWA), based on the
> reservation number, based on a slip of paper the customer hands the clerk,
> or based on the customer, the car type, and the date.
>
> In some cases, the business rules will make the design decision for you. In
> other cases, the business rules are silent on this score.

You started with users telling you that reservations have reservation
numbers; I believe all will agree that that is analysis. And you end up
here with a "design decision". At what point did the transition from
analysis to design happen?

I think I'll retract my initial definition of what analysis and design
is, and use the one at the top of this post instead. :)
--
Jon

Jan Hidders

unread,

Dec 6, 2007, 4:56:52 AM12/6/07

to

On 6 dec, 02:47, JOG <j...@cs.nott.ac.uk> wrote:
> On Dec 6, 12:18 am, Jan Hidders <hidd...@gmail.com> wrote:
>
>
>
> > On 5 dec, 13:52, JOG <j...@cs.nott.ac.uk> wrote:
>
> > > On Dec 5, 10:49 am, Jan Hidders <hidd...@gmail.com> wrote:
>
> > > > On 5 dec, 02:10, JOG <j...@cs.nott.ac.uk> wrote:
>
> > > > > So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> > > > > to why certain cdt'ers (who are clearly intelligent people) seem to be
> > > > > advocating this. An absolute loss I tell you.
>
> > > > I'm not sure that is what Ruud is saying. Anyone else?
>
> > > Is this not what you were proposing yourself Jan in another thread -
> > > that the logical layer should not be neutral, but rather based on
> > > entities and relationships (and hence taking a specific conceptual
> > > viewpoint)?
>
> Did I misunderstand, or is this not what you currently favour?

I thought my question made that clear, but to be completely explicit
about this: I certainly agree that the logical layer should be as
neutral as possible, but I don't accept the idea that formulating it
as an ER schema makes it necessarily less neutral than formulating it
in a relational schema.

> > Why would an ER schema be necessarily less neutral than a relational
> > schema?
>

> Either we are crossing terminology, or this has already been
> highlighted earlier in the thread with reference to marriages.

Quite possible. To save time I am a bit picky about what I read and
don't read, even withint a thread I find interesting, so I may have
missed something.

> E/R
> forces one to pick a single conceptual viewpoint (marriage as
> relationship/marriage as entity, etc), whereas a propositional
> encoding is neutral on the topic.

I'm sorry but that is complete nonsense. The two possible ER diagrams
you're talking about here are mapped to different relational schemas
by the usual mapping. So in the relational setting you also have to
make that choice.

-- Jan Hidders

JOG

unread,

Dec 6, 2007, 5:29:53 AM12/6/07

to

On Dec 6, 4:33 am, David BL <davi...@iinet.net.au> wrote:
> On Dec 6, 10:47 am, JOG <j...@cs.nott.ac.uk> wrote:
>
> > Either we are crossing terminology, or this has already been
> > highlighted earlier in the thread with reference to marriages. E/R
> > forces one to pick a single conceptual viewpoint (marriage as
> > relationship/marriage as entity, etc), whereas a propositional
> > encoding is neutral on the topic.
>
> There is something I may be misunderstanding - can you put me right?

I'll certainly lend you my point of view. I offer no guarantees that
its right :)

>
> I think an entity is characterised as something in the real world that
> we want to be able to reference using a reasonably small identifier.

I agree with the identification aspect, but see a small identifer as
being an attractive quality as opposed to a definitional one.

> Typically an entity is subject to change over time, and we may not
> have convenient access to a nice, convenient, small number of
> measurable attributes that uniquely identifies it.

Sure. And if the identifier is a large unwieldy set of attributes for
ease we might invent a surrogate to represent that set.

> Nevertheless we
> believe that the identifier can be used successfully in the real
> world. This could depend on the population statistics, such as the
> way we identify people around us by their facial characteristics, and
> it is helpful that people don't tend to have extensive plastic
> surgery, exchange body parts on a regular basis or cross dress. This
> allows them to be identified by names like "John".

I'm not sure I'm following this entirely David. "John" is just another
attribute, and it can be used to refer to someone, but we know it
wouldn't identify them uniquely. The reason we can get away with using
it in day to day conversation is because humans are incredibly good at
resolving context (whereas a computer is not). We'd only use someone's
first name alone when we know its use will not be ambiguous.

>
> By contrast I think a relationship is characterised as only being
> identified by the entities that it relates.

Well, coupla things. Relationships can have attributes too, and they
can be part of the identifying set. And second, the attributes of an
entity... well why can't they too be entities? Consider a 'team'
entity say. It could be identified by the players in it. And those
players are all entities themselves right? Perhaps there is a formal
distinction between relationships and entities but I'm yet to read
one, so until then, I have to posit that there is no difference
between them.

>
> Now I see the same distinction being made in a propositional
> encoding.

I'm not sure why - there are no entities in a propositional encoding,
just roles and values.

>
> The intensional definition of the following predicate implicitly
> assumes Husband, Wife and Location correspond to entity types that can
> be identified using domain values. Evidently marriage is merely a
> relationship between them.
>
> married(Husband, Wife, Location) :- Husband married Wife at
> Location
>
> By contrast the following predicates are consistent with thinking of a
> marriage itself as an entity
>
> husband(MarriageId, Husband).
> wife(MarriageId, Wife).
> location(MarriageId, Location).
>
> or maybe just
>
> married(MarriageId, Husband, Wife, Location)

Summarizing the above I gathered that you are saying that:
RELATIONSHIP: married(Husband, Wife, Location)
ENTITY: married(MarriageId, Husband, Wife, Location)

So the only difference is that the entity has a marriageID? I am not
clear why you think the addition of this surrogate would change a
relationship into an entity!

>
> Now whether one "thinks in ER" or "thinks in propositional encodings",
> there has to be good reason to introduce a MarriageId.

I'm not clear at all why you have introduced a MarriageID. If the
'relationship' is identifiable by {H,W} say then so is the 'entity'.
Why have you introduced the surrogate?

>
> Aren't you implying that a propositional encoding doesn't commit you
> to a decision about whether a marriage is implicitly or explicitly
> identified?

I certainly hope not ;) All things are identified by their
attributes, so I don't really see this implicit/explicit split.

David Cressey

unread,

Dec 6, 2007, 5:58:57 AM12/6/07

to

"Jon Heggland" <jon.he...@ntnu.no> wrote in message

news:fj8fff$3ed$1...@orkan.itea.ntnu.no...

If you are modeling features of the problem, it's analysis. If you are
modeling features of both the problem and the solution, it's design.

Jon Heggland

unread,

Dec 6, 2007, 6:22:31 AM12/6/07

to

Quoth David Cressey:
> "Jon Heggland" <jon.he...@ntnu.no> wrote in message

> news:fj8fff$3ed$1...@orkan.itea.ntnu.no...

>> mind, I see your point---but that doesn't answer my question: When is
>> modelling design, and when is it analysis? A bald statement that
>> relation-based models are designed doesn't cut it, even if it is
>> seconded; I need rational arguments.
>
> If you are modeling features of the problem, it's analysis. If you are
> modeling features of both the problem and the solution, it's design.

Still too hand-wavy for my taste. You say you discover attributes as
part of analysis, but those attributes are also features of the solution.

Are you going to respond to the other points in my post?
--
Jon

JOG

unread,

Dec 6, 2007, 7:19:48 AM12/6/07

to

On Dec 6, 10:58 am, "David Cressey" <cresse...@verizon.net> wrote:
> "Jon Heggland" <jon.heggl...@ntnu.no> wrote in message

>
> news:fj8fff$3ed$1...@orkan.itea.ntnu.no...
>
>
>
> > Quoth David Cressey:

> > > "Jon Heggland" <jon.heggl...@ntnu.no> wrote in message

May I also add that research journals don't seem to be according to
the definition of E/R as an analysis tool.

This is from July 2007, Transcations of Information Systems, one of
the ACM's top rated journals:

"Entity-relationship (ER) modeling is a widely accepted technique for
conceptual database /design/."

Bob Badour

unread,

Dec 6, 2007, 8:57:37 AM12/6/07

to

Jon Heggland wrote:

> Quoth Bob Badour:
>
>>David Cressey wrote:
>>
>>>This could have consequences for performance, ease of
>>>programming, "natural joins" etc. etc.
>>
>>Performance is independent of choices at the logical level of discourse
>>where one identifies candidate keys or designates primary keys.
>
>
> Minor point: Is designation of primary keys really at the logical level?

Yes.

> Are there any substantial logical consequences?

The preponderance of foreign key references will refer to the primary
key when one designates a primary key.

Jon Heggland

unread,

Dec 6, 2007, 9:16:56 AM12/6/07

to

Quoth Bob Badour:

> Jon Heggland wrote:
>> Minor point: Is designation of primary keys really at the logical level?
>
> Yes.
>
>> Are there any substantial logical consequences?
>
> The preponderance of foreign key references will refer to the primary
> key when one designates a primary key.

Granted, but it will be explicit which key a foreign key refers to (or
it will be implicitly given by the attribute names in the foreign key).
Primary key designation is neither necessary nor sufficient for this.
--
Jon

Jon Heggland

unread,

Dec 6, 2007, 9:20:33 AM12/6/07

to

Quoth Jon Heggland:

> Granted, but it will be explicit which key a foreign key refers to (or
> it will be implicitly given by the attribute names in the foreign key).

Oops, sorry. SQL does indeed use primary key designation, not column
names, if you're not explicit. But SQL is well-known for mixing logical
and physical concerns. Anyway, I don't think this is a good enough
reason to say that key primacy is a logical issue.
--
Jon

Bob Badour

unread,

Dec 6, 2007, 9:31:15 AM12/6/07

to

Jon Heggland wrote:

Keys and references are logical issues and not physical issues. Physical
issues affect only performance.

Jon Heggland

unread,

Dec 6, 2007, 9:48:02 AM12/6/07

to

Quoth Bob Badour:

> Keys and references are logical issues and not physical issues. Physical
> issues affect only performance.

Keys are, but key /primacy/ is not. If one ignores any performance
differences between a primary key and any other key (as of course one
should, although it may not be possible in current SQL systems), the
remaining difference is merely syntactical convenience---hardly a
logical issue.
--
Jon

David Cressey

unread,

Dec 6, 2007, 10:39:28 AM12/6/07

to

"Jon Heggland" <jon.he...@ntnu.no> wrote in message

news:fj909f$div$1...@orkan.itea.ntnu.no...

"So often times it happens that we live our lives in chains
And we never even know we have the key "

The Eagles (Already Gone).

David Cressey

unread,

Dec 6, 2007, 10:41:00 AM12/6/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:9870400b-bfd4-452a...@s8g2000prg.googlegroups.com...

Unfortunate nomenclature.

Bob Badour

unread,

Dec 6, 2007, 11:41:13 AM12/6/07

to

Jon Heggland wrote:

That's an interesting take on it. While it is a syntactical shorthand,
it is a shorthand for a logical consideration.

Brian Selzer

unread,

Dec 6, 2007, 12:07:14 PM12/6/07

to

"Jon Heggland" <jon.he...@ntnu.no> wrote in message

news:fj8fff$3ed$1...@orkan.itea.ntnu.no...

> Quoth David Cressey:
>> "Jon Heggland" <jon.he...@ntnu.no> wrote in message
>> news:fj6737$o2p$1...@orkan.itea.ntnu.no...
>>> Quoth David Cressey:
>> Not all modeling is analysis. Some of it is design. In particular, I'm
>> going to claim that you discover attributes, but you design relvars.
>> I've
>> already have the second claim confirmed by Bob and others.
>
> Yet when are discovering attributes, you presumably write them down
> somehow. Is it the case that if you do it using E/R notation, you are
> doing analysis, but if you do it using some relation- or predicate-based
> representation, you are doing design?
>
> Or perhaps it's simpler: Analysis is what you're doing when you're
> talking with the subject matter experts; design is what you're doing
> when you're not. :)
>

It's even simpler yet: if you're trying to understand a problem, then you're
doing analysis; if you're trying to solve a problem you already understand,
then you're doing design.

>> Bob's distaste for pretty pictures should not obscure the mian theme. A
>> model isn't a "pretty picture" as such. Rather, a "pretty picture" is
>> the
>> projection of a model on a flat screen. Other projections have been
>> proposed. A table written on a whiteboard, with some imaginary sample
>> data
>> written into it, proposed by another participant, is another projection
>> of
>> a model on a flat screen.
>
> Ceci n'est pas une pipe... Then is the model wholly intangible, existing
> only in a platonic sense inside the designers (or analyst's) head? Never
> mind, I see your point---but that doesn't answer my question: When is
> modelling design, and when is it analysis? A bald statement that
> relation-based models are designed doesn't cut it, even if it is
> seconded; I need rational arguments.
>

If the model represents a requirement, then I would consider the activity
that produced it to be analysis; if the model represents a possible
implementation, then I would consider the activity that produced it to be
design.

What does presentation have to do with the classification of collections of
individuals into entities and relationships? Discovering and understanding
the individuals that are interesting and how those individuals relate and
interact is analysis. Presentation is about communicating that information.

rpost

unread,

Dec 6, 2007, 12:50:51 PM12/6/07

to

David Cressey wrote:

>Bingo! That's the big problem with the literature on ER. Many ER
>proponents use ER as if it were a design artifact. I think that 's a
>misapplication of the artifact, and I'm pretty sure Peter Chen would agree.
>If one is designing a relational system (including but not limited to a
>relational database) then using the relational model to capture the design
>is a much better idea.

Well, that reflects what "we" teach: make a model in ER then convert it
into a logical relational design. I thought it was how ER is *always* used.
Apparently I'm wrong.

--
Reinier Post
TU Eindhoven

rpost

unread,

Dec 6, 2007, 1:13:58 PM12/6/07

to

David Cressey wrote:

>> Genuine question guys. [...]
>> How do I decide whether I am dealing with a marriage entity or a
>> marriage relationship?

[...]

>> The literature I'm reading here is telling me that the choice is based
>> on what 'things' are key to the business. If my business is concerned
>> with people (tax collection say), 'marriage' is best modelled as a
>> relationship, whereas if the marriages themselves are my focus
>> (perhaps I run a church) then its probably better as an entity.
[...]

That's a meta-answer: only model the information you're going to need
and have. E.g. only put information about the marriage certificate into
the model if you need it and have it.

But that doesn't really help you with the question whether to model
a particular notion as a plain domain value, an entity, or a relationships.
To answer that question, you have to determine the functional dependencies
between them. (In E/R terms: the cardinalities of your relationships.)

Marriage is a relationship iff the only thing that identifies a marriage
is the people involved. Is this the case in your situation? E.g.
when you need to accomodate Elizabeth Taylor, who married the same
person twice, will you need to distinguish between her individual
marriages? If you do, you're going to need an additional attribute
(e.g. date of marriage will do), turning the relationship into
an entity (by definition).

[...]

>My answer is that it's subjective. If the subject matter experts all treat
>a marriage as a relationship, follow their lead. If the subject matter
>experts all treat it as an entity, follow their lead, but insist that
>there's a key attroubt that identifies it. (Do the SME's have any such
>thing as a "marriage ID" attribute?

The SME have a fairly unusual concept of marriage, as I just found out:

http://www.sme.org/cgi-bin/find-articles.pl?&ME05ART9&ME&20050209&&SME&

("Highlights of the tour include the body trim areas and the chassis/body
marriage operations.")

--
Reinier

rpost

unread,

Dec 6, 2007, 1:18:06 PM12/6/07

to

JOG wrote:

>Ok so one might summarize the following steps:
>1) initial analysis of business processes and important concepts.
>2) Formulation of an initial conceptual model (that is necessarily
>slanted to a certain viewpoint of the UoD).
>3) Translation into a nicely normalized logical model, that's query
>neutral.
>4) On demand, extract data back out from the neutral logical model,
>shaping it either the original conceptual view, or other conceptual
>views as needs arise from new applications.
>
>Great. This all makes perfect sense, and is very clear to boot. A
>simple process for creating a thorough yet flexible system. It seems
>obvious even, right?

>
>So why on earth would /anyone/ want to drop step 3? I'm at a loss as
>to why certain cdt'ers (who are clearly intelligent people) seem to be
>advocating this. An absolute loss I tell you.

Being new here, I have no idea whom you're referring to.
You'll never see *me* omit step 3, But I don't buy your "neutral".

--
Reinier

rpost

unread,

Dec 6, 2007, 1:26:04 PM12/6/07

to

JOG wrote:

>[...] Its not necessarily the tool, but
>the ethos of E/R and OOP literature seems to perpetuate this
>misunderstanding. XML, for its sins, has never really shared this due
>to its long history of schema transformation via XSLT.

!! I never realized this. Good point.

--
Reinier

rpost

unread,

Dec 6, 2007, 1:36:13 PM12/6/07

to

Jon Heggland wrote:

[...]

>You define relationships as anything that can be identified /only/ by
>its related entitites---if there is an alternate key, the thing is an
>entity. Correct? What is the rationale behind this rule?

It's the common definition. At least Silberschatz/Korth/Sudarshan
defines it this way.

--
Reinier

rpost

unread,

Dec 6, 2007, 1:40:20 PM12/6/07

to

JOG wrote:

>> By contrast I think a relationship is characterised as only being
>> identified by the entities that it relates.
>
>Well, coupla things. Relationships can have attributes too, and they
>can be part of the identifying set. And second, the attributes of an
>entity... well why can't they too be entities? Consider a 'team'
>entity say. It could be identified by the players in it. And those
>players are all entities themselves right? Perhaps there is a formal
>distinction between relationships and entities but I'm yet to read
>one, so until then, I have to posit that there is no difference
>between them.

Hmm ... and to think that I am having this whole discussion with you
elsewhere, in which my argument is *entirely based* on this distinction.

I guess we humans are not so good at unique identification of concepts
as we like to think :(

--
Reinier

rpost

unread,

Dec 6, 2007, 1:43:54 PM12/6/07

to

Jon Heggland wrote:

[...]

>Well, that depends on what analysis is. It seems this guy thinks it's
>the same as data modeling, which in turn is the same as developing a
>graphical representation of the client's needs and processes. Is it?

There we go again ... if a language is graphical it doesn't have to be
imprecise or informal. Textual languages tend to be more expressive
(in the sense of needing fewer square inches to express things) but
they are not automatically better in any other respect.

--
Reinier

rpost

unread,

Dec 6, 2007, 2:02:28 PM12/6/07

to

David Cressey wrote:

>> Apropos this distinction: As to whether marriage is a relationship or an
>> entity, you said that one should listen to the subject matter experts. I
>> have never had such an expert say to me, "No, that's not a relationship,
>> that's an entity!" or vice versa. Have you?
>
>Not in so many words. But they have said things like "a reservation for a
>certain car, on a certain date, by a certain customer has a way of
>identifying it.

Maybe not even that. Maybe they just use the word "reservation", a noun,
which implies that it apparently has some way of being identified.
As NIAM says: nouns -> entities, verbs -> relationships.
(Not a hard and fast rule, of course.)

>We call it a 'reservation number'.

To me this is an independent step: establishing the attributes on our
entity (or relationship), and whether they are compulsory and/or identifying.

>What you have now
>learned is that the UofD people think of a reservation as a thing in and of
>itself and not just an association between a customer and a car on some
>future date.

Funny, "thing" is exactly the word Chen uses in his paper, but I think
what it really means in this context is: whether it can be identified
independently.

>This tells you something you need to know about the problem statement: The
>database has to store reservation numbers.

If they mention reservation numbers, yes. If not, better ask them
how they (reliably) identify reservations, before you end up
using a surrgate.

[...]

>In some cases, the business rules will make the design decision for you. In
>other cases, the business rules are silent on this score.

In other cases, we end up carrying whole databases around of what should
really be database-internal surrogate ids just because the database
designers found it easier to make us remember which ids we happen to have
have in their databases, than to provide us with identification facilities
in terms we already use. And I can't even blame them.

--
rein...@win.tue.nl aka rp...@campus.tue.nl aka r.d.j...@tue.nl aka ......

>
>
>
>

David BL

unread,

Dec 6, 2007, 2:49:33 PM12/6/07

to

On Dec 6, 7:29 pm, JOG <j...@cs.nott.ac.uk> wrote:
> On Dec 6, 4:33 am, David BL <davi...@iinet.net.au> wrote:
>
> > On Dec 6, 10:47 am, JOG <j...@cs.nott.ac.uk> wrote:
>
> > > Either we are crossing terminology, or this has already been
> > > highlighted earlier in the thread with reference to marriages. E/R
> > > forces one to pick a single conceptual viewpoint (marriage as
> > > relationship/marriage as entity, etc), whereas a propositional
> > > encoding is neutral on the topic.
>
> > There is something I may be misunderstanding - can you put me right?
>
> I'll certainly lend you my point of view. I offer no guarantees that
> its right :)
>
>
>
> > I think an entity is characterised as something in the real world that
> > we want to be able to reference using a reasonably small identifier.
>
> I agree with the identification aspect, but see a small identifer as
> being an attractive quality as opposed to a definitional one.

Ok.

> > Typically an entity is subject to change over time, and we may not
> > have convenient access to a nice, convenient, small number of
> > measurable attributes that uniquely identifies it.
>
> Sure. And if the identifier is a large unwieldy set of attributes for
> ease we might invent a surrogate to represent that set.

Yes.

> > Nevertheless we
> > believe that the identifier can be used successfully in the real
> > world. This could depend on the population statistics, such as the
> > way we identify people around us by their facial characteristics, and
> > it is helpful that people don't tend to have extensive plastic
> > surgery, exchange body parts on a regular basis or cross dress. This
> > allows them to be identified by names like "John".
>
> I'm not sure I'm following this entirely David. "John" is just another
> attribute, and it can be used to refer to someone, but we know it
> wouldn't identify them uniquely. The reason we can get away with using
> it in day to day conversation is because humans are incredibly good at
> resolving context (whereas a computer is not). We'd only use someone's
> first name alone when we know its use will not be ambiguous.

My only point is that in particular contexts we are comfortable with
thinking of humans as entities that we are able to identify.

> > By contrast I think a relationship is characterised as only being
> > identified by the entities that it relates.
>
> Well, coupla things. Relationships can have attributes too, and they
> can be part of the identifying set.

Agreed, but that can be accommodated without upsetting the distinction
I'm alluding to.

> And second, the attributes of an
> entity... well why can't they too be entities? Consider a 'team'
> entity say. It could be identified by the players in it. And those
> players are all entities themselves right? Perhaps there is a formal
> distinction between relationships and entities but I'm yet to read
> one, so until then, I have to posit that there is no difference
> between them.

I doubt whether there is a *formal* difference.

In the above example, a team that is independently identified would be
regarded as an entity not a relationship, despite the fact that it

could be identified by the players in it.

> > Now I see the same distinction being made in a propositional

> > encoding.
>
> I'm not sure why - there are no entities in a propositional encoding,
> just roles and values.

The RM formalism has attributes, domains, tuples and relations. Are
roles part of the formalism or outside?

I see entities come into the propositional encoding in the
instantiations of the intentional definitions of the predicates. This
is outside the mathematical formalism. However, of what practical use
is a relation without a well defined intensional definition?

> > The intensional definition of the following predicate implicitly
> > assumes Husband, Wife and Location correspond to entity types that can
> > be identified using domain values. Evidently marriage is merely a
> > relationship between them.
>
> > married(Husband, Wife, Location) :- Husband married Wife at
> > Location
>
> > By contrast the following predicates are consistent with thinking of a
> > marriage itself as an entity
>
> > husband(MarriageId, Husband).
> > wife(MarriageId, Wife).
> > location(MarriageId, Location).
>
> > or maybe just
>
> > married(MarriageId, Husband, Wife, Location)
>
> Summarizing the above I gathered that you are saying that:
> RELATIONSHIP: married(Husband, Wife, Location)
> ENTITY: married(MarriageId, Husband, Wife, Location)

> So the only difference is that the entity has a marriageID? I am not
> clear why you think the addition of this surrogate would change a
> relationship into an entity!

In the first case the marriage is a relationship because it is only
identified indirectly by the entities that are involved. In the
second case the marriage has been directly identified. That seems
like a significant difference in the logical layer. It shows up in
the intensional definitions where a marriage takes on a "role" as an
entity.

In the second case there must be some underlying reason to introduce a
marriage identifier. That reason points at a significant difference
in requirements. Don't assume the marriage identifier is a surrogate
id! Instead assume this is a well conceived design, and it's a
natural identifier.

> > Now whether one "thinks in ER" or "thinks in propositional encodings",
> > there has to be good reason to introduce a MarriageId.
>
> I'm not clear at all why you have introduced a MarriageID. If the
> 'relationship' is identifiable by {H,W} say then so is the 'entity'.
> Why have you introduced the surrogate?

Circumstances are different in the second case, and that's what made
the DBA come up with a different logical design. Maybe {H,W} doesn't
identify a marriage. Maybe the model needs to store detailed
information about polygamous marriage ceremonies and the MarriageId is
a stable, natural key.

David BL

unread,

Dec 6, 2007, 2:53:09 PM12/6/07

to

On Dec 6, 5:37 pm, Jon Heggland <jon.heggl...@ntnu.no> wrote:
> Quoth David BL:
>
> > By contrast I think a relationship is characterised as only being
> > identified by the entities that it relates.
>
> > Now I see the same distinction being made in a propositional
> > encoding.
>
> > The intensional definition of the following predicate implicitly
> > assumes Husband, Wife and Location correspond to entity types that can
> > be identified using domain values. Evidently marriage is merely a
> > relationship between them.
>
> > married(Husband, Wife, Location) :- Husband married Wife at
> > Location
>
> So you intend that Location is necessary for identification?

I wasn't actually intending that Location be necessary for
identification of a marriage. I'll make the intensional definition
clearer:-

married(Husband, Wife, Location) :-
Husband is *currently* married to Wife
and they (last) got married at Location

Candidate keys are { Husband } or { Wife }, enforcing monogamy
integrity constraints.

> What if
> Location does not correspond to an entity type? I assume that might be
> the case if it were simply a spatial coordinate. Is marriage still a
> relationship? If not, what is is?

I agree that it is strange to think of a spatial coordinate as an
entity type.

I guess this example illustrates why ERM allows relationships between
entities to support attributes.

> (And "merely" a relationship? Are entities better than relationships in
> some way?)

No definitely not. I seem to have an unfortunate habit of using
"merely" when I should use "only".

Maybe my subconscious was accounting for the notion that entities are
characterised as things in our perception that for some reason
"deserve" to be identified rather directly - often with their own
special name, whereas relationships are identified indirectly.

> > By contrast the following predicates are consistent with thinking of a
> > marriage itself as an entity
>
> > husband(MarriageId, Husband).
> > wife(MarriageId, Wife).
> > location(MarriageId, Location).
>
> > or maybe just
>
> > married(MarriageId, Husband, Wife, Location)
>
> > Now whether one "thinks in ER" or "thinks in propositional encodings",
> > there has to be good reason to introduce a MarriageId.
>
> Perhaps marriage certificates has a unique number stamped on them, and
> you want to record this in your database.

Yes, that could be a reason. Consider the following intensional
definition

married(MarriageId, Husband, Wife, Location) :-
The marriage identified by MarriageId was
between Husband and Wife at Location.

In this case the relation can record marriages that are no longer
current, and the only candidate key is { MarriageId }.

> Anyway, your first predicate breaks down if two people can marry each
> other more than once on the same location, which is the case in many
> parts of the world.

No, the first predicate doesn't break down if the intension is
*current* marriages.

> I'd suggest you add a DateTime as well.

Yes, that would be useful in practise.

> Does that
> make the predicate an entity? Assuming monogamy, it can be identified by
> { Husband, DateTime }, { Wife, DateTime } or perhaps even { Location,
> DateTime } if we preclude mass weddings. What if the DateTime
> "corresponds to an entity type"? (What determines whether there is such
> a correspondence?)

Well I can see you are technically picking holes in my
"characterisation" of entity, but I wasn't really putting forward a
definition. I think the "characterisation" was very informal and more
along the lines of being necessary but not sufficient.

Actually, I would prefer to steer clear of trying to pin down such
informal words as "entity" and "relationship".

However, I still believe particular entity types are implicit in the
intensional definitions of the predicates. How can they not be?

Although the intensional definitions are strictly outside the
mathematical formalism of the RM, they are nevertheless fundamental to
the meaning and purpose of the database.

> > Aren't you implying that a propositional encoding doesn't commit you
> > to a decision about whether a marriage is implicitly or explicitly
> > identified? I fail to see how that is possible.

This is the question I would like you to comment on! Do you agree
that a predicate can treat a marriage like a relationship, or
otherwise like an entity, which seems at odds with this idea of a
neutral logical layer?

> You define relationships as anything that can be identified /only/ by
> its related entitites---if there is an alternate key, the thing is an
> entity. Correct? What is the rationale behind this rule?

It is only a rather vague characterisation, and about as precise as
the entity/relationship distinction deserves!

My rationale is as follows: If you study some real life ER diagrams
you will tend to find that entities are associated with nouns, and
relationships are associated with verbs or actions. Here are some
binary relationships taken from chapter 14 in Date's Introduction to
Database Systems

entity relationship entity
-------------------------------------
supplier ships part
department employs employee
employee works on project

Now in natural language we don't normally name instances of verbs or
actions, but instead name subject and object. Eg "Jack kicks John".
We don't think of the kick action as an entity that needs to be
identified independently of Jack and John. Furthermore a kick is less
tangible - it has a fleeting existence.

Note furthermore that verb phrases include other types of
relationships that don't correspond to actions, such as "is less
than", "has" or "is the father of". Note how silly it would seem to
name a particular instance of a "has" relationship - ie between a
particular pair of entities.

> (And what if the alternative means of identification is also a
> combination of entities?)

Relationships won't tend to do that!

At this rather informal level of discussion, relationships are
counterparts to relations and as we know, 1) tuple identifiers are not
needed in the RM, and 2) it would be strange indeed if a proposition
could be stated in some equivalent form on an alternative set of
entities.

JOG

unread,

Dec 6, 2007, 3:36:59 PM12/6/07

to

On Dec 6, 7:49 pm, David BL <davi...@iinet.net.au> wrote:
> [snip general agreement]

> > > Nevertheless we
> > > believe that the identifier can be used successfully in the real
> > > world. This could depend on the population statistics, such as the
> > > way we identify people around us by their facial characteristics, and
> > > it is helpful that people don't tend to have extensive plastic
> > > surgery, exchange body parts on a regular basis or cross dress. This
> > > allows them to be identified by names like "John".
>
> > I'm not sure I'm following this entirely David. "John" is just another
> > attribute, and it can be used to refer to someone, but we know it
> > wouldn't identify them uniquely. The reason we can get away with using
> > it in day to day conversation is because humans are incredibly good at
> > resolving context (whereas a computer is not). We'd only use someone's
> > first name alone when we know its use will not be ambiguous.
>
> My only point is that in particular contexts we are comfortable with
> thinking of humans as entities that we are able to identify.

Er, ok. I can't disagree with you there. (Although I think I'm missing
where you were going with that point though.)

>
> > > By contrast I think a relationship is characterised as only being
> > > identified by the entities that it relates.
>
> > Well, coupla things. Relationships can have attributes too, and they
> > can be part of the identifying set.
>
> Agreed, but that can be accommodated without upsetting the distinction
> I'm alluding to.

Ok. gotcha.

> [schnnippp]

> > > Now I see the same distinction being made in a propositional
> > > encoding.
>
> > I'm not sure why - there are no entities in a propositional encoding,
> > just roles and values.
>
> The RM formalism has attributes, domains, tuples and relations. Are
> roles part of the formalism or outside?

Apologies my fault there. I use the term roles as per the ORM, but I
should have said attributes (even thought they are pretty analagous).
I prefer to use the term role because it is more specific, whereas
attribute is a subtly ambiguous word - sometimes its used to denote
attribute name, sometimes the attribute-value, and sometimes both. But
yes anyhow, I meant "there are no entities in a propositional
encoding, just attributes and values".

>
> I see entities come into the propositional encoding in the
> instantiations of the intentional definitions of the predicates. This
> is outside the mathematical formalism. However, of what practical use
> is a relation without a well defined intensional definition?

Yup, I see what you're saying (if you meant intenSional definitions
that is!), but I'm not sure I agree. Intensional definitions only
refer to rules concerning valid values for predicate variables, not
valid entities. If I've missed a trick there, perhaps its worth an
example?

> [ker-shhhnip]

> > Summarizing the above I gathered that you are saying that:
> > RELATIONSHIP: married(Husband, Wife, Location)
> > ENTITY: married(MarriageId, Husband, Wife, Location)
> > So the only difference is that the entity has a marriageID? I am not
> > clear why you think the addition of this surrogate would change a
> > relationship into an entity!
>
> In the first case the marriage is a relationship because it is only
> identified indirectly by the entities that are involved. In the
> second case the marriage has been directly identified.

Ok, sorry if I'm being dense, but are you saying that the second case
is an entity because it can be identified without reference to another
entity? And would a logical consequence be therefore that no
identifying attributes of an entity may be entities themselves?

> That seems
> like a significant difference in the logical layer. It shows up in
> the intensional definitions where a marriage takes on a "role" as an
> entity.
>
> In the second case there must be some underlying reason to introduce a
> marriage identifier. That reason points at a significant difference
> in requirements. Don't assume the marriage identifier is a surrogate
> id! Instead assume this is a well conceived design, and it's a
> natural identifier.

Well hey, I don't think a surrogate means a poorly conceived design.
But I take your point, the Marriage ID in this case is coming from
some external source, and hasn't been instigated by the designer of
this modeller. Regards, J.

Jonathan Leffler

unread,

Dec 6, 2007, 6:32:48 PM12/6/07

to

Consider a database containing a relation containing information about
'the elements' - as in hydrogen, helium, etc.

The elements table has 3 keys (candidate keys):
Element name
Atomic number
Element symbol

Now, consider what else is stored in the database. For the analysis of
isotopes, the atomic number is the important key - the different
isotopes of hydrogen all share the same atomic number, but have
different names (deuterium and tritium) even though chemically they are
all hydrogen.

For the analysis of chemical compounds, it is much more familiar to use
the element symbol - more people have come across H2O and CO2 than are
familiar with 1/2, 8/1 and 6/1, 8/2 (where I'm using atomic number /
multiplicity in the second notation). I'm glossing over some notational
inconveniences (consider the relational representation of your old
friend C2H5OH, for example), but the point remains - for some purposes,
the better key to use is atomic number and for other purposes, the
better key to use is element symbol.

Which key to use is a logical issue here, isn't it?

--
Jonathan Leffler #include <disclaimer.h>
Email: jlef...@earthlink.net, jlef...@us.ibm.com
Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/

publictimestamp.org/ptb/PTB-1963 sha224 2007-12-06 21:00:03
0AC762E1452FAE2896292EA605A8D66B9FEE09F8E55C0B87073F31DA

David BL

unread,

Dec 6, 2007, 7:41:53 PM12/6/07

to

On Dec 7, 5:36 am, JOG <j...@cs.nott.ac.uk> wrote:
> On Dec 6, 7:49 pm, David BL <davi...@iinet.net.au> wrote:
>
>
>
>
>
> > [snip general agreement]
> > > > Nevertheless we
> > > > believe that the identifier can be used successfully in the real
> > > > world. This could depend on the population statistics, such as the
> > > > way we identify people around us by their facial characteristics, and
> > > > it is helpful that people don't tend to have extensive plastic
> > > > surgery, exchange body parts on a regular basis or cross dress. This
> > > > allows them to be identified by names like "John".
>
> > > I'm not sure I'm following this entirely David. "John" is just another
> > > attribute, and it can be used to refer to someone, but we know it
> > > wouldn't identify them uniquely. The reason we can get away with using
> > > it in day to day conversation is because humans are incredibly good at
> > > resolving context (whereas a computer is not). We'd only use someone's
> > > first name alone when we know its use will not be ambiguous.
>
> > My only point is that in particular contexts we are comfortable with
> > thinking of humans as entities that we are able to identify.
>
> Er, ok. I can't disagree with you there. (Although I think I'm missing
> where you were going with that point though.)

So am I :)

> > > > By contrast I think a relationship is characterised as only being
> > > > identified by the entities that it relates.
>
> > > Well, coupla things. Relationships can have attributes too, and they
> > > can be part of the identifying set.
>
> > Agreed, but that can be accommodated without upsetting the distinction
> > I'm alluding to.
>
> Ok. gotcha.
>
> > [schnnippp]
> > > > Now I see the same distinction being made in a propositional
> > > > encoding.
>
> > > I'm not sure why - there are no entities in a propositional encoding,
> > > just roles and values.
>
> > The RM formalism has attributes, domains, tuples and relations. Are
> > roles part of the formalism or outside?
>
> Apologies my fault there. I use the term roles as per the ORM, but I
> should have said attributes (even thought they are pretty analagous).
> I prefer to use the term role because it is more specific, whereas
> attribute is a subtly ambiguous word - sometimes its used to denote
> attribute name, sometimes the attribute-value, and sometimes both. But
> yes anyhow, I meant "there are no entities in a propositional
> encoding, just attributes and values".

In the formalism I think of "attribute" as a pair of name and domain,
and certainly not a value.

> > I see entities come into the propositional encoding in the
> > instantiations of the intentional definitions of the predicates. This
> > is outside the mathematical formalism. However, of what practical use
> > is a relation without a well defined intensional definition?
>
> Yup, I see what you're saying (if you meant intenSional definitions
> that is!)

Oops.

> , but I'm not sure I agree. Intensional definitions only
> refer to rules concerning valid values for predicate variables, not
> valid entities. If I've missed a trick there, perhaps its worth an
> example?

An intensional definition should uniquely define a corresponding
extension. For example

predicate:
album(N)

intension:
String N is the name of a studio album
released before 2003, of the band Garbage

extension:
{
{N=Garbage},
{N=Version 2.0},
{N=beautifulgarbage}
}

An instantiation of the intensional definition is

String "Version 2.0" is the name of a studio
album released before 2003, of the band Garbage

This natural language sentence refers to a studio album entity in the
real world. The name value is distinct from the entity. The
existence of the entity is implied by the intensional definition.

> > [ker-shhhnip]
> > > Summarizing the above I gathered that you are saying that:
> > > RELATIONSHIP: married(Husband, Wife, Location)
> > > ENTITY: married(MarriageId, Husband, Wife, Location)
> > > So the only difference is that the entity has a marriageID? I am not
> > > clear why you think the addition of this surrogate would change a
> > > relationship into an entity!
>
> > In the first case the marriage is a relationship because it is only
> > identified indirectly by the entities that are involved. In the
> > second case the marriage has been directly identified.
>
> Ok, sorry if I'm being dense, but are you saying that the second case
> is an entity because it can be identified without reference to another
> entity? And would a logical consequence be therefore that no
> identifying attributes of an entity may be entities themselves?

Yes.

X is an entity in the context of the model if there exists set A of
attributes + values that identify X and there doesn't exist a subset
of A that identifies a different entity Y.

Actually this attempted definition isn't quite right. For example you
could determine that a team is an entity except for the fact that you
end up identifying the team captain as well. Perhaps that problem
can be fixed by talking about maximal entity types corresponding to
cartesian products of domains and noting that team identifiers aren't
suitable for identifying players more generally. Maybe Reinier can
help.

Jon Heggland

unread,

Dec 7, 2007, 3:21:51 AM12/7/07

to

Quoth rpost:

Straw man. I don't believe I've said anything in general about the
qualities of graphical languages versus textual. I merely observed that
this guy's assumption that analysis = data modeling = drawing something
is dubious.
--
Jon

Jon Heggland

unread,

Dec 7, 2007, 3:36:56 AM12/7/07

to

Quoth Brian Selzer:

> "Jon Heggland" <jon.he...@ntnu.no> wrote in message
> news:fj8fff$3ed$1...@orkan.itea.ntnu.no...

>> Or perhaps it's simpler: Analysis is what you're doing when you're
>> talking with the subject matter experts; design is what you're doing
>> when you're not. :)
>
> It's even simpler yet: if you're trying to understand a problem, then you're
> doing analysis; if you're trying to solve a problem you already understand,
> then you're doing design.

Too simple. It assumes that understanding is binary: either you
understand the problem, or you don't. Furthermore, that you know whether
or not you understand it.

> If the model represents a requirement, then I would consider the activity
> that produced it to be analysis; if the model represents a possible
> implementation, then I would consider the activity that produced it to be
> design.

And if it represents both? How can you sharply delineate one from the other?

>> In most E/R notations, you cannot represent the alternate
>> key---reservation number---if a reservation is a relationship. Vice
>> versa, if it is an entity, you cannot represent the { CustomerID, CarID,
>> Date } key. This means that you have to have an underlying model, of
>> which any graphical E/R diagrams are merely simplified views. I agree to
>> this, but it raises two points:
>>
>> 1. The underlying model cannot have a strict distinction between
>> entities and relationships, since the same concept---reservation---can
>> be thought of and presented as both. This relegates entity/relationship
>> thinking to a question of presentation, not analysis.
>
> What does presentation have to do with the classification of collections of
> individuals into entities and relationships? Discovering and understanding
> the individuals that are interesting and how those individuals relate and
> interact is analysis. Presentation is about communicating that information.

I can only repeat what I've said: As far as I can tell, the decision of
whether or not an "individual" is an entity or a relationship is quite
arbitrary---it may have aspects of both. To communicate all these
aspects, it may be necessary/useful to present it sometimes as an
entity, and sometimes as a relationship. If instead you insist on
classifying your individual as /either/ and entity /or/ a relationship,
you lose information.
--
Jon

Jon Heggland

unread,

Dec 7, 2007, 3:42:27 AM12/7/07

to

Quoth rpost:

But this conversion is fairly mechanical. Is "design" in this case just
the little bit of human input that enters this process?
--
Jon

Jon Heggland

unread,

Dec 7, 2007, 4:08:35 AM12/7/07

to

Quoth Jonathan Leffler:

> Now, consider what else is stored in the database. For the analysis of
> isotopes, the atomic number is the important key - the different
> isotopes of hydrogen all share the same atomic number, but have
> different names (deuterium and tritium) even though chemically they are
> all hydrogen.
>
> For the analysis of chemical compounds, it is much more familiar to use
> the element symbol - more people have come across H2O and CO2 than are
> familiar with 1/2, 8/1 and 6/1, 8/2 (where I'm using atomic number /
> multiplicity in the second notation). I'm glossing over some notational
> inconveniences (consider the relational representation of your old
> friend C2H5OH, for example), but the point remains - for some purposes,
> the better key to use is atomic number and for other purposes, the
> better key to use is element symbol.
>
> Which key to use is a logical issue here, isn't it?

Which key to use for what is definitely a logical issue, but designating
one as primary does not mandate how it is used. I'm not sure if you are
agreeing or disagreeing with me..?
--
Jon

Jon Heggland

unread,

Dec 7, 2007, 6:17:34 AM12/7/07

to

Quoth David BL:

> I wasn't actually intending that Location be necessary for
> identification of a marriage. I'll make the intensional definition
> clearer:-
>
> married(Husband, Wife, Location) :-
> Husband is *currently* married to Wife
> and they (last) got married at Location
>
> Candidate keys are { Husband } or { Wife }, enforcing monogamy
> integrity constraints.

So Marriage is a relationship between a Husband and a Wife, yet it is
identified by either, not the combination? I thought I finally had the
common definition of "relationship" pegged, and then this comes along.

I suppose I am looking for rigor where there is none, though. The
definition of entity---something that is identified independently of
other entities---is also rather half-baked. Take weak entities, for
instance.

>> What if
>> Location does not correspond to an entity type? I assume that might be
>> the case if it were simply a spatial coordinate. Is marriage still a
>> relationship? If not, what is is?
>
> I agree that it is strange to think of a spatial coordinate as an
> entity type.

My attempted point was to posit that a Location might be a non-entity
attribute, /or/ an entity (say, a church or temple or official building
of some kind), and to ask whether this made any difference as to whether
the marriage is a relationship or an entity.

>>> By contrast the following predicates are consistent with thinking of a
>>> marriage itself as an entity
>>> husband(MarriageId, Husband).
>>> wife(MarriageId, Wife).
>>> location(MarriageId, Location).
>>> or maybe just
>>> married(MarriageId, Husband, Wife, Location)
>>> Now whether one "thinks in ER" or "thinks in propositional encodings",
>>> there has to be good reason to introduce a MarriageId.
>> Perhaps marriage certificates has a unique number stamped on them, and
>> you want to record this in your database.
>
> Yes, that could be a reason. Consider the following intensional
> definition
>
> married(MarriageId, Husband, Wife, Location) :-
> The marriage identified by MarriageId was
> between Husband and Wife at Location.
>
> In this case the relation can record marriages that are no longer
> current, and the only candidate key is { MarriageId }.

And you say this now is an entity, right? But what if the MarriageId
represents an entity, just like Husband and Wife presumably do? Do we
not then have a situation analogous to the first case, except that the
relationship is ternary? It is, after all, identified by one of the
entities it relates, just like the first Marriage.

> Well I can see you are technically picking holes in my
> "characterisation" of entity, but I wasn't really putting forward a
> definition. I think the "characterisation" was very informal and more
> along the lines of being necessary but not sufficient.

Why necessary?

> Actually, I would prefer to steer clear of trying to pin down such
> informal words as "entity" and "relationship".

But that is one of the main points of my involvement in this discussion!

> However, I still believe particular entity types are implicit in the
> intensional definitions of the predicates. How can they not be?

I think the burden of proof is on your side here. I have tried to show
that predicates can be interpreted as both entities and relationships
(and probably even as parts of entities). You seem to admit as much,
given that you you use the word "informal" about the whole E/R shebang.
I note that Jan Hidders claimed that the E/R distinction carries through
to the logical/relational level, but I cannot see how---except by
arbitrary claims along the lines of "if a relvar has but one key, and
this key's components are all foreign keys, then the relvar represents a
relationship".

I have also tried to show that the predicates, with keys, represent more
information---important information, at that---than a classification of
things into entities and relationships, with entities having a single
key and relationships being identified by their entities (or some subset
thereof?). And that the predicate representation is simpler, in a way,
than E/R, since it does not have the fuzzy distinction between entities
and relationships---it only has the predicate construct. I therefore
consider an E/R model insufficient, or too simplified to be of much use.
This, of course, is a matter of taste.

> Although the intensional definitions are strictly outside the
> mathematical formalism of the RM, they are nevertheless fundamental to
> the meaning and purpose of the database.

Fundamental? Why?

I have a hypothesis. It is that you are so used to thinking about a
database in terms of entities and relationships that it is impossible
for you to view it any other way. (The idea of viewing a database as a
collection of facts was a revelation for me in that regard.) In fact, I
have the opposite problem; I am unable to look at an E/R diagram without
thinking about relations.

Consider this proposition: "Jon was born in 1974", encoded in a relvar
of the form Born(Person, Year). I think we'll agree that represents a
fact about Jon. You would probably assume that Jon is an entity (though
I'm unsure about what you'd call the relvar/predicate in itself---is it
an entity (type)?). But I would also say that the proposition is as much
a fact about the year 1974! Is 1974 an entity? I really don't care.
Facts are all.

>>> Aren't you implying that a propositional encoding doesn't commit you
>>> to a decision about whether a marriage is implicitly or explicitly
>>> identified? I fail to see how that is possible.
>
> This is the question I would like you to comment on! Do you agree
> that a predicate can treat a marriage like a relationship, or
> otherwise like an entity, which seems at odds with this idea of a
> neutral logical layer?

A propositional encoding does specify how this marriage is identified,
yes. What I dispute is the distinction between implicit and explicit
identification; between entities and relationship. A tuple/fact is
identified by some combination of its attributes, that is all.

In order to be able to say that a predicate treats something like an
entity or a relationship, you would need to define precisely what this
treatment entails---i.e., you would have to 'pin down such informal

words as "entity" and "relationship"'.

Even if you were able and willing to do that, it would be backwards to
claim that the logical layer makes the distinction. You could with as
much justification define some mapping from predicates to either ducks
or cats, and then claim that the logical model is not animal-independent.

>> You define relationships as anything that can be identified /only/ by
>> its related entitites---if there is an alternate key, the thing is an
>> entity. Correct? What is the rationale behind this rule?
>
> It is only a rather vague characterisation, and about as precise as
> the entity/relationship distinction deserves!
>

> My rationale is as follows: [...]

>
> Now in natural language we don't normally name instances of verbs or
> actions, but instead name subject and object. Eg "Jack kicks John".
> We don't think of the kick action as an entity that needs to be
> identified independently of Jack and John. Furthermore a kick is less
> tangible - it has a fleeting existence.

It seems to me that the more significant point is whether or not the
kick exists independently of Jack and John. Presumably, it doesn't, and
introducing/discovering an alternate identifier should not change this.

Is "X kicked Y at time T" (i.e. Jack can kick John multiple times) still
a relationship? Perhaps the definition is "something that cannot be
identified independently of other things", instead of "something that is
identified solely by other things"? Although that would make a weak
entity a kind of relationship...

> Note furthermore that verb phrases include other types of
> relationships that don't correspond to actions, such as "is less
> than", "has" or "is the father of". Note how silly it would seem to
> name a particular instance of a "has" relationship - ie between a
> particular pair of entities.

I have an apartment, and I can put a name to my ownership. But I see
your point.

>> (And what if the alternative means of identification is also a
>> combination of entities?)
>
> Relationships won't tend to do that!

Never mind; with my current understanding of the "definition" of
relation, it wouldn't make a difference.

> At this rather informal level of discussion, relationships are
> counterparts to relations and as we know,

They are? What, then, are the counterparts of entities and attributes?
--
Jon

JOG

unread,

Dec 7, 2007, 7:11:27 AM12/7/07

to

On Dec 7, 12:41 am, David BL <davi...@iinet.net.au> wrote:
> On Dec 7, 5:36 am, JOG <j...@cs.nott.ac.uk> wrote:
> > On Dec 6, 7:49 pm, David BL <davi...@iinet.net.au> wrote:

> [snip]

> > , but I'm not sure I agree. Intensional definitions only
> > refer to rules concerning valid values for predicate variables, not
> > valid entities. If I've missed a trick there, perhaps its worth an
> > example?
>
> An intensional definition should uniquely define a corresponding
> extension. For example
>
> predicate:
> album(N)
>
> intension:
> String N is the name of a studio album
> released before 2003, of the band Garbage
>
> extension:
> {
> {N=Garbage},
> {N=Version 2.0},
> {N=beautifulgarbage}
> }
>
> An instantiation of the intensional definition is
>
> String "Version 2.0" is the name of a studio
> album released before 2003, of the band Garbage
>
> This natural language sentence refers to a studio album entity in the
> real world. The name value is distinct from the entity. The
> existence of the entity is implied by the intensional definition.

Yup, we did have crossed wires - I thought you were referring to
integrity constraints (which of course are very much part of a
relation's intension too).

>
>
>
> > > [ker-shhhnip]
> > > > Summarizing the above I gathered that you are saying that:
> > > > RELATIONSHIP: married(Husband, Wife, Location)
> > > > ENTITY: married(MarriageId, Husband, Wife, Location)
> > > > So the only difference is that the entity has a marriageID? I am not
> > > > clear why you think the addition of this surrogate would change a
> > > > relationship into an entity!
>
> > > In the first case the marriage is a relationship because it is only
> > > identified indirectly by the entities that are involved. In the
> > > second case the marriage has been directly identified.
>
> > Ok, sorry if I'm being dense, but are you saying that the second case
> > is an entity because it can be identified without reference to another
> > entity? And would a logical consequence be therefore that no
> > identifying attributes of an entity may be entities themselves?
>
> Yes.

O.k., so the next natural question is back to our team entity...

Team(goalkeeper:Jim, defender:David, Midfielder:Jon, Attacker:Bob)

But by the definition you are positing, there is no team entity here
at all right? Because the identifying attributes are entities
themselves (assuming people are entities of course)? Hmmm....

>
> X is an entity in the context of the model if there exists set A of
> attributes + values that identify X and there doesn't exist a subset
> of A that identifies a different entity Y.
>
> Actually this attempted definition isn't quite right. For example you
> could determine that a team is an entity except for the fact that you
> end up identifying the team captain as well. Perhaps that problem
> can be fixed by talking about maximal entity types corresponding to
> cartesian products of domains and noting that team identifiers aren't
> suitable for identifying players more generally.

Maximal entity types... er.... sounds like things are getting more
convoluted. Which brings me to the real question. Whats the point of
all this? Why not give up on trying to make what I still feel is an
artificial split between entities and relationships? What is gained by
that split that makes it worth the effort?

As ERM has evolved relationships are looking, more and more like
entities anyhow. Now they are shapes not lines, now they can have
attributes too... One more step, make them boxes instead of diamonds,
and the job is done. What would we have lost then? Might I invoke
occam's razor and say that a system with one type is preferable to
two, unless making a split has any real benefits?

>
> > > That seems
> > > like a significant difference in the logical layer. It shows up in
> > > the intensional definitions where a marriage takes on a "role" as an
> > > entity.
>
> > > In the second case there must be some underlying reason to introduce a
> > > marriage identifier. That reason points at a significant difference
> > > in requirements. Don't assume the marriage identifier is a surrogate
> > > id! Instead assume this is a well conceived design, and it's a
> > > natural identifier.
>
> > Well hey, I don't think a surrogate means a poorly conceived design.
> > But I take your point, the Marriage ID in this case is coming from
> > some external source, and hasn't been instigated by the designer of
> > this modeller.

Jon seems to have covered a lot of comments I might make, so I won't
replicate his post. Instead I shall drink tea.

mAsterdam

unread,

Dec 7, 2007, 7:55:41 AM12/7/07

to

Sorry for butting in this late, and not even completely on topic.
'Facts' triggered my interest.

Jon Heggland schreef:

> ...(The idea of viewing a database as a

> collection of facts was a revelation for me in that regard.) In fact, I
> have the opposite problem; I am unable to look at an E/R diagram without
> thinking about relations.
>
> Consider this proposition: "Jon was born in 1974", encoded in a relvar
> of the form Born(Person, Year). I think we'll agree that represents a
> fact about Jon. You would probably assume that Jon is an entity (though
> I'm unsure about what you'd call the relvar/predicate in itself---is it
> an entity (type)?). But I would also say that the proposition is as much
> a fact about the year 1974! Is 1974 an entity? I really don't care.
> Facts are all.

Consider the statement "Jon is 33 years old". It conveys the same real
world fact in a clumsier way than "Jon was born in 1974". Next year it
won't even convey the same fact anymore. "Jon was born in 1974"
catches the invariant better than "Jon is 33 years old".

Consider "John is in Canada". When? The fact isn't
complete without that piece of information.

"<Person> is at <Location>" needs a time to become a possibly
interesting facttype:
"At <Time>, <Person> is/was/will be at <Location>"

The idea of viewing a database as a collection of

possibly interesting facts was a revelation for me :-)

[snip]

> A propositional encoding does specify how this marriage is identified,
> yes. What I dispute is the distinction between implicit and explicit
> identification; between entities and relationship. A tuple/fact is
> identified by some combination of its attributes, that is all.

minus "that is all": agreed. The choice of the combination of
attributes takes care, some of which is specific to the facttype
(was_born vs. has_age), and some of which is more general (At time:
...).

David Cressey

unread,

Dec 7, 2007, 8:15:28 AM12/7/07

to

"Jon Heggland" <jon.he...@ntnu.no> wrote in message

news:fjba56$p98$1...@orkan.itea.ntnu.no...

Unlike Jan, I do not claim that the distinction between entities and
relationships carries through to the logical/relational level.

Indeed, the more I look at entity tables, the more I think that they don't
really describe entities at all! What they really describe are "unary
relationships" that span the attributes that "pertain to the same entity".
In this way of looking at things, the only thing that makes it down to the
logical/relational level are relationships. The entities are abstractions.

Given that relvars, as currently implemented, identify the attributes that
make up the tuples by name, rather than by position, I think it's fair to
say that the state of a relvar is a "relationship" rather than a
"relation". Most of the regulars in here use the word "relation" in the
data modeling sense, and don't make any distinction between tuples whose
components are known by name and tuples whose components are known by
position. I adopt that terminlogy myself.

David Cressey

unread,

Dec 7, 2007, 8:27:14 AM12/7/07

to

"mAsterdam" <mAst...@vrijdag.org> wrote in message
news:475941b3$0$235$e4fe...@news.xs4all.nl...

> Sorry for butting in this late, and not even completely on topic.
> 'Facts' triggered my interest.
>
> Jon Heggland schreef:
>
> > ...(The idea of viewing a database as a
> > collection of facts was a revelation for me in that regard.) In fact, I
> > have the opposite problem; I am unable to look at an E/R diagram without
> > thinking about relations.
> >
> > Consider this proposition: "Jon was born in 1974", encoded in a relvar
> > of the form Born(Person, Year). I think we'll agree that represents a
> > fact about Jon. You would probably assume that Jon is an entity (though
> > I'm unsure about what you'd call the relvar/predicate in itself---is it
> > an entity (type)?). But I would also say that the proposition is as much
> > a fact about the year 1974! Is 1974 an entity? I really don't care.
> > Facts are all.
>
> Consider the statement "Jon is 33 years old". It conveys the same real
> world fact in a clumsier way than "Jon was born in 1974". Next year it
> won't even convey the same fact anymore. "Jon was born in 1974"
> catches the invariant better than "Jon is 33 years old".
>
> Consider "John is in Canada". When? The fact isn't
> complete without that piece of information.
>

Time for a Clinton moment. The above discussion depends on what the meaning
of the word "is" is.

In Spanish, "John is 33 years old" will be expressed roughly like this:
"John has 33 years."
The verb "to be" is not used.

"John is a man" will be expressed using one of the Spanish verbs "to be".

"John is in Canada" will be expressed using the other of the Spanish verbs
"to be".

This distinction is wasted on a person who thinks about the facts in
English. But it isn't wasted, at all, on a person who thinks about the
facts in Spanish. There are even statements in Spanish that differ only by
which verb is used.

To a Spanish speaker, the following are two different facts:
"Juan es loco."
"Juan está loco."

Does this mean that the content of the database is different, depending on
the first language of the observer?

I apologize for using Spanish rather than a more common language. Spanish
is the only language, other than English, that I know well enough to use to
illustrate the point.

I recall that Bob Badour attributed to Dijkstra the motto that one should
always do computer science in a second language.

David BL

unread,

Dec 7, 2007, 10:37:30 AM12/7/07

to

On Dec 7, 8:17 pm, Jon Heggland <jon.heggl...@ntnu.no> wrote:
> Quoth David BL:
>
> > I wasn't actually intending that Location be necessary for
> > identification of a marriage. I'll make the intensional definition
> > clearer:-
>
> > married(Husband, Wife, Location) :-
> > Husband is *currently* married to Wife
> > and they (last) got married at Location
>
> > Candidate keys are { Husband } or { Wife }, enforcing monogamy
> > integrity constraints.
>
> So Marriage is a relationship between a Husband and a Wife, yet it is
> identified by either, not the combination? I thought I finally had the
> common definition of "relationship" pegged, and then this comes along.

I see your point, and I can see a similarity to the example of how a
team identifier also identifies the team captain in the last post to
Jim. Evidently my "definitions" are too simplistic. In this case,
it was only a desire for strong integrity constraints that caused the
problem so presumably the distinction between entities and
relationships could be adjusted to accommodate these functional
dependency integrity constraints.

> I suppose I am looking for rigor where there is none, though. The
> definition of entity---something that is identified independently of
> other entities---is also rather half-baked. Take weak entities, for
> instance.

Yes, rather half baked.

> >> What if
> >> Location does not correspond to an entity type? I assume that might be
> >> the case if it were simply a spatial coordinate. Is marriage still a
> >> relationship? If not, what is is?
>
> > I agree that it is strange to think of a spatial coordinate as an
> > entity type.
>
> My attempted point was to posit that a Location might be a non-entity
> attribute, /or/ an entity (say, a church or temple or official building
> of some kind), and to ask whether this made any difference as to whether
> the marriage is a relationship or an entity.

I don't think so.

> >>> By contrast the following predicates are consistent with thinking of a
> >>> marriage itself as an entity
> >>> husband(MarriageId, Husband).
> >>> wife(MarriageId, Wife).
> >>> location(MarriageId, Location).
> >>> or maybe just
> >>> married(MarriageId, Husband, Wife, Location)
> >>> Now whether one "thinks in ER" or "thinks in propositional encodings",
> >>> there has to be good reason to introduce a MarriageId.
> >> Perhaps marriage certificates has a unique number stamped on them, and
> >> you want to record this in your database.
>
> > Yes, that could be a reason. Consider the following intensional
> > definition
>
> > married(MarriageId, Husband, Wife, Location) :-
> > The marriage identified by MarriageId was
> > between Husband and Wife at Location.
>
> > In this case the relation can record marriages that are no longer
> > current, and the only candidate key is { MarriageId }.
>
> And you say this now is an entity, right?

That depends on what you mean by "this". The tuple is always a tuple
and represents a fact.

The intensional definition is now explicitly identifying a marriage as
an identifiable entity.

> But what if the MarriageId
> represents an entity, just like Husband and Wife presumably do?

Why do you ask that question? That is how it's interpreted.

> Do we
> not then have a situation analogous to the first case, except that the
> relationship is ternary? It is, after all, identified by one of the
> entities it relates, just like the first Marriage.

Yes, exactly. The propositions are still associated with stating
relationships between entities, but this time the marriage is taking
an explicit part as a named entity.

> > Well I can see you are technically picking holes in my
> > "characterisation" of entity, but I wasn't really putting forward a
> > definition. I think the "characterisation" was very informal and more
> > along the lines of being necessary but not sufficient.
>
> Why necessary?

You're trying to draw me into some formal definitions. I'm only
considering how examples tend to pan out in practise, towards some
understanding of this neutrality question.

> > Actually, I would prefer to steer clear of trying to pin down such
> > informal words as "entity" and "relationship".
>
> But that is one of the main points of my involvement in this discussion!

Not mine!

> > However, I still believe particular entity types are implicit in the
> > intensional definitions of the predicates. How can they not be?
>
> I think the burden of proof is on your side here. I have tried to show
> that predicates can be interpreted as both entities and relationships
> (and probably even as parts of entities). You seem to admit as much,
> given that you you use the word "informal" about the whole E/R shebang.

I don't interpret predicates as anything other than predicates. An
instantiation of a predicate can be used to record a relationship that
exists between entities. A set of attribute values can sometimes
identify an entity.

You are looking at this from the perspective of the RM mathematical
formalism, where entities don't exist. I'm considering the
intensional definitions which tend to be stated in natural language
and are directly interpreted as factual statements about things in the
real world.

> I note that Jan Hidders claimed that the E/R distinction carries through
> to the logical/relational level, but I cannot see how---except by
> arbitrary claims along the lines of "if a relvar has but one key, and
> this key's components are all foreign keys, then the relvar represents a
> relationship".

> I have also tried to show that the predicates, with keys, represent more
> information---important information, at that---than a classification of
> things into entities and relationships, with entities having a single
> key and relationships being identified by their entities (or some subset
> thereof?). And that the predicate representation is simpler, in a way,
> than E/R, since it does not have the fuzzy distinction between entities
> and relationships---it only has the predicate construct. I therefore
> consider an E/R model insufficient, or too simplified to be of much use.
> This, of course, is a matter of taste.

I agree with that. I'm not a fan of E/R models. I prefer to directly
write down the predicates. In my opinion the predicates
simultaneously encode logical and conceptual design decisions. The
predicates have this entity-less RM formalism on the one hand, and the
intensional definitions involving entity types on the other. The
latter is messy and informal, but cannot be denied.

An important part of the design is to work out what things need to be
identified. If we feel that it's necessary to directly identify a
thing in order to state facts about it, then we consider it to be an
entity. Treat that characterisation as definitional, and only to give
some rough idea to that rather vague distinction between entity and
relationship.

> > Although the intensional definitions are strictly outside the
> > mathematical formalism of the RM, they are nevertheless fundamental to
> > the meaning and purpose of the database.
>
> Fundamental? Why?

Consider the following predicates

married(H,W) :- H is currently married to W and H is an
Australian citizen.

married(H,W) :- W has at some time been married to H and W is a
current employee of company X.

Aren't the intensional definitions fundamentally important?

> I have a hypothesis. It is that you are so used to thinking about a
> database in terms of entities and relationships that it is impossible
> for you to view it any other way.
> (The idea of viewing a database as a
> collection of facts was a revelation for me in that regard.) In fact, I
> have the opposite problem; I am unable to look at an E/R diagram without
> thinking about relations.

Your hypothesis is way off the mark.

> Consider this proposition: "Jon was born in 1974", encoded in a relvar
> of the form Born(Person, Year). I think we'll agree that represents a
> fact about Jon. You would probably assume that Jon is an entity (though
> I'm unsure about what you'd call the relvar/predicate in itself---is it
> an entity (type)?). But I would also say that the proposition is as much
> a fact about the year 1974! Is 1974 an entity? I really don't care.
> Facts are all.

My use of the word "entity" is no more presumptuous than your use of
the word "fact". Can you define what a fact is? In the formalism of
the propositional calculus, a proposition is just a formula.

Can you show me a factual statement stated in English that doesn't
reference something that could informally and reasonably be called an
identifiable entity? I'm assuming the fact must have actual
information content (ie no tautologies allowed) and could reasonably
be recorded in a real database.

> >>> Aren't you implying that a propositional encoding doesn't commit you
> >>> to a decision about whether a marriage is implicitly or explicitly
> >>> identified? I fail to see how that is possible.
>
> > This is the question I would like you to comment on! Do you agree
> > that a predicate can treat a marriage like a relationship, or
> > otherwise like an entity, which seems at odds with this idea of a
> > neutral logical layer?
>
> A propositional encoding does specify how this marriage is identified,
> yes. What I dispute is the distinction between implicit and explicit
> identification; between entities and relationship. A tuple/fact is
> identified by some combination of its attributes, that is all.

When you say you dispute it, what do you mean? I regard the
distinction as definitional, and to the extent that we can make sense
of it in particular examples, gives us some consistency in the
difference between entities and relationships.

I agree that a tuple/fact is identified by some combination of its
attributes. I also agree that a fact must not be thought of as
directly representing an entity or a relationship.

> In order to be able to say that a predicate treats something like an
> entity or a relationship, you would need to define precisely what this
> treatment entails---i.e., you would have to 'pin down such informal
> words as "entity" and "relationship"'.

I don't agree. I think the identification distinction applies
appropriately in lots of common examples. Just because something
breaks down in the fully general case doesn't make it worthless.

I posted because Jim claimed that the marriage example illustrated
neutrality in propositional encoding whereas ERM forces bias in the
design because of the entity/relationship distinction.

> Even if you were able and willing to do that, it would be backwards to
> claim that the logical layer makes the distinction. You could with as
> much justification define some mapping from predicates to either ducks
> or cats, and then claim that the logical model is not animal-independent.

That's only because the definition of that mapping is at odds with
people's conception of ducks and cats.

> >> You define relationships as anything that can be identified /only/ by
> >> its related entitites---if there is an alternate key, the thing is an
> >> entity. Correct? What is the rationale behind this rule?
>
> > It is only a rather vague characterisation, and about as precise as
> > the entity/relationship distinction deserves!
>
> > My rationale is as follows: [...]
>
> > Now in natural language we don't normally name instances of verbs or
> > actions, but instead name subject and object. Eg "Jack kicks John".
> > We don't think of the kick action as an entity that needs to be
> > identified independently of Jack and John. Furthermore a kick is less
> > tangible - it has a fleeting existence.
>
> It seems to me that the more significant point is whether or not the
> kick exists independently of Jack and John. Presumably, it doesn't, and
> introducing/discovering an alternate identifier should not change this.

Exactly, so if we assume the propositional encoding by the DBA is well
conceived, it won't directly identify an instance of a kick.

> Is "X kicked Y at time T" (i.e. Jack can kick John multiple times) still
> a relationship?

I think it should be

> Perhaps the definition is "something that cannot be
> identified independently of other things", instead of "something that is
> identified solely by other things"?

That seems an improvement.

> Although that would make a weak
> entity a kind of relationship...

Hmmm.

> > Note furthermore that verb phrases include other types of
> > relationships that don't correspond to actions, such as "is less
> > than", "has" or "is the father of". Note how silly it would seem to
> > name a particular instance of a "has" relationship - ie between a
> > particular pair of entities.
>
> I have an apartment, and I can put a name to my ownership. But I see
> your point.
>
> >> (And what if the alternative means of identification is also a
> >> combination of entities?)
>
> > Relationships won't tend to do that!
>
> Never mind; with my current understanding of the "definition" of
> relation, it wouldn't make a difference.
>
> > At this rather informal level of discussion, relationships are
> > counterparts to relations and as we know,
>
> They are? What, then, are the counterparts of entities and attributes?

I didn't say that well at all. Very informally, a set of attributes
(of the RM formalism) plus their values can identify an entity.
Amongst other things, relations are used to record the relationships
between the entities.

Jan Hidders

unread,

Dec 7, 2007, 11:04:00 AM12/7/07

to

On 7 dec, 12:17, Jon Heggland <jon.heggl...@ntnu.no> wrote:
> Quoth David BL:
>
> > I wasn't actually intending that Location be necessary for
> > identification of a marriage. I'll make the intensional definition
> > clearer:-
>
> > married(Husband, Wife, Location) :-
> > Husband is *currently* married to Wife
> > and they (last) got married at Location
>
> > Candidate keys are { Husband } or { Wife }, enforcing monogamy
> > integrity constraints.
>
> So Marriage is a relationship between a Husband and a Wife, yet it is
> identified by either, not the combination? I thought I finally had the
> common definition of "relationship" pegged, and then this comes along.
>
> I suppose I am looking for rigor where there is none, though. The
> definition of entity---something that is identified independently of
> other entities---is also rather half-baked. Take weak entities, for
> instance.

Allow me to make an attempt at a few definitions:

Entities are things.
Relationships are predicates.

What's wrong with this picture?

-- Jan Hidders

David Cressey

unread,

Dec 7, 2007, 11:12:51 AM12/7/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message
news:0c5c22b8-cdb3-462c...@a39g2000pre.googlegroups.com...

This sounds right to me. If we can go one step further, and say that unary
relationships are predicates that reference only one entity, now we have a
basis for moving forward. The entire lofical level is baed on predicates.

We can say, as others have said, that relvars can be designed once the
predicates are known.
And this, unlike the discussions on ER, remains silent on the subject of
whether the mode of epxression has to be a diagram. Predicates can be
expressed in plain English.

What's the difference between a predicate and a proposition?

> -- Jan Hidders

mAsterdam

unread,

Dec 7, 2007, 11:26:51 AM12/7/07

to

David Cressey schreef:
> mAsterdam wrote

>> Sorry for butting in this late, and not even completely on topic.
>> 'Facts' triggered my interest.
>>

>> Jon Heggland wrote:
>>
>>> ...(The idea of viewing a database as a
>>> collection of facts was a revelation for me in that regard.) In fact, I
>>> have the opposite problem; I am unable to look at an E/R diagram without
>>> thinking about relations.
>>>
>>> Consider this proposition: "Jon was born in 1974", encoded in a relvar
>>> of the form Born(Person, Year). I think we'll agree that represents a
>>> fact about Jon. You would probably assume that Jon is an entity (though
>>> I'm unsure about what you'd call the relvar/predicate in itself---is it
>>> an entity (type)?). But I would also say that the proposition is as much
>>> a fact about the year 1974! Is 1974 an entity? I really don't care.
>>> Facts are all.

>> Consider the statement "Jon is 33 years old". It conveys the same real
>> world fact in a clumsier way than "Jon was born in 1974". Next year it
>> won't even convey the same fact anymore. "Jon was born in 1974"
>> catches the invariant better than "Jon is 33 years old".
>>
>> Consider "John is in Canada". When? The fact isn't
>> complete without that piece of information.
>
> Time for a Clinton moment. The above discussion depends on what the meaning
> of the word "is" is.

Did Clinton's argument hold up? Oh wait :-)

Under which interpretation/condition of "is" would
"John is in Canada" not need a time to be of interest?

> In Spanish, "John is 33 years old" will be expressed roughly like this:
> "John has 33 years."
> The verb "to be" is not used.
>
> "John is a man" will be expressed using one of the Spanish verbs "to be".
>
> "John is in Canada" will be expressed using the other of the Spanish verbs
> "to be".
>
> This distinction is wasted on a person who thinks about the facts in
> English. But it isn't wasted, at all, on a person who thinks about the
> facts in Spanish. There are even statements in Spanish that differ only by
> which verb is used.

For the facttype of which "John is 33" is a statement it is possible
to make language-specific intension, and a language neutral extension:

[English]: <Person> is <Age> years old
[Nederlands]: <Persoon> is <Leeftijd> jaar oud
----------------------------------------------
John, 33

> To a Spanish speaker, the following are two different facts:
> "Juan es loco."
> "Juan está loco."

What do they mean? My Spanish sucks.

> Does this mean that the content of the database is different, depending on
> the first language of the observer?

One requirement for a database can be:
make sure that the content is language-neutral.

> I apologize for using Spanish rather than a more common language. Spanish
> is the only language, other than English, that I know well enough to use to
> illustrate the point.
>
> I recall that Bob Badour attributed to Dijkstra the motto that one should
> always do computer science in a second language.

Quoting Dijkstra can make anyone look sensible.

Bob Badour

unread,

Dec 7, 2007, 12:29:49 PM12/7/07

to

David Cressey wrote:

Instantiation.

Message has been deleted

Jan Hidders

unread,

Dec 7, 2007, 1:17:50 PM12/7/07

to

On 7 dec, 17:12, "David Cressey" <cresse...@verizon.net> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

Indeed. Both the RM and ER modelling are ultimately based on
predicates. That is not where the essential difference lies. Also not
in the fact that you can draw a diagram, because this can be done in
both cases.

> What's the difference between a predicate and a proposition?

My usual explanation is something like the following. A proposition is
a specific statement that is either true or false. A predicate is a
proposition where zero or more references have been left open such as
"X is having a conversation with Y". If you instantiate the predicate,
i.e., you fill in the X and Y with well understood references, like in
"Jan is having a conversation with David", then it becomes a
proposition. Roughly speaking you could say that the relationship
between propositions and predicates is like the relationship between
entities and entity types, or between objects and object classes.

Somehow I suspect that what you actually wanted to hear was "they are
the same". :-)

-- Jan Hidders

Message has been deleted

David Cressey

unread,

Dec 7, 2007, 2:08:55 PM12/7/07

to

"mAsterdam" <mAst...@vrijdag.org> wrote in message

news:47597326$0$227$e4fe...@news.xs4all.nl...
> David Cressey schreef:

> > To a Spanish speaker, the following are two different facts:
> > "Juan es loco."
> > "Juan está loco."

In casual conversation, I would say "John is crazy" for either one of them.
But they don't express the same fact. For the first one, if John were to
be not crazy tomorrow, it would be a sign of a most unusual and unexpected
recovery from a chronic condition. For the second one, if John were not
crazy tomorrow, it would probably mean that he went through a psychotic
episode of short duration.

>
> What do they mean? My Spanish sucks.
>

> > Does this mean that the content of the database is different, depending
on
> > the first language of the observer?
>
> One requirement for a database can be:
> make sure that the content is language-neutral.
>

This could get into some deep waters. The linguists and programmers who
have attempted to perform automatic translation between two natural
languages have repeatedly come up against the obstacle that a language
neutral scheme for expressing human thought is far more elusive than it
seems.

David Cressey

unread,

Dec 7, 2007, 2:12:48 PM12/7/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:b0c48336-ae41-4041...@s8g2000prg.googlegroups.com...

Could they also be the difference between a relvar's intension and its
extension, as I read in here?

> Somehow I suspect that what you actually wanted to hear was "they are
> the same". :-)
>

No, actually what I wanted to hear was "the same as the difference between
the two Wikipedia definitions." But I will be content if I can learn the
truth, regardless of what I wanted.

> -- Jan Hidders

TroyK

unread,

Dec 7, 2007, 2:21:33 PM12/7/07

to

For any given conceptual model (assuming in this context
that we take the ER model mentioned to be a conceptual
model), there are (usually) more than one valid logical designs
that will faithfully model it.

The act of choosing among the tradeoffs that those logical
models imply, to me, is the act we call designing. I'm unaware
of any tools that do this mechanically (at least adequately),
so I would say that the human input is more than a "little bit",
which you said.

TroyK

Tegiri Nenashi

unread,

Dec 7, 2007, 2:25:25 PM12/7/07

to

On Dec 7, 8:12 am, "David Cressey" <cresse...@verizon.net> wrote:
> What's the difference between a predicate and a proposition?

proposition is a 0-ary predicate

TroyK

unread,

Dec 7, 2007, 2:49:53 PM12/7/07

to

On Dec 7, 9:12 am, "David Cressey" <cresse...@verizon.net> wrote:
[snip]

> What's the difference between a predicate and a proposition?

I like C.J. Date's explanation for this:
"Observe, incidentally, that a proposition can be regarded as a
degenerate predicate; to be precise, it's a predicate for which the
corresponding set of parameters is empty (and the function thus always
returns the same result, either TRUE or FALSE, every time it's
invoked). In other words, all propositions are predicates, but most
predicates aren't propositions."

"The Logic Of Business Rules", dbdebunk.com "Practical Database
Foundations" paper.

TroyK

mAsterdam

unread,

Dec 7, 2007, 3:31:42 PM12/7/07

to

paul c schreef:

> mAsterdam wrote:
>> One requirement for a database can be:
>> make sure that the content is language-neutral.
>

> No need, the RM already is language-neutral, eg.,
>
> <John,33>
> <Johan,33>
> <Jaan,33>
>
> are different, not the same.
>
> Same goes for headings:
>
> <Nomme, Ans>
> <Name, Years>
>
> are different headings.

Yes. However, having multiple headings in one relation
is not part of RM AFAIK. Is it? Maybe you or somebody else
knows of some work in this area?

David Cressey

unread,

Dec 7, 2007, 3:36:05 PM12/7/07

to

"TroyK" <cs_t...@juno.com> wrote in message
news:ac66edf3-bc57-47c4...@l1g2000hsa.googlegroups.com...

Exactly.

> I'm unaware
> of any tools that do this mechanically (at least adequately),
> so I would say that the human input is more than a "little bit",
> which you said.
>

Data Architect from Sybase did this for me very well back in 1999. It did
physical design in the same step, but no matter.

mAsterdam

unread,

Dec 7, 2007, 3:45:24 PM12/7/07

to

David Cressey wrote:
> mAsterdam wrote:

>> David Cressey wrote:
>
>>> To a Spanish speaker, the following are two different facts:
>>> "Juan es loco."
>>> "Juan está loco."
>
> In casual conversation, I would say "John is crazy" for either one of them.
> But they don't express the same fact. For the first one, if John were to
> be not crazy tomorrow, it would be a sign of a most unusual and unexpected
> recovery from a chronic condition. For the second one, if John were not
> crazy tomorrow, it would probably mean that he went through a psychotic
> episode of short duration.
>

Thanks.

[snip]

>>> Does this mean that the content of the database is different, depending
>>> on the first language of the observer?
>> One requirement for a database can be:
>> make sure that the content is language-neutral.
>
> This could get into some deep waters.

Deep and rich.

> The linguists and programmers who
> have attempted to perform automatic translation between two natural
> languages have repeatedly come up against the obstacle that a language
> neutral scheme for expressing human thought is far more elusive than it
> seems.

Yes. Yet, facts in shared databases make a comparativly
simple subset of what can be expressed in a narrative way.

Development of a database for a multilingual
organization does face the requirement that
the extension should be language-neutral.
This seems more achievable than automatic translation.

Message has been deleted

mAsterdam

unread,

Dec 7, 2007, 4:07:02 PM12/7/07

to

paul c schreef:
> mAsterdam wrote:

>> paul c schreef:

>>> <Nomme, Ans>
>>> <Name, Years>
>>>
>>> are different headings.
>>
>> Yes. However, having multiple headings in one relation

>> is not part of RM AFAIK. ...
>
> Who said anything about multiple headings in one relation?

I did. It is the way I labelled:

>>> <Nomme, Ans>
>>> <Name, Years>

... appearantly not something you intended.
Why not - or, better: what are they?

mAsterdam

unread,

Dec 7, 2007, 4:11:37 PM12/7/07

to

paul c schreef:
> mAsterdam wrote:

>> Development of a database for a multilingual
>> organization does face the requirement that
>> the extension should be language-neutral.
>> This seems more achievable than automatic translation.
>

> I'd bet that most United Nations db headers are uni-lingual,
> propositions too. I doubt that they translate their ER diagrams into 99
> languages, although from stories I've heard, I might be wrong about that!

I won't take your bet, I'll side with you on that.

However, even with the uni-lingual db headers there will be a lot of
multi-lingual user-interfaces to those databases. In a way they take
the role of those headers.

Jan Hidders

unread,

Dec 7, 2007, 4:37:57 PM12/7/07

to

Yes, in the sense that the predicate is the intension of the relvar
and each tuple in the content of the relvar represents a proposition
in the extension of the predicate.

> > Somehow I suspect that what you actually wanted to hear was "they are
> > the same". :-)
>
> No, actually what I wanted to hear was "the same as the difference between
> the two Wikipedia definitions."

The entry for "proposition" seems fine to me and emphasizes correctly
that it is not the statement itself but rather its meaning, something
I glossed over. The entry for "predicate" is lousy, and seems to think
all predicates are unary. The "predicate (mathematics)" is better, but
considers only static ones, i.e., ones whose extension is fixed and
cannot change over time (hence the claim that a predicate can be
defined as a relation). This is not wrong in itself, this is indeed
how the term is usually used in mathematics, but in the context of
database theory it is important to realize that we also are
considering the time-varying kind.

-- Jan Hidders

Ruud de Koter

unread,

Dec 7, 2007, 4:59:32 PM12/7/07

to

JOG wrote:
> On Dec 5, 6:09 am, Ruud de Koter <nob...@internet.org> wrote:
>> JOG wrote:
>>> On Dec 4, 11:42 pm, Ruud de Koter <nob...@internet.org> wrote:
>>>> David Cressey wrote:
>>>>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>>>>> news:58c47eeb-adf0-414a...@l1g2000hsa.googlegroups.com...
>>>>>> Genuine question guys. From an E/R perspective (one of the good
>>>>>> variants, that allows relationships to have attributes), if I'm faced
>>>>>> with the following data.
>>>>>> -- Fred married Wilma in Bedrock.
>>>>>> -- Barney and Betty married in Paris.
>>>>>> How do I decide whether I am dealing with a marriage entity or a
>>>>>> marriage relationship?
>>>>>> The literature I'm reading here is telling me that the choice is based
>>>>>> on what 'things' are key to the business. If my business is concerned
>>>>>> with people (tax collection say), 'marriage' is best modelled as a
>>>>>> relationship, whereas if the marriages themselves are my focus
>>>>>> (perhaps I run a church) then its probably better as an entity.
>>>>>> Have I made the right interpretation here, and is there general
>>>>>> agreement here? I am much more comfortable seeing that some variants
>>>>>> allow relationships to themselves have attributes, and that there is
>>>>>> nothing sacred about choices between using relationships or entities,
>>>>>> making it a design decision instead.
>>>>>> Thanks in advance, J.
>>>>> My answer is that it's subjective. If the subject matter experts all treat
>>>>> a marriage as a relationship, follow their lead. If the subject matter
>>>>> experts all treat it as an entity, follow their lead, but insist that
>>>>> there's a key attroubt that identifies it. (Do the SME's have any such
>>>>> thing as a "marriage ID" attribute?
>>>>> Once you switch over from analysis to design, here's what happens: the
>>>>> attributes that you discovered during analyisis and attached to entities or
>>>>> relationships will be carried over from your ER model your design model,
>>>>> which I presume will be a relational model.
>>>>> The entities and relationships themselves, as such, will all disappear!
>>>>> Another thing that will carry over is the keys, used to identify instances
>>>>> of entities in the subject matter (or UoD if you prefer). Sometimes the
>>>>> keys used by the SME (subject matter experts) are a little too informal and
>>>>> require "common sense" to disambiguate. That's a special case, and doesn't
>>>>> affect this discussion.
>>>>> I'm used to expressing a relational model in terms of tables (relational
>>>>> tables), but I presume that what follows could be transliterated into terms
>>>>> of relvars without any difficulty.
>>>>> Each entity will have a table of its own, with the attributes that pertain
>>>>> to that entity. The primary key of the table will be the key attribute of
>>>>> the entity.
>>>>> Each relationship will have a table of its own, except for a few that can
>>>>> be piggy backed onto entity tables. The primary key of a relational table
>>>>> will consists of two or more foreign keys, compound. These tables will
>>>>> automatically be normalized up to 3NF unless your analysis put an attribute
>>>>> on the "wrong entity". I'm not sure about normalization beyond 3NF.
>>>>> The above process is so automatic that you can have software that does it
>>>>> for you. Indeed, that's what several tools do. They express an ER model
>>>>> in terms of metadata, and likewise express the relational model in terms of
>>>>> metadata. And they have a programmed process that will create a relational
>>>>> model from an ER model. The only tool I know calls the relaional model a
>>>>> "physical model" and makes it specific to some product like Oracle or DB2,
>>>>> etc. But that's a trivial detail. The software also turns models into
>>>>> diagrams and/or create scripts for you.
>>>>> So where did the entities and relationships go? They disappeared into the
>>>>> ether! However,
>>>>> when the application people get around to designing screens and reports,
>>>>> they can tie each feature back to an original entity or relationship. That
>>>>> can make the resulting system coherent for the users.
>>>>> I apologize for this repsonse. It's really a lot more than you asked for.
>>>>> But it's actually easier to use than it is to describe. It may also
>>>>> oversimplify. The relational model that software constructs may not be the
>>>>> best relational model that could be designed to deal with the original
>>>>> problem.
>>>> Very clear, this answer. One minor point I 'd like to add is that it is
>>>> not subjective. Instead, the choices are governed by the goal to be
>>>> served with the application (assuming the analysis aims at building an
>>>> application).
>>> Shared data anyone? Isn't the point that we _don't_ necessarily know
>>> all the applications?
>>>> As clearly stated, there is a difference in perspective
>>>> between tax inspectors and priests (and spouses, for that matter). There
>>>> simply is no single authorative model for a marriage, there are several
>>>> points of view, depending on the universe of discourse one operates in.
>>>> What we, in analysis, can do is to make sure we are aware of these UoDs
>>>> , and make a conscious choice. That is something else than being subjective.
>>> Why make the choice? Keep the data neutral and its good for both tax
>>> inspectors and priests right.
>> There are two troublesome points in your reaction. First of all 'keep
>> the data neutral' doesn't mean no choices are made. Staying neutral is a
>> choice as well. One of the hardest choices I 'd say, because in order to
>> stay neutral, a thorough knowledge of the universes of discourse is
>> necessary. Also, these universes should not be mutually exclusive.
>
> Thats a fair point. Neutrality is something i've promoted for a long
> time on cdt, and I understand there are issues for processor cycles.
> However one can still take a single conceptual view of data in
> analysis and flatten it out in the logical layer. Take David's
> breakdown of the marriage example for instance - even though it is
> translated from a single conceptual view, by the time it is in the
> logical layer data may be extracted from it via the perspective of
> marriage as an entity, or marriage as a relationship, with equal ease.

Re-reading your posts gave me a better understanding. I see how
flattening-out may help to make the model more accessible for other UoDs
then the one that was used in the original analysis. Nevertheless, I
keep thinking this accessibility is only a surface matter: any
attributes that were not part of the analysis will not be in the logical
model. To stay with the example: the IRS may get at the marriage data,
but it will probably want to know some specific legal information that
determines how the spouse's taxes will be dealt with. It will not be
that hard to add columns in a relational model to represent these
attributes (an obvious and classical bonus of the relational approach).
>
>> A second point: we can only keep the data neutral if know all possible
>> perspectives. It is only then that we can consciously model the data to
>> fit all the universes of discourse. Yet, you rightly observe we don't
>> necessarily know all the applications, which amounts to saying we don't
>> know all the universes of discourse. So choices can not be avoided. In
>> that case I 'd much rather make these conscious choices instead of
>> keeping up a pretense of neutrality. At the very least we should be
>> aware that the model resulting from analysis may be biased, and is not
>> the final word on the world out there.
>
> I think maybe we are referring to a slightly different definition of
> neutrality. I'm suggesting that a logical model should have no bias as
> to whether things are relationships or entities, and leave that to be
> determined by the person generating the queries.

Yes, we are clearly using a different definition of neutrality. I must
have missed that. Apparently my background as a political scientist
still cuts in from time to tome.

Regards,

Ruud de Koter.

Regards, J.
>
>>
>>
>>>> Hope this helps,
>>>> Ruud de Koter.
>

Ruud de Koter

unread,

Dec 7, 2007, 5:05:22 PM12/7/07

to

Jan Hidders wrote:
> On 5 dec, 02:10, JOG <j...@cs.nott.ac.uk> wrote:

>> Ok so one might summarize the following steps:
>> 1) initial analysis of business processes and important concepts.
>> 2) Formulation of an initial conceptual model (that is necessarily
>> slanted to a certain viewpoint of the UoD).
>
> You might want to add here that sometimes in this step you have to
> integrate several different UoDs that do not necessarily agree on the
> overlapping parts. I think that is mainly what Ruud is talking about,
> plus the fact that at this stage you might want to anticipate a bit on
> other UoDs that might have to be integrated in the future. You could
> call that "making it more objective" but I think "making it less
> subjective" is more precise. ;-)
>
>> 3) Translation into a nicely normalized logical model, that's query
>> neutral.
>
> Normalization is really only a very minor issue here IMO. I've not had
> that much personal practical experience in my life but I did work
> briefly for two big Dutch companies that both had an organization in
> charge of maintaining the global company data model that integrated
> all data models from the applications and databases they had. I worked
> with the guys that did this, and I remember being completely blown
> away buy how much variation there was in concepts such as employee and
> order, even within a single company. I still admire these guys.
>
>> 4) On demand, extract data back out from the neutral logical model,
>> shaping it either the original conceptual view, or other conceptual
>> views as needs arise from new applications.
>>
>> Great. This all makes perfect sense, and is very clear to boot. A
>> simple process for creating a thorough yet flexible system. It seems
>> obvious even, right?
>>
>> So why on earth would /anyone/ want to drop step 3? I'm at a loss as
>> to why certain cdt'ers (who are clearly intelligent people) seem to be
>> advocating this. An absolute loss I tell you.
>
> I'm not sure that is what Ruud is saying. Anyone else?

This position is so far from what I think, that I hadn't even considered
I would be one of these people before I read this. If anybody read my
post as to mean that normalization is a useless operation, they misread
it. As for the query neutrality (as I understand it now), no criticism
either.

Ruud de Koter.

>
> -- Jan Hidders

Message has been deleted

mAsterdam

unread,

Dec 7, 2007, 9:13:36 PM12/7/07

to

> Oh, it was you, was it? Whew, that was a close one. Maybe I misled
> with the tilted carets or whatever they're called and should have used
> braces, also by abbreviating them without type names, which seems common
> whenever the purpose isn't affected.

Explicit types do not reduce the number of headers.

> Anyway, I suppose there might not be anything theoretically wrong with
> an rdbms that allowed multi-lingual headings, so that the Frenchman
> could pretend the db was using his lingo and the Englishman his,
> although they might get up to more hijinks whenever RENAME came into
> play than those two nationalities ever did in the last 900 years.
>
> If you insist on such a thing, I hope you'll call them some kind of
> alias as I think there is already more than enough multi-lingual false
> correctness in the world.

Please do not attribute your invention to me just because
I labelled it.

This seems like a good place to completely re-quote your claim.

> No need, the RM already is language-neutral, eg.,
>
> <John,33>
> <Johan,33>
> <Jaan,33>
>
> are different, not the same.
>
> Same goes for headings:
>

> <Nomme, Ans>
> <Name, Years>
>
> are different headings.

I read your example as having multiple headings to one relation -
which is outside the RM as I know it. You prefer to call them aliases?
If so: aliases of what?

> As the Mott's Clamato man said, why stop there? Might as well have
> multi-lingual aliases for relation and relvar names too. What the heck,
> do similar for values in tuples.

Making the values in tuples language-neutral makes sense if the
people sharing the data do not share a language.

Exposure: Where I live most people have to learn at least
two foreign languages (English and one of French and German).

> For a while, the effect might be
> drastically deleterious for update performance, but eventually an
> optimization theory might appear. So typical of the IT world to
> optimize the tool rather than the problem. The DB world being so
> over-endowed with clarity, I guess adding a good dose of obscurity can't
> hurt it either!
>
> (Just my not too blunt way of doing my bit and helping us all yet again
> that explaining analysis and design is harder than doing it.)

Bob Badour

unread,

Dec 7, 2007, 9:16:43 PM12/7/07

to

paul c wrote:

> Oh, it was you, was it? Whew, that was a close one. Maybe I misled
> with the tilted carets or whatever they're called and should have used
> braces, also by abbreviating them without type names, which seems common
> whenever the purpose isn't affected.
>

> Anyway, I suppose there might not be anything theoretically wrong with
> an rdbms that allowed multi-lingual headings, so that the Frenchman
> could pretend the db was using his lingo and the Englishman his,
> although they might get up to more hijinks whenever RENAME came into
> play than those two nationalities ever did in the last 900 years.

If one wants this feature, all one has to do is declare a bunch of views.

> If you insist on such a thing, I hope you'll call them some kind of
> alias as I think there is already more than enough multi-lingual false
> correctness in the world.

View === some kind of alias

> As the Mott's Clamato man said, why stop there? Might as well have
> multi-lingual aliases for relation and relvar names too. What the heck,

> do similar for values in tuples. For a while, the effect might be

mAsterdam

unread,

Dec 7, 2007, 9:38:54 PM12/7/07

to

Bob Badour schreef:

Which feature exactly? Views on what?
Who asked for a feature of any kind?
Explanations of solutions to problems one does not grasp
do not help. They obfuscate. Who wrote this?

>> If you insist on such a thing, I hope you'll call them some kind of
>> alias as I think there is already more than enough multi-lingual false
>> correctness in the world.
>
> View === some kind of alias

The art of stating the obvious ... Argh! It's Bob!
Oh well. I'll push the send button (called 'Verzenden' in my
Thunderbird) anyway.

Message has been deleted

mAsterdam

unread,

Dec 7, 2007, 10:04:57 PM12/7/07

to

paul c schreef:

> Bob Badour wrote:
>> If one wants this feature, all one has to do is declare a bunch of views.
>

> Good one, even if it doesn't handle domain value reps, AFAIK. ...
Just a restatement of your aliases. Aliases of what?

> Even being an athiest, all I can think of so far is "Good God!". I can
> barely read French so English is the only tongue I can attempt glibness
> in. I had a feeling my couple of short examples might explode into
> mysticism, just my own fault I guess.

You made a claim. The mysticism itself is yours - not what you
perceive as the explosion of it.

> Actually, I see there is evidence hereabouts that Dikjstra was only

Dijkstra.

> partly right, it depends on the language. Personally, I think French
> would be a better language for db's to use, more precise as the longer
> length of any English to French and vice-versa translation will show. No
> offence, but English assumes more intelligence than precision in its
> audience, eg., in English we might say John is thirty-three whereas in
> French that couldn't happen, we'd say Jean a vingt-treize ans.

Trente trois.