It seems to me that most theoretical texts talk about 'conceptual',
'logical' and 'physical' design. These steps produce conceptual, logical and
database schema. However in the 'real world' conceptual and logical steps
seem to be combined into a single step.
Academic texts seem to use the chen notation and the E-R data model to
identify entity and relationship sets (pure conceptual design independent of
any DBMS, no asumption of an RDBMS implementation). Is this ever done in the
'real world'?
It seems to me that professionals simply engage in logical and physical
design. They produce a normalised logical data design (identifying all
attributes and primary and foreign keys - possibly also domains). They also
use a 'crows foot' notation identifying relations and the relationships
between them. Thus they are assuming an RDMBS implementation (but no
particular brand of RDBMS). This 'logical' design activity is sometimes
referred to as 'conceptual design', even though an RDBMS is being targeted.
From the logical design a database schema is produced (physical design).
Am i going wrong somewhere?
Here's my take on it.  A "conceptual model"  is typically the result of
analysis,  not design.  The best conceptual models capture the information
requirements of some proposed system or database,  from a data centric point
of view.  The ER model, when it was first proposed and for about a decade
following,  was a good model for this purpose,  because of the way
relationships are treated.  In the ER model, relationships are identified,
but not implemented.
Again,  taking the point of view of 20 years ago,  databases of that era
were mostly hierarchical or network,  with "newfangled" relational databases
making rapid inroads.  Hierarchical and Network databases implement
relationships by embedding pointers in records, alongside the data.
Relational databases implement relationships in the data itself.  If a
reference is made to the data in a table row,  a foreign key at the point of
reference is used,  and not a pointer.
This point has come to the forefront again, because many OODBMS products
make the same use of pointers that Hierarchical and network DBMS did.
A Relational DBMS may have pointers and keys associated together in indexes,
but these are managed by the RDBMS itself,  and are nearly transparent to
the application programmer.
So what's the difference between a conceptual (ER) model, and a logical
(Relational) model?  Well, one difference is how tied to a relational
implementation you are.  A relational model is somewhat more bound to a
relational physical implementation than an ER model is.  Another difference
is normalization.  An ER model isn't particularly normalized,  although a
"good" ER model tends to produce a fairly normalized design, almost
automatically.
But the big difference, in my mind,  is the difference between analysis and
design.  The conceptual model captures the analysis, while the logical model
captures the logical design. I expect to hear a lot of contending views on
this topic.
In our work, we use the 'conceptual design' to document the high level
business
entities and subject areas of the database.  For example a sales agent can
have one or
more customers, a sales agent must be an employee.  An employee is
identified by
one and only one identification number.  This gives us three conceptual
entities -
employee, sales agent, and customer (independent of applications or RDBMS)
Our logical model then defines in more detail the attributes associated with
each
entitiy.  Generally, we don't keep a seperate conceptual, but use it to
build the logical.
The physical is then defined based on the RDBMS you are going to be using in
the
final application
Hope this helps.
D
"David Cressey" <da...@dcressey.com> wrote in message
news:Z_bs9.196$0I3....@petpeeve.ziplink.net...
>Well, one difference is how tied to a relational
> implementation you are.  A relational model is somewhat more bound to a
> relational physical implementation than an ER model is.
Why is this exactly?  Maybe you are referring to how most (well a lot
anyway) people don't really build a logical model first even though they
might say they are.  They just jump right in to building a schema with a
target RDBMS in mind.  If you don't do that and just build a data model with
whatever tool you want, or just a sheet of paper and a pencil, then I submit
that you are not bound to a physical implementation at all.
RD
If your audience are developers, they see a CM and start to think
about tables and design instead of concentrating on business rules and
discovering entities/things. You have to get them to think about the
"What" and not the "How".
The CM, IMO, is really intended for non-developers such as clients and
analyst. BUT.......If you really don't have a lot of business rules
and its a simple application, such as a survey datbase with two or
three tables, I guess you could just jump to the logical model. With
large-enterprise database development however, the CM is a must!
"Ray DiMarcello" <ra...@comcast.net> wrote in message news:<rk-dnUl4DJr...@comcast.com>...
>> It seems to me that most theoretical texts talk about 'conceptual',
>>  'logical' and 'physical' design. These steps produce conceptual,
>> logical
> 
> and
> 
>> database schema. However in the 'real world' conceptual and logical
>> steps seem to be combined into a single step.
> 
> Here's my take on it.  A "conceptual model"  is typically the result
> of analysis,  not design.  The best conceptual models capture the
> information requirements of some proposed system or database,  from a
> data centric point of view.  The ER model, when it was first proposed
> and for about a decade following,  was a good model for this purpose,
> because of the way relationships are treated.  In the ER model,
> relationships are identified, but not implemented.
	Wrong, ER can't capture all of the organisation's rules, because it has 
no equivalent to all integrity constraints in the database.
	Actually, conceptual model would be the rules derived from analysis.  A 
logical model would be the realisation of that as a relational database.
> A Relational DBMS may have pointers and keys associated together in 
> indexes, but these are managed by the RDBMS itself,  and are nearly
> transparent to the application programmer.
Should be not "nearly", but "totally" transparent.
> So what's the difference between a conceptual (ER) model, and a
> logical (Relational) model?  Well, one difference is how tied to a
> relational implementation you are.  A relational model is somewhat
> more bound to a relational physical implementation than an ER model
> is.
	Wrong again, the physical model is determined by the DBA from both the 
logical model and the system implementation physical constraints and 
utilisation requirements.  On the other direction, the logical model is 
totally independent from the physical implementation.
> But the big difference, in my mind,  is the difference between 
> analysis and design.  The conceptual model captures the analysis,
> while the logical model captures the logical design. I expect to hear
> a lot of contending views on this topic.
Actually this is what you got kinda right.
Sure it has. It can use the same language that you can use for the
relational model: 1st order logic.
-- Jan Hidders
> In article , Leandro Guimarães Faria Corsetti Dutra wrote:
> 
>> Wrong, ER can't capture all of the organisation's rules, because it
>> has no equivalent to all integrity constraints in the database.
> 
> Sure it has. It can use the same language that you can use for the 
> relational model: 1st order logic.
	Now I want to see it.  References, please.  You may want to be informed 
of articles against your view in http://dbdebunk.com./, for instance.
	BTW, it is not simply "1st order logic", but Set Theory *and* 1st Order 
_Predicate_ Logic.
-- 
  _
/ \ Leandro Guimarães Faria Corsetti Dutra        +41 (21) 216 15 93
\ / http://homepage.mac.com./leandrod/        fax +41 (21) 216 19 04
  X  http://tutoriald.sourceforge.net./      Orange Communications CH
/ \ Campanha fita ASCII, contra correio HTML      +41 (21) 644 23 01
Dbdebunk is not the gospel. The fact that you can use FOL to describe
constraints in the ER model is so plainly obvious to people who know about
logic and ER models that no serious researcher would write an article about
that.
If you think there is an inherent problem there then I suggest you share it
with us and I will be happy to explain why it isn't a problem. :-)
>	BTW, it is not simply "1st order logic", but Set Theory *and* 1st
> Order _Predicate_ Logic.
Yes, predicate logic, any other first order logics you know? :-)
But what makes you think you need set theory to specify constraints? Is
there any constraint expressible in set theory but not in logic? What you
_may_ sometimes need is higher order logics to express constraints you
cannot express in FOL, but that's about it.
-- Jan Hidders
How does an application programmer optimize access to information
if they are unaware of the existing indexes? Or is that not what is
meant by totally transparent?
> How does an application programmer optimize access to information if
> they are unaware of the existing indexes? Or is that not what is
> meant by totally transparent?
The RDBMS' optimizer handles all the optimization.  This allows for a
level of abstraction between the developer and the RDBMS.
Furthermore, by employing stored procedures (SP's), this builds
another layer of abstraction between the developer and the DBA.  The
SP's allow the DBA to tweak and tune the SQL, once the application is
in production, without affecting the application.
-- 
Pablo Sanchez, High-Performance Database Engineering
http://www.hpdbe.com
> Leandro Guimarães Faria Corsetti Dutra  wrote:
> 
>> Jan.Hidders wrote:
>> 
>>> Sure it has. It can use the same language that you can use for
>>> the relational model: 1st order logic.
>> 
>> Now I want to see it.  References, please.  You may want to be 
>> informed of articles against your view in http://dbdebunk.com./,
>> for instance.
> 
> Dbdebunk is not the gospel.
No, but it is a nice reference site and eye-opener.
> The fact that you can use FOL to describe
> constraints in the ER model is so plainly obvious to people who know
> about logic and ER models that no serious researcher would write an
> article about that.
	What about Fabian Pascal as "serious"?  What about 
http://dbdebunk.com./fp4a.htm as an article?  Or take Date and 
http://dbdebunk.com/kimball1.htm.
> If you think there is an inherent problem there then I suggest you 
> share it with us and I will be happy to explain why it isn't a
> problem. :-)
	OK.  How do you express non-RI constraints?  How one does express each 
domain, attribute, relation and database integrity constraints?  Or in a 
different taxiology, take transition constraints, how would ERDs 
represent them?  I'm not saying you mightn't have a partial answer, but 
I very much doubt you could give (or point to) a complete answer.
> But what makes you think you need set theory to specify constraints?
	Nothing.  But you need set theory to come up with the idea of relations 
in the first place, and all its attendant operations, and to have what 
to create constraints on.
Sure, but that doesn't mean that everything they say is true. So pointing to
an article of theirs and say "and therefore you are wrong" is not a good
argument.
>> The fact that you can use FOL to describe constraints in the ER model is
>> so plainly obvious to people who know about logic and ER models that no
>> serious researcher would write an article about that.
>
>	What about Fabian Pascal as "serious"?  What about 
>http://dbdebunk.com./fp4a.htm as an article?  Or take Date and 
>http://dbdebunk.com/kimball1.htm.
Where exactly in those articles do they claim that you cannot use FOL to
describe constraints in the ER model?
>> If you think there is an inherent problem there then I suggest you 
>> share it with us and I will be happy to explain why it isn't a
>> problem. :-)
>
>	OK.  How do you express non-RI constraints?  How one does express each 
>domain, attribute, relation and database integrity constraints?  Or in a
>different taxiology, take transition constraints, how would ERDs represent
>them?  I'm not saying you mightn't have a partial answer, but I very much
>doubt you could give (or point to) a complete answer.
All of the above can be done in first-order logic with the obvious
limitation that you cannot express what you can express in higer-order
logics but not in first-order logic. Again, what do you think is the
problem? You do know what first-order logic is, do you?
> Leandro Guimarães Faria Corsetti Dutra  wrote:
> 
>> Jan Hidders wrote:
>> 
>>> Leandro Guimarães Faria Corsetti Dutra  wrote:
>>> 
>>>> Now I want to see it.  References, please.  You may want to be 
>>>> informed of articles against your view in
>>>> http://dbdebunk.com./, for instance.
>>> 
>>> Dbdebunk is not the gospel.
>> 
>> No, but it is a nice reference site and eye-opener.
> 
> Sure, but that doesn't mean that everything they say is true. So 
> pointing to an article of theirs and say "and therefore you are
> wrong" is not a good argument.
	I don't think I did.  I only pointed to it for interesting, useful 
arguments in case you weren't aware of them.
	I said "you were wrong" yes, and then asked myself to be proven wrong. 
  But I never offered DBDebunk as any kind of proof, even because I 
beware of the argument from authority.
>>> The fact that you can use FOL to describe constraints in the ER
>>> model is so plainly obvious to people who know about logic and ER
>>> models that no serious researcher would write an article about
>>> that.
>> 
>> What about Fabian Pascal as "serious"?  What about 
>> http://dbdebunk.com./fp4a.htm as an article?  Or take Date and 
>> http://dbdebunk.com/kimball1.htm.
> 
> Where exactly in those articles do they claim that you cannot use FOL
> to describe constraints in the ER model?
	No, they claim that ERDs can't represent all possible constraints. 
They are not arguing limitations on first-order logic, but on ERDs.
	Now, I never saw ERDs proposed as a complete, practical representation 
of neither logic nor a relational database.  I would love to be proved 
wrong, because this would make the RM itself more palatable to GUI 
weenies that currently dominate all over the world.
>>> If you think there is an inherent problem there then I suggest
>>> you share it with us and I will be happy to explain why it isn't
>>> a problem. :-)
>> 
>> OK.  How do you express non-RI constraints?  How one does express
>> each domain, attribute, relation and database integrity
>> constraints?  Or in a different taxiology, take transition
>> constraints, how would ERDs
> represent
>> them?  I'm not saying you mightn't have a partial answer, but I
>> very much doubt you could give (or point to) a complete answer.
> 
> All of the above can be done in first-order logic with the obvious 
> limitation that you cannot express what you can express in
> higer-order logics but not in first-order logic. Again, what do you
> think is the problem? You do know what first-order logic is, do you?
Again, how can all of first-order logic be represented in ERDs?
Of course. (Do you think that the Relational Model can, by the way?) But I
said that you can use FOL to describe constraints in the ER model. So if you
extend the ER model with FOL (which is not very difficult) then you can
describe those constraints.
-- Jan Hidders
>Dbdebunk is not the gospel. The fact that you can use FOL to describe
>constraints in the ER model is so plainly obvious to people who know about
>logic and ER models that no serious researcher would write an article about
>that.
Do you mean to attach a written list of rules to the ER diagram?
I think It would be more logic to do the opposite.
Attach ER or ORM diagrams to the written database design, in order to
give a high level quick view.
As Date says: database design is constraint definition.
Yes.
>I think It would be more logic to do the opposite.
You mean attach an ER diagram to the list of rules? That would be very
strange since the ER diagram defines the universe of discourse for the rules.
-- Jan Hidders
According to the standard definitions:
1. Humans process information while machines process data.
2. The conceptual model captures an information model and need not represent
anything for machine consumption. In other words, it captures the concepts
used by human beings.
3. The logical model represents some or all of a conceptual model in an
abstract form suitable for machine manipulation.
4. A physical model maps the abstract logical model to actual storage
structures and locations.
You asked a simple question, and you deserve the simple answer above. If you
have access to a library that has them, you can verify the answer from the
ISO standard vocabularies (ISO/IEC 2382)
I can think of no better confirmation of Fabian Pascal's point ( see
http://www.dbdebunk.com ) regarding fundamentals ignorance ubiquitous in our
industry than the example given by this newsgroup where so many
practitioners can blow so much wind and completely bypass a simple
straightforward answer. It's simple to conclude they are completely ignorant
of all fundamentals -- even such simple ones as these.
"ad" <noS...@hotmail.com> wrote in message news:pI_r9.172$OC2.19355@wards...
> Leandro Guimarães Faria Corsetti Dutra  wrote:
> 
>> Jan Hidders wrote:
>> 
>>> Where exactly in those articles do they claim that you cannot use
>>> FOL to describe constraints in the ER model?
>> 
>> No, they claim that ERDs can't represent all possible constraints.
> 
> Of course. (Do you think that the Relational Model can, by the way?)
	The RM can represent all machine-enforceable constraints AFAIU.  If I'm 
in the wrong, please prove so.
> But I said that you can use FOL to describe constraints in the ER
> model. So if you extend the ER model with FOL (which is not very
> difficult) then you can describe those constraints.
What do you mean by "extend the ER model with FOL"?
	If it means complementing ERDs with textual representations of 
constraints, then we just have proved ERDs aren't enough.  Then it is 
just a matter of taste or circumstancial expediency to draw diagrams for 
quick visualisation and later flesh them out with test, or to go 
straight for text and later draw parts for presentation.  As we wanted 
to prove.
> Alfredo Novoa wrote:
> 
>> Do you mean to attach a written list of rules to the ER diagram?
> 
> Yes.
"That's the most stupid thing I ever heard!" -- BillG III.
What that gives one?
>> I think It would be more logic to do the opposite.
> 
> 
> You mean attach an ER diagram to the list of rules? That would be
> very strange since the ER diagram defines the universe of discourse
> for the rules.
	Obviously he means to write the full model, from domains to relvars and 
constraints, and then choose specific portions that need to be 
graphically represented as ERDs or whatever.
	I wonder what ERDs gives us besides yet another fuzzy way of thinking 
and nice presentation.  Good for ppt slides for PHBs, but hardly useful 
for serious designs.
>2. The conceptual model captures an information model and need not represent
>anything for machine consumption. In other words, it captures the concepts
>used by human beings.
>
>3. The logical model represents some or all of a conceptual model in an
>abstract form suitable for machine manipulation.
Fabian Pascal's samples seems to suggest that the data is also a part
of the logical model, but not a part of the conceptual model.
http://www.dbazine.com/pascal1.html
Am I wrong?
Alfredo
>>> I think It would be more logic to do the opposite.
>> 
>> 
>> You mean attach an ER diagram to the list of rules? That would be
>> very strange since the ER diagram defines the universe of discourse
>> for the rules.
>
>	Obviously he means to write the full model, from domains to relvars and 
>constraints, and then choose specific portions that need to be 
>graphically represented as ERDs or whatever.
Yes, domain and relvar definitions are also constraint definitions or
rule definitions if you like more.
And ERDs are never necessary, but it may be useful in order to give a
quick view.
And ORM diagrams seems to be better than ERDs in all cases.
Alfredo
Can you point me to the exact and complete definition of *the* language
in which you define constraints in the RM?
>> But I said that you can use FOL to describe constraints in the ER
>> model. So if you extend the ER model with FOL (which is not very
>> difficult) then you can describe those constraints.
>
>	What do you mean by "extend the ER model with FOL"?
>
>	If it means complementing ERDs with textual representations of 
>constraints, then we just have proved ERDs aren't enough.
I never said they were.
>Then it is just a matter of taste or circumstancial expediency to draw
>diagrams for quick visualisation and later flesh them out with test, or to
>go straight for text and later draw parts for presentation.
ERD diagrams can be given an exact semantics and are in that no better or
worse than textual table definitions. However, they are often closer to how
the user sees the data because, for example, you don't have to add
attributes to model certain relationships.
-- Jan Hidders
> Can you point me to the exact and complete definition of *the* language
> in which you define constraints in the RM?
"The Third Manifesto" by Date&Darwen defines a set of languages called "D"
that are able to define arbitrary constraints.  The same book defines
"Tutorial-D", a sample language that has not been implemented.
Dataphor "D4" belongs to "D", so I would say that "D4" is _a_ language
in which you can define constraints.
For documentation, see http://docs.alphora.com/D4LGConstraints.html
+ others on the same web site
regards,
Lauri Pietarinen
Is this set of languages really a set in the sense that there is a proper
definition of it in set theory? Or is there just a set of properties that
should hold for the language (in which case it might be a class rather then
a set). That sounds a bit as if there is not really such a thing as "the
language in which you specify constraints in the RM". So how should I then
understand Leandro's claim that in this language (which one) you can express
every machine-enforceable constraint? Anyway, is there a proof somewhere
that Tutorial-D is computationally complete?
>Dataphor "D4" belongs to "D", so I would say that "D4" is _a_ language
>in which you can define constraints.
Ok. Can I express that relation R2 is the transitive closure of relation R1?
-- Jan Hidders
>Is this set of languages really a set in the sense that there is a proper
>definition of it in set theory?
No.
> Or is there just a set of properties that
>should hold for the language (in which case it might be a class rather then
>a set).
It is a set of prescriptions, proscriptions and recomendations for
constructing relational languages.
>Anyway, is there a proof somewhere
>that Tutorial-D is computationally complete?
Yes, because it also has imperative extensions.
>Ok. Can I express that relation R2 is the transitive closure of relation R1?
Yes:
R2 := TClose R1;
Agreed.
> And ORM diagrams seems to be better than ERDs in all cases.
Can you expand or provide pointers?
> On Sun, 3 Nov 2002 20:10:01 -0500, "Bob Badour" wrote:
> 
>> 2. The conceptual model captures an information model and need not
>> represent
>> anything for machine consumption. In other words, it captures the
>> concepts used by human beings.
>> 
>> 3. The logical model represents some or all of a conceptual model
>> in an abstract form suitable for machine manipulation.
> 
> Fabian Pascal's samples seems to suggest that the data is also a part
> of the logical model, but not a part of the conceptual model.
> 
> http://www.dbazine.com/pascal1.html
	I think they are only that, samples.  The data is there for 
illustrative purposes only.
AFAIU you could say that metadata is data, but that would be stretching it.
thanks for clarifying.
To be spesific the closure stuff in TTM is actually
a "Very Strong Suggestion" :
<quote>
6. D should provide some shorthand for expressing the generalized
transitive closure operation, including the ability to specify
generalized concatenate and aggregate operations.
</quote>
For the complete set of the
prescriptions, proscriptions and recomendations
see
http://www.hughdarwen.freeola.com/TheThirdManifesto.web/CHAP03.pdf
regards,
Lauri Pietarinen
> For the complete set of the prescriptions, proscriptions and
> recomendations see
> 
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/CHAP03.pdf
	Just to be picky, actually the canonical address is 
http://www.thethirdmanifesto.com/, like in 
http://www.hughdarwen.freeola.com/TheThirdManifesto.web/CHAP03.pdf.  Not 
just picky, usually the authors and webmasters would rather we use the 
canonical URLs instead of the redirected ones.
Not all imperative extensions make it computationally complete. In fact,
even if you add enough to simulate a Turing machine, you still could be
computationally incomplete. Moreover I thought that it was Date's position
that the constraint language and the query language should be declarative.
Has that principle been abandoned?
>>Ok. Can I express that relation R2 is the transitive closure of relation R1?
>
>Yes:
>
>R2 := TClose R1;
Interesting. Can I also use that to express the constraint that the relvar
R2 always contains the transitive closure of R1?
-- Jan Hidders
Strange, because in the prescriptions (thanks for the pointer, btw.) it
already says that recursion should be allowed. Any idea why then there is
still the need for an explicit transitive closure operation?
Another thing I wondered about while reading it is if I can express the
NEST and the UNNEST operations of the nested relational algebra. I seem to
remember that Date & Darwen were heavily opposed to those operations, but
the language would otherwise be certainly computationally incomplete. So,
can I?
-- Jan Hidders
>Not all imperative extensions make it computationally complete.
I thougth you only need a if-then and while-do instructions.
>even if you add enough to simulate a Turing machine, you still could be
>computationally incomplete.
Sure?
I always heard that a Turing machine is an example of computationally
complete "language".
And a quick google search points to that.
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s01/lectures/lect22/NP-complete.pdf
> Moreover I thought that it was Date's position
>that the constraint language and the query language should be declarative.
>Has that principle been abandoned?
Not, but Date says that it will never feasible to procure a
declarative syntax for all possible referential actions.
"An Introduction to Database Systems" Chapter 8.8
>Interesting. Can I also use that to express the constraint that the relvar
>R2 always contains the transitive closure of R1?
var R2 virtual relation TClose R1;
Alfredo
That is not always enough. For example, if you just take the relational
algebra (select, project, join, union, difference) and add a while construct
(you can simulate the if-then-else with that) it is easy to see that you can
only compute queries that can be computed in polynomial space. Morever, it
can also be shown that there are some very simple queries, like does this
relation contain an even number of tuples, that cannot be expressed in it.
There's a very big and very interesting body of research literature on this
subject, and to get an impression of that you might want to take a look at
the chapters 16, 17 and 18 from "the Alice book":
  Serge Abiteboul, Richard Hull, Victor Vianu: Foundations of Databases. 
  Addison-Wesley 1995, ISBN 0-201-53771-0
http://www.informatik.uni-trier.de/~ley/db/books/dbtext/abiteboul95.html
Note that the book assumes a pretty good formal background in computer
science from its readers.
>I always heard that a Turing machine is an example of computationally
>complete "language".
Yes it is, but for computations over strings. If you change your domain of
computation to relational databases and you show that you can simulate a
Turing machine then you only have shown that on a certain subset of your
domain (the part you use to simulate the Turing tape) you are
computationally complete, but "computationally complete" means of course
that you are complete on the *whole* of your domain. For more information
and a formal treatment see chapter 2 of "the Alice book".
-- Jan Hidders
> >
> ><quote>
> >
> >6. D should provide some shorthand for expressing the generalized
> >transitive closure operation, including the ability to specify
> >generalized concatenate and aggregate operations.
> >
> ></quote>
> 
> Strange, because in the prescriptions (thanks for the pointer, btw.) it
> already says that recursion should be allowed. Any idea why then there is
> still the need for an explicit transitive closure operation?
>
I don't necessarily have a complete answer for you but
I could point out that the above "strong suggestion" talks
about "providing some _shorthand_ for expressing the generalized
transitive closure".
Compare it to "Strong suggestion #2:"
<quote>
2. D should include some declarative shorthand for expressing
referential constraints (also known as foreign key
constraints).
</quote>
 
> Another thing I wondered about while reading it is if I can express the
> NEST and the UNNEST operations of the nested relational algebra. I seem to
> remember that Date & Darwen were heavily opposed to those operations, but
> the language would otherwise be certainly computationally incomplete. So,
> can I?
> 
> -- Jan Hidders
Again, no expert on this but see Prescription 6 in the
same link.
regards,
Lauri Pietarinen
> Another thing I wondered about while reading it is if I can express the
> NEST and the UNNEST operations of the nested relational algebra. I seem to
> remember that Date & Darwen were heavily opposed to those operations, but
> the language would otherwise be certainly computationally incomplete. So,
> can I?
I took a look at the printed version of TTM (2ed)
(which is not available on line).
On pages 137-138 there is a discussion of NEST and UNNEST
(in Tutorial-D the operations are called GROUP and UNGROUP).
So for the relationtype
RELATION { S# S#, PQ RELATION { P# P#, QTY QTY } }
we can get to
RELATION { S# S#, P# P#, QTY QTY }
with
(assuming SPQ1 is a relvar of the first type and SPQ2 is
a relvar of the second type):
SPQ1 := SPQ2 GROUP { P#, QTY } AS PQ
and the opposite operation would be
SPQ2 := SPQ1 UNGROUP PQ
regards,
Lauri Pietarinen
>That is not always enough. For example, if you just take the relational
>algebra (select, project, join, union, difference) and add a while construct
>(you can simulate the if-then-else with that) it is easy to see that you can
>only compute queries that can be computed in polynomial space. Morever, it
>can also be shown that there are some very simple queries, like does this
>relation contain an even number of tuples, that cannot be expressed in it.
With the aggregate operators you can.
In Tutorial D:
(count(R) mod 2) = 0;
Of course it will return a boolean value.
Tutorial D also has scalar types and variables, and variable
assignements, thus I think it is clear it is computationally complete
having a while construct.
>There's a very big and very interesting body of research literature on this
>subject, and to get an impression of that you might want to take a look at
>the chapters 16, 17 and 18 from "the Alice book":
>
>  Serge Abiteboul, Richard Hull, Victor Vianu: Foundations of Databases. 
>  Addison-Wesley 1995, ISBN 0-201-53771-0
Thanks It was yet in my reading queue, I will try to move it to the
top of the queue :-)
>For more information
>and a formal treatment see chapter 2 of "the Alice book".
Ok, thanks
Alfredo
Very good point, and it made me realize that I should have known the answer
myself. The answer consists probably of two parts:
1. There are special efficient algorithms available to the query optimizer
   for the transitive closure. And because it is not easy for the optimizer
   to always recognize when a general recursive query computes a transitive
   closure it is useful if the user indicates this explicitly.
2. The same probabaly holds for the human reader who can now more easily
   understand what it is that is computed.
-- Jan Hidders
This is of course their good right but I hope the authors realize how much
better for the acceptance of their model it would be if it were on line.
Think of ODMG/OQL and W3C/XML.
>On pages 137-138 there is a discussion of NEST and UNNEST
>(in Tutorial-D the operations are called GROUP and UNGROUP).
Wow!! Do you realize what this means? Chris Date has recanted! The
relational model is dead, long live the nested relational model. We can
finally kiss the 1NF good bye. Good riddance, I say. I think I'm beginning
to like Tutorial-D. :-)
So can I abuse you a little more as my tutorial-D tutor? :-) Can I have
recursive types? For example (probably not the right syntax, but I think you
will understand what I mean):
TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
-- Jan Hidders
Why would that be clear?
-- Jan Hidders
From "OO PRESCRIPTIONS" :
<quote>
3. D shall be computationally complete. That is, D may support,
but shall not require, invocation from so-called "host
programs" written in languages other than D. Similarly, D may
support, but shall not require, the use of other programming
languages for implementation of user-defined operators.
</quote>
So, well, need I say more?
:-)
regards,
Lauri Pietarinen
No. :-)
-- Jan Hidders
> >On pages 137-138 there is a discussion of NEST and UNNEST
> >(in Tutorial-D the operations are called GROUP and UNGROUP).
> 
> Wow!! Do you realize what this means? Chris Date has recanted! The
> relational model is dead, long live the nested relational model. We can
> finally kiss the 1NF good bye. Good riddance, I say. I think I'm beginning
> to like Tutorial-D. :-)
To quote further from TTM (pages 136-137):
<quote>
Note that we permit relations (and tuples) to include attributes whose
values are relations. (However, we explicitly do not espouse "NF squared"
relations as described in e.g. reference [121], because they
involve major extensions to the classical relational algebra,
extensions that we find unnecessary.)
[121] Mark A. Roth, Henry F. Korth, and Abraham Silberschatz:
"Extended Algebra and Calculus for Nested Relational Databases,"
ACM TODS 13, No.4 (December 1988)
</quote>
And further in "An Introduction to Database Systems" (7ed)
on page 143 there is more annotation on the mentioned reference
which is too long to repeat here.
What I understand of this is that the attribute is seen
by the relational operators as just an attribute.
We have to separately "open up" that attribute to see
what's inside it and further operate on it.
> So can I abuse you a little more as my tutorial-D tutor? :-) Can I have
> recursive types? For example (probably not the right syntax, but I think you
> will understand what I mean):
> 
> TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
>
I don't really know if what you suggest here is permissible - perhaps
it is.  However there are no relational operators that would "traverse" the
tree.  You would have to explicitely "open up" each tree-tuple separately.
What are you trying to achieve with this construct?  Is it something
that can't be done any other way?  What would this be useful for?
regards,
Lauri Pietarinen
> Very good point, and it made me realize that I should have known the answer
> myself. The answer consists probably of two parts:
> 
> 1. There are special efficient algorithms available to the query optimizer
>    for the transitive closure. And because it is not easy for the optimizer
>    to always recognize when a general recursive query computes a transitive
>    closure it is useful if the user indicates this explicitly.
> 
> 2. The same probabaly holds for the human reader who can now more easily
>    understand what it is that is computed.
> 
That sounds like a good answer to me!
regards,
Lauri Pietarinen
These "major extensions" they talk about are the NEST and UNNEST operators.
You have just yourself told me that these can be done in Tutorial-D. The
inescapable conclusion then is that it is therefore a superset of the nested
relational algebra. So it walks like a duck, it talks like duck, and it
swims like a duck. Why Date and Darwen want to deny that it is is a duck is
a complete mystery to me.
>What I understand of this is that the attribute is seen by the relational
>operators as just an attribute. We have to separately "open up" that
>attribute to see what's inside it and further operate on it.
Roughly that is also what happens in the nested relational model.
>> So can I abuse you a little more as my tutorial-D tutor? :-) Can I have
>> recursive types? For example (probably not the right syntax, but I think you
>> will understand what I mean):
>> 
>> TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
>>
>
>I don't really know if what you suggest here is permissible - perhaps
>it is.  However there are no relational operators that would "traverse" the
>tree.  You would have to explicitely "open up" each tree-tuple separately.
>
>What are you trying to achieve with this construct?  Is it something
>that can't be done any other way?  What would this be useful for?
Recursive types are used in lots of programming languages. For a concrete
example think of parsing trees or XML documents. Of course you can simulate
those with unnested tables and non-recursive types but then you have to
introduce artificial node identifiers.
-- Jan Hidders
Date indeed admits and even explains his recantation in "An Introduction to
Databases, 7th Edition" in Chapter 5 under the subsection entitled
"Relation-Valued Attributes".  I am somewhat surprised by your surprise.
His 7th edition has been out for some time now.
I hate to dampen any glee concerning Date's apparent abandonment of 1NF and
the classical relational model, but he does also write the following in
reference to what Ms. Pietarinen's commented on in Date's "Introduction...."
annotation:
"Some writer's would reject this definition and define an 'NF^2 relation' to
be any relation with at least one attribute that is relation-valued.  We
have our own reasons for arguing that such a relation in fact is in first
normal form."
My thoughts on the foregoing are that Date interpreted Roth, et. al's
definition of a NF^2 relation to be a relation that held non-relational
structures (i.e. quasi-relations holding repeating groups).  He seems to
maintain that true relations with true relation-valued attributes still meet
the criteria to be categorized as both 1NF, and even more importantly also
as true relations based on their classical definition.  I interpret his
rationale for this as being that since each relation is a kind of value, or
type, it is considered a single "value" for each intersection of tuple and
attribute (as opposed to repeating values),  much along the lines of Ms.
Pietarinen's assessment where each "value" needs to be opened up
independently for a further breakdown of composite scalar values.
So please don't kiss 1NF good-bye yet. ;-)
Moreover, Mr. Date, in chapter 11, does go on to say that it seems
prefereable to avoid relation-valued attributes, at least for base relation
variables in most cases, because they are asymmetric and provide a degree of
needless complexity.
Regards,
Dan Guntermann
"Lauri Pietarinen" <lauri.pi...@atbusiness.com> wrote in message
news:e9d83568.02111...@posting.google.com...
> >
> >To quote further from TTM  (pages 136-137):
> >
> ><quote>
> >
> >Note that we permit relations (and tuples) to include attributes whose
> >values are relations. (However, we explicitly do not espouse "NF squared"
> >relations as described in e.g. reference [121], because they
> >involve major extensions to the classical relational algebra,
> >extensions that we find unnecessary.)
> >
> >[121] Mark A. Roth, Henry F. Korth, and Abraham Silberschatz:
> >"Extended Algebra and Calculus for Nested Relational Databases,"
> >ACM TODS 13, No.4 (December 1988)
> >
> ></quote>
> 
> These "major extensions" they talk about are the NEST and UNNEST operators.
> You have just yourself told me that these can be done in Tutorial-D. The
> inescapable conclusion then is that it is therefore a superset of the nested
> relational algebra. So it walks like a duck, it talks like duck, and it
> swims like a duck. Why Date and Darwen want to deny that it is is a duck is
> a complete mystery to me.
OK, I am no expert on this subject - you could be correct.
> >
> >What are you trying to achieve with this construct?  Is it something
> >that can't be done any other way?  What would this be useful for?
> 
> Recursive types are used in lots of programming languages. For a concrete
> example think of parsing trees or XML documents. Of course you can simulate
> those with unnested tables and non-recursive types but then you have to
> introduce artificial node identifiers.
OK, I get the picture.  But to me it looks like you
are trying to undermind the original power of the RM
by somehow introducing new kinds of data dependencies,
i.e. entities not identified by value but by "position".
What use are my powerfull relational operators if I have to write
a program to navigate the structure?
Anyway, whats so frightening about artificial node identifiers?
Granted current SQL-products (except for DB2) do not have
decent recursive operators, but there are other ways (e.g.
nested set-representation).
The beauty of RM relies in the fact that we can use
very powerfull operators on the data.  
Why do we want to throw that away by using nested relations
(excessively)?
regards,
Lauri Pietarinen
No, in fact, that is exactly the problem. If the nodes or subtrees are
identified by position then I could easily model it in the flat relational
model. But in the examples I gave they are true values and identified by
their components.
>What use are my powerfull relational operators if I have to write
>a program to navigate the structure?
All you need is a while-loop or recursion and Tutorial-D has that.
>Anyway, whats so frightening about artificial node identifiers?
Not much, the usual arguments are that they don't have "business meaning"
and they make it necessary to add explicit other key constraints to indicate
the "real identifier" and in this case you also need extra explicit
constraints to indicate that they form a tree. So if we can easily avoid
them by extending the data model a little I think we should.
-- Jan Hidders
How much difference do you think it would make? I agree that their model could
certainly be 'marketed' with more, err pazazz.
> >On pages 137-138 there is a discussion of NEST and UNNEST
> >(in Tutorial-D the operations are called GROUP and UNGROUP).
>
> Wow!! Do you realize what this means? Chris Date has recanted! The
> relational model is dead, long live the nested relational model. We can
> finally kiss the 1NF good bye. Good riddance, I say. I think I'm beginning
> to like Tutorial-D. :-)
>
> So can I abuse you a little more as my tutorial-D tutor? :-) Can I have
> recursive types? For example (probably not the right syntax, but I think you
> will understand what I mean):
>
> TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
I don't have the book on me today, but I'm sure that they explicitly disallow
such recursive definition of both tuples and relvars.
Are there arguments in favour of recursively defined types in the relational
model?  I don't recall D&D explaining why they disallow them.
What would a valid value of Tree look like?
{"NodeA:Value1", ("NodeB:Value2", ("", {}) )}
How do we avoid recursing infinitely?
Regards
Paul Vernon
Business Intelligence, IBM Global Services
Indeed, and if the tone at dbdebunk became a bit less aggressive people
might actually start to listen again.
>> [...] Can I have recursive types? For example (probably not the right
>> syntax, but I think you will understand what I mean):
>>
>> TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
>
>I don't have the book on me today, but I'm sure that they explicitly disallow
>such recursive definition of both tuples and relvars.
>
>Are there arguments in favour of recursively defined types in the relational
>model?  I don't recall D&D explaining why they disallow them.
Trees are a very common data structure. Also think of semistructured data.
If you don't support that then people will add their own domains to store
them in and then they will be out of reach of the query optimizer.
>What would a valid value of Tree look like?
>
>{"NodeA:Value1", ("NodeB:Value2", ("", {}) )}
Yeah, something like that.
>How do we avoid recursing infinitely?
Usually by limiting recursion to structural recursion.
-- Jan Hidders
You do have a point there - I might try to pass it on.
> >Are there arguments in favour of recursively defined types in the
relational
> >model?  I don't recall D&D explaining why they disallow them.
>
> Trees are a very common data structure. Also think of semistructured data.
> If you don't support that then people will add their own domains to store
> them in and then they will be out of reach of the query optimizer.
True, but would your tree tuple example be the best way of implementing a tree
data-structure anyway?
> >What would a valid value of Tree look like?
> >
> >{"NodeA:Value1", ("NodeB:Value2", ("", {}) )}
>
> Yeah, something like that.
>
So
     ("", {})
and therefore
    {}
would be valid values of that type? Doesn't look quite right to me.
Implementing? This is the logical level we are talking about, so a better
word would be "modelling" or "representing" and it should IMO be up to the
user if this is how he or she wants to represent that data that way or not.
>> >What would a valid value of Tree look like?
>> >
>> >{"NodeA:Value1", ("NodeB:Value2", ("", {}) )}
>>
>> Yeah, something like that.
>
>So
>     ("", {})
>and therefore
>    {}
>would be valid values of that type? Doesn't look quite right to me.
That's why I said "something like that". :-) If you want to be precise it
should be for example:
  ( node-value : "book",
    subtrees : { ( node-value : "chapter1", subtrees : {} ),
                 ( node-value : "chapter2", subtrees : {} ) } )
and in this value the following nested values would also belong to the type:
( node-value : "chapter1", subtrees: {} )
( node-value : "chapter2", subtrees: {} )
Note that his example is a bit artificial because the chapters are more
likely to be list then a set, but that is besides the point.
-- Jan Hidders
 So {} is a valid value for this tuple type. Excuse my ignorance, but what is
it? A tuple with two attributes but no 'rows'?!
Yes, I have to admit a feel a bit stupid here since the book has been lying
on my desk for the last 3 monts. But I used to teach from the 6th edition
and since then we only use a few chapters and use Ullman for when things get
serious.
>I hate to dampen any glee concerning Date's apparent abandonment of 1NF and
>the classical relational model, but he does also write the following in
>reference to what Ms. Pietarinen's commented on in Date's "Introduction...."
>annotation:
>
>"Some writer's would reject this definition and define an 'NF^2 relation' to
>be any relation with at least one attribute that is relation-valued.  We
>have our own reasons for arguing that such a relation in fact is in first
>normal form."
>
>My thoughts on the foregoing are that Date interpreted Roth, et. al's
>definition of a NF^2 relation to be a relation that held non-relational
>structures (i.e. quasi-relations holding repeating groups).
Quasi relations holding repeating groups? Of course they do because that is
what nested relations do! But Roth et al give a decent formal definition of
their relations so I fail to see how Date could have misunderstood that.
>So please don't kiss 1NF good-bye yet. ;-)
>
>Moreover, Mr. Date, in chapter 11, does go on to say that it seems
>prefereable to avoid relation-valued attributes, at least for base relation
>variables in most cases, because they are asymmetric and provide a degree
>of needless complexity.
I fully agree with Date here. In fact, I would even go further then that.
IMO we should stick with the original flat interpretation of the relational
model, (so not even relation-valued attributes) but with the addition that
we allow domains of abstract identifiers (or object identifiers if you want
to call them that). With that you can represent anything you want (nested
sets, lists, trees, graphs, et cetera) and still exploit the full power of
the relational algebra (although we would need some new operators to create
new identifers). People often forget that the requirement that the domains
should not contain abstract identifiers is something where the relational
model deviates from first-order logic.
But perhaps one of these days Date will explain to use that this is what
Codd had meant all along. :-)
-- Jan Hidders
> >OK, I get the picture.  But to me it looks like you
> >are trying to undermind the original power of the RM
> >by somehow introducing new kinds of data dependencies,
> >i.e. entities not identified by value but by "position".
> 
> No, in fact, that is exactly the problem. If the nodes or subtrees are
> identified by position then I could easily model it in the flat relational
> model. But in the examples I gave they are true values and identified by
> their components.
They are - in your approach - in fact identified by their parents
values.
I think that kind of modeling of trees is a disaster.  How do you
search for nodes that - say - have the value 'x' in some attribute.
Let's be concrete:  if you are saving an XML-tree as you have
proposed, how do you search for all elements with element type =
'person'?
Do you need a do-while loop for that??
What's wrong with the 'traditional' way, say
create table node
  (nodeID int primary key,
   blah blah...
  );
create table edge
   (parentNodeID int references node,
    childNodeId int primary key references node,
    childOrder int,
    unique(parentNodeId, childOrder );
This requires recursive support (available
in DB2, and partly Oracle)
(Needless to say: Dataphor has it)
Or, you can take a look at
"On Accelerating XPath Evaluation in Any RDBMS"
for an approach that works on _any_ (SQL)-database.
> 
> >What use are my powerfull relational operators if I have to write
> >a program to navigate the structure?
> 
> All you need is a while-loop or recursion and Tutorial-D has that.
But I want to use my RELATIONAL OPERATORS!! I don't want
to use any stupid loops!  They are hard to write and
the optimizer won't understand them anyway.
> 
> >Anyway, whats so frightening about artificial node identifiers?
> 
> Not much, the usual arguments are that they don't have "business meaning"
> and they make it necessary to add explicit other key constraints to indicate
> the "real identifier" and in this case you also need extra explicit
> constraints to indicate that they form a tree. So if we can easily avoid
> them by extending the data model a little I think we should.
This is shooting with a canon to kill a fly!
regards,
Lauri Pietarinen
You mean where Dr. Codd states the following in his "A Relational Model of
Data for Large Shared Data Banks." paper (p. 380)?
Excerpt below:
"So far , we have discussed examples of relations which are defined on
simple domains-domains whose elements are atomic (nondecomposable) values.
Nonatomic values can be discussed within the relational framework.  Thus,
some domains may have relations as elements.  These relations may, in turn,
be defined on nonsimple domains, and so on."
E.F. Codd.  "Relational Model of Data for Large Shared Databanks.  Comm. ACM
13, 6 (June 1970)
Regards,
Daniel Guntermann
Have you been reading my PhD thesis? But seriously, they are not. Two trees
are the same if they have the same attributes and set of children, just like
two tupels are the same if they have the same attributes. Period. The parent
doesn't come into that.
>I think that kind of modeling of trees is a disaster.  How do you
>search for nodes that - say - have the value 'x' in some attribute.
By using recursion:
Let's say the type is defined by
  TYPE Tree = TUPLE { Name : STRING,
                      Attr : STRING,
                      Subtrees : RELATION { Tree } }
and we have a relation Trees : RELATION { Tree }
Then you can do (I hope you will forgive me if I use the conventional nested
relational algebra):
R1 := Trees UNION (UNNEST[Subtrees](PROJECT[Subtrees](R1)))
R2 := PROJECT[Name,Attr](R1)
R3 := SELECT[Attr="Lauri"](R2)
>Do you need a do-while loop for that??
No.
>What's wrong with the 'traditional' way, say [...]
You are inventing a nodeID that was not in the original data model of the
user. Don't forget that we are talking about a *logical* data model here.
There's nothing wrong with using surrogate identifiers but that is an
implementation issue and the user shouldn't have to be aware of that.
>This requires recursive support (available
>in DB2, and partly Oracle)
>
>(Needless to say: Dataphor has it)
Good. So everything we need is already there.
>Or, you can take a look at 
>
>"On Accelerating XPath Evaluation in Any RDBMS"
>
>http://www.google.com/url?sa=U&start=1&q=http://wwwhome.cs.utwente.nl/~keulen/onderzoek/XPathAccel/tods2002_xpath-accel.pdf&e=747
I know that research very well. (It's from the university where I studied.)
In fact, part of my current research is about showing how simple relational
algebra is enough to implement almost the whole of XPath, axes, filters and
all. That shows that even something as complicated and messy as XML (=
eXtremely Messy Language) can be effectively tackled by the simple relational
model.
>But I want to use my RELATIONAL OPERATORS!!
You can. But I don't think you should be using algebra operators for that
but a more declarative type of language. I assume Dataphor has that?
>I don't want to use any stupid loops!
You don't even have to use smart loops. :-)
>They are hard to write and the optimizer won't understand them anyway.
It should be able to deal with recursion. A lot of research has been done in
that area, but at the moment very little of that has found its way into real
DBMSs. Btw., while-loops can often be simulated with recursion (depends upon
the type of recursion that is allowed and what its semantics is) so a good
query optimizer should also be able to deal with that kind of construct.
-- Jan Hidders
Where does Codd speak here about abstract identifiers? Or do you mean that
Codd already here suggested nested relations? That is certainly
true, but he then proceeds with explaining in section 1.4 how these should
be normalized away.
-- Jan Hidders
Look in the type definition. The type of 'subtrees' is a RELATION type. The
{} denotes the empty set, i.e., an empty relation.
-- Jan Hidders
Just prior to 1.4, he discusses confusion due to failure to distinguish
between type and instance with a very direct reference to "record".  I
gather that he considers a relation value in a relation tuple to be of a
_complex domain_ because he considers them to be repeating/multiple
instances of tuple types rather than one encapsulated instance of a relation
type.
His naming of attribute roles as _jobhistory_, _salaryhistory_, and
_children_ in Figure 3(a) almost seemed like abstract identifiers at one
point.  Could there be such a thing as "conceptual encapsulation"?
Regards,
Daniel Guntermann
With user defined types, one can define whatever type one wants. If you want
a type to use as an abstract identifier, just define one. Ordered, unordered
whatever you want. Just tell the DBMS what operations it has.
> People often forget that the requirement that the domains
> should not contain abstract identifiers is something where the relational
> model deviates from first-order logic.
How so? The relational model allows full support of user defined types.
> But perhaps one of these days Date will explain to use that this is what
> Codd had meant all along. :-)
Perhaps, one day.
> -- Jan Hidders
Sorry to say - no I haven't. Can you give me a
pointer?
> But seriously, they are not. Two trees
> are the same if they have the same attributes and set of children, just like
> two tupels are the same if they have the same attributes. Period. The parent
> doesn't come into that.
But we are looking at the tree from two (or more?) viewpoints:
- the nodes themselves
- the nodes in relation to other nodes
Say I have I file structure with files and
folders (or just folders to simplify the picture).
Now the folders them selves are interesting on their
own right, but what is also interesting is their
relation to one another.  But these two aspects
are distinct.
In the "nested relation" approach we would emphasize
the structural aspect only.
And what if these folders have files in them?
Do we just nest the files inside the folders?
And if the files have access rights - we
just nest them inside the files?
And the nodes _are_ identified via their parents.
If you dont know their parents, how do you get
a hold on them?
  
> 
> By using recursion:
> 
> Then you can do (I hope you will forgive me if I use the conventional nested
> relational algebra):
Sure.
> 
> R1 := Trees UNION (UNNEST[Subtrees](PROJECT[Subtrees](R1)))
> R2 := PROJECT[Name,Attr](R1)
> R3 := SELECT[Attr="Lauri"](R2)
> 
OK - no do-while loops.  But how on earth does the optimiser
handle this query besides materialising the whole tree?
Can it use a traditional B-tree index to find Attr="Lauri"?
I think we are loosing so much for so little gain.
> >Do you need a do-while loop for that??
> 
> No.
> 
> >What's wrong with the 'traditional' way, say [...]
> 
> You are inventing a nodeID that was not in the original data model of the
> user. Don't forget that we are talking about a *logical* data model here.
> There's nothing wrong with using surrogate identifiers but that is an
> implementation issue and the user shouldn't have to be aware of that.
Well, speaking of XML specifically - the nodes are identified
by their parents and their position relative to the other siblings,
and that causes lot's of problems.  So you either identify
the nodes by position or by some surrogate.  My vote goes
for surrogates.
> 
> >This requires recursive support (available
> >in DB2, and partly Oracle)
> >
> >(Needless to say: Dataphor has it)
>
> Good. So everything we need is already there.
Sorry.
Not the recursive definitions, but an explode-operator.
> 
> >Or, you can take a look at 
> >
> >"On Accelerating XPath Evaluation in Any RDBMS"
> >
> >http://www.google.com/url?sa=U&start=1&q=http://wwwhome.cs.utwente.nl/~keulen/onderzoek/XPathAccel/tods2002_xpath-accel.pdf&e=747
> 
> I know that research very well. (It's from the university where I studied.)
> In fact, part of my current research is about showing how simple relational
> algebra is enough to implement almost the whole of XPath, axes, filters and
> all. That shows that even something as complicated and messy as XML (=
> eXtremely Messy Language) can be effectively tackled by the simple relational
> model.
> 
> >But I want to use my RELATIONAL OPERATORS!!
> 
> You can. But I don't think you should be using algebra operators for that
> but a more declarative type of language. I assume Dataphor has that?
What on earth could be more declarative than relational operators???
> >They are hard to write and the optimizer won't understand them anyway.
> 
> It should be able to deal with recursion. A lot of research has been done in
> that area, but at the moment very little of that has found its way into real
> DBMSs. Btw., while-loops can often be simulated with recursion (depends upon
> the type of recursion that is allowed and what its semantics is) so a good
> query optimizer should also be able to deal with that kind of construct.
Sorry, I am not convinced.
Maybe you could find an example where the recursive
definition is _really_ needed and we can't do without
it?
regards,
Lauri Pietarinen
This might be beside the point but TTM
includes two standard tables, i.e.
TableDee := 
 relation {
   row {} };
and TableDum :=
 relation { };
They are the '0' and '1' elements 
of the relational algebra.
-- proscription #5:
5. D shall not forget that relations with no attributes are
respectable and interesting, nor that candidate keys with no
components are likewise respectable and interesting.
For further elaboration see Date's
selected writings (1989-1991).
There are some beautiful articles
by Darwen.
Sorry if this was a distraction.
regards,
Lauri Pietarinen
Indeed, but the million dollar question is here what exactly is the
definiton of "encapsulated instance" here. Have you seen a formal
definition of this anywhere? In OO and in the context of ADT's I know what
this means but there it concernes the *implementation* of a data type, but
that cannot be approriate here because the nested relational model is a
logical data model and therefore does not speak about implementation. The
only definition I can think of is that you cannot unnest the set,
but that is exactly what Date now has allowed.
>His naming of attribute roles as _jobhistory_, _salaryhistory_, and
>_children_ in Figure 3(a) almost seemed like abstract identifiers at one
>point.
They are readable strings and therefore not abstract.
>Could there be such a thing as "conceptual encapsulation"?
Exactly! My compliments. You are asking the right questions. I would say
that term encapsulation in a logical conceptual data model is a
contradiction in terms. This is actually one of the best and deepest
arguments I know against the use of OO data models as logical data models.
-- Jan Hidders
That depends upon what the minimal requirements are for defining it. If I
have to define a function that outputs the representation of the value then
there is a problem. It also depends on how exactly you can denote or create
new values. As long as these things are not exactly specified it is not
clear if you can use abstract identifiers or not.
-- Jan Hidders
Yes, sorry. Attribute Subtrees is a RELATION of type {TREE Tree}.
So by {} you mean a RELATION of type {TREE Tree} with no rows.
And to hummer me, {} = {} iff they are empty sets of the same relation type.
Yes?
Ealier, I said
>>True, but would your tree tuple example be the best way of implementing a
>>tree data-structure anyway?
>Implementing? This is the logical level we are talking about, so a better
>word would be "modelling" or "representing" and it should IMO be up to the
>user if this is how he or she wants to represent that data that way or not
OK, wrong usage of that word. IMO the fewer arbitrary options a user has the
better.
Agreed. This is Date saying about the same thing (I think)
Encapsulation Is a Red Herring
C.J.Date
http://www.dbpd.com/vault/9809date.html
It's not really that relevant except that the question came up whether
nested values are identified by just their components are also by their
containing values. Just look in Google on my name, and you will find it.
>> But seriously, they are not. Two trees are the same if they have the same
>> attributes and set of children, just like two tupels are the same if they
>> have the same attributes. Period. The parent doesn't come into that.
>
>But we are looking at the tree from two (or more?) viewpoints:
>
>- the nodes themselves
>- the nodes in relation to other nodes
>
>Say I have I file structure with files and
>folders (or just folders to simplify the picture).
>
>Now the folders them selves are interesting on their
>own right, but what is also interesting is their
>relation to one another.  But these two aspects
>are distinct.
>
>In the "nested relation" approach we would emphasize
>the structural aspect only.
>
>And what if these folders have files in them?
>Do we just nest the files inside the folders?
>
>And if the files have access rights - we
>just nest them inside the files?
>
>And the nodes _are_ identified via their parents.
>If you dont know their parents, how do you get
>a hold on them?
You are speaking about a specific kind of trees; those where the nodes have
identifiers. In the example I gave this is not the case.
>> By using recursion:
>> 
>> Then you can do (I hope you will forgive me if I use the conventional nested
>> relational algebra):
>
>Sure.
>
>> 
>> R1 := Trees UNION (UNNEST[Subtrees](PROJECT[Subtrees](R1)))
>> R2 := PROJECT[Name,Attr](R1)
>> R3 := SELECT[Attr="Lauri"](R2)
>> 
>
>OK - no do-while loops.  But how on earth does the optimiser handle this
>query besides materialising the whole tree?
By pushing the selection into the recursion and using a query optimization
technique called "magic sets".
>Can it use a traditional B-tree index to find Attr="Lauri"?
Yes.
>I think we are loosing so much for so little gain.
What loss? Everything that could be done can still be done. Loss of
optimizability? Not true; in the physical model the data might be
represented in the same way as when the user had used surrogate identifiers?
Loss of simplicity? That is up to the user, if he or she wants to represent
their data that way then it is probably the simpelest for them.
>> You are inventing a nodeID that was not in the original data model of the
>> user. Don't forget that we are talking about a *logical* data model here.
>> There's nothing wrong with using surrogate identifiers but that is an
>> implementation issue and the user shouldn't have to be aware of that.
>
>Well, speaking of XML specifically - the nodes are identified
>by their parents and their position relative to the other siblings,
>and that causes lot's of problems.  So you either identify
>the nodes by position or by some surrogate.  My vote goes
>for surrogates.
There is actually in this case a natural key: the position in the document.
>> >This requires recursive support (available
>> >in DB2, and partly Oracle)
>> >
>> >(Needless to say: Dataphor has it)
>>
>> Good. So everything we need is already there.
>
>Sorry.
>Not the recursive definitions, but an explode-operator.
Then the claim that it has recursion is incorrect. The explode operator is
ony a very limited form of recursion. Why does it not have recursion?
>> You can. But I don't think you should be using algebra operators for that
>> but a more declarative type of language. I assume Dataphor has that?
>
>What on earth could be more declarative than relational operators???
The relational calculus.
-- Jan Hidders
No problem, I just noticed that I switched my use of brackets between the
type and the instance to the usual mathematical convention ( () for tuples,
{} for sets), so the confusion is my fault.
>So by {} you mean a RELATION of type {TREE Tree} with no rows.
Yes.
>And to hummer me, {} = {} iff they are empty sets of the same relation type.
>Yes?
If that's what makes you happy. :-) If I had done it properly then I would
have written with every relation the relation type it belongs to.
>>>Ealier, I said True, but would your tree tuple example be the best way of
>>>implementing a tree data-structure anyway?
>
>>Implementing? This is the logical level we are talking about, so a better
>>word would be "modelling" or "representing" and it should IMO be up to the
>>user if this is how he or she wants to represent that data that way or not
>
>OK, wrong usage of that word. IMO the fewer arbitrary options a user has the
>better.
It is very difficult to judge in advance if such a choice is always and
everywhere under all circumstances arbitrary or not. There is no data
structure so weird or somewhere on the world there is somebody who has good
reasons to model their data that way. And simulating it with surrogate
identifiers means that you have to add a set of not so trivial constraints
that guarantee that is a tree and that there are no duplicats in the sets,
and you burden the user with computing explicitly the deep equality. The
bottom-line is that you cannot make the complexity go away by modelling your
data in a simpeler data model.
-- Jan Hidders
Agreed. As simple as possiable, but no simpler. :-)
Do you have a reference that explores the err, 'pros/cons' of recursive types?
From what you have said so far, I could certainly entertain the thought that
they might be a good idea.
In some sense all the research that is being done on storing semistructured
data or XML in databases is basically an exercise in storing nested data for
which we don't know the nesting depth.
What is also interesting is the work in type theory that was done already in
the past (thing Algol) on recursive types. That turned not to be so
difficult as one might expect. Think subsumption for DFAs. If you want a
reference for more recent work:
http://citeseer.nj.nec.com/amadio93subtyping.html
-- Jan Hidders
> >OK - no do-while loops.  But how on earth does the optimiser handle this
> >query besides materialising the whole tree?
> 
> By pushing the selection into the recursion and using a query optimization
> technique called "magic sets".
> 
> >Can it use a traditional B-tree index to find Attr="Lauri"?
> 
> Yes.
So this is in a sense just a short cut to tell the system
"this is a tree"?
> What loss? Everything that could be done can still be done. Loss of
> optimizability? Not true; in the physical model the data might be
> represented in the same way as when the user had used surrogate identifiers?
> Loss of simplicity? That is up to the user, if he or she wants to represent
> their data that way then it is probably the simpelest for them.
Well, more stuff to implement...  more bugs... 
On the other hand, if this really is necessary then why not.
But I just got the impression that you thought that
these recursive definitions were very important.  Now
it looks like they _might_ be useful in _some_ situations.
 
> There is actually in this case a natural key: the position in the document. 
True. Actually prefix and postfix orders will give the structure as well.
> 
> >> >This requires recursive support (available
> >> >in DB2, and partly Oracle)
> >> >
> >> >(Needless to say: Dataphor has it)
> >>
> >> Good. So everything we need is already there.
> >
> >Sorry.
> >Not the recursive definitions, but an explode-operator.
> 
> Then the claim that it has recursion is incorrect. The explode operator is
> ony a very limited form of recursion. 
Sorry - a missunderstanding.
>Why does it not have recursion?
Poor guys! They have done quite a lot in
1,5 years! Don't be too harsh on them ;-)
> 
> >> You can. But I don't think you should be using algebra operators for that
> >> but a more declarative type of language. I assume Dataphor has that?
> >
> >What on earth could be more declarative than relational operators???
> 
> The relational calculus.
Are they not equivalent?
regards,
Lauri Pietarinen
Is there a theoretical problem with just using plain
integers as surrogates or "object id's"?  Or is it just that we 
want to prevent the user from comparing apples and oranges?
regards,
Lauri Pietarinen
The Third Manifesto gives a 
relational Calculus Version
of Tutorial D in Appendix A.
Dataphor does not have the calculus.
regards,
Lauri Pietarinen
Exactly. It saves you the trouble of having to specify explicitly with
constraint that the nodes form trees. Also, if you want to say that two
nodes represent the tree you can say T1 = T2. Now try to express that in
the surrogate key simulation.
>> What loss? Everything that could be done can still be done. Loss of
>> optimizability? Not true; in the physical model the data might be
>> represented in the same way as when the user had used surrogate identifiers?
>> Loss of simplicity? That is up to the user, if he or she wants to represent
>> their data that way then it is probably the simpelest for them.
>
>Well, more stuff to implement...  more bugs... 
>On the other hand, if this really is necessary then why not.
>But I just got the impression that you thought that
>these recursive definitions were very important.  Now
>it looks like they _might_ be useful in _some_ situations.
That is what I meant. I'm not saying that they should be possible. What I'm
wondering about is if there are very good reasons to forbid them.
>>Why does it not have recursion?
>
>Poor guys! They have done quite a lot in
>1,5 years! Don't be too harsh on them ;-)
Oh, nothing but respect fo these guys.
>> >What on earth could be more declarative than relational operators???
>> 
>> The relational calculus.
>
>Are they not equivalent?
Yes, but the general idea is that the calculus is closer to how people
actually think and does not imply so much an execution order as the algebra
does. 
-- Jan Hidders
Thanks Jan.
I was afraid that you'd point me in that direction.  I notice that the latest
Systems Journal is dedicated to 'Information Integration' and includes a
number of XML+Relational articles. I guess I should bite the bullet and see
what the great and the good have been upto on reconciling a extremely messy
language with relational.  :-)
http://www.research.ibm.com/journal/sj41-4.html
> What is also interesting is the work in type theory that was done already in
> the past (thing Algol) on recursive types. That turned not to be so
> difficult as one might expect. Think subsumption for DFAs. If you want a
> reference for more recent work:
>
>  http://citeseer.nj.nec.com/amadio93subtyping.html
>
Regards
No, there is a deeper problem here. Suppose you use numbers to represent
the nodes of a nested value like a tree. Normally in a query in a model with
explicit nested values you can ask a query that transforms the given nested
values into another nested value, e.g., it transforms the trees into other
trees. If you want to do that in your surrogate-identifier simulation you
have to generate new identifiers in your query. There are theoretical
results that prove that whatever computation on numbers you use you cannot
simulate every query that you could do with explicit nested values.
However, these results also apply to abstract identifiers if you only have a
new_id() constructor and only flat relations. But it is also known that you
can remedy this by either allowing 1 level of nesting (no more is needed) or
introduce a special algebra operator that creates new abstract identifiers
for each of the sets defined by a (flat) binary relation.
Sorry that this is so abstract, but it takes a lot of time to explain this
properly. If you want to know more:
 
  Jan Van den Bussche, Dirk Van Gucht, Marc Andries, and Marc Gyssens. On
  the completeness of object-creating query languages (extended abstract). In
  33rd Annual Symposium on Foundations of Computer Science, pages 372-379,
  Pittsburgh, Pennsylvania, 24-27 October 1992. IEEE.
http://citeseer.nj.nec.com/420972.html
Btw., they use the term "object-oriented" a lot in their paper, but if you
look closely you will see that what they are talking about is in fact the
(flat) relational model extended with object identifiers.
-- Jan Hidders
Thanks for the reference.
Date has mentioned/hinted/eluded to a new revised definition for 1NF that I
assume takes into account his acceptance of nested relations as attribute
values.  Any pointers to a definitive reference in this regard?  I don't
have TTM, unfortuntately.
Regards,
Daniel Guntermann
Consider a simple recursion example:
with iota (n) as
( values(1)
   union all
  select n+1 from iota
  where n<100000
)
select * from iota where n*n=25;
Could n*n=25 predicate be pushed into the recursion? When a predicate
can be pushed into recursion?
That's not so easy to explain. If you have "the Alice book" see section 13.3
on ... Magic :-).
-- Jan Hidders
OK, I got it now, thanks.  How high in priority are recursive definitions
in your opinion?
> >>Why does it not have recursion?
> >
> >Poor guys! They have done quite a lot in
> >1,5 years! Don't be too harsh on them ;-)
>
> Oh, nothing but respect fo these guys.
But you haven't even used Dataphor!
Do you really take their word for it?? ;-)
> >> >What on earth could be more declarative than relational operators???
> >>
> >> The relational calculus.
> >
> >Are they not equivalent?
>
> Yes, but the general idea is that the calculus is closer to how people
> actually think and does not imply so much an execution order as the algebra
> does.
Agreed. But algebra only _looks_ like imposing an order. It doesn't, of course.
regards,
Lauri Pietarinen
>
Not very high. As you already know it can be simlulated if you really want
them, and the theoretical problems I mentioned can be solved by moving some
work to the application layer.
>> >Poor guys! They have done quite a lot in
>> >1,5 years! Don't be too harsh on them ;-)
>>
>> Oh, nothing but respect fo these guys.
>
>But you haven't even used Dataphor!
>Do you really take their word for it?? ;-)
No, but implementing a DBMS is one of the hardest things around. It's right
up there with building games, compilers and operating systems. But perhaps
the term DBMS doesn't apply to Dataphor since it doesn't really have its own
storage engine. Or does it?
-- Jan Hidders
That's true.  They use SQL-databases as a storage engines and
they translate the D4-statements into SQL (that was perhaps a bit simplified
statement).
They have currently concentrated on implementing the TTM-stuff and
on an application generator that "understands" the db-schema (and
hence generates a lot of stuff "for free").
I understand that they have plans to build their own storage engine.
regards,
Lauri Pietarinen
>From "OO PRESCRIPTIONS" :
>
><quote>
>
>3. D shall be computationally complete. That is, D may support,
>but shall not require, invocation from so-called "host
>programs" written in languages other than D. Similarly, D may
>support, but shall not require, the use of other programming
>languages for implementation of user-defined operators.
>
></quote>
>
>So, well, need I say more?
Well, it proves that a valid D language is computationally complete,
but it is not a proof about if Tutorial D is a valid D language :-)
Alfredo
> So can I abuse you a little more as my tutorial-D tutor? :-) Can I have
> recursive types? For example (probably not the right syntax, but I think you
> will understand what I mean):
>
> TYPE Tree = TUPLE { NODE-VALUE STRING, SUBTREES RELATION { TREE Tree } }
just returning to this question.  I did not find anything related to recursive
definitions
in TTM (www.thethirdmanifesto.com).  However, here is an example of a recursive
operator expressed
in Tutorial-D (page 163, 2ed):
OPERATOR TRANCLO ( XY RELATION { X P#, Y P# } )
                                             RETURNS RELATION { X P#, Y P# } ;
    RETURN
         WITH ( XY UNION ( ( XY COMPOSE
                                             ( XY RENAME ( Y AS Z, X AS Y ) ) )
                     RENAME Z AS Y ) ) AS TTT :
          IF TTT = XY THEN TTT     /* unwind recursion */
               ELSE TRANCLO ( TTT )    /* recursive invocation */
         END IF;
 END OPERATOR;
Your Tutorial-D Tutor,
Lauri Pietarinen
[...]
Sorry that this is so abstract, but it takes a lot of time to explain this
properly. If you want to know more:Jan Van den Bussche, Dirk Van Gucht, Marc Andries, and Marc Gyssens. On
the completeness of object-creating query languages (extended abstract). In
33rd Annual Symposium on Foundations of Computer Science, pages 372-379,
Pittsburgh, Pennsylvania, 24-27 October 1992. IEEE.
OK, thanks.  I  plodded through the paper, but it was a bit too
heavy for me
(my algebra is kinda rusty).  Is this bound to be a serious problem
for
normal database users, if such indentifiers are not provided for?
A note on object identifiers:  The "Other Orthogonal" Proscription
nr.2
in Manifesto (www.thethirdmanifesto.com) proscribes the use of object
id's.
Frankly, I don't know how the term is usually used, say in the context
of the
previous article (it was mentioned only once), but while we are on
the
subject I would like to point out a presentation by Darwen where
he describes the subtyping scheme of the Manifesto and also makes
(in my opinion) a convincing case against object id's (in the sense
that they are used in the Object Oriented world).
Presentation: http://www.TheThirdManifesto.com/taamoti.slides.pdf
Also Date's keynote address in this year's VLDB touches on the subject:
http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/slides/S01P02slides.pdf
regards,
Lauri Pietarinen
 
 
 
 
Ok. Thanks for looking. When I browse through it I get the impression from
their terminology that they haven't really considered it but it is not
explicitly ruled out either.
>However, here is an example of a recursive operator expressed in Tutorial-D
>(page 163, 2ed):
>
>OPERATOR TRANCLO ( XY RELATION { X P#, Y P# } )
>                                             RETURNS RELATION { X P#, Y P# } ;
>    RETURN
>         WITH ( XY UNION ( ( XY COMPOSE
>                                             ( XY RENAME ( Y AS Z, X AS Y ) ) )
>                     RENAME Z AS Y ) ) AS TTT :
>          IF TTT = XY THEN TTT     /* unwind recursion */
>               ELSE TRANCLO ( TTT )    /* recursive invocation */
>         END IF;
> END OPERATOR;
Wow. That's a pretty powerful type of recursion. It will be interesting to
see how well query optimization for such recursive expressions will perform
once this gets really implemented. Dataphor has this I presume?
-- Jan Hidders
No. If you have also something like nested relations then there is actually
no problem. But it would be more a problem if you insist on flat relations,
but even then you can solve these problems with moving the identifier
manipulation outside the DBMS into the application.
>A note on object identifiers:  The "Other Orthogonal" Proscription nr.2
>in Manifesto (www.thethirdmanifesto.com) proscribes the use of object id's.
>Frankly, I don't know how the term is usually used, say in the context of the
>previous article (it was mentioned only once), but while we are on the
>subject I would like to point out a presentation by Darwen where
>he describes the subtyping scheme of the Manifesto and also makes
>(in my opinion) a convincing case against object id's (in the sense
>that they are used in the Object Oriented world).
>
>Presentation:  http://www.TheThirdManifesto.com/taamoti.slides.pdf
I don't think he argues here against object identifers per se, but rather
against the confusion between types for pointers and types for the things
that pointers point to in many OO typing systems, which causes all these
subtyping headaches that makes you see circles as ellipses. :-)
>Also Date's keynote address in this year's VLDB touches on the subject:
>
> http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/slides/S01P02slides.pdf
Yes, but the object identifier as pointer is not the type of object
identifer I (and the article I mentioned) was talking about. What I meant
could also be called "abstract value" and is in some sense the theoretical
clean version of surrogate identifiers. The big difference with surrogate
identifiers is that the user can never get to see their value and the only
operation that is defined for them is equality. Such a restricted type may
seem a bit strange and not very useful at first sight but it makes the life
of the DBMS easier (for example the DBMS can reuse identifiers that are no
longer used) and allows the user to define views and queries in which new
identifiers are generated without using some clever arithmetic to generate
new surrogate identifiers. In fact, if such object identifiers are possible
(including special operators to generate them) IMO the good old flat
relational model would be all we need.
Well, they also go on to say that
"We make no claim that TRANCLO is very efficcient but it surely can
be improved in this regard"
> Dataphor has this I presume?
I think they do not yet support relation valued operators.
regards,
Lauri Pietarinen
OK, that's what I suspected.
> What I meant
> could also be called "abstract value" and is in some sense the theoretical
> clean version of surrogate identifiers. The big difference with surrogate
> identifiers is that the user can never get to see their value and the only
> operation that is defined for them is equality. Such a restricted type may
> seem a bit strange and not very useful at first sight but it makes the life
> of the DBMS easier (for example the DBMS can reuse identifiers that are no
> longer used) and allows the user to define views and queries in which new
> identifiers are generated without using some clever arithmetic to generate
> new surrogate identifiers. In fact, if such object identifiers are possible
> (including special operators to generate them) IMO the good old flat
> relational model would be all we need.
So is the function of these identifiers to act as an "internal glue"?  Obviously
they are not used as external identifiers because the "user" can't see them.
What is the problem with generating such values?  If they are just integer
values (say) and there is a process inside the engine that gives the system the
next integer?  Or maybe generates a random value within a large space
and just quickly checks if it is reserved?
And one more question:  what would be a practical query in which
these abstract identifiers would be needed.
regards,
Lauri Pietarinen
In a sense, yes. Suppose you want to represent a nested relation R(a,B) (B
is the nested column) with the flat relations R1(a,c) and R2(c,b) where in
R1.c you use an identifier to replace the set in R.B, and in R2 you record
what the elements of the sets. You could indeed say that the identifiers in
colomns c is what allows us to glue the two relations together and
reconstruct the nested relation R.
>What is the problem with generating such values?  If they are just integer
>values (say) and there is a process inside the engine that gives the system
>the next integer?  Or maybe generates a random value within a large space
>and just quickly checks if it is reserved?
For surrogate identifers the DBMS must somehow remember all identifiers
that have ever been used, also those that are no longer used in any
relation. It could do this by either storing the set of all identifiers that
were ever used (could be big) or remembering the highest integer that it has
every generated, but then it cannot reuse identifiers that are no longer
used.
Another problem is if you define views with new identifiers in them. Should
these be regenerated every time the view is accessed? What would these
identifiers then mean?
>And one more question:  what would be a practical query in which
>these abstract identifiers would be needed.
Suppose I have a relation
Bought(customer, product)
and I want as the result a relation
Cust_class(class_id, customer)
that groups the customers into classes that have bought the same set of
prodcucts and gives them a new identifier.
-- Jan Hidders
> >So is the function of these identifiers to act as an "internal glue"?
>
> In a sense, yes. Suppose you want to represent a nested relation R(a,B) (B
> is the nested column) with the flat relations R1(a,c) and R2(c,b) where in
> R1.c you use an identifier to replace the set in R.B, and in R2 you record
> what the elements of the sets. You could indeed say that the identifiers in
> colomns c is what allows us to glue the two relations together and
> reconstruct the nested relation R.
OK. Why do they have to be abstract?
> >What is the problem with generating such values?  If they are just integer
> >values (say) and there is a process inside the engine that gives the system
> >the next integer?  Or maybe generates a random value within a large space
> >and just quickly checks if it is reserved?
>
> For surrogate identifers the DBMS must somehow remember all identifiers
> that have ever been used, also those that are no longer used in any
> relation. It could do this by either storing the set of all identifiers that
> were ever used (could be big) or remembering the highest integer that it has
> every generated, but then it cannot reuse identifiers that are no longer
> used.
Why does it have to remember all identifiers ever been used if it want's
to reuse them anyway?  On the other hand if it does not reuse the identifiers
what's wrong with an integer counter (say 2**64).  And still: why do the
identifiers have to be globally unique? Surely it suffices to have them
unique within each relation?
And how do "abstract identifiers" come to the resque?  Surely
the same pertains to them?
> Another problem is if you define views with new identifiers in them. Should
> these be regenerated every time the view is accessed? What would these
> identifiers then mean?
Why not? What are they used for if they mean nothing?
> >And one more question:  what would be a practical query in which
> >these abstract identifiers would be needed.
>
> Suppose I have a relation
>
>   Bought(customer, product)
>
> and I want as the result a relation
>
>   Cust_class(class_id, customer)
>
> that groups the customers into classes that have bought the same set of
> prodcucts and gives them a new identifier.
OK, but
- what's wrong with having an ID that can be seen
 - I would suppose that the id's could be generated
each time the query was executed - what's the
problem? They can't possibly overlap with
any other identifiers.  In a sense the view
does not exist until it is executed?
regards,
Lauri Pietarinen
Thanks,
Dan
"Jan Hidders" <hid...@REMOVE.THIS.uia.ua.ac.be> wrote in message
news:3dd59681$1...@news.uia.ac.be...
> Lauri Pietarinen  wrote:
> >Jan Hidders wrote:
> >
[snip]
True object identifiers are abstract, i.e., the user never gets to see them.
So you don't need a global mechanism and there is no reconciliation
problem. You have to think of them as for example the nodes in a graph. A
directed graph is a binary relation but how the nodes are exactly
represented is immaterial.
-- Jan Hidders
That's the wrong question. If their representation has no meaning for you,
why do you want to see it? This is a logical data model we are talking
about, it should model what is relevant for the user and nothing else. If
you're puzzled by what I mean see how it works in formal logic and model
theory, which are the roots of the relational model.
>> >What is the problem with generating such values?  If they are just
>> >integer values (say) and there is a process inside the engine that gives
>> >the system the next integer?  Or maybe generates a random value within a
>> >large space and just quickly checks if it is reserved?
>>
>> For surrogate identifers the DBMS must somehow remember all identifiers
>> that have ever been used, also those that are no longer used in any
>> relation. It could do this by either storing the set of all identifiers
>> that were ever used (could be big) or remembering the highest integer
>> that it has every generated, but then it cannot reuse identifiers that
>> are no longer used.
>
>Why does it have to remember all identifiers ever been used if it want's
>to reuse them anyway?
No, they cannot be reused and that is why you have to remember them. This is
because otherwise a user might come back after a while and ask if entity
with id 12345 is still there and get the answer "yes" but in reality this
entity was removed and after that a new identity was created the reused its
identifier.
>On the other hand if it does not reuse the identifiers
>what's wrong with an integer counter (say 2**64).
If you are going to use identifiers to simulated nested values you will have
to create new identifiers lots of times and then you might run out.
>And still: why do the identifiers have to be globally unique? Surely it
>suffices to have them unique within each relation?
That depends on what they represent and what the columns mean they are in.
>And how do "abstract identifiers" come to the resque?  Surely the same
>pertains to them?
Since the user cannot see them, the system can reuse them freely.
>> Another problem is if you define views with new identifiers in them.
>> Should these be regenerated every time the view is accessed? What would
>> these identifiers then mean?
>
>Why not?  What are they used for if they mean nothing?
Glue, as you called it, as in the example above with relation R and R1 and
R2.
>> >And one more question:  what would be a practical query in which
>> >these abstract identifiers would be needed.
>>
>> Suppose I have a relation
>>
>>   Bought(customer, product)
>>
>> and I want as the result a relation
>>
>>   Cust_class(class_id, customer)
>>
>> that groups the customers into classes that have bought the same set of
>> prodcucts and gives them a new identifier.
>
>OK, but
>
> - what's wrong with having an ID that can be seen
Like I said, no reuse and it's bad data modelling. The concrete
representation has no meaning so any operation that uses it should not be allowed
which leaves only equality.
> - I would suppose that the id's could be generated
>each time the query was executed - what's the problem?
If it is executed a lot you may run out of them.
-- Jan Hidders
thanks for the trouble for trying to clarify.
There must be some underlying assumptions that
I juat don't understand, because I fail to follow your
argumentation.
The only question I know how to ask is the following:
>>> Suppose I have a relation
>>>
>>>   Bought(customer, product)
>>>
>>> and I want as the result a relation
>>>
>>>   Cust_class(class_id, customer)
>>>
>>> that groups the customers into classes that have bought the same set of
>>> prodcucts and gives them a new identifier.
> > - I would suppose that the id's could be generated
> >each time the query was executed - what's the problem?
>
> If it is executed a lot you may run out of them.
Surely if that same view (that generates - perhaps abstract -
identifiers) is queried many times, it represents each time
the values of the execution at that point of time.  So if
I query it twice it makes no difference whether the
identifiers are the same or not, or what the result
of the query is.
So if I say
 select * from cust_class
     and get the answer {(1,1),(1,2),(2,3)}
and re-execute it after 10 minutes
   and get the answer {(1,1),(2,2),(2,3)}
that would be perfectly correct because
it would just be the result of the query
and that's it.
regards,
Lauri Pietarinen
Well, to be honest, it is a bit hard to explain it thoroughly in short
usenet postings. But if you keep asking the right questions we will get
there.
>The only question I know how to ask is the following:
>
>>>> Suppose I have a relation
>>>>
>>>>   Bought(customer, product)
>>>>
>>>> and I want as the result a relation
>>>>
>>>>   Cust_class(class_id, customer)
>>>>
>>>> that groups the customers into classes that have bought the same set of
>>>> prodcucts and gives them a new identifier.
>
>> > - I would suppose that the id's could be generated
>> >each time the query was executed - what's the problem?
>>
>> If it is executed a lot you may run out of them.
>
>Surely if that same view (that generates - perhaps abstract -
>identifiers) is queried many times, it represents each time
>the values of the execution at that point of time.  So if
>I query it twice it makes no difference whether the
>identifiers are the same or not, or what the result
>of the query is.
>
>So if I say
>
> select * from cust_class
>     and get the answer {(1,1),(1,2),(2,3)}
>
>and re-execute it after 10 minutes
>   and get the answer {(1,1),(2,2),(2,3)}
>
>that would be perfectly correct because
>it would just be the result of the query
>and that's it.
Yes, what could also happen is:
first answer: {(1,1),(1,2),(2,3)}
second answer: {(2,1),(2,2),(1,3)}
since a relation is a set and there is no fixed order in which the
identifiers are generated. This is theoretically a bit annoying; nothing has
changed in the database, you ask the same question, and yet you get a
different answer. Another more practical problem is that there may be
another view that (on the basis of this view) computes certain properties of
these groups like Avg_income(class_id, avg_inc) that will have computed
certain properties of these classes. If the identifiers change every time
you query the view the connection between this tables is lost.
-- Jan Hidders
Jan:
first you say that "you may run out of them [integers]"
then the problem is that different id's get assigned.
???
We must be assuming that the database is changing
all the time anyway, so if we want the class_id to mean
something, we would have to have a separate
table to describe the meaning of each class.
Anyway:  how do abstract identifiers help?  Now
we have to assign the same abstract indentifier
to each class - same problem I suppose?
regards,
Lauri Pietarinen
I had a look in the book (you should get a copy Jan :-), but could find no
comment either way on recursively defined types. That suggest to me that they
have not occasioned to consider such a possibility.
Yes, both are a problem. Why do you think these remarks are not compatible?
>We must be assuming that the database is changing
>all the time anyway, so if we want the class_id to mean
>something, we would have to have a separate
>table to describe the meaning of each class.
No, no, they already have a meaning; the represent a set of customers as
difened by the Bought table.
>Anyway:  how do abstract identifiers help?  Now
>we have to assign the same abstract indentifier
>to each class - same problem I suppose?
The user doesn't assign anything. That's the whole point. And since the user
cannot see the identifiers he or she can also not see whether the database
is now using other surrogate identifers or not.
I'm a bit time pressed at the moment, so if you don't mind I will try to
explain it a bit more fully another time.
-- Jan Hidders