Lucid statement of the MV vs RM position?

ralph...@gmail.com

unread,

Apr 20, 2006, 1:18:17 AM4/20/06

to

A number of people on this group are proponents of
Pick (MultiValue) DBMS. I've been trying to find a
definition for MultiValue to give me a better handle
on the arguments MV types often advance against
the relational model.

As I understand it, an MV database is a collection
of files, a file is a collection of records, records in
a file all have the same structure, a record is indexed
by a unique key, a record is a collection of fields, a
field is a collection of (atomic?) values.

If that is correct, it seems to me that MV is an
implementation technology and the RM is a logical
formalism and that to compare the two is to compare
apples and oranges.

That said, debate on the topic still goes on and on
in this group, so I assume I have failed to grasp
something important about MV. Is there a clear,
*concise* explanation somewhere of (a) a formal
(preferably set theoretic) model of MV, and (b)
how integrity constraints are expressed and enforced
in an MV database?

(On www.pickwiki.com I came across "A formal
mathematical critique of Relational Theory and its
MultiValue opposition", which I hoped would be a
lucid expression of the MV vs RM position, but
unfortunately it turned out to be just plain wrong
where it wasn't incoherent.)

-- Ralph

Bob Badour

unread,

Apr 20, 2006, 9:49:57 AM4/20/06

to

ralph...@gmail.com wrote:

> A number of people on this group are proponents of
> Pick (MultiValue) DBMS. I've been trying to find a
> definition for MultiValue to give me a better handle
> on the arguments MV types often advance against
> the relational model.
>

> [snip]

>
> That said, debate on the topic still goes on and on
> in this group, so I assume I have failed to grasp
> something important about MV.

The Pickies are self-aggrandizing ignorants who are undeterred by either
their ignorance or their stupidity. Hence the endless 'debate'.

Rational thinking people will have concluded years ago that pick is a
primitive file processor with no integrity function whatsoever and a
seriously crippled manipulation function.

That said, you are confusing two parallel comparisons: MV (or NFNF) vs
RM is one comparison, pick (or any other primitive file processor) vs
any real DBMS is another comparison.

NFNF = not first normal form
RM = relational model ie. first normal form and above
DBMS = database management system

> Is there a clear,
> *concise* explanation somewhere of (a) a formal
> (preferably set theoretic) model of MV, and (b)
> how integrity constraints are expressed and enforced
> in an MV database?

The NFNF proponents do not agree on a single model. The models that
exist seem quite arbitrary. Certainly, any I have examined have had
serious and obvious flaws.

> (On www.pickwiki.com I came across "A formal
> mathematical critique of Relational Theory and its
> MultiValue opposition", which I hoped would be a
> lucid expression of the MV vs RM position, but
> unfortunately it turned out to be just plain wrong
> where it wasn't incoherent.)

"Plain wrong" and "incoherent" are exactly what I expect from the pick
crowd. I concluded three years ago that prolonged pick use causes severe
cognitive damage.

dawn

unread,

Apr 20, 2006, 11:01:26 AM4/20/06

to

ralph...@gmail.com wrote:
> A number of people on this group are proponents of
> Pick (MultiValue) DBMS.

I am an advocate of the data model underlying the MV databases.
Because fewer dollars have been spent in recent years to put pretty
front-ends on these databases, among other things, most of them are
dated, however.

> I've been trying to find a
> definition for MultiValue to give me a better handle
> on the arguments MV types often advance against
> the relational model.

You could read my little trilogy of flashcards at

http://www.tincat-group.com/mv/trilogy.html
(I have some javascript cleanup to do, but hopfully you can read each
sets of cards easily)

> As I understand it, an MV database is a collection
> of files, a file is a collection of records, records in
> a file all have the same structure, a record is indexed
> by a unique key, a record is a collection of fields, a
> field is a collection of (atomic?) values.

pretty close.

> If that is correct, it seems to me that MV is an
> implementation technology and the RM is a logical
> formalism and that to compare the two is to compare
> apples and oranges.

I have been trying to address this in my blog (same domain as above,
but select the blog, starting with Is Codd Dead?)

> That said, debate on the topic still goes on and on
> in this group, so I assume I have failed to grasp
> something important about MV.

My points are intended to be more about the lapses in the relational
model that are shown up by seeing other models, such as the MV (which
is the main alternative with which I have experience).

> Is there a clear,
> *concise* explanation somewhere of (a) a formal
> (preferably set theoretic) model of MV,

No, it has never, as best I can tell, had a formal and definitely not
an exclusively set-theoretic model developed. I did include the
earliest write-up that anyone I know has found related to the model in
one of my blog entries (The LIST of GIRLS, IIRC).

> and (b)
> how integrity constraints are expressed and enforced
> in an MV database?

Yes, that has been discussed in this forum many times. It is a big
topic, but the small version is that there is nothing resembling a
SQL-DBMS variety of constraints in Pick.

> (On www.pickwiki.com I came across "A formal
> mathematical critique of Relational Theory and its
> MultiValue opposition", which I hoped would be a
> lucid expression of the MV vs RM position, but
> unfortunately it turned out to be just plain wrong
> where it wasn't incoherent.)

I'll have to find that. Thanks. --dawn

Pickie

unread,

Apr 20, 2006, 5:41:30 PM4/20/06

to

There doesn't seem to be a formal model anywhere. Integrity
constraints are not enforced at the database level.

http://www.pickwiki.com/cgi-bin/wiki.pl?PickDataStructure gives my
description of the data structure, which is the point of difference in
Pick. To quote - "one can see how the layout somewhat resembles that
used in most SQL DBMS's". There is a brief comparison with objects and
XML too. Pick 'sort-of' looks like everything, but is unique - and
very useful.

http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives my views
of the philosophy behind Pick. I think even Bob would agree with this
quote - "In the Multi-Value model there is no DBMS as such."

The above are informal descriptions - I certainly don't have any
pretensions to ability at higher mathematics. C.D.T. certainly is the
place for someone to raise this issue, and the lack of a formal
definition is frustrating. But I'm not sure that Pick IS the right
thing to have a formal description. It seems to me that there is
something missing, in that it does not enforce constraints. In my
view, one uses Pick to build an application which is, itself, the DBMS.

There is a mindset about the Relational Model that is disturbing. The
point of view that says that there is no TRULY Relational DBMS because
of incompetance or wickedness on the part of the SQL DBMS providers is
just outright wrong. The problem is that it is difficult in the
extreme to build a data store of whatever size desired, that can have
some arbitrarily huge number of people changing the data in it, and
that will provide the answer to any conceivable query - as if the data
store were to be frozen until the query is done. Every time you put an
index in, or some other cute little wrinkle to more cleverly do this,
you are argueably de-normalising your database. Well, you are storing
data in multiple places, anyway.

The idea of having a horrendously complex physical implementation - in
order to provide the appearance of a clear logical model - is
uncomfortable to me. I question, not the Relational Model, but whether
implementing this aspect of it in this way is worth the trouble.

Were there any formal definitions of the Heirachical and Network
Models? I was going to write "exempting the ones that Codd set up in
order to show the RM was better", but then I realised that I haven't
even seen these, and that maybe they would teach me something. Does
anyone have a link to them?

Mikito Harakiri

unread,

Apr 20, 2006, 5:52:37 PM4/20/06

to

Pickie wrote:
> http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives my views

> I think even Bob would agree with this
> quote - "In the Multi-Value model there is no DBMS as such."

You certainly meant to say:
"In the Multi-Value model there is no *model* as such."
?

dawn

unread,

Apr 20, 2006, 6:06:42 PM4/20/06

to

That would depend on your definition of a model. --dawn

Neo

unread,

Apr 20, 2006, 6:24:42 PM4/20/06

to

> so I assume I have failed to grasp something important about MV.

There are three men with limited vision in a room with a large but
docile elephant. The first person feeling the elephant's trunk declares
its a vaccum cleaner and it works really well, especially when there
are peanuts on the ground. The second person feeling the breeze from
elephant's swaying ears declares it is a fan and it works really well
to keep him cool. The third person standing by the tail declares it is
a fly-swatter and it also works really well.

Gene Wirchenko

unread,

Apr 20, 2006, 7:08:03 PM4/20/06

to

On 20 Apr 2006 14:41:30 -0700, "Pickie" <keith....@datacom.co.nz>
wrote:

[snip]

>There is a mindset about the Relational Model that is disturbing. The
>point of view that says that there is no TRULY Relational DBMS because
>of incompetance or wickedness on the part of the SQL DBMS providers is
>just outright wrong. The problem is that it is difficult in the

IBM went with SQL over Codd's objections. Market inertia
continued from there.

>extreme to build a data store of whatever size desired, that can have
>some arbitrarily huge number of people changing the data in it, and
>that will provide the answer to any conceivable query - as if the data
>store were to be frozen until the query is done. Every time you put an
>index in, or some other cute little wrinkle to more cleverly do this,
>you are argueably de-normalising your database. Well, you are storing
>data in multiple places, anyway.

It does not matter what the implementation does as long as it is
not visible to the programmer, that is, no dependencies on the
physical are created.

>The idea of having a horrendously complex physical implementation - in
>order to provide the appearance of a clear logical model - is
>uncomfortable to me. I question, not the Relational Model, but whether
>implementing this aspect of it in this way is worth the trouble.

Why? Have you ever used a compiler? Compilers create horribly
complex lower-level code to implement a clearer (one hopes)
higher-level language. Is this worth the trouble? Many would say so.

What about having the same freedom from the lower-level details
in a DBMS?

>Were there any formal definitions of the Heirachical and Network
>Models? I was going to write "exempting the ones that Codd set up in
>order to show the RM was better", but then I realised that I haven't
>even seen these, and that maybe they would teach me something. Does
>anyone have a link to them?

Apparently yes, but they are much more complex. It is not that
the RM is the only model possible, just that it is the simplest that
works.

Sincerely,

Gene Wirchenko

paul c

unread,

Apr 20, 2006, 7:18:11 PM4/20/06

to

Have the pickle people ever implemented the RM with a pickle product?
This seems like a useful project if the pickle products are so great.
Heh, maybe the person who wrote the checkers program could write a
pickle product in sql.

p

paul c

unread,

Apr 20, 2006, 7:21:09 PM4/20/06

to

Neo wrote:
>>so I assume I have failed to grasp something important about MV.
>
>
> There are three men with limited vision in a room with a large but

> docile elephant. ...

They aren't the only ones in that room. The one I like is where the
scientist cuts of one of a frog's legs, claps his hands and the frog
jumps. After he's cut off the other three legs, he concludes the frog
has gone deaf.

p

ralph...@gmail.com

unread,

Apr 20, 2006, 8:28:03 PM4/20/06

to

Pickie wrote:
> There doesn't seem to be a formal model anywhere. Integrity
> constraints are not enforced at the database level.

Yikes.

> http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives
> my views of the philosophy behind Pick.

I've just read your article. Two misconceptions of the RM seem
to crop up again and again in comparisons between MV and the RM:

(1) the RM is *not* based on the idea of storing data in tables.
Under the RM, a relation is a set of rows with the same
signature, and a row is a partial function from column names to
values. The signature of a row is the domain of the function
(i.e., a set of column names). It is purely accidental that
relations of this kind can be conveniently portrayed on the page
as two- dimensional tables. The emphasis on the RM should be
that a relation is a *set* of rows, a row is a *set* of (column
name, value) pairs, and sets are unordered, duplicate-free
collections.

(2) The RM says *nothing* about *how* a database should be
implemented. It would be a mistake to think that because
relations are often called "tables" and are often portrayed as
2D arrays, that is how they are stored in memory or on disc.
Any implementation taking that route would have shocking
performance. The point of the RM is to separate the model (how
one thinks of the data) from the representation (how it is
stored). A good RDBMS implementation should make good decisions
concerning representation (perhaps under the guidance of the
DBA), but that is purely an optimization issue. Conflating
representation and model is akin to hand optimizing a program
before you'r sure it's correct: it will surely lead to a world
of pain.

> There is a mindset about the Relational Model that is
> disturbing. The point of view that says that there is no
> TRULY Relational DBMS because of incompetance or wickedness on
> the part of the SQL DBMS providers is just outright wrong.

My gut feeling is that it's partly to do with poor early choices
having become the standard and partly to do with the fact that
not many people finish a CS degree with any understanding of
theory or how the careful application of theory can save huge
amounts of time and effort. Given things such as the lack of
any decent type theory or the addition of terrible ideas like
NULLs into SQL, I'm inclined to think the latter is more
significant than the former.

> The problem is that it is difficult in the extreme to build a
> data store of whatever size desired, that can have some
> arbitrarily huge number of people changing the data in it,
> and that will provide the answer to any conceivable query -
> as if the data store were to be frozen until the query is
> done.

The DBMS has to
- ensure data integrity
- ensure data availability
- protect against hardware failure
- manage distribution
- manage concurrent access
- optimize *dynamically* for *multiple* applications.

There is no way it makes sense to implement each of these
aspects in every new application. Implementing any one of them
well is a huge undertaking.

> Every time you put an index in, or some other cute little
> wrinkle to more cleverly do this, you are argueably
> de-normalising your database. Well, you are storing data in
> multiple places, anyway.

If you bugger up your model ("denormalise" it) you should expect
trouble. A good DBMS should allow the DBA to suggest
optimizations, but the DBMS should be responsible for
implementing those optimizations, which should not affect the
model in any way.

> The idea of having a horrendously complex physical
> implementation - in order to provide the appearance of a clear
> logical model - is uncomfortable to me. I question, not the
> Relational Model, but whether implementing this aspect of it
> in this way is worth the trouble.

As someone else said, the same could be said of compilers for
high level languages. But as I said above, there are things
that you Just Have to Have in a DBMS, and it's better to get
them right just once, in one place: the DBMS, not every
application.

-- Ralph

Bob Badour

unread,

Apr 20, 2006, 8:50:35 PM4/20/06

to

Pickie wrote:

> There doesn't seem to be a formal model anywhere.

Which limits the application of anything resembling a formal method.

> Integrity
> constraints are not enforced at the database level.

Which is a central database management function.

> Pick 'sort-of' looks like everything, but is unique - and
> very useful.

We have already established that it has less utility than most of the
alternatives. How does that make it 'very useful' ?

> http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives my views
> of the philosophy behind Pick. I think even Bob would agree with this
> quote - "In the Multi-Value model there is no DBMS as such."

Why would you compare a model and a dbms. I would agree that at best
primitive file processors like Pick are nothing better than dbms
construction kits. I have seen several NFNF logical models, but I have
seen nothing I would call a concensus on an MV model.

> The above are informal descriptions - I certainly don't have any
> pretensions to ability at higher mathematics. C.D.T. certainly is the
> place for someone to raise this issue, and the lack of a formal
> definition is frustrating. But I'm not sure that Pick IS the right
> thing to have a formal description.

See Dijkstra's comments on the illusion of power.

> It seems to me that there is
> something missing, in that it does not enforce constraints. In my
> view, one uses Pick to build an application which is, itself, the DBMS.

Doesn't this then require every pick application developer to have the
rare expert skills of a dbms developer? Do you think it is wise to
require such expertise from all developers?

> There is a mindset about the Relational Model that is disturbing. The
> point of view that says that there is no TRULY Relational DBMS because
> of incompetance or wickedness on the part of the SQL DBMS providers is
> just outright wrong.

Why is it wrong to point out actual incompetence? The profit motive is
not wicked, but it is the profit motive. If it is cheaper to baffle
people with bullshit than dazzle them with brilliance, what outcome
would you expect?

> The problem is that it is difficult in the
> extreme to build a data store of whatever size desired, that can have
> some arbitrarily huge number of people changing the data in it, and
> that will provide the answer to any conceivable query - as if the data
> store were to be frozen until the query is done.

Ironically, it is easier to build a relational dbms than a
non-relational dbms.

> Every time you put an
> index in, or some other cute little wrinkle to more cleverly do this,
> you are argueably de-normalising your database.

Ignorants can argue anything. That doesn't mean the arguments even begin
to rise to the level of legitimate discourse. Normalization is a logical
concept and indexes are physical structures that do not affect the
logical view of the data. As such, any argument related to indexes and
normalization is just plain old dumb.

> Well, you are storing
> data in multiple places, anyway.

True. But if redundancy achieves the desired performance characteristics
and is entirely managed by the dmbs, what is the objection to the
redundancy?

> The idea of having a horrendously complex physical implementation - in
> order to provide the appearance of a clear logical model - is
> uncomfortable to me.

Physical implementations in heterogeneous hardware environments are by
their nature horrendously complex and performance requirements often
increase the demand for physical complexity. Physical independence and a
clear logical data model serve to alleviate (often obviate) the need for
casual users to account for this complexity.

As a general principle, one seeks to shield human users from complexity
not relevant to their needs. What about this principle disturbs you?

> I question, not the Relational Model, but whether
> implementing this aspect of it in this way is worth the trouble.

Have you considered the consequences of doing so? Fewer errors. Better
overall performance. Greater utility for more users. Higher reliability.
Greater availability. Easier maintenance.

How much trouble are those benefits worth? I think they are worth a lot
of trouble.

> Were there any formal definitions of the Heirachical and Network
> Models? I was going to write "exempting the ones that Codd set up in
> order to show the RM was better", but then I realised that I haven't
> even seen these, and that maybe they would teach me something. Does
> anyone have a link to them?

From this article by Chris Date, the footnotes identify several sources
of the paper:
http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml

1. Codd, E. F. Codd and C. J. Date. "Interactive Support for
Nonprogrammers: The Relational and Network Approaches." IBM Research
Report RJ1400 (June 6th, 1974). Republished in Randall J. Rustin (ed.),
Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol.
II, Ann Arbor, Michigan (May 1974). Also in C. J. Date, Relational
Database: Selected Writings. Reading, Mass.: Addison-Wesley (1986).

I suggest a used copy of _Relational Database: Selected Writings_ would
be the wisest investment of the options given.

Christopher Browne

unread,

Apr 20, 2006, 10:05:37 PM4/20/06

to

If you look at the ACM TODS (Transactions on Database Systems), a
goodly number of the papers present views of relational systems in a
fashion that looks *way* more like Prolog than anything else.

It tends to be pretty easy to represent relational facts as well as
queries as sets of Prolog clauses.

There is a conspicuous disconnect from Darwen/Date, there, in that
they trumpet loudly about strong data typing, whilst Prolog tends to
be nearly type-free. Mind you, I'm conflating representation and
model there, a bit...

>> There is a mindset about the Relational Model that is disturbing.
>> The point of view that says that there is no TRULY Relational DBMS
>> because of incompetance or wickedness on the part of the SQL DBMS
>> providers is just outright wrong.
>
> My gut feeling is that it's partly to do with poor early choices
> having become the standard and partly to do with the fact that not
> many people finish a CS degree with any understanding of theory or
> how the careful application of theory can save huge amounts of time
> and effort. Given things such as the lack of any decent type theory
> or the addition of terrible ideas like NULLs into SQL, I'm inclined
> to think the latter is more significant than the former.

I'd tend to agree with that.

One issue I'd take is with the notion of the *forcible* importance of
type theory. That's certainly the sort of thing that falls out of a
focus on type-oriented systems like ML, which extend the importance of
explicity typing as is typical in the spectrum of computer languages
like FORTRAN, PL/I, *descendents* of the typeless BCPL like C, C++,
and Java, and Pascal descendents like Ada.

In contrast, there are also a lively set of languages that eschew
strong typing, like, well, in the ancient past, BCPL, Tcl, and Perl.
And lively sets of languages that have strong typing, but which mostly
eschew type annotations, like Scheme, Common Lisp. I'm not certain
how to classify Smalltalk...

At any rate, there's enough diversity there that I don't think I can
go along with type theory being entirely essential...

>> The problem is that it is difficult in the extreme to build a data
>> store of whatever size desired, that can have some arbitrarily
>> huge number of people changing the data in it, and that will
>> provide the answer to any conceivable query - as if the data store
>> were to be frozen until the query is done.
>
> The DBMS has to
> - ensure data integrity
> - ensure data availability
> - protect against hardware failure
> - manage distribution
> - manage concurrent access
> - optimize *dynamically* for *multiple* applications.
>
> There is no way it makes sense to implement each of these aspects in
> every new application. Implementing any one of them well is a huge
> undertaking.
>
>> Every time you put an index in, or some other cute little wrinkle
>> to more cleverly do this, you are argueably de-normalising your
>> database. Well, you are storing data in multiple places, anyway.
>
> If you bugger up your model ("denormalise" it) you should expect
> trouble. A good DBMS should allow the DBA to suggest
> optimizations, but the DBMS should be responsible for
> implementing those optimizations, which should not affect the
> model in any way.

And storing index key values in multiple places is *not* an addition
of "unnecessary redundancy."

>> The idea of having a horrendously complex physical
>> implementation - in order to provide the appearance of a clear
>> logical model - is uncomfortable to me. I question, not the
>> Relational Model, but whether implementing this aspect of it
>> in this way is worth the trouble.
>
> As someone else said, the same could be said of compilers for
> high level languages. But as I said above, there are things
> that you Just Have to Have in a DBMS, and it's better to get
> them right just once, in one place: the DBMS, not every
> application.

Indeed.
--
output = ("cbbrowne" "@" "gmail.com")
http://linuxdatabases.info/info/spreadsheets.html
"Avoid the Gates of Hell. Use Linux" -- Unknown source

Jay Dee

unread,

Apr 21, 2006, 3:31:38 AM4/21/06

to

True,

> There is a conspicuous disconnect from Darwen/Date, there, in that
> they trumpet loudly about strong data typing, whilst Prolog tends to
> be nearly type-free. Mind you, I'm conflating representation and
> model there, a bit...

true,

>
>>>There is a mindset about the Relational Model that is disturbing.
>>>The point of view that says that there is no TRULY Relational DBMS
>>>because of incompetance or wickedness on the part of the SQL DBMS
>>>providers is just outright wrong.
>>
>>My gut feeling is that it's partly to do with poor early choices
>>having become the standard and partly to do with the fact that not
>>many people finish a CS degree with any understanding of theory or
>>how the careful application of theory can save huge amounts of time
>>and effort. Given things such as the lack of any decent type theory
>>or the addition of terrible ideas like NULLs into SQL, I'm inclined
>>to think the latter is more significant than the former.
>
>
> I'd tend to agree with that.
>
> One issue I'd take is with the notion of the *forcible* importance of
> type theory. That's certainly the sort of thing that falls out of a
> focus on type-oriented systems like ML, which extend the importance of
> explicity typing as is typical in the spectrum of computer languages
> like FORTRAN, PL/I, *descendents* of the typeless BCPL like C, C++,
> and Java, and Pascal descendents like Ada.
>
> In contrast, there are also a lively set of languages that eschew
> strong typing, like, well, in the ancient past, BCPL, Tcl, and Perl.
> And lively sets of languages that have strong typing, but which mostly
> eschew type annotations, like Scheme, Common Lisp. I'm not certain
> how to classify Smalltalk...

true,

> At any rate, there's enough diversity there that I don't think I can
> go along with type theory being entirely essential...

and true -- depending on your system's intended purpose.

[Here comes the rub.]

The "knowledge" DBMSs, which store facts and functional
dependencies explicitly and independently, behave very
differently than most "relational" DBMSs. I don't want to
spend too much time on this point, but let me simply say
that a KDBMS can be expected to expunge everything which is
inconsistent with the most recently presented knowledge.
Most DBMSs that posters to this group are familiar with
would expect the system to reject such inconsistent data -
usually because some constraint is violated.- rather than
view it as better knowledge.

This, I feel, is an essential distinction between the two
worlds -- and one that has parallels when the discussion
moves to "strongly typed" v. "typeless" languages.

If languages are arranged along a continuum extending
from "machine oriented" to "problem oriented," we should
have little trouble recognizing that those on the machine
oriented end have to be strongly typed and that those types
must directly correlate to the hardware. On the other
end: it depends -- and the decision involves a trade-off
between flexibility and predictability. In the case of
the in-between languages - like C++, for instance - it
is entirely possible that you can't predict exactly which
class will provide the methods that a pure virtual class
needs to instantiate and operate on objects. As long
as every thing's working well, everything works well.
But if something "unexpected" crops up: all bets are off.

Date and Darwin concern themselves with systems which
exhibit completely predictable behavior. The relational
model they describe is completely silent with regard to
"other" data types, other than prescribing that the
system provide mechanisms for the user to describe,
store, and operate on data of other types. What are
those other types? Everything which isn't a truth value,
a tuple value, or a relation value.

They described a set of operators which operate on those
relational values with completely predictable results.
Beyond that, they have described a system for other types
of data and operations on those types which are also
completely predictable.

But they are very careful to say that their type system
is orthogonal to the relational model. I don't think it's
correct to say that Date and Darwin advocate strong typing
because of the relational model; I think they see their
type system as an appropriate adjunct to the relational
model and they deem it so because of the reliability their
technique provides.

ralph...@gmail.com

unread,

Apr 21, 2006, 4:04:58 AM4/21/06

to

Jay Dee wrote:

> Christopher Browne wrote:
> > There is a conspicuous disconnect from Darwen/Date, there, in that
> > they trumpet loudly about strong data typing, whilst Prolog tends to
> > be nearly type-free. Mind you, I'm conflating representation and
> > model there, a bit...

Prolog can hardly be held up either as a model declarative
language or as a decent software engineering tool.

> If languages are arranged along a continuum extending
> from "machine oriented" to "problem oriented," we should
> have little trouble recognizing that those on the machine
> oriented end have to be strongly typed and that those types
> must directly correlate to the hardware.

That's completely the wrong way around: to the hardware,
it's pretty much all just bit patterns. When writing
applications, I want a strong, expressive, statically typed
language to detect as many bugs in my code as possible
*before* it ever gets to run.

(In my experience, when people complain about the discipline
imposed by a strong, static type system, they are really
complaining about the compiler refusing to generate a binary
for a provably broken program.)

That said, the only operations data types in an RDBMS need
to support are equality, comparison, and (maybe) hashing.

But all this is getting off topic.

-- Ralph

Bob Badour

unread,

Apr 21, 2006, 9:18:35 AM4/21/06

to

Jay Dee wrote:

So, one careless mistake can wipe out the entire database?!? Yikes!!!

Jay Dee

unread,

Apr 23, 2006, 12:15:29 AM4/23/06

to

Christopher Browne wrote:

If you look at the ACM TODS (Transactions on Database Systems), a
goodly number of the papers present views of relational systems in a
fashion that looks *way* more like Prolog than anything else.

***

There is a conspicuous disconnect from Darwen/Date, there, in that
they trumpet loudly about strong data typing, whilst Prolog tends to
be nearly type-free. Mind you, I'm conflating representation and
model there, a bit...

Jay Dee wrote:

The "knowledge" DBMSs, which store facts and functional
dependencies explicitly and independently, behave very
differently than most "relational" DBMSs. I don't want to
spend too much time on this point, but let me simply say
that a KDBMS can be expected to expunge everything which is
inconsistent with the most recently presented knowledge.
Most DBMSs that posters to this group are familiar with
would expect the system to reject such inconsistent data -
usually because some constraint is violated.- rather than
view it as better knowledge.

Bob Badour wrote:

So, one careless mistake can wipe out the entire database?!? Yikes!!!

-----

Well, yes -- but it would be considered an education.

In other words, KDBs presume that tuples presented for storage are true;
if something "new" (inconsistent) shows up all the "old" knowledge is
expected to be discarded.

As I tried to point out: this is significantly different than the sorts
of things Date and Darwin are working on and different languages "work
better."

Pickie

unread,

Apr 23, 2006, 7:57:02 PM4/23/06

to

Maybe I ranged too widely in my post.

RE Pick's usefulness. I think Pick is in fact a very good DBMS
constructor kit. The 'rare expert skills of a dbms developer' are not
too much different to having an understanding of the Relational Model.
Even with the fairly grunty SQL DBMS toolsets there seem to be a lot of
crap design decisions, so maybe forcing people to learn a few
disciplines is good?

RE The term 'model'. Maybe if I state that capitalised 'Model', as in
'Relational Model', refers to formal mathematical Model; while lower
case 'model' is a generic, plastic term. Even using the most primitive
file mechanisms and Fortran or Cobol, surely we are trying to 'model'
some aspect of the real world.

RE "See Dijkstra's comments on the illusion of power." Did I make any
case for a magic elixir? Or did I want to get rid of the programmer?
I just don't see the relevance of the reference. It's usually the SQL
DBA who fears programmers, warns how badly they act around data, and
wants to get rid of them.

RE Incompetance, wickedness/profit motive, etc. Date & Pascal were
plugging a TRULY RDBMS a while ago... seems to have died the proverbial
(Dataphor? Alphapro? something like that). Maybe there's just not
enough brilliance in the world to do this job (building a relational
DBMS). I don't see how you can say this is easier than building a
non-relational DBMS unless you are saying that one cannot build any
sort of DBMS at all! I do this all the time with my Pick multivalue
list-oriented database constructor kit!

RE Redundancy in Indexes. Even Codd's paper mentions this. I wasn't
making a strong case, but I have seen the argument made that a
relational database is very non-redundant. Obviously this is
implementation-specific. Actually, the storage structure used in Pick
is very compact. Used correctly, it is more compact than you would
think from looking at the logical model. Mind you, the phrase "used
correctly" can be a real killer sometimes.

RE Complexity. I was unclear. I was questioning the requirement that
any and every query must be carried out in such a way that the database
appears to be static. As I understand it, Oracle uses the undo logs to
do this and I can only imagine how complex a task it is. Moreover,
apparently there is a possibility that the logs aren't big enough to
cope with a query, and I cannot imagine what happens then. It's
obvious that some data is highly time sensitive and other data is not
very timesensitive at all. Bank transactions would be a good example
of the former. It seems to me that timestamping data as and when
necessary is a lot simpler than providing this capability everywhere.

RE References. I had hoped to avoid "Buy a book by Chris Date" as
being the answer to my request. From the look of it, there are a few
of his in this series - would I get the right ones? Surely someone can
point to an open-source web resource!

Thanks for the comments, Bob.

Pickie

unread,

Apr 23, 2006, 8:37:03 PM4/23/06

to

Hi Ralph,

I just don't understand how you could read my little article and think
I got those two points wrong.

I didn't say the RM was based on tables. I said 'The method used is
formal, mathematical, and uses data which is in a tabular form.' When
the basic concept is rows and columns, I think that's a fair statement.
Maybe I was wrong to say "The tabular form was inspired by data
storage structures where data was held in tables of fixed length
fields." However, if Codd did or did not draw inspiration from the
source I thought he had - so what?

I also didn't say that the RM said anything about how the data was
stored - I said this

"...we now have three levels. They are

- The presentation layer
- The logical layer
- The storage layer

The logical layer is the core part of the Relational Model."

This seems to be exactly what you are saying!

Are you rebutting some other thing you read, or do you not read
properly, or what?

I don't think the CS degree results have anything to do with what
Oracle or IBM or Microsoft do for their design decisions. I just don't
follow your logic on this one.

Does any DBMS optimise dynamically correctly? My Systems Analysis days
are long past, but it seems to me this would be one of those really
complex problems that can never be solved.

Your statement "A good DBMS should allow the DBA to suggest
optimizations..." seems more than a little anthropomorphic, don't you
think?

ralph...@gmail.com

unread,

Apr 23, 2006, 10:02:51 PM4/23/06

to

Pickie wrote:
> I didn't say the RM was based on tables. I said 'The method used is
> formal, mathematical, and uses data which is in a tabular form.' When
> the basic concept is rows and columns, I think that's a fair statement.
> Maybe I was wrong to say "The tabular form was inspired by data
> storage structures where data was held in tables of fixed length
> fields."

The latter is the statement I was mainly taking issue with.
The RM notion of a table has nothing to do with storage
structures or fixed length fields: it has nothing to do with
representation at all.

Sorry if I misunderstood you, but together with the earlier
statement, "The method used is formal, mathematical, and
uses data which is in a tabular form." suggested strongly to
me that you were thinking in terms of a representation
rather than the abstraction.

> I also didn't say that the RM said anything about how the data was
> stored - I said this
>
> "...we now have three levels. They are
>
> - The presentation layer
> - The logical layer
> - The storage layer
>
> The logical layer is the core part of the Relational Model."
>
> This seems to be exactly what you are saying!

I beg your pardon: it seemed to me that you were saying the
presentation and storage layers are also part of the RM. I
got this impression from the full introductory sentence, "As
the overall Relational model has been refined over the
years, we now have three levels."

> I don't think the CS degree results have anything to do with what
> Oracle or IBM or Microsoft do for their design decisions. I just don't
> follow your logic on this one.

My experience is that few people have a good grasp of CS
theory and consequently, IMHO, often end up being poor
designers. This often causes real problems where great
ideas end up with terrible solutions (e.g., great idea: a
standardised data exchange format, terrible solution: XML;
great idea: representing absent information in a database,
terrible solution: NULLs in SQL).

> Does any DBMS optimise dynamically correctly? My Systems Analysis days
> are long past, but it seems to me this would be one of those really
> complex problems that can never be solved.

I would expect Oracle etc. to provide good performance
measurement tools and the means to change the representation
of a live database as appropriate.

Similarly, I would imagine they would put a great deal of
effort into query optimization.

I'm no expert on the technology of the day, but I'm sure
other posters here could give you an informed answer.

> Your statement "A good DBMS should allow the DBA to suggest
> optimizations..." seems more than a little anthropomorphic, don't you
> think?

Do you think so? The declarative programming language I
work on supports source code pragmas that the compiler can
ignore, implement, or use as hints for other optimisations.
A pragma, then, is just a suggestion by the programmer for
the compiler.

-- Ralph

Bob Badour

unread,

Apr 23, 2006, 11:17:18 PM4/23/06

to

Pickie wrote:

> Maybe I ranged too widely in my post.
>
> RE Pick's usefulness. I think Pick is in fact a very good DBMS
> constructor kit. The 'rare expert skills of a dbms developer' are not
> too much different to having an understanding of the Relational Model.

I disagree. Anyone who can understand highschool-level set theory can
use the relational model effectively. Only those writing the dbms need
to have the rare expert skills of a dbms developer. Only those analysing
the requirements and designing the schema need to have those rare and
expert skills. Only those establishing the physical structure to meet
the performance requirements of disparate applications need to have
those rare and expert skills.

But as I said, anyone who can understand highschool-level set theory can
use the relational model.

> Even with the fairly grunty SQL DBMS toolsets there seem to be a lot of
> crap design decisions, so maybe forcing people to learn a few
> disciplines is good?

Forcing casual users to have the expertise of designers and analysts is
a disaster. Forcing them to have the expertise of dbms implementers is
too stupid for words.

> RE The term 'model'. Maybe if I state that capitalised 'Model', as in
> 'Relational Model', refers to formal mathematical Model; while lower
> case 'model' is a generic, plastic term. Even using the most primitive
> file mechanisms and Fortran or Cobol, surely we are trying to 'model'
> some aspect of the real world.

If by real world you mean a winchester drive with multiple read/write
heads on parallel arms, I suppose so. However, we use higher level
abstractions to shield casual users from such gobbledeegook.

> RE "See Dijkstra's comments on the illusion of power." Did I make any
> case for a magic elixir? Or did I want to get rid of the programmer?

You don't have to. Pick exhibits exactly the sort of imprecision
Dijkstra refers to which is the root of the illusion of power.

> I just don't see the relevance of the reference.

Since you are a pickie, that doesn't surprise me in the least.

> RE Incompetance, wickedness/profit motive, etc. Date & Pascal were
> plugging a TRULY RDBMS a while ago... seems to have died the proverbial
> (Dataphor? Alphapro? something like that). Maybe there's just not
> enough brilliance in the world to do this job (building a relational
> DBMS).

You are an idiot. If I built the perfect dbms tomorrow for $15, I would
have to spend billions re-educating the market, and even then, plenty of
legacy systems would remain.

Alphora remains in business, and I hope they do so for a very long time
indeed.

I don't see how you can say this is easier than building a
> non-relational DBMS unless you are saying that one cannot build any
> sort of DBMS at all!

Your lack of vision is common among pickies. I am convinced the tool
damages your brains. First, one has to contemplate building a dbms. Pick
is not one of those. If we look at dbmses that do exist, like oracle or
sybase, they have to put in a whole bunch of useless shit to deal with
their own lack of logical identity and the horrors of null.

Implementing those dbmses would be a lot easier without all that stupid
non-relational shit.

I do this all the time with my Pick multivalue
> list-oriented database constructor kit!

I doubt, very much, that you have the capability to build a dbms with
any kit. Just because you have a file processor and one could
conceivably build a dbms around the file processor doesn't mean you have
ever succeeded in doing so.

> RE Redundancy in Indexes. Even Codd's paper mentions this. I wasn't
> making a strong case, but I have seen the argument made that a
> relational database is very non-redundant.

Not from me you haven't. I doubt you understood whatever argument you read.

> RE Complexity. I was unclear. I was questioning the requirement that
> any and every query must be carried out in such a way that the database
> appears to be static.

So? Question the requirement. That's not part of the RM. While every
dbms must have condern for consistency, that concern is separate from
and orthogonal to the RM.

> RE References. I had hoped to avoid "Buy a book by Chris Date" as
> being the answer to my request. From the look of it, there are a few
> of his in this series - would I get the right ones? Surely someone can
> point to an open-source web resource!

Don't sweat the book. You wouldn't understand it in any case. Your mind
has been too badly mangled.

Pickie

unread,

Apr 24, 2006, 1:06:45 AM4/24/06

to

Hi Bob,

RE 'rare and expert skills' A general purpose dbms is a far more
complex affair than an application-specific dbms than CAN be built with
Pick. So on this point we are not comparing apples with apples.

RE 'higher level abstractions' I was just pointing out that I was using
a non-rigorous term 'model' as opposed to 'Relational Model'

RE 'Dijkstra's comment' Was it the quote about magic elixir? You may
dislike Pick, but that's no reason to denigrate anyone just because
they use it. As a good workman, I can use a tool even as poor as Pick;
while others are still moaning about Oracle and Sybase.

RE Alphora. They may be in business, but are they still in the dbms
business? OK, there will be a huge legacy overhang to bring in a
better RDBMS, but I didn't expect this crashing silence given Date's
enthusiasm. He is a respected figure, after all.

RE Redundancy. Hey, I brought up the redundancy of Indexing originally.
I was clarifying my original post, not replying to an arguement that
you hadn't made.

RE Complexity. I suppose you could make the case that time-varying data
is not specifically addressed by the RM. That's a point that Date
criticises as well. Good point.

Oh by the way, stuff your insults.

David Cressey

unread,

Apr 24, 2006, 4:08:43 AM4/24/06

to

"Pickie" <keith....@datacom.co.nz> wrote in message
news:1145836622....@j33g2000cwa.googlegroups.com...

> Maybe I ranged too widely in my post.
>
> RE Pick's usefulness. I think Pick is in fact a very good DBMS
> constructor kit. The 'rare expert skills of a dbms developer' are not
> too much different to having an understanding of the Relational Model.
> Even with the fairly grunty SQL DBMS toolsets there seem to be a lot of
> crap design decisions, so maybe forcing people to learn a few
> disciplines is good?

PMFJI. I haven't been following this thread, but the above comment caught
my eye.

In all the comments made about Pick in this forum, I've never seen it
descibed as a DBMS constructor kit.
I have no opinion on that, but I'm interested.

What really interests me is the idea that a DBMS should be constructed each
time an application is to be built.
The way I see it, there are three alternatives:

Buy a DBMS, and use it as a platform to build the application.
Buy a DBMS constructor kit, build a DBMS, and use it to build the
application.
Build an application on a platform that isn't a DBMS.

I'm using the word "buy" in the broadest sense, meaning something like
"obtain and accept". And, of course, I'm oversimplifying, because buying a
DBMS mey or may not mean buying a programming language to go with it.

But I would strongly suggest that the level of knowledge of the RM (or,
presumably, of the MV) needed to design a good database and build a good
application is quite a bit less than the level of knowledge needed to build
a good DBMS. And the level of effort needed to build a good DBMS should be
recovered over many projects, not just one.

For that matter, building a database to support just one application strikes
me as quaint. But then, my mental set is from another era, when databases
were thought of as large hubs where entire clusters of applictions came
together to collaborate by sharing data.

I agree with you about the large number of crap designs, which I call
"stupid database tricks". But I think the solution is to teach better
design skills, and develop better platforms, rather than to dismiss the
RM.

David Cressey

unread,

Apr 24, 2006, 4:15:12 AM4/24/06

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:2rX2g.65027$VV4.1...@ursa-nb00s0.nbnet.nb.ca...

> But as I said, anyone who can understand highschool-level set theory can
> use the relational model.

I disagree, although my disagreement may be merely a quibble.

I would have said:

" anyone who can understand highschool-level set theory can

learn to use the relational model."

Even with good undetstanding of highschool-level set theory, there is still
some additional learning needed to design well and to implement well.

dawn

unread,

Apr 24, 2006, 6:55:47 AM4/24/06

to

I had not seen Pick referred to as a DBMS constructor kit before
either, but I do not disagree. I call it a DBMS because there are
definitely features beyond a simple file system and I would agree with
the vendors of such products that they are database management systems.
However, I did not think it to be a dbms the first time I saw it
because it was missing so many features I expected to find. For
example, there was no talk of before and after images as I was
accustomed to with IMS. There was no prescriptive schema, only
descriptive. No one builds these features into Pick when they use
their "constructor kit".

They do typically build CRUD services where such thing as referential
integrity are placed. And, yes, it does seem primitive to build in RI
when it could come with the dbms. However, when you have a philosophy
of a descriptive schema (rather than prescriptive), one of the
downsides is that you need to roll your own constraint-handling of all
kinds. The schema does not constrain.

> For that matter, building a database to support just one application strikes
> me as quaint.

Me too. I have typically worked with enterprise systems, whether Pick
or not. I do not think of it as a single-app approach at all.

> But then, my mental set is from another era, when databases
> were thought of as large hubs where entire clusters of applictions came
> together to collaborate by sharing data.

Another era, really?

> I agree with you about the large number of crap designs, which I call
> "stupid database tricks". But I think the solution is to teach better
> design skills, and develop better platforms, rather than to dismiss the
> RM.

We can do both ;-) We ought not dismiss relations, nor modeling with
relations, but we do need to go beyond the RM to include lists, for
example, which the Information Principle (of the RM) does not permit.
Cheers! --dawn

Jon Heggland

unread,

Apr 24, 2006, 8:26:50 AM4/24/06

to

dawn wrote:

> David Cressey wrote:
>> I agree with you about the large number of crap designs, which I call
>> "stupid database tricks". But I think the solution is to teach better
>> design skills, and develop better platforms, rather than to dismiss the
>> RM.
>
> We can do both ;-) We ought not dismiss relations, nor modeling with
> relations, but we do need to go beyond the RM to include lists, for
> example, which the Information Principle (of the RM) does not permit.

I think (hope?) you mean "we do want to go beyond SQL to include lists,
for example, which SQL does not permit".
--
Jon

Bob Badour

unread,

Apr 24, 2006, 9:16:09 AM4/24/06

to

I used "can" in the meaning "is capable of".

Bob Badour

unread,

Apr 24, 2006, 9:21:59 AM4/24/06

to

Jon Heggland wrote:

She is a self-aggrandizing ignorant who hasn't a clue what she is
talking about. Whatever intellect she may have once had has been
destroyed by years of Pick use.

Ask her to point to the field of logic where anyone has ever improved
predicate logic by extending it with lists. After all, she claims to
have a background in mathematics.

dawn

unread,

Apr 24, 2006, 9:28:21 AM4/24/06

to

Well, no, I was sticking to the Information Principle for RM. Are you
suggesting that attributes whos values are lists are now acceptable
within the RM?

If so, then some of the big issues I originally had with the RM,
including 1NF (normalization per Codd), 3VL (Codd's 3rd Rule), and lack
of support for lists would all now be accepted as if they were the RM.
If the IP is gone, then I'm in (there are still typing and constraints
issues, but the RM takes a reasonable approach to those even if I can
also see the value in other strategies).

Maybe the RM has been redefined so as to dump much of what has been
taught as the RM and much of what has been implemented as the RM. Then
what's left is to get the word out. I think the easiest way to do that
would be to say that the RM is no longer the model -- let's give it a
new name. It is much harder to get the word out that something has
changed so drastically as to no longer be what it was and what people
think it is if the terms and rules keep getting redefined. Changing
1NF to mean something altogether different (or nothing at all) does not
help make it clear that the industry is moving on, for example.

So, let me know if the IP is no longer applicable and where there are
any implementations of this new approach. Thanks. --dawn

Jan Hidders

unread,

Apr 24, 2006, 10:02:18 AM4/24/06

to

dawn wrote:.

>
> So, let me know if the IP is no longer applicable and where there are
> any implementations of this new approach.

Come on, Dawn, this has already been discussed to death in this
newsgroup. The IP only say that all the information should be
represented in relation, what is in the columns of those relations it
does not speak about, so there could be very well lists there as far as
the IP is concerned.

-- Jan Hidders

dawn

unread,

Apr 24, 2006, 10:18:18 AM4/24/06

to

And lists as attribute values is implemented by whom, where? If you
are saying that every site can implement it, including making any
relevant changes to the query language, has anyone done that? --dawn

Jan Hidders

unread,

Apr 24, 2006, 11:21:38 AM4/24/06

to

dawn wrote:
> Jan Hidders wrote:
> > dawn wrote:.
> > >
> > > So, let me know if the IP is no longer applicable and where there are
> > > any implementations of this new approach.
> >
> > Come on, Dawn, this has already been discussed to death in this
> > newsgroup. The IP only say that all the information should be
> > represented in relation, what is in the columns of those relations it
> > does not speak about, so there could be very well lists there as far as
> > the IP is concerned.
>
> And lists as attribute values is implemented by whom, where?

Er, Dawn, quick follow-up questions are not going to hide the fact that
you yet again managed to misunderstand the exact meaning of one of the
most fundamental principles of the relational model.

> If you
> are saying that every site can implement it, including making any
> relevant changes to the query language, has anyone done that?

Yes. User-defined types (UDTs) were already added in SQL:1999. See for
example on

http://www.wiscorp.com/SQLStandards.html

under the, for you perhaps, intriguing title: "Great News, The
Relational Model is Dead"

UDTs (or something similar) are possible in Sybase, DB2, Oracle, SQL
server, PostgreSQL and probably more I cannot think of right now. Come
to think of it, a list-of-something datatype can of course also be
defined easily in Alphora's Dataphor.

-- Jan Hidders

Jan Hidders

unread,

Apr 24, 2006, 11:47:51 AM4/24/06

to

Jan Hidders wrote:
> dawn wrote:
> > If you
> > are saying that every site can implement it, including making any
> > relevant changes to the query language, has anyone done that?
>
> Yes. User-defined types (UDTs) were already added in SQL:1999.

Silly me. More to the point is of course the ARRAY type in SQL:1999.
For an example and short explanation see:

http://www.cl.cam.ac.uk/Teaching/2003/Databases/sql1999.pdf

There they are still bounded, in SQL:2003 they are also allowed to be
unbounded.

-- Jan Hidders

dawn

unread,

Apr 24, 2006, 12:24:02 PM4/24/06

to

Jan Hidders wrote:
> dawn wrote:
> > Jan Hidders wrote:
> > > dawn wrote:.
> > > >
> > > > So, let me know if the IP is no longer applicable and where there are
> > > > any implementations of this new approach.
> > >
> > > Come on, Dawn, this has already been discussed to death in this
> > > newsgroup. The IP only say that all the information should be
> > > represented in relation, what is in the columns of those relations it
> > > does not speak about, so there could be very well lists there as far as
> > > the IP is concerned.
> >
> > And lists as attribute values is implemented by whom, where?
>
> Er, Dawn, quick follow-up questions are not going to hide the fact that
> you yet again managed to misunderstand the exact meaning of one of the
> most fundamental principles of the relational model.

OK, I really do want to understand and appreciate any patience you can
muster. Date quoting Codd says

"The Information Principle (which I heard Ted refer to on occasion as
the fundamental principle underlying the relational model)...

'The entire information content of a relational database is
represented in one and only one way: namely, as attribute values within
tuples within relations.'"

Historically there have additionally been statements about these values
being scalar values or atomic values, but I will set that aside and
figure the RM has evolved beyond that, even if current usage (SQL-92 is
where database independent SQL seems to be sitting) lags behind.

By this description, these attribute values could be of any type,
including a user-defined type that is a list. Implementations and even
dreamed-about implementations, to my knowledge, have the DBMS and
related languages recognize only relations and scalar wrt defined type
values (of varying primitive types), permitting site-specific
definitions of user-defined types.

Are you suggesting that there have been implementations where list
attributes have been defined (as UDTs) in such a way that the query
language understands and properly handles values that are lists
(ordered)? When defining this type, is it easy enough to implement it
so that ALL and EVERY (or whatever) will function properly from SQL
against these lists (not just against RVAs)?

I have not tried writing any UDF's so my guesses might be wrong -- I
anticipated that the type definitions used something along the lines of
a toString() for display, along with possibly other functions to
extract values such as List[2] but that grouping and ungrouping within
SQL would only be for RVA's and no other collection types.

I very much appreciate you helping me get this right.

> > If you
> > are saying that every site can implement it, including making any
> > relevant changes to the query language, has anyone done that?
>
> Yes. User-defined types (UDTs) were already added in SQL:1999. See for
> example on
>
> http://www.wiscorp.com/SQLStandards.html
>
> under the, for you perhaps, intriguing title: "Great News, The
> Relational Model is Dead"

[There's a clue! I do understand that once the RM is dead, then we can
get the job done ;-) And if we redefine it so that much of what it
stood for is removed, that would work too, although likely harder to
get the industry to move beyond the early definitions.]

> UDTs (or something similar) are possible in Sybase, DB2, Oracle, SQL
> server, PostgreSQL and probably more I cannot think of right now. Come
> to think of it, a list-of-something datatype can of course also be
> defined easily in Alphora's Dataphor.

I read a lot of the Tutorial D information a while back, but I saw only
the grouping and ungrouping constructs for RVAs and not lists or other
collection types. Has anyone used Dataphor to define lists in such a
way that they are handled as one might expect in the query language?

It is conceivable to me, Jan, that in the inner sanctum the RM is
already dead (or altered so much that 1NF, 3VL, and the IP understood
as relations and primitive scalars are all gone), in which case I'm
only adding my voice to what is already known by theorists. That would
be great as it would much better align what I have seen as best
practices with the theory.

My goals would still include getting the word out about this fatality.
I doubt many undergraduate courses teach that 1NF, as we knew it, is
dead, for example. I sure don't see that knowledge having made it into
the practitioners "common knowledge" as yet. I see little movement
along these lines in industry other than various struggles with
persisting XML. If you do know of an implementation of lists that
aligns with current relational theory (which was once described as "no
lists" or "no repeating groups"), I'm very interested.

As always, thanks for your help. --dawn

Tony Andrews

unread,

Apr 24, 2006, 1:02:35 PM4/24/06

to

dawn wrote:
> 'The entire information content of a relational database is
> represented in one and only one way: namely, as attribute values within
> tuples within relations.'"

<SNIP>

> Are you suggesting that there have been implementations where list
> attributes have been defined (as UDTs) in such a way that the query
> language understands and properly handles values that are lists
> (ordered)? When defining this type, is it easy enough to implement it
> so that ALL and EVERY (or whatever) will function properly from SQL
> against these lists (not just against RVAs)?

Why would the query language have to understand lists? The whole point
of UDTs is that they are User Defined, which means that their
implementation and meaning is not known to the query language. The
definer of the UDT would define "operators" (aka "methods") to do the
things that can be done with values of that UDT, and you could use
those operators in your queries. Lists are just one of a squillion
UDTs that could be defined, and surely you don't expect the DBMSs
built-in query language to "understand" them all?

dawn

unread,

Apr 24, 2006, 1:47:29 PM4/24/06

to

Tony Andrews wrote:
> dawn wrote:
> > 'The entire information content of a relational database is
> > represented in one and only one way: namely, as attribute values within
> > tuples within relations.'"
> <SNIP>
> > Are you suggesting that there have been implementations where list
> > attributes have been defined (as UDTs) in such a way that the query
> > language understands and properly handles values that are lists
> > (ordered)? When defining this type, is it easy enough to implement it
> > so that ALL and EVERY (or whatever) will function properly from SQL
> > against these lists (not just against RVAs)?
>
> Why would the query language have to understand lists?

To make things easier (less expensive) for users (developers and
end-users)

> The whole point
> of UDTs is that they are User Defined, which means that their
> implementation and meaning is not known to the query language.

That is how I understood it too.

>The
> definer of the UDT would define "operators" (aka "methods") to do the
> things that can be done with values of that UDT, and you could use
> those operators in your queries. Lists are just one of a squillion
> UDTs that could be defined, and surely you don't expect the DBMSs
> built-in query language to "understand" them all?

Nope, just lists, lists of lists (2D arrays), sets, and scalars. --dawn

dawn

unread,

Apr 24, 2006, 2:06:29 PM4/24/06

to

Cool. We would surely want variable length (and the ability to ask the
length). I still don't see where you can work with these arrays as one
(I, at least) would want to, with grouping and ungrouping, for example.
I would want to be able to ask for all students with every major not
equal to 'PHIL' for example, even if we opted to implement majors in a
list. I would hope there is some way to do that with the functions
provided, but my guess (without being able to find examples right off)
is that the SQL might not be really sweet.

OK, so let's say that relational theory has caught up with Pick ;-) in
these ways (NF2, 2VL, and LVAs). Where is this implemented. How I can
I get access to this data from any reporting or BI tools?

This is very helpful. Thanks. --dawn

David Cressey

unread,

Apr 24, 2006, 5:27:52 PM4/24/06

to

<ralph...@gmail.com> wrote in message
news:1145510297....@j33g2000cwa.googlegroups.com...
> A number of people on this group are proponents of
> Pick (MultiValue) DBMS. I've been trying to find a
> definition for MultiValue to give me a better handle
> on the arguments MV types often advance against
> the relational model.
>
> As I understand it, an MV database is a collection
> of files, a file is a collection of records, records in
> a file all have the same structure, a record is indexed
> by a unique key, a record is a collection of fields, a
> field is a collection of (atomic?) values.

The best description of the data structure in a Pick file was given a few
months ago, in here, by DonR, IIRC. By going to Google groups, you may be
able to dredge it up. Or DonR, or the actual author, may be willing to
give it again.

If you are familiar with the basics of data manipulation in files in the era
before database managment systems, much of Pick will be familiar to you.
My understanding, which comes from reading the description I've already
mentioned, agrees for the most part with what you wrote.

Pick data is sotred in files. Files are made up of records. Records are
set up for direct access. As I understand it, records are accessed by
number, not by key. In any event, translating a key to a number is a small
problem, probably offered as a service inside Pick. Records are made up of
ASCII characters. Certain special characters are used to separate fields
within records, and sub-fields within fields, and values within
sub-fields.

So far this is just a rather simple minded specific case of a hierarchy.
Nothing to offer scathing criticism about, but nothing to wax lyrical about
either.

Where it starts to get interesting is in the following: many sub-fields
consist of only one value, but this can be for one of two reasons: The
first is that the context is such that multiple values would be meaningless.

The second is a case of a list with only one element in the list. Thus a
pizza with onions on it will have only one value, "onions" in the
appropriate place. A pizza with onions and mushrooms on it will have two
values in the same place, separated by whatever the separator is.

This raises two questions in my mind:

First, how does one disambiguate between a list consisting of only one
entry, and the entry itself. In other words how does one disambiguate
between "onions" and "(onions)" since they both have the same
representation in Pick.

The second question for me is how one distinguishes between a list used as a
poor man's representation of a set, and a list used as a list, where
placement in the list is supposed to carry information.

Thus the question is whether the list "(onions, mushrooms)" conveys the
same information or different information than the list "(mushrooms,
onions)".

The answer I keep getting from Pickies is (after I've stripped away the
veneer) seems to be: "It's all in the mind of the programmer! Isn't that
wonderful!"

My response is that it's not wonderful. The whole reason I migrated from
files to databases was to get away from data whose description was buried in
some other programmer's mind. If you think that keeping the ultimate key to
decoding the data ought to be in the mind of the programmer, then I think
you should stay away from databases. Either that or hang a suitable warning
sign in front of any databases you have built.

>

> If that is correct, it seems to me that MV is an
> implementation technology and the RM is a logical
> formalism and that to compare the two is to compare
> apples and oranges.
>
> That said, debate on the topic still goes on and on
> in this group, so I assume I have failed to grasp
> something important about MV. Is there a clear,
> *concise* explanation somewhere of (a) a formal
> (preferably set theoretic) model of MV, and (b)
> how integrity constraints are expressed and enforced
> in an MV database?

Once again, "it's all in the mind of the programmer". Pick programmers are
supposed to be so smart that they never create rogue programs or unintended
data. I'll believe it when I see it.

David Cressey

unread,

Apr 24, 2006, 5:38:33 PM4/24/06

to

"dawn" <dawnwo...@gmail.com> wrote in message
news:1145876147.9...@j33g2000cwa.googlegroups.com...

> > I agree with you about the large number of crap designs, which I call
> > "stupid database tricks". But I think the solution is to teach better
> > design skills, and develop better platforms, rather than to dismiss
the
> > RM.
>
> We can do both ;-) We ought not dismiss relations, nor modeling with
> relations, but we do need to go beyond the RM to include lists, for
> example, which the Information Principle (of the RM) does not permit.

This is a far cry from your earlier rants, where you basically said, "Now
that Ed Codd is dead, we can bury him!" and, by implication, his life's work
along with him. "And now we can stop normalizing data."

A good data architect, whether a programmer or not, will know how to
normalize data, and will also know something about when to normalize data,
based on the probable consequences of a normalized design, and the probable
consequences of a different design. The same goes for how far to carry
normalization.

Your idea that 1NF data is difficult to create and difficult to deal with
is one that has eluded me for as long as I've been reading what you write in
here. I'm wondering why I never ran into such difficulties when I built
databases. It seemed pretty straightforward to me. In other words, you
seem to spending an enormous amount of energy addressing a problem that I
see as either minor or non-existent.

dawn

unread,

Apr 24, 2006, 6:05:19 PM4/24/06

to

David Cressey wrote:
> <ralph...@gmail.com> wrote in message
> news:1145510297....@j33g2000cwa.googlegroups.com...
> > A number of people on this group are proponents of
> > Pick (MultiValue) DBMS. I've been trying to find a
> > definition for MultiValue to give me a better handle
> > on the arguments MV types often advance against
> > the relational model.
> >
> > As I understand it, an MV database is a collection
> > of files, a file is a collection of records, records in
> > a file all have the same structure, a record is indexed
> > by a unique key, a record is a collection of fields, a
> > field is a collection of (atomic?) values.
>
> The best description of the data structure in a Pick file was given a few
> months ago, in here, by DonR, IIRC. By going to Google groups, you may be
> able to dredge it up. Or DonR, or the actual author, may be willing to
> give it again.
>
> If you are familiar with the basics of data manipulation in files in the era
> before database managment systems, much of Pick will be familiar to you.
> My understanding, which comes from reading the description I've already
> mentioned, agrees for the most part with what you wrote.
>
> Pick data is sotred in files. Files are made up of records. Records are
> set up for direct access. As I understand it, records are accessed by
> number, not by key.

It is accessed by a key. Each file has a single primary key which may
be a multi-part key (which would be similar to a composite/compound
key).

> In any event, translating a key to a number is a small
> problem, probably offered as a service inside Pick.

Scratch the above statement.

> Records are made up of
> ASCII characters. Certain special characters are used to separate fields
> within records, and sub-fields within fields, and values within
> sub-fields.

Yes, record marks, field marks, value marks, sub-value marks are
characters such as ^253

> So far this is just a rather simple minded specific case of a hierarchy.

In some sense it is. However, having worked with VSAM files and
related applications, it is really very different. For example, a file
of source or object code is a record in a Pick file. Dictionaries are
queried just like files are. Virtual fields are used extensively.

> Nothing to offer scathing criticism about, but nothing to wax lyrical about
> either.

I understand that as an outside perspective. That was mine for the
first few months after joining a shop where I had developers working
with it.

> Where it starts to get interesting is in the following: many sub-fields
> consist of only one value, but this can be for one of two reasons: The
> first is that the context is such that multiple values would be meaningless.
>
> The second is a case of a list with only one element in the list. Thus a
> pizza with onions on it will have only one value, "onions" in the
> appropriate place.

This actually describes values rather than sub-values

> A pizza with onions and mushrooms on it will have two
> values in the same place, separated by whatever the separator is.

Yes.

> This raises two questions in my mind:
>
> First, how does one disambiguate between a list consisting of only one
> entry, and the entry itself. In other words how does one disambiguate
> between "onions" and "(onions)" since they both have the same
> representation in Pick.

Because the "schema" is descriptive rather than prescriptive, it
depends on your dictionary, how you define the attribute, as to whether
it is considered single or multi-valued. If it is multi-valued with
exactly one value, it is stored the same and looks the same at the
logical level as a single-valued attribute.

> The second question for me is how one distinguishes between a list used as a
> poor man's representation of a set, and a list used as a list, where
> placement in the list is supposed to carry information.

That's a good question and I think we have discussed this one before.
The only hints in the vanilla dictionary combined with the data is in
the name, description, and values. Developers have complete control on
whether they treat it as a list, a bag or a set and they do not
advertise this anywhere other than the code.

Many people add attributes to the dictionary, but even then I suspect
that few have found a need to specify whether the attribute really
should be ordered or not. I'm certain there can be misunderstandings,
but I don't even have anecdotes of this lack of metadata causing
significant problems for an organization.

> Thus the question is whether the list "(onions, mushrooms)" conveys the
> same information or different information than the list "(mushrooms,
> onions)".

Because the system is so language based, this concern is much like a
concern for any list that a person might give in a sentence. It is
often clear whether the order was important or not. When it is not
clear, ask someone. I realize this is a less than satisfying answer,
but it is the situation.

> The answer I keep getting from Pickies is (after I've stripped away the
> veneer) seems to be: "It's all in the mind of the programmer! Isn't that
> wonderful!"

The control of whether something implemented as a list is a list, bag,
or set is addressed by developers, but the answer of which is in the
mind of the end-users too.

> My response is that it's not wonderful.

I agree that it is not tight. I wish I could come up with any time
this has botched things up. I can think of a time when a VAR inserted
items in a list in an order that was different than the customer
thought it should be. I'd be happy to have it be tighter (add a
specifier to the metadata to identify whether something is a logical
list, bag, or set), but it really doesn't seem to bubble up as a
significant issue.

> The whole reason I migrated from
> files to databases was to get away from data whose description was buried in
> some other programmer's mind.

I was happy to migrate from VSAM to IMS for the same reason. I don't
know what you could have told me that would have made me think it was a
good idea to use Pick. But having done so, it sure does seem to have
some advantages that could help the industry for the future.

> If you think that keeping the ultimate key to
> decoding the data ought to be in the mind of the programmer, then I think
> you should stay away from databases. Either that or hang a suitable warning
> sign in front of any databases you have built.

I really do understand why you say that. It is very frustrating that I
have not yet been able to explain it.

> > If that is correct, it seems to me that MV is an
> > implementation technology and the RM is a logical
> > formalism and that to compare the two is to compare
> > apples and oranges.

That would be the case except that you can take the same business
domain and implement it with either. So, in the end, they are tools
used to solve pretty much the same problems; address the same
opportunities.

> > That said, debate on the topic still goes on and on
> > in this group, so I assume I have failed to grasp
> > something important about MV. Is there a clear,
> > *concise* explanation somewhere of (a) a formal
> > (preferably set theoretic) model of MV, and (b)
> > how integrity constraints are expressed and enforced
> > in an MV database?
>
> Once again, "it's all in the mind of the programmer". Pick programmers are
> supposed to be so smart that they never create rogue programs or unintended
> data.

No, in fact you likely need more smarts to implement the same set of
requirements in just about any other environment. There is simply a
different approach to quality assurance than with a SQL-DBMS.

> I'll believe it when I see it.

I have some ideas along those lines. Someday, maybe. cheers! --dawn

David Cressey

unread,

Apr 24, 2006, 7:00:07 PM4/24/06

to

"dawn" <dawnwo...@gmail.com> wrote in message

> Because the "schema" is descriptive rather than prescriptive, it

> depends on your dictionary, how you define the attribute, as to whether

I don't understand the distinction between descriptive and prescriptive.

David Cressey

unread,

Apr 24, 2006, 7:58:22 PM4/24/06

to

"dawn" <dawnwo...@gmail.com> wrote in message

news:1145916318.9...@j33g2000cwa.googlegroups.com...
> David Cressey wrote:

> > The second question for me is how one distinguishes between a list used
as a
> > poor man's representation of a set, and a list used as a list, where
> > placement in the list is supposed to carry information.
>
> That's a good question and I think we have discussed this one before.

Yes we have discussed this one before.

> The only hints in the vanilla dictionary combined with the data is in
> the name, description, and values. Developers have complete control on
> whether they treat it as a list, a bag or a set and they do not
> advertise this anywhere other than the code.

That's a problem if you are building a database where data can be shared
across databases, and where developers of different applications do not have
access to each other's code.

>
> Many people add attributes to the dictionary, but even then I suspect
> that few have found a need to specify whether the attribute really
> should be ordered or not. I'm certain there can be misunderstandings,
> but I don't even have anecdotes of this lack of metadata causing
> significant problems for an organization.

>
> > Thus the question is whether the list "(onions, mushrooms)" conveys
the
> > same information or different information than the list "(mushrooms,
> > onions)".
>
> Because the system is so language based, this concern is much like a
> concern for any list that a person might give in a sentence. It is
> often clear whether the order was important or not. When it is not
> clear, ask someone. I realize this is a less than satisfying answer,
> but it is the situation.

A database isn't a substitute for e-mail or voice mail. It's a more formal,
and more predictable way of sharing data.
Lack of clarity that can quickly be resolved orally in a face to face
dialogue can be disastrous in the kinds of environments where databases are
relied upon.

>
> > The answer I keep getting from Pickies is (after I've stripped away the
> > veneer) seems to be: "It's all in the mind of the programmer! Isn't
that
> > wonderful!"
>
> The control of whether something implemented as a list is a list, bag,
> or set is addressed by developers, but the answer of which is in the
> mind of the end-users too.
>
> > My response is that it's not wonderful.
>
> I agree that it is not tight. I wish I could come up with any time
> this has botched things up.

Oh, come on! This has been botched up thousands of times in the pre database
era, by people using FORTAN, COBOL, or what have you. If you never ran
across any of those cases, I wonder why. If Pick is somehow ineffably
different from classical programming languages, in a way that prevents this
confusion, while still allowing things to be "loose", then I would think at
least one practitioner of Pick would have been able to explain it to the
rest of the world.

>
> > The whole reason I migrated from
> > files to databases was to get away from data whose description was
buried in
> > some other programmer's mind.
>
> I was happy to migrate from VSAM to IMS for the same reason. I don't
> know what you could have told me that would have made me think it was a
> good idea to use Pick. But having done so, it sure does seem to have
> some advantages that could help the industry for the future.
>
> > If you think that keeping the ultimate key to
> > decoding the data ought to be in the mind of the programmer, then I
think
> > you should stay away from databases. Either that or hang a suitable
warning
> > sign in front of any databases you have built.
>
> I really do understand why you say that. It is very frustrating that I
> have not yet been able to explain it.
>

Allow me to suggest, as gently and politely as I can, that your problems in
explaining it have one root cause. You are unable to explain it, after
years of trying, because it simply isn't true.

Bob Badour

unread,

Apr 24, 2006, 8:15:31 PM4/24/06

to

David Cressey wrote:

> "dawn" <dawnwo...@gmail.com> wrote in message
> news:1145916318.9...@j33g2000cwa.googlegroups.com...
>
>>David Cressey wrote:
>
>>Because the system is so language based, this concern is much like a
>>concern for any list that a person might give in a sentence. It is
>>often clear whether the order was important or not. When it is not
>>clear, ask someone. I realize this is a less than satisfying answer,
>>but it is the situation.

Ahh, the elixir. It is language based so it's like english and we don't
need all those nasty programmers. It's imprecise but pickies like it
that way. It makes their lives easier if the users have no way to
express what is wrong with the product. EWD 898

> A database isn't a substitute for e-mail or voice mail. It's a more formal,
> and more predictable way of sharing data.
> Lack of clarity that can quickly be resolved orally in a face to face
> dialogue can be disastrous in the kinds of environments where databases are
> relied upon.

What?!? It's all only an illusion?!? Say it ain't so!

>>>The answer I keep getting from Pickies is (after I've stripped away the
>>>veneer) seems to be: "It's all in the mind of the programmer! Isn't
>
> that
>
>>>wonderful!"
>>
>>The control of whether something implemented as a list is a list, bag,
>>or set is addressed by developers, but the answer of which is in the
>>mind of the end-users too.
>>
>>
>>>My response is that it's not wonderful.
>>
>>I agree that it is not tight. I wish I could come up with any time
>>this has botched things up.

> Oh, come on! This has been botched up thousands of times in the pre database
> era, by people using FORTAN, COBOL, or what have you. If you never ran
> across any of those cases, I wonder why. If Pick is somehow ineffably
> different from classical programming languages, in a way that prevents this
> confusion, while still allowing things to be "loose", then I would think at
> least one practitioner of Pick would have been able to explain it to the
> rest of the world.

In the absense of intellectual honesty, one has no hope of ever
learning. I think the exchange above shows how clearly you waste your time.

Note the immediate fall-back to imprecise terminology like "tight". She
will not venture from the comfort of her illusions or delusions as the
case may be.

>>>If you think that keeping the ultimate key to
>>>decoding the data ought to be in the mind of the programmer, then I
>
> think
>
>>>you should stay away from databases. Either that or hang a suitable
>
> warning
>
>>>sign in front of any databases you have built.
>>
>>I really do understand why you say that. It is very frustrating that I
>>have not yet been able to explain it.
>
> Allow me to suggest, as gently and politely as I can, that your problems in
> explaining it have one root cause. You are unable to explain it, after
> years of trying, because it simply isn't true.

It's hard to explain an hallucination or an axiom. What the
self-aggrandizing ignorant fails to realize is she concludes her axioms,
and her axioms are false.

dawn

unread,

Apr 24, 2006, 9:10:52 PM4/24/06

to

It doesn't matter how you describe the data in any data dictionary
(schema), your CRUD services (the C and U, specially) can put any data
into the database. The metadata does not constrain the values.

There is weak typing (close to no typing). Everything is a string
unless a binary value. Attributes are identified by position (LOC).
You can then describe the value at that position in whatever ways are
useful. These descriptions have to do with how to use or query
attribute values. You can define as many vocabulary terms for your
dictionary as you want for any particular attribute.

So, you could describe (define) an attribute as MultiValue and Pick
will treat it like a list. Then you could define that same attribute
as single valued (not done frequently), in which case you would get the
string with the value marks in it and you could hand-parse it.

You could describe LOC 5 as CUST_ID to be a single-valued string and
CUST_NUM to be a number and have these describe the same attribute. It
then becomes a form of duck typing. As long as a value can act like a
number, for example, if you treat it like a number it will play along.

Additionally, you can define LOC 4 as CAR_COLOR and then later decide
that due to new requirements and new FD's you are going to split out
CAR_COLOR into another file. Then you add CAR_ID at LOC 23, stop using
LOC 4 and redefine CAR_COLOR as a virtual field (no LOC) in this file
so that it "gets" the color from the new file. The queries all still
work because in some sense when it comes to querying, because the
dictionaries are all merely descriptive, they are all like views in
that they are "logical files" where you can redefine LOCs and attribute
names however you need to.

Like SQL, Pick has miles to go, but there really are concepts worth
moving into the future in a more industry-standard way (NF2, 2VL,
nested lists, variable length everything, duck typing...). There is a
rather big chasm between duck typing and strong typing, however, and I
haven't figured that one out. Cheers! --dawn

Jon Heggland

unread,

Apr 25, 2006, 4:12:07 AM4/25/06

to

Bob Badour wrote:
> Jon Heggland wrote:
>
>> dawn wrote:
>>> We can do both ;-) We ought not dismiss relations, nor modeling with
>>> relations, but we do need to go beyond the RM to include lists, for
>>> example, which the Information Principle (of the RM) does not permit.
>>
>> I think (hope?) you mean "we do want to go beyond SQL to include lists,
>> for example, which SQL does not permit".
>
> She is a self-aggrandizing ignorant who hasn't a clue what she is
> talking about.

Thank you, but I prefer to make up my own mind about people. Which I
have in the case of Dawn, by the way; my comment was primarily an
involuntary reflex. The point shouldn't be unfamiliar to anyone who has
read (and participated) in the recent 1NF threads here, but there you
have it. :s
--
Jon

Tony Andrews

unread,

Apr 25, 2006, 5:01:27 AM4/25/06

to

dawn wrote:

> Tony Andrews wrote:
> > Why would the query language have to understand lists?
>
> To make things easier (less expensive) for users (developers and
> end-users)

Well, if lists are sufficiently useful there is no reason why support
for the list type and its operators should not come pre-built with the
DBMS - just as useful types like DATE already do. If no commecial
DBMSs currently do, perhaps that means most people don't have a need
for them. Some DBMSs (e.g. Oracle) do support arrays as columns.

> > Lists are just one of a squillion
> > UDTs that could be defined, and surely you don't expect the DBMSs
> > built-in query language to "understand" them all?
>
> Nope, just lists, lists of lists (2D arrays), sets, and scalars. --dawn

Why not lists of lists of lists (3D arrays), lists of sets, and so on?

David Cressey

unread,

Apr 25, 2006, 6:44:13 AM4/25/06

to

"Jon Heggland" <jon.he...@idi.ntnu.no> wrote in message
news:e2klkn$eh8$1...@orkan.itea.ntnu.no...

> Thank you, but I prefer to make up my own mind about people. Which I
> have in the case of Dawn, by the way; my comment was primarily an
> involuntary reflex. The point shouldn't be unfamiliar to anyone who has
> read (and participated) in the recent 1NF threads here, but there you
> have it. :s

I wish I'd said that!

Frank Hamersley

unread,

Apr 25, 2006, 8:16:26 AM4/25/06

to

Thats funny - I always thought the "schema" was descriptive _and_
prescriptive!

Cheers, Frank.

Jan Hidders

unread,

Apr 25, 2006, 12:53:49 PM4/25/06

to

dawn wrote:
>
> Cool. We would surely want variable length (and the ability to ask the
> length). I still don't see where you can work with these arrays as one
> (I, at least) would want to, with grouping and ungrouping, for example.
> I would want to be able to ask for all students with every major not
> equal to 'PHIL' for example, even if we opted to implement majors in a
> list. I would hope there is some way to do that with the functions
> provided, but my guess (without being able to find examples right off)
> is that the SQL might not be really sweet.

Sweet is a matter of taste, but I would say it is quite simple. To get
a feeling see slide 31 of

www.starlab.vub.ac.be/staff/robert/InleidingDB/Silberschatz%20ppt%20files%20etc/ch09%20ORDB.ppt

> OK, so let's say that relational theory has caught up with Pick ;-) in
> these ways (NF2, 2VL, and LVAs). Where is this implemented.

In the DBMSs I mentioned.

> How I can I get access to this data from any reporting or BI tools?

The usual way one gets information out of a SQL-DBMS. :-P IIRC JDBC 3.0
has caught up with SQL:1999.

-- Jan Hidders

Bob Badour

unread,

Apr 25, 2006, 1:09:10 PM4/25/06

to

Jan Hidders wrote:

> dawn wrote:
>
>>OK, so let's say that relational theory has caught up with Pick ;-) in
>>these ways (NF2, 2VL, and LVAs). Where is this implemented.
>
> In the DBMSs I mentioned.

Jan, why do you bother with this idiot? The relational theory hasn't
changed significantly, and the recent SQL standard regresses at least as
far down as Pick.

Jan Hidders

unread,

Apr 25, 2006, 1:41:12 PM4/25/06

to

dawn wrote:
>
> It is conceivable to me, Jan, that in the inner sanctum the RM is
> already dead (or altered so much that 1NF, 3VL, and the IP understood
> as relations and primitive scalars are all gone), in which case I'm
> only adding my voice to what is already known by theorists. That would
> be great as it would much better align what I have seen as best
> practices with the theory.

Gee, Dawn, perhaps you should write an e-mail to the friendly people at
dbdebunk to tell them that you and they agree after all.

Ok, now my cheek hurts. :-)

> My goals would still include getting the word out about this fatality.
> I doubt many undergraduate courses teach that 1NF, as we knew it, is
> dead, for example. I sure don't see that knowledge having made it into
> the practitioners "common knowledge" as yet.

It shouldn't. The "theorists" you referred to, although formally not
opposed to nested relations, warn for using them too enthousiastically.
Never mind the sloppy data modelling attitude where one forgets to
indicate that a certain list is actually a set. Finally, whether using
nested relations / lists is in practice a good idea depends on how
efficiently your DBMS supports them and for example can do decent query
optimization on them. AFAIK the jury is still out on that one, so it is
certainly not something that should be promoted in undergraduate
classes as the "currently known best practices".

> [...] If you do know of an implementation of lists that

> aligns with current relational theory (which was once described as "no
> lists" or "no repeating groups"), I'm very interested.

AFAIK the thing you have in mind doesn't exist yet.

-- Jan Hidders

Pickie

unread,

Apr 25, 2006, 8:59:04 PM4/25/06

to

What one does with a Pick-type (MV) system, is build an application.
Some of the tasks of a DBMS are carried out - like record locking,
triggers, and transactioning - by the MV system automatically. Other
tasks, like constraint checking, are usually carried out by the
application program. There is enough provided in the MV system to
enable a total application system to be as good as a relational DBMS
without requiring the capability of building a DBMS.

Perhaps I should have called it a database construction kit, but I
thought the result was so much more than a database that 'DBMS
constructor kit' appealed to me.

It is possible to build different applications which use the same data
files. This is not usually done by independant teams if the data is to
be modified.

It is possible to use SQL, but if this involves modifying an existing
application to follow SQL rules (such as not allowing the full stop in
a column name) then usually it's not worth the financial cost.

Multivalues can violate first normal form in the same way as using the
internal file, block, and row numbers instead of a key violates Codd's
guaranteed access rule. That is to say, while it may be possible to do
something, it may not be good practice to do it. Multivalues are
correctly used if they are seen as a compact storage method for 'child'
relations. They partially automate cascading deletes which otherwise
would require application programming.

In my opinion, MV databases tend to collect rubbish data, but the
applications do not fail catastrophically because of this. Usually the
rubbish is only discovered when migrating the data to a DBMS, or
investigating the possibility of doing so. Obviously, these ones are
not quite "as good as a relational DBMS". However, this cruft seems to
gather as the conceptual model evolves, so maybe at one point it was
"as good as".

Pickie

unread,

Apr 25, 2006, 9:04:05 PM4/25/06

to

Is it possible to learn to use the relational model without a good
understanding of highschool-level set theory?

David Cressey

unread,

Apr 26, 2006, 1:59:14 AM4/26/06

to

"Pickie" <keith....@datacom.co.nz> wrote in message
news:1146013144....@j33g2000cwa.googlegroups.com...

> It is possible to build different applications which use the same data
> files. This is not usually done by independant teams if the data is to
> be modified.

This is where I diverge from the Pick culture. It's precisely this
situation that interests me: Independent teams writing applications that
collaborate by exchanging data in a meaningful, formal fashion.

If the data is exchanged interactively, you can use a network. If the data
is to be persistent, and trustworthy over time, you can use a database.
In either case you want to manage the resource that enables data sharing,
and the data to be shared.

This divergence of culture explains more about the ongoing debates than
divergences in the internals of the tool sets, divergences in syntax and
semantics, and divergences in methodologies, IMO.

Perhaps by understanding that we are trying to solve different problems, we
will understand that different things look "good" or "productive" to us.

Dan

unread,

Apr 26, 2006, 2:57:52 AM4/26/06

to

"Thats funny - I always thought the "schema" was descriptive _and_
prescriptive! "

And I thought is was proscriptive as well.

- Dan

Frank Hamersley

unread,

Apr 26, 2006, 9:14:31 AM4/26/06

to

Chuckle - any more for any more 'ives?

Cheers, Frank.

Gene Wirchenko

unread,

Apr 26, 2006, 2:31:30 PM4/26/06

to

On 25 Apr 2006 17:59:04 -0700, "Pickie" <keith....@datacom.co.nz>
wrote:

[snip]

>Multivalues can violate first normal form in the same way as using the
>internal file, block, and row numbers instead of a key violates Codd's
>guaranteed access rule. That is to say, while it may be possible to do
>something, it may not be good practice to do it. Multivalues are
>correctly used if they are seen as a compact storage method for 'child'
>relations. They partially automate cascading deletes which otherwise
>would require application programming.

And require application programming to unpack.

>In my opinion, MV databases tend to collect rubbish data, but the
>applications do not fail catastrophically because of this. Usually the

Sure, they just give wrong answers. People base their decisions
on these wrong answers. It may not be realistically possible to track
back to exactly what caused a bad decision. Bad data is often
involved though.

>rubbish is only discovered when migrating the data to a DBMS, or
>investigating the possibility of doing so. Obviously, these ones are
>not quite "as good as a relational DBMS". However, this cruft seems to
>gather as the conceptual model evolves, so maybe at one point it was
>"as good as".

Or maybe it never was.

Sincerely,

Gene Wirchenko

dawn

unread,

Apr 26, 2006, 4:31:59 PM4/26/06

to

David Cressey wrote:
> "Pickie" <keith....@datacom.co.nz> wrote in message
> news:1146013144....@j33g2000cwa.googlegroups.com...
>
> > It is possible to build different applications which use the same data
> > files. This is not usually done by independant teams if the data is to
> > be modified.
>
> This is where I diverge from the Pick culture. It's precisely this
> situation that interests me: Independent teams writing applications that
> collaborate by exchanging data in a meaningful, formal fashion.

I have possibly worked with larger system than Pickie has.
Applications using the database in one s/w company had the requirement
(enforced primarily through code reviews) of using the centrally
maintained CRUD services, which took care of some of what a SQL DBMS
person would care about (e.g. RI), but definitely not everything.

> If the data is exchanged interactively, you can use a network. If the data
> is to be persistent, and trustworthy over time, you can use a database.
> In either case you want to manage the resource that enables data sharing,
> and the data to be shared.
>
> This divergence of culture explains more about the ongoing debates than
> divergences in the internals of the tool sets, divergences in syntax and
> semantics, and divergences in methodologies, IMO.

Results will vary. Small shops typically have fewer automated
standards in place.

> Perhaps by understanding that we are trying to solve different problems, we
> will understand that different things look "good" or "productive" to us.

I will grant that the issues and priorities addressed by SQL DBMS tools
and Pick, Cache' or other tools, are different. However, the very same
business problems (accounts receivable, manufacturing, student
registration, call center operations...) can be addressed for small to
large organizations with either approach. MV providers often market
now as "embedded" but are often employed as an enterprise database.

I also have not seen more junk data in software that works with MV
solutions than elsewhere, but I have seen different kinds of junk data
-- character data in numeric fields in an MV system, for example, due
to a bug that was fixed without, apparently, repairing all data. On
the other hand, you don't have users encoding data in comment fields
because the cost of them asking for a change, not to mention the cost
of the change, is too high. There are pros and cons to each approach,
so it make sense to ask what gives an organization the best results
over time for the least amount of dollars.
Cheers! --dawn

dawn

unread,

Apr 26, 2006, 10:14:46 PM4/26/06

to

Jan Hidders wrote:
> dawn wrote:
> >
> > It is conceivable to me, Jan, that in the inner sanctum the RM is
> > already dead (or altered so much that 1NF, 3VL, and the IP understood
> > as relations and primitive scalars are all gone), in which case I'm
> > only adding my voice to what is already known by theorists. That would
> > be great as it would much better align what I have seen as best
> > practices with the theory.
>
> Gee, Dawn, perhaps you should write an e-mail to the friendly people at
> dbdebunk to tell them that you and they agree after all.
>
> Ok, now my cheek hurts. :-)

;-) You should see the responses I get when I do write dbdebunk.
There are some folks that I just can't charm ;-) and I don't think a
beverage and a chat in person would change that. Ah well...

> > My goals would still include getting the word out about this fatality.
> > I doubt many undergraduate courses teach that 1NF, as we knew it, is
> > dead, for example. I sure don't see that knowledge having made it into
> > the practitioners "common knowledge" as yet.
>
> It shouldn't.

Yes, it should. Should all XML documents be in 1NF? If not, why not?

> The "theorists" you referred to, although formally not
> opposed to nested relations, warn for using them too enthousiastically.

Yes, I know. Baby steps.

> Never mind the sloppy data modelling attitude where one forgets to
> indicate that a certain list is actually a set.

It is sloppy only if there is a business need to make the distinction.
Every set whose values I've ever enumerated for anyone, either in
writing or verbally, has been passed as a list ;-) I can imagine
instances where this could be a problem, but it just doesn't seem to be
the show stopper that some might think. I would prefer the added
precision, of course.

> Finally, whether using
> nested relations / lists is in practice a good idea depends on how
> efficiently your DBMS supports them

Agreed.

> and for example can do decent query
> optimization on them. AFAIK the jury is still out on that one,

Well, there's more than 30 years of production apps out there running
flavors of MUMPS, PICK, and others from which the jury could gather
data. Even if you don't like the lack of DBMS-defined constraints on
the way in, those are not required for determining read-only query
performance. Unfortunately, there are no industry performance measures
of which I am aware that are not designed strictly for SQL-DBMS's (or
do you know of some?)

> so it is
> certainly not something that should be promoted in undergraduate
> classes as the "currently known best practices".

I guess I still have work to do, eh?

> > [...] If you do know of an implementation of lists that
> > aligns with current relational theory (which was once described as "no
> > lists" or "no repeating groups"), I'm very interested.
>
> AFAIK the thing you have in mind doesn't exist yet.

That's what I thought. I'll do what I can (which would be a lot easier
if I were not both ignorant AND stupid, I suspect.) smiles. --dawn

Pickie

unread,

Apr 26, 2006, 11:57:27 PM4/26/06

to

People can collaborate (and exchange data in a meaningful, formal
fashion) without a DBMS. In practice, using a DBMS involves much more
than just providing the database(s). You have to be able to evolve the
database structure for new circumstances, which means you run into a
different set of problems. But I understand where you are coming from.

The original request was for a lucid statement of the MV vs RM
position. In my reply I tried to describe Pick-MV a bit, because there
are a few misconceptions about it. The basic position I took with
respect to the request was that I didn't think it could be compared
with RM because they are different things. Moreover Pick-MV doesn't
have some of the key things required of a DBMS, so it can't be compared
with RDBMSs either. You are, however, left with the fact that one can
build a system which acts as if it was built with a DBMS. People can
access information, update it, report on it, etc. without getting in
each others way. Of course, this does depend on the application
programmer. Similarly you depend on your DBA when you have a DBMS.
These people need to be skilled, and an undestanding of normalisation,
etc. is necessary to both.

The only theory I have any knowledge at all about in the database world
is the RM. The major DBMS's (I mean the ones that call themselves
RDBMSs) are based on SQL which doesn't follow the RM. In fact, given
that the RM does not have nulls, Codd's rules for a RDBMS don't seem to
follow the RM either. There are DBMS's that are said to follow the RM
but I have my doubts if SQL is supported.

The truest answer to the request seems to be that the Relational Model
is a theory that has no practical example and Multi-Value (at least the
Pick version) is a practical example that has no theory.

dawn

unread,

Apr 27, 2006, 1:12:04 AM4/27/06

to

Pickie wrote:
<snip>

> The truest answer to the request seems to be that the Relational Model
> is a theory that has no practical example and Multi-Value (at least the
> Pick version) is a practical example that has no theory.

Well said, Keith! I'm not sure either has to be true, but that does
appear to be the current state. On a related note, I sure would like
to have a means of optimizing the best practices & theory combination.
I might think one will find a 2VL, NF2, LVA... data modeling approach
there, but how could I possibly verify that? Mathematics cannot
provide the answer to that question and our industry seems
improverished related to emperical data.

Cheers! --dawn

David Cressey

unread,

Apr 27, 2006, 9:04:45 AM4/27/06

to

"Pickie" <keith....@datacom.co.nz> wrote in message

news:1146110247....@i39g2000cwa.googlegroups.com...

> People can collaborate (and exchange data in a meaningful, formal
> fashion) without a DBMS. In practice, using a DBMS involves much more
> than just providing the database(s). You have to be able to evolve the
> database structure for new circumstances, which means you run into a
> different set of problems. But I understand where you are coming from.
>

Thank you for a rational and lucid exposition of a point of view that's
different from my own. This is perhaps the best discussion between a Pickie
and a database designer I've seen. I'm going to try to keep the tone high,
while exploring the issues as deeply as I know how.

> The original request was for a lucid statement of the MV vs RM
> position. In my reply I tried to describe Pick-MV a bit, because there
> are a few misconceptions about it. The basic position I took with
> respect to the request was that I didn't think it could be compared
> with RM because they are different things.

I have to agree. First off, it's been asserted that Pick didn't have a
data model in mind when he developed Pick, and that the MV is an attempt to
provide a data model in retrospect. I think of this as largely a side
issue. Having built applications in everything from FORTRAN to SQL, I'm
going to suggest that, in every case, I had some kind of data model
somewhere in my brain, even if it was at the subconscious or subverbal
level. The advantages of making the data model conscious and explicit,
rather than subverbal and implicit, is food for another discussion.

> Moreover Pick-MV doesn't
> have some of the key things required of a DBMS, so it can't be compared
> with RDBMSs either.

I accept this, because I just don't know enough about Pick to question it.
However, some others in the forum have begun talking about products like
UniVerse, and descibing them as MV based DBMS products. At that point, a
comparison between "the multivalue data model" and "the relational data
model" becomes relevant, because we are comparing one DBMS with another.
I've found it necessary to learn at least a flyby overview of the way Pick
data files are structured as a handle on what the MVDM might be about.
There may be better handles.

> You are, however, left with the fact that one can
> build a system which acts as if it was built with a DBMS. People can
> access information, update it, report on it, etc. without getting in
> each others way. Of course, this does depend on the application
> programmer. Similarly you depend on your DBA when you have a DBMS.
> These people need to be skilled, and an undestanding of normalisation,
> etc. is necessary to both.
>

You can certainly build a system without using a DBMS. And I'm the last
person to suggest that every application ought to be layered on a DBMS.
There is a large class of problems that are better solved without a DBMS
than with a DBMS. However, there's also some truth to Spight's law, which
reads: sooner or later every application needs a database.
And I think it's obvious that if you need a database, you're better off with
a good DBMS than without any DBMS.

However, I'm going to take issue with one thing you said: People can acess
information and report on it. People can't access information directly,
in the scenario you outline. They can access services provided by the
application, and report on the data, provided they can feed the data
output by the application into a report writer! From 1969 to 2003, I've
seen numerous applications where the API is so grotesque that the
information inside it is unusable for all practical purposes. These aren't
particularly Pick applications, but applications across the board.

This is the problem that databases were invented to solve. If you're a "pro
DBMS partisan" as I am, this is what you want a database to provide:
standard DBMS services, like backup; a standard data interface, like SQL
(with due allowance for its defects); a standard data description format,
like the metadata in SQL after about 1992, I think; a reasonably standard,
or at least not counterinuitive, way of understanding the semantics of the
data; and a reasonably simple and expressive way of connecting the data, as
modeled and stored, with the view of the data expressed by subject matter
experts.

The above is a high cost wish list. I grant that. In order to get
sufficient bang for the buck, you are going to have to get a lot of utility
out of the data that you've made this investment in. That's what I think
the practice of database development is all about.

> The only theory I have any knowledge at all about in the database world
> is the RM. The major DBMS's (I mean the ones that call themselves
> RDBMSs) are based on SQL which doesn't follow the RM. In fact, given
> that the RM does not have nulls, Codd's rules for a RDBMS don't seem to
> follow the RM either. There are DBMS's that are said to follow the RM
> but I have my doubts if SQL is supported.
>

When I'm being careful (wich is not always), I try to use the initials,
RDM, rather than RM. It's important to remember that the RDM is a data
model, not just a model. Some people in this forum argue that "NULL" has
no counterpart in mathematics, so it shouldn't be part of the model. That
misses the point. If you are building a DATA model, you have to address
the question: what are you going to do when the data isn't there? as well
as the question, how do you prevent missing data? That is why Codd
addresses the issue of missing data, rather than evading the issue.

Alternatives to RDM are the hierachical data model, the CODASYL (or network)
data model, various object oriented data models, and the mulitvalue data
model.

> The truest answer to the request seems to be that the Relational Model
> is a theory that has no practical example and Multi-Value (at least the
> Pick version) is a practical example that has no theory.
>

It depends. As a practitioner, I've been willing to live with SQL as an
approximation to an RDM interface. It's one of those eighty-twety things.
And, also as a parctitioner, I've been able to build some very practical,
and very successful databases, along with applications, that use products
like Oracle. Oracle is another one of those eighty-twenty things.

Soooo...... if you are willing to accept the idea that "SQL is
approximately relational" (an idea that is nonsense to the laudest voices
in this forum) then I'm going to say that there are tens of thousands of
practical examples of the usefulness of RDM.

There are also tens of thousands of poorly built databases based on SQL out
there. Some of those might be traceable to SQL's flaws. More of them, in
my opinion, are traceable to lack of database design skills or lack of data
analysis skills.

I'm sorry this response was so long. It hope it was worth reading.

JOG

unread,

Apr 27, 2006, 1:51:21 PM4/27/06

to

David Cressey wrote:
<snippage>

> Some people in this forum argue that "NULL" has
> no counterpart in mathematics, so it shouldn't be part of the model. That
> misses the point. If you are building a DATA model, you have to address
> the question: what are you going to do when the data isn't there? as well
> as the question, how do you prevent missing data? That is why Codd
> addresses the issue of missing data, rather than evading the issue.

</snippage>

I could not disagree more. There is a reason there is no counterpart in
mathematics - the "null concept" is not necessary. Why then is it
suddenly necessary in a data model based upon mathematics? Because
"missing data" is viewed incorrectly by most participants, as though it
is some obtuse construct external to the underlying theory. This is
absolutely not the case - _everything_ one could possible say about
missing or unknown data can still happily be described in terms of
predicate logic. It is not a 'hole' in the data, it is different data.

Back to the MV...

Bob Badour

unread,

Apr 27, 2006, 3:19:56 PM4/27/06

to

David Cressey wrote:

> "Pickie" <keith....@datacom.co.nz> wrote in message
> news:1146110247....@i39g2000cwa.googlegroups.com...
>
>>People can collaborate (and exchange data in a meaningful, formal
>>fashion) without a DBMS. In practice, using a DBMS involves much more
>>than just providing the database(s). You have to be able to evolve the
>>database structure for new circumstances, which means you run into a
>>different set of problems. But I understand where you are coming from.

[snip]

> The above is a high cost wish list. I grant that. In order to get
> sufficient bang for the buck, you are going to have to get a lot of utility
> out of the data that you've made this investment in. That's what I think
> the practice of database development is all about.

A good formalism reduces the cost of the wish list. This is as true for
data models as it is for discrete signal analysis, vector calculus,
grammar specification etc.

> When I'm being careful (wich is not always), I try to use the initials,
> RDM, rather than RM. It's important to remember that the RDM is a data
> model, not just a model. Some people in this forum argue that "NULL" has
> no counterpart in mathematics, so it shouldn't be part of the model. That
> misses the point. If you are building a DATA model, you have to address
> the question: what are you going to do when the data isn't there? as well
> as the question, how do you prevent missing data? That is why Codd
> addresses the issue of missing data, rather than evading the issue.

Actually, I believe Codd tried to instill some formalism to an ad hoc
feature thrown in by others. All one has to do to allow for some
information to be missing is to structure one's predicates to allow for it.

Null is a toxic elixir. It gives one the illusion of power, and it
restrains the minds of those who use it.

Pickie

unread,

Apr 27, 2006, 5:48:44 PM4/27/06

to

Yes David, it was worth reading, thank you.

One of the problems when discussing the Pick-MV world is that there is
thirty-some years of changes. The original idea is described in a 1965
paper by Don Nelson, while the version known as R83 is looked on as the
archetype for "Pick". Universe now has SQL support, so maybe it is a
full-fledged DBMS?

When Prime Information (Universe's progenitor)came up with the idea if
having the system re-size the files continuously (called "dynamic" vs
the standard "static" files), it didn't work for a while. Re-sizing
was always a horrendous task in Pick machines as the files (tables)
grew - at one site I was on it took over 24 hours to do "the big 'un".
The latest Pick-alike (called QM) was written without any static files
at all. Transactioning was added to the various Pick-alikes at
different times. I'd use it ("like a shot" as I said at a job
interview), but most of the applications I've seen have been around for
a good while and don't use it.

Interestingly, Don Nelson conceived that there would be four 'engines'.
One for getting data out, one for putting data in, another for
formatting reports, and the fourth for changing the data structure. As
Dick Pick developed it, the formatting overtook the engine for getting
data out - that is, the two were merged (at least, that's my view of
the situation). The engine for putting data in was not built, instead
that job was carried out by the application programmer using Ken Simm's
version of BASIC.

The Pick/Nelson system did not have the idea of 'declarative'
statements. You must remember that we are talking early '70's when
most of the ideas were implemented. I might add that the original Pick
"thing" was an extremely weird one. It was written using its own
assembler. When they wanted to port to a new chip, they only had to
re-write the small part that interpreted the assembler code. A really
cool idea, but apparently the way it was coded was immensely
interlinked and very tricky. Apparently the assembler code was largely
a transcription of a set of diagrams Don Nelson did long before any
hardware was available.

You might like to read Don Nelson's paper at
http://www.tincat-group.com/mv/GIRLS.pdf

David Cressey

unread,

Apr 28, 2006, 7:45:45 AM4/28/06

to

"Pickie" <keith....@datacom.co.nz> wrote in message

news:1146174524.5...@i40g2000cwc.googlegroups.com...

> Yes David, it was worth reading, thank you.
>
> One of the problems when discussing the Pick-MV world is that there is
> thirty-some years of changes. The original idea is described in a 1965
> paper by Don Nelson, while the version known as R83 is looked on as the
> archetype for "Pick". Universe now has SQL support, so maybe it is a
> full-fledged DBMS?

SQL support is neither necessary nor sufficient for status as a full fledged
DBMS. And I say that notwithstanding the fact that I like SQL pretty well.

Jan Hidders

unread,

Apr 30, 2006, 9:51:59 AM4/30/06

to

dawn wrote:
> Jan Hidders wrote:
> > dawn wrote:
> > >
> > > My goals would still include getting the word out about this fatality.
> > > I doubt many undergraduate courses teach that 1NF, as we knew it, is
> > > dead, for example. I sure don't see that knowledge having made it into
> > > the practitioners "common knowledge" as yet.
> >
> > It shouldn't.
>
> Yes, it should. Should all XML documents be in 1NF? If not, why not?

That is neither here nor there. The question is not if you should
always model your data such that it is in 1NF.

> > Finally, whether using
> > nested relations / lists is in practice a good idea depends on how
> > efficiently your DBMS supports them
>
> Agreed.
>
> > and for example can do decent query
> > optimization on them. AFAIK the jury is still out on that one,
>
> Well, there's more than 30 years of production apps out there running
> flavors of MUMPS, PICK, and others from which the jury could gather
> data.

They already have done so, and they already know why they work
efficiently under certain circumstances, and under which circumstances
they have problems.

> Even if you don't like the lack of DBMS-defined constraints on
> the way in, those are not required for determining read-only query
> performance. Unfortunately, there are no industry performance measures
> of which I am aware that are not designed strictly for SQL-DBMS's (or
> do you know of some?)

Simply translate they SQL queries to queries in the ad-hoc query
language of your favorite system. Presuming, of course, that this
ad-hoc query language is powerful enough. Any extra required
programming would of course make the comparison meaningless.

-- Jan Hidders

dawn

unread,

Apr 30, 2006, 4:04:14 PM4/30/06

to

Jan Hidders wrote:
> dawn wrote:
> > Jan Hidders wrote:
> > > dawn wrote:
> > > >
> > > > My goals would still include getting the word out about this fatality.
> > > > I doubt many undergraduate courses teach that 1NF, as we knew it, is
> > > > dead, for example. I sure don't see that knowledge having made it into
> > > > the practitioners "common knowledge" as yet.
> > >
> > > It shouldn't.
> >
> > Yes, it should. Should all XML documents be in 1NF? If not, why not?
>
> That is neither here nor there. The question is not if you should
> always model your data such that it is in 1NF.
>
> > > Finally, whether using
> > > nested relations / lists is in practice a good idea depends on how
> > > efficiently your DBMS supports them
> >
> > Agreed.
> >
> > > and for example can do decent query
> > > optimization on them. AFAIK the jury is still out on that one,
> >
> > Well, there's more than 30 years of production apps out there running
> > flavors of MUMPS, PICK, and others from which the jury could gather
> > data.
>
> They already have done so, and they already know why they work
> efficiently under certain circumstances, and under which circumstances
> they have problems.

I've tried to find those "they" people to see what they have found out.
Can you point to one study out there that compares Pick with SQL-based
products? What is out there that compares any non-SQL (or SQL as a
second language) databases with SQL DBMS's without using SQL against
the non-SQL database? I'm apparently doing a lousy job of searching.
I'm not looking for theory on this one, but actual studies of real
software solutions. It does seem like there would be such, but I'm not
finding 'em.

> > Even if you don't like the lack of DBMS-defined constraints on
> > the way in, those are not required for determining read-only query
> > performance. Unfortunately, there are no industry performance measures
> > of which I am aware that are not designed strictly for SQL-DBMS's (or
> > do you know of some?)
>
> Simply translate they SQL queries to queries in the ad-hoc query
> language of your favorite system. Presuming, of course, that this
> ad-hoc query language is powerful enough.

You would need to implement solutions to the same business problems.
It is highly unlikely you would model the data identically for the two
systems. Then if you ask the same question of each and a compiled
language is executed via a virtual field for one and SQL is used for
the other, is that how you want to compare? I would suggest the SQL
folks would not want to compare on performance, although with some of
the latest approaches to performance improvements with SQL tools, they
might now be at the point where they would be willing to do so if there
were financial incentive to do so.

> Any extra required
> programming would of course make the comparison meaningless.

Not to an end-user, but, yes. --dawn

Jan Hidders

unread,

May 1, 2006, 6:28:40 AM5/1/06

to

dawn wrote:
> Jan Hidders wrote:
> > dawn wrote:
> > >
> > > Well, there's more than 30 years of production apps out there running
> > > flavors of MUMPS, PICK, and others from which the jury could gather
> > > data.
> >
> > They already have done so, and they already know why they work
> > efficiently under certain circumstances, and under which circumstances
> > they have problems.
>
> I've tried to find those "they" people to see what they have found out.
> Can you point to one study out there that compares Pick with SQL-based
> products? What is out there that compares any non-SQL (or SQL as a
> second language) databases with SQL DBMS's without using SQL against
> the non-SQL database? I'm apparently doing a lousy job of searching.
> I'm not looking for theory on this one, but actual studies of real
> software solutions. It does seem like there would be such, but I'm not
> finding 'em.

I don't know any, nor do I expect something like that to exist. Such
comparisons are difficult because of several reasons. The claims of the
RM are mainly wrt. the functioning of an IT department as a whole
within a certain type of context. That is hard to recreate in an
experiment.

> > > [...] Unfortunately, there are no industry performance measures

> > > of which I am aware that are not designed strictly for SQL-DBMS's (or
> > > do you know of some?)
> >
> > Simply translate they SQL queries to queries in the ad-hoc query
> > language of your favorite system. Presuming, of course, that this
> > ad-hoc query language is powerful enough.
>
> You would need to implement solutions to the same business problems.

Yes, if you are only interested in execution performance. But, again,
for a meaningful comparison between the two technolgies that approach
could be, depending on your definition of "business problem", too
myopic.

> > Any extra required
> > programming would of course make the comparison meaningless.
>
> Not to an end-user, but, yes.

Not necessarily. Indirectly this sometimes also matters to the
end-user.

-- Jan Hidders

dawn

unread,

May 1, 2006, 5:57:48 PM5/1/06

to

Jan Hidders wrote:
> dawn wrote:
> > Jan Hidders wrote:
> > > dawn wrote:
> > > >
> > > > Well, there's more than 30 years of production apps out there running
> > > > flavors of MUMPS, PICK, and others from which the jury could gather
> > > > data.
> > >
> > > They already have done so, and they already know why they work
> > > efficiently under certain circumstances, and under which circumstances
> > > they have problems.
> >
> > I've tried to find those "they" people to see what they have found out.
> > Can you point to one study out there that compares Pick with SQL-based
> > products? What is out there that compares any non-SQL (or SQL as a
> > second language) databases with SQL DBMS's without using SQL against
> > the non-SQL database? I'm apparently doing a lousy job of searching.
> > I'm not looking for theory on this one, but actual studies of real
> > software solutions. It does seem like there would be such, but I'm not
> > finding 'em.

> I don't know any, nor do I expect something like that to exist. Such
> comparisons are difficult because of several reasons. The claims of the
> RM are mainly wrt. the functioning of an IT department as a whole
> within a certain type of context. That is hard to recreate in an
> experiment.

OK, so you told me the jury was out regarding modeling and implementing
with non-1NF data (or something like that); then I told you that there
are production systems out there, if the jury really wanted to know;
then you said they have already gathered data; then I tried to ask
"where is it?"; and you said "I don't know any, nor do I expect

something like that to exist."

So, I think we talked past each other on that one. If RVAs and other
non-1NF structures are now accepted in theory and implemented in
practice, then when should they be used? In theory (since I don't know
in practice with SQL-DBMS's), there are some good times to model in
NF2, I would think. For example, when a property of a strong entity is
multivalued, it makes sense to model it as a relation or list-valued
attribute. For example, for most implementations, a set or list of
valid e-mail addresses for a person. Of course there are veritical
industries where it makes more sense to retain e-mail addresses even
when there is no person associated with the e-mail address.

If we are going to (re-!)introduce nested structures, we should have a
good idea when a logical data model should include them, or at least
know some best practices. If a particular DBMS is not up to the task
of working well with performance or ease of queries with the nested
structures, then perhaps the actual implementation needs to normalize
(1NF). But I'm not seeing nested structures yet in logical data models
(not that I'm seeing all LDMs).

> > > > [...] Unfortunately, there are no industry performance measures
> > > > of which I am aware that are not designed strictly for SQL-DBMS's (or
> > > > do you know of some?)
> > >
> > > Simply translate they SQL queries to queries in the ad-hoc query
> > > language of your favorite system. Presuming, of course, that this
> > > ad-hoc query language is powerful enough.
> >
> > You would need to implement solutions to the same business problems.
>
> Yes, if you are only interested in execution performance. But, again,
> for a meaningful comparison between the two technolgies that approach
> could be, depending on your definition of "business problem", too
> myopic.

I would be interested in the use over time, with multiple apps.

> > > Any extra required
> > > programming would of course make the comparison meaningless.
> >
> > Not to an end-user, but, yes.
>
> Not necessarily. Indirectly this sometimes also matters to the
> end-user.

I agree that if you have to hold everything and write new code, that
would matter to end-users, but whether a system was written this way or
that, end-users would typically not consider such comparisons
_meaningless_, even if they also care about the underlying technology.
Cheers! --dawn

dawn

unread,

May 1, 2006, 7:07:48 PM5/1/06

to

Pickie wrote:
<snip>

> The truest answer to the request seems to be that the Relational Model
> is a theory that has no practical example and Multi-Value (at least the
> Pick version) is a practical example that has no theory.

Well said, Keith! I'm not sure either has to be true, but that does

JOG

unread,

May 1, 2006, 7:20:57 PM5/1/06

to

I'm confused by this statement Dawn. My understanding is as follows:

1NF is definitional in that it is simply a case of saying that
"everything must be stored as tuples", allowing sets of such to form
relations. By definition, mathematical relations may not have variable
cardinalities across tuples and hence (but only as a consequence)
applying 1NF means that each field must be guaranteed to occur but
once.

However, an element in a tuple may happily be a set, a list, a RVA or
whatever user defined type one wishes to use and still be 1NF, as one
can still form acceptable tuples with them. Currently one has to
manipulate these user defined types externally as the tools aren't
really there to do otherwise (apart from strings and date types).

Is this not how you see it? All best, J.

dawn

unread,

May 1, 2006, 8:21:44 PM5/1/06

to

There are many differing definitions of 1NF, but the traditional one is
that it is equal to what Codd termed "normalized" which he discusses in
the 1970 paper. That is the only definition I can find that has been
popularized enough for most practitioners to have in mind when someone
mentions 1NF. I know you read my "Is Codd Dead?" blog entry, Jim, but
to clarify, it loosely means "no repeating groups" or what Cood called
"nonsimple domains." I recognize this is not precise, but it is the
thinking behind the origins of SQL, the ramifications of which are
still with us big time.

There are now some (Date, for example) who have redefined 1NF to have
no real meaning. If something is a relation, it is in 1NF by this
definition IIRC. That is not the 1NF to which I am referring.

> By definition, mathematical relations may not have variable
> cardinalities across tuples and hence (but only as a consequence)
> applying 1NF means that each field must be guaranteed to occur but
> once.
>
> However, an element in a tuple may happily be a set, a list, a RVA or
> whatever user defined type one wishes to use and still be 1NF,

This is only true if one has redefined 1NF to make it true. This is
not the case with the original definitions, which is why SQL (which
might be a flawed implementation, but was based on the RM as it was
then defined) did not include RVAs (SQL '92 on which ODBC is based does
not include them, for example).

> as one
> can still form acceptable tuples with them. Currently one has to
> manipulate these user defined types externally as the tools aren't
> really there to do otherwise (apart from strings and date types).
>
> Is this not how you see it? All best, J.

No, those are not the definitions that I am using. I'm not doing any
fancy footwork with the original definitions. Since I work with what
are called NF2 databases (or NF^2), which stands for Non-First Normal
Form, I am using the definition of 1NF that prompts the NF2
designation. If you have a better term for
the-approach-formerly-known-as-1NF, let me know and I'll try to
incorporate it where 1NF might otherwise be confusing.

Cheers! --dawn

JOG

unread,

May 1, 2006, 10:52:03 PM5/1/06

to

This take on 1NF has no "meaning" as such, but it does have practical
consequences. And it is pretty much the version of 1NF that I
described. Ok, you're not happy with the original "non-simple" domain
fluff, but I see this as an aspect that has been refined over the years
in the theory (if not in the implementation).

>
> > By definition, mathematical relations may not have variable
> > cardinalities across tuples and hence (but only as a consequence)
> > applying 1NF means that each field must be guaranteed to occur but
> > once.
> >
> > However, an element in a tuple may happily be a set, a list, a RVA or
> > whatever user defined type one wishes to use and still be 1NF,
>
> This is only true if one has redefined 1NF to make it true. This is
> not the case with the original definitions, which is why SQL (which
> might be a flawed implementation, but was based on the RM as it was
> then defined) did not include RVAs (SQL '92 on which ODBC is based does
> not include them, for example).

True, not according to the original paper, but relational theory
appears to me to have developed since then. However, if in some
alternate world the definition of 1NF that I described was the
universally accepted one, and it was implemented perfectly - would that
satisfy your needs?

>
> > as one
> > can still form acceptable tuples with them. Currently one has to
> > manipulate these user defined types externally as the tools aren't
> > really there to do otherwise (apart from strings and date types).
> >
> > Is this not how you see it? All best, J.
>
> No, those are not the definitions that I am using. I'm not doing any
> fancy footwork with the original definitions. Since I work with what
> are called NF2 databases (or NF^2), which stands for Non-First Normal
> Form, I am using the definition of 1NF that prompts the NF2
> designation. If you have a better term for
> the-approach-formerly-known-as-1NF, let me know and I'll try to
> incorporate it where 1NF might otherwise be confusing.

This is important. I don't believe anyone would see Codd's original
paper (including probably Codd himself) as the final word - after all,
it was over three decades ago and refinements are to be expected. It
sounds to me your beef is over the "non-simple" domains aspect - but
with user defined types, which many RM proponents promote, I honestly
think there is no disagreement to be had here.

So your remaining objections with the RM establishment (as I have read
them) seem to distill to whether a statement such as:

"Barney is_colour green and is_colour purple"

translates best to which of the following propositions:

1) colour(Barney, green) && colour(Barney, purple)
2) colour(Barney, {green, purple} )

Would you not say that with bit of logical manipulation we could
probably show that the first is preferable? J.

>
> Cheers! --dawn

dawn

unread,

May 1, 2006, 11:45:35 PM5/1/06

to

Yes.

> Ok, you're not happy with the original "non-simple" domain
> fluff, but I see this as an aspect that has been refined over the years
> in the theory (if not in the implementation).

If the theory is up to snuff at this point (toss in query language
support of LVA's and I'll start falling in line, with some angst over
strong typing and constraints that are only usable by the dbms and a
few other details)

> > > By definition, mathematical relations may not have variable
> > > cardinalities across tuples and hence (but only as a consequence)
> > > applying 1NF means that each field must be guaranteed to occur but
> > > once.
> > >
> > > However, an element in a tuple may happily be a set, a list, a RVA or
> > > whatever user defined type one wishes to use and still be 1NF,
> >
> > This is only true if one has redefined 1NF to make it true. This is
> > not the case with the original definitions, which is why SQL (which
> > might be a flawed implementation, but was based on the RM as it was
> > then defined) did not include RVAs (SQL '92 on which ODBC is based does
> > not include them, for example).
>
> True, not according to the original paper, but relational theory
> appears to me to have developed since then. However, if in some
> alternate world the definition of 1NF that I described was the
> universally accepted one, and it was implemented perfectly - would that
> satisfy your needs?

I'm insatiable ;-)

> > > as one
> > > can still form acceptable tuples with them. Currently one has to
> > > manipulate these user defined types externally as the tools aren't
> > > really there to do otherwise (apart from strings and date types).
> > >
> > > Is this not how you see it? All best, J.
> >
> > No, those are not the definitions that I am using. I'm not doing any
> > fancy footwork with the original definitions. Since I work with what
> > are called NF2 databases (or NF^2), which stands for Non-First Normal
> > Form, I am using the definition of 1NF that prompts the NF2
> > designation. If you have a better term for
> > the-approach-formerly-known-as-1NF, let me know and I'll try to
> > incorporate it where 1NF might otherwise be confusing.
>
> This is important. I don't believe anyone would see Codd's original
> paper (including probably Codd himself) as the final word -

agreed. But it is hard to find general agreement on some of these
terms, so going to the source can be helpful with that and happens to
align with the way many people still think of the terms.

> after all,
> it was over three decades ago and refinements are to be expected. It
> sounds to me your beef is over the "non-simple" domains aspect -

I've picked off three of my beefs -- 1NF and lack of LVA's and 3VL --
to start with. So one of my beefs is the non-simple domains aspect.

> but
> with user defined types, which many RM proponents promote, I honestly
> think there is no disagreement to be had here.

OK, then what would be either a) the right way or b) a best practice
for including an RVA in a logical data model?

> So your remaining objections with the RM establishment (as I have read
> them) seem to distill to whether a statement such as:
>
> "Barney is_colour green and is_colour purple"
>
> translates best to which of the following propositions:
>
> 1) colour(Barney, green) && colour(Barney, purple)
> 2) colour(Barney, {green, purple} )
>
> Would you not say that with bit of logical manipulation we could
> probably show that the first is preferable? J.

No. If there were no human beings in the mix, no input, and no output
to and from human beings, and no interpretation of the meaning of the
data ever, then 1) has the charm of mathematical simplicity (1st order
logic) with no downside.

I reserve the right to change my mind on that, however. Cheers! --dawn

David Cressey

unread,

May 2, 2006, 6:55:09 AM5/2/06

to

"dawn" <dawnwo...@gmail.com> wrote in message
news:1146520667.9...@y43g2000cwc.googlegroups.com...

> So, I think we talked past each other on that one. If RVAs and other
> non-1NF structures are now accepted in theory and implemented in
> practice, then when should they be used? In theory (since I don't know
> in practice with SQL-DBMS's), there are some good times to model in
> NF2, I would think.

Could you be a little more specific about what it is that you do not know,
in the above?

David Cressey

unread,

May 2, 2006, 7:06:01 AM5/2/06

to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1146525657.2...@v46g2000cwv.googlegroups.com...

> I'm confused by this statement Dawn. My understanding is as follows:
>
> 1NF is definitional in that it is simply a case of saying that
> "everything must be stored as tuples", allowing sets of such to form
> relations. By definition, mathematical relations may not have variable
> cardinalities across tuples and hence (but only as a consequence)
> applying 1NF means that each field must be guaranteed to occur but
> once.
>
> However, an element in a tuple may happily be a set, a list, a RVA or
> whatever user defined type one wishes to use and still be 1NF, as one
> can still form acceptable tuples with them. Currently one has to
> manipulate these user defined types externally as the tools aren't
> really there to do otherwise (apart from strings and date types).
>
> Is this not how you see it? All best, J.

This newsgroup has been down this trail many times before.

In Ted Codd's 1970 paper, he points out that when a system of relations is
devised to store a body of facts, there are other systems of relations that
will express precisely the same body of facts. He then points out that
within a group of such systems that are all logically equivalent, there
will be (at least) one that contains no sets, lists, or RVAs as elements of
a tuple.
He called that the normalized form for the group. Later on, when 2NF was
discovered, "normalized" got renamed to 1NF.

In c. 1992, C.J.Date redefined 1NF to permit RVAs as elements. Not
everyone follows Date's definition. In particular, some introductory
material to the RDM still teaches 1NF, 2NF, and 3NF as they were defined in
the 1970s and 1980s.

I hope the above summary can save a few iterations in the newsgroup.

Jon Heggland

unread,

May 2, 2006, 8:10:42 AM5/2/06

to

David Cressey wrote:
> In Ted Codd's 1970 paper, he points out that when a system of relations is
> devised to store a body of facts, there are other systems of relations that
> will express precisely the same body of facts. He then points out that
> within a group of such systems that are all logically equivalent, there
> will be (at least) one that contains no sets, lists, or RVAs as elements of
> a tuple.

...on two conditions:

"(1) The graph of interrelationships of the nonsimple
domains is a collection of trees.
(2) No primary key has a component domain which is
nonsimple.
The writer knows of no application which would require
any relaxation of these conditions."

Other writers claim they *do* know of such applications, however, and
have proposed (semi-)formal guidelines for identifying cases where RVAs
may be appropriate.

> In c. 1992, C.J.Date redefined 1NF to permit RVAs as elements. Not
> everyone follows Date's definition. In particular, some introductory
> material to the RDM still teaches 1NF, 2NF, and 3NF as they were defined in
> the 1970s and 1980s.

*Most* do, I think---with the obvious weakness that they don't really
define what atomic/simple domains are, or why it matters. Do you know of
any textbooks apart from Date's that uses Date's definition? And of
course, some introductory material teach yet another 1NF definition.
Caveat lector, and think for yourself. (BTW, there isn't much
disagreement on 2NF and 3NF, is there? Nor much interest, I think...)

> I hope the above summary can save a few iterations in the newsgroup.

:)
--
Jon

dawn

unread,

May 2, 2006, 9:01:36 AM5/2/06

to

There are some good and bad practices for including multi-valued
attributes in a logical data model when implementing in various NF2
databases. Now that SQL-DBMS's include NF2 structures (RVAs + UDTs),
what are the best practices for an LDM to include such? I've seen
examples, but also read how it is rarely a good idea to use an RVA in a
typical business scenario. So I do not know when, in practice, one
should model with nested relations (for example),. What are
practitioners learning or being told about changes in logical data
modeling now that nested relations are viable?

The theory is not constrained by whatever issues the implementations
might have, so my question was: when, in theory, should you model data
with nested relations?

Thanks. --dawn

dawn

unread,

May 2, 2006, 9:20:19 AM5/2/06

to

Jon Heggland wrote:
> David Cressey wrote:
> > In Ted Codd's 1970 paper, he points out that when a system of relations is
> > devised to store a body of facts, there are other systems of relations that
> > will express precisely the same body of facts. He then points out that
> > within a group of such systems that are all logically equivalent, there
> > will be (at least) one that contains no sets, lists, or RVAs as elements of
> > a tuple.
>
> ...on two conditions:
>
> "(1) The graph of interrelationships of the nonsimple
> domains is a collection of trees.
> (2) No primary key has a component domain which is
> nonsimple.
> The writer knows of no application which would require
> any relaxation of these conditions."
>
> Other writers claim they *do* know of such applications, however, and
> have proposed (semi-)formal guidelines for identifying cases where RVAs
> may be appropriate.

Yes, but what about the use of RVAs even when the application could be
implemented with simple domains? For example, we model data for XML
documents for data exchange using multivalues even if we could do
otherwise. Are there best practices for that? Would any of those best
practices be applicable to modeling persisted data as well? If data
for exchange is persisted (perhaps in a database such as IBM Viper when
that comes out), it would make sense that the best practices for
modeling data for exchange and persistence would overlap, right?

> > In c. 1992, C.J.Date redefined 1NF to permit RVAs as elements. Not
> > everyone follows Date's definition. In particular, some introductory
> > material to the RDM still teaches 1NF, 2NF, and 3NF as they were defined in
> > the 1970s and 1980s.

Quite a bit of it does, even if nodding to RVAs, OO, XML, while taking
digs at network and hierarchical structures. CS and CIS/MIS majors
often graduate knowing how to "normalize" data, but ask them what the
difference between master, transaction, and code files or tables might
be and they draw a blank (yes, that was a really old fashioned and
non-academic thing to say).

> *Most* do, I think---with the obvious weakness that they don't really
> define what atomic/simple domains are, or why it matters. Do you know of
> any textbooks apart from Date's that uses Date's definition?

I don't, but I haven't seen them all.

> And of
> course, some introductory material teach yet another 1NF definition.
> Caveat lector, and think for yourself. (BTW, there isn't much
> disagreement on 2NF and 3NF, is there? Nor much interest, I think...)

The disagreement would be on whether to teach them. I think many
people do because it has an academic air about it to teach how to move
from 1NF to 2 to 3 to BCNF..., but in practice you can start with BCNF
(skipping the old 1NF).

> > I hope the above summary can save a few iterations in the newsgroup.

Me too. Thanks. --dawn

JOG

unread,

May 2, 2006, 10:06:05 AM5/2/06

to

dawn wrote:
> JOG wrote:
> > dawn wrote:
> > > JOG wrote:
> > > > dawn wrote:
> > > > > Jan Hidders wrote:
> > > > > > dawn wrote:
> > > > > > > Jan Hidders wrote:
> > > > > > > > dawn wrote:

> [time for a snip]

> > So your remaining objections with the RM establishment (as I have read
> > them) seem to distill to whether a statement such as:
> >
> > "Barney is_colour green and is_colour purple"
> >
> > translates best to which of the following propositions:
> >
> > 1) colour(Barney, green) && colour(Barney, purple)
> > 2) colour(Barney, {green, purple} )
> >
> > Would you not say that with bit of logical manipulation we could
> > probably show that the first is preferable? J.
>
> No. If there were no human beings in the mix, no input, and no output

> to and from human beings.

Well queries interact with the logical model, not humans or their
notion of an entity directly. It's a layer down. I have read and
understand your standpoint on this, but I really think there is a
distinction to be made - I try and describe it below.

> and no interpretation of the meaning of the data ever

I don't think this follows: both mean the same thing - the original
proposition.

> then 1) has the charm of mathematical simplicity (1st order
> logic) with no downside.

Consider that the propositions:

1) "Barney has green fur and purple fur",
"Oscar has green fur and black fur"

are logically identical to:

2) "Barney and Oscar have green fur",
"Barney has purple fur",
"Oscar has black fur"

If we're agreed there then consider that the first - in MV style -
would give something like:
colour( Barney, {green, purple} ) &&
colour( Oscar, {green, black})

But the second set of propositions in MV style gives us:
colour( green, {Barney, Oscar} ) &&
colour( purple, {Barney}) &&
colour( black, {Oscar})

The first "human interface" refers to entities "Barney" and "Oscar".
The second, has an "interface" consisting of the entities
"green"-ness, "purple"-ness and "black"-ness. Which is correct? Neither
and both, because they are artifices. Who knows which will be
appropriate to the user? An XML or MV style prejudges that decision,
and with large shared data, who's future use is unpredictable, it seems
essential to me to avoid that interface-prison.

With RDBMS, despite thei 3VL issues, there is no prejudice as to this
choice - both sets of propositions, 1 and 2, encode down to the same
things - just as they should being logically identical.

> I reserve the right to change my mind on that, however. Cheers! --dawn

I'd reckon that this is the most important quality anyone can have in
life. That and a high tolerance for alcohol.

paul c

unread,

May 2, 2006, 12:37:25 PM5/2/06

to

Jon Heggland wrote:
> David Cressey wrote:
>
>>In Ted Codd's 1970 paper, he points out that when a system of relations is
>>devised to store a body of facts, there are other systems of relations that
>>will express precisely the same body of facts. He then points out that
>>within a group of such systems that are all logically equivalent, there
>>will be (at least) one that contains no sets, lists, or RVAs as elements of
>>a tuple.
>
>

> ....on two conditions:

>
> "(1) The graph of interrelationships of the nonsimple
> domains is a collection of trees.
> (2) No primary key has a component domain which is
> nonsimple.
> The writer knows of no application which would require
> any relaxation of these conditions."
>
> Other writers claim they *do* know of such applications, however, and
> have proposed (semi-)formal guidelines for identifying cases where RVAs
> may be appropriate.

> ...

An example that some may not find very interesting, ie., too simple but
still troubles me might be "the sets/combinations of parts that a
supplier will agree to ship" having no other attributes than S# and P#,
eg., SP { S#, {P#}} where I'm intending {P#} to mean a set of parts.
I'm interested to know of the other writers or what they say.

pc

dawn

unread,

May 2, 2006, 1:15:50 PM5/2/06

to

JOG wrote:
> dawn wrote:
> > JOG wrote:
> > > dawn wrote:
> > > > JOG wrote:
> > > > > dawn wrote:
> > > > > > Jan Hidders wrote:
> > > > > > > dawn wrote:
> > > > > > > > Jan Hidders wrote:
> > > > > > > > > dawn wrote:
> > [time for a snip]
> > > So your remaining objections with the RM establishment (as I have read
> > > them) seem to distill to whether a statement such as:
> > >
> > > "Barney is_colour green and is_colour purple"
> > >
> > > translates best to which of the following propositions:
> > >
> > > 1) colour(Barney, green) && colour(Barney, purple)
> > > 2) colour(Barney, {green, purple} )
> > >
> > > Would you not say that with bit of logical manipulation we could
> > > probably show that the first is preferable? J.
> >
> > No. If there were no human beings in the mix, no input, and no output
> > to and from human beings.
>
> Well queries interact with the logical model, not humans

I beg to differ. The logical model does not initiate anything. There
is an asker and a responder. The asker is either a software program,
written by a person (or generated by specs written by a person) or a
person.

> or their
> notion of an entity directly. It's a layer down.

Whatever layer it is in, somewhere in the mix is a human being, often a
software developer (found in some layer of humanity).

> I have read and
> understand your standpoint on this, but I really think there is a
> distinction to be made - I try and describe it below.
>
> > and no interpretation of the meaning of the data ever
>
> I don't think this follows: both mean the same thing - the original
> proposition.

I don't recall what my above statement was about in order to respond to
this response.

> > then 1) has the charm of mathematical simplicity (1st order
> > logic) with no downside.
>
> Consider that the propositions:
>
> 1) "Barney has green fur and purple fur",
> "Oscar has green fur and black fur"
>
> are logically identical to:
>
> 2) "Barney and Oscar have green fur",
> "Barney has purple fur",
> "Oscar has black fur"
>
> If we're agreed there then consider that the first - in MV style -
> would give something like:
> colour( Barney, {green, purple} ) &&
> colour( Oscar, {green, black})
>

so far, so good

> But the second set of propositions in MV style gives us:
> colour( green, {Barney, Oscar} ) &&
> colour( purple, {Barney}) &&
> colour( black, {Oscar})

Not if you use solid multi-value data modeling. Barney and Oscar are
name attributes for strong entities, in this case of type Character. I
would not model strong entities in this way (an example of a best
practice). Strong entities are modeled as "entities" translating in
the case of MV to "files.". Properties of entities such as colour [sic
;-) ] would go away if the strong entity were gone. So, I would model
that as a property list. The second proposition then becomes

Character( Barney, {green, purple} )
Character( Oscar, {green, black} )

> The first "human interface" refers to entities "Barney" and "Oscar".
> The second, has an "interface" consisting of the entities
> "green"-ness, "purple"-ness and "black"-ness.

Not entities. Entities are things. It is a person, place, thing or
event. Person, place, and event (e.g. transaction) are easy enough to
identify. For things, if you cannot hit it with a stick or print it
out on paper, then think twice about whether it is an entity or a
property of an entity.

> Which is correct? Neither
> and both,

The first, not the second.

> because they are artifices. Who knows which will be
> appropriate to the user?

The systems analyst better be able to give it a good shot. If they
cannot tell whether green-ness is an entity or a property of an entity
wrt to the organization's requirements, I would be very surprised.

> An XML or MV style prejudges that decision,

No, it makes a distinction that those modeling with the RM do not make.
I am an entity and my shoe size is a property of me. It can change.
It could even go away if my feet were amputated (sheesh, apologies for
the example), but only as long as I am an entity an organization cares
about would that property be relevant. If I am not of interest, then,
by definition, my feet are not of interest either. In other words, if
you are interested in shoe size, you are interested in me because my
shoe size is a property of me.

[I realize that if a company is doing a shoe size survey and has no
interest in collecting other data about people, then the shoe size
might be a property of a survey or some other entity.]

> and with large shared data, who's future use is unpredictable, it seems
> essential to me to avoid that interface-prison.

I disagree, as you have likely guessed. Yes, there are new
requirements regularly that prompt one change or other, but I would
dare say that there are more changes that would push a property like
color to go from single to multi-valued, prompting a schema redesign
(new table) than there are if you make color a multivalued attribute to
start with.

> With RDBMS, despite thei 3VL issues, there is no prejudice as to this
> choice - both sets of propositions, 1 and 2, encode down to the same
> things - just as they should being logically identical.
>
> > I reserve the right to change my mind on that, however. Cheers! --dawn
>
> I'd reckon that this is the most important quality anyone can have in
> life. That and a high tolerance for alcohol.

I have the first even if I argue until I fully understand another
position, but don't let me drink more than one martini. --dawn

Jan Hidders

unread,

May 2, 2006, 1:28:28 PM5/2/06

to

dawn wrote:
>
> OK, so you told me the jury was out regarding modeling and implementing
> with non-1NF data (or something like that);

Not exactly. What the jury is out on is whether you can still have both
efficiency and data-independence at the same time, which requires
powerful query optimization.

> then I told you that there
> are production systems out there, if the jury really wanted to know;

The jury knows. :-)

> then you said they have already gathered data; then I tried to ask
> "where is it?"; and you said "I don't know any, nor do I expect
> something like that to exist."

... in a published form that you can Google for. But that data is there
in the sense that you can talk to experienced developers and DBMS
implementors.

> So, I think we talked past each other on that one. If RVAs and other
> non-1NF structures are now accepted in theory and implemented in
> practice, then when should they be used?

When they bring you closer to your goals, such as for example
flexibility and efficiency. Whether that is the case or not may depend
upon the DBMS you are using.

> If we are going to (re-!)introduce nested structures, we should have a
> good idea when a logical data model should include them, or at least
> know some best practices. If a particular DBMS is not up to the task
> of working well with performance or ease of queries with the nested
> structures, then perhaps the actual implementation needs to normalize
> (1NF). But I'm not seeing nested structures yet in logical data models
> (not that I'm seeing all LDMs).

To add some confusion; there actually is such a thing as *the* logical
data model, called LDM. Google for "logical data model kuper vardi".
But I digress.

But what do you mean with "I'm not seeing nested structures". Are we
talking about conceptual data models? About database schema designs in
practice? Proposed data modelling notations? Or what?

-- Jan Hidders

Bob Badour

unread,

May 2, 2006, 2:09:23 PM5/2/06

to

JOG wrote:

Sadly, claiming the right and having the fortitude of character to
exercise the right are very different things.

Bob Badour

unread,

May 2, 2006, 2:20:39 PM5/2/06

to

paul c wrote:

The problem with SP { S#, {P#} } in base relations is it brushes up
against the information principle. It introduces a 'thing' that one
cannot discuss as a simple value, and that 'thing' is a set of parts.

While I can see no particular argument against SP { S#, {P#} } as a
view, I can see some potential problems with having it as a base
relation. Of course, those problems are only potential problems.

dawn

unread,

May 2, 2006, 5:12:20 PM5/2/06

to

Jan Hidders wrote:
> dawn wrote:
> >
> > OK, so you told me the jury was out regarding modeling and implementing
> > with non-1NF data (or something like that);
>
> Not exactly. What the jury is out on is whether you can still have both
> efficiency and data-independence at the same time, which requires
> powerful query optimization.

Ah, that data independence thing again. There are plenty of
requirements and I definitely do not want changes in the need for
additional disk or any physical changes like that to prompt a change in
software, but I still don't grok this requirement fully. Can this
non-functional (theoretical?) requirement of "data independence" be
written as a series of functional requirements?

For example, do we want the requirement that if data are moved from
schema-A on host-A managed by subsidiary-A to schema-B on host-B
managed by subsidiary-B, then must not be a need to change the logical
data model used by the applications, so that applications can run
without changes simply by redirecting (outside of the apps) requests
for such data to another data source? Obviously, that would be nice.
Each such requirement has an associated cost.

I'd like to take the overall functional requirements for a database
management system (which are not the same for every organization, I
will grant) and optimize all together rather than declaring a single
non-functional requirement as fixed in stone, while users might not get
what they need. Obviously, we want to have maintainability,
reliability, security, and all other non-functional requirements met,
but these need to be turned into functional requirements, it seems,
before they can be tested. So what tests does a DBMS need to meet in
order to be branded as having "data independence"? Are all the
functional requirements in that category of equal importantance?

> > then I told you that there
> > are production systems out there, if the jury really wanted to know;
>
> The jury knows. :-)

But you cannot point me to what the jury has found out, right?

> > then you said they have already gathered data; then I tried to ask
> > "where is it?"; and you said "I don't know any, nor do I expect
> > something like that to exist."
>
> ... in a published form that you can Google for. But that data is there
> in the sense that you can talk to experienced developers and DBMS
> implementors.

OK, I'm talkin' to you, Jan. How is XQuery performance against a
Cache' (MUMPS), U2 or D3/TigerLogic (PICK), or Viper implementation
going to compare to SQL queries on an SQL-DBMS if the first are NF2 and
the latter are normalized?

> > So, I think we talked past each other on that one. If RVAs and other
> > non-1NF structures are now accepted in theory and implemented in
> > practice, then when should they be used?
>
> When they bring you closer to your goals, such as for example
> flexibility and efficiency. Whether that is the case or not may depend
> upon the DBMS you are using.

Yes, in practice it will matter, but since we teach database theory in
higher ed, what should we teach about use of NF2 other than "it will
depend on your DBMS"? I would suggest that it is a bad idea to use
RVAs for strong entities, for example, while often a good idea to model
multivalued entity properties as RVAs or LVAs. Are logical data models
about performance? (Of course in practice they are, but what about in
theory?) If so, then we would need to teach logical data modeling for
a particular DBMS.

> > If we are going to (re-!)introduce nested structures, we should have a
> > good idea when a logical data model should include them, or at least
> > know some best practices. If a particular DBMS is not up to the task
> > of working well with performance or ease of queries with the nested
> > structures, then perhaps the actual implementation needs to normalize
> > (1NF). But I'm not seeing nested structures yet in logical data models
> > (not that I'm seeing all LDMs).
>
> To add some confusion; there actually is such a thing as *the* logical
> data model, called LDM. Google for "logical data model kuper vardi".
> But I digress.

Which I just did too by googling as suggested. A poor choice of names,
but I like di-graph models and I think I've glanced at this before, but
I'll add to the reading list.

> But what do you mean with "I'm not seeing nested structures". Are we
> talking about conceptual data models?

No, in conceptual data models you will have multivalues. There are
many defs for these terms, and if you consider the logical data model
to be database-dependent, then it is the implementation model, but that
is not quite how I'm thinking of it.

If I use the terms as described below, then what are some best
practices for the LDM?

LDM: logical organization for the data (not physical), like a
conceptual data model it is useful in a database independent way, used
by software developers who are writing software to be db-independent

IDM: implementation data model, this is the implementation of the
logical model optimized for a particular dbms. So, if one dbms does
well with a particular data approach and another does not, the CRUD
services written by software develoeprs for our database-independence
and used by application software developers will have the proper
translations for the specific supported data stores.

I recognize that a lot of SQL-DBMS's are not optimized for
implementation data models with RVAs, but there could still be best
practices with LDMs employing such -- what is a good theory of when to
model data as a nested set, list, or bag?

> About database schema designs in
> practice? Proposed data modelling notations? Or what?

LDMs in practice still (from what I have seen) exclude RVAs and what
I've read either ignores RVAs or suggests they rarely be employed in a
model, in some extreme cases. The industry seems to be headed toward a
return to nested structures (with XML and RVAs, for example), so what
would be some best practices for when an LDM (as described above)
should include them? Is the answer based on Oracle or SQL Server
performance or can we recognize that there are some tools where nested
structures work (we could choose Cache' for example) and start learning
how to write NF2-ready LDMs. We can then let other DBMS's that want to
play join in when they are NF2-capable.

I just not finding anything about NF2 best practices, although I can
think of several. Do you know of such write-ups? Thanks. --dawn

paul c

unread,

May 2, 2006, 8:12:27 PM5/2/06

to

> ...

Interesting you should say that. I thought it was adhering quite well
to the IP, more so than the tack the practical people I've known would
likely have taken. But most of them were excessively practical and I
admit I'm not, probably excessively impractical for that matter. If I
were a supplier, I must admit I'd give my part set offerings
names/identifiers, otherwise nobody would order them. But I'd prefer
the system to make them up for me.

> While I can see no particular argument against SP { S#, {P#} } as a
> view, I can see some potential problems with having it as a base
> relation. Of course, those problems are only potential problems.

My interest is mostly so that I can reconcile such perhaps oddball views
before I finish with my little engine. Actually, I'm interested in it
as an internal expression for want of a better term, more so than as a
view, even though I wouldn't want to prevent such a view. I'm always
suspicious of restrictions in systems since they *usually* end up
demonstrating that the implementor didn't know exactly what to do next
(like most of the developers in the average large project, can't resist).

thanks for comment,
p

Bob Badour

unread,

May 2, 2006, 8:54:35 PM5/2/06

to

paul c wrote:

I said it brushes up against it. It doesn't violate it in the sense that
one can still refer to a set {P#} by the value of the entire set. But
that's an awkward way to refer to it -- especially when the value can
change with time.

But most of them were excessively practical and I
> admit I'm not, probably excessively impractical for that matter. If I
> were a supplier, I must admit I'd give my part set offerings
> names/identifiers, otherwise nobody would order them. But I'd prefer
> the system to make them up for me.

Nothing I said precludes that. However, if SP { S#, {P#} } is your base
relation, how is the dbms going to refer to what it makes up?

>> While I can see no particular argument against SP { S#, {P#} } as a
>> view, I can see some potential problems with having it as a base
>> relation. Of course, those problems are only potential problems.
>
> My interest is mostly so that I can reconcile such perhaps oddball views
> before I finish with my little engine. Actually, I'm interested in it
> as an internal expression for want of a better term, more so than as a
> view, even though I wouldn't want to prevent such a view. I'm always
> suspicious of restrictions in systems since they *usually* end up
> demonstrating that the implementor didn't know exactly what to do next
> (like most of the developers in the average large project, can't resist).

Certainly such a construct would be useful for describing physical
clustering of data as well, but that ventures beyond the realm of the RM
per se.

> thanks for comment,
> p

You are very welcome.

paul c

unread,

May 2, 2006, 11:26:07 PM5/2/06

to

> ...

if you mean some part of a dbms, mentioning the relation might be the
same way as a developer might do that, eg., SP { S#, {P#} }. But to
sort of answer the question about what it is made up of, I don't think
it's necessary at this point to answer it, except to say that, obeying
the IP, one could regurgitate the parts set for the supplier one might
have in mind.
p

JOG

unread,

May 2, 2006, 11:41:10 PM5/2/06

to

dawn wrote:
>[chop]

> Not entities. Entities are things. It is a person, place, thing or
> event. Person, place, and event (e.g. transaction) are easy enough to
> identify. For things, if you cannot hit it with a stick or print it
> out on paper, then think twice about whether it is an entity or a
> property of an entity.

No, this is way off base - with ERM Chen wrote extensively about
associative entities for example, which are _exactly_ the sort of thing
that cannot be hit with a stick. But I don't want to get sidetracked by
this so hey, if the colour green as an abstract entity is too out
there, fair enough. Just substitute in something more physical into
those original statements - fruits for the colours for example, or
perhaps academic papers:

paper( paper1, {Barney, Oscar} ) &&
paper( paper2, {Barney}) &&
paper( paper3, {Oscar})

or

person( Barney, {paper1, paper2} ) &&
person( Oscar, {paper1, paper3} )

Everything here is clearly an entity. Both are strong. Both
representations come from those logically identical propositions - so
which is the right one?

Instinct might be to go for the first - papers have authors after all.
But my application's first job is to produce academic resumes - hence
the second is the more appropriate one for that 'interface'. Whichever
is picked, the choice enforces some form of arbitrary hierarchy. Unless
I encode both - is that preferable?

>
> > Which is correct? Neither
> > and both,
>
> The first, not the second.
>
> > because they are artifices. Who knows which will be
> > appropriate to the user?
>
> The systems analyst better be able to give it a good shot. If they
> cannot tell whether green-ness is an entity or a property of an entity
> wrt to the organization's requirements, I would be very surprised.
>
> > An XML or MV style prejudges that decision,
>
> No, it makes a distinction that those modeling with the RM do not make.
> I am an entity and my shoe size is a property of me. It can change.
> It could even go away if my feet were amputated (sheesh, apologies for
> the example), but only as long as I am an entity an organization cares
> about would that property be relevant. If I am not of interest, then,
> by definition, my feet are not of interest either. In other words, if
> you are interested in shoe size, you are interested in me because my
> shoe size is a property of me.

Yes, a weak entity. This is indeed a good example of something that
does fit hierarchically, but it *doesn't* negate the fact that there
are many examples that don't at all. Using colours as universals was a
poor example on my part as I feel its given rise to this red-herring.

>
> [I realize that if a company is doing a shoe size survey and has no
> interest in collecting other data about people, then the shoe size
> might be a property of a survey or some other entity.]
>
> > and with large shared data, who's future use is unpredictable, it seems
> > essential to me to avoid that interface-prison.
>
> I disagree, as you have likely guessed. Yes, there are new
> requirements regularly that prompt one change or other, but I would
> dare say that there are more changes that would push a property like
> color to go from single to multi-valued, prompting a schema redesign
> (new table) than there are if you make color a multivalued attribute to
> start with.

But, two wrongs don't make a right. Cardinality change is something
i've mentioned before as an interesting area in schema change as a
whole - but it does not mean the theoretical solution should be to
imprison the user into a certain view of the data. all best, J.

Jon Heggland

unread,

May 3, 2006, 2:16:24 AM5/3/06

to

dawn wrote:
> Now that SQL-DBMS's include NF2 structures (RVAs + UDTs),

So now user-defined datatypes violate 1NF as well? Sheesh. What's the
reasoning (if any) behind that? Or does not the word (UDT) mean what I
think it means?
--
Jon

Jon Heggland

unread,

May 3, 2006, 2:40:45 AM5/3/06

to

paul c wrote:
> An example that some may not find very interesting, ie., too simple but
> still troubles me might be "the sets/combinations of parts that a
> supplier will agree to ship" having no other attributes than S# and P#,
> eg., SP { S#, {P#}} where I'm intending {P#} to mean a set of parts. I'm
> interested to know of the other writers or what they say.

The standard examples of Date & co. is relvars and their keys, and
(iirc) functional dependencies. Those two both involve set-valued
attributes, and they can't be "flattened" without losing information
(unless you introduce an identifying attribute for each set). Your
example is equivalent, I think. Anyway, isn't that you on the TTM list?
You should know about Darwen's group-ungroup normal form, then.
--
Jon

Jon Heggland

unread,

May 3, 2006, 2:58:27 AM5/3/06

to

dawn wrote:

> Jon Heggland wrote:
>> Other writers claim they *do* know of such applications, however, and
>> have proposed (semi-)formal guidelines for identifying cases where RVAs
>> may be appropriate.
>
> Yes, but what about the use of RVAs even when the application could be
> implemented with simple domains?

"Simple" meaning "not relation-valued"?

> For example, we model data for XML
> documents for data exchange using multivalues even if we could do
> otherwise.

You do? Why?

I don't see the relationship between XML, RVAs and normalisation.
Normalisation by definition applies to relations/relvars only. (I must
confess I haven't seen the light with regard to using XML for exchange
of data to/from relational systems either.)

> Are there best practices for that?

You tell me. Isn't "modelling" "data for exchange" just producing
reports? In which case you do what is most convenient for the task at
hand. Which is very different from designing the database, where you (or
at least I) want it (i.e. the logical model) to be as orthogonal,
unbiased and simple as possible.

> Would any of those best
> practices be applicable to modeling persisted data as well? If data
> for exchange is persisted (perhaps in a database such as IBM Viper when
> that comes out), it would make sense that the best practices for
> modeling data for exchange and persistence would overlap, right?

I don't know, but I'm skeptical. Why do you want to "persist" "data for
exchange"? (I'm not really comfortable with either of those terms.)
--
Jon

dawn

unread,

May 3, 2006, 4:17:59 AM5/3/06

to

JOG wrote:
> dawn wrote:
> >[chop]
> > Not entities. Entities are things. It is a person, place, thing or
> > event. Person, place, and event (e.g. transaction) are easy enough to
> > identify. For things, if you cannot hit it with a stick or print it
> > out on paper, then think twice about whether it is an entity or a
> > property of an entity.
>
> No, this is way off base - with ERM Chen wrote extensively about
> associative entities for example, which are _exactly_ the sort of thing
> that cannot be hit with a stick.

Well aware. "Entity" is an overloaded term, of course. If using an
ERD approach, the E's you start with, not those used to handle
many-to-manys, nor the R's that turn into E's are the basics I'm
referring to as entities. I also said to think twice, not to exclude
the possibility. Starting with the basics, look at your problem space
and identify people, places, events, and "things."

> But I don't want to get sidetracked by
> this so hey, if the colour green as an abstract entity is too out
> there, fair enough. Just substitute in something more physical into
> those original statements - fruits for the colours for example, or
> perhaps academic papers:

If you are going to substitute something I can throw for something I
cannot, then we have a new entity, not just a property of an entity.
While color is a property of our characters, a paper is not. I would
model it as a separate entity in the conceptual data model (if we are
using UML class diagrams and not ORM, for example, then I would
indicate color as an attribute in the Character class and Paper as a
separate class).

> paper( paper1, {Barney, Oscar} ) &&
> paper( paper2, {Barney}) &&
> paper( paper3, {Oscar})
>
> or
>
> person( Barney, {paper1, paper2} ) &&
> person( Oscar, {paper1, paper3} )
>
> Everything here is clearly an entity. Both are strong.

Yes.

> Both
> representations come from those logically identical propositions - so
> which is the right one?

Both and neither. These are fine as derivations, but I would model
these propositions the same way you would for implementation, most
likely. It is "best practice" to model strong entities as separate
classes/files/tables/relations/functions/types... However, if the
problem domain is such that our organization could care less about
papers (today) and sees (recognizing we re not all-seeing) nothing on
the horizon that would suggest that it is a big risk factor -- nothing
suggesting our organization will care about any properties of these
papers, we only want to record that Barney has paper1 and paper2 while
Oscar has paper1 and paper3, then in a move from the conceptual to the
logical, we might decide to treat this strong entity as if it were
merely a property. Properties that really are properties are cheaper
than entities. Entities that can be implemented as properties might be
cheaper than entities implemented as entities if you do not have to
change them down the line. What happens if we change our mind in the
future? We "refactor" our solution. We change it to meet our new
requirements. It rarely makes good business sense to design anything
so that it mitigates all possible changes -- risk assessment is key.

> Instinct might be to go for the first - papers have authors after all.
> But my application's first job is to produce academic resumes - hence
> the second is the more appropriate one for that 'interface'. Whichever
> is picked, the choice enforces some form of arbitrary hierarchy.

You are correct now that you have changed the question to two strong
entities.

> Unless
> I encode both - is that preferable?

It might be preferable that both be views of the data (which SQL-92
would not permit, but perhaps people are now coding NF2 views of their
1NF data?), but I would model both as derivations, not base tables.

> >
> > > Which is correct? Neither
> > > and both,
> >
> > The first, not the second.
> >
> > > because they are artifices. Who knows which will be
> > > appropriate to the user?
> >
> > The systems analyst better be able to give it a good shot. If they
> > cannot tell whether green-ness is an entity or a property of an entity
> > wrt to the organization's requirements, I would be very surprised.
> >
> > > An XML or MV style prejudges that decision,
> >
> > No, it makes a distinction that those modeling with the RM do not make.
> > I am an entity and my shoe size is a property of me. It can change.
> > It could even go away if my feet were amputated (sheesh, apologies for
> > the example), but only as long as I am an entity an organization cares
> > about would that property be relevant. If I am not of interest, then,
> > by definition, my feet are not of interest either. In other words, if
> > you are interested in shoe size, you are interested in me because my
> > shoe size is a property of me.
>
> Yes, a weak entity. This is indeed a good example of something that
> does fit hierarchically,

happy, happy, joy, joy!

> but it *doesn't* negate the fact that there
> are many examples that don't at all.

Of course, of course. You don't want to take your entire problem
domain and start willy-nilly nesting relations. It is frustrating to
look at some XML doc designs. Give people, many of whom have been
taught normalization, an inch and they seem to botch it all up. There
are best practices for modeling data when you can use multivalues. You
need to be able to see the HUGE difference between Oscar's color and
Oscar's paper. If I no longer care about Oscar, I no longer care
about his color either.

E-mail addresses are in the fuzzy area where you need to make a call.
It typically makes sense that if a person is no longer of any interest
to your system, then their e-mail address is not either. So, the
e-mail address can be seen as a property of the person. Of course it
is the case that on person could ditch an e-mail address and another
adopt it, so they do each have a life apart from each other, but there
is a low risk for most orgs that their requirements will change enough
to separate e-mail address from person.

> Using colours as universals was a
> poor example on my part as I feel its given rise to this red-herring.

No, no, no -- it was a GREAT example. You can see I jumped for joy
above when you could see it as I did -- color as a multi-valued
property of a strong entity. Then changing your example and seeing how
I changed with it, might have further helped.

> > [I realize that if a company is doing a shoe size survey and has no
> > interest in collecting other data about people, then the shoe size
> > might be a property of a survey or some other entity.]
> >
> > > and with large shared data, who's future use is unpredictable, it seems
> > > essential to me to avoid that interface-prison.
> >
> > I disagree, as you have likely guessed. Yes, there are new
> > requirements regularly that prompt one change or other, but I would
> > dare say that there are more changes that would push a property like
> > color to go from single to multi-valued, prompting a schema redesign
> > (new table) than there are if you make color a multivalued attribute to
> > start with.
>
> But, two wrongs don't make a right. Cardinality change is something
> i've mentioned before as an interesting area in schema change as a
> whole - but it does not mean the theoretical solution should be to
> imprison the user into a certain view of the data. all best, J.

I agree. Rule of thumb -- model strong entities as separate, uh,
classes? relations? (pick your favorite collective term, and yes I'm
aware that some think it is a mistake to compare these two terms in any
way). Model properties, whether single or multi-valued, as attributes
of these. Make a call, based on risk assessment, whether you should
turn an entity into a property at the risk of needing to make a change
later. Properties that can stay properties are cheaper than entities.
Now if a property was single-valued and becomes multi-valued or vice
versa you flip a bit on it.

Whacha think? --dawn

dawn

unread,

May 3, 2006, 4:32:22 AM5/3/06

to

Jon Heggland wrote:
> dawn wrote:
> > Jon Heggland wrote:
> >> Other writers claim they *do* know of such applications, however, and
> >> have proposed (semi-)formal guidelines for identifying cases where RVAs
> >> may be appropriate.
> >
> > Yes, but what about the use of RVAs even when the application could be
> > implemented with simple domains?
>
> "Simple" meaning "not relation-valued"?

Yes.

> > For example, we model data for XML
> > documents for data exchange using multivalues even if we could do
> > otherwise.
>
> You do? Why?

For the same reason I use a multvalued database, perhaps.

> I don't see the relationship between XML, RVAs and normalisation.
> Normalisation by definition applies to relations/relvars only.

You can model an XML document with a relation, you just need
list-valued attributes and RVAs to do so.

> (I must
> confess I haven't seen the light with regard to using XML for exchange
> of data to/from relational systems either.)

But it is the relational systems that are going to change to accomodate
and there is a reason for that.

> > Are there best practices for that?
>
> You tell me.

Yes, there are, but I don't see them written up anywhere. Ever since
the RM busted onto the scene, business data processing types who use
NF2 approaches seem to have gone into hiding, just going about their
business but talking only amongst themselves so as not to ruffle the RM
feathers (or something). I honestly don't know why I can't find much
related the practice of data modeling in an NF2 world. That is gonna
have to change, I would think.

> Isn't "modelling" "data for exchange" just producing
> reports?

Output on one side, input on the other. You can think of it as
decidedly different from database data modeling until you decide you
want to persist and report on the exchange data.

> In which case you do what is most convenient for the task at
> hand. Which is very different from designing the database, where you (or
> at least I) want it (i.e. the logical model) to be as orthogonal,
> unbiased and simple as possible.

In order to model as much meaning as feasible, I want my data modeled
in a biased way (a way biased toward meaning where color and paper are
different, see chat with JOG on this topic), and what I think is simple
is not backed by the simplest mathematical model.

> > Would any of those best
> > practices be applicable to modeling persisted data as well? If data
> > for exchange is persisted (perhaps in a database such as IBM Viper when
> > that comes out), it would make sense that the best practices for
> > modeling data for exchange and persistence would overlap, right?
>
> I don't know, but I'm skeptical. Why do you want to "persist" "data for
> exchange"? (I'm not really comfortable with either of those terms.)

OK, I want to store all of the transactions coming in from web
services. Is that any better? --dawn

dawn

unread,

May 3, 2006, 4:37:14 AM5/3/06

to

Jon Heggland wrote:
> dawn wrote:
> > Now that SQL-DBMS's include NF2 structures (RVAs + UDTs),
>
> So now user-defined datatypes violate 1NF as well?

Only if they would be in the category of repeating groups or "nonsimple
domains" such as list-valued attributes.

> Sheesh. What's the
> reasoning (if any)

lists

> behind that? Or does not the word (UDT) mean what I
> think it means?

It does. I didn't want to equate NF2 with RVAs since my favorite
non-1NF structure is still the oft-forbidden list. cheers! --dawn

Bob Badour

unread,

May 3, 2006, 6:54:48 AM5/3/06

to

JOG wrote:

> dawn wrote:
>
>>[chop]
>>Not entities. Entities are things. It is a person, place, thing or
>>event. Person, place, and event (e.g. transaction) are easy enough to
>>identify. For things, if you cannot hit it with a stick or print it
>>out on paper, then think twice about whether it is an entity or a
>>property of an entity.
>
>
> No, this is way off base - with ERM Chen wrote extensively about
> associative entities for example, which are _exactly_ the sort of thing
> that cannot be hit with a stick. But I don't want to get sidetracked by
> this so hey, if the colour green as an abstract entity is too out
> there, fair enough. Just substitute in something more physical into
> those original statements - fruits for the colours for example, or
> perhaps academic papers:

Keep in mind the ignorant you address is utterly incapable of abstract
reasoning. Even with the concrete examples, she doesn't stand a chance
of ever grasping a clue.

Jon Heggland

unread,

May 3, 2006, 8:47:03 AM5/3/06

to

dawn wrote:
> Jon Heggland wrote:

>> So now user-defined datatypes violate 1NF as well?
>
> Only if they would be in the category of repeating groups or "nonsimple
> domains" such as list-valued attributes.

Then please say so. You're confusing the issue even more than usual with
such imprecise statements. Whether a type is user-defined or not is
supremely irrelevant.

(BTW, what "group" is "repeated" in a list of (say) integers?)
--
Jon

Bob Badour

unread,

May 3, 2006, 9:13:42 AM5/3/06

to

Jon Heggland wrote:

Jon, she's a self-aggrandizing ignorant and a crank. You are wasting
your time.