Dreaming About Redesigning SQL

Seun Osewa

unread,

Oct 3, 2003, 2:52:15 PM10/3/03

to

Hi,

This is for relational database theory experts on one hand and
imlementers of real-world alications on the other hand. If there was
a chance to start again and design SQL afresh, for best
cleaness/power/performance what changes would you make? What would
_your_ query language (and the underlying database concept) look like?

Seun Osewa
PS: I should want to post my ideas too for review but more
experienced/qualified people should come first

Christopher Browne

unread,

Oct 3, 2003, 5:39:03 PM10/3/03

to

After takin a swig o' Arrakan spice grog, seun...@inaira.com (Seun Osewa) belched out...:

> This is for relational database theory experts on one hand and
> imlementers of real-world alications on the other hand. If there was
> a chance to start again and design SQL afresh, for best
> cleaness/power/performance what changes would you make? What would
> _your_ query language (and the underlying database concept) look
> like?

There are two notable 'projects' out there:

1. There's Darwen and Date's "Tutorial D" language, defined as part
of their "Third Manifesto" about relational databases.

2. newSQL <http://newsql.sourceforge.net/>, where they are studying
two syntaxes, one based on Java, and one based on a
simplification (to my mind, oversimplification) of SQL.

The "newSQL" project suffers from their definition being something of
a "chip away everything that doesn't look like an elephant"
definition. They aren't defining, in "mathematical" terms, what their
language is supposed to be able to express; they are instead defining
a big grab-bag of minor syntactic features, and seem to expect that a
database system will emerge from that.

In contrast, "Tutorial D" is _all_ about mathematical definition of
what it is supposed to express, and the text is a tough read,
irrespective of other merits.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com').
http://cbbrowne.com/info/thirdmanifesto.html
DOS: n., A small annoying boot virus that causes random spontaneous
system crashes, usually just before saving a massive project. Easily
cured by Unix. See also MS-DOS, IBM-DOS, DR-DOS.
-- from David Vicker's .plan

Seun Osewa

unread,

Oct 3, 2003, 9:59:06 PM10/3/03

to

Thanks for the links.

Christopher Browne <cbbr...@acm.org> wrote in message news:<blkq9n$d9puv$4...@ID-125932.news.uni-berlin.de>...

> There are two notable 'projects' out there:
>
> 1. There's Darwen and Date's "Tutorial D" language, defined as part
> of their "Third Manifesto" about relational databases.
>
> 2. newSQL <http://newsql.sourceforge.net/>, where they are studying
> two syntaxes, one based on Java, and one based on a
> simplification (to my mind, oversimplification) of SQL.

I was able to get a pdf coy of the "Third Manifesto" article here:
http://citeseer.nj.nec.com/darwen95third.html
but the details of tutorial D seem not to be a part of that article.
NewSQL *might* be cool if someone found reason to use it in a DBMS.

Sometimes I wonder why its so important to model data in the "rela-
tional way", to think of data in form of sets of tuples rather than
tables or lists or whatever. I mean, though its elegant and based
on mathematical principles I would like to know why its the _right_
model to follow in designing a DBMS (or database). The way my mind
sees it, should we not rather be interested in what works?

Seun Osewa

Christopher Browne

unread,

Oct 4, 2003, 10:40:39 AM10/4/03

to

The world rejoiced as mas...@mascari.com (Mike Mascari) wrote:
> It's a very provocative read. At a minimum, one can learn what to
> avoid with SQL. The language looks neat on paper. Perhaps one day
> someone will provide an open source implementation. One could envision
> a "D" project along the same lines as the same sort of project that
> added SQL to Postgres...

I think you summed it up nicely. The "manifesto" is a provocative, if
painful, read. It is very useful at pointing out "pointy edges" of
SQL that might be wise to avoid.

I'm not thrilled with the language; I think they have made a mistake
in trying to make it too abbreviation-oriented. They keep operator
names short, to a fault.

As you say, the most likely way for a "D" to emerge in a popular way
would be by someone adding the language to an existing database
system.

There is a project out on SourceForge for a "D implementation," called
"Duro." It takes the opposite approach; the operators are all defined
as C functions, so you write all your code in C. It uses a data store
built atop Berkeley DB.

I think an implementor would be better off using an SQL database
underneath, and using their code layer in between to accomplish the
"divorce" from the aspects of SQL that they disapprove of. Sort of
like MaVerick, a Pick implementation in Java that uses a DBMS such as
PostgreSQL as the underlying data store.

You do a "proof of concept" by building something that translates D
requests to SQL requests. And _then_ get a project going to put a "D
parser" in as an alternative to the SQL parser. (Yes, that
oversimplifies matters. Tough...)
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/rdbms.html
Rules of the Evil Overlord #81. "If I am fighting with the hero atop a
moving platform, have disarmed him, and am about to finish him off and
he glances behind me and drops flat, I too will drop flat instead of
quizzically turning around to find out what he saw."
<http://www.eviloverlord.com/>

Seun Osewa

unread,

Oct 4, 2003, 11:49:08 AM10/4/03

to

Replies to this Thread From PostgreSQL Hackers
************************************************
mas...@mascari.com (Mike Mascari) wrote in message news:<3F7E3F49...@mascari.com>...
I read the Third Manifesto. There are many ideas in the TTM that have
strong arguments, although I most confess I haven't read any
critiques. A few (of many) points:

1) Strict adherence to the relational model, where all of SQL's
short-comings are addressed:

A) No attribute ordering
B) No tuple ordering (sets aren't ordered)
C) No duplicate tuples (relations are sets)
D) No nulls (2VL sufficient. Missing information is meta-data)
E) No nullogical mistakes (ex: SUM of an empty relation is zero, AVG
is an error)
F) Generalized transitive closure
G) Declared attribute, relation variable, and database constraints,
including transition constraints
H) Candidate keys required (this has positive logical consequences for
the DBMS implementor)
I) Tuple and relation-valued attributes
J) No tuple-level operations

a bunch more...

2) The query language should be computationally complete. The user
should be able to author complete applications in the language, rather
than the language being a sublanguage. This reverses Codd's query
sublanguage proposed in "A Relational Model of Data for Large Shared
Data Banks"

http://www.acm.org/classics/nov95/s1p5.html

<sarcasm>
Thanks ACM for just putting part of the paper on-line, complete with
broken links and spelling errors!
</sarcasm>

3) The language (a D implementation) would ensure a separation between
the logical design of the application and the physical implementation.
The programmer should think in terms of the evaluation of relational
algebraic expressions, not manipulating physical records in disk
blocks in a file.

4) The type system should separate the actual, internal representation
from the possible representation, of which there might be many. For
example, a POINT may be internally expressed in cartesian coordinates
but may supply both polar and cartensian THE_ operators.

5) The type system should implement D & D's view of multiple
inheritance, where read-operators are inherited but write-operators
aren't. This eliminates the "Is a Circle an Ellipse?" dilemma imposed
by C++, for example. IOW, in a "D" language, a Circle is an Ellipse.

They reject Stonebreaker's ideas of OIDs and relation variable
inheritance, which of course, are in PostgreSQL.

It's a very provocative read. At a minimum, one can learn what to
avoid with SQL. The language looks neat on paper. Perhaps one day
someone will provide an open source implementation. One could envision
a "D" project along the same lines as the same sort of project that
added SQL to Postgres...

But I'd rather have PITR :-)

Mike Mascari
mas...@mascari.com
************************************************

************************************************
ha...@tm.ee (Hannu Krosing) wrote in message news:<1065254989....@fuji.krosing.net>...

Mike Mascari kirjutas L, 04.10.2003 kell 06:32:
>
> 2) The query language should be computationally complete. The user
> should be able to author complete applications in the language, rather
> than the language being a sublanguage.

To me it seems like requiring that one should be able to author complete
programs in regex.

Yes, when all you have is a hammer everything looks like a nail ;)

----------------
Hannu
************************************************

Lee Fesperman

unread,

Oct 4, 2003, 11:23:55 PM10/4/03

to

Seun Osewa wrote:
>
> Sometimes I wonder why its so important to model data in the "rela-
> tional way", to think of data in form of sets of tuples rather than
> tables or lists or whatever. I mean, though its elegant and based
> on mathematical principles I would like to know why its the _right_
> model to follow in designing a DBMS (or database). The way my mind
> sees it, should we not rather be interested in what works?

Relational is the _right_ model because 'it works'. It's the only truly comprehensive
data model and subject of decades of research. All other data models have been found to
be flawed and (nearly) discarded.

If you don't care for mathematical principles, there's always ad-hoc database models.
Check out Pick, OO and XML databases. They're interested in what works and ignore
elegance and mathematical principles.

--
Lee Fesperman, FirstSQL, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)

Lauri Pietarinen

unread,

Oct 5, 2003, 3:47:44 PM10/5/03

to

Christopher Browne <cbbr...@acm.org> wrote in message news:<m3lls1v...@wolfe.cbbrowne.com>...

> I think an implementor would be better off using an SQL database
> underneath, and using their code layer in between to accomplish the
> "divorce" from the aspects of SQL that they disapprove of.

That is, in fact, the approach taken in a product called Dataphor
(see www.alphora.com). They have implemented a "D"-language (called D4)
that translates into SQL and hence uses underlying SQLServer, Oracle
or DB2- DBMS'es as the engine.

It is, however, not a very easy mapping to do and you have to resort
to all sorts of unclean stuff to make it work...

regards,
Lauri Pietarinen

Anthony W. Youngman

unread,

Oct 5, 2003, 4:46:57 PM10/5/03

to

In article <ba87a3cf.03100...@posting.google.com>, Seun Osewa
<seun...@inaira.com> writes
>Thanks for the links.
>
>Christopher Browne <cbbr...@acm.org> wrote in message news:<blkq9n$d9puv$4@ID-

>125932.news.uni-berlin.de>...
>> There are two notable 'projects' out there:
>>
>> 1. There's Darwen and Date's "Tutorial D" language, defined as part
>> of their "Third Manifesto" about relational databases.
>>
>> 2. newSQL <http://newsql.sourceforge.net/>, where they are studying
>> two syntaxes, one based on Java, and one based on a
>> simplification (to my mind, oversimplification) of SQL.
>
>I was able to get a pdf coy of the "Third Manifesto" article here:
>http://citeseer.nj.nec.com/darwen95third.html
>but the details of tutorial D seem not to be a part of that article.
>NewSQL *might* be cool if someone found reason to use it in a DBMS.

Is Darwen and Date's stuff that where they said SQL was crap. As I
understand it, within about a year of designing SQL, at least one of
Codd and Date said it was rubbish and tried to replace it with something
"better".

>
>Sometimes I wonder why its so important to model data in the "rela-
>tional way", to think of data in form of sets of tuples rather than
>tables or lists or whatever. I mean, though its elegant and based
>on mathematical principles I would like to know why its the _right_
>model to follow in designing a DBMS (or database). The way my mind
>sees it, should we not rather be interested in what works?
>

I couldn't agree more (of course I would). As I like to put it, surely
Occam's Razor says that stuffing the four-dimensional world into a flat-
earth database can't be the optimal solution!

The trouble with so many SQL advocates is that they are so convinced in
the mathematical rightness of the relational model, that they forget it
is a *model* and, as such, needs to be shown as relevant to the real
world.

That said, I always think relationally when designing databases - it
helps. Look at the multi-value databases. Think relationally, you can
still store your data in normal form, but you're not stuffed by all the
irrelevant restrictions that relational databases tend to impose.

Get a freebie copy of jBASE, UniVerse or UniData, and try them out :-)

Cheers,
Wol
--
Anthony W. Youngman <pi...@thewolery.demon.co.uk>
'Yings, yow graley yin! Suz ae rikt dheu,' said the blue man, taking the
thimble. 'What *is* he?' said Magrat. 'They're gnomes,' said Nanny. The man
lowered the thimble. 'Pictsies!' Carpe Jugulum, Terry Pratchett 1998
Visit the MaVerick web-site - <http://www.maverick-dbms.org> Open Source Pick

Anthony W. Youngman

unread,

Oct 5, 2003, 4:51:17 PM10/5/03

to

In article <3F7F8E...@ix.netcom.com>, Lee Fesperman
<firs...@ix.netcom.com> writes

>If you don't care for mathematical principles, there's always ad-hoc database
>models.
>Check out Pick, OO and XML databases. They're interested in what works and
>ignore
>elegance and mathematical principles.

Mathematical principles? You mean like Euclidean Geometry and Newtonian
Mechanics? They're perfectly solid, good, mathematically correct. Shame
they don't actually WORK all the time in the real world.

That's what I feel about relational, too ...

Cheers,
Wol
--
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
Witches are curious by definition and inquisitive by nature. She moved in. "Let
me through. I'm a nosey person.", she said, employing both elbows.
Maskerade : (c) 1995 Terry Pratchett

Bob Badour

unread,

Oct 6, 2003, 5:59:28 PM10/6/03

to

"Anthony W. Youngman" <thew...@nospam.demon.co.uk> wrote in message news:<xTDLP1CFRIg$Ew...@thewolery.demon.co.uk>...

> In article <3F7F8E...@ix.netcom.com>, Lee Fesperman
> <firs...@ix.netcom.com> writes
> >If you don't care for mathematical principles, there's always ad-hoc database
> >models.
> >Check out Pick, OO and XML databases. They're interested in what works and
> >ignore
> >elegance and mathematical principles.
>
> Mathematical principles? You mean like Euclidean Geometry and Newtonian
> Mechanics? They're perfectly solid, good, mathematically correct. Shame
> they don't actually WORK all the time in the real world.
>
> That's what I feel about relational, too ...

That explains the generally poor quality of your posts. You substitute
emotion for reason.

Bob Badour

unread,

Oct 6, 2003, 6:03:14 PM10/6/03

to

"Anthony W. Youngman" <thew...@nospam.demon.co.uk> wrote in message news:<qTyNfpCBNIg$Ew...@thewolery.demon.co.uk>...

That's a compelling argument to avoid pick as a flat file processor,
and a strong argument for representing n-dimensional data in n-ary
relations.

> The trouble with so many SQL advocates is that they are so convinced in
> the mathematical rightness of the relational model, that they forget it
> is a *model* and, as such, needs to be shown as relevant to the real
> world.

The trouble with pick/mv advocates is they are ignorant, stupid and
irrational buffoons convinced in the superiority of their product in
spite of all contradictory evidence.

Mike Sherrill

unread,

Oct 6, 2003, 10:47:50 PM10/6/03

to

On 3 Oct 2003 21:39:03 GMT, Christopher Browne <cbbr...@acm.org>
wrote:

>There are two notable 'projects' out there:
>
> 1. There's Darwen and Date's "Tutorial D" language, defined as part
> of their "Third Manifesto" about relational databases.
>
> 2. newSQL <http://newsql.sourceforge.net/>, where they are studying
> two syntaxes, one based on Java, and one based on a
> simplification (to my mind, oversimplification) of SQL.

ISTR that Terry Halpin (of ORM fame) designed a language named
"ConQuer". I don't know the details, but I think Date's latest
edition refers to it in a note. Halpin's working on Visio at
Microsoft now, I think.

--
Mike Sherrill
Information Management Systems

Seun Osewa

unread,

Oct 7, 2003, 2:07:33 AM10/7/03

to

One question that I think must be looked into is this: If SQL
databases are successful today, is it because:
** of the relational _model_ they are based on?

** of the ease with which SQL can be used from within all programming
languages and as an interactive query language?

** The failure or earlier models and the support of major SQL database
vendors once it reached critical mass of adoption?

In other words do we have the model, the language, or standardisation
to blame/praise for the popularity of the relational model?

I would also like to know the classical arguments against the network
model or other "pointer based" models. The only things I know are
that:

** using pointers to positions in memory or disk can be messy when
data has to be moved around. But then is seems there are several
simple ways to solve this, e.g. what I can only call "logical
pointers".

** The difficulty of performing adhoc queries. but I want to think
that if there is a procedural (query?) language many of the advanced
features of SQL e.g. group by and sorting of the data can be done with
user-generated procedural code. Then I observe that most databases in
the world today are being accessed by asp, php or perl web scripts and
only recieve adhoc queries during the development stage. These
queries are written by developers who are skilled enough in procedural
programming not to have problems if they needed such code to access
the data.

Other than that, what's wrong with the network (or similar) models? I
have not found a good link to such a discussion though I hear it
repeated that there are certain classical arguments against them.

Seun Osewa

Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F7F8E...@ix.netcom.com>...

Seun Osewa

unread,

Oct 7, 2003, 5:19:36 AM10/7/03

to

What do you think about the so-called "Associative Model of Data"

"The associative model divides the real-world things about which data
is to be recorded into two sorts: Entities are things that have
discrete, independent existence. An entity's existence does not depend
on any other thing. Associations are things whose existence depends on
one or more other things, such that if any of those things ceases to
exist, then the thing itself ceases to exist or becomes meaningless."

Whitepaper(its ok to skip over the hype):
http://www.lazysoft.com/docs/other_docs/amd_whitepaper.pdf

==Unrelated to the above, but somewhat motivated by it==
I am beginning to think there may a place for a database model that
consists of the most commonly used subset of relational, SQL or
aiming-to-be-relational databases. For example, I may want to make a
generalization of the observation that so many tables begin with
CREATE TABLE XXX (ID INT AUTO_INCREMENT PRIMARY KEY ... ) unless there
is a natural unique key, and that those that don't are often used
merely to indicate many-to-many relationships: CREATE TABLE COUPLES
(MAN_ID INT REFERENCES MEN, WOMAN_ID INT REFERENCES WOMEN).

Now even in the case where there is a natural primary key for a table
(hope my terminology is right) e.g. a social security number for a
PERSONS table, a product part number (like METPLAS012A) for a
PRODUCT_PARTS table, etc, if we can figure a way to generate an
integer primary key (from 0 to 2^32 or 2^64 or as the case may be),
then its easy to think of a technique related to hashing that
eliminates the need for another index on the "natural primary key" and
makes lookups/joins for the table using the "natural" key as fast as
operations using the "meaningless" integer key.

Sometimes I think that, while the the separation of logical from
physical model is a nice concept, it should not prevent those who
understand the underlying physical implementation and the data access
patterns of their target application from tweaking things to achieve
acceptable performance... things they would not have to tweak if their
performance was ok by default. I want to compare it to things like
C++ inline keyword, which help to make things faster in the case where
the programmer knows a little bit more than the compiler. I want to
say if the application designer knows a lower-level way to get his
applications to run faster, give him the chance, that a good language
should give power to the user who wants to go the extra mile.

So pls help me point out any flaws there might be in my logic!

A million thanks,

Seun Osewa.

Jan Hidders

unread,

Oct 7, 2003, 3:37:25 PM10/7/03

to

Seun Osewa wrote:
> What do you think about the so-called "Associative Model of Data"

Looks like a slightly scrambled and rather limited version of ORM.
Certainly nothing new there.

> ==Unrelated to the above, but somewhat motivated by it==
> I am beginning to think there may a place for a database model that
> consists of the most commonly used subset of relational, SQL or
> aiming-to-be-relational databases.

Has already been done. See RM/T by Codd.

> Sometimes I think that, while the the separation of logical from
> physical model is a nice concept, it should not prevent those who
> understand the underlying physical implementation and the data access
> patterns of their target application from tweaking things to achieve
> acceptable performance...

That depends on your situation. If we are talking about a large database
with many users and/or applications that access it then it should be the
DBA that decides what is optimized for whom so he or she gets to decide
what exactly the underlying physical implementation is. In that case the
programmer does not and cannot know what the underlying physical
implementation is. The whole idea of data independence is that the DBA
is able to change this without the application noticing this, except for
a difference in performance, of course.

-- Jan Hidders

Jan Hidders

unread,

Oct 7, 2003, 3:51:39 PM10/7/03

to

Seun Osewa wrote:
> In other words do we have the model, the language, or standardisation
> to blame/praise for the popularity of the relational model?

Don't forget market domination.

> I would also like to know the classical arguments against the network
> model or other "pointer based" models. The only things I know are
> that:
>
> ** using pointers to positions in memory or disk can be messy when
> data has to be moved around. But then is seems there are several
> simple ways to solve this, e.g. what I can only call "logical
> pointers".

Correct. There's absolutely no reason to believe that you cannot have
data independence with logical pointer or references. I would however
argue that allowing entities without representable keys is not a good idea.

> ** The difficulty of performing adhoc queries. but I want to think
> that if there is a procedural (query?) language many of the advanced
> features of SQL e.g. group by and sorting of the data can be done with
> user-generated procedural code.

No. Procedural query language are a very big no no, because you should
let the database choose how to optimize the queries and not let the
users choose some optimizations on assumptions about how they think the
data is stored.

-- Jan Hidders

Lee Fesperman

unread,

Oct 7, 2003, 5:32:06 PM10/7/03

to

Jan Hidders wrote:

>
> Seun Osewa wrote:
> > I would also like to know the classical arguments against the network
> > model or other "pointer based" models. The only things I know are
> > that:
> >
> > ** using pointers to positions in memory or disk can be messy when
> > data has to be moved around. But then is seems there are several
> > simple ways to solve this, e.g. what I can only call "logical
> > pointers".
>
> Correct. There's absolutely no reason to believe that you cannot have
> data independence with logical pointer or references. I would however
> argue that allowing entities without representable keys is not a good idea.

Incorrect. Even though you call them 'logical' pointers, they are still physical
artifacts and have no place in a truly logical view of the database. Databases are about
data, and pointers are not data (or meta-data).

This is very old news.

The OP needs to increase his knowledge of database concepts before he tries to 'fix'
things. A newsgroup is the wrong place to get any real depth of understanding.

Lauri Pietarinen

unread,

Oct 7, 2003, 6:44:03 PM10/7/03

to

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.03100...@posting.google.com>...

> One question that I think must be looked into is this: If SQL
> databases are successful today, is it because:
> ** of the relational _model_ they are based on?

This is the main reason.

>
> ** of the ease with which SQL can be used from within all programming
> languages and as an interactive query language?

I think this "dual mode" concept was proposed by Codd, so in a sense
it is part of the "Relational Model"

>
> ** The failure or earlier models and the support of major SQL database
> vendors once it reached critical mass of adoption?

IMS (a hierarchical DBMS) and IDMS (a network, or CODASYL DBMS) were
VERY successful in the 70's and 80's. SQL databases had an uphill
battle.

>
> In other words do we have the model, the language, or standardisation
> to blame/praise for the popularity of the relational model?

I think the model is the biggest issue. Standardisation has also
helped. Many say that QUEL was better than SQL, so maybe we could say
"inspite of SQL"

>
> I would also like to know the classical arguments against the network
> model or other "pointer based" models. The only things I know are
> that:
>
> ** using pointers to positions in memory or disk can be messy when
> data has to be moved around. But then is seems there are several
> simple ways to solve this, e.g. what I can only call "logical
> pointers".

Well, that's what the relational model does: it uses "logical
pointers".

>
> ** The difficulty of performing adhoc queries. but I want to think
> that if there is a procedural (query?) language many of the advanced
> features of SQL e.g. group by and sorting of the data can be done with
> user-generated procedural code. Then I observe that most databases in
> the world today are being accessed by asp, php or perl web scripts and
> only recieve adhoc queries during the development stage. These
> queries are written by developers who are skilled enough in procedural
> programming not to have problems if they needed such code to access
> the data.

If you take a standard SQL-query with sorting and grouping and some
joins and compare it with hand written "navigational" code you will
notice that what can be expressed in SQL in 5-10 lines of code will
require several pages of hand written code for equivalent result. It
is also very helpfull to pretest your SQL using a query editor, and
when you are convinced of the correctness of your SQL query you just
drop it in the program.

The third issue is that the SQL query can be optimised to a much
higher degree than procedural code. Say you add a new index. The
DBMS will/can immediately start using it without any user intervention
(if it decides to). In the procedural alternative you would have to
recode your query. The optimiser can also take into account the
cardinality of tables (=number of rows) so that it will produce a
different plan for a small database and a big database. An optimiser
can even be sensitive to input from users at run time, say you have

select name, salary
from nurses
where sex = ?

If the parameter given at run time is 'M' then it could be
advantageous to use an index, but not if parameter is 'F'. This
decision could be made by the optimiser "on the fly". I hope you get
the picture...

>
> Other than that, what's wrong with the network (or similar) models? I
> have not found a good link to such a discussion though I hear it
> repeated that there are certain classical arguments against them.

In 1974 there was the BIG DEBATE where Codd defended the RM against
network guys. I wonder if there is a transcript available?

I think one of Codds arguments was that the number of manipulative
operators needed in the RM (i.e. insert, update, delete) was much
lower than in CODASYL (connect, disconnect, etc etc...)

regards,
Lauri Pietarinen

Anthony W. Youngman

unread,

Oct 7, 2003, 5:27:01 PM10/7/03

to

In article <ba87a3cf.03100...@posting.google.com>, Seun Osewa
<seun...@inaira.com> writes

>One question that I think must be looked into is this: If SQL
>databases are successful today, is it because:
>** of the relational _model_ they are based on?

This model, because it's mathematically provable, gives Computer
Scientists the warm fuzzies ...

>
>** of the ease with which SQL can be used from within all programming
>languages and as an interactive query language?

This is an effect, not a cause. SQL is a naff query language - it's a
mathematical notation (which is why computer scientists like it) but an
absolute bummer for normal people.

>
>** The failure or earlier models and the support of major SQL database
>vendors once it reached critical mass of adoption?

Well, with IBM behind it, it wasn't hard to reach critical mass ...

Like most things, once the ball starts rolling, it's hard to stop. The
fact that it may (or may not) be crap tends to be irrelevant :-)

>
>In other words do we have the model, the language, or standardisation
>to blame/praise for the popularity of the relational model?

I think we have what is called the "network effect" - in other words,
it's popular because it's popular. I know that's a circular argument,
but in the real world that sort of argument works.

>
>I would also like to know the classical arguments against the network
>model or other "pointer based" models. The only things I know are
>that:

The relational advocates believe that because their model is
"mathematical" it must be better than anything else. And all too often
they convince themselves that theirs is the only model based on maths -
when in fact most of the others are too ...

>
>** using pointers to positions in memory or disk can be messy when
>data has to be moved around. But then is seems there are several
>simple ways to solve this, e.g. what I can only call "logical
>pointers".
>
>** The difficulty of performing adhoc queries. but I want to think
>that if there is a procedural (query?) language many of the advanced
>features of SQL e.g. group by and sorting of the data can be done with
>user-generated procedural code. Then I observe that most databases in
>the world today are being accessed by asp, php or perl web scripts and
>only recieve adhoc queries during the development stage. These
>queries are written by developers who are skilled enough in procedural
>programming not to have problems if they needed such code to access
>the data.

Jan thinks the DBA should determine how the database should be
optimised. As a Pickie, I'm used to believing that a DBA shouldn't be
necessary. Pick *doesn't* *have* a query optimiser - because it doesn't
*need* one. (It's hard to optimise something that's 97% efficient before
optimisation.)

Let's design a Pick database. Let's normalise it. And store it in the
Pick data structure. I can declare one "table" per entity (with no
attribute or relationship tables - I don't need them). Each Pick FILE is
equivalent to a relational entity-view.

So, given that I know the primary key of the entity I want to access, it
takes me just ONE disk seek to retrieve EVERYTHING about that entity.
Its attributes. Its relationships. THE LOT. Now do you see why I say a
query optimiser is a waste of time?

>
>Other than that, what's wrong with the network (or similar) models? I
>have not found a good link to such a discussion though I hear it
>repeated that there are certain classical arguments against them.
>

There's nothing wrong with other models. The thing about relational is
that it is designed to make "ad hoc" "easy". No one query is harder or
easier than any other. The problem is that by reducing all queries to
the same difficulty, the "easy" queries are made far harder than they
need to be.

Things like hierarchical databases make easy queries easy. The snag is
they leave hard queries hard.

But let me leave you with this - why are DB2, MS SQL-server, Oracle all
adding multi-dimensional features to their relational databases? I'll
tell you. It's because they work - and Pick has had exactly that from
the day it was designed nearly 40 years ago. And the typical company
that runs Pick as its database spends roughly half (as a percentage of
turnover) what a relational-based company spends on its database.

Cheers,
Wol
--

Jerry Gitomer

unread,

Oct 7, 2003, 8:29:57 PM10/7/03

to

The RDBMS wound up as the most popular approach to DBMS for practical
business reasons. The premier DBMS at the time of the introduction of
Oracle (the first RDBMS to hit the market) was IBM's IMS. IMS was much
faster than Oracle and, given the concern about system performance in an
era where supercomputers didn't have the performance of a Pentium, the
fact that Oracle and subsequent RDBMS became more popular was due to the
one factor which concerned businesses even more than bang for the buck.

This was the cost of maintaining a working system. The primary advantage
of the RDBMS over its predecessors in the 1970's was that the RDBMS was
dynamic, while most of its predecessors were static. By that I mean that
if the world changed it was easy to change the RDBMS model and code to
meet the changing conditions. This was not the case with the
alternatives.

IMS was designed to optimize performance and, from what I recall, was very
difficult to change. Even trivial changes in the real world could result
in massive reprogramming efforts. Things were so bad that in some cases
once the world started to change the programmers could barely keep pace
and the use of a database became a liability rather than an asset.

Paul G. Brown

unread,

Oct 8, 2003, 4:06:10 AM10/8/03

to

Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F8330...@ix.netcom.com>...

> Jan Hidders wrote:
> >
> > Seun Osewa wrote:
> > > I would also like to know the classical arguments against the network
> > > model or other "pointer based" models. The only things I know are
> > > that:
> > >
> > > ** using pointers to positions in memory or disk can be messy when
> > > data has to be moved around. But then is seems there are several
> > > simple ways to solve this, e.g. what I can only call "logical
> > > pointers".
> >
> > Correct. There's absolutely no reason to believe that you cannot have
> > data independence with logical pointer or references. I would however
> > argue that allowing entities without representable keys is not a good idea.
>
> Incorrect. Even though you call them 'logical' pointers, they are still physical
> artifacts and have no place in a truly logical view of the database. Databases are about
> data, and pointers are not data (or meta-data).

Erm . .

What's 'physical' about this?

RELATION Dept ( Id Dept_Id KEY, Name String );
RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
Name PersonName, Salary Money );

RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;

RETRIEVE E.Name FROM Emp E, Dept D
WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';

This schema assumes a strict two-value logic, so Emp.Dept cannot, under any
circumstances what-so-ever, have anything in it except a 'logical reference'
to exactly one Dept tuple. How is this achieved under the covers? Who
cares! It's about the logical model.

(I should note that this formulation kind of makes my skin crawl for
aesthetic reasons, but its a REF()/DEREF() model that is essentially
syntactic sugar around well understood key and foreign key semantics.)

Not trying to start a flame war, really (though that's an easy thing to
do when you're as ignorant as I am). I'm just hoping that we can all
agree that it's the semantics of the model, not the syntax, which we ought
to care about. 'Logical pointers' (by which I mean this simple REF/DEREF,
and mindful of a memory I have to the effect that SQL-3 REF/DEREF are not)
are mappable to PK/FK, and to a wide variety of rules stated in terms of the
existance of a tuple with certain properties.

The advantage of predicate logic is that there are various kinds of
rules which can't be stated using REF/DEREF. And when you have a choice
between two models take the one with the broader expressive power.

KR

P "twit filtered since 2003!" b

Jan Hidders

unread,

Oct 8, 2003, 4:53:58 AM10/8/03

to

Lee Fesperman wrote:

> Jan Hidders wrote:
>>
>>Correct. There's absolutely no reason to believe that you cannot have
>>data independence with logical pointer or references. I would however
>>argue that allowing entities without representable keys is not a good idea.
>
> Incorrect. Even though you call them 'logical' pointers, they are
> still physical artifacts and have no place in a truly logical view of
> the database.

Logical pointers can be defined at the logical level and implemented in
various ways. They are just as much physical artifacts as, say,
relations are.

> Databases are about
> data, and pointers are not data (or meta-data).

They carry information. That makes them data. I see no good reason to
use a more restrictive definition.

> The OP needs to increase his knowledge of database concepts before he
> tries to 'fix' things.

His motivation may be wrong, but he asks the right questions.

> A newsgroup is the wrong place to get any real
> depth of understanding.

I fully agree. If only because there seem to be so many in these
newsgroups that confuse "knowledge of database concepts" with "knowing
what Chris Date et al. say about them". If you want to know what the
real experts in industry and the research community think, these
newsgroups are certainly not representative.

-- Jan Hidders

Seun Osewa

unread,

Oct 8, 2003, 4:56:59 AM10/8/03

to

Lee Fesperman <firs...@ix.netcom.com> wrote

> The OP needs to increase his knowledge of database concepts before he tries to 'fix'
> things. A newsgroup is the wrong place to get any real depth of understanding.

In that case, I would like to know what is the _right_ way to get the
depth of understanding. At the moment due to where I live I only have
access to internet resources, so what do you suggest? Besides, I find
that many people who have all followed the right way disagree on a
number of issues. I know that there are certain classic books that
would help (and I would like to be pointed to some more), but there
must be some web resources I've missed?

The observation is appreciated, a little assistance in this area would
be appreciated even more ;-). How else can I get in touch with very
smart people?

Seun Osewa

unread,

Oct 8, 2003, 5:40:02 AM10/8/03

to

Thanks Jerry,

This seems a most reasonable, logical explanation for the current
state of things. And I guess it still explains why OO databases are
not catching on as fast as predicted today. The issue of flexibility,
the fact that the corporate database should be able to support several
generations of several Application programs ... and of course the
"critical mass" effect and the backing of major corporate players like
IBM must have done the rest.

Seun Osewa.

"Jerry Gitomer" <jgit...@erols.com> wrote in message news:<pan.2003.10.08....@erols.com>...

Seun Osewa

unread,

Oct 8, 2003, 6:56:49 AM10/8/03

to

I have tried, twice, to download the evaluation version of the alphora
product for testing and it doesn't work. Guess there would be a lot
to learn from playing with it; the product is more than a RDBMS

Regards,
Seun Osewa

lauri.pi...@atbusiness.com (Lauri Pietarinen) wrote:
> That is, in fact, the approach taken in a product called Dataphor
> (see www.alphora.com). They have implemented a "D"-language (called D4)
> that translates into SQL and hence uses underlying SQLServer, Oracle
> or DB2- DBMS'es as the engine.
>

> regards,
> Lauri Pietarinen

Seun Osewa

unread,

Oct 8, 2003, 7:29:47 AM10/8/03

to

Jan Hidders <jan.h...@REMOVETHIS.pandora.be> wrote in message news:<fPEgb.63802$IA4.3...@phobos.telenet-ops.be>...

> Seun Osewa wrote:
> > In other words do we have the model, the language, or standardisation
> > to blame/praise for the popularity of the relational model?
>
> Don't forget market domination.

By "standardisation" I really meant something close to market
domination.

> No. Procedural query language are a very big no no, because you should
> let the database choose how to optimize the queries and not let the
> users choose some optimizations on assumptions about how they think the
> data is stored.

If procedural languages are so bad for database access, how come all
the major database vendors (and now the SQL99 standard) have
procedural extensions? Procedural coding is flexible, powerful,
although not always the most elegant. There's a line we always have to
draw between what should be done procedurally and what should be done
declaratively because not everything can be done declaratively and you
have to depend on what is included in the database engine.

And personally I think that once you have identified the records you
are interested in, through a 'declarative' query which can be
optimized by the DBMS, one should be able to manipulate is with
relative freedom from a powerful-enough procedural. GROUP BY,
AGGREGATE FUNCTIONS, and any further processing or transformation of
the raw data are probably not so complicated that they cannot be
implemented as _libraries_ callable from the procedural language. And
yes, with such libraries the procedural code should be shortened.

So my developing idea is to have:
- A declarative language for indicating which data I am interested in.
- A procedural language for working on each record of the SELECTed
data: locking, unlocking, updating, creating data that depends on it
...

Does anybody on this list have information about "transactional"
languages or anything like that. A procedural language where all your
actions have no effect until you "commit" and you can rollback, etc?

Regards,
Seun Osewa
"There are things we do primarily because we think they are fun and
might just be useful to someone somewhere sometime ..."

Jan Hidders

unread,

Oct 8, 2003, 8:56:18 AM10/8/03

to

Paul G. Brown wrote:
>
> Erm . .
>
> What's 'physical' about this?
>

> RELATION Dept ( Id Dept_Id KEY, Name String ); [....]

Absolutely nothing, of course, and just about any expert on database
theory (people who had several publications on PODS, ICDT, et cetera)
that I talked to about this agrees with you on this. In fact, I also
talked to some logicians (you know, people who actually know about what
is logical and what is not, :-)) about this question, and their answer
was that from their point of view it was in fact the String-part that
was less logical. This is not easy to explain in a few sentences but if
you are interested in the links between logic and the relational model I
can recommend you:

Jan Van den Bussche: Applications of Alfred Tarski's Ideas in
Database Theory. CSL 2001: 20-37

It's on-line at:

http://citeseer.nj.nec.com/vandenbussche01applications.html

Especially important is the reference

[59] A. Tarski, "What are logical notions?", History and Philosophy
of Logic, 7:143-154, 1986, Edited by J. Corcoran.

but unfortunately I couldn't find that on-line. To get an idea read the
definitions at the end of page 2 and beginning of page 3 in the paper by
Van den Bussche. It's not easy stuff if you're not used to it, but I
hope you will understand that what it says there is that logical notions
treat the atomic values as *abstract* values, which is what references,
object identifiers, logical pointers or whatever you want to call them,
actually are. And Alfred Tarski, who may be considered as one of the
founders of modern logic, consideres this to be such a fundamental
property of logical notions that he in fact uses it as the very
definition of what a logical notion is.

So am I arguing that we should only use abstract values in the columns
of our relations? Certainly not. What Tarski presents here is a
simplified model that he uses to understand certain properties of
logical systems. But what this does tell you is that from a logical
point of view there is no problem with abstract values and in some sense
they can even be regarded as more fundamental than concrete values.

-- Jan Hidders

Jan Hidders

unread,

Oct 8, 2003, 9:31:57 AM10/8/03

to

Seun Osewa wrote:
> Jan Hidders <jan.h...@REMOVETHIS.pandora.be> wrote in message news:<fPEgb.63802$IA4.3...@phobos.telenet-ops.be>...
>>

>>No. Procedural query language are a very big no no, because you should
>>let the database choose how to optimize the queries and not let the
>>users choose some optimizations on assumptions about how they think the
>>data is stored.
>
> If procedural languages are so bad for database access, how come all
> the major database vendors (and now the SQL99 standard) have
> procedural extensions?

I interpreted the term "procedural query langauge" as referring to a
strictly procedural query language. There is no problem with a
procedural extension of a declarative language, as long as you keep the
core language as declarative as possible.

> And personally I think that once you have identified the records you
> are interested in, through a 'declarative' query which can be
> optimized by the DBMS, one should be able to manipulate is with
> relative freedom from a powerful-enough procedural. GROUP BY,
> AGGREGATE FUNCTIONS, and any further processing or transformation of
> the raw data are probably not so complicated that they cannot be
> implemented as _libraries_ callable from the procedural language.

Well, they *are* that complicated. There are quite a few sophisticated
query optimization tricks that databases can apply to them, and some of
these depend upon the existence of certain indices and therefore
knowledge of the physical layer. So those would then be lost.

-- Jan Hidders

Seun Osewa

unread,

Oct 8, 2003, 12:41:44 PM10/8/03

to

Some other things I forgot to mention in my last post:

1) Its unfortunate that this thread shows up as two discussions on
google groups. This is because subscribers to the postgresql hackers
mailing list cannot reply to the other lists. Its a pity, because
there are some interesting people on that list.

2) I still think there are some problems with the principle of "let
the user only have to know the database at the logical level; separate
the model from the implementation". It is nice if the user does not
have to know anything about implementation to in order to execute
queries and manipulate data. Its nice to let the DBA and DBMS be the
one to worry about these things. But in order to do this properly the
DBA _needs_ to know how the user is _likely_ to make use of the data.
The user should be able to supply _hints_ to the DBMS to help it to
optimize the way it lays out data, etc. That's why I used the C++
inline keyword used for small functions. It's a _hint_ to the
compiler, and it does not affect the results of the program, but it
sure as hell can help if the compiler is not smart enough or lacks
data necessary to make correct decisions. Rather than insist that the
all query optimization be done by the DBMS, it seems to me that the
thing to do is to make it _unnecessary_, not impossible, for the user
to make things run faster.

3) I guess that's what all the commercial RDMSs are already doing ;)

Another two cents with regards,
Seun Osewa

Mikito Harakiri

unread,

Oct 8, 2003, 12:45:43 PM10/8/03

to

paul_geof...@yahoo.com (Paul G. Brown) wrote in message news:<57da7b56.03100...@posting.google.com>...

> RELATION Dept ( Id Dept_Id KEY, Name String );
> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> Name PersonName, Salary Money );
>
> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>
> RETRIEVE E.Name FROM Emp E, Dept D
> WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';

I'm not sure I understand what DEREF(E.Dept).Id = E.Id predicate is
saying. Are you matching Employee id against Department id? Therefore,
if ref/deref is indeed just a syntactic sugar around foreign key
constraint, what is the syntax for join predicate?

Hrundi V. Bakshi

unread,

Oct 8, 2003, 12:46:50 PM10/8/03

to

"Seun Osewa" <seun...@inaira.com> wrote in message
news:ba87a3cf.03100...@posting.google.com...

> Some other things I forgot to mention in my last post:
>
> 1) Its unfortunate that this thread shows up as two discussions on
> google groups. This is because subscribers to the postgresql hackers
> mailing list cannot reply to the other lists. Its a pity, because
> there are some interesting people on that list.

Hackers are unable to reply to the other groups. That's just hillarious.

Paul G. Brown

unread,

Oct 8, 2003, 3:22:03 PM10/8/03

to

Jan Hidders <jan.h...@pandora.be.REMOVE.THIS> wrote in message news:<3f840...@news.ruca.ua.ac.be>...

> Paul G. Brown wrote:
> >
> > Erm . .
> >
> > What's 'physical' about this?
> >
> > RELATION Dept ( Id Dept_Id KEY, Name String ); [....]
>
> Absolutely nothing, of course, and just about any expert on database
> theory (people who had several publications on PODS, ICDT, et cetera)
> that I talked to about this agrees with you on this. In fact, I also
> talked to some logicians (you know, people who actually know about what
> is logical and what is not, :-)) about this question, and their answer
> was that from their point of view it was in fact the String-part that
> was less logical. This is not easy to explain in a few sentences

[...]

Little worth knowing is.

[...]

> So am I arguing that we should only use abstract values in the columns
> of our relations? Certainly not. What Tarski presents here is a
> simplified model that he uses to understand certain properties of
> logical systems. But what this does tell you is that from a logical
> point of view there is no problem with abstract values and in some sense
> they can even be regarded as more fundamental than concrete values.

I think I grok the principles supporting 'abstract' types. I've not
read the Tarski but in my earnest youth I railed against REF/DEREF for
all of the 'right reasons' until it was pointed out to me that databases
routinely store filenames and then reason about the filenames perfectly
happily thank you very much and filenames are just references so would
I please just shut up, OK?

But I still can't swallow the syntax.

KR

Pb

Paul G. Brown

unread,

Oct 8, 2003, 6:07:59 PM10/8/03

to

mikha...@yahoo.com (Mikito Harakiri) wrote in message news:<bdf69bdf.03100...@posting.google.com>...

Oh. My bad:

RETRIEVE E.Name FROM Emp E, Dept D

WHERE DEREF(E.Dept).Id = D.Id AND D.Name = 'shoe';
^^^
\_ Dodgy alias in original.

Bob Badour

unread,

Oct 8, 2003, 8:52:32 PM10/8/03

to

paul_geof...@yahoo.com (Paul G. Brown) wrote in message news:<57da7b56.03100...@posting.google.com>...

> Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F8330...@ix.netcom.com>...
> > Jan Hidders wrote:
> > >
> > > Seun Osewa wrote:
> > > > I would also like to know the classical arguments against the network
> > > > model or other "pointer based" models. The only things I know are
> > > > that:
> > > >
> > > > ** using pointers to positions in memory or disk can be messy when
> > > > data has to be moved around. But then is seems there are several
> > > > simple ways to solve this, e.g. what I can only call "logical
> > > > pointers".
> > >
> > > Correct. There's absolutely no reason to believe that you cannot have
> > > data independence with logical pointer or references. I would however
> > > argue that allowing entities without representable keys is not a good idea.
> >
> > Incorrect. Even though you call them 'logical' pointers, they are still physical
> > artifacts and have no place in a truly logical view of the database. Databases are about
> > data, and pointers are not data (or meta-data).
>
> Erm . .
>
> What's 'physical' about this?

Retrieval implies a physical operation, but a name is just a name
after all. Having two attributes called name in a single structure
requires physical location to disambiguate.

> RELATION Dept ( Id Dept_Id KEY, Name String );
> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> Name PersonName, Salary Money );
>
> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>
> RETRIEVE E.Name FROM Emp E, Dept D
> WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';

Whether physical or not physical, the pointer is a superfluous
structural element. Other than increasing complexity, what purpose
does it serve?

Bob Badour

unread,

Oct 8, 2003, 9:00:57 PM10/8/03

to

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.0310...@posting.google.com>...

> Lee Fesperman <firs...@ix.netcom.com> wrote
> > The OP needs to increase his knowledge of database concepts before he tries to 'fix'
> > things. A newsgroup is the wrong place to get any real depth of understanding.
>
> In that case, I would like to know what is the _right_ way to get the
> depth of understanding. At the moment due to where I live I only have
> access to internet resources, so what do you suggest?

Without access to books, I suggest you are in a nearly hopeless situation.
You could try scouring citeseer.

Bob Badour

unread,

Oct 8, 2003, 9:10:30 PM10/8/03

to

"Anthony W. Youngman" <thew...@nospam.demon.co.uk> wrote in message news:<tpLsB7Bl+yg$Ew...@thewolery.demon.co.uk>...

> In article <ba87a3cf.03100...@posting.google.com>, Seun Osewa
> <seun...@inaira.com> writes
> >One question that I think must be looked into is this: If SQL
> >databases are successful today, is it because:
> >** of the relational _model_ they are based on?
>
> This model, because it's mathematically provable, gives Computer
> Scientists the warm fuzzies ...

Mathematically provable? You are speaking nonsense.

> >** of the ease with which SQL can be used from within all programming
> >languages and as an interactive query language?
>
> This is an effect, not a cause. SQL is a naff query language - it's a
> mathematical notation (which is why computer scientists like it) but an
> absolute bummer for normal people.

Most people define normal by reference to themselves. When you speak
of "normal people", I assume you use yourself as paradigm. I agree
that most stupid, obstinately ignorant people will find SQL difficult,
but they will find almost everything difficult.

> >I would also like to know the classical arguments against the network
> >model or other "pointer based" models. The only things I know are
> >that:
>
> The relational advocates believe that because their model is
> "mathematical" it must be better than anything else. And all too often
> they convince themselves that theirs is the only model based on maths -
> when in fact most of the others are too ...

Such as?

> >** using pointers to positions in memory or disk can be messy when
> >data has to be moved around. But then is seems there are several
> >simple ways to solve this, e.g. what I can only call "logical
> >pointers".
> >
> >** The difficulty of performing adhoc queries. but I want to think
> >that if there is a procedural (query?) language many of the advanced
> >features of SQL e.g. group by and sorting of the data can be done with
> >user-generated procedural code. Then I observe that most databases in
> >the world today are being accessed by asp, php or perl web scripts and
> >only recieve adhoc queries during the development stage. These
> >queries are written by developers who are skilled enough in procedural
> >programming not to have problems if they needed such code to access
> >the data.
>
> Jan thinks the DBA should determine how the database should be
> optimised. As a Pickie, I'm used to believing

...a lot of horseshit.

[horseshit snipped]

Mikito Harakiri

unread,

Oct 8, 2003, 9:47:26 PM10/8/03

to

"Paul G. Brown" <paul_geof...@yahoo.com> wrote in message
news:57da7b56.03100...@posting.google.com...

> mikha...@yahoo.com (Mikito Harakiri) wrote in message
news:<bdf69bdf.03100...@posting.google.com>...
> > paul_geof...@yahoo.com (Paul G. Brown) wrote in message
news:<57da7b56.03100...@posting.google.com>...
> > > RELATION Dept ( Id Dept_Id KEY, Name String );
> > > RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> > > Name PersonName, Salary Money );
> > >
> > > RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
> > >
>
> RETRIEVE E.Name FROM Emp E, Dept D

> WHERE DEREF(E.Dept).Id = D.Id AND D.Name = 'shoe';

Now, suppose that indexed nested loops(D, E) is the best access path to the
data. Then DBA or advanced ("self-managed";-) DBMS would have to create
functional index on DEREF(E.Dept).Id. Then, wouldn't it be just be cleaner
to store Id instead of reference in the table itself?

Actually, I'm confused here, since Id is in the other table, so it is
certainly not a functional index, but looks more like a bitmapped join
index. But we don't want to admit anything bitmapped (read non OLTP) in such
a simple problem. So the question is how do we index this access path?

As far as file names stored in a table column, the fundamental difference is
that files themselves are outside of the database. Anything referring to the
outside of the database is a pointer. Person's name is a pointer. It's
impossible to maintain referential integrity for such pointers. One can
delete a file without letting the database to know, and we get a hanging
pointer. A person can change his name without informing the database. The
idea is that there is no benefits having pointers inside of the database
referencing one table from the other.

Paul G. Brown

unread,

Oct 9, 2003, 2:11:50 AM10/9/03

to

"Mikito Harakiri" <mikha...@iahu.com> wrote in message news:<Ze3hb.28$MU....@news.oracle.com>...

> "Paul G. Brown" <paul_geof...@yahoo.com> wrote in message
> news:57da7b56.03100...@posting.google.com...
> > mikha...@yahoo.com (Mikito Harakiri) wrote in message
> news:<bdf69bdf.03100...@posting.google.com>...
> > > paul_geof...@yahoo.com (Paul G. Brown) wrote in message
> news:<57da7b56.03100...@posting.google.com>...
> > > > RELATION Dept ( Id Dept_Id KEY, Name String );
> > > > RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> > > > Name PersonName, Salary Money );
> > > >
> > > > RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
> > > >
> >
> > RETRIEVE E.Name FROM Emp E, Dept D
> > WHERE DEREF(E.Dept).Id = D.Id AND D.Name = 'shoe';
>
> Now, suppose that indexed nested loops(D, E) is the best access path to the
> data. Then DBA or advanced ("self-managed";-) DBMS would have to create
> functional index on DEREF(E.Dept).Id. Then, wouldn't it be just be cleaner
> to store Id instead of reference in the table itself?

[ snip ]

Well, the point of departure here was a discussion about modeling something
at the logical level. Implementation is quite up to you. The simplest
thing to do would be to take a bog-standard RDBMS approach and treat
REF(Dept) as "this says to take the primary key of Dept and embed those
columns invisibly within the Emp table and enforce the necessary RI
constraint rules". At which point a decision about how to process this
query becomes a bog-standard query problem.

The biggest problems come about when you try to make REF() work in the
absence of a primary key. At that point REF/DEREF goes really feral. In
SQL-3 IIRC, there is this wierd thing where a column can ref to a key in
more than one table but I'm not up to gospel on that.

Look: 'logical pointers' done this way isn't new, and it isn't my idea.
Carlo Zaniolo introduced the idea in a 1983 paper called GEM (or something
like that). Besides, once again, I don't like RED/DEREF because it
introduces redundant syntax.

KR

Pb

Lee Fesperman

unread,

Oct 9, 2003, 2:25:16 AM10/9/03

to

Seun Osewa wrote:
>
> Lee Fesperman <firs...@ix.netcom.com> wrote
> > The OP needs to increase his knowledge of database concepts before he tries to 'fix'
> > things. A newsgroup is the wrong place to get any real depth of understanding.
>
> In that case, I would like to know what is the _right_ way to get the
> depth of understanding. At the moment due to where I live I only have
> access to internet resources, so what do you suggest? Besides, I find
> that many people who have all followed the right way disagree on a
> number of issues. I know that there are certain classic books that
> would help (and I would like to be pointed to some more), but there
> must be some web resources I've missed?

I'm not the best one to answer that (most of my grounding in database came from reading
research papers in 70's). I'd suggest books by C. J. Date, Fabian Pascal and Codd
himself. ACM can be a valuable resource.

Decent information is rather scarce on the Web. Database Debunkings
(www.dbdebunk.com) is a good choice, and there are papers on my company's site
(www.firstsql.com) with references.

> The observation is appreciated, a little assistance in this area would
> be appreciated even more ;-). How else can I get in touch with very
> smart people?

That's rather hard. A change in tone (more like your tone here) would help --- less
confrontational, less troll.

Paul G. Brown

unread,

Oct 9, 2003, 2:28:29 AM10/9/03

to

bba...@golden.net (Bob Badour) wrote in message news:<cd3b3cf.03100...@posting.google.com>...

> paul_geof...@yahoo.com (Paul G. Brown) wrote in message news:<57da7b56.03100...@posting.google.com>...
> > Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F8330...@ix.netcom.com>...
> > > Jan Hidders wrote:
> > > >
> > > > Seun Osewa wrote:
> > > > > I would also like to know the classical arguments against the network
> > > > > model or other "pointer based" models. The only things I know are
> > > > > that:
> > > > >
> > > > > ** using pointers to positions in memory or disk can be messy when
> > > > > data has to be moved around. But then is seems there are several
> > > > > simple ways to solve this, e.g. what I can only call "logical
> > > > > pointers".
> > > >
> > > > Correct. There's absolutely no reason to believe that you cannot have
> > > > data independence with logical pointer or references. I would however
> > > > argue that allowing entities without representable keys is not a good idea.
> > >
> > > Incorrect. Even though you call them 'logical' pointers, they are still physical
> > > artifacts and have no place in a truly logical view of the database. Databases are about
> > > data, and pointers are not data (or meta-data).
> >
> > Erm . .
> >
> > What's 'physical' about this?
>
> Retrieval implies a physical operation, but a name is just a name
> after all. Having two attributes called name in a single structure
> requires physical location to disambiguate.

Trouble with the old reading comprehension again, eh Bob?

Have another look at what I typed. See the semi-colon? Two relations. Two
"structures", the attributes of which are unambiguous.

> > RELATION Dept ( Id Dept_Id KEY, Name String );
> > RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> > Name PersonName, Salary Money );
> >
> > RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
> >
> > RETRIEVE E.Name FROM Emp E, Dept D
> > WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';
>
> Whether physical or not physical, the pointer is a superfluous
> structural element. Other than increasing complexity, what purpose
> does it serve?

None. But that's not the point. Pleas read the carefully quoted text
from the previous posts.

Where do I sign up to get back into your twit-filter?

KR

Pb

Jan Hidders

unread,

Oct 9, 2003, 3:50:51 AM10/9/03

to

Seun Osewa wrote:
>
> 2) I still think there are some problems with the principle of "let
> the user only have to know the database at the logical level; separate
> the model from the implementation". It is nice if the user does not
> have to know anything about implementation to in order to execute
> queries and manipulate data. Its nice to let the DBA and DBMS be the
> one to worry about these things.

What is especially nice is that this separation, aka data
independendence, is what gives you the flexibility that was the main
raison d'etre of the relational model and one of the main reasons for
its succes.

> But in order to do this properly the
> DBA _needs_ to know how the user is _likely_ to make use of the data.
> The user should be able to supply _hints_ to the DBMS to help it to

> optimize the way it lays out data, etc. [...]

>
> 3) I guess that's what all the commercial RDMSs are already doing ;)

Indeed.

-- Jan Hidders

Jan Hidders

unread,

Oct 9, 2003, 4:27:12 AM10/9/03

to

Paul G. Brown wrote:
>
> I think I grok the principles supporting 'abstract' types. I've not
> read the Tarski but in my earnest youth I railed against REF/DEREF for
> all of the 'right reasons' until it was pointed out to me that databases
> routinely store filenames and then reason about the filenames perfectly
> happily thank you very much and filenames are just references so would
> I please just shut up, OK?

Er, well, file names are actually not abstract because they have a
concrete representation that you can read.

-- Jan Hidders

Lee Fesperman

unread,

Oct 9, 2003, 4:41:14 AM10/9/03

to

Paul G. Brown wrote:
>
> Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F8330...@ix.netcom.com>...
> > Jan Hidders wrote:
> > > Correct. There's absolutely no reason to believe that you cannot have
> > > data independence with logical pointer or references. I would however
> > > argue that allowing entities without representable keys is not a good idea.
> >
> > Incorrect. Even though you call them 'logical' pointers, they are still physical
> > artifacts and have no place in a truly logical view of the database. Databases
> > are about data, and pointers are not data (or meta-data).
>
> Erm . .
>
> What's 'physical' about this?
>
> RELATION Dept ( Id Dept_Id KEY, Name String );
> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> Name PersonName, Salary Money );
>
> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>
> RETRIEVE E.Name FROM Emp E, Dept D
> WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';
>
> This schema assumes a strict two-value logic, so Emp.Dept cannot, under any
> circumstances what-so-ever, have anything in it except a 'logical reference'
> to exactly one Dept tuple. How is this achieved under the covers? Who
> cares! It's about the logical model.

I still don't see it as truly logical. It's not data. If you add it to the relational
model, you violate the rule about all information being represented as values in tables.
Values are -- 14, 'December 12, 2002', "Sales".

But what do I know? Chris Date tells me I'm completely wrong about nulls.

> (I should note that this formulation kind of makes my skin crawl for
> aesthetic reasons, but its a REF()/DEREF() model that is essentially
> syntactic sugar around well understood key and foreign key semantics.)
>
> Not trying to start a flame war, really (though that's an easy thing to
> do when you're as ignorant as I am). I'm just hoping that we can all
> agree that it's the semantics of the model, not the syntax, which we ought
> to care about. 'Logical pointers' (by which I mean this simple REF/DEREF,
> and mindful of a memory I have to the effect that SQL-3 REF/DEREF are not)
> are mappable to PK/FK, and to a wide variety of rules stated in terms of the
> existance of a tuple with certain properties.

Yes, it is mappable to PK/FK but not equivalent.

> The advantage of predicate logic is that there are various kinds of
> rules which can't be stated using REF/DEREF. And when you have a choice
> between two models take the one with the broader expressive power.

Pointers add nothing but complexity. So there's no choice at all.

Perhaps, you'll come around since your file-name pointer argument didn't hold water. A
data model that supports references to external entities (like files) not under its
control is hopelessly flawed.

Lauri Pietarinen

unread,

Oct 9, 2003, 8:10:11 AM10/9/03

to

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.03100...@posting.google.com>...

> I have tried, twice, to download the evaluation version of the alphora
> product for testing and it doesn't work. Guess there would be a lot
> to learn from playing with it; the product is more than a RDBMS

Aw, that's unfortunate. It took me a while to get working.
It is infact an integrated application development environment where
you can define a great part of your application in a declarative
fashion.

regards,
Lauri Pietarinen

Bob Badour

unread,

Oct 9, 2003, 8:33:35 AM10/9/03

to

Apparently, your reading comprehension is once again a problem,
because I see no semicolon in the following expression:
( E.Name, DEREF(E.Dept).Name FROM Emp E )

> > > RELATION Dept ( Id Dept_Id KEY, Name String );
> > > RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
> > > Name PersonName, Salary Money );
> > >
> > > RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
> > >
> > > RETRIEVE E.Name FROM Emp E, Dept D
> > > WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';
> >
> > Whether physical or not physical, the pointer is a superfluous
> > structural element. Other than increasing complexity, what purpose
> > does it serve?
>
> None. But that's not the point.

It's my point regardless of any other person's point.

> Pleas read the carefully quoted text
> from the previous posts.
>
> Where do I sign up to get back into your twit-filter?

I suggest you send an email to help...@golden.net asking them to fix
their usenet service so I do not have to use google groups. You could
also propose that google look into providing some sort of twit-filter.

Bob Badour

unread,

Oct 9, 2003, 8:38:20 AM10/9/03

to

It also violates the information principle.

Jan Hidders

unread,

Oct 9, 2003, 10:21:57 AM10/9/03

to

Lee Fesperman wrote:
>
> I still don't see it as truly logical. It's not data. If you add it
> to the relational model, you violate the rule about all information
> being represented as values in tables. Values are -- 14, 'December
> 12, 2002', "Sales".

FWIW, I'm probably as much opposed to object identifiers and logical
pointers as you are, but the claim that they are somehow not logical is
such blatant nonsense (unless you start redefining the notion of
"logical" such that it becomes trivially true) that it makes me cringe
when it's used to defend the relational model.

> But what do I know? Chris Date tells me I'm completely wrong about nulls.

Really? I actually believe he has some pretty good arguments there. What
is it that you don't agree with?

-- Jan Hidders

Seun Osewa

unread,

Oct 9, 2003, 2:14:22 PM10/9/03

to

bba...@golden.net (Bob Badour) wrote:
> Without access to books, I suggest you are in a nearly hopeless situation.
> You could try scouring citeseer.

Ouch, it hurts. Hopeless and no help in sight! I do scour citeseer
often, lots of articles to read, and maybe I can "fill in the gaps"
with helpful comments made by people on this group?

Seun Osewa

unread,

Oct 9, 2003, 2:23:41 PM10/9/03

to

Lee Fesperman <firs...@ix.netcom.com> wrote:
> That's rather hard. A change in tone (more like your tone here) would help --- less
> confrontational, less troll.

Have I been confrontational? Have I been a troll? Please tell me the
things I have done that may be percieved that way. I didn't intend to
come here to make people angry, and maybe because of the many old
threads I have gone through, until you mentioned this I never felt
that there is anything I have said that could be regarded as
confrontational :-( or negative.

Seun Osewa

Costin Cozianu

unread,

Oct 9, 2003, 2:40:48 PM10/9/03

to

It has been claimed that logical pointers are harmful and not necessary.

Everything can be expressed in terms of values. Well, logical pointers
are values, for any reasonable definition of value. They are elements of
a well defined types (or sort) in many sorted algebra, just like
integers and strings.

Surrogate keys on the other hand are nothing but logical pointers, they
are an explicit, programmer hand made instantiation of "logical
pointers". Actually logical pointers is maybe a wrong term, a better
name would be abstract identifier. Surrogate keys introduce unnecessary
difficulties in expressing constraints and logical operations. They are
good, insofar as the outside world (i.e. the business world)
incorporates them into their model (their language), but if they are
used just by the software system (as most of them are, in practice) in
order to workaround technical limitation in the DBMS, then they are
definitely not better than logical pointers.

I put a small example justifying the technical difficulties.

Lee Fesperman wrote:

> Jan Hidders wrote:
>
>>Seun Osewa wrote:
>>
>>>I would also like to know the classical arguments against the network
>>>model or other "pointer based" models. The only things I know are
>>>that:
>>>
>>>** using pointers to positions in memory or disk can be messy when
>>>data has to be moved around. But then is seems there are several
>>>simple ways to solve this, e.g. what I can only call "logical
>>>pointers".
>>
>>Correct. There's absolutely no reason to believe that you cannot have
>>data independence with logical pointer or references. I would however
>>argue that allowing entities without representable keys is not a good idea.
>
>
> Incorrect. Even though you call them 'logical' pointers, they are still physical
> artifacts and have no place in a truly logical view of the database. Databases are about
> data, and pointers are not data (or meta-data).
>

> This is very old news.

>
> The OP needs to increase his knowledge of database concepts before he tries to 'fix'
> things. A newsgroup is the wrong place to get any real depth of understanding.
>

Here's a very basic problem for you, that you won't get by foreign keys:

So you have a table Orders where you keep and each Order has to be paid
in full in one payment. So a payment identifier has to stay in the
Orders table.

Now payments can be of different types:
Credit Card payments. Those will be identified and tracked by a
special number we get from the clearing house, and will have a whole
range of other data, typical to credit card.

Check payments: this has another idefntifying number and an
entirely different record (tuple) type.

Paypal payments. Yet another identifier, yet another tuple type,
with different information.

Cryptographic money. Cryptographic money is a self contained
information token that doesn't need a tuple type, they can just stay in
line where they pay for something (for example in the ORDERS table).

And other, unknown at this time, payments have to be added in the future
(for example prepaid coupons or gift cards ), without any impact
whatsoever (like table reorganization) on the existing tables.

For security and data integrity reasons all the identifiers (primary
keys) for different entities have different datatype, because there's
different algorithms to check a credit card payment identifier versus a
check payment identifier, vs. a paypal.

The main requirement is for user to query if the order is paid, when it
was paid, etc, and to display the details of the order together with the
relevant payment information to the user, to resolve payment disputes, etc.

So how do you model this scenario in the ideal relational database
without using abstract identifiers ?

One solution might be to introduce an artificial (surrogate key)
payment_id. I reject this solution, as it introduces an unnecessary
entity, not existent in the business model (users know of "The credit
card payment identified by 10241024, or the check payment 10000009, or
the cryptographic money 0xABCDEFABCDEF.... -- doesn't matter the
representation as long as the software verifies their validity). Plus
the constraints are difficult to express and they have to be
reformulated once we introduce different types of payments.

The solution using abstract identifiers is very easy. Let there be the
abstract datatype PAYMENT, then the ORDERS table just needs to contain a
REF PAYMENT column. New subtypes of payment may be added at any time,
each in their own tables, without anything in ORDERS table being
touched. When the programmer is concerned with the ORDERS table he
doesn't need to worry of the multiple joins involved with a lot of
potential different tables to get the details for credit card, checks,
etc. He just needs to use a simple operator, like the "->" (or DEREF can
be useful if very verbose) to get access to a PAYMENT value.

For example he can use orders.payment->toTextDescription() to get a
string of character to display to the user.

In any case, the mechanism is simple and doesn't pose any logical
problem. Contrary to the claim that logical pointers are illogical. On
the contrary, under the model proposed say in The Third Manifesto, this
very simple problem is difficult to handle.

Seun Osewa

unread,

Oct 9, 2003, 3:00:14 PM10/9/03

to

Lee Fesperman <firs...@ix.netcom.com> wrote in message news:<3F851E...@ix.netcom.com>...
-> Pointers add nothing but complexity. So there's no choice at all.

>
> Perhaps, you'll come around since your file-name pointer argument didn't hold water. A
> data model that supports references to external entities (like files) not under its
> control is hopelessly flawed.

I would like to know how in a relational database column linked to a
"meaningless" primary key in another table is different from a logical
pointer. Obviously I cannot change the primary key without changing
all rows that reference that key and if constraints in a relational
database can solve this problem I do not see why the same cannot be
done in an Object Oriented database. Am I still missing something?

Seun Osewa

Dawn M. Wolthuis

unread,

Oct 9, 2003, 4:12:07 PM10/9/03

to

Good question. Although I would want to move away from relational
databases too, if there is an RDBMS and one wants to query it, what
would I aim for? If you look at XQuery, you will see an example of
what I would definitely NOT aim for. Although the user of such a
language might very well be a technical person, instead of starting
with mathematics (relational calculus, relational algebra) I would
suggest starting with language. The mathematics of language is more
complex than the mathematics of relations, particularly simple
relations (such as 1NF tables).

If you look at the history of data persistence prior to Codd's 1970
ACM paper, you will see several attempts at this. One I have studied
of late is GIRLS (Generalized Information Retrieval Language and
System), specified by Don Nelson and implemented by several folks with
the most famous being Dick Pick. This GIRLS language was specified a
full 40 years ago and lives today in many IT shops under a variety of
about 10 different names, including IBM's UniQuery and Retrieve (for
UniData and Universe respectively). This language is flawed, as are
all, but so very close to what I would think would be a good approach.
It was written at TRW in order to make it so that the military in
Viet Nam could query their data without technical folks in the field.
It went into production in 1969 with the US Army. Prior to the end of
the cold war, it was used by the CIA to track (the associated
database) and query about Russion spies in the US.

I would suggest ditching the entire relational model (as both overly
simplistic in its theory and overly complex in its implementation) and
start with English (that is one of the other names for the GIRLS
language). Note that language is also the starting point for putting
data in XML documents, but it sure doesn't seem to be the starting
point for XQuery, eh?

--dawn
Dawn M. Wolthuis
www.tincat-group.com

Paul G. Brown

unread,

Oct 9, 2003, 5:47:21 PM10/9/03

to

Jan Hidders <jan.h...@pandora.be.REMOVE.THIS> wrote in message news:<3f851...@news.ruca.ua.ac.be>...

We could go round and round on this. I'd observe that there is a whole
class of things which can potentially go into tuple attributes that aren't
'values': for example, you might put a query in there, or a reference, or a
filename, or an executable script to generate a value. The information
content of such values are always 'one step removed' from their
representation, and may in fact be ambiguous.

The Information Principle says that data is stored as values in tuple
attributes. All of the above examples violate this principle. Filenames
then, are 'bad' for the same reasons REF/DEREF is 'bad'. But few people
would say that you shouldn't plonk filenames in attributes.

KR

P "still don't like REF/DEREF, but can live with queries in attributes" b

Seun Osewa

unread,

Oct 9, 2003, 6:24:50 PM10/9/03

to

Hmmm ...

As I see it, problem can be solved with a view which I think is
roughly analogous to the "abstract data type" you propose. And I
disagree that payments_id creates an entity that is not present in the
business model. If it was so there would be no need to group the
various sort of payments together in a query (as the challenge
suggests). I still do not see any real difference between logical
pointers and the relational way except that using the relational model
there is much more flexibility in the way I can manipulate the data.
For example, I can construct queries that make no sense. I can
perform operations on a thousand "pointers" at once.

Having said that i have not yet come accross anyone who knows SQL and
is not grounded in procedural programming. I guess the world is a bit
different from the 1970's afterall. Computers don't come with BASIC
or command line shells anymore. SQL is no longer really the end
user's language. I do not see CEOs asking the DBA for user accounts
so they can use SQL to generate reports. So the advantage of
SQL/relational being able to perform ad-hoc queries may not
necessarily need to be as strong as it used to be. And the advantage
of a newsgroup such as this, I believe, it that if i am wrong the
flaws in my logic will be pointed out.

Seun Osewa.

Costin Cozianu <c_co...@hotmail.com> wrote in message news:<bm49va$io39m$1...@ID-152540.news.uni-berlin.de>...

Mikito Harakiri

unread,

Oct 9, 2003, 7:40:18 PM10/9/03

to

"Paul G. Brown" <paul_geof...@yahoo.com> wrote in message
news:57da7b56.03100...@posting.google.com...

> Well, the point of departure here was a discussion about modeling
something
> at the logical level. Implementation is quite up to you. The simplest
> thing to do would be to take a bog-standard RDBMS approach and treat
> REF(Dept) as "this says to take the primary key of Dept and embed those
> columns invisibly within the Emp table and enforce the necessary RI
> constraint rules". At which point a decision about how to process this
> query becomes a bog-standard query problem.

After half-day investigation I must give some credit to object folks. They
reduced the query to

select e.name from Emp E
where E.Deptref.name = 'shoe'

and pointed me to scoping syntax which allows creating index like this

create index deptref_ix ON emp (Deptref);

Then, the hinted query

select --+leading(d) use_nl(e) index(e deptref_ix)

E.Name FROM Emp E, Dept D

WHERE E.Deptref = REF(d) AND D.Name = 'shoe';

did follow the access path that I wanted to. Of course, it still remains to
see the more challenging task solved with the hints removed.

It is optimizer that is always a problem:-)

Costin Cozianu

unread,

Oct 9, 2003, 7:53:50 PM10/9/03

to

Mikito Harakiri wrote:

Well, I'd be least concerned about the optimizer. If you're using Oracle
, I'd lookup the documentation for the cute name

IS_DANGLING

Cool stuff.

Cheers,
Costin

Seun Osewa

unread,

Oct 9, 2003, 8:47:39 PM10/9/03

to

I started a new thread and waited in vain for google to allow me to
post follow-ups to the first post, so I am putting the follow-ups here
temporarily. Pls review and try to reply to the other thread titled
PlayDB. Thanks.
===
Hi,

The name 'PlayDB' is an attempt to flame-proof the thread from
comments like, "Hey, this is not the way to develop something
_serious_" Afterall, its only PlayDB!

Going through thread archives on the three groups (and sites like
dbdebunk.com) I find that a lot of arguments occur over definitiona of
basic things like "relational", "data model", etc. MySQL may use
"relational" to mean "able to process SQL queries" and the authors of
the third manifesto may mean someone else. All should be well as long
as each speaker explains his usage of each word. Words tend to be
overloaded in real life and if the multiple meanings contiue long
enough, they become the right meanings *sigh*.

I have found these online glossaries/dictionaries to be particularly
interesting, but I would like to know your opinions:
Ibm glossary of computing terms:
http://www-3.ibm.com/ibm/terminology/goc/gocmain.htm
American National Standard Dictionary for Information Systems:
http://www.ncits.org/tc_home/k5htm/ANSDIT.htm

Please point me to any other sources of information. Its easy to miss
some things when there are millions of pages to search, despite google
assistance.

I believe at the beginning of the process it might be difficult to
separate between descriptions of the model, the language, and the
implementation because they affect each other so much but eventualy it
should happen.

Regards,

Seun Osewa

====

Hi,

The architecture I have been thinking of involves storing both the
data and the business logic on the server. "Users" never actually use
the query language (Let's call it PL/PlayQL, which would be
procedural) directly. They instead connect to the PL/PlayQL programs
running on the server, via a simple request/response interface.

The language, while strictly procedural, would include a savepoints
feature that will allow the server to roll back execution in the case
of mysterious errors and try again from last savepoint. SO the
effects of program execution can always be reversed until a savepoint.
And because the program is managed by the server the programs do not
stop running abruptly when clients disconnect. In fact one can have
long-running processes on the server and it becomes something like an
OS (pending: unfuzzification of these statements).

Summary: processes using "PL/PlayQL" running on database server having
flexible access to data. Client programs connect to these processes
and communicate using simplified protocols that are totally
independent of the underlying database. All business logic is
implemented on the server. PL/PlayQL as a language implements
savepoints and can be rolled back to last savepoint. Communication
with client actually takes place at savepoint boundaries (cause it
cannot be reversed).

Seun Osewa

====
Hi,

The name 'PlayDB' is an attempt to flame-proof the thread from
comments like, "Hey, this is not the way to develop something
_serious_" Afterall, its only PlayDB!

Going through thread archives on the three groups (and sites like
dbdebunk.com) I find that a lot of arguments occur over definitiona of
basic things like "relational", "data model", etc. MySQL may use
"relational" to mean "able to process SQL queries" and the authors of
the third manifesto may mean someone else. All should be well as long
as each speaker explains his usage of each word. Words tend to be
overloaded in real life and if the multiple meanings contiue long
enough, they become the right meanings *sigh*.

I have found these online glossaries/dictionaries to be particularly
interesting, but I would like to know your opinions:
Ibm glossary of computing terms:
http://www-3.ibm.com/ibm/terminology/goc/gocmain.htm
American National Standard Dictionary for Information Systems:
http://www.ncits.org/tc_home/k5htm/ANSDIT.htm

Please point me to any other sources of information. Its easy to miss
some things when there are millions of pages to search, despite google
assistance.

I believe at the beginning of the process it might be difficult to
separate between descriptions of the model, the language, and the
implementation because they affect each other so much but eventualy it
should happen.

Regards,

Seun Osewa

====

Hi,

The architecture I have been thinking of involves storing both the
data and the business logic on the server. "Users" never actually use
the query language (Let's call it PL/PlayQL, which would be
procedural) directly. They instead connect to the PL/PlayQL programs
running on the server, via a simple request/response interface.

The language, while strictly procedural, would include a savepoints
feature that will allow the server to roll back execution in the case
of mysterious errors and try again from last savepoint. SO the
effects of program execution can always be reversed until a savepoint.
And because the program is managed by the server the programs do not
stop running abruptly when clients disconnect. In fact one can have
long-running processes on the server and it becomes something like an
OS (pending: unfuzzification of these statements).

Summary: processes using "PL/PlayQL" running on database server having
flexible access to data. Client programs connect to these processes
and communicate using simplified protocols that are totally
independent of the underlying database. All business logic is
implemented on the server. PL/PlayQL as a language implements
savepoints and can be rolled back to last savepoint. Communication
with client actually takes place at savepoint boundaries (cause it
cannot be reversed).

Seun Osewa
====
Hi,

I am thinking of some non-standard definitions for the PlayDB system:

Data Model: a way of representing reality in an information system so
it can be usefully manipulated.

Object: An independent entity, distingushable from all other objects
by a certain unique combination of attributes.

Class: (Most Important)Any arbitrary group of objects thought to
possess certain similarities. And the special thought here is that
classes may overlap in interesting ways.

For example: a=rectangle, b=square, c=paralellogram, d=rhombus
rectangle = (a,b)
parallelogram = (a,b,c,d)
rhombus=(b,d)

This represents no hierarchy. That's why I want to see it as
arbitrary grouping.

Seun Osewa

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.03100...@posting.google.com>...

> Hi,
>
> This is for relational database theory experts on one hand and
> imlementers of real-world alications on the other hand. If there was
> a chance to start again and design SQL afresh, for best
> cleaness/power/performance what changes would you make? What would
> _your_ query language (and the underlying database concept) look like?
>
> Seun Osewa
> PS: I should want to post my ideas too for review but more
> experienced/qualified people should come first

Bob Badour

unread,

Oct 9, 2003, 10:31:23 PM10/9/03

to

With all due respect, relational means something very specific when it comes
to database management just as rational has a very specific meaning when
talking about numeric algebras.

"Seun Osewa" <seun...@inaira.com> wrote in message
news:ba87a3cf.03100...@posting.google.com...

Bob Badour

unread,

Oct 9, 2003, 10:35:56 PM10/9/03

to

With all due respect, Dawn, you are an idiot.

"Dawn M. Wolthuis" <dw...@iserv.net> wrote in message
news:6db906b2.03100...@posting.google.com...

Anith Sen

unread,

Oct 9, 2003, 10:42:01 PM10/9/03

to

>> The mathematics of language is more complex than the mathematics of
relations, particularly simple relations (such as 1NF tables). <<

Are you sure, you know what you are talking about?

>> I would suggest ditching the entire relational model (as both overly

simplistic in its theory and overly complex in its implementation.. <<

Incredible! How about reading some books on the subject?

--
-- Anith
( Please reply to newsgroups only )

Christopher Browne

unread,

Oct 9, 2003, 11:36:18 PM10/9/03

to

A long time ago, in a galaxy far, far away, seun...@inaira.com (Seun Osewa) wrote:
> I started a new thread and waited in vain for google to allow me to
> post follow-ups to the first post, so I am putting the follow-ups here
> temporarily. Pls review and try to reply to the other thread titled
> PlayDB. Thanks.

This is NOT something where it is likely that structure can emerge out
of the chaos of a series of Usenet postings.

Designing a model for data access is a matter that requires an
enormous amount of thought and reflection, and mandates having a clear
crystal of an idea. Crystals need to be left undisturbed until they
have grown to size; that is doubtless true here.

The "committee" approach is what led to the characteristic _problems_
of SQL, where competing agendas assortedly led to:
a) Necessary features being left out because they couldn't agree
on them [SQL '99 finally got around to having the notion of
'sequences']
b) Stuff getting forced in on the basis of political pressure.

Your "chance" of coming up with a design depends on carefully defining
a design, and presenting it coherently. But the chaos of Usenet is
unlikely to be the right place for that.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://www.ntlug.org/~cbbrowne/unix.html
Dain bramaged.

Christopher Browne

unread,

Oct 9, 2003, 11:36:17 PM10/9/03

to

A long time ago, in a galaxy far, far away, dw...@iserv.net (Dawn M. Wolthuis) wrote:
> Good question. Although I would want to move away from relational
> databases too, if there is an RDBMS and one wants to query it, what
> would I aim for? If you look at XQuery, you will see an example of
> what I would definitely NOT aim for. Although the user of such a
> language might very well be a technical person, instead of starting
> with mathematics (relational calculus, relational algebra) I would
> suggest starting with language. The mathematics of language is more
> complex than the mathematics of relations, particularly simple
> relations (such as 1NF tables).

No, that _very much_ gets things backwards.

You need to have a clearly defined model of how the data is to be
manipulated before it makes any sense to try to make up a language.
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/rdbms.html
To iterate is human; to recurse, divine.

Seun Osewa

unread,

Oct 10, 2003, 2:17:52 AM10/10/03

to

dw...@iserv.net (Dawn M. Wolthuis) wrote in message news:<6db906b2.03100...@posting.google.com>...

> I would suggest ditching the entire relational model (as both overly
> simplistic in its theory and overly complex in its implementation) and
> start with English (that is one of the other names for the GIRLS
> language). Note that language is also the starting point for putting
> data in XML documents, but it sure doesn't seem to be the starting
> point for XQuery, eh?
>
> --dawn
> Dawn M. Wolthuis
> www.tincat-group.com

Please explain further. What do you really mean? Its natural for
everyone here to think every word in that post was a troll unless you
explain your views more clearly. You could not have expressed a more
unpopular/unsupportable combination of ideas! Exactly how would we go
about using language as a query tool? Is this AI? What would the
underlying model be knowing how redundant and imprecise language can
be? Tell what we may have missed.

Seun Osewa

Ruud de Koter

unread,

Oct 10, 2003, 2:57:50 AM10/10/03

to

Hi Seun,

Who ever said ad-hoc queries would be performed by end-users? Or are you not
implying that?

Regards,

Ruud de Koter

--
--------------------------------------------------------------------------------------
Ruud de Koter HP OpenView Software Business Unit
Senior Software Engineer IT Service Management Operation
Telephone: +31 (20) 514 15 89 Van Diemenstraat 200
Telefax : +31 (20) 514 15 90 PO Box 831
Telnet : 547 - 1589 1000 AV Amsterdam, the Netherlands
Email : ruud_d...@hp.com

internet: http://www.openview.hp.com/products/servicedesk
intranet: http://ovweb.bbn.hp.com/itservicemanager
--------------------------------------------------------------------------------------

Lee Fesperman

unread,

Oct 10, 2003, 5:11:02 AM10/10/03

to

Jan Hidders wrote:
>
> Lee Fesperman wrote:

> > Jan Hidders wrote:
> >>
> >>Correct. There's absolutely no reason to believe that you cannot have
> >>data independence with logical pointer or references. I would however
> >>argue that allowing entities without representable keys is not a good idea.
> >
> > Incorrect. Even though you call them 'logical' pointers, they are
> > still physical artifacts and have no place in a truly logical view of
> > the database.
>

> Logical pointers can be defined at the logical level and implemented in
> various ways. They are just as much physical artifacts as, say,
> relations are.

>
> > Databases are about
> > data, and pointers are not data (or meta-data).
>

> They carry information. That makes them data. I see no good reason to
> use a more restrictive definition.

Because the information they carry is physical, which doesn't belong in a logical model.

Let me try again:

In the relational model, tables (relations) are completely independent except through
the action of inter-table constraints (like referential integrity). The constraints are
declarative and truly logical. They can be removed without changing the contents of a
single column in the participating tables.

Pointers form a rigid bond between table. To my mind, this makes them physical --- if it
walks like a duck, etc.

Pointers don't have the flexibility associated with a logical entity. You must change
table contents to remove that bond.

It's been argued here that pointers (references) could be mapped to pk/fk constructs.
Besides throwing away other benefits of pk/fk, it doesn't work in the general case, for
instance:

+ When the fk is part of the pk of the referencing table,

+ When the fk columns are shared with other fk's.

> > The OP needs to increase his knowledge of database concepts before he
> > tries to 'fix' things.
>

> His motivation may be wrong, but he asks the right questions.

Which makes him a troll (who has lately revealed his agenda).

> > A newsgroup is the wrong place to get any real
> > depth of understanding.
>

> I fully agree. If only because there seem to be so many in these
> newsgroups that confuse "knowledge of database concepts" with "knowing
> what Chris Date et al. say about them". If you want to know what the
> real experts in industry and the research community think, these
> newsgroups are certainly not representative.

Are you saying that someone can be knowledgeable about database concepts without knowing
of the enormous contributions made by relational? That is foolishness.

Perhaps, you are attacking Date (I don't know your agenda, though "et al." is
revealing). In that case, I agree -- you don't have to read Date to become knowledgeable
about database concepts.

Lee Fesperman

unread,

Oct 10, 2003, 5:51:26 AM10/10/03

to

Read the papers on www.firstsql.com.

Seun Osewa

unread,

Oct 10, 2003, 7:00:00 AM10/10/03

to

Ruud de Koter <ruud_d...@hp.com> wrote in message news:<3F865867...@hp.com>...

> Hi Seun,
>
> Who ever said ad-hoc queries would be performed by end-users? Or are you not
> implying that?
>
> Regards,
>
> Ruud de Koter

Hi Ruud,

It puzzles me myself.

Hmmm ... somewhere in my sleepy brain at 4am in the night a brain cell
misfired and told me that the above post is a continuation of the
running "Yes or no, pointers are bad and they are one of the reasons
why the relational database model is good and network/object models is
bad" debate. And its agreed that the adhoc nature of queries you can
run on relational or "almost-relational" databases using SQL is one
reason for its popularity. And I wanted to chip in that comment as a
way of saying that u know based on the way things are today this is
less of a strong reason to use SQL/relational rather than more rigid
database models.

This seems like a good explanation for my post :-P Hope it makes some
sense.

Seun Osewa

Dawn M. Wolthuis

unread,

Oct 10, 2003, 11:22:45 AM10/10/03

to

While I definitely agree that the mathematics of the data persistence
mechanism is not as important to me as whether it works or not, as a
former mathematician, I have done a little study related to the
mathematics of non-relational approaches, such as PICK (the one both
Wol and I have been know to advocate on behalf of).

These models tend to start with language rather than mathematics. So,
what started out as my attempt to show such things as the fact that a
PICK file is more like a mathematical RELATION than an RDBMS table, I
ended up studying the mathematics of language for a short time - one
can see that the mathematics of language, which is what we are storing
when working with text-based objects, is much more complex than simple
relations.

By the way, in case you are wondering how PICK files are more like
mathematical relations -- they do have a numbered position for each
domain (in other words, there is a location for each column within a
row as there is a location in a PICK ITEM/RECORD); they do not by
default request a constraint on the length of values in a given domain
(a quite unnecessary database constraint); and they permit relations
as elements within a relation -- there is no mathematical requirement
that a relation be in first normal form, for example.

I do tire of the thought that a database premised on the relational
model is somehow more mathematically accurate than those premised on a
language model. PICK, like XML, was used to make it easy to think
about storing and retrieving text. If you set aside the need for
storing other objects for now and focus on text-based data
persistence, it is simply a means to persist propositions. If one were
to normalize your sentences before you said them, you might guess that
people would have a harder time figuring out what you were saying.
Similarly, normalizing data before persisting it tends to make it
difficult to retrieve the original propositions, reconstructing
language from normalized data.

It's time to move on from the relational model -- it was a good
academic exercise, but has not proven a very agile means for
persisting and retrieving propositions, methinks. smiles. --dawn

"Anthony W. Youngman" <thew...@nospam.demon.co.uk> wrote in message news:<xTDLP1CFRIg$Ew...@thewolery.demon.co.uk>...
> In article <3F7F8E...@ix.netcom.com>, Lee Fesperman
> <firs...@ix.netcom.com> writes
> >If you don't care for mathematical principles, there's always ad-hoc database
> >models.
> >Check out Pick, OO and XML databases. They're interested in what works and
> >ignore
> >elegance and mathematical principles.
>
> Mathematical principles? You mean like Euclidean Geometry and Newtonian
> Mechanics? They're perfectly solid, good, mathematically correct. Shame
> they don't actually WORK all the time in the real world.
>
> That's what I feel about relational, too ...
>
> Cheers,
> Wol

Patrick K. O'Brien

unread,

Oct 10, 2003, 11:45:31 AM10/10/03

to

Lee Fesperman <firs...@ix.netcom.com> writes:

But references (as opposed to pointers) are not that far off from
surrogate keys, at least in terms of how they can be used to enforce
referential integrity. But don't take that as trolling, I'm just
setting the stage for a few questions below.

> Pointers don't have the flexibility associated with a logical
> entity. You must change table contents to remove that bond.

I agree with that, and the same is true of references - they do
represent a bond between objects, which makes them inflexible. But if
you compared an RDBMS that used surrogate keys to establish RI between
two tables, you've got a certain inflexibility there as well. If
there is no longer a relationship between the two tables, the FK
should be dropped along with the RI constraint, right?

> It's been argued here that pointers (references) could be mapped to
> pk/fk constructs. Besides throwing away other benefits of pk/fk, it
> doesn't work in the general case, for instance:

The only benefit I've come up with is the case where something other
than a surrogate key is used as the fk (like state postal code, for
example), eliminating the need to do a join to get a meaningful
value. What other benefits do you consider to be lost when references
are used instead of values?

> + When the fk is part of the pk of the referencing table,

If I understand you correctly, I disagree with this. I understand you
to say that if a fk makes up part of the pk of the referencing table,
and the fk was a reference, instead of a value, the dbms would not be
able to enforce the uniqueness constraint on the pk of the referencing
table. That is not true about my ODBMS implementation (PyPerSyst).
Did I understand you correctly?

> + When the fk columns are shared with other fk's.

I don't understand this point. Can you provide a bit more detail
about what you mean, or maybe an example? Thanks.

--
Patrick K. O'Brien
Orbtech http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------

Dawn M. Wolthuis

unread,

Oct 10, 2003, 12:13:10 PM10/10/03

to

Thank you, Seun, for asking your question with a bit of logic and not
gut-reaction emotional baggage (and for also asking a question of me
off-list so I could ramble). I'll try to make this more suscinct.

First of all, I have read Codd's 1970 & 1974 ACM papers, as well as
his "The Relational Model for Database Management, Version 2" book. I
have also read several books by Chris Date and by Fabian Pascal. I
held a dialog (that reads more like a monologue) with Pascal and it is
reproduced in total if you scroll to the bottom of the
http://store.tincat-group.com page and click on the Dick Pick / Ted
Codd Blue Brothers parody picture. I had not been reading this news
group until lately, but it strikes me that this is a group that might
be very entertained by that dialog.

I have a master's degree in mathematics and my father is a linguist.
I find the language of mathematics and the mathematics of language
both fascinating. My experience, however, is that I have run IT
project teams working with a variety of databases (and languages) and
have never seen any environment that is as agile for develoeprs (both
productive from the start and easy to maintain) than the teams I have
led that worked with the UniData database. I have worked with SQL as
well as both older and newer database languages.

So, nope, I'm not trolling. I've been doing some research the past
couple of years and I'm convinced that it is time to do something new
(and yet old) with data persistence.

I favor using Java for a variety of reasons, but am comfortable with
other languages as well, and think that using Java both for the
software application and for the constraints on the data, rather than
encoding constraints in some other language within a database, makes
for both a more agile development approach AND, surprisingly enough,
tends to make for better data integrity, although a lousy software
developer can certainly mess up either environment. Separating the
DBA from the software developer has definitely had a negative affect
on the speed with which software is developed and maintained, but my
experience (and intuition -- I don't, yet, have scientific evidence)
tells me that the benefits purported by the approach of having a dba
work on some centralized constraints on the persisted data outside of
the context of the use of that data have not really come to fruition
and/or are not worth the costs of using this approach (more on that
some other time).

So, while some might classify me as an idiot (men can be so emotional
sometimes ;-), I have several graduate classes in logic to my credit
and believe that I am approaching this topic quite logically, even if
my summaries skip some of the logical steps in the process. I have
thought about how to prove my points and since my point is really
about agility and quality in application software development and
maintenance, a competition to see what tools and techniques and what
data persistence approaches win such a competition might be the best
proof. The current industry benchmarks for databases tend to be
SQL-based and highly political, so let's put different approaches to
the test.

Thanks for asking your question and not just assuming I'm a nut
because I disagree with the current state of the industry on this
topic. I'm sure there are gaps in my thinking and I know some of my
opinions are based on intuition that arises from my experience, but I
do hope to have more proof in the future. I am also very willing to
adjust my opinions with convincing arguments and evidence and trust
that there are some on this list who work similarly.
--dawn

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.03100...@posting.google.com>...

Patrick K. O'Brien

unread,

Oct 10, 2003, 12:22:45 PM10/10/03

to

lauri.pi...@atbusiness.com (Lauri Pietarinen) writes:

Please allow me to take exception to a few of your points.

> If you take a standard SQL-query with sorting and grouping and some
> joins and compare it with hand written "navigational" code you will
> notice that what can be expressed in SQL in 5-10 lines of code will
> require several pages of hand written code for equivalent result.

That really depends on two things: the specific SQL query, the
"navigational" programming language, and the capabilities of the
ODBMS. In general, I would agree that a declarative language is going
to be able to express an operation more concisely than a procedural or
object-oriented language. But a language like Python is very powerful
and expressive. I can do the equivalent of a simple SQL query in
about as many lines of code in Python using the PyPerSyst ODBMS. It
would take an incredibly complex SQL query before I'd get to "several
pages of hand written code". PyPerSyst itself is only several pages
of hand written code. ;-)

> The third issue is that the SQL query can be optimised to a much
> higher degree than procedural code. Say you add a new index. The
> DBMS will/can immediately start using it without any user
> intervention (if it decides to). In the procedural alternative you
> would have to recode your query.

You wouldn't necessarily *have to*. PyPerSyst does not yet have a
declarative query capability. But if you add an index, the same
procedural code will now make use of that index (or rather, the
procedural code is calling methods of objects that are smart enough to
look for indexes to optimize themselves without requiring any changes
in application code).

I only point this out becase critics of object databases really need
to improve their understanding of the capabilities of object languages
and object databases. They just aren't as inflexible as some pundits
would like to claim.

> The optimiser can also take into account the cardinality of tables
> (=number of rows) so that it will produce a different plan for a
> small database and a big database.

PyPerSyst doesn't do this, but it is completely within the realm of
possibilities. All we are really talking about is levels of
abstraction. Nobody should be writing procedural code at such a low
level of abstraction that their code can't be optimized at runtime
based on the presence or absence of indexes, or the size of the data
involved. There is nothing that precludes an ODBMS, or procedural
code, from being able to be optimized at runtime without changing
code.

> An optimiser can even be sensitive to input from users at run time,
> say you have
>
> select name, salary
> from nurses
> where sex = ?
>
> If the parameter given at run time is 'M' then it could be
> advantageous to use an index, but not if parameter is 'F'. This
> decision could be made by the optimiser "on the fly". I hope you get
> the picture...

I hope you get the picture that every example you have given can be
true about procedural code. Here is a very simple example of the
source code for the "find" method of the PyPerSyst Extent class:

def find(self, **criteria):
"""Return list of instances exactly matching all criteria."""
instances = []
names = criteria.keys()
spec = self._makekeyspec(names)
if spec in self._altkeys:
# Use alternate keys as a shortcut.
key = self._makekey(criteria, spec)
d = self._altkeys[spec]
if key in d:
instance = d[key]
instances.append(instance)
else:
# Scan all instances for a match.
for instance in self._instances.values():
match = True
for name, value in criteria.items():
if getattr(instance, name) != value:
match = False
break
if match:
instances.append(instance)
return instances

This find method returns a list of instances matching some criteria
supplied by the application code. A specific example of its use would
look like this:

people = db.root['Person'].find(name='Lauri Pietarinen')

If there is an alternate key index on the name attribute, the find
method will use that as an internal optimization. If one does not
exist, then it will scan all instance of the Person class looking for
a match. Let's assume that today there is no index, and all instances
are scanned. If tomorrow the schema is changed to declare that there
should be an alternate key enforced on the name attribute, then
PyPerSyst will build and maintain an index that will be used to
enforce that constraint. As a side benefit, the find method will also
use that index to optimize its performance. The application code will
not have changed.

> > Other than that, what's wrong with the network (or similar)
> > models? I have not found a good link to such a discussion though
> > I hear it repeated that there are certain classical arguments
> > against them.
>
> In 1974 there was the BIG DEBATE where Codd defended the RM against
> network guys. I wonder if there is a transcript available?
>
> I think one of Codds arguments was that the number of manipulative
> operators needed in the RM (i.e. insert, update, delete) was much
> lower than in CODASYL (connect, disconnect, etc etc...)

Any arguments that are true about CODASYL are not necessarily true
about current ODBMSs, in spite of the fact that they both could be
described as implementing a network model. I have also tried to find
information about the weaknesses of the network model and have not
found much of value. Some of the obvious weaknesses of network
products, such as the difficulty in changing your schema because
pointers were static and tied to the physical location of information
on disk, are easily avoided in modern object databases. For example,
PyPerSyst uses references, but has no problem evolving schemas and
migrating instances from the old schema to the new schema. IMO,
references in modern OO languages aren't as evil as pointers in old
CODASYL products. But I'm still looking for information to confirm or
dispute that opinion.

Thanks for listening. :-)

Costin Cozianu

unread,

Oct 10, 2003, 12:54:41 PM10/10/03

to

Seun Osewa wrote:

> Hmmm ...
>
> As I see it, problem can be solved with a view which I think is
> roughly analogous to the "abstract data type" you propose. And I
> disagree that payments_id creates an entity that is not present in the
> business model. If it was so there would be no need to group the
> various sort of payments together in a query (as the challenge
> suggests).

You can group several unrelated things together without giving them an
id. Like apples and oranges that we put in a basket, we don't need to
give them apple_or_orange_id.

Let's think of CUSTOMER, who have a default payment mechanism on record,
where the PAYMENT_MECHANISM can be

Credit Card, (CardtyTYPE, Number )

OR

Electronic Withdrawal From Account: (Bank Number, AccountNumber)

You really don't get the business user to care that we create an extra
ID for these two entities, they are already identifiable through their data.

if we create a PAYMENT_MECHANISM_ID in current databases is just becuase
the DBMS vendor doesn't allow us to say:

type PaymentMechanism = CreditCard | ElectronicWithdrawalFromAccount

So database developers will create a PAYMENT_MECHANISM table and invent
PAYMENT_MECHANISM_ID, and even then it is either impossible or extremely
awkward to specify the needed integrity constraint.

But if the end user sees

PAYMENT_MECHANISM_ID: 1020303030303,

where he expects a credit card number or account details, he will say
"What ???"

Sometimes, they are forced to swallow such invention and make them part
of their business model, just because they have to use our software, but
not because a mathematical model of the business actually needs
surrogate keys.

Costin

Anthony W. Youngman

unread,

Oct 10, 2003, 1:57:42 PM10/10/03

to

In article <bm4sa8$bi4iq$1...@ID-152540.news.uni-berlin.de>, Costin Cozianu
<c_co...@hotmail.com> writes

>> select --+leading(d) use_nl(e) index(e deptref_ix)
>> E.Name FROM Emp E, Dept D
>> WHERE E.Deptref = REF(d) AND D.Name = 'shoe';
>>
>> did follow the access path that I wanted to. Of course, it still remains to
>> see the more challenging task solved with the hints removed.
>>
>> It is optimizer that is always a problem:-)
>>
>>
>
>Well, I'd be least concerned about the optimizer. If you're using Oracle , I'd
>lookup the documentation for the cute name
>
> IS_DANGLING
>
>Cool stuff.

I'd just use a database that doesn't NEED an optimiser :-)

Cheers,
Wol
--
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
Witches are curious by definition and inquisitive by nature. She moved in. "Let
me through. I'm a nosey person.", she said, employing both elbows.
Maskerade : (c) 1995 Terry Pratchett

Mikito Harakiri

unread,

Oct 10, 2003, 2:08:23 PM10/10/03

to

"Patrick K. O'Brien" <pob...@orbtech.com> wrote in message
news:m265ixf...@orbtech.com...

> lauri.pi...@atbusiness.com (Lauri Pietarinen) writes:
> You wouldn't necessarily *have to*. PyPerSyst does not yet have a
> declarative query capability. But if you add an index, the same
> procedural code will now make use of that index (or rather, the
> procedural code is calling methods of objects that are smart enough to
> look for indexes to optimize themselves without requiring any changes
> in application code).
>
> I only point this out becase critics of object databases really need
> to improve their understanding of the capabilities of object languages
> and object databases. They just aren't as inflexible as some pundits
> would like to claim.

I just went through simple stupidity test for Object Relational yesterday.
Yes, Object Relational folks are not totally dumb, their stuff doesn't have
*obvious* flaws. Therefore, yes it seems possible in principle to optimize
queries in Object Extensions to SQL, and maybe in even OQL as well.

The problem is that they raise complexity so significatly that it is
extremely challenging for the optimizer to catch up. Note, that sql
optimization is struggling to solve all problems in "simple" area of "flat"
relations. Or, to put it in other words, how large is optimization team in
your little PyPerSyst?

Patrick K. O'Brien

unread,

Oct 10, 2003, 2:36:35 PM10/10/03

to

"Mikito Harakiri" <mikha...@iahu.com> writes:

I don't completely understand all of what you have said, or exactly
what you are asking, but I'll try to provide some information that may
answer your question. First, PyPerSyst is not Relational, nor is it
Object Relational, nor does it support OQL. It is a persistence
system for Python objects. It can persist any Python object graph,
but it also comes with a couple of base classes that can be used to
create an object system that has some *qualities* in common with
Relational databases.

As for optimization, that isn't quite as much of an issue, as
PyPerSyst is an in-memory database, which means that all objects
reside in RAM. Transactions are logged to disk before being executed,
and the combination of system snapshot and transaction log is used to
recover from a crash. So performance is good without a lot of
attention to optimization.

There is no optimization team. As of right now, I have written all
the PyPerSyst code. But I have drawn inspiration from other projects
and academic papers, and there are several developers actively
contributing ideas as well. PyPerSyst is free and open source, so you
can examine the source code yourself:

http://sourceforge.net/projects/pypersyst/

Did that answer your question?

Mikito Harakiri

unread,

Oct 10, 2003, 3:07:36 PM10/10/03

to

"Patrick K. O'Brien" <pob...@orbtech.com> wrote in message

news:m2he2hd...@orbtech.com...

I apologyse for rude tone of my message. Our newsgroup is overflown with
messages like "Could you please answer the following question, and here is
BTW a little system that I developed".

Optimization is a hard problem. It doesn't matter if the system operates in
memory or on hard disk. If it seems simple in your scope, that means that
there are limitations in flexibility, query expressiveness, and/or power.

Paul G. Brown

unread,

Oct 10, 2003, 6:05:06 PM10/10/03

to

"Mikito Harakiri" <mikha...@iahu.com> wrote in message news:<DIChb.34$qu2...@news.oracle.com>...

[ snip ]

> I just went through simple stupidity test for Object Relational yesterday.
> Yes, Object Relational folks are not totally dumb, their stuff doesn't have
> *obvious* flaws. Therefore, yes it seems possible in principle to optimize
> queries in Object Extensions to SQL, and maybe in even OQL as well.
>
> The problem is that they raise complexity so significatly that it is
> extremely challenging for the optimizer to catch up. Note, that sql
> optimization is struggling to solve all problems in "simple" area of "flat"
> relations. Or, to put it in other words, how large is optimization team in
> your little PyPerSyst?

Which bit of object relational drew your ire?

Extensible domains adds some complexity, but it also simplifies some
stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out the
Sequoia 2000 and Bucky benchmarks.

References and table inheritance make a lot harder.

KR

Pb

Jan Hidders

unread,

Oct 10, 2003, 7:09:53 PM10/10/03

to

Paul G. Brown wrote:
> Jan Hidders <jan.h...@pandora.be.REMOVE.THIS> wrote in message news:<3f851...@news.ruca.ua.ac.be>...
>>

>>Er, well, file names are actually not abstract because they have a
>>concrete representation that you can read.
>
> We could go round and round on this. I'd observe that there is a whole
> class of things which can potentially go into tuple attributes that aren't
> 'values': for example, you might put a query in there,

A query is a value.

> or a reference,

A reference, just by itself without some magical DEREF funcion is an
(abstract) value.

> or a filename,

A filename is a value.

> or an executable script to generate a value.

A script is a value.

> The information content of such values are always 'one step removed' from their
> representation, and may in fact be ambiguous.

That you can do an extra step to derive some more information doesn't
mean that they didn't already have some information to begin with. The
crucial question is if in this extra step you need some extra
information that is not to be found anywhere in the tables. For the
evaluation of the query you don't. For dereferencing the reference
without the help of a table that associates them with their destination,
you do. For looking up the contents of the file without the help of a
table that associates them with their contents, you do. For executing
the script you don't.

So as long as you don't expect a DEREF function or a function that
magically looks up the content of a file none of you examples violates
the information principle.

-- Jan Hidders

Mikito Harakiri

unread,

Oct 10, 2003, 7:13:27 PM10/10/03

to

"Paul G. Brown" <paul_geof...@yahoo.com> wrote in message

news:57da7b56.03101...@posting.google.com...

> Which bit of object relational drew your ire?
>
> Extensible domains adds some complexity, but it also simplifies some
> stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out
the
> Sequoia 2000 and Bucky benchmarks.
>
> References and table inheritance make a lot harder.

Yes, I meant References, Nested Collections, and other redundancies, not
User Defined Types (although, the latter are not a total success either).
Recently spatial folks shifted their focus to topologies, so they use a
respectable method, unlike those "Everything is object" dummies.

Christopher Browne

unread,

Oct 10, 2003, 8:47:36 PM10/10/03

to

After takin a swig o' Arrakan spice grog, paul_geof...@yahoo.com (Paul G. Brown) belched out...:

> Extensible domains adds some complexity, but it also simplifies some
> stuff. Queries over UDTs/UDFs works pretty good in Postgres. Check out the
> Sequoia 2000 and Bucky benchmarks.

One of the bigger challenges with PostgreSQL is that having extensible
aggregates (one of the "more advanced" relational features) makes it
tough to do aggregate-specific optimizations.

Typical to this, for instance, is that aggregates like MAX() and MIN()
can't trivially take advantage of indices. As "aggregates," their
values have to be collected by walking across the elements in the set.
But it is often the case that you could do the following transform:

select max(field) from table where [conditions];

--->

select field from table where [conditions] order by field descending
limit 1;

To "special case" this is a very dangerous idea, because:
a) It's a "hack" for some aggregates;
b) If other aggregates are needed, it may be counterproductive;
c) If you haven't got a good index for the purpose, "plodding
through the aggregate" may not be worse.

The flip side is that by having aggregates be "generic," it doesn't
take hideous hacks if you need to add in sophisticated aggregates
(common for statistical work).

Of course, this is pretty divergent from the subject; PostgreSQL is
certainly NOT an "OODB"; its extensions tend to add to its 'relational
fidelity.'
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/linuxxian.html
Rules of the Evil Overlord #128. "I will not employ robots as agents
of destruction if there is any possible way that they can be
re-programmed or if their battery packs are externally mounted and
easily removable." <http://www.eviloverlord.com/>

Bob

unread,

Oct 11, 2003, 1:51:11 AM10/11/03

to

paul_geof...@yahoo.com (Paul G. Brown) wrote in message news:<57da7b56.03101...@posting.google.com>...

Yes, UDTs with inheritance simplifies queries.
This white paper includes good examples:
http://www.matisse.com/pdf/product_information/Whitepapers/SQLTechBrief.pdf

bob

Alfredo Novoa

unread,

Oct 11, 2003, 10:42:38 AM10/11/03

to

pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m2he2hd...@orbtech.com>...

> I don't completely understand all of what you have said, or exactly
> what you are asking, but I'll try to provide some information that may
> answer your question. First, PyPerSyst is not Relational, nor is it
> Object Relational, nor does it support OQL.

Nor it is a DBMS.

> It is a persistence
> system for Python objects. It can persist any Python object graph,

It is only a Phyton library which can save snapshots of main memory
network structures.

It another "DBMS" without DBMS like the infamous Prevayler.

http://www.dbdebunk.com/page/page/743719.htm

> As for optimization, that isn't quite as much of an issue, as
> PyPerSyst is an in-memory database, which means that all objects
> reside in RAM.

What nonsense! Optimization is independent to where data reside. There
are main memory SQL DBMSes with optimizers.

Regards
Alfredo

Seun Osewa

unread,

Oct 11, 2003, 11:02:50 AM10/11/03

to

Hi,

Costin Cozianu <c_co...@hotmail.com> wrote in message news:<bm6o4c$j8uu5$1...@ID-152540.news.uni-berlin.de>...

> Seun Osewa wrote:
>
> > Hmmm ...
> >
> > As I see it, problem can be solved with a view which I think is
> > roughly analogous to the "abstract data type" you propose. And I
> > disagree that payments_id creates an entity that is not present in the
> > business model. If it was so there would be no need to group the
> > various sort of payments together in a query (as the challenge
> > suggests).
>
> You can group several unrelated things together without giving them an
> id. Like apples and oranges that we put in a basket, we don't need to
> give them apple_or_orange_id.

How about fruits_id? :D Seriously! I think it'll always be possible
to support one model of doing things by contrasting its
correct/optimal use to a missaplication of the conflicting model. Its
good to be able to resist the temptation! This is made worse by the
fact that people enspousing one technilogy or one approach to things
usually are not motivated to learn to do things properly in the
conflicting way. For example, how many of us have ever used Pick or
IMS? (I haven't either)

> Let's think of CUSTOMER, who have a default payment mechanism on record,
> where the PAYMENT_MECHANISM can be
>
> Credit Card, (CardtyTYPE, Number )
>
> OR
>
> Electronic Withdrawal From Account: (Bank Number, AccountNumber)
>
> You really don't get the business user to care that we create an extra
> ID for these two entities, they are already identifiable through their data.
>
> if we create a PAYMENT_MECHANISM_ID in current databases is just becuase
> the DBMS vendor doesn't allow us to say:
>
> type PaymentMechanism = CreditCard | ElectronicWithdrawalFromAccount
>
> So database developers will create a PAYMENT_MECHANISM table and invent
> PAYMENT_MECHANISM_ID, and even then it is either impossible or extremely
> awkward to specify the needed integrity constraint.
>
> But if the end user sees
>
> PAYMENT_MECHANISM_ID: 1020303030303,

The end user never sees this. Only the application programmer, and
only if he does not skip over the whole problem by defining a neat
little view over the 4 tables.

> where he expects a credit card number or account details, he will say
> "What ???"
>
> Sometimes, they are forced to swallow such invention and make them part
> of their business model, just because they have to use our software, but
> not because a mathematical model of the business actually needs
> surrogate keys.

The relational model seems to need such an extra table and key in this
case and its a mathematical model ;-) But, yes, a mathematical model
is only a model and we can have a million of them and the only
difference is that some map more closely to reality than others. And
then again it depends on which segment of reality you are talking
about. To be too "religious" robs the mind of flexibility and
creativity. Sometimes with a little work the "stupid" idea opens the
door to improvement. History repeats itself.

>
> Costin

Patrick K. O'Brien

unread,

Oct 11, 2003, 11:10:17 AM10/11/03

to

alf...@ncs.es (Alfredo Novoa) writes:

> pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m2he2hd...@orbtech.com>...
>
> > I don't completely understand all of what you have said, or
> > exactly what you are asking, but I'll try to provide some
> > information that may answer your question. First, PyPerSyst is
> > not Relational, nor is it Object Relational, nor does it support
> > OQL.
>
> Nor it is a DBMS.

It walks like a duck, it quacks like a duck... ;-)

Alfredo Novoa

unread,

Oct 11, 2003, 11:17:21 AM10/11/03

to

pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m265ixf...@orbtech.com>...

> lauri.pi...@atbusiness.com (Lauri Pietarinen) writes:
>
> Please allow me to take exception to a few of your points.
>
> > If you take a standard SQL-query with sorting and grouping and some
> > joins and compare it with hand written "navigational" code you will
> > notice that what can be expressed in SQL in 5-10 lines of code will
> > require several pages of hand written code for equivalent result.

> In general, I would agree that a declarative language is going

> to be able to express an operation more concisely than a procedural or
> object-oriented language. But a language like Python is very powerful
> and expressive. I can do the equivalent of a simple SQL query in
> about as many lines of code in Python using the PyPerSyst ODBMS.

Python is a very low level language compared to SQL.

Lauri was not talking about trivial queries like "select * from A"

PyPerSyst is not a DBMS.

> It
> would take an incredibly complex SQL query before I'd get to "several
> pages of hand written code".

It is completely false.

> PyPerSyst itself is only several pages
> of hand written code. ;-)

It is one of the reasons because it is ridiculous to compare it with
an SQL DBMS like DB2 or Oracle. It is only a little toy which takes
snapshots of main memory object networks in files.

> > The third issue is that the SQL query can be optimised to a much
> > higher degree than procedural code. Say you add a new index. The
> > DBMS will/can immediately start using it without any user
> > intervention (if it decides to). In the procedural alternative you
> > would have to recode your query.
>
> You wouldn't necessarily *have to*. PyPerSyst does not yet have a
> declarative query capability.

One of the many reasons because it can not be considered seriously.

> But if you add an index, the same
> procedural code will now make use of that index (or rather, the
> procedural code is calling methods of objects that are smart enough to
> look for indexes to optimize themselves without requiring any changes
> in application code).

It is something like XBase whith pointers. There is nothing to do with
optimization.

> I only point this out becase critics of object databases really need
> to improve their understanding of the capabilities of object languages
> and object databases.

You are an ignorant. You should educate yourself before trying to
correct real experts like Lauri.

> > The optimiser can also take into account the cardinality of tables
> > (=number of rows) so that it will produce a different plan for a
> > small database and a big database.
>
> PyPerSyst doesn't do this, but it is completely within the realm of
> possibilities.

It is completely out of its possibilities. It does not have optimizer,
nor plans, nor declarative query language, etc.

You are very good discrediting your own product. That code is awful
and shows many of the limitations of your stone age approach.

> I have also tried to find
> information about the weaknesses of the network model and have not
> found much of value.

Where did you tried?

There is plenty of information on any universitary level database
textbook.

Regards
Alfredo

Patrick K. O'Brien

unread,

Oct 11, 2003, 11:21:06 AM10/11/03

to

alf...@ncs.es (Alfredo Novoa) writes:

> pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m2he2hd...@orbtech.com>...
>

> > It is a persistence system for Python objects. It can persist any
> > Python object graph,
>
> It is only a Phyton library which can save snapshots of main memory
> network structures.
>
> It another "DBMS" without DBMS like the infamous Prevayler.

PyPerSyst shares certain things in common with Prevayler, but has gone
beyond the limited features provided by Prevayler. In particular,
PyPerSyst provides base classes with features common in database
management systems. I don't think anyone that conducted a serious
evaluation of PyPerSyst would be troubled by the fact that I call it a
DBMS[1].

[1] http://wikipedia.org/wiki/DBMS

Patrick K. O'Brien

unread,

Oct 11, 2003, 11:29:30 AM10/11/03

to

alf...@ncs.es (Alfredo Novoa) writes:

> pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m2he2hd...@orbtech.com>...
>

> > As for optimization, that isn't quite as much of an issue, as
> > PyPerSyst is an in-memory database, which means that all objects
> > reside in RAM.
>
> What nonsense! Optimization is independent to where data
> reside. There are main memory SQL DBMSes with optimizers.

What I probably should have said is that optimization is not my top
priority at the moment. RAM is much faster than disk, which is much
faster than tape. So the need to optimize or not is definitely
dependent on where the data resides. I never meant to imply that main
memory databases *couldn't* be optimized.

P.S. You might want to switch to decaf. ;-)

P.P.S. I'm glad your house is not made of glass. On the other hand,
you might want to get out a bit more. Life without windows must be
very confining. :-)

Patrick K. O'Brien

unread,

Oct 11, 2003, 12:06:37 PM10/11/03

to

alf...@ncs.es (Alfredo Novoa) writes:

> pob...@orbtech.com (Patrick K. O'Brien) wrote in message news:<m265ixf...@orbtech.com>...
> > lauri.pi...@atbusiness.com (Lauri Pietarinen) writes:
> >
> > Please allow me to take exception to a few of your points.
> >
> > > If you take a standard SQL-query with sorting and grouping and some
> > > joins and compare it with hand written "navigational" code you will
> > > notice that what can be expressed in SQL in 5-10 lines of code will
> > > require several pages of hand written code for equivalent result.
>
> > In general, I would agree that a declarative language is going
> > to be able to express an operation more concisely than a procedural or
> > object-oriented language. But a language like Python is very powerful
> > and expressive. I can do the equivalent of a simple SQL query in
> > about as many lines of code in Python using the PyPerSyst ODBMS.
>
> Python is a very low level language compared to SQL.

I've already made that point myself. What you are missing is that
Python is also a very high level language compared to other imperative
languages.

> Lauri was not talking about trivial queries like "select * from A"

Lauri made a blanket statement. I made a clarifying statement. I can
think of lots of trivial examples of a "standard SQL-query with
sorting and grouping and some joins." The "equivalent result" in
Python would not "require several pages of hand written code." I
stand by my original statement.

Of course, I'm only trying to clarify the issues, not win a war. I'm
not sure what your goal is in these conversations. Perhaps you could
help me understand where you are coming from, and why you feel the
need to be so confrontational and aggressive.

> PyPerSyst is not a DBMS.

PyPerSyst is still in development, so I wouldn't say that it is a
complete DBMS at the moment. But I think it has enough features to
qualify. You can argue semantics all you want.

> > It would take an incredibly complex SQL query before I'd get to
> > "several pages of hand written code".
>
> It is completely false.

That's a useful statement. And your evidence to support this is what?

> > PyPerSyst itself is only several pages of hand written code. ;-)
>
> It is one of the reasons because it is ridiculous to compare it with
> an SQL DBMS like DB2 or Oracle. It is only a little toy which takes
> snapshots of main memory object networks in files.

It does quite a bit more than that. And the fact that it does it in
so few lines of code is a testament to its design and to the
expressive power of the Python language. I think it is foolish to
judge the quality of software by the size of the code. More often
than not, the smaller code base is better.

And, in any case, I'm not comparing PyPerSyst to DB2 or Oracle. There
is plenty of room in the DBMS landscape for solutions of all sorts of
sizes and shapes. Not every application would benefit from the use of
DB2 or Oracle. The kinds of applications that I create work fine with
PyPerSyst. My goal is to have the same integrity enforcement that you
get with DB2 or Oracle. That's why I participate in these groups.
What is your reason for participating?

> > > The third issue is that the SQL query can be optimised to a much
> > > higher degree than procedural code. Say you add a new index.
> > > The DBMS will/can immediately start using it without any user
> > > intervention (if it decides to). In the procedural alternative
> > > you would have to recode your query.
> >
> > You wouldn't necessarily *have to*. PyPerSyst does not yet have a
> > declarative query capability.
>
> One of the many reasons because it can not be considered seriously.

I'll be sure to put you on the announcement list when the declarative
query feature is complete. In the mean time, please live up to your
own statements and stop taking PyPerSyst so seriously. ;-)

> > But if you add an index, the same procedural code will now make
> > use of that index (or rather, the procedural code is calling
> > methods of objects that are smart enough to look for indexes to
> > optimize themselves without requiring any changes in application
> > code).
>
> It is something like XBase whith pointers. There is nothing to do with
> optimization.

You seem to have a very limited definition of optimization.

> > I only point this out becase critics of object databases really
> > need to improve their understanding of the capabilities of object
> > languages and object databases.
>
> You are an ignorant. You should educate yourself before trying to
> correct real experts like Lauri.

I'm not aware of Lauri's qualifications. I'm not sure how they are
relevant. I'm sure Lauri can speak for herself. As for me, I'm
humble enough to know that we are all ignorant to some degree. Thanks
for reminding me. Have a nice day yourself.

> > > The optimiser can also take into account the cardinality of
> > > tables (=number of rows) so that it will produce a different
> > > plan for a small database and a big database.
> >
> > PyPerSyst doesn't do this, but it is completely within the realm
> > of possibilities.
>
> It is completely out of its possibilities. It does not have optimizer,
> nor plans, nor declarative query language, etc.

And it is completely out of the realm of possibility that I could add
these things to PyPerSyst? Your proof is what? That I'm ignorant?

You really do amuse me, Alfredo. Nothing is ever good enough for you.
Oh well, that's okay. Ignorance is bliss, after all. So I'll just
keep plugging away. You'll keep attacking. I'll get stronger.
PyPerSyst will get better. Next thing you know, I'll be taking over
Oracle accounts. :-)

Lauri Pietarinen

unread,

Oct 11, 2003, 3:49:04 PM10/11/03

to

Costin Cozianu <c_co...@hotmail.com> wrote in message news:<bm6o4c$j8uu5$1...@ID-152540.news.uni-berlin.de>...
>

> You can group several unrelated things together without giving them an
> id. Like apples and oranges that we put in a basket, we don't need to
> give them apple_or_orange_id.
>
> Let's think of CUSTOMER, who have a default payment mechanism on record,
> where the PAYMENT_MECHANISM can be
>
> Credit Card, (CardtyTYPE, Number )
>
> OR
>
> Electronic Withdrawal From Account: (Bank Number, AccountNumber)
>
> You really don't get the business user to care that we create an extra
> ID for these two entities, they are already identifiable through their data.
>
> if we create a PAYMENT_MECHANISM_ID in current databases is just becuase
> the DBMS vendor doesn't allow us to say:
>
> type PaymentMechanism = CreditCard | ElectronicWithdrawalFromAccount
>
> So database developers will create a PAYMENT_MECHANISM table and invent
> PAYMENT_MECHANISM_ID, and even then it is either impossible or extremely
> awkward to specify the needed integrity constraint.
>
> But if the end user sees
>
> PAYMENT_MECHANISM_ID: 1020303030303,
>
> where he expects a credit card number or account details, he will say
> "What ???"
>
> Sometimes, they are forced to swallow such invention and make them part
> of their business model, just because they have to use our software, but
> not because a mathematical model of the business actually needs
> surrogate keys.

In my mind the rational for using surrogates in databases is that
they insulate us from cascading changes in the case that we have to
change the primary key.

An example:

we have the following relations:

VEHICLE(license_number primary key, year_of_purchase, etc...)
VEHICLE_INSURANCE(insurance_no primary key, license_number, etc...)

Let's suppose for the sake of argument that no two vehicles have
the same license number.

Now, the user enters the data of a new insurance policy but miss spells
the license number and does not notice it until he has entered quite
a lot of stuff in the system. At this point the value of 'license_number'
has potentially, as a foreing key, been placed in many tables
in the system.

There are now two possibilities:
1) He removes the policy and re-enters everything
2) The system has implemented a function to update all
foreign key values
(OK, some databases, e.g. SQLServer2000 have a 'cascade'
primary key update rule, but would I trust it with a cascading
update of, say 50 tables?)

Alternative 1) means extra work and frustration for user
Alternative 2) means lots of extra coding; note that this
'alter' function will have to be potentially updated over time
as new tables are introduced with license_number as foreign key

Now, surrogates give us some insulation from that problem, because
since the license number is found only in one table it is easy to change.

Our schema with surrogates would be:

VEHICLE(vehicle_id, license_number unique, year_of_purchase, etc...)
VEHICLE_INSURANCE(insurance_no primary key, vehicle_id, etc...)

So in this case the use of surrogates is purely a pragmatic question and
not something that results from some mathematical theory.

Please note, that the user is not supposed to even see the surrogate value,
so it does not burden him in any way.

regards,
Lauri Pietarinen

Lauri Pietarinen

unread,

Oct 11, 2003, 5:15:21 PM10/11/03

to

Patrick K. O'Brien wrote:

>lauri.pi...@atbusiness.com (Lauri Pietarinen) writes:
>
>Please allow me to take exception to a few of your points.
>

Sure!

>>If you take a standard SQL-query with sorting and grouping and some
>>joins and compare it with hand written "navigational" code you will
>>notice that what can be expressed in SQL in 5-10 lines of code will
>>require several pages of hand written code for equivalent result.
>>
>>
>
>That really depends on two things: the specific SQL query, the
>"navigational" programming language, and the capabilities of the
>ODBMS. In general, I would agree that a declarative language is going
>to be able to express an operation more concisely than a procedural or
>object-oriented language. But a language like Python is very powerful
>and expressive. I can do the equivalent of a simple SQL query in
>about as many lines of code in Python using the PyPerSyst ODBMS. It
>would take an incredibly complex SQL query before I'd get to "several
>pages of hand written code". PyPerSyst itself is only several pages
>of hand written code. ;-)
>

Well, just to make things a bit more concrete, how would you write the
following query

SELECT c.custname, p.prodname, sum(od.order_qty*p.prod_price) as
part_cust_total
from customer c,
order o,
order_detail od,
part p
where c.custid= o.custid and
o.orderid = od.orderid and
od.partid = p.partid
group by c.custname, p.partname
order by part_cust_total

with obvious tables:
customer(custid, custname)
order(custid, orderid, orderdate, etc...)
order_detail(orderid,partid,order_qty)
product(prodid, prodname, prodprice)

>>The third issue is that the SQL query can be optimised to a much
>>higher degree than procedural code. Say you add a new index. The
>>DBMS will/can immediately start using it without any user
>>intervention (if it decides to). In the procedural alternative you
>>would have to recode your query.
>>
>>
>
>You wouldn't necessarily *have to*. PyPerSyst does not yet have a
>declarative query capability. But if you add an index, the same
>procedural code will now make use of that index (or rather, the
>procedural code is calling methods of objects that are smart enough to
>look for indexes to optimize themselves without requiring any changes
>in application code).
>

If there are several indexes available, how does it choose which one to use?

What if this method was to return, say 1 million rows, but I only wanted
to show the first
10 on screen? Would I have to wait for all the rows to be read? What
if I want the rows
in a spesific order (not necessarily the same as the search criteria)?

>Thanks for listening. :-)
>
>
No problem!

best regards,
Lauri Pietarinen

Bob Badour

unread,

Oct 11, 2003, 5:53:37 PM10/11/03

to

"Dawn M. Wolthuis" <dw...@iserv.net> wrote in message
news:6db906b2.03101...@posting.google.com...

> Thank you, Seun, for asking your question with a bit of logic and not
> gut-reaction emotional baggage (and for also asking a question of me
> off-list so I could ramble). I'll try to make this more suscinct.

With all due respect, Dawn, you are an idiot. That statement has nothing to
do with my viscera or with my emotions; I write it with complete dispassion.
You simply are an idiot, and Seun simply lacks sufficient education to
recognize that fact from what you have written.

Costin Cozianu

unread,

Oct 11, 2003, 10:42:25 PM10/11/03

to

But this is a physical level concern. The table holding the foreign key,
may just physically hold a pointer to the record from where it will get
the (foreign key/candidate key)value. Then an "ON UPDATE CASCADE" will
be just as easy as if there was nothing to cascade.

Plus the scenario you described is not quite kosher. In general you
don;t allow users to update the logical identifier of an entity. For
example he mistype his license number, so you want to update that
information in all different tables.

But business wise this is not always allowable, what if, for example
that license ended up in a different table, and was validated and
produced business consequences ? Like it was printed on a legal
document, or the guy just got a better insurance rate on the license
number of his friend, after which he "corrected" it. So it is nto always
that we want ON UPDATE CASCADE.

Regardless of the problem related to changing the identity of entities,
in the case I presented, introducing a PAYMENT_ID is a strange way to
plumb the inadequacy of a type system.

Costin

Lauri Pietarinen

unread,

Oct 12, 2003, 4:11:13 AM10/12/03

to

Costin Cozianu wrote:

> Lauri Pietarinen wrote:
>
>>
>> In my mind the rational for using surrogates in databases is that
>> they insulate us from cascading changes in the case that we have to
>> change the primary key.
>>
> But this is a physical level concern. The table holding the foreign
> key, may just physically hold a pointer to the record from where it
> will get the (foreign key/candidate key)value. Then an "ON UPDATE
> CASCADE" will be just as easy as if there was nothing to cascade.

Do you mean some kind of table reference, like has been discussed in
this thread?

RELATION Dept ( Id Dept_Id KEY, Name String );
RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
Name PersonName, Salary Money );

RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;

RETRIEVE E.Name FROM Emp E, Dept D
WHERE DEREF(E.Dept).Id = E.Id AND D.Name = 'shoe';

If so, I think this approach will complicate things unnecessarily.

> Plus the scenario you described is not quite kosher. In general you
> don;t allow users to update the logical identifier of an entity. For
> example he mistype his license number, so you want to update that
> information in all different tables.

That's true, of course, in some situations. Or it might be that after
the policy has reached a certain state the license number can't
be modified. All the same I have seen such situations appear in practice,

>
> Regardless of the problem related to changing the identity of
> entities, in the case I presented, introducing a PAYMENT_ID is a
> strange way to plumb the inadequacy of a type system.

By type system, do you mean datatypes (=Domains) or table types?

Lauri

Costin Cozianu

unread,

Oct 12, 2003, 11:05:43 AM10/12/03

to

Lauri Pietarinen wrote:
> Costin Cozianu wrote:
>
>> Lauri Pietarinen wrote:
>>
>>>
>>> In my mind the rational for using surrogates in databases is that
>>> they insulate us from cascading changes in the case that we have to
>>> change the primary key.
>>>
>> But this is a physical level concern. The table holding the foreign
>> key, may just physically hold a pointer to the record from where it
>> will get the (foreign key/candidate key)value. Then an "ON UPDATE
>> CASCADE" will be just as easy as if there was nothing to cascade.
>
>
> Do you mean some kind of table reference, like has been discussed in
> this thread?
>
> RELATION Dept ( Id Dept_Id KEY, Name String );
> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept), Name
> PersonName, Salary Money );
>
> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>
> RETRIEVE E.Name FROM Emp E, Dept D WHERE DEREF(E.Dept).Id = E.Id AND
> D.Name = 'shoe';
>
>
> If so, I think this approach will complicate things unnecessarily.
>

Not at all, if this mechanism is part of the physical implementation level.

>> Plus the scenario you described is not quite kosher. In general you
>> don;t allow users to update the logical identifier of an entity. For
>> example he mistype his license number, so you want to update that
>> information in all different tables.
>
>
> That's true, of course, in some situations. Or it might be that after
> the policy has reached a certain state the license number can't
> be modified. All the same I have seen such situations appear in practice,
>
>
>>
>> Regardless of the problem related to changing the identity of
>> entities, in the case I presented, introducing a PAYMENT_ID is a
>> strange way to plumb the inadequacy of a type system.
>
>
> By type system, do you mean datatypes (=Domains) or table types?
>

Table types are data types, in other words it is not profitable at all
to have first class types and second class types.

In any case, in the example above it was about a "data type" i.e. a the
type of a value to be put in a column.

> Lauri
>

Costin

Lauri Pietarinen

unread,

Oct 12, 2003, 12:53:47 PM10/12/03

to

Costin Cozianu wrote:

> Lauri Pietarinen wrote:
>
>> Costin Cozianu wrote:
>>
>>> Lauri Pietarinen wrote:
>>>
>>>>
>>>> In my mind the rational for using surrogates in databases is that
>>>> they insulate us from cascading changes in the case that we have to
>>>> change the primary key.
>>>>
>>> But this is a physical level concern. The table holding the foreign
>>> key, may just physically hold a pointer to the record from where it
>>> will get the (foreign key/candidate key)value. Then an "ON UPDATE
>>> CASCADE" will be just as easy as if there was nothing to cascade.
>>
>>
>>
>> Do you mean some kind of table reference, like has been discussed in
>> this thread?
>>
>> RELATION Dept ( Id Dept_Id KEY, Name String );
>> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
>> Name PersonName, Salary Money );
>>
>> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>>
>> RETRIEVE E.Name FROM Emp E, Dept D WHERE DEREF(E.Dept).Id = E.Id
>> AND D.Name = 'shoe';
>>
>>
>> If so, I think this approach will complicate things unnecessarily.
>>
>
>
> Not at all, if this mechanism is part of the physical implementation
> level.

What do you mean by "physical implementation level"? That it is not
visible to the
user (meaning here the one that issued the query)?

>> Regardless of the problem related to changing the identity of
>> entities, in the case I presented, introducing a PAYMENT_ID is a
>> strange way to plumb the inadequacy of a type system.
>>
>> By type system, do you mean datatypes (=Domains) or table types?
>>
>
> Table types are data types, in other words it is not profitable at all
> to have first class types and second class types.

Am I to understand that you do not make a distinction between table
types and scalar, or data types?
What do you understand table types to be? Would, say the table (or relation)
PERSON(ID INTEGER, NAME STRING) be of type (INTEGER, STRING)?

best regards,
Lauri Pietarinen

Seun Osewa

unread,

Oct 12, 2003, 1:58:55 PM10/12/03

to

Hi,

seun...@inaira.com (Seun Osewa) wrote in message news:<ba87a3cf.03100...@posting.google.com>...
> One question that I think must be looked into is this: If SQL
> databases are successful today, is it because:
> ** of the relational _model_ they are based on?
>
> ** of the ease with which SQL can be used from within all programming
> languages and as an interactive query language?
>
> ** The failure or earlier models and the support of major SQL database
> vendors once it reached critical mass of adoption?
>
> In other words do we have the model, the language, or standardisation
> to blame/praise for the popularity of the relational model?

Quote from
http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html

"We knew sort of peripherally that there was some work going on in the
provinces, in San Jose. There was this guy Ted Codd who had some kind
of strange mathematical notation, but nobody took it very seriously.
Ray Boyce was hired at about this time, and we kind of got into this
game called the Query Game where we were thinking of ways to express
complicated queries. But actually before the Query Game started, I had
a conversion experience, and I still remember this. Ted Codd came to
visit Yorktown, I think it might have been at this symposium that Irv
alluded to. He gave a seminar and a lot of us went to listen to him.
This was as I say a revelation for me because Codd had a bunch of
queries that were fairly complicated queries and since I'd been
studying CODASYL, I could imagine how those queries would have been
represented in CODASYL by programs that were five pages long that
would navigate through this labyrinth of pointers and stuff. Codd
would sort of write them down as one-liners. These would be queries
like, "Find the employees who earn more than their managers."
[laughter] He just whacked them out and you could sort of read them,
and they weren't complicated at all, and I said, "Wow." This was kind
of a conversion experience for me, that I understood what the
relational thing was about after that."

Seun Osewa.

Anne & Lynn Wheeler

unread,

Oct 12, 2003, 3:01:24 PM10/12/03

to

seun...@inaira.com (Seun Osewa) writes:
> Quote from
> http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html
>
> "We knew sort of peripherally that there was some work going on in the
> provinces, in San Jose. There was this guy Ted Codd who had some kind
> of strange mathematical notation, but nobody took it very seriously.
> Ray Boyce was hired at about this time, and we kind of got into this
> game called the Query Game where we were thinking of ways to express
> complicated queries. But actually before the Query Game started, I had
> a conversion experience, and I still remember this. Ted Codd came to
> visit Yorktown, I think it might have been at this symposium that Irv
> alluded to. He gave a seminar and a lot of us went to listen to him.
> This was as I say a revelation for me because Codd had a bunch of
> queries that were fairly complicated queries and since I'd been
> studying CODASYL, I could imagine how those queries would have been
> represented in CODASYL by programs that were five pages long that
> would navigate through this labyrinth of pointers and stuff. Codd
> would sort of write them down as one-liners. These would be queries
> like, "Find the employees who earn more than their managers."
> [laughter] He just whacked them out and you could sort of read them,
> and they weren't complicated at all, and I said, "Wow." This was kind
> of a conversion experience for me, that I understood what the
> relational thing was about after that."

... some conjecture that SQL was chosen as a TLA (three letter
acronym) in competition with QBE (query by example) from YKT:
http://www.garlic.com/~lynn/2002e.html#44 SQL wildcard origins?
http://www.garlic.com/~lynn/2002o.html#70 Pismronunciation

i have some memory of QBE presentation in cambridge in the early to
mid 70s ... in advance of some amount of the System/R stuff.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Internet trivia 20th anv http://www.garlic.com/~lynn/rfcietff.htm

Anne & Lynn Wheeler

unread,

Oct 12, 2003, 3:13:54 PM10/12/03

to

oops; and other query stuff from the same period ... also done on
vm/370 and cms ... nomad, ramis, focus
http://www.garlic.com/~lynn/2003d.html#15 CA-RAMIS
http://www.garlic.com/~lynn/2003d.html#17 CA-RAMIS
http://www.decosta.com/Nomad/tales/history.html

Costin Cozianu

unread,

Oct 12, 2003, 4:55:16 PM10/12/03

to

>>> Do you mean some kind of table reference, like has been discussed in
>>> this thread?
>>>
>>> RELATION Dept ( Id Dept_Id KEY, Name String );
>>> RELATION Emp ( Id Emp_Id KEY, Dept REF(Dept),
>>> Name PersonName, Salary Money );
>>>
>>> RETRIEVE E.Name, DEREF(E.Dept).Name FROM Emp E;
>>>
>>> RETRIEVE E.Name FROM Emp E, Dept D WHERE DEREF(E.Dept).Id = E.Id
>>> AND D.Name = 'shoe';
>>>
>>>
>>> If so, I think this approach will complicate things unnecessarily.
>>>
>>
>>
>> Not at all, if this mechanism is part of the physical implementation
>> level.
>
>
> What do you mean by "physical implementation level"? That it is not
> visible to the
> user (meaning here the one that issued the query)?
>
>

Yes, how it is implemented, ti is not visible to the users. In abstract
we talk about relation over Employee * Departments, the users should not
care that in Oracle the best is to create a surrogate for department, or
in other base they have to work with REF/DEREF. If the relation is
implicit, one can just right

Select * from USERS NATURAL JOIN DEPARTMENTS

Or, you could as easily say

select users.*, users.dept from users

Where users.dept denotes the whole dept record.

Why do we need to let the user care that we implemented the relation
with a VARCHAR DEPT_ID or a AUTOINCREMENT DEPT_ID or whatever other
implementation technique that has nothing to do with Employee * Dept.

>>> Regardless of the problem related to changing the identity of
>>> entities, in the case I presented, introducing a PAYMENT_ID is a
>>> strange way to plumb the inadequacy of a type system.
>>>
>>> By type system, do you mean datatypes (=Domains) or table types?
>>>
>>
>> Table types are data types, in other words it is not profitable at all
>> to have first class types and second class types.
>
>
> Am I to understand that you do not make a distinction between table
> types and scalar, or data types?

No. They are all types.

> What do you understand table types to be?

Types derived from other types through a type constructors.

Would, say the table (or
> relation)
> PERSON(ID INTEGER, NAME STRING) be of type (INTEGER, STRING)?
>

It depends on named vs. unnamed perspective. If we choose names to be
significant, and for all practical purposes they should be, we can
define the table type to be (informally)

SET (RECORD { id:integer, name: string } )

in a ML-like language it would be:

{id: int; name: string} set

> best regards,
> Lauri Pietarinen
>

Best,
Costin

Paul G. Brown

unread,

Oct 12, 2003, 6:18:59 PM10/12/03

to

news:<5%Ghb.70458$rt6.3...@phobos.telenet-ops.be>...

> Paul G. Brown wrote:
> > Jan Hidders <jan.h...@pandora.be.REMOVE.THIS> wrote in message news:<3f851...@news.ruca.ua.ac.be>...
> >>
> >>Er, well, file names are actually not abstract because they have a
> >>concrete representation that you can read.
> >
> > We could go round and round on this. I'd observe that there is a whole
> > class of things which can potentially go into tuple attributes that aren't
> > 'values':

[ snip - but they're all values ]

Yes, but my point is that they're not the same kind of 'value' as, say, the
value '1' of domain INTEGERS, or 'Fred Flintstone' of VARCHAR(32). Recall
the original argument I'm making; it's hard to say "no" to a logical
REF/DEREF on the grounds that it violates data abstraction while at the
same time maintaining that filenames, SQL queries and other misc. scripts
are OK. There are other reasons to reject REF/DEREF related to redundancy
in the language, but that redundancy is not so clear in the other examples
I've given.

The point is that as 'values of some domain', all of these are ambiguous.
Let's call what appears in the tuple attribute a 'token'. There are two ways
to 'interpret' a filename token: either as it literally appears, or else
through some kind of intermediary operation.

TABLE Files ( Name File PRIMARY KEY );

SELECT COUNT(*) FROM Files WHERE Name = "/tmp/Foo";

Does this mean the count of times that the file-name "/tmp/Foo" appears
in the table Files, or is it the number of times a file is found
with *the same contents* as what is in the file "/tmp/Foo"? (Note that
this second interpretation implies that the result of this query can
be at most 1.)

I'm not saying either interpretation is correct, and I know what SQL
says it ought to be. I'm just saying that this is open to more than one
interpretation.

> That you can do an extra step to derive some more information doesn't
> mean that they didn't already have some information to begin with.

We seem to agree that the 'token' is ambiguous (holding aside the point
that SQL-92 allows you only one interpretation.)

> The
> crucial question is if in this extra step you need some extra
> information that is not to be found anywhere in the tables. For the
> evaluation of the query you don't. For dereferencing the reference
> without the help of a table that associates them with their destination,
> you do.

Your point pivots on the question of what counts as being "inside"
the database. Nowhere, in the tables, does anything say when
two strings are "equal". That information is outside the database too,
in the sense that it resides in code implementing domain operators. But
it would be kind of silly to say that values of VARCHAR(32) violate the
information principle because needed information about these values is
"outside" the schema.

> For looking up the contents of the file without the help of a
> table that associates them with their contents, you do. For executing
> the script you don't.
>
> So as long as you don't expect a DEREF function or a function that
> magically looks up the content of a file none of you examples violates
> the information principle.

In all of these examples--which derive from practical systems--you
do apply some DEFEF kind of logic to the token. In Postgres, for example,
(or any of the recent RDBMS releases) you can write functions to 'do
things' with the contents of the file referenced by the file-name. Several
DBMS products provide a mechanism for dynamically executing a SQL query
within another SQL query. By introducing a raft of domain specific
functions and operators in these systems you can get the behavior you want.

Look: one way to overcome this problem is to do away with functions and
operators (like DEREF) altogether. I'm back on this hobby horse again.
Note that relations are disambiguated by their *entire* definition, not
just their names. I know this smells like function overloading and I need
a whole bunch of new relations but it does reduce the amount of "stuff"
in the relational model without reducing the model's power.

RELATION Files ( File FileName KEY );
RELATION EQUALS ( Domain String Key, Range String );
RELATION EQUALS ( Domain Stream Key, Range Stream );
RELATION STREAM ( Domain String Key, Range Stream );

Q1: How many files called "/tmp/Foo"?

VALUES CARD( Files( FileName ) EQUALS( FileName, "/tmp/Foo" ));

Q2: How many files with same contents as "/tmp/Foo

VALUES CARD( Files ( FileName) STREAM ( Filename, F.Stream )
STREAM ( "/tmp/Foo", Stream) EQUALS ( F.Stream, Stream ));

End the tyranny of domain operators, now!!!

KR

P "signing off and heading for the tub" b