On Sun, 1 Sep 2013 03:47:35 -0700 (PDT)
karl.s...@o2online.de wrote:
> Am Samstag, 31. August 2013 19:59:52 UTC+2 schrieb James K. Lowden:
> > On Sat, 31 Aug 2013 08:22:41 -0700 (PDT)
> >
karl.s...@o2online.de wrote:
> > > Codd's model emerged out of the technology of the seventies and
> > > needs urgently a revision.
> >
> > That's a novel observation. I'm sure others would be interested to
> > know of any aspect of the relational model rooted in 1970s
> > technology.
> >
> My observation is based on Codd's paper of 1970 "A Relational Model
> of Data for Large Shared Data Banks"
I appreciate the effort you took to defend your idea, and I think I now
understand what you mean. But you are confusing the incidental
reference to contemporary technology with its influence on the
relational model.
Yes, you can find references to 1970's technology in a 1970s paper and,
yes, Codd was a product of his day just as anyone is. His work was
motivated by commercial needs in a commercial firm, and he was applying
his mathematical expertise to real-world problems. His paper relates
his theoretical work to those problems in terms of the technology then
extant.
What you will not find is any aspect of the theory that is *tied* to any
technology, 1970s or other. The theory is all about relations and
operations and constraints. Naturally both Codd and his contemporaries
were interested in how that theory might be applied to real computers
of the day; then as now many insisted that his pointy-headed math could
never be implemented efficiently. That quite different from saying the
theory is somehow blinkered by the limitations of those computers.
> He addresses the following problems
> 1.2.1. Ordering Dependence.
> "Let us consider those existing systems which either require or
> permit data elements to be stored in at least one total ordering
> which is closely associated with the hardware-determined ordering of
> addresses. "
If you remove "hardware-determined" from the sentence, it's exactly as
true now as then.
> 1.2.2. Indexing Dependence.
> "...destroy indices from time to time will probably be necessary. The
> question then arises: Can application programs and terminal
> activities remain invariant as indices come and go?..."
>
> In the seventies "bigdata" had to be stored on sequential data
> storages (tapes, cards). Querying data from sequential media cannot
> use indices ("indices go").
Hmm, no, I'm pretty sure VSAM and IMS were available in the 70s.
Cullinet was selling IDMS.
Codd's "competition" wasn't 1890s hollerith cards. It was (what were
later called) hierarchical and network DBMSs that imposed great
constraints and complexity on application programmers.
Funny how little has changed, right? You move to Hadoop City and build
an entire application around a "known" application domain on a
nonstandard filesystem. Then comes the day you'd like summaries by zip
code instead by customer account, and you have to write an application
instead of a query. Santayana rides again!
> 1.2.3. Access Path Dependence.
> "
> One solution to this is to adopt the policy that once a
> user access path is defined it will not be made obsolete until
> all application programs using that path have become
> obsolete. Such a policy is not practical, because the number
> of access paths in the total model for the community of
> users of a data bank would eventually become excessively
> large."
>
> That statement is based on the hardware of the seventies.
On the web I believe it's called "404".
> First normal form and normalization
> "
> So far, we have discussed examples of relations which are
> defined on simple domains-domains whose elements are
> atomic (nondecomposable) values. Nonatomic values can
> be discussed within the relational framework. Thus, some
> domains may have relations as elements. These relations
> may, in turn, be defined on nonsimple domains, and so on.
> "
> It is clear, Codd started 1970 with a design like the
> "document storages" in NOSQL or the N1F systems of the past.
Yes.
> For reasons not comprehensible any more (Codd's reference is
> out of print and not online available), he restricted his model
No mystery. Books in a library are hardly lost texts of Babylon. And
he states his motivation plainly: "the possibility of eliminating
nonsimple domains appears worth investigating!"
> "1.4. NORMAL FORM
> A relation whose domains are all simple can be represented
> in storage by a two-dimensional column-homogeneous
> array of the kind discussed above. Some more
> complicated data structure is necessary for a relation with
> one or more nonsimple domains. For this reason (and others
> to be cited below) the possibility of eliminating nonsimple
> domains appears worth investigating! There is, in fact, a
> very simple elimination procedure, which we shall call
> normalization.
> "
The model is not "restricted". It is *simplified*, a feature, not a
bug. By showing -- more, *proving* -- that logical inferences could be
drawn from data manipulated with a small number of operators closed
over a domain, Codd released programmers from low-level complexity and
man-centuries of work.
> Meanwhile are complex dynamic data structures (trees, graphs,
> lists... ) part of standard liraries for common mainstream programing
> languages.
If you're programming a computer, graphs are a your natural ally
because they can be mapped directly onto the computer's memory. They're
of no use, though, when you want to manage data logically. How, for
example, do you define a subset of a cyclic graph?
You're right to say that graphs are more complex than relations. It's a
mistake, though, to conclude therefore that they are more powerful.
It's been proved mathematically that graphs and relations are
interchangeable in the sense that they can represent the same
information. The difference is that relational theory is much
simpler. That's its advantage, not a handicap.
> "Future users of large data banks must be protected from
> having to know how the data is organized in the machine (the
> internal representation)"
>
> For me as a programmer this sounds like a textbook example for
> object design.
OK, but it's not.
Consider the UNIX filesystem, for instance, which you refered to
earlier. Upon a time, when my mother wrote disk access routines for
Univac, the programmer had to know all the particulars of the device,
and read/write data in terms of the device's design. Unix
revolutionized the field by abstracting all disk access into today's
familar stream of bytes. No addresses, no heads or sectors or tracks.
A catalog to facilitate sharing that anyone (potentially) can update,
not just the system programmers. Works pretty good for nonrotating
media, too, and over the network. And not an object in sight.
On the other hand, you are in some sense right, if the DBMS is
the object. What OO calls "data hiding" is analogous to what RM calls
"data independence". In both cases, the goal is to isolate the
application from details it doesn't need and that might change, to
permit the application programmer to operate at a higher level. Both
also have a notion of "consistent state". I have long thought that
stored procedures are to databases what methods are to objects, and
subscribe to the idea that applications should access the data only
through views and procedures.
Part of your critique is actually of DBMSs that we have, not of RM.
SQL DBMSs largely support only a few primitive types that the user may
then further constrain or write functions for. One cannot, for
example, define an aggregate type as a set of columns, and use that
name in, say, FK declarations. Nor can we usually define types of blobs
and comparison functions for them (although I'm unconvinced that's a
good idea).
And of course SQL itself -- not RM! -- is deeply rooted in IBM's 1970s
notion of an end-user query language, to let users write their own
reports. You could talk all day to IT departments about math and
logic, but you could close the deal convincing them that their reports
would write themselves. So we're saddled now with a language no one
likes, and that no one thinks expresses relational algebra or calculus
well. I wonder if we're going to have live through the rediscovery of
the purpose and benefits of the relational model before we see a
re-implementation of it that provides a relational language in which to
express our queries.
--jkl
P.S. Since you've read this far and we're debating the 70s, I hope
you've seen
http://www.masswerk.at/google60/.