Message from discussion
A DTD for Personal Identity
From: pierc...@sabi.demon.co.uk (Piercarlo Grandi)
Subject: Re: A DTD for Personal Identity
Date: 1996/10/15
Message-ID: <yf391989lc2.fsf@sabi.demon.co.uk>
X-Deja-AN: 189603275
x-nntp-posting-host: sabi.demon.co.uk
x-disclaimer: Contents reflect my personal views only
references: <AshleyB-0610960226550001@news.halcyon.com> <vwjwwwyj3k3.fsf@osfb.aber.ac.uk> <AshleyB-1110961756180001@news.halcyon.com> <yf3hgo0kmze.fsf@sabi.demon.co.uk> <32631316.895091@news.alink.net>
organization: Home's where my rucksack's
newsgroups: comp.text.sgml,comp.databases.theory
>>> "Charles" == Charles F Goldfarb <Char...@SGMLsource.com> writes:
Charles> I agree that the A element type in HTML defines no relationship
Charles> other than "we are linked".
Charles> In post-TC HyTime, however, one can define a link element type
Charles> such as:
Charles> <!element enrolled empty>
Charles> <!attlist enrolled
Charles> HyTime=hylink
Charles> anchrole CDATA #fixed "person #corlist course #corlist"
Charles> person IDREFS #required
Charles> course IDREFS #required>
Charles> [ ... illustration of the meaning of the DTD ... ]
Charles> An instance could be:
Charles> <enrolled person="person1 person2 person1 person3"
Charles> course="course 1 course1 course2 course1">
Charles> and there could be multiple instances in the same document,
Charles> with differing attribute values.
Charles> Can you explain how this differs from your database example?
Well, first of all let me start with the punchline: the above is another
abuse of SGML -- in effect most of HyTime is an abuse of SGML, even if
it is sanctioned by you and the whole WG8 crowd. :-)
This little idea enunciated (an explanation will follow later), let me
note some of the problems with the above as compared with
piercarl> enrolled(person*,course*)
piercarl> person1 course1
piercarl> person2 course1
piercarl> person1 course2
piercarl> person3 course1
The first and biggest is that even assuming that the two are more or
less equivalent, there is no explicit definition in HyTime of a data
model; it is left largely implicit. The example above in relational
notation makes is based on a very clearly and explicitly defined data
model, down to some important bits of formal work. If one does
``serious'' data modeling an explicitly, possibly formalized, data model
is rather opportune.
An example is that the use of HyTime as schema, as in the definition of
the attributes or 'enrolled' above, is not quite equivalent to the
'enrolled(person*,course*)' bit; for this specifies (by making both the
'person' and 'course' columns part of the primary key, something that is
often conventionally indicated by the '*' suffix, or by underlining)
that there cannot be two identical 'person,course' pairs. I think that
your definition above does not capture this rather essential
semantics. Perhaps there are bits of HyTime where this can be specified;
but I cannot think of anything right now (my limit probably).
But there are many other problems; ranging from the implicit data model
being link based (and thus forcing navigational access, where most data
modeling people view that as regrettable nowadays), the above is not as
easy to update/manipulate as a table. Another issue is that the
relationship is expressed not as part fo the data, but of the
``metadata'', the markup (this is related to is being navigational,
where one must then distinguish between data and navigation paths).
The whole WWW is link based, even if vastly less rigorously than with
Hytime; it is telling that many people actually ``navigate'' it not by
following links but by going thru a non-link-based search engine's
database.
Then there is the indirection: the relationship is between ids, not data
values. Admittedly in most cases the enronllment relation would be
really written not as:
enrolled(person*,course*)
but as
enrolled(personid*,courseid*)
but the option to have the former is essential in various cases to
capture the semantics of relationships (for examplein the table that
maps personids to person names or whatever).
Then we are back to the abuse-of-SGML idea: I personally find it
extremely distateful that such a relationship has to be described using
attribute values.
I could imagine comparing the 'enrolled' relation above with something
like:
<BINARYREL>
<NAME/enrolled/
<BETWEEN/person1/ <AND/course1/
<BETWEEN/person2/ <AND/course1/
<BETWEEN/person1/ <AND/course2/
<BETWEEN/person3/ <AND/course1/
</BINARYREL>
or perhaps more aptly:
<BINARYREL2>
<NAME>enrolled</>
<BETWEEN LINKTO="students/person1.sgml">person1</A>
<AND LINKTO="courses/course1.sgml">course1</>
<BETWEEN LINKTO="students/person2.sgml">person2</A>
<AND LINKTO="courses/course1.sgml">course1</>
<BETWEEN LINKTO="students/person1.sgml">person1</A>
<AND LINKTO="courses/course2.sgml">course2</>
<BETWEEN LINKTO="students/person3.sgml">person3</A>
<AND LINKTO="courses/course1.sgml">course1</>
</BINARYREL2>
(I'm making up the markup, I hope the intent is clear). These at least
are documents that _describe_ things directly, not indirectly via
attributes.
At this point let me observe: that you regard the two things as
comparable is, with all due respect, IMNHO, very naive: they are not
even remotely on the same plane (if for no other reason that the HyTime
notation is awfully ugly and opaque, at least to my taste, the HyTime
documentation is often impenetrable, but of course that is not just
that).
My main point is that designing a data model and a schema language that
expresses it is as large and difficult a task as designing an
hypermedia/text/... markup system, and only coincidentally the latter
could double up as the other. Both describe some sort of structure and
some sort of relationships; in both the cases of SGML and HyTime they
are not far removed from being good data modeling tools, even given a
very optimistic interpretation.
This is in effect what has happened in the OO world: the very poor data
model implicit in ordinary programming constructs like pointers,
records, arrays, has been haphazardly sanctified into a data model (see
Grady Booch's OOADWA, and many other examples). It simply is not up to
the task...
It is a bit hard to explain why more or less from first principles; it
would take probably a small history of data models from the sixties to
today to illustrate it properly.
But I hope to have found a way to suggest why your example above is so
distateful to me; and it is by way of (caricatural/comical) analogy.
I'll spare myself the DTDs, but consider two instances/examples of two
hypothetical SGML architectural forms; the first is called MarkDown:
<ELEMENT TAG=html>
<ELEMENT TAG=head>
<ELEMENT TAG=title TEXT="A sample MarkDown document"></>
</>
<ELEMENT TAG=body ATTRS="bgcolor" ATTVALS="#ffffff">
<ELEMENT TAG=h1 ATTRS="center" ATTVALS="yes" TEXT="What is MarkDown?"></>
<ELEMENT TAG=p>
MarkDown is a caricature of SGML; it is an imaginary
architectural form whose semantics are document markup, where
the <ELEMENT TAG=code TEXT="tag"></> attribute is the one to
which the MarkDown semantics are attached.
</>
</>
</>
Now this is a monstrosity, but I hope the analogy is clear, even if a
bit forced in some respects. Perhaps a bit less forced is the following
example for the architectural form PasCool:
<FUNCTION ID=gcd>
<PARM ID=x TYPE=int><PARM ID=y TYPE=int><RESULT TYPE=int>
<!-- or perhaps
FUNCTION ID=gcd PARMS="x y" PARTYPES="integer integer"
RESTYPE=int
-->
<BEGIN>
<WHILE NEQ="x y">
<IF GT="x y">
<THEN><SET VAR=x><MINUS OPS="x y"></MINUS></SET></THEN>
<ELSE><SET VAR=y><MINUS OPS="y x"></MINUS></SET></ELSE>
</FI>
</WHILE>
<RETURN VALUE=x>
</BEGIN>
</FUNCTION>
<CALL PROC=writeln><APPLY FUNC=euclid ARGS="9 6"></APPLY></CALL>
This is a monstrosity too: note that this is, like so many HyTime
examples/actual uses, including your "enrolled" one, pure markup without
any contents, which seems a bit odd (but implicit in the logic of using
SGML as the EBNF for defining any sort of language).
I hope that this suggests/illustrates why I am so uncomfortable with the
use of HyTime/SGML as a database schema language. Both caricatures are
in some way ``workable''; it is pretty clear what the meaning is. But
neither quite works well; there are too many obvious loose ends. To me
much the same applies to HyTime, when interpreted as a schema language
for some sort of data model, as you (ab)use it in the example above.
At this point, hoping that you are still suspending your disbelief, and
are prepared to try to see my point of view, let me go back to the
punchline: that HyTime is (largely) an abuse of SGML. I think that I can
now explain why...
Basically SGML is a customized/uglified relative of EBNF; it is
customized in the sense that it is geared to defining context free
grammars based on parenthetical forms, especially suitable for document
markup.
Now, EBNF/SGML can be (ab)used to define _any_ sort of notation, not
just a markup notation for documents; it can be (ab)used to define also
any sort of language, by using EBNF/SGML to define the syntax and
attaching a semantics to it.
It is then very tempting to enjoy riding SGML/EBNF into defining _any_
sort of language by attaching some sort of meaning to its parenthetical
forms. Musical scores? Schema language? Query language? Scripting
language? even Markup Language :-) or Pascal-like Language :-).
But doing so has two drawbacks:
* SGML in particular is really geared to defining document markup, if
nothing else from an aesthetical point of view, but not only.
* defining new languages, be they schema definition languages or
Pascal-like languages, is a rather nontrivial activity, and it's
easy to get it wrong even in the best of conditions, and being
familiar with the language's domain; if the notation gets in the
way, and with SGML as EBNF it does, getting it right is well nigh
impossible, bar a miracle of chance, and miracles don't happen.
My point is then that, and please forgive me for this colourful
expression, approximately as MarkDown and PasCool above are a
Goofy-style markup notations and programming language, HyTime, if it is
to be regarded as a schema language (and I would spare it the indignity,
but you did not), is a Mickey Mouse schema language (SGML/EBNF can also
be given an interpretation as a schema language, but with somewhat more
difficulty).
Now even a Mickey Mouse schema language may be perfectly apt and
suitable (but as to that there are _some_ reservations) for describing
multidocument structures (a forest of trees with links across trees);
but to express *data* models it has so many problems it's hard even to
start listing them (and I did try to point out some above).