Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Data-semantics and the evolution of ontologies

0 views
Skip to first unread message

Jorn Barger

unread,
Dec 5, 2002, 10:20:35 AM12/5/02
to
My researches into the prehistory of Linux desktops--
http://www.robotwisdom.com/linux/desktops.html --led me to
Sketchpad, Simula, and Smalltalk as object-oriented turning-
points in the history of electronic knowledge-representation.

But it occurs to me that one can trace that history _backwards_
as well as forwards, by inventorying the various applications
early computers were used for-- ballistics, codebreaking,
navigation, accounting, etc etc etc...

What you should end up with, eventually, is a timeline of
'data semantics' that shows vividly how more and more bits of
the hypothetical universal ontology (cf Cyc, XML) have been
usefully represented in silicon:
http://www.robotwisdom.com/linux/semantics.html

For some reason, this peculiar perspective seems poorly
documented-- how do you do a web-search for the earliest
computerised accounting system? --so tips are welcome.


(The belligerently-stupid faction from afc is requested to
use the anonymous feedback form at that url, or at least
trim can-l from their content-free followups.)

Jorn Barger

unread,
Dec 6, 2002, 5:39:10 AM12/6/02
to
I wrote in message news:<16e613ec.0212...@posting.google.com>...

> My researches into the prehistory of Linux desktops--
> http://www.robotwisdom.com/linux/desktops.html --led me to
> Sketchpad, Simula, and Smalltalk as object-oriented turning-
> points in the history of electronic knowledge-representation.
> But it occurs to me that one can trace that history _backwards_
> as well as forwards, by inventorying the various applications
> early computers were used for-- ballistics, codebreaking,
> navigation, accounting, etc etc etc...

Of course, I immediately found myself on a slippery slope--
the first apps were all intended to speed math calculations
that were already important enough to require dedicated teams
of 'human computers', including range-tables for artillery,
astronomical/navigational tables, maps, and various advanced
engineering problems.

These human computers used adding machines and sliderules and
books of math-tables. A little further down the scale came
beancounters-- accountants and actuaries and demographers.

But the knowledge-representations used by these professionals
barely had to be changed at all to apply digital technologies--
the real 'AI' innovations took place centuries or millennia
earlier, when the problems of representing facts with _numbers_
(or even words) were first solved.

And I'm tempted to take a shot at mapping these out as well,
but it would have to include, eg, the Code of Hammurabi as an
early knowledgebase of legal principles, and the Linear A
fragments from the Greek bronze age, with their palace
accounts...

> What you should end up with, eventually, is a timeline of
> 'data semantics' that shows vividly how more and more bits of
> the hypothetical universal ontology (cf Cyc, XML) have been
> usefully represented in silicon:
> http://www.robotwisdom.com/linux/semantics.html

The other direction I want to go is forward thru Sketchpad,
Simula and Smalltalk. Sketchpad's great innovation (I think)
was representing _screen-shapes_ as first-class data-objects...
which is quite a conceptual leap from counting business-beans.

Alan Kay generalised this so that interface-widgets were also
first-class. And Simula perhaps first considered _events_ as
objects, allowing them to be symbolically queued. (I assume
Smalltalk, like the Mac, used an event-queue for interface-
events?)

Looking at the origins of LISP c1958--
http://www-formal.stanford.edu/jmc/history/lisp/node2.html
--it appears McCarthy's goal was to represent logic-
propositions (which he called 'sentences') in a form that
mapped closely onto their normal human representations, by
analogy with Fortran's innovation of writing math-programs
that closely resembled algebraic notation.

Which leads us to a grey area between the representation and
the 'things' we're trying to represent-- electronic
computation must have started by representing just the math
that the humans had been doing, but the earliest machine-
representations bore little resemblance to the normal
human notation-system.

Fortran then was a usability-breakthru on the same order as
the mouse-and-windows revolution, by transferring (more of)
the burden of 'compilation' from the user to the machine.

The Fortran compiler has to treat program-statements as
data-objects, but that's not _quite_ the same as trying to
represent the actual notations of the human calculators.

Analogously, early attempts at natural-language translation
must from the first have included representations of real
nat-lang sentences, but I wonder what program first treated
sentences as objects, with their own meta-data...?

jmfb...@aol.com

unread,
Dec 6, 2002, 5:45:49 AM12/6/02
to
In article <16e613ec.0212...@posting.google.com>,

jo...@enteract.com (Jorn Barger) wrote:
>My researches into the prehistory of Linux desktops--

<snip>

>For some reason, this peculiar perspective seems poorly
>documented-- how do you do a web-search for the earliest
>computerised accounting system? --so tips are welcome.

You're not looking back far enough. You have go a lot
further back than the desktop generation.


>
>
>(The belligerently-stupid faction from afc is requested to
>use the anonymous feedback form at that url, or at least
>trim can-l from their content-free followups.)

This belligerently-stupid member from afc knows some of
the answers to your questions. You might consider
doing some abject groveling IF, and only if, you really
want to get something accomplished.


/BAH

Subtract a hundred and four for e-mail.

Cameron Laird

unread,
Dec 6, 2002, 10:42:23 AM12/6/02
to
In article <16e613ec.02120...@posting.google.com>,

I'm warming to this project; it sounds quite instructive.
The unhappy fact is that I can't afford to think at all
deeply about it at least until New Year.

Flotsam that I happen to know was represented as an explicit
data structure early on:
chess positions
(certain) differential operations
human inheritance diagrams
securities markets' habit of calculating in eighths, ...
military capabilities, in an "operations research" sense
English-language text (without regard to case?)
economic state
...
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html

Jorn Barger

unread,
Dec 8, 2002, 5:39:51 AM12/8/02
to
I wrote in message news:<16e613ec.02120...@posting.google.com>...

> > My researches into the prehistory of Linux desktops--
> > http://www.robotwisdom.com/linux/desktops.html --led me to
> > Sketchpad, Simula, and Smalltalk as object-oriented turning-
> > points in the history of electronic knowledge-representation.
> > But it occurs to me that one can trace that history _backwards_
> > as well as forwards, by inventorying the various applications
> > early computers were used for-- ballistics, codebreaking,
> > navigation, accounting, etc etc etc... [...]
> > http://www.robotwisdom.com/linux/semantics.html

I'm thinking that I can re-title my page "Timeline of Knowledge
Representation" (kr) and that this can encompass both computerized
and non-computerized efforts.

> And I'm tempted to take a shot at mapping these out as well,
> but it would have to include, eg, the Code of Hammurabi as an
> early knowledgebase of legal principles, and the Linear A
> fragments from the Greek bronze age, with their palace
> accounts...

The prospect of starting at 3000BC and trudging forward seems
guaranteed to fail, as does the idea of starting with ENIAC and
trudging forward. One alternative is 'cherry-picking'-- starting
with apps I've heard of, that advanced the state of the art.

But it occurs to me that I can exploit the topical framework(s)
used by librarians-- the Dewey Decimal System and the Library of
Congress Classification. (Does the LoC or its software equivalent
solicit submissions of any and all commercial software? If there
is such a beast, that would be the place for me to start, assuming
they classify by topic.)

A rough overview:

Q. Science (LoC)
500 Natural sciences & mathematics (Dewey)

These are where computation started, and progress in kr has been
steady. In math, Macsyma and Mathematica are classic advances.
In mechanics, Sketchpad and ThingLab were early advances. In
ecology, there's the Club of Rome.

Historically, early breakthrus in kr for math and the sciences
are pretty well documented (eg decimal notation, square-root
algorithms, etc.)

600 Technology (Applied sciences)
R. Medicine
S. Agriculture
T. Technology/Engineering
U. Military Science
V. Naval Science

The early ballistics work fits here, but so do recent war-sims
like Command and Conquer. In medicine, I'm a fan of Weed's
Knowledge-Couplers and Problem-Oriented Medical Records.
Tracking agricultural output has been a high priority since
Ancient Egypt.

Software-engineering in general has to go here, and must
include software-widget theory.

100 Philosophy & psychology
200 Religion
B. Philosophy, Psychology, Religion
300 Social sciences
G. Anthropology
H. Social Sciences
J. Political Science
K. Law (General)
L. Education

Progress here has been very slow. Mortimer Adler's catalog
of 'Great Ideas' is one pre-computer landmark. 'The Sims'
is a crude psychological model. There have been some
interesting games based on politics (Balance of Power, etc).

I've seen anthropologists tabulating the 'toolkits' of different
cultures (eg how they start fires, how they treat wounds).

Embodying philosophical principles has always been an ambition
of game designers. I think 'Black and White' is a recent
landmark.

I'd include my psychology of romantic love (Solace) as well.

400 Language
P-PA Language and Literature
800 Literature & rhetoric

The Text Encoding Initiative fits here, and WordNet, and
some of Grady Ward's 'Moby' stuff. And Propp's analysis of
folktales, and my Anti-Math. And grammar-checkers, etc.

G. Geography, Maps, Anthropology, Recreation
900 Geography & history
C. Auxiliary Sciences of History
D. History (Eastern Hemisphere)
E-F. History (Western Hemisphere)

The representation of geographical features on maps has
evolved slowly-- Tufte offers some innovative experiments.

KR-for-history is non-existent as far as I know-- my own
Internet Timelines Project seeks to remedy this using
pseudo-XML.

700 The arts (Fine and decorative arts)
M. Music
N. Fine arts

Representation of music morphed naturally into MIDI-etc.
Sketchpad made certain kinds of drawing easier. 'Aaron'
succeeds in automatically creating representational 'art'.

Any schemes for classifying the content of paintings and
photos should be included. (My fractal-thickets are one.)

Z. Bibliography, Library Science
000 Generalities
A. General works

LISP and Cyc might go here...?

Brian {Hamilton Kelly}

unread,
Dec 8, 2002, 8:31:14 PM12/8/02
to
In article <16e613ec.0212...@posting.google.com>
jo...@enteract.com "Jorn Barger" writes:

> But it occurs to me that I can exploit the topical framework(s)
> used by librarians-- the Dewey Decimal System and the Library of
> Congress Classification. (Does the LoC or its software equivalent
> solicit submissions of any and all commercial software? If there
> is such a beast, that would be the place for me to start, assuming
> they classify by topic.)

Serious libraries (as opposed to those lending to hoi polloi) in Europe
use the UDC (Universal Decimal Classification) which is derived from
Thomas Dewey's original principles but also permits of more extensive
cross-indexing of concepts.

Perhaps you should expand your horizons.

--
Brian {Hamilton Kelly} b...@dsl.co.uk
"We can no longer stand apart from Europe if we would. Yet we are
untrained to mix with our neighbours, or even talk to them".
George Macaulay Trevelyan, 1919

Jorn Barger

unread,
Dec 9, 2002, 3:20:37 AM12/9/02
to
Brian {Hamilton Kelly} <b...@dsl.co.uk> wrote in news:<103939...@dsl.co.uk>:

> Serious libraries (as opposed to those lending to hoi polloi) in Europe
> use the UDC (Universal Decimal Classification) which is derived from
> Thomas Dewey's original principles but also permits of more extensive
> cross-indexing of concepts. Perhaps you should expand your horizons.

I know this is just a cheap shot, and you probably don't have a clue
what I'm talking about... but in fact I found this UDC page very
interesting: http://www.niss.ac.uk/resource-description/udcbrief.html

The top-ten categories are essentially the same as Dewey, but the
following more-detailed summary is well-timed to suggest new
software/knowledge-rep categories. The question I'm asking, which
doesn't seem to have been asked yet by compsci historians, is
what sorts of knowledge-representations have been attempted in each
of these subcategories. (All lines that start with numbers are
from the url above, my comments are interspersed. I'll be adding
relevant links at: http://www.robotwisdom.com/ai/timeline.html )

0 Generalities. Science and knowledge. Information. Documentation.
Librarianship etc
00 Prolegomena. Fundamentals of knowledge and culture

I'll come back to this, but my first guess is still Cyc and LISP.

01 Bibliography and bibliographies. Catalogues

Yahoo and DMoz. Standardised formats for bibliographies (especially
machine-readable).

02 Librarianship

Card-catalog software design?

030 General reference works. Encyclopaedias. Dictionaries

Digitizing the OED was especially hairy, I know. (It's easy to
forget that I'm _not_ interested in just listing reference
websites-- I'm only interested in novel kr-approaches.)

050 Serial publications. Periodicals

I think the special kr-problems here are pretty trivial. Perhaps
some pagelayout programs included special features for periodicals,
though?

06 Organizations and other types of cooperation. Congresses. Museums

Some toy ontologies include simple representations of organization-
structure.

070 Newspapers. The Press. Journalism

There's a very primitive markup standard for news articles.

08 Polygraphies. Collective works

Tagging multiple authors is a semi-trivial markup/metadata problem.

09 Manuscripts. Rare and remarkable works

There's lots of experimental efforts in computerizing manuscript-
drafts.

1 Philosophy
11 Metaphysics. Fundamental problems
122/129 Special metaphysics
13 Philosophy of mind and spirit. Metaphysics of spiritual life
14 Philosophical systems and points of view. Weltanschauung

I've proposed the (eventual) use of simulations as a 'Philosophy Lab'
where one can try out different ethical choices, etc. Multi-user
gaming-systems already have to deal with these (eg cheaters), but
I don't know if they're actually represented symbolically at any
level.

159.9 Psychology

(Odd that the UDC doesn't relocate this and give it more room...?)

Cognitive psych uses very limited computer models.

16 Logic. Epistemology. Theory of knowledge. Methodology of logic

Lots of AI-work here.

17 Moral philosophy. Ethics

There are some very interesting cellular automata that model cheating.

2 Religion. Theology
21 Natural theology
22 The Bible. Holy scripture

Lots of pioneering basic work in semantic-markup, computerized
concordances, etc.

23 Dogmatic theology
24 Practical theology
25 Pastoral theology
26 The Christian Church in general
27 General history of the Christian Church
28 Christian churches, sects, denominations
29 Non-Christian religions

3 Social sciences
30 Theories, methodology etc. Sociography
31 Demography. Sociology. Statistics

Statistics software. Census software.

32 Politics
33 Economics. Economic science

Some interesting sims.

34 Law. Jurisprudence

Indexing, and simple expert systems.

35 Public administration. Government. Military affairs

The history of giant custom-software boondoggles. ;^/

36 Ensuring the mental and material necessities of life. Social aid.
Insurance

(US users please ignore this category!)

37 Education. Teaching. Training. Leisure

Lots of bad theory about software for education. Also 'wizards'?

(How is 'leisure' different from 79 below?)

389 Metrology. Weights and measures

Trivial, I think. Eg unit-conversions.

39 Ethnology. Ethnography. Folklore

Propp et al.

4 (Vacant; linguistics transferred to 81)

(What was Dewey thinking?)

5 Mathematics and natural sciences
50 Generalities. Nature and conservation
51 Mathematics
52 Astronomy. Astrophysics. Space research. Geodesy
53 Physics
54 Chemistry. Crystallography. Mineralogy
55 Earth sciences. Geology. Meteorology etc
56 Palaeontology
57 Biological sciences in general
58 Botany
59 Zoology

Lots of obvious kr-work here.

6 Applied sciences. Medicine. Technology
60 General questions of the applied sciences
61 Medical sciences
62 Engineering. Technology in general
63 Agriculture. Forestry. Farming etc
64 Home economics. Domestic science. Housekeeping
65 Management and organization of industry, trade and communication

Simula.

66 Chemical technology. Chemical and related industries
67 Various industries, trades and crafts
68 Assembled articles. Precision mechanisms
69 Building (construction) trade. Building materials. Building practice
and procedure

I've heard that automakers-etc have very sophisticated inhouse design-
packages (eg CAD-CAM), but supposedly they keep their kr-innovations
secret?

7 The arts. Recreation. Entertainment. Sport
71 Physical planning. Regional, town and country planning. Landscaping
72 Architecture

The original Pattern Language was for architects.

73 Plastic arts. Sculpture
74 Drawing. Design. Applied arts and crafts
75 Painting
76 Graphic arts. Graphics
77 Photography and similar processes
78 Music
79 Recreation. Entertainment. Games. Sport

Dance isn't mentioned, but there are notation-schemes for
choreographers.

Sports-sims, and stat-databases (eg sabremetrics), and whatever
software announcers use on teevee.

8 Language. Linguistics. Literature
80 General questions. Prosody
81 Linguistics and languages
82 Literature generally
821 Literatures of individual languages

Grammar-checkers, translation, markup.

9 Geography. Biography. History
902/908 Related sciences. Archaeology. Prehistory. Historic remains.
Study of a locality

There may be special database software for archeologists by now.

91 Geography. Exploration. Travel

Mapping software. GPS.

913 Regional geography in general. Geography of the ancient world
914/919 Individual regions and countries of the modern world
929 Biographical and related studies. Genealogy. Heraldry etc

Genealogy software. Trivially, the biographical formats of (eg)
Who's Who. Resume/CV markup.

93/99 History
930 Science of history. Diplomatic. Archivistics. Epigraphy.
Palaeography
931/939 Ancient history
94/99 Mediaeval and modern history. History of individual countries

I've seen several history-oriented websites where you can (eg) click
on any mention of a given year and see a generated-page with all the
events of that year, but none of them seemed very useful compared to
handbuilt pages.

Returning to the start:

0 Generalities. Science and knowledge. Information. Documentation.
Librarianship etc
00 Prolegomena. Fundamentals of knowledge and culture

Overviews of kr-software have to be very different from overviews
of library-topics. Neither Lisp nor Cyc is an obvious candidate
for UDC's '00'. And my survey above obviously careens between
trivial database-implementations and profound unsolved challenges.

I think I should probably radically re-sort the LoC/UDC categories
to reflect this...?

Jorn Barger

unread,
Dec 10, 2002, 9:44:56 AM12/10/02
to
I wrote in message news:<16e613ec.02120...@posting.google.com>...
> [...] I found this UDC page very
> interesting: http://www.niss.ac.uk/resource-description/udcbrief.html

A more-extensive version of the UDC is at
http://www.niss.ac.uk/subject2/new95udc.html
and reveals that software is ghettoized at 518, in a handful of
major subcategories that are, disappointingly, alphabetized.

[...]


> I think I should probably radically re-sort the LoC/UDC categories

Here's my thinking so far:

I like the "0 = Generalities" approach, but since Dewey/LoC/UDC
are inherently book-oriented, I want to rethink the range of
'generalities' quite radically to include software (and sims).

I'll propose that for any given _topic_ of discourse, the _language_
of discourse follows the same approximate evolutionary path:

- preverbal
- images
- words and symbols
- clustering and orthogonal tables?
- counting and measurement
- algorithmic relationships
- simulations

The topics that these can range over include physics, chemistry,
astronomy, geology, biology, and psychology. At some point within
the psych-topic we'll have to re-echo the evolution of languages
of discourse (duplicating the 0=generalities categories), but
perhaps we can arbitrarily draw a line between 'topical' motives
(eg food, sex) and topic-free abstract logics.

Topic-neutral _tools_ for each type of discourse/language will
also have to be dealt with in the 0=generalities section, which
forces us to relocate great chunks of the UDC. Roget's thesaurus,
eg, is an all-topic classification of words, and so must go under
word-tools. Macsyma is a generic math-tool, and goes under
algorithms, as do spreadsheets. Simula goes under generic
simulations. Sketchpad (paradoxically?) goes under generic image-
tools.

The UDC itself is a word-tool, since it's classifying all texts.
Grammar-checkers probably have to go here too, although you
could argue that their topic is motivated by a desire to appear
intelligent (and therefore psychological, not purely abstract).

History and geography are not in themselves topical, and must
therefore go in with words-and-symbols-- you can talk about
human-behavior in Egypt c1400BC, but you might instead talk
about the weather or the botany then.

I just went back and added "clustering and orthogonal tables"
with a questionmark because I'm not sure how it fits. After
you name something, you can start to contrast how separate
instances of that phenomenon differ-- this is the principle
of the flatfile database.

be...@sonic.net

unread,
Dec 10, 2002, 6:46:30 PM12/10/02
to

Brian {Hamilton Kelly} wrote:
>
> Serious libraries (as opposed to those lending to hoi polloi) in Europe
> use the UDC (Universal Decimal Classification) which is derived from
> Thomas Dewey's original principles but also permits of more extensive
> cross-indexing of concepts.
>
> Perhaps you should expand your horizons.

As long as you contemplate compiling a history, take care to get it
right.

It wasn't Thomas Dewey, it was Melvil Dewey, the outspoken american
efficiency expert, office-supplies dealer, racist, and anti-semite who
made his money by inventing, patenting, and selling the hanging file
folder system. He had the reputation of being a thoroughly nasty man,
but very very sharp.

And I would say that the Dewey system itself, as the first reasonably
general and widely adopted standard for shelving sequence in libraries,
is definitely an achievement of note in the history of knowledge
representation.

The hanging file folder system may also deserve a footnote in there, but
it's media-dependent rather than being about ideas.

Bear

Charles Richmond

unread,
Dec 11, 2002, 1:59:30 AM12/11/02
to
be...@sonic.net wrote:
>
> Brian {Hamilton Kelly} wrote:
> >
> > Serious libraries (as opposed to those lending to hoi polloi) in Europe
> > use the UDC (Universal Decimal Classification) which is derived from
> > Thomas Dewey's original principles but also permits of more extensive
> > cross-indexing of concepts.
> >
> > Perhaps you should expand your horizons.
>
> As long as you contemplate compiling a history, take care to get it
> right.
>
> It wasn't Thomas Dewey, it was Melvil Dewey, the outspoken american
> efficiency expert, office-supplies dealer, racist, and anti-semite who
> made his money by inventing, patenting, and selling the hanging file
> folder system. He had the reputation of being a thoroughly nasty man,
> but very very sharp.
>
Thomas Dewey was an infamous US political leader...for more
info, see:

<http://gi.grolier.com/presidents/aae/side/dewey.html>

--
+-------------------------------------------------------------+
| Charles and Francis Richmond <rich...@plano.net> |
+-------------------------------------------------------------+

0 new messages