Lisp XML parser ?

36 views
Skip to first unread message

Clint Hyde

unread,
Jun 22, 2000, 3:00:00 AM6/22/00
to
I'm sure this is a FAQ by now...I'm out of touch...

I want an XML parser written in lisp...is there such a thing? available
free? where?

Erik Naggum must have written this by now if no on else has :)

--

please reply direct to <a href="mailto:ch...@bbn.com">Clint Hyde</a>
I don't have enough time to scan everything I'd like to, and don't
want to miss your answers...

-- clint

Daniel Barlow

unread,
Jun 22, 2000, 3:00:00 AM6/22/00
to
Clint Hyde <ch...@bbn.com> writes:
> I want an XML parser written in lisp...is there such a thing? available
> free? where?

According to CLiki <URL:http://ww.telent.net/cliki/XML>, available options
include

UncommonXML, at http://alpha.onshore.com/lisp-software/

CLOCC, at http://clocc.sourceforge.net/

If anybody knows of any others (DFSG-free), please do add links for
them. I believe there is something in the Lambda Codex at Everest,
but I can't remember its licensing nor get to their web site right now
to verify.


-dan

--
http://ww.telent.net/cliki/ - CLiki: CL/Unix free software link farm

s...@usa.net

unread,
Jun 22, 2000, 3:00:00 AM6/22/00
to ch...@bbn.com
In article <39523341...@bbn.com>,

Clint Hyde <ch...@bbn.com> wrote:
>
> I want an XML parser written in lisp...is there such a thing?
> available free? where?

xml.lisp is a part of my CLLIB
(http://www.podval.org/~sds/data/cllib.html)
which is a part of CLOCC (http://clocc.sourceforge.net)

Sent via Deja.com http://www.deja.com/
Before you buy.

Erik Naggum

unread,
Jun 22, 2000, 3:00:00 AM6/22/00
to
* Clint Hyde <ch...@bbn.com>

| Erik Naggum must have written this by now if no on else has :)

Thanks, but I have to disppoint you. I don't consider a parser to
be very valuable by itself (even though they simplify some tasks),
unless it can produce something close to a document structure that
may be traversed with reasonable tools. There is no consensus on
what an XML document means. The failure of the SGML community to
realize that they need to deal with SGML documents the same way Lisp
deals with source code/data also means that there will be no good
agreement on any in-memory representation of SGML documents. (And
DOM is an incredibly ridiculous misunderstanding of "object oriented
technology".)

#:Erik
--
If this is not what you expected, please alter your expectations.

Simon Brooke

unread,
Jun 22, 2000, 3:00:00 AM6/22/00
to
Erik Naggum <er...@naggum.no> writes:

> * Clint Hyde <ch...@bbn.com>
> | Erik Naggum must have written this by now if no on else has :)
>
> Thanks, but I have to disppoint you. I don't consider a parser to
> be very valuable by itself (even though they simplify some tasks),
> unless it can produce something close to a document structure that
> may be traversed with reasonable tools. There is no consensus on
> what an XML document means.

H'mmmm.... I've always considered that XML syntax was just a prolix
way of writing sexprs. I mean, there's little inherently different
between, say (to deal with something I was working on today),

<question
answer="Yes, formally, with reviews"
score="45"
shortform="Development of a structured management system">
<text>
Does the company have a management system in place
</text>
<advice>
You have a formal management system in place, which is providing
benefits to the company. Have you considered how you could
introduce greater flexibility within this system or how you could
integrate other approaches, e.g. environmental management,
business excellence, into your system to make it more holistic.
</advice>
</question>

and

(question
((answer . "Yes, formally, with reviews")
(score . 45)
(shortform . "Development of a structured management system"))
(text "Does the company have a management system in place")
(advice
"You have a formal management system in place, which is providing
benefits to the company. Have you considered how you could
introduce greater flexibility within this system or how you could
integrate other approaches, e.g. environmental management,
business excellence, into your system to make it more holistic."))

A text node is much the same as a string; a non-text node is very much
the same as an atom consed onto the front of an alist. The only
problem in the representation is that XML has two distinct types of
attribute-value pairs, one of which can only take simple data types as
values and the other of which can take structures. You need some way
of indicating the difference but the above scheme (I would have
thought) would make an adequate first cut.

Simon, well aware that he is posting in exalted company.

--
si...@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

Wise man with foot in mouth use opportunity to clean toes.
;; the Worlock

Erik Naggum

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
* Simon Brooke <si...@jasmine.org.uk>

| H'mmmm.... I've always considered that XML syntax was just a prolix
| way of writing sexprs.

The element structure has inherent similarities to trees made up of
lists and the significant differences are non-obvious.

| The only problem in the representation is that XML has two distinct
| types of attribute-value pairs, one of which can only take simple
| data types as values and the other of which can take structures.
| You need some way of indicating the difference but the above scheme
| (I would have thought) would make an adequate first cut.

I tend to represent *ML elements as if destructured with

((&rest attlist &key gi &allow-other-keys) &rest contents)

where attlist is a keyword-value plist, at least one key in which is
the generic identifier, a.k.a. the element type name. (There is an
important distinction between attributes and contents as far as
abstraction goes, but I won't go into that.) Attribute values have
a restricted set of types, but I consider this an artificial, not a
significant difference.

One significant difference is the entity structure, which is mostly
used for special characters, but is really an amazingly powerful and
under-understood mechanism for organizing the input sources. Lisp's
syntax has nothing like it at all, and neither do other languages
that could naturally represent tree structures. It is non-trivial
to represent the entity structure and the element structure side by
side, unless you only refer to entities in attribute values.

Another significant difference is the way identifiers are used to
change the meaning of both the gi and the other attributes. We are
not used to the operator changing meaning if we change an argument,
but this is quite common in *ML contexts, to the point where the
generic identifier may not even name the element type as far as
processing is concerned. This means that the "processing key" is
computed from the entire attribute list. Various other mechanisms
with similar confusability exist, and they are bad enough that you
cannot just gloss over them.

The result is that you cannot really represent an *ML structure
without knowing how it is supposed to be processed, as if you would
have to tell the Lisp reader whether you were reading for code or
reading for data, rejecting perhaps the biggest advantage of Lisp's
syntax. In short: They got it all wrong.

If they had had a less involved syntax, they wouldn't have needed
all the arcane details and would have had fewer chances to go off
the deep end. Given that you can stuff a lot of junk into that
attribute list, it just had to happen that they would do something
harmful to themselves. Both Perl and C++ evolved they way they did
because of syntactic mistakes like that.

Tim Bradshaw

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
* Simon Brooke wrote:
> H'mmmm.... I've always considered that XML syntax was just a prolix
> way of writing sexprs. I mean, there's little inherently different
> between, say (to deal with something I was working on today),

But XML is more complicated and harder to parse, and this is always an
advantage.

--tim

Espen Vestre

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
Erik Naggum <er...@naggum.no> writes:

> There is no consensus on what an XML document means.

well there's always XSL (or what was that acronym again?), but
in general, for some uses of XML a 'meaning' would be a meaning
in the philosphical logic sense, I guess, so we will have to
wait a few hundred years and hope that the fundamentals of epistomology
and semantics are a little better understood.

> agreement on any in-memory representation of SGML documents. (And
> DOM is an incredibly ridiculous misunderstanding of "object oriented
> technology".)

The DOM specification is the most frustrating piece of documentation
I've read in quite a few years. Not that I remember a word, though
(I _hope_ that's 'garbage out - garbage in', and not just me being
lazy ;-)).
--
(espen)

Simon Brooke

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
Erik Naggum <er...@naggum.no> writes:

> One significant difference is the entity structure, which is mostly
> used for special characters, but is really an amazingly powerful and
> under-understood mechanism for organizing the input sources. Lisp's
> syntax has nothing like it at all, and neither do other languages
> that could naturally represent tree structures. It is non-trivial
> to represent the entity structure and the element structure side by
> side, unless you only refer to entities in attribute values.

Is not an entity more or less equivalent to a read macro? A special
notation which is expanded at read-time by applying a function out of a
special namespace? There's nothing very magical about it... unless I'm
missing something very badly?

;; I'd rather live in sybar-space

Erik Naggum

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
* Simon Brooke <si...@jasmine.org.uk>

| Is not an entity more or less equivalent to a read macro?

No. Neither more nor less. The Lisp reader returns whole Lisp
objects from its reader macro functions, which is eminently doable
because Lisp has syntax with a defined meaning. Entities are
sources of characters that sort of "precede" lexical analysis, but
there are rules for where the end of an entity may occur, so the
Entity end "signal" is a special input event. Case in point: When
you give the string "foo&dash;bar" to the parser, and suppose you
have defined dash to mean the string "--", the parser will actually
see "foo&dash;--|bar", where | has the role of the Entity end. Both
the start and end of an entity are at the same level as all other
syntax in SGML, but the parsed result may or may not need to know
this depending on whether you intend to reconstruct the entity
structure (as in edit them) or process the element structure.

| There's nothing very magical about it... unless I'm
| missing something very badly?

I think I have made a case for for "magical", if not "very magical".

Tim Bradshaw

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
To: Erik Naggum <er...@naggum.no>
Subject: Re: Lisp XML parser ?
References: <39523341...@bbn.com> <31706821...@naggum.no> <m2wvjh4...@gododdin.internal.jasmine.org.uk> <31707086...@naggum.no> <m2n1kc4...@gododdin.internal.jasmine.org.uk> <31707442...@naggum.no>
FCC: ~/Net/outgoing/gnus-mails
--text follows this line--
* Erik Naggum wrote:

> I think I have made a case for for "magical", if not "very magical".

Can entities also expand to syntactically/lexically-nonsensical
things? I remember (vaguely, thank God), seeing entities in DTDs used
for things like this, in a similar awful way that people use C
preprocessor macros to expand to random chunks of text. But I know
entities in DTDs are not the same as entities in documents, and it was
SGML not XML, and in any case I may be misremembering.

--tim

Chris Brew

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
Tim Bradshaw <t...@cley.com> writes:
> Can entities also expand to syntactically/lexically-nonsensical
> things? I remember (vaguely, thank God), seeing entities in DTDs used
> for things like this, in a similar awful way that people use C
> preprocessor macros to expand to random chunks of text. But I know
> entities in DTDs are not the same as entities in documents, and it was
> SGML not XML, and in any case I may be misremembering.


In XML, entities have to expand to something well-formed. You can't
have a start tag without an end tag. This is explained in 4.3.2 of
the standard, although it isn't straightforward to understand unless
you already understand it.

I know that several of the people on the XML committees have a
thorough and exhaustive grasp of the semantic and syntactic
issues in designing such things. But these things are committees,
and sensible committee members don't necessarily produce ...

C

--

Sunil Mishra

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
in article 87wvjhb...@tninkpad.telent.net, Daniel Barlow at
d...@telent.net wrote on 6/22/00 8:59 AM:

> Clint Hyde <ch...@bbn.com> writes:
>> I want an XML parser written in lisp...is there such a thing? available
>> free? where?
>

> According to CLiki <URL:http://ww.telent.net/cliki/XML>, available options
> include
>
> UncommonXML, at http://alpha.onshore.com/lisp-software/
>
> CLOCC, at http://clocc.sourceforge.net/
>
> If anybody knows of any others (DFSG-free), please do add links for
> them. I believe there is something in the Lambda Codex at Everest,
> but I can't remember its licensing nor get to their web site right now
> to verify.
>
>
> -dan

We (everest) have an FFI layer for James Clark's expat parser at
sourceforge.net. The FFI bindings are for ACL. Here's the full URL:

ftp://lambda-codex.sourceforge.net/pub/lambda-codex/expat-1.0-beta.tgz

Sunil


mato...@iname.com

unread,
Jun 23, 2000, 3:00:00 AM6/23/00
to
In article <w64s6k3...@wallace.nextel.no>,

Espen Vestre <espen@*do-not-spam-me*.vestre.net> wrote:
> Erik Naggum <er...@naggum.no> writes:
>
> > There is no consensus on what an XML document means.
>
> well there's always XSL (or what was that acronym again?), but

Great.

<function-definition>foo<arglist></arglist>
<application>display "I am paren-challenged"</application>
</function-definition>

--
Fernando D. Mato Mira

Rob Warnock

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
<mato...@iname.com> wrote:
+---------------

| <function-definition>foo<arglist></arglist>
| <application>display "I am paren-challenged"</application>
| </function-definition>
+---------------

Well, don't you really want this: ;-} ;-}

<program>


<function-definition>foo<arglist></arglist>
<application>display "I am paren-challenged"</application>

<application>newline</application>
</function-definition>
<application>foo</application>
<application>exit 0</application>
</program>


-Rob

-----
Rob Warnock, 41L-955 rp...@sgi.com
Applied Networking http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043

Simon Brooke

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
Daniel Barlow <d...@telent.net> writes:

> Clint Hyde <ch...@bbn.com> writes:
> > I want an XML parser written in lisp...is there such a thing? available
> > free? where?
>
> According to CLiki <URL:http://ww.telent.net/cliki/XML>, available options
> include
>
> UncommonXML, at http://alpha.onshore.com/lisp-software/

Uhhh... the page is there, but as of this morning the links to the TAR,
ZIP and .DEB archives are all broken, malhereuxment. There is also a
public CVS server advertised, but it too doesn't work:

[simon@gododdin uncommon]$ cvs login
(Logging in to ano...@alpha.onshore.com)
CVS password:
[simon@gododdin uncommon]$ cvs co uncommonxml
cvs server: cannot find module `uncommonxml' - ignored
cvs [checkout aborted]: cannot expand modules

...but have you *seen* the size of the world wide spider?

Erik Naggum

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
* Tim Bradshaw <t...@cley.com>

| Can entities also expand to syntactically/lexically-nonsensical
| things?

Yes. There are some feeble attempts to restrict the nonsense in
SGML and some less feeble, but not particularly strong, attempts at
same in XML.

Fabrice Popineau

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
* Erik Naggum <er...@naggum.no> writes:

> may be traversed with reasonable tools. There is no consensus on


> what an XML document means.

And what about DOM ???

--
Fabrice POPINEAU
------------------------
e-mail: Fabrice....@supelec.fr | The difference between theory
voice-mail: +33 (0) 387764715 | and practice, is that
surface-mail: Supelec, 2 rue E. Belin, | theoretically,
F-57078 Metz Cedex 3 | there is no difference !

Steven M. Haflich

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to

Fernando -- I am shocked [shocked!] that you and all the Lisp
bigots on these lists so defame the succinct expressiveness
of XSL:

mato...@iname.com wrote:
> Great.


>
> <function-definition>foo<arglist></arglist>
> <application>display "I am paren-challenged"</application>
> </function-definition>

You miss the elegant, natural terseness of expressing it
this way:

<function-definition>foo<arglist/>


<application>display "I am paren-challenged"</application>
</function-definition>

When I get caught up on my other work I indend to write a
XML-syntax CL readtable, and then write a CL evaluator in XSL,
and then all my Lisp code will be write once, run anywhere.

Christopher Browne

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
Centuries ago, Nostradamus foresaw a time when Steven M. Haflich would say:

>
>Fernando -- I am shocked [shocked!] that you and all the Lisp
>bigots on these lists so defame the succinct expressiveness
>of XSL:
>
>mato...@iname.com wrote:
>> Great.
>>
>> <function-definition>foo<arglist></arglist>
>> <application>display "I am paren-challenged"</application>
>> </function-definition>
>
>You miss the elegant, natural terseness of expressing it
>this way:
>
> <function-definition>foo<arglist/>
> <application>display "I am paren-challenged"</application>
> </function-definition>

Don't you mean something more like:
(foo :application "display \"I am paren-challenged\"")

>When I get caught up on my other work I indend to write a
>XML-syntax CL readtable, and then write a CL evaluator in XSL,
>and then all my Lisp code will be write once, run anywhere.

Sounds pretty neat...
--
cbbr...@ntlug.org - <http://www.ntlug.org/~cbbrowne/linux.html>
((lambda (foo) (bar foo)) (baz))

Erik Naggum

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
* Fabrice Popineau <Fabrice....@supelec.fr>

| And what about DOM ???

Yes? What about DOM? Giving something an alternate representation
and _nothing_ else does not constitute giving it meaning. Besides,
I wrote what I think about DOM in <31706821...@naggum.no>:

(And DOM is an incredibly ridiculous misunderstanding of "object
oriented technology".)

#:Erik

Christopher Browne

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
Centuries ago, Nostradamus foresaw a time when Erik Naggum would say:

>* Fabrice Popineau <Fabrice....@supelec.fr>
>| And what about DOM ???
>
> Yes? What about DOM? Giving something an alternate representation
> and _nothing_ else does not constitute giving it meaning. Besides,
> I wrote what I think about DOM in <31706821...@naggum.no>:
>
> (And DOM is an incredibly ridiculous misunderstanding of "object
> oriented technology".)

But "Document Object Model" contains the word "Object," so it _MUST_
be object oriented. Right?
--
cbbr...@ntlug.org - <http://www.ntlug.org/~cbbrowne/lsf.html>
"How should I know if it works? That's what beta testers are for. I
only coded it." (Attributed to Linus Torvalds, somewhere in a
posting)

Erik Naggum

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
* Christopher Browne

| But "Document Object Model" contains the word "Object," so it _MUST_
| be object oriented. Right?

The people behind DOM are much less stupid than this implies, so
there's a possibility you're attempting to use this stupid snide
remark towards me, instead. But regardless, couldn't you instead
try to be somewhat constructive in your comments? Bogus as it is,
DOM doesn't deserve outright _disrespect_, lest we thus hinder any
better ideas along the same axis grow, too.

Steven M. Haflich

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to

Erik Naggum wrote:
>
> * Christopher Browne
> | But "Document Object Model" contains the word "Object," so it _MUST_
> | be object oriented. Right?
>
> The people behind DOM are much less stupid than this implies, so
> there's a possibility you're attempting to use this stupid snide
> remark towards me, instead. But regardless, couldn't you instead
> try to be somewhat constructive in your comments? Bogus as it is,
> DOM doesn't deserve outright _disrespect_, lest we thus hinder any
> better ideas along the same axis grow, too.

The intelligence of the people behind DOM is not a relevant issue, and
the question whether the DOM is or is not OO is also to me not the most
important one. There were, at least, cogent reasons for the peculiar
OO design even if they turn out not to have been worthwhile.

However, I am very much bothered by a hidden performance issue in
the language design, specifically, the fact that a NodeList returned
by getElementsByTagName is "live" and dynamically reflects any changes
made to the document tree from which it was made.

This seems a neat feature for the programmer until you think _very_
_carefully_ about using it. How is it implemented? The DOM specifies
specifically that the method of implementation is not specified. This
leaves the thoughtful user up in the air: What are the performance
characteristics? A NodeList references its contained nodes by numeric
index 0..(length-1) and this length changes dynamically as elements
are added and removed by operations elsewhere upon the document. How
is this implemented with performance predictable to the user? I can
think of lots of implementation tricks (delayed updating, caching,
weird hashing schemes) that would maintain efficient operation as
Element nodes are added and deleted from the tree, but the problem is
that these techniques are not obvious to the _user_ and eventually
they all break down under some conceivable pattern of document
manipulation. Unpredictable performance knees are to me unacceptable
in a serious programming language.

Lisp has lists, vectors, and hashtables to accommodate different kinds
of collection usage. Most of the performance issues are clear to any
programmer beyond the complete beginner. But as both a potential user
and a potential implementor, the appropriate performance of a NodeList
remanis opaque, and that means portable programming and portable
programmers are impossible for the language.

Fabrice Popineau

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
* Erik Naggum <er...@naggum.no> writes:

> Yes? What about DOM? Giving something an alternate
> representation and _nothing_ else does not constitute giving it
> meaning. Besides, I wrote what I think about DOM in
> <31706821...@naggum.no>:

> (And DOM is an incredibly ridiculous misunderstanding of "object
> oriented technology".)

This is not the problem. You stated that 'there is no consensus on
what an XML document means'.

The DOM is a recommendation of the W3C, so it is a consensus, even if
you do not like it. From the 'parser problem' point of view, it is the
recommended way to access the document and any parser should ideally
follow it.

From a practical point of view, I have found several DOM modules
for Perl, C/C++ that quickly allowed me to hack XML documents but I
have not been able to find the same thing for Lisp (any hint there ?).
And even if DOM does not follow an ideally good design, it is already
useful. If you have better proposals, just submit them to the W3C.

Fabrice

Erik Naggum

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
* Fabrice Popineau <Fabrice....@supelec.fr>

| This is not the problem. You stated that 'there is no consensus on
| what an XML document means'.

I'm sorry, but could you please pay attention to what I'm saying so
I don't have to reestablish the entire context _every_ time I say
something you apparently are not going to accept and keep bickering
about?

To be blunt: *ML documents derive meaning from sources external to
the documents. Even if you use XSL to obtain meaning as far as
_presentation_ is concerned, you still don't have a clue what you're
dealing with unless you're actually the _same_ application as the
writer of the XML document. *ML is no better than random chunks of
binary data, but it also is no worse -- it could easily have been.

| The DOM is a recommendation of the W3C, so it is a consensus, even if
| you do not like it.

That's the worst non sequitur this newsgroup has suffered in a while.
If you can't argue better than this, go back to school and shut up.

| From the 'parser problem' point of view, it is the recommended way
| to access the document and any parser should ideally follow it.

I'm glad you're providing evidence of your understanding that DOM is
essentially no more than an access mechanism, which I called merely
an alternate representation, not actually representing a _meaning_.
Can you please make the effort to grasp the difference?

| From a practical point of view, I have found several DOM modules for
| Perl, C/C++ that quickly allowed me to hack XML documents but I have
| not been able to find the same thing for Lisp (any hint there ?).
| And even if DOM does not follow an ideally good design, it is
| already useful.

I was not talking about your ability to find useful tools to access
*ML documents via DOM "API"'s, OK? Now, _get_ the idea, damnit!

| If you have better proposals, just submit them to the W3C.

Oh, Christ, another one of those. Just go away. If you don't like
that response, please submit your suggestions for improvements to
the Norwegian government, or better yet: NATO. Wait, try EU! No,
make that the United Nations.

Michael Schuerig

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
Fabrice Popineau <Fabrice....@supelec.fr> wrote:

> If you have better proposals, just submit them to the W3C.

Irrespective of programming language I find it pretty tiresome to deal
with XML on a low level, be it SAX or DOM. This level may be appropriate
for applications targetting working _on_ XML. For a applications that
only _use_ XML for externally representing objects I'd much prefer a
direct mapping between internal and external representation. If I'm not
mistaken, this is very familiar to Lisp people. Incidentally, Sun is
working on something like this for Java (keyword: XML data binding).

Michael

--
Michael Schuerig
mailto:schu...@acm.org
http://www.schuerig.de/michael/

Fabrice Popineau

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
* Erik Naggum <er...@naggum.no> writes:

* Fabrice Popineau <Fabrice....@supelec.fr>
Fabrice> This is not the problem. You stated that 'there is no
Fabrice> consensus on what an XML document means'.

Erik> I'm sorry, but could you please pay attention to what I'm
Erik> saying so I don't have to reestablish the entire context
Erik> _every_ time I say something you apparently are not going to
Erik> accept and keep bickering about?

I apologize for not having taken your first assertion to its basic
meaning. From my point of view, it has always been obvious that an XML
document does not convey any meaning by itself (except if it is a
standardized application of XML like MathML) and each of the writer
and reader applications should be aware of the document's semantics.
So I guess we agree on this point.

Erik> To be blunt: *ML documents derive meaning from sources external
Erik> to the documents. Even if you use XSL to obtain meaning as far
Erik> as _presentation_ is concerned, you still don't have a clue
Erik> what you're dealing with unless you're actually the _same_
Erik> application as the writer of the XML document. *ML is no
Erik> better than random chunks of binary data, but it also is no
Erik> worse -- it could easily have been.

I agree. You might expect to describe more semantics using metadata :
RDF and schemas descriptions of your document. But you will still be
far from describing how to generate data structures (say, in Lisp)
from an unknown XML document even if it has associated metadata. So
that's why the DOM is lacking from semantics. By the way, do you know
of any clear ways to specify semantics of generic documents ? What
would you like to find there ?

Erik> I'm glad you're providing evidence of your understanding that
Erik> DOM is essentially no more than an access mechanism, which I
Erik> called merely an alternate representation, not actually
Erik> representing a _meaning_. Can you please make the effort to
Erik> grasp the difference?

I perfectly grasp the difference. Nobody ever tolds that an XML
document should convey meaning, and that's why your first assertion
was misleading.

Fabrice Popineau

Joe Marshall

unread,
Jun 25, 2000, 3:00:00 AM6/25/00
to
rp...@rigden.engr.sgi.com (Rob Warnock) writes:

> <mato...@iname.com> wrote:
> +---------------


> | <function-definition>foo<arglist></arglist>
> | <application>display "I am paren-challenged"</application>
> | </function-definition>

> +---------------
>
> Well, don't you really want this: ;-} ;-}
>
> <program>

> <function-definition>foo<arglist></arglist>
> <application>display "I am paren-challenged"</application>

> <application>newline</application>
> </function-definition>
> <application>foo</application>
> <application>exit 0</application>
> </program>

Hot Damn! Without those parenthesis, it suddenly becomes orders of
magnitude more readable! Why didn't we think of this before? Do you
have a DTD for this?

Oh, just noticed a typo (no doubt because it is so much easier to
read):

<application>display &quot;I am paren-challenged&quot;</application>

And of course, we have to consider the crucial question of
indentation. So allow me to be the first to point out that unless the
</function-definition> tag is lined up with the body of the function,
you will be excommunicated.


Steven M. Haflich

unread,
Jun 26, 2000, 3:00:00 AM6/26/00
to

Erik Naggum wrote:
>
> * Tim Bradshaw <t...@cley.com>
> | Can entities also expand to syntactically/lexically-nonsensical
> | things?
>
> Yes. There are some feeble attempts to restrict the nonsense in
> SGML and some less feeble, but not particularly strong, attempts at
> same in XML.

The kind of splicing enmacrofurbulation made famous by the infamous
string-munching preprocessor of C (and PL/I in 1966) is not allowed in
XML. I don't know how strictly various parsers enforce this
requirement. Any that don't enforce it shouldn't be used, since they
encourage creative misuse of the language.

The XML specification 4.3.2 specifically says:

A consequence of well-formedness in entities is that the logical and
physical structures in an XML document are properly nested;
no start-tag, end-tag, empty-element tag, element, comment, processing
instruction, character reference, or entity reference can begin in one
entity and end in another.

I agree with Erique that the way this requirement is expressed is
indirect, feeble and to me seems the result of an committee compromise
or afterthought. To read the XML specification is to realize that
nothing at all has been learned in the past 30 years of computer science.

"One learns from one's failures, not one's successes."

"Stop me before I flame again..."

Steve Haflich

Christopher Browne

unread,
Jun 27, 2000, 3:00:00 AM6/27/00
to
Centuries ago, Nostradamus foresaw a time when Erik Naggum would say:
>* Fabrice Popineau <Fabrice....@supelec.fr>
>| This is not the problem. You stated that 'there is no consensus on

>| what an XML document means'.
>
> I'm sorry, but could you please pay attention to what I'm saying so
> I don't have to reestablish the entire context _every_ time I say
> something you apparently are not going to accept and keep bickering
> about?
>
> To be blunt: *ML documents derive meaning from sources external to
> the documents. Even if you use XSL to obtain meaning as far as
> _presentation_ is concerned, you still don't have a clue what you're
> dealing with unless you're actually the _same_ application as the
> writer of the XML document. *ML is no better than random chunks of
> binary data, but it also is no worse -- it could easily have been.

Don't Lisp programs suffer from the same problem?

(CAR WHATEVER) derives meaning from whatever external meaning you've
attached to whatever is in the sequence WHATEVER.

To be sure, DTDs are not as useful in determining semantics as one
might _want_ them to be, but they _do_ provide _some_ indication of
meaning.

Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>

Erik Naggum

unread,
Jun 27, 2000, 3:00:00 AM6/27/00
to
* Christopher Browne

| Don't Lisp programs suffer from the same problem?

No. Lisp programs do not exist outside of the language definition.

| (CAR WHATEVER) derives meaning from whatever external meaning you've
| attached to whatever is in the sequence WHATEVER.

Nonsense. car has defined meaning regardless of what whatever is,
and the whole form has defined meaning regardless of which operator
is in the first position.

| To be sure, DTDs are not as useful in determining semantics as one
| might _want_ them to be, but they _do_ provide _some_ indication of
| meaning.

Like what?

Larry Elmore

unread,
Jun 27, 2000, 3:00:00 AM6/27/00
to
"Christopher Browne" <cbbr...@news.hex.net> wrote in message
news:Z9c65.281780$VR.41...@news5.giganews.com...

> Centuries ago, Nostradamus foresaw a time when Erik Naggum would say:
> >* Christopher Browne
> >| Don't Lisp programs suffer from the same problem?
> >
> > No. Lisp programs do not exist outside of the language definition.
> >
> >| (CAR WHATEVER) derives meaning from whatever external meaning you've
> >| attached to whatever is in the sequence WHATEVER.
> >
> > Nonsense. car has defined meaning regardless of what whatever is,
> > and the whole form has defined meaning regardless of which operator
> > is in the first position.
>
> Sure, there's _a_ meaning.
>
> But the _intended_ meaning can vary considerably, depending on the
> context of what data I stuck into WHATEVER, and what Lisp form this
> reference is embedded into.
>
> Based on looking at a bit of code that says (car a1), I can't tell
> much about what it means.
>
> In contrast, if I look at an SGML document fragment:
>
> <sect1> <title> Introduction </title>
>
> it is reasonably likely that, even without knowing anything about the
> DTD, we can readily guess something about the intent of <sect1> and
> <title>.

Yes, the tags convey some information, because they were deliberately
created that way. That can be done with Lisp, too.

> >| To be sure, DTDs are not as useful in determining semantics as one
> >| might _want_ them to be, but they _do_ provide _some_ indication of
> >| meaning.
> >
> > Like what?
>

> Whether it's you writing the code that processes the FOS or sosofo, or
> me, we're likely to have _some_ common realization of the structure of
> the results that should come out of something like:
>
> <sect1> <title> Introduction </title> <para> ... stuff ... </para>
> </sect1>

Yes, but it's needlessly verbose. I can't see that it's any better than:

(sect1
(title Introduction)
(para ...stuff...))

And this is a whole lot more readable (to me, at least). It's not too far
from a possible Lisp program, even.

Larry

Christopher Browne

unread,
Jun 28, 2000, 3:00:00 AM6/28/00
to
Centuries ago, Nostradamus foresaw a time when Erik Naggum would say:
>* Christopher Browne
>| Don't Lisp programs suffer from the same problem?
>
> No. Lisp programs do not exist outside of the language definition.
>
>| (CAR WHATEVER) derives meaning from whatever external meaning you've
>| attached to whatever is in the sequence WHATEVER.
>
> Nonsense. car has defined meaning regardless of what whatever is,
> and the whole form has defined meaning regardless of which operator
> is in the first position.

Sure, there's _a_ meaning.

But the _intended_ meaning can vary considerably, depending on the
context of what data I stuck into WHATEVER, and what Lisp form this
reference is embedded into.

Based on looking at a bit of code that says (car a1), I can't tell
much about what it means.

In contrast, if I look at an SGML document fragment:

<sect1> <title> Introduction </title>

it is reasonably likely that, even without knowing anything about the
DTD, we can readily guess something about the intent of <sect1> and
<title>.

>| To be sure, DTDs are not as useful in determining semantics as one


>| might _want_ them to be, but they _do_ provide _some_ indication of
>| meaning.
>
> Like what?

Whether it's you writing the code that processes the FOS or sosofo, or
me, we're likely to have _some_ common realization of the structure of
the results that should come out of something like:

<sect1> <title> Introduction </title> <para> ... stuff ... </para>
</sect1>

--
cbbr...@hex.net - <http://www.hex.net/~cbbrowne/linux.html>
Roses are red
Violets are blue
Some poems rhyme
But this one doesn't.

Tim Bradshaw

unread,
Jun 28, 2000, 3:00:00 AM6/28/00
to
* Christopher Browne wrote:

> Based on looking at a bit of code that says (car a1), I can't tell
> much about what it means.

> In contrast, if I look at an SGML document fragment:

> <sect1> <title> Introduction </title>

> it is reasonably likely that, even without knowing anything about the
> DTD, we can readily guess something about the intent of <sect1> and
> <title>.

Yes, but this is an entirely different thing. You can *guess*
something about the intent is entirely different than saying that the
DTD tells you what the intent is.

If you start worrying about just what exactly it means to `have an
intent' or `have a meaning' you will rapidly fall into a quagmire of
philosophy and probably be doomed to spend the rest of your life as an
embittered cognitive scientist or something. But you can stay away
from that by asking much more specific questions.

Take the string.

"(lambda () (let ((x '(1 2))) (car x)))"

Then there are several things you can do:

READ (well, READ-FROM-STRING) will accept this and return an
object. So you know that it's well-formed as a lisp form.

COMPILE (something like (compile nil ...)) will accept what
READ gave you and return another object. So you know that
it's well-formed as a lisp program.

FUNCALL will accept that object and return 1. So you know
that that lisp program actually does something.

Cognitive scientists will disagree with all this, because CL probably
isn't formally enough specified, but I don't care about them. And
language lawyers will point out that you have to be in the right
package and the readtable has to be sane, and I've carefully chosen
the string not to have anything that might be a macro in it which
makes it possibly-indeterminate whether it's a well-formed program,
but I don't care about them either.

Now the point is that SGML and XML only give you the first two stages,
at best. In fact I think they give only partial bits of them:

I'm not sure (someone will know) if either assign a structure
to the string rather than just saying that it's well-formed.
I presume they do.

SGML only gives you the second stage in general: without the
grammar, you can't even tell if a string is readable the way
you can in Lisp. XML, I think, aimed to give both first and
second stages, so you should be able to check an XML document
for first-stage well-formedness even without a grammar. I
don't know if it succeeds.


All this will not satisfy people who care about formal semantics and
so on. All I'm trying to get at is that it's clear that Lisp programs
do have a whole bunch more `meaning' than *ML documents, in a sense
that can be made formal.

--tim

Erik Naggum

unread,
Jun 28, 2000, 3:00:00 AM6/28/00
to
* Christopher Browne

| But the _intended_ meaning can vary considerably, depending on the
| context of what data I stuck into WHATEVER, and what Lisp form this
| reference is embedded into.

Nonsense. Failure to include information about an enclosing form
does not constitute a change of semantics for the form so enclosed.
It is simply not useful to communicate with other people with a fear
that everything they say might have been enclosed in a `not' form,
and it is not useful to blame the recipient for not having taking
such into account when interpreting the meaning of what they say.

| Based on looking at a bit of code that says (car a1), I can't tell
| much about what it means.

No, obviously _you_ can't, since you have made up your mind that you
can enclose a form in any form at all to _rob_ it of meaning, a
pretty silly move, but necessary in order to argue that SGML _has_
meaning, since SGML has meaning _only_ relative to external sources
and that hypothetical-mythical enclosing form has the same status
for the Lisp forms: The Great Unknown Semantic Modifier.

Once again, an SGML fan is displaying his lack of clue. Boring!

Clint Hyde

unread,
Jun 29, 2000, 3:00:00 AM6/29/00
to
Clint Hyde wrote:

> I'm sure this is a FAQ by now...I'm out of touch...


>
> I want an XML parser written in lisp...is there such a thing? available
> free? where?
>

Dan Barlow pointed me at two parsers. I wasn't able to get at the first URL
at the time, so I went with:


Scott ?*, who pointed me at his XML parser. it's ok--won't read/use a DTD
(i.e., it's not a validating parser),
does require well-formed XML (close-tags are required, empty tags will
break it), has a couple of quirks,
at least one of which I eliminated (text-strings (like comments) weren't
allowed to have commas in them ?!,
just required a quickie fix to the custom read-table the parser uses, to
remove the reader-macro on the comma).

I was able to fire it up without too much trouble (well, a lot, actually,
because it is written to use things I don't/won't use,
like the mkant defsystem--portability is NOT my concern, but build
efficiency is).

it may be that the other xml parser is/will-be better, apparently it will
read a DTD...


>
> Erik Naggum must have written this by now if no on else has :)
>

and to my surprise, he hadn't. he did point out that he wanted a tree of
objects coming out of one, so do I,
and that's what I have. he still might not find it satisfactory...

I used it to build a quickie CLIM app, which you can download if you're
interested:

http://phaedrus.gteinetva.bbnplane.net/trs/trs.html

all that's there is a zip with the application (Win-NT only, since that's
the PC ACL I have), and a zip with the source needed
to compile this and build it. I modified the classes in the xml-parser to
support being drawn in my clim app. that might
be unsatisfactory to others, you'd have to build a shadow tree for use in
the window where each node would point to a corresponding
node in the xml. the source should be easy enough to build in some other
clim. I could make a Solaris app if someone
wanted it.

the app: uses a real tree view of the xml-structure (using clim's
format-graph-from-root). you can drag-n-drop
to re-arrange the structure, you can add/remove nodes. you can load/save,
and if you save and reload, you do
get back what you wrote out.

if you try to open a DTD, it will break. fortunately, for any errors,
thanks to Jeff Morrill, this is handled cleanly: you get a popup
menu-choose, and you get to pick standard lisp proceed-choices...i.e., you
can recover back to where the app is waiting for
you to click on something.

I did this in about 12 hours total. very nearly all MY code in it was
cut-n-paste from other projects over the past few years.
the app includes a sample XML file...if you have another one that is
bigger/different/breaks-the-program, I'd like to know what
I've missed/left-out.

of course: GPL applies here. give credit where it is due...feel free to
take the code and modify as you like.
send me any improvements :) or bug-reports :(

--
please reply direct to <a href="mailto:ch...@bbn.com">Clint Hyde</a>
I don't have enough time to scan everything I'd like to, and don't
want to miss your answers...

-- clint

Simon Brooke

unread,
Jun 30, 2000, 3:00:00 AM6/30/00
to
Clint Hyde <ch...@bbn.com> writes:

> http://phaedrus.gteinetva.bbnplane.net/trs/trs.html

Uhhmmm... I think you mistyped 'bbnplanet' (from which I just
downloaded your source - thanks)

Morning had broken, and there was nothing left for us to do
but pick up the pieces.

Lars Marius Garshol

unread,
Jul 10, 2000, 3:00:00 AM7/10/00
to

* Clint Hyde

|
| I want an XML parser written in lisp...is there such a thing? available
| free? where?

Just for the record: you can find a list of free XML parsers written
in Common Lisp at:

<URL: http://www.garshol.priv.no/download/xmltools/plat_ix.html#plat6 >

Note that this list includes one parser not mentioned in this thread
so far: James Anderson's CL-XML. This is the most complete of the four
I have listed and even contains a DOM implementation.

--Lars M.

Reply all
Reply to author
Forward
0 new messages