I'm sure this is a FAQ by now...I'm out of touch...
I want an XML parser written in lisp...is there such a thing? available free? where?
Erik Naggum must have written this by now if no on else has :)
--
please reply direct to <a href="mailto:ch...@bbn.com">Clint Hyde</a> I don't have enough time to scan everything I'd like to, and don't want to miss your answers...
If anybody knows of any others (DFSG-free), please do add links for them. I believe there is something in the Lambda Codex at Everest, but I can't remember its licensing nor get to their web site right now to verify.
* Clint Hyde <ch...@bbn.com> | Erik Naggum must have written this by now if no on else has :)
Thanks, but I have to disppoint you. I don't consider a parser to be very valuable by itself (even though they simplify some tasks), unless it can produce something close to a document structure that may be traversed with reasonable tools. There is no consensus on what an XML document means. The failure of the SGML community to realize that they need to deal with SGML documents the same way Lisp deals with source code/data also means that there will be no good agreement on any in-memory representation of SGML documents. (And DOM is an incredibly ridiculous misunderstanding of "object oriented technology".)
#:Erik -- If this is not what you expected, please alter your expectations.
Erik Naggum <e...@naggum.no> writes: > * Clint Hyde <ch...@bbn.com> > | Erik Naggum must have written this by now if no on else has :)
> Thanks, but I have to disppoint you. I don't consider a parser to > be very valuable by itself (even though they simplify some tasks), > unless it can produce something close to a document structure that > may be traversed with reasonable tools. There is no consensus on > what an XML document means.
H'mmmm.... I've always considered that XML syntax was just a prolix way of writing sexprs. I mean, there's little inherently different between, say (to deal with something I was working on today),
<question answer="Yes, formally, with reviews" score="45" shortform="Development of a structured management system"> <text> Does the company have a management system in place </text> <advice> You have a formal management system in place, which is providing benefits to the company. Have you considered how you could introduce greater flexibility within this system or how you could integrate other approaches, e.g. environmental management, business excellence, into your system to make it more holistic. </advice> </question>
and
(question ((answer . "Yes, formally, with reviews") (score . 45) (shortform . "Development of a structured management system")) (text "Does the company have a management system in place") (advice "You have a formal management system in place, which is providing benefits to the company. Have you considered how you could introduce greater flexibility within this system or how you could integrate other approaches, e.g. environmental management, business excellence, into your system to make it more holistic."))
A text node is much the same as a string; a non-text node is very much the same as an atom consed onto the front of an alist. The only problem in the representation is that XML has two distinct types of attribute-value pairs, one of which can only take simple data types as values and the other of which can take structures. You need some way of indicating the difference but the above scheme (I would have thought) would make an adequate first cut.
Simon, well aware that he is posting in exalted company.
* Simon Brooke <si...@jasmine.org.uk> | H'mmmm.... I've always considered that XML syntax was just a prolix | way of writing sexprs.
The element structure has inherent similarities to trees made up of lists and the significant differences are non-obvious.
| The only problem in the representation is that XML has two distinct | types of attribute-value pairs, one of which can only take simple | data types as values and the other of which can take structures. | You need some way of indicating the difference but the above scheme | (I would have thought) would make an adequate first cut.
I tend to represent *ML elements as if destructured with
((&rest attlist &key gi &allow-other-keys) &rest contents)
where attlist is a keyword-value plist, at least one key in which is the generic identifier, a.k.a. the element type name. (There is an important distinction between attributes and contents as far as abstraction goes, but I won't go into that.) Attribute values have a restricted set of types, but I consider this an artificial, not a significant difference.
One significant difference is the entity structure, which is mostly used for special characters, but is really an amazingly powerful and under-understood mechanism for organizing the input sources. Lisp's syntax has nothing like it at all, and neither do other languages that could naturally represent tree structures. It is non-trivial to represent the entity structure and the element structure side by side, unless you only refer to entities in attribute values.
Another significant difference is the way identifiers are used to change the meaning of both the gi and the other attributes. We are not used to the operator changing meaning if we change an argument, but this is quite common in *ML contexts, to the point where the generic identifier may not even name the element type as far as processing is concerned. This means that the "processing key" is computed from the entire attribute list. Various other mechanisms with similar confusability exist, and they are bad enough that you cannot just gloss over them.
The result is that you cannot really represent an *ML structure without knowing how it is supposed to be processed, as if you would have to tell the Lisp reader whether you were reading for code or reading for data, rejecting perhaps the biggest advantage of Lisp's syntax. In short: They got it all wrong.
If they had had a less involved syntax, they wouldn't have needed all the arcane details and would have had fewer chances to go off the deep end. Given that you can stuff a lot of junk into that attribute list, it just had to happen that they would do something harmful to themselves. Both Perl and C++ evolved they way they did because of syntactic mistakes like that.
#:Erik -- If this is not what you expected, please alter your expectations.
* Simon Brooke wrote: > H'mmmm.... I've always considered that XML syntax was just a prolix > way of writing sexprs. I mean, there's little inherently different > between, say (to deal with something I was working on today),
But XML is more complicated and harder to parse, and this is always an advantage.
Erik Naggum <e...@naggum.no> writes: > There is no consensus on what an XML document means.
well there's always XSL (or what was that acronym again?), but in general, for some uses of XML a 'meaning' would be a meaning in the philosphical logic sense, I guess, so we will have to wait a few hundred years and hope that the fundamentals of epistomology and semantics are a little better understood.
> agreement on any in-memory representation of SGML documents. (And > DOM is an incredibly ridiculous misunderstanding of "object oriented > technology".)
The DOM specification is the most frustrating piece of documentation I've read in quite a few years. Not that I remember a word, though (I _hope_ that's 'garbage out - garbage in', and not just me being lazy ;-)). -- (espen)
Erik Naggum <e...@naggum.no> writes: > One significant difference is the entity structure, which is mostly > used for special characters, but is really an amazingly powerful and > under-understood mechanism for organizing the input sources. Lisp's > syntax has nothing like it at all, and neither do other languages > that could naturally represent tree structures. It is non-trivial > to represent the entity structure and the element structure side by > side, unless you only refer to entities in attribute values.
Is not an entity more or less equivalent to a read macro? A special notation which is expanded at read-time by applying a function out of a special namespace? There's nothing very magical about it... unless I'm missing something very badly?
* Simon Brooke <si...@jasmine.org.uk> | Is not an entity more or less equivalent to a read macro?
No. Neither more nor less. The Lisp reader returns whole Lisp objects from its reader macro functions, which is eminently doable because Lisp has syntax with a defined meaning. Entities are sources of characters that sort of "precede" lexical analysis, but there are rules for where the end of an entity may occur, so the Entity end "signal" is a special input event. Case in point: When you give the string "foo‐bar" to the parser, and suppose you have defined dash to mean the string "--", the parser will actually see "foo‐--|bar", where | has the role of the Entity end. Both the start and end of an entity are at the same level as all other syntax in SGML, but the parsed result may or may not need to know this depending on whether you intend to reconstruct the entity structure (as in edit them) or process the element structure.
| There's nothing very magical about it... unless I'm | missing something very badly?
I think I have made a case for for "magical", if not "very magical".
#:Erik -- If this is not what you expected, please alter your expectations.
To: Erik Naggum <e...@naggum.no> Subject: Re: Lisp XML parser ? References: <39523341.CED20EE@bbn.com> <3170682110777797@naggum.no> <m2wvjh4b0o.fsf@gododdin.internal.jasmine.org.uk> <3170708641147673@naggum.no> <m2n1kc4xzf.fsf@gododdin.internal.jasmine.org.uk> <3170744203873830@naggum.no> FCC: ~/Net/outgoing/gnus-mails --text follows this line--
* Erik Naggum wrote: > I think I have made a case for for "magical", if not "very magical".
Can entities also expand to syntactically/lexically-nonsensical things? I remember (vaguely, thank God), seeing entities in DTDs used for things like this, in a similar awful way that people use C preprocessor macros to expand to random chunks of text. But I know entities in DTDs are not the same as entities in documents, and it was SGML not XML, and in any case I may be misremembering.
Tim Bradshaw <t...@cley.com> writes: > Can entities also expand to syntactically/lexically-nonsensical > things? I remember (vaguely, thank God), seeing entities in DTDs used > for things like this, in a similar awful way that people use C > preprocessor macros to expand to random chunks of text. But I know > entities in DTDs are not the same as entities in documents, and it was > SGML not XML, and in any case I may be misremembering.
In XML, entities have to expand to something well-formed. You can't have a start tag without an end tag. This is explained in 4.3.2 of the standard, although it isn't straightforward to understand unless you already understand it.
I know that several of the people on the XML committees have a thorough and exhaustive grasp of the semantic and syntactic issues in designing such things. But these things are committees, and sensible committee members don't necessarily produce ...
> If anybody knows of any others (DFSG-free), please do add links for > them. I believe there is something in the Lambda Codex at Everest, > but I can't remember its licensing nor get to their web site right now > to verify.
> -dan
We (everest) have an FFI layer for James Clark's expat parser at sourceforge.net. The FFI bindings are for ACL. Here's the full URL:
Daniel Barlow <d...@telent.net> writes: > Clint Hyde <ch...@bbn.com> writes: > > I want an XML parser written in lisp...is there such a thing? available > > free? where?
Uhhh... the page is there, but as of this morning the links to the TAR, ZIP and .DEB archives are all broken, malhereuxment. There is also a public CVS server advertised, but it too doesn't work:
[simon@gododdin uncommon]$ cvs login (Logging in to anon...@alpha.onshore.com) CVS password: [simon@gododdin uncommon]$ cvs co uncommonxml cvs server: cannot find module `uncommonxml' - ignored cvs [checkout aborted]: cannot expand modules
> may be traversed with reasonable tools. There is no consensus on > what an XML document means.
And what about DOM ???
-- Fabrice POPINEAU ------------------------ e-mail: Fabrice.Popin...@supelec.fr | The difference between theory voice-mail: +33 (0) 387764715 | and practice, is that surface-mail: Supelec, 2 rue E. Belin, | theoretically, F-57078 Metz Cedex 3 | there is no difference !
> <function-definition>foo<arglist></arglist> > <application>display "I am paren-challenged"</application> > </function-definition>
You miss the elegant, natural terseness of expressing it this way:
<function-definition>foo<arglist/> <application>display "I am paren-challenged"</application> </function-definition>
When I get caught up on my other work I indend to write a XML-syntax CL readtable, and then write a CL evaluator in XSL, and then all my Lisp code will be write once, run anywhere.
>> <function-definition>foo<arglist></arglist> >> <application>display "I am paren-challenged"</application> >> </function-definition>
>You miss the elegant, natural terseness of expressing it >this way:
> <function-definition>foo<arglist/> > <application>display "I am paren-challenged"</application> > </function-definition>
Don't you mean something more like: (foo :application "display \"I am paren-challenged\"")
>When I get caught up on my other work I indend to write a >XML-syntax CL readtable, and then write a CL evaluator in XSL, >and then all my Lisp code will be write once, run anywhere.
* Fabrice Popineau <Fabrice.Popin...@supelec.fr> | And what about DOM ???
Yes? What about DOM? Giving something an alternate representation and _nothing_ else does not constitute giving it meaning. Besides, I wrote what I think about DOM in <3170682110777...@naggum.no>:
(And DOM is an incredibly ridiculous misunderstanding of "object oriented technology".)
#:Erik -- If this is not what you expected, please alter your expectations.
Centuries ago, Nostradamus foresaw a time when Erik Naggum would say:
>* Fabrice Popineau <Fabrice.Popin...@supelec.fr> >| And what about DOM ???
> Yes? What about DOM? Giving something an alternate representation > and _nothing_ else does not constitute giving it meaning. Besides, > I wrote what I think about DOM in <3170682110777...@naggum.no>:
> (And DOM is an incredibly ridiculous misunderstanding of "object > oriented technology".)
But "Document Object Model" contains the word "Object," so it _MUST_ be object oriented. Right? -- cbbro...@ntlug.org - <http://www.ntlug.org/~cbbrowne/lsf.html> "How should I know if it works? That's what beta testers are for. I only coded it." (Attributed to Linus Torvalds, somewhere in a posting)
* Christopher Browne | But "Document Object Model" contains the word "Object," so it _MUST_ | be object oriented. Right?
The people behind DOM are much less stupid than this implies, so there's a possibility you're attempting to use this stupid snide remark towards me, instead. But regardless, couldn't you instead try to be somewhat constructive in your comments? Bogus as it is, DOM doesn't deserve outright _disrespect_, lest we thus hinder any better ideas along the same axis grow, too.
#:Erik -- If this is not what you expected, please alter your expectations.
> * Christopher Browne > | But "Document Object Model" contains the word "Object," so it _MUST_ > | be object oriented. Right?
> The people behind DOM are much less stupid than this implies, so > there's a possibility you're attempting to use this stupid snide > remark towards me, instead. But regardless, couldn't you instead > try to be somewhat constructive in your comments? Bogus as it is, > DOM doesn't deserve outright _disrespect_, lest we thus hinder any > better ideas along the same axis grow, too.
The intelligence of the people behind DOM is not a relevant issue, and the question whether the DOM is or is not OO is also to me not the most important one. There were, at least, cogent reasons for the peculiar OO design even if they turn out not to have been worthwhile.
However, I am very much bothered by a hidden performance issue in the language design, specifically, the fact that a NodeList returned by getElementsByTagName is "live" and dynamically reflects any changes made to the document tree from which it was made.
This seems a neat feature for the programmer until you think _very_ _carefully_ about using it. How is it implemented? The DOM specifies specifically that the method of implementation is not specified. This leaves the thoughtful user up in the air: What are the performance characteristics? A NodeList references its contained nodes by numeric index 0..(length-1) and this length changes dynamically as elements are added and removed by operations elsewhere upon the document. How is this implemented with performance predictable to the user? I can think of lots of implementation tricks (delayed updating, caching, weird hashing schemes) that would maintain efficient operation as Element nodes are added and deleted from the tree, but the problem is that these techniques are not obvious to the _user_ and eventually they all break down under some conceivable pattern of document manipulation. Unpredictable performance knees are to me unacceptable in a serious programming language.
Lisp has lists, vectors, and hashtables to accommodate different kinds of collection usage. Most of the performance issues are clear to any programmer beyond the complete beginner. But as both a potential user and a potential implementor, the appropriate performance of a NodeList remanis opaque, and that means portable programming and portable programmers are impossible for the language.
> Yes? What about DOM? Giving something an alternate > representation and _nothing_ else does not constitute giving it > meaning. Besides, I wrote what I think about DOM in > <3170682110777...@naggum.no>: > (And DOM is an incredibly ridiculous misunderstanding of "object > oriented technology".)
This is not the problem. You stated that 'there is no consensus on what an XML document means'.
The DOM is a recommendation of the W3C, so it is a consensus, even if you do not like it. From the 'parser problem' point of view, it is the recommended way to access the document and any parser should ideally follow it.
From a practical point of view, I have found several DOM modules for Perl, C/C++ that quickly allowed me to hack XML documents but I have not been able to find the same thing for Lisp (any hint there ?). And even if DOM does not follow an ideally good design, it is already useful. If you have better proposals, just submit them to the W3C.