I've been doing my best to ignore XML thus far, but repeatedly encountering comparisons of XML to lisp has piqued my interest. I am wondering whether I can advance my understanding of lisp by learning about its relation to XML.
Can you reccommend any books or URLs which could help me to learn about XML, whith the aim of being able to discuss intelligently the relative merits of the two.
Jacek Generowicz <j...@ecs.soton.ac.uk> writes: > I've been doing my best to ignore XML thus far, but repeatedly > encountering comparisons of XML to lisp has piqued my interest. I am > wondering whether I can advance my understanding of lisp by learning > about its relation to XML.
Roughly said (and with my agent provocateur hat on :) ) XML is a re-invention of the wheel. The wheel is Lisp.
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
> > I've been doing my best to ignore XML thus far, but repeatedly > > encountering comparisons of XML to lisp has piqued my interest. I am > > wondering whether I can advance my understanding of lisp by learning > > about its relation to XML.
> Roughly said (and with my agent provocateur hat on :) ) XML is a > re-invention of the wheel. The wheel is Lisp.
Similarities:
-o- both are regarded by some as the best thing since sliced bread
-o- both go in heavily for balanced delimiters
-o- both are regarded as overly-bracketful by many people
Differences:
-o- One is a text markup language with little or no semantics
-o- One is a programming language with little or no syntax
In article <g07kvvjf1z....@scumbag.ecs.soton.ac.uk>, Jacek Generowicz wrote: >I've been doing my best to ignore XML thus far, but repeatedly >encountering comparisons of XML to lisp has piqued my interest.
Comparisons between XML and Lisp as a whole are meaningless. Only comparisons between XML and Lisp as a data representation are meaningful. XML is not a programming language, it is merely a syntax for data representation which squanders bandwidth, memory and processing time. Lisp as a data representation is more frugal. It's close to being as compact as you can make a notation for structured data while remaining in readable plain text.
Ian Wild <i...@cfmu.eurocontrol.int> writes: > Differences:
> -o- One is a text markup language with little or no semantics
> -o- One is a programming language with little or no syntax
((:reply :title "Lisp is not just a programming language") (:body (:p "It is also a text-markup language, and many other things, as you can see here" "For instance with a suitable (small) macro, this is quite legal Lisp syntax, which is compiled to *ML. I have written significantly-sized documents in this notation.")) (:signature "--tim"))
Ian Wild <i...@cfmu.eurocontrol.int> writes: > Marco Antoniotti wrote:
> > Jacek Generowicz <j...@ecs.soton.ac.uk> writes:
> > > I've been doing my best to ignore XML thus far, but repeatedly > > > encountering comparisons of XML to lisp has piqued my interest. I am > > > wondering whether I can advance my understanding of lisp by learning > > > about its relation to XML.
> > Roughly said (and with my agent provocateur hat on :) ) XML is a > > re-invention of the wheel. The wheel is Lisp.
> Differences:
> -o- One is a text markup language with little or no semantics
> -o- One is a programming language with little or no syntax
I like this one! :)
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
Jacek Generowicz <j...@ecs.soton.ac.uk> writes: > I've been doing my best to ignore XML thus far, but repeatedly > encountering comparisons of XML to lisp has piqued my interest. I am > wondering whether I can advance my understanding of lisp by learning > about its relation to XML.
> Can you reccommend any books or URLs which could help me to learn > about XML, whith the aim of being able to discuss intelligently the > relative merits of the two.
> ((:reply :title "Lisp is not just a programming language") > (:body > (:p "It is also a text-markup language, > and many other things, as you can see here" > "For instance with a suitable (small) macro, this is quite legal > Lisp syntax, which is compiled to *ML. I have written significantly-sized > documents in this notation.")) > (:signature "--tim"))
As long as we think aloud in alternative syntaxes, I actually prefer to break the _incredibly_ stupid syntactic-only separation of elements and attribute values. SGML and its descendants have made a crucial mistake: For every level of container (there are about 7 of them), there is a new syntax for _two_ properties of the container: (1) the contents is wrapped in one syntax, but (2) the "writing on the box" is in quite another. This means that information and meta-information are massively different concepts, and this artificial separation runs through the whole SGML design. Each level offers a new way to write the two differently. This is what makes it so goddamn hard to reason about SGML documents and to do reasonably intelligent transformations on them without working your butt off specifying all sorts of irrelevant stuff that does _nothing_ but get in your way.
I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools use and produce, because it makes XML just as evil in Lisp as it was in XML to begin with, and we have gained absolutely nothing in either power of processing or in abstraction, which is so very un-Lisp-like.
<foo bar="zot">quux</foo>
should be read as
(foo (bar "zot") "quux")
and most definitely _NOT_ as ((:foo :bar "zot") "quux"), which turns this fairly reasonable structure into a morass of complexity worse than it was to begin with. And it does _NOT_ help to represent empty elements only with a keyword. Using three different levels of nesting to represent a single concept is Just Plain Wrong. Also, using keywords is not a good idea because there needs to be a lot of related information associated with elements and attributes, in different contexts, not to mention all the things they do with their funny "namespaces" these days.
Whether something is an attribute or element is _completely_ arbitrary. It is based on some arbitrary choices in the design process that reveal absolutely no inherent qualities. For purely pragmatic reasons, SGML folks will use attributes for some things and elements for others because their tools can deal with some things in attributes and some things in elements. The faulty idea that attributes say something "about" the element and sub-elements somehow constitute be their contents is the same premature structuring that premature optimization of code suffers from. The whole language is incredibly misdesigned in making that distinction.
As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of the annoying verbosity of these stupid languages while _retaining_ that mistake between attribute values and elements, because it is quite hard to make simple regular expression-based conversions retain enough data about an element to decide what should be attribute and element. An element has the form <name [attributes] | [contents]>. Attribute have the form <name | value>. Internal whitespace is only for readability.
So I have almost none of the annoying and arbitrary quote/escape mania in attribute values or contents alike, either. Entities I write as [name], and they end up in the Lisp version as symbols if not the character they represent purely for syntactic reasons. Writing "code" in this language is actually amazingly painless compared to the produced noise. Besides, with a few simple modify-syntax-entry calls in Emacs, I get < and > to match and blink and I can move up and down the structure very easily.
For processing this stuff in Common Lisp, it is _sometimes_ neat to convert the single | attribute/content marker into the zero-length symbol, ||, so pathological cases like
<foo bar="zot"><bar>"zot"</bar></foo>
which could have been written like this to show how arbitrary the syntactic disctinction in SGML/XML is
<foo <bar|zot>|<bar|zot>>
come out as
(foo (bar "zot") || (bar "zot"))
The really interesting thing is that writing in Enamel and producing XML is so easy that a simple Perl or Lisp function that takes an Enamel string as argument and produces XML is quite simple and straight- forward. This makes for some interesting-looking "scripting" that blows the mind of the miserable little wrecks that think they have to type the endtag, the quotes and all the other user-inimical features of SGML/XML.
In my personal view, Lisp "markup" has the disadvantage of needing lots of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx is always symbolic and yyy is always a string of characters subject to interpretation by whatever the symbolic part instructs in context.
Since the key feature of markup languages is the separation of text from markup, the simple idea in Enamel should carry enough force to make this a fully realizable goal without making an artificial syntactic separation between information and meta-information at any level. If the syntax is good enough for the information, it should be good enough for the meta- information, and I think Enamel is. Fortunately, I do not have to create a whole new international following and engage in godawful politics to use a better syntax for XML and the like, since XML and the like are only used as interchange syntaxes these days. Nobody in their right mind actuslly writes anything by hand in such stupid languages that require so much attention to incredibly insignificant details and incomprehensibly irrelevant redundancy, anyway, do they? :)
Finally, note that in Enamel, a complete element is enclosed in <...> and that means it can be subject to a nice little Common Lisp reader macro, and it can be taught to recognize other stuff, as well, such as the neat concept of interpolating expression values where {expression} occurs.
Still at "internal use" stage, I plan to publish some stuff about Enamel not too far into the future.
Erik Naggum <e...@naggum.net> writes: > <foo bar="zot">quux</foo>
> should be read as
> (foo (bar "zot") "quux")
> and most definitely _NOT_ as ((:foo :bar "zot") "quux"), which turns this > fairly reasonable structure into a morass of complexity worse than it was > to begin with. And it does _NOT_ help to represent empty elements only > with a keyword. Using three different levels of nesting to represent a > single concept is Just Plain Wrong. Also, using keywords is not a good > idea because there needs to be a lot of related information associated > with elements and attributes, in different contexts, not to mention all > the things they do with their funny "namespaces" these days.
I don't think I disagree with any of this - my lhtml hack was never meant as more than that - it originated out of dissatisfaction with the WITH-x syntax that CL-HTTP uses which is really painful to type, and you also need to define millions of macros, and it's only meant to be better than that. I consciously ignored the whole namespace stuff, because I was really only interested in spitting out something a browser could render efficiently, and embedding it in lisp programs in such a way that I can skip easily between lhtml and lisp (the macro just checks if the car is a keyword basically...). So really I just want to say that I'm not proposing the syntax I gave as anything other than a quick hack.
I'm curious about your syntax though: If I want to go from Lisp to something (rather than from something to Lisp), it seems that the syntax you give is amiguous because of this (I cut the lines that don't seem relevent).
What I considered for my own hack was to avoid the whole ((...) ...) thing by always requiring an attribute list, which could be nil, so these would come out as
(foo (bar "zot")) and (foo () (bar () "zot")) respectively. But for most cases that was more typing than I liked (since I was typing in Lisp not a better syntax).
> In my personal view, Lisp "markup" has the disadvantage of needing lots > of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx > is always symbolic and yyy is always a string of characters subject to > interpretation by whatever the symbolic part instructs in context.
Yes, this is a really good point, the quoting gets tedious.
> * Tim Bradshaw <t...@tfeb.org> > > ((:reply :title "Lisp is not just a programming language") > > (:body > > (:p "It is also a text-markup language, > > and many other things, as you can see here" > > "For instance with a suitable (small) macro, this is quite legal > > Lisp syntax, which is compiled to *ML. I have written significantly-sized > > documents in this notation.")) > > (:signature "--tim"))
> As long as we think aloud in alternative syntaxes, I actually prefer to > break the _incredibly_ stupid syntactic-only separation of elements and > attribute values. SGML and its descendants have made a crucial mistake: > For every level of container (there are about 7 of them), there is a new > syntax for _two_ properties of the container: (1) the contents is wrapped > in one syntax, but (2) the "writing on the box" is in quite another.
Certainly what you say is undeniably true in terms of practice, and I'd even give you that the notational distinction is not worth the mechanism, but is there somewhere that the language actually forces this "role" relationship?
I wrote a package in Java at a prior employer which automatically generated XML representations for classes as elements based on Java metadata, and the tack I took was not that the XML attributes contain meta-data and the contents data but rather that the XML attributes contain atomic data and the contents contain compound data, since this is IN FACT what the real distinction is. Some type Foo with
In effect, what I got out of this was a description that allowed two syntaxes: an easy syntax for easy things, and a hard syntax for hard things. Of course, there are all kinds of problems even then because of subclass relationships (the analog of the problem of *print-nreadably* and strings of base-char vs char, or the loss of fill-pointer, etc. in printing a string. Fixing these gets very verbose very quickly. So I'm mnot defending the notation in that regard.)
But what I'm really wondering is whether SGML has some "intended use" spec that tells you that you have to put meta-info in the "car" of the "form", and info in the "cdr". I thought the use of these containers was semantics-free.
> I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools > use and produce, because it makes XML just as evil in Lisp as it was in > XML to begin with, and we have gained absolutely nothing in either power > of processing or in abstraction, which is so very un-Lisp-like.
> <foo bar="zot">quux</foo>
> should be read as
> (foo (bar "zot") "quux")
Maybe. Macsyma used a similar notation for years (though without the restriction on container-ness). I don't think the answer is to change to do the rewrite you suggest. I don't understand why it's not natural to add the following as legal syntaxes:
<foo bar=<zot/>>
or
<foo bar=<string>zot</string>>quux</foo>
This would keep people from feeling the attribute list was a shorthand area and would also allow the storing of complex meta-data. Right now, the fact that a use of <...> in the attribute thing seems a terrible waste. The only rationales I can figure for this were either the desire to periodically beat someone on the back of the hand for syntax errors by having a regular application of over-applied syntax or else some sort of efficiency bum to make the acquisition of strings in the attribute list uselessly faster. Do you know what the reason was that recursive structures were not allowed in this position in XML?
Or perhaps it was the fact that the "real world" substitutes for "parsed structure" things like that weird assembly code like notation which looks like
(A AHREF=foo.html -Text )A
Perhaps someone was just being uncreative about how a compound-structure could be offered as an attribute.
> As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of > the annoying verbosity of these stupid languages while _retaining_ that > mistake between attribute values and elements, because it is quite hard > to make simple regular expression-based conversions retain enough data > about an element to decide what should be attribute and element. An > element has the form <name [attributes] | [contents]>. Attribute have > the form <name | value>. Internal whitespace is only for readability.
> In my personal view, Lisp "markup" has the disadvantage of needing lots > of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx > is always symbolic and yyy is always a string of characters subject to > interpretation by whatever the symbolic part instructs in context.
I'd like to see a side-by-side elaboration of this problem to better understand it.
> Still at "internal use" stage, I plan to publish some stuff about Enamel > not too far into the future.
Good. I'd hate for it to be "lost" as merely a post here, though I think it's fun that you felt comfortable in sharing your thoughts.
> I'm curious about your syntax though: If I want to go from Lisp to > something (rather than from something to Lisp), it seems that the syntax > you give is amiguous because of this (I cut the lines that don't seem > relevent).
The key to this is the relationship between foo and bar. Whether bar is an attribute or a sub-element of foo is irrelevant to processing them, but when you need to turn this back into SGML/XML/Enamel, you need to know which it is. This is why I said:
As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of the annoying verbosity of these stupid languages while _retaining_ that mistake between attribute values and elements, because it is quite hard to make simple regular expression-based conversions retain enough data about an element to decide what should be attribute and element.
... implying that I would normally have such information and use it when generating attribute/value or sub-element/contents.
> What I considered for my own hack was to avoid the whole ((...) ...) > thing by always requiring an attribute list, which could be nil, so > these would come out as
> (foo (bar "zot")) and (foo () (bar () "zot")) respectively. But for > most cases that was more typing than I liked (since I was typing in > Lisp not a better syntax).
Wouldn't attribute lists need to have a more `let' like syntax (and behavior).
(foo ((bar "zot")) "text")
or for some HTML:
(font ((size 10) (color :yellow)) "text")
Which is just a lisp program, after applying even my minimal skills with macros. The hard part is not overwelming the text.
(fragment ((layout :html-like) (feeling :pompous)) " SGML and TeX, being just markup, did their best to preserve the bulk of text without any transformation. Their goal is to take a normal text document and " (tquote "mark it up") " for computer interpretation. SGML markers are ugly but they weren't intended to dominate the file. " (p) " As people's interest has moved from SGML to XML, they now talk more off " (italic "structured data") ", and although this is a somewhat subtle change of mindset, it makes the markup the dominant part of the file. Unfortunately, once people start down a course of action they rarely stop to consider if the original design guidelines and intent may have been lost. " (p) " Just by " (emph "standardizing") " a straightforward mapping of XML into and back from lisp, the uglyness and verbosity of XML would be less of an issue. You could use the syntax you liked. I suspect when the enthusiasm for XML has died down a bit, the benefits of a standardized lisp notation could become better recognized. " (p) " Without such standards, of course, forget it. ")
You do need to step around the native lisp functions like quote.
> Wouldn't attribute lists need to have a more `let' like syntax (and behavior).
No. Please forget the attributes. There _are_ no attributes. Whether something is an attribute or not is completely arbitrary and irrelevant. Your access to that information is _not_ dependent on its rerepsentation in SGML/XML. Treat everything as a subordinate element. This is the key idea to gaining power of abstraction over the XML data. Holding on to the mythical distinction between attribute and sub-element is the key idea to losing any and all power of abstraction.
> Just by " (emph "standardizing") " a straightforward mapping of XML into > and back from lisp, the uglyness and verbosity of XML would be less of an > issue. You could use the syntax you liked. I suspect when the > enthusiasm for XML has died down a bit, the benefits of a standardized > lisp notation could become better recognized.
Please understand that that is what I was trying to do. The only way to deal with the mistake that they made in syntactically separating attributes from contents is to undo that mistake. Any and all catering to it is only making it worse.
> You do need to step around the native lisp functions like quote.
> Certainly what you say is undeniably true in terms of practice, and I'd even > give you that the notational distinction is not worth the mechanism, but > is there somewhere that the language actually forces this "role" relationship?
No, there is nothing that requires there to be element attributes as a distinct concept from element contents. There are, however, a number of practical things that follow from making that arbitrar distinction which can look like rationales, but if you ask yourself "why can it not be a subelement", there are no real answers, only appeals to the idea that there somehow __have to be a distinction. It took me years to figure out that the whole attribute idea is completely vacuous, and I worked with the creator of SGML himself for several years on several SGML-related standards and projects. I started writing "A conceptual introduction to SGML" back in 1994, but as I had pained my way through five chapters, I had to realize that it was all wrong. There was a basic design mistake in the whole language framework. That mistake is that simply put: "what is good enough for the users of the language is not good enough for its creators". Each and every level of "containership" in SGML has its own syntax, optimized for the task. Each and every level has a different syntax for "the writing on the box" as opposed to "the contents of the box". This follows from a very simple, yet amazingly elusive principle in its design: Meta-data is conceptually incompatible with data. This is in fact wrong. Meta-data is only data viewed from a different angle, and vice versa. SGML forces you to remain loyal to your chosen angle of view.
> I wrote a package in Java at a prior employer which automatically > generated XML representations for classes as elements based on Java > metadata, and the tack I took was not that the XML attributes contain > meta-data and the contents data but rather that the XML attributes > contain atomic data and the contents contain compound data, since this is > IN FACT what the real distinction is.
The key to understanding this is that there is no _one_ real distinction. There are in fact any number of "real distinctions". You just found one way to wrap your world in the attribute/contents dichotomy because it was there. What would you do if it was not? What would you do if you had only sub-elements? Would you have _invented_ attributes? I do not think anyone would have, because using sub-elements exacts no higher cost than using attributes.
> In effect, what I got out of this was a description that allowed two > syntaxes: an easy syntax for easy things, and a hard syntax for hard > things.
I propose an easier syntax for the harder things and a slightly harder syntax for the easier things so they do not impose any easy-vs-hard misconceptions on the user and designer. By making both things cost the same, the decision to use an attribute or a sub-element becomes a very different choice.
> But what I'm really wondering is whether SGML has some "intended use" > spec that tells you that you have to put meta-info in the "car" of the > "form", and info in the "cdr". I thought the use of these containers was > semantics-free.
The intended use has less to do with it than the notion that you can define what is meta-information and what is information at the time you want to decide whether something goes in an attribute or a sub-element. My argument is that this is impossible. Whether it is meta-information or information is a reflection of the actual use, not the intended use.
However, given that the mechanism was created, and I will argue that it was not so much created as it was never thought possible to be any other way, it was used to define several language properties. "Now that we have this, would it not also be nice to have that." This means that several of the attribute types grew very far apart from the contents of sub-elements and you sort of "had" to use them as attributes, but only sort of, because the application can and does define the semantics of everything, and if you want ID and IDREF, you can make the same choice as you would in Common Lisp to use symbols or a hash tables of strings.
> > I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools > > use and produce, because it makes XML just as evil in Lisp as it was in > > XML to begin with, and we have gained absolutely nothing in either power > > of processing or in abstraction, which is so very un-Lisp-like.
> > <foo bar="zot">quux</foo>
> > should be read as
> > (foo (bar "zot") "quux")
> Maybe. Macsyma used a similar notation for years (though without the restriction > on container-ness). I don't think the answer is to change to do the rewrite > you suggest.
I cannot follow you here. I am not suggesting a rewrite. I suggest that there is _no_ distinction between attribute and sub-element contents. What I am trying to communicate is so emphatically _NOT_ syntax that we will have a severe communications problem if this is not understood. The syntax has a function, and I am challenging the _function_ of the syntax that is believed by many people to support a concept I _also_ challenge. What do you gain from the attribute-vs-contents dichotomy? Why do you need it? What does it do for you? What would you have done if it were not there? What choices and design decisions went into attributes that would go into contents if you did not have attributes?
> I don't understand why it's not natural to add the > following as legal syntaxes:
> <foo bar=<zot/>>
> or
> <foo bar=<string>zot</string>>quux</foo>
Imagine that all attributes are in fact sub-elements, and this problem just goes away. Please, discard the concept of attributes. They no longer exist. What used to be called "attributes" are only sub-elements with special treatment and a whole bunch of arbitrary restrictions, one of which is lack of internal structure (except insofar as defined by the NOTATION attribute of attributes in SGML).
> This would keep people from feeling the attribute list was a shorthand > area and would also allow the storing of complex meta-data.
But that is not my goal. My goal is to get rid of the idea that there is a distinction that can be made once and for all, and prematurely at that, that some information is meta-data and some information is data. The core philosophical mistake in SGML is that you can specify these things before you know them. SGML is great for after-the-fact description of structures you already know how to deal with perfectly. It absolutely sucks for structures that are in any way yet to be defined. This is _because_ it is impossible to define what is considered meta-information and what is considered information before you actually have a full-blown software application that is hard to change your mind about. SGML was supposedly designed to free data from the vagaries of software, but when it adopted the attribute-content dichotomy, it dove right into dependency on the software design process instead of the information design process.
> Do you know what the reason was that recursive structures were not > allowed in this position in XML?
Yes, as a matter of fact, I do. Recursive structures are in fact allowed in attribute values, provided that your application processe them and not the SGML/XML parser. Back in the SGML days, the NOTATION attribute of both elements and attribute values was designed as an "escape" to the application to let some other syntax processor deal with the string of characters. (Please understand that everything SGML/XML is a string of characters. There are no _values_. Imposing valuedom on strings is the kind of semantics that SGML/XML specifically does _not_ support.)
> Or perhaps it was the fact that the "real world" substitutes for "parsed > structure" things like that weird assembly code like notation which looks > like
> (A > AHREF=foo.html > -Text > )A
> Perhaps someone was just being uncreative about how a compound-structure > could be offered as an attribute.
No, they never actually thought of it that way. You have to understand and appreciate that the design process for SGML was such that some people had a very clear picture of the meta-information-vs-information dichotomy and that it never occurred to anyone that meta-information had exactly the same properties as information.
Whoever first decided to define HTML in such a way that unknown elements should be displayed suffered from exactly the same problem. As a sorry consequence, we have elements that have to contain _comments_ that are the real contents because that somebody did not foresee the need to have meta-information in contents. I argue that this is a result of "getting" the invalid meta-information/information dichotomy. If that person had not been bitten by the false idea that meta-information is fundamentally different from information, he would have realized that there would be a need to use element contents for meta-information, as well.
> Good. I'd hate for it to be "lost" as merely a post here, though I think > it's fun that you felt comfortable in sharing your thoughts.
Well, it took ten years of discomfort with the "attribute" concept before I went back to examine the genesis of the various forms of attributes and persisted in asking the question "could it not have been done with sub-elements", and finally found that the reason it could not was that somebody did not _want_ it to be done with sub-elements, and that the root cause of this was a fundamental misunderstanding of the relationship between information and meta-information. Just like
...
Erik Naggum <e...@naggum.net> writes: > The key to understanding this is that there is no _one_ real distinction. > There are in fact any number of "real distinctions". You just found one > way to wrap your world in the attribute/contents dichotomy because it was > there.
I fully agree with Eric here, and have myself implemented S-expression file formats that in fact collapsed attributes to be just child elements in the same way.
The only useful information was the name of some data, and the assumed or explicit type of the data value. It made no difference in terms of processing if the data was logically an "attribute" or an "element" -- the problem of extraction is exactly the same in both cases.
That there is a difference with XML is only due to its artifical distinction of attributes vs elements.
The only useful distinction I have found for attributes vs elements was aesthetical: how did the element look to a human reader (i.e. me) of the XML file? Whether I would choose a simple vs compound approach depended solely on my mental picture of the data in question, e.g.
<foo name="joe" size="big"/>
vs
<foo> <name>joe</name> <size>big</size> </foo>
In an S-expression format however, it doesn't matter, and the aesthetic distinction is only:
(foo (name joe) (size big))
vs
(foo (name joe) (size big))
> Recursive structures are in fact allowed in attribute values, provided > that your application processe them and not the SGML/XML parser.
<rant>
As a separate topic, I just *hate* when people encode complicated data into attributes, forcing applications solve yet another parsing problem. The whole point of something like XML is to have a standard encoding structure. The "parsing" problem is supposed to be solved once, and only the semantic interpretation should remain.
Of course, S-expressions are much more preferrable. The only "cool" things about XML that I like is the ability to specify character encodings (ASCII vs Unicode, etc.) and the schema namespaces business, such that one can mix "tags" from semantically different spaces.
Mind you, the details of Schemas are overly complicated and gross to use, but they are better than DTDs which should just die die die. DTDs are a lesson in why a separate language should *not* be invented.
-- Cheers, The Rhythm is around me, The Rhythm has control. Ray Blaak The Rhythm is inside me, bl...@infomatch.com The Rhythm has my soul.
Erik Naggum <e...@naggum.net> writes: > No. Please forget the attributes. There _are_ no attributes. Whether > something is an attribute or not is completely arbitrary and irrelevant. > Your access to that information is _not_ dependent on its rerepsentation > in SGML/XML. Treat everything as a subordinate element. This is the key > idea to gaining power of abstraction over the XML data. Holding on to > the mythical distinction between attribute and sub-element is the key > idea to losing any and all power of abstraction.
I looked again, and you incantations did not work. Attributes still seem to be in the language. I agree that when XML is used as a data definition they are "completely arbitrary" and make a syntactic separation which is destructive. I, personally, just avoid using them when I have control of the XML I use to define data. But I can't ignor them or re-format them, when I need to generate XML which someone or some standard defined to use them. That battle belongs in the XML standards committees, and I am afraid its a bit late to change their minds.
If I just treat attributes as subordinate elements, I lose the ability to simply translate from lisp into XML. In other news articles you seem to suggest that you use information outside the lisp representation to make that determination. This means that my tools would require priori knowledge, which I feel a simple lisp->XML (non-interpretating) translator should not need. I don't think lisp->XML translators should have constraints that XML parsers don't have.
In code which interprets the lispified XML, I know what the grammar is, so can't I (at that time) bury any abstraction issues in the access methods? I admit I don't fully understand the abstraction benefits with which you are concerned. I've been overwelmed in tracking all the XML languages which are being defined. I was hoping that being able to map them into lisp syntax would help avoid being buried in XML's confusing syntax. When looking at them in a lisp syntax, thing can become clearer (and seem less innovative).
I don't agree that the distinction between attributes and entities is always arbitrary. SGML does stands for Simple Graphical *Markup* Language, and in a markup language, I think it is important to distinguish the text of a document from it markup. Multiple translators may be used, and they should not need to be kept up to date on what attributes are used in the other translators. In an expression like:
<header1><italic>Wow</italic>, this is difficult.</header1>
or as lisp (which I think is more readable): (header1 (italic "Wow") " this is difficult")
it isn't clear whether "Wow" is text or the value of an attribute unless you have prior knowledge of whether `italic` is a attribute in the context of a header1 directive. So here the distinction is simple, clear, and useful. (I am not commenting on the syntax.)
This is still important for things like xhtml -- and probably docbook, whose standard I have not yet assimilated.
In my previous message I suggested that: <header1 italic="Wow"> this is difficult</header1>
become: (header1 ((italic "Wow")) " this is difficult")
With mimimal (but I admit real) damage to the syntax.
> I looked again, and you incantations did not work. Attributes still > seem to be in the language.
Sigh.
> I agree that when XML is used as a data definition they are "completely > arbitrary" and make a syntactic separation which is destructive. I, > personally, just avoid using them when I have control of the XML I use to > define data. But I can't ignor them or re-format them, when I need to > generate XML which someone or some standard defined to use them. That > battle belongs in the XML standards committees, and I am afraid its a bit > late to change their minds.
How you work with XML is not defined by those standards bodies. What your _internal_ representation of XML looks like is not defined by those standards bodies. One of the fundamental properties of Lisp is that we have a very nice and well-defined mapping between external and internal representation for most of our object types. There is no well-defined mapping between XML syntax and internal representation. Lots of ways are equally valid. Insisting on only some of them is counter-productive.
> If I just treat attributes as subordinate elements, I lose the ability to > simply translate from lisp into XML.
You have made up your mind about this, so I shall not try to convince you of the errors of your ways. People who are dead set on their ways should be left alone, mostly because the get cranky when faced with alternatives.
> In other news articles you seem to suggest that you use information > outside the lisp representation to make that determination.
No, you do not understand, and that is because you do not even try.
> This means that my tools would require priori knowledge, which I feel a > simple lisp->XML (non-interpretating) translator should not need.
I see that you have to be very hard and fast on how you represent your information. This is your choice. I wish you would recognize it as a choice, and not try to impose a very specific view on the reality that is far more flexible and adaptable than you have shown to believe it to be.
> I don't think lisp->XML translators should have constraints that XML > parsers don't have.
Well, that is another choice you have made. Other people, other choices.
> In code which interprets the lispified XML, I know what the grammar is, > so can't I (at that time) bury any abstraction issues in the access > methods?
What does it matter to your access whether something is an attribute or a sub-element? Why do you need to retain the distinction internally?
> I admit I don't fully understand the abstraction benefits with which you > are concerned.
I appreciate that you state this, because you certainly have not.
> I've been overwelmed in tracking all the XML languages which are being > defined.
Yes, overwhelmed by bad design, most people's brain shut down and they refuse to deal with a massive simplification because it threatens to be as painful as dealing with the complexity they have barely survived.
> I was hoping that being able to map them into lisp syntax would help > avoid being buried in XML's confusing syntax.
That is my idea. I am sorry for you that you have to define away the solution to your problem by insisting on a trivial one-to-one mapping of conceptual elements that effectively block your own conceptualization.
> When looking at them in a lisp syntax, thing can become clearer (and seem > less innovative).
How very true.
> I don't agree that the distinction between attributes and entities is > always arbitrary.
Attribute and entities are very different concepts and distinction between them is of fundamental importance. I fail to see how you think I have made any claims about their relationship, however. I am talking about _elements_.
> SGML does stands for Simple Graphical *Markup* Language,
It stands for Standard Generalized Markup Language, actually. They key to understanding the name is that "generalized markup" is something more than mere markup. SGML has aspirations beyond simply marking up text.
> and in a markup language, I think it is important to distinguish the text > of a document from it markup.
I think I already said that.
> Multiple translators may be used, and they should not need to be kept up > to date on what attributes are used in the other translators.
Your value judgments are your choice. I happen to disagree with them. If you try to deny me this, please realize that I do not care at all.
> In an expression like:
> <header1><italic>Wow</italic>, this is difficult.</header1>
> or as lisp (which I think is more readable): > (header1 (italic "Wow") " this is difficult")
> it isn't clear whether "Wow" is text or the value of an attribute > unless you have prior knowledge of whether `italic` is a attribute in > the context of a header1 directive.
Well, first off: You _have_ that prior knowledge. Your application will actually need to know what to do with it whether it is an attribute or a sub-element. If your application does not know what to do with it, I fail to see how whether it is an attribute or an element can matter to you. If you _do_ know what to do with it, how does it matter to you whether it came from an attribute value or a sub-element?
> So here the distinction is simple, clear, and useful.
It is arbitrary.
> This is still important for things like xhtml -- and probably docbook, > whose standard I have not yet assimilated.
No, it is fundamentally unimportant. Please try to accept this premise for the sake of discussion, and see if something you believe falls out and shows itself to you as more important than your simple protestations.
> In my previous message I suggested that: > <header1 italic="Wow"> this is difficult</header1>
> become: > (header1 ((italic "Wow")) " this is difficult")
> With mimimal (but I admit real) damage to the syntax.
Keeping the distinction between attributes and content is keeping you from realizing how simple and efficiently you can deal with XML data. But that is your choice. I fully expect that loads of people who have fused their brains shut and have fully "integrated" the false dichotomy of attributes and contents will never be able to unfuse it and open up to a very simple realization that it has absolutely no bearing on anything _other_ than the specific syntax in SGML/XML whether something is an attribute or an element.
Those who grasp the concepts involved, will see that attributes are just another form of contents. Those who do not grasp the concepts involved, will think that attributes are different from contents because they have been given syntactically different expression. But it is always the syntax that follows the function. Someone believed that meta-information should be fundamentally different from information. Someone believed that the contents of elements should be text that wound up in the final document on the printed page and the values of attributes should not, but should only influence the processing of the information. This worked only as long as SGML was used as a markup language for documents and had no aspirations towards being an abstract structuring syntax. When it came to use it as a more abstract syntax, there _is_ no inherent quality that determines whether some value ends up displayed or not. That has to be supplied by the software that processes the information, which is precisely prior knowledge of the structure and its meaning.
Erik Naggum <e...@naggum.net> writes: > * Barry Fishman <barry_fish...@acm.org> >> If I just treat attributes as subordinate elements, I lose the ability to >> simply translate from lisp into XML.
> You have made up your mind about this, so I shall not try to convince you > of the errors of your ways. People who are dead set on their ways should > be left alone, mostly because the get cranky when faced with > alternatives.
Crankyness is just a part of facing new ways of looking at things, Its only a terminal disease in the very young. At my age, old ways of thinking, still, do not give way without an internal fight. I have great many years of Java/C/C++/Perl to overcome. I do comprehend that writing the equivalent code in lisp is pointless, although easy.
I wouldn't be taking the time to learn and work in lisp, if I didn't recognize that it could significantly improve the ways I analyze and solve problems. This was made obvious by looking at (what I presume is) good lisp code.
I will follow your suggestion and remove the entity/attribute distinctions in my lisp code. I am then left with a strong desire to keep the names of XML attribute names in a list, and use that in a generic XML output translator.
I suspect this is still avoiding the issues you have raised. Instead I will start by writing specific code for each case and see if a less "C" like way of sharing code becomes evident.
I am open to any suggestions, although I can not guarantee I will immediately grasp their rational. (I think I am past my cranky stage. I am never cranky when I get to write code.)
>> SGML does stands for Simple Graphical *Markup* Language,
> It stands for Standard Generalized Markup Language, actually.
Yes, Yes, Yes. I was focused on the _markup_ part, but there really is no excuse when the answer is just an `info psgml' away.
Barry Fishman -- I am used to working from the general to the specific. Problems seem to have the same design patterns in C/C++/Java and probably XML, although the implementations may use slightly different language features. However, this does not seem to follow with lisp. A new set of approaches take center stage, and I do not yet have the judgement to understand their implications. They just dangle before me benefits which aren't present otherwise. They also seem to carry the seed of complexitys which could bury the project as a whole. These disasters of course are present in other languages, but there I can trust my traditional ways of avoiding them. The answer is, as my music teacher would say, is practice, practice practice.
| * Barry Fishman <barry_fish...@acm.org> | | > In an expression like: | > | > <header1><italic>Wow</italic>, this is difficult.</header1> | > | > or as lisp (which I think is more readable): | > (header1 (italic "Wow") " this is difficult") | > | > it isn't clear whether "Wow" is text or the value of an attribute | > unless you have prior knowledge of whether `italic` is a attribute | > in the context of a header1 directive. | | Well, first off: You _have_ that prior knowledge. Your | application will actually need to know what to do with it whether | it is an attribute or a sub-element. If your application does not | know what to do with it, I fail to see how whether it is an | attribute or an element can matter to you. If you _do_ know what | to do with it, how does it matter to you whether it came from an | attribute value or a sub-element?
Well, I agree that in most cases you will know whether something was an attribute or contents, when you're processing it, but what about:
<foo bar="1"><bar>2</bar></foo>
If I understand you correctly (and I'm not exactly sure about that), you would represent this in Lisp as:
(foo (bar 1) (bar 2))
I don't see how you can distinguish attributes and contents in this case, and how you can translate this back into the same XML. Probably I'm missing something.
Boris Schaefer <bo...@uncommon-sense.net> writes: > Well, I agree that in most cases you will know whether something was > an attribute or contents, when you're processing it, but what about:
> <foo bar="1"><bar>2</bar></foo>
> If I understand you correctly (and I'm not exactly sure about that), > you would represent this in Lisp as:
> (foo (bar 1) (bar 2))
> I don't see how you can distinguish attributes and contents in this > case, and how you can translate this back into the same XML. Probably > I'm missing something.
assuming I understood Erik's point, you can surely distinguish attributes and contents when translating back to XML. that is, your Lisp->XML translator should know that `(bar 1)' under `foo' is Lisp->supposed to be an attribute.
having this knowledge _only_ in the output translator frees you from distinguishing attributes and contents in your program.
this does mean that the value of foo's `bar' attribute (er, content) should magically always be atomic, because if it's not you won't be able to output valid XML. but note that if your program can deal with non-atomic values of `bar', then you probably have chosen wrong XML format in the first place...
-- What is this talk of 'release'? Klingons do not make software 'releases.' Our software 'escapes' leaving a bloody trail of designers and Quality Assurance people in its wake. -- Klingon Programmer
> Well, I agree that in most cases you will know whether something was > an attribute or contents, when you're processing it, but what about:
> <foo bar="1"><bar>2</bar></foo>
> If I understand you correctly (and I'm not exactly sure about that), > you would represent this in Lisp as:
> (foo (bar 1) (bar 2))
> I don't see how you can distinguish attributes and contents in this case, > and how you can translate this back into the same XML. Probably I'm > missing something.
Yes, you are definitely missing the constraints of SGML in real life. There are some problems that are not worth solving because they never come up even if they superficially could appear to come up if you do not pay attention. This is such a problem. You have failed to consider the ramifications of the solutions and pose a problem that simply would not exist if you did. This taxes my patience, which already legendary in its general absence.
However, I apparently need to insist that you understand that in SGML and XML alike, you do in fact know what attributes an element has. It cannot possibly be ambiguous. If you decide to name a sub-element the same as an attribute, however massively stupid that is even with SGML/XML as it is, you _still_ know that you have an attribute with that name. That there is a sub-element with that name, as well, is coincidental to the representation. There simply is no way you can _not_ know that, unless you go out of your way to destroy the information that SGML provides you with. If you destroy the information that is available to you, you will not get me to do stupid human tricks answering your resulting questions.
I truly wonder what is so hard to understand about this. We Lisp people are quite used to association lists, right? Keyword-value pairs do not need to be in property lists to be understandable by Lisp people, do they? To my mind, whether you store something in a property list or an association list is arbitrary. However, in the reactions that I have seen to obliterating the false dichotomy between attributes and contents in SGML, there somehow seems to be a _fundamental_ difference between property lists and association lists. I completely fail to understand how that can be.
The whole deal is so simple I do not even know how to explain it so people get it if they do not get it immediately. It is somewhat like seeing someone struggle with fractions. They either get it or they do not, and although I have managed to make many a struggling child get the idea, I have _no_ idea what precisely caused them to grasp it. It just happened, and they laughed in relief. This attribute/container thing is equally intuitively evident.
Case in point: An element has a fixed number of attributes. That is reflected in a fixed length of the association list that makes up the attributes. Attributes are not repetable and not omissible, so if there are n attributes in the attribute list for an element, there will be n conses with attributes in the cdr of the element representation. There are no two ways about this. It is completely and irrevocably unambiguous.
By exploiting the rich information we have about the elements and their makeup in SGML, we can reason about things with much simpler means than by adhering strictly to the particular representational issues in SGML. If it matters to you that some values are attributes, you ask for the attribute information. If it does not matter to you, you can be relieved of the distinction. If you want to transform attribute to contents or vice versa, modify the information about the element, not the element; if and when you print it out, the modifications will manifest themselves in new SGML/XML syntax, but nothing happened to your internal representation.
* Erik Naggum wrote: > The key to this is the relationship between foo and bar. Whether bar is > an attribute or a sub-element of foo is irrelevant to processing them,
Yes, that's a good point. Whenever I've tried to design DTDs I've always ended up having no attributes but doing everything as subelements, and it's interesting that another very rich & successful markup language - CL - does everything as `subelements' except when people like me try and make it mirror HTML.
> ... implying that I would normally have such information and use it when > generating attribute/value or sub-element/contents.
Yes, that's the crucial information I don't have in my application.
Erik Naggum <e...@naggum.net> writes: > I truly wonder what is so hard to understand about this.
I think in situations like this the answer is that you need to stay concrete. One often can't say specifically why one finds something difficult or hard, but one can generate a test case that is at their fringe. It was asked whether
<foo bar="1"><bar>2</bar></foo>
would be represented as
(foo (bar 1) (bar 2))
and you've sort of hinted yes. You've made allusions to alists as a way of understanding this, but as a sense of intuition, of course, that doesn't help a Lisp programmer a lot since plainly an alist is about th leftmost of each named thing, and people are uneasy about accessing the next-leftmost element behind it--that usually violates some sense of a-list/stack discipline.
You haven't offered an operator whose goal is to be like destructuring-bind and so to get around this, so the burden seems, to those looking on, to be on the programmer to pick apart this structure manually and the set of tools seems light. That's probably only an artifact of not seeing your tools, rather than anyone's belief that you have no such tools.
Likewise you haven't shown any syntax which is, by loose analogy, the equivalent of Lisp's arglist strangeness for keywords where you map a keyword to a differently-named variable by doing (lambda ((:foo fu) 3) fu) It is by having an abstraction like this that you can assure the person that the caller's name for things will not confuse the callee. I toyed with coming up with an analogously absurd example for Lisp and the following was my best go of it. If it seems unhelpful, just ignore it. But the point is just to show that you can manage (let ((weird 'ee) (apartheid 'ii) (pie 'ii) (pier 'ee)) (labels ((fn1 (&key ((:ei e)) ((:ie i))) (list 'ee e 'ii i)) (fn2 (&key ((:ie e)) ((:ei i))) (list 'ee e 'ii i)) (sort-by-sound (&rest keys &key (first-vowel-wins-p t) &allow-other-keys) (apply (if first-vowel-wins-p #'fn1 #'fn2) :allow-other-keys t keys))) (list (sort-by-sound :first-vowel-wins-p t :ei 'weird :ie 'pie) (sort-by-sound :first-vowel-wins-p nil :ei 'apartheid :ie 'pier)))) ((EE WEIRD II PIE) (EE PIER II APARTHEID))
That is, somehow you'd expect the external representation (:ie vs :ei) to have a fixed effect on what two functiosnt that each have the same body might return, but the arglist mappings (the "magic" in your example, of a different kind than the "magic" here of &key, but still magic in a way) manage to sort things out. It isn't their behavior but the cross-bar you plug between them that is doing something cool, and people don't see what that cool thing is, probably only for lack of specificity rather than disbelief that what you say might be true. Just as my example above is ho-hum to a Lisp programmer, not mysterious, once they understand how keyargs work.
I think it would help if you posted the NML which helps you manipulate these, and perhaps a small code fragment that showed an end-to-end use of constructing an expression in Lisp and having it appear in the XML with this notation Boris suggests, and the reverse. Then people would be talking concrete still.