Erik Naggum <e...@naggum.net> wrote: +--------------- | * Boris Schaefer <bo...@uncommon-sense.net> | > Well, I agree that in most cases you will know whether something was | > an attribute or contents, when you're processing it, but what about: | > <foo bar="1"><bar>2</bar></foo> ... | > (foo (bar 1) (bar 2)) | > I don't see how you can distinguish attributes and contents in this case, | > and how you can translate this back into the same XML. ... ... | Case in point: An element has a fixed number of attributes. That is | reflected in a fixed length of the association list that makes up the | attributes. Attributes are not repetable and not omissible, so if there | are n attributes in the attribute list for an element, there will be n | conses with attributes in the cdr of the element representation. There | are no two ways about this. It is completely and irrevocably unambiguous. +---------------
While not repeatable, attributes *are* omissible if the DTD for those attribute contains either default values or the "#IMPLIED" status keyword, are they not? So if the DTD said:
<!ELEMENT foo (bar | PCDATA)*> <!ATTLIST foo bar NUMBER #IMPLIED>
that is, the "foo" element has an optional "bar" attribute *and* also allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar 2)) *would* be ambiguous.
I see two obvious ways to preserve the simplicity you seek:
1. Do what CL does for declarations, that is, reserve a symbol to tag lists of attributes (like "declare" does), which are optional, but if present may only appear before all non-attribute subforms:
(foo (attr (bar 1)) (bar 2))
2. Force attribute names and element names into different packages, e.g.:
(foo (attr:bar 1) (bar 2))
or if the current package is never the keyword package, simply:
After writing the usual 'when to use elements and when to use attributes' bit for a new book and then spending some time close up with the XLink specs, I'm really starting to wonder if we haven't painted ourselves into a corner by treating leaf elements and attributes differently.
Unfortunately, no significant followups seem to have been posted!!
----- Rob Warnock, 30-3-510 <r...@sgi.com> SGI Network Engineering <http://reality.sgi.com/rpw3/> 1600 Amphitheatre Pkwy. Phone: 650-933-1673 Mountain View, CA 94043 PP-ASEL-IA
> I think it would help if you posted the NML which helps you manipulate > these, and perhaps a small code fragment that showed an end-to-end use > of constructing an expression in Lisp and having it appear in the XML with > this notation Boris suggests, and the reverse. Then people would be > talking concrete still.
I'd like to echo this sentiment. I'm intrigued, but Dog knows intriguing things can turn out to be pretty aweful in practice, or divine, or anywhere in between, but it takes actual experience to tell the difference most of the time.
> You've made allusions to alists as a way of understanding this, but as a > sense of intuition, of course, that doesn't help a Lisp programmer a lot > since plainly an alist is about th leftmost of each named thing, and > people are uneasy about accessing the next-leftmost element behind > it--that usually violates some sense of a-list/stack discipline.
Well, this is why association lists work as a metaphor -- attributes in SGML/XML cannot be repeated. If there are more keys in the remainder of the contents, they are not attributes.
> You haven't offered an operator whose goal is to be like > destructuring-bind and so to get around this, so the burden seems, to > those looking on, to be on the programmer to pick apart this structure > manually and the set of tools seems light. That's probably only an > artifact of not seeing your tools, rather than anyone's belief that you > have no such tools.
Which tools are available for the contents? Why are they _not_ usable directly for the attributes? I fail to grasp what you want to _do_ with the attributes that you cannot do with them if they are sub-elements.
You imply that people are unable to deal with sub-elements and need special tools to deal with attributes. This _must_ be wrong.
> I think it would help if you posted the NML which helps you manipulate > these, and perhaps a small code fragment that showed an end-to-end use of > constructing an expression in Lisp and having it appear in the XML with > this notation Boris suggests, and the reverse. Then people would be > talking concrete still.
I assume that people who voice their concerns in this discussion know SGML. I have no inclination to write tutorials for people who do not. It is a waste of my time, and I know that I will hate it. I have about 500 pages of a book entitled "A Conceptual Introduction to SGML" that I swear to whichever deity is on duty today will _never_ be published, because the design flaws of SGML are so pervasive that the only thing I want to do with them is get rid of them. Accept the fact that I deal with a history of personal pain in this regard. I invested 6 years of my life on SGML and related standards, and the more I worked with it, the more I found that SGML actively destroyed any hope of achieving what it had set out to do, because it is introducing several poisons into the conceptual processes of structuring information. Taking a look at what people do with SGML and XML today has not shown _one_ case of anyone waking up and smelling the coffee, and it has been _burning_ in the coffee machine for a decade.
This is my view: You were told that you needed attributes in addition to sub-element contents. Why did you ever _agree_ to that? The onus of proof is normally on he who asserts the positive, and I challenge you to explain to me why you _need_ attributes rather than accepting any challenge to explain why you do _not_ need them when what I say is that _you_ already know perfectly well how to deal with sub-elements. If you have worked with SGML at all, you _know_ that people screw up attributes and sub-elements, and you _have_ to had to deal with one that should have been the other in your processing. It is _impossible_ to get them "right" because the notion that there is a "right" solution depends on information that is not available at the time the distinction is made.
Over the years, I have thought of _many_ different ways to deal with the colossal braindamage that is attributes in SGML. One might think of them as (keyword) arguments to functions, but which other information should influence a "function" that deals with an element? Well, first and foremost, its _parentage_. That means that I have already had to get rid of the notion that <foo bar="x" zot="y"> is "really" a function call like (foo :bar "x" :zot "y"). It has to know _so_ much more to do _anything_ right that it is completely useless to cast one's thinking in such terms.
SGML must be _questioned_, not accepted as gospel or natural science reporting on some findings. Somebody made a decision to add attributes, and I know for a fact that that was back in the days of typesetting and document production when the idea was that you should be able to "remove" the "tags" and end up with the readable text of the document as it would be printed. That was the _real_ rationale for attributes. I happen to think that was a briliant idea at the time -- competing markup languages have a serious problem in using notations that destroy the ability to figure out easily what it intended for human and what is intended for the machine. (In particular, TeX is a monster.) I tended towards explaining to people that they should not let stuff that should not be displayed be in sub-elements. What a crock of shit that advice is! As soon as GML became more general than producing print documents, for which it was well suited and still is, the attribute concept had become a mill-stone around its neck and it dragged it down fast. It was _wrong_ to keep attributes around when their rationale had been completely eradicated from its set of operating conditions. It made everything incredibly complex. I was one of very few people on this planet to really _study_ the standard, and my brain works in such a way that I still _know_ with immediate certainty whether something is or is not supported by the standard language and how to express it. (It works the exact same way with Common Lisp, Ada (1983, unfortunately :), C (1991), and any number of things I have really sat down to study and understand, and it is so efficient that I even get an emotional response to violations before I see the logic of them.) I love the way my brain works, but it also has serious drawbacks: Overriding and updating old information is something I have to work really hard at. The end result of the way I think and the way the standard is defined is that I immediately saw these massively complex ways to do things that "nobody" understood. Take HyTime and what it calls "architectual forms" -- I vividly remember a long walk around a quiet Tallahassee one summer night with the creator of this concept, when I questioned some of the designs and how it would be implemented, and he was quiet for the longest time before he said that I was probably the first person to have understood what he was _really_ trying to accomplish. That would have been _such_ a great thing if it had been, say, rocket science, but it was not. It was a man-made complexity so great that it had required _months_ of brain- wracking to really get my intuition working. That was the first time I had really serious doubts about the wisdom of SGML's structuring process, because the massive complexity of it all is _completely_ pointless and a result of spreading the semantics so thin that you had to keep mental track of an enormous number of relationships to end up with an idea of what something should do or mean. It does not have to be that way. It was _profoundly_ disappointing to discover that at the end of this long process of grasping something that looked intellectually challenging lie only a complexity that resulted from _rejecting_ simplicity of design at a few crucial points. Hell, it still took me years to figure out what alternatives they _should_ have picked up, and by then it was too late.
Now you are probably thinking "how F hard can it be?" and looking half condescending on a retarded monkey who cannot figure out the purposes of the mathematical relationships in calculus. But it is the same problem we find in C++. The question to be asked of massive complexity like that is not "what wonderful things did you find out that made this necessary", but "whatever did you _miss_ that made this so horribly complex"? You can sometimes see people who are really, really dumb go about some simple tasks in a way that tells you that they have arrived at their ways of performing it through an incredibly painful process that they are loathe to reopen or examine at all no matter how hard it is to get it right for them. Some people will construct ways of performing their job so that they utilize all available brainpower, simply because that is indeed a very satisfying feeling. However, when it comes to grasping someone else's _wrong_ ideas, there is no upper bound on complexity. Some people have the most bizarrely convoluted thinking processes and they completely fail to monitor their thinking so they traipse off into oblivion and may or may not come back, but if they do, it is with these spectacularly irrational ideas that they _love_ before they discard them. This is the kind of complexity that befell the SGML community. That I could figure this mess out and think about it and have something dramatic to say about it to the creators, frankly scares me.
In any case, I think the core problem is that a request for a rationale for _removing_ a complexifying misfeature is completely bogus. We should not look at what we wound up with, we should look at _how_ we wound up where we are. I have explained how attributes got invented in the first place and it _was_ a good idea at the time. However, as soon as elements got more abstract and elements could contain _no_ information that would wind up on the printed page, but instead other elements that would, and those "abstract" elements would influence the way their sub-elements' contents would wind up on the printed page, it should have been clear that the attribute concept should be scheduled for extinction because some of its roles had now been moved into a different realm where _all_ of its roles could be moved without sacrificing anything.
> While not repeatable, attributes *are* omissible if the DTD for those > attribute contains either default values or the "#IMPLIED" status keyword, > are they not?
That depends on whether you represent the parsed or pre-parsed structure. In a Common Lisp setting, we are dealing with parsed structure. If the attribute value is "implied" in the source, it still needs to be there in the parsed structure.
> So if the DTD said:
> <!ELEMENT foo (bar | PCDATA)*> > <!ATTLIST foo bar NUMBER #IMPLIED>
> that is, the "foo" element has an optional "bar" attribute *and* also > allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar > 2)) *would* be ambiguous.
If you choose to represent a pre-parsed SGML instance in Common Lisp, I would argue strongly against that before I would even attempt to answer anything else.
I _really_ mean it when I say that the attribute list has a fixed length.
I also indicated that for pragmatic reasons, I sometimes use a marker to separate the attributes from the contents in the cdr of the element, such as when the task at hand would be wastefully slow if I were to deal with a fully parsed structure. Dirty hacks should be within reach because the world is sometimes not clean. I am probably not going to get used to the habit of some people who see a problem in one part of a proposal and ignore the fact that there is a solution in another part of the same proposal (like the next paragraph), and I am certainly not patient enough with all the rampant idiocy in the SGML/XML world to explain this over and over, but please go back and read the whole message. If you find a need to use a marker in _some_ cases, I have in fact covered it. In the fully parsed, fully general case, that need does _not_ arise, because the attribute list is a fixed set of "slots" in the structure. This should have no bearing on how to process them, however, but of course it matters to and from SGML/XML representation.
Kent M Pitman <pit...@world.std.com> wrote: +--------------- | r...@rigden.engr.sgi.com (Rob Warnock) writes: | > 2. Force attribute names and element names into different packages, e.g.: | > (foo (attr:bar 1) (bar 2)) | > or if the current package is never the keyword package, simply: | > (foo (:bar 1) (bar 2)) | | Don't forget XML has a package namespace of its own. You'd need nested | namespaces to pull this off, no? +---------------
Oh, heavens! I certainly wasn't trying to open *that* can of worms again! But yes, you're right, of course, if one were to try to use Lisp namespaces directly for XML names. But...
I think Erik's parallel response gets it absolutely correct [which I missed on first reading of his earlier article -- oops!], namely, once parsed (and defaulted, if necessary) all the stuff about what's an "attribute" and what's not should be a property of the Lisp representation of the element [CLOS class, whatever], and not necessarily encoded in any way in the Lisp data structure per se.
Likewise, I suspect the right answer for dealing with XML namespaces will turn out to be to have the Lisp representation of each element worry about that, and use directly-corresponding names for XML elements and Lisp symbols only to the extent that it's convenient, and *NOT* attempt to force any rigid or automatic 1-to-1 correspondence.
I was intending to use Lisp packages only to encode the one bit of "attribute/non-attribute", not encode XML namespace, but Erik rightly showed that approach was still trapped in the SGML/XML worldview. Hence, I retract the suggestion (except in the case that the Lisp representation of a particular element *chooses* to use that distinction, purely for its own convenience).
-Rob
----- Rob Warnock, 30-3-510 <r...@sgi.com> SGI Network Engineering <http://reality.sgi.com/rpw3/> 1600 Amphitheatre Pkwy. Phone: 650-933-1673 Mountain View, CA 94043 PP-ASEL-IA
Erik Naggum <e...@naggum.net> wrote: +--------------- | r...@rigden.engr.sgi.com (Rob Warnock) | > While not repeatable, attributes *are* omissible if the DTD for those | > attribute contains either default values or the "#IMPLIED" status keyword, | > are they not? | | That depends on whether you represent the parsed or pre-parsed structure. | In a Common Lisp setting, we are dealing with parsed structure. If the | attribute value is "implied" in the source, it still needs to be there | in the parsed structure. +---------------
*Doh!* I think I finally get what you were trying to say, thanks!
+--------------- | > So if the DTD said: | > <!ELEMENT foo (bar | PCDATA)*> | > <!ATTLIST foo bar NUMBER #IMPLIED> | > that is, the "foo" element has an optional "bar" attribute *and* also | > allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar | > 2)) *would* be ambiguous. | | If you choose to represent a pre-parsed SGML instance in Common Lisp... +---------------
Or a half-parsed (i.e., half-assed)? ;-}
+--------------- | I would argue strongly against that before I would even attempt to | answer anything else. | | I _really_ mean it when I say that the attribute list has a fixed length. +---------------
Got it. Now let's see if I can explain it to others who may not have:
My understanding of what Erik is suggesting [very strongly!] is that one should *NOT* try to invent any kind of direct "Lispified" or S-expr restatement of XML/HTML/SGML *syntax* per se, but instead to *parse* the XML document and choose convenient (potentially element-specific) CL representations for the parsed elements. This parsing process will involve filling in default values for omitted attributes, including those whose default is "#IMPLIED". Once you have done this parsing, there is nothing "optional" at all about any of the attributes -- you now have *all* of their values. [Whether you choose to explicitly store defaulted ones or not is a separate decision -- in any event you know their values.]
Now, having parsed the element and filled in the defaults, how you choose to represent it in CL data is pretty much up to you. One way might be as an instance of a CLOS class, with the attributes as slots [plus a slot for the sub-elements, if it's not an empty element]. This would allow you to use a generic function (print-element elem style) that specialized on both the element type and the desired output style to output completely different texts from the same parsed document.
Another way is a simple list of the element name[*] followed by the values of the attributes (with or without attendant "keywords" to make them readable to humans debugging the program) followed by the rest of the contained elements (if any). Without any attribute markers at all, this might have a form similar in appearance (only!) to a function call with positional parameters, that is:
<foo bar="1"><bar>2</bar></foo>
after parsing might internally represented as:
(foo 1 (bar 2))
Or if you choose to add some element-like structure to the attributes, you can do that, too. [You might choose to do that if (*ugh!* *shudder!*) some attributes contain further internal structure, and you'd like to represent the *parsed* version of that structure in a pleasing way.] That gets us to:
(foo (bar 1) (bar 2))
But again, since all of the application routines that have to deal with a "foo" element *know* that "foo" has a "bar" attribute, all of the code [that cares about attributes] knows that the CADADR is the attribute value and the CDDR is the content.
Now suppose that the application-implied value for the attribute "bar" is zero, and we are given this to parse:
<foo><bar>2</bar><bar>17</bar></foo>
What I (finally) heard Erik say is that the only reasonable internal representation for that (depending on whether you chose the "positional" or "element-like" representation for foo's attributes) would be one of these forms:
That is, the structure of the CL representation *must* be invariant w.r.t. inclusion or omission of attributes in the source text. So in the second form, the CADADR is still the attribute value and the CDDR is still the content, even though the attribute was omitted in the source text.
+--------------- | I also indicated that for pragmatic reasons, I sometimes use a marker to | separate the attributes from the contents in the cdr of the element, such | as when the task at hand would be wastefully slow if I were to deal with | a fully parsed structure. Dirty hacks should be within reach because the | world is sometimes not clean. +---------------
I now understand & agree.
+--------------- | I am probably not going to get used to the habit of some people who | see a problem in one part of a proposal and ignore the fact that there | is a solution in another part of the same proposal (like the next | paragraph), and I am certainly not patient enough with all the rampant | idiocy in the SGML/XML world to explain this over and over, but please | go back and read the whole message. +---------------
I did, and that's when the light finally dawned, but I have to say that until one *does* finally understand it's not at all obvious. No, I don't know how you could have said it any more clearly. I can only say (from personal experience now!) that if one *ever* falls into the trap of trying to "Lispify" the *syntax* of XML instead of represent the *parsed* structure, it can be very hard to let go of that fixation.
Hmmm... Perhaps it's some sort of "figure/ground" thing, as in that classic picture <URL:http://www.lcsc.edu/ss150/u5s1p6.htm> used in gestalt psychology. If you see the young woman first, it's sometimes hard to then see the old hag (or vice versa). And one's history or prejudices may strongly affect which one you see first, e.g., young men tend to see the young woman first.
[Of course, once you've seen *both*, then it's much, much easier to flip your perception back and forth at will between them.]
-Rob
[*] That is, as I mentioned in my parallel reply to Kent, a CL symbol chosen to *represent* the XML element name, not necessarily or even desirably any automatic conversion of the XML element name to a CL symbol.
----- Rob Warnock, 30-3-510 <r...@sgi.com> SGI Network Engineering <http://reality.sgi.com/rpw3/> 1600 Amphitheatre Pkwy. Phone: 650-933-1673 Mountain View, CA 94043 PP-ASEL-IA
[Note: aaanal...@sgi.com and zedwa...@sgi.com aren't for humans ]
| That depends on whether you represent the parsed or pre-parsed | structure. In a Common Lisp setting, we are dealing with parsed | structure. If the attribute value is "implied" in the source, it | still needs to be there in the parsed structure.
Aahh, I think this clears things up for me. I think I understand now. Thanks.
| I am probably not going to get used to the habit of some people who | see a problem in one part of a proposal and ignore the fact that | there is a solution in another part of the same proposal (like the | next paragraph), and I am certainly not patient enough with all the | rampant idiocy in the SGML/XML world to explain this over and over, | but please go back and read the whole message.
I did. I actually read that part about the marker before already, somehow it just didn't enter my brain. I also really didn't realize that the attribute list _really_ is fixed length after parsing. Thanks for stressing your patience and explaining it again.