I have a system which currently reads an sexp based config file syntax, for which I need to provide (and in fact have provided) an alternative XML-based syntax for political reasons.
I'm wondering if anyone else has been through this and has run into the same problems I have and maybe can offer any solutions. To describe them I need to describe the current syntax slightly.
The config files are read and validated in a `safe' (as safe as I can make it easily, which may not be really safe) reader environment. After reading, they are validated by checking everything read is of a good type (using an occurs check in case of circularity) a `good type' means pretty much non-dotted lists, strings, reals, keywords and a defined list of other symbols. At top-level a file consists of conses whose cars are one of these good symbols.
Before anything else happens some metasyntax is expanded which allows file inclusion, and conditionalisation. This results in an `effective file' which may actually be the contents of several files. The metasyntax is just things like ($include ...) or ($platform-case ...).
Finally, the resulting forms are passed to a handler function (this is a function passed to the config file reader) which gets to dispatch on the car of the forms, and do whatever it likes.
A top-level form is declared valid by declaring that its car is a `good symbol' (via a macro) and usually by defining a handler for it. In some cases the system wants all forms to be handled, but in many cases all it cares is that the form is `good' (it must be good for the first stage not to reject it) - this depends entirely on the handler.
The end result of this is that a module of the system can very easily declare a new config-file form to be valid and establish a handler for it, thus enabling it to get configured correctly at boot time or whenever else config files are read. The overall system does not have to care about anything other than making sure the files are read.
(On top of this there's a reasonably trivial hook mechanism which can let modules run code before or after a config file is read or at other useful points, so they can, for instance, check that the configuration they needed actually happened.)
So I have to make something like this work with XML, and I have to do it without doubling the size of either my brain or the system - as far as I can see if I was to even read most of the vast encrustation of specifications that have accumulated around XML I'd need to do the former, both to make space for them and to invent a time machine so I can do it in time. If I was to actually use code implementing these specs then I'd definitely do the latter.
So what I'm doing instead is using the expat bindings done by Sunil Mishra & Kaelin Colclasure (thanks), writing a tiny tree-builder based on that, and then putting together a sort of medium-level syntax based on XML.
Because I'm using expat I don't need to care about DTDs, just about well-formedness. But it would be kind of nice (the client thinks) to have DTDs, because it would be additional documentation.
But this seems to be really hard. Firstly, because of the metasyntax, the grammar is kind of not like anything I can easily describe (as a non-expert DTD writer). For instance almost any config file can have metasyntax almost everywhere in it. I could give up and have XML syntax which looks like:
<cons><car><string>...</string></car>...</cons>
or something, and write a DTD for that but this is obviously horrible.
Secondly, my system has modules. These modules want to be able to declare handlers of their own. One day *other people* might write these modules. It looks to me like any little module which currently, say, declares some syntax like:
(load-patches file ...)
now has to involve me in changing the DTD to allow (say)
<load-patches><file>...</file>...</load-patches>
This looks doomed.
When I skim the XML specs (doing more than this would require far longer than I have: and they've also now fallen through my good strong 19th century floor and killed several innocent bystanders in the floors below before finally coming to rest, smoking, embedded in the bedrock a few hundred yards under my flat) it looks like there is stuff do to with namespaces which looks like it might do what I want - it looks like I can essentially have multiple concurrent DTDs and declare which one is valid for a chunk by using namespaces. Then each module could declare its own little namespace. This is kind of complicated.
Or I could just give up and not care about DTDs: the system doesn't actually care, so why should I? But then, is there any sense in which XML is more than an incredibly complex and somehow less functional version of sexprs? Surely it can't be this bad?
So really, I guess what I'm asking is: am I missing something really obvious here, or is it all really just a very hard and over-complex solution to a problem I've already solved?
* Tim Bradshaw | So really, I guess what I'm asking is: am I missing something really | obvious here, or is it all really just a very hard and over-complex | solution to a problem I've already solved?
XML, being the single suckiest syntactic invention in the history of mankind, offers you several layers at which you can do exactly the same thing very differently, in fact so differently that it takes effort to see that they are even related.
<foo type="bar">zot</foo> actually defines three different views on the same thing: Whather what you are really after is foo, bar, or zot, depends on your application. XML is only a overly complex and otherwise meaningless exercise in syntactic noise around the message you want to send. Its notion of "structure" must be regarded as the same kind of useless baggage that come with language that have been designed by people who have completely failed to understand what syntax is all about. It is therefore a mistake to try to shoe-horn things into the "structure" that XML allows you to define.
In the abaove example, foo can be the application-level element, or it can be the syntax-level element and bar the application-level element. It is important to realize that SGML and XML offer a means to control only the generic identifier (foo) and their nesting, but that it is often important to use another attribute for the application. This was part of the reason for #FIXED in the attribute default specification and the purpose of omitting attributes from the actual tags. In my view, this is probably the only actually useful role that attributes can play, but there are other, much more elegant, ways to accomplish the same goal, but not within the SGML framework. Now, whether you use one of the parts of the markup, or use the contents of an element for your application is another design choice. The markup may only be useful for validation purposes, anyway.
The XML now contains all the syntax information of the "host" language. Many people think this is the _only_ granularity at which XML should be used, and they try to enforce as much structure as possible, which generally produces completely useless results and so brittle "documents" that they break as soon as anyone gets any idea at all for improvement.
The XML now contains only a "surface level" syntax and the meaning of the form elements is determined by the application, which discards or ignores the "form" element completely and looks only at the attributes. This way of doing things allows for some interesting extensibility that XML cannot do on its own, and for which XML was designed because people used SGML wrong, as in the first example.
The XML is now only a suger-coating of syntax and the meaning of the entire construct is determined by the contents of the form elements, which are completely irrelevant after they have been parsed into a tree structure, which is very close to what we do with the parentheses in Common Lisp.
I hope this can resolve some of the problems of being forced to use XML, but in all likelihood, lots of people will object to anything but the finest granularity, even though it renders their use of XML so complex that their applications will generally fail to be useful at all. Such is the curse of a literally meaningless syntactic contraption whose verbosity is so enormous that people are loath to use simple solutions.
My preferred syntax these days is one where I use angle brackets instead of parentheses and let the symbol in the first position determines the parsing rules for the rest of that "form". It could be mistaken for XML if you are completely clueless, but then again, if you had any clue, you would not be using XML.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
In the last exciting episode, Erik Naggum <e...@naggum.net> wrote::
> * Tim Bradshaw > | So really, I guess what I'm asking is: am I missing something really > | obvious here, or is it all really just a very hard and over-complex > | solution to a problem I've already solved? > XML, being the single suckiest syntactic invention in the history > of mankind, offers you several layers at which you can do exactly > the same thing very differently, in fact so differently that it > takes effort to see that they are even related.
Wouldn't the embedding of quasi-XML-like functionality into HTML be considered to suck even worse? -- (reverse (concatenate 'string "gro.mca@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/finances.html Giving up on assembly language was the apple in our Garden of Eden: Languages whose use squanders machine cycles are sinful. The LISP machine now permits LISP programmers to abandon bra and fig-leaf. -- Epigrams in Programming, ACM SIGPLAN Sept. 1982
Erik Naggum <e...@naggum.net> writes: > * Tim Bradshaw > | So really, I guess what I'm asking is: am I missing something really > | obvious here, or is it all really just a very hard and over-complex > | solution to a problem I've already solved?
> XML, being the single suckiest syntactic invention in the history of > mankind, offers you several layers at which you can do exactly the same > thing very differently, in fact so differently that it takes effort to > see that they are even related.
I don't think there's anything wrong with XML that a surgeon's knife, removing 80% (or more) of the standard's text, wouldn't fix.
IMO, what makes XML bad is not how little it does but how much it pretends to fix from what came before, yet without changing anything. If it had either attempted less or been willing to make actual changes, it might be respected more.
XML's lifeboat-like attempt to rescue all of SGML's functionality from drowning, yet without applying "lifeboat ethics" and tossing deadweight overboard (i.e., abandoning compatibility), seems to be the problem.
To quote Dr. Amar Bose (of Bose corporation fame): Better implies different.
Tim Bradshaw <t...@cley.com> writes: > Or I could just give up and not care about DTDs: the system doesn't > actually care, so why should I?
Give up an don't care about DTDs. Your posting gives a clearer explanation about your format than any DTD would. DTDs that are for humans to read have to be understandable, and if the DTD will be torturous than there is no point.
DTDs other official purpose is for separate validation, a dubious idea in my opinion. The application that finally processes an XML file will need to validate it on its own anyway, so what is the point of validation in advance?
> But then, is there any sense in which XML is more than an incredibly complex > and somehow less functional version of sexprs? Surely it can't be this bad?
It's really that bad. XML does have the nice notion of support for various character encodings. There are tricks with namespaces you can do that seem more powerful, but on the whole things are confusing and error prone as holy hell.
> So really, I guess what I'm asking is: am I missing something really > obvious here, or is it all really just a very hard and over-complex > solution to a problem I've already solved?
You are not missing anything.
-- Cheers, The Rhythm is around me, The Rhythm has control. Ray Blaak The Rhythm is inside me, bl...@telus.net The Rhythm has my soul.
> XML, being the single suckiest syntactic invention in the history of > mankind, offers you several layers at which you can do exactly the same > thing very differently, in fact so differently that it takes effort to > see that they are even related.
Believe it or not, there are things in actual operational use that syntactically suck worse than XML. Check out:
which describes Object Definition Language (ODL), developed by NASA/JPL in the early 90's to hold metadata for space data sets (primarily planetary probe data).
XML is what you get when you assign the nested property list problem to people who only know SGML. ODL is apparently what you get when you assign the same problem to people who only know FORTRAN.
ODL is the official standard metadata representation for data from the Earth Science Data and Information System, NASA's next generation observe-the-whole-earth data gathering project. I am currently working on a task to take ODL from this system and display it intelligibly. The current solution (chosen before I got here) is to take the ODL, convert it to XML, then bounce the XML off an XSLT stylesheet to generate HTML/Javascript.
So remember as you slog through yet another brain-damaged XML application - it could be worse.
* Ray Blaak <bl...@telus.net> | DTDs other official purpose is for separate validation, a dubious idea in | my opinion. The application that finally processes an XML file will need | to validate it on its own anyway, so what is the point of validation in | advance?
Remember when C was so young and machines so small that the compiler could not be expected to do everything and we all studiously ran "lint" on our programs? It was a fascinating time, I can tell you.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
On Tue, 05 Mar 2002 16:20:55 GMT, Erik Naggum <e...@naggum.net> wrote: > XML, being the single suckiest syntactic invention in the history of > mankind,
Tim Bradshaw wrote: > I have a system which currently reads an sexp based config file > syntax, for which I need to provide (and in fact have provided) an > alternative XML-based syntax for political reasons.
> But then, is there any sense in which > XML is more than an incredibly complex and somehow less functional > version of sexprs? Surely it can't be this bad?
XML is an incredibly complex and somehow less functional vertsion of sexprs. It is that bad.
XML thoroughly sucks, but if you have to deal with it, there is an excellent Scheme library for dealing with it, and a defined mapping of the XML "infoset" to scheme, in the form of SXML. It'll go XML to sexprs and vice versa.
I know it's not common lisp, but, in theory, it could be ported with relatively little effort, and it should be food for thought.
About the only vaguely interesting features of XML to me are probably certain aspects of XML-Schema (the replacement for DTDs), and perhaps certain aspects of the extended hyperlinking (xlink/xpointer)
I've occasionally pondered the similarities of XML-Schema to syntax-rules in Scheme, giving some sort of datatyping-of-tree-structures-based-on-their-structure, or some similarly wooly concept - i.e. checking whether a given sexpr would match a given complicated macro definition is vaguely akin to validating an XML document against an XML schema.
Erik Naggum <e...@naggum.net> writes: > Remember when C was so young and machines so small that the compiler > could not be expected to do everything and we all studiously ran "lint" > on our programs?
Probably I wasn't born yet, so what is "lint"?
> It was a fascinating time, I can tell you.
I'm sure. I love when KMP (or someone else) talks about anciente (for me :) software or hardware (PDP's, VAX, TOPS, Lisp Machines, ITS and the like).
David Golden <qnivq.tby...@bprnaserr.arg> writes: > XML thoroughly sucks, but if you have to deal with it, there is an > excellent Scheme library for dealing with it, and a defined mapping > of the XML "infoset" to scheme, in the form of SXML. It'll go XML > to sexprs and vice versa.
> I know it's not common lisp, but, in theory, it could be ported with > relatively little effort, and it should be food for thought.
If it's just about getting the job done maybe this will help:
>>>>> On Tue, 05 Mar 2002 20:49:27 GMT, Thaddeus L Olczyk ("Thaddeus") writes:
Thaddeus> On Tue, 05 Mar 2002 16:20:55 GMT, Erik Naggum <e...@naggum.net> wrote: >> XML, being the single suckiest syntactic invention in the history of >> mankind, Thaddeus> APL.
APL syntax is simpler than that of Lisp. Do you program in APL?
olc...@interaccess.com (Thaddeus L Olczyk) writes:
> On Tue, 05 Mar 2002 16:20:55 GMT, Erik Naggum <e...@naggum.net> wrote:
> > XML, being the single suckiest syntactic invention in the history of > > mankind, > APL.
I beg to differ. APL *is* weird, but it's syntax is amazingly simple and regular. It is the net effect that is unreadable. This net effect is due to the special glyphs required and to the fact that operators have different "semantics" if monadic or dyadic.
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
Thaddeus L Olczyk wrote: > On Tue, 05 Mar 2002 16:20:55 GMT, Erik Naggum <e...@naggum.net> wrote:
>> XML, being the single suckiest syntactic invention in the history of >> mankind, > APL.
Strange that you'd say that. Most people I know who like Lisp also like APL and Forth (if they know about them in the first place).
Both Forth and APL have simple, elegant, syntax. Kinda like... oh... Lisp...
Note that I'm not talking about asciified APL abominations, which are a royal pain in the backside to read... APL is unusual in that if you DON'T use single-symbol identifiers for things, it gets less readable.
Also, APL programs can look as indecipherable as idiomatic Perl - if you don't know the language. However, like Perl, if you take a little time to learn the language, it all makes much more sense (O.K. a little more sense...)
David Golden <qnivq.tby...@bprnaserr.arg> writes: > Tim Bradshaw wrote:
> > I have a system which currently reads an sexp based config file > > syntax, for which I need to provide (and in fact have provided) an > > alternative XML-based syntax for political reasons.
> > But then, is there any sense in which > > XML is more than an incredibly complex and somehow less functional > > version of sexprs? Surely it can't be this bad?
> XML is an incredibly complex and somehow less functional > vertsion of sexprs. It is that bad.
> XML thoroughly sucks, but if you have to deal with it, there is an > excellent Scheme library for dealing with it, and a defined mapping of the > XML "infoset" to scheme, in the form of SXML. It'll go XML to sexprs and > vice versa.
> I know it's not common lisp, but, in theory, it could be ported with > relatively little effort, and it should be food for thought.
Of course, people who do not know Common Lisp are bound to mess things up.
How do you justify something written as
(*TOP* (urn:loc.gov:books:book (urn:loc.gov:books:title "Cheaper by the Dozen") (urn:ISBN:0-395-36341-6:number "1568491379") (urn:loc.gov:books:notes (urn:w3-org-ns:HTML:p "This is a " (urn:w3-org-ns:HTML:i "funny") " book!")))) ?
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
* "Eduardo Muñoz" | Probably I wasn't born yet, so what is "lint"?
No big loss. "lint" was a program that would compare actual calls and definitions of pre-ANSI C functions because the languge lacked support for prototypes, so header files was not enough to ensure consistency and coherence between separately compiled files, probably not even within the same file, if I recall correctly -- my 7th edition Unix documentation is in natural cold storage somewhere on the loft, and it is too goddamn cold tonight. "lint" also ensured that some of the more obvious problems in C were detected prior to compilation. It was effectively distributing the complexity of compilation among several programs because the compiler was unable to remember anything between each file it had compiled. ANSI C does not prescribe anything useful to be stored after compiling a file, either, so manual header file management is still necessary, even though this is probably the singularly most unnecessary thing programmers do in today's world of programming. "lint" lingers on.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
Marco Antoniotti wrote: > operators have different "semantics" if monadic or dyadic.
ah yes, I saw that in the K language, I was wondering what possessed them, thx for clearing that up. :)
--
kenny tilton clinisys, inc --------------------------------------------------------------- "Be the ball...be the ball...you're not being the ball, Danny." - Ty, Caddy Shack
* David Golden wrote: > XML is an incredibly complex and somehow less functional > vertsion of sexprs. It is that bad.
Thanks for this and the other followups. I now feel kind of better about the whole thing.
The really disturbing thing is that huge investments in `web services' are being predicated on using XML, something which (a) is crap and (b) is so complicated that almost no-one will be able to use it correctly (`CORBA was too complicated and hard to use? hey, have XML, it's *even more* complicated and hard to use, it's bound to solve all your problems!'). Papers like The Economist are busy writing plausible-sounding articles about how all this stuff might be the next big thing.
> On Tue, 05 Mar 2002 16:20:55 GMT, Erik Naggum <e...@naggum.net> wrote: >> XML, being the single suckiest syntactic invention in the history >> of mankind, > APL.
What's wrong with the syntax of APL?
If there's anything simpler and more regular than Lisp, it's APL.
It's fair to say that a lot of APL code depends on the "abuse" of quasi-perverse interpretations of matrix operations, but that's not syntax, that's "odd math." -- (reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa")) http://www3.sympatico.ca/cbbrowne/linuxxian.html Oh, no. Not again. -- a bowl of petunias
In an attempt to throw the authorities off his trail, David Golden <qnivq.tby...@bprnaserr.arg> transmitted:
> XML thoroughly sucks, but if you have to deal with it, there is an > excellent Scheme library for dealing with it, and a defined mapping > of the XML "infoset" to scheme, in the form of SXML. It'll go XML > to sexprs and vice versa. > I know it's not common lisp, but, in theory, it could be ported with > relatively little effort, and it should be food for thought.
When I have need to do so, I use Pierre Mai's C interface to expat. It uses the expat XML parser, and generates sexp output that can be read in using READ.
It would arguably be nicer to have something paralleling SAX which would generate closures and permit lazy evaluation. But I haven't found cases yet where the "brute force" of XML-READER was unsatisfactory to me.
Note that this has the HIGHLY attractive feature of keeping all management of "ugliness" in a library (/usr/lib/libexpat.so.1) that is _widely_ used (including by such notables as Apache, Perl, Python, and PHP) so that it is likely to be kept _quite_ stable.
I'd argue that expat significantly beats doing some automagical conversion of Scheme code into CL... -- (concatenate 'string "aa454" "@freenet.carleton.ca") http://www3.sympatico.ca/cbbrowne/xml.html Black holes are where God divided by zero.
Tim Bradshaw wrote: > Papers like The Economist are busy writing > plausible-sounding articles about how all this stuff might be the next > big thing.
I haven't seen what the Economist has to say, but XML /will/ be the next big thing if it works out as a lingua franca for data exchange. Not saying XML does not suck from the syntax standpoint, just that syntax can be fixed or (more likely) hidden.
--
kenny tilton clinisys, inc --------------------------------------------------------------- "Be the ball...be the ball...you're not being the ball, Danny." - Ty, Caddy Shack
Kenny Tilton <ktil...@nyc.rr.com> writes: > Marco Antoniotti wrote: > > operators have different "semantics" if monadic or dyadic.
> ah yes, I saw that in the K language, I was wondering what possessed > them, thx for clearing that up. :)
Yep. Turns out that K is a language that heavily borrows from APL.
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
> * "Eduardo Muñoz" > | Probably I wasn't born yet, so what is "lint"? > No big loss. "lint" was a program that would compare actual calls > and definitions of pre-ANSI C functions because the languge lacked > support for prototypes, so header files was not enough to ensure > consistency and coherence between separately compiled files, > probably not even within the same file, if I recall correctly -- > my 7th edition Unix documentation is in natural cold storage > somewhere on the loft, and it is too goddamn cold tonight. "lint" > also ensured that some of the more obvious problems in C were > detected prior to compilation. It was effectively distributing > the complexity of compilation among several programs because the > compiler was unable to remember anything between each file it had > compiled. ANSI C does not prescribe anything useful to be stored > after compiling a file, either, so manual header file management > is still necessary, even though this is probably the singularly > most unnecessary thing programmers do in today's world of > programming. "lint" lingers on.
There are new variations on lint, notably "LCLint" which has become "Splint" which stands for "Secure Programming Lint." It does quite a bit more than lint used to do.
Chances are that you'd be better off redeploying the code in OCAML where type signatures would catch a whole lot more mistakes... -- (concatenate 'string "cbbrowne" "@acm.org") http://www.ntlug.org/~cbbrowne/lisp.html ``What this means is that when people say, "The X11 folks should have done this, done that, or included this or that", they really should be saying "Hey, the X11 people were smart enough to allow me to add this, that and the other myself."'' -- David B. Lewis <d...@motifzone.com>
In the last exciting episode, Tim Bradshaw <t...@cley.com> wrote:
> * David Golden wrote:
>> XML is an incredibly complex and somehow less functional >> vertsion of sexprs. It is that bad.
> Thanks for this and the other followups. I now feel kind of better > about the whole thing. > The really disturbing thing is that huge investments in `web > services' are being predicated on using XML, something which (a) is > crap and (b) is so complicated that almost no-one will be able to > use it correctly (`CORBA was too complicated and hard to use? hey, > have XML, it's *even more* complicated and hard to use, it's bound > to solve all your problems!'). Papers like The Economist are busy > writing plausible-sounding articles about how all this stuff might > be the next big thing.
The thing is, you don't actually _write_ any XML unless you're the guy writing the library/module/package that _implements_ XML-RPC/SOAP.
Here's a bit of Python that provides the "toy" of allowing you to submit simple arithmetic calculations to a SOAP server. (Of course, that's a preposterously silly thing to do, but it's easy to understand!)
def add(a, b): return a + b
def add_array (e) : total = 0 for el in e: total = total + el return total
A bit of Perl that calls that might be thus: $a = 100; $b = 15.5; $c = $soap->add($a, $b)->result; print $soap->add($a, $b), "\n";
I've omitted some bits of "client/server setup," but there's no visible XML in any of that.
The problems with SOAP have to do with it being inefficient almost beyond the wildest dreams of 3Com, Cisco, and Intel (the main beneficiaries of the inefficiency in this case).
It should be unusual to need to look at the XML. Pretend it's like CORBA's IIOP, which you generally don't look too closely at.
The place where you _DO_ look at or write some XML is with the "WSDL" service description scheme, which is more or less similar to CORBA IDL.
But I'd think CLOS/MOP would provide some absolutely _WONDERFUL_ opportunities there; it ought to be possible to write some CL that would generate WSDL given references to classes and methods... -- (concatenate 'string "aa454" "@freenet.carleton.ca") http://www.ntlug.org/~cbbrowne/finances.html I have this nagging fear that everyone is out to make me paranoid.