* james anderson | i have, quite coincidentally, recently received a note from a lisp user | who was confounded in his effort to use an application by problems with | symbol name case. it was most amusing that, in keeping with the | respective lisp vendor's documents on the subject, the enquiry was | phrased in term of what one should do to make the application code | 'portable'." i had always presumed that "portable" would be confined to | runtime behaviour which itself conforms to "the standard".
Well, I for one intensely dislike the hijacking of "portably" to mean "compatible with both the standard and our violation of the standard".
If anyone wants a deviation from the standard, it should be made very, very explicit. I am working on a defsyntax form that takes care of many aspects of the reader and printer, and among them symbol case. You identify the source file with an in-syntax form and choose a syntax dynamically with with-syntax, all by name. These forms clearly require a previous in-package, too, so as to make a file without either still use standard syntax, and to be self-identifying for deviating syntaxes.
My goal has been to ensure that all conforming code would work the same, but I am stuck with macros that define symbols. Suppose you have the form (defstruct EvilName ...). As long as you use monocase symbol names, this works just fine when the macro definition captures the syntax in which it was defined, but you would have to get MAKE-EvilName if you used the string "MAKE-", although you would get make-good-name as expected, With the aid of hindsight from CLOS, macros that generate symbol names are bad karma. That programmers do not use them without understanding how they work is virtually the only hope. I remain somewhat uncertain whether this pathological case is worth solving, but there is a snag here in that it matters whether you do early or late symbol-name extraction when you use that string to form other symbols. (This is why Franz Inc suggests that late symbol-name extraction is preferable. I am not sure I agree, but it would be a much better proposal with interned-string-style keywords as the name of symbols, even though that would not help against mixing conventions.)
Generally, I think the need to support conventions like FooBarZot should be supported with automatic conversion to and from foo-bar-zot. I find the FooBarZot conventions so horribly disgusting that it has been a very serious detractor from solving the case-sensitive lower-case Common Lisp problem. I have even made Emacs help me not see that crap in Java.
| in order for this to work, intern, defpackage, and read/print should all | have readily predictable effects as to the internal and external | representations of objects which they manipulate. i would have been less | likely to use that method if it were not trivially possible to predict | the internal symbol name case in a conforming lisp.
The solution to the whole problem is to make the readtable hold a slot for intern, find-symbol, and symbol-name so the readtable determines whether (symbol-name 'car) returns "CAR" or "car". Switch to another readtable and you get what you expect. Good Common Lisp style mandates that you do not make dependencies on user-settable reader and printer control variables. (Franz Inc unfortunately does a shoddy job in this area, and this, I believe, is because the case-sensitive lower-case "solution" is there for them to point users at, which is another reason why this form of deviation is so very harmful and causes much unfair distrust in the standard.)
| aside from my general curiosity about the possible advantages of | permitting the programmer to predict the effects of thirty-six possible | combinations of symbol case, readtable case, and print case, i am | concerned whether everything which can be expressed in any given mode | can also be expressed "portably".
Unless "portably" here is the hijacked Franz meaning, the solution is quite simple, wrap your printing and reading in with-standard-io-syntax.
| has there been any discussion on that question?
As far as I can tell, the Franz Inc stance on lower-case symbol names is highly irrational and is based in personal animosity towards people, not any longer in anything actually technical -- if it were, they would have researched this and done something far better than they have -- and far less destructive. I mean, when they wanted to support CORBA, they did it amazingly well. When they wanted to support Java, they did it amazingly well. When they wanted to support Unicode, they did it amazingly well. Botching something so central is clearly not from incompetence, which in my eyes makes it much more than a nuisance.
That said, I also think the internal symbol-names in Common Lisp should have been lower-case, the case of the symbol when interned should have been stored, and possibly even that matches that differ in case should produce a warning if the user so desired. But that was not the way it went, and if we want to change anything, it is vital that the decision that was made is respected and honored, or any new decision will not be, either. In order to make a transition actually work, we must be just as careful as when Internet mail went from 7-bit to 8-bit characters. MIME is the result of a clearly erroneous decision _not_ to support 8-bit in Internet mail dating back to the 7-bit-only characters on the computers that were on the ARPAnet in 1973. MIME is horribly ugly because of this ancient 7-bit restriction, but we can now send 8-bit text with ESMTP, which itself depends on MIME for the appropriate character set, but all the same, we still allow only 7-bit characters in the headers, leading to one of the most god-awful syntactic inventions since Perl. This is the kind of thing that happens when an old decision is crippling the future. However, this should have been invisible to users of reasonable-quality mail software. When it does not, when people are actually shown this crap instead of the information it represents, they object vociferously. Back in the early days, there were a lot of people who argued that we should "just send eight bits", causing massive grief around the world even though it sort of worked most of the time -- but only within a group of people who used the same 8-bit character set. Venture outside that group, and you lose big time with the "just send eight bits"-line. In order to make sure that people of different value systems or concrete values can work together, more elaborate systems need to be set up than those that work for a single person or homogeneous group. Respect for this more complicated system and the procedures necessary to arrive at them is at odds with the desire to "do it my way" and be satisfied with "it works for me". Actually achieving an improved standard or a workable change as a first step that satisfies all sides is very hard work.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
* Kent M Pitman | This is just not a difficult problem. | SUGGESTION: JUST DON'T USE NON-PORTABLE STUFF.
This is the wrong response to the situation. It is often necessary to use non-portable stuff to get the job done. It is not even sufficient to say "non-standard", because many interesting things are not standardized. To make this as precise as possible, I think the most _accurate_ term should be "anti-standard" or "counter-standard". If the standard mandates something, do it that way. If no standard mandates something, do it any way you want.
| There is simply no good programming need for non-portable naming. It is | a contrived problem.
I actually disagree on this. The problem is very real: When you set up a case sensitive reader, and that _is_ an option in the standard, the only way to type lower-case symbol names and get the standard symbols is to use :invert, which only inverts monocase symbol names. This means that (symbol-name <whatever>) may be exactly what you want, or the inverted version of what you want if it happened to be monocase. The only option is to use some other function than symbol-name to retrieve the name of the symbol, and then you need some other function than intern to make a new symbol, which effectively inverts it before calling the regular intern. This is an annyoing waste. As a user-code programmer, the only solution is to make your own abstraction for symbols, which leads to messy interaction with other systems. As a systems-code programmer, you should instead provide the necessary features and facilities to choose a lower-case _view_ of the standard symbols and their names. This is what I have been working with. In any case, the _actual_ internal symbol name is immaterial as long as cl:intern and ecl:symbol-name both work with upper-case symbol names for the common-lisp-package. What it _actually_ is underneath should not concern anyone: that is _not_ what the standard mandates.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
* Duane Rettig | The documents we write about such case-portability are designed to show | a programmer how to write in a style that will work in both implementations.
But standard, conforming code no longer works when you use the standard symbol-names in strings that are used to name symbols. Code that does a fraction of what the reader does, but which upcases the string in order to conform to the standard, will fail miserably. Protocols that specify uppercase names and use that directly to map to symbols will not work if the symbols are used in the program. This is wrong. This is bad. This has nothing to do with how you represent or deal with the symbol-names of symbols internal to the program. It has everything to do with how you treat _external_ input. Not being able to implement your _own_ reader because the implementation uses stealth case switching is fantastically hostile and exposes a decision that should be tacit and invisible to full view and discussion, like we have here periodically.
Put slightly differently, you force every programmer to be _aware_ of the case issue when they should instead just _know_ what to do and expect. This is not unlike how you train soldiers to march. If you always start on one foot, you never have a problem -- they just get it right whether they are confused about left or right or not, but if you have an option to start on either foot, you have to communicate "left" or "right" at some point, and lots of people confuse them, so you get an inevitable mess out of just _creating_ the option, even if you think you solve something by offering a choice because you believe that the particular choice is no good. The whole key is to _accept_ that an arbitrary decision has to be made, make it, and move on with it as a declared constant, not keep it a variable as if you could undo the decision.
Now, I sympathize with the problems that do occur when you want to use the symbol-name for something else, but a readtable-case of :invert and an additional inverter in, say, the FFI code, _does_ take care of it.
However, I believe that programmers should also have a choice, but a _visible_ one that they cannot escape noticing. A source or object file does not know which Allegro CL image is used to load it amd it should not have to know, either, but if the source file _actually_ and _explicitly_ requests a particular readtable, it knows. This is the same argument that the SGML community had with respect to the SGML declaration. I helped make it possible to use named declarations instead of omitting it, which everybody did, because the whole declaration had to be inline.
To this end, Allegro CL's support for named readtables is _very_ nice. Concomitant support for the in-syntax macro that Kent Pitman once offered would have been most welcome.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
Duane Rettig <du...@franz.com> writes: > > You are working outside of the spec.
> You can't be sure of this. Mr. Anderson might not even be able to > be sure of this, because he is posting on behalf of another, who > should be contacting the vendor for support.
I didn't mean to be suggesting otherwise. Although not explicit, I guess I meant in fact that this was his main and best option.
> > As far as I'm concerned, you're on your own.
> This is not true. Mr. Anderson (or his "original party") should contact > the vendor for which they are having any problems.
My remark wasn't intended to be in conflict with yours, though I guess I was vague on this.
I just mean _I_ wasn't going to volunteer on this. (And probably that I wasn't alone.) But that's not to say no one would volunteer, especially since indeed some vendors have support.
the responses have convinced me that matters of symbol name case are not trivial and that i should presume as little as possible about internal symbol representation.
the problematic program generates marshalling functions and macros from document type definitions and schema. these functions include numerous pairs, one a macro and one a generic function, which produce an encoded markup stream. they differ primarily in that the macro version makes more decions at compile time about what and how to encode things. i had been generating the respective symbols in the same package, inverting the case of one or the other of them in accord with readtable case. i believe that method is portable. despite that, it is not as robust as it could be.
it would be better to allow one to specify destination packages and cases for the respective symbols when the marshalling functions are generated from the definitions. the default settings would be carried over from readtable case, but it would be possible to accommodate other approaches.
* james anderson <james.ander...@setf.de> | the responses have convinced me that matters of symbol name case are not | trivial and that i should presume as little as possible about internal | symbol representation.
Well, while this is a valid conclusion, I think you should just know that unless you work _very_ hard to make it otherwise, the standard mandates upper-case internal symbol names, and you should _never_ expect to find a symbol in all lower-case in the package named "COMMON-LISP".
We have no mechanism that could support symbol forwarding in Common Lisp, although it could have been a good thing to have, such that you could rename symbols upon import to a new package. Such a feature could have been used to make lower-case "clones" of symbols. (One _could_ hack the package hashtables to fake this effort.)
| the problematic program generates marshalling functions and macros from | document type definitions and schema. these functions include numerous | pairs, one a macro and one a generic function, which produce an encoded | markup stream. they differ primarily in that the macro version makes | more decions at compile time about what and how to encode things. i had | been generating the respective symbols in the same package, inverting the | case of one or the other of them in accord with readtable case. i | believe that method is portable. despite that, it is not as robust as it | could be.
Well, it is. The readtable case is not where you want to look at this.
| it would be better to allow one to specify destination packages and cases | for the respective symbols when the marshalling functions are generated | from the definitions. the default settings would be carried over from | readtable case, but it would be possible to accommodate other approaches.
"Destination packages"? Hm. This weird expression _may_ indicate a lack of understanding of the package system. I hope you are aware that the way to control the package into which the reader will intern symbols when they have no package qualification is controlled by the value of the special variable *package*. If you are not using the reader to read Common Lisp code with symbols that (may) live in the common-lisp package, you may use any case you want for your own symbols and adjust the readtable-case so it fits your needs. There is nothing in Common Lisp that requires your own symbols to be in upper-case, it is just massively convenient when also working with symbols in the common-lisp package.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
> * james anderson <james.ander...@setf.de> > | the responses have convinced me that matters of symbol name case are not > | trivial and that i should presume as little as possible about internal > | symbol representation.
> Well, while this is a valid conclusion, I think you should just know that > unless you work _very_ hard to make it otherwise, the standard mandates > upper-case internal symbol names, and you should _never_ expect to find a > symbol in all lower-case in the package named "COMMON-LISP".
i am not concerned directly with symbols in the "COMMON-LISP" package. i am concerned that, when a program generation utility creates symbols in other packages, it do so in a way which accommodates a similar expectation with respect to symbols in user packages, but does not fail if the expectation is not met.
> We have no mechanism that could support symbol forwarding in Common Lisp, > although it could have been a good thing to have, such that you could > rename symbols upon import to a new package. Such a feature could have > been used to make lower-case "clones" of symbols. (One _could_ hack the > package hashtables to fake this effort.)
> | the problematic program generates marshalling functions and macros from > | document type definitions and schema. these functions include numerous > | pairs, one a macro and one a generic function, which produce an encoded > | markup stream. they differ primarily in that the macro version makes > | more decions at compile time about what and how to encode things. i had > | been generating the respective symbols in the same package, inverting the > | case of one or the other of them in accord with readtable case. i > | believe that method is portable. despite that, it is not as robust as it > | could be.
> Well, it is.
was "it" intended to mean "portable", "as robust as it could be", or "not as robust as it could be"?
> The readtable case is not where you want to look at this.
why not? (please see below.)
> | it would be better to allow one to specify destination packages and cases > | for the respective symbols when the marshalling functions are generated > | from the definitions. the default settings would be carried over from > | readtable case, but it would be possible to accommodate other approaches.
> "Destination packages"? Hm. This weird expression _may_ indicate a lack > of understanding of the package system.
the program in question has until now taken things like this
<!ELEMENT html (head, body) > <!ATTLIST html xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml' id ID #IMPLIED dir (ltr|rtl) #IMPLIED xml:lang NMTOKEN #IMPLIED lang NMTOKEN #IMPLIED >
and generated code which included things roughly like this
(DEFMACRO |http://www.w3.org/1999/xhtml|:|html| ((&KEY (XMLNS "http://www.w3.org/1999/xhtml") ID DIR |xml|::LANG LANG) &REST BODY) (LIST* 'XML (CONS '{xhtml}html (REMOVE NIL (LIST (WHEN XMLNS (LIST '{xmlns}|| XMLNS)) (WHEN ID (LIST '{}id ID)) (WHEN DIR (LIST '{}dir DIR)) (WHEN |xml|::LANG (LIST '{xml}lang |xml|::LANG)) (WHEN LANG (LIST '{}lang LANG))))) BODY))
(DEFGENERIC |http://www.w3.org/1999/xhtml|::HTML (DATUM &KEY XMLNS ID DIR |xml|::LANG LANG) (:METHOD ((GENERATOR FUNCTION) &KEY (XMLNS *XHTML-NAMESPACE-NAME*) ID DIR |xml|::LANG LANG) (|xhtml|:|html| (:XMLNS XMLNS :ID ID :DIR DIR :|xml|::LANG |xml|::LANG :LANG LANG) (FUNCALL GENERATOR))) (:METHOD ((DATUM T) &REST ARGS) (DECLARE (DYNAMIC-EXTENT ARGS)) (APPLY #'|http://www.w3.org/1999/xhtml|::HTML #'(LAMBDA NIL (ENCODE-NODE DATUM)) ARGS)))
where XML is a macro which performs primitive element encoding.
the symbols for function, macro, and parameter bindings were generated from document names as follows
first, select a so-called "destination" package: if the name specified a namespace, use a package with the same name; if an attribute name specified no namespace, then use the package bound to *package*; if an element name specified no namespace, then use the name of the default namespace or, if no default namespace had been specified, then use the package bound to *package*.
second, select a case for the symbol name: an attribute name is transformed or preserved in manner consistent with readtable case a function name is also transformed or preserved in a manner consistent with readtable case a macro name is transformed so as to be the opposite case of the function name.
transform the name case as specified and intern the result in the destination package.
(that this fails where the document definition includes mixed case name is another problem.)
> I hope you are aware that the > way to control the package into which the reader will intern symbols when > they have no package qualification is controlled by the value of the > special variable *package*.
i did not want to use the reader. i suppose, since the names are produced by parsing a document definition and should be fairly safe, that concern may be unwarranted.
If you are not using the reader to read
> Common Lisp code with symbols that (may) live in the common-lisp package, > you may use any case you want for your own symbols and adjust the > readtable-case so it fits your needs. There is nothing in Common Lisp > that requires your own symbols to be in upper-case, it is just massively > convenient when also working with symbols in the common-lisp package.
there is nothing in common lisp, but it would be good if the symbol names followed the expection which you mentioned above. i am trying it out now with independant options for destination packages for macro and function names and for the respective case transformation. the latter options use the readtable case as a default, but may be specified either as one of the readtable case keywords or as a function. at the moment, i upcase the names, since that's my readtable case setting, and intern the macro and function names in distinct packages, for example "_xhtml" and "xhtml".
? why would it be ill advised to use readtable case as the default?
* james anderson <james.ander...@setf.de> | i am not concerned directly with symbols in the "COMMON-LISP" package. i | am concerned that, when a program generation utility creates symbols in | other packages, it do so in a way which accommodates a similar | expectation with respect to symbols in user packages, but does not fail | if the expectation is not met.
I need to sort out a few things to make sure I understand what you are talking about. (1) The expectations you want met are not simply that symbols in the common-lisp package are all upper-case and that the reader and printer may produce other cases. If it were, I would conclude that you would have no problem at all since you control the reader and printer. But you give contradicting information to this position, i.e., you continue to have problems. (2) If the expectation is not met, the only way that could happen as I see your situation is precisely if the symbols in the common-lisp package were not all upper-case. You appear to claim that this is not the problem. (3) I sense a confusion in how you believe the reader and printer control variables affect the reading and printing of symbols. This sense is strongly reinforced by the fact that you neglect to tell us what your settings are.
| and generated code which included things roughly like this
Is this output from something? The reason I ask is that it appears more than unlikely that either of {xmlns}||, {}id, {}dir, {xml}lang, or {}lang would survive a printer that would both need to write |html| and also be happy with :use and :nicknames.
| second, select a case for the symbol name: | an attribute name is transformed or preserved in manner consistent with | readtable case
Why? The readtable-case is completely irrelevant to you. It concerns the mapping from input string to symbol name, but if this is SGML, you already know that the names are case-insensitive and you can canonicalize any way _you_ want, or if this is XML, you already know that the names are case-sensitive, so you should perform no transformations at all.
| a function name is also transformed or preserved in a manner consistent | with readtable case | a macro name is transformed so as to be the opposite case of the | function name.
This part is _really_ weird. The macro appears to be a completely useless massive complication of the whole node creation process. Would you call it on its own? Why a macro in the first place? Why make a new macro for each element when one can do for all? The whole point of the macro seems to be to convert an argument property list to an association list where all properties with a nil (= #IMPLIED?) value are removed while converting from a keyword to some weird internal symbol. This appears to me to be an excellent case for data-driven programming, not code-driven.
| (that this fails where the document definition includes mixed case name | is another problem.)
Well, it is a pretty strong indicator that you screwed up the design.
| i did not want to use the reader. i suppose, since the names are | produced by parsing a document definition and should be fairly safe, | that concern may be unwarranted.
If this is so, where does the reader come in at all?
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.