After trying to find a way to preserve symbol case in CMUCL I have discovered that (setf (readtable-case *readtable*) :invert) preserves symbol case perfectly (also without breaking existing lowercase code):
$ lisp CMU Common Lisp release x86-linux 3.0.12 18d+ 23 May 2002 build 3350, ...
Loaded subsystems: Python 1.0, target Intel x86 CLOS based on PCL version: September 16 92 PCL (f) * (setf (readtable-case *readtable*) :invert)
:invert * 'aa
aa * 'AA
AA * 'Aa
Aa * :abc
:abc * :Abc
:Abc * :ABC
:ABC
This is a fantastic find because I wish to include the symbols in XHTML output (which is case sensitive).
Can someone please comment on whether this behaviour accords with the HyperSpec:
When the readtable case is :invert, then if all of the unescaped letters in the extended token are of the same case, those (unescaped) letters are converted to the opposite case.
I'm thankful that all lowercase symbols are not converted to uppercase and vice versa. Does this mean the CMUCL behaviour is non-standard (or what has been called "modern" in other threads?)
* Adam Warner | After trying to find a way to preserve symbol case in CMUCL I have | discovered that (setf (readtable-case *readtable*) :invert) preserves | symbol case perfectly (also without breaking existing lowercase code):
Try the following two forms and report your understanding of the interaction of the symbol reader and the printer:
There is potential enligthenment here. -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
70 percent of American adults do not understand the scientific process.
Adam Warner <use...@consulting.net.nz> writes: > I'm thankful that all lowercase symbols are not converted to uppercase and > vice versa. Does this mean the CMUCL behaviour is non-standard (or what > has been called "modern" in other threads?)
The reader converts all-lower-case symbols to upper case, but the printer converts them back again. See CLHS section 22.1.3.3.2.
On Mon, 27 May 2002 03:33:22 +1200, Erik Naggum wrote: > * Adam Warner > | After trying to find a way to preserve symbol case in CMUCL I have | > discovered that (setf (readtable-case *readtable*) :invert) preserves | > symbol case perfectly (also without breaking existing lowercase code):
> Try the following two forms and report your understanding of the > interaction of the symbol reader and the printer:
So when I go to read the symbol it will be the wrong case and I will have to invert it. But at least mixed case will be preserved (and since the inverting is predictable no information is thrown away).
I can't use preserve because none of the built in functions can be called using lower case. Perhaps a custom compiled CMUCL image would be a long term solution.
I'll have to think about this in the (*cough*) morning. I'm really tired.
Adam Warner <use...@consulting.net.nz> writes: > Hi all,
> After trying to find a way to preserve symbol case in CMUCL I have > discovered that (setf (readtable-case *readtable*) :invert) preserves > symbol case perfectly (also without breaking existing lowercase code):
> $ lisp > CMU Common Lisp release x86-linux 3.0.12 18d+ 23 May 2002 build 3350, > ...
> Loaded subsystems: > Python 1.0, target Intel x86 > CLOS based on PCL version: September 16 92 PCL (f) > * (setf (readtable-case *readtable*) :invert)
> :invert > * 'aa
> aa
* (symbol-name 'aa)
"AA" * (symbol-name 'AA)
"aa" * (symbol-name 'Aa)
"Aa"
> This is a fantastic find because I wish to include the symbols in XHTML > output (which is case sensitive).
> Can someone please comment on whether this behaviour accords with the > HyperSpec:
> When the readtable case is :invert, then if all of the unescaped > letters in the extended token are of the same case, those (unescaped) > letters are converted to the opposite case.
CMUCL does exactly what the HyperSpec demands here.
> I'm thankful that all lowercase symbols are not converted to uppercase and > vice versa. Does this mean the CMUCL behaviour is non-standard (or what > has been called "modern" in other threads?)
They are converted as demanded by the HyperSpec (otherwise entering (car (cons 1 2)) in that mode would fail), so this isn't modern mode. The reason why you are confused is that both the reader and the printer collude to give you the intended illusion that case is completely preserved. Quoting from section 22.1.3.3.2 "Effect of Readtable Case on the Lisp Printer":
When printer escaping is disabled, or the characters under consideration are not already quoted specifically by single escape or multiple escape syntax, the readtable case of the current readtable affects the way the Lisp printer writes symbols in the following ways:
:upcase When the readtable case is :upcase, uppercase characters are printed in the case specified by *print-case*, and lowercase characters are printed in their own case.
[...]
:invert When the readtable case is :invert, the case of all alphabetic characters in single case symbol names is inverted. Mixed-case symbol names are printed as is.
So as long as you always use the Lisp Printer (or do something similar), you will get the illusion of having both case preservation, and the ability to access CL-mandated symbols in lower-case. However, behind the scenes 'car is still the symbol CL:CAR, etc.
Regs, Pierre.
-- Pierre R. Mai <p...@acm.org> http://www.pmsf.de/pmai/ The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents. -- Nathaniel Borenstein
On Mon, 27 May 2002 04:14:47 +1200, Pierre R. Mai wrote: > So as long as you always use the Lisp Printer (or do something similar), > you will get the illusion of having both case preservation, and the > ability to access CL-mandated symbols in lower-case. However, behind > the scenes 'car is still the symbol CL:CAR, etc.
Thanks Pierre. I also find this is a clear example of what happens with (setf (readtable-case *readtable*) :invert):
* (string :align)
"ALIGN"
Only (setf (readtable-case *readtable*) :preserve) actually preserves the symbol case:
* Adam Warner <use...@consulting.net.nz> | Yes I have achieved enlightenment Erik.
You made my day!
| I can't use preserve because none of the built in functions can be called | using lower case. Perhaps a custom compiled CMUCL image would be a long | term solution.
Well, that way lies madness. One Common Lisp vendor has decided to make a "custom" world in which symbols are in their preferred lower-case. While I also like to read and see lower-case, all I need to do to get that most of the time is with either :invert or :upcase and *print-case* to :downcase. However, if you want to use lower-case names in your own code, you can shadow intern, find-symbol, and symbol-name to invert their argument. Efficient invertion is not necessarily a trivial task, and your implementation may have optimized functions for it, but this is a shot, and intended to be an efficien tone. Just how efficient it is seems to vary a lot between implementations:
(defun invert-string (string) (declare (optimize (speed 3) (safety 0)) (simple-string string)) (check-type string 'string) (prog ((invert nil) (index 0) (length (length string))) (declare (simple-string invert) (type (integer 0 65536) index length)) unknown-case (cond ((= index length) (return string)) ((upper-case-p (schar string index)) (when (and (/= (1+ index) length) (lower-case-p (schar string (1+ index)))) (return string)) (setq invert (copy-seq string)) (go upper-case)) ((lower-case-p (schar string index)) (setq invert (copy-seq string)) (go lower-case)) (t (incf index) (go unknown-case))) upper-case (setf (schar invert index) (char-downcase (schar invert index))) (incf index) (cond ((= index length) (return invert)) ((lower-case-p (schar invert index)) (return string)) (t (go upper-case))) lower-case (setf (schar invert index) (char-upcase (schar invert index))) (incf index) (cond ((= index length) (return invert)) ((upper-case-p (schar invert index)) (return string)) (t (go lower-case))))) -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
70 percent of American adults do not understand the scientific process.
On Mon, 27 May 2002 05:42:04 +1200, Erik Naggum wrote: > * Adam Warner <use...@consulting.net.nz> | Yes I have achieved > enlightenment Erik.
> You made my day!
And mine!
> | I can't use preserve because none of the built in functions can be > called | using lower case. Perhaps a custom compiled CMUCL image would > be a long | term solution.
> Well, that way lies madness. One Common Lisp vendor has decided to > make a "custom" world in which symbols are in their preferred > lower-case. While I also like to read and see lower-case, all I need > to do to get that most of the time is with either :invert or :upcase > and *print-case* to :downcase. However, if you want to use lower-case > names in your own code, you can shadow intern, find-symbol, and > symbol-name to invert their argument. Efficient invertion is not > necessarily a trivial task, and your implementation may have optimized > functions for it, but this is a shot, and intended to be an efficient > one. Just how efficient it is seems to vary a lot between > implementations:
Thanks for the code Erik. It appears to be slightly broken. Here's my attempt to fix it:
The above line is undefined. Shouldn't it be (check-type string string)? This is comparing the variable called string against the type string. Luckily Lisp has multiple namespaces. We also have string as a function.
And an optimisation question: Doesn't this declare index and length to be greater than 16-bit unsigned integers? (when starting from 0 the maximum permissible unsigned value is 2^16-1). This probably causes the compiler to optimise using 32-bit integers. On my computer it seems to make no speed difference, probably because 32-bit integers are the minimum size used on 32-bit machines.
* Adam Warner | Shouldn't it be (check-type string string)?
Yes. I stuffed that line in just prior to posting. I keep making that mistake, yet I think it seems more correct to use the quoted type for the type, not just an unevaluated expression.
| And an optimisation question: Doesn't this declare index and length to be | greater than 16-bit unsigned integers?
Yes, but this is actually irrelevant, since the point was only to limit these things to less than array-dimension-limit, which it is annoyingly verbose to do. I also keep misremembering that (integer 0 1) and (integer (-1) (2)) are equivalent. I guess I believe upper limits should be exclusive because they are everywhere else in the language. It is surprisingly hard to learn things you believe should be different from what they are. Thanks for reminding me of these things. Just goes to show what happens when I post code I had not visited for weeks and had just rattled off at the time -- it was just useful to me at the time.
| This probably causes the compiler to optimise using 32-bit integers.
Well, we do not generally have 32-bit integers in Common Lisp systems on 32-bit hardware. but at least this makes it use more than 16 bits. It should have been only 65535, of course. A better way to specify this is (unsigned-byte 16).
| On my computer it seems to make no speed difference, probably because | 32-bit integers are the minimum size used on 32-bit machines.
(integer-length (- most-positive-fixnum most-negative-fixnum)) is usually less than 32, and can be as low as 16. A quick survey finds that Allegro CL and CMUCL have 30-bit signed fixnums, CLISP has 25-bit, and LispWorks 24-bit, all on a 32-bit Linux system. -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
70 percent of American adults do not understand the scientific process.