Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

(setf (readtable-case *readtable*) :invert) completely preserves symbol case in CMUCL

75 views
Skip to first unread message

Adam Warner

unread,
May 26, 2002, 11:00:44 AM5/26/02
to
Hi all,

After trying to find a way to preserve symbol case in CMUCL I have
discovered that (setf (readtable-case *readtable*) :invert) preserves
symbol case perfectly (also without breaking existing lowercase code):

$ lisp
CMU Common Lisp release x86-linux 3.0.12 18d+ 23 May 2002 build 3350,
...

Loaded subsystems:
Python 1.0, target Intel x86
CLOS based on PCL version: September 16 92 PCL (f)
* (setf (readtable-case *readtable*) :invert)

:invert
* 'aa

aa
* 'AA

AA
* 'Aa

Aa
* :abc

:abc
* :Abc

:Abc
* :ABC

:ABC

This is a fantastic find because I wish to include the symbols in XHTML
output (which is case sensitive).

Can someone please comment on whether this behaviour accords with the
HyperSpec:

http://www.xanalys.com/software_tools/reference/HyperSpec/Body/23_ab.htm

When the readtable case is :invert, then if all of the unescaped
letters in the extended token are of the same case, those (unescaped)
letters are converted to the opposite case.

I'm thankful that all lowercase symbols are not converted to uppercase and
vice versa. Does this mean the CMUCL behaviour is non-standard (or what
has been called "modern" in other threads?)

Regards,
Adam

Erik Naggum

unread,
May 26, 2002, 11:33:22 AM5/26/02
to
* Adam Warner

| After trying to find a way to preserve symbol case in CMUCL I have
| discovered that (setf (readtable-case *readtable*) :invert) preserves
| symbol case perfectly (also without breaking existing lowercase code):

Try the following two forms and report your understanding of the
interaction of the symbol reader and the printer:

(mapcar #'symbol-name '(UPPER UPPER-lower lower))
(mapcar #'intern '("UPPER" "UPPER-lower" "lower"))

There is potential enligthenment here.
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.

70 percent of American adults do not understand the scientific process.

Kalle Olavi Niemitalo

unread,
May 26, 2002, 11:46:12 AM5/26/02
to
Adam Warner <use...@consulting.net.nz> writes:

> I'm thankful that all lowercase symbols are not converted to uppercase and
> vice versa. Does this mean the CMUCL behaviour is non-standard (or what
> has been called "modern" in other threads?)

The reader converts all-lower-case symbols to upper case, but the
printer converts them back again. See CLHS section 22.1.3.3.2.

Adam Warner

unread,
May 26, 2002, 12:15:59 PM5/26/02
to
On Mon, 27 May 2002 03:33:22 +1200, Erik Naggum wrote:

> * Adam Warner
> | After trying to find a way to preserve symbol case in CMUCL I have |
> discovered that (setf (readtable-case *readtable*) :invert) preserves |
> symbol case perfectly (also without breaking existing lowercase code):
>
> Try the following two forms and report your understanding of the
> interaction of the symbol reader and the printer:
>
> (mapcar #'symbol-name '(UPPER UPPER-lower lower)) (mapcar #'intern
> '("UPPER" "UPPER-lower" "lower"))
>
> There is potential enligthenment here.

Thanks Kalle and Erik. If I had run the test-readtable-case-reading code
it would have been clear. Yes I have achieved enlightenment Erik.

READTABLE-CASE Input Symbol-name
-----------------------------------
:INVERT ZEBRA zebra
:INVERT Zebra Zebra
:INVERT zebra ZEBRA

So when I go to read the symbol it will be the wrong case and I will have
to invert it. But at least mixed case will be preserved (and since the
inverting is predictable no information is thrown away).

I can't use preserve because none of the built in functions can be called
using lower case. Perhaps a custom compiled CMUCL image would be a long
term solution.

I'll have to think about this in the (*cough*) morning. I'm really tired.

Thanks for all your help.

Regards,
Adam

Pierre R. Mai

unread,
May 26, 2002, 12:14:47 PM5/26/02
to
Adam Warner <use...@consulting.net.nz> writes:

> Hi all,
>
> After trying to find a way to preserve symbol case in CMUCL I have
> discovered that (setf (readtable-case *readtable*) :invert) preserves
> symbol case perfectly (also without breaking existing lowercase code):
>
> $ lisp
> CMU Common Lisp release x86-linux 3.0.12 18d+ 23 May 2002 build 3350,
> ...
>
> Loaded subsystems:
> Python 1.0, target Intel x86
> CLOS based on PCL version: September 16 92 PCL (f)
> * (setf (readtable-case *readtable*) :invert)
>
> :invert
> * 'aa
>
> aa

* (symbol-name 'aa)

"AA"
* (symbol-name 'AA)

"aa"
* (symbol-name 'Aa)

"Aa"

> This is a fantastic find because I wish to include the symbols in XHTML
> output (which is case sensitive).
>
> Can someone please comment on whether this behaviour accords with the
> HyperSpec:
>
> http://www.xanalys.com/software_tools/reference/HyperSpec/Body/23_ab.htm
>
> When the readtable case is :invert, then if all of the unescaped
> letters in the extended token are of the same case, those (unescaped)
> letters are converted to the opposite case.

CMUCL does exactly what the HyperSpec demands here.

> I'm thankful that all lowercase symbols are not converted to uppercase and
> vice versa. Does this mean the CMUCL behaviour is non-standard (or what
> has been called "modern" in other threads?)

They are converted as demanded by the HyperSpec (otherwise entering
(car (cons 1 2)) in that mode would fail), so this isn't modern mode.
The reason why you are confused is that both the reader and the
printer collude to give you the intended illusion that case is
completely preserved. Quoting from section 22.1.3.3.2 "Effect of
Readtable Case on the Lisp Printer":

When printer escaping is disabled, or the characters under consideration are
not already quoted specifically by single escape or multiple escape syntax,
the readtable case of the current readtable affects the way the Lisp printer
writes symbols in the following ways:

:upcase
When the readtable case is :upcase, uppercase characters are printed
in the case specified by *print-case*, and lowercase characters are
printed in their own case.

[...]

:invert
When the readtable case is :invert, the case of all alphabetic
characters in single case symbol names is inverted. Mixed-case symbol
names are printed as is.

So as long as you always use the Lisp Printer (or do something
similar), you will get the illusion of having both case preservation,
and the ability to access CL-mandated symbols in lower-case. However,
behind the scenes 'car is still the symbol CL:CAR, etc.

Regs, Pierre.

--
Pierre R. Mai <pm...@acm.org> http://www.pmsf.de/pmai/
The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We cause accidents. -- Nathaniel Borenstein

Adam Warner

unread,
May 26, 2002, 12:58:47 PM5/26/02
to
On Mon, 27 May 2002 04:14:47 +1200, Pierre R. Mai wrote:

> So as long as you always use the Lisp Printer (or do something similar),
> you will get the illusion of having both case preservation, and the
> ability to access CL-mandated symbols in lower-case. However, behind
> the scenes 'car is still the symbol CL:CAR, etc.

Thanks Pierre. I also find this is a clear example of what happens with (setf
(readtable-case *readtable*) :invert):

* (string :align)

"ALIGN"

Only (setf (readtable-case *readtable*) :preserve) actually preserves the
symbol case:

* (STRING :align)

"align"

But unfortunately (string :align) is undefined.

Regards,
Adam

Erik Naggum

unread,
May 26, 2002, 1:42:04 PM5/26/02
to
* Adam Warner <use...@consulting.net.nz>

| Yes I have achieved enlightenment Erik.

You made my day!

| I can't use preserve because none of the built in functions can be called
| using lower case. Perhaps a custom compiled CMUCL image would be a long
| term solution.

Well, that way lies madness. One Common Lisp vendor has decided to make
a "custom" world in which symbols are in their preferred lower-case.
While I also like to read and see lower-case, all I need to do to get
that most of the time is with either :invert or :upcase and *print-case*
to :downcase. However, if you want to use lower-case names in your own
code, you can shadow intern, find-symbol, and symbol-name to invert their
argument. Efficient invertion is not necessarily a trivial task, and
your implementation may have optimized functions for it, but this is a
shot, and intended to be an efficien tone. Just how efficient it is
seems to vary a lot between implementations:

(defun invert-string (string)
(declare (optimize (speed 3) (safety 0))
(simple-string string))
(check-type string 'string)
(prog ((invert nil)
(index 0)
(length (length string)))
(declare (simple-string invert)
(type (integer 0 65536) index length))
unknown-case
(cond ((= index length)
(return string))
((upper-case-p (schar string index))
(when (and (/= (1+ index) length)
(lower-case-p (schar string (1+ index))))
(return string))
(setq invert (copy-seq string))
(go upper-case))
((lower-case-p (schar string index))
(setq invert (copy-seq string))
(go lower-case))
(t
(incf index)
(go unknown-case)))
upper-case
(setf (schar invert index) (char-downcase (schar invert index)))
(incf index)
(cond ((= index length)
(return invert))
((lower-case-p (schar invert index))
(return string))
(t
(go upper-case)))
lower-case
(setf (schar invert index) (char-upcase (schar invert index)))
(incf index)
(cond ((= index length)
(return invert))
((upper-case-p (schar invert index))
(return string))
(t
(go lower-case)))))

Adam Warner

unread,
May 26, 2002, 7:34:37 PM5/26/02
to
On Mon, 27 May 2002 05:42:04 +1200, Erik Naggum wrote:

> * Adam Warner <use...@consulting.net.nz> | Yes I have achieved
> enlightenment Erik.
>
> You made my day!

And mine!

> | I can't use preserve because none of the built in functions can be
> called | using lower case. Perhaps a custom compiled CMUCL image would
> be a long | term solution.
>
> Well, that way lies madness. One Common Lisp vendor has decided to
> make a "custom" world in which symbols are in their preferred
> lower-case. While I also like to read and see lower-case, all I need
> to do to get that most of the time is with either :invert or :upcase
> and *print-case* to :downcase. However, if you want to use lower-case
> names in your own code, you can shadow intern, find-symbol, and
> symbol-name to invert their argument. Efficient invertion is not
> necessarily a trivial task, and your implementation may have optimized

> functions for it, but this is a shot, and intended to be an efficient


> one. Just how efficient it is seems to vary a lot between
> implementations:

Thanks for the code Erik. It appears to be slightly broken. Here's my
attempt to fix it:

> (defun invert-string (string)
> (declare (optimize (speed 3) (safety 0))
> (simple-string string))
> (check-type string 'string)

The above line is undefined. Shouldn't it be (check-type string string)? This
is comparing the variable called string against the type string. Luckily
Lisp has multiple namespaces. We also have string as a function.

> (prog ((invert nil)
> (index 0)
> (length (length string)))
> (declare (simple-string invert)
> (type (integer 0 65536) index length))

And an optimisation question: Doesn't this declare index and length to be
greater than 16-bit unsigned integers? (when starting from 0 the maximum
permissible unsigned value is 2^16-1). This probably causes the compiler
to optimise using 32-bit integers. On my computer it seems to make no
speed difference, probably because 32-bit integers are the minimum size
used on 32-bit machines.

Thanks again Erik.

Regards,
Adam

Erik Naggum

unread,
May 26, 2002, 9:10:18 PM5/26/02
to
* Adam Warner

| Shouldn't it be (check-type string string)?

Yes. I stuffed that line in just prior to posting. I keep making that
mistake, yet I think it seems more correct to use the quoted type for the
type, not just an unevaluated expression.

| And an optimisation question: Doesn't this declare index and length to be
| greater than 16-bit unsigned integers?

Yes, but this is actually irrelevant, since the point was only to limit
these things to less than array-dimension-limit, which it is annoyingly
verbose to do. I also keep misremembering that (integer 0 1) and
(integer (-1) (2)) are equivalent. I guess I believe upper limits should
be exclusive because they are everywhere else in the language. It is
surprisingly hard to learn things you believe should be different from
what they are. Thanks for reminding me of these things. Just goes to
show what happens when I post code I had not visited for weeks and had
just rattled off at the time -- it was just useful to me at the time.

| This probably causes the compiler to optimise using 32-bit integers.

Well, we do not generally have 32-bit integers in Common Lisp systems on
32-bit hardware. but at least this makes it use more than 16 bits. It
should have been only 65535, of course. A better way to specify this is
(unsigned-byte 16).

| On my computer it seems to make no speed difference, probably because
| 32-bit integers are the minimum size used on 32-bit machines.

(integer-length (- most-positive-fixnum most-negative-fixnum)) is usually
less than 32, and can be as low as 16. A quick survey finds that Allegro
CL and CMUCL have 30-bit signed fixnums, CLISP has 25-bit, and LispWorks
24-bit, all on a 32-bit Linux system.

0 new messages