invert-string revisited

Adam Warner

unread,

May 31, 2002, 11:29:26 AM5/31/02

to

Hi all,

A couple of days ago Erik Naggum provided this (now slightly modified) code
as an efficient shot at inverting case iff a string is all upper or lower
case (as a method for inverting symbols after they have been switched by
readtable invert):

(defun invert-string (string)
(declare (optimize (speed 3) (safety 0))
(simple-string string))
(check-type string string)
(prog ((invert nil)
(index 0)
(length (length string)))
(declare (simple-string invert)
(type (integer 0 65535) index length))
unknown-case
(cond ((= index length)
(return string))
((upper-case-p (schar string index))
(when (and (/= (1+ index) length)
(lower-case-p (schar string (1+ index))))
(return string))
(setq invert (copy-seq string))
(go upper-case))
((lower-case-p (schar string index))
(setq invert (copy-seq string))
(go lower-case))
(t
(incf index)
(go unknown-case)))
upper-case
(setf (schar invert index) (char-downcase (schar invert index)))
(incf index)
(cond ((= index length)
(return invert))
((lower-case-p (schar invert index))
(return string))
(t
(go upper-case)))
lower-case
(setf (schar invert index) (char-upcase (schar invert index)))
(incf index)
(cond ((= index length)
(return invert))
((upper-case-p (schar invert index))
(return string))
(t
(go lower-case)))))

Remember we have (setf (readtable-case *readtable*) :invert)

Entering the code into CMUCL and performing a crude speed test:

* (defun st1 ()
(time (loop for x from 1 to 10000 do
(invert-string (string :abc)))))

st1
* (st1)
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
2.59 seconds of real time
2.52 seconds of user run time
0.07 seconds of system run time
[Run times include 0.25 seconds GC run time]
424 page faults and
8693064 bytes consed.

Now I have come up with this first attempt unoptimised code:

(defun sym2str (sym)
(if (equal (string sym) (string-upcase (string sym)))
(string-downcase (string sym)) ;lowercase
(if (equal (string sym) (string-downcase (string sym)))
(string-upcase (string sym)) ;uppercase
(string sym) ;mixedcase
)))

Which converts a symbol to its correct string representation:

* (sym2str :lowercase)

"lowercase"
* (sym2str :UPPERCASE)

"UPPERCASE"
* (sym2str :Mixedcase)

"Mixedcase"

And the code is many times faster:

(defun st2 ()
(time (loop for x from 1 to 10000 do
(sym2str :abc))))

* (st2)
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
0.3 seconds of real time
0.28 seconds of user run time
0.02 seconds of system run time
0 page faults and
1359584 bytes consed.

Even the worst case scenario of an all uppercase symbol is many
times faster:

(defun st2 ()
(time (loop for x from 1 to 10000 do
(sym2str :ABC))))

* (st2)
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
0.47 seconds of real time
0.47 seconds of user run time
0.0 seconds of system run time
[Run times include 0.05 seconds GC run time]
0 page faults and
1917144 bytes consed.

Even if I supply Erik's code with a string to avoid the extra string
conversion it is still much slower:

(defun st1 ()
(time (loop for x from 1 to 10000 do
(invert-string "abc")))))

* (st1)
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
2.37 seconds of real time
2.32 seconds of user run time
0.04 seconds of system run time
[Run times include 0.18 seconds GC run time]
0 page faults and
8298216 bytes consed.

So it appears I've come up with a decent (and ridiculously simple)
algorithm for preserving original symbol case when converting symbols to
strings.

Regards,
Adam

Erik Naggum

unread,

May 31, 2002, 12:11:06 PM5/31/02

to

* Adam Warner

| So it appears I've come up with a decent (and ridiculously simple)
| algorithm for preserving original symbol case when converting symbols to
| strings.

Another decent and ridiculously simple way is to use the compiler.
Timing interpreted code is generally considered quite silly.

Before compilation of any function:

(st1)
; cpu time (non-gc) 3,730 msec user, 40 msec system
; cpu time (gc) 400 msec user, 20 msec system
; cpu time (total) 4,130 msec user, 60 msec system
; real time 4,689 msec
; space allocation:
; 2,230,144 cons cells, 14,594,640 other bytes, 0 static bytes
=> nil

(st2)
; cpu time (non-gc) 1,150 msec user, 0 msec system
; cpu time (gc) 120 msec user, 0 msec system
; cpu time (total) 1,270 msec user, 0 msec system
; real time 1,268 msec
; space allocation:
; 460,136 cons cells, 12,720,160 other bytes, 0 static bytes
=> nil

After compilation of all functions:

(st1)
; cpu time (non-gc) 20 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 20 msec user, 0 msec system
; real time 19 msec
; space allocation:
; 0 cons cells, 240,000 other bytes, 0 static bytes
=> nil

(st2)
; cpu time (non-gc) 30 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 30 msec user, 0 msec system
; real time 39 msec
; space allocation:
; 0 cons cells, 720,000 other bytes, 0 static bytes
=> nil

Test run on a 600 MHz Intel PIII with enough RAM and Allegro CL 6.2.beta.
I am more impressed with the interpreted speed than anything else...

Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
(are these guys Microsoft employees or what?. Before compilation:

* (st1)
Compiling LAMBDA NIL:
Compiling Top-Level Form:

Evaluation took:
2.98 seconds of real time
2.87 seconds of user run time
0.01 seconds of system run time
[Run times include 0.19 seconds GC run time]
447 page faults and
8699088 bytes consed.
NIL
* (st2)
Compiling LAMBDA NIL:
Compiling Top-Level Form:

Evaluation took:
0.37 seconds of real time
0.37 seconds of user run time

0.0 seconds of system run time

[Run times include 0.04 seconds GC run time]
0 page faults and
1351384 bytes consed.
NIL

After compilation:

* (st1)

Evaluation took:
0.0 seconds of real time
0.0 seconds of user run time

0.0 seconds of system run time

0 page faults and
155112 bytes consed.
NIL
* (st2)

Evaluation took:
0.02 seconds of real time
0.02 seconds of user run time

0.0 seconds of system run time

0 page faults and
319488 bytes consed.
NIL

Please note that space is sometimes worth optimizing for. One simple
test I could have done, with two additional states, is to test the
character following a character with case to see if it is in the same
case, to return the string unchanged if so.
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.

70 percent of American adults do not understand the scientific process.

Joe Marshall

unread,

May 31, 2002, 12:13:15 PM5/31/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message news:ad84p1$v4j2k$1...@ID-105510.news.dfncis.de...
> Hi all,

> Entering the code into CMUCL and performing a crude speed test:
>
> * (defun st1 ()
> (time (loop for x from 1 to 10000 do
> (invert-string (string :abc)))))
>
> st1
> * (st1)
> Compiling lambda nil:
> Compiling Top-Level Form:
>
> Evaluation took:
> 2.59 seconds of real time
> 2.52 seconds of user run time
> 0.07 seconds of system run time
> [Run times include 0.25 seconds GC run time]
> 424 page faults and
> 8693064 bytes consed.
>
>
> Now I have come up with this first attempt unoptimised code:
>
> (defun sym2str (sym)
> (if (equal (string sym) (string-upcase (string sym)))
> (string-downcase (string sym)) ;lowercase
> (if (equal (string sym) (string-downcase (string sym)))
> (string-upcase (string sym)) ;uppercase
> (string sym) ;mixedcase
> )))
>
>

> (defun st2 ()
> (time (loop for x from 1 to 10000 do
> (sym2str :abc))))
>
> * (st2)
> Compiling lambda nil:
> Compiling Top-Level Form:
>
> Evaluation took:
> 0.3 seconds of real time
> 0.28 seconds of user run time
> 0.02 seconds of system run time
> 0 page faults and
> 1359584 bytes consed.
>

That's an awfully long time for either of those.
I did the same test on my machine, but I had to multiply the
iteration count by 100. Here are my results:

Timing the evaluation of (LOOP FOR X FROM 1 TO 1000000 DO (INVERT-STRING (STRING :ABC)))

user time = 1.882
system time = 0.000
Elapsed time = 0:00:02
Allocation = 24001520 bytes standard / 1826 bytes fixlen
0 Page faults

Timing the evaluation of (LOOP FOR X FROM 1 TO 1000000 DO (SYM2STR :ABC))

user time = 4.556
system time = 0.020
Elapsed time = 0:00:05
Allocation = 48020400 bytes standard / 18139 bytes fixlen
0 Page faults

Did you forget to compile the code?

Adam Warner

unread,

May 31, 2002, 12:20:05 PM5/31/02

to

On Sat, 01 Jun 2002 03:29:26 +1200, Adam Warner wrote:

> (defun sym2str (sym)
> (if (equal (string sym) (string-upcase (string sym)))
> (string-downcase (string sym)) ;lowercase
> (if (equal (string sym) (string-downcase (string sym)))
> (string-upcase (string sym)) ;uppercase (string sym) ;mixedcase
> )))
>
> Which converts a symbol to its correct string representation:
>
> * (sym2str :lowercase)
>
> "lowercase"
> * (sym2str :UPPERCASE)
>
> "UPPERCASE"
> * (sym2str :Mixedcase)
>
> "Mixedcase"

BTW It also inverts strings. So it is a superset of Erik's invert-string
program (note that Erik explicitly rejected non-strings using check-type):

* (sym2str "lowercase")

"LOWERCASE"
* (sym2str "UPPERCASE")

"uppercase"
* (sym2str "Mixedcase")

"Mixedcase"

(Note that Erik explicitly rejected non-strings using check-type)

If the program was only converting strings it could be made even simpler
and faster:

(defun invert-string (str)
(if (equal str (string-upcase str)) (string-downcase str)
(if (equal str (string-downcase str)) (string-upcase str) str)))

Regards,
Adam

Adam Warner

unread,

May 31, 2002, 12:28:17 PM5/31/02

to

On Sat, 01 Jun 2002 04:11:06 +1200, Erik Naggum wrote:

> Please note that space is sometimes worth optimizing for. One simple
> test I could have done, with two additional states, is to test the
> character following a character with case to see if it is in the same
> case, to return the string unchanged if so.

Thanks for the tests you run, and the tip to compile :-)

If you want to be consistent compare against my simpler version (which
doesn't handle symbols) but with a check-type and remember the maximum
speed compile options (which are built into your code). It's far too late
for me to continue tonight.

Regards,
Adam

Peter Van Eynde

unread,

May 31, 2002, 12:28:15 PM5/31/02

to

Erik Naggum <er...@naggum.net> writes:

> Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> (are these guys Microsoft employees or what?. Before compilation:

Thanks :-S. No. But sometimes I'm stupid and don't upgrade the version
number. The build number is generated, so should always be different.

Groetjes, Peter

--
It's logic Jim, but not as we know it. | pvan...@debian.org
"God, root, what is difference?" - Pitr|
"God is more forgiving." - Dave Aronson| http://cvs2.cons.org/~pvaneynd/

Christophe Rhodes

unread,

May 31, 2002, 12:42:53 PM5/31/02

to

Peter Van Eynde <pvan...@debian.org> writes:

> Erik Naggum <er...@naggum.net> writes:
>
> > Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> > (are these guys Microsoft employees or what?. Before compilation:
>
> Thanks :-S. No. But sometimes I'm stupid and don't upgrade the version
> number. The build number is generated, so should always be different.

What happens when it overflows the fixnum range? ;-)

Christophe
--
Jesus College, Cambridge, CB5 8BL +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)

Adam Warner

unread,

May 31, 2002, 12:46:39 PM5/31/02

to

Joe Marshall wrote:

> Did you forget to compile the code?

Sure did :-) I now realise how silly that was when Erik's code was
designed to be compiled. I misunderstood the "Compiling" messages in the
output.

At least it's a good demonstration that if you are trying to write
portable Lisp code that may have to be interpreted that any compiler-style
optimisation can have a large negative performance impact.

Regards,
Adam

Andy

unread,

May 31, 2002, 1:32:41 PM5/31/02

to

Erik Naggum wrote:

[snip]

> After compilation of all functions:
>
> (st1)
> ; cpu time (non-gc) 20 msec user, 0 msec system
> ; cpu time (gc) 0 msec user, 0 msec system
> ; cpu time (total) 20 msec user, 0 msec system
> ; real time 19 msec
> ; space allocation:
> ; 0 cons cells, 240,000 other bytes, 0 static bytes
> => nil
>
> (st2)
> ; cpu time (non-gc) 30 msec user, 0 msec system
> ; cpu time (gc) 0 msec user, 0 msec system
> ; cpu time (total) 30 msec user, 0 msec system
> ; real time 39 msec
> ; space allocation:
> ; 0 cons cells, 720,000 other bytes, 0 static bytes
> => nil
>

This might be a nerd question. But i wonder why are there 0 conses. Does
the compiler convert all of them to stack operations ?

Best
AHz

Erik Naggum

unread,

May 31, 2002, 1:42:45 PM5/31/02

to

* Andy <a...@smi.de>

| This might be a nerd question. But i wonder why are there 0 conses.
| Does the compiler convert all of them to stack operations ?

Which conses would that be?

Andy

unread,

May 31, 2002, 1:53:49 PM5/31/02

to

Erik Naggum wrote:
>
> * Andy <a...@smi.de>
> | This might be a nerd question. But i wonder why are there 0 conses.
> | Does the compiler convert all of them to stack operations ?
>
> Which conses would that be?

Don't know :-(
Since i'm a beginner with lisp i don't understand the details
of consing completly (but working on that ;-).
I'm just asking because in your mail the ACL compiled function shows the line

; 0 cons cells, 240,000 other bytes, 0 static bytes

while the CMUCL compiled function shows

0 page faults and
155112 bytes consed.

(both for the st1 function).

Best regards
AHz

Joe Marshall

unread,

May 31, 2002, 1:59:40 PM5/31/02

to

There are two meanings of the word CONS in these contexts.

In the ACL version, they are talking about literal CONS cells.
Since the string inversion doesn't use lists, there *are* no
literal cons cells.

In the CMUCL version, they are talking about bytes allocated
(by CONS or otherwise). In this case CMUCL is allocating a
bit more than half the number of bytes.

Perhaps CMUCL is using 8-bit characters and ACL is using 16-bit
or perhaps ACL is forcing alignment on 16-byte boundaries and
cmucl is forcing alignment on 8-byte boundaries.

"Andy" <a...@smi.de> wrote in message news:3CF7B8AD...@smi.de...

Andy

unread,

May 31, 2002, 2:21:19 PM5/31/02

to

Joe Marshall wrote:
>
> There are two meanings of the word CONS in these contexts.
>
> In the ACL version, they are talking about literal CONS cells.
> Since the string inversion doesn't use lists, there *are* no
> literal cons cells.
>
> In the CMUCL version, they are talking about bytes allocated
> (by CONS or otherwise). In this case CMUCL is allocating a
> bit more than half the number of bytes.
>

This explains also the 240,000 other bytes in the ACL version.
Thanks

Best regards
AHz

Erik Naggum

unread,

May 31, 2002, 3:53:05 PM5/31/02

to

* Andy <a...@smi.de>

| Since i'm a beginner with lisp i don't understand the details of consing
| completly (but working on that ;-).

To "cons" (an object) as a verb is different from a "cons" (cell) as a
noun. See the glossary, the function cons, and the system class cons.

Erik Naggum

unread,

May 31, 2002, 4:19:56 PM5/31/02

to

* "Joe Marshall"

| Perhaps CMUCL is using 8-bit characters and ACL is using 16-bit
| or perhaps ACL is forcing alignment on 16-byte boundaries and
| cmucl is forcing alignment on 8-byte boundaries.

FWIW, the ACL version employed uses 16-bit characters in strings.

Adam Warner

unread,

May 31, 2002, 7:55:02 PM5/31/02

to

On Sat, 01 Jun 2002 04:11:06 +1200, Erik Naggum wrote:

> Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> (are these guys Microsoft employees or what?. Before compilation:
>
> * (st1)
> Compiling LAMBDA NIL:
> Compiling Top-Level Form:
>
> Evaluation took:
> 2.98 seconds of real time
> 2.87 seconds of user run time
> 0.01 seconds of system run time
> [Run times include 0.19 seconds GC run time] 447 page faults and
> 8699088 bytes consed.
> NIL
> * (st2)
> Compiling LAMBDA NIL:
> Compiling Top-Level Form:
>
> Evaluation took:
> 0.37 seconds of real time
> 0.37 seconds of user run time
> 0.0 seconds of system run time
> [Run times include 0.04 seconds GC run time] 0 page faults and 1351384
> bytes consed.
> NIL
>
> After compilation:

Help! How did you compile invert-string Erik? It's broken on my platform
(which is similar to yours): CMU Common Lisp release x86-linux 3.0.12
18d+ 23 May 2002 build 3350

Loaded code into vanilla CMUCL image:

INVERT-STRING
* (compile 'invert-string)

Compiling LAMBDA (STRING):

In: LAMBDA (STRING)
(PROG ((INVERT NIL) (INDEX 0) (LENGTH #))
(DECLARE (SIMPLE-STRING INVERT) (TYPE # INDEX LENGTH))
UNKNOWN-CASE
(COND (# #) (# # # #) (# # #) (T # #)) ...)
--> BLOCK
==>
(LET ((INVERT NIL) (INDEX 0) (LENGTH #))
(DECLARE (SIMPLE-STRING INVERT) (TYPE # INDEX LENGTH)) (TAGBODY
UNKNOWN-CASE (COND # # # #) UPPER-CASE (SETF # #) ...))
Warning: The binding of INVERT is not a
(VALUES &OPTIONAL SIMPLE-BASE-STRING &REST T):
NIL

(RETURN STRING)
--> RETURN-FROM
==>
STRING
Note: Deleting unreachable code.

(SCHAR STRING INDEX)
--> AREF
==>
INDEX
Note: Deleting unreachable code.

(1+ INDEX)
--> +
==>
INDEX
Note: Deleting unreachable code.
[Last message occurs 2 times]

(RETURN STRING)
--> RETURN-FROM
==>
STRING
Note: Deleting unreachable code.

(SETQ INVERT (COPY-SEQ STRING))
Note: Deleting unreachable code.

(SETF (SCHAR INVERT INDEX) (CHAR-DOWNCASE (SCHAR INVERT INDEX)))
--> COMMON-LISP::%SCHARSET COMMON-LISP::%ASET THE ==>
INVERT
Note: Deleting unreachable code.

(INCF INDEX)
--> LET* +
==>
INDEX
Note: Deleting unreachable code.

(SCHAR INVERT INDEX)
--> AREF THE
==>
INVERT
Note: Deleting unreachable code.

(RETURN STRING)
--> RETURN-FROM
==>
STRING
Note: Deleting unreachable code.

(RETURN INVERT)
--> RETURN-FROM
==>
INVERT
Note: Deleting unreachable code.

(SCHAR STRING INDEX)
--> AREF
==>
INDEX
Note: Deleting unreachable code.

(SETQ INVERT (COPY-SEQ STRING))
Note: Deleting unreachable code.

(SETF (SCHAR INVERT INDEX) (CHAR-UPCASE (SCHAR INVERT INDEX)))
--> COMMON-LISP::%SCHARSET COMMON-LISP::%ASET THE ==>
INVERT
Note: Deleting unreachable code.

(INCF INDEX)
--> LET* +
==>
INDEX
Note: Deleting unreachable code.

(SCHAR INVERT INDEX)
--> AREF THE
==>
INVERT
Note: Deleting unreachable code.

(RETURN STRING)
--> RETURN-FROM
==>
STRING
Note: Deleting unreachable code.

(RETURN INVERT)
--> RETURN-FROM
==>
INVERT
Note: Deleting unreachable code.

(INCF INDEX)
--> LET* +
==>
INDEX
Note: Deleting unreachable code.

Compiling Top-Level Form:

Compilation unit finished.
1 warning
19 notes

INVERT-STRING
T
T
* (invert-string "abc")

Type-error in KERNEL::OBJECT-NOT-TYPE-ERROR-HANDLER:
NIL is not of type SIMPLE-BASE-STRING

Restarts:
0: [ABORT] Return to Top-Level.

Debug (type H for help)

(INVERT-STRING #<unused-arg>)
Source: (PROG ((INVERT NIL) (INDEX 0) (LENGTH #))
(DECLARE (SIMPLE-STRING INVERT) (TYPE # INDEX LENGTH))
UNKNOWN-CASE
(COND (# #) (# # # #) (# # #) (T # #)) ...)
0]

Here's my code to match yours:

(defun invert-str (str)

(declare (optimize (speed 3) (safety 0))

(simple-string str))
(check-type str string)

(if (equal str (string-upcase str)) (string-downcase str)
(if (equal str (string-downcase str)) (string-upcase str) str)))

It appears it will be significantly slower. But I'd like to find out how
to compile yours.

* (compile 'invert-str)
Compiling lambda (str):
Compiling Top-Level Form:

invert-str
nil
nil

* (defun st2 ()
(time (loop for x from 1 to 1000000 do
(invert-str "abc"))))

st2
* (compile 'st2)

Compiling lambda nil:
Compiling Top-Level Form:

st2
nil
nil
* (st2)

Evaluation took:
3.97 seconds of real time
3.56 seconds of user run time
0.42 seconds of system run time
[Run times include 1.05 seconds GC run time]
0 page faults and
47808176 bytes consed.

Regards,
Adam

Joe Marshall

unread,

May 31, 2002, 8:06:47 PM5/31/02

to

The initial value of the variable invert is NIL,
yet it is declared to be a simple string. One solution is
to declare it to be (or null simple-string), but I think a
better solution is to initially bind it to "".

"Adam Warner" <use...@consulting.net.nz> wrote in message news:ad92d1$v7crv$1...@ID-105510.news.dfncis.de...

Adam Warner

unread,

May 31, 2002, 8:26:39 PM5/31/02

to

On Sat, 01 Jun 2002 12:06:47 +1200, Joe Marshall wrote:

> The initial value of the variable invert is NIL,
> yet it is declared to be a simple string. One solution is
> to declare it to be (or null simple-string), but I think a
> better solution is to initially bind it to "".

... (prog ((invert "") ...

Thanks. Got it to compile now.

(defun st1 ()

(time (loop for x from 1 to 1000000 do

(invert-string "abc"))))

* (st1)

Evaluation took:
1.05 seconds of real time
0.84 seconds of user run time
0.21 seconds of system run time
[Run times include 0.38 seconds GC run time]
414 page faults and
15939920 bytes consed.

(defun st2 ()
(time (loop for x from 1 to 1000000 do
(invert-str "abc"))))

* (st2)

Evaluation took:
3.91 seconds of real time
3.41 seconds of user run time
0.45 seconds of system run time
[Run times include 1.01 seconds GC run time]
0 page faults and
47802072 bytes consed.

Remarkable. Erik's code turns out to be four times faster when everything
is compiled.

Very impressive. Thanks for helping me out everyone.

Regards,
Adam

Adam Warner

unread,

May 31, 2002, 8:35:33 PM5/31/02

to

On Sat, 01 Jun 2002 04:11:06 +1200, Erik Naggum wrote:

> Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> (are these guys Microsoft employees or what?. Before compilation:

BTW Erik, since the code could not be compiled on my platform without
changing (prog ((invert nil) ... to (prog ((invert "") ... how come it
compiled for you?

Regards,
Adam

Wade Humeniuk

unread,

May 31, 2002, 9:45:13 PM5/31/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message

news:ad87o0$urdlr$1...@ID-105510.news.dfncis.de...

You probably want to use string= instead of equal

Or yet another version:

CL-USER 65 > (defun invert-string (string)
(loop for char of-type character across string
with uppercase-p = nil
with lowercase-p = nil
when (upper-case-p char) do (setf uppercase-p t)
when (lower-case-p char) do (setf lowercase-p t)
finally return (cond
((eq uppercase-p lowercase-p) string)
(uppercase-p (string-downcase string))
(lowercase-p (string-upcase string)))))
INVERT-STRING

CL-USER 66 > (compile 'invert-string)
INVERT-STRING
NIL
NIL

CL-USER 67 > (defun st1 ()
(time (loop for x from 1 to 1000000 do
(invert-string (string :abc)))))

ST1

CL-USER 68 > (compile 'st1)
ST1
NIL
NIL

CL-USER 69 > (st1)

1.1 seconds used.
Standard Allocation 24000256 bytes.
Fixlen Allocation 0 bytes.
NIL

CL-USER 70 >

Wade

Adam Warner

unread,

May 31, 2002, 10:08:31 PM5/31/02

to

On Sat, 01 Jun 2002 13:45:13 +1200, Wade Humeniuk wrote:

>> (defun invert-string (str)
>> (if (equal str (string-upcase str)) (string-downcase str)
>> (if (equal str (string-downcase str)) (string-upcase str) str)))
>
> You probably want to use string= instead of equal

Thanks for that tip. That alone makes the program significantly faster.
From 3.41s user run time to 2.72s.

Your version is incredibly fast:

* (defun invert-string (string)

(loop for char of-type character across string
with uppercase-p = nil
with lowercase-p = nil
when (upper-case-p char) do (setf uppercase-p t)
when (lower-case-p char) do (setf lowercase-p t)
finally return (cond
((eq uppercase-p lowercase-p) string)
(uppercase-p (string-downcase string))
(lowercase-p (string-upcase string)))))

INVERT-STRING

* (compile 'invert-string)
Compiling LAMBDA (STRING):

Compiling Top-Level Form:

In: LAMBDA (STRING)
(LOOP FOR CHAR OF-TYPE CHARACTER ...)
--> LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY
==>
(PROGN RETURN (COND (# STRING) (UPPERCASE-P #) (LOWERCASE-P #)))
Warning: Undefined variable: RETURN

Warning: This variable is undefined:
RETURN

Compilation unit finished.
2 warnings

INVERT-STRING
NIL
NIL
* (defun st1 ()

(time (loop for x from 1 to 1000000 do

(invert-string "abc"))))

ST1
* (compile 'st1)

Compiling LAMBDA NIL:
Compiling Top-Level Form:

ST1
NIL
NIL
* (st1)

Evaluation took:
0.14 seconds of real time
0.15 seconds of user run time

0.0 seconds of system run time

0 page faults and
0 bytes consed.

It's around six times faster than Erik's on my system

* (compile 'invert-string)
Compiling LAMBDA (STRING):

Compiling Top-Level Form:

INVERT-STRING
NIL
NIL
* (st1)

Evaluation took:
1.04 seconds of real time
0.9 seconds of user run time
0.14 seconds of system run time
[Run times include 0.38 seconds GC run time]
0 page faults and
15936064 bytes consed.

But your version only returns NIL :-)

(invert-string "abc")

NIL

When it returns the string it might make a difference!

Regards,
Adam

Joe Marshall

unread,

May 31, 2002, 11:06:31 PM5/31/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message news:ad94ov$vb8v9$1...@ID-105510.news.dfncis.de...

I don't want to put words in Erik's mouth, but I can answer this one.
Allegro CL doesn't depend on type declarations as much as CMUCL, nor
does it check them for consistency as thoroughly as CMUCL. As a result,
it is very easy to write an inconsistent declaration because the
compiler isn't going to tell you about it.

Wade Humeniuk

unread,

May 31, 2002, 11:27:20 PM5/31/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message

news:ad9a79$ugn79$1...@ID-105510.news.dfncis.de...

It looks like your CL has a different implementation of loop. I think my func is proper
is a proper ANSI CL.

In the meantime a more "primitive version" could be

CL-USER 16 > (defun invert-string (string)
(let ((char nil)

(index 0)
(length (length string))

(uppercase-p nil)
(lowercase-p nil))
(declare (character char))
(tagbody
start
(when (< index length)
(progn
(setf char (char string index))
(when (upper-case-p char) (setf uppercase-p t))
(when (lower-case-p char) (setf lowercase-p t))
(incf index)
(go start))))

(cond
((eq uppercase-p lowercase-p) string)
(uppercase-p (string-downcase string))
(lowercase-p (string-upcase string)))))
INVERT-STRING

CL-USER 17 > (compile 'invert-string)
INVERT-STRING
NIL
NIL

CL-USER 18 > (st1)

1.0 seconds used.
Standard Allocation 24000000 bytes.

Fixlen Allocation 0 bytes.
NIL

CL-USER 19 > (invert-string "WADE")
"wade"

CL-USER 20 > (invert-string "wade")
"WADE"

CL-USER 21 > (invert-string "Wade")
"Wade"

CL-USER 22 >

Adam Warner

unread,

May 31, 2002, 11:49:44 PM5/31/02

to

On Sat, 01 Jun 2002 13:45:13 +1200, Wade Humeniuk wrote:

> CL-USER 65 > (defun invert-string (string)
> (loop for char of-type character across string
> with uppercase-p = nil
> with lowercase-p = nil
> when (upper-case-p char) do (setf uppercase-p t)
> when (lower-case-p char) do (setf lowercase-p t)
> finally return (cond
> ((eq uppercase-p lowercase-p) string)
> (uppercase-p (string-downcase string))
> (lowercase-p (string-upcase string)))))

Your code was the inspriation for my version Wade. It seems to be about as fast
as Erik's but using an approach I can understand:

(defun invert-str (str)
(declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
(simple-string str) (bit up) (bit down))
(let ((up 0) (down 0))
(block skip
(loop for char of-type character across str do
(if (upper-case-p char) (setf up 1))
(if (lower-case-p char) (setf down 1))
(if (= up down 1) (return-from skip str)))
(if (= up 1) (string-upcase str) (string-downcase str)))))

With everything compiled:

*(st2)
Evaluation took:
1.12 seconds of real time
0.94 seconds of user run time
0.14 seconds of system run time
[Run times include 0.31 seconds GC run time]
0 page faults and
15937472 bytes consed.

Compiling everything for Erik's version (I included the extra
(compilation-speed 0) (debug 0) optimisation options):

* (st1)
Evaluation took:
1.02 seconds of real time
0.89 seconds of user run time
0.13 seconds of system run time
[Run times include 0.35 seconds GC run time]
0 page faults and
15945384 bytes consed.

We also have somewhat similar performance when the string is mixed case ("Abc"):

* (st1)
Evaluation took:
0.12 seconds of real time
0.12 seconds of user run time

0.0 seconds of system run time

0 page faults and
0 bytes consed.

NIL

* (compile 'st2)

Compiling LAMBDA NIL:
Compiling Top-Level Form:

ST2
NIL

NIL
* (st2)
Evaluation took:

0.18 seconds of real time
0.19 seconds of user run time

0.0 seconds of system run time

0 page faults and
0 bytes consed.

Though Erik's version shows a bit more of a performance gain. At least I'm
now in the ballpark.

One thing I would like to understand is why the compiler warns me
that the variables up and down have not been declared when their type
has been set in declare and they have been explicitly set with a let
statement:

* (compile 'invert-str)
Compiling LAMBDA (STR):
Compiling Top-Level Form:

In: LAMBDA (STR)
#'(LAMBDA (STR)
(DECLARE (OPTIMIZE # # # #) (SIMPLE-STRING STR) (BIT UP) (BIT DOWN))
(BLOCK INVERT-STR
(LET #
#)))
Warning: Undefined variable: DOWN
Warning: Undefined variable: UP

Warning: These variables are undefined:
DOWN UP

Compilation unit finished.
3 warnings

Regards,
Adam

Adam Warner

unread,

Jun 1, 2002, 12:00:40 AM6/1/02

to

On Sat, 01 Jun 2002 15:27:20 +1200, Wade Humeniuk wrote:

> It looks like your CL has a different implementation of loop. I think

> my func is a proper ANSI CL.

Luckily I used a simple loop in my version then. Graham does state in ANSI
Common Lisp (p 239): "If you are one of the many Lisp programmers who have
been planning one day to understand what loop does, there is some good
news and some bad news. The good news is that you are not alone: almost no
one understands it. The bad news is that you probably never will, because
the ANSI standard does not really give a formal specification of its
behaviour."

The optimisation has been fun. Thanks everyone.

BTW it's a pity the current ilisp code isn't readtable invert compatible.

Regards,
Adam

Joe Marshall

unread,

Jun 1, 2002, 12:06:47 AM6/1/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message news:ad9489$us8lt$1...@ID-105510.news.dfncis.de...

> Remarkable. Erik's code turns out to be four times faster when everything
> is compiled.
>

(defun sym2str (sym)

(if (equal (string sym) (string-upcase (string sym)))
(string-downcase (string sym)) ;lowercase
(if (equal (string sym) (string-downcase (string sym)))
(string-upcase (string sym)) ;uppercase
(string sym) ;mixedcase
)))

In this version, the string is traversed a minimum of 3 times,
and at least 2 copies of the string are created. In the
case that the string is all lower case, it is traversed 5 times,
and three copies are made, one of which is the same as the
original, the other two are the same as each other.

In Erik's version, the string is traversed a minimum of once,
and no copy is made if there are no characters with case.
At most, the string is traversed twice and one copy is made.

Wade Humeniuk

unread,

Jun 1, 2002, 12:54:37 AM6/1/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message

news:ad9g52$ubfeq$1...@ID-105510.news.dfncis.de...

Maybe your compiler is really picky, try

(defun invert-str (str)
(declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))

(simple-string str))

(let ((up 0) (down 0))

(declare (bit up down))

Tim Moore

unread,

Jun 1, 2002, 1:23:53 AM6/1/02

to

On Fri, 31 May 2002 21:27:20 -0600, Wade Humeniuk <hume...@cadvision.com>
wrote:

>
>"Adam Warner" <use...@consulting.net.nz> wrote in message

>> * (defun invert-string (string)
>> (loop for char of-type character across string
>> with uppercase-p = nil
>> with lowercase-p = nil
>> when (upper-case-p char) do (setf uppercase-p t)
>> when (lower-case-p char) do (setf lowercase-p t)
>> finally return (cond
>> ((eq uppercase-p lowercase-p) string)
>> (uppercase-p (string-downcase string))
>> (lowercase-p (string-upcase string)))))
>>
>> INVERT-STRING
>> * (compile 'invert-string)

...

>> Warning: This variable is undefined:
>> RETURN
>It looks like your CL has a different implementation of loop. I think my func is proper
>is a proper ANSI CL.

No. one or more compound forms, not a loop clause, follows finally.

Tim

Adam Warner

unread,

Jun 1, 2002, 1:26:32 AM6/1/02

to

On Sat, 01 Jun 2002 16:54:37 +1200, Wade Humeniuk wrote:

>> One thing I would like to understand is why the compiler warns me
>> that the variables up and down have not been declared when their type
>> has been set in declare and they have been explicitly set with a let
>> statement:
>
> Maybe your compiler is really picky, try
>
> (defun invert-str (str)
> (declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
> (simple-string str))
> (let ((up 0) (down 0))
> (declare (bit up down))
> (block skip
> (loop for char of-type character across str do
> (if (upper-case-p char) (setf up 1))
> (if (lower-case-p char) (setf down 1))
> (if (= up down 1) (return-from skip str)))
> (if (= up 1) (string-upcase str) (string-downcase str)))))

Yes really picky. That works thanks. If I don't want warnings I have to
declare a variable's type after it exists. Which seems odd because if the
compiler knows what type a variable should be in advance it might be able
to optimise the initial storage of the variable.

Regards,
Adam

Tim Moore

unread,

Jun 1, 2002, 1:26:08 AM6/1/02

to

On Sat, 01 Jun 2002 16:00:40 +1200, Adam Warner <use...@consulting.net.nz>
wrote:

>On Sat, 01 Jun 2002 15:27:20 +1200, Wade Humeniuk wrote:
>
>> It looks like your CL has a different implementation of loop. I think
>> my func is a proper ANSI CL.
>
>Luckily I used a simple loop in my version then. Graham does state in ANSI
>Common Lisp (p 239): "If you are one of the many Lisp programmers who have
>been planning one day to understand what loop does, there is some good
>news and some bad news. The good news is that you are not alone: almost no
>one understands it. The bad news is that you probably never will, because
>the ANSI standard does not really give a formal specification of its
>behaviour."

What a load of shit!

Tim

Adam Warner

unread,

Jun 1, 2002, 1:49:19 AM6/1/02

to

On Sat, 01 Jun 2002 15:49:44 +1200, Adam Warner wrote:

> (defun invert-str (str)
> (declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
> (simple-string str) (bit up) (bit down))
> (let ((up 0) (down 0))
> (block skip
> (loop for char of-type character across str do
> (if (upper-case-p char) (setf up 1))
> (if (lower-case-p char) (setf down 1))
> (if (= up down 1) (return-from skip str)))
> (if (= up 1) (string-upcase str) (string-downcase str)))))

It has been pointed out to me via email that I should be using t and nil
instead of 0 and 1. I thought it would be faster to explictly declare a
boolean value using bit. It turns out that the more elegant code is the
same speed (as best I can tell):

(defun invert-str (str)
(declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))

(simple-string str))
(let ((up nil) (down nil))

(block skip
(loop for char of-type character across str do

(if (upper-case-p char) (setf up t))
(if (lower-case-p char) (setf down t))
(if (and up down) (return-from skip str)))
(if up (string-upcase str) (string-downcase str)))))

The CMUCL complier must be very intelligent to deduce the type of
variable without it having to be declared (unless this is not what's
happening and it's just that the speed difference isn't apparent).

Regards,
Adam

Adam Warner

unread,

Jun 1, 2002, 3:10:16 AM6/1/02

to

Thanks to a tip from Paul using a cond is a little faster in some
circumstances:

(defun invert-str (str)
(declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
(simple-string str))
(let ((up nil) (down nil))
(block skip
(loop for char of-type character across str do

(cond ((upper-case-p char) (if down (return-from skip str) (setf up t)))
((lower-case-p char) (if up (return-from skip str) (setf down t)))))
(if up (string-downcase str) (string-upcase str)))))

The difference here is that if char matches an upper case character
then the next conditional test for a lower case character is skipped. The
order is relevant as most symbols written in lowercase will become
uppercase when using readtable invert.

Putting the ifs in the conditional will make the algorithm faster when
the character is neither upper nor lower case. In such circumstances
there's no need to test whether there has been a character mismatch.

Note that some of my code did not switch the case right at the end:
I had (if up (string-upcase str) ... ;-)

Regards,
Adam

Andy

unread,

Jun 1, 2002, 6:01:10 AM6/1/02

to

Erik Naggum wrote:
>
> * Andy <a...@smi.de>
> | Since i'm a beginner with lisp i don't understand the details of consing
> | completly (but working on that ;-).
>
> To "cons" (an object) as a verb is different from a "cons" (cell) as a
> noun. See the glossary, the function cons, and the system class cons.

After consulting the Hyperspec glossary i realized that cons is also used as an
idiom for storage allocation. I wasn't aware of this.

Thanks for the help.
Best regards
AHz

Erik Naggum

unread,

Jun 1, 2002, 7:29:44 AM6/1/02

to

* Adam Warner

| Help! How did you compile invert-string Erik?

First, I learned how to quote less from other people's articles and from
program output -- paring it down to the essentials helps all parties.
That was in June, 1980 or so. Then I learned to _read_ error messages
from computers in the hopes that the author of the software was able to
express more than incoherent babble when things did not go his way. That
was in July 1980 or so. Then I learned that modifying the specific part
of the input about which programs had complained usually altered the
complaint, and over time I found out how to make the complaint go away
more often than not. This was in in August 1980 or so. Then, in May
2002, I applied these skills.

| Here's my code to match yours:

Please pay attention. CMUCL complained about the binding of INVERT was
NIL, not a SIMPLE-BASE-STRING. How hard could it be to figure out what
to do?

Sorry, I have lost all interest in this. I offered the code in the
interest of showing you something, but you bicker and bitch. I deeply
and profoundly regret that I tried to help you and I will not try again.

Joe Marshall

unread,

Jun 1, 2002, 7:56:19 AM6/1/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message news:ad9g52$ubfeq$1...@ID-105510.news.dfncis.de...

>
> One thing I would like to understand is why the compiler warns me
> that the variables up and down have not been declared when their type
> has been set in declare and they have been explicitly set with a let
> statement:
>

(defun invert-str (str)
(declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
(simple-string str) (bit up) (bit down))
(let ((up 0) (down 0))

(block skip ....

Declarations are lexically scoped. There is no lexically visible binding
for the variables UP and DOWN at the point at which they are declared,
so the declaration is in error.

Adam Warner

unread,

Jun 1, 2002, 8:03:58 AM6/1/02

to

Joe Marshall wrote:

That's understood Joe. Just realise that Erik provided benchmark results
using exactly the same version and build of CMUCL as I.

Regards,
Adam

Adam Warner

unread,

Jun 1, 2002, 9:42:36 AM6/1/02

to

Erik Naggum wrote:

> * Adam Warner
> | Help! How did you compile invert-string Erik?
>
> First, I learned how to quote less from other people's articles and
> from program output -- paring it down to the essentials helps all
> parties. That was in June, 1980 or so. Then I learned to _read_ error
> messages from computers in the hopes that the author of the software
> was able to express more than incoherent babble when things did not go
> his way. That was in July 1980 or so. Then I learned that modifying
> the specific part of the input about which programs had complained
> usually altered the complaint, and over time I found out how to make
> the complaint go away more often than not. This was in in August 1980
> or so. Then, in May 2002, I applied these skills.

Congratulations.

> | Here's my code to match yours:
>
> Please pay attention. CMUCL complained about the binding of INVERT
> was NIL, not a SIMPLE-BASE-STRING. How hard could it be to figure out
> what to do?

By any chance when you benchmarked your code on CMUCL--using the same
version number and build of CMUCL as myself--did the code fail to compile?
If so, how hard could it have been to note this?

> Sorry, I have lost all interest in this. I offered the code in the
> interest of showing you something, but you bicker and bitch. I deeply
> and profoundly regret that I tried to help you and I will not try
> again.

I am confident that I have not bickered nor bitched nor conducted myself
inappropriately. And with the help of a number of people--including you
Erik--I have learned a great deal. So at least I have nothing to
profoundly regret.

Adam Warner

Raymond Wiker

unread,

Jun 1, 2002, 2:12:39 PM6/1/02

to

Christophe Rhodes <cs...@cam.ac.uk> writes:

> Peter Van Eynde <pvan...@debian.org> writes:

>
> > Erik Naggum <er...@naggum.net> writes:
> >
> > > Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> > > (are these guys Microsoft employees or what?. Before compilation:
> >

> > Thanks :-S. No. But sometimes I'm stupid and don't upgrade the version
> > number. The build number is generated, so should always be different.
>
> What happens when it overflows the fixnum range? ;-)

Printing the build number becomes slower?

--
Raymond Wiker Mail: Raymon...@fast.no
Senior Software Engineer Web: http://www.fast.no/
Fast Search & Transfer ASA Phone: +47 23 01 11 60
P.O. Box 1677 Vika Fax: +47 35 54 87 99
NO-0120 Oslo, NORWAY Mob: +47 48 01 11 60

Try FAST Search: http://alltheweb.com/

Thomas F. Burdick

unread,

Jun 1, 2002, 3:20:57 PM6/1/02

to

Adam Warner <use...@consulting.net.nz> writes:

> That's understood Joe. Just realise that Erik provided benchmark results
> using exactly the same version and build of CMUCL as I.

Since the code appeared in one article, and the benchmark results
later, I assume he noticed the error and corrected it, figuring that
anyone who enjoys worrying about speed enough to indulge in a silly
benchmarking thread would also have the wherewithal to notice and fix
the mistake. Let's revisit the debugger output you got:

* (invert-string "abc")

Type-error in KERNEL::OBJECT-NOT-TYPE-ERROR-HANDLER:
NIL is not of type SIMPLE-BASE-STRING

Restarts:
0: [ABORT] Return to Top-Level.

Debug (type H for help)

(INVERT-STRING #<unused-arg>)
Source: (PROG ((INVERT NIL) (INDEX 0) (LENGTH #))
(DECLARE (SIMPLE-STRING INVERT) (TYPE # INDEX LENGTH))
UNKNOWN-CASE
(COND (# #) (# # # #) (# # #) (T # #)) ...)
0]

Honestly, I think it would be hard for it to get any clearer.
Something should be a SIMPLE-BASE-STRING, but it's NIL. Look for NIL
&/or the SIMPLE-STRING declaration, and you've found the problem. It
doesn't look like you put any effort into figuring out what went
wrong, not the least because you said you couldn't compile it, but it
/compiled/ just fine. You got a runtime error. And included all of
CMUCL's chattiness. Frankly, that's rude. It might have been
unintentional, but it's rude nonetheless.

--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'

Thomas F. Burdick

unread,

Jun 1, 2002, 3:58:18 PM6/1/02

to

Adam Warner <use...@consulting.net.nz> writes:

> Yes really picky. That works thanks. If I don't want warnings I have to
> declare a variable's type after it exists. Which seems odd because if the
> compiler knows what type a variable should be in advance it might be able
> to optimise the initial storage of the variable.

It's not being picky, it's being correct. The declare statement in
the DEFUN is saying that in the environment of the DEFUN, UP and DOWN
are BITs, and at this point, there's no binding for UP or DOWN. Then
you promptly introduce new bindings for UP and DOWN. These are
different variables, so those declarations don't apply. Remember, LET
isn't assignment, it makes a new lexical environment, with new
bindings (variables). The declarations are dealt with at compile
time, and apply to the environment of the form they're attached to.
So when you do:

(let ((up 0) (down 0))

(declare (bit up down)) ...)

It means the same thing as the imaginary form:

(with-verbose-bindings
((up :of-type bit :initially 0)
(down :of-type bit :initially 0))
...)

Thomas F. Burdick

unread,

Jun 1, 2002, 4:27:46 PM6/1/02

to

Adam Warner <use...@consulting.net.nz> writes:

> (defun invert-str (str)
> (declare (optimize (speed 3) (compilation-speed 0) (debug 0) (safety 0))
> (simple-string str))
> (let ((up nil) (down nil))
> (block skip
> (loop for char of-type character across str do
> (if (upper-case-p char) (setf up t))
> (if (lower-case-p char) (setf down t))
> (if (and up down) (return-from skip str)))
> (if up (string-upcase str) (string-downcase str)))))
>
> The CMUCL complier must be very intelligent to deduce the type of
> variable without it having to be declared (unless this is not what's
> happening and it's just that the speed difference isn't apparent).

Although the CMUCL compiler is generally very intelligent, in this
case it doesn't need to be. Assigning T or NIL to a variable is just
an assignment, no type check is needed. And testing if a variable is
NIL is the same cost, whether it was declared of type BOOLEAN or not.
Either way, the code for the test part of

(lambda (x) (if x ...))

will be

CMP %A0, %NULL ; %A0 = X
BNE L0

Adam Warner

unread,

Jun 1, 2002, 8:43:52 PM6/1/02

to

On Sun, 02 Jun 2002 07:20:57 +1200, Thomas F. Burdick wrote:

> Adam Warner <use...@consulting.net.nz> writes:
>
>> That's understood Joe. Just realise that Erik provided benchmark
>> results using exactly the same version and build of CMUCL as I.
>
> Since the code appeared in one article, and the benchmark results later,
> I assume he noticed the error and corrected it, figuring that anyone who
> enjoys worrying about speed enough to indulge in a silly benchmarking
> thread would also have the wherewithal to notice and fix the mistake.
> Let's revisit the debugger output you got:

"the wherewithal to notice and fix the mistake." That is the sum of my
human failing: I hastily asked for help in an active thread and received
help within 11 minutes.

(snip)

> Honestly, I think it would be hard for it to get any clearer.

Again I agree Thomas. I failed as a sentient being. I hastily asked for
help in an active thread and received help within 11 minutes.

(snip)

> You got a runtime error. And included all of CMUCL's chattiness.
> Frankly, that's rude. It might have been unintentional, but it's rude
> nonetheless.

Posting a bug report is not rude. If you want an example of what's
profoundly rude Thomas, just tell someone that you deeply and profoundly
regret trying to help them.

After starting this thread by making a fool of myself through lack of
understanding I set about to reproduce the tests. I broadcast that my
simple algorithm was dog slow when compiled, being around four times
slower than Erik's code (what an embarrassment!) With the help of others I
was able to come up with an acceptably performing algorithm that accords
with my sense of aesthetics (for example it avoids five `go' statements).

Lisp throws away symbol information as part of its historical baggage. An
invert workaround is available to preserve the information while still
allowing lowercase code. If symbols are being used as the building blocks
of HTML/XML generation then the invert code will be called many times.
There's a good reason for why it should be fast.

Thank you very much for your additional comments in the thread. They are
much appreciated, expecially the information about how CMUCL constructs
assembly code and how no type checking is needed when a variable is
assigned t or nil.

Regards,
Adam

Erik Naggum

unread,

Jun 2, 2002, 5:33:23 AM6/2/02

to

* Adam Warner

| Posting a bug report is not rude. If you want an example of what's
| profoundly rude Thomas, just tell someone that you deeply and profoundly
| regret trying to help them.

Help has to be deserved. If you receive help and turn around to be a
jackass towards the person who has helped you, you have been so rude that
nobody should consider helping you again. That you are arrogant beyond
belief, too, and attempt to take the high moral ground by refusing to
listen to hints to modify your behavior, but say outright that you have
done nothing wrong, that people who perceive you as rude are wrong, is
nothing short of despicable. People like you are the reason I have no
_desire_ to share code with anyone, anymore. People like you are the
reason I generally let others deal with newbies. Too many of you turn
out to remain clueless, anyway.

Nils Goesche

unread,

Jun 3, 2002, 9:18:40 AM6/3/02

to

tmo...@sea-tmoore-l.dotcast.com (Tim Moore) writes:

Indeed. It was exactly the above paragraph that scared me away from
LOOP, too, when I started learning CL. Then, I saw many simple ways
of using it on comp.lang.lisp, began using some simple cases, too, and
finally looked it up in the HyperSpec. I was very surprised.

Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

Thomas F. Burdick

unread,

Jun 3, 2002, 2:31:28 PM6/3/02

to

Adam Warner <use...@consulting.net.nz> writes:

> With the help of others I was able to come up with an acceptably
> performing algorithm that accords with my sense of aesthetics (for
> example it avoids five `go' statements).

For what it's worth, I don't think that's a very good goal (avoiding
GOs). If you're talking about a fairly small algorithm, I think
Erik's code was a great example of why we still have goto statements
in modern languages. There's no twisted spaghetti logic, and the way
to achieve the same thing without using GO is to emulate it using
booleans and CASE or COND, which is both less efficient, and harder to
read[*].

> Lisp throws away symbol information as part of its historical baggage.

It what now? I don't understand what you're trying to say here.

[*] I realize that's a subjective criterion, but that doesn't mean I'm
not right :). I'm pretty sure that anyone who's comfortable with
both well styled "structured" code and well styled code that uses
GO, would have an easier time reading Erik's formulation of the
algorithm than one that used looping and boolean variables. As
for people who aren't comfortable with one or the other, they
don't count, because ignorance is a very lame argument against a
programming style.

Adam Warner

unread,

Jun 3, 2002, 9:22:09 PM6/3/02

to

On Tue, 04 Jun 2002 06:31:28 +1200, Thomas F. Burdick wrote:

I'll avoid commenting upon which algorithm is easier to understand. As
you concur it's subjective.

BTW compare the speed of the algorithms on clisp which compiles
to non-native/byte code (and thus may give an advantage to a simpler
algorithm). My original algorithm is faster than both of them:

(defun invert-str (str)
(declare (optimize (speed 3) (safety 0) (debug 0) (safety 0))
(simple-string str))
(check-type str string)

(if (equal str (string-upcase str)) (string-downcase str)
(if (equal str (string-downcase str)) (string-upcase str) str)))

[46]> (compile 'invert-str)
INVERT-STR ;
NIL ;
NIL
[47]>
(defun st2 ()

(time (loop for x from 1 to 1000000 do

(invert-str "abc"))))
ST2
[48]> (compile 'st2)
ST2 ;
NIL ;
NIL
[49]> (st2)

Real time: 5.285 sec.
Run time: 5.28 sec.
Space: 48000000 Bytes
GC: 91, GC time: 0.66 sec.
NIL

5.3 seconds. Not bad. Note that using equal is _faster_ than string=
(time is 5.6 seconds with string=)

This is a very impressive result. CMUCL's compiler can achieve 4.04s
using native code. This is not a significant difference for a native compiler.
Note that CMUCL performs much worse if the code is interpreted:

Evaluation took:
52.9 seconds of real time
51.5 seconds of user run time
1.35 seconds of system run time
[Run times include 3.82 seconds GC run time]
435 page faults and
175333976 bytes consed.

Whereas the clisp version performs twice as fast:
$ clisp --quiet

[1]> (defun invert-str (str)
(declare (optimize (speed 3) (safety 0) (debug 0) (safety 0))
(simple-string str))
(check-type str string)

(if (equal str (string-upcase str)) (string-downcase str)
(if (equal str (string-downcase str)) (string-upcase str) str)))

INVERT-STR
[2]> (defun st2 ()

(time (loop for x from 1 to 1000000 do

(invert-str "abc"))))
ST2
[3]> (st2)

Real time: 25.642069 sec.
Run time: 25.64 sec.
Space: 48000000 Bytes
GC: 91, GC time: 0.64 sec.
NIL

Now my more complicated algorithm ("final code" from this thread):
[51]> (compile 'invert-str)
INVERT-STR ;
NIL ;
NIL
[52]> (st2)

Real time: 6.615702 sec.
Run time: 6.62 sec.
Space: 16000000 Bytes
GC: 31, GC time: 0.23 sec.
NIL

1.3 seconds slower than the simple and compiled algorithm on clisp.

Finally Erik's code (also with max compile speed options):
[54]> (compile 'invert-string)
INVERT-STRING ;
NIL ;
NIL

[58]>

(defun st1 ()
(time (loop for x from 1 to 1000000 do
(invert-string "abc"))))
ST1

[59]> (compile 'st1)
ST1 ;
NIL ;
NIL
[60]> (st1)

Real time: 8.543372 sec.
Run time: 8.54 sec.
Space: 16000000 Bytes
GC: 31, GC time: 0.25 sec.
NIL

1.9 seconds slower again. 60% slower than the simple algorithm. Note that
both the more complicated algorithms would be significantly faster in a
best-case mixed-case scenario with longer strings (because even though you
have a longer string you find out that there is a mixed case after
comparing the first two characters).

>> Lisp throws away symbol information as part of its historical baggage.
>
> It what now? I don't understand what you're trying to say here.

Perhaps pasting back in the context helps:

"Lisp throws away symbol information as part of its historical baggage. An
invert workaround is available to preserve the information while still
allowing lowercase code."

When read in its context it's clear I am commenting upon the case of
entered code being converted to upper case when creating symbols. If Lisp
was being designed today it's very likely that case would be preserved--if
for no other reasons than (a) it assists interoperability with case-sensitive
languages and markup; and (b) it's easier to destroy information than it
is to recreate it (so leave the destruction of case information up to the
programmer).

I've read a lot comments about this issue so I know it's contentious. You
don't have to agree with me but you should be able to understand what I'm
"trying to say here."

Regards,
Adam

Nils Goesche

unread,

Jun 3, 2002, 10:22:32 PM6/3/02

to

Adam Warner <use...@consulting.net.nz> writes:

> On Tue, 04 Jun 2002 06:31:28 +1200, Thomas F. Burdick wrote:

[Well, first Adam Warner wrote:]

> >> Lisp throws away symbol information as part of its historical baggage.
> >
> > It what now? I don't understand what you're trying to say here.
>
> Perhaps pasting back in the context helps:
>
> "Lisp throws away symbol information as part of its historical baggage. An
> invert workaround is available to preserve the information while still
> allowing lowercase code."
>
> When read in its context it's clear I am commenting upon the case of
> entered code being converted to upper case when creating symbols. If Lisp
> was being designed today it's very likely that case would be preserved--if
> for no other reasons than (a) it assists interoperability with case-sensitive
> languages and markup; and (b) it's easier to destroy information than it
> is to recreate it (so leave the destruction of case information up to the
> programmer).

First of all, the names of interned symbols are only converted to
upper case if you don't escape those symbols, as in |FooBar|, for
example, and only if you don't set (readtable-case) to :preserve.
This is the default setting of the /reader/, but internally,
FOOBAR and |FooBar| are different symbols, so Lisp is in fact
case sensitive. I wouldn't call this ``throwing away of symbol
information''. It was a conscious decision to let the reader
behave this way by default, and past discussions have shown that
it is very unlikely that people would choose another default
behavior today. Your point (a) is not that important, as Lisp as
a general purpose language has a far wider scope than dealing
with markup languages, and you can always set (readtable-case) to
:preserve if you really need that. (b) is a bit silly, as
information that has been destroyed is not hard to recreate but
/impossible/ to recreate. And again, if you want case to be
preserved by the reader, just tell it so, and remember that the
symbol table /is/ case sensitive, so all you are talking about is
the reader, which can be customized to no end.

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xC66D6E6F

Adam Warner

unread,

Jun 4, 2002, 12:07:02 AM6/4/02

to

On Tue, 04 Jun 2002 14:22:32 +1200, Nils Goesche wrote:

> Adam Warner <use...@consulting.net.nz> writes:
>
>> On Tue, 04 Jun 2002 06:31:28 +1200, Thomas F. Burdick wrote:
>
> [Well, first Adam Warner wrote:]
>
>> >> Lisp throws away symbol information as part of its historical
>> >> baggage.
>> >
>> > It what now? I don't understand what you're trying to say here.
>>
>> Perhaps pasting back in the context helps:
>>
>> "Lisp throws away symbol information as part of its historical baggage.
>> An invert workaround is available to preserve the information while
>> still allowing lowercase code."
>>
>> When read in its context it's clear I am commenting upon the case of
>> entered code being converted to upper case when creating symbols. If
>> Lisp was being designed today it's very likely that case would be
>> preserved--if for no other reasons than (a) it assists interoperability
>> with case-sensitive languages and markup; and (b) it's easier to
>> destroy information than it is to recreate it (so leave the destruction
>> of case information up to the programmer).
>
> First of all, the names of interned symbols are only converted to upper
> case if you don't escape those symbols, as in |FooBar|, for example

Right. But that doesn't mean you are able to _refer_ to the preserved case
without also supplying the vertical bars or escaping the symbol/function
name:

[1]> (defun |lower| (x)
(write x))
|lower|
[2]> (lower :hello)

*** - EVAL: the function LOWER is undefined

(|lower| :hello)
:HELLO
:HELLO
(\l\o\w\e\r :hello)
:HELLO
:HELLO

(|lower| :|hello|)
:|hello|
:|hello|

(\l\o\w\e\r :\h\e\l\l\o)
:|hello|
:|hello|

> This is the default setting of the /reader/, but internally, FOOBAR and
> |FooBar| are different symbols, so Lisp is in fact case sensitive. I
> wouldn't call this ``throwing away of symbol information''.

Yet below you go on to state that it is /impossible/ to retrieve the case
information.

> It was a conscious decision to let the reader behave this way by
> default, and past discussions have shown that it is very unlikely that
> people would choose another default behavior today. Your point (a) is
> not that important, as Lisp as a general purpose language has a far
> wider scope than dealing with markup languages, and you can always set
> (readtable-case) to :preserve if you really need that.

Which most importantly breaks lowercase code.

> (b) is a bit silly, as
> information that has been destroyed is not hard to recreate but
> /impossible/ to recreate. And again, if you want case to be preserved
> by the reader, just tell it so, and remember that the symbol table /is/
> case sensitive, so all you are talking about is the reader, which can be
> customized to no end.

Yes the reader can be customised to break most legacy code. Even the
:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP and
the latest release).

Believe if you wish that it would be "very unlikely" that people would
choose different default reader behaviour today. Even if reader case
insensitivity was desired I am certain that a decision would be made to
translate to lowercase instead of uppercase characters.

It would also be nice if Lisp generated code (using macros etc.) actually
looked like human generated code (so for example it could be cut and
pasted without having to translate the case). The best way to achieve this
is the preserve the case that that programmer desires (which today is
overwhelmingly lower or mixed case).

I suspect many in the Lisp community will move to using a case sensitive
reader over time, faciliated by interpreters and compilers that accept
both lower and upper case functions and macros. It would be interesting to
know how much of an impact this could have upon interpreter performance.

We'll just have to wait a few years to see whether I turn out to be right
about this suspicion.

Regards,
Adam

Gabe Garza

unread,

Jun 4, 2002, 1:08:53 AM6/4/02

to

Adam Warner <use...@consulting.net.nz> writes:

> On Tue, 04 Jun 2002 06:31:28 +1200, Thomas F. Burdick wrote:
>

> [snip OP pointing out that this in CLisp:]

>
> (defun invert-str (str)
> (declare (optimize (speed 3) (safety 0) (debug 0) (safety 0))
> (simple-string str))
> (check-type str string)
> (if (equal str (string-upcase str)) (string-downcase str)
> (if (equal str (string-downcase str)) (string-upcase str) str)))
>

> (defun st2 ()
> (time (loop for x from 1 to 1000000 do
> (invert-str "abc"))))
>

> Real time: 5.285 sec.
>
> [Was faster then CMUCL:]

>
> Evaluation took:
> 52.9 seconds of real time
> 51.5 seconds of user run time
> 1.35 seconds of system run time
> [Run times include 3.82 seconds GC run time]
> 435 page faults and
> 175333976 bytes consed.
>

I don't think this is a reasonable test to benchmark two
implementations on. You're going to get the biggest advantage from an
optimizing native-code compiler relative to byte-codes when you write
complex functions: otherwise you're benchmarking the implementations
library more then you are the compiler. When you consider that
STRING-UPCASE, STRING-DOWNCASE, and EQUAL are written in CMUCL
in CMUCL, and they're written in C in CLisp, then CMUCL isn't so shabby.

Gabe Garza

Tim Moore

unread,

Jun 4, 2002, 1:21:18 AM6/4/02

to

On Tue, 04 Jun 2002 16:07:02 +1200, Adam Warner <use...@consulting.net.nz>
wrote:

You've just shown 4 different ways to refer to the lower case symbol!
What's the problem again?

>
>Yes the reader can be customised to break most legacy code. Even the
>:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP and
>the latest release).

ILISP is not a Common Lisp program, it's an Emacs lisp program with
some interface bits written in Common Lisp. All you've proven is that
ILISP should be more careful about setting print case and read case in
order to be a conforming ANSI Common Lisp program.

>It would also be nice if Lisp generated code (using macros etc.) actually
>looked like human generated code (so for example it could be cut and
>pasted without having to translate the case). The best way to achieve this
>is the preserve the case that that programmer desires (which today is
>overwhelmingly lower or mixed case).

Cut and paste into programs? This is almost never useful. Why not
just use the macro in your code directly?

>
>I suspect many in the Lisp community will move to using a case sensitive
>reader over time, faciliated by interpreters and compilers that accept
>both lower and upper case functions and macros. It would be interesting to
>know how much of an impact this could have upon interpreter
>performance.

How about "zero?"

>
>We'll just have to wait a few years to see whether I turn out to be right
>about this suspicion.

Do report back to us then.

Tim

Adam Warner

unread,

Jun 4, 2002, 1:37:25 AM6/4/02

to

On Tue, 04 Jun 2002 17:08:53 +1200, Gabe Garza wrote:

> I don't think this is a reasonable test to benchmark two implementations
> on.

It was just an example. I haven't seen comprehensive interpreted
benchmarks. Here is a variety of compiled comparisons:

http://ww.telent.net/cliki/Performance%20Benchmarks

One thing that is astonishing is clisp's bignum performance.

> You're going to get the biggest advantage from an optimizing native-code
> compiler relative to byte-codes when you write complex functions:
> otherwise you're benchmarking the implementations library more then you
> are the compiler.

Which is a great description of what I have come to understand: unless you
are certain that your more complex algorithm is going to be used on a
native code compiler then you might be better off leveraging the built in
libraries.

> When you consider that STRING-UPCASE, STRING-DOWNCASE, and EQUAL are
> written in CMUCL in CMUCL, and they're written in C in CLisp, then CMUCL
> isn't so shabby.

You're right. Thanks for helping me understand that. And it's an important
issue for anyone that expects to be evaluating a lot of code at run time
(and by "code" I mean the Lisp sense where data can be expressed as code
and executed).

Regards,
Adam

Adam Warner

unread,

Jun 4, 2002, 2:50:53 AM6/4/02

to

Hi Tim,

The most efficient way to refer to a lower case symbol is to refer to it
by using the same lower case symbol. Anything else is more verbose. That's
essentially the problem.

Mixing you up with another Tim I was thinking you may have experienced
this issue yourself. But it's Tim Bradshaw's HTML generator:

http://www.tfeb.org/lisp/hax.html#HTOUT

It generates HTML that looks like this:

<h1>Numbers from zero below ten</h1>
<p>Table border width 0</p><br>
<hr NOSHADE><center><table BORDER='0' WIDTH='90%'><tbody><tr><th
ALIGN='left'>English</th><th ALIGN='right'>Arabic</th><th
ALIGN='right'>Roman</th></tr><tr BGCOLOR='blue'><td
ALIGN='left'>zero</td><td ALIGN='right'>0</td><td ALIGN='right'></td></tr>
<tr BGCOLOR='white'><td ALIGN='left'>one</td><td ALIGN='right'>1</td>

All the attributes of the tags that are generate are not XHTML compatible.
Symbols were used to refer to attributes. Those symbols which were entered
as lower case were converted to upper case by the reader and output by the
generator.

http://www.w3.org/TR/2000/REC-xhtml1-20000126/

The XML document object model specifies that element and attribute
names are returned in the case they are specified. In XHTML 1.0,
elements and attributes are specified in lower-case.

Upper case tags are no longer recognised. And since XML in general is case
sensitive just converting all attributes to lower case is not a robust nor
long term solution (but it is OK for XHTML 1.0). You either need to find
another way to represent symbols (perhaps by replacing the : character
with some new syntax) or use a case sensitive reader or just put up with
more verbose syntax (e.g. :|align| or specifying "align" as a string
instead of a symbol).

>>Yes the reader can be customised to break most legacy code. Even the
>>:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP
>>and the latest release).
>
> ILISP is not a Common Lisp program, it's an Emacs lisp program with some
> interface bits written in Common Lisp. All you've proven is that ILISP
> should be more careful about setting print case and read case in order
> to be a conforming ANSI Common Lisp program.

I just illustrated that it can be difficult to even start editing code in
a case sensitive Lisp mode when a widly used tool may not be able to cope
with case inversion.

>>It would also be nice if Lisp generated code (using macros etc.)
>>actually looked like human generated code (so for example it could be
>>cut and pasted without having to translate the case). The best way to
>>achieve this is the preserve the case that that programmer desires
>>(which today is overwhelmingly lower or mixed case).
>
> Cut and paste into programs? This is almost never useful. Why not just
> use the macro in your code directly?

For the situations that fit into categories other than "almost never
useful". Perhaps you can explain why the default of upper case machine
generated Lisp is superior when you want to cut and paste code. Otherwise
this point stands.

(snip)

>>We'll just have to wait a few years to see whether I turn out to be
>>right about this suspicion.
>
> Do report back to us then.

OK. It's just a naive prediction.

Regards,
Adam

Marco Antoniotti

unread,

Jun 4, 2002, 10:49:49 AM6/4/02

to

tmo...@sea-tmoore-l.dotcast.com (Tim Moore) writes:

> On Tue, 04 Jun 2002 16:07:02 +1200, Adam Warner <use...@consulting.net.nz>
> wrote:

...

> >
> >Yes the reader can be customised to break most legacy code. Even the
> >:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP and
> >the latest release).
>
> ILISP is not a Common Lisp program, it's an Emacs lisp program with
> some interface bits written in Common Lisp. All you've proven is that
> ILISP should be more careful about setting print case and read case in
> order to be a conforming ANSI Common Lisp program.

Well, yes. ILISP 5.12.0 is more robust, but still not perfect (and
READTABLE-CASE issues are always lurking around). PLease (Adam
Warner, that is) send in a good test case and (possibly) a diagnosis.

> >I suspect many in the Lisp community will move to using a case sensitive
> >reader over time, faciliated by interpreters and compilers that accept
> >both lower and upper case functions and macros. It would be interesting to
> >know how much of an impact this could have upon interpreter
> >performance.
>
> How about "zero?"

Well, I gave a lot of thought to this. My gut feeling is that you
will have to pay a price in "interpreted" code in order to retrieve
the "right" symbol.

Yet this begs the question about "what is an interpreter" for CL.
CMUCL and Corman Lisp (these I know, other implementations may do the
same) essentially compile forms on the fly, so their "execution" will
actually achieve "zero" cost.

Yet, I believe this is a problem of perception. From the
"interpreted-only, single-implementation" language point of view, this
may seem a relevant problem. IMHO it is not.

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

Tim Bradshaw

unread,

Jun 4, 2002, 11:27:01 AM6/4/02

to

* Adam Warner wrote:

> http://www.w3.org/TR/2000/REC-xhtml1-20000126/

> The XML document object model specifies that element and attribute
> names are returned in the case they are specified. In XHTML 1.0,
> elements and attributes are specified in lower-case.

Yet another reason not to use XML. About the only thing SGML got
right was being case-insensitive by default, but the fools who
`designed' XML were blinded by the lOvely naming staNdaRds YoU Have tO
UsE in lanGuaGeS without convEnIent inter-woRd seParaTors. Amazingly
you *can* use hyphens in XML names (they didn't manage to break that),
but nonetheless we are going to be eaten alive by incomprehensible
studly XML.

If you have to use XML (note: htmlify is called htmlify for a reason)
then, assuming you've rejected the obvious choices of finding a new
career or suicide, it's not really a lot of work to support
case-sensitivity per form:

(defvar *cs-readtable*
(let ((rt (copy-readtable)))
(setf (readtable-case rt) :preserve)
rt))

(defvar *proper-readtable* *readtable*)

(set-dispatch-macro-character
#\# #\~
#'(lambda (stream char infix)
(declare (ignore char infix))
(let ((*proper-readtable* *readtable*)
(*readtable* *cs-readtable*))
(read stream t nil t)))
*proper-readtable*)

(set-dispatch-macro-character
#\# #\~
#'(lambda (stream char infix)
(declare (ignore char infix))
(let ((*readtable* *proper-readtable*))
(read stream t nil t)))
*cs-readtable*)

#~(DEFUN FOO (x) (LIST x '#~x))

You could even use ~ rather than #~ if you wanted it to be really
terse:

~(DEFUN FOO (x) (LIST x '~X))

Incidentally, lest anyone misinterpret my feelings about
case-sensitivity: I don't have any. I do have very strong feelings
about StudlyCaps, and it does look like a case-insensitive reader (or
full-blown case-insensitivity) is the only way known to prevent the
appalling StudliNess that infests programs in C/Java family languages.
Actually, I think Studly code is worse than XML - at least XML will be
dead in a few years, but big C++ programs will live forever, if only
because they won't have finished compiling by the time XML has gone
away.

--tim

Tim Moore

unread,

Jun 4, 2002, 12:13:00 PM6/4/02

to

On Tue, 04 Jun 2002 18:50:53 +1200, Adam Warner <use...@consulting.net.nz>
wrote:

>On Tue, 04 Jun 2002 17:21:18 +1200, Tim Moore wrote:

...

>> You've just shown 4 different ways to refer to the lower case symbol!
>> What's the problem again?
>
>Hi Tim,
>
>The most efficient way to refer to a lower case symbol is to refer to it
>by using the same lower case symbol. Anything else is more verbose. That's
>essentially the problem.

I'm actually pretty agnostic on the whole case issue. I work with a
"modern" Lisp and play with a Lisp that aspires to be ANSI
compatible. And yes, in the modern Lisp it's a bit easier to rip out
macros that deal with mixed case XML. At work I'm lazy about using
mixed case symbols all over the place. However, a very small amount
of additional work would be required on my part to make our stuff work
in ANSI Common Lisp, both as a macro writer and a as user. If
anything this discussion is doing a good job of convincing me that the
default case doesn't matter much for new tasks as long as it's well
defined.

To touch briefly on the stuff I elided above, CL programmers rarely
have to deal with mixed case in any great amount in normal program
code, so worrying about the efficiency of referring to mixed case
symbols in user code leaves me yawning.

>
>Mixing you up with another Tim I was thinking you may have experienced
>this issue yourself. But it's Tim Bradshaw's HTML generator:

The confusion is more flattering to me than it is to Bradshaw :)

>http://www.tfeb.org/lisp/hax.html#HTOUT
>
>It generates HTML that looks like this:
>
><h1>Numbers from zero below ten</h1>
><p>Table border width 0</p><br>
><hr NOSHADE><center><table BORDER='0' WIDTH='90%'><tbody><tr><th
>ALIGN='left'>English</th><th ALIGN='right'>Arabic</th><th
>ALIGN='right'>Roman</th></tr><tr BGCOLOR='blue'><td
>ALIGN='left'>zero</td><td ALIGN='right'>0</td><td ALIGN='right'></td></tr>
><tr BGCOLOR='white'><td ALIGN='left'>one</td><td ALIGN='right'>1</td>

Looks like fine HTML to me.

>All the attributes of the tags that are generate are not XHTML compatible.
>Symbols were used to refer to attributes. Those symbols which were entered
>as lower case were converted to upper case by the reader and output by the
>generator.

Let me get this straight: you're criticizing the default case in
the Common Lisp reader because a certain set of macros doesn't automagically
produce output that conforms to a standard that it wasn't designed to
produce?

>Upper case tags are no longer recognised. And since XML in general is case
>sensitive just converting all attributes to lower case is not a robust nor
>long term solution (but it is OK for XHTML 1.0). You either need to find
>another way to represent symbols (perhaps by replacing the : character
>with some new syntax) or use a case sensitive reader or just put up with
>more verbose syntax (e.g. :|align| or specifying "align" as a string
>instead of a symbol).

And now, although an extremely simple modification to Bradshaw's macro
would produce valid XHTML (at least in this instance), that's not good
enough because it doesn't apply to all possible XML output?

I think you have a skewed idea of what's hard and what's not when
dealing with XML. This kind of macro system that lets you emit
arbitrary tags and attributes in an SGML style becomes less and less
useful as the the complexity of the XML target -- and your own program
-- grows. From a Common Lisp (well, this Common Lisp programmer's)
point of view, real XML is generated from a description of the XML
schema and program objects or function calls. In this world view the
case of the XML output and the case of corresponding names or keywords
in the code are only marginally connected, if at all. Some small
amount of work may need to be done by the person writing the XML
translation code or schema definition, but unless the XML schema is
really preverse it will be a very small amount of work.

>
>>>Yes the reader can be customised to break most legacy code. Even the
>>>:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP
>>>and the latest release).
>>
>> ILISP is not a Common Lisp program, it's an Emacs lisp program with some
>> interface bits written in Common Lisp. All you've proven is that ILISP
>> should be more careful about setting print case and read case in order
>> to be a conforming ANSI Common Lisp program.
>
>I just illustrated that it can be difficult to even start editing code in
>a case sensitive Lisp mode when a widly used tool may not be able to cope
>with case inversion.

This is a good illustration of how often Common Lisp programmers find
it useful to work with case inversion. However, I'm sure Marco will
get right on fixing ILISP :)

>
>>>It would also be nice if Lisp generated code (using macros etc.)
>>>actually looked like human generated code (so for example it could be
>>>cut and pasted without having to translate the case). The best way to
>>>achieve this is the preserve the case that that programmer desires
>>>(which today is overwhelmingly lower or mixed case).
>>
>> Cut and paste into programs? This is almost never useful. Why not just
>> use the macro in your code directly?
>
>For the situations that fit into categories other than "almost never
>useful". Perhaps you can explain why the default of upper case machine
>generated Lisp is superior when you want to cut and paste code. Otherwise
>this point stands.

No, I reject the aesthetics of cutting and pasting machine generated Lisp
code as a useful criteria for judging anything. If you really care
about this it's so simple to get it to come out in lower case! Then
you can start worrying about all the other problems in cutting and
pasting code, from all but the most trivial macros, back into your program.

Tim

Duane Rettig

unread,

Jun 4, 2002, 1:00:01 PM6/4/02

to

Tim Bradshaw <t...@cley.com> writes:

> Incidentally, lest anyone misinterpret my feelings about
> case-sensitivity: I don't have any. I do have very strong feelings
> about StudlyCaps, and it does look like a case-insensitive reader (or
> full-blown case-insensitivity) is the only way known to prevent the
> appalling StudliNess that infests programs in C/Java family languages.

One can still write such studly code when read by an insensitive language.
I believe the real reason why C++ users end up pushed to write in such a
style is because early on (when C was first invented) they "lost" their
most natural delimiter, the hyphen. Of course, I would not want to have
to extend the already-too-complex C/C++ parsing algorithm to accomodate
yet-another-syntax-overload for -, but such an extension early in the C
definition would have removed the perceived need for these studly names.
(the alternative used in most C code is using the underbar, but that
has the disadvantage, at least on my keyboard, of being a shifted key,
and not accepted in some interchanges).

> Actually, I think Studly code is worse than XML - at least XML will be
> dead in a few years, but big C++ programs will live forever, if only
> because they won't have finished compiling by the time XML has gone
> away.

Let me guess - you were writing this while waiting for a C++ compilation
to finish? :-)

--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)

Andy

unread,

Jun 4, 2002, 1:45:19 PM6/4/02

to

Duane Rettig wrote:
> Of course, I would not want to have
> to extend the already-too-complex C/C++ parsing algorithm to accomodate
> yet-another-syntax-overload for -, but such an extension early in the C
> definition would have removed the perceived need for these studly names.

Some years ago i have had the chance to look over some code of an old unix.
Most of the code was lowercase and most variablenames was shorter
than 3 chars. This is of course not representive. But for my feeling the
style that is found in C/C++ today was mainly introduced by Windows programmers
in conjunction with the reverse polish type notation (i.e pszTheString).

Best regards
AHz

Daniel Barlow

unread,

Jun 4, 2002, 2:11:05 PM6/4/02

to

Duane Rettig <du...@franz.com> writes:

> (the alternative used in most C code is using the underbar, but that
> has the disadvantage, at least on my keyboard, of being a shifted key,

I'm not altogether sure this is a significant problem: so do capital
letters, generally ...

-dan

--

http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

Joe Marshall

unread,

Jun 4, 2002, 3:28:42 PM6/4/02

to

"Adam Warner" <use...@consulting.net.nz> wrote in message news:adh4k6$111uqr$1...@ID-105510.news.dfncis.de...

> If Lisp was being designed today it's very likely that case would be preserved--if
> for no other reasons than (a) it assists interoperability with case-sensitive
> languages and markup; and (b) it's easier to destroy information than it
> is to recreate it (so leave the destruction of case information up to the
> programmer).
>

Lisp is already case-sensitive.

You cannot destroy information. (Admittedly you can send it into the heat
sink where it isn't coming back, but to be pedantic it isn't being destroyed.)

Joe Marshall

unread,

Jun 4, 2002, 3:32:03 PM6/4/02

to

"Duane Rettig" <du...@franz.com> wrote in message news:4u1ojq...@beta.franz.com...

>
> (the alternative used in most C code is using the underbar, but that
> has the disadvantage, at least on my keyboard, of being a shifted key,
> and not accepted in some interchanges).
>

Boy, hitting that shift key is such a pain in the ass when you have
to type an underscore. I guess studly caps are the way to go.

Thomas F. Burdick

unread,

Jun 4, 2002, 3:48:29 PM6/4/02

to

Adam Warner <use...@consulting.net.nz> writes:

> On Tue, 04 Jun 2002 06:31:28 +1200, Thomas F. Burdick wrote:
>
> I'll avoid commenting upon which algorithm is easier to understand. As
> you concur it's subjective.
>
> BTW compare the speed of the algorithms on clisp which compiles
> to non-native/byte code (and thus may give an advantage to a simpler
> algorithm). My original algorithm is faster than both of them:

Well, of course. CLISP is odd in that your own code, being
interpreted byte codes, is much much slower than the library code,
which is native code. Nonetheless, your initial string inverter is
algorithmically slower (best case scenario, it traverses the string 3
times), you've just seriously changed the constant factors.

> (defun st2 ()
> (time (loop for x from 1 to 1000000 do
> (invert-str "abc"))))

BTW, have you tried it on really long strings? How about really long
strings that change case at the very beginning? If your typical
string you're trying to invert is 3 chars long, then the simple
solution is good enough, and in the case of CLISP, because it relies
very heavily on library code, will be the best. However, "abc" is a
rather special case.

> [49]> (st2)
>
> Real time: 5.285 sec.
> Run time: 5.28 sec.
> Space: 48000000 Bytes
> GC: 91, GC time: 0.66 sec.
> NIL
>
> 5.3 seconds. Not bad. Note that using equal is _faster_ than string=
> (time is 5.6 seconds with string=)

[ Strange, perhaps string= is written partially in lisp? ]

> This is a very impressive result. CMUCL's compiler can achieve 4.04s
> using native code.

The vast majority of the time spent in this function by CLISP /is/
spent in native code, so it's not surprising.

> >> Lisp throws away symbol information as part of its historical baggage.
> >
> > It what now? I don't understand what you're trying to say here.
>
> Perhaps pasting back in the context helps:

I read it in context, and didn't understand what this sentance was
saying. Now I understand what you were trying to get across, but I
think the way you said it was totally misguided (it doesn't throw away
"symbol information" it loses information about the textual form of
the source code).

Duane Rettig

unread,

Jun 4, 2002, 5:00:01 PM6/4/02

to

"Joe Marshall" <prunes...@attbi.com> writes:

Yes, you and Daniel Barlow are both right. But as I recall, the
underscore is relatively new in the ASCII set - I remember seeing
a left-arrow in its place in old encodings. Perhaps my memory is
failing me, but if that is the case, no amount of shifting will get
you an underscore on some keyboards/computers.

Joe Marshall

unread,

Jun 4, 2002, 5:22:44 PM6/4/02

to

"Duane Rettig" <du...@franz.com> wrote in message news:4ptz6r...@beta.franz.com...

>
> Yes, you and Daniel Barlow are both right. But as I recall, the
> underscore is relatively new in the ASCII set - I remember seeing
> a left-arrow in its place in old encodings. Perhaps my memory is
> failing me, but if that is the case, no amount of shifting will get
> you an underscore on some keyboards/computers.

The underscore has been in the ASCII character set at least since
1967. (See ANSI X3.4-1967)

C came into being in the years 1969 - 1973.

Thomas A. Russ

unread,

Jun 4, 2002, 4:07:51 PM6/4/02

to

Adam Warner <use...@consulting.net.nz> writes:

>
> On Sat, 01 Jun 2002 04:11:06 +1200, Erik Naggum wrote:
>
> > Just for kicks, I ran the same tests with CMUCL 3.0.12 18d+ build 3350
> > (are these guys Microsoft employees or what?. Before compilation:
>
> BTW Erik, since the code could not be compiled on my platform without
> changing (prog ((invert nil) ... to (prog ((invert "") ... how come it
> compiled for you?

Because most other Lisp systems are not as fanatic about type
declarations as CMUCL. They will accept NIL as an initial value for a
variable with type STRING. CMUCL (correctly IMHO) will not.

> Regards,
> Adam

--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu

Thomas A. Russ

unread,

Jun 4, 2002, 4:10:10 PM6/4/02

to

Adam Warner <use...@consulting.net.nz> writes:

>
> On Sat, 01 Jun 2002 13:45:13 +1200, Wade Humeniuk wrote:
>
> Your version is incredibly fast:
>
> * (defun invert-string (string)
> (loop for char of-type character across string
> with uppercase-p = nil
> with lowercase-p = nil
> when (upper-case-p char) do (setf uppercase-p t)
> when (lower-case-p char) do (setf lowercase-p t)
> finally return (cond
> ((eq uppercase-p lowercase-p) string)
> (uppercase-p (string-downcase string))
> (lowercase-p (string-upcase string)))))
>

The finally clause should be:

finally (return (cond
((eq uppercase-p lowercase-p) string)
(uppercase-p (string-downcase string))
(lowercase-p (string-upcase string))))

with the extra parens around the return. The "return" keyword is not
strictly allowed at this place in the Loop macro, and is therefore
ignored (although an extension of Loop might process it differently)

> But your version only returns NIL :-)
>
> (invert-string "abc")

Adam Warner

unread,

Jun 4, 2002, 7:35:02 PM6/4/02

to

On Wed, 05 Jun 2002 02:49:49 +1200, Marco Antoniotti wrote:

> tmo...@sea-tmoore-l.dotcast.com (Tim Moore) writes:
>
>> On Tue, 04 Jun 2002 16:07:02 +1200, Adam Warner
>> <use...@consulting.net.nz> wrote:
>
> ...
>
>
>> >Yes the reader can be customised to break most legacy code. Even the
>> >:invert mode currently breaks ILISP (with CMUCL. I'm yet to test CLISP
>> >and the latest release).
>>
>> ILISP is not a Common Lisp program, it's an Emacs lisp program with
>> some interface bits written in Common Lisp. All you've proven is that
>> ILISP should be more careful about setting print case and read case in
>> order to be a conforming ANSI Common Lisp program.
>
> Well, yes. ILISP 5.12.0 is more robust, but still not perfect (and
> READTABLE-CASE issues are always lurking around). PLease (Adam Warner,
> that is) send in a good test case and (possibly) a diagnosis.

I'll test ILISP 5.12.0 soon. I made a Debian package of the CVS version
but it wasn't entirely successful (ILISP didn't work with clisp for some
reason, perhaps the CVS version was broken at that time). I sent in a way
to reproduce some problems to the SourceForge bug reporter. I'll probably
wait to test the unstable ILISP 5.12.0 Debian package before submitting
another bug report.

>> >I suspect many in the Lisp community will move to using a case
>> >sensitive reader over time, faciliated by interpreters and compilers
>> >that accept both lower and upper case functions and macros. It would
>> >be interesting to know how much of an impact this could have upon
>> >interpreter performance.
>>
>> How about "zero?"
>
> Well, I gave a lot of thought to this. My gut feeling is that you will
> have to pay a price in "interpreted" code in order to retrieve the
> "right" symbol.
>
> Yet this begs the question about "what is an interpreter" for CL. CMUCL
> and Corman Lisp (these I know, other implementations may do the same)
> essentially compile forms on the fly, so their "execution" will actually
> achieve "zero" cost.
>
> Yet, I believe this is a problem of perception. From the
> "interpreted-only, single-implementation" language point of view, this
> may seem a relevant problem. IMHO it is not.

Good to know. And I appreciate the thought you're put into these kinds of
issues in the past.

Regards,
Adam

Adam Warner

unread,

Jun 4, 2002, 10:06:26 PM6/4/02

to

On Wed, 05 Jun 2002 03:27:01 +1200, Tim Bradshaw wrote:

> * Adam Warner wrote:
>
>> http://www.w3.org/TR/2000/REC-xhtml1-20000126/
>
>> The XML document object model specifies that element and attribute
>> names are returned in the case they are specified. In XHTML 1.0,
>> elements and attributes are specified in lower-case.
>
> Yet another reason not to use XML. About the only thing SGML got right
> was being case-insensitive by default, but the fools who `designed' XML
> were blinded by the lOvely naming staNdaRds YoU Have tO UsE in lanGuaGeS
> without convEnIent inter-woRd seParaTors. Amazingly you *can* use
> hyphens in XML names (they didn't manage to break that), but nonetheless
> we are going to be eaten alive by incomprehensible studly XML.

You may not be able to avoid XML. But you may be able to leverage Lisp to
keep its impact to a minimum (instead of embracing the entire tool chain).

> If you have to use XML (note: htmlify is called htmlify for a reason)

Of course. That's also why you write in the text: "In order to do this
`right' for XML you need to deal with issues of Unicode, *case*,
namespaces and so on." (my emphasis)

I was just showing how the approach of using symbols to specify attributes
doesn't automatically translate to XML because symbols are converted to
upper case by the reader.

> then, assuming you've rejected the obvious choices of finding a new
> career or suicide, it's not really a lot of work to support
> case-sensitivity per form:

This is my first experience of creating new syntax (refer "7.5 Macro
Characters" and "14.3 Read-Macros" in Graham). Thanks for the great
example.

> (defvar *cs-readtable*
> (let ((rt (copy-readtable)))
> (setf (readtable-case rt) :preserve)
> rt))

cs stands for case sensitive. *cs-readable* stores (a pointer to) a copy
of the readtable in case preserving mode.

> (defvar *proper-readtable* *readtable*)

*proper-readtable* is just (a pointer to) the current readtable.

> (set-dispatch-macro-character
> #\# #\~

Just a question at this point. Graham on p235 states that six combinations
of dispatching macro characters are specifically reserved for the user: #!
#? #[ #] #{ #}

You've chosen to use #~. What impact could this have on compatibility?

The HyperSpec says that "The combinations marked by an asterisk (*) are
explicitly reserved to the user. No conforming implementation defines them
(the ones that Graham lists). ~ is on the list as undefined. So a
conforming implementation could implement your character and break your
code?

Does anyone know of a "conforming implementation" that does use macro
characters designated as "undefined"?

Six explicitly reserved dispatching macros characters aren't a lot (and
four of them are for bracket style macros). Is there a way to build upon
the reserved macro charcters to expand the number of macros allowed (e.g.
something like #!macro1 #!macro2)?

> #'(lambda (stream char infix)
> (declare (ignore char infix))
> (let ((*proper-readtable* *readtable*)
> (*readtable* *cs-readtable*))
> (read stream t nil t)))
> *proper-readtable*)

This switches the readtable case to case sensitive (by pointing
*readtable* to a copy of the case sensitive reader). *readtable* does not
need to be switched back because it is a locally created variable. Once
the read macro exits, *readtable* takes on its original definition.

Is the last line redundant? Could the macro have been written:

(set-dispatch-macro-character
#\# #\~
#'(lambda (stream char infix)
(declare (ignore char infix))
(let ((*proper-readtable* *readtable*)
(*readtable* *cs-readtable*))

(read stream t nil t))))

> (set-dispatch-macro-character
> #\# #\~
> #'(lambda (stream char infix)
> (declare (ignore char infix))
> (let ((*readtable* *proper-readtable*))
> (read stream t nil t)))
> *cs-readtable*)

This appears to be another version of the dispatching macro that
translates to the proper readtable when you are already in the case
sensitive readtable. Was the same dispatching macro intended or is this
supposed to use a new symbol so you can "escape" from case sensitive mode?

> #~(DEFUN FOO (x) (LIST x '#~x))
>
> You could even use ~ rather than #~ if you wanted it to be really terse:
>
> ~(DEFUN FOO (x) (LIST x '~X))

It's a very interesting idea thanks Tim. I'll contemplate its
applications.

> Incidentally, lest anyone misinterpret my feelings about
> case-sensitivity: I don't have any. I do have very strong feelings
> about StudlyCaps, and it does look like a case-insensitive reader (or
> full-blown case-insensitivity) is the only way known to prevent the
> appalling StudliNess that infests programs in C/Java family languages.
> Actually, I think Studly code is worse than XML - at least XML will be
> dead in a few years, but big C++ programs will live forever, if only
> because they won't have finished compiling by the time XML has gone
> away.

Some effects are at work:

(a) A case insensitive reader facilitates the use of whatever case you
please, including StudlyCaps. I could go ahead right now and use StudyCaps
in all my code. So long as the reader translates everything to upper case
everything continues to work.

(b) But even if I use StudlyCaps no one else has to. This eliminates me
from leveraging network effects to get someone else to start using
StudlyCaps in their code.

Since network effects have been eliminated whatever case is used is just
based upon the norms and conforming with the community. How many people do
you see writing uppercase Lisp?

There's actually another pressure upon non-uppercase code:

(c) If anyone wants to write code that conforms with the readtable :invert
mode they should write it in lower case or StudlyCaps. Use of uppercase
breaks this mode.

If the Lisp community as a whole liked StudlyCaps there would be nothing
stopping its adoption. The readtable case insentitive mode would
facilitate its adoption because legacy functions can be referred to in
StudlyCaps.

What the readtable case insentitivity does do is moderate the influence of
anyone creating libraries that employ StudlyCaps, since the rest of the
community who like case insensitivity can continue to refer to them in
whatever case they please (unless the library is truly evil and implements
different functions with the same case insensitive name).

Regards,
Adam

Adam Warner

unread,

Jun 4, 2002, 10:16:44 PM6/4/02

to

On Wed, 05 Jun 2002 14:06:26 +1200, Adam Warner wrote:

> Six explicitly reserved dispatching macros characters aren't a lot (and
> four of them are for bracket style macros). Is there a way to build upon
> the reserved macro charcters to expand the number of macros allowed
> (e.g. something like #!macro1 #!macro2)?

Oh, I think I can answer this one. If you run out of macro _characters_
you might as well just create standard macros :-)

Regards,
Adam

Christopher C. Stacy

unread,

Jun 4, 2002, 11:38:28 PM6/4/02

to

>>>>> On Tue, 04 Jun 2002 21:22:44 GMT, Joe Marshall ("Joe") writes:

Joe> "Duane Rettig" <du...@franz.com> wrote in message news:4ptz6r...@beta.franz.com...

>>
>> Yes, you and Daniel Barlow are both right. But as I recall, the
>> underscore is relatively new in the ASCII set - I remember seeing
>> a left-arrow in its place in old encodings. Perhaps my memory is
>> failing me, but if that is the case, no amount of shifting will get
>> you an underscore on some keyboards/computers.

Joe> The underscore has been in the ASCII character set at least
Joe> since 1967. (See ANSI X3.4-1967)

Joe> C came into being in the years 1969 - 1973.

However, the most common ASCII terminal up until the late 1970s
was an ASR teletype, which didn't yet have underscore.
It had the left arrow.

Erik Naggum

unread,

Jun 5, 2002, 6:12:58 AM6/5/02

to

* Adam Warner

| Oh, I think I can answer this one. If you run out of macro _characters_
| you might as well just create standard macros :-)

If you run out of macro _characters_, there are thousands and thousands
of "private use" Unicode characters there for the taking. You might want
to run out of the rest of Unicode first, however.
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.

70 percent of American adults do not understand the scientific process.

Tim Bradshaw

unread,

Jun 5, 2002, 8:44:47 AM6/5/02

to

* Adam Warner wrote:

> You may not be able to avoid XML. But you may be able to leverage
> Lisp to keep its impact to a minimum (instead of embracing the
> entire tool chain).

I hope so.

> Of course. That's also why you write in the text: "In order to do this
> `right' for XML you need to deal with issues of Unicode, *case*,
> namespaces and so on." (my emphasis)

I think the underlying issue here is that you need to be able to
represent all the information associated with a tag in XML. CL
symbols have well-defined behaviour & properties (not in the plist
sense), including well-defined read & write behaviour. XML tags
probably do too. Those behaviours might even be kind of almost the
same. But they probably aren't, and if they are this week the XML
monkeys will have changed the behaviour next week, because they're a
lot further from knowing what language elements should do than the CL
community is. It's not just case, as I said, you need to understand
how XML does namespaces, and maybe try and do something to massage the
CL package system to do that, and then understand 900 other things and
try and make symbols do that. This is doomed. Better instead to
invent some new type which you can manipulate as you wish to track the
cancer of XML standards, I think.

Actually, the facility I'd like from CL to support this is some kind
of readtable hook which would allow you to get control at the point
where a token is about to be made into <something> and control what
happened then. That would allow you to easily substitute your own
type for symbols, which is currently not easy to do. It might be nice
if this got information from the reader saying what it would otherwise
try and create - so you'd get some string, and some information saying
`if you don't say anything this will get made into a number' or
something. Maybe there should also be an `after' hook which gets the
new object and could return something else if need be. Obviously it
needs some design to be done.

> You've chosen to use #~. What impact could this have on
> compatibility?

There's a possibility of a clash, if it was real code it should check
for the char already being defined. Unfortunately if you want to stay
within a fairly small character set there just aren't that many
characters around...

> Six explicitly reserved dispatching macros characters aren't a lot
> (and four of them are for bracket style macros). Is there a way to
> build upon the reserved macro charcters to expand the number of
> macros allowed (e.g. something like #!macro1 #!macro2)?

Yes, I think this is quite possible. I have a thing (at
http://www.tfeb.org/lisp/hax.html#READ-PACKAGES) which lets you say
things like #@cl-user ... to read `...' in the CL-USER package.
Incidentally this is a good example of wanting a readtable hook - the
macro wants to get the name of a package but I didn't want to have to
write a tokenizer, so it sets the package to something it controls,
reads what it expects to be a symbol, and then uses the symbol's name
as a package name (and uninterns the symbol to avoid leakage). What
it really wants to be able to do is say `get me a token' without going
through all these hoops.

> This appears to be another version of the dispatching macro that
> translates to the proper readtable when you are already in the case
> sensitive readtable. Was the same dispatching macro intended or is
> this supposed to use a new symbol so you can "escape" from case
> sensitive mode?

Yes, the aim was that #~ or ~ would toggle sensitivity, so in
case-insensitive mode it would read the next form case-sensitively and
in case-sensitive mode it would read it insensitively. Of course you
need to use an extra character, but languages like Perl seem to
indicate that using single-character prefixes for variables &c is not
something that puts people off. There are probably better ways of
doing it - what I gave is just what I typed in.

> What the readtable case insentitivity does do is moderate the
> influence of anyone creating libraries that employ StudlyCaps, since
> the rest of the community who like case insensitivity can continue
> to refer to them in whatever case they please (unless the library is
> truly evil and implements different functions with the same case
> insensitive name).

Right. Your[1] use of StudlyCaps can't easily infect me, even if
you're a very large player, so long as you stick to standards. In C,
a large player can force their terrible conventions down everyone's
throat more easily. Of course there is no absolute protection, since
you could expose case-sensitive interfaces anyway, embrace and extend
the standard, or in any one of a number of other ways cause various
other kinds of infection (whatever the horrible type-in-the-identifier
standard that MS uses for instance), but its another layer of
protection between me and the hordes monkeys with typewriters.

Damn, I'm not meant to post to cll any more, or even read it. Sorry.

--tim

Footnotes:
[1] not you personally of course. Rather the generic `you: the
enemy'.

Adam Warner

unread,

Jun 5, 2002, 9:14:21 AM6/5/02

to

On Thu, 06 Jun 2002 00:44:47 +1200, Tim Bradshaw wrote:

> Damn, I'm not meant to post to cll any more, or even read it. Sorry.

Well I'm so glad you did. Thanks for all the info and follow up.

Regards,
Adam

Kragen Sitaker

unread,

Jun 5, 2002, 10:13:47 AM6/5/02

to

Tim Bradshaw <t...@cley.com> writes:
> * Adam Warner wrote:
> > http://www.w3.org/TR/2000/REC-xhtml1-20000126/
> > The XML document object model specifies that element and attribute
> > names are returned in the case they are specified. In XHTML 1.0,
> > elements and attributes are specified in lower-case.
>
> Yet another reason not to use XML. About the only thing SGML got
> right was being case-insensitive by default, but the fools who
> `designed' XML were blinded by the lOvely naming staNdaRds YoU Have tO
> UsE in lanGuaGeS without convEnIent inter-woRd seParaTors. Amazingly
> you *can* use hyphens in XML names (they didn't manage to break that),
> but nonetheless we are going to be eaten alive by incomprehensible
> studly XML.

SGML is ASCII; XML is Unicode. It's reasonable for an ASCII language
to depend on a particular case mapping; it's not reasonable for a
Unicode language to do so. You can argue that this is sufficient
reason not to use Unicode, I guess, but you should be aware of the
issues.

Section 4.1 of Unicode version 2.0 reads, in part:

The lowercase letter default case mapping is between the small
character and the capital character. The Unicode Standard case
mapping tables, which are informative, are on the CD-ROM.

In a few instances, upper- and lowercase mappings may differ from
language to language between writing systems that employ the same
letters. Examples include Turkish (...U+0131 LATIN SMALL LETTER
DOTLESS I maps to "I", U+0049 LATIN CAPITAL LETTER I) and French
(...U+00E9 LATIN SMALL LETTER E WITH ACUTE generally maps to
...U+00C9 LATIN CAPITAL LETTER WITH ACUTE, but in some
circumstances may map to "E", U+0045 LATIN CAPITAL LETTER E).
However, in general the vast majority of case mappings are uniform
across languages.

It is important to note that casing operations do not always
provide a round-trip mapping. Also, since many characters are
really caseless (most of the IPA block, for example) uppercasing a
string does not mean that it will no longer contain any lowercase
letters.

Because there are many more lowercase forms than there are
uppercase or titlecase, it is recommended that the lowercase form
be used for normalization, such as when strings are folded for
loose comparison or indexing.

In short, if you specify that a Unicode language be case-insensitive,
then whether or not two identifiers are equal will depend on the
locale, effectively producing a very subtly different per-locale
dialect of the language. If you specify a particular locale as
"standard", then your French and Turkish users will get frustrated
that your "case-insensitive" language is case-sensitive, from
their point of view, but only sometimes. (Unless your particular
locale is French or Turkish, in which case everybody else in the
world will be confused.) And then it still depends on the canonical
case. And then people start to wonder why your halfwidth variant
forms are distinguished from the U+00?? characters that look identical
to them. And then, since the case mapping tables are so large and
somewhat irregular and include scripts your programmers don't use,
there are bugs in your case mapping.

So far, this is all theoretical. I don't know of any Unicode language
that has actually taken the plunge and declared itself committed
to case-insensitivity.

> appalling StudliNess that infests programs in C/Java family languages.
> Actually, I think Studly code is worse than XML - at least XML will be
> dead in a few years, but big C++ programs will live forever, if only
> because they won't have finished compiling by the time XML has gone
> away.

I'm not a fan of StudlyCaps either. C's standard alternative is the
underscore, which makes identifier phrases readable, but I still
like hyphens better.

ozan s yigit

unread,

Jun 5, 2002, 10:28:19 AM6/5/02

to

Duane Rettig:

> I believe the real reason why C++ users end up pushed to write in such a
> style is because early on (when C was first invented) they "lost" their
> most natural delimiter, the hyphen.

i believe the style is due to charles simonyi ("hungarian" notation)
and the microsoft world of programming. a reasonably detailed discussion
of this naming convention is found in (eg) steve mcconnell's "code complete"
(pp 185). there is no basis for the style in unix; i have the original C++
reports and that style is not bjarne's doing. [if you look at his "the C++
programming language special edition" (2000) you won't find the style. he
writes Zlib_init, not ZlibInit. namespace and class names start with
uppercase only, and a similar practice exits for typedefs in C; but
they don't get carried away.]

oz
--
practically no other tree in the forest looked so tree-like as this tree.
-- terry pratchett

Conrad Scott

unread,

Jun 5, 2002, 10:44:07 AM6/5/02

to

"ozan s yigit" <o...@blue.cs.yorku.ca> wrote in message
news:vi41ybl...@blue.cs.yorku.ca...

> Duane Rettig:
>
> > I believe the real reason why C++ users end up pushed to write in such a
> > style is because early on (when C was first invented) they "lost" their
> > most natural delimiter, the hyphen.
>
> i believe the style is due to charles simonyi ("hungarian" notation)
> and the microsoft world of programming. a reasonably detailed discussion
> of this naming convention is found in (eg) steve mcconnell's "code
complete"
> (pp 185). there is no basis for the style in unix; i have the original C++
> reports and that style is not bjarne's doing. [if you look at his "the C++
> programming language special edition" (2000) you won't find the style. he
> writes Zlib_init, not ZlibInit. namespace and class names start with
> uppercase only, and a similar practice exits for typedefs in C; but
> they don't get carried away.]

I can first recall using mixed lower/upper case identifiers in Smalltalk-80,
where it is the standard scheme (see the various books by Adele Goldberg et
al), so that pushes the date back before Hungarian notation I would of
thought.

// Conrad

sv0f

unread,

Jun 5, 2002, 11:58:33 AM6/5/02

to

In article <3cfe235f$0$238$cc9e...@news.dial.pipex.com>, "Conrad Scott"
<aconra...@hotmail.com> wrote:

>"ozan s yigit" <o...@blue.cs.yorku.ca> wrote in message
>news:vi41ybl...@blue.cs.yorku.ca...

[...]

>> i believe the style is due to charles simonyi ("hungarian" notation)
>> and the microsoft world of programming. a reasonably detailed discussion
>> of this naming convention is found in (eg) steve mcconnell's "code
>complete"
>> (pp 185).

[...]

>
>I can first recall using mixed lower/upper case identifiers in Smalltalk-80,
>where it is the standard scheme (see the various books by Adele Goldberg et
>al), so that pushes the date back before Hungarian notation I would of
>thought.

FWIW, Simonyi was at Xerox PARC in the early days, the birthplace
of Smalltalk, and may have been influenced by (or responsible for?)
the mixed case style.

(See "Dealers of Lightning", a book on the glory days of PARC,
for more information.)

Marco Antoniotti

unread,

Jun 5, 2002, 3:26:15 PM6/5/02

to

Tim Bradshaw <t...@cley.com> writes:

...

>
> Actually, the facility I'd like from CL to support this is some kind
> of readtable hook which would allow you to get control at the point
> where a token is about to be made into <something> and control what
> happened then. That would allow you to easily substitute your own
> type for symbols, which is currently not easy to do. It might be nice
> if this got information from the reader saying what it would otherwise
> try and create - so you'd get some string, and some information saying
> `if you don't say anything this will get made into a number' or
> something. Maybe there should also be an `after' hook which gets the
> new object and could return something else if need be. Obviously it
> needs some design to be done.

This is exactly what I had in mind with my proposal. The problem in
CL is that the relation among READ, INTERN and FIND-SYMBOL is too
unflexible.

Now, from what I gather from your posts you'd like some visual marker
(in the form of a read macro or something similar) I think this is
not sufficient and (we get in the personal preferences jungle here)
not elegant. The more I think about these issues the more I convince
mysemf that I was right on track with my very complex proposal of a
couple of years ago.

You need the :SYMBOL-NAME-CASE field in the package data structure and
augment the behavior of FIND-SYMBOL and INTERN appropriately.

...

> Yes, the aim was that #~ or ~ would toggle sensitivity, so in
> case-insensitive mode it would read the next form case-sensitively and
> in case-sensitive mode it would read it insensitively. Of course you
> need to use an extra character, but languages like Perl seem to
> indicate that using single-character prefixes for variables &c is not
> something that puts people off. There are probably better ways of
> doing it - what I gave is just what I typed in.

Of course, this use of #~ (or #^ maybe?) can easily made to work with
my proposal, since it did not really modify the behavior of READ and
READTABLE-CASE.

John Wiseman

unread,

Jun 5, 2002, 3:28:39 PM6/5/02

to

Adam Warner <use...@consulting.net.nz> writes:

> Does anyone know of a "conforming implementation" that does use
> macro characters designated as "undefined"?

MCL does some fancy things with #$ and #_ to refer to FFI constants
and functions, respectively. It also uses #@ to create 2D points that
are used to represent positions and sizes in both Mac OS graphics/GUI
APIs and the higher-level MCL wrappers around them.

Examples:

(eql (#_Q3Object_IsType thing type) #$kQ3True)

(make-instance 'window :view-size #@(320 200))

(MCL also does some funky stuff with #P, using #P, #2P, #3P, #4P to
mean different things.)

I can't think offhand of any other macro characters used in any other
lisps, but I bet there are some.

John Wiseman

Kragen Sitaker

unread,

Jun 5, 2002, 4:25:27 PM6/5/02

to

"Conrad Scott" <aconra...@hotmail.com> writes:
> I can first recall using mixed lower/upper case identifiers in Smalltalk-80,
> where it is the standard scheme (see the various books by Adele Goldberg et
> al), so that pushes the date back before Hungarian notation I would of
> thought.

So perhaps it got picked up in NIHCL, and then in MFC, much of which
is merely a gratuitously incompatible reimplementation of NIHCL? Or
does Microsoft StudlyCaps extend back further than C++ at Microsoft?

Erik Naggum

unread,

Jun 5, 2002, 8:22:15 PM6/5/02

to

* Kragen Sitaker

| So far, this is all theoretical. I don't know of any Unicode language
| that has actually taken the plunge and declared itself committed to
| case-insensitivity.

It seems more appropriate to require lowercase-only names in a Unicode-
based language than to allow and preserve mixed case. I think it is a
serious design blunder to allow the programmer to decide on the uppercase
version of an ordinarily lowercase letter just because of an arbitrary
rule to avoid interword delimiters. If what you say is true about
locales, a Turkish, say, programmer would make a different uppercase
choice than a French, say, but now without the benefit of preserved
locale information. So, if you wanted to be the most reasonable and
"international" in a Unicode-based language, you should outlaw the use
of uppercase letters from the language altogether and use an explicit
interword delimiter.

I have argued elsewhere that embedding case information in the encoding
of letters, with a resulting near doubling of the code space requirement,
was a huge mistake, like early Common Lisp had encoded the font in its
character type. In a better world, we would have developed writing
systems with individual markers for sentence start, not just their end,
and proper name start and end, too. All our other punctuation marks and
conventions developed haphazardly and each has an interesting story to
tell, so it is only an historical accident that we fixed and encoded our
character set(s) at the time we did and much would have so very been
different if we had just waited a litte longer to solidify it all, but
they say that about NTSC and HDTV, too...

Christopher Browne

unread,

Jun 5, 2002, 9:29:18 PM6/5/02

to

Oops! Erik Naggum <er...@naggum.net> was seen spray-painting on a wall:

That would be pretty wild...

Presumably this would mean you'd throw data in and get something
vaguely resembling:

] (setf foo (mksentence "here's a sentence."))
#SENT("HERE'S A SENTENCE")
] (format NIL "~S" foo)
"Here's a sentence."
] (setf bar (mksentence "here's a sentence. anD aNOtHer IN STudLY CApS."))
(#SENT("HERE'S A SENTENCE") #SENT("AND ANOTHER IN STUDLY CAPS")
] (format NIL "~S" bar)
"Here's a sentence. And another in studly caps."

Proper names makes that representation really inadequate, and I'm not
sure (beyond the TeX way, with a ~ forced in here and there...) what
would be a good way to embed indication of initials, as in the
sentence:

"This was all about T.S. Eliott."

I'm not sure that this would have been _so_ much better as to set the
world on fire, but maybe I've not thought about it enough...
--
(reverse (concatenate 'string "moc.enworbbc@" "sirhc"))
http://www3.sympatico.ca/cbbrowne/finances.html
"Thank you for calling PIXAR! If you have a touch tone phone, you can
get information or reach anybody here easily! If your VCR at home is
still blinking '12:00', press '0' at any time during this message and
an operator will assist you."
-- PIXAR'S toll-free line (1-800-888-9856)

Erik Naggum

unread,

Jun 5, 2002, 10:02:16 PM6/5/02

to

* [christopher browne]
| <that would be pretty wild..>

<[i] think the distance from what we have today and we would have had
today had some serious changes been made long ago tend to be a lot larger
than most people are casually able to appreciate> <to rethink and change
a fundamental property of written language is not like some third-rate
time-traveling science fiction movie/joke where going back in time and
changing something, then "returning" ends up with only trivial changes>
<sometimes, you can retrace the development of a person's life and
character to individual events> <if you change or undo these events,
there is no telling what other event would be equally decisive>

| <presumably this would mean you'd throw data in and get something vaguely
| resembling:>

<[i] find such trivializing presumption to be quite insulting> <at least
be willing to consider that our current situation is based on a large
number of completely arbitrary decisions and accidents that could very
easily have been different with dramatically different outcomes> <a
thought experiment with different arbitrary decisions and accidents would
then have to produce different results on a scale similar to that of the
consequences of the arbitrary decisions and accidents that have shaped
our current state> <"what if?"-analyses are generally difficult only
because it is hard to understand just how interrelated and dependent what
we take for granted really is>

| "This was all about T.S. Eliott."

| I'm not sure that this would have been _so_ much better as to set the
| world on fire, but maybe I've not thought about it enough...

ozan s. yigit

unread,

Jun 6, 2002, 12:51:02 AM6/6/02

to

Conrad Scott:

> I can first recall using mixed lower/upper case identifiers in Smalltalk-80,
> where it is the standard scheme (see the various books by Adele Goldberg et
> al), so that pushes the date back before Hungarian notation I would of
> thought.

ah, you are right. the books show clearly that they liked that sort of
UpperAndLowerCaseNamePacking in that language:

"private variable names are required to have lowercase initial
letters; shared variable names are required to have uppercase
initial letters." [smalltalk blue book]

[funny how smalltalk is always forgotten in many a language discussion;
i would have liked an essay titled "smalltalk: good news, bad news, how
to be innovative and lose big" but most smalltalkers moved on... :-]

oz
---
The quality of mercy is not fkazji jkafl xi enkf -Monkey #5572.

Christopher Browne

unread,

Jun 6, 2002, 2:24:53 AM6/6/02

to

It's come back as a scripting language in the form of Ruby. Which is
not a half-bad language...
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.cbbrowne.com/info/rdbms.html
"When I was a boy of fourteen, my father was so ignorant I could
hardly stand to have the old man around. But when I got to be
twenty-one, I was astonished at how much the old man had learned in
seven years." -- Mark Twain

Conrad Scott

unread,

Jun 6, 2002, 8:53:54 AM6/6/02

to

o...@cs.yorku.ca (ozan s. yigit) wrote in message news:<4da3d9af.02060...@posting.google.com>...

> Conrad Scott:
>
> > I can first recall using mixed lower/upper case identifiers in Smalltalk-80,
> > where it is the standard scheme (see the various books by Adele Goldberg et
> > al), so that pushes the date back before Hungarian notation I would of
> > thought.
>
> ah, you are right. the books show clearly that they liked that sort of
> UpperAndLowerCaseNamePacking in that language:
>
> "private variable names are required to have lowercase initial
> letters; shared variable names are required to have uppercase
> initial letters." [smalltalk blue book]

I'd forgotten that bit (it has been a while since I used Smalltalk in
anger). And that's something too that's survived in many C/C++ coding
styles: global names, particularly types, starting with a capital
letter, local scope with lower case. I assume I picked that up from
there: I can't recall anything before that I used with such a naming
scheme.

// Conrad

[Replying via Google Groups since my ISP seems to drop lots of usenet
messages on the floor.]

Marco Antoniotti

unread,

Jun 6, 2002, 10:45:39 AM6/6/02

to

John Wiseman <wis...@server.local.lemon> writes:

I'd rule off #I, since it is defined in the INFIX package (which I use
a lot) :)

Kragen Sitaker

unread,

Jun 6, 2002, 12:12:31 PM6/6/02

to

Erik Naggum <er...@naggum.net> writes:
> * Kragen Sitaker
> | So far, this is all theoretical. I don't know of any Unicode language
> | that has actually taken the plunge and declared itself committed to
> | case-insensitivity.
>
> It seems more appropriate to require lowercase-only names in a Unicode-
> based language than to allow and preserve mixed case.

You know what is uppercase today; but what man can say what will be
uppercase tomorrow? Each month Unicode adds a new script.

Erik Naggum

unread,

Jun 6, 2002, 12:19:22 PM6/6/02

to

* Kragen Sitaker
| So far, this is all theoretical. I don't know of any Unicode language
| that has actually taken the plunge and declared itself committed to
| case-insensitivity.

* Erik Naggum

> It seems more appropriate to require lowercase-only names in a Unicode-
> based language than to allow and preserve mixed case.

* Kragen Sitaker

| You know what is uppercase today; but what man can say what will be
| uppercase tomorrow? Each month Unicode adds a new script.

I am at loss to the relevancy to what you quoted me saying. Would you
care to elaborate or point me to the missing argument or reasoning?

Robert Swindells

unread,

Jun 6, 2002, 5:02:05 PM6/6/02

to

"Conrad Scott" <aconra...@hotmail.com> wrote in message news:<3cfe235f$0$238$cc9e...@news.dial.pipex.com>...

You can hardly claim that Smalltalk predates Hungarian notation.

Try looking up who wrote the Bravo text editor on the Alto.

Robert Swindells

Bijan Parsia

unread,

Jun 6, 2002, 7:30:34 PM6/6/02

to

On 6 Jun 2002, Christopher Browne wrote:

> o...@cs.yorku.ca (ozan s. yigit) wrote:

[snip]

> > [funny how smalltalk is always forgotten in many a language discussion;

Always remembered in language discussions I particpate in, which are many
:)

> > i would have liked an essay titled "smalltalk: good news, bad news, how
> > to be innovative and lose big" but most smalltalkers moved on... :-]

Actually, quite a few stayed.

> It's come back as a scripting language in the form of Ruby.

Actually, it's been around all along. Ruby use is, I would safely bet, far
behind Smalltalk use. There are *at least* 7 commercially sold and
supported implementations, all, to my knowledge, under reasonably active
development (i.e., major releases in the past year). Plus, several free
(in all senses) implementations, one of which has a very active community.

> Which is
> not a half-bad language...

Ruby is a nice language, even if it has TMPWTDI (too many perl ways to do
it) for my taste, but it's hardly the second coming of
Smalltalk. Smalltalk flourishes and Ruby is just another stream...

Cheers,
Bijan Parsia.

Christopher Browne

unread,

Jun 6, 2002, 9:55:35 PM6/6/02

to

In the last exciting episode, Bijan Parsia <bpa...@email.unc.edu> wrote::

> On 6 Jun 2002, Christopher Browne wrote:
>
>> o...@cs.yorku.ca (ozan s. yigit) wrote:
> [snip]
>> > [funny how smalltalk is always forgotten in many a language discussion;
>
> Always remembered in language discussions I particpate in, which are many
> :)
>
>> > i would have liked an essay titled "smalltalk: good news, bad
>> > news, how to be innovative and lose big" but most smalltalkers
>> > moved on... :-]
>
> Actually, quite a few stayed.
>
>> It's come back as a scripting language in the form of Ruby.
>
> Actually, it's been around all along. Ruby use is, I would safely
> bet, far behind Smalltalk use. There are *at least* 7 commercially
> sold and supported implementations, all, to my knowledge, under
> reasonably active development (i.e., major releases in the past
> year). Plus, several free (in all senses) implementations, one of
> which has a very active community.

Squeak, yes...

Smalltalk has the demerit that _most_ of the implementations store
code using a (rather LispOSy) "runtime environment" scheme, as opposed
to loading source code from files.

>> Which is not a half-bad language...

> Ruby is a nice language, even if it has TMPWTDI (too many perl ways
> to do it) for my taste, but it's hardly the second coming of
> Smalltalk. Smalltalk flourishes and Ruby is just another stream...

There are packaged versions of Ruby for major Linux and BSD
variations, so that it's downright _easy_ to pull in dependancies if
you have some apps written in Ruby.

In contrast, Smalltalks tend to play more like a "total environment,"
with the Outer Limits "We control the horizontal; we control the
vertical" thing.

It's quite easy for Ruby to get deployed more; it's not so easy for
Smalltalk...
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
Would-be National Mottos:
Tibet: "It's all downhill from here!"

Bijan Parsia

unread,

Jun 7, 2002, 10:41:14 AM6/7/02

to

On 7 Jun 2002, Christopher Browne wrote:

> In the last exciting episode, Bijan Parsia <bpa...@email.unc.edu> wrote::

[snip]

> > sold and supported implementations, all, to my knowledge, under
> > reasonably active development (i.e., major releases in the past
> > year). Plus, several free (in all senses) implementations, one of
> > which has a very active community.
>
> Squeak, yes...

Yep. And GNU Smalltalk is used and developed and rather interesting
(although not widely used to my knowledge). Bistro (a Smalltalk
compatibility package/implementation for Java) is also interesting, but,
to my knowledge, also not very popular. Still neat.

> Smalltalk has the demerit

This demerit doesn't change the *brute* fact that Smalltalk is far more
widely deployed than Ruby. It just is. I has been all along. Ruby may
become more popular sometime, maybe soon. But even then, unless Smalltalk
*declines* (instead of growing, as its doing now) a *lot*, Ruby won't be
the second coming of Smalltalk. (Smalltalk has to *leave*.)

So, let's keep on the point, here.

> that _most_ of the implementations store
> code using a (rather LispOSy) "runtime environment" scheme, as opposed
> to loading source code from files.

Actually, source code is typically stored in two files, the sources file
and the changes file.

There is a runtime image as well, and you do modify that directly. That
makes it more difficult to reconstruct a system from source code alone.

But most Smalltalks are moving to a variety of modularity and build
systems. <shrug/> Doesn't mean that the langauge isn't doing well overall.

[snip]

> There are packaged versions of Ruby for major Linux and BSD
> variations, so that it's downright _easy_ to pull in dependancies if
> you have some apps written in Ruby.

There are packaged versions of various Smalltalk implementations for major
Linux and BSD variations, and decent installers for others for windows and
MacOS. So?

> In contrast, Smalltalks tend to play more like a "total environment,"
> with the Outer Limits "We control the horizontal; we control the
> vertical" thing.

Which also makes it easier to deploy crossplatform apps that work
identically. WHich is what some of us want, at least some of the time.

> It's quite easy for Ruby to get deployed more; it's not so easy for
> Smalltalk...

Even if so, it *isn't* deployed more. Q.E.D.

I mean, Smalltalk syntax isn't very much like perl syntax. More
programmers are comforatable with perl like syntax. Ruby has Perl like
syntax. Duh, it's easier for Ruby to get deployed more (i.e., to such
programmers). But Ruby *isn't* deployed more than Smalltalk right now.

C'mon.

Cheers,
Bijan Parsia.

ozan s yigit

unread,

Jun 7, 2002, 11:10:20 AM6/7/02

to

Bijan Parsia:

> Even if so, it *isn't* deployed more. Q.E.D.

D in QED is for demonstradum, meaning "demonstrated" not "bijan says so."

[one can always type "implemented in ruby" and "implemented in smalltalk"
(with quotes) to google and look at the the hit numbers but that is only
slightly better than bijan's say so. :-]

oz
---
mankind confuses opinion with intelligence. -- don juan in hell (g. b. shaw)

Ed L Cashin

unread,

Jun 7, 2002, 11:13:17 AM6/7/02

to

Bijan Parsia <bpa...@email.unc.edu> writes:

...

> This demerit doesn't change the *brute* fact that Smalltalk is far
> more widely deployed than Ruby. It just is.

With all due respect, how do you know?

--
--Ed L Cashin | PGP public key:
eca...@uga.edu | http://noserose.net/e/pgp/

Bijan Parsia

unread,

Jun 7, 2002, 1:01:39 PM6/7/02

to

On 7 Jun 2002, ozan s yigit wrote:

> Bijan Parsia:
>
> > Even if so, it *isn't* deployed more. Q.E.D.
>
> D in QED is for demonstradum, meaning "demonstrated" not "bijan says so."

It helps you follow the whole argument.

> [one can always type "implemented in ruby" and "implemented in smalltalk"
> (with quotes) to google and look at the the hit numbers but that is only
> slightly better than bijan's say so. :-]

Hmm. And the point that there are 5-7 commercially supported Smalltalk
implementations, including one by IBM, isn't evidence?

Note that the person I was arguing with shifted ground from Ruby *being*
more widely deployed for it being *easier* for Ruby to *become* more
widely deployed.

Even if it *were* more widely deployed, that doesn't mean that Smalltalk
doesn't have a reasonable life of it's own. That fact is, I think, amply
shown by the information I've provided (which is easily verified).

Cheers,
Bijan Parsia.

Bijan Parsia

unread,

Jun 7, 2002, 1:12:21 PM6/7/02

to

On 7 Jun 2002, Ed L Cashin wrote:

> Bijan Parsia <bpa...@email.unc.edu> writes:
>
> ...
> > This demerit doesn't change the *brute* fact that Smalltalk is far
> > more widely deployed than Ruby. It just is.
>
> With all due respect, how do you know?

Well, by the number of commercial implementations, some by very big corps,
a greater than 20 year history, being taught in many universities,
etc. etc. The relative youth of Ruby, etc. etc.

Is this completely conclusive, of course not. If Ruby is bundled with a
Linux distribution, it might pull a head in "number of places
installed" (though VisualWorks is/was bundled on the commerical/demo disk
of RedHat).

Of course, are these deployments worth the name? Probably not.

I'm a little surprised that anyone thinks that overall Ruby use has
outstripped overall Smalltalk use, and would welcome *any* evidence in
support of this. Browne offered none first round, then offered Ruby's ease
of getting deployed (something share by at least some Smalltalks!) as
evidence for...well...something.

Even if Ruby is currently beating out Smalltalk in marketshare (let's
grant it) that doesn't show that Smalltalk is moribund or that it
"needs" Ruby to continue to grow. Of course, Ruby's successis *welcome* to
Smalltalkers, but it's just FUD to say that Smalltalk is dead and Ruby is
its revival.

Odd that *I'm* the only one in this thread to be held to a rather high
standard of proof.

Some parts of all my posts are speculative, natch. The Smalltalk market
could collapse tomorrow.

But I doubt it :)

(Note that one of the more innovative dialects and implementations of
Smalltalk, SmallScript, probably has as much call as Ruby as a major
"reviver" (marketshare extender?) of Smalltalk, though it's younger, not
quite done, and not opensource. It's probably of more interest to CLers,
as it borrows quite a bit from CL, including multimethods and an extensive
MOP.)

Cheers,
Bijan Parsia.

Bijan Parsia

unread,

Jun 7, 2002, 1:15:37 PM6/7/02

to

On Fri, 7 Jun 2002, Bijan Parsia wrote:

> On 7 Jun 2002, ozan s yigit wrote:
>
> > Bijan Parsia:
> >
> > > Even if so, it *isn't* deployed more. Q.E.D.
> >
> > D in QED is for demonstradum, meaning "demonstrated" not "bijan says so."
>
> It helps you follow the whole argument.

Erk. I meant to type, "It helps if you had followed the whole argument."

Doesn't matter. It was more than have a joke. (The Q.E.D.)

Cheers,
Bijan Parsia.

Thomas Bushnell, BSG

unread,

Jun 7, 2002, 1:17:30 PM6/7/02

to

ozan s yigit <o...@blue.cs.yorku.ca> writes:

> Bijan Parsia:
>
> > Even if so, it *isn't* deployed more. Q.E.D.
>
> D in QED is for demonstradum, meaning "demonstrated" not "bijan says so."

No, D is for "demonstrandum", meaning "needing to be demonstrated".

(The past tense comes from the E ["erat"], meaning "was".)

Q is quod, the relative pronoun "that" or "which".

So QED == "that which was to be demonstrated".

Thomas

Kragen Sitaker

unread,

Jun 7, 2002, 2:17:31 PM6/7/02

to

Erik Naggum <er...@naggum.net> writes:
> * Kragen Sitaker
> | So far, this is all theoretical. I don't know of any Unicode language
> | that has actually taken the plunge and declared itself committed to
> | case-insensitivity.
>
> * Erik Naggum
> > It seems more appropriate to require lowercase-only names in a Unicode-
> > based language than to allow and preserve mixed case.
>
> * Kragen Sitaker
> | You know what is uppercase today; but what man can say what will be
> | uppercase tomorrow? Each month Unicode adds a new script.
>
> I am at loss to the relevancy to what you quoted me saying. Would you
> care to elaborate or point me to the missing argument or reasoning?

If you allow identifiers written in Unicode in many scripts, you must
either allow unassigned code points in identifiers or forbid them.

If you forbid them, then only scripts added to Unicode before your
language processor is released will be supported, and future versions
will support more scripts. With every release, the language syntax
will change in such a way that previously unparsable documents become
parsable.

If you allow them, then you allow code points that will eventually be
assigned to uppercase characters --- assuming, that is, that more
uppercase characters are eventually added to Unicode, which seems
likely to happen eventually, but very infrequently. If you do not
allow uppercase characters in identifiers, then when uppercase
characters are added to Unicode, the language syntax makes previously
parsable documents become unparsable, which is far worse.

On the other hand, you could freeze your language specification at
some version of Unicode, with the result that either some arbitrary
scripts cannot be used in identifiers or that some arbitrary uppercase
characters are allowed.

Perhaps this demonstrates that Unicode is a bad idea. I think Unicode
is a terrible thing, but preferable to any alternative I know of.

Thomas Bushnell, BSG

unread,

Jun 7, 2002, 2:58:46 PM6/7/02

to

Kragen Sitaker <kra...@pobox.com> writes:

> If you allow identifiers written in Unicode in many scripts, you must
> either allow unassigned code points in identifiers or forbid them.

Quite wrong.

You can: promise not to reject them, and not promise to keep them
stable.

This allows programs that wish to depend on a specific version of your
system, to do so, by using unassigned code points.

Thomas