Here is the code, which adds dynamic loading of shared libraries to
CLISP. Each external function has to be defexternaled, external names
are case sensitive, and if possible I would like to avoid having to
give the name of the external twice (name and :entry-name). The
problem is in the line (*), where SYMBOL-NAME returns the upcased
value of `name'.
Thanks in advance for any hints.
(defmacro defexternal (name param-list
&key return-type entry-name (library-name "libc.so"))
(unless entry-name (setq entry-name (symbol-name name)))
;(*)
(let* ((return-func (get-returner return-type))
(arg-names (mapcar #'first param-list))
(arg-types (mapcar #'second param-list))
(push-defs (loop for type in arg-types
for arg in arg-names
and push-func = (get-pusher type)
collect `(,push-func ,arg))))
`(PROGN
(DEFUN ,name (,@arg-names)
(LET* ((lib (dl::ensure-library ,library-name))
(sym (dl::ensure-symbol lib ,entry-name)))
(dli::init-external-call)
,@push-defs
(,return-func sym))))))
--
Eric Marsden
emarsden @ mail.dotcom.fr
It's elephants all the way down
> Is it possible, within a macro definition, to obtain the name of the
> macro with case preserved? I have tried using readtable-case, but it
> applies too late in the game, applying only to the forms inside the
> macro.
>
> Here is the code, which adds dynamic loading of shared libraries to
> CLISP. Each external function has to be defexternaled, external names
> are case sensitive, and if possible I would like to avoid having to
> give the name of the external twice (name and :entry-name). The
> problem is in the line (*), where SYMBOL-NAME returns the upcased
> value of `name'.
How about entering the name as a string instead of a symbol, or
escaping the symbol?
E.g, (defexternal "FuncWithMixedCase" ...)
or (defexternal |FuncWithMixedCase| ...)
//Raymond.
--
Raymond Wiker, Orion Systems AS
+47 370 61150
"All I want for Christmas is Bill Gates' front teeth..."
No, it's not possible. Case transformation is done by the reader, long
before the macro runs. It doesn't save the actual input anywhere. You can
require the user to use READTABLE-CASE, but that's going to cause havoc
with the rest of his code (he'll have to type all standard CL symbols in
upper case).
>Here is the code, which adds dynamic loading of shared libraries to
>CLISP. Each external function has to be defexternaled, external names
>are case sensitive, and if possible I would like to avoid having to
>give the name of the external twice (name and :entry-name). The
>problem is in the line (*), where SYMBOL-NAME returns the upcased
>value of `name'.
All the foreign function interfaces I've ever seen require you to specify
the name twice -- once for the Lisp symbol and then again for the remote
entry name.
--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Don't bother cc'ing followups to me.
(defmacro defexternal (name param-list
&key return-type (library-name "libc.so"))
"Name MUST be a string!"
(check-type name string)
....
`(PROGN
(DEFUN ,(intern (string-upcase name)) (,@arg-names)
.....
then you could have:
(defexternal "foo" (bar))
and call (FOO 42) in your lisp code.
Bernhard
--
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence bern...@ai.univie.ac.at
You might want to avoid the uppercase conversion done by the reader.
For this purpose, clisp has the notion of case sensitive packages.
Example:
(defpackage "EXTERNAL"
(:case-sensitive t)
(:nicknames "EXT")
(:use))
You would then write
(defexternal ext::getenv ...)
or -- inside that package --
(user::defexternal getenv ...)
This has been used for the CLISP to Linux libc bindings, see
clisp-1998-09-09/modules/bindings/linuxlibc[56]/linux.lsp.
Bruno http://clisp.cons.org/~haible
> Eric Marsden <emar...@mail.dotcom.fr> writes:
>
> > Is it possible, within a macro definition, to obtain the name of the
> > macro with case preserved? I have tried using readtable-case, but it
> > applies too late in the game, applying only to the forms inside the
> > macro.
> >
> > Here is the code, which adds dynamic loading of shared libraries to
> > CLISP. Each external function has to be defexternaled, external names
> > are case sensitive, and if possible I would like to avoid having to
> > give the name of the external twice (name and :entry-name). The
> > problem is in the line (*), where SYMBOL-NAME returns the upcased
> > value of `name'.
>
> How about entering the name as a string instead of a symbol, or
> escaping the symbol?
>
> E.g, (defexternal "FuncWithMixedCase" ...)
> or (defexternal |FuncWithMixedCase| ...)
>
That would not work either. You'd have to call the macro using that
symbol.
(|FuncWithMixedCase| .....)
Not pretty.
The reason why the original Common Lisp group chose to stick with the
idea of uppercasing all symbols being interned is the second mystery
of the universe :)
Cheers
--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 10 03 16, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it
> Raymond Wiker <ray...@orion.no> writes:
>
> > How about entering the name as a string instead of a symbol, or
> > escaping the symbol?
> >
> > E.g, (defexternal "FuncWithMixedCase" ...)
> > or (defexternal |FuncWithMixedCase| ...)
> >
>
> That would not work either. You'd have to call the macro using that
> symbol.
>
> (|FuncWithMixedCase| .....)
>
> Not pretty.
True, but if you enter the name as an "escaped" symbol or a
string, you could have a transformation from a C-type name (mixed
case, possibly with underscore characters). I think that ILU and the
proposed CORBA mapping for CL do this: replace underscores with
hyphens, insert hyphens at lowercase-to-uppercase transitions. E.g;
FuncWithMixedCase -> func-with-mixed-case
Note that this particular scheme may give conflicts, as
func_with_mixed_case -> func-with-mixed-case, but *in principle* it
should be doable.
> The reason why the original Common Lisp group chose to stick with the
> idea of uppercasing all symbols being interned is the second mystery
> of the universe :)
At least they force the internal representation to *be* all
uppercase. It's much worse with something like Windows NT, which
- does not distinguish between uppercase and lowercase
- preserves case in file names
- and capitalises file names so that they look "prettier"
BP> Why not just change the calling convention of DEFEXTERNAL and
BP> force the name to be a string:
I guess this would work, but it seems a pity to impose a different
syntax for DEFEXTERNAL from that of DEFUN, DEFMACRO etc, where the
semantics (if not the implementation) are so similar.
Out of curiosity, how many CL implementations provide a .h parser
which extracts call-out declarations automatically? Do any
implementations allow the passing of C structures/unions between C and
a mapped CL type?
< Out of curiosity, how many CL implementations provide a .h parser
< which extracts call-out declarations automatically? Do any
< implementations allow the passing of C structures/unions between C and
< a mapped CL type?
I know you can use structs with C/Lisp, and I don't see any different
between a struct or union since both are just blocks of, hopefully
aligned, memory.
There is a .h parser/ff-automater for acl5 on sun, sgi, and windows in
pub/cbind/ on their ftp server. There is also some code that came with
my acl5.0 in #p"sys:;contrib;defctype; that does the same thing.
ecm> which extracts call-out declarations automatically? Do any
ecm> implementations allow the passing of C structures/unions between C and
ecm> a mapped CL type?
If I understand your question correctly, I think CMUCL allows this.
You can pass around the C type to and from any lisp function.
Ray
... [Re: Obtaining the case preserved name of a macro] stuff ...
>
> The reason why the original Common Lisp group chose to stick with the
> idea of uppercasing all symbols being interned is the second mystery
> of the universe :)
>
I would like to see here some more discussion on the rational for doing this
in CL. This is one of the more embarrassing things that I have to explain
to new people that I am introducing to Common Lisp. I can see no reason why
case sensitive symbols by default would not be superior. Implementations
could then give the other three symbol/case possibilities as an option
(See Allegro's symbol/case options for example).
My understanding (correct me if I'm wrong) of the original concept is that a
symbol denotes (or even connotes) an abstraction irrespective of its case.
For example the symbols "red" and "RED" both may denote the concept "the
color red" in a program. However one can make the argument that in the case
of "Bill" and "bill" we have two distinctly different concepts. Furthermore
one can make the argument that "red" and "RED" in fact could denote different
concepts. Or we may even have a use to distinguish "rEd" from "ReD". I
sure hope the reason that it stuck was because the original LISP was all
upper case.
Another argument for case sensitive symbols is that it is easier to
implement and faster to compute.
--
William P. Vrotney - vro...@netcom.com
This and the fact that "historically" some pre-CL Lisps did the
uppercasing are probably the main reasons. The other ones just seem
"rationalizations" of the choice.
--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 10 03 17, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it
>
>In article <lwn25pl...@copernico.parades.rm.cnr.it> Marco Antoniotti
><mar...@copernico.parades.rm.cnr.it> writes:
>
> ... [Re: Obtaining the case preserved name of a macro] stuff ...
>
>>
>> The reason why the original Common Lisp group chose to stick with the
>> idea of uppercasing all symbols being interned is the second mystery
>> of the universe :)
>>
>
>I would like to see here some more discussion on the rational for doing this
>in CL. This is one of the more embarrassing things that I have to explain
>to new people that I am introducing to Common Lisp. I can see no reason why
>case sensitive symbols by default would not be superior. Implementations
>could then give the other three symbol/case possibilities as an option
>(See Allegro's symbol/case options for example).
>
I have no idea why the designers of CL made that decision. However, it
supports a principle I learned in the 70s (probably from Henry Ledgard's
book, "Programming Proverbs") -- that of ensuring a sufficient "conceptual
distance" (a term intended to be evocative, no formal interpretation
applies) between identifiers. Names that differ in one letter, for example,
may be easy to misread; this could lead to inadvertent substitution of one
identifier for another during code construction. Case sensitivity opens the
door to a more insidious form of this problem; you can have many identifiers
with the same spelling but different capitalization. In languages that can
check usage because they require declaration before use, this is less of a
problem. But in a language that introduces a new identifier at first use in
the source test, this can be a cause of undetected errors.
>My understanding (correct me if I'm wrong) of the original concept is that a
>symbol denotes (or even connotes) an abstraction irrespective of its case.
>For example the symbols "red" and "RED" both may denote the concept "the
>color red" in a program. However one can make the argument that in the case
>of "Bill" and "bill" we have two distinctly different concepts. Furthermore
>one can make the argument that "red" and "RED" in fact could denote different
>concepts. Or we may even have a use to distinguish "rEd" from "ReD". I
>sure hope the reason that it stuck was because the original LISP was all
>upper case.
>
Let's go with the 'Bill' vs. 'bill' distinction. One is a proper name and
the other has something to do with financial matters (or the anatomy of
certain birds). Programs that deal with semantics have a richer means of
expression at their disposal than mere capitalization. Capitalization might
be an important clue when parsing natural language. Once past the parser,
richer representations are available.
>Another argument for case sensitive symbols is that it is easier to
>implement and faster to compute.
I programmed a long time ago (when 5MB of rotating storage occupied a box
the size of a washing machine) using an assembler that was case sensitive
for precisely the reason that it would measurably impact performance to do
case folding. My gut feel is that on today's computers the difference would
not even be measurable -- the tiny extra time spent case folding should be
swamped by the I/O performance of the program.
---
David B. Lamkins <http://www.teleport.com/~dlamkins/>
Case transformation of input allows the system to provide an illusion of
case insensitivity with little overhad. Case insensitivity was probably a
good idea when Maclisp was being designed in the early 70's -- there were
still lots of uppercase-only terminals in use, but users with newer
terminals that had lowercase would generally use lowercase. Canonicalizing
input allowed both classes of users to coexist easily.
There have been a number of case-sensitive Lisps: Multics Maclisp and Franz
Lisp (not to be confused with Allegro CL from Franz) are both case
sensitive, using lowercase for all the built-in symbols. Luckily, most
programmers wrote their files in lowercase by the late 70's, so moving
source files between these environments wasn't too difficult. But Maclisp
begat Lisp Machine Lisp, which begat Common Lisp, and the case
transformation style of Maclisp has been propagated throughout (although
ANSI CL added the ability to configure it on a per-readtable basis).
Many languages support case sensitivity because it is just easier
to implement -- let the users deal with it, and hey, lets even
call it a feature, not a bug.
And of course, many old implementations used a small number of
characters to distinguish identifiers so longer names would
not be possible.
-Kelly Murray k...@intellimarket.com
> My understanding (correct me if I'm wrong) of the original concept is that a
> symbol denotes (or even connotes) an abstraction irrespective of its case.
> For example the symbols "red" and "RED" both may denote the concept "the
> color red" in a program. However one can make the argument that in the case
> of "Bill" and "bill" we have two distinctly different concepts.
Still, we manage OK at parsing things like
Finance Bill Passes After Slight Amendment
HELLO, BILL. HOW ARE YOU?
hi, my names bill, i like common lisp, who are you
which suggests that maybe we aren't relying that heavily on
case sensitivity. Indeed, spoken language is generally reckoned
to predate written language by a long way, so this is no
surprise...
--
Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics,
gj...@dpmms.cam.ac.uk Cambridge University, England.
A situation currently important to me: Writing Lisp (well, Scheme) that
gets compiled to C and must co-exist with already case-sensitive external
names in the C environment. Not that I can't use "meaningful"-enough names
in my Lisp code (that can be done easily enoug, and the names trivially mapped
to acceptible-to-C names), but that it needs to reference already-defined
case-sensitive names (externals, macros, struct elements, enums, etc.).
Fortunately, the Scheme that I'm using for bootstrapping has a case-sensitive
READ mode that can be turned on...
-Rob
-----
Rob Warnock, 8L-855 rp...@sgi.com
Applied Networking http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
2011 N. Shoreline Blvd. FAX: 650-964-0811
Mountain View, CA 94043 PP-ASEL-IA
>
> I don't see any reason to use case to distinguish symbols.
> I see many reasons not to, most importantly is that it can
> create bugs and confusion. Programmers have enough problems without
> adding yet more sources of them.
> In fact, I would say very emphatically, that if a programmer
> wrote two functions that had names identical except for case,
> they should be taken out back and buggy whipped.
>
C programmers do not seem to have any problem with case sensitive symbols.
In fact C++ programmers use a convention to their advantage to distinguish
compact names such as "functionP" and to distinguish Box_Contents as a class
name versus box_contents as a function name. Of course this is not
necessary in Common Lisp, however the convention of using *foo* as a global
symbol is not necessary either, but it sure helps make more readable code.
I find having to specify the symbol "Bill" in Common Lisp as "|Bill|" not
only awkward but find it non-symmetric in having to specify it differently
than the symbol "bill". The fact that the former prints as |Bill| makes it
difficult to read and parse in general.
I also find having a debugger printing something like the expression
(defun faa (x)
(cond ((x (+ x x)))
(t 0)))
as
(DEFUN FAA (X)
(COND ((X (+ X X)))
(T 0)))
yuckie.
>
>In article <3655F612...@IntelliMarket.Com> Kelly Murray
><k...@IntelliMarket.Com> writes:
>
>>
>> I don't see any reason to use case to distinguish symbols.
>> I see many reasons not to, most importantly is that it can
>> create bugs and confusion. Programmers have enough problems without
>> adding yet more sources of them.
>> In fact, I would say very emphatically, that if a programmer
>> wrote two functions that had names identical except for case,
>> they should be taken out back and buggy whipped.
>>
>
>C programmers do not seem to have any problem with case sensitive symbols.
>In fact C++ programmers use a convention to their advantage to distinguish
>compact names such as "functionP" and to distinguish Box_Contents as a class
>name versus box_contents as a function name.
This is OK so long as you have a reference document to describe the
conventions being followed. It's even nicer if every programmer on a
project follows the same conventions without exception <g>. Who was it that
said "Managing programmers is like herding cats."?
[snip]
>I find having to specify the symbol "Bill" in Common Lisp as "|Bill|" not
>only awkward but find it non-symmetric in having to specify it differently
>than the symbol "bill". The fact that the former prints as |Bill| makes it
>difficult to read and parse in general.
>
I'm not sure I understand the cause for your objection. Obviously, the Lisp
printer and reader have no problems with this. So what is it you're doing
that absolutely requires you to have case-sensitive names? Is there
something intrinsic to your Lisp program that depends on case distinctions,
or are you having a problem interfacing to external libraries with
case-sensitive names?
This thread was spawned by a query on how to preserve case from a macro
definition used to interface CLISP to external libraries. There are at
least three ways to deal with this particular problem:
1. If the names of library entry points don't occur in a canonical case,
then add an optional argument to the macro to accept a string or symbol
giving the proper case-sensitive name of the entry point. For example, if
you're interfacing Lisp to C where Lisp symbols are normally uppercase and C
identifiers are normally lowercase, then the default behavior of the
interface macro would be to use the Lisp symbol as-is for the Lisp side of
the interface and to downcase the print representation of that symbol to
name the C entry point. This would work well if you have only a small
number of exceptions to the uniform case convention on either side of the
interface; the exceptions would be handled by having the programmer provide
appropriate spellings for both sides of the interface, like
(defexternal (lisp-name "C_Name") ...)
Unfortunately, the use of bicapitalization for word breaks became all the
rage sometime in the 70s or 80s (my first exposure was via the Mac OS APIs,
which were originally case-insensitive). Many libraries now use mixed-case
identifiers exclusively, so the previous section's suggestion becomes labor
intensive and error-prone (although you likely need to get it right just
once, then forget about it). Therefore, suggestions 2 and 3...
2. You can extract the proper case from the library itself, assuming that
you can figure out how to parse the library. The pitfalls to watch our for
are different namespaces in the library (e.g. static variables vs. function
entries) and names distinguished in the library only by case (e.g. DoFoo vs.
doFoo). On the latter point, I refer you to Kelly's closing remark,
preserved above <g>.
3. If you can't (or don't want to) parse the library's symbol table(s), you
can probably find a tool that will list identifiers from the library. Put
this listing someplace where your interface macro can find it, then do the
same thing as in suggestion 2.
>I also find having a debugger printing something like the expression
>
> (defun faa (x)
> (cond ((x (+ x x)))
> (t 0)))
>
>as
>
> (DEFUN FAA (X)
> (COND ((X (+ X X)))
> (T 0)))
>
*PRINT-CASE* lets you change this for normal output. Some implementations
also have a way of specifying print case separately for certain tools
(debugger, inspector, ...) -- a quick trip to the documentation or apropos
should help you find these.
I think that statement needs some qualification. C programmers do
seem to use various case conventions, and of course, they say they
have no problem doing that. But how do they know that they don't? --
the same people say they have no problem doing manual store
management! To know for sure, you'd need to do some hairy
measurements on productivity with systems that made use of case
sensitivity in various ways vs ones that did not, but were otherwise
equivalent. And those measurements are probably expensive enough to
do that they'll never be done, so we'll never actually know.
Of course the same problem exists for almost any of these issues, so
many or most of them probably have no clear solutions.
--tim
so (setq *print-case* :downcase) and increase your happiness.
#:Erik, who reads the fine standard and all the manuals
--
The Microsoft Dating Program -- where do you want to crash tonight?
> Raymond Wiker <ray...@orion.no> writes:
>
> > How about entering the name as a string instead of a symbol, or
> > escaping the symbol?
> >
> > E.g, (defexternal "FuncWithMixedCase" ...)
> > or (defexternal |FuncWithMixedCase| ...)
> >
>
> That would not work either. You'd have to call the macro using that
> symbol.
>
> (|FuncWithMixedCase| .....)
>
> Not pretty.
Of course, one could just modify the macro itself to compute the name
that you call from lisp to be the one with the name in uppercase.
--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu
Even better, the macro could invoke (at compile time, of course)
a routine that converted the external name to *whatever* your
preferred Lisp name was, with rules as complicated as you like,
e.g. (defexternal |XtGetMultiClickTime| ...) could define a Lisp
name of external::xt-get-multi-click-time, or whatever.
I have never quite understood the desire to treat Common Lisp symbols as
somehow intrinsically related to symbol names in any foreign language.
in my view, a more Lispy symbol name makes sense while a C-like symbol
name (including case hypersensitivity) does not, the latter just being a
string to Lisp, so I would prefer something along these lines:
(defexternal (xt::get-multi-click-time "XtGetMultiClickTime") ...)
this makes even more sense with C++'s reinvention of mutually exclusive
wheels with namespaces, type encoding (mangling) into the name, etc.
#:Erik
--
foggy FFI -- external world isolated
Well, the situation that prompted my comment was wanting to automatically
generate a *huge* number of "defexternal"s, without having to manually
invent & type a Lispy name for each external case-sensitive name.
+---------------
| in my view, a more Lispy symbol name makes sense while a C-like symbol
| name (including case hypersensitivity) does not, the latter just being a
| string to Lisp, so I would prefer something along these lines:
|
| (defexternal (xt::get-multi-click-time "XtGetMultiClickTime") ...)
+---------------
That's fine, and I agree that any reasonable "defexternal" *should* require
both a Lisp symbol and a string for the C name. But if you have a *large*
number of external names that happen to obey some reasonably consistent
construction rule(s), why not save work by creating a macro that constructs
the (case-insenstive) Lisp name from the (case-sensitive) C name for you
automatically? Something like:
(defmacro defexternal-autotrans (string &rest body)
`(defexternal (,(my-c-name-to-lisp-name string) ,string) ,@body))
...
(defexternal-autotrans "XtGetMultiClickTime" ...)
...
Saves typing, avoids typos, etc.
You can still use the underlying 2-arg form if/when your name-mangler
runs into trouble...
> In article <lwn25pl...@copernico.parades.rm.cnr.it> Marco Antoniotti
> <mar...@copernico.parades.rm.cnr.it> writes:
>
> ... [Re: Obtaining the case preserved name of a macro] stuff ...
>
> >
> > The reason why the original Common Lisp group chose to stick with the
> > idea of uppercasing all symbols being interned is the second mystery
> > of the universe :)
The reader translates case. INTERN is case-sensitive.
One of the great mysteries language designers stay up late pondering
is why people continually use language that confuses these two. (Even
if you knew what you meant, it risks confusing someone else when you
have not used the precise language to explain it.)
(intern "red") and (intern "RED") yield different symbols.
> I would like to see here some more discussion on the rational for doing this
> in CL. This is one of the more embarrassing things that I have to explain
> to new people that I am introducing to Common Lisp. I can see no reason why
> case sensitive symbols by default would not be superior. Implementations
> could then give the other three symbol/case possibilities as an option
> (See Allegro's symbol/case options for example).
There are numerous reasons. If I had more time at the keyboard I'd explain
it but suffice it to say that people disagree about case choices a lot,
and having a case-translating reader ensures that people can program in the
case of their choice with a minimum of impact on others.
> My understanding (correct me if I'm wrong) of the original concept is that a
> symbol denotes (or even connotes) an abstraction irrespective of its case.
I think you're wrong here. I'll assume subsequent discussion has cleared
that up, though.
> For example the symbols "red" and "RED" both may denote the concept "the
> color red" in a program. However one can make the argument that in the case
> of "Bill" and "bill" we have two distinctly different concepts. Furthermore
> one can make the argument that "red" and "RED" in fact could denote different
> concepts. Or we may even have a use to distinguish "rEd" from "ReD". I
> sure hope the reason that it stuck was because the original LISP was all
> upper case.
Officially, uppercase is the default canonical case for compatibility reasons.
Personally, I also think it's the right canonical case. :-)
THIS SENTENCE IS SYNTACTICALLY PROPER ENGLISH.
SENTENCES IN ALL-UPPERCASE APPEAR ON BILLBOARDS ALL THE TIME.
this sentence is not syntactically proper.
only unix and lazy people (sometimes including me) type in all-lowercase.
as such, this is a bad choice of canonical case, since as a single-case
choice it yields syntactically ill-formed sentences.
Besides, in text containing mixed references to text and code, such as
my remark earlier about INTERN, and this one, which refers back to the
INTERN remark but uses INTERN itself three times, it's easy to tell
code from text because of the all-uppercase. I first saw fonted text
on computer screens decades ago when I first came to the MIT AI Lab,
but I still can't reliably send anything more than 7-bit ASCII in mail
messages even today. As such, being able to clearly set off code continues
important to me, and case-sensitivity continues as an issue.
> Another argument for case sensitive symbols is that it is easier to
> implement and faster to compute.
Since symbols in Lisp are case-sensitive, I'm not sure what you mean.
Symbols are interned and compared by pointer-comparison. The symbol
FOO can be typed as "foo" or "Foo" or "FoO" or "FOO" (minus the quotes).
But there are symbols Foo and foo which you can type as "|Foo|" or "|foo|"
(minus the quotes). The cost of the reader doing case-translation is tiny
since it's only one more memory cycle to access a case-translated character
out of an array, and this cost is only incurred when the READ function
is used, which is not in program execution under normal circumstances.
Compiled code usually has faster representations for loading which doesn't
go through read, and executing programs do not typically call READ
unless storing symbolic data in a probably-inefficient way.
> I also find having a debugger printing something like the expression
>
> (defun faa (x)
> (cond ((x (+ x x)))
> (t 0)))
>
> as
>
> (DEFUN FAA (X)
> (COND ((X (+ X X)))
> (T 0)))
>
> yuckie.
I'd have a problem with it the other way around. I'm not arguing you should
have to cope with that; just that it matters that the language supports
both forms and implicitly encourages people not to make more than one name
with the same letters but differing only in case. As long as you don't
walk over someone else's space, setting *PRINT-CASE* should leave you happy.
Once you start choosing to use both Foo and FOO as different symbols, of
course, *PRINT-CASE* won't help you. But once you start to have both Foo and
FOO as different symbols, you are creating many nightmares for many people
that go well beyond what it's reasonable for you to do as a point of personal
choice.
Personally, I often program in uppercase and rather like it. It sets off
code words from non-code words, as in:
(DEFUN FOO (X) ;This is a comment, so occurs in human case.
(WRITE-STRING "This is not code either, really."))
Further, in an interactive session I assume text I see in lowercase is
what I've typed and text in uppercase is what the system has typed back.
I find the visual distinction very helpful. That you don't is fine.
I just didn't want you to think the use of uppercase was without supporters
or without principle.
> THIS SENTENCE IS SYNTACTICALLY PROPER ENGLISH.
> SENTENCES IN ALL-UPPERCASE APPEAR ON BILLBOARDS ALL THE TIME.
>
> this sentence is not syntactically proper.
This is a good point. Although it is funny that you chose billboards, since
I am seeing more and more billboards using all lower case these days. :-)
Don't forget my example of the importance of distinguishing the symbols "A"
and "a" in mathematics.
>
> > Another argument for case sensitive symbols is that it is easier to
> > implement and faster to compute.
>
> Since symbols in Lisp are case-sensitive, I'm not sure what you mean.
> Symbols are interned and compared by pointer-comparison. The symbol
> FOO can be typed as "foo" or "Foo" or "FoO" or "FOO" (minus the quotes).
> But there are symbols Foo and foo which you can type as "|Foo|" or "|foo|"
> (minus the quotes). The cost of the reader doing case-translation is tiny
> since it's only one more memory cycle to access a case-translated character
> out of an array, and this cost is only incurred when the READ function
> is used, which is not in program execution under normal circumstances.
> Compiled code usually has faster representations for loading which doesn't
> go through read, and executing programs do not typically call READ
> unless storing symbolic data in a probably-inefficient way.
Thats all that I was referring to is the reading and printing of symbols.
We also have to count PRINT (your right that is handy) here since an
implementation code has to figure out that it needs to print the symbol
"Foo" as |Foo| and not print the symbol "FOO" as |FOO|. I agree that the
time to implement this and the compute time to read and print is small, but
it is just one more thing. When trying to rationally compare the
differences between case sensitive versus case insensitive reading of
symbols, we are talking about lots of very subtle things.
I can go either way on the default case mode for reading of symbols, so I am
not ultimately trying to argue for one over the other, I just want to
understand the how programmers use case in Lisp. But here is a mind
experiment for you to help me answer my original query into these phenomena.
Suppose that we had originally defined Lisp to be case sensitive reading of
symbols, which eliminates the historical issues, and allow me to assume for
the sake of this mind experiment that we are not going to consider the fact
we can distinguish PRINT in prose (something I like by the way) is an
important enough reason to define a language spec. Then today, would Lisp
programmers have reasons to ever want the case insensitive version and for
what reasons?
In cases like this, I believe that case was used precisely *because* it was
inconvenient. KILL was presumably intended for exceptional situations (I
don't remember this distinction, but I'm guessing it did what "kill -9"
does), and they didn't want users running it inadvertently. Case is also
sometimes used similarly for command options, although more generally case
is used there for the same reason that mathematicians use it: the
convention is that Unix command options are a single character.
not that I ever discussed religion on the old telex machines my first
real employer used back in 1979/80, but they _were_ all lowercase. it's
also my _impression_ that the old Teletype was all lowercase, but I can't
say for sure. what I do know, however, was that on the mainframes of the
time, the canonical case was uppercase. the Cyber computers were short
one bit of a real character code, and they had all uppercase. the DEC-10
had 7-bit codes, and even 8- og 9-bit codes if you really wanted it to,
but still uppercase was the canonical case. I used TOPS-10 and TOPS-20
back then, and we had to quote lowercase letters in filenames with ^V.
(but tell this to the kids these days, and they won't believe you.)
I also observe a change towards less uppercase characters in general.
magazine and newspaper heads move towards lowercase. company logos and
trademarks are moving to all lowercase, too. dictionaries used to be
published with capitalized headwords -- that's history, too. "Lisp" used
to be written LISP, then L<small-caps>ISP</small-caps>, then Lisp, the
same as Unix. my own style is to maintain the case of the word in
electronic text regardless of its position in a sentence, rather than
force an information-losing upcase on the first letter of the sentence.
(thus the deity has nothing to fear from me.)
on the Net, domain names used to be written in uppercase. now they are
universally written in lowercase. fortunately for us all, the DNS is
case insensitive. mail addresses used to be in uppercase, too, but the
standards decreed that the local-part of a mail address not be munged.
this is actually _not_ an argument for changing the canonical case of
Common Lisp symbols. all we need is a slightly better way to ensure that
we can still talk about symbol in lowercase. currently, however, we may
have to expose the uppercaseness of symbol names when constructing them
on the fly with INTERN. this may be a good time to promote my #" reader
macro, which reads the following symbol (which must end in a ", too, and
case-translates as it would do for a symbol, but return the string
without ever interning the symbol. it's particularly easy to implement
in Allegro CL:
;;; reader for symbol names that does case conversion according to the
;;; rest of the symbol reader. thanks to John Foderaro for the pointer.
(defun symbol-namestring-reader (stream character prefix)
(declare (ignore prefix))
(prog1 (excl::read-extended-token stream)
(unless (char= character (read-char stream))
(excl::.reader-error stream "invalid symbol-namestring syntax"))))
(loop with readtables = (excl::get-objects 11)
for i from 1 to (aref readtables 0)
for readtable = (aref readtables i) do
(when (excl::readtable-dispatch-tables readtable)
(set-dispatch-macro-character #\# #\" 'symbol-namestring-reader readtable)))
this latter part actually affects all your readtables. you may have
valid reasons not to want that. this is best suited for customizations
prior to dumping a new Lisp.
this means you can write (apropos #"eric-fun"). I think this is cool.
#:Erik
>
> vro...@netcom.com (William Paul Vrotney) writes:
>
> > I can go either way on the default case mode for reading of symbols, so I am
> > not ultimately trying to argue for one over the other, I just want to
> > understand the how programmers use case in Lisp. But here is a mind
> > experiment for you to help me answer my original query into these phenomena.
> > Suppose that we had originally defined Lisp to be case sensitive reading of
> > symbols, which eliminates the historical issues, and allow me to assume for
> > the sake of this mind experiment that we are not going to consider the fact
> > we can distinguish PRINT in prose (something I like by the way) is an
> > important enough reason to define a language spec. Then today, would Lisp
> > programmers have reasons to ever want the case insensitive version and for
> > what reasons?
>
> For you these are thought exercises. For me they are memory exercises.
> This was the situation in some Lisp dialects and it was a disaster for
> those who didn't like lowercase. The problem is that some symbols given by
> the system must be in one case or another. e.g., you must say "car" or "CAR"
> or "Car" or have all three defined. (And in the cas of "Hash-Table" or
> "Hash-table" or "HASH-TABLE" or "hash-table" there are four possibilities
> that are possible in ordinary use.) Consequently, while you might choose
> to put things in all uppercase yourself, as in (CAR X), you are then imposing
> capital-CAR on everyone else, even the lowercase fans, who must write
> (CAR x) because (car x) is something else or is undefined. Or if you assume
> lowercase is the default as in (car x), then people who choose uppercase must
> still user lowercase for system symbols. (car X). That's just as offensive.
>
> Case translating allows you to write your code in lowercase if you like it,
> as (car x), and allows me to write in uppercase as (CAR X), and some third
> person to write (Car x) or (Car X). This is, I believe, maximally flexible
> to individual personal models. At that point, the only issue is what the
> canonical internal case is, and that seems to me to be largely arbitrary,
> though slightly influenced by external issues, such as the billboard issue
> and the religious issue I mentioned earlier, both of which point to uppercase
> as a preferred case.
>
Your point above on the problems with "car", "Car" "CAR" is well taken. But
I must add that I've written tons of Elisp and have yet to suffer any kind
of agony because it is case sensitive. Although I have heard programmers
complain about this situation, by and large in actual practice Elisp
programmers seem to have little problems with this. The reason being is
that most all of the Elisp functions are in lower case and programmers tend
to use the utility of case with discretion.
The billboard example is a good point from the perspective of using Lisp
symbols to represent natural language abstractions, but to represent
computer language abstractions, times have changed. When I look at Lisp
programs in my library of LISP books at one point they are all in upper
case, then after that point they are all in lower case. So I think lower
case Lisp documentation is here to stay for awhile. Again this is most
likely due to a silly reason again, keyboards, and that people find typing
without hitting the SHIFT key more streamlined. Nevertheless this is a
modern trend and is even creeping into natural language situations, such as
the counter billboard example.
For the moment going along with your point that case translation is better
but that canonical internal case is arbitrary. Then, in this case, why not
make the default canonical internal case lower. Then at least new Lisp
users might find learning and teachers teaching a bit easier for this
situation. We could argue -- just set up *print-case* to :upcase in some
site init lisp file for the new user. But if the new user is down-loading a
Lisp and trying it out he usually doesn't know about this variable or even
that he has a site or user init file capability. Also keep in mind that if
canonical lower case were the default, then half of us seasoned Lisp users
would not set *print-case* to :upcase in our init file. I don't deny your
arguments for the contrary, which I think are good reasons, such as the
ability to see input and output easily in a listener, I just wanted to point
out that there are legitimate reasons for other default alternatives.
Gee, and I thought the deity Kent was referring to was "IBM"... ;-}
+---------------
| also my _impression_ that the old Teletype was all lowercase...
+---------------
Nope. The KSR-33 & ASR-33 & KSR-35 were uppercase-only. The KSR-37 was
upper/lower, though.
If Lisp started case sensitive (BSD Franz Lisp was),
then we'd find uses for it that turn
it into a feature, and would not want it changed later.
However, if I were to design a new lanaguge, I would most definitely
make it case-insensitive. Imagine if HTML were case sensitive,
what a set of problems that would have caused, or URL query strings
names (oops, they CAN BE case sensitive for CGI scripts,
oh what bugs that can cause..)
> make the default canonical internal case lower.
I don't think it really matters much which is the default.
Making it upper does have the benefit of making it a little
clearer when you're dealing with symbol names, upcasing is
probably not done much for normal strings.
-Kelly Murray k...@intellimarket.com
> > Don't forget my example of the importance of distinguishing the symbols "A"
> > and "a" in mathematics.
>
> But this is an artificial situation. Mathemeticians sometimes have to refer
> to "small A", "big A", "script A", etc. in speech to refer to these where
> they occur mixed. Personally, I find that abhorrent. In ordinary speech,
> it simply does not occur. ("Oh, you asked for a book? I thought you'd said
> a Book." "No, my name isn't bill, it's Bill." These are statements you
> never hear. Case is not marked in speech. It is an important property
> of a good language that it can be conveniently spoken about.)
>
Good mathematical writing benefits from having a wide range of
characters and symbols available, although it can be overdone. It
helps the reader to have visual cues that differentiate between
different types of data. When speaking different kinds of cues can be
used - inflection; emphasis; volume etc, that aren't available when
writing.
For example, a usual convention is to refer the carrrier of an algebra
$\mathfrak{A}$ as $A$. This helps the reader remember the relationship
between the entities involved, particularly when a number of such are
employed in the same paragraph.
Once conventions have been established in a given field it serves as a
useful shorthand, the risk is that the jargon excludes those who are
not familiar with it...
I thought that the argument about relating code to speech was pretty
weak. Code is written, not spoken. (char= #\a #\A) ==> NIL. So
these things are, fundamentally, different, and the programmer can use
this.
There's one little thing, though. I used to see lots of C code which
used the convention of underscores to delimit words in varible and
function names, e.g. delete_header() or this_language_sux. Nowadays,
people have been using mixed case to avoid nasty underscores,
e.g. deleteHeader() and treeInsert(), etc. So mixed case is really
important there. But Lisp allows the use of dashes, which are so much
nicer than underscores: e.g. tree-insert, delete-header. So mixed
case isn't quite as important. William's Lpp program (which is truly
quite cool: http://www.interhack.net/projects/lpp/) uses this sort of
mixed-case on the C++ side. If he'd chosen to use underscores to
represent Lisp dashes in symbols, then (at least to me) that would be
abhorrent, and I'd stray away just on that note. I'd hate to see
read_from_string(), though I don't mind readFromString().
I like using Allegro, because it lets me say "I want my Lisp case
sensetive". To me, that is a good thing. I'm sure other vendors
provide similar options.
dave
> I can go either way on the default case mode for reading of symbols, so I am
> not ultimately trying to argue for one over the other, I just want to
> understand the how programmers use case in Lisp. But here is a mind
> experiment for you to help me answer my original query into these phenomena.
> Suppose that we had originally defined Lisp to be case sensitive reading of
> symbols, which eliminates the historical issues, and allow me to assume for
> the sake of this mind experiment that we are not going to consider the fact
> we can distinguish PRINT in prose (something I like by the way) is an
> important enough reason to define a language spec. Then today, would Lisp
> programmers have reasons to ever want the case insensitive version and for
> what reasons?
I would want a case insensitive reader, and I would mandate it for any
projects I ran.
Why? Because I find that I spend a huge amount of time when dealing
with a certain common style of C/C++/Java programming which likes to
name everything in weird-mixed-case -- FooInstance, (or should that be
fooInstance?) FooClass, MakeFoo. Whenever I have to deal with these
things I find that: (1) it takes me significantly longer to type
things because I have to blip the shift key all the time, and I am not
a good enough typist to be able to type flat out while inserting
apparently randomly-distributed caps (so I get MakefOo of MakeFOo);
(2) I make mistakes because I can't remember the right case,
especially since the conventions are not always obvious.
Of course, it may be that in the kinds of languages where that style
thrives, things are so hard to write in any case, that being able to
type reasonably fluently is not an issue, so the capitalisation
doesn't hurt you. But in Lisp I often find that, once I've worked out
what I want to do, I type chunks of code almost at the speed I am
typing this message, so it would really hurt me.
(It's interesting that zmacs has a mode where it automagically types
code in CAPS and comments in ordinary case. That works OK for the
style some people like, but it wouldn't generalise to the StudLy style
unless you had very sophisticated analysis in there).
--tim
amusing. DOS doesn't name its files case-sensitively. it _retains_ the
case, yet searches are case-insensitive. try to write the files FOOBAR
and FooBar, and you'll see.
you can still use case if you want to in Common Lisp. I know a few
people don't like this, but (SETF (READTABLE-CASE *READTABLE*) :PRESERVE)
does not convert the case of input symbols. :INVERT likewise inverts it,
so you can type in all lowercase, and it comes out as uppercase, and vice
versa. if, however, you use mixed case, the symbol name is not altered.
the idea was apparently that by mixing case, you would not expect it to
change in the symbol name.
| I like using Allegro, because it lets me say "I want my Lisp case
| sensetive". To me, that is a good thing. I'm sure other vendors provide
| similar options.
well, even the standard does, and thanks to my bitching and moaning and
John Foderaro's excellent code, Allegro CL 5.0 also behaves correctly in
all combinations of readtable-case, print-case, and case-mode, now.
In C++, I might have something like this:
class Foo;
Foo foo;
This is not a problem for me.
I also deal with things that are case insensitive like HTML. Again, I
prefer to use lower case when typing the tags. I guess I am just to
lazy to use the shift key constantly.
Because of C/C++ parsing rules, the - character can't be used as a
word separator. It turns into a minus sign, a token in its own right.
It makes the use of upper case symbols between words possibly more
desirable than underscores (both require the shift key). Since Lisp
is happy to allow the - in a symbol name, I find it makes a great word
separator. I can go ahead and type in my lazy fashion.
Lisp is rather flexible as far as case goes. You can set the reader
to do case folding or not. It seems to me, you have a choice. You
just risk compatibility with other programmers by violating
convention.
What really counts is expressiveness. Lisp is so expressive that I am
still trying to wrap my brain around it. The primary problem
(actually, a good feature) being the large vocabulary. Sure, I like
the terseness of C. But such a language is in constant danger of
being write only. It is more useful to a programmer to be able to
read the code. That task is performed more frequently.
--
David Steuber (ver 1.31.3a)
http://www.david-steuber.com
To reply by e-mail, replace trashcan with david.
May the source be with you...
One point that I can't remember anyone making here with regard to the
whole symbol case and or sensitivity is the ergonomics of the font and
case used. There is a reason that the majority of printed text that
we read in the day is in lower case, that being that it is easer to
read. (Others also make a case for the fact that serifed fonts are
easer to read than their sans counterparts. The books by Edward
R. Tufte include some nice reasons for why fonts are designed as they
are, among other things.) Some of the comparisons made to the use of
all caps on billboards and other "short duration" texts is not really
relevant in my mind in this case. I will not spend large amounts of
time reading billboards, while I will spend hours reading a book or
code. IMAGINE READING 500+ PAGES OF TEXT THAT IS IN ALL CAPITAL
LETTERS IT CAN BECOME TIRESOME AFTER A WHILE.
In natural languages case serves a very important role based primarily
on its absence, case accentuates some point of the text or comments on
its content. For instance, case can be used to mark "important"
elements of the text: names, the first word of a sentence, objects in
German. In all of these cases the removal of case will not usually
inhibit the understanding of what is being presented but it will
reduce the ease of understanding. Likewise, in computer languages
this added extra syntactic information can be useful to the reader of
the code. A good Lisp example of this is the practice of using all
caps to refer to arguments and the like in argument strings. This
distinguishes between the two worlds of discourse, that of the natural
language string and the Lisp program. This type of comment would not
be possible where case syntactically important.
If we were to have a case sensitive reader my primary objection would
be that such a reader precludes the use of case in an extra syntactic
way. However, if we could go back to the days of LispM's where source
code could be written with different type faces to communicate many of
the non-syntactic things that case is used for I would have a lot less
to complain about.
-Eric
> I will not spend large amounts of
> time reading billboards, while I will spend hours reading a book or
> code. IMAGINE READING 500+ PAGES OF TEXT THAT IS IN ALL CAPITAL
> LETTERS IT CAN BECOME TIRESOME AFTER A WHILE.
I think this is claim is anecdotal and without statistical foundation.
Like political debates, all it does is reinforce in both kinds of viewers
the fact that their side is right. Likers of lowercase will naturally
agree with your remark; likers of uppercase will look at it and see how
much more readable the uppercase is and say "gee, this guy is a nut if
he can see something so plain and easy to read and call it tiresome".
Code used to be written in uppercase all the time and has changed in
character I think mostly because of people not wanting to type "shift"
and because there are fewer uppercase-only keyboards. I still
personally program in all-uppercase quite frequently. Mostly I just
don't notice the difference. Over time I've shifted to lowercase more
just because others complain. But personally I think it's like the
coke/pepsi distinction: It is possible to tell the difference and a
few people (myself included for coke/pepsi) can do so quiet reliably;
but most people have an opinion on it just because they've been taught
they should. Back when code was all-uppercase routinely, there
weren't nearly as many complaints about it as there are now.
There ARE certain studies that say that lowercase is "easier to read";
I don't know. I doubt there is a study that says that it's "always
easier to read"--I think those things mean "on average" and do not
mean to suggest there's no one who doesn't like or prefer or do better
with uppercase. And like the coke/pepsi, I think the important part
isn't to tell how many people prefer one or the other, the important
thing is to stamp out stupid hotels and restaurants that don't let you
choose the one you want. It's the situations in which you're forced
to use one or the other that are a problem, not the choice itself.
Ditto for case--upper or lowercase is fine--let it be chosen by the
person who is the principal client of a particular piece. What ought
not be determined is that people should have no choice, which is what
"case-sensitive" systems do, because they provide no way for a person
with a different preference to hide.
> In natural languages case serves a very important role based primarily
> on its absence, case accentuates some point of the text or comments on
> its content. For instance, case can be used to mark "important"
> elements of the text: names, the first word of a sentence, objects in
> German. In all of these cases the removal of case will not usually
> inhibit the understanding of what is being presented but it will
> reduce the ease of understanding. Likewise, in computer languages
> this added extra syntactic information can be useful to the reader of
> the code. A good Lisp example of this is the practice of using all
> caps to refer to arguments and the like in argument strings. This
> distinguishes between the two worlds of discourse, that of the natural
> language string and the Lisp program. This type of comment would not
> be possible where case syntactically important.
Yes, I agree with this part.
> If we were to have a case sensitive reader my primary objection would
> be that such a reader precludes the use of case in an extra syntactic
> way. However, if we could go back to the days of LispM's where source
> code could be written with different type faces to communicate many of
> the non-syntactic things that case is used for I would have a lot less
> to complain about.
Heh. I hated code in bold and italics, even though others liked it.
Again, the key to my not being driven crazy by some of my best friends
was not the availability of fonting but the fact that the font was
discarded by the reader. Incidentally, it was common for us to patch
Lisp Machine code marking the changed part in bold, and it was a VERY
common error to have the bold creep into the a string (which didn't
discard fonting) and end up in the program output. That was also a
nightmare... Some problems you just can't get away from.
In any case, I think your overall remark about the "extra syntactic"
annotation that weaves through code (and human textual language)
is a good one.
I suspect it's mostly a matter of what people are used to. I think a study
would also fine that Americans find Roman text easier to read the Hebrew,
but does that mean that one is inherently easier to read? Since most
printed text is in lowercase, with uppercase to highlight things, that's
what we find comfortable.
On the other hand, that begs the question of why the letters in various
cases are formed the way they are. Are the arbitrary, or are the lowercase
letters shaped so as to be easier to read (alphabets evolved mostly by
themselves, with little conscious guidance by users -- certainly much less
than has been given to computer languages).
>
> * eric dahlman <dah...@cs.colostate.edu>
> | For instance, case can be used to mark "important" elements of the text:
> | names, the first word of a sentence, objects in German.
>
> actually, any intrinsic importance of the first word in a sentence is
> lost if you upcase it gratuitously, of course, bad jokes like "Bill Gates
> is great, as long as `bill' is a verb." for some odd reason, the Germans
> are abandoning the (stupid) rule to capitalize nouns, like the rest of
> Europe did hundreds of years ago and capitalizes only proper nouns.
One of the great problems with trying to computationally parse and
analyze natural languages is that the components of the language do
not have a single exact meaning or purpose. That is elements can be
uses to convey information or they can convey "error bits" that help
the receiver correct what they hear. Consider your joke example "Bill
Gates is great, as long as `bill' is a verb." you assert that
intrinsic importance of the first word is lost by upcasing it.
However, the joke interpretation of "bill Gates is great..." does not
parse because of the improper verb form. So without capitalization I
could still rightly assume that "bill" is a proper name in this
context. That is not to say that there are not cases of genuine
ambiguity but this is not one where upcasing necessarily looses.
>
> incidentally, you don't capitalize your own name in the headers. why?
>
Well, to make a long story short the sysadmins here were moving some
accounts around and everyone got downcased by a script one of them
wrote. This wouldn't be so bad except they also disabled the
userland programs for changing these things and since there were 100s
of users affected they thought it would be too much trouble for them
to fix each one by hand...
> | In all of these cases the removal of case will not usually inhibit the
> | understanding of what is being presented but it will reduce the ease of
> | understanding.
>
> ever looked at the code and the work necessary to figure out whether the
> first word of a sentence is a proper name or not? we've grown beyond the
> evolutionary stage where all there is to text is understanding by humans.
I touched on this before, the code for determining this can be quite
bad but this is already in the realm of natural language processing
where nothing is easy. To do it right you need to include enough
semantic/syntactic knowledge to infer whether it is a proper name or
not from the surrounding text, lots o' work. But it is part and parcel
of NLP.
As for moving beyond the point where all text is understandable by
humans, I am not sure that I agree. In this case we are talking about
programming languages which are the interface between the thought
process of the human and the computation process of the machine. It
needs to be understandable by both. In fact this is one of the
brightest points in favor or Lisp as a language because we are able to
"communicate" with the machine on an equal level because we both a
talking about lists. A wonderful example of this is Screamer which
adds non-determinism to Lisp by transforming parts of the original
program into CPF via macros and some elaborate code walking. It is
getting late so if that doesn't make since I can elaborate on it later
if anyone cares.
>
> | A good Lisp example of this is the practice of using all
> | caps to refer to arguments and the like in argument strings. This
> | distinguishes between the two worlds of discourse, that of the natural
> | language string and the Lisp program. This type of comment would not
> | be possible where case syntactically important.
>
> that's an odd argument, considering that Emacs Lisp, a case-sensitive
> Lisp, uses upper-case in documentation strings to refer to arguments.
My rebuttal would simply be that elisp tends not to have symbols of
mixed case so the convention would be easily understood in that
community. A preliminary estimate based on an apropos of '-' just for
fun to sample the symbols in emacs and turned up 50 symbols which were
not all lower case out of a total of 4867 in my emacs. If you do not
make use of a feature is it really there? ;-)
>
> | If we were to have a case sensitive reader my primary objection would be
> | that such a reader precludes the use of case in an extra syntactic way.
> | However, if we could go back to the days of LispM's where source code
> | could be written with different type faces to communicate many of the
> | non-syntactic things that case is used for I would have a lot less to
> | complain about.
>
> then it would matter even less which case the system uses internally.
My point was not for which case was used internally but rather to
point out that in running to the C way of case sensitivity we
sacrifice the ability to use case to communicate "out of stream"
information about our code.
-Eric
I believe that people recognize whole words at a time and rarely look
at single letters. The idea of lowercase (I believe) is that there is
more variation in the shape of words which allow for easier `pattern
matching'. In upper case you have no ascenders or descenders to aid in
this process. Although I do agree with you that I am more comfortable
with what I see most often.
What I find more important than casing is the meaning that's
associated with the chosen symbols of representation. This is
something I always worry about. Anyone have any pointers for this?
I had a function that would return a list of symbols which started
with a question mark, like ?x, ?color, etc. I called them variables
because that's the purpose that they were serving at the `user level'.
It would look like
(rule ((hair ?color)) ...).
(I'm sure your familiar with what I'm talking about - just trying to
be clear - something which I'm not great at doing).
At one point, I needed to find all the free variables in an expression
like the above. My question was/is what should I call this function.
Should it be `extract-free-variables', or `free-variables-in'? I guess
the question is which is best to describe, the action, the effect, or
something entirely else? I find that this comes up often. Any good
pointers for reaching the `optimal' level of clarity?
[another option would be occurs-check since I believe that this is a
familiar term which is often used]
actually, any intrinsic importance of the first word in a sentence is
lost if you upcase it gratuitously, of course, bad jokes like "Bill Gates
is great, as long as `bill' is a verb." for some odd reason, the Germans
are abandoning the (stupid) rule to capitalize nouns, like the rest of
Europe did hundreds of years ago and capitalizes only proper nouns.
incidentally, you don't capitalize your own name in the headers. why?
| In all of these cases the removal of case will not usually inhibit the
| understanding of what is being presented but it will reduce the ease of
| understanding.
ever looked at the code and the work necessary to figure out whether the
first word of a sentence is a proper name or not? we've grown beyond the
evolutionary stage where all there is to text is understanding by humans.
| A good Lisp example of this is the practice of using all
| caps to refer to arguments and the like in argument strings. This
| distinguishes between the two worlds of discourse, that of the natural
| language string and the Lisp program. This type of comment would not
| be possible where case syntactically important.
that's an odd argument, considering that Emacs Lisp, a case-sensitive
Lisp, uses upper-case in documentation strings to refer to arguments.
| If we were to have a case sensitive reader my primary objection would be
| that such a reader precludes the use of case in an extra syntactic way.
| However, if we could go back to the days of LispM's where source code
| could be written with different type faces to communicate many of the
| non-syntactic things that case is used for I would have a lot less to
| complain about.
then it would matter even less which case the system uses internally.
#:Erik
the invention of lower-case came with more widespread literacy. its main
purpose was to reduce the amount of writing materials consumed and the
time it took to write. also, the Roman letters (which were all capital)
are fairly easy to carve in stone. the Latin alphabet (which includes
the lowercase letters) is fairly hard to carve in stone, but much easier
to use on pergament or paper with pen. most of the fonts we use today
were designed to be written by hand, with _very_ little variation into
their printed form. e.g., Times Roman was designed _explicitly_ to fit
more text onto the written page using lead type. Palatino was designed
to look better with offset printing techniques. Lucida was designed to
look better with 300 DPI laser printers. the Gothic alphabet was used
primarily in elaborately decorative hand-copied texts, thus it was the
alphabet Gutenberg used for the first Bibles.
in the history of writing systems and fonts, you will find that modern
technologies have influenced the alphabets and their shapes more than
anything else. it might well be that the previous situation with all
uppercase terminals caused upper-case to be favored by languages that
went through their formative years at the time, while the effort to use
the shift key on those disgustingly hard Teletype terminals caused the
lazy bums who designed Unix to use all lowercase and as few characters as
possible, and also designed C that way because they couldn't bother to
use any "superfluous" CPU instructions to fold case¹². (my MULTICS
material indicates case-insensitivity and canonical upper-case forms.)
I have on good authority that I write very legibly by hand, but I switch
to (tiny) upper-case letters if I'm really cramped for space, as that
means less likelihood of confusion when the relatively boldness of the
stroke increases. excessively bold lower-case letters look really bad --
and that's also the reason you find more boldface text in upper-case.
all this said, I still think small caps was a good idea. the fonts I use
these days (Lucida Typewriter on my screen, various serif fonts on paper)
also have upper-case letters that are shorter than the stem on the tall
lower-case letters. such is also a fairly recent invention.
#:Erik
-------
¹ this may not actually be true, but it is unfortunately 100% consistent
with the rest of the Unix history.
² I'm reminded of the C programmer who argued against longer symbol names
because it would take longer to compile. (I'm not making this one up.)
--
don't call people who don't understand statistics idiots. take their money.
I think the studies are pretty carefully done. I don't have
references (sorry) but I read at least a couple, one of did a lot of
stuff with analysing spatial frequencies of letterforms & text and
making it look very plausible that LC text was better in terms of the
visual system, as well as explaining a lot about line
lengths/spacing/serifs.
*But*. These are for large chunks of natural language text, not code,
on paper not a screen. And that is just different, so there is really
no reason to assume that any of it at all applies to code!
For instance, I find sans-serif, or perhaps slab-serif, typefaces *much*
easier to read on screen. Seriffed ones just end up with all sorts of
little single-pixel visual noise, which is a pain. But I find exactly
the opposite for paper.
And there's this wonderful stuff about margins &c. It's fairly
well-understood that, for paper, people like quite large margins, and
there are reasonably good studies that show this (the typographer's
rule I was taught is that a well-printed book should be 50%
whitespace). So someone came along (again, I read this but I don't
have a reference, sorry) and looked at text on screens. Most people
use(d?) editors which have maybe a couple of pixels between the text
and the border. And this ought to be hard to read, and sure enough it
is. And he designed some editor which had nice big margins, and it
was easier to read. But he totally failed to realise that screen
space is a seriously scarce resource, and you really want to get as
much as you possibly can in that area, *especially* if you are
programming, because you need to be able to see the code rather than
scrolling around all the time. So you sacrifice the margins for the
text, because those pixels are expensive.
So really, I think that experience from the paper & natural language
text world may be pretty irrelevant to things like reading source
code.
--tim
sigh. I think I'll add force to your former assertion by asking you to
consider the insertion of a pair of quotation marks around "bill Gates",
to _make_ it a valid verb-object expression. part of what makes us laugh
at jokes is the effort and intelligence required to make them make sense.
| That is not to say that there are not cases of genuine ambiguity but this
| is not one where upcasing necessarily looses.
how about lose vs loose? (OK, that was a cheap shot, but it isn't just a
spelling mistake.)
| I touched on this before, the code for determining this can be quite
| bad but this is already in the realm of natural language processing
| where nothing is easy.
but why make it so much harder for no good reason? computers can
capitalize the first word of a sentence trivially if they want to. they
can't return the word to what it once was without excessive computational
power, and frequently get it wrong. therefore, I don't capitalize the
first word of a sentence in electronic text, but leave it to the
print/paper/final edition.
| As for moving beyond the point where all text is understandable by
| humans, I am not sure that I agree. In this case we are talking about
| programming languages which are the interface between the thought
| process of the human and the computation process of the machine. It
| needs to be understandable by both.
hm. let's look at what I wrote, compared to what you think it said:
| we've grown beyond the evolutionary stage where all there is to text
| is understanding by humans.
how _did_ this get warped into "beyond the point where all text is
understandable by humans"? I may have thought you were are bad piece of
NLP software for not getting the "bill Gates" joke, but this is getting
to annoy me. do me a favor and expend the effort required to grasp the
meaning of what you respond to. if you don't grasp it, complain. if you
don't want to expend that effort either way, don't reply. OK?
it is _precisely_ "understandable by both" which is my point. your
stupid defense of the ancient ritual of upcasing the first letter of a
sentence gives me a clear indication that you are not ready to understand
what is involved, yet speak a lot about it.
that is, there is no need to agree with me, but there is even less need
to defend the status quo. fact is, I have come to see this argument as a
yardstick on whether people are able to understand that old habits are
just that: old habits, and that they need to think about why they want to
keep them. I say: _always_ question the status quo, learn how it came to
be, but do _not_ accept without understanding, and above all: _never_
defend without deep understanding of the alternatives as well.
#:Erik
I recommend the book Web Site Usability: A Designer's Guide, by Jared
M. Spool (principal investigator), Tara Scanlon, Will Schroeder, Carolyn
Snyder, and Terri DeAngelo, all of User Interface Engineering. ISBN
0-9660641-0-0. available from amazon.com. it's about how users behave
in response to various web designs that you'd _think_ would be just
great, but questions why you think so, and shows through serious empiric
evidence how and why we think wrong. (it also contains the comforting
line "No engineers were harmed in the production of this book" on the
colophon page. :)
| So really, I think that experience from the paper & natural language text
| world may be pretty irrelevant to things like reading source code.
as it turns out, it is very _hurtful_ to the users to assume similarity
between paper and screen, and between prose and code, because what's good
for the users is anathema to the previous-technology schools of design.
> | So really, I think that experience from the paper & natural language text
> | world may be pretty irrelevant to things like reading source code.
>
> as it turns out, it is very _hurtful_ to the users to assume similarity
> between paper and screen, and between prose and code, because what's good
> for the users is anathema to the previous-technology schools of design.
This is certainly true. A point I've made about copyright for example
is that in "creative writing", the goal is to make something
different; you get graded down in English class (oops, sorry, Erik, I
mean "the class one takes in school to study the nitpicky details of
one's mother tongue"--I wish there were a generic English term for
that that didn't use the word "English") for writing text that is the
same was what someone else did. By contrast, in a programming course,
the goal is normative behavior and it's not far from the truth to say
you get graded down if you write something which is not the same as
what your fellow classmates do. In this latter context, it's odd to
be asserting copyright (something that tries to defend against
duplication, though fortunately at least not through independent
creation) on programs. Programming design is only superficially the
same as creative writing, and applying old-world theories of use to it
leads to odd results.
But more generally, there's a meta-point to be made, too, which is
that it's hurtful to assume that there's a similarity between one
person's needs and another's unless there is a strong overriding
reason to force a similarity. Just because some people, perhaps many
people, have a certain preference, doesn't mean others do. Computers
force a great deal of our lives to be the same for practical reasons.
We're all speaking in English, for example. And while the non-English
folks in our midst don't complain about it a lot and even derive some
benefit from the "interchangeability" of English as a de facto
international language for technical exchange, we have to be careful
to stop at the point of "offering places to talk in English" and not
go so far as to insist that the statistical numbers mean everyone
should have to talk in English. We don't want to kill the variation
among us that makes life outside the computer interesting. Ditto with
case and fonts and a variety of other things. It's one thing to work
on ways for us to interchange what we do (since that's a necessity),
but it's quite another to make it hard for people to get by in the
privacy of their own coding. Tolerance takes work. But we should
remember to do that work, and to do it as our FIRST impulse, falling
back to intolerance only where tolerance is unworkable. Case-insensitive
languages are tolerant languages. Case-sensitive languages are not.
In a few cases, not surprisingly one of which being those languages
devoted to introspecting and manipulating case itself, such as SGML
enttity references, it doesn't suffice to have case-insensitivity
because you're trying to represent information about cased characters
and the easiest mapping to use in keeping that mapping is case itself.
So ñ and Ñ (or &ntilde; and &Ntilde; for those
people whose news readers get confused about quoting levels in HTML
mail vs quoting levels in plain text mail). might be different
characters and SGML sacrifices "tolerance" to a specific highly
forseeable need on the part of a large number of users. But for
ordinary programming languages, this kind of thing is rare, and more
tolerant approaches are, I think, morally superior. Not that I think
getting too into moralizing is itself morally superior... but so it
goes. Once in a while one has to take a stand.