char-upcase settable?

Erik Naggum

unread,

Aug 21, 1996, 3:00:00 AM8/21/96

to

according to ANSI X3.226, a conforming program cannot add setf expanders to
symbols in the COMMON-LISP package (11.1.2.1.2). it also _seems_ that a
conforming implementation cannot add any that aren't specified by the
standard. (I haven't found a similarly strong clause in this case.)

however, with non-trivial (non-ASCII) character sets, a system needs to be
able to control the case conversion. this seems such an obvious thing that
I wonder why nobody thought of it. it is a piece of cake to modify a
system to offer a setf expander for char-upcase (etc), but my program will
be non-conforming if it does, and ugly if I have to use a different
function to do this.

should I care?

#\Erik
--
life is hard and then you post.

Howard R. Stearns

unread,

Aug 22, 1996, 3:00:00 AM8/22/96

to

Here's what I think:

LAWYER'S ARGUMENT:
11.1.2.1.2 Constraints on the COMMON-LISP Package for Conforming
Programs
(for those who don't have it handy) states:
Except where explicitly allowed, the consequences are undefined if any
of the following actions are performed on an external symbol of the
COMMON-LISP package:
...
13. Defining a setf expander for it (via defsetf or
define-setf-method).
This means YOU can't define a (setf char-upcase) expander. A user must
be able to rely on the implemention doing the same.

TERRORIST'S ARGUMENT:
You can consider yourself to not be bound by these rules by calling
yourself an implementor, not a user. Since no conforming user can
define the expansion, there's no way that you doing so can screw
anybody, so go ahead.

HACKER'S (microsoft?) ARGUMENT:
Just because it's undefined doesn't mean you can't do it. If it doesn't
work in some situation, then you'll have to feature-qualify some of your
code for that platform.

ENGINEER'S ARGUMENT: (my personal opinion)
The symbol CHAR-CASE is NOT in the COMMON-LISP package. Design your
system to use (SETF CHAR-CASE), which takes an argument like :UPPER,
:LOWER, or NIL with the default being NIL. If, when all is said and
done, you find that you never had to use :LOWER as an argument AND the
rest of the world agrees AND the Common Lisp standard is ammended to
allow (SETF CHAR-UPCASE), you (and your users) will only have to
globally replace CHAR-CASE with CHAR-UPCASE, because :UPPER is non-null.
The change will be compatible.

(At first blush, though, I don't see how you can implement BOTH-CASE-P
without specifying :lower for some characters.)

COMPUTER SCIENTIST'S ARGUMENT:
???

----------------------------
I, too, am confused about what conforming implementations can do. For
example, can a conforming implementation add new &KEY arguments to ANY
Common Lisp function, or just the VERY few which are explicitly
mentioned in the spec? If &KEY arguments are added, must the "keys"
(i.e. argument names) be symbols in a package other than those specified
by the spec (i.e. not in COMMON-LISP or KEYWORD packages), or can they
be keywords?

My opinion is that &KEY arguments can safely be added to any function
(or special form) IFF the argument key is actually a symbol in an
implementation-specific package (i.e. not the keyword package).

My guiding rational for all the issues here is that someone should be
able to locate ALL implementation-specific extensions in user code by
simply searching for the implementation-specific package name. (If the
implementation-specific package name shows up in a package manipulating
form, then of course, one might have to then search for individual
symbols, but at least the set of these individual symbols can be located
by searching for the package name. This strategy can still be broken by
finding packages named by strings which are constructed at run-time, and
although strange, it does mean there is a "security weakness" in using
this strategy to VALIDATE that code is conforming.)

Barry Margolin

unread,

Aug 22, 1996, 3:00:00 AM8/22/96

to

In article <30496305...@arcana.naggum.no>,

Erik Naggum <er...@naggum.no> wrote:
>according to ANSI X3.226, a conforming program cannot add setf expanders to
>symbols in the COMMON-LISP package (11.1.2.1.2). it also _seems_ that a
>conforming implementation cannot add any that aren't specified by the
>standard. (I haven't found a similarly strong clause in this case.)

Neither have I, so I think it's allowed; the only things an implementation
is prohibited from adding to symbols in the COMMON-LISP package are
function, macro, special operator, global variable, and type definitions.
Of course, a conforming program cannot *use* a setf expansion of a
COMMON-LISP symbol that isn't defined in the standard.

>however, with non-trivial (non-ASCII) character sets, a system needs to be
>able to control the case conversion. this seems such an obvious thing that
>I wonder why nobody thought of it. it is a piece of cake to modify a
>system to offer a setf expander for char-upcase (etc), but my program will
>be non-conforming if it does, and ugly if I have to use a different
>function to do this.

It seems to me that the correspondence between characters in the two cases
is an attribute of a particular character repertoire, not something that a
program would need to set dynamically. I'm not sure what you expect
someone to do with such a capability; something like this:

(setf (char-upcase #\a) #\B)

?
--
Barry Margolin
BBN Planet, Cambridge, MA
bar...@bbnplanet.com - Phone (617) 873-3126 - Fax (617) 873-6351
(BBN customers, please call (800) 632-7638 option 1 for support)

Erik Naggum

unread,

Aug 23, 1996, 3:00:00 AM8/23/96

to

Barry Margolin correctly observes that case correspondence is an attribute
of a particular character repertoire, but support for multiple repertoires
(at run-time) will have to handle dynamic changes to the mapping for the
non-ASCII characters of, e.g., ISO 8859-1 and ISO 8859-6. Lisp systems
that do not support more than ASCII may still want to allow the programmer
or user(!) to employ a larger character set as the default character set;
at one level or another, the case mappings will need to be changed. seeing
no other means to change the case-mapping tables, `(setf char-upcase)' was
my primary choice, and thus the question of conformance arose.

(ANSI X3.226 clause 13.1.10 Documentation of Implementation-Defined Scripts
goes to great lengths to define what a system must document that it does
with additional scripts. however, this is all `implementation-defined' and
would require `char-upcase' and friends to accept subtypes of character,
which they are not required to do (that's a problem). this _sounds_ good,
if implemented, but since one may want to use a script different from ASCII
as the primary character set, a system must have thought about this _very_
carefully to get it right. the basics are right in Common Lisp (unlike all
other language standards I know), but without sufficient information to get
it right in practice. I'm therefore trying to get by with insufficient
implementations. if anybody has given thought to the subclassing the type
CHARACTER, I'd love to talk with them.)

#\Erik
--
and there I was, barking up the wrong cons

Barry Margolin

unread,

Aug 23, 1996, 3:00:00 AM8/23/96

to

In article <30498010...@arcana.naggum.no>,

Erik Naggum <er...@naggum.no> wrote:
>Barry Margolin correctly observes that case correspondence is an attribute
>of a particular character repertoire, but support for multiple repertoires
>(at run-time) will have to handle dynamic changes to the mapping for the
>non-ASCII characters of, e.g., ISO 8859-1 and ISO 8859-6. Lisp systems
>that do not support more than ASCII may still want to allow the programmer
>or user(!) to employ a larger character set as the default character set;
>at one level or another, the case mappings will need to be changed. seeing
>no other means to change the case-mapping tables, `(setf char-upcase)' was
>my primary choice, and thus the question of conformance arose.

My other objection to (setf char-upcase) is that (setf (<function>
<constant>) <value>) seems conceptually wrong to me. This would be like
allowing (setf (1+ 2) 4)!

I agree that this is a thorny issue. The real problem is that the value of
CHAR-UPCASE is dependent upon more than just the character, but also the
repertoire being used. A character may belong to multiple repertoires, and
it might have different case mappings in them; for instance, in Canada I
don't think they put accents on uppercase letters, but in France they do.

>(ANSI X3.226 clause 13.1.10 Documentation of Implementation-Defined Scripts
>goes to great lengths to define what a system must document that it does
>with additional scripts. however, this is all `implementation-defined' and
>would require `char-upcase' and friends to accept subtypes of character,
>which they are not required to do (that's a problem).

I believe they *are* required to accept subtypes of character. Whenever a
function is defined to accept an argument of type T, it must also accept
all subtypes of T. What's implementation-defined is the value of the
functions, but they must accept them.

Erik Naggum

unread,

Aug 24, 1996, 3:00:00 AM8/24/96

to

Barry Margolin posts an objection to a function `(setf char-upcase)' that
it would be like `(setf 1+)'. point taken, but I see it much more as the
equivalent of `(setf aref)', which can also take a constant index and
modify it. a function `(setf char-upcase)' may not be the best solution,
but as I indicated previously, the lack of standardized means to define
your own repertoires, encodings and external-format, makes the whole
business of support for international character sets stand on somewhat
loose ground. Barry also points out that upcasing and downcasing is
dependent on the "locale", for lack of a better term, and I must in turn
point out that although, e.g., ANSI/ISO C, has done a bad job at tackling
this issue, it seems to be done right in ANSI Common Lisp, but not taken
far enough to be expected to be useful.

but enough of this, I'll have to think up a more general solutions to this
now that I know what the standard says on setf methods. thanks, Barry!

#\Erik
--
my other car is a cdr

Barry Margolin

unread,

Aug 25, 1996, 3:00:00 AM8/25/96

to

In article <30499151...@arcana.naggum.no>,

Erik Naggum <er...@naggum.no> wrote:
>Barry Margolin posts an objection to a function `(setf char-upcase)' that
>it would be like `(setf 1+)'. point taken, but I see it much more as the
>equivalent of `(setf aref)', which can also take a constant index and
>modify it.

I don't agree with the analogy. (setf aref) takes a non-constant array
element (identified by an array and an index) and modifies it; no one
thinks of (setf (aref foo 1) val) as modifying an attribute of the number
1. The general idea is that the first argument to SETF is conceptually a
"container", and it changes what it contains. It's not used to update
functional mappings.

Notice, for instance, that changes to reader macros are *not* done using
(setf get-macro-character), but rather with set-macro-character.

Riesbeck

unread,

Aug 26, 1996, 3:00:00 AM8/26/96

to

In article <4vr6q8$r...@tools.bbnplanet.com>, bar...@tools.bbnplanet.com
(Barry Margolin) wrote:

> ...The general idea is that the first argument to SETF is conceptually a

> "container", and it changes what it contains. It's not used to update
> functional mappings.
>
> Notice, for instance, that changes to reader macros are *not* done using
> (setf get-macro-character), but rather with set-macro-character.

I take it then you're not a fan of DEFTABLE then, where something like

(deftable get-foo)

defines a function get-foo that takes a key and returns a value,
and (setf (get-foo key) value) stores a value for a key.

--
-- Chris

Jeff Dalton

unread,

Aug 29, 1996, 3:00:00 AM8/29/96

to

In article <4vlfi2$p...@tools.bbnplanet.com> bar...@tools.bbnplanet.com (Barry Margolin) writes:
>
>My other objection to (setf char-upcase) is that (setf (<function>
><constant>) <value>) seems conceptually wrong to me. This would be like
>allowing (setf (1+ 2) 4)!

What's wrong with (setf (<fn> <const>) <value>)? Do you also object
to (<fn> <const> <const>), as in e.g. (get 'a 'b)? And how about
setf of symbol-value, symbol-function, and symbol-plist? How are
they conceptually wrong?

-- jd