Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

strings and characters

29 views
Skip to first unread message

Tim Bradshaw

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
I've managed to avoid worrying about characters and strings and all
the related horrors so far, but I've finally been forced into having
to care.about

The particular thing I don't understand is what type a literal string
has. It looks at first sight as if it should be something capable of
holding any CHARACTER, but I'm not really sure if that's right. It
looks to me as if it might be possible read things such that it's OK
to return something that can only hold a subtype of CHARACTER in some
cases.

I'm actually more concerned with the flip side of this -- if almost all
the time I get some `good' subtype of CHARACTER (probably BASE-CHAR?)
but sometimes I get some ginormous multibyte unicode thing or
something, because I need to be able I have to deal with some C code
which is blithely assuming that unsigned chars are just small integers
and strings are arrays of small integers and so on in the usual C way,
and I'm not sure that I can trust my strings to be the same as its
strings.

I realise that people who care about character issues are probably
laughing at me at this point, but my main aim is to keep everything as
simple as I can, and especially I don't want to have to keep copying
my strings into arrays of small integers (which I was doing at one
point, but it's too hairy).

The practical question I guess is -- are there any implementations
which do currently have really big characters in strings? Genera
seems to, but that's of limited interest. CLISP seems to have
internationalisation stuff in it, and I know there's an international
Allegro, so those might have horrors in them.

Thanks for any advice.

--tim `7 bit ASCII was good enough for my father and it's good enough
for me' Bradshaw.

Barry Margolin

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
In article <ey3hfe7...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:
>I realise that people who care about character issues are probably
>laughing at me at this point, but my main aim is to keep everything as
>simple as I can, and especially I don't want to have to keep copying
>my strings into arrays of small integers (which I was doing at one
>point, but it's too hairy).

You can call ARRAY-ELEMENT-TYPE on the string to find out if it contains
anything weird. If its compatible with your foreign function's API,
then you don't need to copy it.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Erik Naggum

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Tim Bradshaw <t...@cley.com>

| The particular thing I don't understand is what type a literal string
| has. It looks at first sight as if it should be something capable of
| holding any CHARACTER, but I'm not really sure if that's right. It looks
| to me as if it might be possible read things such that it's OK to return
| something that can only hold a subtype of CHARACTER in some cases.

strings _always_ contain a subtype of character. e.g., an implementation
that supports bits will have to discard them from strings. the only
array type that can contain all character objects has element-type t.

| I'm actually more concerned with the flip side of this -- if almost all
| the time I get some `good' subtype of CHARACTER (probably BASE-CHAR?)
| but sometimes I get some ginormous multibyte unicode thing or something,
| because I need to be able I have to deal with some C code which is
| blithely assuming that unsigned chars are just small integers and strings
| are arrays of small integers and so on in the usual C way, and I'm not
| sure that I can trust my strings to be the same as its strings.

this is not a string issue, it's an FFI issue. if you tell your FFI that
you want to ship a string to a C function, it should do the conversion
for you if it needs to be performed. if you can't trust your FFI to do
the necessary conversions, you need a better FFI.

| I realise that people who care about character issues are probably
| laughing at me at this point, but my main aim is to keep everything as
| simple as I can, and especially I don't want to have to keep copying my
| strings into arrays of small integers (which I was doing at one point,
| but it's too hairy).

if you worry about these things, your life is already _way_ more complex
than it needs to be. a string is a string. each element of the string
is a character. stop worrying beyond this point. C and Common Lisp
agree on this fundamental belief, believe it or not. your _quality_
Common Lisp implementation will ensure that whatever invariants are
maintained in _each_ environment.

| The practical question I guess is -- are there any implementations which
| do currently have really big characters in strings?

yes, and not only that -- it's vitally important that strings take up no
more space than they need. a system that doesn't support both
base-string (of base-char) and string (of extended-char) when it attempts
to support Unicode will fail in the market -- Europe and the U.S. simply
can't tolerate the huge growth in memory consumption from wantonly using
twice as much as you need. Unicode even comes with a very intelligent
compression technique because people realize that it's a waste of space
to use 16 bits and more for characters in a given character set group.

| I know there's an international Allegro, so those might have horrors in
| them.

sure, but in the same vein, it might also have responsible, intelligent
people behind it, not neurotics who fail to realize that customers have
requirements that _must_ be resolved. Allegro CL's international version
deals very well with conversion between the native system strings and its
internal strings. I know -- not only do I run the International version
in a test environment that needs wide characters _internally_, the test
environment can't handle Unicode or anything else wide at all, and it's
never been a problem.

incidentally, I don't see this as any different from whether you have a
simple-base-string, a simple-string, a base-string, or a string. if you
_have_ to worry, you should be the vendor or implementor of strings, not
the user. if you are the user and worry, you either have a problem that
you need to take up with your friendly programmer-savvy shrink, or you
call your vendor and ask for support. I don't see this as any different
from whether an array has a fill-pointer or not, either. if you hand it
to your friendly FFI and you worry about the length of the array with or
without fill-pointer, you're simply worrying too much, or you have a bug
that needs to be fixed.

"might have horrors"! what's next? monster strings under your bed?

#:Erik

Pekka P. Pirinen

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
Erik is basically right that you shouldn't have to worry. Unless
you're specifically writing localized applications. A string will
hold a character, and the FFI will convert if it can. The details of
how things should work with multiple string types have not been worked
out in the standard, so if you do want more control, it's
non-portable.

Erik Naggum <er...@naggum.no> writes:


> Tim Bradshaw writes:
> | The particular thing I don't understand is what type a literal string
> | has. It looks at first sight as if it should be something capable of
> | holding any CHARACTER, but I'm not really sure if that's right. It looks
> | to me as if it might be possible read things such that it's OK to return
> | something that can only hold a subtype of CHARACTER in some cases.
>
> strings _always_ contain a subtype of character. e.g., an implementation
> that supports bits will have to discard them from strings. the only
> array type that can contain all character objects has element-type t.

If only it were so! Unfortunately, the standard says characters with
bits are of type CHARACTER and STRING = (VECTOR CHARACTER). Harlequin
didn't have the guts to stop supporting them (even though there's a
separate internal representation for keystroke events, now). I guess
Franz did?

However, it's rarely necessary to create strings out of them, and it's
easy to configure LispWorks so that never happens. Basically, there's
a variable called *DEFAULT-CHARACTER-ELEMENT-TYPE* that is the default
character type for all string constructors. That includes the
reader's "-syntax that Tim Bradshaw was worrying about. The reader
will actually silently construct wider strings if it sees a character
that is not in *D-C-E-T*, it's just the default. (Note that if you're
reading from a stream, you have to consider the external format on the
stream first.)

> | The practical question I guess is -- are there any implementations which
> | do currently have really big characters in strings?

Allegro and LispWorks, at least. Both will use thin strings where possible
(but in slightly different ways).
--
Pekka P. Pirinen
A feature is a bug with seniority. - David Aldred <david_aldred.demon.co.uk>

Tim Bradshaw

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Erik Naggum wrote:
> strings _always_ contain a subtype of character. e.g., an implementation
> that supports bits will have to discard them from strings. the only
> array type that can contain all character objects has element-type
> t.

I don't think this is right -- rather I agree that they contain
CHARACTERs, but it looks like `bits' -- which I think now are
`implementation-defined attributes' -- can end up in strings, or at
least it is implementation-defined whether they do or not (2.4.5 says
this I think).

> this is not a string issue, it's an FFI issue. if you tell your FFI that
> you want to ship a string to a C function, it should do the conversion
> for you if it needs to be performed. if you can't trust your FFI to do
> the necessary conversions, you need a better FFI.

Unfortunately my FFI is READ-SEQUENCE and WRITE-SEQUENCE, and at the
far end of this is something which is defined in terms of treating
characters as fixed-size (8 bit) small integers. And I can't change
it because it's big important open source software and lots of people
have it, and it's written in C so it's too hard to change anyway... So
I need to be sure that nothing I can do is going to start spitting
unicode or something at it.

At one point I did this by converting my strings to arrays of
(UNSIGNED-BYTE 8)s, on I/O but that was stressful to do for various
reasons.

In *practice* this has not turned out to be a problem but it's not
clear what I need to check to make sure it is not. I suspect that
checking that CHAR-CODE is always suitably small would be a good
start.

> if you worry about these things, your life is already _way_ more complex
> than it needs to be. a string is a string. each element of the string
> is a character.

Well, the whole problem is that at the far end that's not true. Each
element (they've decided!) is an *8-bit* character...

> yes, and not only that -- it's vitally important that strings take up no
> more space than they need. a system that doesn't support both
> base-string (of base-char) and string (of extended-char) when it attempts
> to support Unicode will fail in the market -- Europe and the U.S. simply
> can't tolerate the huge growth in memory consumption from wantonly using
> twice as much as you need. Unicode even comes with a very intelligent
> compression technique because people realize that it's a waste of space
> to use 16 bits and more for characters in a given character set group.

For what it's worth I think this is wrong (but I could be wrong of
course, and anyway it's not worth arguing over). People *happily*
tolerate doublings of memory & disk consumption if it suits them --
look at windows 3.x to 95, or sunos 5.5 to 5.7, or any successive pair
of xemacs versions ... And they're *right* to do that because Moore's
law works really well. Using compressed representations makes things
more complex -- if strings are arrays, then aref &c need to have hairy
special cases, and everything else gets more complex, and that
complexity never goes away, which doubled-storage costs do in about a
year.

So I think that in a few years compressed representations will look
like the various memory-remapping tricks that DOS did, or the similar
things people now do with 32 bit machines to deal with really big
databases (and push, incredibly, as `the right thing', I guess because
they worship intel and intel are not doing too well with their 64bit
offering). The only place it will matter is network transmission of
data, and I don't see why normal compression techniques shouldn't deal
with that.

So my story is if you want characters twice as big, just have big
characters and use more memory and disk -- it's cheap enough now that
it's dwarfed by labour costs and in a year it will be half the price.

On the other hand, people really like complex fiddly solutions to
things (look at C++!), so that would argue that complex character
compression techniques are here to stay.

Anyway, like I said it's not worth arguing over. Time will tell.

--tim


Tim Bradshaw

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* I wrote:

> For what it's worth I think this is wrong (but I could be wrong of
> course, and anyway it's not worth arguing over).

Incidentally I should make this clearer, as it looks like I'm arguing
against fat strings. Supporting several kinds of strings is
*obviously* sensible, I quibble about the compressing stuff being
worth it.

--tim


Erik Naggum

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Erik Naggum

| strings _always_ contain a subtype of character. e.g., an implementation
| that supports bits will have to discard them from strings. the only
| array type that can contain all character objects has element-type t.

* Tim Bradshaw


| I don't think this is right -- rather I agree that they contain
| CHARACTERs, but it looks like `bits' -- which I think now are
| `implementation-defined attributes' -- can end up in strings, or at least
| it is implementation-defined whether they do or not (2.4.5 says this I
| think).

trivially, "strings _always_ contain a subtype of character" must be true
as character is a subtype of character, but I did mean in the sense that
strings _don't_ contain full character objects, despite the relegation of
fonts and bits to "implementation-defined attributes". that the type
string-char was removed from the language but the attributes were sort of
retained is perhaps confusing, but it is quite unambiguous as to intent.

so must "the only array type that can contain all character objects has
element-type t" be true, since a string is allowed to contain a subtype
of type character. (16.1.2 is pertinent in this regard.) it may come as
a surprise to people, but if you store a random character object into a
string, you're not guaranteed that what you get back is eql to what you
put into it.

furthermore, there is no print syntax for implementation-defined
attributes in strings, and no implementation is allowed to add any. it
is perhaps not obvious, but the retention of attributes is restricted by
_both_ the string type's capabilities and the stream type's capabilities.

you can quibble with the standard all you like -- you aren't going to see
any implementation-defined attributes in string literals. if you compare
with CLtL1 and its explicit support for string-char which didn't support
them at all, you must realize that in order to _have_ any support for
implementation-defined attributes, you have to _add_ it above and beyond
what strings did in CLtL1. this is an extremely unlikely addition to an
implementation just after bits and fonts were removed from the language
and relegated to "implementation-defined attributes".

I think the rest of your paranoid conspiratorial delusions about what
"horrors" might afflict Common Lisp implementations are equally lacking
in merit. like, nothing is going to start spitting Unicode at you, Tim.
not until and unless you ask for it. it's called "responsible vendors".

| The only place it will matter is network transmission of data, and I
| don't see why normal compression techniques shouldn't deal with that.

then read the technical report and decrease your ignorance. sheesh.

#:Erik, who's actually quite disappointed, now.

Erik Naggum

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Tim Bradshaw <t...@cley.com>

| Incidentally I should make this clearer, as it looks like I'm arguing
| against fat strings. Supporting several kinds of strings is *obviously*
| sensible, I quibble about the compressing stuff being worth it.

compressing strings for in-memory representation of _arrays_ is nuts.
nobody has proposed it, and nobody ever will. again, read the Unicode
technical report and decrease both your fear and your ignorance.

#:Erik

Barry Margolin

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
In article <31622236...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> so must "the only array type that can contain all character objects has
> element-type t" be true, since a string is allowed to contain a subtype
> of type character. (16.1.2 is pertinent in this regard.) it may come as
> a surprise to people, but if you store a random character object into a
> string, you're not guaranteed that what you get back is eql to what you
> put into it.

Isn't (array character (*)) able to contain all character objects?

Tim Bradshaw

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Erik Naggum wrote:

> I think the rest of your paranoid conspiratorial delusions about what
> "horrors" might afflict Common Lisp implementations are equally lacking
> in merit. like, nothing is going to start spitting Unicode at you, Tim.
> not until and unless you ask for it. it's called "responsible
> vendors".

If my code gets a string (from wherever, the user if you like) which
has bigger-than-8-bit characters in it, then tries to send it down the
wire, then what will happen? I don't see this as a vendor issue, but
perhaps I'm wrong.

Meantime I'm going to put in some optional checks to make sure that
all my character codes are small enough.

--tim


Erik Naggum

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
* Barry Margolin <bar...@bbnplanet.com>

| Isn't (array character (*)) able to contain all character objects?

no. specialized vectors whose elements are of type character (strings)
are allowed to store only values of a subtype of type character. this is
so consistently repeated in the standard and so unique to strings that
I'm frankly amazed that anyone who has worked on the standard is having
such a hard time accepting it. it was obviously intended to let strings
be as efficient as the old string-char concept allowed, while not denying
implementations the ability to retain bits and fonts if they so chose.

an implementation that stores characters in strings as if they have null
implementation-defined attributes regardless of their actual attributes
is actually fully conforming to the standard. the result is that you
can't expect any attributes to survive string storage. the consequences
are _undefined_ if you attempt to store a character with attributes in a
string that can't handle it.

the removal of the type string-char is the key to understanding this.

#:Erik

Pekka P. Pirinen

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
Erik Naggum <er...@naggum.no> writes:
> * Barry Margolin <bar...@bbnplanet.com>
> | Isn't (array character (*)) able to contain all character objects?
>
> no. specialized vectors whose elements are of type character (strings)
> are allowed to store only values of a subtype of type character. this is
> so consistently repeated in the standard and so unique to strings that
> I'm frankly amazed that anyone who has worked on the standard is having
> such a hard time accepting it.

Who replaced #:Erik with a bad imitation? This one's got all the
belligerence, but not the insight we've come to expect.

You've read a different standard than I, since many places actually
say "of type CHARACTER or a subtype" -- superfluously, since the
glossary entry for "subtype" says "Every type is a subtype of itself."
When I was designing the "fat character" support for LispWorks, I
looked for a get-out clause, and it's not there.

> the consequences are _undefined_ if you attempt to store a
> character with attributes in a string that can't handle it.

This is true. It's also true of all the other specialized arrays,
although different language ("must be") is used to specify that.

> the removal of the type string-char is the key to understanding this.

I suspect it was removed because it was realized that there would have
to be many types of STRING (at least 8-byte and 16-byte), and hence
there wasn't a single subtype of CHARACTER that would be associated
with strings. Whatever the reason, we can only go by what the
standard says.

I think it was a good choice, and LW specifically didn't retain the
type, to force programmers to consider what the code actually meant by
it (and to allow them to DEFTYPE it to the right thing).
Nevertheless, there should be a standard name for the type of simple
characters, i.e., with null implementation-defined attributes.
LispWorks and Liquid use LW:SIMPLE-CHAR for this.
--
Pekka P. Pirinen, Harlequin Limited
The Risks of Electronic Communication
http://www.best.com/~thvv/emailbad.html

Erik Naggum

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
* Pekka P. Pirinen

| Who replaced #:Erik with a bad imitation?

geez...

| You've read a different standard than I, since many places actually say
| "of type CHARACTER or a subtype" -- superfluously, since the glossary
| entry for "subtype" says "Every type is a subtype of itself."

sigh. this is so incredibly silly it isn't worth responding to.

| I suspect it was removed because it was realized that there would have to
| be many types of STRING (at least 8-byte and 16-byte), and hence there
| wasn't a single subtype of CHARACTER that would be associated with
| strings. Whatever the reason, we can only go by what the standard says.

the STRING type is a union type, and there are no other union types in
Common Lisp. this should give you a pretty powerful hint, if you can get
away from your "bad imitation" attitude problem and actually listen, but
I guess that is not very likely at this time.

#:Erik

Pekka P. Pirinen

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
Tim Bradshaw <t...@cley.com> writes:

> * Erik Naggum wrote:
> > this is not a string issue, it's an FFI issue. if you tell your FFI that
> > you want to ship a string to a C function, it should do the conversion
> > for you if it needs to be performed. if you can't trust your FFI to do
> > the necessary conversions, you need a better FFI.
>
> Unfortunately my FFI is READ-SEQUENCE and WRITE-SEQUENCE, and at the
> far end of this is something which is defined in terms of treating
> characters as fixed-size (8 bit) small integers.

You still need a better FFI: WRITE-SEQUENCE is just as much a foreign
interface as any. In theory, you specify the representation on the
other side by the external format of the stream. If the system
doesn't have an external format that can do this, then you're reduced
to hacking it.

> In *practice* this has not turned out to be a problem but it's not
> clear what I need to check to make sure it is not. I suspect that
> checking that CHAR-CODE is always suitably small would be a good
> start.

In practice, most of us can pretend there's no encoding except ASCII.
If you expect non-ASCII characters on the Lisp side, you need to know
what the encoding is on the other side, otherwise it might come out
wrong.

It might be enough to check the type of your strings (and perhaps the
external format of the stream), instead of every character.


--
Pekka P. Pirinen, Harlequin Limited

Technology isn't just putting in the fastest processor and most RAM -- that's
packaging. - Steve Wozniak

Barry Margolin

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
In article <31622323...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* Barry Margolin <bar...@bbnplanet.com>
>| Isn't (array character (*)) able to contain all character objects?
>
> no. specialized vectors whose elements are of type character (strings)
> are allowed to store only values of a subtype of type character.

You seem to be answering a different question than I asked. I didn't say
"Aren't all strings of type (array character (*))?".

I realize that there are string types that are not (array character (*)),
because a string can be of any array type where the element type is a
subtype of character. But if you want a type that can hold any character,
you can create it with:

(make-string length :element-type 'character)

In fact, you don't even need the :ELEMENT-TYPE option, because CHARACTER is
the default.

Tim Bradshaw

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
* Pekka P Pirinen wrote:

> You still need a better FFI: WRITE-SEQUENCE is just as much a foreign
> interface as any.

Yes, in fact it's worse than most, because I can't rely on the
vendor/implementor to address the issues for me!

> In theory, you specify the representation on the other side by the
> external format of the stream. If the system doesn't have an
> external format that can do this, then you're reduced to hacking it.

Right. And I'm happy to do this -- what I was asking was how I can
ensure

> In practice, most of us can pretend there's no encoding except ASCII.
> If you expect non-ASCII characters on the Lisp side, you need to know
> what the encoding is on the other side, otherwise it might come out
> wrong.

Yes. And the problem is that since my stuff is a low-level facility
which others (I hope) will build on, I don't really know what they
will throw at me. And I don't want to check every character of the
strings as this causes severe reduction in maximum performance (though
I haven't spent a lot of time checking that the checker compiles
really well yet, and in practice it will almost always be throttled
elsewhere).

> It might be enough to check the type of your strings (and perhaps the
> external format of the stream), instead of every character.

My hope is that BASE-STRING is good enough, but I'm not sure (I don't
see that a BASE-STRING could not have more than 8-bit characters, if
an implementation chose to have only one string type for instance (can
it?)). Checking the external format of the stream is also obviously
needed but if it's :DEFAULT does that tell me anything, and if it's
not I have to special case anyway.

Obviously at some level I have to just have implementation-dependent
checks because I don't think it says anywhere that characters are at n
bits or any of that kind of grut (which is fine). Or I could just not
care and pretend everything is 8-bit which will work for a while I
guess.

Is there a useful, fast, check that that (write-sequence x y) will
write (length x) bytes on y if all is well for LispWorks / Liquid (I
don't have a license for these, unfortunately)?

Thanks

--tim

Erik Naggum

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
* Tim Bradshaw <t...@cley.com>

| Is there a useful, fast, check that that (write-sequence x y) will write
| (length x) bytes on y if all is well for LispWorks / Liquid ...?

yes. make the buffer and the stream have type (unsigned-byte 8), and
avoid the character abstraction which you obviously can't trust, anyway.

#:Erik

Erik Naggum

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
* Barry Margolin <bar...@bbnplanet.com>

| But if you want a type that can hold any character, you can create it with:
|
| (make-string length :element-type 'character)

no, and that's the crux of the matter. this used to be different from

(make-string length :element-type 'string-char)

in precisely the capacity that you wish is still true, but it isn't.
when the type string-char was removed, character assumed its role in
specialized arrays, and you could not store bits and fonts in strings any
more than you could with string-char. to do that, you need arrays with
element-type t.

but I'm glad we've reached the point where you assert a positive, because
your claim is what I've been trying to tell you guys DOES NOT HOLD. my
claim is: there is nothing in the standard that _requires_ that there be
a specialized array with elements that are subtypes of character (i.e., a
member of the union type "string") that can hold _all_ character objects.

can you show me where the _standard_ supports your claim?

| In fact, you don't even need the :ELEMENT-TYPE option, because CHARACTER is
| the default.

sure. however, I'm trying to penetrate the armor-plated belief that the
resulting string is REQUIRED to retain non-null implementation-defined
attributes if stored into it. no such requirement exists: a conforming
implementation is completely free to provide a single string type that is
able to hold only simple characters. you may think this is a mistake in
the standard, but it's exactly what it says, after the type string-char
was removed.

methinks you're stuck in CLtL1 days, Barry, and so is this bad imitation
jerk from Harlequin, but that's much less surprising.

#:Erik

Tim Bradshaw

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to

Which is precisely what I want to avoid unfortunately, as it means
that either this code or the code that calls it has to deal with the
issue of copying strings too and from arrays of (UNSIGNED-BYTE 8)s,
which simply brings back the same problem somewhere else.

(My first implementation did exactly this in fact)

--tim

Barry Margolin

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
In article <31623029...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* Barry Margolin <bar...@bbnplanet.com>
>| But if you want a type that can hold any character, you can create it with:
>|
>| (make-string length :element-type 'character)
>
> no, and that's the crux of the matter. this used to be different from
>
>(make-string length :element-type 'string-char)
>
> in precisely the capacity that you wish is still true, but it isn't.
> when the type string-char was removed, character assumed its role in
> specialized arrays, and you could not store bits and fonts in strings any
> more than you could with string-char. to do that, you need arrays with
> element-type t.

I'm still not following you. Are you saying that characters with
implementation-defined attributes (e.g. bits or fonts) might not satisfy
(typep c 'character)? I suppose that's possible. The standard allows
implementations to provide implementation-defined attributes, but doesn't
require them; an implementor could instead provide their own type
CHAR-WITH-BITS that's disjoint from CHARACTER rather than a subtype of it.
I'm not sure why they would do this, but nothing in the standard prohibits
it.

On the other hand, something like READ-CHAR would not be permitted to
return a CHAR-WITH-BITS -- it has to return a CHARACTER. So I'm not sure
how a program that thinks it's working with characters and strings would
encounter such an object unexpectedly.

Gareth McCaughan

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
Erik Naggum wrote:

> * Barry Margolin <bar...@bbnplanet.com>
> | But if you want a type that can hold any character, you can create it with:
> |
> | (make-string length :element-type 'character)
>
> no, and that's the crux of the matter. this used to be different from
>
> (make-string length :element-type 'string-char)
>
> in precisely the capacity that you wish is still true, but it isn't.
> when the type string-char was removed, character assumed its role in
> specialized arrays, and you could not store bits and fonts in strings any
> more than you could with string-char. to do that, you need arrays with
> element-type t.
>

> but I'm glad we've reached the point where you assert a positive, because
> your claim is what I've been trying to tell you guys DOES NOT HOLD. my
> claim is: there is nothing in the standard that _requires_ that there be
> a specialized array with elements that are subtypes of character (i.e., a
> member of the union type "string") that can hold _all_ character objects.
>
> can you show me where the _standard_ supports your claim?

I'm not Barry, but I think I can. Provided I'm allowed to
use the HyperSpec (which I have) rather than the Standard
itself (which I don't).

1. MAKE-STRING is defined to return "a string ... of the most
specialized type that can accommodate elements of the given
type".

2. The default "given type" is CHARACTER.

3. Therefore, MAKE-STRING with the default ELEMENT-TYPE
returns a string "that can accommodate elements of the
type CHARACTER".

Unfortunately, there's no definition of "accommodate" in the
HyperSpec. However, compare the following passages:

From MAKE-STRING:
| The element-type names the type of the elements of the
| string; a string is constructed of the most specialized
| type that can accommodate elements of the given type.

From MAKE-ARRAY:
| Creates and returns an array construbted of the most
| specialized type that can accommodate elements of type
| given by element-type.

It seems to me that the only reasonable definition of "can
accommodate elements of type FOO" in this context is "can
have arbitrary things of type FOO as elements". If so, then

4. MAKE-STRING with the default ELEMENT-TYPE returns a string
capable of having arbitrary things of type CHARACTER as
elements.

Now,

5. A "string" is defined as "a specialized vector ... whose
elements are of type CHARACTER or a subtype of type CHARACTER".

6. A "specialized" array is defined to be one whose actual array
element type is a proper subtype of T.

Hence,

7. MAKE-STRING with the default ELEMENT-TYPE returns a vector
whose actual array element type is a proper subtype of T,
whose elements are of type CHARACTER or a subtype thereof,
and which is capable of holding arbitrary things of type
CHARACTER as elements.

And therefore

8. There is such a thing as a specialized array with elements
of type CHARACTER or some subtype thereof, which is capable
of holding arbitrary things of type CHARACTER as elements.

Which is what you said the standard doesn't say. (From #7
we can also deduce that this thing has actual array element
type a proper subtype of T, so it's not equivalent to
(array t (*)).)

I can see only one hole in this. It's sort of possible that
"can accommodate elements of type FOO" in the definition of
MAKE-STRING doesn't mean what I said it does, even though
the exact same language in the definition of MAKE-ARRAY does
mean that. I don't find this plausible.

I remark also the following, from 16.1.1 ("Implications
of strings being arrays"):

| Since all strings are arrays, all rules which apply
| generally to arrays also apply to strings. See
| Section 15.1 (Array Concepts).
..
| and strings are also subject to the rules of element
| type upgrading that apply to arrays.

I'd have thought that if strings were special in the kind
of way you're saying they are, there would be some admission
of the fact here. There isn't.

*

Elsewhere in the thread, you said

| an implementation that stores characters in strings
| as if they have null implementation-defined attributes
| regardless of their actual attributes is actually
| fully conforming to the standard.

I have been unable to find anything in the HyperSpec that
justifies this. Some places I've looked:

- 15.1.1 "Array elements" (in 15.1 "Array concepts")

I thought perhaps this might say something like
"In some cases, storing an object in an array will
actually store another object that need not be EQ
to the original object". Nope.

- The definitions of CHAR and AREF

Again, looking for any sign that an implementation
is allowed to store something non-EQ to what it's
given with (setf (aref ...) ...) or (setf (char ...) ...).
Again, no. The definition of CHAR just says that it
and SCHAR "access the element of STRING specified by INDEX".

- 13.1.3 "Character attributes"

Perhaps this might say "Storing a character in a string
may lose its implementation-defined attributes". Nope.
It says that the way in which two characters with the
same code differ is "implementation-defined", but I don't
see any licence anywhere for this to mean they get confused
when stored in an array.

- The definition of MAKE-STRING

I've discussed this already.

- The glossary entries for "string", "attribute", "element",
and various others.

Also discussed above.

- The whole of chapter 13 (Characters) and 16 (Strings).

No sign here, unless I've missed something.

- The definitions of types CHARACTER, BASE-CHAR, STANDARD-CHAR,
EXTENDED-CHAR.

Still no sign.

- The CHARACTER-PROPOSAL (which isn't, in any case, part of
the standard).

I thought this might give some sign of the phenomenon
you describe. Not that I can see.

Perhaps I'm missing something. It wouldn't be the first time.
But I just can't find any sign at all that what you claim is
true, and I can see rather a lot of things that suggest it isn't.

The nearest I can find is this, from 16.1.2 ("Subtypes of STRING"):

| However, the consequences are undefined if a character
| is inserted into a string for which the element type of
| the string does not include that character.

But that doesn't give any reason to believe that the result
of (MAKE-STRING n :ELEMENT-TYPE 'CHARACTER) doesn't have an
element type that includes all characters. And, as I've said
above, there's good reason to believe that it does.

> | In fact, you don't even need the :ELEMENT-TYPE option, because CHARACTER is
> | the default.
>
> sure. however, I'm trying to penetrate the armor-plated belief that the
> resulting string is REQUIRED to retain non-null implementation-defined
> attributes if stored into it. no such requirement exists: a conforming
> implementation is completely free to provide a single string type that is
> able to hold only simple characters. you may think this is a mistake in
> the standard, but it's exactly what it says, after the type string-char
> was removed.

Where?

--
Gareth McCaughan Gareth.M...@pobox.com
sig under construction

Jon S Anthony

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
Gareth McCaughan wrote:

>
> Erik Naggum wrote:
>
> > sure. however, I'm trying to penetrate the armor-plated belief that the
> > resulting string is REQUIRED to retain non-null implementation-defined
> > attributes if stored into it. no such requirement exists: a conforming
> > implementation is completely free to provide a single string type that is
> > able to hold only simple characters. you may think this is a mistake in
> > the standard, but it's exactly what it says, after the type string-char
> > was removed.
>
> Where?

The part from "a conforming implementation..." on is direcly supported
by
13.1.3:

| A character for which each implementation-defined attribute has the
| null value for that attribute is called a simple character. If the
| implementation has no implementation-defined attributes, then all
| characters are simple characters.

/Jon

--
Jon Anthony
Synquiry Technologies, Ltd. Belmont, MA 02478, 617.484.3383
"Nightmares - Ha! The way my life's been going lately,
Who'd notice?" -- Londo Mollari

Erik Naggum

unread,
Mar 18, 2000, 3:00:00 AM3/18/00
to
* Barry Margolin <bar...@bbnplanet.com>

| I'm still not following you. Are you saying that characters with
| implementation-defined attributes (e.g. bits or fonts) might not satisfy
| (typep c 'character)?

no. I'm saying that even as this _is_ the case, the standard does not
require a string to be able to hold and return such a character intact.

#:Erik

Erik Naggum

unread,
Mar 18, 2000, 3:00:00 AM3/18/00
to
* Tim Bradshaw <t...@cley.com>

| Which is precisely what I want to avoid unfortunately, as it means that
| either this code or the code that calls it has to deal with the issue of
| copying strings too and from arrays of (UNSIGNED-BYTE 8)s, which simply
| brings back the same problem somewhere else.

in this case, I'd talk to my vendor or dig deep in the implementation to
find a way to transmogrify an (unsigned-byte 8) vector to a character
vector by smashing the type codes instead of copying the data. (this is
just like change-class for vectors.) barring bivalent streams that can
accept either kind of vector (coming soon to an implementation near you),
having to deal with annoyingly stupid or particular external requirements
means it's OK to be less than nice at the interface level, provided it's
done safely.

#:Erik

Tim Bradshaw

unread,
Mar 18, 2000, 3:00:00 AM3/18/00
to
* Erik Naggum wrote:

> in this case, I'd talk to my vendor or dig deep in the implementation to
> find a way to transmogrify an (unsigned-byte 8) vector to a character
> vector by smashing the type codes instead of copying the data. (this is
> just like change-class for vectors.)

This doesn't work (unless I've misunderstood you) because I can't use
it for the string->unsigned-byte-array case, because the strings might
have big characters in them. Actually, it probably *would* work in
that I could arrange to get a twice-as-big array if the string had
16-bit characters in (or 4x as big if ...), but I think the other end
would expect UTF-8 or something in that case (or, more likely, just
throw up its hands in horror at the thought that characters are not 8
bits wide, it's a pretty braindead design).

It looks to me like the outcome of all this is that there isn't a
portable CL way of ensuring what I need to be true is true, and that I
need to ask vendors for per-implementation answers, and meantime punt
on the issue until my code is more stable. Which are fine answers
from my point of view, in case anyone thinks I'm making the standard
`lisp won't let me do x' complaint.

> barring bivalent streams that can accept either kind of vector
> (coming soon to an implementation near you), having to deal with
> annoyingly stupid or particular external requirements means it's
> OK to be less than nice at the interface level, provided it's done
> safely.

Yes, I agree with this.

--tim

Erik Naggum

unread,
Mar 18, 2000, 3:00:00 AM3/18/00
to
* Tim Bradshaw <t...@cley.com>

| This doesn't work (unless I've misunderstood you) because I can't use
| it for the string->unsigned-byte-array case, because the strings might
| have big characters in them.

sigh. so read (unsigned-byte 8), smash the type code so it's a string of
non-big characters, and do _whatever_ you need to do with the string,
then smash the type code and write (unsigned-byte 8) to whatever.

| It looks to me like the outcome of all this is that there isn't a
| portable CL way of ensuring what I need to be true is true, and that I
| need to ask vendors for per-implementation answers, and meantime punt on
| the issue until my code is more stable. Which are fine answers from my
| point of view, in case anyone thinks I'm making the standard `lisp won't
| let me do x' complaint.

portable languages are for portable problems. conversely, non-portable
problems may require non-portable solutions. I don't have a problem with
that, but many seem to have.

#:Erik

Gareth McCaughan

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to
Jon S Anthony wrote:

> Gareth McCaughan wrote:
>>
>> Erik Naggum wrote:
>>

>>> sure. however, I'm trying to penetrate the armor-plated belief that the
>>> resulting string is REQUIRED to retain non-null implementation-defined
>>> attributes if stored into it. no such requirement exists: a conforming
>>> implementation is completely free to provide a single string type that is
>>> able to hold only simple characters. you may think this is a mistake in
>>> the standard, but it's exactly what it says, after the type string-char
>>> was removed.
>>
>> Where?
>

> The part from "a conforming implementation..." on is direcly supported
> by
> 13.1.3:
>
> | A character for which each implementation-defined attribute has the
> | null value for that attribute is called a simple character. If the
> | implementation has no implementation-defined attributes, then all
> | characters are simple characters.

Well, yes, but it's not actually relevant to the point
Erik's making.

The paragraph you quote says implies *if* an implementation
has no implementation-defined attributes, *then* that
implementation is free to make all its strings hold
only simple characters. In other words, if all characters
are simple then you can have strings that can only contain
simple characters. Surprise, surprise. :-)

What Erik's saying is that there needn't be any string
type that can hold arbitrary character objects. This claim
isn't supported by the paragraph you quoted, so far as
I can see.

Erik Naggum

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to
* Gareth McCaughan <Gareth.M...@pobox.com>

| I'm not Barry, but I think I can. Provided I'm allowed to use the
| HyperSpec (which I have) rather than the Standard itself (which I don't).

note that this all hinges on the definition of STRING, not CHARACTER.

we all agree that character objects may have implementation-defined
attributes. the crux of the matter is whether strings are _required_ to
support these implementation-defined attributes for characters stored in
them, or is _permitted_ only to hold simple characters, i.e., characters
that have null or no implementation-defined attributes. sadly, nothing
you bring up affects this crucial argument.

there are two compelling reasons why implementation-defined attributes
are _not_ required to be retained in strings: (1) there is special
mention of which implementation-defined attributes are discarded when
reading a string literal from an input stream (which apparently may
support reading them, but nothing is indicated as to how this happens),
and (2) historically, strings did not retain bits and fonts, so if they
were to be supported by an implementation that conformed to CLtL1, they
would have to be _added_ to strings, while bits and fonts were explicitly
_removed_ from the language.

| 1. MAKE-STRING is defined to return "a string ... of the most
| specialized type that can accommodate elements of the given
| type".
|
| 2. The default "given type" is CHARACTER.
|
| 3. Therefore, MAKE-STRING with the default ELEMENT-TYPE
| returns a string "that can accommodate elements of the
| type CHARACTER".

the question boils down to whether the character concept as defined in
isolation is the same as the character concept as defined as part of a
string. if they are, your logic is impeccable. if they aren't the same,
your argument is entirely moot. I'm arguing that the crucial clue to
understand that there is a difference is indicated by the unique "union
type" of strings and the phrase "or a subtype of character" which is not
used of any other specialized array in the same way it is for strings --
no other types permit _only_ a subtype.

I'm arguing that an implementation is not required not to have a weaker
character concept in strings than in isolation, i.e., that strings may
_only_ hold a subtype of character, that implementation-defined
attributes are defined only to exist (i.e., be non-null) in isolated
character objects, and not in characters as stored in strings.

| Now,
|
| 5. A "string" is defined as "a specialized vector ... whose
| elements are of type CHARACTER or a subtype of type CHARACTER".

_please_ note that no other specialized vector type is permitted the
leeway that "or a subtype of" implies here. for some bizarre reason, the
bad imitation jerk from Harlequin thought that he could delete "of type
CHARAACTER" since every type is a subtype of itself. however, the key is
that this wording effectively allows a proper subtype of character to be
represented in strings. a similar wording does not exist _elsewhere_ in
the standard, signifying increased importance by this differentiation.

| 8. There is such a thing as a specialized array with elements
| of type CHARACTER or some subtype thereof, which is capable
| of holding arbitrary things of type CHARACTER as elements.

this is a contradiction in terms, so I'm glad you conclude this, as it
shows that carrying "or a subtype thereof" with you means precisely that
the standard does not require a _single_ string type to be able to hold
_all_ character values. that is why string is a union type, unlike all
other types in the language.



| I'd have thought that if strings were special in the kind of way you're
| saying they are, there would be some admission of the fact here. There
| isn't.

there is particular mention of "or a subtype of character" all over the
place when strings are mentioned. that's the fact you're looking for.

however, if you are willing to use contradictions in terms as evidence of
something and you're willing to ignore facts on purpose, there is not
much that logic and argumentation alone can do to correct the situation.

| I have been unable to find anything in the HyperSpec that justifies this.

again, look for "or a subtype of character" in the definition of STRING.

#:Erik

Tim Bradshaw

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to
* Erik Naggum wrote:
* Tim Bradshaw <t...@cley.com>
> | This doesn't work (unless I've misunderstood you) because I can't use
> | it for the string->unsigned-byte-array case, because the strings might
> | have big characters in them.

> sigh. so read (unsigned-byte 8), smash the type code so it's a string of
> non-big characters, and do _whatever_ you need to do with the string,
> then smash the type code and write (unsigned-byte 8) to whatever.

Unless I've misunderstood you this still won't work. The strings I
need to write will come from lisp code, not down the wire -- I have no
control at all over what the user code puts in them. If they have big
characters in then I assume that smashing the type code will, if it
works at all, double / quadruple the length and result in me writing
stuff to the other end that it won't be able to cope with. What I'm
after, if it is possible, is a way of asking lisp if this string
contains (or might contain) big characters, without iterating over the
string checking.

--tim

Gareth McCaughan

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to
Erik Naggum wrote:

> we all agree that character objects may have implementation-defined
> attributes. the crux of the matter is whether strings are _required_ to
> support these implementation-defined attributes for characters stored in
> them, or is _permitted_ only to hold simple characters, i.e., characters
> that have null or no implementation-defined attributes. sadly, nothing
> you bring up affects this crucial argument.

What about the complete absence of any statement anywhere
in the standard (so far as I can tell) that it's legal for
storing characters in a string to throw away their attributes?

> there are two compelling reasons why implementation-defined attributes
> are _not_ required to be retained in strings: (1) there is special
> mention of which implementation-defined attributes are discarded when
> reading a string literal from an input stream (which apparently may
> support reading them, but nothing is indicated as to how this happens),
> and (2) historically, strings did not retain bits and fonts, so if they
> were to be supported by an implementation that conformed to CLtL1, they
> would have to be _added_ to strings, while bits and fonts were explicitly
> _removed_ from the language.

I don't see why #1 is relevant. #2 is interesting, but the
language is defined by what the standard says, not by what
it used to say.

> | 1. MAKE-STRING is defined to return "a string ... of the most
> | specialized type that can accommodate elements of the given
> | type".
> |
> | 2. The default "given type" is CHARACTER.
> |
> | 3. Therefore, MAKE-STRING with the default ELEMENT-TYPE
> | returns a string "that can accommodate elements of the
> | type CHARACTER".
>

> the question boils down to whether the character concept as defined in
> isolation is the same as the character concept as defined as part of a
> string. if they are, your logic is impeccable. if they aren't the same,
> your argument is entirely moot. I'm arguing that the crucial clue to
> understand that there is a difference is indicated by the unique "union
> type" of strings and the phrase "or a subtype of character" which is not
> used of any other specialized array in the same way it is for strings --
> no other types permit _only_ a subtype.

The point here is simply that there can be several different
kinds of string. The standard says that there may be string
types that only permit a subtype of CHARACTER; it doesn't say
that there need be no string type that permits CHARACTER itself.

> I'm arguing that an implementation is not required not to have a weaker
> character concept in strings than in isolation, i.e., that strings may
> _only_ hold a subtype of character, that implementation-defined
> attributes are defined only to exist (i.e., be non-null) in isolated
> character objects, and not in characters as stored in strings.

I understand that you're arguing this. I still don't see anything
in the HyperSpec that supports it.

> | Now,
> |
> | 5. A "string" is defined as "a specialized vector ... whose
> | elements are of type CHARACTER or a subtype of type CHARACTER".
>

> _please_ note that no other specialized vector type is permitted the
> leeway that "or a subtype of" implies here.

I understand this. But I don't think STRING is a specialized vector
type, even though it's a type all of whose instances are specialized
vectors :-). The type STRING is a union of types; each of those
types is a specialized vector type. *This* is the reason for the
stuff about "or a subtype of": some strings might not be able to
hold arbitrary characters. But I still maintain that the argument
I've given demonstrates that some strings *are* able to hold
arbitrary characters.

> for some bizarre reason, the
> bad imitation jerk from Harlequin thought that he could delete "of type
> CHARAACTER" since every type is a subtype of itself. however, the key is
> that this wording effectively allows a proper subtype of character to be
> represented in strings. a similar wording does not exist _elsewhere_ in
> the standard, signifying increased importance by this differentiation.

I agree that the wording allows a proper subtype of CHARACTER
to be represented in strings. What it doesn't allow is for
all strings to allow only proper subtypes.

Here's a good way to focus what we disagree about. In the
definition of the class STRING, we find

| A string is a specialized vector whose elements are


| of type CHARACTER or a subtype of type CHARACTER.

| When used as a type specifier for object creation,
| STRING means (VECTOR CHARACTER).

You're arguing, roughly, that "a subtype of type CHARACTER"
might mean "the implementors can choose a subtype of CHARACTER
which is all that strings can contain". I think it means
"there may be strings that only permit a subtype of CHARACTER",
and that there's other evidence that shows that there must
also be strings that permit arbitrary CHARACTERs.

I've just noticed a neater proof than the one I gave before,
which isn't open to your objection that "character" might
sometimes not mean the same as CHARACTER.

1. [System class STRING] "When used as a type specifier
for object creation, STRING means (VECTOR CHARACTER)."
and: "This denotes the union of all types (ARRAY c (SIZE))
for all subtypes c of CHARACTER; that is, the set of
strings of size SIZE."

2. [System class VECTOR] "for all types x, (VECTOR x) is
the same as (ARRAY x (*))."

3. [Function MAKE-ARRAY] "The NEW-ARRAY can actually store
any objects of the type which results from upgrading
ELEMENT-TYPE".

4. [Function UPGRADED-ELEMENT-TYPE] "The TYPESPEC is
a subtype of (and possibly type equivalent to) the
UPGRADED-TYPESPEC." and: "If TYPESPEC is CHARACTER,
the result is type equivalent to CHARACTER."

5. [Glossary] "type equivalent adj. (of two types X and Y)
having the same elements; that is, X is a subtype of Y
and Y is a subtype of X."

6. [Function MAKE-ARRAY] "Creates and returns an array


constructed of the most specialized type that can

accommodate elements of type given by ELEMENT-TYPE."

7. [Function MAKE-STRING] "a string is constructed of the


most specialized type that can accommodate elements
of the given type."

So, consider the result S of

(make-array 10 :element-type 'character) .

We know (3) that it can store any objects of the type which
results from upgrading CHARACTER. We know (4) that that type
is type-equivalent to CHARACTER. We therefore know that S
can store any object of type CHARACTER.

We also know that (6) S is an array constructed of the most
specialized type that can accommodate elements of type
CHARACTER. So (1,2,7) is the result S' of

(make-string 10 :element-type 'character) .

Therefore S and S' are arrays of the same type.

We've just shown that S

- can hold arbitrary characters
- is of the same type as S'

and of course S' is a string. Therefore S is also a string.
Therefore there is at least one string (namely S) that
can hold arbitrary characters.

> | 8. There is such a thing as a specialized array with elements
> | of type CHARACTER or some subtype thereof, which is capable
> | of holding arbitrary things of type CHARACTER as elements.
>

> this is a contradiction in terms, so I'm glad you conclude this, as it
> shows that carrying "or a subtype thereof" with you means precisely that
> the standard does not require a _single_ string type to be able to hold
> _all_ character values. that is why string is a union type, unlike all
> other types in the language.

It doesn't require *every* string type to be able to hold all
character values. It does, however, require *some* string type
to be able to hold all character values.

The reason why STRING is a union type is that implementors
might want to have (say) an "efficient" string type that uses
only one byte per character, for storing "easy" strings. Having
this as well as a type that can store arbitrary characters,
and having them both be subtypes of STRING, requires that
STRING be a union type.

Oh, and what I wrote isn't a contradiction in terms. It would be
if "or some subtype thereof" meant "or some particular subtype
thereof that might be imposed once-for-all by the implementation",
but I don't think it means that.

> | I'd have thought that if strings were special in the kind of way you're
> | saying they are, there would be some admission of the fact here. There
> | isn't.
>

> there is particular mention of "or a subtype of character" all over the
> place when strings are mentioned. that's the fact you're looking for.

I don't believe the fact means what you think it does.

> however, if you are willing to use contradictions in terms as evidence of
> something and you're willing to ignore facts on purpose, there is not
> much that logic and argumentation alone can do to correct the situation.

I've explained why I don't think what I wrote is a contradiction
in terms.

I assure you that I am not ignoring anything on purpose.

Erik Naggum

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
* Gareth McCaughan <Gareth.M...@pobox.com>

| What about the complete absence of any statement anywhere in the standard
| (so far as I can tell) that it's legal for storing characters in a string
| to throw away their attributes?

what of it? in case you don't realize the full ramification of the
equally completely absence of any mechanism to use, query, or set these
implementation-defined attributes to characters, the express intent of
the removal of bits and fonts were to remove character attributes from
the language. they are no longer there as part of the official standard,
and any implementation has to document what it does to them as part of
the set of implementation-defined features. OBVIOUSLY, the _standard_ is
not the right document to prescribe the consequences of such features!
an implementation, consequently, may or may not want to store attributes
in strings, and it is free to do or not to do so, and the standard cannot
prescribe this behavior.

conversely, if implementation-defined attributes were to be retained,
shouldn't they have an explicit statement that they were to be retaiend,
which would require an implementation to abide by certain rules in the
implementation-defined areas? that sounds _much_ more plausible to me
than saying "implementation-defined" and then defining it in the standard.

when talking about what an implementation is allowed to do on its own
accord, omitting specifics means it's free to do whatever it pleases. in
any requirement that is covered by conformance clauses, an omission is
treated very differently: it means you can't do it. we are not talking
about _standard_ attributes of characters (that's the code, and that's
the only attribute _required_ to be in _standard_ strings), but about
implementation-defined attributes.

| I don't see why #1 is relevant. #2 is interesting, but the language is
| defined by what the standard says, not by what it used to say.

it says "implementation-defined attributes" and it says "subtype of
character", which is all I need to go by. you seem to want the standard
to prescribe implementation-defined behavior. this is an obvious no-go.

it is quite the disingenious twist to attempt to rephrase what I said as
"what the standard used to say", but I'm getting used to a lot of weird
stuff from your side already, so I'll just point out to you that I'm
referring to how it came to be what it is, not what it used to say. if
you can't see the difference, I can't help you understand, but if you do
see the difference, you will understand that no standard or other
document written by and intended for human beings can ever be perfect in
the way you seem to expect. expecting standards to be free of errors or
of the need of interpretation by humans is just mind-bogglingly stupid,
so I'm blithly assuming that you don't hold that view, but instead don't
see that you are nonetheless flirting with it.

| The point here is simply that there can be several different kinds of
| string. The standard says that there may be string types that only
| permit a subtype of CHARACTER; it doesn't say that there need be no
| string type that permits CHARACTER itself.

sigh. the point I'm trying to make is that it doesn't _require_ there to
be one particular string type which can hold characters with all the
implementation-defined attributes.

| (make-array 10 :element-type 'character) [S]
| (make-string 10 :element-type 'character) [S']


|
| Therefore S and S' are arrays of the same type.

sorry, this is a mere tautology that brings nothing to the argument.

| Therefore there is at least one string (namely S) that can hold arbitrary
| characters.

but you are not showing that it can hold arbitrary characters. _nothing_
in what you dig up actually argues that implementation-defined attributes
have standardized semantics. an implementation is, by virtue of its very
own definition of the semantics, able to define a character in isolation
as having some implementation-defined attributes and strings to contain
characters without such implementation-defined attributes. this is the
result of the removal of the type string-char and the subsequent merging
of the semantics of character and string-char.

| It doesn't require *every* string type to be able to hold all character
| values. It does, however, require *some* string type to be able to hold
| all character values.

where do you find support for this? nowhere does the standard say that a
string must retain implementation-defined attributes of characters. it
does say that the code attribute is the only standard attributes, and it
is obvious that that attribute must be retained wherever. it is not at
all obvious that implementation-defined attributes must survive all kinds
of operations.

you've been exceedingly specific in finding ways to defend your position,
but nowhere do you find actual evidence of a requirement that there exist
a string type that would not reject at least some character objects. I'm
sorry, but the premise that some string type _must_ be able to hold _all_
characters, including all the implementation-defined attributes that
strings never were intended to hold to begin with, is no more than
unsupported wishful thinking, but if you hold this premise as axiomatic,
you won't see that it is unsupported. if you discard it as an axiom and
then try to find support for it, you find that you can't -- the language
definition is sufficiently slippery that these implementation-defined
attributes don't have any standard-prescribed semantics for them at all,
including giving the implementation leeway to define their behavior,
which means: not _requiring_ anything particular about them, which means:
not _requiring_ strings to retain them, since that would be a particular
requirement about an implementation-defined property of the language.



| The reason why STRING is a union type is that implementors might want to
| have (say) an "efficient" string type that uses only one byte per
| character, for storing "easy" strings. Having this as well as a type
| that can store arbitrary characters, and having them both be subtypes of
| STRING, requires that STRING be a union type.

now, this is the interesting part. _which_ string would that be? as far
as I understand your argument, you're allowing an implementation to have
an implementation-defined standard type to hold simple characters (there
is only one _standard_ attribute -- the code), while it is _required_ to
support a wider _non-standard_ implementation-defined type? this is
another contradiction in terms. either the same requirement is standard
or it is implementation-defined -- it can't be both a the same time.

I quote from the character proposal that led to the changes we're
discussing, _not_ to imply that what isn't in the standard is more of a
requirement on the implementation than the standard, but to identify the
intent and spirit of the change. as with any legally binding document,
if you can't figure it out by reading the actual document, you go hunting
for the meaning in the preparatory works. luckily, we have access to the
preparatory works with the HyperSpec. it should shed light on the
wording in the standard, if necessary. in this case, it is necessary.

Remove all discussion of attributes from the language specification. Add
the following discussion:

``Earlier versions of Common LISP incorporated FONT and BITS as
attributes of character objects. These and other supported
attributes are considered implementation-defined attributes and
if supported by an implementation effect the action of selected
functions.''

what we have is a standard that didn't come out and say "you can't retain
bits and fonts from CLtL1 in characters", but _allowed_ an implementation
to retain them, in whatever way they wanted. since the standard removed
these features, it must be interpreted relative to that (bloody) obvious
intent if a wording might be interpreted by some that the change would
require providing _additional_ support for the removed features -- such
an interpretation _must_ be discarded, even if it is possible to argue
for it in an interpretative vacuum, which never exists in any document
written by and for human beings regardless of some people's desires.
(such a vacuum cannot even exist in mathematics -- which reading a
standard is not an exercise in, anyway -- any document must always be
read in a context that supplies and retains its intention, otherwise
_human_ communication breaks down completely.)

#:Erik

Barry Margolin

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
In article <31625068...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> sigh. the point I'm trying to make is that it doesn't _require_ there to
> be one particular string type which can hold characters with all the
> implementation-defined attributes.

Wouldn't those characters be of type CHARACTER? Mustn't a vector
specialized to type CHARACTER be able to hold all objects of type
CHARACTER? Isn't such a vector a subtype of STRING?

> where do you find support for this? nowhere does the standard say that a
> string must retain implementation-defined attributes of characters. it
> does say that the code attribute is the only standard attributes, and it
> is obvious that that attribute must be retained wherever. it is not at
> all obvious that implementation-defined attributes must survive all kinds
> of operations.

Filling in strings is just a particular case of assignment. Where does the
standard ever give license for a value to change during assignment?
I.e. I expect that (setf x <some character with I-D attrs>) will make the
variable X refer to that character object, and thus it will have those I-D
attributes. What then makes (setf (aref str 3) x) any different? It's
just an assignment, as far as the language specification is concerned. As
long as (typep x (array-element-type str)) holds, I would expect (eql (aref
str 3) x) to hold after this.

Note also that the standard does not leave everything about I-D attributes
up to the discretion of the implementor. For instance, it specifies that
characters are not CHAR= if any of their I-D attributes differ. And CHAR=
is the predicate that EQL uses to compare characters.

> now, this is the interesting part. _which_ string would that be? as far
> as I understand your argument, you're allowing an implementation to have
> an implementation-defined standard type to hold simple characters (there
> is only one _standard_ attribute -- the code), while it is _required_ to
> support a wider _non-standard_ implementation-defined type? this is
> another contradiction in terms. either the same requirement is standard
> or it is implementation-defined -- it can't be both a the same time.

It's required to have a type that supports the most general CHARACTER type
(and the other poster showed that this must have array-element-type
CHARACTER, not T). It's allowed to have other types that specialize on
subtypes of CHARACTER, for efficiency. The type STRING is the union of all
these types.

Well, I was there and you weren't, so I think I can comment on the intent,
to the best of my recollection.

What we wanted to remove from the standard were the API and UI that dealt
with the nature of specific attributes. We didn't want to distinguish
these specific attributes (bits and fonts), which often didn't make sense
in many implementations or applications. But I don't think we intended to
destroy the notion that attributes are part of the objects, and are thus
included in assignments just like any attributes and slots of other data
types. They could get lost during I/O, due to the fact that the language
can't specify the nature of external file formats, but as long as you stay
within the Lisp environment they should stick.

foot...@thcsv01.trafford.ford.com

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Erik Naggum <er...@naggum.no> writes:

[ lots snipped ]

>
> the question boils down to whether the character concept as defined in
> isolation is the same as the character concept as defined as part of a
> string. if they are, your logic is impeccable. if they aren't the same,
> your argument is entirely moot. I'm arguing that the crucial clue to
> understand that there is a difference is indicated by the unique "union
> type" of strings and the phrase "or a subtype of character" which is not
> used of any other specialized array in the same way it is for strings --
> no other types permit _only_ a subtype.
>

The "union type" of strings seems (to me) to only to relate to the use
of STRING as a type discriminator. When used as a type specifier no mention
is made of any unions and the type is explicitly defined as meaning
(VECTOR CHARACTER).

Are there any other type specifiers in CL that cover anything like the same
ground as STRING when used as a type discriminator?
If CL were to try and define a type discriminator that covered all
vectors that contained subtypes of fixnum (say) might it not use the same
language (i.e. UNIONs of specialized array element types) that STRING uses?

I get the impression that BASE-CHAR was introduced to give a more flexible
equivalent to STRING-CHAR, and BASE-STRING is perhaps closer to the CLtL1
idea of a string.

I've been reading the Hyperspec a lot today to try and see where you are coming
from but I still don't get it. It may well be because I find it very strange
(read "emotionally hard to accept") that if I ask for a vector capable of
holding type A I may get back a vector only capable of holding some strict
subtype of A but ONLY if I'm taking about strings (else ALWAYS a supertype
of A). But that is also why I think it is worth my spending time to get my
understanding of this straight. I'm hoping that any comments you may care to
make on the above will help me see how the spec supports your view or give
me enough conviction to decide that it doesn't.

[ lots more snipped ]
>
> #:Erik


Guy

Pekka P. Pirinen

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Tim Bradshaw <t...@cley.com> writes:
> * Pekka P Pirinen wrote:
> > You still need a better FFI: WRITE-SEQUENCE is just as much a foreign
> > interface as any.
>
> Yes, in fact it's worse than most, because I can't rely on the
> vendor/implementor to address the issues for me!

I don't see why not, in theory. Your experience might be different,
of course, it's a lot of work to put in proper support for all those
encodings. If you mean there's no standard interface to external
formats, that's unfortunately true, but that can be alleviated with a
small portability layer, to compute the right external format.

> > If you expect non-ASCII characters on the Lisp side, you need to know
> > what the encoding is on the other side, otherwise it might come out
> > wrong.
>
> Yes. And the problem is that since my stuff is a low-level facility
> which others (I hope) will build on, I don't really know what they
> will throw at me.

That's always much harder than writing an application. OK, so you
don't know what you will get on the Lisp side. It might not be
unreasonable to just document some restriction for the input for your
facility, especially if it arises out of the thing you're interfacing
to and can't change. Then if your checks don't catch all the
problems, it's not all your fault.

> > It might be enough to check the type of your strings (and perhaps the
> > external format of the stream), instead of every character.
>
> My hope is that BASE-STRING is good enough, but I'm not sure (I don't
> see that a BASE-STRING could not have more than 8-bit characters, if
> an implementation chose to have only one string type for instance (can
> it?)).

It can, but I'd be surprised to see an implementation with a 16-bit
string as the only type, for the reasons Erik mentioned. You could
again have some small non-portable function that checks this.
LispWorks and Liquid have a type called LW:8-BIT-STRING, that might
help.

> Checking the external format of the stream is also obviously
> needed but if it's :DEFAULT does that tell me anything, and if it's
> not I have to special case anyway.

Yes, it's going to be non-portable, but in a strictly localized way.
If you get :DEFAULT, it is supposed to denote an external format, you
just need to know which one (LW and Liquid will never return
:DEFAULT).

Then it gets complicated, because there are encodings that turn
BASE-CHARACTERs into multiple bytes, and EXTENDED-CHARACTERs into a
single byte. But aren't we trying to answer a question that is too
specific? Don't you just need to know what the encoding on the other
side is (ASCII, Latin-1, SJIS, UTF-8), and match that?

> Is there a useful, fast, check that that (write-sequence x
> y) will write (length x) bytes on y if all is well for LispWorks /

> Liquid (I don't have a license for these, unfortunately)?

LW Personal Edition is free to play with. If you don't have the time
for that, the manuals are on the web site (not that you'll always find
the answers there). EXTERNAL-FORMAT:EXTERNAL-FORMAT-FOREIGN-TYPE will
tell you what it is outputting (it might return, say, (UNSIGNED-BYTE
8)), but that's not the whole story, because of the existence of
multi-byte encodings that will use a variable number of bytes per
character. I think the problem is to define what "all is well" means.


--
Pekka P. Pirinen, Harlequin Limited

We use the facts we encounter the way a drunkard uses a
lamppost: more for support than illumination.
[adapted from A. E. Housman]

Jon S Anthony

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Gareth McCaughan wrote:
>
> Jon S Anthony wrote:
>
> > Gareth McCaughan wrote:
> >>
> >> Erik Naggum wrote:
> >>
> >>> sure. however, I'm trying to penetrate the armor-plated belief that the
> >>> resulting string is REQUIRED to retain non-null implementation-defined
> >>> attributes if stored into it. no such requirement exists: a conforming
> >>> implementation is completely free to provide a single string type that is
> >>> able to hold only simple characters. you may think this is a mistake in
> >>> the standard, but it's exactly what it says, after the type string-char
> >>> was removed.
> >>
> >> Where?
> >
> > The part from "a conforming implementation..." on is direcly supported
> > by
> > 13.1.3:
> >
> > | A character for which each implementation-defined attribute has the
> > | null value for that attribute is called a simple character. If the
> > | implementation has no implementation-defined attributes, then all
> > | characters are simple characters.
>
> Well, yes, but it's not actually relevant to the point
> Erik's making.

Obviously not the whole point (which is why I specifically said "the
part from ...",
but in the paragraph you quote, this is clearly one of the things he is
saying and
apparently, since you quote the whole thing, one for which you wondered
"where".

Nothing more, nothing less...

Gareth McCaughan

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Erik Naggum wrote:

> * Gareth McCaughan <Gareth.M...@pobox.com>
> | What about the complete absence of any statement anywhere in the standard
> | (so far as I can tell) that it's legal for storing characters in a string
> | to throw away their attributes?
>
> what of it? in case you don't realize the full ramification of the
> equally completely absence of any mechanism to use, query, or set these
> implementation-defined attributes to characters, the express intent of
> the removal of bits and fonts were to remove character attributes from
> the language. they are no longer there as part of the official standard,
> and any implementation has to document what it does to them as part of
> the set of implementation-defined features. OBVIOUSLY, the _standard_ is
> not the right document to prescribe the consequences of such features!
> an implementation, consequently, may or may not want to store attributes
> in strings, and it is free to do or not to do so, and the standard cannot
> prescribe this behavior.

Well, this is an argument. To paraphrase (and I'm sure you'll
tell me if this is misleading), "The behaviour of implementation-
-defined things is implementation-defined". The trouble is
that this proves too much -- if the implementation-defined-ness
of character attributes other than the CODE means that
implementations are allowed to throw them away when characters
are stored in strings, I don't see why it shouldn't also mean
that implementations are allowed to throw them away when you
bind them to variables, or put them in lists.

Maybe, of course, you'd say that too; and say, further, that
implementation-defined attributes can be so weird that all
bets are off as soon as a character has any, and the semantics
of any program that uses them are completely undefined. That
might be a *possible* reading of the standard, but it seems
a very unnatural one to me; I cannot for the life of me see
what the point of mentioning implementation-defined attributes
at all was if that was the intention.

It seems to me that what's implementation-defined is just
what attributes there are and how you access them, not their
possible ability to cause bizarre behaviour elsewhere in the
system. The standard says that you can put objects into arrays
and take them out again; nothing anywhere gives any indication
that you can do that and not get out the same object that you
put in in the first place.

> conversely, if implementation-defined attributes were to be retained,

> shouldn't they have an explicit statement that they were to be retained,


> which would require an implementation to abide by certain rules in the
> implementation-defined areas? that sounds _much_ more plausible to me
> than saying "implementation-defined" and then defining it in the standard.

I don't see why an explicit statement that i.d.a.s should be
preserved under storage in arrays is any more necessary than
an explicit statement that they should be preserved under
variable binding, or by CONS. An array is a container for
objects (that much is even said explicitly in the standard).

> when talking about what an implementation is allowed to do on its own
> accord, omitting specifics means it's free to do whatever it pleases. in
> any requirement that is covered by conformance clauses, an omission is
> treated very differently: it means you can't do it. we are not talking
> about _standard_ attributes of characters (that's the code, and that's
> the only attribute _required_ to be in _standard_ strings), but about
> implementation-defined attributes.

Sure. It's up to the implementation what attributes there may
be other than the CODE. I don't think "implementation-defined"
means anything other than that.

> | I don't see why #1 is relevant. #2 is interesting, but the language is
> | defined by what the standard says, not by what it used to say.
>
> it says "implementation-defined attributes" and it says "subtype of
> character", which is all I need to go by. you seem to want the standard
> to prescribe implementation-defined behavior. this is an obvious no-go.

No, I don't want the standard to prescribe implementation-defined
behaviour. I just don't think that "implementation-defined
attributes" means "mysterious things whose semantics may be
entirely inconsistent with everything in this standard".

> it is quite the disingenuous twist to attempt to rephrase what I said as


> "what the standard used to say",

I assure you that I had no intention of misrepresenting you.
I actually originally wrote something like "the relationship
between what it says now and what it used to say" but decided
that that was too verbose and pedantic. Evidently I should
have kept the original version. Too bad.

> but I'm getting used to a lot of weird
> stuff from your side already, so I'll just point out to you that I'm
> referring to how it came to be what it is, not what it used to say. if
> you can't see the difference, I can't help you understand, but if you do
> see the difference, you will understand that no standard or other
> document written by and intended for human beings can ever be perfect in
> the way you seem to expect.

I do see the difference, and I didn't intend to deny or obscure
it. I don't know why you think I expect the standard to be
perfect.

> expecting standards to be free of errors or
> of the need of interpretation by humans is just mind-bogglingly stupid,
> so I'm blithly assuming that you don't hold that view, but instead don't
> see that you are nonetheless flirting with it.

I do not expect it to be free of errors or not to need interpreting.

I do think that my interpretation is more natural than yours,
and that the amount of error the standard would have to contain
if your view were right is considerably more than the amount it
would have to contain if mine were right. Both of these (plus
the fact that it seems on the whole to have very few errors)
lead me to prefer my view to yours.

> | The point here is simply that there can be several different kinds of
> | string. The standard says that there may be string types that only
> | permit a subtype of CHARACTER; it doesn't say that there need be no
> | string type that permits CHARACTER itself.
>
> sigh. the point I'm trying to make is that it doesn't _require_ there to
> be one particular string type which can hold characters with all the
> implementation-defined attributes.

I know. And the point I'm trying to make is that it does. I've
explained how I deduce that from what it says.

> | (make-array 10 :element-type 'character) [S]
> | (make-string 10 :element-type 'character) [S']
> |
> | Therefore S and S' are arrays of the same type.
>
> sorry, this is a mere tautology that brings nothing to the argument.

That observation is there for a picky pedantic reason: that
I want to make it explicit not only that there are things
capable of holding arbitrary characters, but that those things
are actually strings in the sense defined in the standard.

> | Therefore there is at least one string (namely S) that can hold arbitrary
> | characters.
>
> but you are not showing that it can hold arbitrary characters. _nothing_
> in what you dig up actually argues that implementation-defined attributes
> have standardized semantics. an implementation is, by virtue of its very
> own definition of the semantics, able to define a character in isolation
> as having some implementation-defined attributes and strings to contain
> characters without such implementation-defined attributes. this is the
> result of the removal of the type string-char and the subsequent merging
> of the semantics of character and string-char.

I understand that this is your claim, but I still disagree. The
standard says that the result of (make-string 10), for instance,
is an array whose element-type is the result of upgrading the
type CHARACTER. It *doesn't* just say that it's a thing that
holds characters; it says that it holds CHARACTERs.

> | It doesn't require *every* string type to be able to hold all character
> | values. It does, however, require *some* string type to be able to hold
> | all character values.
>
> where do you find support for this?

In the arguments I've already given.

> you've been exceedingly specific in finding ways to defend your position,
> but nowhere do you find actual evidence of a requirement that there exist
> a string type that would not reject at least some character objects.

I consider that I have. It's true that there isn't a sentence
that says in so many words "There is a string type that allows
you to store arbitrary characters"; but there are sentences
that, by a simple process of deduction, imply that there is.

Clearly you consider my deduction flawed. You haven't, though,
said what's wrong with it. (You've said that some bits of it
are content-free, but I don't mind if they are. What matters
is whether any of it is actually *wrong* in the sense that
I say "X and therefore Y" where Y doesn't really follow from X.)

> I'm
> sorry, but the premise that some string type _must_ be able to hold _all_
> characters, including all the implementation-defined attributes that
> strings never were intended to hold to begin with, is no more than
> unsupported wishful thinking, but if you hold this premise as axiomatic,
> you won't see that it is unsupported.

I don't hold it as axiomatic.

> if you discard it as an axiom and
> then try to find support for it, you find that you can't -- the language
> definition is sufficiently slippery that these implementation-defined
> attributes don't have any standard-prescribed semantics for them at all,
> including giving the implementation leeway to define their behavior,
> which means: not _requiring_ anything particular about them, which means:
> not _requiring_ strings to retain them, since that would be a particular
> requirement about an implementation-defined property of the language.

Well, yes, the standard is slippery. You said yourself,
earlier, that no standard is perfect, and that interpretation
is always necessary. My interpretation of the standard is
that it's not intended to say what you say it says. That's
borne out by Barry Margolin's recollections of the discussions
of the standardising committee.

> | The reason why STRING is a union type is that implementors might want to
> | have (say) an "efficient" string type that uses only one byte per
> | character, for storing "easy" strings. Having this as well as a type
> | that can store arbitrary characters, and having them both be subtypes of
> | STRING, requires that STRING be a union type.
>
> now, this is the interesting part. _which_ string would that be? as far
> as I understand your argument, you're allowing an implementation to have
> an implementation-defined standard type to hold simple characters (there
> is only one _standard_ attribute -- the code), while it is _required_ to
> support a wider _non-standard_ implementation-defined type? this is
> another contradiction in terms. either the same requirement is standard
> or it is implementation-defined -- it can't be both a the same time.

Nope.

The "wider type" is the type CHARACTER, which is defined (by
the standard) to contain all characters, including those with
non-null implementation-defined attributes. But, even though
it may contain things some of whose properties are down to
the implementation, CHARACTER is of course not an implementation-
-defined type in any useful sense; it's clearly documented in
the standard.

> I quote from the character proposal that led to the changes we're
> discussing, _not_ to imply that what isn't in the standard is more of a
> requirement on the implementation than the standard, but to identify the
> intent and spirit of the change. as with any legally binding document,
> if you can't figure it out by reading the actual document, you go hunting
> for the meaning in the preparatory works. luckily, we have access to the
> preparatory works with the HyperSpec. it should shed light on the
> wording in the standard, if necessary. in this case, it is necessary.
>
> Remove all discussion of attributes from the language specification. Add
> the following discussion:
>
> ``Earlier versions of Common LISP incorporated FONT and BITS as
> attributes of character objects. These and other supported
> attributes are considered implementation-defined attributes and
> if supported by an implementation effect the action of selected
> functions.''
>
> what we have is a standard that didn't come out and say "you can't retain
> bits and fonts from CLtL1 in characters", but _allowed_ an implementation
> to retain them, in whatever way they wanted. since the standard removed
> these features, it must be interpreted relative to that (bloody) obvious
> intent if a wording might be interpreted by some that the change would
> require providing _additional_ support for the removed features

My interpretation of the standard doesn't make it require anyone
to add support for the removed features.

Tim Bradshaw

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
* Pekka P Pirinen wrote:
>>
>> Yes, in fact it's worse than most, because I can't rely on the
>> vendor/implementor to address the issues for me!

> I don't see why not, in theory.

Well, I think the issues I meant were `braindamaged thing at the other
end'. If it's a conventional FFI to a braindamaged language then the
vendor will typically have dealt with this, but there are so many
things that can be at the other end of a stream...

> That's always much harder than writing an application. OK, so you
> don't know what you will get on the Lisp side. It might not be
> unreasonable to just document some restriction for the input for your
> facility, especially if it arises out of the thing you're interfacing
> to and can't change. Then if your checks don't catch all the
> problems, it's not all your fault.

That is almost certaintly what I will do.

> It can, but I'd be surprised to see an implementation with a 16-bit
> string as the only type, for the reasons Erik mentioned. You could
> again have some small non-portable function that checks this.
> LispWorks and Liquid have a type called LW:8-BIT-STRING, that might
> help.

Thanks, well-contained implementation-dependencies are a fine solution
to this i think.

--tim


Erik Naggum

unread,
Mar 21, 2000, 3:00:00 AM3/21/00
to
* Barry Margolin <bar...@bbnplanet.com>

| Wouldn't those characters be of type CHARACTER? Mustn't a vector
| specialized to type CHARACTER be able to hold all objects of type
| CHARACTER? Isn't such a vector a subtype of STRING?

what was the _intent_ of removing string-char and making fonts and bits
implementation-defined? has that intent been carried forward all the way?

| Where does the standard ever give license for a value to change during
| assignment?

16.1.2 Subtypes of STRING, and I qoute:

However, the consequences are undefined if a character is inserted into a
string for which the element type of the string does not include that
character.

| Well, I was there and you weren't, so I think I can comment on the intent,
| to the best of my recollection.

that's appreciated, but I must say I find "I was there and you weren't"
to be amazingly childish as "arguments" go.

| What we wanted to remove from the standard were the API and UI that dealt
| with the nature of specific attributes. We didn't want to distinguish
| these specific attributes (bits and fonts), which often didn't make sense
| in many implementations or applications. But I don't think we intended
| to destroy the notion that attributes are part of the objects, and are
| thus included in assignments just like any attributes and slots of other
| data types. They could get lost during I/O, due to the fact that the
| language can't specify the nature of external file formats, but as long
| as you stay within the Lisp environment they should stick.

perhaps you, who were presumably there for the duration, could elaborate
on the intended _consequences_ of the removal of the string-char type and
the change to strings from being made up of a subtype of character that
explicitly excluded fonts and bits to a character type that didn't need
to include fonts and bits?

#:Erik

Erik Naggum

unread,
Mar 21, 2000, 3:00:00 AM3/21/00
to
* Gareth McCaughan <Gareth.M...@pobox.com>

| Well, this is an argument. To paraphrase (and I'm sure you'll tell me if
| this is misleading), "The behaviour of implementation- -defined things is
| implementation-defined". The trouble is that this proves too much -- if
| the implementation-defined-ness of character attributes other than the
| CODE means that implementations are allowed to throw them away when
| characters are stored in strings, I don't see why it shouldn't also mean
| that implementations are allowed to throw them away when you bind them to
| variables, or put them in lists.

you don't see that? well, I can't help you see that, then. let me just
reiterate what I have previously said: a regular array with element-type
t can hold any character object. if you insist on being silly, however,
there's nothing I can do to prevent this from going completely wacky.

| No, I don't want the standard to prescribe implementation-defined
| behaviour. I just don't think that "implementation-defined
| attributes" means "mysterious things whose semantics may be
| entirely inconsistent with everything in this standard".

that is your interpretation, Gareth, and I claim it's unsupported by
facts, but ever more supported by silliness and "I don't see why"'s.
now, I do see how you are reaching your conclusion, I just don't accept
that you have refuted the one thing I'm still claiming: that the standard
_requires_ there to be a specialized array (string) that must be able to
hold all character objects. since you're going into silly mode, I can
only guess that you don't understand my argument and have to ridicule it.

and let me just say that an implementation that chooses to allow strings
to hold all attributes is obviously just as conforming as one that only
has strings that holds the code attribute.

| I don't know why you think I expect the standard to be perfect.

because you use it as the basis of proofs that you expect to be
universally valid without recognizing your own interpretative work
(including omitting irrelevant points that seem irrelevant to you) in
constructing them. the confidence in perfection required to do this is
quite staggering.

| I do think that my interpretation is more natural than yours, and that
| the amount of error the standard would have to contain if your view were
| right is considerably more than the amount it would have to contain if
| mine were right. Both of these (plus the fact that it seems on the whole
| to have very few errors) lead me to prefer my view to yours.

this, however, is a valid line of argument. I just happen to disagree.

| That observation is there for a picky pedantic reason: that I want to
| make it explicit not only that there are things capable of holding
| arbitrary characters, but that those things are actually strings in the
| sense defined in the standard.

and I have already pointed out that the union type does not contain the
individiual type that is _required_ to hold all characters. to get that,
you have to upgrade the element-type to t.

| Clearly you consider my deduction flawed. You haven't, though, said
| what's wrong with it.

yes, I have. I have pointed out that it ignores several important
factors that affect how you can interpret the standard. in particular,
that the requirement you come up with _adds_ additional burden to an
implementation that decides to continue to support implementation-defined
attributes above and beyond what it needed to do before they were removed
from the standard. this is clearly a serious mismatch between intent and
expression, and we need to understand the intent behind the standard when
it seems to say something that isn't very smart.

| That's borne out by Barry Margolin's recollections of the discussions of
| the standardising committee.

for some reason, I have yet to see those posted to the newsgroup. if you
could mail them to me, I'd be much obliged.

| My interpretation of the standard doesn't make it require anyone to add
| support for the removed features.

now, this is just plain ridiculous or unbelievably ignorant.

strings in CLtL1 were made up of string-char elements, not character.
string-char explicitly excluded fonts and bits attributes. now that
string-char has been removed, and you claim strings have to contain the
whole character type, and not only a subtype, as I claim, the string that
used to be able to contain only the code attribute, now has to be able to
contain characters _with_ implementation-defined attributes, as well.
this is NOT A QUIET CHANGE to the implementation -- it has a really major
impact on system storage requirements. this fact, however, is recognized
in the reader for strings (which may dump attributes at will, however
they wound up in the characters read from an input stream) and intern
(which may also dump them at will, regardless of how they could get into
the string to begin with).

clearly, you don't understand the implications of what you interpret the
standard to say if you don't understand that it forces an implementation
to _add_ support for a feature the standard effectively deprecates.

#:Erik