Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what is false

38 views
Skip to first unread message

Tom Lord

unread,
Dec 8, 2002, 3:38:58 AM12/8/02
to

So, let's say I'm going to implement a Schemish language. By which I
mean (roughly):


functions and variables in the same namespace, functions
as first-class values

upward funargs / closures / however you want to say it

call-with-current-continuation

modules not based on symbol identity

a much richer set of primitives and a much richer run-time system


but don't mean:

strict RnRS conformance


Or (again roughly): CL but without compatibility requirements, and not
afraid to pick up a few good things from Scheme.

Now, if I make nil and #f be eq?, then I think my data structures will
be naturally isomorphic with some CL data structures. If I make them
different, then I'm betting CL doesn't matter. If they're the same,
we can pretty much read and write to each other. If they're
different, it's a pain in the ass.

So, with far too little context to make it a meaningful question:
should CL take precedence over the prevailing opinion on c.l.s?
Does CL matter in this regard?

-t

Bruce Hoult

unread,
Dec 8, 2002, 3:51:30 AM12/8/02
to
In article <uv61925...@corp.supernews.com>,
lo...@emf.emf.net (Tom Lord) wrote:

> So, let's say I'm going to implement a Schemish language.

Stupid question, but why do you need a new language that is almost but
not quite Scheme?

-- Bruce

Tom Lord

unread,
Dec 8, 2002, 4:16:11 AM12/8/02
to

Stupid question, but why do you need a new language that is
almost but not quite Scheme?

Hmm. Stupid answers?:

To write the ultimate emacs.

or

It matches the way I think (and I've been around).


or


The rabbit thesis is cool. RnRS is tiresome.


or


I started (20 years ago) with forth and wound up here.


or


Beauty. Why not?


More serious answer:

Lots of little reasons, plus a trust in my overall aesthetics.

On the narrow #f/nil issue, the c.l.s. crowd, by virtue of their
unanimity, have me convinced to override my intuitions and make
#f and nil distinct. But maybe c.l.l'ers can soften that.

-t


Frank A. Adrian

unread,
Dec 8, 2002, 12:52:54 PM12/8/02
to
Tom Lord wrote:

> So, with far too little context to make it a meaningful question:
> should CL take precedence over the prevailing opinion on c.l.s?

Why should it? It's a new language. Let a thousand flowers bloom.

> Does CL matter in this regard?

Since you're already signed up to re-invent wheels, why should it?

If you have the cajones to go and throw the world a new language, you should
also have the same to break or maintain whatever current conventions YOU
want. If you don't have the latter, I doubt you have the former. In
short, it's your language - figure out what YOU think is the better or most
aesthetically pleasing and do it. Don't try to start flame wars in order
to have others decide for you and avoid responsibility for what is
ultimately YOUR decision. And if you can't make this very fundamental
decision without a bunch of people chiming in on what THEY think is
correct, I doubt you have the knowledge or background to do what YOU are
setting off to do.

Yes, I know the above message is harsh, but by choosing to offer a gift to
the world, you are choosing a harsh life. Get on with it or give it up.

faa

Tom Lord

unread,
Dec 8, 2002, 7:04:28 PM12/8/02
to

Don't try to start flame wars in order to have others decide for you
and avoid responsibility for what is ultimately YOUR decision.

Some people like to flame me, but that isn't my goal.


[Have the balls to just do what you think is right.]

Right. I think I do.

But the #f/nil question comes down to intuition. I'm all for the
synergies that arise from the consistent application of a single
intuition, but the average of a hundred intuitions is certainly
interesting to look at, especially when the votes come out 99:1.
I asked here wondering whether c.l.s. had just been a selection
error. "Nobody knows anything but collectively we know everything" or
something like that.

Yes, I know the above message is harsh,

Weird. It didn't seem that way to me. It seemed interesting, but
maybe a touch more condescending than was needed. I think that if
engineers were partitioned by mapping their communication styles to
star trek races, I'd probably be a klingon. "Harsh", by federation
standards, is merely "direct", by mine.

Let a thousand flowers bloom.

Si.


"Stay. Or go. But do it because ..."
-t

(Star Trek references? I should probably be embarassed.)

Frank A. Adrian

unread,
Dec 8, 2002, 8:29:12 PM12/8/02
to
Tom Lord wrote:

> But the #f/nil question comes down to intuition.

Although I believe stongly in the value of intuition, I do not believe this
is the case. I believe that having a definite domain for booleans vs. the
empty list has some definite advantages and disadvantages. What is decided
with respect to this point is due to what the designer (or those he
believes will use the language) values.

If you wish to open the controversy further, you might also want to conflate
0 with all of these values as was done in QKS Smalltalk, with 0, nil, and
false, ala C-family languages(but remember that Smalltalk does not have
dotted lists, so the problem of representing (x . 0) or (x . false) does
not occur. You also might want to remove these so you could do the same).

In any case, I believe that the decision will have more to do with the value
of pragmatism and usability vs. some value of rigorous mathematical
cleanliness and (perhaps) a slight ease of learning.

faa

Kenny Tilton

unread,
Dec 8, 2002, 8:49:12 PM12/8/02
to

Tom Lord wrote:
> So, let's say I'm going to implement a Schemish language. By which I
> mean (roughly):
>
>
> functions and variables in the same namespace, functions
> as first-class values
>
> upward funargs / closures / however you want to say it
>
> call-with-current-continuation
>
> modules not based on symbol identity
>
> a much richer set of primitives and a much richer run-time system
>
>
> but don't mean:
>
> strict RnRS conformance
>
>
> Or (again roughly): CL but without compatibility requirements, and not
> afraid to pick up a few good things from Scheme.
>
> Now, if I make nil and #f be eq?, then I think my data structures will
> be naturally isomorphic with some CL data structures.

You did not mention that (or the intercommunication you mentioned next)
as design goals, yet you raise it as a consideration, so... maybe decide
first if it is a design goal the do the Right Thing.

> If I make them
> different, then I'm betting CL doesn't matter.

fyi, I do not understand that last sentence. Doesn't matter for what?

> If they're the same,
> we can pretty much read and write to each other. If they're
> different, it's a pain in the ass.

Really sounding like a design goal, now. :)

>
> So, with far too little context to make it a meaningful question:
> should CL take precedence over the prevailing opinion on c.l.s?
> Does CL matter in this regard?

Well, what do /you/ think of nil qua false qua nil? I see why Schemers
differentiate them, but the practical value of nil qua false is
enormous, and in my experience that tells me that somewhere beyond by
poor powers to conceive, I (and the Schemers) are wrong: nil and false
are one. as in, maybe false and true are two /measures/ of the
quantifiable "truth" which comes in two quantities: none and all (gray
areas are a gray area).

Anyway, it is not whether you want to bow to Scheme or CL on this score.
Think about the design goals, and if they are neutral or evenly balanced
on this, just do what you think is right. my 2.

--

kenny tilton
clinisys, inc
---------------------------------------------------------------
""Well, I've wrestled with reality for thirty-five years, Doctor,
and I'm happy to state I finally won out over it.""
Elwood P. Dowd

Jeremy H. Brown

unread,
Dec 8, 2002, 9:13:42 PM12/8/02
to
lo...@emf.emf.net (Tom Lord) writes:
> Now, if I make nil and #f be eq?, then I think my data structures will
> be naturally isomorphic with some CL data structures. If I make them
> different, then I'm betting CL doesn't matter. If they're the same,
> we can pretty much read and write to each other. If they're
> different, it's a pain in the ass.
>
> So, with far too little context to make it a meaningful question:
> should CL take precedence over the prevailing opinion on c.l.s?
> Does CL matter in this regard?

Handwaving wildly here: consider making #f, '(), 0, etc. distinct
objects under the hood. But also give yourself hooks for dynamically
changing the behavior of eq? when applied to them (and also the
behavior of if, and, or, etc.), perhaps using a special variable; you
could then just tweak the variable when calling into CL-style code.

You haven't said how much you care about performance issues; if you're
out to be the fastest gun in the west, this may not be an acceptable
answer.

I forget if you've addressed this already, but have you decided how to
answer the convenience-vs-precision question in other places where
Scheme and CL differ? E.g. is (car '()) an error or just '()?

Jeremy

Drew McDermott

unread,
Dec 8, 2002, 11:23:26 PM12/8/02
to
Tom Lord wrote:

>
>
> So, with far too little context to make it a meaningful question:
> should CL take precedence over the prevailing opinion on c.l.s?
> Does CL matter in this regard?

I prefer CL to Scheme overall, but I think Scheme is right and CL is
wrong on this issue. See http://www.cs.yale.edu/homes/dvm/nil.html for
a fuller explanation.

-- Drew McDermott

Adam Warner

unread,
Dec 9, 2002, 12:07:09 AM12/9/02
to
Hi Drew McDermott,

Discussing your reference: Of course nil is overloaded. That's what makes
it so incredibly convenient.

What's the significance of the first issue you raise? nil and '() cannot
be printed in a consistently different way because they are equivalent. So
you argue for them not being equivalent by saying that if they were not
equivalent you would be able to tell them apart.

So you then go on to say "I will now describe the set of tricks I have
adopted to work around these problems, to the extent they can be worked
around."

Drew, you have created a solution in search of a problem. You tell use
that the underlying problem is "there simply is no unambiguous
representation, readable or otherwise, of false that distinguishes it from
the empty list."

Provide some examples for how your "Scheme" benefits in the production of
clearer, more concise, precise and correct code. That will provide some
evidence why "Scheme is right and CL is wrong on this issue." As it stands
you've just made a claim without providing any supporting evidence.

Regards,
Adam

Drew McDermott

unread,
Dec 9, 2002, 9:59:49 AM12/9/02
to
Adam Warner wrote:

>
> Drew, you have created a solution in search of a problem. You tell use
> that the underlying problem is "there simply is no unambiguous
> representation, readable or otherwise, of false that distinguishes it from
> the empty list."
>
> Provide some examples for how your "Scheme" benefits in the production of
> clearer, more concise, precise and correct code. That will provide some
> evidence why "Scheme is right and CL is wrong on this issue." As it stands
> you've just made a claim without providing any supporting evidence.

I said in my little screed that "...nil is over-overloaded in Common
Lisp. This probably doesn't matter to most people, but I am interested
in the subject of type-checking CL code, and nil is almost
un-type-checkable."

I grant you that if the original poster (Tom Lord) doesn't care about
type checking, then my article will carry no weight, or might even
persuade him to handle NIL the CL way.

However, even if you think you don't care about type checking, the
question is whether you think it is really cool to write

(defun aify (lst)
(cond (lst (cons 'a (aify (cdr lst)))
(t 'a)))

instead of

(defun aify (lst)
(cond ((null lst) 'a)
(t (cons 'a (aify (cdr lst))))))

The first version obscures whether 'lst' is of type boolean or list. I
would rather be clear about it. Of course, in Lisp there really is no
type 'boolean', so one may choose to declare that the question whether
'lst' is a boolean or a list is meaningless. I used to think that was
cool, but I now think it's just about as lame as C's conflation of
boolean and integer. It is odd that the poor little boolean type so
often gets sucked into some other convenient type, whereupon the
resulting confusion is proclaimed as totally intuitive and elegant. C
programmers think it's obvious that false is 0, Lisp programmers think
it's obvious that false is the empty list, Perl programmers think it's
obvious that false is the empty string (among other things), and so forth.

I don't expect to persuade very many people, and I don't think the issue
is worth arguing much further.

-- Drew McDermott

Tim Bradshaw

unread,
Dec 9, 2002, 10:26:43 AM12/9/02
to
* Drew McDermott wrote:

> The first version obscures whether 'lst' is of type boolean or list.
> I would rather be clear about it. Of course, in Lisp there really is
> no type 'boolean', so one may choose to declare that the question
> whether 'lst' is a boolean or a list is meaningless.

Well, actually there is a BOOLEAN type, its members are just members
of other types too.

> I used to think
> that was cool, but I now think it's just about as lame as C's
> conflation of boolean and integer.

My guess is that most CL programmers don't think this is lame. I
don't anyway, I *like* it. I find languages with disjoint boolean
types a pain, and I hate the kind of half-disjoint thing that scheme
does (I think: they may have changed it) almost as much: there's a
unique false object, not used for anything else, but anything not it
is true.

I *do* think that there are all sorts of horrors in perl to do with
false, but they are really to do with the kind of awful "0" = 0 = ""
hideousness that perl revels in so much.

So, I can't be having with half-hearted defences of the NIL/()/false
equality in CL: I think it is a GOOD THING, and I think C's 0/false
equivalence is a GOOD THING too. So there.

--tim

Duane Rettig

unread,
Dec 9, 2002, 1:00:01 PM12/9/02
to
lo...@emf.emf.net (Tom Lord) writes:

> Stupid question, but why do you need a new language that is
> almost but not quite Scheme?

[ ... ]

> More serious answer:
>
> Lots of little reasons, plus a trust in my overall aesthetics.
>
> On the narrow #f/nil issue, the c.l.s. crowd, by virtue of their
> unanimity, have me convinced to override my intuitions and make
> #f and nil distinct.

This isn't consistent. If the unanimity of the Scheme community has
caused you to override your intuitions (rather than change them) then
there is nothing for you to do _except_ to implement another Scheme.

> But maybe c.l.l'ers can soften that.

I would never touch that. In fact, I would go the opposite direction
and encourage you to just do a Scheme.

The first edition of CLtL has a number of Lisp languages which were the
intended targets of consolidation in the Common Lisp effort to bring
the various dialects together into one Lisp language. Scheme was one
of those languages listed on the cover. There was obviously much
contention in the design of CL from users of all of these lisp dialects,
as to what features from various languages would be incorporated.
Obviously, users of each of the contributing languages would have had
a sense of unfairness, that not enough of their own language was
getting into CL. But eventually, the user communities of most of
these languages switched to CL (in fact, I think the Lisp community
surprised itself at the rapidity of CL acceptance; perhaps they were
just very hungry for a standard).

But the Scheme community did not make that change, and did not accept
CL as the new "common" Lisp. Obviously there are fundamental reasons
why they might not, but many of the other languages also had
fundamental reasons why acceptance might not be forthcoming (perhaps
that is why so many were surprised at the quick acceptance of CL).
One of the fundamental issues that blocked the Scheme community from
CL acceptance was the #f/nil issue. Another was the Lisp1/Lisp2 issue,
and there are a few others, including the issue of whether the language
definition should be large or small and whether macros shoudl be
textually oriented or structurally oriented.

Nowadays, there are essentially two camps, the Scheme camp and the
CL camp. If you want to create a third camp based on fundamental
differences, you are of cource welcome to do so, but it is an uphill
climb which I wouldn't recommend. Therefore, since you are leaning
toward the Scheme side, I would recommend that you go all the way and
implement a Scheme.

--
Duane Rettig du...@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182

Adam Warner

unread,
Dec 9, 2002, 6:41:23 PM12/9/02
to
Hi Drew McDermott,

> I grant you that if the original poster (Tom Lord) doesn't care about
> type checking, then my article will carry no weight, or might even
> persuade him to handle NIL the CL way.
>
> However, even if you think you don't care about type checking, the
> question is whether you think it is really cool to write
>
> (defun aify (lst)
> (cond (lst (cons 'a (aify (cdr lst)))
> (t 'a)))
>
> instead of
>
> (defun aify (lst)
> (cond ((null lst) 'a)
> (t (cons 'a (aify (cdr lst))))))
>
> The first version obscures whether 'lst' is of type boolean or list. I
> would rather be clear about it. Of course, in Lisp there really is no
> type 'boolean', so one may choose to declare that the question whether
> 'lst' is a boolean or a list is meaningless.

So your concern isn't with nil being false, it is with everything else
being true. Such a comparison is a boolean/predicate test and I notice I
use it often as a helpful simplification. For example I can return a
variable that not only allows me to write (when error ...) but also
contains the data itself.

> I used to think that was cool, but I now think it's just about as lame
> as C's conflation of boolean and integer. It is odd that the poor little
> boolean type so often gets sucked into some other convenient type,
> whereupon the resulting confusion is proclaimed as totally intuitive and
> elegant.

That's because it is. If you really only want a symbol that is used for
nothing but a boolean/predicate test then suffix it with p or -p. There is
no confusion.

Your type safety issue is a red herring since there can only be two
outcomes to a predicate test, false and not-false.

From the Hyperspec:

true n. any object that is not false and that is used to represent the
success of a predicate test

The only thing that is false is the symbol nil:

false n. the symbol nil, used to represent the failure of a predicate
test.

A list type is a correct boolean type because there are only two possible
outcomes from a predicate test. It is type safe.

> C programmers think it's obvious that false is 0, Lisp programmers think
> it's obvious that false is the empty list, Perl programmers think it's
> obvious that false is the empty string (among other things), and so
> forth.

It seems that there are two useful approaches: Have one symbol as false
and denote everything else as true or have one symbol as true and denote
everything else as false. Either are extremely simple and conceptually
elegant.

I'd have to be convinced that it would be good for different multiple
values to simultaneously denote both false and not false. It seems like a
kludge arising out of low level languages. That is, if you're working at
the machine level and you have an integer variable it can't hold a unique
symbol for either false or true. One of the integers has to be used
instead. In recognising this as a machine level kludge I believe the
approach is suboptimal for higher level languages.

> I don't expect to persuade very many people, and I don't think the issue
> is worth arguing much further.

Since you title your article "NIL Considered Harmful" (there can be no
less provocative swipe at Common Lisp) I expect you were looking to
persuade people and do consider the issue worth arguing.

Regards,
Adam

Coby Beck

unread,
Dec 9, 2002, 7:10:09 PM12/9/02
to

"Adam Warner" <use...@consulting.net.nz> wrote in message
news:pan.2002.12.09....@consulting.net.nz...
> Hi Drew McDermott,

>
> > I used to think that was cool, but I now think it's just about as lame
> > as C's conflation of boolean and integer. It is odd that the poor little
> > boolean type so often gets sucked into some other convenient type,
> > whereupon the resulting confusion is proclaimed as totally intuitive and
> > elegant.
>
> That's because it is.

I happen to agree with you, Adam, but I think you are not acknowledging that
this is just our (correct ;) opinion. It was very intuitive for me when C
was my main way of thinking that 0 was false. Indeed i remember running in
alarm to my Lisp professor when (or 0 0) returned t (that was ACL 3.0, and I
think it really did return t, not 0) (eq 0 nil) is no longer unituitive for
me.

> If you really only want a symbol that is used for
> nothing but a boolean/predicate test then suffix it with p or -p. There is
> no confusion.
>
> Your type safety issue is a red herring since there can only be two
> outcomes to a predicate test, false and not-false.

What's wrong with true or not-true? Or true or false and there is no in
between as an acceptable answer.

>
> From the Hyperspec:

I don't think the issue is how do things work in CL. I can understand other
ways of doing it, say boolean is either t or nil and nothing else is the
correct type. I can see people's point when they say FALSE and () should
not be eq. I have even sometimes growled at my code because "" was not
false.

But in the end, I personally quite like it all the way it is.

One CL thing I'll say was always unintuitive to me was the non-nil return
value of MEMBER. I still never feel warm and cozy when using the nthcdr it
returns, it always seems like relying on a quirky behind-the-scenes trick (a
bit)

> > I don't expect to persuade very many people, and I don't think the issue
> > is worth arguing much further.
>
> Since you title your article "NIL Considered Harmful" (there can be no
> less provocative swipe at Common Lisp) I expect you were looking to
> persuade people and do consider the issue worth arguing.

FWIW, I didn't really feel provoked.

--
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")


Adam Warner

unread,
Dec 9, 2002, 8:18:34 PM12/9/02
to
Hi Coby Beck,

> "Adam Warner" <use...@consulting.net.nz> wrote in message
> news:pan.2002.12.09....@consulting.net.nz...
>> Hi Drew McDermott,
>>
>> > I used to think that was cool, but I now think it's just about as
>> > lame as C's conflation of boolean and integer. It is odd that the
>> > poor little boolean type so often gets sucked into some other
>> > convenient type, whereupon the resulting confusion is proclaimed as
>> > totally intuitive and elegant.
>>
>> That's because it is.
>
> I happen to agree with you, Adam, but I think you are not acknowledging
> that this is just our (correct ;) opinion. It was very intuitive for me
> when C was my main way of thinking that 0 was false.

And I indeed believe it is intuitive and appropriate--for a low level
language.

> Indeed i remember running in alarm to my Lisp professor when (or 0 0)
> returned t (that was ACL 3.0, and I think it really did return t, not 0)
> (eq 0 nil) is no longer unituitive for me.
>
>> If you really only want a symbol that is used for nothing but a
>> boolean/predicate test then suffix it with p or -p. There is no
>> confusion.
>>
>> Your type safety issue is a red herring since there can only be two
>> outcomes to a predicate test, false and not-false.
>
> What's wrong with true or not-true?

Nothing. Notice I wrote: "It seems that there are two useful approaches:


Have one symbol as false and denote everything else as true or have one
symbol as true and denote everything else as false. Either are extremely
simple and conceptually elegant."

> Or true or false and there is no in between as an acceptable answer.

To make predicate tests conform to only two input states is acceptable but
appears to needlessly bloat code. If there really was a type safety issue
and CL people were running around creating broken code because of this I
could understand the concern.

>> From the Hyperspec:
>
> I don't think the issue is how do things work in CL. I can understand
> other ways of doing it, say boolean is either t or nil and nothing else
> is the correct type. I can see people's point when they say FALSE and
> () should not be eq. I have even sometimes growled at my code because
> "" was not false.

Thankfully Lisp is designed so that you can concatenate nil (sequences)
when creating a string. So you can set an empty string to nil and still be
able to concatenate the result without code bloat.

> But in the end, I personally quite like it all the way it is.

One could say that this particular annoyance ("" not false) is not enough
to outweigh the benefit of a single false value throughout the
specification.

> One CL thing I'll say was always unintuitive to me was the non-nil
> return value of MEMBER. I still never feel warm and cozy when using the
> nthcdr it returns, it always seems like relying on a quirky
> behind-the-scenes trick (a bit)

I don't have any experience with using member. But this does look quirky:
(if (member nil nil) "true" "false")
"false"

Or:
(if (member '() '()) "true" "false")
"false"

That is the empty list is not a member of the empty list.

(if (member '() '(nil)) "true" "false")
"true"

But an empty list is a member of a non-empty list.

Of course if I rephrase it as "it true that the above list contains as one
of its members the empty list" then the outcome sounds reasonable.

Regards,
Adam

Coby Beck

unread,
Dec 9, 2002, 9:13:32 PM12/9/02
to

"Adam Warner" <use...@consulting.net.nz> wrote in message
news:pan.2002.12.10....@consulting.net.nz...
> Hi Coby Beck,

>
> I don't have any experience with using member. But this does look quirky:
> (if (member nil nil) "true" "false")
> "false"
>
> Or:
> (if (member '() '()) "true" "false")
> "false"

Not really:

CL-USER 92 > (member 'anything nil)
NIL


>
> That is the empty list is not a member of the empty list.
>
> (if (member '() '(nil)) "true" "false")
> "true"
>
> But an empty list is a member of a non-empty list.

Not every non-empty list:

CL-USER 97 > (member nil '(a b c d))
NIL


>
> Of course if I rephrase it as "it true that the above list contains as one
> of its members the empty list" then the outcome sounds reasonable.

That is indeed the qestion member answers.

CL-USER 93 > (member nil '(a b nil d))
(NIL D)

CL-USER 94 > (member 'a '(a b nil d))
(A B NIL D)

CL-USER 95 > (member 'b '(a b nil d))
(B NIL D)

CL-USER 96 > (member 'c '(a b nil d))
NIL

Alain Picard

unread,
Dec 10, 2002, 6:24:35 AM12/10/02
to
"Adam Warner" <use...@consulting.net.nz> writes:

> Or:
> (if (member '() '()) "true" "false")
> "false"
>
> That is the empty list is not a member of the empty list.

I should hope not! I can just see the printer trying to
print NIL:
Ok, I'll print ()...
No wait! I'll print (())...
No wait! I'll print ((()))....

:-)

Tim Bradshaw

unread,
Dec 10, 2002, 6:37:26 AM12/10/02
to
* Adam Warner wrote:

> That is the empty list is not a member of the empty list.

Yes, of course. The empty list is the set with *no* members, not the
set with the empty set as a member, which is a set with one member.

--tim

Paul Dietz

unread,
Dec 10, 2002, 10:31:10 AM12/10/02
to
> > Indeed i remember running in alarm to my Lisp professor when (or 0 0)
> > returned t (that was ACL 3.0, and I think it really did return t, not 0)
> > (eq 0 nil) is no longer unituitive for me.
> >
> >> If you really only want a symbol that is used for nothing but a
> >> boolean/predicate test then suffix it with p or -p. There is no
> >> confusion.
> >>
> >> Your type safety issue is a red herring since there can only be two
> >> outcomes to a predicate test, false and not-false.
> >
> > What's wrong with true or not-true?
>
> Nothing. Notice I wrote: "It seems that there are two useful approaches:
> Have one symbol as false and denote everything else as true or have one
> symbol as true and denote everything else as false. Either are extremely
> simple and conceptually elegant."


Having (OR 0 0) return T would be in violation of the CL specification,
and also in violation of what the OR macro has been doing in most lisps
for decades. I'd be very surprised if ACL did that.

BTW, one error I made in understanding CL was assuming that the built in
CL predicates return T for true. For most of them there is no requirement
that they do so. Indeed, there's no requirement that they return the
same true value! That is, (EQ (EQL X Y) (EQL X Y)) needn't be true, as
far as I can tell. In practice most lisps will return T for true in most
places; it's a shame the standard didn't require this in more places.

There would be a slight efficiency advantage to having false == 0 in
the usual lisp implementations, since obtaining a nonzero NIL for comparisons
requires either extra instructions or consumes a register. I doubt this
is significant, or worth any extra difficulty it might cause the programmer,
however.

Paul

Edi Weitz

unread,
Dec 10, 2002, 12:44:28 PM12/10/02
to
Paul Dietz <paul.f...@motorola.com> writes:

> BTW, one error I made in understanding CL was assuming that the
> built in CL predicates return T for true. For most of them there is
> no requirement that they do so. Indeed, there's no requirement that
> they return the same true value! That is, (EQ (EQL X Y) (EQL X Y))
> needn't be true, as far as I can tell. In practice most lisps will
> return T for true in most places; it's a shame the standard didn't
> require this in more places.

Is this really an issue? I'd think that something like this (untested)

(defun same-truth-value (v1 v2)
;; (NOT X) is /always/ T or NIL according to CLHS
(eq (not v1) (not v2)))

(same-truth-value (eql x y) (eql x y)) ---> always T

should work. Or am I missing something?

Edi.

Kent M Pitman

unread,
Dec 10, 2002, 2:06:26 PM12/10/02
to
Edi Weitz <e...@agharta.de> writes:

There as a public review comment when standardizing that came from
Boyer (of Boyer/Moore) and John McCarthy, if I recall correctly.
The committee replied that they should do exactly as you suggest.

They were of the misimpression that ANSI CL, then only dpANS CL, had
changed this behavior from CLTL. In fact, I had only changed the
presentation terminology to make it more apparent what Steele had done
without objection (perhaps BECAUSE it was more obscurely presented)
years before.

As an aside, it's probably worth the potential efficiency gain of doing
(declaim (inline same-truth-value))
before the DEFUN for it, so that it can be inlined reasonably. There's
quite a good chance that at least the call to EQ and perhaps even the
two calls to NOT will be open coded.

Erik Naggum

unread,
Dec 10, 2002, 2:30:15 PM12/10/02
to
* Paul Dietz

| Having (OR 0 0) return T would be in violation of the CL
| specification, and also in violation of what the OR macro has been
| doing in most lisps for decades. I'd be very surprised if ACL did
| that.

It was my experience that the version of Allegro CL known as "ACL
3.0 for Windows" was a very strange version of Common Lisp. I had
tried to use it for a project, but when there were so many oddities
that I spent more time trying to figure out which language it tried
to implement than to implement my own application, I dropped it.

| In practice most lisps will return T for true in most places; it's
| a shame the standard didn't require this in more places.

Hm. I did not follow your reasoning here. Why is it a shame? By
the way, which operators return a `boolean´ as opposed to only a
"generalized boolean"? The obvious choice is of course `not´.

| There would be a slight efficiency advantage to having false == 0
| in the usual lisp implementations, since obtaining a nonzero NIL
| for comparisons requires either extra instructions or consumes a
| register.

If you use a register for `nil´, you can exploit that in ways that
are quite a lot more beneficial than the savings of using 0.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Paul Dietz

unread,
Dec 10, 2002, 2:52:55 PM12/10/02
to
Edi Weitz wrote:

> Or am I missing something?

Just the point I was trying to make.

Paul

Paul Dietz

unread,
Dec 10, 2002, 3:15:23 PM12/10/02
to
Erik Naggum wrote:

> | In practice most lisps will return T for true in most places; it's
> | a shame the standard didn't require this in more places.
>
> Hm. I did not follow your reasoning here. Why is it a shame?

Because there is little or no cost to adding this to the language
specification, and it makes it easier for users to avoid potential portability
problems. If users depend on a de facto standard (that these functions
do, in fact, return T for true) then I'd like that to be an actual
standard behavior.

It's not a very strong critique; many existing lisps are not ANSI compliant
anyway, so you have much more serious potential portability problems.


> By
> the way, which operators return a `boolean´ as opposed to only a
> "generalized boolean"? The obvious choice is of course `not´.

The ALWAYS and NEVER termination test clauses of LOOP return T on success.
Aside from NOT (and NULL), I don't know of any others.


> If you use a register for `nil´, you can exploit that in ways that
> are quite a lot more beneficial than the savings of using 0.

Quite true.

Paul

Duane Rettig

unread,
Dec 10, 2002, 5:00:02 PM12/10/02
to
Paul Dietz <paul.f...@motorola.com> writes:

> Erik Naggum wrote:
>
> > By
> > the way, which operators return a `boolean´ as opposed to only a
> > "generalized boolean"? The obvious choice is of course `not´.
>
> The ALWAYS and NEVER termination test clauses of LOOP return T on success.
> Aside from NOT (and NULL), I don't know of any others.

Implementationally, I believe that this is as it should be. It is
implementationally easier to generate a generalized boolean than a
boolean, but the specification of NOT/NULL turning a generalized
boolean into a boolean takes care of the specialized situations where
one needs T or NIL (which really aren't that often, since most true
or false arguments to CL functions are required to be generalized
booleans).

Also, implementing (not (not (<expression>))), or (not (null (<expression>)))
is very inexpensive.

Consider:

(when <pred>
<conseq>)

The generated code generates both the predicate and the consequent, and
between them places a "jump if result not NIL" instruction whose target
is the code just after the consequent. Now consider:

(when (not <pred>)
<conseq>)

which is the same as

(unless <pred>
<conseq>)

In this case, a "jump if result is NIL" instruction is placed after
the predicate. A decent CL compiler will simply reverse the sense
of the conditional jump instruction for each level of negation, for as
many NOTs or NULLs are wrapped around the predicate.

Now, finally, consider a non-predicate situation:

(setq x (member y z))

The assignment of the result to x is direct; so what is returned
from the function (MEMBER, in this case) is significant. Suppose we
want T or NIL from this:

(setq x (null (member y z)))

In this case, the compilation of NOT must create two branches, one for
the T result and one for the NIL result. It is equivelent to

(setq x (if (member y z) nil t))

But note that this result is negated, and we might want the positive
logic as the result:

(setq x (not (null (member y z))))

Which is equivelent to

(setq x (if (member y z) t nil))

This compiles to no more code than the previous example, because the
only real expense is in the final generation of the boolean values to
return.

Duane Rettig

unread,
Dec 10, 2002, 5:00:02 PM12/10/02
to
Duane Rettig <du...@franz.com> writes:

> The generated code generates both the predicate and the consequent, and
> between them places a "jump if result not NIL" instruction whose target
> is the code just after the consequent. Now consider:

This is poorly worded. I should have said "The compiler generates code
for both the ..."

Erik Naggum

unread,
Dec 10, 2002, 5:34:53 PM12/10/02
to
* Paul Dietz

| It's not a very strong critique; many existing lisps are not ANSI
| compliant anyway, so you have much more serious potential
| portability problems.

If I read you right, the effect of specifying a boolean instead of
generalized boolean value would only have been to create more
non-conforming implementations...

I used to believe that people (both programmers and vendors) would
value conformance to the standard and based much of my effort on
this premise, but it appears that programmers are either clueless
or know how to circumvent the shortcomings and few vendors give a
damn, but this has, in turn, dampened my enthusiasm for both
vendors, language, and community.

Erik Naggum

unread,
Dec 10, 2002, 6:08:44 PM12/10/02
to
* Kent M Pitman

| There as a public review comment when standardizing that came from
| Boyer (of Boyer/Moore) and John McCarthy, if I recall correctly.
| The committee replied that they should do exactly as you suggest.

But `eq´ is not guaranteed to return a boolean. It may seem rather
silly for it to return anything else, but the specification is very
clear that it does not, in fact, return `t´ when it means true. So

(defun same-truth-value (v1 v2)
(not (not (eq (not v1) (not v2)))))

will return `t´ when the two are the same truth value.

Now, the much more important question is why anyone would care what
the specific "true" value is. Nobody tests for anything but `nil´,
anyway, and writing code that tests for any /particular/ true value
is broken.

Incidentally, the above function will effectively, since machine
operations do not return `nil´ and `t´ but some CPU flag, be at
least one conditional, so a version of `same-truth-value´ that
returned a generalized boolean could also be written as

(defun same-truth-value (v1 v2)
(if v1 v2 (not v2)))

and if a true boolean is necessary, use (and v2 t) instead of v2.
In natively compiled Common Lisp implementations, this is much more
efficient, too, but when expressed this way, there is seldom a need
for a function for it.

Also note that the standard name for this boolean operator is `eqv´,
cf `logeqv´ and `boole-eqv´.

Paul F. Dietz

unread,
Dec 10, 2002, 9:23:59 PM12/10/02
to
Duane Rettig wrote:

> Implementationally, I believe that this is as it should be. It is
> implementationally easier to generate a generalized boolean than a
> boolean,

Ok. For example, it's often slightly easier to generate a small
fixnum than it would be to load the address (+ mark bits) of the T symbol.

But then why does (eq 'x 'x) return T in ACL? :)

Paul

Duane Rettig

unread,
Dec 11, 2002, 1:00:01 AM12/11/02
to
"Paul F. Dietz" <di...@dls.net> writes:

> Duane Rettig wrote:
>
> > Implementationally, I believe that this is as it should be. It is
> > implementationally easier to generate a generalized boolean than a
> > boolean,
>
>
> Ok. For example, it's often slightly easier to generate a small
> fixnum than it would be to load the address (+ mark bits) of the T symbol.

Not necessarily. T is only one memory reference, and that location is
likely to be in cache. It also tends not to have any pipeline or
functional unit conflicts, because the memory reference is coming just
after a comparison operation and a possible jump. It is usually actually
more consistent to just load T than to try to calculate some new value,
unless that value is in fact the result of some calculation already known
to be true. If that value is numeric, you then have to load NIL on a false
result anyway.

> But then why does (eq 'x 'x) return T in ACL? :)

Um, er, ah, ... Because it can? :-)


I'm actually working on an answer to a question Erik asked earlier.
I'll answer seriously there.

Duane Rettig

unread,
Dec 11, 2002, 2:00:01 AM12/11/02
to
Erik Naggum <er...@naggum.no> writes:

> Now, the much more important question is why anyone would care what
> the specific "true" value is. Nobody tests for anything but `nil´,
> anyway, and writing code that tests for any /particular/ true value
> is broken.

In a GC which uses a write-barrier for new objects, T and NIL can
usually be guaranteed to be old and thus the write-barrier can be elided.
Thus, if a generalized boolean can be guaranteed by the implementation
to only return T or NIL, even though it might be allowed to return
something else for true, it can then lead to more efficient code.

Unfortunately, I lost track of this optimization, because of a fluke
in the naming (we had implemented a comp::boolean type for the compiler,
which didn't get fully translated when cl:boolean was exported as a
late entry in the CL spec). I've now recovered the optimization,
and once again my devel version implements (setf x (the boolean y))
efficiently.

Bruce Hoult

unread,
Dec 11, 2002, 3:09:44 AM12/11/02
to
In article <3DF608BE...@motorola.com>,
Paul Dietz <paul.f...@motorola.com> wrote:

> There would be a slight efficiency advantage to having false == 0 in
> the usual lisp implementations, since obtaining a nonzero NIL for comparisons
> requires either extra instructions or consumes a register. I doubt this
> is significant, or worth any extra difficulty it might cause the programmer,
> however.

Depends on how the tags work in each implementation, but having NIL be
at machine address zero is about equally as likely to be optimal.

-- Bruce

Paul F. Dietz

unread,
Dec 11, 2002, 6:07:32 AM12/11/02
to
Bruce Hoult wrote:

> Depends on how the tags work in each implementation, but having NIL be
> at machine address zero is about equally as likely to be optimal.

But that means symbols need to be the objects with zero tag bits,
which slows down fixnum arithmetic.

Paul

Paul F. Dietz

unread,
Dec 11, 2002, 6:31:41 AM12/11/02
to
Duane Rettig wrote:

>>Ok. For example, it's often slightly easier to generate a small
>>fixnum than it would be to load the address (+ mark bits) of the T symbol.
>
> Not necessarily. T is only one memory reference, and that location is
> likely to be in cache. It also tends not to have any pipeline or
> functional unit conflicts, because the memory reference is coming just
> after a comparison operation and a possible jump. It is usually actually
> more consistent to just load T than to try to calculate some new value,
> unless that value is in fact the result of some calculation already known
> to be true. If that value is numeric, you then have to load NIL on a false
> result anyway.

Well, here's an example. Generating a 0 appears to be slightly more
compact than loading T. (ACL 6.2 Linux x86 trial; optimization settings:
(speed 3) (safety 0) (space 0) (debug 0))

CL-USER(22): (defun foo (x y) (eq x y))
FOO
CL-USER(23): (defun bar (x y) (if (eq x y) 0 nil))
BAR
CL-USER(24): (compile 'foo)
FOO
NIL
NIL
CL-USER(25): (compile 'bar)
BAR
NIL
NIL
CL-USER(26): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals:

;; code start: #x7149b9a4:
0: 3b c2 cmpl eax,edx
2: 74 07 jz 11
4: 8b c7 movl eax,edi
6: f8 clc
7: 8b 75 fc movl esi,[ebp-4]
10: c3 ret
11: 8b 47 e7 movl eax,[edi-25] ; T
14: f8 clc
15: eb f6 jmp 7
17: 90 nop
CL-USER(27): (disassemble 'bar)
;; disassembly of #<Function BAR>
;; formals:

;; code start: #x7149cb04:
0: 3b c2 cmpl eax,edx
2: 75 07 jnz 11
4: 33 c0 xorl eax,eax
6: f8 clc
7: 8b 75 fc movl esi,[ebp-4]
10: c3 ret
11: 8b c7 movl eax,edi
13: f8 clc
14: eb f7 jmp 7
CL-USER(28):

Paul

Raymond Toy

unread,
Dec 11, 2002, 10:14:43 AM12/11/02
to
>>>>> "Paul" == Paul F Dietz <di...@dls.net> writes:

Paul> Duane Rettig wrote:

Paul> Well, here's an example. Generating a 0 appears to be slightly more
Paul> compact than loading T. (ACL 6.2 Linux x86 trial; optimization settings:
Paul> (speed 3) (safety 0) (space 0) (debug 0))

But would the answer have been different if you ran this test on a
Sparc? At least with CMUCL foo is

CDC: CMP %A1, %A2 ; No-arg-parsing entry point
CE0: BPEQ %ICC, L1
CE4: NOP
CE8: MOV %NULL, %A0
CEC: L0: MOV %CFP, %CSP
CF0: MOV %OCFP, %CFP
CF4: J %LRA+5
CF8: MOV %LRA, %CODE
CFC: L1: ADD %NULL, 28, %A0 ; T
D00: BP %ICC, L0
D04: NOP

and bar is

9C: CMP %A1, %A2 ; No-arg-parsing entry point
A0: BPEQ %ICC, L1
A4: NOP
A8: MOV %NULL, %A0
AC: L0: MOV %CFP, %CSP
B0: MOV %OCFP, %CFP
B4: J %LRA+5
B8: MOV %LRA, %CODE
BC: L1: MOV %ZERO, %A0
C0: BP %ICC, L0
C4: NOP

The only difference it the instruction at L1. Instead of loading a
zero to %A0, a small constant is added to the %NULL register (which is
NIL) to create T.

Ray

Duane Rettig

unread,
Dec 11, 2002, 1:00:01 PM12/11/02
to
"Paul F. Dietz" <di...@dls.net> writes:

> Duane Rettig wrote:
>
> >>Ok. For example, it's often slightly easier to generate a small
> >>fixnum than it would be to load the address (+ mark bits) of the T symbol.
> > Not necessarily. T is only one memory reference, and that location
> > is
>
> > likely to be in cache. It also tends not to have any pipeline or
> > functional unit conflicts, because the memory reference is coming just
> > after a comparison operation and a possible jump. It is usually actually
> > more consistent to just load T than to try to calculate some new value,
> > unless that value is in fact the result of some calculation already known
> > to be true. If that value is numeric, you then have to load NIL on a false
> > result anyway.
>
> Well, here's an example. Generating a 0 appears to be slightly more
> compact than loading T. (ACL 6.2 Linux x86 trial; optimization settings:
> (speed 3) (safety 0) (space 0) (debug 0))

Yes, it does generate a two-byte instruction instead of a three-byte
instruction on the x86 (but, as Raymond pointed out, not on other
architectures). However, the savings are small - less than 3% in this
very trivial case, which means that real applications will see even less
advantage. As I said earlier, and your example code shows, what could have
been a disadvantage via a memory reference is mitigated by the lack of any
neighboring pipeline-killing instructions - the memory reference completes
when it completes, there is no immediate usage of the target eax register,
and thus the instruction locus continues on with little delay.

Also, in a CL that provides a foreign-function interface, 0 is an
especially bad value to use for <true>, because when a foreign argument
is declared to be type boolean, the translation semantics become
confusing for 0, which is true in lisp and false in C.

Kent M Pitman

unread,
Dec 11, 2002, 1:11:14 PM12/11/02
to
Duane Rettig <du...@franz.com> writes:

> Also, in a CL that provides a foreign-function interface, 0 is an
> especially bad value to use for <true>, because when a foreign argument
> is declared to be type boolean, the translation semantics become
> confusing for 0, which is true in lisp and false in C.

Interesting point.

Of curiosity, is this an actual semantic ambiguity problem or just a
trap waiting to nab people who don't keep contexts straight? It doesn't
seem like there's every any actual ambiguity, but I can easily see how
human beings could be made to show their potential fallibilities in this
context.

Could we go back and change things, I've advocated the "many false values"
approach that MOO takes. It has a very good feel. I'd like to see
"", 0, 0.0, NIL, #(), and all error objects be false.

But oh well... we have what we have and it's been "good enough" for a
long time. I'm content. Every strategy has its advantages and
disadvantages. Even if I got what I wanted, we'd be discussing the
"good ole days" when there was only one truth value because of reasons
we presently take for granted... You never get anything for free.

Kenny Tilton

unread,
Dec 11, 2002, 1:56:20 PM12/11/02
to

Kent M Pitman wrote:
> Could we go back and change things, I've advocated the "many false values"
> approach that MOO takes. It has a very good feel. I'd like to see
> "", 0, 0.0, NIL, #(), and all error objects be false.

There's a blonde joke here somewhere.

:)

--

kenny tilton
clinisys, inc
http://www.tilton-technology.com/
---------------------------------------------------------------
"Cells let us walk, talk, think, make love and realize the bath water is
cold."
-- Lorraine Lee Cudmore

Duane Rettig

unread,
Dec 11, 2002, 3:00:01 PM12/11/02
to
Kent M Pitman <pit...@world.std.com> writes:

> Duane Rettig <du...@franz.com> writes:
>
> > Also, in a CL that provides a foreign-function interface, 0 is an
> > especially bad value to use for <true>, because when a foreign argument
> > is declared to be type boolean, the translation semantics become
> > confusing for 0, which is true in lisp and false in C.
>
> Interesting point.
>
> Of curiosity, is this an actual semantic ambiguity problem or just a
> trap waiting to nab people who don't keep contexts straight? It doesn't
> seem like there's every any actual ambiguity, but I can easily see how
> human beings could be made to show their potential fallibilities in this
> context.

Correct. There is no ambiguity. But pity the poor tired C/Lisp programmer
who is spending a late night with a foreign interface: "I don't understand
this at all; when I call propagate_truth_value() directly from C with a 0
argument, I get the right behavior, and the false value is propagated, but
when I call it from Lisp with the same 0 argument, it is doing something
different...". How many times have you stared at code over and over again,
with the answer staring right back at you, and you just _can't_ see it?

> Could we go back and change things, I've advocated the "many false values"
> approach that MOO takes. It has a very good feel. I'd like to see
> "", 0, 0.0, NIL, #(), and all error objects be false.

Ouch; that would be an implementor's nightmare, unless you also then
provided for only one "true" value. Imagine how a test for truth or
falsehood would have to be implemented!

> But oh well... we have what we have and it's been "good enough" for a
> long time. I'm content. Every strategy has its advantages and
> disadvantages. Even if I got what I wanted, we'd be discussing the
> "good ole days" when there was only one truth value because of reasons
> we presently take for granted... You never get anything for free.

The grass _looks_ greener on the other side... but is it, really?

Roger Corman

unread,
Dec 11, 2002, 3:22:53 PM12/11/02
to
On Wed, 11 Dec 2002 05:07:32 -0600, "Paul F. Dietz" <di...@dls.net>
wrote:

Not necessarily. It could be at address 0 but there could still be tag
bits i.e. NIL could be 0x1, 0x2 or 0x3. Loading small constants is a
small instruction (on intel processors anyway).

Corman Lisp just keeps NIL and T in cells indexed off a committed
register: [esi] and [esi + 4]. The esi register is always pointing
here when lisp code is executing. This strategy also works well for
RISC processors--even better, because they usually have more
registers. I don't think there is much faster than loading these,
because they should almost always be in cache and the instructions are
small and well-optimized. Given this, it doesn't matter too much what
representation is used. There is a cost of course in committing a
register (especially on intel with few registers). However, in this
case, that register is used for many, many more uses, so the specific
cost for these particular cells is negligable. In other words, many
other things are stored at offsets from that register, and it works in
conjunction with the processor stack to maintain thread execution
state. This would be needed in any case. For old-school mac
programmers, it's like the old A5-world (where everything was hanging
off the A5 register).

Roger

Hannah Schroeter

unread,
Dec 11, 2002, 2:32:36 PM12/11/02
to
Hello!

Kenny Tilton <kti...@nyc.rr.com> wrote:

>Kent M Pitman wrote:
>> Could we go back and change things, I've advocated the "many false values"
>> approach that MOO takes. It has a very good feel. I'd like to see
>> "", 0, 0.0, NIL, #(), and all error objects be false.

>There's a blonde joke here somewhere.

Seems Kent has converted to some perl derivative recently *g*

Kind regards,

Hannah.

Thomas Stegen

unread,
Dec 11, 2002, 5:00:54 PM12/11/02
to
Erik Naggum wrote:
> I used to believe that people (both programmers and vendors) would
> value conformance to the standard and based much of my effort on
> this premise, but it appears that programmers are either clueless
> or know how to circumvent the shortcomings and few vendors give a
> damn, but this has, in turn, dampened my enthusiasm for both
> vendors, language, and community.

The difference is certainly stark when comparing the C community
and the Common Lisp community (as represented by their respective
newsgroups comp.lang.c and comp.lang.lisp). C people are fanatic
about their standard compared to Common Lisp people. I think the
C people does the right thing.

One of the effects of this is that if I write some ISO C code
I can be pretty sure it will compile and run and any C implementation
on the planet. I don't feel that I have this guarantee when
writing Common Lisp. Which is a shame since IMHO Common Lisp is
the better language of the two.

--
Thomas.

Raymond Toy

unread,
Dec 11, 2002, 5:57:16 PM12/11/02
to
>>>>> "Thomas" == Thomas Stegen <tst...@cis.strath.ac.uk> writes:

Thomas> One of the effects of this is that if I write some ISO C code
Thomas> I can be pretty sure it will compile and run and any C implementation
Thomas> on the planet. I don't feel that I have this guarantee when

But how many different C implementations have you really tried it on?
And on how many different platforms?

Ray

Thomas Stegen

unread,
Dec 11, 2002, 6:18:26 PM12/11/02
to
Raymond Toy wrote:

> But how many different C implementations have you really tried it on?
> And on how many different platforms?

Not to many (4 IO think), but I can name quite a few standard conforming
implementations.


--
Thomas.

Greg Menke

unread,
Dec 11, 2002, 9:39:37 PM12/11/02
to

You're kidding right? Download cryptlib and look at the comments in
the Makefile if you'd like to read some humorous commentary on the
state of C portability between different architectures & compilers.

Highly portable "ISO" code is easy until you start doing complicated
things or actually try to use operating system features, then you
really quickly end up either drowning in homebrewed ifdefs or you sign
your soul over to something like automake/autoconf/configure. And
thats before you even start trying to support different compilers.
Sure, the simple cases are pretty easy- but thats not the point, the
important question is how difficult do the various implementations on
the various architectures make portability in the face of complex
software.

Just try to find highly or even reasonably portable support in C for
things as seemingly simple as managing pathnames or date/time without
tracking down a library that happens to support your target
architectures- make it easy and only include the Unix family in that
list, worry about the various Windows permutations later. Then you'll
really enjoy how portable CL is.

Gregm

Christopher C. Stacy

unread,
Dec 11, 2002, 11:10:02 PM12/11/02
to
>>>>> On Wed, 11 Dec 2002 22:00:54 +0000, Thomas Stegen ("Thomas") writes:
Thomas> One of the effects of this is that if I write some ISO C code
Thomas> I can be pretty sure it will compile and run and any C
Thomas> implementation on the planet. I don't feel that I have this
Thomas> guarantee when writing Common Lisp. Which is a shame since
Thomas> IMHO Common Lisp is the better language of the two.

I'm not sure why you have that feeling, because if you
write ANSI Common Lisp code, it will compile and run on
any ANSI Common Lisp implementation on the planet.

And you'll be able to write portable code a zillion
times more easily than in C. And you'll know very well
when you're doing something non-standard in Lisp.

I'm under the impression that the Lisp spec is tighter than
the C spec, and I am definitely positive that the Lisp spec
includes a ton more functionality.

There are some implementations of Lisp that are not fully
ANSI compliant, but they tell you that right up front.
There are also C implementations that are not ANSI compliant,
but I am not sure they are widely used any more for mainstream
programming. An interesting question for you would be: why do
you suppose it is possible for some people to get their work done,
writing portable code, when using a non-compliant Lisp implementation,
but that situation does not happen in C? (Part of the answer is
surely that those Lisps are so close to being ANSI compliant that
their users can't tell the difference.)

If you call third-party libraries, they are not ANSI standard.
But you'll find that it's easy to write compatible interface
libraries, so that even your calls to third-party code will
be totally portable. In fact, some people have gone off and
written such compatability libraries and shared them.

But if you stick to ANSI implementations, your code will be portable.
It does not matter what the word size is, or whether the machine has
registers, or what the byte order is, or anything else. I don't know
how you got an impression that is so backwards from reality.

Jochen Schmidt

unread,
Dec 12, 2002, 3:37:06 AM12/12/02
to
Christopher C. Stacy wrote:

>>>>>> On Wed, 11 Dec 2002 22:00:54 +0000, Thomas Stegen ("Thomas") writes:
> Thomas> One of the effects of this is that if I write some ISO C code
> Thomas> I can be pretty sure it will compile and run and any C
> Thomas> implementation on the planet. I don't feel that I have this
> Thomas> guarantee when writing Common Lisp. Which is a shame since
> Thomas> IMHO Common Lisp is the better language of the two.
>
> I'm not sure why you have that feeling, because if you
> write ANSI Common Lisp code, it will compile and run on
> any ANSI Common Lisp implementation on the planet.

I agree on the point that ANSI CL code is more portable than C code.
But I think there are things that make writing portable ANSI CL code
unnecessarily difficult.

One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows) lineending issue.
This differences make read-line and write-line useless for I/O that has to
speak a protocol like HTTP, IRC or most other common internet
text-protocols. If I call read-line on such a source one of the following
will happen:

[xxxxx]abc ; Windows
[xxxxxCR]abc ; Unix
[xxxxx]LFabc ; MacOS

The part in the brackets is what read-line would return and the part after
it is what is left in the stream.

Some implementations of Common Lisp allow to change the lineending style via
external formats - but that is not portable ANSI CL code either.

So it seems the only thing one can do is to write your own line reader
function in portable ANSI CL (via read-char/unread-char/peek-char).

What do others when faced with this issue?

--
http://www.dataheaven.de

Christopher C. Stacy

unread,
Dec 12, 2002, 4:46:20 AM12/12/02
to
>>>>> On Thu, 12 Dec 2002 09:37:06 +0100, Jochen Schmidt ("Jochen") writes:

Jochen> I agree on the point that ANSI CL code is more portable than
Jochen> C code. But I think there are things that make writing
Jochen> portable ANSI CL code unnecessarily difficult.

Jochen> One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows)
Jochen> lineending issue. This differences make read-line and
Jochen> write-line useless for I/O that has to speak a protocol like
Jochen> HTTP, IRC or most other common internet text-protocols.
Jochen> If I call read-line on such a source one of the following
Jochen> will happen:

Jochen> [xxxxx]abc ; Windows
Jochen> [xxxxxCR]abc ; Unix
Jochen> [xxxxx]LFabc ; MacOS

Jochen> The part in the brackets is what read-line would return and
Jochen> the part after it is what is left in the stream.

Jochen> Some implementations of Common Lisp allow to change the lineending style via
Jochen> external formats - but that is not portable ANSI CL code either.

Jochen> So it seems the only thing one can do is to write your own line reader
Jochen> function in portable ANSI CL (via read-char/unread-char/peek-char).

What you are seeing is a result of the fact that READ-LINE does not
include the line terminator character in the string it returns.

I think this is better than C, which does not have that degree of
portable IO: C programs have to be coded to know what the line
terminator character is. In Common Lisp, you either don't need to
worry about it at all, or you use the portable character #\Newline.
Lisp takes care of the line terminator for you.

The problem you are experiencing is because of the network protocol.

Neither ANSI Common Lisp (nor C) defines what a network stream is,
or provides a way to open one. So you're already in "extension
land" as soon as you're talking about network streams.

The problem you're having above is that each network protocol defines
what the line terminator is, and there is no way for Lisp (or any
other language) to know what choice was made. If the network protocol
happens to use the same line terminator as your host operating system,
then READ-LINE will happen to work.

Usually, for text command based control connections, the choice
is CR LF. That's the same choice as DOS/Windows, which is why
your first example above works with READ-LINE. But that's not
portable, so you may have to program around that.

In C, you have to write your own function to read the lines
and do the translation. The programs have to be ported.

In Lisp, you could write your own, also, using Grey streams if you like.
But each vendor has also provided an extension that allows you to
automatically control the external character set translation.
This is done either by wrapping translation streams around the network
stream, or just specifying some keywords to the stream opener function.
At that point, you can write portable code and you're back to not
caring about line terminators.

If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
then you ought to be okay with some sort of :EXTERNAL-FORMAT
keyword thingie that is endorsed by the standard.

It's much more than C is doing for you, so it's hard to understand why
you would claim that C is more portable than Lisp in this area.
Or any area.

Jochen> What do others when faced with this issue?

The main server code is handed a stream that implements the correct
#\Newline semantics (and probably buffering and maybe other features).
A small portion of this code, and also the code that opens the network
connection, needs to be ported for each operating system. The main
server code is all portable ANSI Common Lisp, while the network code
is as a whole, not.

Duane Rettig

unread,
Dec 12, 2002, 6:00:09 AM12/12/02
to
Jochen Schmidt <j...@dataheaven.de> writes:

> Christopher C. Stacy wrote:
>
> >>>>>> On Wed, 11 Dec 2002 22:00:54 +0000, Thomas Stegen ("Thomas") writes:
> > Thomas> One of the effects of this is that if I write some ISO C code
> > Thomas> I can be pretty sure it will compile and run and any C
> > Thomas> implementation on the planet. I don't feel that I have this
> > Thomas> guarantee when writing Common Lisp. Which is a shame since
> > Thomas> IMHO Common Lisp is the better language of the two.
> >
> > I'm not sure why you have that feeling, because if you
> > write ANSI Common Lisp code, it will compile and run on
> > any ANSI Common Lisp implementation on the planet.
>
> I agree on the point that ANSI CL code is more portable than C code.
> But I think there are things that make writing portable ANSI CL code
> unnecessarily difficult.
>
> One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows) lineending issue.

This difference also exists in C, a la fgets().

> This differences make read-line and write-line useless for I/O that has to
> speak a protocol like HTTP, IRC or most other common internet
> text-protocols. If I call read-line on such a source one of the following
> will happen:
>
> [xxxxx]abc ; Windows
> [xxxxxCR]abc ; Unix
> [xxxxx]LFabc ; MacOS
>
> The part in the brackets is what read-line would return and the part after
> it is what is left in the stream.

The read-line function is defined to leave out the #\newline that was
read. In C, the '\n' is included by fgets() (a minor difference between
the CL and C styles of reading a line). I did not try this on MacOS,
but on Linux and Windows, I got similar results (a file with the
characters abc\r\n was read into windows as "abc\n" and into Linux
as "abc\r\n"). Thus the same non-portability exists between linux
and Windows in C, even with the supposedly portable Cygwin C library
for Windows.

> Some implementations of Common Lisp allow to change the lineending style via
> external formats - but that is not portable ANSI CL code either.

Yes, the CL spec allows for the vendor to provide portability between
line-ending styles via external formats. What does the C specification
provide?

> So it seems the only thing one can do is to write your own line reader
> function in portable ANSI CL (via read-char/unread-char/peek-char).

Or ask your vendor to provide a portable solution.

> What do others when faced with this issue?

In Allegro CL, we provide external formats which handle CR/LF styles.

Casper H.S. Dik

unread,
Dec 12, 2002, 6:10:49 AM12/12/02
to
Duane Rettig <du...@franz.com> writes:

>The read-line function is defined to leave out the #\newline that was
>read. In C, the '\n' is included by fgets() (a minor difference between
>the CL and C styles of reading a line). I did not try this on MacOS,
>but on Linux and Windows, I got similar results (a file with the
>characters abc\r\n was read into windows as "abc\n" and into Linux
>as "abc\r\n"). Thus the same non-portability exists between linux
>and Windows in C, even with the supposedly portable Cygwin C library
>for Windows.

The ANSI C specification has a single newline character; the source code
is portable and there is no need to handle it specifically.

However, the files need to be converted when send from one OS to the next;
i.e., using fgets() on a DOS CR/LF delimited file on Unix will not
yield the desired results. This is to be expected.

Binary files (binary and non-binary are the same for Unix but different
for DOS) are a portable format.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Jochen Schmidt

unread,
Dec 12, 2002, 9:45:12 AM12/12/02
to
Duane Rettig wrote:

> Jochen Schmidt <j...@dataheaven.de> writes:
>
>> Christopher C. Stacy wrote:
>>
>> >>>>>> On Wed, 11 Dec 2002 22:00:54 +0000, Thomas Stegen ("Thomas")
>> >>>>>> writes:
>> > Thomas> One of the effects of this is that if I write some ISO C code
>> > Thomas> I can be pretty sure it will compile and run and any C
>> > Thomas> implementation on the planet. I don't feel that I have this
>> > Thomas> guarantee when writing Common Lisp. Which is a shame since
>> > Thomas> IMHO Common Lisp is the better language of the two.
>> >
>> > I'm not sure why you have that feeling, because if you
>> > write ANSI Common Lisp code, it will compile and run on
>> > any ANSI Common Lisp implementation on the planet.
>>
>> I agree on the point that ANSI CL code is more portable than C code.
>> But I think there are things that make writing portable ANSI CL code
>> unnecessarily difficult.
>>
>> One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows) lineending
>> issue.
>
> This difference also exists in C, a la fgets().

>...

>> Some implementations of Common Lisp allow to change the lineending style
>> via external formats - but that is not portable ANSI CL code either.
>
> Yes, the CL spec allows for the vendor to provide portability between
> line-ending styles via external formats. What does the C specification
> provide?

It seems I have articulated this not very well. This issue was not meant as
a comparison of portability in CL related to C. I wholeheartedly agree with
those who said that CL is by far more portable than C. I only wanted to
outline one issue which more than one times got in my way of writing
portable code in Common Lisp.

>> So it seems the only thing one can do is to write your own line reader
>> function in portable ANSI CL (via read-char/unread-char/peek-char).
>
> Or ask your vendor to provide a portable solution.

How would such a portable solution look like?

>> What do others when faced with this issue?
>
> In Allegro CL, we provide external formats which handle CR/LF styles.

That is probably the best solution in this situation. I wonder how portable
the use of this external formats is between different lisp-systems.

ciao,
Jochen

--
http://www.dataheaven.de

Jochen Schmidt

unread,
Dec 12, 2002, 10:09:09 AM12/12/02
to
Christopher C. Stacy wrote:

>>>>>> On Thu, 12 Dec 2002 09:37:06 +0100, Jochen Schmidt ("Jochen") writes:
>
> Jochen> I agree on the point that ANSI CL code is more portable than
> Jochen> C code. But I think there are things that make writing
> Jochen> portable ANSI CL code unnecessarily difficult.
>
> Jochen> One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows)
> Jochen> lineending issue. This differences make read-line and
> Jochen> write-line useless for I/O that has to speak a protocol like
> Jochen> HTTP, IRC or most other common internet text-protocols.
> Jochen> If I call read-line on such a source one of the following
> Jochen> will happen:
>
> Jochen> [xxxxx]abc ; Windows
> Jochen> [xxxxxCR]abc ; Unix
> Jochen> [xxxxx]LFabc ; MacOS
>
> Jochen> The part in the brackets is what read-line would return and
> Jochen> the part after it is what is left in the stream.
>
> Jochen> Some implementations of Common Lisp allow to change the
> lineending style via Jochen> external formats - but that is not portable
> ANSI CL code either.
>
> Jochen> So it seems the only thing one can do is to write your own line
> reader Jochen> function in portable ANSI CL (via
> read-char/unread-char/peek-char).
>
> What you are seeing is a result of the fact that READ-LINE does not
> include the line terminator character in the string it returns.

Well - if it were included the code would be in no way more portable.
The problem is how the lineending is recognized.

> I think this is better than C, which does not have that degree of
> portable IO: C programs have to be coded to know what the line
> terminator character is. In Common Lisp, you either don't need to
> worry about it at all, or you use the portable character #\Newline.
> Lisp takes care of the line terminator for you.

'\n' is used as a "portable" newline character in C. This is in the same way
portable like #\newline in CL. The issue is something different - this
facility allows one to write programs which read "local encoded" files and
handle "newline characters" in a portable way.

The issue I meant is handling of non-local encoded sources. One example
might be a network protocol. Another one is simply a file-format of a file
which originates in a filesystem of another OS. An example would be Unix
mbox-files under Windows. The lineendings in those files may only be
#\linefeed characters so Windows based read-line will not be able to
successfully read such a file.

> The problem you are experiencing is because of the network protocol.
>
> Neither ANSI Common Lisp (nor C) defines what a network stream is,
> or provides a way to open one. So you're already in "extension
> land" as soon as you're talking about network streams.

What about files who originate in non-local filesystems?

> The problem you're having above is that each network protocol defines
> what the line terminator is, and there is no way for Lisp (or any
> other language) to know what choice was made. If the network protocol
> happens to use the same line terminator as your host operating system,
> then READ-LINE will happen to work.
>
> Usually, for text command based control connections, the choice
> is CR LF. That's the same choice as DOS/Windows, which is why
> your first example above works with READ-LINE. But that's not
> portable, so you may have to program around that.

Yes this is exactly what I meant.

> In C, you have to write your own function to read the lines
> and do the translation. The programs have to be ported.

Of course - just for the record - This issue was definitely not meant as an
example of C being better than CL. The situation in C is actually the same
or even worse.

> In Lisp, you could write your own, also, using Grey streams if you like.
> But each vendor has also provided an extension that allows you to
> automatically control the external character set translation.
> This is done either by wrapping translation streams around the network
> stream, or just specifying some keywords to the stream opener function.
> At that point, you can write portable code and you're back to not
> caring about line terminators.
>
> If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
> then you ought to be okay with some sort of :EXTERNAL-FORMAT
> keyword thingie that is endorsed by the standard.

Hm - I remember at least ACL does provide external formats for network
streams. Last time I looked LispWorks did not seem to provide specifying
external formats for network streams. I have not yet investigated the
situation in other systems.

> It's much more than C is doing for you, so it's hard to understand why
> you would claim that C is more portable than Lisp in this area.
> Or any area.

Sorry for the confusion - I did not mean that C is doing better here. I did
agree that CL is by far more portable than C. I only wanted to show one
point were I personally had portability problems with CL. This issue did
not have anything to do with a comparison to C.

> Jochen> What do others when faced with this issue?
>
> The main server code is handed a stream that implements the correct
> #\Newline semantics (and probably buffering and maybe other features).
> A small portion of this code, and also the code that opens the network
> connection, needs to be ported for each operating system. The main
> server code is all portable ANSI Common Lisp, while the network code
> is as a whole, not.

Sounds reasonable.

Tim Bradshaw

unread,
Dec 12, 2002, 10:47:52 AM12/12/02
to
* Jochen Schmidt wrote:

> That is probably the best solution in this situation. I wonder how portable
> the use of this external formats is between different lisp-systems.

Both ACL and LispWorks provide line-end-style-conversion external
formats I think. They probably have different names, but this is a
very small place where you need to worry. It would be a mildly good
idea, I think, if some lowest-common-denominator standard
external-format specifications could be defined.

--tim

Arthur Lemmens

unread,
Dec 12, 2002, 11:17:14 AM12/12/02
to
Jochen Schmidt wrote:

> Some implementations of Common Lisp allow to change the lineending style via
> external formats - but that is not portable ANSI CL code either.
>
> So it seems the only thing one can do is to write your own line reader
> function in portable ANSI CL (via read-char/unread-char/peek-char).
>
> What do others when faced with this issue?

I use Gray streams to handle encoding issues. I've defined a class
character-encoding-stream (a subclass of fundamental-character-input-stream and
fundamental-character-output-stream) that takes an arbitrary stream and encodes
(or decodes) everything written to it before passing it to the original stream.
For each encoding I create a new subclass of character-encoding-stream, and
define methods for stream-read-char, stream-write-char and friends.

The encoding tables for most well-known character encodings are automatically
generated at compile time (one more hurray for Common Lisp) from the files
distributed by the Unicode consortium. As an example, here's the definition of
iso-8859-5:

(define-simple-character-encoding :iso-8859-5
:nicknames '(:iso_8859-5:1999 :iso-ir-144 :iso_8859-5 :cyrillic)
:vector #.(load-unicode-table "8859-5"))

On top of that I've defined a couple of macros that allow you to do stuff like:

(with-encodings (stream :crlf :base64 :koi8-r)
(write-line string stream))

This will (conceptually) encode STRING using the koi8-r character encoding,
then apply a base64 encoding to the result, and finally ensure that all newlines
will be translated to sequences of CR-LF characters. One of the charms of the
Gray stream mechanism is that it lets you use functions like write-line or format
'for free': you only need to define a method for stream-write-char to make all
of this work.

The result is, in my opinion, a very elegant and flexible system. It may not
be the most efficient solution for handling encoding issues, but extreme
efficiency is not very important for the kind of work I do.

--
Arthur Lemmens

Duane Rettig

unread,
Dec 12, 2002, 12:00:01 PM12/12/02
to
Tim Bradshaw <t...@cley.com> writes:

The lowest common denominator _is_ defined: it is :default
(http://www.franz.com/support/documentation/6.2/ansicl/dictentr/open.htm)

Oh, you meant _useful_ external-format name standardizations? :-)
That would have to come under de facto standardization, like MOP,
or under unification efforts, similar to UFFI. Then again, there
is always #+/#- These, as well as #if/#ifdef in C, are the great
portablizers in their respective languages.

Duane Rettig

unread,
Dec 12, 2002, 12:00:00 PM12/12/02
to
Casper H.S. Dik <Caspe...@Sun.COM> writes:

> Duane Rettig <du...@franz.com> writes:
>
> >The read-line function is defined to leave out the #\newline that was
> >read. In C, the '\n' is included by fgets() (a minor difference between
> >the CL and C styles of reading a line). I did not try this on MacOS,
> >but on Linux and Windows, I got similar results (a file with the
> >characters abc\r\n was read into windows as "abc\n" and into Linux
> >as "abc\r\n"). Thus the same non-portability exists between linux
> >and Windows in C, even with the supposedly portable Cygwin C library
> >for Windows.
>
> The ANSI C specification has a single newline character; the source code
> is portable and there is no need to handle it specifically.

You didn't actually try this, did you? Please note that Jochen is _not_
talking about file formats, which do tend to be matched to the respective
operating systems to which they are intended (although, as you note below,
there are times when these files do get read by non-intended operating
systems). He is talking about common internet protocols which _do_
specify a CRLF combination regardless of the operating system that reads
or generates them. For these protocols, fgets() cannot be used portably.

> However, the files need to be converted when send from one OS to the next;
> i.e., using fgets() on a DOS CR/LF delimited file on Unix will not
> yield the desired results. This is to be expected.

Of course. But we're not talking about native vs non-native files, we're
talking about internet protocols. These are intended for use by all
systems, and have a specific format.

> Binary files (binary and non-binary are the same for Unix but different
> for DOS)

Ah, yes, yet another incompatibility between C systems...

> are a portable format.

Whether you regard internet protocols as binary or text files, fgets()
simply does not work portably on them.

Christopher C. Stacy

unread,
Dec 12, 2002, 12:06:39 PM12/12/02
to
>>>>> On Thu, 12 Dec 2002 16:09:09 +0100, Jochen Schmidt ("Jochen") writes:
>> Neither ANSI Common Lisp (nor C) defines what a network stream is,
>> or provides a way to open one. So you're already in "extension
>> land" as soon as you're talking about network streams.

Jochen> What about files who originate in non-local filesystems?

Gee, what about files or network protocols that use the sequence
#\E#\n#\d#\O#\f#\L#\i#\n#\e#\C#\o#\o#\k#\i#\e#\
to mean end-of-line, and swap #\A and #\Z?

I don't think the function (AI) was adopted by X3J13.

A given Lisp implementation is only responsible for running
portably on its host machine, not for being a universal
IO format translator that knows all other operating systems
and file formats and network protocols.


Tim Bradshaw

unread,
Dec 12, 2002, 2:33:56 PM12/12/02
to
* Duane Rettig wrote:
> Oh, you meant _useful_ external-format name standardizations? :-)
> That would have to come under de facto standardization, like MOP,
> or under unification efforts, similar to UFFI.

Yes, this is what I meant. A de facto standardisation would be good.

> Then again, there
> is always #+/#- These, as well as #if/#ifdef in C, are the great
> portablizers in their respective languages.

And in particular somewhat better than this (:-).

--tim

Erik Naggum

unread,
Dec 12, 2002, 9:48:17 PM12/12/02
to
* Greg Menke

| Highly portable "ISO" code is easy until you start doing
| complicated things or actually try to use operating system features

Operating system features are not part of the ISO C standard.

| Then you'll really enjoy how portable CL is.

My experience is that C code that conforms to its standard is /way/
more portable than Common Lisp code that conforms to its standard.
Lest there be any confusion about where I stand on this: /it pisses
me off/ that Common Lisp system vendors have not just moved past
the point where /everyone/ faithfully implements the standard. I
have given up reporting conformance bugs, since conforming to the
specification might break some paying customer's code. There are
enough gratuitous incompatibilities between the Common Lisp systems
and enough unimplemented standard behavior that you cannot program
according to the specification and expect it to behave correctly.
You must in effect always program to the particular implementation.
Some vendors actually consider this a feature and argue that you
have to program to the particular implementation, anyway, which
does nothing but piss me off further. It is like arguing that you
might as well break some laws because you cannot have laws for
everything. Playing with a particular Common Lisp system is fun,
but playing with several is not fun, as it forces me to be conscious
of a large number of highly irrelevant minor issues, which is the
exact opposite of the purpose of a standard.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Kent M Pitman

unread,
Dec 13, 2002, 1:21:48 AM12/13/02
to
cst...@dtpq.com (Christopher C. Stacy) writes:

> Jochen> One thing is the LF(Unix) vs. CR(MacOS) vs. CRLF(Windows)
> Jochen> lineending issue. This differences make read-line and
> Jochen> write-line useless for I/O that has to speak a protocol like
> Jochen> HTTP, IRC or most other common internet text-protocols.
> Jochen> If I call read-line on such a source one of the following
> Jochen> will happen:
>
> Jochen> [xxxxx]abc ; Windows
> Jochen> [xxxxxCR]abc ; Unix
> Jochen> [xxxxx]LFabc ; MacOS
>
> Jochen> The part in the brackets is what read-line would return and
> Jochen> the part after it is what is left in the stream.
>

> Jochen> Some implementations of Common Lisp allow to change the
> Jochen> lineending style via external formats - but that is not
> Jochen> portable ANSI CL code either.
>
> Jochen> So it seems the only thing one can do is to write your own
> Jochen> line reader function in portable ANSI CL (via
> Jochen> read-char/unread-char/peek-char).


>
> What you are seeing is a result of the fact that READ-LINE does not
> include the line terminator character in the string it returns.
>

> I think this is better than C, which does not have that degree of
> portable IO: C programs have to be coded to know what the line
> terminator character is. In Common Lisp, you either don't need to
> worry about it at all, or you use the portable character #\Newline.
> Lisp takes care of the line terminator for you.
>

> The problem you are experiencing is because of the network protocol.

By contrast, I would say it is because you are using READ-LINE on a
stream that does not obey the host conventions for end of line
termination.

In effect, READ-LINE is not defined for this kind of stream.



> Neither ANSI Common Lisp (nor C) defines what a network stream is,
> or provides a way to open one. So you're already in "extension
> land" as soon as you're talking about network streams.

The issue isn't the way to open it. I think it's conforming ANSI CL
to have an open network stream. What is not conforming is to use an
operator (i.e., READ-LINE) whose job it is to apply end of line conventions
to a stream that is violating that. Either READ-LINE should not be used,
or the program that uses it should require arguments to it to be only
appropriately formatted streams.

> The problem you're having above is that each network protocol defines
> what the line terminator is, and there is no way for Lisp (or any
> other language) to know what choice was made. If the network protocol
> happens to use the same line terminator as your host operating system,
> then READ-LINE will happen to work.

I agree with Chris here.



> Usually, for text command based control connections, the choice
> is CR LF. That's the same choice as DOS/Windows, which is why
> your first example above works with READ-LINE. But that's not
> portable, so you may have to program around that.

I don't like the use of "works" here. I prefer "appears to work".
It only works by coincidence.

> In C, you have to write your own function to read the lines
> and do the translation. The programs have to be ported.
>

> In Lisp, you could write your own, also, using Grey streams if you like.
> But each vendor has also provided an extension that allows you to
> automatically control the external character set translation.
> This is done either by wrapping translation streams around the network
> stream, or just specifying some keywords to the stream opener function.
> At that point, you can write portable code and you're back to not
> caring about line terminators.

You can also just parse those lines with READ-CHAR or READ-SEQUENCE.



> If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
> then you ought to be okay with some sort of :EXTERNAL-FORMAT
> keyword thingie that is endorsed by the standard.

I disagree that this is the right way to go.
I personally believe the right thing is to not use READ-LINE.

(This is a repeat of the same argument we get into over whether READ
and modified readtables should ever be used to parse other than Lisp
data. I say no, but others disagree.)

> It's much more than C is doing for you, so it's hard to understand why
> you would claim that C is more portable than Lisp in this area.
> Or any area.
>

> Jochen> What do others when faced with this issue?
>
> The main server code is handed a stream that implements the correct
> #\Newline semantics (and probably buffering and maybe other features).
> A small portion of this code, and also the code that opens the network
> connection, needs to be ported for each operating system. The main
> server code is all portable ANSI Common Lisp, while the network code
> is as a whole, not.

I write code that uses READ-CHAR and expects CR/LF pairs unconditionally.
I usually also provide a special variable that will allow you to ignore
that strictness and instead to treat CR, CR/LF, and LF as synonyms
heuristically.

Jochen Schmidt

unread,
Dec 13, 2002, 3:28:58 AM12/13/02
to
Kent M Pitman wrote:

That that was what I described. I did not argue against read-line not being
designed to be only useful on host conventions. The question is if it is a
good idea to restrict an operator like this to be only useful in this
restricted set of possibilities.

Many fileformats and network protocols are line-based with variing notions
of line-endings (CRLF is the most common in network protocols).

One thing I could have imagined would be a more general operator (or a set
of related operators) which read until a particular lineending
token/condition appears.

>> Neither ANSI Common Lisp (nor C) defines what a network stream is,
>> or provides a way to open one. So you're already in "extension
>> land" as soon as you're talking about network streams.
>
> The issue isn't the way to open it. I think it's conforming ANSI CL
> to have an open network stream. What is not conforming is to use an
> operator (i.e., READ-LINE) whose job it is to apply end of line
> conventions
> to a stream that is violating that. Either READ-LINE should not be used,
> or the program that uses it should require arguments to it to be only
> appropriately formatted streams.

This makes IMHO read-line a rather useless operator. Maybe others experience
and environment is fundamentally different but here I either talk to some
network protocol (with non-host conventions or not) or I have to work with
files of varying origins (and therefore varying lineending conventions
too).

>> The problem you're having above is that each network protocol defines
>> what the line terminator is, and there is no way for Lisp (or any
>> other language) to know what choice was made. If the network protocol
>> happens to use the same line terminator as your host operating system,
>> then READ-LINE will happen to work.
>
> I agree with Chris here.

Well - of course it is not possible for lisp to know what choice was made
but the programmer knows it (or should know it). So he can parameterize the
system in a way to understand the right lineending convention.

>> Usually, for text command based control connections, the choice
>> is CR LF. That's the same choice as DOS/Windows, which is why
>> your first example above works with READ-LINE. But that's not
>> portable, so you may have to program around that.
>
> I don't like the use of "works" here. I prefer "appears to work".
> It only works by coincidence.

This depends. If one is able to specify the lineending style in an
implementation (non-portable). Then (if done right) it will either work or
the system will complain that such an implementation defined facility is
not available.

>> In C, you have to write your own function to read the lines
>> and do the translation. The programs have to be ported.
>>
>> In Lisp, you could write your own, also, using Grey streams if you like.
>> But each vendor has also provided an extension that allows you to
>> automatically control the external character set translation.
>> This is done either by wrapping translation streams around the network
>> stream, or just specifying some keywords to the stream opener function.
>> At that point, you can write portable code and you're back to not
>> caring about line terminators.
>
> You can also just parse those lines with READ-CHAR or READ-SEQUENCE.

Which is what I have done most of the times in such situations. But I think
a READ-LINE would be the right operator for line-based input. Denying that
and falling back to READ-CHAR/READ-SEQUENCE seems to me like denying that
the input is line-based.

>> If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
>> then you ought to be okay with some sort of :EXTERNAL-FORMAT
>> keyword thingie that is endorsed by the standard.
>
> I disagree that this is the right way to go.
> I personally believe the right thing is to not use READ-LINE.

Can you explain further why you think it is not the right thing to use
READ-LINE on line-based input? (Assuming that a facility to parameterize
the lineending convention exists)

> (This is a repeat of the same argument we get into over whether READ
> and modified readtables should ever be used to parse other than Lisp
> data. I say no, but others disagree.)

The difference is that line-based input is much more common than input very
similar to lisps syntax. So an operator like READ-LINE which works in a
almost complete set of line-based formats seems useful to me.



>> It's much more than C is doing for you, so it's hard to understand why
>> you would claim that C is more portable than Lisp in this area.
>> Or any area.
>>
>> Jochen> What do others when faced with this issue?
>>
>> The main server code is handed a stream that implements the correct
>> #\Newline semantics (and probably buffering and maybe other features).
>> A small portion of this code, and also the code that opens the network
>> connection, needs to be ported for each operating system. The main
>> server code is all portable ANSI Common Lisp, while the network code
>> is as a whole, not.
>
> I write code that uses READ-CHAR and expects CR/LF pairs unconditionally.
> I usually also provide a special variable that will allow you to ignore
> that strictness and instead to treat CR, CR/LF, and LF as synonyms
> heuristically.

In my last projects I experimented with using "encapsulation-streams" built
through Gray-Streams for parsing.
So far this seems to be a very mighty facility.
Gray-Streams are actually quite portable so I think they are the right thing
to solve such encoding issues.

Joe Marshall

unread,
Dec 13, 2002, 11:35:56 AM12/13/02
to
Erik Naggum <er...@naggum.no> writes:

> My experience is that C code that conforms to its standard is /way/
> more portable than Common Lisp code that conforms to its standard.

I concur with this.

> Lest there be any confusion about where I stand on this: /it pisses
> me off/ that Common Lisp system vendors have not just moved past
> the point where /everyone/ faithfully implements the standard.

I agree, but C compiler vendors have a far easier task. C was
designed to be easy to implement rather than to be easy to use.

On the other hand, aren't Common Lisp vendors supposed to be smarter
than C vendors?

Duane Rettig

unread,
Dec 13, 2002, 3:00:04 PM12/13/02
to
Erik Naggum <er...@naggum.no> writes:

> * Greg Menke


>
> | Then you'll really enjoy how portable CL is.
>
> My experience is that C code that conforms to its standard is /way/
> more portable than Common Lisp code that conforms to its standard.
> Lest there be any confusion about where I stand on this: /it pisses
> me off/ that Common Lisp system vendors have not just moved past
> the point where /everyone/ faithfully implements the standard. I
> have given up reporting conformance bugs, since conforming to the
> specification might break some paying customer's code. There are
> enough gratuitous incompatibilities between the Common Lisp systems
> and enough unimplemented standard behavior that you cannot program
> according to the specification and expect it to behave correctly.
> You must in effect always program to the particular implementation.
> Some vendors actually consider this a feature and argue that you
> have to program to the particular implementation, anyway, which
> does nothing but piss me off further. It is like arguing that you
> might as well break some laws because you cannot have laws for
> everything. Playing with a particular Common Lisp system is fun,
> but playing with several is not fun, as it forces me to be conscious
> of a large number of highly irrelevant minor issues, which is the
> exact opposite of the purpose of a standard.

That you have given up reporting conformance bugs to your vendors is
a sad thing; when you were using Allegro CL the reports you made of
nonconformaces were gladly accepted as bugs, and although we sometimes
prioritized differently than you would have preferred (and yes,
sometimes the problem with these priorities has to do with how to
make such changes as painless as possible to customers who might
have become accustomed to such nonconformances), we have always
viewed the standard as a requirement, and continually work on such
nonconformances, as resources allow, to eliminate them one by one.
It is my belief that all of your CL vendors should be doing the same,
and should not shirk their duty to be faithful to the standard.

I hope you will reconsider, and rekindle your hope that CL vendors
will accept your nonconformance reports as bugs, and will work on
them to eventually remove them.

Kent M Pitman

unread,
Dec 13, 2002, 10:07:42 PM12/13/02
to
Jochen Schmidt <j...@dataheaven.de> writes:

...


> One thing I could have imagined would be a more general operator (or a set
> of related operators) which read until a particular lineending
> token/condition appears.

As long as it defaulted to the value of some *native-line-termination*
variable, I guess this would be ok. That is, the default behavior is
intentionally non-uniform and to cast it into something which dead-reckons
the end of line convention requires positing some hidden variable.



> >> Neither ANSI Common Lisp (nor C) defines what a network stream is,
> >> or provides a way to open one. So you're already in "extension
> >> land" as soon as you're talking about network streams.
> >
> > The issue isn't the way to open it. I think it's conforming ANSI CL
> > to have an open network stream. What is not conforming is to use an
> > operator (i.e., READ-LINE) whose job it is to apply end of line
> > conventions
> > to a stream that is violating that. Either READ-LINE should not be used,
> > or the program that uses it should require arguments to it to be only
> > appropriately formatted streams.
>
> This makes IMHO read-line a rather useless operator.

READ-LINE is intended to read information off the console or out of a file.
That's far from useless. But it's not the answer to all the world's problems.

> Maybe others experience
> and environment is fundamentally different but here I either talk to some
> network protocol (with non-host conventions or not) or I have to work with
> files of varying origins (and therefore varying lineending conventions
> too).

To be honest, I think the real bad guy here was the network protocol,
for defying local host conventions for newlines. They SHOULD have defined
that any of the three EOL conventions could be used and should be expected
exactly because people tend to just use their host-specific write-line
and forget whether it uses CRLF ... all web servers have to expect this.

Btw, I also think that it's both conforming and useful for READ-LINE to
accept all three EOL conditions. The spec doesn't require it, but it seems
to me that it does permit it.

> >> Usually, for text command based control connections, the choice
> >> is CR LF. That's the same choice as DOS/Windows, which is why
> >> your first example above works with READ-LINE. But that's not
> >> portable, so you may have to program around that.
> >
> > I don't like the use of "works" here. I prefer "appears to work".
> > It only works by coincidence.
>
> This depends. If one is able to specify the lineending style in an
> implementation (non-portable). Then (if done right) it will either work or
> the system will complain that such an implementation defined facility is
> not available.

I stil say "appear to work".

Back in the days of Maclisp, there were two weird programming
conventions on the pdp10 regarding compiled functions.
In convention #1, a variable called *RSET contained information
about whether to do error checking in many functions; in effect,
it was like a special variable controlling what we now do with SAFETY.
The instructions in such functions would start:
SKIPE *RSET
PUSHJ ...errcheck...
...main functionality...
In this way, if you did (SETQ *RSET NIL), you got low error checking
in built-in functions. In convention #2, user functions could be
"number compiled". That is, they could take their arguments in registers
(already machine numbers, no tag bit decoding, etc.) Code compiled for
"number calling" would do a PUSHJ in the first instruction to something
that move stuff from the stack to those registers for the sake of callers
who didn't know the convention, but if you declared the function to be of
number type, the "number calling" function would enter at the second
instruction and bypass such stuff. Now, it happened one day that
George J Carrette sent a message (I believed the subject line was
"crowbar in head") where he found someone who had code that wanted to
force error checking even when *RSET was NIL, so the person had
declared the function to be a "number function" (even though it wasn't).
[If you're following along, you'll realize that doing this caused the
first instruction to be skipped, and so avoided the SKIPE *RSET.]
Now, you can claim this "worked" because it achieved the programmer's
desired effect. That's a very Machiavellian approach to programming.
I tend to think something "works" you've obeyed documentation/intent
of programs, not when you've adhered to incidental truth of implementation.
(In fact, I think George found this weirdness where someone had changed
some particular system function to not respect *RSET or to do the check in
a different place, so the number declaration was causing whatever the
first, somewhat arbitrary, instruction of the function to be skipped...)

I kind of abbreviated this story. It's always stuck in my mind, though, as
a perfect example of why not to rely on felicity.

> >> In C, you have to write your own function to read the lines
> >> and do the translation. The programs have to be ported.
> >>
> >> In Lisp, you could write your own, also, using Grey streams if you like.
> >> But each vendor has also provided an extension that allows you to
> >> automatically control the external character set translation.
> >> This is done either by wrapping translation streams around the network
> >> stream, or just specifying some keywords to the stream opener function.
> >> At that point, you can write portable code and you're back to not
> >> caring about line terminators.
> >
> > You can also just parse those lines with READ-CHAR or READ-SEQUENCE.
>
> Which is what I have done most of the times in such situations. But I think
> a READ-LINE would be the right operator for line-based input. Denying that
> and falling back to READ-CHAR/READ-SEQUENCE seems to me like denying that
> the input is line-based.

The input is NOT line-based. Put another way, the term "line" has multiple
meanings and the input on a network stream which is called a "line" is not
the same kind of "line" as is the "line" used by the host OS. If you want
to say it's ok for lines to be differently terminated, what makes a
"tab delimited field" not just a "differently terminated line", too? Where
do you draw the line (sorry, bad pun)?



> >> If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
> >> then you ought to be okay with some sort of :EXTERNAL-FORMAT
> >> keyword thingie that is endorsed by the standard.
> >
> > I disagree that this is the right way to go.
> > I personally believe the right thing is to not use READ-LINE.
>
> Can you explain further why you think it is not the right thing to use
> READ-LINE on line-based input? (Assuming that a facility to parameterize
> the lineending convention exists)

Because the facility does not exist.

And because there are multiple kinds of lines being blurred here,
obscuring what is really going on. I'd rather see a NETWORK:READ-LINE
or NETWORK:READ-CRLF-TERMINATED created that is plain about what it is doing
and what state it leaves things in.

I mostly never advocated solutions that involve imagining implementations
to do differently than the spec asks them to.

> > (This is a repeat of the same argument we get into over whether READ
> > and modified readtables should ever be used to parse other than Lisp
> > data. I say no, but others disagree.)
>
> The difference is that line-based input is much more common than input very
> similar to lisps syntax. So an operator like READ-LINE which works in a
> almost complete set of line-based formats seems useful to me.

Sure. No problem about making such. But that isn't what READ-LINE is.
READ-LINE dates to a day that simply predates these other needs.

Here is politically what I would do if I were you and had your goals:

Step 1.

One by one, go to each vendor and ask them why they don't just accept
all three line ending protocols, so that no stray c-M and c-J are left
on the stream, perhaps under control of some special variable.

Step 2.

When you have fought this battle individually with each vendor and all
implementations de facto do what you want, then make the case you are now
making.

In other words, the change you are suggesting is a compatible extension
and so you should focus your energies not on a user community that cannot
affect the spec and cannot change implementations, but rather on vendors
who can respond to legitimate expressed needs of users if they are convinced
of the importance.

Mostly I regard having this discussion on the open in comp.lang.lisp and
not in private with vendors as a waste of breath. People who agree with
you are likely to make you feel good, but not likely to change much. People
who disagree with you will make you feel discouraged, but won't actually
hold you back.

Standards are best created when most vendors ALREADY do something either
all the same way or in trivially different ways that require regularization.
Standards are NOT intended as a way of bludgeoning vendors into adding
extensions that they do not see a need for. People often talk on this group
as if somehow by changing the standard you will force vendors into submission.

On the other hand, this discussion arose as a criticism of the standard nature
of the language. Just because READ-LINE doesn't satisfy your personal needs
does not mean it doesn't have regular behavior. Could it have? Sure. And
maybe it could be fixed to. But it isn't non-portable now--it's just that its
portable uses are not the ones you wish they were. You can criticize
the language for its _choice_ of portable operations, but don't confuse that
with criticizing the language for not having portable operations, which you
have done, and I think unfairly.

> >> It's much more than C is doing for you, so it's hard to understand why
> >> you would claim that C is more portable than Lisp in this area.
> >> Or any area.
> >>
> >> Jochen> What do others when faced with this issue?
> >>
> >> The main server code is handed a stream that implements the correct
> >> #\Newline semantics (and probably buffering and maybe other features).
> >> A small portion of this code, and also the code that opens the network
> >> connection, needs to be ported for each operating system. The main
> >> server code is all portable ANSI Common Lisp, while the network code
> >> is as a whole, not.
> >
> > I write code that uses READ-CHAR and expects CR/LF pairs unconditionally.
> > I usually also provide a special variable that will allow you to ignore
> > that strictness and instead to treat CR, CR/LF, and LF as synonyms
> > heuristically.
>
> In my last projects I experimented with using "encapsulation-streams" built
> through Gray-Streams for parsing.
> So far this seems to be a very mighty facility.
> Gray-Streams are actually quite portable so I think they are the right thing
> to solve such encoding issues.

Check out Franz's simple streams, too. They've dealt with some additional
problems that Gray streams didn't seek to solve. I think Franz is going to
try to do the leg work to get other vendors to adopt this extension protocol.
I don't see any reason the other vendors should shy away from their
proposed stuff.


Jochen Schmidt

unread,
Dec 14, 2002, 6:14:21 AM12/14/02
to
Kent M Pitman wrote:

> Jochen Schmidt <j...@dataheaven.de> writes:
>
> ...
>> One thing I could have imagined would be a more general operator (or a
>> set of related operators) which read until a particular lineending
>> token/condition appears.
>
> As long as it defaulted to the value of some *native-line-termination*
> variable, I guess this would be ok. That is, the default behavior is
> intentionally non-uniform and to cast it into something which dead-reckons
> the end of line convention requires positing some hidden variable.

Yes that is a good idea.

Ok thats a good point. Many network protocols are implemented wrong so one
often has to test for all three cases anyway as you say.



>> >> Usually, for text command based control connections, the choice
>> >> is CR LF. That's the same choice as DOS/Windows, which is why
>> >> your first example above works with READ-LINE. But that's not
>> >> portable, so you may have to program around that.
>> >
>> > I don't like the use of "works" here. I prefer "appears to work".
>> > It only works by coincidence.
>>
>> This depends. If one is able to specify the lineending style in an
>> implementation (non-portable). Then (if done right) it will either work
>> or the system will complain that such an implementation defined facility
>> is not available.
>
> I stil say "appear to work".

Of course - my example should show how much assumptions one has to make
about implementation defined behaviour. Well - it will "work" in the way
that the result is right in the environment the programmer did setup. But of
course it will not work in many other cases. I think something like this is
ok to get a problem solved in an ad hoc way and the programmer knows what
he does. Even then there is of course the danger that this program gets
unintentionally reused at a later time.



>> Which is what I have done most of the times in such situations. But I
>> think a READ-LINE would be the right operator for line-based input.
>> Denying that and falling back to READ-CHAR/READ-SEQUENCE seems to me like
>> denying that the input is line-based.
>
> The input is NOT line-based. Put another way, the term "line" has
> multiple meanings and the input on a network stream which is called a
> "line" is not
> the same kind of "line" as is the "line" used by the host OS. If you want
> to say it's ok for lines to be differently terminated, what makes a
> "tab delimited field" not just a "differently terminated line", too?
> Where do you draw the line (sorry, bad pun)?

:-)

Actually I even think that it would not be too far away to read a single tab
delimited field as several "tab delimited lines". But this makes clear that
this facility is probably more like a READ-TOKEN than a READ-LINE. The
connection to lines is only that lines are tokens delimited trough a
lineending token. Talking about lines implies that there is a format
consisting of a sequence of those "line-tokens".

>> >> If you're okay with writing (MAKE-INSTANCE 'NET:SOCKET-STREAM...)
>> >> then you ought to be okay with some sort of :EXTERNAL-FORMAT
>> >> keyword thingie that is endorsed by the standard.
>> >
>> > I disagree that this is the right way to go.
>> > I personally believe the right thing is to not use READ-LINE.
>>
>> Can you explain further why you think it is not the right thing to use
>> READ-LINE on line-based input? (Assuming that a facility to parameterize
>> the lineending convention exists)
>
> Because the facility does not exist.

At least not in the standard.

> And because there are multiple kinds of lines being blurred here,
> obscuring what is really going on. I'd rather see a NETWORK:READ-LINE
> or NETWORK:READ-CRLF-TERMINATED created that is plain about what it is
> doing and what state it leaves things in.
>
> I mostly never advocated solutions that involve imagining implementations
> to do differently than the spec asks them to.

This is an interesting issue. When trying to extend CL there seems to be
always the choice between fitting the extension into the existing framework
(here the external format stuff in streams) or to do it completely
orthogonal (e.g. NETWORK:READ-CRLF-TERMINATED or more general a
READ-DELIMITED-LINE). The question is if one should try to do it the first
way or if it is almost always better to choose the latter.
I'm talking here in particular about vendors who are able to extend the
system in those ways. Open Source implementations make it (in theory)
possible that even "normal users" can work on that level of the language.

>> > (This is a repeat of the same argument we get into over whether READ
>> > and modified readtables should ever be used to parse other than Lisp
>> > data. I say no, but others disagree.)
>>
>> The difference is that line-based input is much more common than input
>> very similar to lisps syntax. So an operator like READ-LINE which works
>> in a almost complete set of line-based formats seems useful to me.
>
> Sure. No problem about making such. But that isn't what READ-LINE is.
> READ-LINE dates to a day that simply predates these other needs.

Ok

> Here is politically what I would do if I were you and had your goals:
>
> Step 1.
>
> One by one, go to each vendor and ask them why they don't just accept
> all three line ending protocols, so that no stray c-M and c-J are left
> on the stream, perhaps under control of some special variable.
>
> Step 2.
>
> When you have fought this battle individually with each vendor and all
> implementations de facto do what you want, then make the case you are now
> making.
>
> In other words, the change you are suggesting is a compatible extension
> and so you should focus your energies not on a user community that cannot
> affect the spec and cannot change implementations, but rather on vendors
> who can respond to legitimate expressed needs of users if they are
> convinced of the importance.

I did not mean to suggest that something has to happen or that CL is in a
way "broken" in this part. This issue is simply a thing that made writing
portable CL apps more difficult than what one is used from other parts of
the language. I asked here instead of my vendors support because I was
interested what other users do in those cases.

> Mostly I regard having this discussion on the open in comp.lang.lisp and
> not in private with vendors as a waste of breath. People who agree with
> you are likely to make you feel good, but not likely to change much.
> People who disagree with you will make you feel discouraged, but won't
> actually hold you back.

Hold me back from solve this issue? I don't think that I'm experienced
enough to do something like this.

I was interested in how this problem can be solved and so far it seems to
be:

a) Instead of READ-LINE use READ-CHAR/READ-SEQUENCE yourself
I think in this part naturally fits writing a (portable)
READ-DELIMITED-LINE function.

b) Use implementation defined external formats to make READ-LINE
do the right thing.
Here a de-facto standard naming of a minimum set of external formats
would be a valuable enhancement.

c) Change implementations of READ-LINE to accept any of the three eol
styles.

d) Use Gray Streams (Or Simple Streams) to write a wrapper stream which
handles encoding

> Standards are best created when most vendors ALREADY do something either
> all the same way or in trivially different ways that require
> regularization. Standards are NOT intended as a way of bludgeoning vendors
> into adding
> extensions that they do not see a need for. People often talk on this
> group as if somehow by changing the standard you will force vendors into
> submission.

As I said above I asked how other users face this problem - if I really
wanted to change something I would have asked my vendor of course.

> On the other hand, this discussion arose as a criticism of the standard
> nature
> of the language. Just because READ-LINE doesn't satisfy your personal
> needs
> does not mean it doesn't have regular behavior. Could it have? Sure. And
> maybe it could be fixed to. But it isn't non-portable now--it's just that
> its
> portable uses are not the ones you wish they were. You can criticize
> the language for its _choice_ of portable operations, but don't confuse
> that with criticizing the language for not having portable operations,
> which you have done, and I think unfairly.

Ok sorry - I did not meant to imply that READ-LINE is broken in a way which
would make ANSI CL not portable.

Of course it is the obvious that READ-LINE is not thought as the facility I
wanted but I think that of the operations available in CL READ-LINE comes
closest to what is needed for things like handling "line based protocols".

>> In my last projects I experimented with using "encapsulation-streams"
>> built through Gray-Streams for parsing.
>> So far this seems to be a very mighty facility.
>> Gray-Streams are actually quite portable so I think they are the right
>> thing to solve such encoding issues.
>
> Check out Franz's simple streams, too. They've dealt with some additional
> problems that Gray streams didn't seek to solve. I think Franz is going
> to try to do the leg work to get other vendors to adopt this extension
> protocol. I don't see any reason the other vendors should shy away from
> their proposed stuff.

I have heard that there is some work ongoing to write an implementation of
simple streams for CMUCL. I have not yet heard anything from other vendors.

The most important thing simple streams would solve for me is making it
possible to define your own buffered streams in a de facto portable way.

LispWorks provides its own layer of buffering primitives built as an add-on
to Gray Streams. There are generic functions like STREAM-FILL-BUFFER,
STREAM-FLUSH-BUFFER and facilities to access buffer state objects of
streams. For ACL-COMPAT I implemented a simple layer which emulates this
buffering protocol in a mostly portable way because I used it to implement
HTTP/1.1 chunking streams.

Tim Bradshaw

unread,
Dec 14, 2002, 11:01:24 AM12/14/02
to
* Kent M Pitman wrote:

> Sure. No problem about making such. But that isn't what READ-LINE is.
> READ-LINE dates to a day that simply predates these other needs.

I'm pretty confused now. I don't think this is really (or should
really be) about READ-LINE. What it should be about is what external
formats should do. The line-end convention problem is a minute subset
of this issue - anyone who can write external format support for
streams that can deal with UTF-8 or any of the many other encodings
that might crop up is surely able to deal with end-of-line
variations. For some reason we're getting all hung up on the line-end
issue when the real problem is that input is stuff down the wire in
shift-JIS or something. You need to be able to specify when opening a
stream what external format it has, ideally in some standard way (at
least for the common external formats).

--tim

Don Geddis

unread,
Dec 14, 2002, 4:11:48 PM12/14/02
to
Kent M Pitman <pit...@world.std.com> writes:
> Standards are NOT intended as a way of bludgeoning vendors into adding
> extensions that they do not see a need for. People often talk on this group
> as if somehow by changing the standard you will force vendors into
> submission.

While I agree with you, it isn't surprising that people come to that conclusion
and thus see standard-changing as a way to force implementation changes.
The CL spec has achieved wide enough support that pretty much all the major
lisp implementations accept reports of deviance as bugs.

I sure that each vendor found something in the spec that annoyed them, and
was more difficult to implement than they wanted, and didn't really matter to
their user community. And yet, they valued full compliance with the spec above
just being "mostly" compliant to the parts their users cared about.

Given that power, can't you see how natural it is for users to believe a
spec change could force vendors to update implementations? Surely there's at
least a grain of truth to that wish...
_______________________________________________________________________________
Don Geddis http://don.geddis.org d...@geddis.org
The face of a child can say it all, especially the mouth part of the face.
-- Deep Thoughts, by Jack Handey

Kent M Pitman

unread,
Dec 15, 2002, 6:28:13 PM12/15/02
to
Jochen Schmidt <j...@dataheaven.de> writes:

> Kent M Pitman wrote:

[...]



> > I mostly never advocated solutions that involve imagining implementations
> > to do differently than the spec asks them to.
>
> This is an interesting issue. When trying to extend CL there seems to be
> always the choice between fitting the extension into the existing framework
> (here the external format stuff in streams) or to do it completely
> orthogonal (e.g. NETWORK:READ-CRLF-TERMINATED or more general a
> READ-DELIMITED-LINE). The question is if one should try to do it the first
> way or if it is almost always better to choose the latter.
> I'm talking here in particular about vendors who are able to extend the
> system in those ways. Open Source implementations make it (in theory)
> possible that even "normal users" can work on that level of the language.

Different spin:

The open source paradigm can confuse a user into believing he/she is a
language designer by confusing the notion of "access" with "right".

Well, indeed, the open source user CAN design _another_ language. But the
open source user, even with access to source, cannot _redesign_ an existing
language. The language is defined by its _specification_, not by its
_implementation_. Open source implementations, in part becuase there is
often a sense of a "canonical source", create this confusion in spades. Yet
if the "canonical source" is editable, then there is _not_ a single source,
because the product of an edit yields a non-canonical source, and so if you
are confusing implementations with languages also yields a non-canonical
language.

Add this to my growing list of nits about open source. ;)

> I did not mean to suggest that something has to happen or that CL is in a
> way "broken" in this part. This issue is simply a thing that made writing
> portable CL apps more difficult than what one is used from other parts of
> the language.

The writing of portable CL apps is trivial by just doing your own
NETWORK:READ-LINE after observing that the existing read-line gives
you too few assurances. Surely this can't take more than a couple minutes.
While this is "more difficult" in some trivial sense, at the big-O level
I don't think you've made programming more difficult.

I think it's a mistake to assume that the "difficulty" should be the same as
for other parts of the language uniformly. It's plain that the entire I/O
system offered by CL is filled with caveats and that if you assume you can do
anything other than the simple teletype/file model that CL makes a vague stab
at, you will have more problems than just this. Things like sequence
functions, math, etc. were pretty well-settled at the time CL was created.
I/O and window systems were new areas of concern that were far from settled
and the CL design specifically opts to "not attempt" rather than "attempt
badly". We could have gone out on a limb and tried to go after some of the
low hanging fruit, but we might have done things wrong, and our general
approach where we did not yet understand something fairly well was simply to
back off and leave the entire area for future standardization rather than to
produce a language that was broken for the entire future. In that way,
the language would stand without the need for change (not to be confused with
the need for extension). I think this design methodology has served
remarkably well.

> As I said above I asked how other users face this problem - if I really
> wanted to change something I would have asked my vendor of course.
>
> > On the other hand, this discussion arose as a criticism of the standard
> > nature
> > of the language. Just because READ-LINE doesn't satisfy your personal
> > needs
> > does not mean it doesn't have regular behavior. Could it have? Sure. And
> > maybe it could be fixed to. But it isn't non-portable now--it's just that
> > its
> > portable uses are not the ones you wish they were. You can criticize
> > the language for its _choice_ of portable operations, but don't confuse
> > that with criticizing the language for not having portable operations,
> > which you have done, and I think unfairly.
>
> Ok sorry - I did not meant to imply that READ-LINE is broken in a way which
> would make ANSI CL not portable.

Ok. Just a confusion then. Sorry to be ranting on.



> Of course it is the obvious that READ-LINE is not thought as the facility I
> wanted but I think that of the operations available in CL READ-LINE comes
> closest to what is needed for things like handling "line based protocols".

Yes. Its name might confuse one into not reading the description.
CL names are often highly descriptive, and people have lauded the language
for requiring very little commenting; I suppose the flip side is that
sometimes this means people don't get used to reading comments and doc, and
sometimes they should. ;) "You never get anything for free." I guess this
occasional effect is the price of mostly self-documenting code.

Kent M Pitman

unread,
Dec 15, 2002, 6:36:06 PM12/15/02
to
Tim Bradshaw <t...@cley.com> writes:

It's a little more complicated than that.

You see similar effects in the discussion of CLOSE where we tried to
define what happens when you close encapsulated streams. That
decision happened in haste in an in-person meeting where everyone was
feeling tired and rushed, and was never revisited even though I think
it should have been. Some people wanted CLOSE to close the underlying
stream, but there were patterns of use that were foreign to some
committee members and common to others which made discussion
difficult. On some implementations, the underlying stream might have
been the terminal and closing that was like logging the user out; on
others, it was a window and closing it meant closing the window; on
still others, it was just an object and closing it meant nothing
really at all. Everyone wanted the language to say what would happen
in a situation where no one was even sure what the future events would
be in all cases because they felt it important that it be well-defined
(even though the situation in which it happene was not).

In this case, you might think the world was the simple one you described
but HTTP streams change their nature dynamically between bursts. You read
a command line and then on the very same stream you read some number of
bytes of data. The external format changes dynamically. We might never
have seen such a stream back in those days or might have thought it
irregular and unimportant, and so if we had gone to specify READ-LINE we
might have said "oh, just assume all streams never change format" and then
we might have left out "xml streams", "sgml streams", and "http streams",
which are probably some of the most used streams around!

For example, again, look at timezone. Steele had the forethought to say
that the timezone might change because a PC might be "mobile". I think he
was thinking a tank, but it could just as well be a laptop with GPS builtin!
If we had specified that the timezone be in a variable, cached values would
produce wrong results. Likewise, it's not a stream but a snapshot of a stream,
that has an external format.

But we didn't know any of that back then, and we had the good sense not to
pretend we did, so we made it simpler. We didn't attack all the world's
problems. We just made it strong enough to read from the tty console and
from source files and left the rest to other functions, or so I claim.

Kent M Pitman

unread,
Dec 15, 2002, 6:41:58 PM12/15/02
to
Don Geddis <d...@geddis.org> writes:

> Kent M Pitman <pit...@world.std.com> writes:
> > Standards are NOT intended as a way of bludgeoning vendors into
> > adding extensions that they do not see a need for. People often
> > talk on this group as if somehow by changing the standard you will
> > force vendors into submission.
>
> While I agree with you, it isn't surprising that people come to that
> conclusion and thus see standard-changing as a way to force
> implementation changes. The CL spec has achieved wide enough
> support that pretty much all the major lisp implementations accept
> reports of deviance as bugs.
>
> I sure that each vendor found something in the spec that annoyed
> them, and was more difficult to implement than they wanted, and
> didn't really matter to their user community. And yet, they valued
> full compliance with the spec above just being "mostly" compliant to
> the parts their users cared about.
>
> Given that power, can't you see how natural it is for users to
> believe a spec change could force vendors to update implementations?
> Surely there's at least a grain of truth to that wish...

Sure. We're in agreement as to the potential. It's exactly for this
reason that I responded.

This might indeed be a way of forcing change. I'm just trying
to suggest that in many cases it's not the ethically best path.
Well, I don't know if it's a priori unethical, but I'm arguing for
a construction that makes it unethical exactly to avoid a bad
societal situation--exactly because once advertised as a standard way
of achieving change, it leads to people who make a profession of
finding ways to argue all kinds of weird things, and in the end the
treatment of the spec (an intended instrument of stability) as a
springboard for change (through creative interpretation) weakens our
community.

If you think the standard is misworded in a way that violates original
intent, that's one thing. If you instead think that the standard is
worded in a way that everyone has traditionally understood to mean a
certain thing which coincides with original meaning, but you think
there's a cool new interpretation that you'd like and that could be
argued to be inside some bizarre/novel/new reading of the standard
even though no one now does it, you're stirring up a procedural
hornet's nest that goes well beyond your local goal and would live to
regret. Or so I say. My opinion is just one of many.

Scott Schwartz

unread,
Dec 15, 2002, 7:38:45 PM12/15/02
to
Joe Marshall <j...@ccs.neu.edu> writes:
> I agree, but C compiler vendors have a far easier task. C was
> designed to be easy to implement rather than to be easy to use.

On the contrary. C was designed to be easy to use (it says so right
in the preface to "The C Programming Language") and it is.

Paul F. Dietz

unread,
Dec 15, 2002, 7:48:05 PM12/15/02
to
Scott Schwartz wrote:

>>I agree, but C compiler vendors have a far easier task. C was
>>designed to be easy to implement rather than to be easy to use.
>
> On the contrary. C was designed to be easy to use (it says so right
> in the preface to "The C Programming Language") and it is.

Ease of use may have been a secondary design criterion, but a
primary one was that it compile on a machine with a 64KB address
space and generate reasonably efficient code.

Paul

Damond Walker

unread,
Dec 15, 2002, 9:44:55 PM12/15/02
to
On 12/15/02 7:48 PM, in article xbScnV1m5uW...@dls.net, "Paul F.
Dietz" <di...@dls.net> wrote:

> Ease of use may have been a secondary design criterion, but a
> primary one was that it compile on a machine with a 64KB address
> space and generate reasonably efficient code.

Say what you want...but those were the days. ;)

Damond

Tim Bradshaw

unread,
Dec 16, 2002, 5:38:13 AM12/16/02
to
* Kent M Pitman wrote:
> In this case, you might think the world was the simple one you described
> but HTTP streams change their nature dynamically between bursts. You read
> a command line and then on the very same stream you read some number of
> bytes of data. The external format changes dynamically. We might never
> have seen such a stream back in those days or might have thought it
> irregular and unimportant, and so if we had gone to specify READ-LINE we
> might have said "oh, just assume all streams never change format" and then
> we might have left out "xml streams", "sgml streams", and "http streams",
> which are probably some of the most used streams around!

I think that CL already fails to deal with these situations, or rather
requires you to do all the work yourself, because you have to make the
decision about element type when you open the stream, and that's too
early. If you want, say, to read encoded (either line-end-converted
or more generally encoded) characters, *and* bytes from a stream, you
are going to have to do a lot of work. I'm trying to suggest that the
answer to these kinds of problem might be for streams to become more
flexible and, ideally, for some of that flexibility to be standard (in
an informal sense). Using external formats rather than requiring
everyone and their dog to write their own, probably buggy,
line-reading code is part of that story; allowing bivalant streams or
changing element type and external format on open streams is another.

--tim

Pascal Costanza

unread,
Dec 16, 2002, 6:01:06 AM12/16/02
to
Kent M Pitman wrote:

> You see similar effects in the discussion of CLOSE where we tried to
> define what happens when you close encapsulated streams.

This isn't directly related - but why did you include a close function
at all? Since my experience with the Oberon System I was always puzzled
why languages that include GC require you to close files. In Oberon,
this is also the job of the GC...


Pascal

--
Pascal Costanza University of Bonn
mailto:cost...@web.de Institute of Computer Science III
http://www.pascalcostanza.de Römerstr. 164, D-53117 Bonn (Germany)

Espen Vestre

unread,
Dec 16, 2002, 6:05:09 AM12/16/02
to
Pascal Costanza <cost...@web.de> writes:

> at all? Since my experience with the Oberon System I was always
> puzzled why languages that include GC require you to close files. In
> Oberon, this is also the job of the GC...

Wouldn't this make file descriptor reclaim too unpredictable (e.g.
for tcp/ip servers which then would be easier targets for DoS-attacks)?
--
(espen)

Pascal Costanza

unread,
Dec 16, 2002, 6:17:33 AM12/16/02
to

Oh, I don't know enough about this kind of low-level stuff - what do the
experts think? I do know that this allows for a very convenient
programming style - you can pass files around in your programs without
worrying at all about who is responsible for closing them.

Tim Bradshaw

unread,
Dec 16, 2002, 6:39:19 AM12/16/02
to
* Pascal Costanza wrote:

> This isn't directly related - but why did you include a close function
> at all? Since my experience with the Oberon System I was always
> puzzled why languages that include GC require you to close files. In
> Oberon, this is also the job of the GC...

I'm glad I don't use Oberon then. Using the GC to do resource/object
management is a recipe for disaster: either you have to overspecify
the GC and therefore essentially kill a lot of experimentation with
GCs, or you are going to get an ugly shock when the GC fails to close
your files, and you run out of file handles. Finally, if you can make
it work at all it's just bad design anyway.

For the first case: for a conventional copying GC, if the file handle
is garbage the GC never even sees it, and so never closes it. In
order to make this work, you have to make it secretly not be garbage,
so the GC can see it, and close it. So you have to implement
finalisation or something like that.

For the second case: how soon is the GC meant to run? Maybe you have
a lot of real memory and aren't consing much other than file handles,
so it runs once a week or something. Should running out of OS
filehandles cause a GC? What about issues with generational GCs -
your file handle might get tenured and the system never even realises
it's garbage.

Finally, this is just terrible design. What happens if closing the
file *fails*? At the point it fails, you're in the GC, and you've
lost all the context which might help you decide what to do. There's
not much you can do at this point but unilaterally close the stream
and hope you didn't care about your data much, anyway. Closing a file
can fail even if you've flushed all the data, especially if you use
something like NFS. And of course, users generally won't flush all
the data, so closing the file has to actually put the last buffer's
worth somewhere, or not. You really want this to happen in the
context of your program, not the GC.

--tim

Pascal Costanza

unread,
Dec 16, 2002, 8:04:48 AM12/16/02
to
Tim Bradshaw wrote:
> * Pascal Costanza wrote:
>
>
>>This isn't directly related - but why did you include a close function
>>at all? Since my experience with the Oberon System I was always
>>puzzled why languages that include GC require you to close files. In
>>Oberon, this is also the job of the GC...
>
>
> I'm glad I don't use Oberon then. Using the GC to do resource/object
> management is a recipe for disaster: either you have to overspecify
> the GC and therefore essentially kill a lot of experimentation with
> GCs, or you are going to get an ugly shock when the GC fails to close
> your files, and you run out of file handles. Finally, if you can make
> it work at all it's just bad design anyway.

Actually, Oberon had a very neat design, based on some simplifying
assumptions, but I have given a very abridged description. For example,
the original Oberon system couldn't run out of file handles. Generally,
I agree with you, the GC shouldn't have added responsibilities.

> For the first case: for a conventional copying GC, if the file handle
> is garbage the GC never even sees it, and so never closes it. In
> order to make this work, you have to make it secretly not be garbage,
> so the GC can see it, and close it. So you have to implement
> finalisation or something like that.

I don't completely understand this paragraph - is finalisation a
problematic issue in this context?

> For the second case: how soon is the GC meant to run? Maybe you have
> a lot of real memory and aren't consing much other than file handles,
> so it runs once a week or something. Should running out of OS
> filehandles cause a GC? What about issues with generational GCs -
> your file handle might get tenured and the system never even realises
> it's garbage.

As I said above, Oberon was based on some simplifying assumptions - the
GC responsible for reclamation of harddisk space was supposed to run
about once a day (whenever you switched on your workstation ;).

> Finally, this is just terrible design. What happens if closing the
> file *fails*?

[...]

I haven't thought of partial failure before in this context. You're
right, this is a very strong objection. Thanks for your comments.

Tim Bradshaw

unread,
Dec 16, 2002, 8:54:18 AM12/16/02
to
* Pascal Costanza wrote:

> Actually, Oberon had a very neat design, based on some simplifying
> assumptions, but I have given a very abridged description. For
> example, the original Oberon system couldn't run out of file
> handles. Generally, I agree with you, the GC shouldn't have added
> responsibilities.

Assuming you can't run out of file handles is not really a tradeoff
that a system that might want to run on a general OS should make,
because it's likely not true for some OSs, both implementationally and
administratively (`we support a nearly unlimited number of open files,
but we restrict the number a process can have to nnn' (such as modern
Unixes with ulimit)).

Even assuming you control the OS, you can't make this be true - if
these file handles refer to open TCP connections, it's not only your
system that needs to not care how many you have, it's all the systems
you might talk to...


> I don't completely understand this paragraph - is finalisation a
> problematic issue in this context?

I don't know how hard it is to implement. I suppose you do it by
maintaining a secret list of objects-with-finalizers, which is not
counted as a root for GC. Then you do a normal GC, and, once it's
done, check to see if any of the objects on the to-be-finalised list
are still in old-space. If they are, copy them (and anything they
refer to) to new space, run the finalizer and then remove them from
the list. This approach has two problems I can see: if the finalizer
`revives' the object, does should it now run again next time it
becomes garbage? I guess not. Secondly, in a generational GC you
probably don't want the finalizer to make things more likely to be
tenured, but I guess this approach will do that.

But really, my point was that if you mandate, as part of the language,
that the GC will close file handles (or that open files will somehow
magically get closed), then you've mandated something like finalizers,
and this might have significant implementation cost (or it might not -
maybe finalizers are as simple as they look above).

> I haven't thought of partial failure before in this context. You're
> right, this is a very strong objection. Thanks for your comments.

Another related thing is: what if you get unsolicited input after the
file handle has been dropped? If it's a TCP stream, for instance...

Anyway, I think you can see the issues (:-).

--tim

Pascal Costanza

unread,
Dec 16, 2002, 9:11:54 AM12/16/02
to
Tim Bradshaw wrote:
> * Pascal Costanza wrote:
>
>
>>Actually, Oberon had a very neat design, based on some simplifying
>>assumptions, but I have given a very abridged description. For
>>example, the original Oberon system couldn't run out of file
>>handles. Generally, I agree with you, the GC shouldn't have added
>>responsibilities.
>
>
> Assuming you can't run out of file handles is not really a tradeoff
> that a system that might want to run on a general OS should make,
[...]

OK

>>I haven't thought of partial failure before in this context. You're
>>right, this is a very strong objection. Thanks for your comments.
>
>
> Another related thing is: what if you get unsolicited input after the
> file handle has been dropped? If it's a TCP stream, for instance...
>
> Anyway, I think you can see the issues (:-).

Yes, I can. :)

Thanks for your excellent explanations.

Duane Rettig

unread,
Dec 16, 2002, 12:00:01 PM12/16/02
to
Tim Bradshaw <t...@cley.com> writes:

We've already done the work for you here, in the form of simple-streams.
Simple-streams even gives you CL compliance, in that if you specify an
element-type at all (which you still can't change) to open, you get a
Gray stream instead of a simple-stream, which will give you the conformance
of the unchangeable element-type that CL requires. Simple-streams are by
nature bivalent, and so don't force you to work with specific data widths.

Ray Blaak

unread,
Dec 16, 2002, 12:58:13 PM12/16/02
to
Pascal Costanza <cost...@web.de> writes:
> Espen Vestre wrote:
> > Pascal Costanza <cost...@web.de> writes:
> >>at all? Since my experience with the Oberon System I was always
> >>puzzled why languages that include GC require you to close files. In
> >>Oberon, this is also the job of the GC...
> >
> > Wouldn't this make file descriptor reclaim too unpredictable (e.g.
> > for tcp/ip servers which then would be easier targets for DoS-attacks)?
>
> Oh, I don't know enough about this kind of low-level stuff - what do the
> experts think? I do know that this allows for a very convenient
> programming style - you can pass files around in your programs without
> worrying at all about who is responsible for closing them.

Not that I am calling myself an expert, but I do think that this makes file
resource clean up too unreliable. In general one cannot assume when GC will
release a particular resource.

File and network resources usually need immediate cleanup. Linking these to GC
cleanup is usually done only as a safety measure, such that in worst case
there are no ultimate leaks.

--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
bl...@telus.net The Rhythm has my soul.

Daniel Barlow

unread,
Dec 16, 2002, 7:44:25 AM12/16/02
to
Tim Bradshaw <t...@cley.com> writes:

> and hope you didn't care about your data much, anyway. Closing a file
> can fail even if you've flushed all the data, especially if you use
> something like NFS. And of course, users generally won't flush all
> the data, so closing the file has to actually put the last buffer's
> worth somewhere, or not. You really want this to happen in the
> context of your program, not the GC.

Another point is that on file descriptors which represent sockets, the
time that you close the file is observable to the connected peer, and
the socket shutdown may actually be part of the protocol. For
example, in an HTTP server that doesn't chunk or send Content-length,
it's the only way to tell that the response has finished. You don't
want all your users to be sitting there with stalled browsers until
the next GC.


-dan

--

http://www.cliki.net/ - Link farm for free CL-on-Unix resources

Kent M Pitman

unread,
Dec 16, 2002, 7:01:20 PM12/16/02
to
Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes:

Absolutely.

Some GC's don't run for days. Some GC's are generational and run all
the time, but then again, an open file can survive a generational GC
and get aged into an older pool, too. It also depends on the size of
the address space, how fast you're using space, etc. The CL specification
does not mention GC, btw; it is simply assumed that this is a resource
issue to be dealt with by implementations. Some Lisp Machine users would
prefer, rather than gc'ing, to simply run for a few days with no GC and
then die fatally when finally out of memory. Losing a lock on a file for
that period of time is unacceptable.

In fact, in Maclisp days on ITS on the PDP10's at MIT, we had about 30
users logged into one PDP10 (with about 5MB core/ram memory,
incidentally). That operating system had a finite number of "file
channels" and the number wasn't huge. It was very easy to get a
situation where there were no more file channels available. In fact,
I used to open files and not close them and let the GC do it, but
people would come to me and complain I was "using too many resources".
In an attempt to avoid being constantly berated by irate colleagues, I
wrote a macro I called IOTA which was very similar to the present day
WITH-OPEN-FILE (except it handled more than one binding in the binding
list) to keep my channel usage to something respectable. A lot of
Maclisp users used IOTA, and it helped a lot to keep system resources
in check. I believe WITH-OPEN-FILE was independently developed on the
Lisp Machine at roughly the same time, though surely for similar
reasons. The need for these things seem to sometimes be everywhere at
once.

Moreover, there is the need to be able to close synchronously.
Holding a file open is often a "hands off" signal and you often want
to release it in a controlled way for interprocess synchronization.
Waiting for the GC, which usually happens at an unsynchronized time,
will yield bad behavior in such cases.

Pascal Costanza

unread,
Dec 16, 2002, 7:14:22 PM12/16/02
to
Kent M Pitman wrote:
> Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes:
>
>
>>Pascal Costanza <cost...@web.de> writes:
>>
>>
>>>at all? Since my experience with the Oberon System I was always
>>>puzzled why languages that include GC require you to close files. In
>>>Oberon, this is also the job of the GC...
>>
>>Wouldn't this make file descriptor reclaim too unpredictable (e.g.
>>for tcp/ip servers which then would be easier targets for DoS-attacks)?
>
>
> Absolutely.
[...]

I have learned a lot from this little subthread - thanks a lot indeed.


Pascal

--
Given any rule, however ‘fundamental’ or ‘necessary’ for science, there
are always circumstances when it is advisable not only to ignore the
rule, but to adopt its opposite. - Paul Feyerabend

0 new messages