a crude map of Scheme

Tom Lord

unread,

Sep 30, 2001, 10:06:09 PM9/30/01

to

> In general, though, the constraints on type representations
> strongly encourage us to have a very small (but non-0)
> number of stateless unique values which are specially
> understood by control structures. The simplest, small,
> non-0 integer is 1.

Care to elaborate why you think there are such constraints?

Sure. I did in a previous post, but I'll try a different mode of
explanation here. Unfortunately, my beliefs are formed with
consideration of a very large and largely unformalized context: unless
"lambda mysticism" floats your boat, it is hard to explain my reasons
with just a pithy turn of phrase or two. ("Words are like fingers,
pointing at the moon....").

Let's distinguish between five kinds of "Scheme":

Scheme in R5RS (S5):
The Scheme of the current version of the Revised
Report.

Scheme in R4RS (S4):
The Scheme of R4RS. The important difference between
S4 and S5, for this post, is that in S4 it is
unspecified whether or not () and #f are the same (eq?);
in S5, they are required to be the same.

Scheme, Generically (SG):
SG is not a single programming language, but
rather a family of languages, each of which might have
S4 or S5 as a subset.

Perfect Scheme (PS):
Let's pretend for the moment that the Revised Report
series is an infinite series, and that, over time, it
produces better and better approximations of an ideal
standard: Perfect Scheme. In reality, it's an open
question whether or not PS can be said to exist: it
might be the case that PS is impossible to define, and
the the Revised Report should just settle on an
arbitrary consensus. On the other hand, it might be
the case that if we agree on a few general, and widely
agreeable principles that constrain the design of
Scheme, that those constraints are enough to
completely determine PS.

Common Scheme (CS):
Let's pretend that we know PS exists, even if we
haven't completely worked out yet what it should be.
Whatever PS is, it is almost certainly a minimal
subset of the facilities that any "complete",
practical Scheme implementation should have.
Common Scheme is a hypothetical standard that is a
superset of PS, and that contains lots of arbitrary
but generally agreed upon elements which several
implementations might support.

Here's where I think we are:

1) Lots of implementors develop SG implementations and often they take
the language design in quite novel directions. SIOD-style
implementations try to build the smallest useful implementations;
oaklisp, dylan, and (after one or two more tweaks) python explore
"object oriented" variations; VSCM and S48 aim for highly disciplined
and precise implementations, with S48 aiming to create degrees of
freedom in implementation technique; SCM strikes a pretty good balance
given equally important goals of a fully dynamic implementation, and
one that is fast; MzScheme supports a number of features nice for
teaching and experimenting with a particular style of GUI framework;
several dialects aim to support highly optimized compilation; the list
goes on. Scheme is a family of languages and people who limit their
thinking to just S5 or even PS are missing the point.

2) S4 and S5 are attempts to write down a spec for PS: a tiny subset
language that SG implementors can surface within a larger context. PS
has a few important uses:

- PS encourages implementors to focus on a small set of
primitives that everyone should have, which has influence
both on Scheme coding style, and on the development of
portable libraries. The goal of supporting portable
libraries is sometimes overstated: although the RnRS
series has been around for a long time, large libraries
of portable Scheme code have not emerged.

- PS gives a starting point for learning the specific
languages of any of the SG implementations. If you
know PS, you can learn all those SGs easily.

- PS creates a precise lingua-franca for technical
communication. At least a few really good papers
describe results in what could be S4 or S5.

- PS gives a small but non-trivial language, generalizing
the algol family of languages, that can be used for
research into language design and implementation, and
high-level programming tools.

3) CS is a very long way off, but there are several packages and SRFIs
which sure look like they'll show up (in some form) in CS. CS
probably won't be a monolith in the style of Common Lisp; instead,
like the SRFIs, it will be a large collection of components which
specific SGs can explicitly select from.

Looking at the SGs, the SRFIs, and some of the nicer non-SRFI packages
floating around, it looks like S4 and S5 aren't quite enough to
provide a common substrate for CS. That is -- CS is going to involve
more than just libraries built on top of any approximation of PS.

That's an important point to keep in mind. As we try to define PS,
since we're not particularly succeeding at providing a core for which
all of a future CS can be libraries, we should try to make sure that
our PS approximations don't make arbitrary choices that interfere with
the development of reasonable SGs that can make contributions to CS.

* Towards PS

I *conjecture* that PS exists: we can state a few generally agreeable
constraints, from which PS (or rather, a class of isomorphic
languages, each having a different choice of "primitive" constructs)
simply falls out.

What constraints do I have in mind? Sorry -- I can't state them
formally yet; that's why its a conjecture. Here is an *informal*
(imprecise) statement. People who dedicate large parts of their lives
to accurate mathematical speech, please begin gnashing your teeth
*now*:

* A side-effectless subset of the language should be
isomorphic to an applicative order lambda-calculus-plus-
call-with-current-continuation over some given set of
primitive types and procedures. All primitive procedures
must be total over the universe of values, though error
values and signaling mechanisms may be introduced for this
purpose, and effectiveness is limited by the choice of
reduction order and the usual constraints on computability.

* There should be a global namespace of top-level variables.

* Side effects should be permitted which allow the values
of variables in scope to be modified. The timing of
these side effects is determined by the rules of applicative
order evaluation and continuations created with call/cc.

I think that implies most of Scheme, but it doesn't say what the
primitive types and procedures are, and it doesn't say anything about
side effects over the primitive types. Where do those come from?
This is where things get tricky.

I think it is reasonable to say that the constraints on PS should be
chosen to permit easy and efficient implementation. We're already
largely down that path once we say "applicative order" and "side
effects on variables" -- we might as well make that commitment
explicit, and see what else it implies.

In order to talk about "easy and efficient implementation" more
precisely, we need some model of the kinds of computer we have in
mind. The model has to be abstract enough to describe all reasonable
systems but detailed enough to capture relevant performance
characteristics. Roughly speaking, I think the model is something
like:

* A general purpose, finite state, CPU, whose state is
small relative to the size of programs, connected to
a larger storage device by a small, fixed-bandwidth
connection. PS programs are reducible to instructions
and data stored in the larger device, fetched in small
pieces by the CPU, which can also write (at least) data and
(maybe) instructions back to the larger store. (I'm leaving
out I/O from this model, since it isn't relevant to the
discussion of ()/#f).

* The larger store consists of small values (whose size is
closely tied to the CPU-to-store bandwidth), and has an
address mechanism and access cost function that are related by
"locality of reference": addresses which are "close" can be
accessed, within a given period of time at significantly
lower cost than addresses that are "far away".

Because we've required the primitive procedures to be total, data
representations in PS must carry type information. Because the CPU is
finite state, and bandwidth between the CPU and larger store quite
limited, we have a space/time trade-off to make in the representation
of types. That's why staggered-tag type representations were invented
in the first place; one of the points of my sketch of a model of
computers for PS is that staggered-tag type representations and other
representations which gives us a few types with "fast" representations
and others with "slow" representations is pretty much essential:
there's no getting around it.

Since type "real estate" is naturally limited, PS should be quite
minimalist in picking the primitive types. It should require as few
as practical, leaving room for SG implementations to fill out the
rest.

I think these goals lead, maybe with one or two extra observations
that I'll omit from this post, to the types "pair" and "vector". It's
interesting to think about why two types are desirable there instead
of only one.

Characters and strings are another matter. Many of the S4 and S5
details for string handling are arbitrary: perhaps they should be
floated out of the standard in favor of lower-level or at least less
culturally-specific facilities.

What about control structures? We don't, semantically speaking, need
to define control structures at all: we've got lambda. But our rough
sketch of a machine model strongly hints that the lambda model of
control structures isn't good enough: we need a reliably fast
implementation -- some primitive control structures. That means that
control structures place another demand on precious "type real
estate": to have fast control structures, we have to use up some of
the small number of bits reliably available for "fast types", and use
those bits to switch on in primitive conditionals.

If we're going to take up some precious "fast types" for control
structures, we'd better maximize the "bang for our buck". For that
reason, for example, we require that the control structures not signal
an error for non-booleans; we also specify the control structures to
return non-boolean values for certain arguments. We want these
control structures to have a maximal number of direct uses.

Although non-binary logics might be interesting, binary logics are
just as powerful, permit fast implementation on pretty much all the
hardware we've ever seen, and probably impose the least burden on the
type system. So we need to partition the primitive types into "true"
and "false" values. The way to do that that takes up the fewest
number of bits in type representations is to lump almost all values
into one category (which we may as well call "true"), and put just one
value into the other category ("false"): a fraction of a bit of type
real estate dedicated to control structures.

What about "end of list"? PS doesn't need to define lists as a
primitive construct at all. It's handy to specify "the ordinary way"
to build lists out of pairs, but we're not adding anything fundamental
to the language in doing so -- it's all just library procedures.
Except, that is, for the "end of list" object.

EOL (aka '()) is the name for the leaf case in our recursive
definition of a list. We could either pick some existing primitive
value for EOL (say, #f, or the symbol 'nil, or the integer 0, or a
string "end of list", or ...). Or we could define a distinct value,
separate from all the others. If we're going to re-use a value, since
this is a value used for the leaf-case in a recursive data structure,
#f is the choice that leads to simpler code and faster execution: it
increases the direct usefulness of our primitive control structures.

We could stop here, observe that people succeed at writing perfectly
good Scheme code without knowing whether or '() is a distinct value,
and conclude that PS has no business settling the issue. That's where
we were up through R4RS.

Or we could go further, observe that lots of recursively defined data
types need leaf cases, and choose between requiring an arbitrary
number of such values; or conclude they should all share just one
value. An arbitrary number of leaf-case values places a burden on the
type system that we've otherwised managed to avoid. If we're going to
have a single value, making that value the same value as the control
structure value (#f) further reduces the burden on type real estate
and increases the direct applicability of the control structures.

So, either '() and #f should be the same in PS; or whether or not they
are the same should be unspecified. Well, that's the conjecture, and
a sketch of how it might be proven.

-t

Stephan H.M.J. Houben

unread,

Oct 1, 2001, 3:39:44 AM10/1/01

to

On Mon, 01 Oct 2001 02:06:09 -0000, Tom Lord <lo...@emf.emf.net> wrote:

[large snippage]

>Although non-binary logics might be interesting, binary logics are
>just as powerful, permit fast implementation on pretty much all the
>hardware we've ever seen, and probably impose the least burden on the
>type system. So we need to partition the primitive types into "true"
>and "false" values. The way to do that that takes up the fewest
>number of bits in type representations is to lump almost all values
>into one category (which we may as well call "true"), and put just one
>value into the other category ("false"): a fraction of a bit of type
>real estate dedicated to control structures.

Mmm, I come to the same conclusion, but on somewhat different
grounds. We want only one "false" value, because that means
that we can transform something like

(if cond true-part false-part)

into effectively:

(case cond ((#f) false-part) (else true-part))

,which in turn becomes something like

; load cond into accu register
compare accu, #f
jump-if-true true-part
; execute false-part
jump end
true-part:
; execute true-part
end:
; rest of the code

So on the machine level, case is more fundamental than if.
Now, if we presume that our compiler is
sufficient intelligent to optimise:

(if (null? foo) null-part nonnull-part)

into:

(case foo
((())
null-part)
(else
nonnull-part))

which in turn becomes:

; load foo into accu register
compare accu, ()
jump-if-true null-part
; execute nonnull-part
jump end
null-part:
; execute null-part
end:
; rest of the code

, then it is clear that #f or () as EOL marker makes no difference.

It might make a difference on hypothetical future hardware that
has hard-wired support for a special jump-on-register-false instruction.
But I am not aware of any hardware that has such an instruction today.

Stephan

Tom Lord

unread,

Oct 1, 2001, 4:54:55 AM10/1/01

to

So on the machine level, case is more fundamental than if.

I don't see that. I don't see how your examples show that.
`case' translates to `if' and `if' translates to `case' -- what
difference does it make?

-t

Sander Vesik

unread,

Oct 1, 2001, 9:08:40 AM10/1/01

to

Tom Lord <lo...@emf.emf.net> wrote:
> > In general, though, the constraints on type representations
> > strongly encourage us to have a very small (but non-0)
> > number of stateless unique values which are specially
> > understood by control structures. The simplest, small,
> > non-0 integer is 1.

> Care to elaborate why you think there are such constraints?

> Sure. I did in a previous post, but I'll try a different mode of
> explanation here. Unfortunately, my beliefs are formed with
> consideration of a very large and largely unformalized context: unless
> "lambda mysticism" floats your boat, it is hard to explain my reasons
> with just a pithy turn of phrase or two. ("Words are like fingers,
> pointing at the moon....").

> Let's distinguish between five kinds of "Scheme":

> Scheme in R5RS (S5):
> The Scheme of the current version of the Revised
> Report.

> Scheme in R4RS (S4):
> The Scheme of R4RS. The important difference between
> S4 and S5, for this post, is that in S4 it is
> unspecified whether or not () and #f are the same (eq?);
> in S5, they are required to be the same.

-------------------------
6.3.1 Booleans

The standard boolean objects for true and false are written as #t and #f.
What really matters, though, are the objects that the Scheme conditional
expressions (if, cond, and, or, do) treat as true or false. The phrase
``a true value'' (or sometimes just ``true'') means any object treated as
true by the conditional expressions, and the phrase ``a false value'' (or
``false'') means any object treated as false by the conditional expressions.

Of all the standard Scheme values, only #f counts as false in conditional
expressions. Except for #f, all standard Scheme values, including #t,
pairs, the empty list, symbols, numbers, strings, vectors, and procedures,
count as true.
-------------------------

In effect, it sounds to me that R5RS *forbids* '() being treated as an
alias to #f - this is somewhat even backed by the disjointness of types,
namely every object belonging to no more than one primitive type, if
'() was the same as #f, then effectively, it would be both a truth value
and a pair.

Why do you call the additional features overwhatever standardised by
CS 'arbitrary'?

> Here's where I think we are:

> 1) Lots of implementors develop SG implementations and often they take
> the language design in quite novel directions. SIOD-style
> implementations try to build the smallest useful implementations;
> oaklisp, dylan, and (after one or two more tweaks) python explore
> "object oriented" variations; VSCM and S48 aim for highly disciplined
> and precise implementations, with S48 aiming to create degrees of
> freedom in implementation technique; SCM strikes a pretty good balance
> given equally important goals of a fully dynamic implementation, and
> one that is fast; MzScheme supports a number of features nice for
> teaching and experimenting with a particular style of GUI framework;
> several dialects aim to support highly optimized compilation; the list
> goes on. Scheme is a family of languages and people who limit their
> thinking to just S5 or even PS are missing the point.

Considering it use dyland and python in the reasoning, I have hard time
extracting anything from this paragraph 8-(

> 2) S4 and S5 are attempts to write down a spec for PS: a tiny subset
> language that SG implementors can surface within a larger context. PS
> has a few important uses:

> - PS encourages implementors to focus on a small set of
> primitives that everyone should have, which has influence
> both on Scheme coding style, and on the development of
> portable libraries. The goal of supporting portable
> libraries is sometimes overstated: although the RnRS
> series has been around for a long time, large libraries
> of portable Scheme code have not emerged.

Or it could be that the mechanincs that cause large portable libraries
to be created are different, and the contents of RnRS or their usability
for that has no bearing on it.

> - PS gives a starting point for learning the specific
> languages of any of the SG implementations. If you
> know PS, you can learn all those SGs easily.

> - PS creates a precise lingua-franca for technical
> communication. At least a few really good papers
> describe results in what could be S4 or S5.

> - PS gives a small but non-trivial language, generalizing
> the algol family of languages, that can be used for
> research into language design and implementation, and
> high-level programming tools.

> 3) CS is a very long way off, but there are several packages and SRFIs
> which sure look like they'll show up (in some form) in CS. CS
> probably won't be a monolith in the style of Common Lisp; instead,
> like the SRFIs, it will be a large collection of components which
> specific SGs can explicitly select from.

Well, if there ever is an attempt at CS, I do hope it be done in a series of
RnCS, all of which approximate PCS, perfect common scheme 8-)

> Looking at the SGs, the SRFIs, and some of the nicer non-SRFI packages
> floating around, it looks like S4 and S5 aren't quite enough to
> provide a common substrate for CS. That is -- CS is going to involve
> more than just libraries built on top of any approximation of PS.

> That's an important point to keep in mind. As we try to define PS,
> since we're not particularly succeeding at providing a core for which
> all of a future CS can be libraries, we should try to make sure that
> our PS approximations don't make arbitrary choices that interfere with
> the development of reasonable SGs that can make contributions to CS.

Maybe we are succeeding at defining the core, except for a differnt kind
of CS than what you have in mind? alternatively of course, it could be
that we are further away from it than you think (eg it will take 2x or
3x more RnRS-s to get there).

> * Towards PS

> I *conjecture* that PS exists: we can state a few generally agreeable
> constraints, from which PS (or rather, a class of isomorphic
> languages, each having a different choice of "primitive" constructs)
> simply falls out.

> What constraints do I have in mind? Sorry -- I can't state them
> formally yet; that's why its a conjecture. Here is an *informal*
> (imprecise) statement. People who dedicate large parts of their lives
> to accurate mathematical speech, please begin gnashing your teeth
> *now*:

[snip]

> Since type "real estate" is naturally limited, PS should be quite
> minimalist in picking the primitive types. It should require as few
> as practical, leaving room for SG implementations to fill out the
> rest.

Or one might instead claim that type real-estate is essentially
unlimited, and point people claming otherwise at the pure-OO languages
that don't have any primitive types, merely some types that have
system defined operations. just by representing all objects in the
style of {pointer_to_type_spec, object_data} pairs, one has essentialy
acquired it.

But this is different from saying that the system specified types should
not be a small set, just "type real estate limitation" is not the
reason why this is a good idea.

> I think these goals lead, maybe with one or two extra observations
> that I'll omit from this post, to the types "pair" and "vector". It's
> interesting to think about why two types are desirable there instead
> of only one.

You don't need pair as a fundamental type, you could get away with
just having vectors. It is merely more efficent to have known fixed
length datatypes. An easy case for having a primitive triplet type
could be made.

First of - i disagree with the need for a symbol to designate that a
list has ended. After all, scheme does not even have a list type,
what it does have is a "pair" type. Now, one not only can but quite
often does buid structures out of the pairs - in fact, RnRS describe one
such structure and specify library functions for these - in which case
it is useful to have a way to specify that a slot in a pair does not
contain anything or alternatively, could be thought of as containing
"nothing".

There are no real constraints as to what "nothing" is as long as it is
a bound constant variable that is in scope for all functions dealing
with the structures. As the standard defines one type of structure built
out of pairs it is reasonable that it also defines a way of saying that
one of the slots in a pair is empty (or rather, contains "nothing"). The
standard scheme "nothing" is '() which gets (with the help of a lot
(but not quite all) library functions treating "nothing" as a list of
zero length) mis-represented as the empty list.

> We could stop here, observe that people succeed at writing perfectly
> good Scheme code without knowing whether or '() is a distinct value,
> and conclude that PS has no business settling the issue. That's where
> we were up through R4RS.

> Or we could go further, observe that lots of recursively defined data
> types need leaf cases, and choose between requiring an arbitrary
> number of such values; or conclude they should all share just one
> value. An arbitrary number of leaf-case values places a burden on the
> type system that we've otherwised managed to avoid. If we're going to
> have a single value, making that value the same value as the control
> structure value (#f) further reduces the burden on type real estate
> and increases the direct applicability of the control structures.

Do control structures have any reason for knowing about a library level
feature such as lists or even user defined structures? I don't think so,
they should merely think in terms of 'true' and 'false'.

Also, sould theer be more than one type of leaf nodes in the structure,
the system knowing about one of them is more of a burden than help.

> So, either '() and #f should be the same in PS; or whether or not they
> are the same should be unspecified. Well, that's the conjecture, and
> a sketch of how it might be proven.

No, they should be different. It has not been shown that there is a need
for the two to be the same, and there is a rtivial example that shows a
case where it is useful to have them separate:

function f returns -1 if it's argument is not a list or the length
of the list otherwise.

With #f and '() separate, we have simply:
(define f
(lambda (x)
(if (not (list? x))
-1
(length x))))

while with them being the same, there is no reasonable way to write
the function.

> -t

--
Sander

+++ Out of cheese error +++

Stephan H.M.J. Houben

unread,

Oct 1, 2001, 11:28:27 AM10/1/01

to

In the assembly output, the conditional execution is
translated as a "compare" instruction, which sets or clears
a flag, and then a jump that is taken if the flag is set.

So the generated code effectively has to check if the
condition is eq? to #f. But it would be just as expensive
to compare with () rather than #f, or any other value for
that matter.

The point is that for "if", one value is special, namely #f.
But on conventional hardware (i.e. not specially optimised
for Lisp) there is nothing special about #f. The "case"
construct reflects this idea, since "case" has no special
value like "if" has. Thus I claimed that "case" was, in
a sense, more fundamental.

Of course, most processors only provide 2-way jumps, and case
can be n-way, so in that respect it is not so fundamental.
So perhaps this remark didn't make that much sense after all.

Anyway, I just wanted to point out that on conventional machines,
testing for #f-ness and testing for ()-ness is both equally
expensive, since the hardware doesn't have any built-in notion
of #f and () and considers both as pretty much arbitrary bit
patterns.

Stephan

Tom Lord

unread,

Oct 1, 2001, 12:36:22 PM10/1/01

to

Sander Vesik <san...@haldjas.folklore.ee>:
> Tom Lord <lo...@emf.net>

> The Scheme of R4RS. The important difference between
> S4 and S5, for this post, is that in S4 it is
> unspecified whether or not () and #f are the same (eq?);
> in S5, they are required to be the same.

[....]

In effect, it sounds to me that R5RS *forbids* '() being
treated as an alias to #f

Right. Typo. Should say "different", of course.

> Common Scheme is a hypothetical standard that is a
> superset of PS, and that contains lots of arbitrary
> but generally agreed upon elements which several
> implementations might support.

Why do you call the additional features overwhatever
standardised by CS 'arbitrary'?

A library or feature is "arbitrary" in this sense if there is
more than one perfectly reasonable way to define the library
or feature, and the point of the standard is just to pick one.

Consider a Scheme interface to unix system calls. The way that error
codes (errno values) are represented is arbitrary: you'd want to be
regular about it, but there's more than one way to do that. A
standard would be nice, so that several implementations could make the
same choice -- but that choice would be an arbitrary one.

By way of contrast, consider the scoping rules for variables. We
might have arbitrary choices of syntax for variable reference, but the
basic scoping rules are not arbitrary -- they're fixed once we decide
to use a lambda calculus.

The conjecture in "a crude map of Scheme" is that, of the core
features of Scheme, far fewer are arbitrary than might be assumed at
first glance: ``perfect scheme'' -- the Scheme RnRS might be said to
aim for -- is (perhaps) the consequence of a few general decisions
about language design and the targetted class of computers.

You don't need pair as a fundamental type, you could get away with
just having vectors. It is merely more efficent to have known fixed
length datatypes. An easy case for having a primitive triplet type
could be made.

For that matter, you can "get away with" having no primitive types at
all. You can get by with nothing but functions, using functions to
build models of numbers, data structures, and everything else.

But we don't do that because efficiency is a central concern of the
design of Scheme -- you shouldn't really say that some core feature is
there "merely" because it is more efficient.

Also, Revised Report Scheme doesn't have a triplet type which is
disjoint from the pair type or the quad type -- but it arguably has
all three.

-t

Ji-Yong D. Chung

unread,

Oct 1, 2001, 1:14:20 PM10/1/01

to

Hi

This is not related to the main thrust of the discussion, I
am curious. You wrote,

> For that matter, you can "get away with" having no primitive types at
> all. You can get by with nothing but functions, using functions to
> build models of numbers, data structures, and everything else.

Wouldn't you still need primitive types for I/O, or can one
"get away without" those also?

Sander Vesik

unread,

Oct 1, 2001, 2:46:01 PM10/1/01

to

Tom Lord <lo...@emf.emf.net> wrote:

> You don't need pair as a fundamental type, you could get away with
> just having vectors. It is merely more efficent to have known fixed
> length datatypes. An easy case for having a primitive triplet type
> could be made.

> For that matter, you can "get away with" having no primitive types at
> all. You can get by with nothing but functions, using functions to
> build models of numbers, data structures, and everything else.

right - I shouldn't have written 'could get away with', but instead
'there is no major difference between pairs as pairs vs. pairs as
length-2 vectors'. Especially if you use low amounts of lists.

> But we don't do that because efficiency is a central concern of the
> design of Scheme -- you shouldn't really say that some core feature is
> there "merely" because it is more efficient.

> Also, Revised Report Scheme doesn't have a triplet type which is
> disjoint from the pair type or the quad type -- but it arguably has
> all three.

But none of the implemenations of that triplet type are particularily
efficent (yes, some are particularily inefficent, but that's another
matter entirely).

Lauri Alanko

unread,

Oct 2, 2001, 12:32:58 PM10/2/01

to

In article <9pa22r$2fq$1...@news.tue.nl>,

Stephan H.M.J. Houben <step...@wsan03.win.tue.nl> wrote:
>The point is that for "if", one value is special, namely #f.
>But on conventional hardware (i.e. not specially optimised
>for Lisp) there is nothing special about #f.

Unless you choose to represent #f with an all-bits-zero machine word, in
which case the flags may have been set already correctly by a previous
instruction (perhaps a load), so you don't need to compare, just branch.

Lauri Alanko
l...@iki.fi

David Rush

unread,

Oct 4, 2001, 2:46:45 PM10/4/01

to

This reply is to both Tom's and Sander's posts, I just think I can more
discuss things in terms of Tom Lord's. I will commit the usual
contextual mayhem...

lo...@emf.emf.net (Tom Lord) writes:
> my beliefs are formed with
> consideration of a very large and largely unformalized context: unless
> "lambda mysticism" floats your boat,

It floats mine, so I'll give it a go.

> Let's distinguish between five kinds of "Scheme":
>
> Scheme in R5RS (S5):

> Scheme in R4RS (S4):

and Sn for any member of the RnRS approximation series.

> Scheme, Generically (SG):
> Perfect Scheme (PS):
> Common Scheme (CS):

> Scheme is a family of languages and people who limit their
> thinking to just S5 or even PS are missing the point.

Well yes and no. I like SG as a research platform into what couyld
arguably called the 'core' of programming language semantics. OTOH,
when it comes to writing real code I'd prefer a fairly complete (in
POSIX terms) CS.

> 3) CS is a very long way off Looking at the SGs, the SRFIs, and some

> of the nicer non-SRFI packages floating around, it looks like S4 and
> S5 aren't quite enough to provide a common substrate for CS.

I'm glad to hear an 'old hand' saying this at last.

> I *conjecture* that PS exists: we can state a few generally agreeable
> constraints, from which PS (or rather, a class of isomorphic
> languages, each having a different choice of "primitive" constructs)
> simply falls out.

I'm well prepared to grant the conjecture, although I suspect that PS
is SML/NJ with an s-expression syntax glommed over it. In fact, I'd
include SML as an SG exploring the benefits of static type inference.

> * A side-effectless subset of the language should be
> isomorphic to an applicative order lambda-calculus-plus-
> call-with-current-continuation over some given set of
> primitive types and procedures.

You had me up until you said 'plus call/cc'. I *like* call/cc. A
*lot*. But since I've been programming more and more in explicit CPS
style (mostly to deal with S4's that don't support multiple values) I
am not convinced that it deserves this position in the lambda
mysticism. After all call/cc is really just a simple source
transformation away from your normal code.

> All primitive procedures
> must be total over the universe of values, though error
> values and signaling mechanisms may be introduced for this
> purpose,

This is a core issue which actually bears upon your #f vs. () debate.

> I think it is reasonable to say that the constraints on PS should be
> chosen to permit easy and efficient implementation.

OK for now, but I don't totally agree here. I'd be more concerned
with efficient than easy.

> In order to talk about "easy and efficient implementation" more

> precisely, we need some model ... Roughly speaking, I think the

> model is something like:
>
> * A general purpose, finite state, CPU, whose state is

> small relative to the size of programs, ... (I'm leaving

> out I/O from this model, since it isn't relevant to the
> discussion of ()/#f).

Unfortunately, I/O does have a bearing on ()/#f. But more on that
later. Suffice it to say that this is one of the greatest failures of
Sn, and is so precisely because of the tendency to handwave I/O
away. It is a core feature of real programs that they must perform
I/O. Sn needs to have this modelled if it will ever begin to really
converge on PS.

> Since type "real estate" is naturally limited,

^^^^
||||- staggered tag real estate. Type real estate is naturally
infinite (and is probably dense, too)

> PS should be quite minimalist in picking the primitive types.

A better reason is that we are defining a 'core' language. When we
have commonly-available 64-bit CPUs, the tag real estate will
certainly be rather a lot larger. <handwave>

> I think these goals lead, maybe with one or two extra observations
> that I'll omit from this post, to the types "pair" and "vector". It's
> interesting to think about why two types are desirable there instead
> of only one.

In fact I don't think that those two types are desirable. "pair" is a
relatively trivial generalization of "vector". I think you really mean
that PS needs both aggregate and recursive types - in which case you
should be arguing for "list" and "vector". But since a Scheme location
gets bound to a value from the union of all the primitive Scheme types
the only *visible* part of the "list" recursive type is the "pair".

In short: the list type is *not* part of PS. It is a specific instance
of the more general domain of recursive types which Sn does not
adequately support.

> Characters and strings are another matter.

Indeed. But if we're ignoring I/O these are definitely below the noise
floor. They definitely need to be part of CS, though.

> What about control structures? we need ... some primitive control
> structures.

Errr...So some combinators are primitive. No Big Deal, neh?

> That means that
> control structures place another demand on precious "type real
> estate": to have fast control structures, we have to use up some of
> the small number of bits reliably available for "fast types", and use
> those bits to switch on in primitive conditionals.

This really does not follow (as I think Sander pointed out). With S, K
and Y you've really got everything. Enlarging the primitive set to
include the cheap comparisons found on real processors is
trivial. However, none of these require either type or tag real-estate
*directly*, although various control primitives might need type
information in order to correctly function.

> we require that the control structures not signal
> an error for non-booleans;

Therefore the tag real-estate argument doesn't apply here. However,
this is a very useful CS feature. I can't find it in my heart to
really consider it a PS feature; in fact, I think it would be a PS
bug.

> we also specify the control structures to
> return non-boolean values for certain arguments.

We're talking about AND and OR here right? I actually think that these
are misnamed, although they are again very convenient. We know from Sn
that AND and OR are macro-expansions for particular IF idioms.

> So we need to partition the primitive types into "true"
> and "false" values. The way to do that that takes up the fewest
> number of bits in type representations is to lump almost all values
> into one category (which we may as well call "true"), and put just one
> value into the other category ("false"): a fraction of a bit of type
> real estate dedicated to control structures.

Pace, for the moment. I just want to note that you are making "false"
to be an (important) degenerate value within the overall value lattice
of the language.

> What about "end of list"? PS doesn't need to define lists as a
> primitive construct at all.

No, but...

> Except, that is, for the "end of list" object.

Yet another important degenerate value.

> EOL (aka '()) is the name for the leaf case in our recursive
> definition of a list.

> We could stop here, observe that people succeed at writing perfectly

> good Scheme code without knowing whether or '() is a distinct value,

...

> Or we could go further, observe that lots of recursively defined data
> types need leaf cases, and choose between requiring an arbitrary
> number of such values; or conclude they should all share just one
> value. An arbitrary number of leaf-case values places a burden on the
> type system that we've otherwised managed to avoid. If we're going to
> have a single value, making that value the same value as the control
> structure value (#f) further reduces the burden on type real estate
> and increases the direct applicability of the control structures.

Except for disjointness of type requirements. Unfortunately, #f is too
type-specific to be a useful leaf case for recursive types. An empty
list is in no sense 'untrue'. When you generalize to more general
recursive types (which Sn's attention to lists obfuscates) the problem
becomes even more pronounced. In point of fact Sn needs a way to
address user-definable types. As with the definition of the list type
in terms of pair, the actual recursiveness of the definition is
accomplished by the top-level union of types explicit in PS.

> So, either '() and #f should be the same in PS

No, this conflation is a fundamental error, actually. Let's consider a
related question for which '() and/or #f is/are often presented as the
solution. This is because they are both `bottom' values in certain
value lattices present within Scheme. In the lattice of lists, '()
carries no information about the lists built on it. In the lattice of
Scheme booleans we have #f vs. everything else; if you know that a value
is not #f, you know nothing else about it.

Now if you are attempting to define a mechanism for generally
performing lazy initializations (as in an putative Scheme object
system) You'd like a value that is distinct from all other values (and
therefore indicates that you don't know anything about the actual
value needed yet), in much the same way that '() and #f hide
information about the rest of the elements in their respective
lattices.

(I should define some terminology here, but in the interests of
brevity I will assume that I am not being too opaque)

Now if your lazy-value should ultimately be drawn from the lattice of
lists, you *can't* use '() as the uninitialized case if an empty list
is a valid actual-value of the instantiated list. Likewise #f is a
problem if you'd like your actual values to be drawn from the boolean
lattice. What is needed is an utterly disjoint value whose only valid
operation is the test of it's presence.

Well Scheme has one already, and (un)surprisingly enough it is
connected with the only implicitly lazy data structure present in
Sn. the only way to denote this object is:

(with-input-from-file "/dev/null" read)

For convenience we usually call it eof-object; however, it
conventionally means that no more information is available. That
meaning, combined with clumsiness of it's denotation make it entirely
unattractive for the role. What I really want is the reification of
#unspecified from R5RS 7.2. This would have a number of desirable
implications for 'error values and signaling mechanisms' which have
not (and should be) addressed in PS.

The key to making this possible is that we *not* conflate any of the
'bottoms' present within Scheme, because 'error values and signalling
mechanisms' also hide information. In order to do anything useful in
these situations we need to make a clearer separation of these
low-information cases.

> Well, that's the conjecture, and a sketch of how it might be proven.

Since I don't accept your argument about tag real-estate, it doesn't
prove it to me. Additionally I think there are very good reasons *not*
to conflate #f and '().

david rush
--
Einstein said that genius abhors consensus because when consensus is
reached, thinking stops. Stop nodding your head.
-- the Silicon Valley Tarot