Miscellaneous queries about the language

Andy

unread,

Jan 10, 2004, 3:14:14 PM1/10/04

to

Hello

I have some questions - or curiosities - about Scheme and about how (&
why) it works the way it does.

I'll number these if anyone wants to pick and mix in replying:

1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
what are the implications of this (if any) for security, program
overhead, garbage collection?
1.1. In the same section: "... if they [implementations of Scheme?] can
prove that the object cannot possibly matter to any future computation"
- how does an implementation do that (as in a background process)? or is
it left to the programmer to pass arguments to ensure this happens? or ... ?
1.2. I was wondering then about garbage collection, and found this:
http://www.swiss.ai.mit.edu/projects/scheme/documentation/user_4.html#SEC32
I have to confess - this is _way_ above my head!!! :) Can someone
undertake a translation please, and at what point do considerations of
such programming come into play?
However, Wilson writes "The Scheme heap is garbage collected, meaning
that the Scheme system automatically cleans up after you."
(http://www.cs.utexas.edu/users/wilson/schintro/schintro_33.html) - so
does this vary between implementations, and if so, how does one figure
out which implementations do/not garbage collect?

2. (This is probably a no-brainer but ...) What is the difference among
all these groups: ANSI, IEEE, and the RnRS. I know what they mean and
have a general idea of their respective remits, but was wondering about
how they specifically co-ordinate/shape the development of Scheme with
respect to standardisation - like ANSI C, for example.

3. In reading a page of code, a Scheme program is a collection of
sub-programs (functions, forms). Is the arrangement of all of these
sub-forms arbitrary - for e.g. can one pass a function as an argument
near the top of page 1 when that function is being defined and
processing data in the middle of page 2?
This ties in for me with a bigger issue of how to _read_ Scheme. So far
it seems that to read Scheme, one reads from right to left - from the
inner-most set of parentheses to the outer set. This may be a statement
on the condition of my brain or the inexperience of a neophyte, but I am
finding it hard to keep all that in my brain when trying to decipher
seemingly endless blocks of (definition (statements x) ... ). How do
those of you with more experience do it?

There'll be more in the future, undoubtedly - bet you can't contain your
excitement!! :)

Thanks all

- Andy

Jens Axel Søgaard

unread,

Jan 10, 2004, 5:25:49 PM1/10/04

to

Andy wrote:
> Hello
>
> I have some questions - or curiosities - about Scheme and about how (&
> why) it works the way it does.
>
> I'll number these if anyone wants to pick and mix in replying:
>
> 1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
> what are the implications of this (if any) for security, program
> overhead, garbage collection?

Remember, this is the semantics section. If the implementation can
prove that an object will never be used by the running program, it
is ok to remove the object.

> 1.1. In the same section: "... if they [implementations of Scheme?] can
> prove that the object cannot possibly matter to any future computation"
> - how does an implementation do that (as in a background process)? or is
> it left to the programmer to pass arguments to ensure this happens? or
> ... ?

This happens automatically. How it is done depends on the implementation.
One way is descibed in SICP section 5.3.2:

<http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-33.html#%_idx_5930>

> 1.2. I was wondering then about garbage collection, and found this:
> http://www.swiss.ai.mit.edu/projects/scheme/documentation/user_4.html#SEC32
> I have to confess - this is _way_ above my head!!! :) Can someone
> undertake a translation please, and at what point do considerations of
> such programming come into play?

The only thing I can think of (besides having fun with internals) is
in benchmarks. Before one starts a benchmark, it is wise to force
a garbage collection before the timer starts. In this way the second
algorithm, say, doesn't pay for garbage produeced bu the first.

> However, Wilson writes "The Scheme heap is garbage collected, meaning
> that the Scheme system automatically cleans up after you."
> (http://www.cs.utexas.edu/users/wilson/schintro/schintro_33.html) - so
> does this vary between implementations, and if so, how does one figure
> out which implementations do/not garbage collect?

They all do (in one form or the other).

> 3. In reading a page of code, a Scheme program is a collection of
> sub-programs (functions, forms). Is the arrangement of all of these
> sub-forms arbitrary - for e.g. can one pass a function as an argument
> near the top of page 1 when that function is being defined and
> processing data in the middle of page 2?

If all your definitions are of functions, then the order is arbitrary.

> This ties in for me with a bigger issue of how to _read_ Scheme. So far
> it seems that to read Scheme, one reads from right to left - from the
> inner-most set of parentheses to the outer set. This may be a statement
> on the condition of my brain or the inexperience of a neophyte, but I am
> finding it hard to keep all that in my brain when trying to decipher
> seemingly endless blocks of (definition (statements x) ... ). How do
> those of you with more experience do it?

It depends on the situation. To get the big picture, I read
top-to-bottom. To study details, I read a single expression inside-out
as you do.

--
Jens Axel Søgaard

Andy

unread,

Jan 10, 2004, 7:45:41 PM1/10/04

to

Jens Axel Søgaard wrote:
>> 1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
>> what are the implications of this (if any) for security, program
>> overhead, garbage collection?

> Remember, this is the semantics section. If the implementation can
> prove that an object will never be used by the running program, it
> is ok to remove the object.

hmmm ... I'm not sure that answers my question Jens. Can you elaborate
please.

>> 1.1. In the same section: "... if they [implementations of Scheme?]
>> can prove that the object cannot possibly matter to any future
>> computation" - how does an implementation do that (as in a background
>> process)? or is it left to the programmer to pass arguments to ensure
>> this happens? or ... ?
>
>
> This happens automatically. How it is done depends on the implementation.
> One way is descibed in SICP section 5.3.2:
>
> <http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-33.html#%_idx_5930>

Thanks. I'll check that out - but might have to come back for a further
translation :)

>> 1.2. I was wondering then about garbage collection, and found this:
>> http://www.swiss.ai.mit.edu/projects/scheme/documentation/user_4.html#SEC32
>>
>> I have to confess - this is _way_ above my head!!! :) Can someone
>> undertake a translation please, and at what point do considerations of
>> such programming come into play?
>
> The only thing I can think of (besides having fun with internals) is
> in benchmarks. Before one starts a benchmark, it is wise to force
> a garbage collection before the timer starts. In this way the second
> algorithm, say, doesn't pay for garbage produeced bu the first.

OK - so basically, as a neophyte I don't need to worry about that ...
which is just as well at this point!!

>> However, Wilson writes "The Scheme heap is garbage collected, meaning
>> that the Scheme system automatically cleans up after you."
>> (http://www.cs.utexas.edu/users/wilson/schintro/schintro_33.html) - so
>> does this vary between implementations, and if so, how does one figure
>> out which implementations do/not garbage collect?
>
>
> They all do (in one form or the other).

How does one check that out further. For example I use STk as well as
DrScheme. I am pretty certain that DrScheme has GC (I think that I read
that somewhere, but cannot remember where), but don't really know about
STk. I can but presume, but I am curious as to how one can ascertain
that. Is there a test one can apply to an implementation, like proper
tail recursion is used as a criteria for R5RS compliance, I believe?

>> 3. In reading a page of code, a Scheme program is a collection of
>> sub-programs (functions, forms). Is the arrangement of all of these
>> sub-forms arbitrary - for e.g. can one pass a function as an argument
>> near the top of page 1 when that function is being defined and
>> processing data in the middle of page 2?
>
> If all your definitions are of functions, then the order is arbitrary.

This is a deficit in my knowledge-base about Scheme. So far, all the
programs that I have read (which, admittedly, is not more than 4
complete programs) and the emphasis in texts I am reading/working with
seems to be suggest that a Scheme program is replete with functions. I
am sure that there must be other 'forms' within a program, such as
prompts, as noted in a Scheme tutorial at
http://cs.wwc.edu/~cs_dept/KU/PR/Scheme.html . The example program,
"checkbook", at the end of the tut, provides user interaction prompts
which was interesting to see, because I was wondering how one takes
arguments passed to it by a user. In C or in Python that is quite
straightforward, so am pleased to see how it is done in Scheme.
Aside from this however, what other "orders of precedence" - if any -
apply when one reads the code if different forms are used aside from
functions?

>> This ties in for me with a bigger issue of how to _read_ Scheme. So
>> far it seems that to read Scheme, one reads from right to left - from
>> the inner-most set of parentheses to the outer set. This may be a
>> statement on the condition of my brain or the inexperience of a
>> neophyte, but I am finding it hard to keep all that in my brain when
>> trying to decipher seemingly endless blocks of (definition (statements
>> x) ... ). How do those of you with more experience do it?
>
>
> It depends on the situation. To get the big picture, I read
> top-to-bottom. To study details, I read a single expression inside-out
> as you do.
>

OK - I guess then maybe this is an experience thing then, because as I
said I have difficulty keeping all that stuff current in my head and
mapping it out conceptually, and how sub-forms interact. But I suppose
that, as I become more familiar with the code and the way that Scheme
works I'll be able to focus more on the bigger picture - the idea - and
"intuitively" understand the detail without getting bogged down in it.
Oh boy - a satori moment awaits me, one of these years!! :)

Thanks Jens.

- Andy

Thien-Thi Nguyen

unread,

Jan 10, 2004, 7:40:00 PM1/10/04

to

Andy <Andy...@home.com> writes:

> 3. In reading a page of code, [...]

> How do those of you with more experience do
> it?

i go into emacs, turn on hideshow minor mode, and hide everything. then
i skim the top-level to see the overall shape and selectively show those
blocks that hint at some curiosity of design or implementation.

understanding the hints is an acquired taste, probably. certainly, it
changes w/ time and ambiant interests. for example, i am currently
(perhaps perpetually :-) at a middle-stage of my development as a scheme
programmer, and find that control expressions and funky syntax munging
are not as interesting to read as other more mundane aspects (like data
organization). perhaps that will change as i pass my eyeballs over more
data (and become more organized :-). we will see.

thi

Andy

unread,

Jan 10, 2004, 8:07:32 PM1/10/04

to

Thanx Thi. Emacs is (I think Brian Harvey describes it as "God's GUI!!")
something that I have never really sunk my teeth into I must confess.
That it is one of the more Scheme/Lisp-friendly IDEs makes sense because
I think Stalman created it entirely in Lisp.
Not having much beyond the initial blush experience with God's GUI (:)),
what exactly are "hints" and how do they help you get the overall idea
behind a program?

Unfortunately Thi, I am but a lowly entrant upon a grand stage, with
minimal experience in computing and programming beyond basics. There is
much for me to learn about Scheme - so little time and so little brain
power.

- Andy

Joe Marshall

unread,

Jan 10, 2004, 8:37:36 PM1/10/04

to

Andy <Andy...@home.com> writes:

> Hello
>
> I have some questions - or curiosities - about Scheme and about how (&
> why) it works the way it does.
>
> I'll number these if anyone wants to pick and mix in replying:
>
> 1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
> what are the implications of this (if any) for security, program
> overhead, garbage collection?

That sounds suspiciously like a homework question.

> 1.1. In the same section: "... if they [implementations of Scheme?]
> can prove that the object cannot possibly matter to any future
> computation" - how does an implementation do that (as in a background
> process)? or is it left to the programmer to pass arguments to ensure
> this happens? or ... ?

This is more interesting. In general, one cannot precisely determine
whether an object matters to any future computation. Garbage
collectors use a conservative approximation like reachability. If an
object cannot be accessed in any way, the object no longer matters.

> 1.2. I was wondering then about garbage collection, and found this:
> http://www.swiss.ai.mit.edu/projects/scheme/documentation/user_4.html#SEC32
> I have to confess - this is _way_ above my head!!! :) Can someone
> undertake a translation please, and at what point do considerations of
> such programming come into play?

> However, Wilson writes "The Scheme heap is garbage collected, meaning
> that the Scheme system automatically cleans up after you."
> (http://www.cs.utexas.edu/users/wilson/schintro/schintro_33.html) - so
> does this vary between implementations, and if so, how does one figure
> out which implementations do/not garbage collect?

Almost all implementations do garbage collect. The particular
algorithm used for GC varies among implementations (and even *within*
implementations!).

> 2. (This is probably a no-brainer but ...) What is the difference
> among all these groups: ANSI, IEEE, and the RnRS. I know what they
> mean and have a general idea of their respective remits, but was
> wondering about how they specifically co-ordinate/shape the
> development of Scheme with respect to standardisation - like ANSI C,
> for example.

This web page has a good overview:
http://www.acm.org/tsc/sstd.html

> 3. In reading a page of code, a Scheme program is a collection of
> sub-programs (functions, forms). Is the arrangement of all of these
> sub-forms arbitrary - for e.g. can one pass a function as an argument
> near the top of page 1 when that function is being defined and
> processing data in the middle of page 2?

In general, yes.

Usually, a Scheme program has a bunch of definitions. The definitions
do not evaluate their bodies, so it is ok for those bodies to have
forward references. So this would work:

(define (funny-function observer)
(invoke-observer observer another-function))

(define (another-function x y z)
(+ x (* y z)))

Because the act of defining funny-function doesn't do anything with
regard to another function. Later on, when you call funny-function,
another-function will have already been defined.

On the other hand, this would not work:

(map foo '(a b c))

(define (foo symbol) (cons symbol 22))

Because you are actually trying to use the function foo before you
have defined it.

> This ties in for me with a bigger issue of how to _read_ Scheme. So
> far it seems that to read Scheme, one reads from right to left - from
> the inner-most set of parentheses to the outer set. This may be a
> statement on the condition of my brain or the inexperience of a
> neophyte, but I am finding it hard to keep all that in my brain when
> trying to decipher seemingly endless blocks of (definition (statements
> x) ... ). How do those of you with more experience do it?

It takes a little bit of practice, but you'll get the hang of it.

However, you want to read the functions from *outside* to *inside*,
not the other way. This is the trickiest thing to learn --- you need
to ignore the details and trust that they work. I'll give an example:

(define (leftmost-digit n base)
(define (loop i next)
(cond ((<= next n) (loop next (* next next)))
((= i 1) n)
(else (leftmost-digit (floor (/ n i)) base))))
(loop 1 base))

First, we look at the very first outside thing. It is a DEFINE. So
now we know we are defining something. So we look at the next thing.
It is `leftmost-digit'. At this point, we can actually stop. We know
now that this is a function that computes the leftmost digit of
something. If we don't need to know the leftmost digit, why bother
continuing to read? If we continue, however, we'll see that there are
two arguments: N and BASE. Ok, I'll guess that N is the number whose
leftmost digit we want, and BASE is the base in which we represent the
number (the leftmost digit of any number in base 2 is going to be a 1,
but if it is in base 16 it could be 1 through F).

If we run it, we'll find out if it works:
=> (leftmost-digit 23094823084 10)
2

Yep. Now if we need to know the leftmost digit of something we can
use this. Do we care *how* it works? Not really.

But suppose we want to see how it works for fun. Continuing in, we
see an internal function named LOOP. Looks like some sort of
iteration. Skip the loop body. The next thing is the call to LOOP
with 1 and BASE. Ok, so this thing works by some sort of iteration or
recursion with 1 and BASE as the starting point.

Now we can look at the definition of LOOP. We can anticipate that
there will be a recursive call to LOOP, and that the recursive call
will be within a COND or IF, so let's find it... Right off the bat,
there is the conditional and a recursive call to LOOP. A quick check
shows that this is the only recursive call, so we can now see that
this function will iterate with I being the last value of NEXT each
time, and NEXT starting at BASE and being SQUARED each time.

So if we use base 10, I and NEXT will do this:

I NEXT

1 10
10 100
100 10000
10000 100000000
100000000 10000000000000000

When does it stop? The arm of the conditional says we keep going so
long as NEXT is not greater than N (our original number). Well, at
the rate NEXT is growing, that'll be pretty quick.

Since I is the previous value of NEXT, I will be that value of NEXT
smaller than N. So we know that the function iterates until it finds
a big multple of the base that is still smaller than our number N.

Now lets look at the other conditional arms. If I is 1, then we just
return N. Ok, this is the base case for when we have a single digit
input. Simple enough.

Finally, we look at the last case. We're going to find the
LEFTMOST-DIGIT of something else! It had better be smaller than what
we currently have if we expect it to finish, and sure enough, we
divide our original number by I (that big multple of the base) and
throw away the remainder (by taking the floor). So what's going on is
that we our original big number:

23094823084

Find a big multiple of the base to divide it: 100000000,
do the division and throw out the remainder: 230
and find the leftmost digit of that. Eventually we have a 1 digit number,
and we just return it.

So you can see, you really want to start at the outside and work in
rather than on the inside and work out. It is easier to take
something apart into pieces than to put the pieces together without
knowing what it is they are doing.

--
~jrm

Joe Marshall

unread,

Jan 10, 2004, 9:08:57 PM1/10/04

to

Andy <Andy...@home.com> writes:

> Emacs is (I think Brian Harvey describes it as "God's
> GUI!!") something that I have never really sunk my teeth into I must
> confess. That it is one of the more Scheme/Lisp-friendly IDEs makes
> sense because I think Stalman created it entirely in Lisp.

It is more than that. Emacs has been around since the 70's as a
collection of TECO macros (E-diting MACroS) used by various people at
the MIT AI Lab. Steele and Stallman unified the various macro
packages into one system.

There have been several implementation of Emacs:
Multics Emacs by Greenberg (the first pure lisp emacs)
EINE and ZWEI by McMahon and Weinreb (lisp machine implementation)
Gosling Emacs by James Gosling (in C)
GNU Emacs by Richard Stallman (in C and Lisp)
XEmacs by Jamie Zawinski (in C and Lisp)

and of course numerous other people contributed to each of these.
Stallman's GNU emacs was neither the first Lisp version, nor was it
entirely Lisp.

--
~jrm

Andy

unread,

Jan 10, 2004, 9:39:08 PM1/10/04

to

Hello Joe

Joe Marshall wrote:

> Andy <Andy...@home.com> writes:
>
>>1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
>>what are the implications of this (if any) for security, program
>>overhead, garbage collection?
>
>
> That sounds suspiciously like a homework question.

I wish ... :) Nope, I am not in school. Just a hobbyist trying to get my
head around concepts. I'll take that as a compliment that the question
had value - thanks.

>
>>1.1. In the same section: "... if they [implementations of Scheme?]
>>can prove that the object cannot possibly matter to any future
>>computation" - how does an implementation do that (as in a background
>>process)? or is it left to the programmer to pass arguments to ensure
>>this happens? or ... ?
>
>
> This is more interesting. In general, one cannot precisely determine
> whether an object matters to any future computation. Garbage
> collectors use a conservative approximation like reachability. If an
> object cannot be accessed in any way, the object no longer matters.

I appreciate this Joe. This seems like a shorthand way of understanding
this process, and a useful summary. I am still wading through Jens's
SICP reference :)

> Almost all implementations do garbage collect. The particular
> algorithm used for GC varies among implementations (and even *within*
> implementations!).

What do you mean Joe? Can you explain how it would vary _within_
implementations?

>
>>2. (This is probably a no-brainer but ...) What is the difference
>>among all these groups: ANSI, IEEE, and the RnRS. I know what they
>>mean and have a general idea of their respective remits, but was
>>wondering about how they specifically co-ordinate/shape the
>>development of Scheme with respect to standardisation - like ANSI C,
>>for example.
>
>
> This web page has a good overview:
> http://www.acm.org/tsc/sstd.html
>

OK - will check it out. Thanks.

>>3. In reading a page of code, a Scheme program is a collection of
>>sub-programs (functions, forms). Is the arrangement of all of these
>>sub-forms arbitrary - for e.g. can one pass a function as an argument
>>near the top of page 1 when that function is being defined and
>>processing data in the middle of page 2?
>
>
> In general, yes.
>
> Usually, a Scheme program has a bunch of definitions. The definitions
> do not evaluate their bodies, so it is ok for those bodies to have
> forward references. So this would work:
>
> (define (funny-function observer)
> (invoke-observer observer another-function))
>
> (define (another-function x y z)
> (+ x (* y z)))
>
> Because the act of defining funny-function doesn't do anything with
> regard to another function. Later on, when you call funny-function,
> another-function will have already been defined.

Makes sense. And I think this is (generally) what I have encountered so
far which prompted my question; the order seemed almost entirely arbitrary.

> On the other hand, this would not work:
>
> (map foo '(a b c))
>
> (define (foo symbol) (cons symbol 22))
>
> Because you are actually trying to use the function foo before you
> have defined it.

Again, makes sense. Forms have to be defined before arguments can be
passed to them or they, in turn, are passed to other functions as data.

>
>>This ties in for me with a bigger issue of how to _read_ Scheme. So
>>far it seems that to read Scheme, one reads from right to left - from
>>the inner-most set of parentheses to the outer set. This may be a
>>statement on the condition of my brain or the inexperience of a
>>neophyte, but I am finding it hard to keep all that in my brain when
>>trying to decipher seemingly endless blocks of (definition (statements
>>x) ... ). How do those of you with more experience do it?
>
>
> It takes a little bit of practice, but you'll get the hang of it.

I'll take that on faith :)

> However, you want to read the functions from *outside* to *inside*,
> not the other way. This is the trickiest thing to learn --- you need
> to ignore the details and trust that they work.

What does this mean with respect to peer-reviewed code, vulnerabilities,
etc., if we "accept and trust"? Isn't there some more critical review
that ensures that the forms (and sub-forms) actually do as intended and
not lead to some other inadvertent activity? Sorry - that question is a
bit jumbled. Hope it makes sense though.

> I'll give an example:
>
> (define (leftmost-digit n base)
> (define (loop i next)
> (cond ((<= next n) (loop next (* next next)))
> ((= i 1) n)
> (else (leftmost-digit (floor (/ n i)) base))))
> (loop 1 base))
>
>
> First, we look at the very first outside thing. It is a DEFINE. So
> now we know we are defining something. So we look at the next thing.
> It is `leftmost-digit'. At this point, we can actually stop. We know
> now that this is a function that computes the leftmost digit of
> something. If we don't need to know the leftmost digit, why bother
> continuing to read? If we continue, however, we'll see that there are
> two arguments: N and BASE. Ok, I'll guess that N is the number whose
> leftmost digit we want, and BASE is the base in which we represent the
> number (the leftmost digit of any number in base 2 is going to be a 1,
> but if it is in base 16 it could be 1 through F).
>
> If we run it, we'll find out if it works:
> => (leftmost-digit 23094823084 10)
> 2
>
> Yep. Now if we need to know the leftmost digit of something we can
> use this. Do we care *how* it works? Not really.

I understand the overall thrust of your argument, and I suppose there
may be some parallels here with the philosophy of object oriented
programming in terms of data hiding (would this be correct?): we don't
need to know the details of how it works, but really only how it can be
used and applied.
Nevertheless, I am still curious about how one reads such code. If I
have never seen a page of code before, are you saying that by reading
the first line of a (define (x y z)) expression, one need not delve into
the whys and wherefores of its operation, but rather only focus on the
way that it interacts with other (sub) forms?

Joe, I have appreciated you taking the time to explain this in such
detail. Thanks for your work on this. I have printed this out for future
(recurring) reference, because it is probably the single most clear
explanation I have come across thus far. This strikes me as a bit
strange actually. Something as fundamental as this would, I'd have
thought, have found expression more frequently. How to read a function
of Scheme (both with respect to its own reference as well as external
references if it is passed to another function, or is part of a larger
program) seems to be a fundamental skill to acquire. So far I haven't
come across a reference to it though. I am hoping yet to do so, so
thanks Joe - this was a useful description of events.

Kind regards

- Andy

Neptune

unread,

Jan 10, 2004, 9:40:54 PM1/10/04

to

Thanks for the correction. I wasn't aware of the history/context of
Emacs and was under the (wrong) idea that it was his baby.
Cheers

- Andy

--
"Today a young man on acid realised that all matter was really energy
condensed to a slow vibration, that we are all one consciousness
experiencing itself subjectively, there's no such thing as death,
life is only a dream, and we're the imaginations of ourselves.
Here's Tom with the weather ..." - Bill Hicks.

Barry Margolin

unread,

Jan 11, 2004, 1:03:31 AM1/11/04

to

In article <4000...@212.67.96.135>, Andy <Andy...@home.com> wrote:

> > Almost all implementations do garbage collect. The particular
> > algorithm used for GC varies among implementations (and even *within*
> > implementations!).
>
> What do you mean Joe? Can you explain how it would vary _within_
> implementations?

An implementation may have multiple GC algorithms that are invoked in
different situations. There's often an incremental GC that runs
quickly, but only scans a subset of the heap to get the most likely
garbage. In addition, there may be a full GC that scans everything,
guaranteed to reap all the garbage. And there can be intermediary
algorithms.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA

Ray Dillinger

unread,

Jan 11, 2004, 3:30:20 AM1/11/04

to

Andy wrote:
>
> Jens Axel Søgaard wrote:
> >> 1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
> >> what are the implications of this (if any) for security, program
> >> overhead, garbage collection?
>
> > Remember, this is the semantics section. If the implementation can
> > prove that an object will never be used by the running program, it
> > is ok to remove the object.
>
> hmmm ... I'm not sure that answers my question Jens. Can you elaborate
> please.

In practical terms, it's easy for the implementation to prove that
you'll never use an allocated object again when you no longer have
any way of accessing it. For example, let's say you do

(define foo (make-vector 23))

The system responds by allocating vector 23 elements long, and
you can access it via the variable whose name is foo.

If you then do

(set! foo '())

Foo now refers to the empty list, and you don't have any way anymore
of accessing that array. Since you can't access it anymore, the
implementation can prove that it doesn't matter to the future of
the computation. So it's allowed (not required) to reuse the
physical memory space for something else.

The program semantics are unaffected by its doing so; the program
still behaves as though the object once allocated were never
deallocated. Garbage collection is just a way to simulate the
availability of more memory than is necessarily physically present.

But if before setting foo to the empty list you had done

(define bar foo)

Then bar would be another way of referring to that vector. In that
case, after you set foo to the empty list, the implementation could
not prove that you were done with that vector, because you'd still
have a variable named bar that referred to it. So in that case it
couldn't reuse the memory for something else.

> How does one check that out further. For example I use STk as well as
> DrScheme. I am pretty certain that DrScheme has GC (I think that I read
> that somewhere, but cannot remember where), but don't really know about
> STk. I can but presume, but I am curious as to how one can ascertain
> that. Is there a test one can apply to an implementation, like proper
> tail recursion is used as a criteria for R5RS compliance, I believe?

Here's a simple test:

(define GCvar '())

(define GCtest
(lambda (n)
(if (< n (* 1024 1024))
(begin
(set! GCvar (make-vector 1024))
(GCtest (+ n 1)))
"finished"))))

Just define GCvar and GCtest, then attempt to evaluate (GCtest 0).

This will allocate just over a million vectors of 1024 elements each,
where each element is presumably at least a 4-byte pointer.

On a 32-bit system that does not GC, this is guaranteed to cause the
system to run out of memory. If the system does GC, it will merely
take a while to complete. In practice, you will find no functional
scheme system which does not GC.

The important bit is that the variable GCvar refers to a particular new
vector, only until it gets set! again. When a garbage collector observes
that the only method of referring to a vector now refers to something
else, it will know that the memory occupied by the vector can now be
used for other things as well without affecting the program semantics.

Bear

tal...@noshpam.lbl.government

unread,

Jan 11, 2004, 4:31:38 AM1/11/04

to

On Jan 10, 2004 at 8:14pm, Andy wrote:

Andy_n >This ties in for me with a bigger issue of how to _read_ Scheme. So far
Andy_n >it seems that to read Scheme, one reads from right to left - from the
Andy_n >inner-most set of parentheses to the outer set. This may be a statement
Andy_n >on the condition of my brain or the inexperience of a neophyte, but I am
Andy_n >finding it hard to keep all that in my brain when trying to decipher
Andy_n >seemingly endless blocks of (definition (statements x) ... ). How do
Andy_n >those of you with more experience do it?

Reading from in to out can be thought of as the substitution model of
evaluation; where you are evaluating the statement step-by-step,
trying to determine the result of the expression. Reading from out to
in, is a way for the programmer to understand what the code wants to
do. Paul Graham describes this ("On Lisp") in terms of the difference
between functional and imperative programming style (one finds it
easier to default to functional programming style in Scheme): a
function written in an imperative style is describing the *HOW*,
whereas a function written in a functional style is describing the *WHAT*.

One thing that I find when writing Scheme code is that once a function
gets to be a certain size, I begin to get nervous. You need to develop
this healthy distrust of large blocks of code: when you can, make
values used in the function named either via function arguments, or by
"let". Also, when in doubt, create a helper function, utility
function, data abstraction, etc. "Invest in abstraction"; let that be
your motto.

I'm also discovering the correct balance between nesting all functions
in terms of lexical scope, or in putting them in the top-level. Does
anybody have advice about how much you nest? What do you nest, what
won't you nest?

~Tomer

Thien-Thi Nguyen

unread,

Jan 11, 2004, 6:59:48 AM1/11/04

to

Andy <Andy...@home.com> writes:

> I think Stalman created [emacs] entirely in Lisp.

i encourage you to download emacs source and look at it (briefly),
perhaps in the same way that you peruse other programs. if you
have a fast net.cnxn, another pleasant pasttime is to do a "cvs
checkout" on the source, pick a random file and peruse its history
and/or annotations (in emacs w/ colors).

> what exactly are "hints" and how do they help you get
> the overall idea behind a program?

i don't use "hints" in a technical sense. rather, these hints are
simply signals from your own brain when it recognizes similarities
between your experience and the new code before you. the overall
idea of a program should have already been conveyed to you in the
README or somesuch.

the process of reading code is ultimately a process of matching
your internal model of how things work against the reality, and
updating the model where you have erred. the process of writing
code is the same, except you update everything at the same time.
(there more errors, too, typically. :-)

> I am but a lowly entrant upon a grand stage,

no worries.

thi

Andy

unread,

Jan 11, 2004, 8:10:25 AM1/11/04

to

Thanks for the explanation Barry - that makes sense to me.

- Andy

Andy

unread,

Jan 11, 2004, 8:31:10 AM1/11/04

to

Ray Dillinger wrote:

> Andy wrote:
>
>>hmmm ... I'm not sure that answers my question Jens. Can you elaborate
>>please.
>
>
> In practical terms, it's easy for the implementation to prove that
> you'll never use an allocated object again when you no longer have
> any way of accessing it. For example, let's say you do
>
> (define foo (make-vector 23))
>
> The system responds by allocating vector 23 elements long, and
> you can access it via the variable whose name is foo.
>
> If you then do
>
> (set! foo '())
>
> Foo now refers to the empty list, and you don't have any way anymore
> of accessing that array. Since you can't access it anymore, the
> implementation can prove that it doesn't matter to the future of
> the computation. So it's allowed (not required) to reuse the
> physical memory space for something else.

Ah hah! The penny is beginning to drop now. I guess I was stumbling over
the notion of not being to "access" an object and was wondering if that
meant that the object was "sent" somewhere. Obviously not, so thanks.

>
> The program semantics are unaffected by its doing so; the program
> still behaves as though the object once allocated were never
> deallocated. Garbage collection is just a way to simulate the
> availability of more memory than is necessarily physically present.

I may regret answering this due to the potential level of technicality
your answer might involve but ... <deep breath> ... how does a program
continue to behave _as if_ the object has not been de-allocated? I don't
know if that makes sense, so let me ask it this way:
If the program allocates an object some or value and that value is then
stored in a register, if that object is now de-allocated (or rendered
inaccessible and hence is "put out" for garbage collection) is it the
reference that is changed, or is the value erased from the register? If
it is erased from the register, then why wouldn't that create an error
if the program is still pointing to that value at that register.

I apologise in advance for the very simplistic way of putting this, and
willingly admit that my knowledge about machine level operation is
sketchy, so the above question may be irrelevant.

> But if before setting foo to the empty list you had done
>
> (define bar foo)
>
> Then bar would be another way of referring to that vector. In that
> case, after you set foo to the empty list, the implementation could
> not prove that you were done with that vector, because you'd still
> have a variable named bar that referred to it. So in that case it
> couldn't reuse the memory for something else.

OK - I think that this is addressing my query above, because 'bar' still
points to the value stored in the register ... is that accurate?

>
>>How does one check that out further. <snip>

>
> Here's a simple test:
>
> (define GCvar '())
>
> (define GCtest
> (lambda (n)
> (if (< n (* 1024 1024))
> (begin
> (set! GCvar (make-vector 1024))
> (GCtest (+ n 1)))
> "finished"))))
>
> Just define GCvar and GCtest, then attempt to evaluate (GCtest 0).

I'll give this a go. Can you tell me how I would go about checking the
memory to see this? On my Linux box I was thinking of using the CLI top,
or kinfocenter - but there might be a better way to see this in action.

> This will allocate just over a million vectors of 1024 elements each,
> where each element is presumably at least a 4-byte pointer.
>
> On a 32-bit system that does not GC, this is guaranteed to cause the
> system to run out of memory. If the system does GC, it will merely
> take a while to complete. In practice, you will find no functional
> scheme system which does not GC.

So it is not necessary to conduct the test you have outlined, although
would still be educational to do so.

> The important bit is that the variable GCvar refers to a particular new
> vector, only until it gets set! again. When a garbage collector observes
> that the only method of referring to a vector now refers to something
> else, it will know that the memory occupied by the vector can now be
> used for other things as well without affecting the program semantics.

GC then is an automatic process that happens in the background and hence
does not need to be deliberately invoked. In many books, and I'm
thinking of SICP particularly, the object seems to be to build an
interpreter and a compiler. Would GC functions get built into the
compiler/interpreter or elsewhere in the basic implementation itself?

Thanks Bear. Good illustrations - sorry if my queries are muddying the
waters further. This is just really interesting stuff, albeit a stretch
for me to wrap my non-technical brain around!!

- Andy

Andy

unread,

Jan 11, 2004, 8:52:38 AM1/11/04

to

tal...@noshpam.lbl.government wrote:

>
> Reading from in to out can be thought of as the substitution model of
> evaluation; where you are evaluating the statement step-by-step,
> trying to determine the result of the expression.

That's correct yes - and seems to be what HTDP and Simply Scheme
recommend (at least up to where I have gotten in those respective texts
to date).

> Reading from out to
> in, is a way for the programmer to understand what the code wants to
> do.

As in reading (define (foo bar bar1) ... ) from out to in shows that it
is a definition of foo taking two arguments, bar and bar1. Joe Marshall
(in an earlier response) pointed out that (a) this is the best way of
reading Scheme from out-to-in; (b) that one trusts that the function
works - or rather trust in the details; and (c) that by reading code
this way, although a difficult skill to acquire, one obtains a better
sense of what the program does and how it works.

> Paul Graham describes this ("On Lisp") in terms of the difference
> between functional and imperative programming style (one finds it
> easier to default to functional programming style in Scheme): a
> function written in an imperative style is describing the *HOW*,
> whereas a function written in a functional style is describing the *WHAT*.

And Scheme as a functional programming style is more concerned with the
"what" of the program - what it does - rather than with the "how" it works?

> One thing that I find when writing Scheme code is that once a function
> gets to be a certain size, I begin to get nervous. You need to develop
> this healthy distrust of large blocks of code: when you can, make
> values used in the function named either via function arguments, or by
> "let". Also, when in doubt, create a helper function, utility
> function, data abstraction, etc. "Invest in abstraction"; let that be
> your motto.

I like that: "Invest in abstraction". Certainly using helper functions
is something that is being continuously drummed into my head through my
readings.

A question I asked earlier that hasn't yet been picked up on concerns
this "trusting" that the details work the way they should: doesn't this
lead to a certain degree of slippage or the potential for malicious code
or unstable code to get through scrutiny?

> I'm also discovering the correct balance between nesting all functions
> in terms of lexical scope, or in putting them in the top-level. Does
> anybody have advice about how much you nest? What do you nest, what
> won't you nest?

That's an interesting question. I'm just touching on the ideas of
lexical scopes, so I'd be curious to hear about this one too.

> ~Tomer

Thanks Tomer.

- Andy

Andy

unread,

Jan 11, 2004, 8:55:41 AM1/11/04

to

Thi
I suspect that what you are touching on here moves into the realms of
where programming becomes an art form, like creative writing - such as
the literature approach that HTDP and even SICP advocate. Mind you,
although referring mainly to C, Spinellis recommends reading code as a
narrative too in his "Code Reading". Certainly has a feelgood factor to it.

Cheers Thi

- Andy

Richard C. Cobbe

unread,

Jan 11, 2004, 11:53:14 AM1/11/04

to

Andy <Andy...@home.com> writes:

> Ray Dillinger wrote:
>> Andy wrote:

> I may regret answering this due to the potential level of technicality
> your answer might involve but ... <deep breath> ... how does a program
> continue to behave _as if_ the object has not been de-allocated? I
> don't know if that makes sense, so let me ask it this way:
> If the program allocates an object some or value and that value is
> then stored in a register, if that object is now de-allocated (or
> rendered inaccessible and hence is "put out" for garbage collection)
> is it the reference that is changed, or is the value erased from the
> register? If it is erased from the register, then why wouldn't that
> create an error if the program is still pointing to that value at that
> register.

(First, a small side note: many, though not all, Scheme values are too big
to store in a single register; they're typically allocated elsewhere in
memory and the register contains a `reference' -- think C pointer, but
safer.)

To answer your question, though: if there is still a reference to the
object in a register, then the collector will determine that the object is
still reachable, and then not collect the object. (Unless, for whatever
reason, that particular register is not itself considered reachable,
although this is highly unlikely.)

If a program has no remaining references to a particular object, then it
cannot read the contents of that object or write to them. Therefore, the
program will behave the same regardless of whether the object is still in
memory or not.

<Bear's test for GC snipped>

> I'll give this a go. Can you tell me how I would go about checking the
> memory to see this? On my Linux box I was thinking of using the CLI
> top, or kinfocenter - but there might be a better way to see this in
> action.

Top would work, certainly. I think a much simpler test would be to run the
program. If the scheme process terminates with an out-of-memory error, you
don't have GC. Otherwise, you do.

> GC then is an automatic process that happens in the background and
> hence does not need to be deliberately invoked.

You're correct in that you do not need to invoke it manually. However, GC
need not be implemented as a separate process/thread at the OS level. A
much simpler implementation simply wires it into the allocator (the part of
the Scheme run-time analogous to malloc(3)). If the allocator finds enough
memory for the requested object, well and good; otherwise, it calls the GC
as a subroutine.

> In many books, and I'm thinking of SICP particularly, the object seems to
> be to build an interpreter and a compiler. Would GC functions get built
> into the compiler/interpreter or elsewhere in the basic implementation
> itself?

Been a while since I've read SICP. IIRC, the interpreter is written in
Scheme, so you can use the implementation language's garbage collector to
do garbage collection for the language that you yourself are implementing.
Don't remember the details of their compiler implementation.

To make this more concrete, let's use the name `SICP-scheme' to refer to
the language implemented by the SICP interpreter. Let's further assume,
for the moment, that you're writing this interpreter in PLT Scheme.
Now, in the meta-circular interpreter at least, cons in SICP-scheme is
simply implemented with cons in PLT Scheme. In that case, SICP-scheme is
automatically garbage-collected, because PLT Scheme is.

In general, though, an interpreter will contain the GC routines. If you
compile to some sort of byte code, like the JVM or Microsoft's CLR, then
the byte-code interpreter will contain the GC routines. If you compile
directly to native code, the language runtime (usually a set of libraries
provided with the compiler and linked with the executable that you create)
contains the GC routines.

Whichever case happens to be true, the programmer doesn't really need to
worry about it --- the language Does the Right Thing automatically.

> Thanks Bear. Good illustrations - sorry if my queries are muddying the
> waters further. This is just really interesting stuff, albeit a
> stretch for me to wrap my non-technical brain around!!

Not a problem. It sounds like you're on the right track; just a little bit
farther and you'll get there.

Richard

Andy

unread,

Jan 11, 2004, 12:46:55 PM1/11/04

to

Richard C. Cobbe wrote:
> (First, a small side note: many, though not all, Scheme values are too big
> to store in a single register; they're typically allocated elsewhere in
> memory and the register contains a `reference' -- think C pointer, but
> safer.)

Thanks - that makes sense. It is also reassuring to note that Scheme's
allocations are safer than those of C. Can anyone elaborate why that is
the case? What is/are the difference(s)?

> To answer your question, though: if there is still a reference to the
> object in a register, then the collector will determine that the object is
> still reachable, and then not collect the object. (Unless, for whatever
> reason, that particular register is not itself considered reachable,
> although this is highly unlikely.)

So, in effect, the reference to an object indicates that it is still
accessible (or "needed") and hence will be ignored by GC? Is it the
reference that determines whether or not something gets GCed - i.e. as
in the absence or the presence of a given reference.

> If a program has no remaining references to a particular object, then it
> cannot read the contents of that object or write to them. Therefore, the
> program will behave the same regardless of whether the object is still in
> memory or not.

OK - I think I get that alright. If the reference ("address") of an
object is unavailable, then it cannot be "communicated with", in effect,
and will continue to operate with or without that unreferenced object?

> <Bear's test for GC snipped>
>

> Top would work, certainly. I think a much simpler test would be to run the
> program. If the scheme process terminates with an out-of-memory error, you
> don't have GC. Otherwise, you do.

So if it crashes I know it doesn't GC. LOL. I guess that makes it rather
unambiguous then :)

>
> If the allocator finds enough
> memory for the requested object, well and good; otherwise, it calls the GC
> as a subroutine.

So as to "clear" enough memory for the object?

>
> Been a while since I've read SICP. IIRC, the interpreter is written in
> Scheme, so you can use the implementation language's garbage collector to
> do garbage collection for the language that you yourself are implementing.
> Don't remember the details of their compiler implementation.

I'm with you: _because_ the interpreter is written in Scheme then
Scheme's builtin GC routine will be called into play as required anyway
and would not need to be deliberately set up.

>
> Whichever case happens to be true, the programmer doesn't really need to
> worry about it --- the language Does the Right Thing automatically.

... and I for one am sighing with relief!!! :)

>
> Not a problem. It sounds like you're on the right track; just a little bit
> farther and you'll get there.

Those are encouraging words Richard. I think sometimes you
technically-oriented people don't realise just how complex your field is
for an 'outsider'. You guys have been a great assist.

Thanks Richard and others

- Andy
>
> Richard

Ray Dillinger

unread,

Jan 11, 2004, 1:00:50 PM1/11/04

to

Andy wrote:
>
> Ray Dillinger wrote:

> > The program semantics are unaffected by its doing so; the program
> > still behaves as though the object once allocated were never
> > deallocated. Garbage collection is just a way to simulate the
> > availability of more memory than is necessarily physically present.
>
> I may regret answering this due to the potential level of technicality
> your answer might involve but ... <deep breath> ... how does a program
> continue to behave _as if_ the object has not been de-allocated? I don't
> know if that makes sense, so let me ask it this way:
> If the program allocates an object some or value and that value is then
> stored in a register, if that object is now de-allocated (or rendered
> inaccessible and hence is "put out" for garbage collection) is it the
> reference that is changed, or is the value erased from the register? If
> it is erased from the register, then why wouldn't that create an error
> if the program is still pointing to that value at that register.

Terminology: You probably meant "memory location" rather than "register".

A program continues to behave as if an object has not been deallocated
because, even if the object weren't deallocated, the program could never
use it again. To put it another way, the semantics are exactly the same
whether it's deallocated or not.

If you're used to programming in C, think about a malloc'd struct that
you've dropped all the pointers to; you've got no way to talk about that
structure, and you can't do anything with it. You can't even deallocate
it. In C this is called a "memory leak" and the program behaves as
though it were never deallocated because in fact it is never deallocated.
But if leaked memory accumulates, and you don't have infinite amounts
of memory on the machine, eventually the program runs out of memory and
crashes.

Scheme programs "leak" all over the place, by intent, and the garbage
collector just notices and lets the program reuse the memory space.

To address your question about an object being deallocated or rendered
inaccessible, you can't get that error because, if the program is still
pointing to that memory location, then the Garbage Collector can't prove
you'll never use that value again and won't reap it.

Bear

In Scheme this is just a
collectable location, and the Garbage collector reaps it so we can use it
again.

Andy

unread,

Jan 11, 2004, 1:45:12 PM1/11/04

to

Ray Dillinger wrote:
> Andy wrote:
>

> Terminology: You probably meant "memory location" rather than "register".
>
> A program continues to behave as if an object has not been deallocated
> because, even if the object weren't deallocated, the program could never
> use it again. To put it another way, the semantics are exactly the same
> whether it's deallocated or not.
>
> If you're used to programming in C, think about a malloc'd struct that
> you've dropped all the pointers to; you've got no way to talk about that
> structure, and you can't do anything with it. You can't even deallocate
> it. In C this is called a "memory leak" and the program behaves as
> though it were never deallocated because in fact it is never deallocated.
> But if leaked memory accumulates, and you don't have infinite amounts
> of memory on the machine, eventually the program runs out of memory and
> crashes.
>
> Scheme programs "leak" all over the place, by intent, and the garbage
> collector just notices and lets the program reuse the memory space.

Whereas in C the memory leaks are not 'recycled'? Can you give me a
reference that explains that process or do you mind giving me a precis
of how this works in Scheme.

> To address your question about an object being deallocated or rendered
> inaccessible, you can't get that error because, if the program is still
> pointing to that memory location, then the Garbage Collector can't prove
> you'll never use that value again and won't reap it.

That's logical, Captain. :)

I put your GCtest into work and monitored the output in top: at it's
peak, it was using 99% of my CPU but never more than 2.8% of memory. I'm
running an AMD Athlon 2.4 with 520MB RAM. Anyway, it survived ergo I
have now satisfied myself that STk's implementation of Scheme also GC
... not that I ever doubted it - but was curious. Thanks Bear :)

Joe Marshall

unread,

Jan 11, 2004, 3:31:57 PM1/11/04

to

Andy <Andy...@home.com> writes:

> Hello Joe
>
> Joe Marshall wrote:
>> Andy <Andy...@home.com> writes:
>>
>>>1. In the R5RS, 1.1. Semantics: "No Scheme object is ever destroyed" -
>>>what are the implications of this (if any) for security, program
>>>overhead, garbage collection?
>> That sounds suspiciously like a homework question.
>
> I wish ... :) Nope, I am not in school. Just a hobbyist trying to get
> my head around concepts. I'll take that as a compliment that the
> question had value - thanks.

Well, then I'll give you a more detailed answer.

1. Security: Security may be improved because it is impossible to
refer to an object that is no longer valid. In a language like C,
one can call free to de-allocate an object, but if a reference to
the object still exists, then the freed object could accidentally
be re-used. This may even `work' until the memory used for the
object is allocated by something else.

2. Program overhead: A certain amount of overhead is caused by
garbage collection. It is necessary to leave enough hints around
for the GC to make sense of what is going on. This is done
automatically, so the programmer doesn't have to be aware of it,
but there is generally one or two bits per word of memory that is
used to tag objects. There is active research in how to reduce
the overhead (with a smart compiler, the GC overhead can approach
zero).

On the other hand manual memory management typically accounts for
about 1/3 of the code in a C program. All this code goes away.

>> Almost all implementations do garbage collect. The particular
>> algorithm used for GC varies among implementations (and even *within*
>> implementations!).
>
> What do you mean Joe? Can you explain how it would vary _within_
> implementations?

Will Clinger and Lars Hansen outfitted their Larceny Scheme
implementation to have a pluggable garbage collector. You could
select which GC you wanted to use.

This paper has info:

@misc{ hansen-experimental,
author = "Lars T Hansen and William D Clinger",
title = "An Experimental Study of Renewal-Older-First Garbage Collection",
url = "citeseer.nj.nec.com/598694.html" }

>> However, you want to read the functions from *outside* to *inside*,
>> not the other way. This is the trickiest thing to learn --- you need
>> to ignore the details and trust that they work.
>
> What does this mean with respect to peer-reviewed code,
> vulnerabilities, etc., if we "accept and trust"?

The outside-to-inside technique is for understanding the *intent* as
quickly as possible. You have to know what the main idea is before
you can criticise the minutae.

But of course you don't want to generalize this to accepting all
programs on faith.

Let me put it this way: suppose you were doing a code review. If we
start at the innermost piece of code, you might see this:
(+ x offset)

Is this correct? Who knows? Could it be wrong? Maybe.

Now if we look at the outermost piece of code:
(define (export-file-changes file change-list) ....)

Even before we look at the body we have some idea at what might be
going on. We expect some file operations, we expect some iteration
over a list, we don't expect fourier transforms.

Since we are doing a code review, we would now go on to consider the
pieces, but we have the beginnings of a road map for the review. If
the next thing we see is this:
(for-each (lambda (change) ...) change-list)

we'll know that things are heading in the right direction, but if we
see this:
(/ change-list (+ file 7))

we'll know something is terribly wrong.

> Isn't there some more critical review that ensures that the forms
> (and sub-forms) actually do as intended and not lead to some other
> inadvertent activity? Sorry - that question is a bit jumbled. Hope
> it makes sense though.

Of course! But we start at the top and work down.

> I understand the overall thrust of your argument, and I suppose there
> may be some parallels here with the philosophy of object oriented
> programming in terms of data hiding (would this be correct?): we don't
> need to know the details of how it works, but really only how it can
> be used and applied.

It's the philosophy of abstraction (object-oriented is but one means
of abstraction). Computers are just too damn complicated to figure
out, so you *have* to pretend that the details don't matter. The art
of computer science is to figure out the best set of abstractions to
use.

> Nevertheless, I am still curious about how one reads such code. If I
> have never seen a page of code before, are you saying that by reading
> the first line of a (define (x y z)) expression, one need not delve
> into the whys and wherefores of its operation, but rather only focus
> on the way that it interacts with other (sub) forms?

Yes. I have some code here I didn't write and I've never looked at.
I'll peek at it now.

It is the metering code from the clocc. (looking at the filename). I
suspect it measure code. For fun, I'll ignore the readme and just
look at the lisp.

Emacs shows me the lines beginning with DEF:

392:(defpackage "MONITOR" (:nicknames "MON") (:use "COMMON-LISP")
395:(defpackage "MONITOR" (:nicknames "MON") (:use "COMMON-LISP")
398:(defpackage "MONITOR" (:nicknames "MON") (:use "COMMON-LISP")

Ok, it lives in the MON package, so other code in the clocc that
refers to MON must be using this.

454:(defparameter *metering-version* "v2.1 25-JAN-94"

Hey, a version number. Good thing to check before reporting a bug.

515:(defmacro get-cons ()
524:(defun get-cons ()
534:(defmacro get-cons () `(the consing-type (gc-size)))
539:(defvar *gc-space-array* (make-array 4 :element-type '(unsigned-byte 32)))
541:(defun bytes-consed ()
546:(defun bytes-consed ()
556:(defmacro get-cons () `(the consing-type (bytes-consed)))
574:(defvar *bytes-consed-chkpt* 0)
577:(defun reset-consing () (setq *bytes-consed-chkpt* 0))
598:(defun total-bytes-consed (&aux pages fp)
612:(defun get-cons ()
625:(defmacro get-cons () `(the consing-type (ccl::total-bytes-allocated)))

Looks like we can measure consing.

668:(defmacro with-time/cons ((delta-time delta-cons) form &body post-process)
680:(defmacro delta4 (nv1 nv2 ov1 ov2 by)
695:(defmacro delta4-cons (new-cons1 new-cons2 old-cons1 old-cons2)
701:(defmacro with-time/cons ((delta-time delta-cons) form &body post-process)
791:(defun required-arguments (name)
809:(defun required-arguments (name)
831:(defun required-arguments (name)
850:(defun square (x) (* x x))
851:(defun square2 (x &optional y) (* x x y))
852:(defun test (x y &optional (z 3)) 3)
853:(defun test2 (x y &optional (z 3) &rest fred) 3)

This looks like test code. I can ignore it.

869:(defvar *MONITOR-TIME-OVERHEAD* nil
871:(defvar *MONITOR-CONS-OVERHEAD* nil
874:(defvar *TOTAL-TIME* 0
876:(defvar *TOTAL-CONS* 0
878:(defvar *TOTAL-CALLS* 0
889:(defmacro PLACE-FUNCTION (function-place)
903:(defsetf PLACE-FUNCTION (function-place) (function)
916:(defun PLACE-FUNCTION (function-place)
924:(defsetf PLACE-FUNCTION (function-place) (function)
931:(defun PLACE-FBOUNDP (function-place)
940:(defun PLACE-MACROP (function-place)

Ok, these look like something I'd actually use from other code. I bet
the UPPER-CASE stuff indicates that.

948:(defvar *monitored-functions* nil
955:(defstruct metering-functions
985:(defvar *monitor* (make-hash-table :test #'equal)
987:(defun get-monitor-info (name)
989:(defsetf get-monitor-info (name) (info)
992:(defun MONITORED (function-place)
997:(defun reset-monitoring-info (name)
1002:(defun reset-all-monitoring ()
1011:(defun monitor-info-values (name &optional (nested :exclusive) warn)
1059:(defun make-monitoring-encapsulation (min-args optionals-p)
1171:(defvar *existing-encapsulations* (make-hash-table :test #'equal))
1172:(defun find-encapsulation (min-args optionals-p)
1190:(defun monitoring-encapsulate (name &optional warn)
1209:(defun monitoring-unencapsulate (name &optional warn)
1227:(defmacro MONITOR (&rest names)
1237:(defmacro UNMONITOR (&rest names)
1243:(defun MONITOR-ALL (&optional (package *package*))
1252:(defmacro MONITOR-FORM (form
1267:(defmacro WITH-MONITORING ((&rest functions)
1285:(defconstant overhead-iterations 5000
1289:(defun STUB-FUNCTION (x)
1294:(defun SET-MONITOR-OVERHEAD ()
1319:(defvar *monitor-results* nil
1321:(defvar *no-calls* nil
1323:(defvar *estimated-total-overhead* 0)
1326:(defstruct (monitoring-info
1341:(defun REPORT (&key (names :all)
1353:(defun REPORT-MONITORING (&optional names
1412:(defun display-monitoring-results (&optional (threshold 0.01) (key :percent-time)
1495:(defun sort-results (&optional (key :percent-time))

Ok, so let's assume I want to monitor some code. It sure looks like
the MONITOR macro will do something like that.

(defmacro MONITOR (&rest names)
"Monitor the named functions. As in TRACE, the names are not evaluated.
If a function is already monitored, then unmonitor and remonitor (useful
to notice function redefinition). If a name is undefined, give a warning
and ignore it. See also unmonitor, report-monitoring,
display-monitoring-results and reset-time."
`(progn
,@(mapcar #'(lambda (name) `(monitoring-encapsulate ',name)) names)
*monitored-functions*))

Ok, this is just a wrapper for `monitoring-encapsulate'. What does
that do?

(defun monitoring-encapsulate (name &optional warn)
"Monitor the function Name. If already monitored, unmonitor first."
;; Saves the current definition of name and inserts a new function which
;; returns the result of evaluating body.
(cond ((not (place-fboundp name)) ; not a function
(when warn
(warn "Ignoring undefined function ~S." name)))
((place-macrop name) ; a macro
(when warn
(warn "Ignoring macro ~S." name)))
(t ; tis a function
(when (get-monitor-info name) ; monitored
(when warn
(warn "~S already monitored, so unmonitoring it first." name))
(monitoring-unencapsulate name))
(multiple-value-bind (min-args optionals-p)
(required-arguments name)
(funcall (find-encapsulation min-args optionals-p) name)))))

A quick scan of the COND conditions shows that a number of errors are
checked for (boring), but right near the end, we see this:

(multiple-value-bind (min-args optionals-p)
(required-arguments name)
(funcall (find-encapsulation min-args optionals-p) name))

From the look of it, REQUIRED-ARGUMENTS must return the minimum number
of arguments of what I'm monitoring, and OPTIONALS-P must have
something to do with whether there are optional arguments or not.
After that is called, FIND-ENCAPSULATION is called. It returns a
function that we invoke on NAME.

Now from here, I can see that I want to ignore the call to
REQUIRED-ARGUMENTS because it isn't likely to have much to do with the
actual monitoring, but more likely just figuring out info that
FIND-ENCAPSULATION needs. So I'd look at that:

(defvar *existing-encapsulations* (make-hash-table :test #'equal))
(defun find-encapsulation (min-args optionals-p)
(or (gethash (cons min-args optionals-p) *existing-encapsulations*)
(setf (gethash (cons min-args optionals-p) *existing-encapsulations*)
(compile nil
(make-monitoring-encapsulation min-args optionals-p)))))

I only have to look at the first line to see that FIND-ENCAPSULATION
looks up the encapsulation in a hash table. If it doesn't find it, it
calls MAKE-MONITORING-ENCAPSULATION and compiles it.

So you can see that we can ignore details:
We don't really care about the ways errors might be signalled
(although later on we might want to know what happens if we try to
monitor a non-existant function), we don't care what
REQUIRED-ARGUMENTS does. We don't even need to know the key used to
lookup the encapsulation in the hash table.

> Joe, I have appreciated you taking the time to explain this in such
> detail.

You're welcome.

--
~jrm

Joe Marshall

unread,

Jan 11, 2004, 3:39:11 PM1/11/04

to

Andy <Andy...@home.com> writes:

> I may regret answering this due to the potential level of technicality
> your answer might involve but ... <deep breath> ... how does a program
> continue to behave _as if_ the object has not been de-allocated?

By not bothering to use the object.

(define *a-random-thing* (cons 23 "Skidoo"))

(define (foo x y)
(+ x y))

No matter what I do to *a-random-thing*, including de-allocating it,
the function FOO continues to work as before.

> I don't know if that makes sense, so let me ask it this way:
> If the program allocates an object some or value and that value is
> then stored in a register, if that object is now de-allocated (or
> rendered inaccessible and hence is "put out" for garbage collection)
> is it the reference that is changed, or is the value erased from the
> register? If it is erased from the register, then why wouldn't that
> create an error if the program is still pointing to that value at that
> register.

That would be a bad thing to deallocate. Let's give a better example:

(define (sum-and-difference a b)
(cons (+ a b) (- a b)))

(define (double x)
(car (sum-and-difference x x)))

SUM-AND-DIFFERENCE returns a CONS cell with the SUM and DIFFERENCE of
the arguments. DOUBLE uses this to construct the sum, but it doesn't
care what the difference is, so it simply selects the first element.

Now what about that CONS cell? No one seems to be using it. Once
DOUBLE returns, there is no way to get at the CONS cell. Might as
well recycle it.

> GC then is an automatic process that happens in the background and
> hence does not need to be deliberately invoked. In many books, and I'm
> thinking of SICP particularly, the object seems to be to build an
> interpreter and a compiler. Would GC functions get built into the
> compiler/interpreter or elsewhere in the basic implementation itself?

In general, yes.

--
~jrm

Joe Marshall

unread,

Jan 11, 2004, 3:42:23 PM1/11/04

to

Andy <Andy...@home.com> writes:

Richard Cobbe wrote:
>> Been a while since I've read SICP. IIRC, the interpreter is written
>> in
>> Scheme, so you can use the implementation language's garbage collector to
>> do garbage collection for the language that you yourself are implementing.
>> Don't remember the details of their compiler implementation.

The GC they mention in SICP is purely for pedagogic reasons. Some
people are curious as to how it might work.

>> Whichever case happens to be true, the programmer doesn't really
>> need to worry about it --- the language Does the Right Thing automatically.
>
> ... and I for one am sighing with relief!!! :)

Yes. You can pretty much ignore the GC.

--
~jrm

Joe Marshall

unread,

Jan 11, 2004, 3:51:53 PM1/11/04

to

Andy <Andy...@home.com> writes:

> Whereas in C the memory leaks are not 'recycled'? Can you give me a
> reference that explains that process or do you mind giving me a precis
> of how this works in Scheme.

Ok. Let's suppose we have just calculated something in a function and
we are returning the answer. Assume the answer is placed in a
register and we intend to just pop the stack and return.

The contents of the other registers are (by definition) no longer
useful, so we can zero those out. The only things that can possibly
be used are those objects still on the stack, and anything that they
refer to, and anything that *those* refer to, and anything that
*those* refer to, etc.

A popular GC technique is `mark and sweep'. We start with the stack
and the returned object and `paint' it red. Now we go through and
paint red everything that is referred to by a red object. We keep
painting things red (but only if a red object refers to it), until we
have painted everything we can. Now we look at all the memory. If we
find a patch that isn't red, we can re-use it.

Another popular technique is `stop and copy' GC
(Minsky-Fenichel-Yokelson). Whenever your apartment becomes too
cluttered, you rent a new apartment, move everything you care about to
the new place, then abandon the old place for the next renter to clean
up. In GC terms, we create a new copy of every useful object and
fixup all the pointers. Then we can discard the entire old heap
wholesale.

--
~jrm

Barry Margolin

unread,

Jan 11, 2004, 10:32:09 PM1/11/04

to

In article <4001...@212.67.96.135>, Andy <Andy...@home.com> wrote:

> Richard C. Cobbe wrote:
> > (First, a small side note: many, though not all, Scheme values are too big
> > to store in a single register; they're typically allocated elsewhere in
> > memory and the register contains a `reference' -- think C pointer, but
> > safer.)
>
> Thanks - that makes sense. It is also reassuring to note that Scheme's
> allocations are safer than those of C. Can anyone elaborate why that is
> the case? What is/are the difference(s)?

The reason is that in Scheme, all pointers are managed behind the scenes
by the implementation, while in C pointers are managed directly by the
application. Scheme doesn't have anything analogous to C's pointer
arithmetic, or casting pointers to integers. So the Scheme runtime
knows where all the pointers are, and therefore which objects are still
accessible through all those pointers. By implication, anything else is
garbage, so it can be reclaimed.

Also, since there's no explicit free() function in Scheme, you don't
have the possibility of indirecting through a pointer that has been
freed. The GC ensures that if you can still access the pointer, the
object it points to won't be freed, nor will anything that it
references, and so on recursively. As long as the GC works properly,
you can prove that no pointer problems can occur.

Richard C. Cobbe

unread,

Jan 12, 2004, 10:33:28 AM1/12/04

to

Andy <Andy...@home.com> writes:

> Richard C. Cobbe wrote:
>> (First, a small side note: many, though not all, Scheme values are too big
>> to store in a single register; they're typically allocated elsewhere in
>> memory and the register contains a `reference' -- think C pointer, but
>> safer.)
>
> Thanks - that makes sense. It is also reassuring to note that Scheme's
> allocations are safer than those of C. Can anyone elaborate why that
> is the case? What is/are the difference(s)?

In addition to Barry Margolin's response, the other major safety difference
is as follows. Let's say I'm programming in C:

struct Foo
{
int a;
char *b;
};

/* ... */

void f(struct Foo *arg)
{
/* ... */
}

Within the body of f, arg may point to *anything*, not necessarily a legal
Foo. For example, I could do something really stupid like the following:

f((struct Foo *) 0xDEADBEEF);

and the compiler will happily assume that I know what I'm doing, even
though the contents of memory location 0xDEADBEEF are almost certainly
garbage. You will get no indication that there is a problem, either at
compile time or at run time. (Actually, that's not quite true -- your
program will likely print incorrect results. If you're very lucky, you may
get a segmentation fault or other memory error, but the error will likely
manifest itself in a part of the program far away from the actual bug.)

None of the above should be news to anyone who programs in C/C++; I'm just
setting up the example.

In Scheme, by contrast, this can't happen. Since Scheme forbids C-style
typecasts and other unsafe operations, we can make stronger statements. If
I have a Scheme reference, then I know for a fact that the reference refers
to a valid Scheme object. In short, no segfaults. Ever.

Of course, if I write a scheme function that expects a reference to a foo,
and you call it with a bar instead, you won't get any compile-time errors.
However, when the function attempts to treat the bar as if it were a foo,
this will trigger a run-time error at the location of the actual bug, or at
least much closer to it. In addition to better error reporting, you also
don't get the worst case in the C world, where the program treats the bar
as if it were a foo and computes an incorrect result *with no indication
that the result is bogus*.

Richard

Peter Keller

unread,

Jan 12, 2004, 11:10:59 AM1/12/04

to

Richard C. Cobbe <co...@ccs.neu.edu> wrote:
> Of course, if I write a scheme function that expects a reference to a foo,
> and you call it with a bar instead, you won't get any compile-time errors.
> However, when the function attempts to treat the bar as if it were a foo,
> this will trigger a run-time error at the location of the actual bug, or at
> least much closer to it.

This is not entirely correct if you talk about user defined aggregate
objects.... If you truly want to catch the error at run time, then you
have to implement some sort of type tagging system for your "objects"
where every time you set up a barrier at the entrance of each function
which basically does explicit run time type checking.

An example would be:

(define employee ("Peter Keller" "Madison" "Wisconsin"))
(define employee-name
(lambda (x)
(car x)))

; now suppose we have some different style of structure representing terminated
; people. The important thing here is that it coincidentally has a person's
; name at the car of the list.
(define terminated ("John Smith" "01/11/04"))

Now, if you accidentily say:
(employee-name terminated)

It will return something of value, even though a serious type error
has happened.

This is mostly the most serious style of inadvertant bugs I've ever
encountered in scheme. With a little forsight and a decent knowledge
of functional design/abstraction, you can write a ton of code without
this being a problem. But when it DOES happen, be prepared for some
debugging time if you use scheme compilers with not that great of error
messages and no debugging interpreter.

-pete

P.S. Macro can make the above run time checks really easy to specify and write,
but that is a topic for another time, and you could probably find a good
example of it in the SICP book.

Richard C. Cobbe

unread,

Jan 12, 2004, 1:04:03 PM1/12/04

to

Peter Keller <psi...@buzzard.cs.wisc.edu> writes:

> Richard C. Cobbe <co...@ccs.neu.edu> wrote:
>> Of course, if I write a scheme function that expects a reference to a foo,
>> and you call it with a bar instead, you won't get any compile-time errors.
>> However, when the function attempts to treat the bar as if it were a foo,
>> this will trigger a run-time error at the location of the actual bug, or at
>> least much closer to it.
>
> This is not entirely correct if you talk about user defined aggregate
> objects.... If you truly want to catch the error at run time, then you
> have to implement some sort of type tagging system for your "objects"
> where every time you set up a barrier at the entrance of each function
> which basically does explicit run time type checking.

This is quite true. I'm so used to the structure system found in
DrScheme/HtDP, where you get run-time type checks in all the selectors for
free, that I didn't even consider the possibility of hand-rolled
structures.

Of course, this is one of the major reasons why hand-rolled data
structures are generally suboptimal.

Richard

Anton van Straaten

unread,

Jan 12, 2004, 1:09:30 PM1/12/04

to

Peter Keller wrote:
> Richard C. Cobbe <co...@ccs.neu.edu> wrote:
> > Of course, if I write a scheme function that expects a reference to a
foo,
> > and you call it with a bar instead, you won't get any compile-time
errors.
> > However, when the function attempts to treat the bar as if it were a
foo,
> > this will trigger a run-time error at the location of the actual bug, or
at
> > least much closer to it.
>
> This is not entirely correct if you talk about user defined aggregate
> objects....

Which is why most Schemes have a record type, and why SRFI-9, "Defining
Record Types", exists. Or, of course, more comprehensive solutions like
object systems.

> (define employee ("Peter Keller" "Madison" "Wisconsin"))
> (define employee-name
> (lambda (x)
> (car x)))

(define-struct employee (name city state))
(define emp1 (make-employee "Peter Keller" "Madison" "Wisconsin"))

...gives you a type-safe employee-name for "free".

> ; now suppose we have some different style of structure representing
terminated
> ; people. The important thing here is that it coincidentally has a
person's
> ; name at the car of the list.
> (define terminated ("John Smith" "01/11/04"))
>
> Now, if you accidentily say:
> (employee-name terminated)

...it will give you an error, of the general form "wrong type passed to
employee-name".

> This is mostly the most serious style of inadvertant bugs I've ever
> encountered in scheme.

The programming style which uses undifferentiated lists as a multi-purpose
data structure dates back to the early Lisp days, i.e. 1950's. It still has
its benefits, but you can't really enjoy those benefits without
understanding the tradeoffs, being familiar with the alternatives, and
knowing when it's appropriate to use that style and when it's not. Most
beginners should be using more structured techniques. Books like HTDP teach
the use of record types for these kinds of reasons.

Anton

Marvin Minsky

unread,

Jan 12, 2004, 3:19:33 PM1/12/04

to

Joe Marshall <prunes...@comcast.net> wrote in message news:<k73yb4...@comcast.net>...

Incidentally, when this is done in LISP-like systems then, at no extra
cost, consecutive list in the form of CDR chains can be stored in
consecutive memory registers, and then can be directly addressed as
indexed arrays. I don't know if anyone has exploited this feature.

Peter Keller

unread,

Jan 12, 2004, 3:38:09 PM1/12/04

to

Anton van Straaten <an...@appsolutions.com> wrote:
> The programming style which uses undifferentiated lists as a multi-purpose
> data structure dates back to the early Lisp days, i.e. 1950's. It still has
> its benefits, but you can't really enjoy those benefits without
> understanding the tradeoffs, being familiar with the alternatives, and
> knowing when it's appropriate to use that style and when it's not. Most
> beginners should be using more structured techniques. Books like HTDP teach
> the use of record types for these kinds of reasons.

Yeah, I agree that record types should be more often used than list or
vector slot methods. I guess I just wrote a lot of scheme before SRFI-9
became available, so I had to do a lot of it myself. It mostly had to
do with portability, but that reason is becomming less so...

-pete

Andy

unread,

Jan 12, 2004, 3:42:21 PM1/12/04

to

Joe Marshall wrote:
> Andy <Andy...@home.com> writes:
>
>
>>Whereas in C the memory leaks are not 'recycled'? Can you give me a
>>reference that explains that process or do you mind giving me a precis
>>of how this works in Scheme.
>
>
> Ok. Let's suppose we have just calculated something in a function and
> we are returning the answer. Assume the answer is placed in a
> register and we intend to just pop the stack and return.
>
> The contents of the other registers are (by definition) no longer
> useful, so we can zero those out. The only things that can possibly
> be used are those objects still on the stack, and anything that they
> refer to, and anything that *those* refer to, and anything that
> *those* refer to, etc.
>
> A popular GC technique is `mark and sweep'. We start with the stack
> and the returned object and `paint' it red. Now we go through and
> paint red everything that is referred to by a red object. We keep
> painting things red (but only if a red object refers to it), until we
> have painted everything we can. Now we look at all the memory. If we
> find a patch that isn't red, we can re-use it.

I suppose to some extent, an application of the legal phrase "fruit of
the poisoned tree" when considering the integrity of evidence: one looks
for contamination of the offending source. Here though, this emphasis is
reversed and validity (as in usability) is preserved.

> Another popular technique is `stop and copy' GC
> (Minsky-Fenichel-Yokelson). Whenever your apartment becomes too
> cluttered, you rent a new apartment, move everything you care about to
> the new place, then abandon the old place for the next renter to clean
> up. In GC terms, we create a new copy of every useful object and
> fixup all the pointers. Then we can discard the entire old heap
> wholesale.
>

It strikes me that the procedures to achieve such results are probably,
like sorting algorithms, part of a computer science students remit at
school. It sounds quite fascinating working out solutions to these
puzzles and I gotta admire the brains that obtain these kinds of
results. Very neat!!

Anyway, I have a _much_ clearer idea about GC and how it works, and am
appreciative to you for your assistance and examples.

Best wishes

- Andy

Joe Marshall

unread,

Jan 12, 2004, 6:12:14 PM1/12/04

to

min...@media.mit.edu (Marvin Minsky) writes:

> Joe Marshall <prunes...@comcast.net> wrote in message news:<k73yb4...@comcast.net>...
>>

>> Another popular technique is `stop and copy' GC
>> (Minsky-Fenichel-Yokelson). Whenever your apartment becomes too
>> cluttered, you rent a new apartment, move everything you care about to
>> the new place, then abandon the old place for the next renter to clean
>> up. In GC terms, we create a new copy of every useful object and
>> fixup all the pointers. Then we can discard the entire old heap
>> wholesale.
>
> Incidentally, when this is done in LISP-like systems then, at no extra
> cost, consecutive list in the form of CDR chains can be stored in
> consecutive memory registers, and then can be directly addressed as
> indexed arrays. I don't know if anyone has exploited this feature.

The Lisp Machine did.

--
~jrm

Joe Marshall

unread,

Jan 12, 2004, 6:43:30 PM1/12/04

to

Andy <andy...@home.com> writes:

> Joe Marshall wrote:

>> A popular GC technique is `mark and sweep'. We start with the stack
>> and the returned object and `paint' it red. Now we go through and
>> paint red everything that is referred to by a red object. We keep
>> painting things red (but only if a red object refers to it), until we
>> have painted everything we can. Now we look at all the memory. If we
>> find a patch that isn't red, we can re-use it.
>
> I suppose to some extent, an application of the legal phrase "fruit of
> the poisoned tree" when considering the integrity of evidence: one
> looks for contamination of the offending source. Here though, this
> emphasis is reversed and validity (as in usability) is preserved.

Exactly!

>> Another popular technique is `stop and copy' GC
>> (Minsky-Fenichel-Yokelson). Whenever your apartment becomes too
>> cluttered, you rent a new apartment, move everything you care about to
>> the new place, then abandon the old place for the next renter to clean
>> up. In GC terms, we create a new copy of every useful object and
>> fixup all the pointers. Then we can discard the entire old heap
>> wholesale.
>
> It strikes me that the procedures to achieve such results are
> probably, like sorting algorithms, part of a computer science students
> remit at school. It sounds quite fascinating working out solutions to
> these puzzles and I gotta admire the brains that obtain these kinds of
> results. Very neat!!

I'm just amazed at the people who came up with the original ideas:
Stefan Arnborg - http://www.nada.kth.se/~stefan/home.html
Henry Baker - http://home.pipeline.com/~hbaker1/home.html
C.J. Cheney
Robert Fenichel - http://www.fenichel.net/
John McCarthy - http://www-formal.stanford.edu/jmc/
Marvin Minsky - http://web.media.mit.edu/~minsky/
Jerome Yochelson

--
~jrm

Andy

unread,

Jan 13, 2004, 2:41:43 AM1/13/04

to

Joe Marshall wrote:
> Andy <Andy...@home.com> writes:
>
>
>>I may regret answering this due to the potential level of technicality
>>your answer might involve but ... <deep breath> ... how does a program
>>continue to behave _as if_ the object has not been de-allocated?
>
>
> By not bothering to use the object.
>
> (define *a-random-thing* (cons 23 "Skidoo"))
>
> (define (foo x y)
> (+ x y))
>
> No matter what I do to *a-random-thing*, including de-allocating it,
> the function FOO continues to work as before.

Meaning what? That the functions are independent? Sorry Joe, can you
back up a step here. I'm not seeing what you mean.

>
>>I don't know if that makes sense, so let me ask it this way:
>>If the program allocates an object some or value and that value is
>>then stored in a register, if that object is now de-allocated (or
>>rendered inaccessible and hence is "put out" for garbage collection)
>>is it the reference that is changed, or is the value erased from the
>>register? If it is erased from the register, then why wouldn't that
>>create an error if the program is still pointing to that value at that
>>register.
>
>
> That would be a bad thing to deallocate. Let's give a better example:
>
> (define (sum-and-difference a b)
> (cons (+ a b) (- a b)))
>
> (define (double x)
> (car (sum-and-difference x x)))
>
> SUM-AND-DIFFERENCE returns a CONS cell with the SUM and DIFFERENCE of
> the arguments. DOUBLE uses this to construct the sum, but it doesn't
> care what the difference is, so it simply selects the first element.
>
> Now what about that CONS cell? No one seems to be using it. Once
> DOUBLE returns, there is no way to get at the CONS cell. Might as
> well recycle it.
>

Right, I got it I think. I have to keep straight in my mind that the
values stored in a given mem location are collected, leaving those
locations available to store new values.

Thanks for the illustration Joe. That was clear and useful.

All the best

- Andy

Joe Marshall

unread,

Jan 13, 2004, 3:27:53 AM1/13/04

to

Andy <andy...@home.com> writes:

> Joe Marshall wrote:
>> Andy <Andy...@home.com> writes:
>>
>>>I may regret answering this due to the potential level of technicality
>>>your answer might involve but ... <deep breath> ... how does a program
>>> continue to behave _as if_ the object has not been de-allocated?
>> By not bothering to use the object.
>> (define *a-random-thing* (cons 23 "Skidoo"))
>> (define (foo x y)
>> (+ x y))
>> No matter what I do to *a-random-thing*, including de-allocating it,
>> the function FOO continues to work as before.
>
> Meaning what? That the functions are independent?

*a-random-thing* is a variable that isn't used by the function.
As you say, the function is completely independent of it.

> Sorry Joe, can you back up a step here. I'm not seeing what you mean.

This may seem trivial, but if you can demonstrate that every function
in the system is completely independent of some piece of data, then it
doesn't matter what you do with the data, including deallocate it.

(Of course, my example used a top-level variable and you cannot
deallocate those because the user might type in the name at any time
and he'd be upset.)

When you design a garbage collector, you can model the entire computer
memory as a mathematical object. Every instruction that you execute
may make changes to memory locations and every function you compute
may depend on the values in certain memory locations. If you can
factor the memory into two sets of locations, those that might be used
in the future and those that definitely will not be used in the
future, then you can recycle the ones that definitely will not be
used.

When you model the memory mathematically, you'll have some terms that
simply do not appear in the equations. Since the future of the
computation is completely independent of those terms, they can be
recycled.

--
~jrm

Andy

unread,

Jan 13, 2004, 1:40:00 PM1/13/04

to

Joe Marshall wrote:
>
> 1. Security: Security may be improved because it is impossible to
> refer to an object that is no longer valid. In a language like C,
> one can call free to de-allocate an object, but if a reference to
> the object still exists, then the freed object could accidentally
> be re-used. This may even `work' until the memory used for the
> object is allocated by something else.

That makes sense. So, if I am getting my head around this correctly
then, Scheme (and do I presume Lisp also?) does not suffer from
vulnerabilities such as buffer over-runs, etc. as C does?

> 2. Program overhead: A certain amount of overhead is caused by
> garbage collection. It is necessary to leave enough hints around
> for the GC to make sense of what is going on. This is done
> automatically, so the programmer doesn't have to be aware of it,
> but there is generally one or two bits per word of memory that is
> used to tag objects. There is active research in how to reduce
> the overhead (with a smart compiler, the GC overhead can approach
> zero).

I bet this is a big research area for optimisation and efficacy of
resource consumption.

> On the other hand manual memory management typically accounts for
> about 1/3 of the code in a C program. All this code goes away.
>

I wasn't aware of this - interesting comparison. I wonder what the C
advocates would say?

> Will Clinger and Lars Hansen outfitted their Larceny Scheme
> implementation to have a pluggable garbage collector. You could
> select which GC you wanted to use.
>
> This paper has info:
>
> @misc{ hansen-experimental,
> author = "Lars T Hansen and William D Clinger",
> title = "An Experimental Study of Renewal-Older-First Garbage Collection",
> url = "citeseer.nj.nec.com/598694.html" }
>

I have downloaded and printed off this paper. It looks reasonably
technical, so it'll take me a while to work my way through, but thanks
for the reference.

> The outside-to-inside technique is for understanding the *intent* as
> quickly as possible. You have to know what the main idea is before
> you can criticise the minutae.

> But of course you don't want to generalize this to accepting all
> programs on faith.
>
> Let me put it this way: suppose you were doing a code review. If we
> start at the innermost piece of code, you might see this:
> (+ x offset)
>
> Is this correct? Who knows? Could it be wrong? Maybe.
>
> Now if we look at the outermost piece of code:
> (define (export-file-changes file change-list) ....)
>
> Even before we look at the body we have some idea at what might be
> going on. We expect some file operations, we expect some iteration
> over a list, we don't expect fourier transforms.
>
> Since we are doing a code review, we would now go on to consider the
> pieces, but we have the beginnings of a road map for the review. If
> the next thing we see is this:
> (for-each (lambda (change) ...) change-list)
>
> we'll know that things are heading in the right direction, but if we
> see this:
> (/ change-list (+ file 7))
>
> we'll know something is terribly wrong.

That makes sense to me. It also seems to be a matter of experience of
reading/writing code so that one would know ("intuitively"??) what
should and (perhaps more importantly) knows what should _not_ come next
in any given code segment.
As I think I might have mentioned previously, as an absolute beginner I
have only be introduced to the substitution model of reading (in to
out), but since your post and those of some others here, I have started
to practice reading code from out to in, and I am finding it really does
change the overall sensibility I am able to make of the code - it is
quicker and actually provides more of a holistic/global/overall
perspective. Fortunately, I know I can trust the code (it's simply.scm
that is used in Harvey and Wright's "Simply Scheme"), but I also know
that if I tried to read code sight-unseen I would really struggle to
sort the good from the bad from the just plain awful.

>
> It's the philosophy of abstraction (object-oriented is but one means
> of abstraction). Computers are just too damn complicated to figure
> out, so you *have* to pretend that the details don't matter. The art
> of computer science is to figure out the best set of abstractions to
> use.

Hmmm - gotta say, I am reassured that I am not the only one who finds
computers too complex/complicated to figure out!! :)

Joe, that was a really helpful illustration. Made your point really
clearly. If you teach, and if you haven't used that example, keep it -
it's bound to help some other neophyte get to grips with some of these
concepts. Thanks for taking all that time to walk through that - that
was generous of you.

> (defmacro MONITOR (&rest names)
> "Monitor the named functions. As in TRACE, the names are not evaluated.
> If a function is already monitored, then unmonitor and remonitor (useful
> to notice function redefinition). If a name is undefined, give a warning
> and ignore it. See also unmonitor, report-monitoring,
> display-monitoring-results and reset-time."

Quickie: Is this a 'doc-string'? Because if I recall correctly from my
foray into Python, Python sets up its doc-strings like this.

> You're welcome.

And, once again, am very grateful :)

All the best to you

- Andy

Joe Marshall

unread,

Jan 13, 2004, 2:14:15 PM1/13/04

to

Andy <andy...@home.com> writes:

> Joe Marshall wrote:
>> 1. Security: Security may be improved because it is impossible to
>> refer to an object that is no longer valid. In a language like C,
>> one can call free to de-allocate an object, but if a reference to
>> the object still exists, then the freed object could accidentally
>> be re-used. This may even `work' until the memory used for the
>> object is allocated by something else.
>
> That makes sense. So, if I am getting my head around this correctly
> then, Scheme (and do I presume Lisp also?) does not suffer from
> vulnerabilities such as buffer over-runs, etc. as C does?

For several years, the White House web site, www.whitehouse.gov, was
running CL-HTTP on a Symbolics Lisp Machine. During the time it was
operational it was never once compromised.

Henry Baker has this to say about buffer overflows:

[Risks Digest 21.84] Pointing the finger at buffer overflows

Date: Wed, 26 Dec 2001 21:19:22 -0800
From: Henry Baker
Subject: "Buffer Overflow" security problems

I'm no fan of lawyers or litigation, but it's high time that someone defined
"buffer overflow" as being equal to "gross criminal negligence"...

I'm sorry to be politically incorrect, but for the ACM to then laud "C" and
its inventors as a major advance in computer science has to rank right up
there with Chamberlain's appeasement of Hitler.

If I remove a stop sign and someone is killed in a car accident at
that intersection, I can be sued and perhaps go to jail for
contributing to that accident... If I remove array bounds checks from
my software, I will get a raise and additional stock options due to
the improved "performance" and decreased number of calls from customer
service...

The most basic safeguards found in "professional engineering" are
cavalierly and routinely ignored in the software field. Software
people would never drive to the office if building engineers and
automotive engineers were as cavalier about buildings and autos as the
software "engineer" is about his software.

>> (defmacro MONITOR (&rest names)
>> "Monitor the named functions. As in TRACE, the names are not evaluated.
>> If a function is already monitored, then unmonitor and remonitor (useful
>> to notice function redefinition). If a name is undefined, give a warning
>> and ignore it. See also unmonitor, report-monitoring,
>> display-monitoring-results and reset-time."
>
>
> Quickie: Is this a 'doc-string'? Because if I recall correctly from my
> foray into Python, Python sets up its doc-strings like this.

Yes.

Andy

unread,

Jan 13, 2004, 7:00:46 PM1/13/04

to

Joe Marshall wrote:

<snip>

> This may seem trivial, but if you can demonstrate that every function
> in the system is completely independent of some piece of data, then it
> doesn't matter what you do with the data, including deallocate it.

... because, being independent, nothing else is reliant on it for it to
do its own job, so from the perspective of the remaining allocated
functions/procedures there is nothing in the system except for the scope
that it refers to.

> When you design a garbage collector, you can model the entire computer
> memory as a mathematical object. Every instruction that you execute
> may make changes to memory locations and every function you compute
> may depend on the values in certain memory locations. If you can
> factor the memory into two sets of locations, those that might be used
> in the future and those that definitely will not be used in the
> future, then you can recycle the ones that definitely will not be
> used.
>
> When you model the memory mathematically, you'll have some terms that
> simply do not appear in the equations. Since the future of the
> computation is completely independent of those terms, they can be
> recycled.
>

And once the link (the reference (or binding?)) between variable and
value is broken (de-allocated) the value is as good as vapourised. Hence
the location where that value was stored can be re-used at any time
which is accomplished by a new reference between the memory location of
a name and a value being made through a procedure?

- Andy

Andy

unread,

Jan 13, 2004, 7:14:11 PM1/13/04

to

Joe Marshall wrote:

> For several years, the White House web site, www.whitehouse.gov, was
> running CL-HTTP on a Symbolics Lisp Machine. During the time it was
> operational it was never once compromised.

This is a little off topic, but:

1. Was it not compromised because it was running on Lisp - is inherently
more secure than <insert language here>?
2. You address this as if it is no longer the case? If so, any ideas why
the WH changed and have there been any compromises since (that
question is a bit like digging for some juicy gossip, sorry - but am
curious, nonetheless)

Doesn't pull many punches, eh? Aren't there ways and means to try and
protect C code from those kinds of leak holes? Are you saying that
Scheme closes those holes completely right from the get go, so to speak,
whereas C has to be tweaked/cajoled into doing that with the attendant
extra code and overhead that extra measure might require?

Cheers

- Andy

Joe Marshall

unread,

Jan 13, 2004, 10:28:02 PM1/13/04

to

Andy <andy...@home.com> writes:

> Joe Marshall wrote:
>
>> For several years, the White House web site, www.whitehouse.gov, was
>> running CL-HTTP on a Symbolics Lisp Machine. During the time it was
>> operational it was never once compromised.
>
> This is a little off topic, but:
>
> 1. Was it not compromised because it was running on Lisp - is
> inherently more secure than <insert language here>?

The bulk of the bugs in standard web servers are buffer overruns.
Lisp simply does not have that problem.

> 2. You address this as if it is no longer the case? If so, any ideas
> why the WH changed and have there been any compromises since (that
> question is a bit like digging for some juicy gossip, sorry - but am
> curious, nonetheless)

They kicked out the former resident of the whitehouse.

It definitely has been taken down by a denial of service attack, and
there seems to be an indication that it was compromised, but very
quickly fixed.

>
> <snip citation of Henry Baker [Risks Digest 21.84] on buffer overflows>
>
> Doesn't pull many punches, eh? Aren't there ways and means to try and
> protect C code from those kinds of leak holes?

Yes, but the semantics of the language means that you cannot close all
the holes.

> Are you saying that Scheme closes those holes completely right from
> the get go, so to speak, whereas C has to be tweaked/cajoled into
> doing that with the attendant extra code and overhead that extra
> measure might require?

Exactly.

--
~jrm

Joe Marshall

unread,

Jan 13, 2004, 10:29:23 PM1/13/04

to

Andy <andy...@home.com> writes:

> And once the link (the reference (or binding?)) between variable and
> value is broken (de-allocated) the value is as good as
> vapourised.

Or if the variable can no longer be named.

> Hence the location where that value was stored can be
> re-used at any time which is accomplished by a new reference between
> the memory location of a name and a value being made through a
> procedure?

Yep.

--
~jrm

Rainer Joswig

unread,

Jan 14, 2004, 3:16:18 AM1/14/04

to

Joe Marshall <j...@ccs.neu.edu> wrote in message news:<65ffu0...@ccs.neu.edu>...

> Andy <andy...@home.com> writes:
>
> > Joe Marshall wrote:
> >> 1. Security: Security may be improved because it is impossible to
> >> refer to an object that is no longer valid. In a language like C,
> >> one can call free to de-allocate an object, but if a reference to
> >> the object still exists, then the freed object could accidentally
> >> be re-used. This may even `work' until the memory used for the
> >> object is allocated by something else.
> >
> > That makes sense. So, if I am getting my head around this correctly
> > then, Scheme (and do I presume Lisp also?) does not suffer from
> > vulnerabilities such as buffer over-runs, etc. as C does?
>
> For several years, the White House web site, www.whitehouse.gov, was
> running CL-HTTP on a Symbolics Lisp Machine. During the time it was
> operational it was never once compromised.

Almost. It was not www.whitehouse.gov. More like www.pub.whitehouse.gov.

Samuel Walters

unread,

Jan 16, 2004, 8:48:30 AM1/16/04

to

| Joe Marshall said |

Slightly OT:

Does Lisp or any of it's other variants have these?

Thanks in advance.

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Tim Haynes

unread,

Jan 16, 2004, 9:23:30 AM1/16/04

to

Samuel Walters <swalter...@yahoo.com> writes:

>>> Quickie: Is this a 'doc-string'? Because if I recall correctly from my
>>> foray into Python, Python sets up its doc-strings like this.
>>
>> Yes.
>
> Slightly OT:
>
> Does Lisp or any of it's other variants have these?

What, doc-strings? Definitely.

(define (my-function somearg)
"Return addition of 1 onto SOMEARG"
(+ 1 somearg))

~Tim
--
14:22:25 up 44 days, 17:35, 2 users, load average: 0.32, 0.18, 0.21
pig...@stirfried.vegetable.org.uk |Remember, fish are FOOD not FRIENDS!
http://spodzone.org.uk/cesspit/ |

Anton van Straaten

unread,

Jan 16, 2004, 10:38:38 AM1/16/04

to

Samuel Walters wrote:
> >> Quickie: Is this a 'doc-string'? Because if I recall correctly from my
> >> foray into Python, Python sets up its doc-strings like this.
> >
> > Yes.
>
> Slightly OT:
>
> Does Lisp or any of it's other variants have these?

I'm under the impression that Lisp was the first language to use them.
Python borrowed the idea from Lisp.

Anton

tal...@noshpam.lbl.government

unread,

Jan 16, 2004, 1:58:06 PM1/16/04

to

On Jan 16, 2004 at 1:48pm, Samuel Walters wrote:

swalte >Slightly OT:
swalte >
swalte >Does Lisp or any of it's other variants have these?

There's Schmooz for Scheme:

http://tinyurl.com/2w49p

Also, Emacs-Lisp supports doc-strings in the first part of a
definition.

~Tomer

Michele Simionato

unread,

Jan 16, 2004, 2:29:59 PM1/16/04

to

Samuel Walters <swalter...@yahoo.com> wrote in message news:<pan.2004.01.16...@yahoo.com>...
> Slightly OT:
>
> Does Lisp or any of it's other variants have these? [docstrings]

Yes and not. Speaking about Scheme (I don't know about Lisp) you should
realize that

(define (id x)
"This is the identity function" ; Scheme docstring
x)

is quite different from Python

def id(x):
"This is the identity function" # Python docstring
return x

In the Scheme case, the literal string is just evaluated to itself, and
has no effect at all: it is the same as a comment. On the other hand,
the Python docstring becomes an attribute of the function (id.__doc__)
and can be retrieved at run-time; you can also change it dynamically,
if you want. In order to access Scheme docstrings you need a code
walker instead.
Python functions are objects with attributes, whereas Scheme functions
are just functions. BTW, it is possible to define annoted procedures
in Scheme, at least in some implementation, but it is not standard.
I've got the impression that docstrings are rarely used in Scheme,
whereas they are ubiquitous in emacs-lisp.

Michele Simionato

Samuel Walters

unread,

Jan 16, 2004, 2:38:43 PM1/16/04

to

| Michele Simionato said |

> Samuel Walters <swalter...@yahoo.com> wrote in message news:<pan.2004.01.16...@yahoo.com>...
>> Slightly OT:
>>
>> Does Lisp or any of it's other variants have these? [docstrings]
>
> Yes and not. Speaking about Scheme (I don't know about Lisp) you should
> realize that
>
> (define (id x)
> "This is the identity function" ; Scheme docstring
> x)
>
> is quite different from Python
>
> def id(x):
> "This is the identity function" # Python docstring
> return x
>
> In the Scheme case, the literal string is just evaluated to itself, and
> has no effect at all: it is the same as a comment. On the other hand,
> the Python docstring becomes an attribute of the function (id.__doc__)
> and can be retrieved at run-time; you can also change it dynamically,
> if you want. In order to access Scheme docstrings you need a code
> walker instead.

Ah, thank you. Docstrings are one of the things I love about python.
Even though they're not as "useful" as Python docstrings, something is
better than nothing.

> Python functions are objects with attributes, whereas Scheme functions
> are just functions. BTW, it is possible to define annoted procedures
> in Scheme, at least in some implementation, but it is not standard.
> I've got the impression that docstrings are rarely used in Scheme,
> whereas they are ubiquitous in emacs-lisp.

I imagine that emacs has the code-walker built in. :-P
Or, at least, as an easy add-in module.

Thank you all for your responses.

Sam Walters

Barry Margolin

unread,

Jan 16, 2004, 10:53:06 PM1/16/04

to

In article <95aa1afa.04011...@posting.google.com>,
michele....@poste.it (Michele Simionato) wrote:

> In the Scheme case, the literal string is just evaluated to itself, and
> has no effect at all: it is the same as a comment.

That's what it is in Lisp, too. The convention originally was created
because it could be used compatibly in implementations that don't treat
the doc-string specially as well as newer ones that recognized it and
saved it away as an attribute. There's certainly nothing preventing
Scheme implementations from doing the same thing, and if enough adopt
this it could be made official in a future RnRS.

tal...@noshpam.lbl.government

unread,

Jan 18, 2004, 8:04:00 PM1/18/04

to

On Jan 17, 2004 at 3:53am, Barry Margolin wrote:

barmar >saved it away as an attribute. There's certainly nothing preventing
barmar >Scheme implementations from doing the same thing, and if enough adopt
barmar >this it could be made official in a future RnRS.

Could some "define" macro be used to implement basic docstring
functionality?

Would something like this be more appropriate for a SRFI?

~Tomer