Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Implementing lisp in Forth. Chapter 4.

227 views
Skip to first unread message

Albert van der Horst

unread,
Sep 5, 2016, 5:57:06 AM9/5/16
to
I've cross posted to comp.lang.lisp
The context is writing a minimal lisp, but with Forth instead of
c as the implementation language.

In article <_p9zz.1142663$OC.4...@fx39.am4>,
Spiros Bousbouras <spi...@gmail.com> wrote:
<SNIP>
>
>I don't think that garbage collection is something to be added as an
>afterthought. The existence of garbage collection affects the allocation (and
>possible recollection) of every object (see examples below) so it's one of
>the issues you need to be thinking about from the start. It's likely more
>fundamental than the syntax of the language. Having said that , it's not that
>big a hurdle. I don't know why you don't like Anton Ertl's package but there
>is also the Boehm-Demers-Weiser garbage collector which is used in Embeddable
>Common Lisp , Guile Scheme and was used in an early implementation of Racket
>Scheme. Even writing your own precise garbage collector is not terribly
>difficult , see
> http://www.cs.cornell.edu/courses/cs312/2003fa/lectures/sec24.htm
>for a simple algorithm.
>
>> About garbage: Is this correct (the infamous Fibonacci series)?
>>
>> After:
>> (define (fib n)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i n ) b)))
>> the fib procedure remains there in all eternity and there is nothing
>> to collect. I can do (fib 10) any time I please.
>> Or will there be anonymous memory places and are the
>> boxes they are in such as (a,b,i,n) collectable?
>
>As long as you don't redefine it then yes it remains there for all eternity.

Maybe there is a confusion between dynamic allocation / knowing
exactly where each piece of memory hangs out, and garbage
collection.
Such a fib consist (in my view) of a Forth word header of 5 CELLS which
is dynamically allocated *using my Forth ALLOCATE* and pointed to by some
pointer sitting in some tree or list or hash as a global heap.
Then the code is high level Forth code, and again it is allocated
dynamically and a pointer to it is present in the header.
(High level Forth code is, of course a list of code tokens.)
Contrary to malloc, my ALLOCATE knows a lot about
the size etc of pieces allocated.
Of course I make sure that analysing the heap is possible to find
out what is no longer needed. But I call actually doing that
the garbage collection. There is no doubt in my mind that
I can implement a garbage collector even if it is
very inefficient. What one needs is all knowledge about pointers to
dynamically allocated area and into dynamically allocated area and
know which is which, and a means to know the sizes.
That may not be properly called an afterthought.
If I wouldn't keep that information from the start, an attempt to
write a Lisp would be ludricous indeed. If I want to investigate how
a Forth can morph into a lisp, I can't have an external tool
dictate me how to do the allocation. Using my own ALLOCATE I can change
it, if I want properties specifically needed for LISP

>But if you call it then the arguments fib and n as well as the local
>(lexical in Lisp parlance) variables i , a , b will have to be allocated
>somewhere. So whether they create garbage , which will need to be reclaimed
>eventually , depends on the allocation strategy of the implementation. A
>perfectly reasonable strategy is that the implementation gets a big chunk of
>memory from the operating system and every time it needs a new object it uses
>the smallest address from that space which it hasn't used yet. If the space
>fills up then the implementation may ask for more memory from the operating
>system or do garbage collection ; obviously , eventually it will have to do
>garbage collection rather than keep asking for more and more memory. With
>such an allocation scheme , every time you call fib and it returns then it
>will create garbage for fib , n , etc. Now a sufficiently clever Lisp
>implementation may be able to prove that none of the variables of fib can
>be referenced after fib terminates and therefore allocate them on the
>hardware stack as it would (likely) happen in a language like C.
>en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
>not that clever then every call to fib creates garbage and if that garbage
>isn't cleared every now and again , it will consume eventually all available
>memory.

My Forth allocation strategy is simpler. At the start a large chunk
is asked from Linux (which overcommits memory). Then my ALLOCATE
gives out memory on a first free basis. Maybe I get stuck and have to
do garbage collection, which amounts to calling FREE from the allocator.
If the large stuck wasn't enough, that is the end.
With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
the GC needs to kick in.

>
>> But what with (directly calculating the 100-th fib number)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i 100 ) b )
>> )
>> 354224848179261915075
>> does that leave garbage in your run of the mill scheme?
>> (Because it might not in mine.)
>
>Same as above ; if the Lisp implementation is clever enough to see that
>i , a , b cannot be accessed after the piece of code terminates then it will
>allocate them in such a way as to not create garbage (like on the hardware
>stack) , otherwise they will be garbage. Note also , which Paul Rubin pointed
>out in a different post , that even 354224848179261915075 may create garbage
>if it's so large that it won't fit in a register. Since
>354224848179261915075 > 2^64 then even in a 64 bit machine it may create
>garbage. Even just writing on the REPL
> 354224848179261915075
>may create garbage.

That is interesting. The Forth REPL can interpret the line
.( aap)
and it will print aap. The string aap is a temporary object and
disappears automatically. It will not be allocated anywhere,
only a string descriptor will be on the Forth stack.
"aap" TYPE
likewise will print aap, otoh it will allocate a string (in ciforth).
Its descriptor is thrown away, so the string itself becomes garbage.
Normally in Forth this garbage is collected using FORGET/MARKER
a very primitive mechanism that basically only restores a previous
memory situation.

>
>> In the procedure fib can I optimise the fib into machine code with
>> (optimise fib)
>> that optimises the underlying Forth code, without getting on
>> someones toes?
>> Obviously that may leave some garbage that may be harder to collect.
>
>Every garbage collection algorithm I've heard of deals with low level
>pointers so turning your high level language into machine code won't make
>garbage collection harder. But in any case , your concern shows again that
>garbage collection is something you have to incorporate into your design from
>the start , not something you add as an afterthought.
>
>[...]
>
>> I just read the description of Guile. That is a Scheme dialect and Scheme
>> is supposed to be simple. I can't detect the simplicity in it (not
>> from the info file). It kind of deters me.
>> I see that guile binary is 3K, that seems okay, but it calls
>> horrific c-libraries like the infamous readline, maybe even malloc.
>
>Every Scheme intended for real work will have plenty of extensions on top of
>the Scheme standard. en.wikipedia.org/wiki/Scheme_(programming_language) :
>
> The R6RS standard has caused controversy because it is seen to
> have departed from the minimalist philosophy. In August 2009, the
> Scheme Steering Committee which oversees the standardization
> process announced its intention to recommend splitting Scheme
> into two languages: a large modern programming language for
> programmers, and a subset of the large version retaining the
> minimalism praised by educators and casual implementors

Thank you. good to know that. I have no objection to extensions of course.
My statically linked lina is 30 kbyte. If you want floating point,
you have to load it from a source library, so then you've to load an
assembler first from a source library.

>
>I don't know why you think readline is infamous but it's for providing a more
>pleasant interface and not necessary. On the other hand , Guile certainly
>uses the GNU multiprecision library and being able to handle arbitrarily
>large integers *is* part of the core functionality. Since it uses the
>Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
>(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.

(In the context of toy lisps that can be implemented in 1000 lines,
I hate readline which is itself couple of hundred lines,
is not written in lisp and is not strictly needed.
I'm glad to see that it is a loadable extension to Guile.)

BDW is a c-package? I'm disappointed, because people told me that
Lisp can be implemented using 5 basic operations. That means
something different than saying that yourforth is implemented
with 38 basic words (in assembler) apparently.
In analysing what is minimally needed in Forth always
KEY and EMIT come up. That is realistic. If you can't tell the
computer what to do, and can't see what it has done, your
language is incomplete. Lispers seem to think that REPL is
a tucked on part.

If you would program a LISP machine like I have a Forth machine
(just a tower that has Forth in the boot sector), would it then contain
a couple of giant binary blobs being the GNU multiprecision lib,
the compiled readline lib and maybe even a compiled BDW library written
in C?
Eech!

> Jorgen Grahn
> James K. Lowden

Two names in a signature, but you're Spiros?
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Paul Rubin

unread,
Sep 5, 2016, 3:04:24 PM9/5/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
> the GC needs to kick in.

I suppose that's a reasonable way to start.

>> 354224848179261915075 may create garbage
> That is interesting. The Forth REPL can interpret the line .( aap)
> and it will print aap. The string aap is a temporary object and
> disappears automatically. It will not be allocated anywhere,

What happens if you enter the line "123" to leave the number 123 on the
stack? It goes into a stack cell (machine word) that's reclaimed
automatically when you pop the stack--that's how Forth works. But what
if the number is 354224848179261915075? It won't fit in a cell. It's a
bignum that must occupy multiple cells, so you'd use something like
ALLOCATE to get those cells, put the number in them, and leave just a
pointer on the stack. Later once the number is no longer in use, you'll
want to reclaim the memory. That's why we say entering
354224848179261915075 in a Lisp REPL might create garbage.

> Normally in Forth this garbage is collected using FORGET/MARKER
> a very primitive mechanism that basically only restores a previous
> memory situation.

Having GC frees you from having to think about stuff like that.

Edsger W. Dijkstra in his 1972 Turing Award lecture said,

"With a few very basic principles at its foundation, it [LISP] has
shown a remarkable stability. Besides that, LISP has been the
carrier for a considerable number of in a sense our most
sophisticated computer applications. LISP has jokingly been
described as “the most intelligent way to misuse a computer”. I
think that description a great compliment because it transmits the
full flavour of liberation: it has assisted a number of our most
gifted fellow humans in thinking previously impossible
thoughts."

Part of the way it did that was by handling stuff like memory management
for you behind the scenes, in a manner mostly unknown in other languages
of that era. These days, of course, everybody does it.

> BDW is a c-package? I'm disappointed, because people told me that
> Lisp can be implemented using 5 basic operations.

That's too vague to be meaningful. Lisp code is written with the
illusion of infinite memory, and the GC reclaims the actual underlying
memory behind the scenes, invisibly to the program. GC has to munge on
actual raw memory which Lisp doesn't have a way to do. BDW is one
approach to GC but there are others, all of which need machine-level
primitives. Some Lisps include memory-access primitives so the GC can
itself be written in Lisp, but that stuff has to be done very carefully
to avoid creating garbage while the GC is running. I believe that
Scheme-48 (s48.org) works that way.

> If you would program a LISP machine like I have a Forth machine (just
> a tower that has Forth in the boot sector), would it then contain a
> couple of giant binary blobs being the GNU multiprecision lib, the
> compiled readline lib and maybe even a compiled BDW library written in
> C? Eech!

BDW is a nice simple way to get GC for a small Lisp, but the higher
performance ones tend to use other methods. And of course Lisp is much
older than the GNU project. Traditionally bignums would have been
implemented in assembly language. The historical Lisp machines had
hardware-assisted GC, I believe.

There's a bare-metal Scheme for the Raspberry Pi but I don't know if
anyone actually uses it. Big Lisp systems usually run under OS's rather
than on bare metal. Although control systems have been done in Lisp,
Lisp usually hasn't been in the same niche as Forth. It's more for
computation than for control, so it usually doesn't care much about low
level hardware access.

Albert van der Horst

unread,
Sep 6, 2016, 5:39:58 AM9/6/16
to
In article <87zinmm...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
<SNIP>
>> If you would program a LISP machine like I have a Forth machine (just
>> a tower that has Forth in the boot sector), would it then contain a
>> couple of giant binary blobs being the GNU multiprecision lib, the
>> compiled readline lib and maybe even a compiled BDW library written in
>> C? Eech!
>
>BDW is a nice simple way to get GC for a small Lisp, but the higher
>performance ones tend to use other methods. And of course Lisp is much
>older than the GNU project. Traditionally bignums would have been
>implemented in assembly language. The historical Lisp machines had
>hardware-assisted GC, I believe.

How can a c-package that has the same API as malloc ever know what
is garbage to LISP? Why would my ALLOCATE fare any less than BDW
if it has access to that same information?

A package with an ALLOCATE and a FREE is not a garbage collector.
Garbage collecting is about knowing what can be freed.
(Merging free areas into one is done automatically and incrementally
in my Forth ALLOCATE. I hope not that this is called garbage collection.
My Forth ALLOCATE knows the size of all allocated area's, so it is
superior to malloc() in all respects except speed.)

I can imagine there is a "fat" FREE to be called from lisp.
It frees the area pointed to and recursively via pointers stored
there all areas indirectly pointed to. I can add a fat FREE to
my ALLOCATE in a heartbeat. But surely only Lisp can decide
when to call it. If there are any aliasing pointers to one
of those areas, it can't fatFREE it.

>
>There's a bare-metal Scheme for the Raspberry Pi but I don't know if
>anyone actually uses it. Big Lisp systems usually run under OS's rather
>than on bare metal. Although control systems have been done in Lisp,
>Lisp usually hasn't been in the same niche as Forth. It's more for
>computation than for control, so it usually doesn't care much about low
>level hardware access.
>

Groetjes Albert

Spiros Bousbouras

unread,
Sep 6, 2016, 9:18:55 AM9/6/16
to
Are those 5 CELLS going to be reused every time the function gets called ?
For this function it doesn't matter if they are but in Lisp you can create
closures which return storage local to the function (i.e. lexical variables)
up the call stack. For example (I'll use Common Lisp syntax because I'm
more familiar with it than Scheme)

(defun adder (n)
(lambda (i) (+ n i)))

If you're going to translate adder in Forth it won't do to have 1 CELL for
n which gets reused every time adder gets called ; you need to create fresh
storage space for adder every time it gets called unless you can prove that
its return value (which is an anonymous function) gets discarded.

> Then the code is high level Forth code, and again it is allocated
> dynamically and a pointer to it is present in the header.
> (High level Forth code is, of course a list of code tokens.)
> Contrary to malloc, my ALLOCATE knows a lot about
> the size etc of pieces allocated.

malloc() also knows about the size of pieces allocated otherwise it wouldn't
work. The interface does not make such information available to the programmer
but the internals of malloc() (and related functions) certainly need to know.

> Of course I make sure that analysing the heap is possible to find
> out what is no longer needed. But I call actually doing that
> the garbage collection.

It's part of garbage collection but you also need to reclaim the space.

> There is no doubt in my mind that
> I can implement a garbage collector even if it is
> very inefficient. What one needs is all knowledge about pointers to
> dynamically allocated area and into dynamically allocated area and
> know which is which, and a means to know the sizes.
> That may not be properly called an afterthought.
> If I wouldn't keep that information from the start, an attempt to
> write a Lisp would be ludricous indeed. If I want to investigate how
> a Forth can morph into a lisp, I can't have an external tool
> dictate me how to do the allocation. Using my own ALLOCATE I can change
> it, if I want properties specifically needed for LISP

Well , if you're going to change ALLOCATE so that it stores the necessary
information to do garbage collection then I would most certainly would not call
it an "afterthought" ; on the contrary you design with garbage collection in
mind starting from the most basic components.

> >But if you call it then the arguments fib and n as well as the local
> >(lexical in Lisp parlance) variables i , a , b will have to be allocated
> >somewhere. So whether they create garbage , which will need to be reclaimed
> >eventually , depends on the allocation strategy of the implementation. A
> >perfectly reasonable strategy is that the implementation gets a big chunk of
> >memory from the operating system and every time it needs a new object it uses
> >the smallest address from that space which it hasn't used yet. If the space
> >fills up then the implementation may ask for more memory from the operating
> >system or do garbage collection ; obviously , eventually it will have to do
> >garbage collection rather than keep asking for more and more memory. With
> >such an allocation scheme , every time you call fib and it returns then it
> >will create garbage for fib , n , etc. Now a sufficiently clever Lisp
> >implementation may be able to prove that none of the variables of fib can
> >be referenced after fib terminates and therefore allocate them on the
> >hardware stack as it would (likely) happen in a language like C.
> >en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
> >not that clever then every call to fib creates garbage and if that garbage
> >isn't cleared every now and again , it will consume eventually all available
> >memory.
>
> My Forth allocation strategy is simpler. At the start a large chunk
> is asked from Linux (which overcommits memory). Then my ALLOCATE
> gives out memory on a first free basis. Maybe I get stuck and have to
> do garbage collection, which amounts to calling FREE from the allocator.

What you're describing here is the same thing I describe in the quoted paragraph
above so I wouldn't call it simpler. I mentioned "escape analysis" as an
optimisation but it's not necessary.

> If the large stuck wasn't enough, that is the end.
> With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
> the GC needs to kick in.

[...]

> >I don't know why you think readline is infamous but it's for providing a more
> >pleasant interface and not necessary. On the other hand , Guile certainly
> >uses the GNU multiprecision library and being able to handle arbitrarily
> >large integers *is* part of the core functionality. Since it uses the
> >Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
> >(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.
>
> (In the context of toy lisps that can be implemented in 1000 lines,
> I hate readline which is itself couple of hundred lines,
> is not written in lisp and is not strictly needed.
> I'm glad to see that it is a loadable extension to Guile.)
>
> BDW is a c-package? I'm disappointed, because people told me that
> Lisp can be implemented using 5 basic operations.

BDW (Boehm-Demers-Weiser) is not necessary. I only brought it up because in an
earlier post you said "The GC seems to be a difficult part. How can there be
one if there is no lisp yet to speak of? A c-library?" .Now you seem confident
that you can write your own garbage collector. In which case you can forget
about BDW.

> That means
> something different than saying that yourforth is implemented
> with 38 basic words (in assembler) apparently.
> In analysing what is minimally needed in Forth always
> KEY and EMIT come up. That is realistic. If you can't tell the
> computer what to do, and can't see what it has done, your
> language is incomplete. Lispers seem to think that REPL is
> a tucked on part.

Not in the slightest ; I can't imagine what gave you that idea. Incremental
development where you test things as you write them is valued in the Lisp
world as much as it is in the Forth world. Note that Common Lisp has the
function READ which reads from a stream and returns Lisp objects. It also has
extensive specifications about how the various standard Common Lisp objects
should be printed as well as the ability to define your own methods to print
user defined objects in any way you like.

> If you would program a LISP machine like I have a Forth machine
> (just a tower that has Forth in the boot sector), would it then contain
> a couple of giant binary blobs being the GNU multiprecision lib,
> the compiled readline lib and maybe even a compiled BDW library written
> in C?
> Eech!

GNU multiprecision lib , etc. are shared libraries so in order to work they
need an operating system or at least parts of it like a dynamic linker. The
readline library is just an interface and not one well suited for Lisp so it's
not relevant to writing an operating system in Lisp. An operating system in
Lisp probably doesn't need support for arbitrarily large integers either ; I'm
not sure that such support would be considered essential for a Lisp under any
circumstances but for desktop applications it's certainly convenient to have.
For garbage collection , a Lisp from scratch would be best served by a precise
garbage collector so not BDW. Actually , an operating system with garbage
collection from the bottom up would be a fascinating thing to have regardless
of the implementation language.

> > Jorgen Grahn
> > James K. Lowden
>
> Two names in a signature, but you're Spiros?

My multiple personality disorder has been acting up :-D

--
it's just that in C++ and the like, you don't trust _anybody_, and in CLOS you
basically trust everybody. the practical result is that thieves and bums use C++
and nice people use CLOS. :)
Erik Naggum

Spiros Bousbouras

unread,
Sep 6, 2016, 11:32:36 AM9/6/16
to
On Tue, 6 Sep 2016 11:48:12 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
> In article <87zinmm...@jester.gateway.pace.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
> >
> >BDW is a nice simple way to get GC for a small Lisp, but the higher
> >performance ones tend to use other methods. And of course Lisp is much
> >older than the GNU project. Traditionally bignums would have been
> >implemented in assembly language. The historical Lisp machines had
> >hardware-assisted GC, I believe.
>
> How can a c-package that has the same API as malloc ever know what
> is garbage to LISP? Why would my ALLOCATE fare any less than BDW
> if it has access to that same information?

The basic idea is as follows : the application which uses BDW
(Boehm-Demers-Weiser) always uses the replacements for malloc() and realloc()
BDW provides to allocate memory. So the BDW library internally knows all the
ranges of addresses which the application has allocated. When BDW needs to do
garbage collection then it scans all the "data" of the application meaning ,
heap , hardware stack and processor registers. Every time it finds something
which could be interpreted as a pointer to a section of allocated memory then
it marks that section as in use. Then it scans each such section for pointers
and continues recursively in the same fashion until it has traversed the whole
references graph. After it has done so , each allocated memory section which
has not being marked during the traversal as in being in use , can be freed.
For more details I will refer you to the following :

http://www.iecc.com/gclist/GC-algorithms.html#Conservative%20collection

Conservative collection

Conservative garbage collection makes use of the observation
that if you are not relocating (copying) objects, then you
need not be quite so certain about exactly what is a pointer.
It suffices to scan the root set (and objects) for any
pointer-like bit patterns, and treat those pointer-like bit
patterns as actual pointers while marking. The result is that
the "true" reachable objects are all found, along with a few
others. This works surprisingly well in practice (if a few
additional tricks are added) and often permits the use of
garbage collection with oblivious source code and compilers.

Conservative collection is very important because it permits
the use of garbage collection with programs that were not
written with garbage collection in mind. That is, it simpifies
the use of garbage collection with existing (non-garbage-collected)
libraries of code.

or

http://hboehm.info/gc/gcdescr.html
Conservative GC Algorithmic Overview
This is a description of the algorithms and data structures used in
our conservative garbage collector.

Note that the techniques work for any application whatsoever not just a Lisp
compiler.

> A package with an ALLOCATE and a FREE is not a garbage collector.
> Garbage collecting is about knowing what can be freed.
> (Merging free areas into one is done automatically and incrementally
> in my Forth ALLOCATE. I hope not that this is called garbage collection.
> My Forth ALLOCATE knows the size of all allocated area's, so it is
> superior to malloc() in all respects except speed.)

As I said in a previous post , the malloc() library also knows internally the
size of all allocated areas otherwise it wouldn't work. The advantage you have
if you have control of both the memory allocation "infrastructure" (which would
be your own Forth in this case) as well as the application (which would be the
Lisp compiler you want to write) is that the application can appropriately mark
all objects which are pointers so that you can have *precise* garbage
collection as opposed to conservative garbage collection ; in the latter , the
algorithm I describe above can mistake as pointers objects which the
application uses as integers simply because accidentally they have a value
which corresponds to one of the allocated memory areas. So the application is
not going to use such an object as a reference to other objects but the garbage
collector doesn't know this so it has to assume the worst (i.e. be
conservative) and treat it as a pointer.

Here's an example of how such marking might work : say you have an architecture
where all objects have to be aligned at even memory addresses. This means that
every bit pattern which corresponds to a pointer will have the 0-th bit be
zero. So a garbage collected programming language may use the 0-th bit as a tag
bit to indicate integers and make this always 1. So for example , if it wants
to store the integer 4 it will not store bit pattern 100 (or appropriate width)
but 1001. This of course means that in order to add 2 integers you cannot just
use the assembly instruction but you have to shift both of them 1 position
right , then add them as normal , then shift the result 1 position left and
finally make the 0-th bit 1 again to tag the result as an integer. Similarly
for other mathematical operations. This is comparatively slow (plus you
sacrifice one value bit as a tag) but in the grand scheme of things , which
includes efficient and precise garbage collection , it's worth it. If I
remember correctly , the OCaml implementation does something like this. But the
main point is that for such a scheme to work , there has to be cooperation
between the garbage collector and the application which runs on top of it.

> I can imagine there is a "fat" FREE to be called from lisp.
> It frees the area pointed to and recursively via pointers stored
> there all areas indirectly pointed to. I can add a fat FREE to
> my ALLOCATE in a heartbeat. But surely only Lisp can decide
> when to call it. If there are any aliasing pointers to one
> of those areas, it can't fatFREE it.

Actually the garbage collector tends to decide when to do garbage collection as
opposed to say requesting more memory from the operating system. Having said
that , if the programming language has a command to do garbage collection then
it's a plus. If for an example an application knows it's going to be idle for a
while , then it may as well ask for a garbage collection to happen.

--
It is better that some innocent men remain in jail than that the
integrity of the English judicial system be impugned.
https://en.wikipedia.org/wiki/Lord_Denning

Albert van der Horst

unread,
Sep 6, 2016, 3:19:02 PM9/6/16
to
In article <jkBzz.1477$xE....@fx35.am4>,
That is a whole of a lot better than the wikipedia entry. I feel like pasting
that whole text into wikipedia.
The crucial points are
the whole memory in use by the application is considered as potential pointers
conservative: garbage may not be freed, but if it is freed it is
certainly garbage

Neither of these points came through from wikipedia.

What about pointers in the middle of an area?
Do they prevent the area from being freed?

<SNIP>

Thanks for the helpful information.

Anton Ertl

unread,
Sep 6, 2016, 4:15:18 PM9/6/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> the whole memory in use by the application is considered as potential pointers

That's wrong. If it was, cyclic structures would not be collected,
and a lot of other stuff would also not be collected (e.g., even if
there was no spurious pointer to a dead linked list, it would only
collect one element (the current first one) per collection).

For an automatic collector (conservative or no), the root set is the
static memory (data and bss segments in Unix), stacks, and registers.
My garbage collector is not as automatic and requires you to register
the roots, because there is no standard way in Forth to scan
everything where there might be roots.

> conservative: garbage may not be freed, but if it is freed it is
> certainly garbage

Actually, no garbage collector guarantees that any memory it keeps is
accessed later. Conservative garbage collectors may keep stuff around
because something that is not a real pointer seems to point to it,
whereas precise collectors do not have this problem. Because precise
collectors know what is a pointer and what is not, they can also copy
the live data around.

>What about pointers in the middle of an area?
>Do they prevent the area from being freed?

That's up to the collector. My collector doesn't. In general, for
conservative collectors, that would significantly increase the number
of spurious pointers. It also depends on the application. In a basic
Lisp system, all pointers are to the start of objects, so there is no
point in handling pointers to the middle of an object.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

WJ

unread,
Sep 6, 2016, 4:52:33 PM9/6/16
to
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Spiros Bousbouras wrote:

> it's just that in C++ and the like, you don't trust anybody, and in CLOS you
> basically trust everybody. the practical result is that thieves and bums use C++
> and nice people use CLOS. :)
> Erik Naggum


Erik Naggum wrote:

> * Martin Rodgers
> | I guss Erik doesn't wish to heal any wounds.
>
> you _are_ the wound. if you go away, the wound will be healed. I really
> want the wound to heal: JUST GET THE HELL OUT OF MY FACE!
>
> and whoever gave you permission to quote from private communication?
>
> please go die, Martin Rogders. at least the maggots will like you.


Erik Naggum wrote:

> You are a clever little asshole, aren't you?

> you incredibly retarded jerk!

> Man, how can you survive being so goddamn _stupid_?

> insufferably arrogant moron.

> Geez, you are such an idiot.

> your exceptionally pitiful brain

> qualities are important? Or are you so unphilosophical and such a
> leering idiot with a moronic grin permanently attached to his skull that

> had been able to think at all, you would probably experience _several_
> revelations of such magnitude that one "god" would not be enough.



Erik Naggum wrote:

> | > You have to get at the bytes where they are actually found, just
> | > after they have been read, and just before they are written.
> | > Anything else is pretty damn stupid.
>
> | Sure. Who said otherwise?
>
> You have strongly implied "otherwise", you insuffable numbnut.

I have grudgingly to admit that this brilliant post proves
that Naggum was indubitably one of the leading luminaries of
the CL cosmos.

In fact, one could go so far as to say that he was the epitome
of all of the characteristics that make worshippers of CL so
special and precious.

If he were still alive, I would be sorely tempted to beseech
him for admittance into his pack of hyenas.

--
Jewish drug dealers, child porn pushers, and slave traders are free
from prosecution in Israel. Israel does not consider these to be
crimes ... so long as the victims are non-Jews.
www.theoccidentalobserver.net/2009/08/the-culture-of-deceit/

Paul Rubin

unread,
Sep 7, 2016, 1:03:23 AM9/7/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> Neither of these points came through from wikipedia.

You should probably look at the gc.html documentation file from Anton's
package, if you haven't already:

http://www.complang.tuwien.ac.at/forth/garbage-collection.zip

unfortunately it's not on its own web page afaik, so you have to extract
it from the zip file. It explains stuff like that pretty well.

Stefan Monnier

unread,
Sep 8, 2016, 12:12:15 PM9/8/16
to
>> the whole memory in use by the application is considered as potential
>> pointers
> That's wrong. If it was, cyclic structures would not be collected,

That depends on how you interpret "use".


Stefan

Albert van der Horst

unread,
Sep 8, 2016, 2:02:05 PM9/8/16
to
In article <jwvmvji74hy.fsf-mon...@gnu.org>,
In the context of conservative garbage collectors, I expect that sometimes
cyclic structures are not collected.

>
>
> Stefan

Paul Rubin

unread,
Sep 8, 2016, 3:34:32 PM9/8/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> In the context of conservative garbage collectors, I expect that sometimes
> cyclic structures are not collected.

There shouldn't be anything different about them from non-cyclic
structures as regards the gc finding them. Anton's point was that a
conservative gc doesn't scan all of memory; but rather, it recursively
traces outward from a predefined root set, like any other gc. If
there's a cyclic structure out there that's unreachable from the roots,
it will get collected.

George Neuner

unread,
Sep 8, 2016, 4:51:08 PM9/8/16
to

If you are really interested in how GC operates, you should read the
bible:

Garbage Collection: Algorithms for Automatic Dynamic Memory
Management; Richard Jones and Rafael D Lins; 1996.
ISBN-13: 978-0471941484
ISBN-10: 0471941484

Don't be too concerned about the age - this book is mainly theory and
little has changed since it was written.


For a survey of modern implementation techniques, see:

The Garbage Collection Handbook: The Art of Automatic Memory
Management; Richard Jones, Antony Hosking and Eliot Moss; 2011.
ISBN-13: 978-1420082791
ISBN-10: 1420082795

This book is light on theory ... it is not a good intro text, but IMO
it is invaluable to an implementor.


Hope this helps,
George

Paul Rubin

unread,
Sep 8, 2016, 7:34:21 PM9/8/16
to
George Neuner <gneu...@comcast.net> writes:
> The Garbage Collection Handbook: The Art of Automatic Memory
> Management; Richard Jones, Antony Hosking and Eliot Moss; 2011.

Is that something other than an update on the 1996 book?

I do know there have been some real advances since then, especially for
parallel gc, though that wouldn't be relevant to a basic Lisp
interpreter.

George Neuner

unread,
Sep 9, 2016, 12:12:28 AM9/9/16
to
The 2nd book is *not* an update of the 1st.

The 1996 book was mainly about theory [which really has _not_ changed
all that much]. The 2011 book is light on the theory and is mainly
about implementation techniques for modern processors, and it covers
HRT implementation (the older book covered only SRT).

The books complement each other. A practitioner might get by with
only the 2011 book, but someone looking for in-depth understanding
needs the 1996 as well.

George

Paul Rubin

unread,
Sep 9, 2016, 1:22:14 AM9/9/16
to
Thanks. I've only seen the 2011 one, which I liked. I'll look for the
1996 one as well. I didn't realize there was much theory behind the
subject to begin with.

Anton Ertl

unread,
Sep 12, 2016, 6:23:21 AM9/12/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>In article <jwvmvji74hy.fsf-mon...@gnu.org>,
>Stefan Monnier <mon...@iro.umontreal.ca> wrote:
>>>> the whole memory in use by the application is considered as potential
>>>> pointers
>>> That's wrong. If it was, cyclic structures would not be collected,
>>
>>That depends on how you interpret "use".

All memory that the collector has given out to the application and
that has not been reclaimed.

>In the context of conservative garbage collectors, I expect that sometimes
>cyclic structures are not collected.

If you use the "in use" memory as specified above as the root set,
then cyclic structures will never be collected.

Stefan Monnier

unread,
Sep 12, 2016, 11:47:20 AM9/12/16
to
>>>>> the whole memory in use by the application is considered as potential
>>>>> pointers
>>>> That's wrong. If it was, cyclic structures would not be collected,
>>> That depends on how you interpret "use".
> All memory that the collector has given out to the application and
> that has not been reclaimed.

I read it as: the memory that was found to be reachable.


Stefan

Anton Ertl

unread,
Sep 12, 2016, 12:59:18 PM9/12/16
to
But you need the pointers to find reachable memory, so your reading is
not sufficient for determining live memory; the other reading also has
it's problems, as pointed out above, and more importantly, it's not
what conservative collectors do, so I just would not use this wording
at all.

Stefan Monnier

unread,
Sep 12, 2016, 1:38:38 PM9/12/16
to
>>>>>>> the whole memory in use by the application is considered as potential
>>>>>>> pointers
>>>>> That's wrong. If it was, cyclic structures would not be collected,
>>>>> That depends on how you interpret "use".
>>> All memory that the collector has given out to the application and
>>> that has not been reclaimed.
>> I read it as: the memory that was found to be reachable.
> But you need the pointers to find reachable memory,

Right, and the conservative GC considers anything that looks like
a pointer to be pointer w.r.t deciding if something is reachable or not.
So you *do* have "the pointers".

> it's not what conservative collectors do, so I just would not use this
> wording at all.

I wouldn't either, but in the context I find it understandable.


Stefan

Albert van der Horst

unread,
Nov 2, 2016, 2:07:08 PM11/2/16
to
In article <%mzzz.1531332$jB.12...@fx43.am4>,
Spiros Bousbouras <spi...@gmail.com> wrote:
<SNIP>
>> Then the code is high level Forth code, and again it is allocated
>> dynamically and a pointer to it is present in the header.
>> (High level Forth code is, of course a list of code tokens.)
>> Contrary to malloc, my ALLOCATE knows a lot about
>> the size etc of pieces allocated.
>
>malloc() also knows about the size of pieces allocated otherwise it wouldn't
>work. The interface does not make such information available to the programmer
>but the internals of malloc() (and related functions) certainly need to know.

That is true but that is in behalf of freeing and resizing and is in
general not byte count precise (e.g. a buddy implementation).
lisp needs the exact size of symbols, bignums and strings, and they
are available via my ALLOCATE, saving a cell and some hassle.

Groetjes Albert

>--
>it's just that in C++ and the like, you don't trust _anybody_, and in CLOS you
>basically trust everybody. the practical result is that thieves and bums use C++
>and nice people use CLOS. :)
> Erik Naggum
>
0 new messages