I've cross posted to comp.lang.lisp
The context is writing a minimal lisp, but with Forth instead of
c as the implementation language.
In article <_p9zz.1142663$OC.4...@fx39.am4>,
Spiros Bousbouras <
spi...@gmail.com> wrote:
<SNIP>
>
>I don't think that garbage collection is something to be added as an
>afterthought. The existence of garbage collection affects the allocation (and
>possible recollection) of every object (see examples below) so it's one of
>the issues you need to be thinking about from the start. It's likely more
>fundamental than the syntax of the language. Having said that , it's not that
>big a hurdle. I don't know why you don't like Anton Ertl's package but there
>is also the Boehm-Demers-Weiser garbage collector which is used in Embeddable
>Common Lisp , Guile Scheme and was used in an early implementation of Racket
>Scheme. Even writing your own precise garbage collector is not terribly
>difficult , see
>
http://www.cs.cornell.edu/courses/cs312/2003fa/lectures/sec24.htm
>for a simple algorithm.
>
>> About garbage: Is this correct (the infamous Fibonacci series)?
>>
>> After:
>> (define (fib n)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i n ) b)))
>> the fib procedure remains there in all eternity and there is nothing
>> to collect. I can do (fib 10) any time I please.
>> Or will there be anonymous memory places and are the
>> boxes they are in such as (a,b,i,n) collectable?
>
>As long as you don't redefine it then yes it remains there for all eternity.
Maybe there is a confusion between dynamic allocation / knowing
exactly where each piece of memory hangs out, and garbage
collection.
Such a fib consist (in my view) of a Forth word header of 5 CELLS which
is dynamically allocated *using my Forth ALLOCATE* and pointed to by some
pointer sitting in some tree or list or hash as a global heap.
Then the code is high level Forth code, and again it is allocated
dynamically and a pointer to it is present in the header.
(High level Forth code is, of course a list of code tokens.)
Contrary to malloc, my ALLOCATE knows a lot about
the size etc of pieces allocated.
Of course I make sure that analysing the heap is possible to find
out what is no longer needed. But I call actually doing that
the garbage collection. There is no doubt in my mind that
I can implement a garbage collector even if it is
very inefficient. What one needs is all knowledge about pointers to
dynamically allocated area and into dynamically allocated area and
know which is which, and a means to know the sizes.
That may not be properly called an afterthought.
If I wouldn't keep that information from the start, an attempt to
write a Lisp would be ludricous indeed. If I want to investigate how
a Forth can morph into a lisp, I can't have an external tool
dictate me how to do the allocation. Using my own ALLOCATE I can change
it, if I want properties specifically needed for LISP
>But if you call it then the arguments fib and n as well as the local
>(lexical in Lisp parlance) variables i , a , b will have to be allocated
>somewhere. So whether they create garbage , which will need to be reclaimed
>eventually , depends on the allocation strategy of the implementation. A
>perfectly reasonable strategy is that the implementation gets a big chunk of
>memory from the operating system and every time it needs a new object it uses
>the smallest address from that space which it hasn't used yet. If the space
>fills up then the implementation may ask for more memory from the operating
>system or do garbage collection ; obviously , eventually it will have to do
>garbage collection rather than keep asking for more and more memory. With
>such an allocation scheme , every time you call fib and it returns then it
>will create garbage for fib , n , etc. Now a sufficiently clever Lisp
>implementation may be able to prove that none of the variables of fib can
>be referenced after fib terminates and therefore allocate them on the
>hardware stack as it would (likely) happen in a language like C.
>
en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
>not that clever then every call to fib creates garbage and if that garbage
>isn't cleared every now and again , it will consume eventually all available
>memory.
My Forth allocation strategy is simpler. At the start a large chunk
is asked from Linux (which overcommits memory). Then my ALLOCATE
gives out memory on a first free basis. Maybe I get stuck and have to
do garbage collection, which amounts to calling FREE from the allocator.
If the large stuck wasn't enough, that is the end.
With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
the GC needs to kick in.
>
>> But what with (directly calculating the 100-th fib number)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i 100 ) b )
>> )
>> 354224848179261915075
>> does that leave garbage in your run of the mill scheme?
>> (Because it might not in mine.)
>
>Same as above ; if the Lisp implementation is clever enough to see that
>i , a , b cannot be accessed after the piece of code terminates then it will
>allocate them in such a way as to not create garbage (like on the hardware
>stack) , otherwise they will be garbage. Note also , which Paul Rubin pointed
>out in a different post , that even 354224848179261915075 may create garbage
>if it's so large that it won't fit in a register. Since
>354224848179261915075 > 2^64 then even in a 64 bit machine it may create
>garbage. Even just writing on the REPL
> 354224848179261915075
>may create garbage.
That is interesting. The Forth REPL can interpret the line
.( aap)
and it will print aap. The string aap is a temporary object and
disappears automatically. It will not be allocated anywhere,
only a string descriptor will be on the Forth stack.
"aap" TYPE
likewise will print aap, otoh it will allocate a string (in ciforth).
Its descriptor is thrown away, so the string itself becomes garbage.
Normally in Forth this garbage is collected using FORGET/MARKER
a very primitive mechanism that basically only restores a previous
memory situation.
Thank you. good to know that. I have no objection to extensions of course.
My statically linked lina is 30 kbyte. If you want floating point,
you have to load it from a source library, so then you've to load an
assembler first from a source library.
>
>I don't know why you think readline is infamous but it's for providing a more
>pleasant interface and not necessary. On the other hand , Guile certainly
>uses the GNU multiprecision library and being able to handle arbitrarily
>large integers *is* part of the core functionality. Since it uses the
>Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
>(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.
(In the context of toy lisps that can be implemented in 1000 lines,
I hate readline which is itself couple of hundred lines,
is not written in lisp and is not strictly needed.
I'm glad to see that it is a loadable extension to Guile.)
BDW is a c-package? I'm disappointed, because people told me that
Lisp can be implemented using 5 basic operations. That means
something different than saying that yourforth is implemented
with 38 basic words (in assembler) apparently.
In analysing what is minimally needed in Forth always
KEY and EMIT come up. That is realistic. If you can't tell the
computer what to do, and can't see what it has done, your
language is incomplete. Lispers seem to think that REPL is
a tucked on part.
If you would program a LISP machine like I have a Forth machine
(just a tower that has Forth in the boot sector), would it then contain
a couple of giant binary blobs being the GNU multiprecision lib,
the compiled readline lib and maybe even a compiled BDW library written
in C?
Eech!
> Jorgen Grahn
> James K. Lowden
Two names in a signature, but you're Spiros?