Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Implementing lisp in Forth. Chapter 4.

1,151 views
Skip to first unread message

Albert van der Horst

unread,
Aug 29, 2016, 12:33:06 PM8/29/16
to
The following is inspired by the website
http://www.buildyourownlisp.com
This site is beyond brilliant. A comparison with the lisp's in
Forth I've seen is ... well let's not compare.

The idea behind this website is that the best way to learn
programming in C is to build a lisp.
We all know that the best way to learn Forth is to build a Forth.
That I've done already. Now I'll improve my Forth skills and,
probably, my Forth by trying to build a lisp.
Moreover I'll try to prove that ciforth is a superior implementation
language. That means I'll use its full power, without regards to ISO
to build a lisp that is on a par with one written in C.
The lisp has the silly Dutch name "effelispe". I leave it to the community
to properly pronounce it in their own language.

I don't need to learn programming in C. So we start with chapter 4.
This is about making a REPL , a read-evaluate-print-loop, what we call
the high level interpreter.

The first version of the REPL is a program that you can call names
and then it gets right back at you.
Here is a session of this program from the site, as presented there.
*****************************************************
* Lispy Version 0.0.0.0.1 *
* Press Ctrl+c to Exit *
* *
* lispy> hello *
* No You're a hello *
* lispy> my name is Dan *
* No You're a my name is Dan *
* lispy> Stop being so rude! *
* No You're a Stop being so rude! *
* lispy> *
*****************************************************

My version handling is done by RCS so my init is

: init "
effelispe snapshot $Revision: 1.2 $ $Date: 2016/08/29 16:32:15 $
Supplied under GPL 2.0, quality but no warranty
Press Ctrl+c to Exit
" TYPE ;

The prompt is handled by revectoring OK:

: prompt "
effelispe>" TYPE ;
'prompt >DFA @ 'OK >DFA !

This is a prompt in the sense of "Hurry up dummy!" as Charles Moore
put it, so the program will need an extra prompt between the init
and the REPL. Our REPL is of course just QUIT.

: repl init prompt QUIT ;

That was though. Now for the fun part.
We need a word to reflect what its input like so:

WANT .FORMAT
: YOU'RE NAME "No You're a %s %n" .FORMAT ; PREFIX

Let's try it

*****************************************************
* YOU'RE douche-bak *
* No You're a douche-bak *
* OK *
* YOU'REdouche-bak *
* No You're a douche-bak *
*****************************************************

Because it is a prefix, it can get a TOKEN that is collated.
Let's get the name of NAME correct while we're at it:
"TOKEN" $, 'NAME >NFA !
The word NAME has its name changed in TOKEN.

The rest is easy:
"" $, 'YOU'RE >NFA !
Now YOU'RE has become an empty prefix that will match everything.

After gathering the code an executable is made with :
lina64 -c effelispe.frt

Let's see whether it behaves.
* *******************************************************************
* ~/PROJECT/ciforth/lisp$ effelispe *
* *
* effelispe snapshot $Revision: 1.2 $ $Date: 2016/08/29 16:32:15 $ *
* Supplied under GPL 2.0, quality but no warranty *
* Press Ctrl+c to Exit *
* *
* effelispe>douche-bak *
* No You're a douche-bak *
* *
* effelispe>kloho *
* No You're a kloho *
* *
* effelispe>^C *
* ~/PROJECT/ciforth/lisp$ *
* *******************************************************************

For good measure here is version 4 of the program. (I'll follow
the chapter numbering.)
----------------------------------8<--------------------------------
\ $Id: chap4.txt,v 1.2 2016/08/29 16:32:15 albert Exp $
\ Copyright (2016): Albert van der Horst {by GNU Public License}

WANT .FORMAT

: init "
effelispe snapshot $Revision: 1.2 $ $Date: 2016/08/29 16:32:15 $
Supplied under GPL 2.0, quality but no warranty
Press Ctrl+c to Exit
" TYPE ;

\ Replace Forth prompt.
: prompt CR "effelispe>" TYPE ;
'prompt >DFA @ 'OK >DFA !

\ Change the name of NAME to TOKEN.
"TOKEN" $, 'NAME >NFA !

\ Calling names.
: YOU'RE TOKEN "No You're a %s %n" .FORMAT ; PREFIX
: install-repl "" $, 'YOU'RE >NFA ! ;

: doit install-repl init prompt QUIT ;

----------------------------------8<--------------------------------

The installation of the repl must wait till run time, otherwise
the compiler starts calling you names ("No You're a TURKEY")

Groetjes Albert

P.S. I hope douche-bak is not too offensive. I heard it in comedy
series like Southpark and I've no idea what it means.
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Paul Rubin

unread,
Aug 29, 2016, 12:39:09 PM8/29/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> P.S. I hope douche-bak is not too offensive. I heard it in comedy
> series like Southpark and I've no idea what it means.

The word is "douche bag" and you could look it up :).

Paul Rubin

unread,
Aug 29, 2016, 12:46:14 PM8/29/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> The following is inspired by the website
> http://www.buildyourownlisp.com
> This site is beyond brilliant. A comparison with the lisp's in
> Forth I've seen is ... well let's not compare.

You might also like:

https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

Or just look at source code of many of the Lisps out there that are
written in C or in Lisp itself.

You should probably also read SICP at some point:

https://mitpress.mit.edu/sicp/

It looks like the Lisp from buildyourownlisp.com is very limited.

Albert van der Horst

unread,
Aug 30, 2016, 4:41:33 AM8/30/16
to
In article <87fupnp...@jester.gateway.pace.com>,
I forgot the smiley. Actually douche bak is the Dutch name for
the part of a douche that catches the water, and kloho is not a word,
but could be associated with an offensive peiorative.

If HA publishes his Straight Forth , the publisher will ask him
to change his "hotties" example. I can have nothing of that.

Groetjes Albert

Albert van der Horst

unread,
Aug 30, 2016, 4:58:49 AM8/30/16
to
In article <87bn0bp...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> The following is inspired by the website
>> http://www.buildyourownlisp.com
>> This site is beyond brilliant. A comparison with the lisp's in
>> Forth I've seen is ... well let's not compare.
>
>You might also like:
>
>https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

This looks similar, but I don't know Haskell. On the other hand,
c I know inside out.

>
>Or just look at source code of many of the Lisps out there that are
>written in C or in Lisp itself.

Done that, but a step by step instruction for a familiar
language with lots of explanation is hard to beat. I've a lisp
in C and it works. I've seen the undocumented lisp in Forth
and I can't even run the example.
I estimate my changes for a 10 hour search to get something
better very slim.

>
>You should probably also read SICP at some point:
>
> https://mitpress.mit.edu/sicp/

I'm almost at the end.

>
>It looks like the Lisp from buildyourownlisp.com is very limited.

Please elaborate.

I see chapter 16, " Only the Beginning ".

This is what the author proposes to add with some remarks about how
to do it. That suggests to me that the lisp generated is a reasonable
starting point.

Native Types
User Defined Types
List Literal
Operating System Interaction
Macros
Variable Hashtable
Pool Allocation
Garbage Collection
Tail Call Optimisation
Lexical Scoping
Static Typing
Conclusion

Which of those are in the 1000 lines toy lisps that is written
in Forth or c anyway?
Note that my ambition goes no further than writing such a toy
lisp but with mature implementation techniques such that one could
seriously consider Forth to be an implementation language for other
languages.

Thanks for the comment. I hope you will look at the next chapters too.
The next step is parsing. It generates data structures. It will require
major design decisions how to do that in Forth. My hope is that by
allocating part of those structures (objects) on the data stack, much
of the freeing will be swift and painless.

Groetjes Albert

Rod Pemberton

unread,
Aug 30, 2016, 7:59:52 AM8/30/16
to
On Tue, 30 Aug 2016 11:07:01 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:

> Paul Rubin <no.e...@nospam.invalid> wrote:

> >Or just look at source code of many of the Lisps out there that are
> >written in C or in Lisp itself.
>
> Done that, but a step by step instruction for a familiar
> language with lots of explanation is hard to beat. I've a lisp
> in C and it works. I've seen the undocumented lisp in Forth
> and I can't even run the example.
> I estimate my changes for a 10 hour search to get something
> better very slim.
>

LISP can be built up from a small set of primitives just like Forth.

Stutter lisp
car, cdr, cons, equal, if, lambda, quote, set
https://groups.google.com/d/msg/comp.lang.lisp/priOn-wu2Ek/cNlWzkYxJ0kJ
Message-ID: <3985f411.278996525@judy>#1/1

McCarthy
atom, car, cdr, cons, eq
https://groups.google.com/d/msg/comp.lang.lisp/k8PZ8a9qJwc/iGWw9r8CO9EJ
Message-ID: <11...@wrs.wrs.com>

common
atom, car, cdr, cond, cons, eq, lambda, quote
https://groups.google.com/d/msg/comp.lang.lisp/NYMDTjWKJPg/j-bQ2NRynpgJ
Message-ID: <87vff8x...@thalassa.informatimago.com>

minimal CL
atom, car, cdr, cond, cons, defun, eq, quote
https://groups.google.com/d/msg/comp.lang.lisp/hYJBg3PzgDw/UAQ4ZcsQ-E4J
Message-ID: <1153129866....@m73g2000cwd.googlegroups.com>


Rod Pemberton

Paul Rubin

unread,
Aug 31, 2016, 5:23:41 PM8/31/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> Native Types ...
> Which of those are in the 1000 lines toy lisps that is written
> in Forth or c anyway?

That is all pretty standard, though 1kloc may be a bit light. You could
look at the early versions of SIOD (Scheme In One Day) by George
Carrette, or more recently Tinyscheme (used in GIMP etc).

> Thanks for the comment. I hope you will look at the next chapters too.
> The next step is parsing. It generates data structures. It will require
> major design decisions how to do that in Forth. My hope is that by
> allocating part of those structures (objects) on the data stack, much
> of the freeing will be swift and painless.

That's not realistic, a useful Lisp needs garbage collection.

I'm not impressed by that book's approach to parsing either (using fancy
parser combinators written in C). Lisp is syntactically very simple,
almost like Forth. Forth just uses non-whitespace tokens delimited by
whitespace, giving a linear structure. Lisp adds parentheses to denote
lists of conses, so you get tree structure.

I haven't completely thought this through so maybe there is a fatal
flaw, but if I wanted to write a Lisp in Forth, I'd consider this
approach:

- write the reader and printer in Lisp, using an existing Lisp for
bootstrapping and testing.

- Write the usual Lisp primitives (CONS, CAR, CDR) etc. as Forth words
that call alloc to get storage for new conses.

- Have the reader intern new symbols into the Forth dictionary (might
need carnal knowledge or hacking of your Forth) putting DEFER
around everything, since Lisp uses late binding. Self-evaluating
symbols would call a Forth word that takes an address arg.
(define (f x) ...) would make an IS to actually bind the "f" symbol
(earlier interned as a deferred) to a code address.

- Use Anton's Boehm-style garbage collector instead of attempting
precise GC

- Don't have an evaluator but instead have a compiler written in Lisp,
that tree-walks Lisp functions and emits Forth code, either as
threaded code to be run by your Forth's address interpreter, or as
Forth source code that you'd pass through a text interpreter. (It
wouldn't look like human-written code because of all the deferred
symbols, the use of special Lisp operators like "Lisp+" instead of "+"
that would have to dispatch on types, etc. But still, now your Lisp
code can call Forth words and vice versa, maybe using a small
FFI to convert between tagged and untagged values.

Albert van der Horst

unread,
Aug 31, 2016, 8:07:16 PM8/31/16
to
In article <8737lkp...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> Native Types ...
>> Which of those are in the 1000 lines toy lisps that is written
>> in Forth or c anyway?
>
>That is all pretty standard, though 1kloc may be a bit light. You could
>look at the early versions of SIOD (Scheme In One Day) by George
>Carrette, or more recently Tinyscheme (used in GIMP etc).
>
>> Thanks for the comment. I hope you will look at the next chapters too.
>> The next step is parsing. It generates data structures. It will require
>> major design decisions how to do that in Forth. My hope is that by
>> allocating part of those structures (objects) on the data stack, much
>> of the freeing will be swift and painless.
>
>That's not realistic, a useful Lisp needs garbage collection.

Don't get me wrong, I understand that. But some allocate/free pairs
may be saved by using our usual stack approach for some of the
temporary data.

>
>I'm not impressed by that book's approach to parsing either (using fancy
>parser combinators written in C). Lisp is syntactically very simple,
>almost like Forth. Forth just uses non-whitespace tokens delimited by
>whitespace, giving a linear structure. Lisp adds parentheses to denote
>lists of conses, so you get tree structure.

Was gonna implement the parentheses as prefix/recognizer and no way
a overkill c-library to be used.
way a

>
>I haven't completely thought this through so maybe there is a fatal
>flaw, but if I wanted to write a Lisp in Forth, I'd consider this
>approach:

There may be a fatal flaw. The lists of lvalues are not with links but
are allocated arrays. If an lvalue is appended the arrays is
realloced. If the first value is to be removed the array is moved
down. This looks like O(N^2) but it all depends on what lvalues are
and how long these lists grow. It may be a reasonable design choice
but at first sight it smells.

>
>- write the reader and printer in Lisp, using an existing Lisp for
> bootstrapping and testing.
>
>- Write the usual Lisp primitives (CONS, CAR, CDR) etc. as Forth words
> that call alloc to get storage for new conses.
>
>- Have the reader intern new symbols into the Forth dictionary (might
> need carnal knowledge or hacking of your Forth) putting DEFER
> around everything, since Lisp uses late binding. Self-evaluating
> symbols would call a Forth word that takes an address arg.
> (define (f x) ...) would make an IS to actually bind the "f" symbol
> (earlier interned as a deferred) to a code address.
>
>- Use Anton's Boehm-style garbage collector instead of attempting
> precise GC
>
>- Don't have an evaluator but instead have a compiler written in Lisp,
> that tree-walks Lisp functions and emits Forth code, either as
> threaded code to be run by your Forth's address interpreter, or as
> Forth source code that you'd pass through a text interpreter. (It
> wouldn't look like human-written code because of all the deferred
> symbols, the use of special Lisp operators like "Lisp+" instead of "+"
> that would have to dispatch on types, etc. But still, now your Lisp
> code can call Forth words and vice versa, maybe using a small
> FFI to convert between tagged and untagged values.

My idea was for e.g.
(* 1 20 (+ 1 2 3) )
a LISP vacobulary with compiling words
( prefix
*
+
-
- prefix
0 prefix
...

9 prefix

All words are compiling.
Then after the line is compiled, there is tree structure
that if EXECUTEd in the Forth sense result in an evaluation
of the expression.

So the parsing of 12
results in a loose Forth word that pushes the atom 12 to the stack.
My PREFIXes have no trouble with that.

Something along the lines of
'BL >CFA @ CONSTANT docon \ code routine for constants.

HERE 1)
docon , \ code field
( 12 ) , \ data field
_ , \ flag field
_ , \ name field
_ , \ link field
_ , \ source field

The HERE can now be compiled into a list using , ( putisically COMPILE,)

The above example would look decompiled
: temp1 + 1 2 3 ;
: temp2 * 1 20 temp1 ;
The * in the lisp vocabulary would look like

: *
DEPTH temp !
CO \ execute remainder of caller
BEGIN DEPTH 1+ temp @ <> WHILE * ( Forth's *) REPEAT ;

(I see that you name that Lisp* )

I'll save this article of yours. There is a lot in it, that I
do not yet understand.

This saturday Willem Ouwerker is going to present a two and a
sex legged robot at a combined meeting of the Robot and AI
user groups. I'll meet an AI Lisp expert there.

Groetjes Albert

1)
(Of course all dummy fields can be left out, as docon doesn't
inspect them, but otherwise it looks like I was implmenting LIT).

Paul Rubin

unread,
Sep 1, 2016, 10:49:18 PM9/1/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>a useful Lisp needs garbage collection.
> Don't get me wrong, I understand that. But some allocate/free pairs
> may be saved by using our usual stack approach

Yes probably true, that's just an optimization though, and it's easiest
to follow a straightforward approach first. If there's a GC you might
as well use it, especially for stuff like the reader (Lisp terminology
for the thing that parses S-expressions). Your Lisp isn't going to
spend much of its time doing that.

> Was gonna implement the parentheses as prefix/recognizer

Would that also handle trailing parens (foo (bar (.... baz))) instead of
treating "baz)))" as a Forth word?

> There may be a fatal flaw. The lists of lvalues are not with links but
> are allocated arrays.

I'm not sure what you mean by that: traditionally you'd use cons nodes
that each hold two values.

> a LISP vacobulary with compiling words
> ( prefix

I wouldn't try to use the Forth text interpreter directly as a Lisp
reader. You start wanting read macros written in Lisp, etc. Assuming
the target machine is a PC or comparable, I wouldn't worry about the
efficiency of the reader. I doubt you're planning to run megabytes of
Lisp code.

> This saturday Willem Ouwerker is going to present a two and a
> sex legged robot at a combined meeting of the Robot and AI
> user groups. I'll meet an AI Lisp expert there.

Nice. It occurs to me given your interest in Euler problems, that your
Lisp should also have bignums.

Albert van der Horst

unread,
Sep 2, 2016, 5:41:00 AM9/2/16
to
In article <87twdzn...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>>a useful Lisp needs garbage collection.
>> Don't get me wrong, I understand that. But some allocate/free pairs
>> may be saved by using our usual stack approach
>
>Yes probably true, that's just an optimization though, and it's easiest
>to follow a straightforward approach first. If there's a GC you might
>as well use it, especially for stuff like the reader (Lisp terminology
>for the thing that parses S-expressions). Your Lisp isn't going to
>spend much of its time doing that.

That approach will be brutally straightforward, I promise.
The GC seems to be a difficult part. How can there be one if there
is no lisp yet to speak of? A c-library?
The whole point of Forth is that such a reader be modular and
handled by the Forth interpreter. That is the crux of this
exercise.

>
>> Was gonna implement the parentheses as prefix/recognizer
>
>Would that also handle trailing parens (foo (bar (.... baz))) instead of
>treating "baz)))" as a Forth word?

You got me here. I've have to change PARSE-NAME aka token.

: TOKEN
_ BEGIN DROP PP@@ ?BLANK OVER SRC CELL+ @ - AND 0= UNTIL
_ BEGIN DROP PP@@ ?BLANK UNTIL
OVER - ;

I'll need a TOKEN2 with the second ?BLANK replaced by ?BLANK-OR-)
: ?BLANK-OR-) DUP &) = SWAP ?BLANK OR ;

Or I may just require
(+ 1 2 3 baz )
instead of
(+ 1 2 3 baz)

>
>> There may be a fatal flaw. The lists of lvalues are not with links but
>> are allocated arrays.
>
>I'm not sure what you mean by that: traditionally you'd use cons nodes
>that each hold two values.
>
>> a LISP vacobulary with compiling words
>> ( prefix
>
>I wouldn't try to use the Forth text interpreter directly as a Lisp
>reader. You start wanting read macros written in Lisp, etc. Assuming
>the target machine is a PC or comparable, I wouldn't worry about the
>efficiency of the reader. I doubt you're planning to run megabytes of
>Lisp code.

>
>> This saturday Willem Ouwerker is going to present a two and a
>> sex legged robot at a combined meeting of the Robot and AI
>> user groups. I'll meet an AI Lisp expert there.
>
>Nice. It occurs to me given your interest in Euler problems, that your
>Lisp should also have bignums.

You got me wrong. I don't want to write a Lisp. I want to demonstrate
that Forth can be used to write a Lisp.

Groetjes Albert

Albert van der Horst

unread,
Sep 2, 2016, 6:25:53 AM9/2/16
to
In article <20160830080019.141d1792@_>,
Interesting and hope giving. I feel that this may not be the
same as what Forthers mean by a minimal set.

How could you multiply two numbers in such a lisp, or
write a program to do that, or even represent numbers?
The only way I can think off is the Turing machine way,
using a sequence of loose bits or maybe digits.

>Rod Pemberton

Paul Rubin

unread,
Sep 4, 2016, 2:28:59 PM9/4/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> The GC seems to be a difficult part. How can there be one if there
> is no lisp yet to speak of? A c-library?

Anton Ertl has already written one in Forth:

http://www.complang.tuwien.ac.at/forth/garbage-collection.zip

It supplies an ALLOC word that gives you a chunk of memory of you
desired size. There is no corresponding FREE, since ALLOC notices if
you're running low on memory, and performs a GC if you are. I'm
suggesting that you use Anton's package and write Lisp-callable words
like CONS as Forth words that use ALLOC, so the GC part is already done
for you.

> The whole point of Forth is that such a reader be modular and handled
> by the Forth interpreter. That is the crux of this exercise.

If the exercise is just to write a Lisp reader in Forth, then ok; but in
terms of getting a Lisp working, the reader is just a component.

>> instead oftreating "baz)))" as a Forth word?

> I'll need a TOKEN2 with the second ?BLANK replaced by ?BLANK-OR-)
> : ?BLANK-OR-) DUP &) = SWAP ?BLANK OR ;

There are a few other characters like ' (quote) that you'll also want to
treat specially. It's not terrible but I don't see much gain from this
approach.

> Or I may just require (+ 1 2 3 baz )

Noooo......

> You got me wrong. I don't want to write a Lisp. I want to demonstrate
> that Forth can be used to write a Lisp.

I'm rather liking the idea of a Forth with a Lisp extension. You could
then use Forth directly to write the realtime or speed-intensive parts
of a program and parts that need to address raw memory, and Lisp for
higher-level parts, UI, etc.

Albert van der Horst

unread,
Sep 4, 2016, 9:23:00 PM9/4/16
to
Newsgroups: comp.lang.forth
References: <nq1ojd$rvh$1...@cherry.spenarnc.xs4all.nl> <87twdzn...@jester.gateway.pace.com> <nqbhus$4fp$1...@cherry.spenarnc.xs4all.nl> <87fupfo...@jester.gateway.pace.com>
Subject: Re: Implementing lisp in Forth. Chapter 4.

In article <87fupfo...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> The GC seems to be a difficult part. How can there be one if there
>> is no lisp yet to speak of? A c-library?
>
>Anton Ertl has already written one in Forth:
>
> http://www.complang.tuwien.ac.at/forth/garbage-collection.zip
>
>It supplies an ALLOC word that gives you a chunk of memory of you
>desired size. There is no corresponding FREE, since ALLOC notices if
>you're running low on memory, and performs a GC if you are. I'm
>suggesting that you use Anton's package and write Lisp-callable words
>like CONS as Forth words that use ALLOC, so the GC part is already done
>for you.

The constraints that package imposes interferes with the non-existing
constraints I will impose on a yet non-existent lisp.
I tell you what I'll do. I just ALLOCATE away in my 16 Gbyte space.
If I know I can FREE I'll do it. Then I'll add GC as an afterthought.
That that is possible will be the final proof that my design was okay.

About garbage: Is this correct (the infamous Fibonacci series)?

After:
(define (fib n)
(do ((i 1 (+ 1 i))
(a 1 (+ a b))
(b 1 a ))
((>= i n ) b)))
the fib procedure remains there in all eternity and there is nothing
to collect. I can do (fib 10) any time I please.
Or will there be anonymous memory places and are the
boxes they are in such as (a,b,i,n) collectable?

But what with (directly calculating the 100-th fib number)
(do ((i 1 (+ 1 i))
(a 1 (+ a b))
(b 1 a ))
((>= i 100 ) b )
)
354224848179261915075
does that leave garbage in your run of the mill scheme?
(Because it might not in mine.)

In the procedure fib can I optimise the fib into machine code with
(optimise fib)
that optimises the underlying Forth code, without getting on
someones toes?
Obviously that may leave some garbage that may be harder to collect.

>
>> The whole point of Forth is that such a reader be modular and handled
>> by the Forth interpreter. That is the crux of this exercise.
>
>If the exercise is just to write a Lisp reader in Forth, then ok; but in
>terms of getting a Lisp working, the reader is just a component.

No it isn't, not in the Forth philosophy. The reader is the glue and
Lisp doesn't exist, it is just turtles all the way down.
(If you don't get that, look at the number formatting in Forth, or
better still control flow. There is no number formatting. There are
just simple words, calling even simpler words.)

(It may well be that Lisp doesn't exist in the Lisp philosophy,
just cdr's all the way down to the atom's. Dunno yet.)
>
>>> instead oftreating "baz)))" as a Forth word?
>
>> I'll need a TOKEN2 with the second ?BLANK replaced by ?BLANK-OR-)
>> : ?BLANK-OR-) DUP &) = SWAP ?BLANK OR ;
>
>There are a few other characters like ' (quote) that you'll also want to
>treat specially. It's not terrible but I don't see much gain from this
>approach.

It is not the gain, it is the loss. I loose an interpreter for Backus
Naur forms. Less is more.

>
>> Or I may just require (+ 1 2 3 baz )
>
>Noooo......

It is my party ... But as it just requires a couple of bytes,
I will make it even more complicated:

: ?BLANK-OR-SPECIAL >R """'])}" R@ $^ 0<> R> ?BLANK OR ;

>
>> You got me wrong. I don't want to write a Lisp. I want to demonstrate
>> that Forth can be used to write a Lisp.
>
>I'm rather liking the idea of a Forth with a Lisp extension. You could
>then use Forth directly to write the realtime or speed-intensive parts
>of a program and parts that need to address raw memory, and Lisp for
>higher-level parts, UI, etc.

I just read the description of Guile. That is a Scheme dialect and Scheme
is supposed to be simple. I can't detect the simplicity in it (not
from the info file). It kind of deters me.
I see that guile binary is 3K, that seems okay, but it calls
horrific c-libraries like the infamous readline, maybe even malloc.

Paul Rubin

unread,
Sep 5, 2016, 12:03:39 AM9/5/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> http://www.complang.tuwien.ac.at/forth/garbage-collection.zip
> The constraints that package imposes interferes with the non-existing
> constraints I will impose on a yet non-existent lisp.

I'd be interested to know what your objections are. The technique that
package uses works very well and its C counterpart is used in many
real-world systems.

> I tell you what I'll do. I just ALLOCATE away in my 16 Gbyte space.
> If I know I can FREE I'll do it. Then I'll add GC as an afterthought.

Well, that's a start, though if you were doing a serious implementation
then a lot of the implementation details would be enmeshed in the gc
strategy so you'd want to think it all through from the beginning.

> the fib procedure remains there in all eternity and there is nothing
> to collect. I can do (fib 10) any time I please. Or will there be
> anonymous memory places and are the boxes they are in such as
> (a,b,i,n) collectable?

It depends on your scheme's implementation details, the number
representation, etc. But if they are fixnums and you have a compiler
that looks for such chances, that function can avoid consing. A simple
implementation would just cons everything. CPython works like that (it
caches 100 or so small integers and conses/gc's the rest) and tons of
people use it. It's not as bad as it sounds.

> 354224848179261915075 does that leave garbage in your run of the mill
> scheme?

Yes, bignums are usually heap-allocated so if you create them as
intermediate results, they will leave garbage.

> In the procedure fib can I optimise the fib into machine code with
> (optimise fib)
> that optimises the underlying Forth code, without getting on
> someones toes?

In principle that could be ok. You could look at Chicken Scheme, which
compiles Scheme to C if I remember properly.

> Obviously that may leave some garbage that may be harder to collect.

I don't think you should worry about optimizing this, especially for
your first implementation. Just spew garbage wherever it's convenient
and let the GC take care of it.

> No it isn't, not in the Forth philosophy. The reader is the glue and
> Lisp doesn't exist, it is just turtles all the way down.

But I thought the idea of implementing Lisp was so you could program
with the Lisp philosophy, which is much different from the Forth
philosophy. Forth is like building a brick outhouse, carefully
cementing each brick in place before moving to the next, getting
something compact and solid. Lisp is more like dancing in the air. You
can create a palace of data with a wave of your hand, use it for
something, then fly away from it, leaving the GC to reclaim the building
materials for you to use in your next adventure.

> (It may well be that Lisp doesn't exist in the Lisp philosophy,
> just cdr's all the way down to the atom's. Dunno yet.)

Lisp very much exists in the traditional Lisp philosophy (you can build
up programs and eval them), even though this isn't supported in Scheme.

> It is not the gain, it is the loss. I loose an interpreter for Backus
> Naur forms. Less is more.

There's no BNF in a usual Lisp reader, but rather just a simple
recursive procedure that recognizes a few special characters and
otherwise acts like a Forth parser. But you then have a tree to
manipulate, maybe for macro expansion or other purposes. It's easier to
handle that in Lisp than in Forth.

> I just read the description of Guile. That is a Scheme dialect and
> Scheme is supposed to be simple. I can't detect the simplicity in it
> (not from the info file). It kind of deters me. I see that guile
> binary is 3K, that seems okay, but it calls horrific c-libraries like
> the infamous readline, maybe even malloc.

Guile is quite large and complex these days, maybe to compete with
higher end Javascript implementations like v8. That 3k binary is
certainly a dynamic loader client that pulls in other libraries. It
looks like libguile-2.0.so.22 on my system is about 1.6MB and it also
pulls in a lot of other code.

You might look at this old (2.0) version of SIOD if you want to see a
minimal Scheme:

http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/impl/siod/siod_20.tgz

This version is nice because it has two different gc's (mark-sweep and
copying) where you choose which one you want with a command line option.
Each gc is just a few dozen LOC. Earlier versions only had the copying
GC. Later versions became bigger and more complicated, maybe more
useful for practical purposes, but less easy to study.

SIOD stands for Scheme In One Day because the original version was
supposedly written in under 24 hours. It's around 1 KLOC of C.

Spiros Bousbouras

unread,
Sep 5, 2016, 3:47:08 AM9/5/16
to
On Mon, 5 Sep 2016 03:31:23 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
>
> In article <87fupfo...@jester.gateway.pace.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
> >alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> >> The GC seems to be a difficult part. How can there be one if there
> >> is no lisp yet to speak of? A c-library?
> >
> >Anton Ertl has already written one in Forth:
> >
> > http://www.complang.tuwien.ac.at/forth/garbage-collection.zip
> >
> >It supplies an ALLOC word that gives you a chunk of memory of you
> >desired size. There is no corresponding FREE, since ALLOC notices if
> >you're running low on memory, and performs a GC if you are. I'm
> >suggesting that you use Anton's package and write Lisp-callable words
> >like CONS as Forth words that use ALLOC, so the GC part is already done
> >for you.
>
> The constraints that package imposes interferes with the non-existing
> constraints I will impose on a yet non-existent lisp.
> I tell you what I'll do. I just ALLOCATE away in my 16 Gbyte space.
> If I know I can FREE I'll do it. Then I'll add GC as an afterthought.
> That that is possible will be the final proof that my design was okay.

I don't think that garbage collection is something to be added as an
afterthought. The existence of garbage collection affects the allocation (and
possible recollection) of every object (see examples below) so it's one of
the issues you need to be thinking about from the start. It's likely more
fundamental than the syntax of the language. Having said that , it's not that
big a hurdle. I don't know why you don't like Anton Ertl's package but there
is also the Boehm-Demers-Weiser garbage collector which is used in Embeddable
Common Lisp , Guile Scheme and was used in an early implementation of Racket
Scheme. Even writing your own precise garbage collector is not terribly
difficult , see
http://www.cs.cornell.edu/courses/cs312/2003fa/lectures/sec24.htm
for a simple algorithm.

> About garbage: Is this correct (the infamous Fibonacci series)?
>
> After:
> (define (fib n)
> (do ((i 1 (+ 1 i))
> (a 1 (+ a b))
> (b 1 a ))
> ((>= i n ) b)))
> the fib procedure remains there in all eternity and there is nothing
> to collect. I can do (fib 10) any time I please.
> Or will there be anonymous memory places and are the
> boxes they are in such as (a,b,i,n) collectable?

As long as you don't redefine it then yes it remains there for all eternity.
But if you call it then the arguments fib and n as well as the local
(lexical in Lisp parlance) variables i , a , b will have to be allocated
somewhere. So whether they create garbage , which will need to be reclaimed
eventually , depends on the allocation strategy of the implementation. A
perfectly reasonable strategy is that the implementation gets a big chunk of
memory from the operating system and every time it needs a new object it uses
the smallest address from that space which it hasn't used yet. If the space
fills up then the implementation may ask for more memory from the operating
system or do garbage collection ; obviously , eventually it will have to do
garbage collection rather than keep asking for more and more memory. With
such an allocation scheme , every time you call fib and it returns then it
will create garbage for fib , n , etc. Now a sufficiently clever Lisp
implementation may be able to prove that none of the variables of fib can
be referenced after fib terminates and therefore allocate them on the
hardware stack as it would (likely) happen in a language like C.
en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
not that clever then every call to fib creates garbage and if that garbage
isn't cleared every now and again , it will consume eventually all available
memory.

> But what with (directly calculating the 100-th fib number)
> (do ((i 1 (+ 1 i))
> (a 1 (+ a b))
> (b 1 a ))
> ((>= i 100 ) b )
> )
> 354224848179261915075
> does that leave garbage in your run of the mill scheme?
> (Because it might not in mine.)

Same as above ; if the Lisp implementation is clever enough to see that
i , a , b cannot be accessed after the piece of code terminates then it will
allocate them in such a way as to not create garbage (like on the hardware
stack) , otherwise they will be garbage. Note also , which Paul Rubin pointed
out in a different post , that even 354224848179261915075 may create garbage
if it's so large that it won't fit in a register. Since
354224848179261915075 > 2^64 then even in a 64 bit machine it may create
garbage. Even just writing on the REPL
354224848179261915075
may create garbage.

> In the procedure fib can I optimise the fib into machine code with
> (optimise fib)
> that optimises the underlying Forth code, without getting on
> someones toes?
> Obviously that may leave some garbage that may be harder to collect.

Every garbage collection algorithm I've heard of deals with low level
pointers so turning your high level language into machine code won't make
garbage collection harder. But in any case , your concern shows again that
garbage collection is something you have to incorporate into your design from
the start , not something you add as an afterthought.

[...]

> I just read the description of Guile. That is a Scheme dialect and Scheme
> is supposed to be simple. I can't detect the simplicity in it (not
> from the info file). It kind of deters me.
> I see that guile binary is 3K, that seems okay, but it calls
> horrific c-libraries like the infamous readline, maybe even malloc.

Every Scheme intended for real work will have plenty of extensions on top of
the Scheme standard. en.wikipedia.org/wiki/Scheme_(programming_language) :

The R6RS standard has caused controversy because it is seen to
have departed from the minimalist philosophy. In August 2009, the
Scheme Steering Committee which oversees the standardization
process announced its intention to recommend splitting Scheme
into two languages: a large modern programming language for
programmers, and a subset of the large version retaining the
minimalism praised by educators and casual implementors

I don't know why you think readline is infamous but it's for providing a more
pleasant interface and not necessary. On the other hand , Guile certainly
uses the GNU multiprecision library and being able to handle arbitrarily
large integers *is* part of the core functionality. Since it uses the
Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.

--
For network programming today, my first-hand choice is to use the
Boost.Asio C++ library, with nonblocking sockets and no threads.
Jorgen Grahn
Didn't see that coming. That wouldn't be my last choice; it wouldn't
even make the list of choices. I'd rather open a fruit stand.
James K. Lowden

Albert van der Horst

unread,
Sep 5, 2016, 5:57:02 AM9/5/16
to
I've cross posted to comp.lang.lisp
The context is writing a minimal lisp, but with Forth instead of
c as the implementation language.

In article <_p9zz.1142663$OC.4...@fx39.am4>,
Spiros Bousbouras <spi...@gmail.com> wrote:
<SNIP>
>
>I don't think that garbage collection is something to be added as an
>afterthought. The existence of garbage collection affects the allocation (and
>possible recollection) of every object (see examples below) so it's one of
>the issues you need to be thinking about from the start. It's likely more
>fundamental than the syntax of the language. Having said that , it's not that
>big a hurdle. I don't know why you don't like Anton Ertl's package but there
>is also the Boehm-Demers-Weiser garbage collector which is used in Embeddable
>Common Lisp , Guile Scheme and was used in an early implementation of Racket
>Scheme. Even writing your own precise garbage collector is not terribly
>difficult , see
> http://www.cs.cornell.edu/courses/cs312/2003fa/lectures/sec24.htm
>for a simple algorithm.
>
>> About garbage: Is this correct (the infamous Fibonacci series)?
>>
>> After:
>> (define (fib n)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i n ) b)))
>> the fib procedure remains there in all eternity and there is nothing
>> to collect. I can do (fib 10) any time I please.
>> Or will there be anonymous memory places and are the
>> boxes they are in such as (a,b,i,n) collectable?
>
>As long as you don't redefine it then yes it remains there for all eternity.

Maybe there is a confusion between dynamic allocation / knowing
exactly where each piece of memory hangs out, and garbage
collection.
Such a fib consist (in my view) of a Forth word header of 5 CELLS which
is dynamically allocated *using my Forth ALLOCATE* and pointed to by some
pointer sitting in some tree or list or hash as a global heap.
Then the code is high level Forth code, and again it is allocated
dynamically and a pointer to it is present in the header.
(High level Forth code is, of course a list of code tokens.)
Contrary to malloc, my ALLOCATE knows a lot about
the size etc of pieces allocated.
Of course I make sure that analysing the heap is possible to find
out what is no longer needed. But I call actually doing that
the garbage collection. There is no doubt in my mind that
I can implement a garbage collector even if it is
very inefficient. What one needs is all knowledge about pointers to
dynamically allocated area and into dynamically allocated area and
know which is which, and a means to know the sizes.
That may not be properly called an afterthought.
If I wouldn't keep that information from the start, an attempt to
write a Lisp would be ludricous indeed. If I want to investigate how
a Forth can morph into a lisp, I can't have an external tool
dictate me how to do the allocation. Using my own ALLOCATE I can change
it, if I want properties specifically needed for LISP

>But if you call it then the arguments fib and n as well as the local
>(lexical in Lisp parlance) variables i , a , b will have to be allocated
>somewhere. So whether they create garbage , which will need to be reclaimed
>eventually , depends on the allocation strategy of the implementation. A
>perfectly reasonable strategy is that the implementation gets a big chunk of
>memory from the operating system and every time it needs a new object it uses
>the smallest address from that space which it hasn't used yet. If the space
>fills up then the implementation may ask for more memory from the operating
>system or do garbage collection ; obviously , eventually it will have to do
>garbage collection rather than keep asking for more and more memory. With
>such an allocation scheme , every time you call fib and it returns then it
>will create garbage for fib , n , etc. Now a sufficiently clever Lisp
>implementation may be able to prove that none of the variables of fib can
>be referenced after fib terminates and therefore allocate them on the
>hardware stack as it would (likely) happen in a language like C.
>en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
>not that clever then every call to fib creates garbage and if that garbage
>isn't cleared every now and again , it will consume eventually all available
>memory.

My Forth allocation strategy is simpler. At the start a large chunk
is asked from Linux (which overcommits memory). Then my ALLOCATE
gives out memory on a first free basis. Maybe I get stuck and have to
do garbage collection, which amounts to calling FREE from the allocator.
If the large stuck wasn't enough, that is the end.
With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
the GC needs to kick in.

>
>> But what with (directly calculating the 100-th fib number)
>> (do ((i 1 (+ 1 i))
>> (a 1 (+ a b))
>> (b 1 a ))
>> ((>= i 100 ) b )
>> )
>> 354224848179261915075
>> does that leave garbage in your run of the mill scheme?
>> (Because it might not in mine.)
>
>Same as above ; if the Lisp implementation is clever enough to see that
>i , a , b cannot be accessed after the piece of code terminates then it will
>allocate them in such a way as to not create garbage (like on the hardware
>stack) , otherwise they will be garbage. Note also , which Paul Rubin pointed
>out in a different post , that even 354224848179261915075 may create garbage
>if it's so large that it won't fit in a register. Since
>354224848179261915075 > 2^64 then even in a 64 bit machine it may create
>garbage. Even just writing on the REPL
> 354224848179261915075
>may create garbage.

That is interesting. The Forth REPL can interpret the line
.( aap)
and it will print aap. The string aap is a temporary object and
disappears automatically. It will not be allocated anywhere,
only a string descriptor will be on the Forth stack.
"aap" TYPE
likewise will print aap, otoh it will allocate a string (in ciforth).
Its descriptor is thrown away, so the string itself becomes garbage.
Normally in Forth this garbage is collected using FORGET/MARKER
a very primitive mechanism that basically only restores a previous
memory situation.
Thank you. good to know that. I have no objection to extensions of course.
My statically linked lina is 30 kbyte. If you want floating point,
you have to load it from a source library, so then you've to load an
assembler first from a source library.

>
>I don't know why you think readline is infamous but it's for providing a more
>pleasant interface and not necessary. On the other hand , Guile certainly
>uses the GNU multiprecision library and being able to handle arbitrarily
>large integers *is* part of the core functionality. Since it uses the
>Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
>(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.

(In the context of toy lisps that can be implemented in 1000 lines,
I hate readline which is itself couple of hundred lines,
is not written in lisp and is not strictly needed.
I'm glad to see that it is a loadable extension to Guile.)

BDW is a c-package? I'm disappointed, because people told me that
Lisp can be implemented using 5 basic operations. That means
something different than saying that yourforth is implemented
with 38 basic words (in assembler) apparently.
In analysing what is minimally needed in Forth always
KEY and EMIT come up. That is realistic. If you can't tell the
computer what to do, and can't see what it has done, your
language is incomplete. Lispers seem to think that REPL is
a tucked on part.

If you would program a LISP machine like I have a Forth machine
(just a tower that has Forth in the boot sector), would it then contain
a couple of giant binary blobs being the GNU multiprecision lib,
the compiled readline lib and maybe even a compiled BDW library written
in C?
Eech!

> Jorgen Grahn
> James K. Lowden

Two names in a signature, but you're Spiros?

Paul Rubin

unread,
Sep 5, 2016, 3:04:21 PM9/5/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
> the GC needs to kick in.

I suppose that's a reasonable way to start.

>> 354224848179261915075 may create garbage
> That is interesting. The Forth REPL can interpret the line .( aap)
> and it will print aap. The string aap is a temporary object and
> disappears automatically. It will not be allocated anywhere,

What happens if you enter the line "123" to leave the number 123 on the
stack? It goes into a stack cell (machine word) that's reclaimed
automatically when you pop the stack--that's how Forth works. But what
if the number is 354224848179261915075? It won't fit in a cell. It's a
bignum that must occupy multiple cells, so you'd use something like
ALLOCATE to get those cells, put the number in them, and leave just a
pointer on the stack. Later once the number is no longer in use, you'll
want to reclaim the memory. That's why we say entering
354224848179261915075 in a Lisp REPL might create garbage.

> Normally in Forth this garbage is collected using FORGET/MARKER
> a very primitive mechanism that basically only restores a previous
> memory situation.

Having GC frees you from having to think about stuff like that.

Edsger W. Dijkstra in his 1972 Turing Award lecture said,

"With a few very basic principles at its foundation, it [LISP] has
shown a remarkable stability. Besides that, LISP has been the
carrier for a considerable number of in a sense our most
sophisticated computer applications. LISP has jokingly been
described as “the most intelligent way to misuse a computer”. I
think that description a great compliment because it transmits the
full flavour of liberation: it has assisted a number of our most
gifted fellow humans in thinking previously impossible
thoughts."

Part of the way it did that was by handling stuff like memory management
for you behind the scenes, in a manner mostly unknown in other languages
of that era. These days, of course, everybody does it.

> BDW is a c-package? I'm disappointed, because people told me that
> Lisp can be implemented using 5 basic operations.

That's too vague to be meaningful. Lisp code is written with the
illusion of infinite memory, and the GC reclaims the actual underlying
memory behind the scenes, invisibly to the program. GC has to munge on
actual raw memory which Lisp doesn't have a way to do. BDW is one
approach to GC but there are others, all of which need machine-level
primitives. Some Lisps include memory-access primitives so the GC can
itself be written in Lisp, but that stuff has to be done very carefully
to avoid creating garbage while the GC is running. I believe that
Scheme-48 (s48.org) works that way.

> If you would program a LISP machine like I have a Forth machine (just
> a tower that has Forth in the boot sector), would it then contain a
> couple of giant binary blobs being the GNU multiprecision lib, the
> compiled readline lib and maybe even a compiled BDW library written in
> C? Eech!

BDW is a nice simple way to get GC for a small Lisp, but the higher
performance ones tend to use other methods. And of course Lisp is much
older than the GNU project. Traditionally bignums would have been
implemented in assembly language. The historical Lisp machines had
hardware-assisted GC, I believe.

There's a bare-metal Scheme for the Raspberry Pi but I don't know if
anyone actually uses it. Big Lisp systems usually run under OS's rather
than on bare metal. Although control systems have been done in Lisp,
Lisp usually hasn't been in the same niche as Forth. It's more for
computation than for control, so it usually doesn't care much about low
level hardware access.

Albert van der Horst

unread,
Sep 6, 2016, 5:39:57 AM9/6/16
to
In article <87zinmm...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
<SNIP>
>> If you would program a LISP machine like I have a Forth machine (just
>> a tower that has Forth in the boot sector), would it then contain a
>> couple of giant binary blobs being the GNU multiprecision lib, the
>> compiled readline lib and maybe even a compiled BDW library written in
>> C? Eech!
>
>BDW is a nice simple way to get GC for a small Lisp, but the higher
>performance ones tend to use other methods. And of course Lisp is much
>older than the GNU project. Traditionally bignums would have been
>implemented in assembly language. The historical Lisp machines had
>hardware-assisted GC, I believe.

How can a c-package that has the same API as malloc ever know what
is garbage to LISP? Why would my ALLOCATE fare any less than BDW
if it has access to that same information?

A package with an ALLOCATE and a FREE is not a garbage collector.
Garbage collecting is about knowing what can be freed.
(Merging free areas into one is done automatically and incrementally
in my Forth ALLOCATE. I hope not that this is called garbage collection.
My Forth ALLOCATE knows the size of all allocated area's, so it is
superior to malloc() in all respects except speed.)

I can imagine there is a "fat" FREE to be called from lisp.
It frees the area pointed to and recursively via pointers stored
there all areas indirectly pointed to. I can add a fat FREE to
my ALLOCATE in a heartbeat. But surely only Lisp can decide
when to call it. If there are any aliasing pointers to one
of those areas, it can't fatFREE it.

>
>There's a bare-metal Scheme for the Raspberry Pi but I don't know if
>anyone actually uses it. Big Lisp systems usually run under OS's rather
>than on bare metal. Although control systems have been done in Lisp,
>Lisp usually hasn't been in the same niche as Forth. It's more for
>computation than for control, so it usually doesn't care much about low
>level hardware access.
>

Groetjes Albert

Albert van der Horst

unread,
Sep 6, 2016, 6:23:02 AM9/6/16
to
Has the following API sufficient functionality to base the garbage
collection of a LISP on?

I've a dynamix allocation package with the following API
Forth style (exceptions are thrown and not shown)

n ALLOCATE addr : Addr point to an area of size n,
subsequently under control of the caller.
addr FREE : Area at addr is no longer under control.
addr n RESIZE : Make the area previously ALLOCATEd a different size.
Content may have to be copied.
addr SIZE size : Size is the length passed to ALLOCATE or RESIZE
addr mask MASK! : Add 10 bits of information related to addr
addr MASK@ mask : Get back 10 bits of information related to addr

There are two allocation areas, only one is in use at any one time.
SWITCH : Use the other allocation area
mask lambda BOTH : Perform lambda with data mask on all area's
mask lambda BACKGROUND : Perform lambda with data mask on all
area's in the allocation area not in use.

RATIONALE:
An action to be done in BOTH would be : store 1 in mask.

An action to be done in BOTH would be store 0 in mask.

An action passed to be BACKGROUND would be
"if mask contains 1, free the allocated area."

This hints to a garbage collector that can work undisturbed,
while an other area is in use.
[I hope that the trivial idea of using two area's is not
patented/patentable. If so I hereby declare prior art.]

(The LISP can be adapted to the API, it doesn;t exist yet.)

Spiros Bousbouras

unread,
Sep 6, 2016, 9:18:52 AM9/6/16
to
On Mon, 5 Sep 2016 12:05:26 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
Are those 5 CELLS going to be reused every time the function gets called ?
For this function it doesn't matter if they are but in Lisp you can create
closures which return storage local to the function (i.e. lexical variables)
up the call stack. For example (I'll use Common Lisp syntax because I'm
more familiar with it than Scheme)

(defun adder (n)
(lambda (i) (+ n i)))

If you're going to translate adder in Forth it won't do to have 1 CELL for
n which gets reused every time adder gets called ; you need to create fresh
storage space for adder every time it gets called unless you can prove that
its return value (which is an anonymous function) gets discarded.

> Then the code is high level Forth code, and again it is allocated
> dynamically and a pointer to it is present in the header.
> (High level Forth code is, of course a list of code tokens.)
> Contrary to malloc, my ALLOCATE knows a lot about
> the size etc of pieces allocated.

malloc() also knows about the size of pieces allocated otherwise it wouldn't
work. The interface does not make such information available to the programmer
but the internals of malloc() (and related functions) certainly need to know.

> Of course I make sure that analysing the heap is possible to find
> out what is no longer needed. But I call actually doing that
> the garbage collection.

It's part of garbage collection but you also need to reclaim the space.

> There is no doubt in my mind that
> I can implement a garbage collector even if it is
> very inefficient. What one needs is all knowledge about pointers to
> dynamically allocated area and into dynamically allocated area and
> know which is which, and a means to know the sizes.
> That may not be properly called an afterthought.
> If I wouldn't keep that information from the start, an attempt to
> write a Lisp would be ludricous indeed. If I want to investigate how
> a Forth can morph into a lisp, I can't have an external tool
> dictate me how to do the allocation. Using my own ALLOCATE I can change
> it, if I want properties specifically needed for LISP

Well , if you're going to change ALLOCATE so that it stores the necessary
information to do garbage collection then I would most certainly would not call
it an "afterthought" ; on the contrary you design with garbage collection in
mind starting from the most basic components.

> >But if you call it then the arguments fib and n as well as the local
> >(lexical in Lisp parlance) variables i , a , b will have to be allocated
> >somewhere. So whether they create garbage , which will need to be reclaimed
> >eventually , depends on the allocation strategy of the implementation. A
> >perfectly reasonable strategy is that the implementation gets a big chunk of
> >memory from the operating system and every time it needs a new object it uses
> >the smallest address from that space which it hasn't used yet. If the space
> >fills up then the implementation may ask for more memory from the operating
> >system or do garbage collection ; obviously , eventually it will have to do
> >garbage collection rather than keep asking for more and more memory. With
> >such an allocation scheme , every time you call fib and it returns then it
> >will create garbage for fib , n , etc. Now a sufficiently clever Lisp
> >implementation may be able to prove that none of the variables of fib can
> >be referenced after fib terminates and therefore allocate them on the
> >hardware stack as it would (likely) happen in a language like C.
> >en.wikipedia.org/wiki/Escape_analysis has some related content. But if it's
> >not that clever then every call to fib creates garbage and if that garbage
> >isn't cleared every now and again , it will consume eventually all available
> >memory.
>
> My Forth allocation strategy is simpler. At the start a large chunk
> is asked from Linux (which overcommits memory). Then my ALLOCATE
> gives out memory on a first free basis. Maybe I get stuck and have to
> do garbage collection, which amounts to calling FREE from the allocator.

What you're describing here is the same thing I describe in the quoted paragraph
above so I wouldn't call it simpler. I mentioned "escape analysis" as an
optimisation but it's not necessary.

> If the large stuck wasn't enough, that is the end.
> With a toy Lisp and a 16 gByte heap, I can do a lot of testing before
> the GC needs to kick in.

[...]

> >I don't know why you think readline is infamous but it's for providing a more
> >pleasant interface and not necessary. On the other hand , Guile certainly
> >uses the GNU multiprecision library and being able to handle arbitrarily
> >large integers *is* part of the core functionality. Since it uses the
> >Boehm-Demers-Weiser garbage collector , I'm guessing it doesn't call malloc()
> >(or realloc()) but the analogous functions Boehm-Demers-Weiser provides.
>
> (In the context of toy lisps that can be implemented in 1000 lines,
> I hate readline which is itself couple of hundred lines,
> is not written in lisp and is not strictly needed.
> I'm glad to see that it is a loadable extension to Guile.)
>
> BDW is a c-package? I'm disappointed, because people told me that
> Lisp can be implemented using 5 basic operations.

BDW (Boehm-Demers-Weiser) is not necessary. I only brought it up because in an
earlier post you said "The GC seems to be a difficult part. How can there be
one if there is no lisp yet to speak of? A c-library?" .Now you seem confident
that you can write your own garbage collector. In which case you can forget
about BDW.

> That means
> something different than saying that yourforth is implemented
> with 38 basic words (in assembler) apparently.
> In analysing what is minimally needed in Forth always
> KEY and EMIT come up. That is realistic. If you can't tell the
> computer what to do, and can't see what it has done, your
> language is incomplete. Lispers seem to think that REPL is
> a tucked on part.

Not in the slightest ; I can't imagine what gave you that idea. Incremental
development where you test things as you write them is valued in the Lisp
world as much as it is in the Forth world. Note that Common Lisp has the
function READ which reads from a stream and returns Lisp objects. It also has
extensive specifications about how the various standard Common Lisp objects
should be printed as well as the ability to define your own methods to print
user defined objects in any way you like.

> If you would program a LISP machine like I have a Forth machine
> (just a tower that has Forth in the boot sector), would it then contain
> a couple of giant binary blobs being the GNU multiprecision lib,
> the compiled readline lib and maybe even a compiled BDW library written
> in C?
> Eech!

GNU multiprecision lib , etc. are shared libraries so in order to work they
need an operating system or at least parts of it like a dynamic linker. The
readline library is just an interface and not one well suited for Lisp so it's
not relevant to writing an operating system in Lisp. An operating system in
Lisp probably doesn't need support for arbitrarily large integers either ; I'm
not sure that such support would be considered essential for a Lisp under any
circumstances but for desktop applications it's certainly convenient to have.
For garbage collection , a Lisp from scratch would be best served by a precise
garbage collector so not BDW. Actually , an operating system with garbage
collection from the bottom up would be a fascinating thing to have regardless
of the implementation language.

> > Jorgen Grahn
> > James K. Lowden
>
> Two names in a signature, but you're Spiros?

My multiple personality disorder has been acting up :-D

--
it's just that in C++ and the like, you don't trust _anybody_, and in CLOS you
basically trust everybody. the practical result is that thieves and bums use C++
and nice people use CLOS. :)
Erik Naggum

Spiros Bousbouras

unread,
Sep 6, 2016, 11:32:33 AM9/6/16
to
On Tue, 6 Sep 2016 11:48:12 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
> In article <87zinmm...@jester.gateway.pace.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
> >
> >BDW is a nice simple way to get GC for a small Lisp, but the higher
> >performance ones tend to use other methods. And of course Lisp is much
> >older than the GNU project. Traditionally bignums would have been
> >implemented in assembly language. The historical Lisp machines had
> >hardware-assisted GC, I believe.
>
> How can a c-package that has the same API as malloc ever know what
> is garbage to LISP? Why would my ALLOCATE fare any less than BDW
> if it has access to that same information?

The basic idea is as follows : the application which uses BDW
(Boehm-Demers-Weiser) always uses the replacements for malloc() and realloc()
BDW provides to allocate memory. So the BDW library internally knows all the
ranges of addresses which the application has allocated. When BDW needs to do
garbage collection then it scans all the "data" of the application meaning ,
heap , hardware stack and processor registers. Every time it finds something
which could be interpreted as a pointer to a section of allocated memory then
it marks that section as in use. Then it scans each such section for pointers
and continues recursively in the same fashion until it has traversed the whole
references graph. After it has done so , each allocated memory section which
has not being marked during the traversal as in being in use , can be freed.
For more details I will refer you to the following :

http://www.iecc.com/gclist/GC-algorithms.html#Conservative%20collection

Conservative collection

Conservative garbage collection makes use of the observation
that if you are not relocating (copying) objects, then you
need not be quite so certain about exactly what is a pointer.
It suffices to scan the root set (and objects) for any
pointer-like bit patterns, and treat those pointer-like bit
patterns as actual pointers while marking. The result is that
the "true" reachable objects are all found, along with a few
others. This works surprisingly well in practice (if a few
additional tricks are added) and often permits the use of
garbage collection with oblivious source code and compilers.

Conservative collection is very important because it permits
the use of garbage collection with programs that were not
written with garbage collection in mind. That is, it simpifies
the use of garbage collection with existing (non-garbage-collected)
libraries of code.

or

http://hboehm.info/gc/gcdescr.html
Conservative GC Algorithmic Overview
This is a description of the algorithms and data structures used in
our conservative garbage collector.

Note that the techniques work for any application whatsoever not just a Lisp
compiler.

> A package with an ALLOCATE and a FREE is not a garbage collector.
> Garbage collecting is about knowing what can be freed.
> (Merging free areas into one is done automatically and incrementally
> in my Forth ALLOCATE. I hope not that this is called garbage collection.
> My Forth ALLOCATE knows the size of all allocated area's, so it is
> superior to malloc() in all respects except speed.)

As I said in a previous post , the malloc() library also knows internally the
size of all allocated areas otherwise it wouldn't work. The advantage you have
if you have control of both the memory allocation "infrastructure" (which would
be your own Forth in this case) as well as the application (which would be the
Lisp compiler you want to write) is that the application can appropriately mark
all objects which are pointers so that you can have *precise* garbage
collection as opposed to conservative garbage collection ; in the latter , the
algorithm I describe above can mistake as pointers objects which the
application uses as integers simply because accidentally they have a value
which corresponds to one of the allocated memory areas. So the application is
not going to use such an object as a reference to other objects but the garbage
collector doesn't know this so it has to assume the worst (i.e. be
conservative) and treat it as a pointer.

Here's an example of how such marking might work : say you have an architecture
where all objects have to be aligned at even memory addresses. This means that
every bit pattern which corresponds to a pointer will have the 0-th bit be
zero. So a garbage collected programming language may use the 0-th bit as a tag
bit to indicate integers and make this always 1. So for example , if it wants
to store the integer 4 it will not store bit pattern 100 (or appropriate width)
but 1001. This of course means that in order to add 2 integers you cannot just
use the assembly instruction but you have to shift both of them 1 position
right , then add them as normal , then shift the result 1 position left and
finally make the 0-th bit 1 again to tag the result as an integer. Similarly
for other mathematical operations. This is comparatively slow (plus you
sacrifice one value bit as a tag) but in the grand scheme of things , which
includes efficient and precise garbage collection , it's worth it. If I
remember correctly , the OCaml implementation does something like this. But the
main point is that for such a scheme to work , there has to be cooperation
between the garbage collector and the application which runs on top of it.

> I can imagine there is a "fat" FREE to be called from lisp.
> It frees the area pointed to and recursively via pointers stored
> there all areas indirectly pointed to. I can add a fat FREE to
> my ALLOCATE in a heartbeat. But surely only Lisp can decide
> when to call it. If there are any aliasing pointers to one
> of those areas, it can't fatFREE it.

Actually the garbage collector tends to decide when to do garbage collection as
opposed to say requesting more memory from the operating system. Having said
that , if the programming language has a command to do garbage collection then
it's a plus. If for an example an application knows it's going to be idle for a
while , then it may as well ask for a garbage collection to happen.

--
It is better that some innocent men remain in jail than that the
integrity of the English judicial system be impugned.
https://en.wikipedia.org/wiki/Lord_Denning

Spiros Bousbouras

unread,
Sep 6, 2016, 11:34:17 AM9/6/16
to
On Tue, 6 Sep 2016 12:30:46 +0200 (CEST)
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
> Has the following API sufficient functionality to base the garbage
> collection of a LISP on?
>
> I've a dynamix allocation package with the following API
> Forth style (exceptions are thrown and not shown)
>
> n ALLOCATE addr : Addr point to an area of size n,
> subsequently under control of the caller.
> addr FREE : Area at addr is no longer under control.
> addr n RESIZE : Make the area previously ALLOCATEd a different size.
> Content may have to be copied.
> addr SIZE size : Size is the length passed to ALLOCATE or RESIZE
> addr mask MASK! : Add 10 bits of information related to addr
> addr MASK@ mask : Get back 10 bits of information related to addr
>
> There are two allocation areas, only one is in use at any one time.
> SWITCH : Use the other allocation area
> mask lambda BOTH : Perform lambda with data mask on all area's
> mask lambda BACKGROUND : Perform lambda with data mask on all
> area's in the allocation area not in use.

What does "perform lambda" mean ?

Albert van der Horst

unread,
Sep 6, 2016, 3:04:31 PM9/6/16
to
In article <YlBzz.1312339$3%1.93...@fx40.am4>,
I try to speak lisp :-) . A lambda is a pointer to a subroutine aka
Forth execution token.
What I mean is that the allocation system follows a linked list of
pointers and execute a procedure that inspects whether it contains
mask, and if so frees the allocated area.

Groetjes Albert

Albert van der Horst

unread,
Sep 6, 2016, 3:18:59 PM9/6/16
to
In article <jkBzz.1477$xE....@fx35.am4>,
That is a whole of a lot better than the wikipedia entry. I feel like pasting
that whole text into wikipedia.
The crucial points are
the whole memory in use by the application is considered as potential pointers
conservative: garbage may not be freed, but if it is freed it is
certainly garbage

Neither of these points came through from wikipedia.

What about pointers in the middle of an area?
Do they prevent the area from being freed?

<SNIP>

Thanks for the helpful information.

Anton Ertl

unread,
Sep 6, 2016, 4:15:15 PM9/6/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> the whole memory in use by the application is considered as potential pointers

That's wrong. If it was, cyclic structures would not be collected,
and a lot of other stuff would also not be collected (e.g., even if
there was no spurious pointer to a dead linked list, it would only
collect one element (the current first one) per collection).

For an automatic collector (conservative or no), the root set is the
static memory (data and bss segments in Unix), stacks, and registers.
My garbage collector is not as automatic and requires you to register
the roots, because there is no standard way in Forth to scan
everything where there might be roots.

> conservative: garbage may not be freed, but if it is freed it is
> certainly garbage

Actually, no garbage collector guarantees that any memory it keeps is
accessed later. Conservative garbage collectors may keep stuff around
because something that is not a real pointer seems to point to it,
whereas precise collectors do not have this problem. Because precise
collectors know what is a pointer and what is not, they can also copy
the live data around.

>What about pointers in the middle of an area?
>Do they prevent the area from being freed?

That's up to the collector. My collector doesn't. In general, for
conservative collectors, that would significantly increase the number
of spurious pointers. It also depends on the application. In a basic
Lisp system, all pointers are to the start of objects, so there is no
point in handling pointers to the middle of an object.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

WJ

unread,
Sep 6, 2016, 4:52:30 PM9/6/16
to
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Spiros Bousbouras wrote:

> it's just that in C++ and the like, you don't trust anybody, and in CLOS you
> basically trust everybody. the practical result is that thieves and bums use C++
> and nice people use CLOS. :)
> Erik Naggum


Erik Naggum wrote:

> * Martin Rodgers
> | I guss Erik doesn't wish to heal any wounds.
>
> you _are_ the wound. if you go away, the wound will be healed. I really
> want the wound to heal: JUST GET THE HELL OUT OF MY FACE!
>
> and whoever gave you permission to quote from private communication?
>
> please go die, Martin Rogders. at least the maggots will like you.


Erik Naggum wrote:

> You are a clever little asshole, aren't you?

> you incredibly retarded jerk!

> Man, how can you survive being so goddamn _stupid_?

> insufferably arrogant moron.

> Geez, you are such an idiot.

> your exceptionally pitiful brain

> qualities are important? Or are you so unphilosophical and such a
> leering idiot with a moronic grin permanently attached to his skull that

> had been able to think at all, you would probably experience _several_
> revelations of such magnitude that one "god" would not be enough.



Erik Naggum wrote:

> | > You have to get at the bytes where they are actually found, just
> | > after they have been read, and just before they are written.
> | > Anything else is pretty damn stupid.
>
> | Sure. Who said otherwise?
>
> You have strongly implied "otherwise", you insuffable numbnut.

I have grudgingly to admit that this brilliant post proves
that Naggum was indubitably one of the leading luminaries of
the CL cosmos.

In fact, one could go so far as to say that he was the epitome
of all of the characteristics that make worshippers of CL so
special and precious.

If he were still alive, I would be sorely tempted to beseech
him for admittance into his pack of hyenas.

--
Jewish drug dealers, child porn pushers, and slave traders are free
from prosecution in Israel. Israel does not consider these to be
crimes ... so long as the victims are non-Jews.
www.theoccidentalobserver.net/2009/08/the-culture-of-deceit/

Paul Rubin

unread,
Sep 6, 2016, 5:52:56 PM9/6/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> n ALLOCATE addr : Addr point to an area of size n,
> subsequently under control of the caller.

You need that.

> addr FREE : Area at addr is no longer under control.

Get rid of this--the GC handles it for you.

> addr n RESIZE : Make the area previously ALLOCATEd a different size.
> Content may have to be copied.

This isn't feasible in a non-moving system and I don't understand why
you wouldwant it anyway.

> addr SIZE size : Size is the length passed to ALLOCATE or RESIZE

The non-gc parts of the program shouldn't care about this either.

> addr mask MASK! : Add 10 bits of information related to addr
> addr MASK@ mask : Get back 10 bits of information related to addr

Is that for type tags? A number of Lisp systems started that way but
ended up going to smaller tags, to have more address bits left. Typical
is 1 tag bit distinguishing ints from pointers. If the low bit is the
tag and 0 means integer, then you can add two such integers together and
get another integer without any bit fiddling. For dereferencing
pointers you can mask off the bit, or use it relative to a base register
that itself contains an address adjusted for the extra 1 bit.

> There are two allocation areas, only one is in use at any one time.

Wait I'm confused, you're writing a semispace gc? I'd say look at the
one in SIOD and get an idea from that what you need. SICP also
describes something like that, towards the end of the book.

Albert van der Horst

unread,
Sep 6, 2016, 7:42:37 PM9/6/16
to
In article <87r38wn...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> n ALLOCATE addr : Addr point to an area of size n,
>> subsequently under control of the caller.
>
>You need that.
>
>> addr FREE : Area at addr is no longer under control.
>
>Get rid of this--the GC handles it for you.

There is some confusion here. I'm describing what is in the GC.
The GC certainly must be able to free space.

<SNIP>

Paul Rubin

unread,
Sep 6, 2016, 8:49:08 PM9/6/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>> addr FREE : Area at addr is no longer under control.
>>Get rid of this--the GC handles it for you.
> There is some confusion here. I'm describing what is in the GC.
> The GC certainly must be able to free space.

Oh I see what you mean--it wasn't clear what level that question was
directed at. Anyway, GC systems usually don't have a way to resize,
except by reallocation above the GC level.

Conservative and other non-moving GC's have to free space, but you might
as well use Anton's instead of re-inventing it. Or you could examine it
to see how it works.

Precise GC's can move the data around (usually to compact it into a
contiguous chunk) so they don't have to explicitly free. SICP has a
description of how to do that. SIOD incorporates both conservative and
copying collectors, with simple implmentations that are easy to read.

It's probably worth examining a few existing implementations before
writing your own.

FWIW, there's a good book by Jones, Hosking, and Moss (2012) about GC,
if you want to go into deeper details. It's a revision of an older book
by Jones and Lins that might be better known.

Paul Rubin

unread,
Sep 7, 2016, 1:03:20 AM9/7/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> Neither of these points came through from wikipedia.

You should probably look at the gc.html documentation file from Anton's
package, if you haven't already:

http://www.complang.tuwien.ac.at/forth/garbage-collection.zip

unfortunately it's not on its own web page afaik, so you have to extract
it from the zip file. It explains stuff like that pretty well.

Stefan Monnier

unread,
Sep 8, 2016, 12:12:12 PM9/8/16
to
>> the whole memory in use by the application is considered as potential
>> pointers
> That's wrong. If it was, cyclic structures would not be collected,

That depends on how you interpret "use".


Stefan

Albert van der Horst

unread,
Sep 8, 2016, 2:02:02 PM9/8/16
to
In article <jwvmvji74hy.fsf-mon...@gnu.org>,
In the context of conservative garbage collectors, I expect that sometimes
cyclic structures are not collected.

>
>
> Stefan

Paul Rubin

unread,
Sep 8, 2016, 3:34:29 PM9/8/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> In the context of conservative garbage collectors, I expect that sometimes
> cyclic structures are not collected.

There shouldn't be anything different about them from non-cyclic
structures as regards the gc finding them. Anton's point was that a
conservative gc doesn't scan all of memory; but rather, it recursively
traces outward from a predefined root set, like any other gc. If
there's a cyclic structure out there that's unreachable from the roots,
it will get collected.

George Neuner

unread,
Sep 8, 2016, 4:51:05 PM9/8/16
to

If you are really interested in how GC operates, you should read the
bible:

Garbage Collection: Algorithms for Automatic Dynamic Memory
Management; Richard Jones and Rafael D Lins; 1996.
ISBN-13: 978-0471941484
ISBN-10: 0471941484

Don't be too concerned about the age - this book is mainly theory and
little has changed since it was written.


For a survey of modern implementation techniques, see:

The Garbage Collection Handbook: The Art of Automatic Memory
Management; Richard Jones, Antony Hosking and Eliot Moss; 2011.
ISBN-13: 978-1420082791
ISBN-10: 1420082795

This book is light on theory ... it is not a good intro text, but IMO
it is invaluable to an implementor.


Hope this helps,
George

Paul Rubin

unread,
Sep 8, 2016, 7:34:18 PM9/8/16
to
George Neuner <gneu...@comcast.net> writes:
> The Garbage Collection Handbook: The Art of Automatic Memory
> Management; Richard Jones, Antony Hosking and Eliot Moss; 2011.

Is that something other than an update on the 1996 book?

I do know there have been some real advances since then, especially for
parallel gc, though that wouldn't be relevant to a basic Lisp
interpreter.

George Neuner

unread,
Sep 9, 2016, 12:12:24 AM9/9/16
to
The 2nd book is *not* an update of the 1st.

The 1996 book was mainly about theory [which really has _not_ changed
all that much]. The 2011 book is light on the theory and is mainly
about implementation techniques for modern processors, and it covers
HRT implementation (the older book covered only SRT).

The books complement each other. A practitioner might get by with
only the 2011 book, but someone looking for in-depth understanding
needs the 1996 as well.

George

Paul Rubin

unread,
Sep 9, 2016, 1:22:11 AM9/9/16
to
Thanks. I've only seen the 2011 one, which I liked. I'll look for the
1996 one as well. I didn't realize there was much theory behind the
subject to begin with.

Lars Brinkhoff

unread,
Sep 9, 2016, 4:05:12 AM9/9/16
to
Can you make it Emacs Lisp, so we can use it to rewrite Emacs in Forth?

Just kidding. Or maybe not?

Albert van der Horst

unread,
Sep 9, 2016, 10:10:56 AM9/9/16
to
In article <87fupfo...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> The GC seems to be a difficult part. How can there be one if there
>> is no lisp yet to speak of? A c-library?
>
>Anton Ertl has already written one in Forth:
>
> http://www.complang.tuwien.ac.at/forth/garbage-collection.zip
>
>It supplies an ALLOC word that gives you a chunk of memory of you
>desired size. There is no corresponding FREE, since ALLOC notices if
>you're running low on memory, and performs a GC if you are. I'm
>suggesting that you use Anton's package and write Lisp-callable words
>like CONS as Forth words that use ALLOC, so the GC part is already done
>for you.

Let's see. I want to show that Forth is an implementation language for
lisp. So I can't use a Forth that is c-based, then ultimately I would
create another c-based lisp. Boring. So I won't use gc.fs on gforth.
Porting gc.fs? It pulls in another 5 facilities, hmm. Including floating
point. Hmm. I just got fp recently on ciforth.

The alternative:
My ALLOCATE. It takes barely 2 screens. No dependancies. It works
with ciforth and I know it inside out. The discussion has shown that
any requirement that lisp might throw at it, I can add easily.

The choice is not hard.

>
>> The whole point of Forth is that such a reader be modular and handled
>> by the Forth interpreter. That is the crux of this exercise.
>
>If the exercise is just to write a Lisp reader in Forth, then ok; but in
>terms of getting a Lisp working, the reader is just a component.

No it isn't. It is to prove that the modular approach in Forth to
writing readers works for Lisp too. That is not a "Lisp reader in Forth".
My modular approach to denotations isn't even traditional Forth that
is stuck on recognizers, that were there in tforth 1993.

>
>>> instead oftreating "baz)))" as a Forth word?
>
>> I'll need a TOKEN2 with the second ?BLANK replaced by ?BLANK-OR-)
>> : ?BLANK-OR-) DUP &) = SWAP ?BLANK OR ;
>
>There are a few other characters like ' (quote) that you'll also want to
>treat specially. It's not terrible but I don't see much gain from this
>approach.
>
>> Or I may just require (+ 1 2 3 baz )
>
>Noooo......

Just kidding, I can spare the two lines needed for that.

>
>> You got me wrong. I don't want to write a Lisp. I want to demonstrate
>> that Forth can be used to write a Lisp.
>
>I'm rather liking the idea of a Forth with a Lisp extension. You could
>then use Forth directly to write the realtime or speed-intensive parts
>of a program and parts that need to address raw memory, and Lisp for
>higher-level parts, UI, etc.

This is of course the idea of using Forth as an implementation language.

Groetjes Albert

P.S. I'm getting more and more fond of a c-source I have.
It doesn't even use structs so it is dead-easy to transfer to
Forth. Once it works, I can make it better.
Already forgot from where it came ...

Paul Rubin

unread,
Sep 9, 2016, 1:55:32 PM9/9/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> The alternative: My ALLOCATE. It takes barely 2 screens. No
> dependancies. It works with ciforth and I know it inside out. The
> discussion has shown that any requirement that lisp might throw at it,
> I can add easily. The choice is not hard.

Well, I understood you wanted to use ciforth, no problem. I'm not
trying to pitch gc.fs: I've never used it myself, but I thought it was
intended to be fairly portable and I suggested it because it seemed like
the easiest way to go forward. There's certainly other ways to handle
gc.

I'm not sure what you want to do with ALLOCATE though:

1. Don't collect garbage but just allocate til you run out of memory,
then give up, per your earlier post. This is ok for testing and early
development, but in the end, you don't have Lisp if you can't reclaim
storage.

2. Add a gc.fs-style conservative gc to ALLOCATE. That's fine though it
might be easier to just adapt gc.fs.

3. Implement some other gc scheme, e.g. a precise one with compaction.
This probably means keeping the Lisp heap separate from the Forth heap,
plus it complicates the interface between Lisp primitives and the GC.
But in some ways it's the most powerful approach.

Do you know what you have in mind?

>>in terms of getting a Lisp working, the reader is just a component.
> No it isn't. It is to prove that the modular approach in Forth to
> writing readers works for Lisp too.

The existence of parsing words shows that it doesn't even always work
for Forth. It works often enough to be useful, for sure. Another part
of the issue is that just as Forth users like to extend Forth syntax
with Forth, Lisp users like to extend the Lisp reader in Lisp. You
might look at the spec for Common Lisp readers. It's complicated and
I'm not saying you want to implement all of it, but looking at it might
give you ideas of what users like to do.

> My modular approach to denotations isn't even traditional Forth that
> is stuck on recognizers, that were there in tforth 1993.

I'm not sure what you're referring to but it sounds interesting--is
there more info online?

> P.S. I'm getting more and more fond of a c-source I have. It doesn't
> even use structs so it is dead-easy to transfer to Forth.

Not sure what you mean: an existing Lisp in C?

Albert van der Horst

unread,
Sep 9, 2016, 3:31:23 PM9/9/16
to
In article <87twdpj...@jester.gateway.pace.com>,
Paul Rubin <no.e...@nospam.invalid> wrote:
<SNIP>
>
>I'm not sure what you want to do with ALLOCATE though:
>
>1. Don't collect garbage but just allocate til you run out of memory,
>then give up, per your earlier post. This is ok for testing and early
>development, but in the end, you don't have Lisp if you can't reclaim
>storage.
>
>2. Add a gc.fs-style conservative gc to ALLOCATE. That's fine though it
>might be easier to just adapt gc.fs.

This is a sequel to 1. My marking of the pieces in use will be exact,
so it is not a conservative gc. Then I've not given up on the idea
to just free things here and there which would work well with
my allocator.

>
>3. Implement some other gc scheme, e.g. a precise one with compaction.
>This probably means keeping the Lisp heap separate from the Forth heap,
>plus it complicates the interface between Lisp primitives and the GC.
>But in some ways it's the most powerful approach.

In particular there will be no fragmentation issues. Why is it that
I don't here about fragmentation in this context (of lisp)

>
>Do you know what you have in mind?
>
>>>in terms of getting a Lisp working, the reader is just a component.
>> No it isn't. It is to prove that the modular approach in Forth to
>> writing readers works for Lisp too.
>
>The existence of parsing words shows that it doesn't even always work
>for Forth. It works often enough to be useful, for sure. Another part
>of the issue is that just as Forth users like to extend Forth syntax
>with Forth, Lisp users like to extend the Lisp reader in Lisp. You
>might look at the spec for Common Lisp readers. It's complicated and
>I'm not saying you want to implement all of it, but looking at it might
>give you ideas of what users like to do.

If the basic Lisp facilities are there, I suppose the reader can be
extended in a Lisp way.

>
>> My modular approach to denotations isn't even traditional Forth that
>> is stuck on recognizers, that were there in tforth 1993.
>
>I'm not sure what you're referring to but it sounds interesting--is
>there more info online?

I'm referring to a trivial extension to Forth: if the search fails
and it is not a number, you've the option to execute some word.
This word can then interpret something as a special number or
whatever.
From the manual: ('thingies are vectors in tforth)
"
'NUMBER contains the execution token of the text interpreter
NUMBER? that converts a string to a number. It is executed whenever
Forth cannot find a word in the vocabulary. ... understands ...
different notations allowed in tforth.
... replace the content of 'NUMBER
however you should take care to obey its complicated stack behaviour.
... change without notice ..
"

You will find something similar probably in iForth, maybe even
the exact above text, as the iForth manual was copied from tForth.

>
>> P.S. I'm getting more and more fond of a c-source I have. It doesn't
>> even use structs so it is dead-easy to transfer to Forth.
>
>Not sure what you mean: an existing Lisp in C?

Indeed. Also about 1000 lines, but I find it easier to understand than
the Forth Lisp from Anton Ertl's website. It contains comment on
the criticial spots.
Somwhat curious. It uses parallel arrays instead of structs.
So all lisp objects are just longs:
cdr[car[x]] instead of x.car->cdr

Groetjes Albert

Paul Rubin

unread,
Sep 9, 2016, 4:06:04 PM9/9/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>2. Add a gc.fs-style conservative gc to ALLOCATE.
> My marking of the pieces in use will be exact, so it is not a
> conservative gc.

That complicates things somewhat but you get the benefit of an exact gc,
so fine. I think Guile uses conservative gc partly to make writing C
extensions easier: they don't have to think about making sure
intermediate values are marked or how to mark any Lisp pointer that
might appear in a C structure, etc. But requiring gc-aware extensions
isn't unbearable.

> Then I've not given up on the idea to just free things here and there
> which would work well with my allocator.

Well, it's a cost (complexity) vs benefit (speed-up) trade-off unless
you're worried about the gc missing stuff.

> In particular there will be no fragmentation issues. Why is it that
> I don't here about fragmentation in this context (of lisp)

With a conservative gc you can't prevent fragmentation. The simplest
and most traditional exact gc (straightforward mark and sweep) doesn't
prevent fragmentation either. Emacs has used non-compacting (except for
strings) mark/sweep for the past 30 years unless they changed it, and it
hasn't been a significant problem afaik.

If you want to defragment, the basic approaches are mark-and-compact, or
stop-and-copy. Both of them relocating objects, which means any
pointers to those objects have to be updated. Within those approaches
there are generational schemes that can improve performance a lot.
Hedgehog Lisp (https://github.com/sbp/hedgehog) has a simple
2-generation copying gc that you might examine. There are some other
methods like having separate pages for each type of fixed-size object
("BIBOP", BIg Bag Of Pages) that I think have fallen into disuse.

In my experience, if you have a reasonable amount of memory, gc
performance isn't a big issue unless you've got a backend that generates
native machine code. Otherwise your interpreter will spend most of its
time running user code rather than gc.

Here's a nice page about the Factor gc that I just happened to look at:

http://factor-language.blogspot.com/2009/11/mark-compact-garbage-collection-for.html

Factor uses a mark-compact scheme inspired by Clojure, that's also
described in the 2011 Jones et al book that's been mentioned here a few
times.

> I'm referring to a trivial extension to Forth: if the search fails
> and it is not a number, you've the option to execute some word.

I think you want to be able to call a user word even if the search
doesn't fail, or it's a number. E.g. you may have to treat Lisp numbers
differently than Forth numbers.

>>Not sure what you mean: an existing Lisp in C?
> Indeed. Also about 1000 lines

Which one is that, if I can ask? Is it online?

lawren...@gmail.com

unread,
Sep 10, 2016, 4:17:39 AM9/10/16
to
On Monday, September 5, 2016 at 6:28:59 AM UTC+12, Paul Rubin wrote:
> ... ALLOC notices if you're running low on memory, and performs a GC if you
> are.

How does it decide when you are “running low on memory”? Does it limit your heap size, like Java does?

Why not use reference-counting as a first resort, only falling back to GC when you need to, like Perl and Python do?

franck....@gmail.com

unread,
Sep 10, 2016, 10:23:45 AM9/10/16
to
Le vendredi 9 septembre 2016 22:06:04 UTC+2, Paul Rubin a écrit :
> If you want to defragment, the basic approaches are mark-and-compact, or
> stop-and-copy. Both of them relocating objects, which means any
> pointers to those objects have to be updated. Within those approaches
> there are generational schemes that can improve performance a lot.
> Hedgehog Lisp (https://github.com/sbp/hedgehog) has a simple
> 2-generation copying gc that you might examine. There are some other
> methods like having separate pages for each type of fixed-size object
> ("BIBOP", BIg Bag Of Pages) that I think have fallen into disuse.
>

This is how Oforth allocates objects : using separate pages for
each object size. I didn't know this method has fallen into disuse.
In fact, I though it was used a lot...
Is there any drawback for this method to not be used anymore ?
Not having fragmentation is better than needing a defragmentation phase, no ?

Anyway, still a BIBOP user here...

btw there is a page about how Oforth's GC works here :
http://www.oforth.com/memory.html

Paul Rubin

unread,
Sep 10, 2016, 3:01:18 PM9/10/16
to
lawren...@gmail.com writes:
>> ... ALLOC notices if you're running low on memory
> How does it decide when you are “running low on memory”?

"running low on memory" was a poor choice of words. In the case of
gc.fs, I believe it gc's after every N bytes of allocations, where N is
a parameter you can set. That is typical of simple gc's. But I don't
have it in front of me. Fancier GC's use more complicated schemes to
decide when and how to gc.

> Does it limit your heap size, like Java does?

You can set the gc.fs heap size to your liking when you initialize the
gc, but once you do, it can't be changed.

Java allows you to limit the heap size with the -Xmx option but doesn't
require you to, I thought.

> Why not use reference-counting as a first resort, only falling back to
> GC when you need to, like Perl and Python do?

I can't speak for Perl but I think refcounting was a poor design
decision in CPython. It had a period of usefulness because before the
addition of the "with" statement, people relied on its undocumented
behaviour of finalizing objects as soon as the last reference went away
as a poor man's RAII. But "with" and context managers are the right way
to do that in Python. Refcounting clutters up the C API, misses
reclaiming cyclic data (so a gc had to be added after the fact), is
slower than decent gc even in a single thread, mandates the notorious
GIL when multiple threads are running, etc.

I don't know of any other Python implementations besides CPython that
use refcounting. PyPy, MicroPython, IronPython, Jython, and another one
whose name I've forgotten all use gc. Afaict, refcounting is a simple
and easily understood method that tempts naive programmers but gc is
better in most situations.

Paul Rubin

unread,
Sep 10, 2016, 3:08:06 PM9/10/16
to
lawren...@gmail.com writes:
> Why not use reference-counting as a first resort, only falling back to
> GC when you need to, like Perl and Python do?

I forgot to add: refcounting doesn't really work in Forth, since you
don't know what cells have pointers in them. With conservative GC you
just assume anything that looks like a pointer is one and occasionally
miss reclaiming something, usually no big deal. With refcounting and
that approach, you could clobber user data thinking you were adjusting a
reference count.

And as Anton mentioned a while back, you'd have to adjust refcounts
every time a program uses DUP (etc.), which would cause a bad slowdown.
Conservative GC on the other hand suits Forth pretty well.

lawren...@gmail.com

unread,
Sep 11, 2016, 12:57:24 AM9/11/16
to
On Sunday, September 11, 2016 at 7:01:18 AM UTC+12, Paul Rubin wrote:

> Lawrence D’Oliveiro writes:
>
>> Does it limit your heap size, like Java does?
>
> Java allows you to limit the heap size with the -Xmx option but doesn't
> require you to, I thought.

There is always a limit. If you don’t specify a limit, the Sun/Oracle Java VM calculates one for you <http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html#sthref22>. On Android, there is a fixed size, dependent on the device/OS build, that you cannot change--typically well under 100MB.

>> Why not use reference-counting as a first resort, only falling back to
>> GC when you need to, like Perl and Python do?
>
> I can't speak for Perl but I think refcounting was a poor design
> decision in CPython. It had a period of usefulness because before the
> addition of the "with" statement, people relied on its undocumented
> behaviour of finalizing objects as soon as the last reference went away
> as a poor man's RAII. But "with" and context managers are the right way
> to do that in Python.

You are confusing two different things: efficient memory utilization, and semantically-guaranteed cleanup in the presence of exceptions.

> Refcounting clutters up the C API, misses reclaiming cyclic data
> (so a gc had to be added after the fact), is slower than decent gc even in
> a single thread, mandates the notorious GIL when multiple threads are
> running, etc.

The cyclic issue was why I said to use it as a “first resort”.

None of the rest of what you said is true. Example: the Linux kernel, which makes heavy use of reference-counting for its objects, yet manages to have high enough performance, and versatility, to be able to run on hardware as modest as Blackfin or ARM, all the way up to the world’s most powerful mainframes and supercomputers <http://lxr.free-electrons.com/source/arch/>. That includes massively-parallel hardware with thousands of cores.

Paul Rubin

unread,
Sep 11, 2016, 1:12:59 AM9/11/16
to
lawren...@gmail.com writes:
> There is always a limit. If you don’t specify a limit, the Sun/Oracle
> Java VM calculates one for you Android, there is a fixed size,

No idea about Android, I've only used java on servers. I thought it
would just eat all the system memory if you didn't use -Xmx. But
whatever. Certainly other systems do that.

> None of the rest of what you said is true. Example: the Linux kernel,
> which makes heavy use of reference-counting for its objects,

That's somewhat different from a language implementation where there are
lots of structures with pointers and arbitrarily deep indirection, new
objects being allocated willy nilly etc.

lawren...@gmail.com

unread,
Sep 11, 2016, 2:30:13 AM9/11/16
to
On Sunday, September 11, 2016 at 5:12:59 PM UTC+12, Paul Rubin wrote:
> Lawrence D’Oliveiro writes:
>> There is always a limit. If you don’t specify a limit, the Sun/Oracle
>> Java VM calculates one for you Android, there is a fixed size,
>
> I thought it would just eat all the system memory if you didn't use -Xmx.

That’s the big drawback with garbage collection.

>> None of the rest of what you said is true. Example: the Linux kernel,
>> which makes heavy use of reference-counting for its objects...
>
> That's somewhat different from a language implementation where there are
> lots of structures with pointers and arbitrarily deep indirection, new
> objects being allocated willy nilly etc.

In other words, rather like a multiuser, multitasking, multiprocessing kernel running lots of concurrent applications on large, complex hardware with hardware virtualization, software-defined-networking, complex memory models, virtual filesystems and next-generation filesystems like ZFS or btrfs?

Paul Rubin

unread,
Sep 11, 2016, 2:48:19 AM9/11/16
to
lawren...@gmail.com writes:
>> I thought it would just eat all the system memory if you didn't use -Xmx.
> That’s the big drawback with garbage collection.

I dunno, if you have the memory then it's reasonable for the server to
expect that you want to use it. If you want to use only part of it,
there's the -Xmx option for that purpose. Weren't you complaining
earlier that it DID set a limit? It's been a long while since I used it
so I don't remember what the default was, but I'd had the idea that it
was basically the amount of physical memory available.

> In other words, rather like a multiuser, multitasking, multiprocessing
> kernel running lots of concurrent applications on large, complex

We were talking about language implementations, which allocate a lot of
small objects (like cons cells in the case of lisp) rapidly, have
complicated and unpredictable reference graphs, etc. In CPython, even
small integers like 135 are heap-allocated (integers less than 100 are
specially cached iirc). That's a problem with different trade-offs than
an OS tracking large objects like page buffers, where the pointers live
in only a few types of other objects. I do remember that Git uses
garbage collection on disk, though I have no idea how it works
underneath.

Andrew Haley

unread,
Sep 11, 2016, 3:59:20 AM9/11/16
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> lawren...@gmail.com writes:
>
>> In other words, rather like a multiuser, multitasking, multiprocessing
>> kernel running lots of concurrent applications on large, complex
>
> We were talking about language implementations, which allocate a lot
> of small objects (like cons cells in the case of lisp) rapidly, have
> complicated and unpredictable reference graphs, etc. In CPython,
> even small integers like 135 are heap-allocated (integers less than
> 100 are specially cached iirc).

You're exactly right, and it's simple to see why. With multi-core
systems and reference counting you need an atomic compare-and-
increment operation every time that you change a pointer. Even worse,
that CAS is in the target of the pointer and causes lots of cache
maintanenance traffic.

Andrew.

Paul Rubin

unread,
Sep 11, 2016, 4:16:39 AM9/11/16
to
Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> You're exactly right, and it's simple to see why. With multi-core
> systems and reference counting you need an atomic compare-and-
> increment operation every time that you change a pointer.

Yep. You even need that when you only copy a pointer, like if you say
"a = b" you have to adjust two refcounts. That's painful even on a
single core computer. CPython got away with it because the interpreter
overhead was so bad that refcounting got lost in the noise, but I don't
think higher performance systems generally work that way.

Paul Rubin

unread,
Sep 11, 2016, 4:28:19 AM9/11/16
to
franck....@gmail.com writes:
>> ("BIBOP", BIg Bag Of Pages) that I think have fallen into disuse.
> This is how Oforth allocates objects...
> Is there any drawback for this method to not be used anymore ?
> Not having fragmentation is better than needing a defragmentation
> phase, no ?

Well, I'm realizing my picture comes from talking to old-time Lisp guys
since I've never used a BIBOP system myself. But my impression is that
BIBOP was used mostly in the era of expensive memory and comparatively
small address spaces. For example, the PDP-10 had 18-bit addresses
(256k of 36-bit words = about 1MB of address space) and Maclisp on the
'10 used BIBOP. The upper 9 bits of the 18-bit address indexed into a
512-entry segment table, and the segments were 512 words each. (Thanks
to Alan Bawden for explaining this to me on comp.lang.lisp a while
back). Once you get to 64-bit or even 32-bit addresses, those tables
start getting large and it's easier to put tags in the objects.

Lisp programs also have a fair number of arbitrary-sized objects like
strings and arrays, so there's still fragmentation issues.

Jones et al's book probably says more about this. I might check it when
I get a chance.

lawren...@gmail.com

unread,
Sep 11, 2016, 10:00:36 PM9/11/16
to
On Sunday, September 11, 2016 at 6:48:19 PM UTC+12, Paul Rubin wrote:
>
> Lawrence D’Oliveiro writes:
>
>>> I thought it would just eat all the system memory if you didn't use -Xmx.
>>
>> That’s the big drawback with garbage collection.
>
> I dunno, if you have the memory then it's reasonable for the server to
> expect that you want to use it.

If your server is only doing one thing, then fine. Otherwise it’s not a very system-friendly thing to do.

On Linux, if you eat up all the RAM, you will arouse the wrath of the dreaded “OOM Killer”.

lawren...@gmail.com

unread,
Sep 11, 2016, 10:02:36 PM9/11/16
to
On Sunday, September 11, 2016 at 8:16:39 PM UTC+12, Paul Rubin wrote:
> You even need that when you only copy a pointer, like if you say
> "a = b" you have to adjust two refcounts. That's painful even on a
> single core computer.

Could be worse. With GC, you are whacking memory that has long since fallen out of the cache. So you are going to incur a shitload of cache misses. Not good for performance at all.

I remember in the mid to late 1990s, Corel undertook a project to rewrite its office suite in Java. Then it found out how bad the performance was...

Anton Ertl

unread,
Sep 12, 2016, 6:09:36 AM9/12/16
to
lawren...@gmail.com writes:
>On Monday, September 5, 2016 at 6:28:59 AM UTC+12, Paul Rubin wrote:
>> ... ALLOC notices if you're running low on memory, and performs a GC if y=
>ou
>> are.
>
>How does it decide when you are =E2=80=9Crunning low on memory=E2=80=9D?

This is actually documented pretty exhaustively in the manual:

|Tuning
|
|gc.fs generally uses only a part of the memory it manages. This policy
|allows you to give it a huge amount of memory to manage, as
|preparation for the worst case, without consuming that much real (and,
|for good OSes, even virtual) memory before it is necessary.
|
|The amount of memory actually used is determined in the following way:
|
|memory= (a * livemem / b) + c
|
|Where livemem is the amount of memory in live objects after the last
|garbage collection, and a, b, and c are tunable parameters. By
|default, a/b is 3, and c is 256KB.
|
|You can tune these parameters for your application: You can reduce the
|memory, at the cost of a higher number of garbage collections, or you
|can increase it to reduce the number of garbage collections. Note that
|if you reduce memory too much (to less than about 2*livemem), the
|number of garbage collections rises extremely. And if a
|garbage-collection does not free an appropriate memory chunk, alloc
|will grab more memory regardless of the limit.
|
|The following code sequence demonstrates the adjustment of parameters
|by resetting them to their default values:
|
|also garbage-collector
|3 1 ( a b ) size-factor 2!
|256 1024 * ( c ) grain/ size-offset !
|set-current-limit
|previous

You also have to tell gc.fs how much maximum memory it should use; it
will allocate that much (plus 1/32 of that) from the start, but will
write to only the part of it that the "memory" above specifies; so
only that much physical memory (and, with overcommit, that much
virtual memory) is really consumed; in other words, if you have an OS
with overcommit, you can make the maximum memory big, e.g., the RAM
size, and if the amount of live memory stays small, the amount of used
memory will also stay relatively small.

> Do=
>es it limit your heap size, like Java does?

I have no idea how Java does it.

>Why not use reference-counting as a first resort, only falling back to GC w=
>hen you need to, like Perl and Python do?

As others have mentioned, that would be inefficient. Moreover, having
both reference counting and GC is more complex than necessary. Most
importantly, you would have to prefix all operations that deal with
references in order to add the reference counting; e.g., RDUP ROVER R@
R!; if the programmer forgets this once, the invariant of reference
counting is broken, and such bugs are not easy to find. For Python
and Perl, that's no problem, because there the system keeps track of
references (by tagging).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

Anton Ertl

unread,
Sep 12, 2016, 6:16:47 AM9/12/16
to
Lars Brinkhoff <lars...@nocrew.org> writes:
>Can you make it Emacs Lisp, so we can use it to rewrite Emacs in Forth?
>
>Just kidding. Or maybe not?

There is Forthmacs, but there the macro language is Forth.
Otherwise, why rewrite the C part of GNU Emacs in Forth? It's a big
and complex program, and you would have to maintain compatibility with
its big and complex interface (otherwise, why go for Lisp at all?).

Anton Ertl

unread,
Sep 12, 2016, 6:23:17 AM9/12/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>In article <jwvmvji74hy.fsf-mon...@gnu.org>,
>Stefan Monnier <mon...@iro.umontreal.ca> wrote:
>>>> the whole memory in use by the application is considered as potential
>>>> pointers
>>> That's wrong. If it was, cyclic structures would not be collected,
>>
>>That depends on how you interpret "use".

All memory that the collector has given out to the application and
that has not been reclaimed.

>In the context of conservative garbage collectors, I expect that sometimes
>cyclic structures are not collected.

If you use the "in use" memory as specified above as the root set,
then cyclic structures will never be collected.

Lars Brinkhoff

unread,
Sep 12, 2016, 7:15:07 AM9/12/16
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> Lars Brinkhoff <lars...@nocrew.org> writes:
>>Can you make it Emacs Lisp, so we can use it to rewrite Emacs in Forth?
>>Just kidding. Or maybe not?
> There is Forthmacs, but there the macro language is Forth.

I thought Forthmacs was a Forth implementation. Did it have an Emacs
clone too?

Albert van der Horst

unread,
Sep 12, 2016, 7:33:58 AM9/12/16
to
In article <2016Sep1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>lawren...@gmail.com writes:
>>On Monday, September 5, 2016 at 6:28:59 AM UTC+12, Paul Rubin wrote:
>>> ... ALLOC notices if you're running low on memory, and performs a GC if y=
>>ou
>>> are.
>>
>>How does it decide when you are =E2=80=9Crunning low on memory=E2=80=9D?
>
>This is actually documented pretty exhaustively in the manual:
>
<SNIP memory allocation strategy >

I believe that the package behaves reasonably with the defaults.

What I don't understand is where the information enters that
something is no longer needed. (I only do allocs, I'm not supposed
to free's, but the garbage collection must have *something* to
work with.)

>
>- anton

Stefan Monnier

unread,
Sep 12, 2016, 11:47:16 AM9/12/16
to
>>>>> the whole memory in use by the application is considered as potential
>>>>> pointers
>>>> That's wrong. If it was, cyclic structures would not be collected,
>>> That depends on how you interpret "use".
> All memory that the collector has given out to the application and
> that has not been reclaimed.

I read it as: the memory that was found to be reachable.


Stefan

Anton Ertl

unread,
Sep 12, 2016, 12:59:14 PM9/12/16
to
But you need the pointers to find reachable memory, so your reading is
not sufficient for determining live memory; the other reading also has
it's problems, as pointed out above, and more importantly, it's not
what conservative collectors do, so I just would not use this wording
at all.

Anton Ertl

unread,
Sep 12, 2016, 1:26:18 PM9/12/16
to
I thought so, but I could be wrong.

Anton Ertl

unread,
Sep 12, 2016, 1:29:31 PM9/12/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>What I don't understand is where the information enters that
>something is no longer needed. (I only do allocs, I'm not supposed
>to free's, but the garbage collection must have *something* to
>work with.)

What happens is that, starting with the root set, the garbage
collector recursively scans for live memory. Anything that is not
found is no longer needed.

Stefan Monnier

unread,
Sep 12, 2016, 1:38:36 PM9/12/16
to
>>>>>>> the whole memory in use by the application is considered as potential
>>>>>>> pointers
>>>>> That's wrong. If it was, cyclic structures would not be collected,
>>>>> That depends on how you interpret "use".
>>> All memory that the collector has given out to the application and
>>> that has not been reclaimed.
>> I read it as: the memory that was found to be reachable.
> But you need the pointers to find reachable memory,

Right, and the conservative GC considers anything that looks like
a pointer to be pointer w.r.t deciding if something is reachable or not.
So you *do* have "the pointers".

> it's not what conservative collectors do, so I just would not use this
> wording at all.

I wouldn't either, but in the context I find it understandable.


Stefan

Albert van der Horst

unread,
Sep 12, 2016, 2:48:02 PM9/12/16
to
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>What I don't understand is where the information enters that
>>something is no longer needed. (I only do allocs, I'm not supposed
>>to free's, but the garbage collection must have *something* to
>>work with.)
>
>What happens is that, starting with the root set, the garbage
>collector recursively scans for live memory. Anything that is not
>found is no longer needed.

In the context I find this non-sensical. The root-set is part of
the application. The garbage collector only knows what has been
alloced, so as far as the garbage collector is concerned there
is no such thing as a "root set".

So please explain.

>
>- anton

Paul Rubin

unread,
Sep 12, 2016, 4:08:40 PM9/12/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> so as far as the garbage collector is concerned there
> is no such thing as a "root set". So please explain.

gc.fs has some words where you tell the gc what the root set is.
The data stack contents are also in the root set. The return stack
contents are not, since there's no standard way to get at them,
but it seems worth doing that if your Forth has a way.

If you look at small Lisp implementations, their gc's also scan a
predefined root set. I've been looking at uLisp (ulisp.com) lately
and you might find it educational. It's about 3 kloc about 1/3 of
which is Arduino specific.

Spiros Bousbouras

unread,
Sep 13, 2016, 5:37:35 PM9/13/16
to
On Sat, 10 Sep 2016 07:23:43 -0700 (PDT) franck....@gmail.com wrote:
> Paul Rubin on 2016-09-09T22:06:04+02:00
I don't see what BiBOP has to do with defragmentation. Say you allocate
contiguous regions A , B , C , ... of the same type and a while afterwards ,
after garbage collection , the system sees that region B is no longer
accessed therefore it can be freed. Then either it copies the regions above B
to cover the range of addresses covered by B or you have fragmentation. As
far as I know , the usefulness of BiBOP is to save memory : instead of having
type tags for each allocated object you have 1 type tag per page so you can
determine the type of the object depending on which page it is allocated.
Obviously it would be impractical to do this for every type but it is
practical for the most commonly used types like integers.

Paul Rubin

unread,
Sep 13, 2016, 5:41:00 PM9/13/16
to
franck....@gmail.com writes:
> btw there is a page about how Oforth's GC works here :

Franck, was Oforth ever released as source code? Are you still thinking
of doing that? I half-remember an old conversation where you weren't
sure. It's a nice looking system.

Albert van der Horst

unread,
Sep 14, 2016, 3:47:14 AM9/14/16
to
In article <xk_Bz.1470863$3%1.11...@fx40.am4>,
I can see that. Suppose lisp object are normally 4 cells.
Then there are byte sequences (like character strings) and arrays
(like bignums). If the lisp objects are in a separate space,
you need not compact. In the above example, B can never become
an unusable fragment because one lisp object was there, and one new lisp
object still fits there.

Groetjes Albert

Anton Ertl

unread,
Sep 14, 2016, 3:58:21 AM9/14/16
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>In article <2016Sep1...@mips.complang.tuwien.ac.at>,
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>>What I don't understand is where the information enters that
>>>something is no longer needed. (I only do allocs, I'm not supposed
>>>to free's, but the garbage collection must have *something* to
>>>work with.)
>>
>>What happens is that, starting with the root set, the garbage
>>collector recursively scans for live memory. Anything that is not
>>found is no longer needed.
>
>In the context I find this non-sensical. The root-set is part of
>the application. The garbage collector only knows what has been
>alloced, so as far as the garbage collector is concerned there
>is no such thing as a "root set".

Every garbage collector uses a root set, so there obviously is such a
thing.

One automatic approach is to treat all memory that is not allocated
through the collector as roots; I think Boehm-Weiser does something
like this.

For a Lisp garbage collector, typically the root of the Lisp symbol
table, the stack, and possibly registers are used as roots.

For my Forth garbage collector, the data stack and explicitly
registered memory locations are the root set. There is no standard
way to access the return stack or locals, nor all of the dictionary,
and I wanted the garbage collector to be a standard library, so that's
how I arrived at that decision.

franck....@gmail.com

unread,
Sep 14, 2016, 4:04:41 AM9/14/16
to
I'am not sure Oforth is a BIBOP system (I didn't know the name before Paul
used it) but Oforth allocates objects according to their size (ie number
of attributes).

If you allocate contiguous regions A, B, C in the same page, it
means that objects at A, B and C addresses have the same size
(but not necessarily the same type).

You are right, if the GC releases region B because the object is
no more used, fragmentation happens, but the next (well, perhap's not
the next) object to be allocated with the same number of attributes ie the
same size (even if not the same type), will be allocated at B region,
because size of region B is exactly the size needed for the object to be
allocated.

So objects allocation resorbs fragmentation : there is no compact or
defragmentation phase during the GC (no copy, no pointer update, ...).

Franck

franck....@gmail.com

unread,
Sep 14, 2016, 6:37:56 AM9/14/16
to
Thanks,

Yes, its planned for the V1.0 release.
Problem is V1.0 release date...
I think there will be another minor release, then V1.0

Lars Brinkhoff

unread,
Sep 27, 2016, 2:51:51 AM9/27/16
to

Paul Rubin

unread,
Sep 27, 2016, 3:47:33 AM9/27/16
to
Lars Brinkhoff <lars...@nocrew.org> writes:
> "A Forth Implemenation of LISP"
> http://soton.mpeforth.com/flag/jfar/vol5/no1/article22.pdf

Interesting but unfortunately gives almost no implementation details.

Gerry Jackson

unread,
Sep 30, 2016, 2:47:56 AM9/30/16
to
On 29/08/2016 17:46, Paul Rubin wrote:
> alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> The following is inspired by the website
>> http://www.buildyourownlisp.com
>> This site is beyond brilliant. A comparison with the lisp's in
>> Forth I've seen is ... well let's not compare.
>
> You might also like:
>
> https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours
>
> Or just look at source code of many of the Lisps out there that are
> written in C or in Lisp itself.
>
> You should probably also read SICP at some point:
>
> https://mitpress.mit.edu/sicp/
>
> It looks like the Lisp from buildyourownlisp.com is very limited.
>

Has anybody mentioned Lispkit which seems to have been implemented in a
few languages.

https://en.wikipedia.org/wiki/Lispkit_Lisp

It is built on the SECD virtual machine and includes garbage collection.
Not being a Lisper I have no idea how good, bad or indifferent it is.

--
Gerry

Albert van der Horst

unread,
Sep 30, 2016, 5:13:59 AM9/30/16
to
In article <nsl1qf$ip2$1...@dont-email.me>,
The lexicon is very helpful, because it's so concise.
This lisp is for experimenting with advanced functions, purely
functional. Not what I intend to try.

>
>--
>Gerry

Paul Rubin

unread,
Sep 30, 2016, 9:36:11 PM9/30/16
to
Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
> https://en.wikipedia.org/wiki/Lispkit_Lisp
> It is built on the SECD virtual machine and includes garbage
> collection. Not being a Lisper I have no idea how good, bad or
> indifferent it is.

I hadn't seen that before, but it looks like a research platform from
quite a long time ago, maybe of some interest to study but not for
actual use.

Albert van der Horst

unread,
Oct 1, 2016, 5:11:08 AM10/1/16
to
In article <87intc3...@jester.gateway.pace.com>,
The first wikipedia reference
http://www.cs.ox.ac.uk/files/3299/PRG32%20vol%201.pdf
is valuable to me because it is a description that doesn't assume
prior knowledge.

It is an interesting read for all Forthers who want to get an
impression of how lisp is implemented without spending an undue amount
of time.

luser droog

unread,
Oct 1, 2016, 10:34:20 PM10/1/16
to
On Saturday, October 1, 2016 at 4:11:08 AM UTC-5, Albert van der Horst wrote:
> In article <87intc3...@jester.gateway.pace.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
> >Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
> >> https://en.wikipedia.org/wiki/Lispkit_Lisp
> >> It is built on the SECD virtual machine and includes garbage
> >> collection. Not being a Lisper I have no idea how good, bad or
> >> indifferent it is.
> >
> >I hadn't seen that before, but it looks like a research platform from
> >quite a long time ago, maybe of some interest to study but not for
> >actual use.
>
> The first wikipedia reference
> http://www.cs.ox.ac.uk/files/3299/PRG32%20vol%201.pdf
> is valuable to me because it is a description that doesn't assume
> prior knowledge.
>
> It is an interesting read for all Forthers who want to get an
> impression of how lisp is implemented without spending an undue amount
> of time.
>

Excellent link. And excellent series of threads!

In writing my own attempt at a very low-level lisp,
https://github.com/luser-dr00g/sexp.c
I found "tinylisp" to be a very usefully documented project,
almost "literate" in its explanation of the representations.

It (my own code) doesn't address the gc issue at all.
tinylisp iirc does.

luser droog

unread,
Oct 21, 2016, 8:02:39 PM10/21/16
to
On Monday, August 29, 2016 at 11:33:06 AM UTC-5, Albert van der Horst wrote:
> The following is inspired by the website
> http://www.buildyourownlisp.com
> This site is beyond brilliant. A comparison with the lisp's in
> Forth I've seen is ... well let's not compare.

I have a very terse, functional, factored lisp in c.
When researching this stuff, I found the above site
interesting, but the "tinylisp" blog much more practical.
Link in the source.

https://github.com/luser-dr00g/sexp.c/blob/master/sexp.c

It uses a custom 6bit encoding to pack 5-character
atoms into an int (+ tag). This has obvious implementation
advantages but does constrain the resulting language
considerably.

Albert van der Horst

unread,
Oct 21, 2016, 8:34:00 PM10/21/16
to
In article <0ff88afb-d576-4535...@googlegroups.com>,
luser droog <mij...@yahoo.com> wrote:
>On Monday, August 29, 2016 at 11:33:06 AM UTC-5, Albert van der Horst wrote:
>> The following is inspired by the website
>> http://www.buildyourownlisp.com
>> This site is beyond brilliant. A comparison with the lisp's in
>> Forth I've seen is ... well let's not compare.
>
>I have a very terse, functional, factored lisp in c.
>When researching this stuff, I found the above site
>interesting, but the "tinylisp" blog much more practical.
>Link in the source.
>
>https://github.com/luser-dr00g/sexp.c/blob/master/sexp.c

Added to my list of reference implementations.
Thanks.

>
>It uses a custom 6bit encoding to pack 5-character
>atoms into an int (+ tag). This has obvious implementation
>advantages but does constrain the resulting language
>considerably.

My goal is to gain insight in fundamental lisp matters.
My enthousiasm for http://www.buildyourownlisp.com has waned
as I find that it side tracks too much from the essential
issues. I still appreciate its clarity.

My main focus now is on lispkit which is sufficiently well known
that it has a wikipedia entry. The kernel is written in Pascal
and I can follow it completely. What's more, I can translate
the limited subset of Pascal that is used into Forth pretty
mechanically.

Albert van der Horst

unread,
Nov 2, 2016, 2:07:03 PM11/2/16
to
In article <%mzzz.1531332$jB.12...@fx43.am4>,
Spiros Bousbouras <spi...@gmail.com> wrote:
<SNIP>
>> Then the code is high level Forth code, and again it is allocated
>> dynamically and a pointer to it is present in the header.
>> (High level Forth code is, of course a list of code tokens.)
>> Contrary to malloc, my ALLOCATE knows a lot about
>> the size etc of pieces allocated.
>
>malloc() also knows about the size of pieces allocated otherwise it wouldn't
>work. The interface does not make such information available to the programmer
>but the internals of malloc() (and related functions) certainly need to know.

That is true but that is in behalf of freeing and resizing and is in
general not byte count precise (e.g. a buddy implementation).
lisp needs the exact size of symbols, bignums and strings, and they
are available via my ALLOCATE, saving a cell and some hassle.

Groetjes Albert

>--
>it's just that in C++ and the like, you don't trust _anybody_, and in CLOS you
>basically trust everybody. the practical result is that thieves and bums use C++
>and nice people use CLOS. :)
> Erik Naggum

Liang Ng

unread,
Feb 8, 2020, 8:59:56 PM2/8/20
to
Dear All,

I found this 2016 thread while searching for SECD Machine.

Just wondering if anyone has recent updates on this topic?

It looks like a "functional" theoretical foundation for stack machines.

How different is it from Forth?

Has anyone attempted to present a "theoretical foundation" for Forth similar to SECD machine?

I have not seen many useful information else where.

Is Forth considered "outlaw" by "functional theorists" that they don't even care to mention it except briefly here?

https://xavierleroy.org/mpri/2-4/machines.pdf

Thank you very much.

On Sunday, 2 October 2016 10:34:20 UTC+8, luser droog wrote:
> On Saturday, October 1, 2016 at 4:11:08 AM UTC-5, Albert van der Horst wrote:
> > In article <87in...com>,
> > Paul Rubin <no.email....invalid> wrote:
Message has been deleted

sydney...@gmail.com

unread,
Feb 9, 2020, 11:28:58 AM2/9/20
to
> Dear All,
>
> I found this 2016 thread while searching for SECD Machine.
>
> Just wondering if anyone has recent updates on this topic?
>
> It looks like a "functional" theoretical foundation for stack machines.
>
> How different is it from Forth?
>
> Has anyone attempted to present a "theoretical foundation" for Forth similar to SECD machine?
>
> I have not seen many useful information else where.
>
> Is Forth considered "outlaw" by "functional theorists" that they don't even care to mention it except briefly here?

Here's a video which goes into some detail regarding the theoretical underpinnings of concatenative languages as a whole, of which Forth is the prototypical example: https://www.youtube.com/watch?v=_IgqJr8jG8M It just so happens that - quite coincidentally - Moore invented a language that is a pretty good modal of SKI-combinator calculus in the same way McCarthy invented a language that is a pretty good model of the λ calculus. Similar to how Scheme came along and brought it (Lisp) even closer (you may even say as close as you can get practically) to its theoretical roots, later concatenative languages have come along and brought Forth (directly or indirectly) closer to its theoretical roots in SKI combinators.
You may find this link on the mathematical underpinnings of the language Joy interesting also: https://hypercubed.github.io/joy/html/j02maf.html
What Forth "lacks" (in its purest form I suppose you could say) is a means of easy compound quotation; providing a direct syntax for "delaying" a group of expressions in a similar way to what ['] does for a single execution token really opens up a world of possibilities in terms of combinators or "higher-order" words (words that take other words as arguments, like an fmap word for example). When you combine that with its relative lack of concern regarding mutable/destructive updates, that might make it a bit less interesting than more modern concatenative languages from a theoretical perspective.
hth, Syd.
0 new messages