Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Not understanding how to write low-level code in Lisp

800 views
Skip to first unread message

CuppoJava

unread,
Jun 8, 2011, 3:25:26 PM6/8/11
to
Hello everyone,

I've read a few times now about compilers, interpreters, and systems
for Lisp written in Lisp itself. I would also like to write a Lisp
system in Lisp but have trouble understanding how to write anything
decently performant in it.

So far I've written:
- A bunch of high level Lisp programs.
- A few Lisp interpreters and compilers in C.
- A metacircular Lisp interpreter in Lisp.

The main questions I have about writing a Lisp in Lisp is:
- How do you write a garbage collector in Lisp? In C it was relatively
straightforward since I have access to the bits directly. In Lisp I
don't really understand how it's done without performance losses.
- When I design my interpreter how do I control how the bits are laid
out in memory?

Thanks a lot for your help. I really love Lisp and writing a self-
compiling compiler is a goal of mine.
-Patrick

Pascal J. Bourguignon

unread,
Jun 8, 2011, 4:45:46 PM6/8/11
to
CuppoJava <patrick...@gmail.com> writes:

> Hello everyone,
>
> I've read a few times now about compilers, interpreters, and systems
> for Lisp written in Lisp itself. I would also like to write a Lisp
> system in Lisp but have trouble understanding how to write anything
> decently performant in it.
>
> So far I've written:
> - A bunch of high level Lisp programs.
> - A few Lisp interpreters and compilers in C.
> - A metacircular Lisp interpreter in Lisp.
>
> The main questions I have about writing a Lisp in Lisp is:
> - How do you write a garbage collector in Lisp? In C it was relatively
> straightforward since I have access to the bits directly.

Actually, Common Lisp has better and more primitives to access the bits
directly than C.

The only thing that is out of the scope of CL, is the peeking and poking
of bytes in raw memory. But most CL implementations provide such
primitives (with the associated "address" type), usually under the guise
of a FFI. You can also use CFFI to use those implementations specific
extensions in a portable way.

Of course, since you are considering implementing CL yourself, you won't
rely on the bootstrap CL implementation to peek and poke, you will
implement those low level primitives yourself.

> In Lisp I don't really understand how it's done without performance
> losses.

What performance losses? If you are implementing a compiler, then you
can make it generate code as performant as you want.


> - When I design my interpreter how do I control how the bits are laid
> out in memory?

Why would that have any importance? For an interpreter there's little
point in doing that.

For a compiler you have to decide how the bits are laid out in memory.
If you want your interpreter to run with programs compiled with your
compilers, then just compiling it with your compiler should be enough to
ensure the compatibility, ie. that they use the same data structures.

> Thanks a lot for your help. I really love Lisp and writing a self-
> compiling compiler is a goal of mine.

Have a look at:
https://gitorious.org/com-informatimago/com-informatimago/trees/master/common-lisp/heap
which implements a simple garbage collected heap in which you can store
lisp data.


The first thing you have to ask yourself when writing a Lisp compiler in
Lisp, is what ABI you need to implement. How will your binary be
loaded, by what system, in what kind of memory addressing space.

If you are not implementing the OS yourself (eg. in Lisp), then the
answers to these questions are mostly given by the host OS. You may
somewhat isolate your compiler from the idiosyncrasies of a given host
OS, but this involves implementing a virtual machine (ie. to implement
an OS yourself).


This is however the choice made by most CL implementations, including
those that compile to native code such as SBCL: the binary files
generated by those implementations are not mere .o files, but .fas or
.fasl files, that can only be loaded in their own virtual machines, and
not by normal programs of the host OS.

A few implementations choose to avoid the virtual machine, and use
directly the ABI of the host OS, for example, ECL and ABCL. More or
less. For ABCL, the host OS is the JVM. For ECL, while it generates
.o files, and they can be loaded by normal applications of the host OS,
it also contains a virtual machine with a byte code interpreter and a
lisp interpreter.


So, once you know what object file format you must generate, and what
ABI (ie. function call conventions, register uses, placement of code,
heap and stack in memory, reference to library entry points, etc), it's
trivial to generate such files, since obviously you can generate binary
files in Common Lisp, using the :element-type '(unsigned-byte 8).

Well, if you are running on a 32-bit or 64-bit unix or MS-Windows
system. If you are running on a VMS 36-bit architecture, you will
probably need to use (unsigned-byte 9) instead (or perhaps not, things
can get complicated).


The important point is that your compiler doesn't need to use raw memory
to store the binary code it generates before writing it to the object
file. It can just use a (vector (unsigned-byte 8) *). (Then you can
use write-sequence to write the object file).

Now, if you want to use your compiler to generate code in memory to be
run directly (ie. implementing COMPILE instead of COMPILE-FILE), then
you will need to store the bytes in a raw memory block with execute
access rights. This is something that can only be done knowing the host
OS ABI. But since you will have written already COMPILE-FILE to
generate code that knows this host OS ABI, you can compile COMPILE with
COMPILE-FILE, and ensure that it has all the "primitives" needed to do
just that. (And again, the host OS could be your own virtual machine,
so it may be trivial: you might have implemented in your virtual machine
a primitive to install a byte vector as the binary code of a function:
(system:set-function-code (function f)
#(67 79 68 69 32 79 70 32 70 85 78 67 84 73 79 78 32 70))
).

--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.

CuppoJava

unread,
Jun 9, 2011, 10:27:03 AM6/9/11
to
Thanks for the reply, it was very helpful.

Of course, I realize that the code generation part of the compiler has
almost nothing to do with using C or Common Lisp.

However, I don't understand how to implement all the supporting
systems in Lisp. For example, the garbage collector and virtual
machine for my current Lisp system is implemented in C. The whole
notion of writing a garbage collector in a garbage collected language
eludes me.

For an interpreter, controlling how the bits are laid out are fairly
important for performance and size issues. It's not necessary as the
metacircular interpreter demonstrates. But the way in which bits are
laid out affects performance drastically.

Thanks
-Patrick

On Jun 8, 4:45 pm, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:

> Have a look at:https://gitorious.org/com-informatimago/com-informatimago/trees/maste...

Antti J Ylikoski

unread,
Jun 9, 2011, 12:34:41 PM6/9/11
to

If I recall correctly then the Symbolics 3600 LISP machine had a LISP
compiler written in LISP -- and actually an entire operating system and
a graphical user inferface written in LISP (Zetalisp, which was one of
the precursors of the Common LISP + CLOS). Including a sophisticated
garbage collector. You might be able to get in your hands (some of)
this source code.

Cheers, Andy
Helsinki, Finland, the E.U.

Pascal J. Bourguignon

unread,
Jun 9, 2011, 1:05:14 PM6/9/11
to
CuppoJava <patrick...@gmail.com> writes:

> Thanks for the reply, it was very helpful.
>
> Of course, I realize that the code generation part of the compiler has
> almost nothing to do with using C or Common Lisp.
>
> However, I don't understand how to implement all the supporting
> systems in Lisp. For example, the garbage collector and virtual
> machine for my current Lisp system is implemented in C. The whole
> notion of writing a garbage collector in a garbage collected language
> eludes me.

What garbage collector?
The Common Lisp language doesn't specify any garbage collector.
How do you know there's a garbage collector in your implementation?
Have you ever filled up your memory (RAM + SWAP)?

Read my heap.lisp file.

Now, of course, if you had to allocate heap objects when collecting
garbage, you'd have a problem because since you're collecting garbage,
it means you don't have memory to allocate, and therefore you'd have to
collect garbage to be able to collect garbage. Clearly, an infinite
recursion.

How do you avoid infinite recursions? Trivially, by providing a base
case: write your garbage collector so that it doesn't need to allocate
heap objects!

Check com.informatimago.common-lisp.heap:gc-mark and
com.informatimago.common-lisp.heap:gc-sweep. They do not call
com.informatimago.common-lisp.heap:gc-allocate. Therefore they're a
base case for the garbage collection recursion.


Here is another trivial memory with mark and sweek garbage collector.

The garbage collector is a few lines. Again, the trick is that this
code just doesn't call new-cons or new-fixnum, therefore it's not
"consing" any memory:


(defun gc-mark (address)
"Marks the object at address and all the objects it deeply references."
(when (zerop (mark address))
(setf (mark address) 1)
(ecase (tag address)
((#.+tag/fixnum+)
#|nothing else to do.|#)
((#.+tag/cons+)
(gc-mark (car-value address))
(gc-mark (cdr-value address))))))

(defun gc-sweep ()
"Sweep: start from first cell, and in sequence,
check whether it's marked, and gather free cells."
(loop
with free = 0
with collected = 0 ; some statistics.
for addr from *heap-start* below *memory-size*
do (ecase (mark addr)
((0) ; collect it
(setf (cdr-value addr) free
free addr)
(incf collected))
((1) ; keep it
(setf (mark addr) 0)))
finally (setf free-list free)
(return collected)))

(defun memory-collect-garbage ()
(gc-mark root-set)
(gc-sweep))


;;;; -*- mode:lisp;coding:utf-8 -*-
;;;;**************************************************************************
;;;;FILE: trivial-gc.lisp
;;;;LANGUAGE: Common-Lisp
;;;;SYSTEM: Common-Lisp
;;;;USER-INTERFACE: NONE
;;;;DESCRIPTION
;;;;
;;;; A trivial memory with mark and sweep garbage collector.
;;;;
;;;;AUTHORS
;;;; <PJB> Pascal J. Bourguignon <p...@informatimago.com>
;;;;MODIFICATIONS
;;;; 2011-06-09 <PJB> Created.
;;;;BUGS
;;;;LEGAL
;;;; GPL
;;;;
;;;; Copyright Pascal J. Bourguignon 2011 - 2011
;;;;
;;;; This program is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU General Public License
;;;; as published by the Free Software Foundation; either version
;;;; 2 of the License, or (at your option) any later version.
;;;;
;;;; This program is distributed in the hope that it will be
;;;; useful, but WITHOUT ANY WARRANTY; without even the implied
;;;; warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
;;;; PURPOSE. See the GNU General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU General Public
;;;; License along with this program; if not, write to the Free
;;;; Software Foundation, Inc., 59 Temple Place, Suite 330,
;;;; Boston, MA 02111-1307 USA
;;;;**************************************************************************


;; Let's have a memory.
(defparameter *memory-size* 128)
(defparameter *memory* (make-array *memory-size* :initial-element 0))


;; The memory contains two kinds of data, fixnums or cons cells;
;; Each memory word contains the value of the data, plus two tag bits
;; and one mark bit used by the garbage collector.
;;
;; | fixnum value |00m|
;; | car | cdr |01m|
;; | reserved |10m|
;; | reserved |11m|


(defconstant +tag/fixnum+ 0)
(defconstant +tag/cons+ 1)
(defconstant +tag/reserved0+ 2)
(defconstant +tag/reserved1+ 3)


(defparameter *mark-byte* (byte 1 0))
(defparameter *tag-byte* (byte 2 1))
(defparameter *fixnum-byte* (byte 24 3))
(defparameter *car-byte* (byte 12 15))
(defparameter *cdr-byte* (byte 12 3))


(defun mark (address)
(ldb *mark-byte* (aref *memory* address)))
(defun (setf mark) (new-bit address)
(setf (aref *memory* address) (dpb new-bit *mark-byte* (aref *memory* address))))

(defun tag (address)
(ldb *tag-byte* (aref *memory* address)))
(defun (setf tag) (tag address)
(setf (aref *memory* address) (dpb tag *tag-byte* (aref *memory* address))))

(defun fixnum-value (address)
(ldb *fixnum-byte* (aref *memory* address)))
(defun (setf fixnum-value) (new-value address)
(setf (aref *memory* address) (dpb new-value *fixnum-byte* (aref *memory* address))))

(defun car-value (address)
(ldb *car-byte* (aref *memory* address)))
(defun (setf car-value) (new-value address)
(setf (aref *memory* address) (dpb new-value *car-byte* (aref *memory* address))))

(defun cdr-value (address)
(ldb *cdr-byte* (aref *memory* address)))
(defun (setf cdr-value) (new-value address)
(setf (aref *memory* address) (dpb new-value *cdr-byte* (aref *memory* address))))

;; Let's collect the memory in a free list:

(define-symbol-macro free-list (car-value 0)) ; Address of the free list.

;; and keep what's in the root set:

(define-symbol-macro root-set (cdr-value 0)) ; Address of the root-set object.

;; Since we can store two pointers in a cons cell, we use that to store
;; both the free-list and root-set in the memory 0.

;; The root-set will always contain one free cons cell at the head, to
;; ease allocating structures. Usually the root-set is managed
;; outside of the heap, with specific memory, but with some care we
;; can also use the heap for the root-set list.


;; In a real memory management system there could be several root
;; objects (eg found on the stack). We can just put a list on the
;; +root-set+.

(defparameter *heap-start* 1)

(defun memory-initialize ()
"Initialize the free list and root set."
;; We use a simple list of cons cells.
(loop
for addr from *heap-start* below *memory-size*
do (setf (tag addr) +tag/cons+
(car-value addr) 0
(cdr-value addr) (1- addr)))
(setf free-list (1- *memory-size*)
root-set (new-cons 0 0))
(memory-room))

(defun memory-consp (address) (= +tag/cons+ (tag address)))
(defun memory-fixnump (address) (= +tag/fixnum+ (tag address)))

(defun memory-add-to-root-set (address)
"Add an object to the root set.
This is needed when we want to keep the object when it's not in any
list that's referenced."
(setf (car-value root-set) address)
(setf root-set (new-cons 0 root-set)))

(defun memory-remove-from-root-set (address)
"Remove an object from the root set."
(if (= address (car-value root-set))
(setf (car-value root-set) 0) ; strange case.
(loop
for current = root-set then (cdr-value current)
while (memory-consp current)
until (= address (car-value (cdr-value current)))
finally (when (= address (car-value (cdr-value current)))
(setf (cdr-value current) (cdr-value (cdr-value current)))))))

;;; Now the core of the mark-and-sweep garbage collector.
;;; What could be simplier?

(defun gc-mark (address)
"Marks the object at address and all the objects it deeply references."
(when (zerop (mark address))
(setf (mark address) 1)
(ecase (tag address)
((#.+tag/fixnum+)
#|nothing else to do.|#)
((#.+tag/cons+)
(gc-mark (car-value address))
(gc-mark (cdr-value address))))))


(defun gc-sweep ()
;; - sweep: start from first cell, and in sequence, check whether
;; it's marked, and gather free ranges.
(loop
with free = 0
with collected = 0
for addr from *heap-start* below *memory-size*
do (ecase (mark addr)
((0) ; collect it
(setf (cdr-value addr) free
free addr)
(incf collected))
((1) ; keep it
(setf (mark addr) 0)))
finally (setf free-list free) (return collected)))


(defun memory-collect-garbage ()
(gc-mark root-set)
(gc-sweep))

;;; That's it.


(defun new-fixnum (value)
"Allocates a new fixnum cell, returns its address."
(when (= 0 free-list) (memory-collect-garbage))
(when (= 0 free-list) (error "Out of memory."))
(let ((addr free-list))
(setf free-list (cdr-value free-list)
(tag addr) +tag/fixnum+
(fixnum-value addr) value)
addr))


(defun new-cons (car cdr)
"Allocates a new cons cell, returns its address."
(when (= 0 free-list) (memory-collect-garbage))
(when (= 0 free-list) (error "Out of memory."))
(let ((addr free-list))
(setf free-list (cdr-value free-list)
(tag addr) +tag/cons+
(car-value addr) car
(cdr-value addr) cdr)
addr))


(defun memory-room ()
"Returns the number of free cells on the free list,
and the length of the root set."
(format *trace-output* "Memory Size= ~5D~%" *memory-size*)
(format *trace-output* "Free-list head: ~5D~%" free-list)
(format *trace-output* "Root-set head: ~5D~%" root-set)
(flet ((len (address) ; not a list-length since we don't update
; the tags on the free list.
(loop
for current = address then (cdr-value current)
while (/= 0 current)
sum 1)))
(values (len free-list)
(len root-set))))


(defun memory-type (address)
"Returns the type of the object at the given ADDRESS (as a string,
until we have symbols)."
(case (tag address)
((#.+tag/fixnum+) "FIXNUM")
((#.+tag/cons+) "CONS")
(otherwise "RESERVED")))


;; Debugging functions. We build

(defgeneric lisp-to-memory (object)
(:method ((value integer))
(check-type value (integer #.(- (expt 2 (1- 24))) #.(1- (expt 2 (1- 24)))))
(new-fixnum value))
(:method ((value null))
0) ; we use fixnum 0 as NIL for now.
(:method ((value cons))
;; Special care should be given to the root-set. In an interpreter
;; or compiler the root-set is collected automatically from the
;; registers and the stack. Here we must update it explicitely to
;; avoid freeing a data structure that is being allocated.
(let ((cell (new-cons 0 root-set)))
(setf root-set cell)
;; lisp-to-memory must be called only after we updated the
;; root-set with cell above.
(setf (car-value root-set) (lisp-to-memory (car value)))
;; We must keep the (lisp-to-memory (car value)) in the root set
;; before calling the following (lisp-to-memory (cdr value)).
(let ((cdr (lisp-to-memory (cdr value))))
(setf root-set (cdr-value root-set))
(setf (cdr-value cell) cdr))
cell)))

(defun memory-to-lisp (address)
(ecase (tag address)
((#.+tag/fixnum+) (fixnum-value address))
((#.+tag/cons+)
(cons (memory-to-lisp (car-value address))
(if (zerop (cdr-value address))
'()
(memory-to-lisp (cdr-value address)))))))


;; Test:

(defun test/memory/1 ()
(memory-initialize)
;; Allocate some data in the root set:
(memory-add-to-root-set (lisp-to-memory '(1 3 5 7 9 11 13 15 17 19)))
;; Allocate some garbage.
(loop
for i below (* 96/100 *memory-size*)
for c = 0 then (new-cons (new-fixnum i) c))
(trace memory-collect-garbage new-cons new-fixnum)
;; Allocate a few more cons cells.
;; This should invoke the garbage collector.
(loop
for i below (* 8/100 *memory-size*)
for c = 0 then (new-cons (new-fixnum i) c)
finally (print (memory-to-lisp c)) (terpri))
(untrace memory-collect-garbage new-cons new-fixnum)
(print (multiple-value-list (memory-room))) (terpri)
(memory-collect-garbage) ; collect all we can
(print (multiple-value-list (memory-room))) (terpri)
;; We still have the root set:
(print (memory-to-lisp root-set)) (terpri)
:success)

;; Exercise for the reader:
;; - implement a SYMBOL data type with name and value slots.

;;;; THE END ;;;;

CuppoJava

unread,
Jun 9, 2011, 4:46:09 PM6/9/11
to
I see. That makes sense now. So in a nutshell, because Common Lisp is
smart enough to not require a garbage collector if your code doesn't
dynamically allocate memory, then you can indeed write a garbage
collector in Common Lisp.

I think, then, that my problem is actually a bootstrapping one.

I've written a Lisp interpreter in C. It's designed to be the simplest
interpreter possible to get out of C as quickly as I can. It's not
sophisticated, and requires a garbage collector to run whether or not
your code dynamically allocates memory.

From here, what is the next step to getting a self-hosting Lisp?

Can I write a *better* garbage collector somehow using my simple Lisp
even though my simple Lisp itself needs a garbage collector to run?

Thanks a lot for your in-depth explanations. I really appreciate it.
-Patrick

On Jun 9, 1:05 pm, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:

Pascal J. Bourguignon

unread,
Jun 9, 2011, 5:22:53 PM6/9/11
to
CuppoJava <patrick...@gmail.com> writes:

> I see. That makes sense now. So in a nutshell, because Common Lisp is
> smart enough to not require a garbage collector if your code doesn't
> dynamically allocate memory, then you can indeed write a garbage
> collector in Common Lisp.
>
> I think, then, that my problem is actually a bootstrapping one.
>
> I've written a Lisp interpreter in C. It's designed to be the simplest
> interpreter possible to get out of C as quickly as I can. It's not
> sophisticated, and requires a garbage collector to run whether or not
> your code dynamically allocates memory.
>
> From here, what is the next step to getting a self-hosting Lisp?
>
> Can I write a *better* garbage collector somehow using my simple Lisp
> even though my simple Lisp itself needs a garbage collector to run?


Well, it depends on what's the purpose.

It may be the purpose is to bootstrap from your own lisp interpreter,
or it may be that it is just to bootstrap a lisp implementation from any
lisp, in which case you can bootstrap from any Common Lisp
implementation. Notice how sbcl can be compiled with most CL
implementations (not only with itself) (however, sbcl has some C code
too and needs a C compiler; one would have to write a C compiler in
lisp, or rewrite its C parts in Lisp, to be able to bootstrap it
entirely from lisp). On the other hand, ccl bootstraps only from itself
(AFAIK).

You might be interested by this paper:

Steel Bank Common Lisp - A Sanely-Bootstrappable Common Lisp
http://www.doc.gold.ac.uk/~mas01cr/talks/2008-05-15%20Potsdam/sbcl.pdf


Now, assuming you want to bootstrap with your own lisp interpreter.
You have to make some decisions:

- what is the target language you want to implement eventually?
(it could be just the language your lisp interpreter currently
implements, or it could be a subset of Common Lisp, or even the
whole Common Lisp, or some other kind of lisp).

- whether you want to use the host OS ABI, or whether you want to define
and use a lisp virtual machine?

- whether you want to bootstrap thru an in-memory compiler or a file
compiler?

- whether you want to mix the current interpreter with the compiled code
or you will run the compiled code in a separate environment (process)?

Have a look at those Tombstone diagrams:

http://en.wikipedia.org/wiki/T-diagram

They should help you represent the bootstrapping path you want to
choose.


Today, you have:

+--------+
| Lisp-I |
| |
| C |
+--------+

that is, a lisp interpreter, written in C. Since gcc is:

+------------------------+
| C gcc x86 |
+-------+ +-------+
| x86 |
+--------+
+--------+
\ x86 /
\ / <---- linux-x86 virtual machine.
\ /
\/

you can use it to compile your lisp interpreter into an executable lisp
interpreter:


+--------+ +--------+
| Lisp-I | | Lisp-I |
| | --> | |
| C | | x86 |
+--------+ +--------+
+------------------------+
| C gcc x86 |
+-------+ +-------+
| x86 |
+--------+


So, starting from your running lisp interpreter:

+--------+
| Lisp-I |
| |
| x86 |
+--------+
+--------+
\ x86 /
\ /
\ /
\/


what do you want to bootstrap to?

+------------------------+
| ? ? |
+-------+ +-------+
| ? |
+--------+

Eg. if what you want is a compiler for the Lisp-I language, you can
write it in Lisp-I:

+------------------------+
| Lisp-I x86 |
+-------+ +-------+
| Lisp-I |
+--------+

and execute it with your interpreter, compiling it itself, thus
producing a Lisp-I to x86 compiler working on x86:

+------------------------+ +------------------------+
(2) | Lisp-I x86 | --> | Lisp-I x86 |
+-------+ +-------+ +-------+ +-------+
| Lisp-I | | x86 |
+--------+ +--------+
+---------------------------------------+
| Lisp-I x86 |
+---------------+ +--------------+
| Lisp-I |
+--------+
+--------+
| Lisp-I |
| | (1)
| x86 |
+--------+
+--------+
\ x86 /
\ /
\ /
\/

Notice that the implementation of the interpreter (1) and the compiler
(2) can be entirely different: you don't have to use the same memory
map, the same data representation, etc. You would have to, only if you
wanted to integrate in the same executable the compiler and the
interpreter. For example, clisp contains

- a clisp VM, which is written in C,
- a Common Lisp interpreter, which is written in C too,
- a Common Lisp to clisp VM compiler which is written in Common Lisp.

When compiling (bootstrapping really) clisp, the interpreter is compiled
with gcc, the clisp VM is compiled with gcc, and the compiler is
compiled with the interpreter. In a second phase, the compiler is
compiled with itself (to check it works well). Then all three
components are gathered in a single executable. So you can run in the
same clisp image, lisp sexps that are interpreted, or execute the
compiler (on the clisp VM) to compile those lisp sexps into clisp VM
byte code, and then run those byte code on the clisp VM. For this to
work, the data that is stored in memory must be accessible by both the
clisp VM and the clisp interpreted: they must use a common format.

But since you can implement a REPL without an interpreter, just with a
compiler, you could avoid this link, and design your compiler and target
VM entirely independent from the existing interpreter.


(defun eval (form)
(funcall (compile nil `(lambda () ,form))))
;; ^ just use the compiler, no need for
;; an interpreter here.

CuppoJava

unread,
Jun 9, 2011, 5:45:31 PM6/9/11
to
Thank you so much! That was extremely helpful and exactly what I was
looking for. Those "Tomb-stone" diagrams helped a lot. I've never seen
that notation before.

Thank you again!
-Patrick

On Jun 9, 5:22 pm, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:

Thomas A. Russ

unread,
Jun 9, 2011, 6:48:46 PM6/9/11
to
CuppoJava <patrick...@gmail.com> writes:

> I see. That makes sense now. So in a nutshell, because Common Lisp is
> smart enough to not require a garbage collector if your code doesn't
> dynamically allocate memory, then you can indeed write a garbage
> collector in Common Lisp.

Correct. In fact, most GC-based languages will behave in a similar
manner, since garbage collection is triggered by not having enough
memory for a requested allocation. [I suppose that certain concurrent
collection algorithms may operate autonomously, but I'm not really that
familiar with them.]

> I think, then, that my problem is actually a bootstrapping one.
>
> I've written a Lisp interpreter in C. It's designed to be the simplest
> interpreter possible to get out of C as quickly as I can. It's not
> sophisticated, and requires a garbage collector to run whether or not
> your code dynamically allocates memory.

This is, of course, an alternate and viable implementation strategy. A
number of lisp systems actually do this, where a small core is written
in some other language like C.

> From here, what is the next step to getting a self-hosting Lisp?
>
> Can I write a *better* garbage collector somehow using my simple Lisp
> even though my simple Lisp itself needs a garbage collector to run?

Yes. And if you don't need to run your initial lisp very long, you may
not even need a garbage collector for it. You only NEED a garbage
collector if you use up your available heap. So you might be able to
write a core that doesn't have a GC and then bootstrap your next system
on it and build the GC in the second system.

The old Vax Common Lisp implementation NIL existed for a long time
without a garbage collector. At the time, 32 bit addresses gave enough
address space that it wasn't a critical priority. The programs of the
time could run without exhausting memory and so there was no need to
collect garbage. My memory is a bit hazy on whether it ever received a
GC or not. Certainly for the first few years it did well without one.

And on some of the early lisp machines, the original GC algorithm was
sufficiently slow that many users would just save the state of the lisp
computation and reboot. It was quicker than waiting for the (mark &
sweep?) GC. I suppose that in a way this was a sort of manual copy and
compact GC algrithm....

> Thanks a lot for your in-depth explanations. I really appreciate it.
> -Patrick
>

--
Thomas A. Russ, USC/Information Sciences Institute

Rob Warnock

unread,
Jun 9, 2011, 8:44:16 PM6/9/11
to
CuppoJava <patrick...@gmail.com> wrote:
+---------------

| Thank you so much! That was extremely helpful and exactly what I was
| looking for. Those "Tomb-stone" diagrams helped a lot. I've never seen
| that notation before.
+---------------

I don't know where the term "Tombstone diagram" came from. I've
always heard them called "T-diagrams", which is what they were
called in McKeeman's 1971 book on the XPL language:

http://en.wikipedia.org/wiki/Tombstone_diagram
Tombstone diagrams (or T-diagrams) ...
...
[1] T diagrams were first introduced for describing bootstrapping
and cross-compiling compilers in McKeeman et al. "A Compiler
Generator" (1971). ...

See also:

http://en.wikipedia.org/wiki/XPL
XPL is a dialect of the PL/I programming language, developed in 1967,
used for the development of compilers for computer languages. ...

The 1971 McKeeman book described the XPL language and compiler and
various cross-compilers for it. The inside front & back covers were
filled with T-diagrams of the various ports of XPL and its support
tools to different environments.

But T-diagrams actually go all the way back to Bratman (1961):

http://portal.acm.org/citation.cfm?id=366249
A alternate form of the "UNCOL diagram"
Harvey Bratman, System Development Corp., Santa Monica, CA
Comm. ACM 4 (March 1961) 3, p. 142.

themselves being inspired by Conway's "UNCOL" (1958):

http://en.wikipedia.org/wiki/UNCOL
UNCOL (Universal Computer Oriented Language) was a proposed universal
intermediate language for compilers introduced by Melvin E. Conway in
1958. It was never fully specified or implemented; in many ways it was
more a concept than a language.
...


-Rob

p.s. Conway is also known for this cynical (yet often true!) observation:

http://en.wikipedia.org/wiki/Conway%27s_Law

http://www.melconway.com/research/committees.html
...
Any organization that designs a system (defined more broadly here
than just information systems) will inevitably produce a design whose
structure is a copy of the organization's communication structure.

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403

Pascal J. Bourguignon

unread,
Jun 9, 2011, 8:52:30 PM6/9/11
to
CuppoJava <patrick...@gmail.com> writes:

> Thank you so much! That was extremely helpful and exactly what I was
> looking for. Those "Tomb-stone" diagrams helped a lot. I've never seen
> that notation before.

I also wanted to mention LiSP, of course.

LiSP = "Lisp in Small Pieces"
http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html
http://pagesperso-systeme.lip6.fr/Christian.Queinnec/Books/LiSP-2ndEdition-2006Dec11.tgz
This book covers Lisp, Scheme and other related dialects,
their interpretation, semantics and compilation. To sum it up
in a few figures: 500 pages, 11 chapters, 11 interpreters and
2 compilers.

(There's an English translation of the first edition, I don't know about
the second edition).

Rob Warnock

unread,
Jun 9, 2011, 9:01:58 PM6/9/11
to
CuppoJava <patrick...@gmail.com> wrote:
+---------------

| However, I don't understand how to implement all the supporting
| systems in Lisp. For example, the garbage collector and virtual
| machine for my current Lisp system is implemented in C. The whole
| notion of writing a garbage collector in a garbage collected language
| eludes me.
+---------------

As other have pointed out, you can use a subset of your language
which doesn't implicitly allocate at all. In this vein, you might
be interested in the language "Pre-Scheme":

http://en.wikipedia.org/wiki/PreScheme

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4031
Pre-Scheme: A Scheme Dialect for Systems Programming (1997)
by Richard A. Kelsey
Abstract:
Pre-Scheme is a statically typed dialect of Scheme that gives the
programmer the efficiency and lowlevel machine access of C while
retaining many of the desirable features of Scheme. The PreScheme
compiler makes use of type inference, partial evaluation and Scheme
and Lisp compiler technology to compile the problematic features of
Scheme, such as closures, into C code without significant run-time
overhead. Use of such features in Pre-Scheme programs is restricted
to those cases that can be compiled into efficient code.
...

which is [or was, at one time] used to implement the low-level bits
of the language "Scheme48":

http://en.wikipedia.org/wiki/Scheme_48

http://s48.org/

And as other have also pointed out, you probably don't even *need* a
garbage collector at all for the early parts of your bootstrapping.
So if you write your garbage collector in an explicit-allocations-only
dialect of Lisp [similar to Pre-Scheme], and then compile that to either
C or directly to machine language, you can then bootstrap your GC without
needing a GC to do it!


-Rob

CuppoJava

unread,
Jun 9, 2011, 9:06:48 PM6/9/11
to
This is great! Thanks a lot everybody. I have enough material to keep
me occupied for a while now. And I think enough to get me all the way
to my own self-hosted Lisp.
-Patrick

On Jun 9, 9:01 pm, r...@rpw3.org (Rob Warnock) wrote:

> Rob Warnock             <r...@rpw3.org>

Frode V. Fjeld

unread,
Jun 10, 2011, 2:35:57 AM6/10/11
to
rp...@rpw3.org (Rob Warnock) writes:

> As other have pointed out, you can use a subset of your language which
> doesn't implicitly allocate at all.

I have written a GC in "the same" Lisp. I discovered you don't really
need to avoid allocation very much at all, because the first thing you
do is to set up a fresh (empty) newspace, which is typically just
changing a pointer, and from then on you can allocate stuff (within
reason) while you fiddle with the old newspace etc.

--
Frode V. Fjeld

0 new messages