Bronze Age Lisp design part 2 - the allocator

68 views
Skip to first unread message

Oto Havle

unread,
Aug 24, 2013, 8:43:46 AM8/24/13
to kl...@googlegroups.com
Hi all,

this is the second post about design choices in
Bronze Age Lisp, interpreter of the Kernel Language
(https://bitbucket.org/havleoto/bronze-age-lisp).

Klisp's allocator and garbage collector are derived from Lua runtime.
The important point is that allocation of all types of values
uses the same mechanism and boils down to malloc() of the underlying
C library. The garbage collector uses the stop-the-world,
mark-and-sweep scheme and does not move objects.

In contrast, Bronze Age lisp has two heaps. One heap is used for structured values
(the lisp heap) and one heap for blobs (the blob heap). Each heap is
organized differently and each has its own allocator and garbage
collector.

The blob heap is expected to be less important for the interpreter
performance, so I will focus on the lisp heap in this post. The lisp
heap is used heavily, because short-lived continuations, environments
and lists are allocated there.

The allocator is based on the classical Cheney's
copying collector algorithm:

http://en.wikipedia.org/wiki/Cheney's_algorithm

Moreover, Bronze Age Lisp implements an extension of the algorithm to
speed up allocation of continuations. The heap (more precisely, the
currently active half-space) is further split into two parts, the
persistent area and the transient area:

+-----------------+ <-- heap end (highest address)
| |
| free space |
| |
|- - - - - - - - -| <-- P = allocation pointer (moves up)
| |
| |
| persistent area |
| |
| |
|-----------------| <-- L = transient limit (moves down)
| |
| transient area |
| |
|- - - - - - - - -| <-- T = transient area minimal address
| free space |
+-----------------+ <-- heap start (lowest addrees)


The size of the heap equal to a power of two and fixed upon compilation
of the interpreter. The allocator maintains the allocation pointer P
and transient limit L in global variables. The transient area minimal
address T is not stored, but it is recomputed as needed.

Most objects are allocated in the persistent area. Allocation in the
persistent area is simple, the allocator just increments P by the object
size and checks it against the heap end.

Continuations and environments may also be allocated in the transient
area. The objects allocated are constrained by several conditions:

(1) pointers from the persistent area to the transient area
are NOT allowed

(2) pointers stored in the transient area must point
to an higher address than is the address where
the pointer is stored

(3) pointers to the transient area may be stored
in the registers EBP (which normally contains the
current continuation) or EDI (which normally
contains the current dynamic environment). Other
registers do not contain pointers to the transient
area

Based on these rules, the transient minimal address is

T = min(L, EBP, EDI), (if EBP or EDI does not point
into transient area, it is not
considered for the min(...) operation).

Allocation in the transient area is also simple. The allocator
computes T, decrements it by the object size, checks against the
heap lowest address, and returns the address of the object.

In order to maintain the condition (1), care must be taken when pointers
to continuations or environments are stored in other objects. The
continuation or environment must be captured. In this case, the
transient limit L is updated in such a way that the captured object
becomes part of the persitent area.

The condition (2) holds almost by design. Consider an allocation of a
continuation related to evaluation or combiner application (i.e.
function call). The continuation usually contains pointer to the parent
continuation, address of the machine code which is executed when the
continuation is invoked, saved dynamic environment, and closure data
like function argument values. The parent continuation is the current
continuation (*) and the environment being saved is the current dynamic
environment. Both parent continuation and the current dynamic
environment can be stored in the newly-created object in transient area
because of the condition (3). The code address is in the read-only code
segment of the interpreter and it is not managed by the heap allocator.
The closure data are already in the persistent area.

(*) the only cases which involve allocation of continuation
with the parent different from the current continuation are:

- the ground combiners extend-continuation, guard-continuation
and guard-dynamic-extent

- building of continuations during interception caused
by the interplay of guard-continuation and apply-continuation

Most of the time, the transient area looks like a stack of
continuations, with T = EBP being the stack pointer. Since the transient
minimal address is not stored in a variable, but recomputed when needed,
tail and nontail calls do not need special care.

When either the persistent or transient area becomes full, the garbage
collector is invoked. The GC algorithm evacuates live objects to the
persitent area of free halfspace. The transient area is empty after each
run of the GC.

To be continued...

Oto Havle.
Reply all
Reply to author
Forward
0 new messages