[Sbcl-devel] Lisp Interface to GC

Craig Lanning

unread,

May 8, 2013, 5:16:24 PM5/8/13

to SBCL Devel List

I'm trying to identify all of the functions and variables that allow
Lisp to interact with whichever GC is being used. Below is the list
that I have so far (I'm using GenCGC). Have I missed any? Do I have
some that I shouldn't?

- Craig

Variables:

sb-ext:*after-gc-hooks*
sb-kernel:*gc-inhibit*
sb-kernel:*gc-pending*
sb-ext:*gc-run-time*
sb-vm:*read-only-space-free-pointer*
sb-vm:*static-space-free-pointer*
sb-vm:dynamic-space-end
sb-vm:dynamic-space-start
sb-vm:gencgc-alloc-granularity
sb-vm:gencgc-card-bytes
sb-vm:gencgc-release-granularity
sb-vm:max-dynamic-space-end
sb-vm:read-only-space-end
sb-vm:read-only-space-start
sb-vm:static-space-end
sb-vm:static-space-start

Functions & Macros:

sb-ext:bytes-consed-between-gcs
sb-kernel:current-dynamic-space-start
sb-ext:dynamic-space-size
sb-kernel:dynamic-space-free-pointer
sb-kerenl:dynamic-usage
sb-ext:gc
sb-impl::gc-and-save
sb-ext:gc-logfile
sb-kernel:gc-reinit
sb-kernel::gc-start-the-world
sb-kernel::gc-stop-the-world
sb-ext:generation-bytes-consed-between-gcs
sb-ext:generation-minimum-age-before-gc
sb-ext:generation-number-of-gcs
sb-ext:generation-number-of-gcs-before-promotion
sb-kernel::post-gc
sb-kernel::read-only-space-usage
sb-kernel::static-space-usage
sb-kernel:sub-gc
sb-unix::unblock-gc-signals
sb-sys:without-gcing

>>>CONFIDENTIALITY NOTICE>>> This electronic mail message, including any and/or all attachments, is for the sole use of the intended recipient(s), and may contain confidential and/or privileged information, pertaining to business conducted under the direction and supervision of the sending organization. All electronic mail messages, which may have been established as expressed views and/or opinions (stated either within the electronic mail message or any of its attachments), are left to the sole responsibility of that of the sender, and are not necessarily attributed to the sending organization. Unauthorized interception, review, use, disclosure or distribution of any such information contained within this electronic mail message and/or its attachment(s), is(are) strictly prohibited. If you are not the intended recipient, please contact the sender by replying to this electronic mail message, along with the destruction of all copies of the original electronic mail message (along with any attachments).

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Nikodemus Siivola

unread,

May 11, 2013, 5:53:26 AM5/11/13

to Craig Lanning, SBCL Devel List

On 9 May 2013 00:16, Craig Lanning <clan...@isc8.com> wrote:
> I'm trying to identify all of the functions and variables that allow
> Lisp to interact with whichever GC is being used. Below is the list
> that I have so far (I'm using GenCGC). Have I missed any? Do I have
> some that I shouldn't?

It's a fuzzy boundary; what do you want the list for?

You're missing at least WITH-PINNED-OBJECTS and its machinery.

You're also missing the lisp-side parts of the allocation and
pseudo-atomic machinery (see eg. WITH-FIXED-ALLOCATION in
src/compiler/x86-64/macros.lisp)... but I don't know if that really
belongs on your list or not.

Craig Lanning

unread,

May 13, 2013, 12:58:53 PM5/13/13

to Nikodemus Siivola, SBCL Devel List

On Sat, 2013-05-11 at 12:53 +0300, Nikodemus Siivola wrote:
> On 9 May 2013 00:16, Craig Lanning <clan...@isc8.com> wrote:
> > I'm trying to identify all of the functions and variables that allow
> > Lisp to interact with whichever GC is being used. Below is the list
> > that I have so far (I'm using GenCGC). Have I missed any? Do I have
> > some that I shouldn't?
>
> It's a fuzzy boundary; what do you want the list for?

I'm trying to make of list of all the functions and variables that
someone would need to implement if they wanted to create a new garbage
collector. Sort of a GC API.

Craig

> You're missing at least WITH-PINNED-OBJECTS and its machinery.
>
> You're also missing the lisp-side parts of the allocation and
> pseudo-atomic machinery (see eg. WITH-FIXED-ALLOCATION in
> src/compiler/x86-64/macros.lisp)... but I don't know if that really
> belongs on your list or not.
>

>>>CONFIDENTIALITY NOTICE>>> This electronic mail message, including any and/or all attachments, is for the sole use of the intended recipient(s), and may contain confidential and/or privileged information, pertaining to business conducted under the direction and supervision of the sending organization. All electronic mail messages, which may have been established as expressed views and/or opinions (stated either within the electronic mail message or any of its attachments), are left to the sole responsibility of that of the sender, and are not necessarily attributed to the sending organization. Unauthorized interception, review, use, disclosure or distribution of any such information contained within this electronic mail message and/or its attachment(s), is(are) strictly prohibited. If you are not the intended recipient, please contact the sender by replying to this electronic mail message, along with the destruction of all copies of the original electronic mail message (along with any attachments).

Nikodemus Siivola

unread,

May 14, 2013, 11:09:37 AM5/14/13

to Craig Lanning, sbcl-devel

Then the correct answer is ALLOCATION/WITH-FIXED-ALLOCATION; everything else is more or less negotiable. Ie. the compiler needs to know how to emit allocation sequences.

Craig Lanning

unread,

May 14, 2013, 6:15:03 PM5/14/13

to Nikodemus Siivola, sbcl-devel

On Tue, 2013-05-14 at 18:09 +0300, Nikodemus Siivola wrote:
> Then the correct answer is ALLOCATION/WITH-FIXED-ALLOCATION;
> everything else is more or less negotiable. Ie. the compiler needs to
> know how to emit allocation sequences.

Now, that was an eye opener.

Ultimately, what I'd like to do is define a set of functions and global
variables that can be define either in Lisp for a GC implemented in Lisp
or in C for a GC implemented in C (and brought into Lisp via the alien
facility).

The company I work for has a very special use case that I think can be
made easier by rewriting the GC. I read the opinions about why the GC
is in C instead of Lisp. I understand them, agree with a few, and
disagree with others. I think that the changes I need to do to the GC
will be easier to implement if I can do it in Lisp instead of C.

After looking at ALLOCATION/WITH-FIXED-ALLOCATION, it appears that
switching the GC from a C implementation to a Lisp implementation would
be very non-trivial. Has anyone given any serious thought to what would
need to be done to implement the GC in Lisp?

Craig

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d

Attila Lendvai

unread,

May 14, 2013, 10:53:58 PM5/14/13

to Craig Lanning, sbcl-devel

> be very non-trivial. Has anyone given any serious thought to what
> would need to be done to implement the GC in Lisp?

it's somewhat tangential, but it might be worth it in your endeavor to
consider completely reifying heaps as first class objects into the
system.

i think it's much more work than just pluggable GC's, so probably it
will be out of your scope, but nevertheless try to keep this in mind
when shaping the interfaces...

that way the user could have an API to tell that in a dynamic extent
the runtime should use a specific heap, with a potentially different
GC.

or just open a new heap in a dynamic extent and tell the runtime not
to bother with synchronization and GC, because this heap is big enough
for all the allocation that will happen in this dynamic extent, and
can be thrown away in its entirety afterwards.

does anyone know any system that already explored this idea? i guess
the FONC people (Alan Kay, VPRI) have something like this, but very
little info/code is leaking out of them.

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If you want to do good, work on the technology, not on getting power.”
— John McCarthy

Faré

unread,

May 15, 2013, 12:26:22 AM5/15/13

to Attila Lendvai, sbcl-devel

On Tue, May 14, 2013 at 10:53 PM, Attila Lendvai
<attila....@gmail.com> wrote:
>> be very non-trivial. Has anyone given any serious thought to what
>> would need to be done to implement the GC in Lisp?
>
> it's somewhat tangential, but it might be worth it in your endeavor to
> consider completely reifying heaps as first class objects into the
> system.
>
> i think it's much more work than just pluggable GC's, so probably it
> will be out of your scope, but nevertheless try to keep this in mind
> when shaping the interfaces...
>
> that way the user could have an API to tell that in a dynamic extent
> the runtime should use a specific heap, with a potentially different
> GC.
>
> or just open a new heap in a dynamic extent and tell the runtime not
> to bother with synchronization and GC, because this heap is big enough
> for all the allocation that will happen in this dynamic extent, and
> can be thrown away in its entirety afterwards.
>
> does anyone know any system that already explored this idea? i guess
> the FONC people (Alan Kay, VPRI) have something like this, but very
> little info/code is leaking out of them.
>

The ML-Kit with Region bootstrapped its compiler entirely using such
nested "dynamic extents". Matthew Fluet wrote a nice thesis on the
general topic that generalizes the notion allowing for first-class
such regions.

I'd like to look into it and write a "linear lisp" in the style of
Henry Baker that has some such notion of first-class region. We'll see
after ELS2013. (Admittedly, I said I'd do it after ILC2012, and
instead I wrote ASDF3.)

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
There cannot be Ethics without Models of possible behaviors, and Imagination
to explore them. [Corollary: there is no Ethics for an all-knowing God,
but there are Ethics for mostly-ignorant but nevertheless thinking humans]

Philipp Marek

unread,

May 15, 2013, 1:32:56 AM5/15/13

to Attila Lendvai, sbcl-devel

> or just open a new heap in a dynamic extent and tell the runtime not
> to bother with synchronization and GC, because this heap is big enough
> for all the allocation that will happen in this dynamic extent, and
> can be thrown away in its entirety afterwards.
>
> does anyone know any system that already explored this idea? i guess
> the FONC people (Alan Kay, VPRI) have something like this, but very
> little info/code is leaking out of them.

APR has something similar: nested memory pools, and destroying one destroys all nested,
too.

http://apr.apache.org/docs/apr/1.4/group__apr__pools.html

Attila Lendvai

unread,

May 15, 2013, 2:01:27 AM5/15/13

to Philipp Marek, sbcl-devel

> APR has something similar: nested memory pools, and destroying one
> destroys all nested, too.
>
> http://apr.apache.org/docs/apr/1.4/group__apr__pools.html

well, it's easy to do it in a system where all that you have is random
access to the underying huge binary number we call memory, and as such
you must manage the object layout yourself.

it's an entirely different story to introduce first class heaps into a
system that also promises transparent memory management for you (GC).

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--

“Be who you are and say what you feel, because those who mind don't
matter and those who matter don't mind.”
— Dr. Seuss

Alastair Bridgewater

unread,

May 15, 2013, 1:15:20 PM5/15/13

to Craig Lanning, sbcl-devel

On May 14, 2013 6:33 PM, "Craig Lanning" <clan...@isc8.com> wrote:
>
> On Tue, 2013-05-14 at 18:09 +0300, Nikodemus Siivola wrote:
> > Then the correct answer is ALLOCATION/WITH-FIXED-ALLOCATION;
> > everything else is more or less negotiable. Ie. the compiler needs to
> > know how to emit allocation sequences.
>
> Now, that was an eye opener.
>
> Ultimately, what I'd like to do is define a set of functions and global
> variables that can be define either in Lisp for a GC implemented in Lisp
> or in C for a GC implemented in C (and brought into Lisp via the alien
> facility).
>
> The company I work for has a very special use case that I think can be
> made easier by rewriting the GC. I read the opinions about why the GC
> is in C instead of Lisp. I understand them, agree with a few, and
> disagree with others. I think that the changes I need to do to the GC
> will be easier to implement if I can do it in Lisp instead of C.

Are you able to share any details about the sort of changes that you have in mind? Or about the use case, even in general terms?

> After looking at ALLOCATION/WITH-FIXED-ALLOCATION, it appears that
> switching the GC from a C implementation to a Lisp implementation would
> be very non-trivial. Has anyone given any serious thought to what would
> need to be done to implement the GC in Lisp?

Some years ago (late 2008, maybe), I did some preliminary investigation into writing parts of the GC in Lisp, although I was mostly focusing on how to implement things like the dispatch involved in scavenging an object.

If you're planning to implement a GC in Lisp, one of the things to make sure of is that the code and data that implements the GC is available while you're running the GC, which is something that seemed very difficult in my original context, but for what you're doing simply arranging for any data tables required to be in static space or otherwise pinned and for code objects to not move would cover the worst of it. Well, and there are the FDEFINITION objects to consider, but putting them into static space as well should work, if you can arrange that (I can think of a couple of approaches here).

Another aspect to consider is if the GC code itself should be allowed to allocate memory. If it shouldn't, then you have to be careful about how you write the code in order to avoid allocation and you may also want to figure out how to tell the compiler that any allocation would be an error (so that you don't backslide during maintenance).

The inline allocation logic itself is actually fairly straightforward, modulo the overflow handling. Each thread has an alloc_region (in a single-threaded system there is a global alloc_region), which contains two fields of interest to the allocation logic: the current allocation pointer and the end of the region. Everything else in the alloc_region is of interest to the GC only, and the general layout might well be completely different for a different GC.

You would also have to deal with the "safepoint" or "pseudo-atomic" logic, and I haven't really thought overmuch about what would be involved here, plus there's the whole issue of actually triggering a collection cycle. And there's the matter of scavenging any interrupt contexts and the various thread stacks.

And if you're changing the heap layout too drastically, or need to arrange for certain things to be in certain heap spaces even in the cold-core, you may well end up modifying genesis (SYS:SRC;COMPILER;GENERIC;GENESIS.LISP), the program that actually creates the cold-core from a set of cross-compiled FASLs.

Anyway, I've given the matter a certain amount of thought, and I'm reasonably confident that I could write SOMETHING that would work, but it would take a while and I simply don't have a use-case that would make it worth the effort.

I hope that this brain-dump helps, and would love to be kept "in the loop" if you decide to go forward with writing a new GC for SBCL in Lisp. Or even a new GC for SBCL in C.

> Craig

-- Alastair

Craig Lanning

unread,

May 15, 2013, 2:41:54 PM5/15/13

to Alastair Bridgewater, sbcl-devel

I can give a generic description of what I want to do. I just can't
tell you the specific reason why the application needs to do this.

Basically, the application has a certain object organization that it
needs to be able to keep localized. If we need to drop one of our main
objects so that we can load another one, it would be advantageous to
know that all allocation related to that object is contained within a
known memory block. Then we just declare the block empty and zero it.

I started my Lisp career working on Symbolics Lisp Machines. One thing
that they had was a concept of areas. An area was an allocation block
that could be constrained somehow. For the LispM's some areas held only
cons cells, others held symbol names. Areas could also be excluded from
collection. The aspect I'm interested in is that the user could create
an area, designate whether it should be handled by the GC, what objects
it would hold, and when the contents of the area are no longer needed,
just flush the contents without running any GC.

The "brain-dump" does help. You mentioned a few things that I hadn't
run across yet. I certainly will keep posting info to the list. I'm
interested to run any GC performance tests on whatever I end up
building. I intend to provide the SBCL team with a GC chapter for the
Internals manual at the very least. As I work on the GC changes, any
code that is generally useful, SBCL is welcome to have. I will try to
make sure that any of our "proprietary" changes are really nothing more
than specific configuration of the general GC that I build.

Based on the info from Nikodemus and Alastair, it looks like this will
be a longer term project than I had originally thought. Fortunately,
the application will still work with the existing GC so we're ok for the
time being. Changing the GC would just make the application run more
quickly and more efficiently.

Craig

Alastair Bridgewater

unread,

May 15, 2013, 7:54:59 PM5/15/13

to Craig Lanning, sbcl-devel

If you can guarantee that your memory areas will always have sufficient space for all of the allocation that you need to do, are willing to run WITHOUT-INTERRUPTS while allocating into that space, and the space will contain no external references other than to static space, then I have a nasty, nasty hack in mind that will run on a stock system (hint: there are only two words used for the basic allocation and overflow check, and the region is in a known global location on single-thread systems and in the thread structure on multi-thread systems, easily poked at via SAP functions either way).

The requirement to always have sufficient space is so that you don't trip an allocation overflow "trap" (though they are actually straight calls on x86oids, but on PPC they are a TWI instruction that the trap handler can easily recognize). To alleviate this, we would need to define a way to hook the overflow trap on a per-region basis to point to the "correct" GC. Note that it should be plausible to run the trap handler in Lisp with a bit of care.

The requirement to run WITHOUT-INTERRUPTS is because an interrupt (unix signal) handler will almost certainly allocate memory, and you don't want that in your custom memory space. To alleviate this, modify the interrupt handling logic to detect that a thread is running with a custom allocation region, and to somehow bind a normal, empty gencgc region in its place. And modify your macro to arrange to close the old gencgc region before setting up your custom region, otherwise the GC could lose track of the old region. If you also have to enable scavenging of your allocation regions then this becomes more complex, as you would need to update whatever tracks the size of the allocated data in your region when binding the gencgc region into place.

The requirement to not have any pointers to non-static heap data is because gencgc doesn't know to scavenge your allocation regions. This should be straightforward to arrange if you keep track of the regions, by arranging to scavenge all of the regions at some point after all conservative roots have been scanned (somewhere near the scavenging of the binding stacks should suffice; it should be before scavenging newspace).

Depending on which of these restrictions you can live with, this could run anywhere from an afternoon to a month, as a zero'th order estimate. The easy restriction to lift is the one about scavenging custom allocation regions. I'd actually have to think through the details for the other two. There may be a dependency involved (you might need some of the bits for removing WITHOUT-INTERRUPTS before you can have a lisp-side overflow handler).

[ snip ]

> > I hope that this brain-dump helps, and would love to be kept "in the
> > loop" if you decide to go forward with writing a new GC for SBCL in
> > Lisp. Or even a new GC for SBCL in C.
>
> The "brain-dump" does help. You mentioned a few things that I hadn't
> run across yet. I certainly will keep posting info to the list. I'm
> interested to run any GC performance tests on whatever I end up
> building. I intend to provide the SBCL team with a GC chapter for the
> Internals manual at the very least. As I work on the GC changes, any
> code that is generally useful, SBCL is welcome to have. I will try to
> make sure that any of our "proprietary" changes are really nothing more
> than specific configuration of the general GC that I build.
>
> Based on the info from Nikodemus and Alastair, it looks like this will
> be a longer term project than I had originally thought. Fortunately,
> the application will still work with the existing GC so we're ok for the
> time being. Changing the GC would just make the application run more
> quickly and more efficiently.
>
> Craig

Now you've got me thinking about pluggable allocation regions and scavenging strategies... And I already have enough of a project list.

-- Alastair

Reply all

Reply to author

Forward