Request for RacketCS GC fun

187 views
Skip to first unread message

Paulo Matos

unread,
Sep 11, 2019, 11:20:00 AM9/11/19
to Racket Developers
Hi,

As part of my day job these days I am working on garbage collection. I
would like to understand better the GC that we use for RacketCS - the
one coming with ChezScheme.
Although coupled relatively tightly with the rest of the source, I found
it not too hard to distill an API for the GC and allocator and did so under
https://github.com/LinkiTools/ChezScheme/commit/095c6a2ccf52ef1ae1c1e6cd1f83b0f043166018
This will actually compile if you do ./configure --omegagc

As a first step I read the BIBOP paper and I have been using the GC
Handbook for more than a good paper weight.
I was wondering if anyone has noticed some Racket apps or benchmarks
performing poorly due to reasons that might have to do with garbage
collection. If so, I kindly request you to let me know so I can have
some fun. Also, if you have a request for something new in the GC that I
can have some fun implementing, feel free to send that to me as well. :-)

Thanks,

Paulo Matos

Philip McGrath

unread,
Sep 12, 2019, 7:20:21 PM9/12/19
to Paulo Matos, Racket Developers
Here's an idea that's half remembered and maybe a quarter understood.

Chez boxes flonums. I thought I'd read something by Matthew talking about it in more detail, but I haven't been able to find it, though I do see that "Flonum unboxing: accept mismatch, for now" still appears in Fig. 3 (p. 78:6) of the recent ICFP experience report (https://www.cs.utah.edu/plt/publications/icfp19-rddkmstz.pdf). My very vague recollection/understanding is that a big part of the issue is the way floats are stored by the GC (or, maybe more specifically, the allocator).

Aside from the fact that it might be hard, I could imagine that this might not be what you're looking for, since it seems like it would be much more about creating garbage than collecting it per se, and presumably there would be work to be done outside of the collector to take advantage of an unboxed representation. The potentially appealing part would be that I could imagine it having an appreciable and measurable impact on a certain class of programs.

-Philip


--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/57b67daa-4eb7-d900-4fdf-317f5671e935%40linki.tools.

Matthew Flatt

unread,
Sep 13, 2019, 9:28:48 AM9/13/19
to Paulo Matos, Philip McGrath, Racket Developers
At Thu, 12 Sep 2019 19:20:08 -0400, Philip McGrath wrote:
> "Flonum unboxing: accept mismatch, for now" [...] I could imagine
> that this might not be what you're looking for, since it seems like
> it would be much more about creating garbage than collecting it per
> se

Yes, flonum unboxing entirely a compiler project.


Some things I'd like to have from the GC, but these are large projects:

* GC implemented/generated in Scheme.

A common problem with adding new things to the GC (including
scalable object locking, ordered finalization, and vfasl saving) is
repeating/copying the code that traverses objects. Each copy is a
little different, and it's difficult to abstract over the variation
and get good performance, especially in C. Implementing the GC
itself in Scheme may be too tricky to be worthwhile, but generating
a C implementation in Scheme might be the way to go. The pointer and
object layouts in "cmacro.ss" is already a lot of the needed
information.

* Support for non-moving objects that are collected when unreferenced
but not moved while referenced.

Racket CS implements non-moving objects by locking objects and using
a finalizer to unlock. Although I made changes to Chez Scheme so
that object locking is more scalable, it would be nicer and likely
perform better to have an allocation arena that is GCed but not
compacted/moved.

* Parallel GC

* Incremental GC

George Neuner

unread,
Sep 13, 2019, 10:55:18 PM9/13/19
to racke...@googlegroups.com

Apologies, Matthew, if you know this all already.  This is really more
for Paulo's benefit, but your wish list provided a convenient framework.


On 9/13/2019 9:28 AM, Matthew Flatt wrote:
> At Thu, 12 Sep 2019 19:20:08 -0400, Philip McGrath wrote:
> > "Flonum unboxing: accept mismatch, for now" [...] I could imagine
> > that this might not be what you're looking for, since it seems like
> > it would be much more about creating garbage than collecting it per
> > se
>
> Yes, flonum unboxing entirely a compiler project.
>
>
> Some things I'd like to have from the GC, but these are large projects:
>
> * GC implemented/generated in Scheme.
>
> A common problem with adding new things to the GC (including
> scalable object locking, ordered finalization, and vfasl saving) is
> repeating/copying the code that traverses objects. Each copy is a
> little different, and it's difficult to abstract over the variation
> and get good performance, especially in C. Implementing the GC
> itself in Scheme may be too tricky to be worthwhile, but generating
> a C implementation in Scheme might be the way to go. The pointer and
> object layouts in "cmacro.ss" is already a lot of the needed
> information.

Working in C  (or similar languages) it is best not to abstract very
much wrt pointer identification.  Each different object type should have
a custom pointer finding function generated by the compiler directly
from the object's layout.   You then somehow associate the function (or
functions, see below) with the object - using BiBOP page metadata, or an
additional pointer in the object header, etc. Stack frames (assuming the
language has a notion of a contiguous stack) can be handled in very much
the the same manner as heap objects.

For a non-moving collector, the pointer finding function can simply
queue all the pointers for later examination and then mark the object as
scanned.  In the case of a union, you would queue the "potential"
pointer and figure out later whether it really points to a heap object
(mistakes due to conservative pointer finding are fairly rare if true
heap allocations can be positively identified).

For a moving collector, it's a bit more complex - the pointer finding
function needs to be able to enumerate pointers one-by-one, and you also
need a corresponding pointer setting function.

But these functions tend to be rather simple to generate - the majority
of objects are not  very complex, and in any case it is all done
mechanically from the object's layout.

For generational (or parallel, below) collection, you'll need to
implement moving anyway.


> * Support for non-moving objects that are collected when unreferenced
> but not moved while referenced.

Best handled with one or more separate logical heaps.  Relatively easy
in a BiBOP system if you base page residence on object size rather than
on type.   Quite a bit more complicated if you demand heaps be contiguous.


> Racket CS implements non-moving objects by locking objects and using
> a finalizer to unlock. Although I made changes to Chez Scheme so
> that object locking is more scalable, it would be nicer and likely
> perform better to have an allocation arena that is GCed but not
> compacted/moved.

Ouch !   "Pinning" objects in an moving system is a PITA ... it doesn't
matter whether you are compacting or (semi-heap) copying. It would be
better if "pinning" an object relocated it to the non-moving heap.


> * Parallel GC

Generally involves a complete rethink of the heap structure. Parallel
collection is one thing that actually IS made easier by contiguous
(semi) heaps.   In a BiBOP system a "heap" is a logical construct of
(maybe) discontiguous pages, and the extra chasing around makes
collecting in parallel more complicated.

Note that the mutator(s) typically are stopped in a parallel collection
(but see below).


> * Incremental GC

Depends on what you mean by "incremental".

One possible "incremental" system would use the same N (semi) heaps as
the parallel collector, but rotates the pair of heaps used for current /
FROM and TO spaces.  The mutator is stopped just long enough to collect
the filled FROM space, emptying it completely so it can become a
TO-space target for the next collection.   [I believe the GC handbook
calls this "train" allocation.]

Another would leverage the VMM system, collecting page by page as
FROM-space pages are touched.  This can be done moving or non-moving,
but it is complicated by the need to positively identify object
boundaries ... because the identity of the object being written to has
been lost when the VMM-assisted collector gets control ... and by
objects that span multiple VMM pages.


Collecting object by object while the mutator continues requires much
more complexity and generally requires more space - regardless whether
GC actually is interleaved with or done partly in parallel with the
mutator.  It requires for each object, one of:

   A) an extra metadata pointer in the header  [kiss of death for small
objects: boxes, pairs, flonums, etc.], or
   B) a minimum allocation size at least equal to that of a cons pair.

Either case requires access barriers on both read and write that can
recognize that an object has been forwarded and then perform the
operation on the TO-space copy instead of the original in FROM-space. 
The barrier may be required to copy other objects because FROM-space
originals generally can't be modified after having been copied [at least
not without introducing a lot of extra complexity].


If you put a reasonable limit on the size of thread (Racket place)
private heaps, and limit the number of generations in the private heaps
to just 1 or 2, then you can get good performance with a simpler
Stop-the-Thread collection - done asynchronously whenever the thread
pleases - and leave more complex techniques like incremental collection
to the shared heap, which is likely to be large (maybe last generation
sized) and could be collected in parallel with the mutator(s), often at
a slower rate.


One thing to remember about GC is that useful real-world systems tend to
be hybrids that make use multiple allocation/collection techniques for
different situations.  Trying to use the same technique everywhere might
make for a good research paper, but it is unlikely to make for a highly
performant runtime system.  No one technique is optimal everywhere (if
it even is optimal anywhere), so some amount of tailoring to the
specifics generally will be better.

George

Ben Greenman

unread,
Oct 6, 2019, 2:45:14 PM10/6/19
to Racket Developers
Here's some data on garbage collection for a few Typed Racket
benchmark programs on Racket v6.12:

https://docs.racket-lang.org/gtp-benchmarks/index.html?q=gtp%20benchmarks#(part._.Time_and_.Garbage_.Collection_.Details)

The easiest way to get code that matches the table is to:
- install the `gtp-checkup` package
- run the "main.rkt" file in the "typed-worst-case" directories

Example code: https://github.com/bennn/gtp-checkup/tree/master/benchmarks/acquire/typed-worst-case

Christopher Lemmer Webber

unread,
Jan 20, 2020, 5:06:29 PM1/20/20
to Matthew Flatt, Racket Developers
Matthew Flatt writes:

> * Incremental GC

Side note, does Racket-on-Chez's GC support incremental garbage
collection still? I'm guessing not, and that's what this bullet point
is about.

It's a nice feature in Racket for gamedev stuff, hence the question.

Matthew Flatt

unread,
Jan 20, 2020, 8:19:10 PM1/20/20
to Christopher Lemmer Webber, Racket Developers
At Mon, 20 Jan 2020 17:06:26 -0500, Christopher Lemmer Webber wrote:
> Matthew Flatt writes:
>
> > * Incremental GC
>
> Side note, does Racket-on-Chez's GC support incremental garbage
> collection still? I'm guessing not, and that's what this bullet point
> is about.

Yes, that's correct: currently no incremental GC for Racket CS.

> It's a nice feature in Racket for gamedev stuff, hence the question.

I expect that we'll get there eventually, but no idea how long it will
take.


Matthew

Reply all
Reply to author
Forward
0 new messages