[Sbcl-devel] Not-ready-for-prime-time concurrent GC on a branch

488 views
Skip to first unread message

Douglas Katzman via Sbcl-devel

unread,
Oct 9, 2023, 11:07:06 PM10/9/23
to SBCL Devel-list
Hi,
Now that we have Hayley's GC which establishes the new standard in SBCL for GC throughput, I'd like to mention that we have another GC which is significantly less stable than hers but offers a lower pause time.
If that appeals to you, and you like to live on the edge (or don't like to but are willing to help debug the nonstop GC), I thought I'd compare/contrast the two:

Similarities
- both use a non-moving allocator and bitmap to find free space
- both currently omit support various optimizations such as "compact instance header", and need to play catch-up with gencgc. 
Pros of concurrent GC relative to Hayley's GC
- the longest a thread should ever be "paused" (not doing its own work) is about 100 microseconds. "paused" actually means doing work for the GC.
- since the pause time is proportional to stack depth - i.e. the only work done is to publish the stack contents - you can, with varying degrees of success, control your pause time
Cons of concurrent GC relative to Hayley's parallel GC:
- the GC throughput is lower (which might not matter, given adequate memory)
- the allocation rate is a little lower which does or doesn't matter, depending on your application
- it's extremely less mature. The regression suite has tons of failures

Doug

Philipp Marek via Sbcl-devel

unread,
Oct 10, 2023, 2:30:54 AM10/10/23
to Douglas Katzman, SBCL Devel-list

> *Pros* of concurrent GC relative to Hayley's GC
> - the longest a thread should ever be "paused" (not doing its own work)
> is
> about 100 microseconds. "paused" actually means doing work for the GC.
> - since the pause time is proportional to stack depth - i.e. the only
> work
> done is to publish the stack contents - you can, with varying degrees
> of
> success, control your pause time

So there is an exported function that I can call in a appropriate stack
frame,
like just after I handled an incoming query?

How do these compare with your arena work, latency- and bandwidth-wise?


Did you spend any thoughts w.r.t. continuation passing style?
That should give a very shallow stack but lots of cps blocks on the
heap, right?


Thanks!


_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Hayley Patton

unread,
Oct 10, 2023, 5:16:13 AM10/10/23
to Philipp Marek, sbcl-...@lists.sourceforge.net
On 10/10/23 17:14, Philipp Marek via Sbcl-devel wrote:
> Did you spend any thoughts w.r.t. continuation passing style?
> That should give a very shallow stack but lots of cps blocks on the
> heap, right?
My unsolicited opinion is that using a return barrier
<http://www.yuasa.kuis.kyoto-u.ac.jp/~yuasa/ilc2002/index.html> would
allow for using a stack while bounding the amount of stack scanning done
at once (but might need yet another invasive change to the SBCL
runtime). That'd seem to side-step the "proportional to stack depth"
work (as long as you don't return from lots of functions at once), while
avoiding allocating more in the heap or having to confound the return
address prediction of the CPU.

Douglas Katzman via Sbcl-devel

unread,
Oct 10, 2023, 10:11:52 AM10/10/23
to Philipp Marek, SBCL Devel-list
On Tue, Oct 10, 2023 at 2:14 AM Philipp Marek <phi...@marek.priv.at> wrote:

So there is an exported function that I can call in a appropriate stack
frame,
like just after I handled an incoming query?

No, but a thread that chooses to keep its stack small can bound the time it spends working for the GC.

How do these compare with your arena work, latency- and bandwidth-wise?
No measurements yet. This GC isn't nearly ready enough to build a real-world application. 
We've gotten tremendous gains with arenas, and I'd almost say it's nearly impossible to outperform a mark+release paradigm, though the MMtK people would disagree. (They'd disagree because a gc-never strategy places more demands on physical memory + caches).  To test, we'd need a server that eschews arenas and only the GC, either of the new choices, which we're in no position to do, because neither can build an ELF core.


Did you spend any thoughts w.r.t. continuation passing style?
That should give a very shallow stack but lots of cps blocks on the
heap, right?
no and yes

Elijah Stone

unread,
Oct 10, 2023, 3:50:54 PM10/10/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
Perhaps virtual memory could be used to do it without being too invasive.
Reply all
Reply to author
Forward
0 new messages