[Sbcl-devel] Mostly working mostly non-moving parallel garbage collection

60 views
Skip to first unread message

Hayley Patton

unread,
Jul 2, 2023, 11:04:21 AM7/2/23
to sbcl-...@lists.sourceforge.net

Hello again,

This patch provides a mostly non-moving parallel garbage collector, enabled with the feature :mark-region-gc. The collector retains bump allocation and the current card-marking write barrier, and doesn't require mutator code to update the allocation bitmap. The collector is slightly slower (5-10% wall time) with a single thread, but tends to outperforms the current serial copying GC with multiple threads. My paper for ELS <https://applied-langua.ge/~hayley/swcl-gc.pdf#page=5> has some benchmarks (before compaction was implemented; compaction either doesn't affect performance or produces a slight speedup in less rigorous tests). But I don't think it's quite done; I'd appreciate some review, in particular on these listed points.

Of feature regressions:

- Immobile space still isn't supported. Doing a full GC was tricky when I last tried to support immobile space, as the mark-region collector can keep generation numbers intact, and the immobile space collector cannot (as I don't use fullcgc to do a full GC). But I got inexplicable crashes in TLSF after kludging around that. We should still use TLSF for the immobile space to avoid fragmentation, regardless of what's done in dynamic space.

- traceroot does not work, as it gleans the array of roots that gencgc finds, and I don't maintain that array.

- Some tests still fail. Tests which check rehashing of hash tables expect that the keys will move, and the collector doesn't move them. I think ADDRESS-INSENSITIVE-EQL-HASH is that kind of test, but it might also be useful (see my last point). (HASH-TABLE GC-SMASHED-CELL-LIST) fails, so I guess I don't get how the hash table free-list works. gc.impure.lisp crashes and futex-wait.test.sh returns an invalid exit status.

- SB-EXT:*GC-REAL-TIME* tracks GC real time, and TIME prints the mutator/GC breakdown of both real time and run time now. I'm amused that single-threaded mutator code can use more than 100% CPU now, but it might come as a surprise - though a correct surprise. (Contrariwise, a parallel program bottlenecked by GC reports less CPU used than the programmer hoped for.)

In the internals:

- Somehow FORMAT stopped working in cold-init, which Stas didn't think would work in the first place <https://irclog.tymoon.eu/libera/%23sbcl?around=1667040784#1667040784>.

- The C runtime can no longer assume that it can walk the heap as if it was contiguous; instead there is a little iterator protocol in walk-heap.h which works for both contiguous and non-contiguous heaps. The non-contiguous implementation is very naive, but I haven't found it being a performance problem. Though maybe I don't test the right things - it took me a while to find that S-L-A-D was broken once, because I hardly use it. The C runtime also cannot use (anything to the effect of) page_table[find_page_index(object)].gen to determine the generation of small objects, as generations for small objects are stored in line metadata instead. (But using the page table is fine for pseudo-static objects and large objects.)

- The tracing logic in fullcgc.c has been made generic in trace-object.inc, which is customised using the C preprocessor.

- Genesis emits single-object pages correctly when #+mark-region-gc.

- Core files now contain the allocation bitmap, which is stored after the page table. I don't ever compress the allocation bitmap, as it's 1/128 the size of the heap. Not sure if it's okay to just dump the bitmap after the page table either.

- The allocator prefers to reuse partly-used pages for small objects, in order to keep free pages for large objects, which is implemented using another array allow_free_pages. It works, but it doesn't seem right to separate it from the other allocator state in alloc_start_pages.

- I had to rearrange some headers, and I'm not sure if I did it in a reasonable way. I also started to #ifndef LISP_FEATURE_MARK_REGION_GC out dead code in gencgc, but I didn't finish. Should/could any common code be extracted into other files, rather than having to #ifndef out a large amount of gencgc.c?

- I've implemented being able to have roots use interior pointers (as mr_preserve_ambiguous calls find_object which allows interior pointers), but honestly I can't think of a purpose for it.

- As a side effect of lines (mostly) having generations rather than pages, the generation breakdown of gc_gen_report_to_file produces wrong and useless data. I'm not sure what to do with it.

With regards to performance:

- Incremental compaction works, but has barely been tuned. #+mark-region-gc sbcl.core is about 2x the size of a #-mark-region-gc sbcl.core. Tuning the amount to copy with a time limit seems useful, moreso if combined with concurrent tracing to make a pause time target; but even with STW the default configuration is rather arbitrary.

- Page and heap usage is counted in lines, and the GC has no way of measuring fragmentation in lines (e.g. if each line in a page was used by only one cons cell somehow). This is done to allow sweeping to update usage by subtracting lines it freed; perhaps we could maintain a table of words used in each page/generation combination, to be able to keep precise measurements. For a parallel GC we might want to have a table for each thread to avoid atomic updates, then merge the tables after tracing. Most collections just need to track usage of one generation, but full collections need to track usage of all generations. I haven't thought this through entirely.

- Currently the number of GC threads is always 4 by default (three helpers and the thread which entered collect_garbage), unless a different number is provided using the GC_THREADS environment variable. I'm not sure what the default should be.

- I added a "panic" trigger which causes collections of older generations, when more than 90% of the heap is used but the other collection triggers don't fire. In my experience this helps to avoid running out of heap, but will likely cause performance regressions if one manages to keep SBCL running with that much heap used. The limit on auto_gc_trigger is also lowered a bit, to compensate for fragmentation causing heap exhaustion to happen earlier than anticipated.

- Currently any moving of keys in a hash table causes the table to be rehashed, regardless of address sensitivity. Keys are seldom moved, so rehashing should not be frequent, but it's still needless.

Many thanks,
Hayley

0001-Parallel-mark-region-garbage-collector.patch

Philipp Marek via Sbcl-devel

unread,
Jul 3, 2023, 7:26:02 AM7/3/23
to Hayley Patton, sbcl-...@lists.sourceforge.net

> - Core files now contain the allocation bitmap,
> which is stored after the page table.
> I don't ever compress the allocation bitmap,
> as it's 1/128 the size of the heap.

I'm not too happy about that.

If you need to use large heaps (say, 64GB or 128GB),
you can either

- use a small heap dump and make sure that --dynamic-space-size
is passed along when starting the image, or

- pay the price (0.5 to 1GB!) in the heap dump,
which also corresponds to load time.

Can the allocation bitmap be derived, eg by a GC run right after
starting?

Another idea would be to allow changing (ie., at least *increasing*)
the heap size from within a running process.


> - Currently the number of GC threads is always 4 by default
> (three helpers and the thread which entered collect_garbage),
> unless a different number is provided using the GC_THREADS
> environment variable. I'm not sure what the default should be.

I guess that should be configurable from *within* the running image,
so that it can depend on some configuration, like

- number of cores allocated to this process,
- size of request worker pool,

etc.



_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Hayley Patton

unread,
Jul 3, 2023, 8:35:38 AM7/3/23
to Philipp Marek, sbcl-...@lists.sourceforge.net
On 3/7/23 17:09, Philipp Marek wrote:
> I'm not too happy about that.
>
> If you need to use large heaps (say, 64GB or 128GB),
> you can either
>
> - use a small heap dump and make sure that --dynamic-space-size
>   is passed along when starting the image, or
>
> - pay the price (0.5 to 1GB!) in the heap dump,
>   which also corresponds to load time.
>
Let me be a bit more precise here. I do the same trick that SBCL already
does for dumping the heap - only the prefix with live pages is stored in
the dump. Only the prefix of the allocation bitmap for live pages is
stored too. So the 1/128 ratio still holds.

(Though mmaping in the bitmap might help with startup time too, as is
done for the heap.)

> Can the allocation bitmap be derived, eg by a GC run right after
> starting?

It seems possible in theory. The bitmap is only needed to deal with
conservative roots, which come from threads, and there are no Lisp
threads yet. But doing a full GC at startup doesn't seem like a good
idea for startup time.

>
>> - Currently the number of GC threads is always 4 by default
>> (three helpers and the thread which entered collect_garbage),
>> unless a different number is provided using the GC_THREADS
>> environment variable. I'm not sure what the default should be.
>
> I guess that should be configurable from *within* the running image,
> so that it can depend on some configuration, like
>
> - number of cores allocated to this process,
> - size of request worker pool,
>
> etc.
>

I haven't implemented any way to resize the thread pool, but that does
sound like the right thing to do.

Douglas Katzman via Sbcl-devel

unread,
Jul 5, 2023, 8:45:41 PM7/5/23
to Hayley Patton, sbcl-...@lists.sourceforge.net

- traceroot does not work, as it gleans the array of roots that gencgc finds, and I don't maintain that array.

Is its only failing that it won't find a path if the root of the path would be a thread stack? Or does it simply not work at all? 
In retrospect I'm not sure it was a good idea to build the inverted heap from a linear walk.  Why couldn't I have just done a forward trace but record backward edges?
That would make it more usable with any GC because we have to assume that tracing from static space finds everything.
So I'm wondering if I should try to redo the construction of the inverted heap, which, if you were lacking such a way, would presumably be helpful to both of us.
Doug

Hayley Patton

unread,
Jul 6, 2023, 1:28:46 AM7/6/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
Is its only failing that it won't find a path if the root of the path would be a thread stack? Or does it simply not work at all? 

I think the former. The following appears to work, involving a symbol-value:

(defvar *x* (list 1 2 3))
(sb-ext:search-roots (sb-ext:make-weak-pointer *x*) :gc t)
-> (SB-IMPL::SYMBOL-HASHSET) #x100031D3D3[2] -> (CONS) #x1006BFDD27[1] -> ((SIMPLE-VECTOR 307)) #x1006C25E5F[78] -> (SYMBOL) #x100B1EC5DF[2] -> #x100931C7C7

Somehow I manage to make a loop in the compacting remset, by the way - or maybe I don't, I'm not sure. I have a program which will either cause it under a few minutes, or not after a day. Nor do I see why it should be possible to make a loop, as the remset uses the same data structure and synchronisation as tracing.

Douglas Katzman via Sbcl-devel

unread,
Jul 7, 2023, 6:11:23 PM7/7/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
Hi Hayley,
I've gotten started looking at this a little bit.
Would it make sense to push it to a branch in sourceforge rather than trying to merge it in one go?
It sort of depends on what you feel the state of it is in terms of cleanliness of the code and what assurances there are about non-degradation of performance of the default build.
I feel like at least if it's upstream I can help rebase it and whatnot.
Currently I'm under a bit of strain trying to get my own stuff done, plus review the powerPC darwin patch series, plus deal with impending breakage of Google-local patches after xof marges his external-format changes. I don't want you to feel I'm putting you off indefinitely, but I can't look at this whole thing and say "sure, merge to master" in any guaranteed timeframe.

Wrt header rearrangement: I'm all for it if it gets us closer to the include-what-you-use paradigm.
It should always be possible to remove an inclusion from a header that is not part of what it is trying to explicitly expose, and it should always be possible to include any given header without knowing what that header might need you to have included first. SBCL fails on both those aspects. I'm pretty pleased that each auto-generated headers can compile by itself now, fwiw.

P.S. I notice that you participate in the mmtk zulip chat.  Did you give serious consideration to creating a binding to SBCL or was it a pipe dream?

Hayley Patton

unread,
Jul 8, 2023, 1:07:24 AM7/8/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
> Hi Hayley,
> I've gotten started looking at this a little bit.
> Would it make sense to push it to a branch in sourceforge rather than
> trying to merge it in one go?
> It sort of depends on what you feel the state of it is in terms of
> cleanliness of the code and what assurances there are about
> non-degradation of performance of the default build.
The only change performance-wise to the default build is the panic
trigger for GC, which would be very hard to keep triggering without
exhausting the heap with gencgc. (Not sure about with the mark-region
GC; non-moving should help, but I haven't tested.) So I think
performance should be the same.

> Wrt header rearrangement: I'm all for it if it gets us closer to the
> include-what-you-use paradigm.
> It should always be possible to remove an inclusion from a header that
> is not part of what it is trying to explicitly expose, and it should
> always be possible to include any given header without knowing what
> that header might need you to have included first. SBCL fails on both
> those aspects. I'm pretty pleased that each auto-generated headers can
> compile by itself now, fwiw.
I suspect I haven't made the situation better; I moved some helper
functions from gencgc.c into headers, and seems I moved some things
between headers, but the cold isn't helping me figure out what moved
where from the patch file.

> P.S. I notice that you participate in the mmtk zulip chat.  Did you
> give serious consideration to creating a binding to SBCL or was it a
> pipe dream?

I did consider it. The main issues were that I wasn't sure how to rip
out all the code assuming gencgc stuff, nor how to make genesis dump a
heap that MMTk could manage. MMTk lacks conservative stack scanning and
weak references, which I could have implemented, and the
mutator/allocator state needlessly has state for every collector
plan/allocator, which I could have also modified (to only have the Immix
bump allocator, say). I arguably had to make some tricky changes to make
my collector work, however; but I think I got off easier there.

(Fixed the issue with the loop in the compacting remset. I got
materialising the allocation bitmap to crash once, but my stress testing
program kept running through the night, so we'll see if it happens again.)

Douglas Katzman via Sbcl-devel

unread,
Jul 8, 2023, 9:53:41 PM7/8/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
On Fri, Jul 7, 2023 at 9:06 PM Hayley Patton <hay...@applied-langua.ge> wrote:
> MMTk lacks conservative stack scanning and
It has it now. See mod.rs which exposes a "valid object" bit for conservative stack

weak references
and similarly add_weak_candidate

It seems like it might be worthwhile revisiting mmtk at some point

Hayley Patton

unread,
Jul 9, 2023, 12:39:28 AM7/9/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net

> MMTk lacks conservative stack scanning and
It has it now. See mod.rs which exposes a "valid object" bit for conservative stack
So it has. From trying to check just the one bit, I re-discovered that there can be interior pointers (from the program counter and stack?) to code; I'm not sure about the speed of checking a bit/location at a time to resolve interior pointers. Though I rewrote my bitmap-scanning code without being too clever about scanning the first word (as some bits may correspond to locations after the pointer) - I really have to benchmark again, and/or track how many conservative references have to be scanned.


weak references
and similarly add_weak_candidate
Not sure if that works for weak hash tables (in SBCL; OpenJDK uses weak pointer objects in the WeakHashMap), though there is a more general API for doing custom reference processing too <https://github.com/mmtk/mmtk-core/pull/700>. Shows that I haven't kept up with progress in MMTk much.

It seems like it might be worthwhile revisiting mmtk at some point

Definitely.

Christophe Rhodes

unread,
Jul 10, 2023, 7:34:36 AM7/10/23
to Douglas Katzman via Sbcl-devel
Douglas Katzman via Sbcl-devel <sbcl-...@lists.sourceforge.net>
writes:

> Currently I'm under a bit of strain trying to get my own stuff done,
> plus review the powerPC darwin patch series, plus deal with impending
> breakage of Google-local patches after xof marges his external-format
> changes.

For what it's worth, I now don't expect to merge external-format work
until early September, so hopefully that can take some of the feeling of
impending doom away at least for now.

Best,

Christophe

Douglas Katzman via Sbcl-devel

unread,
Jul 10, 2023, 11:52:01 PM7/10/23
to Hayley Patton, sbcl-...@lists.sourceforge.net

- I had to rearrange some headers, and I'm not sure if I did it in a reasonable way. I also started to #ifndef LISP_FEATURE_MARK_REGION_GC out dead code in gencgc, but I didn't finish. Should/could any common code be extracted into other files, rather than having to #ifndef out a large amount of gencgc.c?


In your desired end-state would you think that #+gencgc and #+mark-region-gc would be mutually exclusive?  I would.
But given the amount of common code, I think it should be pulled out of gencgc so that if #+gencgc is the selected algorithm, we compile gencgc.c plus the shared file; or else if #+mark-region-gc is selected then we compile mark-region.c and the shared file.
This will give greater confidence that gencgc when selected as the core algorithm is otherwise unaffected.
Some of this is as simple as more #defines.  e.g. Adding a C macro for set_allocation_bit_mark that expands to nothing or something would remove a few #ifdefs.

Btw, any user of ELF cores (of which I know none other than Google) won't be able to use mark-region until we come up with a way to save ELF cores that don't depend critically on #+immobile-space.  It should be possible to write the text section using code pages from anywhere.  I'm not sure if dedicated pages of code were even a thing when I first did ELF cores.

Hayley Patton

unread,
Jul 11, 2023, 2:18:53 AM7/11/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
- I had to rearrange some headers, and I'm not sure if I did it in a reasonable way. I also started to #ifndef LISP_FEATURE_MARK_REGION_GC out dead code in gencgc, but I didn't finish. Should/could any common code be extracted into other files, rather than having to #ifndef out a large amount of gencgc.c?
In your desired end-state would you think that #+gencgc and #+mark-region-gc would be mutually exclusive?  I would.
I'm not sure, I still use a lot of the gencgc code. Would there be another feature for the shared code, or would it be always included (assuming that #+cheneygc is now unused)? Do you use much of the gencgc code in your concurrent collector?

Some of this is as simple as more #defines.  e.g. Adding a C macro for set_allocation_bit_mark that expands to nothing or something would remove a few #ifdefs.
That's a good idea.

Btw, any user of ELF cores (of which I know none other than Google) won't be able to use mark-region until we come up with a way to save ELF cores that don't depend critically on #+immobile-space.  It should be possible to write the text section using code pages from anywhere.  I'm not sure if dedicated pages of code were even a thing when I first did ELF cores.

I do want to support immobile space, but somehow I couldn't get it to work (even for a rather lax definition of "work"). As in my first email there are many deficiencies that I want to correct in the future; but I'd prefer not to grow the pile of code for review any further, nor do I enjoy keeping up with changes made upstream, moreso when I've had to move code around already (though #-mark-region-gc passes the test suite, I'm still uneasy about it).

Douglas Katzman via Sbcl-devel

unread,
Jul 12, 2023, 4:36:35 AM7/12/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
Hi,
Having tried your GC now, I have to say it's performance is impressive. I've started to reshape some parts in a way that gives me greater confidence that 'gencgc' is unaffected.
Basically I want to have eitther gencgc.c or pmrgc.c (a new file) be compiled, while cutting out as much as I can from the commit which becomes the alternative GC.
I'll either send you back a patch for review, or maybe just commit to a branch.

To ensure I'm not making negative progress on things you want working eventually, can you confirm that it's not (now and/or forever) expected to compile on:
- Darwin that has write^execute restriction in hardware
- Darwin that lacks _Thread annotations
- which means not on any Darwin that I have access to, because each is either the first or second kind.
- pedantic compilers that require _Atomic(int) as the argument to atomic_whatever
- any of our architectures that use hardware page protection still
- #-sb-thread
- any other feature combinations? (is it sensitive to #+/- sb-safepoint)
Does it work with the precisely scavenged stacks? Or should we/can we start assuming that we'll transition to ambiguous stacks everywhere, for consistency?

I couldn't test on #+win32 (in emulation) because I'm hitting the gcc compiler bug for which I had discovered a "workaround" of downgrading msys2. But with your patch applied, I hit that gcc bug even on the older gcc. That's not surprising because the gcc bug itself is quite old.
Just wondering what else I should test on.
Thanks, Doug



Douglas Katzman via Sbcl-devel

unread,
Jul 12, 2023, 4:47:32 AM7/12/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
On Sun, Jul 2, 2023 at 7:04 AM Hayley Patton <hay...@applied-langua.ge> wrote:

- The tracing logic in fullcgc.c has been made generic in trace-object.inc, which is customised using the C preprocessor.


Why can't you ditch fullcgc.c and instead perform a full GC by considering gen6 as a regular generation, and scavenge only from static roots?
The sole reason that gencgc can't do that is that it has no way of evacuating gen6 (at least if gen6 is fixed in place), and without a way to evacuate it has no way to represent the known live objects. 
So fullcgc was devised as a hack that works around that limitation, which would not seem to pertain to a marking collector. Is it an issue of granularity of lines where you normally won't free a line until all objects on it are dead? 

Hayley Patton

unread,
Jul 12, 2023, 6:27:27 AM7/12/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
> Having tried your GC now, I have to say it's performance is impressive.

Glad to hear it!

> To ensure I'm not making negative progress on things you want working
> eventually, can you confirm that it's not (now and/or forever)
> expected to compile on:
> - Darwin that has write^execute restriction in hardware

It's not expected to work on Darwin with W^X, though support would be nice.

> - Darwin that lacks _Thread annotations

I expect to have _Thread_local at the moment, and definitely need some
kind of thread-local storage; does Darwin have a different way to access
TLS?

With the attached patch I currently wake each thread to make sure they
committed every packet of the compaction remset; a better design would
have tracing threads build up thread-local lists without committing to a
global list, and then compaction snoops each TLS to get each list. Such
a design would need us to reinvent TLS in order to be able to snoop, as
I was told we can't legally access a _Thread_local from another thread,
through e.g. publishing a pointer into each TLS.

> - pedantic compilers that require _Atomic(int) as the argument to
> atomic_whatever
That probably should be fixed.
> - any of our architectures that use hardware page protection still
The current scavenging algorithm relies on lines and cards being the
same size (128 bytes); regardless there's too much card pollution with
1k pages, which makes scavenging very slow, so I can't imagine 4k or 16k
hardware pages being better.
> - #-sb-thread
> - any other feature combinations? (is it sensitive to #+/- sb-safepoint)
I haven't tried to build #-sb-thread or #+sb-safepoint, though I don't
think there should be issues.
> Does it work with the precisely scavenged stacks? Or should we/can we
> start assuming that we'll transition to ambiguous stacks everywhere,
> for consistency?
It won't work at the moment, as I only modified the code for ambiguous
stacks. But precise scavenging could be made to work.
> I couldn't test on #+win32 (in emulation) because I'm hitting the gcc
> compiler bug for which I had discovered a "workaround" of downgrading
> msys2. But with your patch applied, I hit that gcc bug even on the
> older gcc. That's not surprising because the gcc bug itself is quite old.
> Just wondering what else I should test on.

I've only tested on x86-64/Linux; I don't know the Windows build stuff
and I don't have a Darwin system handy.

> Why can't you ditch fullcgc.c and instead perform a full GC by
> considering gen6 as a regular generation, and scavenge only from
> static roots?
> Is it an issue of granularity of lines where you normally won't free a
> line until all objects on it are dead?

That's what I do already, and there's no issues with it at the moment.
But it doesn't quite work for immobile space, as it'd promote all
immobile objects to gen6, as part of how objects are marked (by putting
them in the scratch generation, then putting them back in the generation
being collected). The mark-region GC represents (both object and line)
marking separate of generations, so it doesn't have this issue.

But I got the tracing code from fullcgc, and so decided to make the
tracing code customisable to keep gencgc working without having to
duplicate that code.

Also please find attached another patch which simplifies find_object and
the committing of thread-local remsets for compaction, speeds up
compaction by walking pages more efficiently, and adds some stub macros
to avoid some #ifdefs as you suggested. I had rare crashes due to
searching for ambiguous roots, and the GC would infrequently hang
because the remset somehow made a cycle after a packet was somehow
written twice, so thought it'd be nice (and seemingly
not-performance-regressing) to simplify both.

Thanks,
Hayley
0002-Simplify-find_object-and-remset-committing-add-stubs.patch

Douglas Katzman via Sbcl-devel

unread,
Jul 13, 2023, 2:21:03 PM7/13/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
Hi, 1 question about the current code, and then some marks that aren't relevant to the review. 

Question: why does scanning the binding stack use 'mr_preserve_ambiguous' instead of 'mr_preserve_range'? (TLS does use preserve_range)
The binding stack on #+sb-thread architectures looks like a sequence of fixnum and exact pointer. (The "fixnum" is actually the TLS index of the symbol as a raw value, but alignment makes it have a 0 low bit).  And for #-sb-thread it looks like a sequence of exact pointer to symbol and exact pointer to value. If you were seeing garbage values on the binding stack, that suggests a bug somewhere.



That's what I do already, and there's no issues with it at the moment.
But it doesn't quite work for immobile space ...
Anything about immobile space as it exists may become irrelevant. I'm not sure of the time frame.
Here's the idea:
* immobile fdefns need not exist. I already have the changes ready to put fdefns back in dynamic space. In that patch, lisp-to-lisp calls go through a linkage table (much like the foreign linkage table). On x86-64, the best-case call is 1 instruction and the worst-case is 2 instructions. It's strictly an improvement over using fdefns as they exist without immobile space, or equivalent to immobile fdefns.
* immobile code. If all code is immobile - and even if it isn't - all we really need to create an ELF core is the code placed contiguously. The sole reason I didn't try to pull code out of dynamic space was that I hadn't thought about how not to leave massive gaps in dynamic space, and as you know the on-disk core would either hold dead objects or be sparse or both. So I'm pretty sure we can do something with your GC to defrag the code into a separate space at save-lisp-and-die.
* immobile symbols - these are the only "complicated" piece.  We gain some codegen advantages by placing them sub-2GB. Could you have a part of dynamic space that is mapped discontiguously in your GC? It mostly just means that address-to-page-index and vice-versa become piecewise linear functions.
* immobile layouts - either use the same idea of immobile symbols, or restore to working order the space formerly known as metaspace. (I liked it, though I realize there is a dissenting opinion). The advantage to having 4-byte addressable layouts is very evident: a 3% to 5% reduction in memory use due to all instances being 1 word shorter on average.



Douglas Katzman via Sbcl-devel

unread,
Jul 13, 2023, 10:03:29 PM7/13/23
to Hayley Patton, sbcl-...@lists.sourceforge.net

Hi Hayley, I'm working on refactoring and rebasing your patch to master.

The overarching goal is that the core logic of 'gencgc' shall be essentially untouched, meaning not a single #ifdef added or removed. I discussed it with xof, we think it's reasonable to move things
from gencgc into a common location if needed by both it and mark-region-gc. And also it's OK to insert a certain amount of conditional code in the new place if innocuous and unlikely to crash in a subtle way. Things like saving the PTEs, I felt were unlikely to crash subtly.
The kinds of code I wished to keep pristine were the critical internal parts like lisp_alloc() which I did not want to see interspersed with more #ifdefs.

Given those ground rules I was able to whittle the new 'pmrgc.c' file down substantially. (You could combine it with mark-region.c)
As a preparatory commit, I will take everything that I pulled out of gengc into its new location. Reviewers would be easily able to see that all I did is move some code with absolutely no diffs other than possibly changing 'static' to 'extern'. However, I broke something in my attempt to rebase, so I will follow up with the actual patch when that's fixed.

My comments below are in 3 sections:
* test failures I observed that were other than known or don't-care
* things changed or still in need of change
* questions

Part I

In addition to your known test failures, a few failures in 'gc.impure' occur but - I think - only under parallel-exec. If I had to guess, it does not stop and restart the GC threads around fork(). So we can probably consider this an error in the test runner and it might be useful to have it fixed, as it will cut your testing time down by N cores.

::: Running :PIN-ALL-CODE-WITH-GC-ENABLED
::: UNEXPECTED-FAILURE :PIN-ALL-CODE-WITH-GC-ENABLED due to SIMPLE-ERROR:
        "68751066331 is not a valid argument to SB-KERNEL:MAKE-LISP-OBJ"

::: Running :GC-LOGFILE
fatal error encountered in SBCL pid 3648282 tid 3648282:
Invalid page type 0x12 (p2084)


Part II

* The post_process function in coreparse needed to receive 'spaces' as an argument otherwise we can't build with #+immobile-space.

* Conditionals for non-soft-card marks were all removed since the requirement on mutators is to be able to mark at a line granularity.  This takes out large pieces of code such as in 'unprotect_oldspace' about mprotect

* page_extensible_p seemed to have no bearing on mark-region but was moved to a common header

* irrelevant variables: alloc_granularity, new_areas

* Obsolete comments.
  "Note that the majority of GC is single-threaded"
  "This is done at the start ... many page faults"

* I'm not thrilled that we have to conditionalize the 'gc-mumble' headers when the reason for naming a file per GC in the first place was to conditionally include this or that. Of course it never worked as intended. But it certainly could stand improvement.

Part III

* The general version of the tracing algorithm is not type-safe.
  Consider that in fullcgc.c with immobile-space enabled, the C preprocessor emits:
    wrap_mark(layout, &((uint32_t*)(where))[1], SOURCE_NORMAL);
but the second parameter to the ACTION is supposed to be lispobj*.  This surely means you can't deference through it, which I presume will constitute a bug if applied to anything that needs the second arg. It's not a bug in mark-region because there are no half-sized pointers without immobile-space.

* Can pinned_p just return a constant 0 instead of pretending that the pinned_objects hash-table ever has >0 items? I never saw anything inserted in the regression suite.

* page_single_obj_p seems to always return 0 in my testing. Is that right?
  This discovery came about when attempting to check if 'add_new_area' could be removed, as I tried to cause copy_potential_large_object() fall into the case where it would call add_new_area, but it never did. (And of course add_new_area can be removed, since areas are never read, which I realized after the fact)

* Is the comment above RESET_ALLOC_START_PAGES still right in its entirety, particularly where it says "other than it serves its purpose for picking up where it left off" ?
  Does mark-region.c use the start page for that purpose?
 
* Similarly adjust_obj_ptes is never called. So you won't account for pages freed if a vector shrinks from 1MB to 1KB ? I removed adjust_obj_ptes but maybe you'll put it back if this was the wrong thing to do.
 
* Does 'walk_generation' remain correct as-is if generation is tracked at a line level? Does it only work if the supplied generation is -1 (therefore selecting all generations)?

* We should probably remove the hook for the statistical profiler in :ALLOCATION mode. It expects to sample a stack once every GENCGC_PAGE_BYTES, but it doesn't really do that with mark region since you don't know how many lines you made available in a new region. (I don't use this feature, so I don't really care. SB-APROF is better)

Hayley Patton

unread,
Jul 14, 2023, 1:56:36 AM7/14/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
On 14/7/23 08:02, Douglas Katzman wrote:
In addition to your known test failures, a few failures in 'gc.impure' occur but - I think - only under parallel-exec. If I had to guess, it does not stop and restart the GC threads around fork(). So we can probably consider this an error in the test runner and it might be useful to have it fixed, as it will cut your testing time down by N cores.

::: Running :PIN-ALL-CODE-WITH-GC-ENABLED
::: UNEXPECTED-FAILURE :PIN-ALL-CODE-WITH-GC-ENABLED due to SIMPLE-ERROR:
        "68751066331 is not a valid argument to SB-KERNEL:MAKE-LISP-OBJ"

::: Running :GC-LOGFILE
fatal error encountered in SBCL pid 3648282 tid 3648282:
Invalid page type 0x12 (p2084)

I've seen both errors without the parallel test runner. I don't get why either happens; for PIN-A-C-W-G-E it's possible that haven't yet put down part of the allocation bitmap (as GC hasn't happened between allocation and searching dynamic space), but find_object should spot that the object is in a fresh line, and materialise that part of the bitmap. GC-LOGFILE is crashing due to finding a single-object boxed page - the allocator treats objects larger than 3/4 a page to be "large" when allocation takes the slow path, to avoid skipping over pages which could be useful for small objects, and seems it made a single-object boxed page. (But the allocator will put such a somewhat-large object on a multi-object page if the object fits.) Does gencgc never make single-object boxed pages?

Part II

* The post_process function in coreparse needed to receive 'spaces' as an argument otherwise we can't build with #+immobile-space.
Well spotted.

* page_extensible_p seemed to have no bearing on mark-region but was moved to a common header
I think I would have used it before, but then stopped using it, because page_cards_all_marked_nonsticky would be too eager to rule out partly used pages, when we have multiple generations in a page and can reclaim without moving. (And p_c_a_m_s reads 32 words of the card table - genesis/cardmarks.h is quite exciting with 128 byte cards).

* Obsolete comments.
  "Note that the majority of GC is single-threaded"
Scavenging and sweeping have taken most of the GC time in some workloads I tested; those phases don't seem to scale so well, so that comment might as well be true sometimes :)

Part III

* The general version of the tracing algorithm is not type-safe.
  Consider that in fullcgc.c with immobile-space enabled, the C preprocessor emits:
    wrap_mark(layout, &((uint32_t*)(where))[1], SOURCE_NORMAL);
but the second parameter to the ACTION is supposed to be lispobj*.  This surely means you can't deference through it, which I presume will constitute a bug if applied to anything that needs the second arg. It's not a bug in mark-region because there are no half-sized pointers without immobile-space.
The second parameter could be NULL when #+compact-instance-header, as we needn't log pointers into immobile space, as we won't ever target it for incremental compaction. I still want to support immobile space eventually, so it'll be a bug then.

* Can pinned_p just return a constant 0 instead of pretending that the pinned_objects hash-table ever has >0 items? I never saw anything inserted in the regression suite.
I pin at the granularity of entire pages, as pinning only matters for compaction, and it doesn't seem to hurt compaction quality. Then pinned_p only needs to check gc_page_pins.

* page_single_obj_p seems to always return 0 in my testing. Is that right?
  This discovery came about when attempting to check if 'add_new_area' could be removed, as I tried to cause copy_potential_large_object() fall into the case where it would call add_new_area, but it never did. (And of course add_new_area can be removed, since areas are never read, which I realized after the fact)
That doesn't seem right; try_allocate_large sets SINGLE_OBJECT_FLAG e.g. But I don't think the incremental compactor ever attempts to copy a large object, which might explain what you observed.

* Is the comment above RESET_ALLOC_START_PAGES still right in its entirety, particularly where it says "other than it serves its purpose for picking up where it left off" ?
  Does mark-region.c use the start page for that purpose?
It does.

* Similarly adjust_obj_ptes is never called. So you won't account for pages freed if a vector shrinks from 1MB to 1KB ? I removed adjust_obj_ptes but maybe you'll put it back if this was the wrong thing to do.
I wasn't aware that SBCL could shrink objects in-place. (I saw logic for it in e.g. copy_potential_large_object, but haven't seen anything which shrinks a vector.) No, that's not accounted for.

* Does 'walk_generation' remain correct as-is if generation is tracked at a line level? Does it only work if the supplied generation is -1 (therefore selecting all generations)?
Right, It only works if the supplied generation is -1. I couldn't find any users of walk_generation that provide a specific generation?

* We should probably remove the hook for the statistical profiler in :ALLOCATION mode. It expects to sample a stack once every GENCGC_PAGE_BYTES, but it doesn't really do that with mark region since you don't know how many lines you made available in a new region. (I don't use this feature, so I don't really care. SB-APROF is better)

I suspect we could rig the allocator to count how many lines get allocated, and ensure that the slow path gets taken after allocating GENCGC_PAGE_BYTES. But I agree that the hook could just be removed, if no one uses sprof with :ALLOCATION mode.

Douglas Katzman via Sbcl-devel

unread,
Jul 14, 2023, 2:46:14 AM7/14/23
to Hayley Patton, sbcl-...@lists.sourceforge.net


I wasn't aware that SBCL could shrink objects in-place. (I saw logic for it in e.g. copy_potential_large_object, but haven't seen anything which shrinks a vector.) No, that's not accounted for.
There are essentially two uses for object size shrinking:
1. %SHRINK-VECTOR which is used by sequence functions on arrays where we'll overestimate the result size and then maybe clip the end
2. Bignum operations. Normalizing a bignum where all the high words became 0 or all became -1 can remove insignificant sign-extending words

Attached are my changes that apply cleanly at the latest commit. It builds with several combinations of options but I didn't try plain gencgc on anything but x86-64 just yet.
I plan to make it where mark-region and gencgc are mutually exclusive choices.
It'll look nicer when the base of this change has all the stuff which I moved out of gencgc already moved out in a precursor change.
I did NOT apply your second patch ("simplify find_object") in this.
Let me know what you think.  The tests passed for me except the 4 that you probably know of. And all tests passed without the mark-region-gc feature, and I think with and without immobile-space.

0001-Hayley-s-GC-refactored.patch

Hayley Patton

unread,
Jul 15, 2023, 1:18:44 AM7/15/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
On 14/7/23 12:45, Douglas Katzman wrote:
There are essentially two uses for object size shrinking:
1. %SHRINK-VECTOR which is used by sequence functions on arrays where we'll overestimate the result size and then maybe clip the end
2. Bignum operations. Normalizing a bignum where all the high words became 0 or all became -1 can remove insignificant sign-extending words
Seems mark_lines and sweeping would do the right thing after a vector gets shrunk.

Attached are my changes that apply cleanly at the latest commit. It builds with several combinations of options but I didn't try plain gencgc on anything but x86-64 just yet.
I plan to make it where mark-region and gencgc are mutually exclusive choices.
It'll look nicer when the base of this change has all the stuff which I moved out of gencgc already moved out in a precursor change.
I did NOT apply your second patch ("simplify find_object") in this.
Let me know what you think.  The tests passed for me except the 4 that you probably know of. And all tests passed without the mark-region-gc feature, and I think with and without immobile-space.

Looks good to me. Is duplicating the functions between pmrgc.c and gencgc.c going to pose any problems for anyone wanting to change one of the duplicated functions? pmrgc.c is relatively small, to be fair.

Douglas Katzman via Sbcl-devel

unread,
Jul 15, 2023, 1:30:38 AM7/15/23
to Hayley Patton, sbcl-...@lists.sourceforge.net


Looks good to me. Is duplicating the functions between pmrgc.c and gencgc.c going to pose any problems for anyone wanting to change one of the duplicated functions? pmrgc.c is relatively small, to be fair.


I think we prefer some code duplication than run the risk of modifying gencgc when we don't intend to.

I just pushed a branch which contains that patch plus the changes necessary for x86-64 macOS.
Performance seemed OK on the mac even though it has to make a ton of calls to pthread_getspecific.
My guess is that it may be more typical to try to bundle all your thread-local variables into one struct, call pthread_getspecific in some top level location in the GC worker, and then pass the struct around a lot more. But maybe it's fine.

Hayley Patton

unread,
Jul 15, 2023, 4:41:09 AM7/15/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
On 14/7/23 00:20, Douglas Katzman wrote:
Question: why does scanning the binding stack use 'mr_preserve_ambiguous' instead of 'mr_preserve_range'? (TLS does use preserve_range)
The binding stack on #+sb-thread architectures looks like a sequence of fixnum and exact pointer. (The "fixnum" is actually the TLS index of the symbol as a raw value, but alignment makes it have a 0 low bit).  And for #-sb-thread it looks like a sequence of exact pointer to symbol and exact pointer to value. If you were seeing garbage values on the binding stack, that suggests a bug somewhere.
No good reason, mr_preserve_range seems to work fine.

* immobile symbols - these are the only "complicated" piece.  We gain some codegen advantages by placing them sub-2GB. Could you have a part of dynamic space that is mapped discontiguously in your GC? It mostly just means that address-to-page-index and vice-versa become piecewise linear functions.
It's possible, but I'm not sure about the inability to defrag immobile space at runtime; non-moving mark-region collection does produce fragmentation by bump-allocating, and I'd assume TLSF is smarter at not fragmenting.

* immobile layouts - either use the same idea of immobile symbols, or restore to working order the space formerly known as metaspace. (I liked it, though I realize there is a dissenting opinion). The advantage to having 4-byte addressable layouts is very evident: a 3% to 5% reduction in memory use due to all instances being 1 word shorter on average.

Something to do with immobile space makes the mutator about 40% faster in my parallel fuzz tester application FWIW. I'd like to say it's compact headers, but the cache miss rate is only a bit lower (from 17% without immobile space 15% with), so I don't have any strong evidence.

On 15/7/23 11:29, Douglas Katzman wrote:

I just pushed a branch which contains that patch plus the changes necessary for x86-64 macOS.
Performance seemed OK on the mac even though it has to make a ton of calls to pthread_getspecific.
My guess is that it may be more typical to try to bundle all your thread-local variables into one struct, call pthread_getspecific in some top level location in the GC worker, and then pass the struct around a lot more. But maybe it's fine.

Neat. The thread-locals are nice specifically for not having to pass the state through functions like e.g. scavenge_root_gens_worker -> scavenge_root_object -> trace_object -> mark, but if it doesn't hurt performance then there's no worries. Though the struct-of-state would give us back lvalues I think - on Darwin we might have

#define TLS(name) pthread_getspecific(gc_key)->name

and

#define TLS(name) name

on platforms with _Thread_local, say. Then e.g. TLS(dirty_generation_source) = gen; should work okay. In my experience on Linux (don't have a Mac handy) all of tracing gets inlined together; pity Clang doesn't seem to be able to CSE pthread_getspecific.

I don't think I understand _Thread_local support in Darwin though; <https://stackoverflow.com/questions/33358417/how-to-get-support-for-thread-local-on-mac-osx-clang> says that Xcode 8 and later support _Thread_local annotations. What versions of macOS and Xcode should SBCL support? But I gather it's still a problem due to you saying "because each is either the first [W^X] or second [_Thread_local-less] kind."

Douglas Katzman via Sbcl-devel

unread,
Jul 15, 2023, 3:27:07 PM7/15/23
to Hayley Patton, sbcl-...@lists.sourceforge.net

I don't think I understand _Thread_local support in Darwin 

I don't either, but I just discovered that this pain is self-inflicted by our default Config.x86-64-darwin. We're *forcing* the compiler not to accept _Thread_local.
I can bump the version-min from 10.6 to 10.7 which does support it. Problem solved.

Hayley Patton

unread,
Jul 18, 2023, 12:52:09 AM7/18/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
On 18/7/23 05:21, Douglas Katzman wrote:
I'm trying to clean up the headers for commiting your stuff to master and I'm wondering about some small changes that nobody else will really care about.
* To make the feature mark-region-gc mutually exclusive with gencgc I'm having to add a lot of "if define LISP_FEATURE_MARK_REGION_GC || defined LISP_FEATURE_GENCGC".  Do you think it makes sense to define a single feature that subsumes those both and means "not cheneygc"? I'm not sure what the name of the feature should be.
What's wrong with having #+mark-region-gc imply #+gencgc (as it was in my patch)? I don't have an opinion on the matter, just wondering; nor am I good at naming things.
* Do you think I should rip out all traces of cheney? Every time I threaten to, xof tells me: "I liked cheneygc because it was an algorithm I could fit in my head".  He said it again just last week.  The point is that he wants at least _some_ other algorithm, but I would guess that mark-region does not satisfy his "fit in head" requirement.
From the archives I found "As Christophe says, cheney is nice because it fits in ones head, keeps GC abstractions more honest" -- the mark-region collector is quite different and discourages too tight coupling to gencgc-isms, perhaps in different ways to cheneygc too (like walking the heap by stepping over contiguous objects). I suspect the same of your concurrent non-moving collector.
* I want to rename pseudo-static-generation to static-generation. I see nothing "pseudo" about it. It's immortal and doesn't move.
Some people view it as a bug that everything is immortal. I suppose you could rectify that in your GC
In my concurrent GC I'm actually just treating it as an extension of static space so it has the same limitation, rightly or wrongly.
I collect the pseudo-static generation just fine in a full GC, but wouldn't know when to automatically collect it. The usual allocation trigger wouldn't work as we don't allocate into pseudo-static (but I gather pseudo-static objects can become unreachable over time due to mutations).
* Do you still use FILLER_WIDETAG ? Can you just mark the lines as unallocated or restore them to whatever their initial state is?
I do not use FILLER_WIDETAG. Indeed the sweep only touches the line bytemap and allocation bitmap, and never touches heap data.
* The code in gencgc which pins starting-threads (thread instance, start function, and vector of args to start-function) might be something we can remove in mark-region, unless you think you really will have incremental moving in which case pinning remains relevant.
I have incremental compaction (in the sense that it is in STW but copies only part of the heap at a time), and have pinning; I'm not too sure what's special about STARTING_THREADS, so the thread, function and arguments should still be pinned?
* Do you see a way to completely remove the C code supporting search-roots from the GC?  I suspect the way the Lisp side should work is as follows: try-acquire-gc-lock; if successful, then stop-the-world; synchronize page table usage; call the path finder; restart the world. Also I think the way it should build the inverted heap is not via a linear walk of objects, but actually do a graph trace and record reverse pointers as it does so.
Rereading traceroot.c I don't think I quite understand how it works. examine_threads scans the TLS, binding stacks, control stacks and pin lists, so I'm not sure what else would be in the gencgc pin table?
Do you have a chat application that you use? I kind of dropped off the sbcl IRC.
I check IRC, Matrix and Discord regularly enough; open to suggestions for other applications though.

Christophe Rhodes

unread,
Jul 18, 2023, 8:21:46 AM7/18/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
Hayley Patton <hay...@applied-langua.ge> writes:

> * Do you think I should rip out all traces of cheney? Every time I threaten to, xof tells me: "I liked cheneygc because it was an algorithm I
> could fit in my head". He said it again just last week. The point is that he wants at least _some_ other algorithm, but I would guess that
> mark-region does not satisfy his "fit in head" requirement.
>
> From the archives I found "As Christophe says, cheney is nice because it fits in ones head, keeps GC abstractions more honest" -- the
> mark-region collector is quite different and discourages too tight coupling to gencgc-isms, perhaps in different ways to cheneygc too (like
> walking the heap by stepping over contiguous objects). I suspect the same of your concurrent non-moving collector.

I mean, we shouldn't over-optimize for my nostalgia for GCs that fit in
my head. But I think it would/should be an interesting project for
someone™ to construct the simplest possible GC for SBCL; what does it
look like? (Would it fit in my head?) What are the immovable
constraints?

None of this is urgent, and if the remnants of Cheney are getting in the
way, then by all means remove them. I agree that having N=2 garbage
collectors of any kind helps with the "keeps the abstractions honest"
part.

Douglas Katzman via Sbcl-devel

unread,
Jul 18, 2023, 2:05:57 PM7/18/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
On Mon, Jul 17, 2023 at 8:51 PM Hayley Patton <hay...@applied-langua.ge> wrote:

What's wrong with having #+mark-region-gc imply #+gencgc (as it was in my patch)? I don't have an opinion on the matter, just wondering; nor am I good at naming things.
From the end-user's perspective the GC choice is an "A, B, C" rotary knob. Whether there is code sharing or inheritance shouldn't matter but I don't want two GC features enabled. But also imo it's particularly confusing that were it to be the case that (:and :gencgc :mark-region) is enabled, we literally do not compile gencgc.c any more.
I think I'm going to make the feature named :gengc-heap to imply that it's heap-layout-compatible with gencgc; that'll be an internal-use-only feature, and a simple search-and-replace will take care any #+gencgc that needs to migrate to #+gengc-heap. (I'm dropping a 'c' because there has always been one more than needed. We get it - it's either precise or conservative)

I have incremental compaction (in the sense that it is in STW but copies only part of the heap at a time), and have pinning; I'm not too sure what's special about STARTING_THREADS, so the thread, function and arguments should still be pinned?
Based on random factors such as the kernel scheduler, a thread creator may have already dropped all its references to a new thread's start closure by the time pthread_create invokes its arg.   So new_thread_trampoline holds the sole pointer to a thread instance and function+argv.  Under precise gencgc, pointers from C code would never be perceived as references because they are not from the lisp stack. Or for x86-64, depending on whether the C compiler decided to produce untagged pointers, it may not be perceived as a reference.  e.g. the C code uses certain untagging idioms like:
static inline struct cons* CONS(lispobj obj) {
  return (struct cons*)(obj - 7);
}
which either does or does not overwrite the source register (at the whim of the C compiler) and thereby losing a tagged pointer.
IIRC your GC pins stack references without regard to lowtag.  So maybe we don't need to pin anything explicitly, because the *starting-threads* value itself is sufficient to ensure liveness of every object that every newly created thread needs to see.  I've also been kicking around the idea of linking a new thread into all_threads before calling the startup trampoline. GC could do something to pin the startup function of threads that have been linked but not yet run. Seems better than a random global.

Hayley Patton

unread,
Jul 21, 2023, 3:21:17 AM7/21/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
Have you got a relatively up-to-date branch with the mark-region GC? I
hadn't updated my branch for a while, and now somehow gencgc crashes due
to missing a dirty card after I tried to merge from upstream. I can't
throw as much spare time at debugging GC as I've just started my last
semester of university.

Thanks,
Hayley

Douglas Katzman via Sbcl-devel

unread,
Jul 21, 2023, 11:39:45 AM7/21/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
I have a branch with it rebased to latest head but I haven't built it with gencgc. I did put a change into scav_code that likely needs to be conditioned out for mark-region but it only affects save-lisp-and-die (https://sourceforge.net/p/sbcl/sbcl/ci/ec6fbf07daef47d88f2167f1735698b1bbbb76d5/)
I plan to try to commit another piece of mark-region today

Douglas Katzman via Sbcl-devel

unread,
Jul 21, 2023, 10:57:55 PM7/21/23
to Hayley Patton, sbcl-...@lists.sourceforge.net
I have what appears to be a stuck regression run. Is this something you've seen?
I can't get much more information from it because it's in parallel-exec, so the logging is being done to a file for this test, and it's not calling fflush when I try backtrace_from_fp.

(gdb) info threads

  Id   Target Id                                         Frame

* 1    Thread 0x7f6beb938740 (LWP 2126589) "sbcl"        __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x560b477ee120 <join_semaphore>) at ./nptl/futex-internal.c:57

  2    Thread 0x7f6be8e146c0 (LWP 2126636) "Parallel GC" 0x00007f6beba0a345 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f6be8e13e20, rem=rem@entry=0x0)

    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48

  3    Thread 0x7f6be97a46c0 (LWP 2126637) "Parallel GC" __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x560b477ee100 <start_semaphore>) at ./nptl/futex-internal.c:57

  4    Thread 0x7f6bea1346c0 (LWP 2126638) "Parallel GC" __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x560b477ee100 <start_semaphore>) at ./nptl/futex-internal.c:57

  5    Thread 0x7f6be75276c0 (LWP 2126640) "finalizer"   __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f6be7730218) at ./nptl/futex-internal.c:57

Hayley Patton

unread,
Jul 22, 2023, 12:02:28 AM7/22/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
New to me, unfortunately. Looks like something's broken in trace_step, as the GC only sleeps when blocks_in_flight != 0 and grey_list is empty. Evidently no threads are working on packets, so blocks_in_flight should be 0.
(My second patch addresses getting stuck in compaction by the way.)
Reply all
Reply to author
Forward
0 new messages