[Sbcl-devel] corruption & memory fault (from gc during compiler note printing?)

26 views
Skip to first unread message

Andreas Franke

unread,
Oct 23, 2023, 9:51:17 PM10/23/23
to sbcl-...@lists.sourceforge.net
On current sbcl (6778aabf4), I'm seeing corruption warnings
and memory faults, possibly related to GCs happening while
compiler notes (with types) are being printed.

I can reproduce the problem as follows (didn't try to minimize):
- place attached files "memlog.lisp" and "trigger-corruption.sh"
somewhere in the same directory
- cd to the root of the sbcl git repo (where run-sbcl.sh is located)
- from there, run trigger-corruption.sh

(For each run, it
- clears the user's entire fasl cache,
- loads some quicklisp systems while retaining some memory info
in a new memlogs-XXXXXX directory, and
- logs the normal output to /tmp/sbcl-run-N.log )

When the problem occurs, the loop stops. Happens quickly for me.
(For an example output, see attached "corruption-0.log".)

Cleanup at the end can be done with two kill -HUP <procid> commands:
1. for the bash process that runs trigger-corruption.sh,
2. for the sbcl process that runs the mapc #'ql:quickload

HTH...
corruption-0.log
memlog.lisp
trigger-corruption.sh

Stas Boukarev

unread,
Oct 23, 2023, 10:40:48 PM10/23/23
to Andreas Franke, sbcl-...@lists.sourceforge.net
Is it not rather from doing ROOM randomly? I guess don't do that then.

_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Stas Boukarev

unread,
Oct 23, 2023, 11:04:45 PM10/23/23
to Andreas Franke, sbcl-...@lists.sourceforge.net
I made ROOM ignore partially allocated instances. But still don't do it.
Thanks for reporting.

Stas Boukarev

unread,
Oct 24, 2023, 12:37:57 AM10/24/23
to Andreas Franke, sbcl-...@lists.sourceforge.net
There's also backtrace at random times, but can't really do anything about it.

Douglas Katzman via Sbcl-devel

unread,
Oct 26, 2023, 5:06:33 PM10/26/23
to Stas Boukarev, Andreas Franke, sbcl-...@lists.sourceforge.net
I've looked at this a bit, and it seems that every failure has to do with dynamic-extent objects in arglists.  Backtrace calls REPLACE-DYNAMIC-EXTENT-OBJECTS but there are dynamic-extent objects inside heap objects, so it does not replace the culprits.
As soon as I make one function "immune" to bad DX objects, and rerun the sample code, another function turns up that crashes.  And this boring chore of making them immune is both terrible for semantics and performance.  All the objects that happen to cause crashing are INSTANCE types (luckily, I would say), so I've added checks that any pointer into the stack tagged as instance-pointer-lowtag points to a word with instance-header-widetag, and that the layout-of is also an instance.  I've done this in things like %FUN-NAME and OUTPUT-UGLY-OBJECT, doing something like returning NIL if the checks fail.  This is horrible.

Now it's quite clear that this test case is mostly just exercising the compiler; so my questions about that are several:
- Do we believe that it is possible to write code which respects the concept of dynamic-extent such that a function which receives a DX arg is on the stack, but the arglists somehow contains a heap object pointing to an object in a frame that exited, or otherwise reused those stack words? (which is exactly what seems to be happening)
- Do we believe that this interesting pattern is part of SBCL itself and fairly widespread?
- If not part of the compiler, do we believe that users can write macroexpanders and such that it makes it look like it's in the compiler?
- What does this have to do with GC? 

TLDR: it seems that under the wrong circumstances, it's impossible to backtrace through the compiler because many dynamic-extent objects are buried in heap objects, and the frames that created the DX objects have exited. Is this legal?

Stas Boukarev

unread,
Oct 26, 2023, 5:09:28 PM10/26/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
One (not saying there are more) of the failures is in the backtrace is from &rest to list allocation, it overwrites the arg-count register so when interrupted the debugger doesn't know that it's not valid and grabs extra arguments from the stack.

Douglas Katzman via Sbcl-devel

unread,
Oct 26, 2023, 5:49:59 PM10/26/23
to Stas Boukarev, sbcl-...@lists.sourceforge.net
Perhaps the first frame of an asynchronously-requested backtrace should never try to decode args?
But in this example is it async? One thread is compiling, and I presume the GC is invoked by a trap instruction at the end of a pseudo-atomic sequence, so it shouldn't be in the function prologue creating a &rest list. Maybe there's some involvement with the finalizer thread being the one to trigger the GC.

Incidentally, I think that the finalizer thread should be the one to run post-GC actions in response to the pthread condition var being signaled, so this particular piece of adversarial code from Andreas to gather a backtrace after GC won't actually be useful. It would merely print the backtrace of the finalizer thread at the top of its event loop.

Stas Boukarev

unread,
Oct 26, 2023, 5:53:57 PM10/26/23
to Douglas Katzman, sbcl-...@lists.sourceforge.net
The debugger doesn't encode locations for the allocation point. I plan on revisiting the &more entry points, they don't run any code and only errors they produce are odd &key and unknown keys. Yet they encode a lot of information (or not enough in this case, but that's going to be a problem with any random interrupt). Maybe even standardize the calling convention so that all the registers are known. (But I have no concrete plans, as usual)

Andreas Franke

unread,
Oct 27, 2023, 1:48:56 AM10/27/23
to sbcl-...@lists.sourceforge.net
Thank you both for having looked at this.
 
I can confirm that the suggestion of turning off arg decoding in backtraces via
(setq sb-debug::*default-argument-limit* 0)
appears to work well to avoid the problems.
 
Cheers,
Andreas
 
Gesendet: Donnerstag, 26. Oktober 2023 um 23:53 Uhr
Von: "Stas Boukarev" <stas...@gmail.com>
An: "Douglas Katzman" <do...@google.com>
Cc: sbcl-...@lists.sourceforge.net
Betreff: Re: [Sbcl-devel] corruption & memory fault (from gc during compiler note printing?)
_______________________________________________ Sbcl-devel mailing list Sbcl-...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Stas Boukarev

unread,
Oct 27, 2023, 11:04:26 AM10/27/23
to Andreas Franke, sbcl-...@lists.sourceforge.net
Maybe specify it the call
(sb-debug:print-backtrace :argument-limit 0)
for more future compatibility.
Reply all
Reply to author
Forward
0 new messages