Futures + threads SIGSEGV

40 views
Skip to first unread message

Dominik Pantůček

unread,
May 2, 2020, 7:56:47 AM5/2/20
to Racket Users
Hello fellow Racketeers,

during my research into how Racket can be used as generic software
rendering platform, I've hit some limits of Racket's (native) thread
handling. Once I started getting SIGSEGVs, I strongly suspected I am
doing too much unsafe operations - and to be honest, that was true.
There was one off-by-one memory access :).

But that was easy to resolve - I just switched to safe/contracted
versions of everything and found and fixed the bug. But I still got
occasional SIGSEGV. So I dug even deeper (during last two months I've
read most of the JIT inlining code) than before and noticed that the
crashes disappear when I refrain from calling bytes-set! in parallel
using futures.

So I started creating a minimal-crashing-example. At first, I failed
miserably. Just filling a byte array over and over again, I was unable
to reproduce the crash. But then I realized, that in my application,
threads come to play and that might be the case. And suddenly, creating
MCE was really easy:

Create new eventspace using parameterize/make-eventspace, put the actual
code in application thread (thread ...) and make the main thread wait
for this application thread using thread-wait. Before starting the
application thread, I create a simple window, bitmap and a canvas, that
I keep redrawing using refresh-now after each iteration. Funny thing is,
now it keeps crashing even without actually modifying the bitmap in
question. All I need to do is to mess with some byte array in 8 threads.
Sometimes it takes a minute on my computer before it crashes, sometimes
it needs more, but it eventually crashes pretty consistently.

And it is just 60 lines of code:

#lang racket/gui

(require racket/future racket/fixnum racket/cmdline)

(define width 800)
(define height 600)

(define framebuffer (make-fxvector (* width height)))
(define pixels (make-bytes (* width height 4)))

(define max-depth 0)

(command-line
#:once-each
(("-d" "--depth") d "Futures binary partitioning depth" (set! max-depth
(string->number d))))

(file-stream-buffer-mode (current-output-port) 'none)

(parameterize ((current-eventspace (make-eventspace)))
(define win (new frame%
(label "test")
(width width)
(height height)))
(define bmp (make-bitmap width height))
(define canvas (new canvas%
(parent win)
(paint-callback
(λ (c dc)
(send dc draw-bitmap bmp 0 0)))
))

(define (single-run)
(define (do-bflip start end (depth 0))
(cond ((fx< depth max-depth)
(define cnt (fx- end start))
(define cnt2 (fxrshift cnt 1))
(define mid (fx+ start cnt2))
(let ((f (future
(λ ()
(do-bflip start mid (fx+ depth 1))))))
(do-bflip mid end (fx+ depth 1))
(touch f)))
(else
(for ((i (in-range start end)))
(define c (fxvector-ref framebuffer i))
(bytes-set! pixels (+ (* i 4) 0) #xff)
(bytes-set! pixels (+ (* i 4) 1) (fxand (fxrshift c 16)
#xff))
(bytes-set! pixels (+ (* i 4) 2) (fxand (fxrshift c 8) #xff))
(bytes-set! pixels (+ (* i 4) 3) (fxand c #xff))))))
(do-bflip 0 (* width height))
(send canvas refresh-now))
(send win show #t)

(define appthread
(thread
(λ ()
(let loop ()
(single-run)
(loop)))))
(thread-wait appthread))

Note: the code is deliberately de-optimized to highlight the problem.
Not even mentioning CPU cache coherence here....

Running this from command-line, I can adjust the number of threads.
Running with 8 threads:

$ time racket crash.rkt -d 3
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real 1m18,162s
user 7m11,936s
sys 0m3,832s
$ time racket crash.rkt -d 3
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real 3m44,005s
user 20m10,920s
sys 0m11,702s
$ time racket crash.rkt -d 3
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real 2m1,650s
user 10m58,392s
sys 0m6,445s
$ time racket crash.rkt -d 3
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real 8m8,666s
user 45m52,359s
sys 0m25,184s
$

With 4 threads it didn't crash even after quite some time:

$ time racket crash.rkt -d 2
^Cuser break
context...:
"crash.rkt": [running body]
temp35_0
for-loop
run-module-instance!
perform-require!

real 20m18,706s
user 61m38,546s
sys 0m22,719s
$


I'll re-run the 4-thread test overnight.

What would be the best approach to debugging this issue? I assume I'll
load the racket binary in gdb and see the stack traces at the moment of
the crash, but that won't reveal the source of the problem (judging
based on my previous experience of debugging heavily multi-threaded
applications). Also I probably need a build with debugging symbols,
which is my plan for this afternoon.

I am running this on:

model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

HT is enabled.

Although this is just a side project, my work (that is the paid-for
work) relies heavily on futures and GUI, so I would really like to nail
down and fix this problem.

Any suggestions are welcome.


Cheers,
Dominik

Dexter Lagan

unread,
May 2, 2020, 8:10:25 AM5/2/20
to Dominik Pantůček, Racket Users
Hello,

  I’ve been getting inconsistent results as well. A while ago I made a benchmark based on a parallel spectral norm computation. The benchmark works fine on Windows on most systems and uses all cores, but crashes randomly on other systems. I haven’t been able to figure out why. On Linux it doesn’t seem to use more than one core. I’d be interested to know if this is related. Here’s the benchmark code :

https://github.com/DexterLagan/benchmark

Dex

On May 2, 2020, at 1:56 PM, Dominik Pantůček <dominik....@trustica.cz> wrote:

Hello fellow Racketeers,
--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/9e49fa26-5234-17eb-7dad-09df8a84b147%40trustica.cz.

Sam Tobin-Hochstadt

unread,
May 2, 2020, 8:26:26 AM5/2/20
to Dominik Pantůček, Racket Users
I successfully reproduced this on the first try, which is good. Here's
my debugging advice (I'm also looking at it):

1. To use a binary with debugging symbols, use
`racket/src/build/racket/racket3m` from the checkout of the Racket
repository that you built.
2. When running racket in GDB, there are lots of segfaults because of
the GC; you'll want to use `handle SIGSEGV nostop noprint`
3. It may not work for this situation because of parallelism, but if
you can reproduce the bug using `rr` [1] it will be almost infinitely
easier to find and fix.

I'm also curious about your experience with Racket CS and futures.
It's unlikely to have the _same_ bugs, but it would be good to find
the ones there are. :)

[1] https://rr-project.org

Dominik Pantůček

unread,
May 2, 2020, 8:27:37 AM5/2/20
to racket...@googlegroups.com
Hi Dex,

On 02. 05. 20 14:10, Dexter Lagan wrote:
> Hello,
>
>   I’ve been getting inconsistent results as well. A while ago I made a
> benchmark based on a parallel spectral norm computation. The benchmark
> works fine on Windows on most systems and uses all cores, but crashes
> randomly on other systems. I haven’t been able to figure out why. On
> Linux it doesn’t seem to use more than one core. I’d be interested to
> know if this is related. Here’s the benchmark code :
>
> https://github.com/DexterLagan/benchmark

Beware that (processor-count) returns the number of HT-cores, so your
v1.3 is actually requesting twice the number of threads as there are
HTs. At least on Linux this is the case (checked right now).

Interesting idea... 16 threads:

$ time racket crash.rkt -d 4
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real 6m37,579s
user 32m55,192s
sys 0m35,124s

So that is consistent to what I see.

Have you tried using future-visualizer[1] for checking why it uses only
single CPU thread? Last summer I spent quite some time with it to help
me find the right futures usage patterns that actually enable the
speculative computation in parallel. Usually if your code is too deep
and keeps allocating "something" each frame, it goes back to the runtime
thread for each allocation.


Cheers,
Dominik

[1] https://docs.racket-lang.org/future-visualizer/index.html
> <mailto:racket-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/3D174C6C-646A-494D-BF77-A476A9AF6C6F%40gmail.com
> <https://groups.google.com/d/msgid/racket-users/3D174C6C-646A-494D-BF77-A476A9AF6C6F%40gmail.com?utm_medium=email&utm_source=footer>.

Matthew Flatt

unread,
May 2, 2020, 8:31:07 AM5/2/20
to Dominik Pantůček, Sam Tobin-Hochstadt, Racket Users
I wasn't able to produce a crash on my first try, but the Nth try
worked, so this is very helpful!

I'm investigating, too...
> https://groups.google.com/d/msgid/racket-users/CAK%3DHD%2BaAeQ6ZeaABFvUHXf%3D4s
> 6R0U5G-xfCrfakqTx%3DpEJUr7g%40mail.gmail.com.

Dexter Lagan

unread,
May 2, 2020, 8:32:48 AM5/2/20
to Dominik Pantůček, racket...@googlegroups.com
Hi Dominik,

Ah that explains why I was getting an incorrect number of threads! I didn’t think about using future-visualizer, but I’ll give it a try. Thanks!

Dex

> On May 2, 2020, at 2:27 PM, Dominik Pantůček <dominik....@trustica.cz> wrote:
>
> Hi Dex,
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/721a4698-34b9-20b2-9c0a-fbe14784b9f3%40trustica.cz.

Sam Tobin-Hochstadt

unread,
May 2, 2020, 8:37:00 AM5/2/20
to Matthew Flatt, Dominik Pantůček, Racket Users
I opened https://github.com/racket/racket/issues/3145 to avoid too
much mailing list traffic, and posted a stack trace there.

Sam


On Sat, May 2, 2020 at 8:31 AM Matthew Flatt <mfl...@cs.utah.edu> wrote:
>
> I wasn't able to produce a crash on my first try, but the Nth try
> worked, so this is very helpful!
>
> I'm investigating, too...
>
> At Sat, 2 May 2020 08:26:10 -0400, Sam Tobin-Hochstadt wrote:
> > https://groups.google.com/d/msgid/racket-users/CAK%3DHD%2BaAeQ6ZeaABFvUHXf%3D4s
> > 6R0U5G-xfCrfakqTx%3DpEJUr7g%40mail.gmail.com.

Dominik Pantůček

unread,
May 2, 2020, 9:38:23 AM5/2/20
to racket...@googlegroups.com
Hi Sam,

On 02. 05. 20 14:26, Sam Tobin-Hochstadt wrote:
> I successfully reproduced this on the first try, which is good. Here's
> my debugging advice (I'm also looking at it):
>
> 1. To use a binary with debugging symbols, use
> `racket/src/build/racket/racket3m` from the checkout of the Racket
> repository that you built.
> 2. When running racket in GDB, there are lots of segfaults because of
> the GC; you'll want to use `handle SIGSEGV nostop noprint`
> 3. It may not work for this situation because of parallelism, but if
> you can reproduce the bug using `rr` [1] it will be almost infinitely
> easier to find and fix.

thanks for the hints and also thanks for opening the Github issue for
that. I'll try to post my results (if any) there.

>
> I'm also curious about your experience with Racket CS and futures.
> It's unlikely to have the _same_ bugs, but it would be good to find
> the ones there are. :)

This is going to be a really hard one. With all the tricks I learned
during past weeks, I get almost 400 frames per second with my experiment
using 3m and unsafe operations. Without unsafe operations it goes down
to 300 and without unsafe operations and with the de-optimized flip
function as shown in the example + set-argb-pixels, I am at about 50 fps
(that is presumably a completely "safe" version without relying on my
bounds and type checking).

With CS, I am unable to get quickly working anything else than the
de-optimized version with set-argb-pixels and I am at about 5 fps. Also,
the thread scheduling is "interesting" at best. I am postponing the work
on that - I sort of assume, that it can take another few weeks to
understand how to properly use all the fixnum/flonum related stuff with CS.


Thanks again!
Dominik

George Neuner

unread,
May 3, 2020, 10:03:01 PM5/3/20
to racket...@googlegroups.com
On Sat, 2 May 2020 14:10:19 +0200, Dexter Lagan
<dexte...@gmail.com> wrote:

> I’ve been getting inconsistent results as well. A while ago I made a
>benchmark based on a parallel spectral norm computation. The
>benchmark works fine on Windows on most systems and uses all cores,
>but crashes randomly on other systems. I haven’t been able to figure
>out why. On Linux it doesn’t seem to use more than one core. I’d be
>interested to know if this is related. Here’s the benchmark code :
>
>https://github.com/DexterLagan/benchmark
>
>Dex

I haven't examined the code in detail, but I suspect you're not giving
the futures time to do anything. Your 'for/par' function touches them
almost immediately, and touching forces a future to be evaluated by
the thread that has touched it.

Also be aware that futures essentially are limited to data access and
math ... in particular if you try to do any kind of I/O within a
future it will force (at least) that future into serial execution.

George

Dexter Lagan

unread,
May 4, 2020, 3:21:11 AM5/4/20
to George Neuner, racket...@googlegroups.com
Thank you sir, no doubt futures are being used in an unsafe manner. The code was used in language benchmarks and was 'optimized' to squeeze every ounce of performance. What I find strange is that it does work well, and saturates all cores (that was the intent) on Win10, but crashes randomly on certain systems and certain versions of Racket. It believe it started becoming unstable around Racket 7.0. On Linux it only uses one thread, which could very well be caused by a difference in the way futures are implemented on Linux.

Dex
--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/i2tuafl9rs29bjlr9i6rb387bdqj02epqg%404ax.com.

Reply all
Reply to author
Forward
0 new messages