[Haskell-cafe] How do I debug this RTS segfault?

25 views
Skip to first unread message

Lana Black

unread,
Jul 24, 2016, 1:50:38 PM7/24/16
to haskel...@haskell.org
Hello,

I have run into this RTS bug recently. In short, when executing multiple
consequtive forks, after 500-600 or so the process is terminated by
SIGSEGV. I know this kind of thing is totally artificial, but still.

The problem I have is that I can't get any meaningful backtrace in gdb.
For example, for threaded RTS I get this

(gdb) bt
#0 0x0000000000560d63 in
base_GHCziEventziThread_ensureIOManagerIsRunning1_info ()
Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0

For non-threaded RTS I get this

(gdb) bt
#0 0x00000000007138c9 in stg_makeStablePtrzh ()
Backtrace stopped: Cannot access memory at address 0x7fffff7fc720

Build command: ghc --make -O2 -g -fforce-recomp fork.hs
Add threaded if needed.

I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD
with the code below.

>import System.Exit (exitSuccess)
>import System.Posix.Process (forkProcess)
>
>fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1)
> | otherwise = putStrLn "I'm done!"
>
>processPid pid | pid > 0 = exitSuccess
> | pid < 0 = putStrLn "OOOPS, forkProcess failed!"
> | otherwise = pure ()
>
>main = fork_ 1000
>

With best regards.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Anatoly Yakovenko

unread,
Jul 24, 2016, 5:25:24 PM7/24/16
to Lana Black, haskel...@haskell.org
It's probably out of file descriptors. It's possible that it tries to open another one during the error handling.

Lana Black

unread,
Jul 24, 2016, 8:46:59 PM7/24/16
to haskel...@haskell.org

Seems like this is not the case. I actually overlooked GHCs -debug
option, with it I'm now able to get a stacktrace. Furthermore, the
number of used file descriptors is well within the limit, and changing
the latter with `ulimit -n` does not affect the outcome.

Curiously, the stacks are rather different for threaded and non-threaded
RTS.

Non-threaded:
(gdb) bt
#0 INFO_PTR_TO_STRUCT (info=<error reading variable: Cannot access
memory at address 0x7fffff7feff0>) at
includes/rts/storage/ClosureMacros.h:60
#1 0x000000000070e956 in get_itbl (c=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:87
#2 0x000000000070ec3c in closure_sizeW (p=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:439
#3 0x000000000070ecf7 in overwritingClosure (p=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:555
#4 0x0000000000725dd7 in stg_upd_frame_info ()
#5 0x0000000000000000 in ?? ()

Threaded:
(gdb) bt
#0 0x00007ffff6ce49ce in _IO_vfprintf_internal (s=s@entry=0x7fffff7ff430, format=format@entry=0x7ffff75c3550 "/proc/self/task/%u/comm", ap=ap@entry=0x7fffff7ff558)
at vfprintf.c:1266
#1 0x00007ffff6d0954b in __IO_vsprintf (string=0x7fffff7ff630 "`\366\177\377\377\177", format=0x7ffff75c3550 "/proc/self/task/%u/comm", args=args@entry=0x7fffff7ff558)
at iovsprintf.c:42
#2 0x00007ffff6cecd47 in __sprintf (s=s@entry=0x7fffff7ff630 "`\366\177\377\377\177", format=format@entry=0x7ffff75c3550 "/proc/self/task/%u/comm") at sprintf.c:32
#3 0x00007ffff75c1f2b in pthread_setname_np (th=140737317025536, name=0x78ba04 "ghc_ticker") at ../sysdeps/unix/sysv/linux/pthread_setname.c:49
#4 0x000000000072ce4e in initTicker (interval=10000000, handle_tick=0x71a23d <handle_tick>) at rts/posix/itimer/Pthread.c:173
#5 0x000000000071a32f in initTimer () at rts/Timer.c:111
#6 0x0000000000703c26 in forkProcess (entry=0x207) at rts/Schedule.c:2072
#7 0x0000000000405bf7 in s7dF_info ()
#8 0x0000000000000000 in ?? ()

Carter Schonwald

unread,
Jul 25, 2016, 8:40:57 PM7/25/16
to Lana Black, haskel...@haskell.org
Fork process is very very different from forkIo and fork os.  Have you tried fork bombing from shell with a similar program? I don't think your os can handle 2^1000 process ids? Right? I seem to reall process ids being 32 or 64 bit.  

Anatoly Zaretsky

unread,
Jul 26, 2016, 9:46:25 AM7/26/16
to Lana Black, Haskell Cafe
Hello,

On Sun, Jul 24, 2016 at 8:50 PM, Lana Black <lana...@amok.cc> wrote:
I have run into this RTS bug recently. In short, when executing multiple
consequtive forks, after 500-600 or so the process is terminated by
SIGSEGV. I know this kind of thing is totally artificial, but still.

Here's a bug report with some analysis: https://ghc.haskell.org/trac/ghc/ticket/12436

Lana Black

unread,
Jul 26, 2016, 10:10:22 AM7/26/16
to Anatoly Zaretsky, Haskell Cafe
Great! Thanks for filing that for me.
Reply all
Reply to author
Forward
0 new messages