Help! Malloc/free deadlock

447 views
Skip to first unread message

gharris

unread,
Sep 26, 2013, 6:43:40 PM9/26/13
to golan...@googlegroups.com
Hello!

I've been writing a game with Go, Enet, Allegro 5 and OpenGL but I'm stuck with a nasty crash.
After adding a loading screen the game still works on my desktop (32bit Linux on an x86 chipset), but not on my Pandora (32bit Linux on an ARM chipset).
The crash happens reliably at the same point.
It was fine before I added the loading screen. The error goes away if I make the level very small for some reason.
I don't think memory is the problem as the loading graphics are very simple despite the level size behaviour.


fatal error: malloc/free - deadlock
[signal 0xb code=0x1 addr=0x10 pc=0x53f6c]

goroutine 1 [syscall]:
runtime.asmcgocall(0xdcdc0, 0x40c2eec8, 0x1052c854, 0x0, 0x50001, ...)
    /home/graham/go/src/pkg/runtime/asm_arm.s:290 +0x24 fp=0x40c2ee8c
runtime.cgocall(0xdcdc0, 0x40c2eec8)
    /home/graham/go/src/pkg/runtime/cgocall.c:166 +0x124 fp=0x40c2eeb8
_/home/graham/programs/ae/ae._Cfunc_deleteBuffers(0x1, 0x0)
    _/home/graham/programs/ae/ae/_obj/_cgo_defun.c:1116 +0x34 fp=0x40c2eec4
_/home/graham/programs/ae/ae.(*ElementArray).Destroy(0x0)
    _/home/graham/programs/ae/ae/_obj/_cgo_gotypes.go:927 +0x34 fp=0x40c2eed0
main.(*Renderer).Destroy(0x105b9000)
    /home/graham/programs/ae/test/world_render.go:523 +0x7c fp=0x40c2eedc
main.(*Test).Destroy(0x10533140)
    /home/graham/programs/ae/test/main.go:184 +0x14c fp=0x40c2eef8
_/home/graham/programs/ae/ae.Loop(0x40953e50, 0x105002f8)
    _/home/graham/programs/ae/ae/_obj/_cgo_gotypes.go:2164 +0x6f8 fp=0x40c2ef3c
main.main()
    /home/graham/programs/ae/test/main.go:442 +0x734 fp=0x40c2efac
runtime.main()
    /home/graham/go/src/pkg/runtime/proc.c:182 +0x7c fp=0x40c2efc4
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1276 fp=0x40c2efc4

goroutine 3 [syscall]:
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1276


I guess that indicates the garbage collector is conflicing with cgo?
This is way out of my comfort zone, how on earth do I debug something like this?!

Dave Cheney

unread,
Sep 26, 2013, 6:47:04 PM9/26/13
to gharris, golan...@googlegroups.com
_/home/graham/programs/ae/ae/_obj/_cgo_defun.c:1116 +0x34 fp=0x40c2eec4
_/home/graham/programs/ae/ae.(*ElementArray).Destroy(0x0)

This looks like you're calling Delete on a nil ElementArray.

Dave Cheney

unread,
Sep 26, 2013, 8:07:29 PM9/26/13
to gharris, golan...@googlegroups.com
Looking at your panic message on a larger screen i'm pretty sure
you're passing a nil into your cgo code.

fatal error: malloc/free - deadlock
[signal 0xb code=0x1 addr=0x10 pc=0x53f6c]

^ addr= nil + 4 words

The malloc/free deadlock is, I think, unrelated, probably due to the
runtime trying to allocate memory to print a second panic message,
while panicing. This is fixed on tip if you want to give that a go
(you can even try the prebuilt 1.2rc1 tarballs from my website if you
like).

Is the code available anywhere ?

grahamalex...@gmail.com

unread,
Sep 27, 2013, 11:38:03 AM9/27/13
to golan...@googlegroups.com, gharris
Ok, interesting.
I've been messing about with the code so I think I broke some more of it doing that, resulting is freeing a nil instance.
I've reverted to a previous version and compiled go using tip.

I now get this error where it used to crash with a different malloc/free deadlock:


fatal error: runtime: stack split during syscall

runtime stack:
runtime.throw(0x22e5cc)
    /home/graham/go/src/pkg/runtime/panic.c:464 +0x5c
runtime.newstack()
    /home/graham/go/src/pkg/runtime/stack.c:261 +0x5bc
runtime.morestack()
    /home/graham/go/src/pkg/runtime/asm_arm.s:212 +0x44

goroutine 1 [stack split]:
runtime: unexpected return pc for runtime.sigpanic called from 0x4028a320
runtime.sigpanic()
    /home/graham/go/src/pkg/runtime/os_linux.c:219 fp=0xbe916f58

goroutine 3 [syscall]:
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1396


Is there anything I can do about that?

Dave Cheney

unread,
Sep 27, 2013, 11:50:50 AM9/27/13
to grahamalex...@gmail.com, golan...@googlegroups.com, gharris
At a guess your c code segfaulted, any chance of seeing the code ?


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew Reimers

unread,
Sep 27, 2013, 2:37:16 PM9/27/13
to golan...@googlegroups.com, grahamalex...@gmail.com
the error "fatal error: runtime: stack split during syscall" happens when a thread calls runtime·entersyscall, and then tries to alloc on the stack.

At a guess as to what is happening (tho I could be completely wrong):

a thread calls runtime·entersyscall

the thread then segfaults (inside the runtime)

the segfault handler then sets the PC to  runtime.sigpanic to handle the fault

runtime.sigpanic sees that you are in a syscall, and throws the fatal error: runtime: stack split during syscall



You could try modifying src/pkg/runtime/signal_arm.c line 58 ish, just after:


56 if(gp == nil || gp == m->g0)
57     goto Throw;

by adding something like:

if (gp->status == Gsyscall) {
    runtime·exitsyscall();
    goto Throw;
}


This should hopefully give you a stack trace of what is segfaulting.

Andrew

Dmitry Vyukov

unread,
Sep 27, 2013, 3:41:19 PM9/27/13
to grahamalex...@gmail.com, golang-nuts
Do we have a test where cgo SISSEGV and we check the error message? I
can not find a one.

Ian Lance Taylor

unread,
Sep 27, 2013, 4:00:13 PM9/27/13
to Dmitry Vyukov, grahamalex...@gmail.com, golang-nuts
On Fri, Sep 27, 2013 at 12:41 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> Do we have a test where cgo SISSEGV and we check the error message? I
> can not find a one.

Do you mean something like misc/cgo/cthread.go?

Or do you mean that C code gets a SIGSEGV? I don't think we have any
tests like that.

Ian

brainman

unread,
Sep 27, 2013, 6:16:24 PM9/27/13
to golan...@googlegroups.com
There might be one in runtime. Called ...Crash... or something.

Alex

gharris

unread,
Sep 28, 2013, 8:48:13 PM9/28/13
to golan...@googlegroups.com
I'm still experimenting with the code, I'm seeing a few different problems at the moment. Stack corruption?
I've changed signal_arm.c but I didn't think I needed to recompile the Go tools.

My Allegro and enet wrappers are now available here:
https://bitbucket.org/gah/ae
https://bitbucket.org/gah/enet

Neither is very pleasant and I expect there are bugs I haven't found yet.

I'll get back to you when I've got some more useful output.

Dave Cheney

unread,
Sep 28, 2013, 9:08:13 PM9/28/13
to gharris, golan...@googlegroups.com
That is a lot of code so I didn't have time to review it in detail. The first thing that I saw was there was no call to runtime.LockOSThread which is critical for things like OpenGL which abuse TLS heavily. If I missed this, ignore this comment. 

Also, I noticed that your test files in the ae package were not labeled correctly. Do you see any teat run when you execute go test -v ?
--

grahamalex...@gmail.com

unread,
Sep 29, 2013, 8:36:35 AM9/29/13
to golan...@googlegroups.com, gharris
runtime.LockOSThread is called in initEngine at line 261 of loop.go. This is called when the game starts.
The test file is very old and probably needs deleting.

At the moment I get these effects when I try to run the game:
  • The game starts and seems to be running fine, I quit after a couple of seconds.
  • The game loads the world into graphics memory then crashes with an error indicating it failed to load a sound file. I have no idea why, the file is valid.
  • The game crashes after loading the world into graphics memory with an error like this:

fatal error: runtime: stack split during syscall

runtime stack:
runtime.throw(0x22e5cc)
    /home/graham/go/src/pkg/runtime/panic.c:464 +0x5c
runtime.newstack()
    /home/graham/go/src/pkg/runtime/stack.c:261 +0x5bc
runtime.morestack()
    /home/graham/go/src/pkg/runtime/asm_arm.s:212 +0x44

goroutine 1 [stack split]:
runtime: unexpected return pc for runtime.sigpanic called from

0x402ff320
runtime.sigpanic()
    /home/graham/go/src/pkg/runtime/os_linux.c:219 fp=0xbeae2f48



goroutine 3 [syscall]:
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1396

  • The game crashes after loading the world into graphics memory with an error like this:

SIGSEGV: segmentation violation
PC=0x40247320

runtime.cgocall(0xf5f8, 0x40a8492c)
    /home/graham/go/src/pkg/runtime/cgocall.c:148 +0x108
fp=0x40a8491c
_/home/graham/programs/ae/ae._Cfunc_al_load_bitmap(0x48f81468,
0x10500730)
    _/home/graham/programs/ae/ae/_obj/_cgo_defun.c:799 +0x34
fp=0x40a84928
_/home/graham/programs/ae/ae.loadImage(0x105a2b60, 0x13, 0x0)
    /home/graham/programs/ae/ae/images.go:43 +0x90 fp=0x40a84950
_/home/graham/programs/ae/ae.Image(0x105a2b60, 0x13, 0x2)
    /home/graham/programs/ae/ae/images.go:71 +0xb4 fp=0x40a84970
main.NewSelect(0x1050a500)
    /home/graham/programs/ae/test/select.go:23 +0xb0
fp=0x40a84d28
main.(*Test).Init(0x1050a500)
    /home/graham/programs/ae/test/main.go:93 +0x350 fp=0x40a84d94
_/home/graham/programs/ae/ae.Change(0x4640fc78, 0x1050a500)
    /home/graham/programs/ae/ae/loop.go:142 +0x254 fp=0x40a84dc8
main.func??005(0xc9da106b, 0xe)
    /home/graham/programs/ae/test/wait.go:131 +0x298
fp=0x40a84e18
main.(*Wait).Tick(0x1059f0f0)
    /home/graham/programs/ae/test/wait.go:183 +0x330
fp=0x40a84ec8
_/home/graham/programs/ae/ae.Loop(0x40942338, 0x105002e0)
    /home/graham/programs/ae/ae/loop.go:250 +0x71c fp=0x40a84f24
main.main()
    /home/graham/programs/ae/test/main.go:446 +0x714
fp=0x40a84f8c
runtime.main()
    /home/graham/go/src/pkg/runtime/proc.c:222 +0x100
fp=0x40a84fc0
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1396 fp=0x40a84fc0



goroutine 3 [syscall]:
runtime.goexit()
    /home/graham/go/src/pkg/runtime/proc.c:1396

trap    0xe
error   0x817
oldmask 0x0
r0      0x0
r1      0x48fbf880
r2      0x160
r3      0xff000000
r4      0xff000000
r5      0xff000000
r6      0xff000000
r7      0xff000000
r8      0xff000000
r9      0x0
r10     0x48fbf860
fp      0x0
ip      0xff000000
sp      0xbec99f1c
lr      0xff000000
pc      0x40247320
cpsr    0x20000010
fault   0x0


I can't really tell what's going on, but the random behaviour reminds me of when I last encountered stack corruption. Why does this only happen on the ARM based hardware? The same code seems to be fine on my x86 machine!

Reply all
Reply to author
Forward
0 new messages