segfault with latest runtime/GC

273 views
Skip to first unread message

Albert Strasheim

unread,
Mar 4, 2013, 7:34:50 AM3/4/13
to golan...@googlegroups.com
Hello all

I'm seeing a segfault that seems to be related to the recent runtime work. Might also be GC related, but we haven't seen anything like this before today.

The fault happens in the program we use to run our internal tests. It runs Go tests compiled with go test -c, buffers the output and turns it into XML that we feed to Jenkins.

Unfortunately it's hard to share the code that causes this, since it seems to depend on running a specific internal test that probably produces a few gigabyte of output.

The system has many hundreds of GB of RAM, so it's not memory running out.

The last bit of the GC trace and stack trace is below. There's a big jump in heap size just before it crashes.

Let me know if I can provide any more info to help debug this.

Cheers

Albert

gc29(8): 3+0+0 ms, 424 -> 279 MB 15814 -> 9560 (68269-58709) objects, 117(5670) handoff, 25(256) steal, 774/347/83 yields
gc30(8): 3+0+1 ms, 279 -> 275 MB 9560 -> 9534 (68270-58736) objects, 96(5285) handoff, 32(556) steal, 834/376/91 yields
scvg22: 0x0 MB released
scvg22: inuse: 500, idle: 87, sys: 588, released: 0, consumed: 587 (MB)
gc31(8): 2+0+0 ms, 755 -> 699 MB 9538 -> 9535 (68275-58740) objects, 108(5353) handoff, 36(643) steal, 811/415/87 yields
gc32(8): 2+0+0 ms, 2235 -> 2171 MB 9537 -> 9536 (68278-58742) objects, 108(5457) handoff, 32(576) steal, 891/382/85 yields
scvg23: 0x0 MB released
scvg23: inuse: 2172, idle: 207, sys: 2380, released: 0, consumed: 2379 (MB)
scvg24: inuse: 4220, idle: 207, sys: 4428, released: 0, consumed: 4427 (MB)
gc33(8): 4+0+0 ms, 4219 -> 3451 MB 9537 -> 9513 (68280-58767) objects, 86(3684) handoff, 33(639) steal, 591/363/96 yields
scvg25: GC forced
scvg25: inuse: 3452, idle: 975, sys: 4428, released: 0, consumed: 4427 (MB)
gc34(8): 5+0+0 ms, 3451 -> 3451 MB 9513 -> 9512 (68281-58769) objects, 156(7905) handoff, 36(657) steal, 1055/490/147 yields
scvg26: 0x0 MB released
scvg26: inuse: 3452, idle: 975, sys: 4428, released: 0, consumed: 4427 (MB)
gc35(8): 3+0+0 ms, 7547 -> 7547 MB 9513 -> 9513 (68283-58770) objects, 108(5264) handoff, 39(695) steal, 878/522/130 yields
scvg27: 0x19 MB released
scvg27: inuse: 7548, idle: 975, sys: 8524, released: 25, consumed: 8498 (MB)
scvg28: 0xb5 MB released
scvg28: inuse: 7548, idle: 975, sys: 8524, released: 207, consumed: 8316 (MB)
gc36(8): 3+0+0 ms, 7547 -> 6523 MB 9513 -> 9490 (68284-58794) objects, 99(4977) handoff, 35(698) steal, 712/289/92 yields
gc37(8): 3+0+0 ms, 6523 -> 6523 MB 9490 -> 9489 (68285-58796) objects, 65(2826) handoff, 38(541) steal, 667/411/122 yields
scvg29: GC forced
scvg29: inuse: 6524, idle: 1999, sys: 8524, released: 207, consumed: 8316 (MB)
scvg30: 0x0 MB released
scvg30: inuse: 6524, idle: 1999, sys: 8524, released: 207, consumed: 8316 (MB)
gc38(8): 2+0+0 ms, 14715 -> 14715 MB 9490 -> 9490 (68287-58797) objects, 100(5268) handoff, 39(743) steal, 809/426/86 yields
unexpected fault address 0xc614c20000
fatal error: fault
[signal 0xb code=0x1 addr=0xc614c20000 pc=0x421f2f]

goroutine 353 [running]:
[fp=0x7fb6a6acdd80] runtime.throw(0xf221b2)
/build/go/go/src/pkg/runtime/panic.c:465 +0x67
[fp=0x7fb6a6acdd98] runtime.sigpanic()
/build/go/go/src/pkg/runtime/thread_linux.c:217 +0xe7
[fp=0x7fb6a6acdda0] runtime.memmove()
/build/go/go/src/pkg/runtime/memmove_amd64.s:46 +0x1f
[fp=0x7fb6a6acddd8] runtime.copy()
/build/go/go/src/pkg/runtime/slice.c:259 +0x13d
[fp=0x7fb6a6acdea8] bytes.(*Buffer).ReadFrom(0xc2004df850, 0xc2001813c0, 0xc200000280, 0xfffffc36, 0x0, ...)
/build/go/go/src/pkg/bytes/buffer.go:162 +0x157
[fp=0x7fb6a6acdf30] io.Copy(0xc200448e10, 0xc2004df850, 0xc2001813c0, 0xc200000280, 0x0, ...)
/build/go/go/src/pkg/io/io.go:328 +0xa3
[fp=0x7fb6a6acdf80] os/exec.func·003(0x8c1a68, 0xc20046a2a0)
/build/go/go/src/pkg/os/exec/exec.go:207 +0x60
[fp=0x7fb6a6acdfb0] os/exec.func·004(0xc200384160)
/build/go/go/src/pkg/os/exec/exec.go:274 +0x2a
[fp=0x7fb6a6acdfb8] runtime.goexit()
/build/go/go/src/pkg/runtime/proc.c:1149
created by os/exec.(*Cmd).Start
/build/go/go/src/pkg/os/exec/exec.go:275 +0x6ac

goroutine 1 [syscall]:
syscall.Syscall6()
/build/go/go/src/pkg/syscall/asm_linux_amd64.s:41 +0x5
syscall.wait4(0x60f2, 0xc2000002e0, 0x0, 0xc20058b6c0, 0x100000001, ...)
/build/go/go/src/pkg/syscall/zerrors_linux_amd64.go:1640 +0x7b
syscall.Wait4(0x60f2, 0x7fb6a6ac99fc, 0x0, 0xc20058b6c0, 0xc2001d2100, ...)
/build/go/go/src/pkg/syscall/syscall_linux.go:222 +0x6c
os.(*Process).wait(0xc200384440, 0x4105dc, 0x0, 0x0)
/build/go/go/src/pkg/os/exec_unix.go:22 +0xea
os.(*Process).Wait(0xc200384440, 0xc200000248, 0xc2005d57c0, 0x2b)
/build/go/go/src/pkg/os/doc.go:43 +0x25
os/exec.(*Cmd).Wait(0xc200721640, 0x0, 0x0)
/build/go/go/src/pkg/os/exec/exec.go:308 +0x197
os/exec.(*Cmd).Run(0xc200721640, 0xc2004df850, 0x4000000002)
/build/go/go/src/pkg/os/exec/exec.go:232 +0x52
os/exec.(*Cmd).CombinedOutput(0xc200721640, 0x46d00c, 0x51349038, 0xc220a236a7, 0x4b16bc, ...)
/build/go/go/src/pkg/os/exec/exec.go:352 +0x20d
<our code here... calling from main.main()>

goroutine 2 [syscall]:

goroutine 86 [finalizer wait]:

go version:

devel +67fc3cb0d3aa Mon Mar 04 14:54:36 2013 +1100

Carl Shapiro

unread,
Mar 4, 2013, 10:58:50 PM3/4/13
to Albert Strasheim, golang-dev
On Mon, Mar 4, 2013 at 4:34 AM, Albert Strasheim <ful...@gmail.com> wrote:
Unfortunately it's hard to share the code that causes this, since it seems to depend on running a specific internal test that probably produces a few gigabyte of output.

Are you able to reliably reproduce the crash?

Albert Strasheim

unread,
Mar 4, 2013, 11:13:32 PM3/4/13
to golan...@googlegroups.com, Albert Strasheim, csha...@golang.org
Hello
I can reproduce it about 60% of the time, or at least reliably enough that I should be able to attach a gdb to it if people can tell me what they want to look at.

I wrote some more here:


The crash seems to trigger whenever the size of the heap suddenly spikes up past 14 GB.

I'll look at building a small program to try to reproduce this today. Maybe I'll get lucky. Or I can bisect the issue a bit if that would help?

Regards

Albert

Russ Cox

unread,
Mar 5, 2013, 3:57:28 PM3/5/13
to Albert Strasheim, golang-dev, Carl Shapiro
I just rolled back Carl's recent garbage collector patch, because it produced malloc corruption on ARM. It would be interesting to know whether this fixes the crash you have been seeing. At least then we'd have a hint as to whether it was caused by the garbage collector changes or the scheduler changes.

Carl Shapiro

unread,
Mar 5, 2013, 4:45:34 PM3/5/13
to Russ Cox, Albert Strasheim, golang-dev
On Tue, Mar 5, 2013 at 12:57 PM, Russ Cox <r...@golang.org> wrote:
I just rolled back Carl's recent garbage collector patch, because it produced malloc corruption on ARM. It would be interesting to know whether this fixes the crash you have been seeing. At least then we'd have a hint as to whether it was caused by the garbage collector changes or the scheduler changes.

(I think these things are unrelated.  Albert's report preceded the submit of my change that was rolled back.)

Albert Strasheim

unread,
Mar 6, 2013, 12:33:34 AM3/6/13
to golan...@googlegroups.com, Russ Cox, Albert Strasheim, csha...@golang.org
Hello
Correct. This issue was triggered by the big scheduler change in 231af8ac63aa.

I wouldn't go so far as to say it caused it though: the change probably causes the crashing program and logs it receives from its slave processes to change.

I'm going to flip between 231af8ac63aa and 1b4b6a694b1c independently for the master and slave processes today to see what there is to see.

It seems related to a byte slice that is close to 4 GB in size though. Here's a gdb backtrace from a crash:

#0  runtime.memmove (to=void, fr=void, n=void) at /build/go/src/pkg/runtime/memmove_amd64.s:46
#1  0x000000000041b5a2 in runtime.copy (to=void, fm=void, width=void, ret=void) at /build/go/src/pkg/runtime/slice.c:230
#2  0x0000000000535e9c in bytes.(*Buffer).ReadFrom (fp=void, b=0xc20051bbd0, r=..., n=4294966338, err=0) at /build/go/src/pkg/bytes/buffer.go:162
#3  0x00000000004778a4 in io.Copy (fp=void, dst=..., src=..., written=0, err=0) at /build/go/src/pkg/io/io.go:328
#4  0x00000000005068a4 in os/exec.func·003 (fp=void, ~anon0="<bad dynamic type>") at /build/go/src/pkg/os/exec/exec.go:207
#5  0x0000000000506928 in os/exec.func·004 (fp=void, fn={void (error *)} 0x7ff2040b2fb0) at /build/go/src/pkg/os/exec/exec.go:274
#6  0x0000000000416760 in ?? () at /build/go/src/pkg/runtime/proc.c:1126
#7  0x000000c2006e8e40 in ?? ()
#8  0x0000000000000000 in ?? ()

Cheers

Albert

unread,
Mar 6, 2013, 1:11:54 AM3/6/13
to golan...@googlegroups.com, Russ Cox, Albert Strasheim, csha...@golang.org
What happens if you run "GOGC=off ./yourprogram" ?

Albert Strasheim

unread,
Mar 6, 2013, 2:46:06 AM3/6/13
to golan...@googlegroups.com, Russ Cox, Albert Strasheim, csha...@golang.org
On Wednesday, March 6, 2013 8:11:54 AM UTC+2, ⚛ wrote:
What happens if you run "GOGC=off ./yourprogram" ?

Tried this. It still crashes. A bit more info here:


Cheers

Albert
Reply all
Reply to author
Forward
0 new messages