arm5 build still broken

40 views
Skip to first unread message

Russ Cox

unread,
Apr 22, 2011, 4:32:58 PM4/22/11
to Dave Cheney, golang-dev
It gets all the way to the gofix test, though, which is
one of the last tests. The gofix crash claims that
all goroutines are sleeping, which since we know
it isn't true is most likely a locking bug causing the
scheduler to get confused. But we let the kernel
provide the compare-and-swap instruction, and we
test that it works on simple cases (i.e. we're calling
it correctly) at startup. So I'm a bit confused as to
what could be going wrong now.

Russ

Dave Cheney

unread,
Apr 22, 2011, 11:03:53 PM4/22/11
to r...@golang.org, golang-dev
Looking at the test failures this morning, the one that is coming up
more than the gofix test is runtime/proc_test.go

After a bit of digging it looks like this poor little CPU can't get
that test done in the 120 second window. It's actually only a 65
second window as TestFloat64 takes 33 seconds minimum on this host.
The runtime on this test is highly variable, and bizarrely it runs
_faster_ when gotest is niced. Without the nice, interactive
performance on that host plummets, the runtime scheduler must be
really pounding the kernel.

http://codereview.appspot.com/4435066

Is an experimental CL that reduces the runtime of the tests from

axentraserver(~/go/src/pkg/runtime) % nice gotest -v | grep -C2 "(RUN|PASS)"
=== RUN runtime_test.TestStopTheWorldDeadlock
--- PASS: runtime_test.TestStopTheWorldDeadlock (122.73 seconds)
Iterations 1000
=== RUN runtime_test.TestFloat64
--- PASS: runtime_test.TestFloat64 (33.89 seconds)

to

axentraserver(~/go/src/pkg/runtime) % nice gotest -v | grep -C2 "(RUN|PASS)"
=== RUN runtime_test.TestStopTheWorldDeadlock
--- PASS: runtime_test.TestStopTheWorldDeadlock (18.96 seconds)
Iterations 100
=== RUN runtime_test.TestFloat64
--- PASS: runtime_test.TestFloat64 (34.03 seconds)

Comments and suggestions gladly welcomed.

Cheers

Dave

Russ Cox

unread,
Apr 23, 2011, 10:05:12 AM4/23/11
to Dave Cheney, golang-dev
I just turned that test off during all.bash.
It's too cpu heavy for the "short" tests anyway.

Russ

Dave Cheney

unread,
Apr 23, 2011, 10:47:37 AM4/23/11
to r...@golang.org, golang-dev
Sorry for the breakage, I'll try to install a diff binary tomorrow.

Sent from my iPad

Russ Cox

unread,
Apr 25, 2011, 4:57:17 PM4/25/11
to Dave Cheney, golang-dev

We've seen one "ok" now, so things are at least not
completely broken. Still, the goroutines are sleeping
crashes persist. I'm still as confused as when I wrote
the above. Dave, can you see if some simple test
can reproduce the crash reliably? The failure after
the ok was in chan/goroutine.go. If that fails
semi-reliably that would at least give us a very simple
test case.

Russ

Dave Cheney

unread,
Apr 25, 2011, 6:34:30 PM4/25/11
to r...@golang.org, golang-dev
I'll take a look today.

Dave Cheney

unread,
Apr 27, 2011, 9:54:59 PM4/27/11
to r...@golang.org, golang-dev
Hello,

Just an update that I'm still looking into the arm5 crashing issue and
hope to have at least a more reliable test case than just hammering
gofix, in a few days. The best I have at the moment is some handwaving
about lots of closures and lots of goroutines.

In the interim I wanted to get some advice from the group about the
finaliser goroutine. Comments in the code suggest it's sort of special
and can run without taking locks because of other preconditions. In
this stack trace http://pastie.org/1841753, it appears that the
finaliser goroutine was the 543'rd goroutine to be started. I'm
wondering if this is breaking an expectation in the scheduler.

Does this tweak anyones interest ?

Cheers

Dave

On Tue, Apr 26, 2011 at 6:57 AM, Russ Cox <r...@golang.org> wrote:

Russ Cox

unread,
Apr 27, 2011, 10:37:55 PM4/27/11
to Dave Cheney, golang-dev
Instead of gofix, what about chan/goroutines.go?
That has failed before too and is a much simpler program.

Dave Cheney

unread,
Apr 28, 2011, 3:36:50 AM4/28/11
to r...@golang.org, golang-dev
http://godashboard.appspot.com/log/5fbc4c1403c6b3ff82c89039c2673da6908a871002e106d5b450ee3d2f01f292

It looks like stack.go is also a good test, but the build doesn't get
there if gofix blows up.

Russ Cox

unread,
Apr 28, 2011, 11:33:03 AM4/28/11
to Dave Cheney, golang-dev
> It looks like stack.go is also a good test

That's a little surprising. The gofix deadlocks all show one
goroutine blocked sending to and another blocked receiving from
the same channel, which obviously should not happen.
There are two possible explanations: (1) the channel
memory is being garbage collected and zeroed between
the two ops or (2) the locks are broken.

Can you send the output for when stack.go fails?
What does uname -a print on your system?
Out of 100 runs, how many fail?
Out of 100 runs with export GOGC=off, how many fail?
Out of 100 runs with export GOGC=1, how many fail?

Thanks.
Russ

Dave Cheney

unread,
Apr 29, 2011, 1:02:33 AM4/29/11
to r...@golang.org, golang-dev
Hi Russ,

There is a lot of output so I've created an issue to collect all the data

http://code.google.com/p/go/issues/detail?id=1750

I'm still working on the GOGC=1 tests (each run of stack.go takes about 3 mins)

I'll also do similar for chan/goroutine.go

Cheers

Dave

Reply all
Reply to author
Forward
0 new messages