Russ
After a bit of digging it looks like this poor little CPU can't get
that test done in the 120 second window. It's actually only a 65
second window as TestFloat64 takes 33 seconds minimum on this host.
The runtime on this test is highly variable, and bizarrely it runs
_faster_ when gotest is niced. Without the nice, interactive
performance on that host plummets, the runtime scheduler must be
really pounding the kernel.
http://codereview.appspot.com/4435066
Is an experimental CL that reduces the runtime of the tests from
axentraserver(~/go/src/pkg/runtime) % nice gotest -v | grep -C2 "(RUN|PASS)"
=== RUN runtime_test.TestStopTheWorldDeadlock
--- PASS: runtime_test.TestStopTheWorldDeadlock (122.73 seconds)
Iterations 1000
=== RUN runtime_test.TestFloat64
--- PASS: runtime_test.TestFloat64 (33.89 seconds)
to
axentraserver(~/go/src/pkg/runtime) % nice gotest -v | grep -C2 "(RUN|PASS)"
=== RUN runtime_test.TestStopTheWorldDeadlock
--- PASS: runtime_test.TestStopTheWorldDeadlock (18.96 seconds)
Iterations 100
=== RUN runtime_test.TestFloat64
--- PASS: runtime_test.TestFloat64 (34.03 seconds)
Comments and suggestions gladly welcomed.
Cheers
Dave
Russ
Sent from my iPad
We've seen one "ok" now, so things are at least not
completely broken. Still, the goroutines are sleeping
crashes persist. I'm still as confused as when I wrote
the above. Dave, can you see if some simple test
can reproduce the crash reliably? The failure after
the ok was in chan/goroutine.go. If that fails
semi-reliably that would at least give us a very simple
test case.
Russ
Just an update that I'm still looking into the arm5 crashing issue and
hope to have at least a more reliable test case than just hammering
gofix, in a few days. The best I have at the moment is some handwaving
about lots of closures and lots of goroutines.
In the interim I wanted to get some advice from the group about the
finaliser goroutine. Comments in the code suggest it's sort of special
and can run without taking locks because of other preconditions. In
this stack trace http://pastie.org/1841753, it appears that the
finaliser goroutine was the 543'rd goroutine to be started. I'm
wondering if this is breaking an expectation in the scheduler.
Does this tweak anyones interest ?
Cheers
Dave
On Tue, Apr 26, 2011 at 6:57 AM, Russ Cox <r...@golang.org> wrote:
It looks like stack.go is also a good test, but the build doesn't get
there if gofix blows up.
That's a little surprising. The gofix deadlocks all show one
goroutine blocked sending to and another blocked receiving from
the same channel, which obviously should not happen.
There are two possible explanations: (1) the channel
memory is being garbage collected and zeroed between
the two ops or (2) the locks are broken.
Can you send the output for when stack.go fails?
What does uname -a print on your system?
Out of 100 runs, how many fail?
Out of 100 runs with export GOGC=off, how many fail?
Out of 100 runs with export GOGC=1, how many fail?
Thanks.
Russ
There is a lot of output so I've created an issue to collect all the data
http://code.google.com/p/go/issues/detail?id=1750
I'm still working on the GOGC=1 tests (each run of stack.go takes about 3 mins)
I'll also do similar for chan/goroutine.go
Cheers
Dave