scheduling of goroutines

605 views
Skip to first unread message

Dean Prichard

unread,
Jan 8, 2010, 1:49:29 PM1/8/10
to golang-nuts
I've stumbled across an interesting problem with some code I wrote.
I've tried to cut down my code to a small test case which demonstrates
this.

I've got 2 goroutines, if I add a small Sleep() in the function, then
they
seem to run concurrently, without the Sleep() they seem to run
sequentially.

without sleep:
$ time ./6.out -s=f
real 0m20.212s
user 0m20.162s
sys 0m0.015s

with sleep:
$ time ./6.out -s=t
real 0m11.103s
user 0m20.173s
sys 0m0.043s

Andrey Mirtchovski took a quick look at it and pointed out that
turning the GC off also seems to cause the goroutines to run
concurrently

$ export GOGC=off; time ./6.out -s=f
real 0m11.065s
user 0m20.165s
sys 0m0.079s

any thoughts?

here is the code:

package main

import (
"flag"
"syscall"
"runtime"
)

var s = flag.Bool("s", true, "sleep hack")

func main() {
flag.Parse()
runtime.GOMAXPROCS(2)
ch1 := make(chan int, 0)
ch2 := make(chan int, 0)

go bigcalc(ch1, *s)
go bigcalc(ch2, *s)

<-ch1
<-ch2
}

func bigcalc(ch chan int, s bool) {
for i := 3000; i <= 4000; i++ {
smallcalc(i)
if s {
syscall.Sleep(1)
}
}
ch <- 1
}

func smallcalc(n int) {
a := make([]int, n)
b := make([]int, n)
for i := 0; i < 1000; i++ {
for j := 0; j < n; j++ {
b[j] = a[j]
}
}
}

John Asmuth

unread,
Jan 8, 2010, 3:20:31 PM1/8/10
to golang-nuts
The short answer: set GOMAXPROCS to 2 (or more).

The long answer is that if you have two goroutines multiplexed onto
the same process (which will always be the case when GOMAXPROCS is 1)
then the one goroutine has to yield control to the other for it to run
at all. The ways to yield control include system calls, io and channel
reading/writing.

For instance, this code will either print immediately or never if
GOMAXPROCS is 1, since nothing in that infinite loop yields control.

func main() {
go func() {
for {}
}()
print("Hi!\n")
}

- John

andrey mirtchovski

unread,
Jan 8, 2010, 3:42:15 PM1/8/10
to golang-nuts
> The short answer: set GOMAXPROCS to 2 (or more).

all of the tests are done with GOMAXPROCS>1, indeed the problem does
not appear with GOMAXPROCS=1.

something else noticed related to this code -- when GOMAXPROCS>1,
sending SIGQUIT to the process causes a segfault during the dump of
the runtime state. if the goroutines execute the sleep call then the
runtime state is dumped correctly on SIGQUIT. how does one go about
debugging this? gcc is useless on first glance.

andrey

-------------------------------------

SIGQUIT: quit
Faulting address: 0x1f400004b09
PC=0x420f93

futex+0x23 /home/andrey/go/src/pkg/runtime/linux/amd64/sys.s:128
futex()
futexsleep+0x47 /home/andrey/go/src/pkg/runtime/linux/thread.c:47
futexsleep(0x460780, 0x0, 0x3, 0x3)
futexlock+0x7a /home/andrey/go/src/pkg/runtime/linux/thread.c:126
futexlock(0x460780, 0x0)
notesleep+0x1c /home/andrey/go/src/pkg/runtime/linux/thread.c:206
notesleep(0x460780, 0x0)
nextgandunlock+0xfc /home/andrey/go/src/pkg/runtime/proc.c:352
nextgandunlock()
scheduler+0xe0 /home/andrey/go/src/pkg/runtime/proc.c:505
scheduler()
mstart+0x47 /home/andrey/go/src/pkg/runtime/proc.c:396
mstart()
_rt0_amd64+0x74 /home/andrey/go/src/pkg/runtime/amd64/asm.s:46
_rt0_amd64()

goroutine 3 [2]:
gosched+0x34 /home/andrey/go/src/pkg/runtime/proc.c:524
gosched()
./toy.sh: line 4: 19206 Segmentation fault ./6.out -s=f

Devon H. O'Dell

unread,
Jan 8, 2010, 3:48:40 PM1/8/10
to John Asmuth, golang-nuts
2010/1/8 John Asmuth <jas...@gmail.com>:

> The short answer: set GOMAXPROCS to 2 (or more).
>
> The long answer is that if you have two goroutines multiplexed onto
> the same process (which will always be the case when GOMAXPROCS is 1)
> then the one goroutine has to yield control to the other for it to run
> at all. The ways to yield control include system calls, io and channel
> reading/writing.
>
> For instance, this code will either print immediately or never if
> GOMAXPROCS is 1, since nothing in that infinite loop yields control.

His example explicitly sets GOMAXPROCS to 2. I think the question is
again more to why a second OS thread is not created although there are
clearly two goroutines that could benefit from running concurrently
(or if a second thread is created, why one of the goroutines is not
being scheduled on that thread).

If turning the GC off effectively causes the `desired' behavior, I'd
postulate that it is running in the second OS thread while the other
goroutines are scheduled on the same thread. However, the behavior
doesn't change for me even if I set GOMAXPROCS=4 on a 4-way system --
so I'm a bit curious as to why turning off GC ends up having the
goroutines scheduled concurrently myself (I would like to understand
the scheduler better). Even if I call runtime.LockOSThread in
bigcalc(), it seems to run synchronously. (I will note that if I do
that, it shaves about 4 seconds off the user time in either case, and
in the yielding case it shaves 1 second from the real time to have
them pinned to their own OS threads). A final guess would be that
since the goroutine never yields after it is entered, the scheduler
never has time to create a new OS thread for it to run on. Why turning
GC off changes this behavior is beyond me.

Thus: Is it only that another OS thread is created if one of the
goroutines yields at some point? (Should we consider trying to create
a new thread when spawning a goroutine?) I understand that goroutines
don't imply threads in the 6g toolchain, but if we can, why not?

--dho

Russ Cox

unread,
Jan 8, 2010, 3:53:22 PM1/8/10
to Dean Prichard, golang-nuts
This is just a bug.  The second proc doesn't get
created until the first system call after the go,
rather than at the time of the go.

Looking into it.

Russ

Devon H. O'Dell

unread,
Jan 8, 2010, 3:54:00 PM1/8/10
to andrey mirtchovski, golang-nuts
2010/1/8 andrey mirtchovski <mirtc...@gmail.com>:

>> The short answer: set GOMAXPROCS to 2 (or more).
>
> all of the tests are done with GOMAXPROCS>1, indeed the problem does
> not appear with GOMAXPROCS=1.
>
> something else noticed related to this code -- when GOMAXPROCS>1,
> sending SIGQUIT to the process causes a segfault during the dump of
> the runtime state. if the goroutines execute the sleep call then the
> runtime state is dumped correctly on SIGQUIT. how does one go about
> debugging this? gcc is useless on first glance.

When I was debugging thread issues with the FreeBSD port, it was gcc +
registers + a good bit of hand-holding by Russ for what to look out
for and how to find it in some cases.

> andrey
>
> -------------------------------------
>
> SIGQUIT: quit
> Faulting address: 0x1f400004b09
> PC=0x420f93
>
> futex+0x23 /home/andrey/go/src/pkg/runtime/linux/amd64/sys.s:128
>       futex()

Weird. sys.s:128 for me is after the RET, and objdump is showing that
futex()+0x23 = retq.

--dho

Russ Cox

unread,
Jan 8, 2010, 5:25:37 PM1/8/10
to Dean Prichard, golang-nuts
This is a different bug than I thought.
The use of the single cpu happens because one cpu
has decided it's time to garbage collect.  It has lowered
GOMAXPROCS to 1 and is patiently waiting for the
other cpu to see that GOMAXPROCS has fallen
(temporarily) and reschedule.  Unfortunately, the
other cpu doesn't ever do this: it's computing away,
at least until it decides to do a garbage collection too.
That mode predominates during the run, with the net
effect that only one cpu is being used.

The "should I reschedule?" check only happens
when exiting a system call, so adding the Sleep(1)
essentially inserted more checks and cut the
garbage collector's wait time.  

Eventually the fix is to preempt goroutines when necessary
(and replace the garbage collector, but even the new one
will need to cause brief preemptions), but for now I will
add checks at the beginning of malloc and channel and
map operations, all of which are heavy enough that
they can easily support one more conditional branch.
That should alleviate the problem until there's a real fix.

If anyone knows how to reliably send a signal to a specific
thread on Linux, or on FreeBSD, or on OS X, please let
me know.  We'll need all of those eventually.
Thanks.
Russ

Devon H. O'Dell

unread,
Jan 8, 2010, 5:40:31 PM1/8/10
to r...@golang.org, Dean Prichard, golang-nuts
Great explanation, thank you!

> That should alleviate the problem until there's a real fix.
> If anyone knows how to reliably send a signal to a specific
> thread on Linux, or on FreeBSD, or on OS X, please let
> me know.  We'll need all of those eventually.

Documented in gchat, but just so it's here for my future reference if
need-be, thr_kill2 is the proper syscall to use for this, and tid can
never == pid.

--dho

> http://codereview.appspot.com/184043
> Thanks.
> Russ

Sergio Luis O. B. Correia

unread,
Jan 8, 2010, 5:50:40 PM1/8/10
to Devon H. O'Dell, r...@golang.org, Dean Prichard, golang-nuts
On Fri, Jan 8, 2010 at 7:40 PM, Devon H. O'Dell <devon...@gmail.com> wrote:
> Great explanation, thank you!
>
>> That should alleviate the problem until there's a real fix.
>> If anyone knows how to reliably send a signal to a specific
>> thread on Linux, or on FreeBSD, or on OS X, please let
>> me know.  We'll need all of those eventually.
>
> Documented in gchat, but just so it's here for my future reference if
> need-be, thr_kill2 is the proper syscall to use for this, and tid can
> never == pid.

the linux equivalent one would be tgkill(2), I believe.

sergio

andrey mirtchovski

unread,
Jan 9, 2010, 1:20:28 PM1/9/10
to golang-nuts
> something else noticed related to this code -- when GOMAXPROCS>1,
> sending SIGQUIT to the process causes a segfault during the dump of
> the runtime state.

digging slightly deeper, it appears that during traceback of the code
in the original email (-s=f) pkg/runtime/amd64/traceback.c somehow
arrives at a bad pc value and then dereferences it at line 42. i'm
seeing two different bad values for (byte*)p for the same binary:

0x7fd4fa181
0x300000003


the comment above line 42 indicates that this is expected behaviour?
anyways, the segfault occurs always in the second goroutine (which
should be busy computing), right after it prints

gosched+0x34 /home/andrey/go/src/pkg/runtime/proc.c:524
gosched()

Dean Prichard

unread,
Jan 12, 2010, 12:51:39 PM1/12/10
to golang-nuts
Thanks Russ.

the latest patch at:

http://codereview.appspot.com/186078

fixes things for me.

Reply all
Reply to author
Forward
0 new messages