Go's scheduler always “stuck” after certain time when executing the code below

1,125 views
Skip to first unread message

Ronald

unread,
Jun 20, 2017, 11:40:12 PM6/20/17
to golang-nuts
Hi, I found when running this code below (play):

package main
import "fmt"
import "runtime"
import "time"
type callRet struct {
ret int
}
type callIn struct {
ret_chan chan *callRet
arg1     int
}
func caller(call_in_c chan *callIn, arg1 int) int {
ret_c := make(chan *callRet, 1)
ci := callIn{ret_c, arg1}
t1 := time.Now().UnixNano()
call_in_c <- &ci
ret := <-ret_c
t2 := time.Now().UnixNano()
_, _ = t1, t2
//fmt.Println(t2, t1, float64(t2-t1)/1e6)
return ret.ret
}
func call_srv(call_in_c chan *callIn) {
//runtime.LockOSThread()
ct := 0
for {
select {
case in := <-call_in_c:
ret_c := in.ret_chan
ret := callRet{3 + in.arg1}
ret_c <- &ret
ct++
if ct%1000 == 0 {
fmt.Println(ct)
}
//default:
// time.Sleep(1 * time.Millisecond)
}
}
//runtime.UnlockOSThread()
}
func init() {
//runtime.LockOSThread()
}
func main() {
p := fmt.Println
runtime.GOMAXPROCS(5)
call_in_c := make(chan *callIn, 2000)
fp := func(call_in_c chan *callIn) {
ct := 0
for ; ct < 2000000; ct = ct + 1 {
caller(call_in_c, 1)
time.Sleep(100 * time.Nanosecond)
}
p("done:)")
}
go fp(call_in_c)
go fp(call_in_c)
go func() {
for {
fmt.Println("heartbeat...")
time.Sleep(10 * 1000 * time.Microsecond)
}
}()
                   // after delete this little piece code, it will be ok
go func() {     
for {
}
}()
                   // -^
call_srv(call_in_c)
return
}
It will always “stuck” in certain time after start executing (tested in both go 1.7 , 1.8.1 and 1.8.3, 
and you should run it on your own computer, not on the play because it would cost several seconds). 
I have checked the code above dozens of times at least and sure it is all right.

I tried to attach the go process with gdb when it is stuck and the `bt` yielded this:


(gdb) info goroutines
  1 waiting  runtime.gopark
  2 waiting  runtime.gopark
  3 waiting  runtime.gopark
  4 waiting  runtime.gopark
  5 waiting  runtime.gopark
* 6 running  runtime.systemstack_switch
  7 waiting  runtime.gopark
* 8 running  main.main.func3
  17 runnable runtime.exitsyscall
  33 waiting  runtime.gopark
  18 waiting  runtime.gopark
  49 waiting  runtime.gopark
  50 waiting  runtime.gopark
  51 waiting  runtime.gopark
(gdb) goroutine 6 bt
#0  runtime.systemstack_switch () at /usr/local/go/src/runtime/asm_amd64.s:281
#1  0x0000000000415088 in runtime.gcStart (mode=0, forceTrigger=false)
    at /usr/local/go/src/runtime/mgc.go:1010
#2  0x000000000040d841 in runtime.mallocgc (size=96, typ=0x4a7fa0, needzero=true,
    ~r3=0x507998 <runtime.timers+24>) at /usr/local/go/src/runtime/malloc.go:785
#3  0x000000000040df68 in runtime.newobject (typ=0x4a7fa0, ~r1=0x0)
    at /usr/local/go/src/runtime/malloc.go:820
#4  0x00000000004035d2 in runtime.makechan (t=0x492960, size=1, ~r2=0xc4200016c0)
    at /usr/local/go/src/runtime/chan.go:92
#5  0x00000000004840e4 in main.caller (call_in_c=0xc420056060, arg1=1, ~r2=0)
    at /root/go/src/chan/x.go:18
#6  0x00000000004846ac in main.main.func1 (call_in_c=0xc420056060) at /root/go/src/chan/x.go:62
#7  0x000000000044da21 in runtime.goexit () at /usr/local/go/src/runtime/asm_amd64.s:2197
#8  0x000000c420056060 in ?? ()
#9  0x0000000000000000 in ?? ()


Ronald

unread,
Jun 20, 2017, 11:45:20 PM6/20/17
to golang-nuts
PS: 
go func() {     
for {
}
}()

The reason of this code above is that I want to test the channel's latency 
when some 'indecency' goroutine is seizing several whole OS threads. 

Dave Cheney

unread,
Jun 21, 2017, 12:55:23 AM6/21/17
to golang-nuts
Yes, this is expected. Don't write infinite loops like that.

Ronald

unread,
Jun 21, 2017, 1:12:12 AM6/21/17
to golang-nuts
I found this: https://github.com/golang/go/issues/7190.

Then I replace :
go func() {     
for {
}
}()
 With:
 
    go func() {
        ct := 0
        t1 := time.Now().UnixNano()
        for {
            ct++
            if ct%600000000 == 0 {
                t2 := time.Now().UnixNano()
                fmt.Println("hooray", t2, t1, float64(t2-t1)/1e6)
                t1 = t2
            }
        }
    }()

Thus the program does not stuck anymore -- But it is thrashing 
and churning! (With a time delay nearly 100-200ms based on 
value '600000000' and the speed of your computer)

So I 'upgrade' the code again with:

    go func() {
        ct := 0
        t1 := time.Now().UnixNano()
        for {
            ct++
            if ct%100000 == 0 {
                runtime.Gosched()
            }
            if ct%600000000 == 0 {
                t2 := time.Now().UnixNano()
                fmt.Println("hooray", t2, t1, float64(t2-t1)/1e6)
                t1 = t2
            }
        }
    }()

Then it looks just fine.But a pretty serious question emerged:

    P1:

        No matter how much the GOMAXPROCS is, if you run a very 
        CPU intensive go routine, your system will become definitely 
        thrashing and churning, and the time delay is depending on 
        how intensive you go code is.

Is proposition P1 is always true on golang so far?

If P1 is true, as the GOMAXPROCS is bigger than 1, can't the scheduler
rearrange the waiting staff (including pending go routines and other core 
business) to other idle threads when some 'indecency' goroutine is seizing 
the several busiest threads?

Ronald

unread,
Jun 21, 2017, 1:15:59 AM6/21/17
to golang-nuts
Hi Dave, thanks for your reply :)

Could you please have one more look on this?

Looking forward to hear the answer from you, 
thanks a lot.

Ronald

unread,
Jun 21, 2017, 1:30:03 AM6/21/17
to golang-nuts
If proposition P1 is always true on golang, then we could 
deduce that golang is not good at cpu intensive applications, 
of course we could unload the jobs to Cgo, but that would 
be no consistency at all.

Dave Cheney

unread,
Jun 21, 2017, 6:51:49 AM6/21/17
to golang-nuts
Go is not good for functions which never call other, take locks, read or write data, or communicate with others. In practice, most code does one or more of these things so it's rarely an issue.

Ronald

unread,
Jun 21, 2017, 7:42:14 AM6/21/17
to golang-nuts
On Wednesday, 21 June 2017 12:55:23 UTC+8, Dave Cheney wrote:
Yes, this is expected. Don't write infinite loops like that.

Sorry Dave, but I still couldn't get why it would 'stuck' 
for the very first piece of code. 

Of course it would be pretty reasonable that the go 
process would be 'stuck' if the GOMAXPROCS is 1.

But if the GOMAXPROCS is bigger than 1, it's unreasonable
that the process would be stuck because we still have plenty of
idle 'GOMAXPROCS' left behind.

PS:

This is working as intended. 
time.Sleep is defined as it will fire _after_ the duration passed to the function, so
technically this is not a bug.

To be honestly, I don't know what you exactly mean. "time.Sleep is defined as it will 
fire _after_ the duration passed to the function", this is right of course. But how could 
you get the conclusion: "so technically this is not a bug" ?
 

Ronald

unread,
Jun 21, 2017, 7:50:35 AM6/21/17
to golang-nuts

Replace :
 
go func() {    
for {
}
}()
 
 With:
 
    go func() {
        ct := 0
        t1 := time.Now().UnixNano()
        for {
            ct++
            if ct%600000000 == 0 {
                t2 := time.Now().UnixNano()
                fmt.Println("hooray", t2, t1, float64(t2-t1)/1e6)
                t1 = t2
            }
        }
    }()

But how could you explain this situation?
This piece code is not "dead spin", it is just cpu intensive
 
Thus the program does not stuck anymore -- But whole process 
is thrashing and churning! (With a time delay nearly 100-200ms
based on value '600000000' and the speed of your computer)

Is this a big flaw for golang when fulfill some cpu intensive staff?

Jan Mercl

unread,
Jun 21, 2017, 8:09:23 AM6/21/17
to Ronald, golang-nuts
On Wed, Jun 21, 2017 at 1:50 PM Ronald <yunth...@gmail.com> wrote:

> for {
>         ct++
>         if ct%600000000 == 0 {
>                 t2 := time.Now().UnixNano()
>                 fmt.Println("hooray", t2, t1, float64(t2-t1)/1e6)
>                 t1 = t2
>         }
> }
> This piece code is not "dead spin", it is just cpu intensive. 

This piece of code prohibits cooperative scheduling in 599 999 999 out of 600 000 000 times. It's a nice example how to not code in Go. Particularly, delays should be implemented using the functions in the time package, not by burning CPU cycles at the expense of other useful work that could have been done instead.

--

-j

Sen Han

unread,
Jun 21, 2017, 8:49:59 AM6/21/17
to Jan Mercl, golan...@googlegroups.com
This piece of code prohibits cooperative scheduling in 599 999 999 out of 
600 000 000 times. It's a nice example how to not code in Go.

Thanks Jan for your reply :)
 
Particularly, delays should be implemented using the functions in the time
package, not by burning CPU cycles at the expense of other useful work 
that could have been done instead.
 
Sure this kinda of dead loop is just an simulation about some really CPU intensive jobs.
And I think for those kind of job, the delay about range [20 ms- inf ms) is quite ordinary.
Thus I think that kind of writing is acceptable :D

Ronald

unread,
Jun 21, 2017, 9:06:12 AM6/21/17
to golang-nuts, 0xj...@gmail.com
And I had write a test case to be a proof that:

current golang's scheduler is not 
good at cpu intensive applications

Here it is:

package main

import "fmt"
import "runtime"

//import "math/big"
import "time"

/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
int deadloop() {
long long ct=0;
struct timespec start, stop;
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
while (1) {
ct++;
if (ct%800000000==0){
clock_gettime(CLOCK_MONOTONIC_RAW, &stop);
printf(
"cgo-> ct:%ld | c loop delay:%lu ms\n",
ct,
((stop.tv_sec - start.tv_sec) * 1000000 + (stop.tv_nsec - start.tv_nsec) / 1000)/1000);
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
}
}
    return rand();
}

*/
import "C"

type callRet struct {
ret int
}

type callIn struct {
ret_chan chan *callRet
arg1     int
}

var delta_t_ms float64

func caller(call_in_c chan *callIn, arg1 int) int {
ret_c := make(chan *callRet, 1)
ci := callIn{ret_c, arg1}
t1 := time.Now().UnixNano()
call_in_c <- &ci
ret := <-ret_c
t2 := time.Now().UnixNano()
_, _ = t1, t2
if float64(t2-t1)/1e6 > delta_t_ms {
delta_t_ms = float64(t2-t1) / 1e6
}
//fmt.Println(t2, t1, float64(t2-t1)/1e6)
return ret.ret
}

func call_srv(call_in_c chan *callIn) {
//runtime.LockOSThread()
ct := 0
for {
select {
case in := <-call_in_c:
ret_c := in.ret_chan
ret := callRet{3 + in.arg1}
ret_c <- &ret
ct++
if ct%10000 == 0 {
fmt.Println("channel_op_ct:", ct, " max_channel_op_delay:", delta_t_ms, "ms")
}
//default:
// time.Sleep(1 * time.Millisecond)
}
}
//runtime.UnlockOSThread()
}

func init() {
//runtime.LockOSThread()
}

func main() {
p := fmt.Println
runtime.GOMAXPROCS(2)

call_in_c := make(chan *callIn, 2000)

fp := func(call_in_c chan *callIn) {
ct := 0
for ; ct < 10*60000; ct = ct + 1 {
caller(call_in_c, 1)
time.Sleep(time.Microsecond / 1000)
}
p("done:)")
}
go fp(call_in_c)
//go fp(call_in_c)
go func() {
for {
fmt.Println("heartbeat... ", "max_channel_op_delay:", delta_t_ms, "ms")
time.Sleep(1000 * 1000 * time.Microsecond)
}
}()

godeadloop := func() {
ct := 0
t1 := time.Now().UnixNano()
for {
ct++
if ct%360000000 == 0 {
t2 := time.Now().UnixNano()
fmt.Println("hooray", t2, t1, float64(t2-t1)/1e6)
t1 = t2
}
if ct%360000000 == 0 {
runtime.Gosched()
}
}
}
_ = godeadloop
_ = C.deadloop
go C.deadloop()
//go godeadloop()    // <<---------
call_srv(call_in_c)
return
}

After go run it yields:

CPU intensive code in cgo:
 
heartbeat...  max_channel_op_delay: 0.81509 ms
channel_op_ct: 590000  max_channel_op_delay: 0.81509 ms
channel_op_ct: 600000  max_channel_op_delay: 0.81509 ms
done:)
cgo-> ct:3200000000 | c loop delay:1039 ms
heartbeat...  max_channel_op_delay: 0.81509 ms

And after changing 'go C.deadloop' to 'go godeadloop()' (at the very 
bottom of this codes) it yields:

CPU intensive code in pure go code:
 
hooray 1498049095239314392 1498049094319306332 920.00806 ms (go dead loop)
channel_op_ct: 600000  max_channel_op_delay: 970.921145 ms
done:)
heartbeat...  max_channel_op_delay: 970.921145 ms
hooray 1498049096158299639 1498049095239314392 918.985247

You see! There is 0.8ms latency drops to 970ms after using cpu intensive 
codes in golang comparing to unload it to cgo directly. And this kinda penalty 
could definitely cause the unstability of go application which has some really 
massive things to handle in pure go codes.

Chris Hopkins

unread,
Jun 21, 2017, 9:12:49 AM6/21/17
to golang-nuts, 0xj...@gmail.com

Particularly, delays should be implemented using the functions in the time
package, not by burning CPU cycles at the expense of other useful work 
that could have been done instead.
 
Sure this kinda of dead loop is just an simulation about some really CPU intensive jobs.
And I think for those kind of job, the delay about range [20 ms- inf ms) is quite ordinary.
Thus I think that kind of writing is acceptable :D

It seems really pointless if you've got lots of work to do to have to sit in a delay.
 Would not a call to runtime.Gosched() be more efficient. That way you accept that you yeald to other procs, but still allow yourself to be rescheduled immediately if there is nothing else in the queue?

Ronald

unread,
Jun 21, 2017, 9:37:59 AM6/21/17
to golang-nuts, 0xj...@gmail.com


It seems really pointless if you've got lots of work to do to have to sit in a delay.

Some CPU intensive application indeed sit here and do some massive mathematic calculation and memory manipulation.There is too many examples, say some de/en-coding worker for example. 
 
 Would not a call to runtime.Gosched() be more efficient. That way you accept that you yeald to other procs, but still allow yourself to be rescheduled immediately if there is nothing else in the queue?

This the point.

The more runtime.Gosched() function calls you put into the 
places of your CPU intensive go codes, the less latency it 
could be, but the CPU cost on the core go runtime is unwanted 
increasing simultaneously (the cost on go core runtime is 
the less the better from the point of view of the go applications). 
So there is unresolved conflict.

But as there is still many idle GOMAXPROCS left behind idly, I 
can not see any reason for the go core runtime to suspend all the
other goroutines just because there is one 'indecency' cpu intensive 
goroutine is fully occupying just one os thread. There must something 
could be done at the inner of go scheduler that could optimize such 
scenarios.

Jesper Louis Andersen

unread,
Jun 21, 2017, 9:41:43 AM6/21/17
to Ronald, golang-nuts
On Wed, Jun 21, 2017 at 5:40 AM Ronald <yunth...@gmail.com> wrote:
Hi, I found when running this code below (play):

You are looking at a property of a specific Go implementation, not a property of the language.

Any system must define a set of "preemption points" at which you can switch out one process for another. Historically go only preempted on channel communication and (blocking) system calls. Later it also added preemption on function calls (it piggybacks on stack checks). Adding more preemption requires you to add checks inside loops so a loop eventually gets preempted, but the caveat is that this might slow you program down when you have a tight loop for which the compiler can prove no eventual bound.

The current best workaround is to make sure that lengthy work periodically hits a preemption point some way or the other. Later versions of Go might solve this problem by adding those checks itself automatically to your programs to improve their latencies.

Other concurrent languages use different methods:

Erlang preempts like Go does currently, but since loops are implemented with recursion, you are sure to hit a preemption point every time you go round the loop.

Haskell preempts on memory allocation and Haskell programs tend to allocate memory quite frequently. The exception is tight (fusioned) loop code on arrays.

Go is doing a *far* better job here than almost every other programming language out there when it comes to fair scheduling. And there is a somewhat simple workaround: just add a function call occasionally depending on your required latency bound.



Sen Han

unread,
Jun 21, 2017, 10:21:05 AM6/21/17
to golan...@googlegroups.com
> just add a function call occasionally depending on your required latency bound

Thanks Jesper for your all detailed explanation. I quite 
agree with your point of view and conclusion.

One thing left here should to be discussed is about the 
go scheduler's unwanted behaviour.

Unwanted behaviour of the scheduler:

After go run the codes at the topmost of this thread, we got:

$ time go run na.go
heartbeat...
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
heartbeat...
12000
13000
heartbeat...
14000
15000
16000
17000
18000
19000
heartbeat...
^Csignal: interrupt
real    0m0.779s
user    0m0.720s
sys     0m0.152s

(The process stuck here forever and no new outputs anymore

As the runtime.GOMAXPROCS is 5 now, why can't the scheduler just
let the other several goroutines which is already pending all switch 
to the 4 left idly goroutine 'slots' and run on instead of suspend forever?
I think there is probably some design flaw about it.

On Wed, Jun 21, 2017 at 10:18 PM, Sen Han <yunth...@gmail.com> wrote:
> just add a function call occasionally depending on your required latency bound

Thanks Jesper for your all detailed explanation. I quite 
agree with your point of view and conclusion.

One thing left here should to be discussed is about the 
go scheduler's unwanted behaviour.

Unwanted behaviour of the scheduler:

After go run the codes at the topmost of this thread, we got:

$ time go run na.go
heartbeat...
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
heartbeat...
12000
13000
heartbeat...
14000
15000
16000
17000
18000
19000
heartbeat...
^Csignal: interrupt
real    0m0.779s
user    0m0.720s
sys     0m0.152s

(The process stuck here forever and no new outputs anymore

As the runtime.GOMAXPROCS is 5 now, why can't the scheduler just
let the other several goroutines which is already pending all switch 
to the 4 left idly goroutine 'slots' and run on instead of suspend forever?
I think there is probably some design flaw about it.

Chris Hines

unread,
Jun 21, 2017, 3:30:05 PM6/21/17
to golang-nuts

David Chase

unread,
Jun 21, 2017, 3:36:43 PM6/21/17
to golang-nuts
Try compiling your code with -gcflags -d=ssa/insert_resched_checks/on .
You should see the scheduler thrashing stop, at the expense of some loop overhead.
(The simpler the loop, the worse the overhead, that is why this phase is not normally enabled.
We're working on the overhead issues, they've not been easy so far.)

Keith Randall

unread,
Jun 21, 2017, 8:37:41 PM6/21/17
to golang-nuts
The reason why the other goroutines can't run is garbage collection.
Because there is one goroutine that's not responding to the runtime's stop request, the runtime can't finish the garbage collection. That outstanding goroutine might have pointers to otherwise unreachable objects.  The runtime can't look at the pointers a goroutine has until it can get the goroutine to stop.
The runtime can't let the other goroutines keep running at this point.  It needs to finish the garbage collection so that it can reclaim memory. Otherwise your program's pathological behavior would switch from hanging to OOMing.

Ronald

unread,
Jun 25, 2017, 8:34:22 AM6/25/17
to golang-nuts
Thanks to all of you :)

Recently I have implemented a sub-option in the `go tool trace` named "diagreedy" 
which means "diagnoses and finds out the greediest several goroutines". This tool 
already helped us track down several deep hiden problems in our go applications 
and achieved more stable and short GC pause latency by fixing them afterwards.

If you are interested in this little tool, there is the repo and the go proposal issue. 

All best 
Reply all
Reply to author
Forward
0 new messages