Goroutine scheduled 10 seconds too late

mic...@steelcode.com

unread,

Aug 22, 2019, 8:09:50 PM8/22/19

to golang-nuts

I have a fairly complex program that has about 150 goroutines servicing requests. Some requests go out to cgo, which does some network IO. There is one goroutine that is responsible for a heartbeat (etcd keepalive) and sometimes (~every two weeks) this goroutine doesn't get scheduled for long periods of time (more than ten seconds), so it misses the heartbeat.

I've been debugging this on and off for several months, and am completely sure it relates to scheduling because there is code that looks similar to this this:

fmt.Printf("before %v\n",time.Now())
<-time.After(500*time.Millisecond)
fmt.Printf("after %v\n",time.Now())

And the print statements show times more than ten seconds apart.

I considered it might be a very long GC stop the world (maybe some rogue unpreemptible goroutine) but I don't see any output from gctrace=1 that indicates that (I could be wrong)

The goroutines in cgo can take some time to complete their operations (normally tens of milliseconds, sometimes tens of seconds) but I have never observed more than 200 processes being created in response to the blocking M being moved off the P and a new M being created.

The total CPU usage doesn't exceed 60%, so there are spare CPU cycles.

I turned on scheduling trace, and here is the output from a problematic event:

# this is where we select on <-time.After(500ms)
[etcddebug] start of select 2019-08-01 06:29:55.873680674 +0000 UTC m=+132838.232219823
SCHED 132838304ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=102 runqueue=2 [0 0 3 0 0 0 0 0]
[INFO] 2019-08-01T06:29:56.084Z <some log output from another goroutine>
SCHED 132839312ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=105 runqueue=9 [0 0 0 0 5 0 0 0]
<some output from another goroutine>
SCHED 132840318ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=106 runqueue=12 [0 0 0 0 5 0 0 0]
SCHED 132841325ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=106 runqueue=12 [0 0 0 0 5 0 0 0]
SCHED 132842331ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=13 [0 0 0 0 5 0 0 0]
SCHED 132843338ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=13 [0 0 0 0 5 0 0 0]
SCHED 132844344ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=13 [0 0 0 0 5 0 0 0]
SCHED 132845351ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=14 [0 0 0 0 5 0 0 0]
SCHED 132846357ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=14 [0 0 0 0 5 0 0 0]
SCHED 132847364ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=107 runqueue=15 [0 0 0 0 5 0 0 0]
# This is where the program gets killed because the <-time.After goroutine never ran
[ERROR] 2019-08-01T06:30:05.472Z clustering.go:259 ETCD LEASE CHANNEL CLOSED

The program panics soon after that message (on purpose) and I get this printout from sched trace. Not sure what triggers it.

SCHED 132847848ms: gomaxprocs=8 idleprocs=0 threads=123 spinningthreads=0 idlethreads=97 runqueue=2 gcwaiting=0 nmidlelocked=1 stopwait=0 sysmonwait=0
P0: status=2 schedtick=714575223 syscalltick=286704429 m=-1 runqsize=0 gfreecnt=41
#this is the <-time.After goroutine finally running
[etcddebug] time.After 2019-08-01 06:30:05.473046143 +0000 UTC m=+132847.831585336 (227)
P1: status=1 schedtick=714230235 syscalltick=286072819 m=36 runqsize=0 gfreecnt=11
P2: status=2 schedtick=710948022 syscalltick=284960027 m=-1 runqsize=0 gfreecnt=53
P3: status=1 schedtick=707003392 syscalltick=283311908 m=88 runqsize=1 gfreecnt=53
P4: status=2 schedtick=698758214 syscalltick=280825500 m=-1 runqsize=0 gfreecnt=23
P5: status=2 schedtick=690236090 syscalltick=278745170 m=-1 runqsize=0 gfreecnt=38
P6: status=1 schedtick=685940734 syscalltick=276797704 m=77 runqsize=0 gfreecnt=51
P7: status=2 schedtick=681779831 syscalltick=275239769 m=-1 runqsize=0 gfreecnt=50

I've tried several times to create a simple reproducing program, and have failed. I'm reaching out to ask if anyone could help me interpret the output from the scheduling trace, or perhaps even suggest what might be causing this, so I can create a standalone reproducing program.

Thanks

Michael

Ian Lance Taylor

unread,

Aug 23, 2019, 1:23:21 AM8/23/19

to Michael Andersen, golang-nuts

One possibility is an unpremptible loop. Can you get a stack trace at
the point where the timeout fails? Perhaps where you print ERROR
above you could call runtime.Stack(buf, true) and print the output.
Look at the code that you see from that output and see if it any of it
might be in an unpreemtible loop.

Ian

Michael Andersen

unread,

Aug 23, 2019, 11:58:44 AM8/23/19

to Ian Lance Taylor, golang-nuts

Thanks, that's a good idea. I'll add that in. It can take a while to reproduce, so it might be a week or so before I have the results.

Ronny Bangsund

unread,

Aug 23, 2019, 4:23:22 PM8/23/19

to golang-nuts

On Friday, August 23, 2019 at 5:58:44 PM UTC+2, Michael Andersen wrote:

It can take a while to reproduce, so it might be a week or so before I have the results.

Can you think of ways to cause the same memory pressure quicker? Mocking data, running the functions the number of times they would have run in a two-week period etc.

Michael Andersen

unread,

Aug 23, 2019, 5:03:31 PM8/23/19

to Ronny Bangsund, golang-nuts

It's not memory pressure, there's a ton of free memory and I'm not GC'ing particularly often. I've tried to create a smaller reproducing example but haven't had any luck so far. I have the servers hooked up to a synthetic load, so there are always exactly the same number of requests being made, at the same rate, and it sometimes happens in a day, sometimes in two weeks. Fortunately this is in a development cluster, not production. Incidentally, it's happened in several clusters, some in the cloud some on bare metal, so it's unlikely to be a hardware problem.

I've added the stack print, as well as a runtime.schedtrace(true) at the point of the error, so hopefully I will have something to work off the next time it happens.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/RT0vrWEc5fw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d3d080e0-30d5-4a5c-b5fd-5014071f3bfc%40googlegroups.com.

Robert Engels

unread,

Aug 23, 2019, 5:12:53 PM8/23/19

to Michael Andersen, Ronny Bangsund, golang-nuts

If you write an long running non pre-emptable all go routines are effected.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAPLpPrtFXzN9itXStxCW-j8N172h1LrGsZxOqU9G5QwQsOZJyQ%40mail.gmail.com.

Michael Andersen

unread,

Aug 23, 2019, 5:30:27 PM8/23/19

to Ronny Bangsund, golang-nuts

On Fri, Aug 23, 2019 at 2:12 PM Robert Engels <ren...@ix.netcom.com> wrote:

If you write an long running non pre-emptable all go routines are effected.

Yeah, I've come across the case before where GC would begin a stop the world but get stuck waiting for a non-preemtable goroutine. I don't think that's happening here because when the program gets "unstuck" as the error gets printed, there is no gctrace output that indicated a GC happened. I can make sure by adding a dummy goroutine that just logs output every second. That would make it more obvious that more than one goroutine is being affected. Thanks for the suggestion.

Are you suggesting that there might be enough unpreemtable goroutines to fill all the P's? I do have several cgo goroutines sitting in syscalls, but my understanding was that the scheduler would "preempt" that by moving the M off the P and creating a new M.

Thanks

Ian Lance Taylor

unread,

Aug 23, 2019, 6:33:03 PM8/23/19

to Michael Andersen, Ronny Bangsund, golang-nuts

On Fri, Aug 23, 2019 at 2:30 PM Michael Andersen <mic...@steelcode.com> wrote:
>
> Are you suggesting that there might be enough unpreemtable goroutines to fill all the P's? I do have several cgo goroutines sitting in syscalls, but my understanding was that the scheduler would "preempt" that by moving the M off the P and creating a new M.

That is how it is supposed to work, yes. That process can take up to
10ms, but that doesn't explain what you are seeing.

Ian

Michael Andersen

unread,

Aug 23, 2019, 10:36:46 PM8/23/19

to Ian Lance Taylor, Ronny Bangsund, golang-nuts

Ok, so I have more information, and it's not what I would expect.

I added scheddetail=1,schedtrace=2000 so that I had a list of which M's and G's were on the P's during the 5 seconds that scheduling stalled. I added a sentinel goroutine that sleeps 1 second in a loop and panics if the sleep takes more than 4 seconds. As soon as the sentinel wakes up after the stall, it dumps all the stacks for all the goroutines under the assumption that at least some of them might be in the same place as they were during the stall. Thus, I know for sure which goroutines caused the problem and maybe which piece of the code they were in.

During the stall, all of the P's had the same M's and the same G's, so nothing was changing except the idlethreads and runqueues (as I would expect because my sentinel goroutine didn't get run). Here is a compilation of the information captured during the stall.

SCHED 3425490ms: gomaxprocs=8 idleprocs=0 threads=52 spinningthreads=0 idlethreads=35 runqueue=8 gcwaiting=0 nmidlelocked=1 stopwait=0 sysmonwait=0

< same M's on the same P's with the same curg's as below >

SCHED 3427497ms: gomaxprocs=8 idleprocs=0 threads=52 spinningthreads=0 idlethreads=36 runqueue=9 gcwaiting=0 nmidlelocked=1 stopwait=0 sysmonwait=0
P0: status=1 schedtick=22978494 syscalltick=6811602 m=44 runqsize=0 gfreecnt=2
M44: p=0 curg=12934173 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934173: status=2(chan receive) m=44 lockedm=-1

P1: status=1 schedtick=22848756 syscalltick=6851102 m=34 runqsize=0 gfreecnt=32
M34: p=1 curg=12934084 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934084: status=2(chan receive) m=34 lockedm=-1

P2: status=1 schedtick=22743131 syscalltick=6730185 m=17 runqsize=0 gfreecnt=37
M17: p=2 curg=12934032 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934032: status=2(chan receive) m=17 lockedm=-1

P3: status=1 schedtick=22674653 syscalltick=6516472 m=11 runqsize=0 gfreecnt=20
M11: p=3 curg=12934147 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934147: status=2(chan receive) m=11 lockedm=-1

P4: status=1 schedtick=22693327 syscalltick=6322345 m=33 runqsize=0 gfreecnt=12
M33: p=4 curg=12934059 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934059: status=2(chan receive) m=33 lockedm=-1

P5: status=1 schedtick=22370152 syscalltick=6219903 m=0 runqsize=0 gfreecnt=39
M0: p=5 curg=12934131 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934131: status=2(chan receive) m=0 lockedm=-1

P6: status=1 schedtick=21910259 syscalltick=6171748 m=51 runqsize=0 gfreecnt=39
M51: p=6 curg=12934126 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934126: status=2(chan receive) m=51 lockedm=-1

P7: status=1 schedtick=22019793 syscalltick=6091894 m=32 runqsize=1 gfreecnt=30
M32: p=7 curg=12934191 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
G12934191: status=2(chan receive) m=32 lockedm=-1

After the sentinel goroutine was scheduled (~1 second after this schedtrace output) I captured the stacks and crossreferenced the G's. All of them are in time.Now()

goroutine 12934173 [runnable]:
time.Now(0x7f453a441dc0, 0xc0169fe000, 0x5a0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934084 [runnable]:
time.Now(0x7f4614a65400, 0xc0155a2000, 0x1680)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934032 [runnable]:
time.Now(0x7f457d74ae20, 0xc023179400, 0x3c0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934147 [runnable]:
time.Now(0x7f461fcca4c0, 0xc00cf38400, 0x3c0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934059 [runnable]:
time.Now(0x7f45ff9dcd00, 0xc0245b1000, 0xf00)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934131 [runnable]:
time.Now(0x7f4513453e80, 0xc02a96c000, 0x5a0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934126 [runnable]:
time.Now(0x7f457fd46e60, 0xc016889200, 0x5a0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

goroutine 12934191 [runnable]:
time.Now(0x7f45088d6b80, 0xc01770ec00, 0x5a0)
/usr/local/go/src/time/time.go:1087 +0xb2
-- line in request handler in my app--

I went through the path that a goroutine takes to get to that time.Now() line and it flows from GRPC through to some simple sanitization. There are function calls sprinkled everywhere and no loops that would be unpreemtible, so I don't think they all got stuck in some tight loop before the time.Now and then somehow all proceeded a few lines so that they ALL were on time.Now when I dumped the stacks. I think they were stuck in time.Now() during the scheduling stall.

This is running in kubernetes on AWS, although I have seen it on a bare metal kubernetes cluster too.

Any ideas where to progress from here?

Michael Andersen

unread,

Aug 23, 2019, 10:37:45 PM8/23/19

to Ian Lance Taylor, Ronny Bangsund, golang-nuts

This is built with go 1.12.5, incidentally, but I have seen this on several go versions spanning several months.

Steven Hartland

unread,

Aug 24, 2019, 4:28:44 AM8/24/19

to Michael Andersen, Ian Lance Taylor, Ronny Bangsund, golang-nuts

Out of interest what OS?

Is the machine virtualised?

You might want to try doing an OS syscall trace to see if it’s stuck in an OS call for some reason.

Regards

Steve

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAPLpPruoeZfEHBHrZJQPM1Ynd3u0gZXPbwBfMKb%2BnQb8RwKU-A%40mail.gmail.com.

Ian Lance Taylor

unread,

Aug 24, 2019, 5:26:01 PM8/24/19

to Steven Hartland, Michael Andersen, Ronny Bangsund, golang-nuts

On Sat, Aug 24, 2019 at 1:28 AM Steven Hartland <ste...@multiplay.co.uk> wrote:
>
> Out of interest what OS?
> Is the machine virtualised?
>
> You might want to try doing an OS syscall trace to see if it’s stuck in an OS call for some reason.

Also, are you running a kernel with VDSO enabled? With VDSO,
`time.Now` should not be making any system calls. It's hard to
understand why it would block. Although the VDSO code does loop, and
perhaps virtualization affects that in some way that I'm not aware of.

Does your code call `time.Now` a lot?

Ian

Michael Andersen

unread,

Aug 25, 2019, 12:24:37 AM8/25/19

to Ian Lance Taylor, Steven Hartland, Ronny Bangsund, golang-nuts

> Out of interest what OS?
> Is the machine virtualised?

Yes, this on EC2, on m5.2xlarge (which are nitro instances with vDSO-supporting clock). I am running Linux 4.9.0

Also, are you running a kernel with VDSO enabled? With VDSO,
`time.Now` should not be making any system calls. It's hard to
understand why it would block. Although the VDSO code does loop, and
perhaps virtualization affects that in some way that I'm not aware of.

I believe vDSO is enabled. I ran this check:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock
strace -e gettimeofday,clock_gettime -- date >/dev/null

+++ exited with 0 +++

and there are no syscalls observed by strace.

Does your code call `time.Now` a lot?

Yes, about 100k times a second. I'll look into reducing that, but even if that solves the problem, I would still be curious as to why it's happening.

I'll dig into this a bit more on Monday.

Michael Andersen

unread,

Aug 26, 2019, 5:12:18 PM8/26/19

to Ian Lance Taylor, Steven Hartland, Ronny Bangsund, golang-nuts

Hi

I was concerned that perhaps the stacks that I dumped after the scheduling stall did not reflect the actual PC's of the goroutines during the stall, so I modified schedtrace to dump out g.sched.pc and g.m.vdsoPC for each G currently on the P at the time schedtrace prints.

It just occurred again a few minutes ago. Previously, all the P's were running the problematic code, so other things weren't scheduled. This time it appears that only a few P's (4 of 8) were running the problematic code but the GC was trying to stop the world so even though there are P's with nothing on them, nothing else ran. All the PC's on the problematic goroutines (those with status=_Prunning) were static for the 3 seconds they stalled. They all had the same PC's:

g.sched.pc was 0x43389f which maps to: Line 302 of "/usr/local/go/src/runtime/proc.go" starts at address 0x43389f <runtime.gopark+239> and ends at 0x4338b7 <runtime.gopark+263>.

g.m.vdsoPC was 0x450c68 which maps to: Line 17 of "/usr/local/go/src/runtime/timestub.go" starts at address 0x450c63 <time.now+51> and ends at 0x450c8d <time.now+93>.

In case there is more useful information that I missed, I attached the log output.

I'm trying to make a simpler program that reproduces this, but thus far running time.Now in a tight loop across multiple goroutines isn't causing it. Any ideas what I could try next?

Thanks

Michael

stall_log.txt

Michael Andersen

unread,

Aug 26, 2019, 5:16:08 PM8/26/19

to Ian Lance Taylor, Steven Hartland, Ronny Bangsund, golang-nuts

Incidentally, it just happened again without involving the GC stop the world, where all 8 P's were running and had the same program counters for 4 seconds. They are the same as those in the previous email.

Ian Lance Taylor

unread,

Aug 27, 2019, 10:06:01 AM8/27/19

to Michael Andersen, Steven Hartland, Ronny Bangsund, golang-nuts

On Mon, Aug 26, 2019 at 2:12 PM Michael Andersen <mic...@steelcode.com> wrote:
>
> I was concerned that perhaps the stacks that I dumped after the scheduling stall did not reflect the actual PC's of the goroutines during the stall, so I modified schedtrace to dump out g.sched.pc and g.m.vdsoPC for each G currently on the P at the time schedtrace prints.
>
> It just occurred again a few minutes ago. Previously, all the P's were running the problematic code, so other things weren't scheduled. This time it appears that only a few P's (4 of 8) were running the problematic code but the GC was trying to stop the world so even though there are P's with nothing on them, nothing else ran. All the PC's on the problematic goroutines (those with status=_Prunning) were static for the 3 seconds they stalled. They all had the same PC's:
>
> g.sched.pc was 0x43389f which maps to: Line 302 of "/usr/local/go/src/runtime/proc.go" starts at address 0x43389f <runtime.gopark+239> and ends at 0x4338b7 <runtime.gopark+263>.
> g.m.vdsoPC was 0x450c68 which maps to: Line 17 of "/usr/local/go/src/runtime/timestub.go" starts at address 0x450c63 <time.now+51> and ends at 0x450c8d <time.now+93>.
>
> In case there is more useful information that I missed, I attached the log output.

g.sched.pc doesn't tell us much about the current state. It tells us
the PC value of the goroutine the last time the goroutine blocked.
What you are seeing is that goroutine's block in gopark. That is not
surprising. What we don't know is what the goroutine is currently
doing. Similarly, vdsoPC tells us the PC value the last time the
goroutine went into a VDSO call. Having that be time.now is not
surprising. It doesn't tell us anything about what the goroutine is
doing now. Also note that vdsoPC is only meaningful if vdsoSP != 0.

> I'm trying to make a simpler program that reproduces this, but thus far running time.Now in a tight loop across multiple goroutines isn't causing it. Any ideas what I could try next?

If it's true that some goroutines are simply looping, then the only
way to get their current PC is to crash the program with SIGQUIT and
look at the stack trace. The runtime doesn't have any way to ask a
goroutine what the currently executing PC is.

Ian

Robert Engels

unread,

Aug 27, 2019, 10:31:09 AM8/27/19

to Ian Lance Taylor, Michael Andersen, Steven Hartland, Ronny Bangsund, golang-nuts

I suggest using pstack to capture the stacks of all threads of you can. Because it is external ut is not subject to the stopping bias of the internal go facilities.

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcU1pvzbun3488exV%3D%2BhTauSxd8JpBh0Oo76inVafHu-rA%40mail.gmail.com.

Michael Andersen

unread,

Aug 27, 2019, 12:50:01 PM8/27/19

to Robert Engels, Ian Lance Taylor, Steven Hartland, Ronny Bangsund, golang-nuts

If it's true that some goroutines are simply looping, then the only
way to get their current PC is to crash the program with SIGQUIT and
look at the stack trace. The runtime doesn't have any way to ask a
goroutine what the currently executing PC is.

I suggest using pstack to capture the stacks of all threads of you can. Because it is external ut is not subject to the stopping bias of the internal go facilities.

Thanks for the suggestions. I'll set up something external that sends sigquit during the stall and add an external gstack in a loop.

Reply all

Reply to author

Forward