goroutine preemption

1,430 views
Skip to first unread message

Russ Cox

unread,
Jul 18, 2013, 10:59:56 AM7/18/13
to golang-dev
This thread is about making preemption of goroutines work.

As background, Dmitriy has written and submitted some CLs that can preeempt running goroutines, such as for garbage collection. The goroutines are modified to look like they have no stack left, and then on the next stack split check they jump into the "I need more stack" handler, which implements the preemption. So there is no additional cost beyond the current stack splits. Unfortunately, every time we turn it on, something new breaks. This thread is a place to record the status of trying to make it work.

When we turned it on yesterday, builders broken in three ways.

First, deferreturn calling free was being preempted, and the runtime panicked because it did not have frame information for deferreturn. Good thing too, since deferreturn is in the middle of synthesizing a call, and until it can put down the right PC to match the arguments, the garbage collector is not going to understand what it sees. CL 11522043 is pending to fix that.

Second, the ARM builders reported a problem trying to find the frame information for _sfloat2, the software floating point handler. CL 11523043 is pending to fix that.

Third, some builders are reporting a problem trying to find the frame information for morestack, implying that they found a call to morestack on a goroutine stack (presumably not a g0 stack). I don't understand how that's possible. I made it happen just once on my Mac using GOGC=1 to trigger many preemptions, but I haven't yet been able to make it happen again. We need to track that down before trying again.

I plan to update this thread as we try again (and again...).

Russ

Dmitry Vyukov

unread,
Jul 18, 2013, 12:19:32 PM7/18/13
to Russ Cox, golang-dev
There is also:
https://code.google.com/p/go/issues/detail?id=5899
Are we able to unwind through cgo callbacks?

We can use
https://codereview.appspot.com/10796043/
to shake out bugs faster. It can preempt goroutines every 20us.

Russ Cox

unread,
Jul 18, 2013, 1:29:26 PM7/18/13
to Dmitry Vyukov, golang-dev
CL 11533043 should fix the morestack problem.
CL 10796043 preempts running goroutines every 10ms (if preemption is enabled at all).

Once both are submitted, we will re-enable preemption and see what breaks next.

Russ

Russ Cox

unread,
Jul 18, 2013, 10:54:39 PM7/18/13
to Dmitry Vyukov, golang-dev
I wrote CL 11559043 to try to exercise preemption a bit. It makes about 1/32 of the function entry prologues see StackPreempt the first time through, which means there is a lot of yielding going on. You can make it more likely by removing 1 bits from the mask at the bottom of asm_amd64.s. That CL helped me find the settype bug (CL 11560043), and it also produced a test failure about finalizers not being called (below).

Is it possible that if the finalizer goroutine reschedules it doesn't get as much time to finalize as it should? Do we care?

Russ


run                  fixedbugs/issue5493.go: incorrect output
1 out of 10 finalizer are called
panic: not all finalizers are called

goroutine 1 [running]:
runtime.panic(0x25760, 0x2100ec190)
/Users/rsc/g/go/src/pkg/runtime/panic.c:265 +0xbb
main.main()
/Users/rsc/g/go/test/fixedbugs/issue5493.go:55 +0x1a1
runtime.main()
/Users/rsc/g/go/src/pkg/runtime/proc.c:182 +0x99
runtime.goexit()
/Users/rsc/g/go/src/pkg/runtime/proc.c:1296

goroutine 2 [syscall]:
runtime.MHeap_Scavenger()
/Users/rsc/g/go/src/pkg/runtime/mheap.c:464 +0x98
runtime.goexit()
/Users/rsc/g/go/src/pkg/runtime/proc.c:1296
created by runtime.main
/Users/rsc/g/go/src/pkg/runtime/proc.c:165

goroutine 24 [finalizer wait]:
runtime.park(0x7d30, 0x7c438, 0x7b53c)
/Users/rsc/g/go/src/pkg/runtime/proc.c:1244 +0x6b
runfinq()
/Users/rsc/g/go/src/pkg/runtime/mgc0.c:2280 +0x89
runtime.goexit()
/Users/rsc/g/go/src/pkg/runtime/proc.c:1296
created by runtime.gc
/Users/rsc/g/go/src/pkg/runtime/mgc0.c:1968

goroutine 23 [timer goroutine (idle)]:
runtime.park(0x7d30, 0x7c640, 0x78e26)
/Users/rsc/g/go/src/pkg/runtime/proc.c:1244 +0x6b
timerproc()
/Users/rsc/g/go/src/pkg/runtime/time.goc:210 +0x80
runtime.goexit()
/Users/rsc/g/go/src/pkg/runtime/proc.c:1296
created by addtimer
/Users/rsc/g/go/src/pkg/runtime/time.goc:90
exit status 2

Dmitry Vyukov

unread,
Jul 19, 2013, 6:38:19 AM7/19/13
to Russ Cox, golang-dev
There is another problem I've discovered:

If a runtime goroutine does this:
runtime·entersyscallblock();
runtime·notetsleep(&note, tick);
runtime·exitsyscall();

It's not preempted in notetsleep(), because it's not Grunning state,
but it's resumed with:
runtime·gogo(&gp->sched); // never return

which does:
MOVQ $0, gobuf_sp(BX) // clear to help garbage collector
MOVQ $0, gobuf_ret(BX)
MOVQ $0, gobuf_ctxt(BX)

So the goroutine is in Gsyscall but with sched.sp=0.
If another goroutine will try to traceback it, traceback will crash
trying to obtain return address from sp=0.

Dmitry Vyukov

unread,
Jul 19, 2013, 7:55:26 AM7/19/13
to Russ Cox, golang-dev
On Fri, Jul 19, 2013 at 6:54 AM, Russ Cox <r...@golang.org> wrote:
> I wrote CL 11559043 to try to exercise preemption a bit. It makes about 1/32
> of the function entry prologues see StackPreempt the first time through,
> which means there is a lot of yielding going on. You can make it more likely
> by removing 1 bits from the mask at the bottom of asm_amd64.s. That CL
> helped me find the settype bug (CL 11560043), and it also produced a test
> failure about finalizers not being called (below).
>
> Is it possible that if the finalizer goroutine reschedules it doesn't get as
> much time to finalize as it should? Do we care?


there are 2 preciseness issues in GC, don't know why they are
triggered by preemption
cl/11454044 and cl/11416046 should fix this

Dmitry Vyukov

unread,
Jul 19, 2013, 3:39:25 PM7/19/13
to Russ Cox, golang-dev
On Fri, Jul 19, 2013 at 3:55 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Fri, Jul 19, 2013 at 6:54 AM, Russ Cox <r...@golang.org> wrote:
>> I wrote CL 11559043 to try to exercise preemption a bit. It makes about 1/32
>> of the function entry prologues see StackPreempt the first time through,
>> which means there is a lot of yielding going on. You can make it more likely
>> by removing 1 bits from the mask at the bottom of asm_amd64.s. That CL
>> helped me find the settype bug (CL 11560043), and it also produced a test
>> failure about finalizers not being called (below).
>>
>> Is it possible that if the finalizer goroutine reschedules it doesn't get as
>> much time to finalize as it should? Do we care?
>
>
> there are 2 preciseness issues in GC, don't know why they are
> triggered by preemption
> cl/11454044 and cl/11416046 should fix this


Should be fixed now

Dmitry Vyukov

unread,
Jul 19, 2013, 3:48:07 PM7/19/13
to Russ Cox, golang-dev
Actually this is not preemption-specific. It's just a serious bug that
was never triggered before.

If this code will grow stack:
runtime·entersyscallblock();
runtime·notetsleep(&note, tick);
runtime·exitsyscall();

newstack will call gogo and clear sched.sp, then next GC will crash.

We can fix newstack. But I think a better idea is to prohibit calling
split functions after entersyscall. syscall package already does this.
And in runtime we only call notesleep/notetsleep.
A goroutine can not call malloc after entersyscall (no P and no
mcache), so it can not allocate large stack frames. So it makes sense
to prohibit stack growing at all.
It will also significantly help with moving stacksegment cache from M
to P (which I want to do eventually, it will save some significant
amount of memory for large programs).

Sounds good?

Russ Cox

unread,
Jul 19, 2013, 3:57:46 PM7/19/13
to Dmitry Vyukov, golang-dev
On Fri, Jul 19, 2013 at 3:48 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Actually this is not preemption-specific. It's just a serious bug that
was never triggered before.

There's plenty of stack, so no splits. It's not a bug: the code is written not to use more than an ordinary stack frame.
 
We can fix newstack. But I think a better idea is to prohibit calling
split functions after entersyscall. syscall package already does this.
And in runtime we only call notesleep/notetsleep.

How do you propose to do this? I don't see any way to prevent it. We can definitely catch it, by having newstack check for g->status == Gsyscall and die then instead of in the next GC.

I agree that after entersyscall there should be very limited operations, and stack splits are not okay.

Russ

Dmitry Vyukov

unread,
Jul 19, 2013, 4:16:54 PM7/19/13
to Russ Cox, golang-dev
On Fri, Jul 19, 2013 at 11:57 PM, Russ Cox <r...@golang.org> wrote:
> On Fri, Jul 19, 2013 at 3:48 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>>
>> Actually this is not preemption-specific. It's just a serious bug that
>> was never triggered before.
>
>
> There's plenty of stack, so no splits. It's not a bug: the code is written
> not to use more than an ordinary stack frame.
>
>>
>> We can fix newstack. But I think a better idea is to prohibit calling
>> split functions after entersyscall. syscall package already does this.
>> And in runtime we only call notesleep/notetsleep.
>
>
> How do you propose to do this? I don't see any way to prevent it.

Mark everything as textflag,7?

> We can
> definitely catch it, by having newstack check for g->status == Gsyscall and
> die then instead of in the next GC.

Moreover, we can set g->stackguard0 = StackPreempt in the end of
entersyscall (and restore in exitsyscall), so that any single call
will be instantly caught.
And the body of entersyscall must surrounded with m->locks++/--,
because it can be preempted directly in entersyscall with
g->status=Gsyscall but invalid g->sched.

Russ Cox

unread,
Jul 19, 2013, 4:45:06 PM7/19/13
to Dmitry Vyukov, golang-dev
On Fri, Jul 19, 2013 at 4:16 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> How do you propose to do this? I don't see any way to prevent it.

Mark everything as textflag,7?

Certainly, but I meant how to catch invalid code.
 
> We can
> definitely catch it, by having newstack check for g->status == Gsyscall and
> die then instead of in the next GC.

Moreover, we can set g->stackguard0 = StackPreempt in the end of
entersyscall (and restore in exitsyscall), so that any single call
will be instantly caught.
And the body of entersyscall must surrounded with m->locks++/--,
because it can be preempted directly in entersyscall with
g->status=Gsyscall but invalid g->sched.

Okay, send a CL.

Russ

Dmitry Vyukov

unread,
Jul 20, 2013, 9:14:49 AM7/20/13
to Russ Cox, golang-dev
The approach of catching stack splits in syscalls works well:
https://codereview.appspot.com/11575044
It instantly catches notesleep/notetsleep. Once I mark them as
textflag,7, it catches a next function.

Now we only need to figure out how to fit all callchains into 128 bytes...

Russ Cox

unread,
Jul 23, 2013, 2:38:47 PM7/23/13
to Dmitry Vyukov, golang-dev
Dmitriy, what's the status of preemption?

Dmitry Vyukov

unread,
Jul 24, 2013, 12:12:11 AM7/24/13
to Russ Cox, golang-dev
On Tue, Jul 23, 2013 at 10:38 PM, Russ Cox <r...@golang.org> wrote:
> Dmitriy, what's the status of preemption?


I've not done lot of progress on the "runtime: do not allow stack
splits in syscall status" CL (https://codereview.appspot.com/11575044)
yesterday due to family reasons.
I've addressed your comments, but I still need to do other OSes, test, etc.
I am working not a full day today as well, but I hope I will be able
to finish it.

I see that you've landed "runtime: reduce frame size for
runtime.cgocallback_gofunc". That's good.

Dmitry Vyukov

unread,
Jul 26, 2013, 9:18:13 AM7/26/13
to Russ Cox, golang-dev
When https://codereview.appspot.com/11575044/ is landed, we are ready
to try to reenable preemption. I've tested with cl/11575044 and
preemption every 10 us on linux/darwin/windows x 386/amd64 x
GOMAXPROCS=1/2/4. No failures.

Russ Cox

unread,
Jul 30, 2013, 3:15:30 PM7/30/13
to Dmitry Vyukov, golang-dev
Preemption is enabled again, and the builders seem happy. Please let us know if you see any problems.

Thanks.
Russ

Brad Fitzpatrick

unread,
Jul 30, 2013, 3:33:00 PM7/30/13
to Russ Cox, Dmitry Vyukov, golang-dev
Is this related?

http://build.golang.org/log/a70ab5e55c6a4a61d242e6385261a7e1c00dc8aa

# GOMAXPROCS=2 runtime -cpu=1,2,4
panic: test timed out

goroutine 240531 [running]:
testing.alarm()
/usr/local/go/src/pkg/testing/testing.go:577 +0x55
created by time.goFunc
/usr/local/go/src/pkg/time/sleep.go:122 +0x4a

goroutine 1 [chan receive]:
testing.RunTests(0x65fbf8, 0x82b8e0, 0x3b, 0x3b, 0x1)
/usr/local/go/src/pkg/testing/testing.go:441 +0x8b1
testing.Main(0x65fbf8, 0x82b8e0, 0x3b, 0x3b, 0x82cba0, ...)
/usr/local/go/src/pkg/testing/testing.go:372 +0x8c
main.main()
runtime/_test/_testmain.go:407 +0x9c

goroutine 38 [finalizer wait]:
runtime.park(0x40d790, 0x82d7b8, 0x82b7fc)
/tmp/gobuilder/netbsd-amd64-bsiegert-a5b5cbb9bd3d/go/src/pkg/runtime/proc.c:1280 +0x66
runfinq()
/tmp/gobuilder/netbsd-amd64-bsiegert-a5b5cbb9bd3d/go/src/pkg/runtime/mgc0.c:2288 +0x84
runtime.goexit()
/tmp/gobuilder/netbsd-amd64-bsiegert-a5b5cbb9bd3d/go/src/pkg/runtime/proc.c:1332
created by runtime.gc
/tmp/gobuilder/netbsd-amd64-bsiegert-a5b5cbb9bd3d/go/src/pkg/runtime/mgc0.c:1971

goroutine 240527 [running]:
goroutine running on other thread; stack unavailable

goroutine 240528 [running]:
goroutine running on other thread; stack unavailable

goroutine 240529 [running]:
goroutine running on other thread; stack unavailable

goroutine 240530 [running]:
goroutine running on other thread; stack unavailable

goroutine 240525 [runnable]:
runtime_test.TestPreemptionGC(0xc210091480)
/tmp/gobuilder/netbsd-amd64-bsiegert-a5b5cbb9bd3d/go/src/pkg/runtime/proc_test.go:228 +0xcd
testing.tRunner(0xc210091480, 0x82bda8)
/usr/local/go/src/pkg/testing/testing.go:360 +0x8e
created by testing.RunTests
/usr/local/go/src/pkg/testing/testing.go:440 +0x88e

goroutine 240526 [running]:
goroutine running on other thread; stack unavailable
FAIL runtime 303.268s
Build complete, duration 19m50.02555083s. Result: error: exit status 1


--
 
---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Brad Fitzpatrick

unread,
Jul 30, 2013, 5:39:58 PM7/30/13
to Russ Cox, Dmitry Vyukov, golang-dev
This also looks bad, on linux/arm:


ok   regexp/syntax 4.819s
runtime: pc=0x1ff08 0xe59d5028
fatal error: runtime: misuse of rewindmorestack

goroutine 2 [stack split]:
runtime.notetsleepg(0x401eef9c, 0xf8475800, 0xd)
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/lock_futex.c:196 +0x40 fp=0x401eef3c
runtime.MHeap_Scavenger()
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/mheap.c:463 +0xec fp=0x401eefcc
runtime.goexit()
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/proc.c:1332 fp=0x401eefcc
created by runtime.main
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/proc.c:168

goroutine 1 [chan receive]:
testing.RunTests(0x20c448, 0x3acef8, 0x3d, 0x3d, 0x1)
/usr/local/go/src/pkg/testing/testing.go:441 +0x6e4
testing.Main(0x20c448, 0x3acef8, 0x3d, 0x3d, 0x3ae8f8, ...)
/usr/local/go/src/pkg/testing/testing.go:372 +0x6c
main.main()
runtime/_test/_testmain.go:451 +0x98

goroutine 73 [finalizer wait]:
runtime.park(0x1fa70, 0x3af6f0, 0x3ae24c)
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/proc.c:1280 +0x48
runfinq()
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/mgc0.c:2288 +0x80
runtime.goexit()
/home/shane/workspace/linux-arm-armv5t-f35668002535/go/src/pkg/runtime/proc.c:1332

goroutine 80174 [running]:
goroutine running on other thread; stack unavailable
FAIL runtime 60.177s
?   runtime/cgo [no test files]


Dave Cheney

unread,
Jul 30, 2013, 5:52:05 PM7/30/13
to Brad Fitzpatrick, Russ Cox, Dmitry Vyukov, golang-dev
This looks like a side effect of the cas emulation on arm5 systems. The arm6 and 7 builders were happy when I looked. 

Russ Cox

unread,
Jul 30, 2013, 11:06:36 PM7/30/13
to Dave Cheney, Brad Fitzpatrick, Dmitry Vyukov, golang-dev
The arm builders are failing in a few ways.

1. No stack frame size for _sfloat. Fixed.
2. Fault during malloc causes panic, wants to allocated, causes malloc/free deadlock.
3. Misuse of rewindmorestack. The pc being passed in is the one recorded in entersyscallblock, but that value should not be possible to get into morestack with.

The amd64 and 386 builders are failing in a few ways.

4. The standard (GOMAXPROCS=1) runtime.test runs for 300 seconds and is killed with no information printed. Sent a CL to try to get information.
5. The later (GOMAXPROCS=2) test times out in TestPreemptionGC, I bet on uniprocessors.

The various failures may or may not be related.

The only way I can explain #3 is that the same goroutine is being run on two different threads at the same time. Any ideas?

Russ

Dmitry Vyukov

unread,
Jul 31, 2013, 9:50:15 AM7/31/13
to Russ Cox, Dave Cheney, Brad Fitzpatrick, golang-dev
On Wed, Jul 31, 2013 at 7:06 AM, Russ Cox <r...@golang.org> wrote:
> The arm builders are failing in a few ways.
>
> 1. No stack frame size for _sfloat. Fixed.
> 2. Fault during malloc causes panic, wants to allocated, causes malloc/free
> deadlock.
> 3. Misuse of rewindmorestack. The pc being passed in is the one recorded in
> entersyscallblock, but that value should not be possible to get into
> morestack with.
>
> The amd64 and 386 builders are failing in a few ways.
>
> 4. The standard (GOMAXPROCS=1) runtime.test runs for 300 seconds and is
> killed with no information printed. Sent a CL to try to get information.
> 5. The later (GOMAXPROCS=2) test times out in TestPreemptionGC, I bet on
> uniprocessors.

working on potential fix

>
> The various failures may or may not be related.
>
> The only way I can explain #3 is that the same goroutine is being run on two
> different threads at the same time. Any ideas?

it would have more serious consequences

Russ Cox

unread,
Aug 1, 2013, 10:15:14 PM8/1/13
to Dmitry Vyukov, Dave Cheney, Brad Fitzpatrick, golang-dev
Preemption is still on and appears to be stable after the most recent round of bug fixes.

Russ

Dave Cheney

unread,
Aug 1, 2013, 10:19:55 PM8/1/13
to Russ Cox, Dmitry Vyukov, Brad Fitzpatrick, golang-dev
We've passed the point of no return. Thanks to rsc, Dmitry, Remy and everyone else for your persistence.

Jan Mercl

unread,
Aug 2, 2013, 4:22:45 AM8/2/13
to Russ Cox, Rob 'Commander' Pike, Dmitry Vyukov, Dave Cheney, Brad Fitzpatrick, golang-dev
On Fri, Aug 2, 2013 at 4:15 AM, Russ Cox <r...@golang.org> wrote:
> Preemption is still on and appears to be stable after the most recent round
> of bug fixes.

I'm afraid that goroutine preemption is now unfortunately a "must"
part of the specs. Otherwise some code running fine with preemption
may fail to deliver w/o it.

-j

Dave Cheney

unread,
Aug 2, 2013, 6:04:24 AM8/2/13
to Jan Mercl, Russ Cox, Rob 'Commander' Pike, Dmitry Vyukov, Brad Fitzpatrick, golang-dev
I disagree for two reasons.

1. Preemption only occurs during function entry, it is a facility to reduce the stop the world latency of the gc. for {} will continue to encourage people to use GOMAXPROCs to mishandle the problem as before.

2. Just because the gc runtime uses a form of preemption is not a justification for it being added to the spec. I'm sure there are lots of runtime difference between gc and gccgo, this is just one more.

Dave

Russ Cox

unread,
Aug 2, 2013, 1:03:42 PM8/2/13
to Jan Mercl, Rob 'Commander' Pike, Dmitry Vyukov, Dave Cheney, Brad Fitzpatrick, golang-dev
On Fri, Aug 2, 2013 at 4:22 AM, Jan Mercl <0xj...@gmail.com> wrote:
I'm afraid that goroutine preemption is now unfortunately a "must"
part of the specs. Otherwise some code running fine with preemption
may fail to deliver w/o it.

This has always been true. Not preempting is a bug, not a feature.

Russ

Dmitry Vyukov

unread,
Aug 3, 2013, 2:31:33 AM8/3/13
to Jan Mercl, Russ Cox, Rob 'Commander' Pike, Dave Cheney, Brad Fitzpatrick, golang-dev
Aggressiveness of preemption is QoI characteristic.
Note that C and Java are exactly in the same situation --
specifications do not guarantee any kind of fair preemptive
scheduling. 99% of C and Java programs rely on it.
Fair preemptive scheduling is not something that is possible to define
on language level. Failure to preempt in a timely fashion is
indistinguishable from an OS rejecting to schedule a thread in a
timely fashion (e.g. to a weird dynamic priority boosting) or some
operation taking way too long (e.g. swap file on a filed NFS) and some
other things.
Reply all
Reply to author
Forward
0 new messages