Preemptive scheduler 2

1,606 views
Skip to first unread message

Dmitry Vyukov

unread,
May 12, 2013, 11:19:13 AM5/12/13
to golang-dev
Good news everyone!

I have a working prototype of a preemptive scheduler:
https://codereview.appspot.com/9136045

It's not based on the design I described previously. Instead it
[ab]uses split stack machinery to preempt long running goroutines.

Sysmon thread watches for P's executing the same goroutine for more
than X ms. If finds a one it sets g->stackguard to a very large value,
so that on next split stack check the goroutine calls morestack() and
gets preempted.
On high level that's basically it. Less than 100 lines of portable
code. And virtually zero overhead without preemption. It plays nicely
with precise GC because goroutine are preempted only at controllable
points. The implementation is relatively simple because goroutines are
preempted voluntarily, i.e. no signals and inter-thread
synchronization.

Currently I use only existing split stack checks. But if we need
finer-grained preemption, we can easily add additional checks on back
edges.

all.bash seems to pass on darwin on linux. I've implemented 2
preemption tests and they work fine. First ensures that the ping-pong
pattern works, and the second ensures that GC can stop goroutines even
if they do not call chan/map/malloc:
https://codereview.appspot.com/9136045/diff/16001/src/pkg/runtime/proc_test.go

I've also tested on the following program, which models one of the GC
stoptheworld issues we have:
http://play.golang.org/p/Yzc4Vx-KaF

w/o preemptive scheduling GC pause looks like:
gc7(8): 0+0+429 ms, 3462 -> 2908 MB
gc8(8): 0+0+296 ms, 5830 -> 3861 MB
gc9(8): 0+0+661 ms, 7758 -> 3825 MB
gc10(8): 0+0+939 ms, 7664 -> 4014 MB
gc11(8): 0+0+907 ms, 8063 -> 4016 MB

and with preemptive scheduler it looks:
gc8(8): 0+0+126 ms, 4989 -> 3020 MB
gc9(8): 0+0+124 ms, 6057 -> 3249 MB
gc10(8): 0+0+72 ms, 6499 -> 3711 MB
gc11(8): 0+0+121 ms, 7434 -> 3250 MB

Note significant decrease in stoptheworld pause.

So what do you think about this design?

minux

unread,
May 12, 2013, 11:41:12 AM5/12/13
to Dmitry Vyukov, golang-dev

On Sun, May 12, 2013 at 11:19 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Good news everyone!
Yeah, really. Thank you for working on that.

I have a working prototype of a preemptive scheduler:
https://codereview.appspot.com/9136045

It's not based on the design I described previously. Instead it
[ab]uses split stack machinery to preempt long running goroutines.

Sysmon thread watches for P's executing the same goroutine for more
than X ms. If finds a one it sets g->stackguard to a very large value,
so that on next split stack check the goroutine calls morestack() and
gets preempted
How do you handle busy loops that doesn't call any split-stack functions (or one
that calls no function at all)?

could we instruct gc to place a dummy call (e.g. runtime.emptyfunc) that only
checks stack limit and return into every while-style loop that doesn't call any split-
stack functions to handle this corner case? (of course, we can further optimize this
solution.)

I think your approach will work fine except the aforementioned corner case.
Can't wait for Go 1.2!

Brad Fitzpatrick

unread,
May 12, 2013, 1:55:19 PM5/12/13
to minux, Dmitry Vyukov, golang-dev
On Sun, May 12, 2013 at 8:41 AM, minux <minu...@gmail.com> wrote:

On Sun, May 12, 2013 at 11:19 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Good news everyone!
Yeah, really. Thank you for working on that.

Yes. This is much less scary than the old proposal.
 
How do you handle busy loops that doesn't call any split-stack functions (or one
that calls no function at all)?
could we instruct gc to place a dummy call (e.g. runtime.emptyfunc) that only
checks stack limit and return into every while-style loop that doesn't call any split-
stack functions to handle this corner case? (of course, we can further optimize this
solution.)

He said: "But if we need

Aram Hăvărneanu

unread,
May 12, 2013, 5:46:35 PM5/12/13
to golan...@googlegroups.com
> could we instruct gc to place a dummy call (e.g. runtime.emptyfunc)
> that only checks stack limit and return into every while-style loop
> that doesn't call any split- stack functions to handle this corner
> case?

You can play whack a mole, but you can't solve the halting problem.

Ian Lance Taylor

unread,
May 12, 2013, 11:42:41 PM5/12/13
to Aram Hăvărneanu, golan...@googlegroups.com
We don't need to solve the halting problem here. It suffices for the
compiler to add a function call (or other check for preemption) to
every backward branch. We can optimize by examining the control flow
graph: we only need to add a check if there is some path from the
branch target to the branch that does not make any function calls. If
we go this route we should be able to arrange for a preemption check
to be two instructions: a load instruction and a conditional branch
instruction, predicted not taken. it would be interesting to find out
how much overhead that would introduce in real programs.

Ian

Dmitry Vyukov

unread,
May 13, 2013, 2:14:17 AM5/13/13
to Ian Lance Taylor, Aram Hăvărneanu, golang-dev
On Mon, May 13, 2013 at 7:42 AM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Sun, May 12, 2013 at 2:46 PM, Aram Hăvărneanu <ar...@mgk.ro> wrote:
>>> could we instruct gc to place a dummy call (e.g. runtime.emptyfunc)
>>> that only checks stack limit and return into every while-style loop
>>> that doesn't call any split- stack functions to handle this corner
>>> case?
>>
>> You can play whack a mole, but you can't solve the halting problem.
>
> We don't need to solve the halting problem here. It suffices for the
> compiler to add a function call (or other check for preemption) to
> every backward branch. We can optimize by examining the control flow
> graph: we only need to add a check if there is some path from the
> branch target to the branch that does not make any function calls. If
> we go this route we should be able to arrange for a preemption check
> to be two instructions: a load instruction and a conditional branch
> instruction, predicted not taken...

... and all register spill/restore. We can teach the compiler that
this call does not spoil registers, but that would require saving and
restoring whole context.

I would wait until we gather evidence that this is important at all. I
think that checking on function entry is good enough for almost all
real programs. I don't care much about for{}. I can be an issue only
for large slice-crunching, but for small slice-crunching it's harmful.

Dmitry Vyukov

unread,
May 13, 2013, 2:22:00 AM5/13/13
to golang-dev, Keith Randall
I think it will be very useful for Go programs (both some preemption
and fairness and faster GC pause). If there are no general objection,
I will start sending patches.

The only remaining problem is that in morestack() we don't always know
framesize (if framesize+argsize < StackSmall, it's not passed to
decrease code size). So I always have to create a new frame, even if
there is plenty of space in the current frame. It can cause allocation
of several unnecessary frames (if can not preempt right now, we retry
on next call and so on).
I think we can generate 8x8 morestack() functions for argsize=8,16..64
x framesize=8,16..64. And for other cases pass both values.
What do you think?

Keith, please take a look at hashmap. Are there any new preemption
points that can break GC invariants? For such points we need to insert
m->locks++/--, that will prevent preemption.

minux

unread,
May 13, 2013, 3:50:25 AM5/13/13
to Dmitry Vyukov, golang-dev
On Sun, May 12, 2013 at 11:19 PM, Dmitry Vyukov <dvy...@google.com> wrote:
I have a working prototype of a preemptive scheduler:
https://codereview.appspot.com/9136045

It's not based on the design I described previously. Instead it
[ab]uses split stack machinery to preempt long running goroutines.

Sysmon thread watches for P's executing the same goroutine for more
than X ms. If finds a one it sets g->stackguard to a very large value,
so that on next split stack check the goroutine calls morestack() and
gets preempted.
the only problem i could think of is:
what about gccgo targets that don't support split stack mechanism at all? 

Dmitry Vyukov

unread,
May 13, 2013, 3:54:17 AM5/13/13
to minux, golang-dev

minux

unread,
May 13, 2013, 3:58:42 AM5/13/13
to Dmitry Vyukov, golang-dev
right. but only gold support splitstack and not all gccgo targets use gold.
and IIRC, ARM target of gold doesn't support splitstack.

Dmitry Vyukov

unread,
May 13, 2013, 4:21:32 AM5/13/13
to minux, golang-dev
On Mon, May 13, 2013 at 11:58 AM, minux <minu...@gmail.com> wrote:
>> >> I have a working prototype of a preemptive scheduler:
>> >> https://codereview.appspot.com/9136045
>> >>
>> >> It's not based on the design I described previously. Instead it
>> >> [ab]uses split stack machinery to preempt long running goroutines.
>> >>
>> >> Sysmon thread watches for P's executing the same goroutine for more
>> >> than X ms. If finds a one it sets g->stackguard to a very large value,
>> >> so that on next split stack check the goroutine calls morestack() and
>> >> gets preempted.
>> >
>> > the only problem i could think of is:
>> > what about gccgo targets that don't support split stack mechanism at
>> > all?
>>
>> http://gcc.gnu.org/wiki/SplitStacks
>
> right. but only gold support splitstack and not all gccgo targets use gold.
> and IIRC, ARM target of gold doesn't support splitstack.

Ian?

I think we can live with that. There are other QoI differences between
gc and gccgo. And absence of split stacks is the major one, so some
programs can behave differently with gccgo w/o split stacks already.

Ian Lance Taylor

unread,
May 13, 2013, 9:56:36 AM5/13/13
to Dmitry Vyukov, Aram Hăvărneanu, golang-dev
On Sun, May 12, 2013 at 11:14 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, May 13, 2013 at 7:42 AM, Ian Lance Taylor <ia...@golang.org> wrote:
>> We don't need to solve the halting problem here. It suffices for the
>> compiler to add a function call (or other check for preemption) to
>> every backward branch. We can optimize by examining the control flow
>> graph: we only need to add a check if there is some path from the
>> branch target to the branch that does not make any function calls. If
>> we go this route we should be able to arrange for a preemption check
>> to be two instructions: a load instruction and a conditional branch
>> instruction, predicted not taken...
>
> ... and all register spill/restore. We can teach the compiler that
> this call does not spoil registers, but that would require saving and
> restoring whole context.

On Intel we can use the test instruction, no registers required. Note
that I am assuming an additional TLS variable which is set to non-zero
when preemption is required. I think that is clearly worth doing if
we do decide to go this route.

> I would wait until we gather evidence that this is important at all. I
> think that checking on function entry is good enough for almost all
> real programs. I don't care much about for{}. I can be an issue only
> for large slice-crunching, but for small slice-crunching it's harmful.

There are a number of bug reports in which people get confused by the
behaviour when running an infinite loop. I think it's worth
considering how to handle those cases better, whether via a preemption
mechanism or some other approach.

I agree that in a range loop over a slice, in which the loop makes no
function calls, it may be appropriate to only insert a test every 256
times around or something. We would have to measure.

Ian

Ian Lance Taylor

unread,
May 13, 2013, 9:58:39 AM5/13/13
to Dmitry Vyukov, golang-dev, Keith Randall
On Sun, May 12, 2013 at 11:22 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> The only remaining problem is that in morestack() we don't always know
> framesize (if framesize+argsize < StackSmall, it's not passed to
> decrease code size). So I always have to create a new frame, even if
> there is plenty of space in the current frame. It can cause allocation
> of several unnecessary frames (if can not preempt right now, we retry
> on next call and so on).

I haven't looked at the patches yet, but don't you know that framesize
+ argsize < StackSmall? Can't you only allocate a new frame when
there is less than StackSmall space available?

Ian

Ian Lance Taylor

unread,
May 13, 2013, 10:04:42 AM5/13/13
to Dmitry Vyukov, minux, golang-dev
The split stack support has to be target specific. Testing for
preemption does not. On targets that do not support split stack it
would be easy to modify the gccgo frontend to add a test at entry of a
function for whether to preempt.

To be precise, gold is not required for split stack support. Gold is
required in order to use small stacks. The difference is that when a
Go function calls a C function, gold inserts code to force the stack
to grow. When not using gold, that code is not present, so the stack
always has to be large enough to call a C function. But even when not
using gold, on targets for which GCC supports split stacks, the stack
will split in a deeply recursive Go function. This distinction is
expressed as the preprocessor macro LINKER_SUPPORTS_SPLIT_STACK when
building libgo/runtime.

Ian

Dmitry Vyukov

unread,
May 13, 2013, 10:18:56 AM5/13/13
to Ian Lance Taylor, golang-dev, Keith Randall
Sorry, I meant StackMin. There is always less space than StackMin.

Dmitry Vyukov

unread,
May 13, 2013, 10:22:55 AM5/13/13
to Ian Lance Taylor, Aram Hăvărneanu, golang-dev
On Mon, May 13, 2013 at 5:56 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Sun, May 12, 2013 at 11:14 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>> On Mon, May 13, 2013 at 7:42 AM, Ian Lance Taylor <ia...@golang.org> wrote:
>>> We don't need to solve the halting problem here. It suffices for the
>>> compiler to add a function call (or other check for preemption) to
>>> every backward branch. We can optimize by examining the control flow
>>> graph: we only need to add a check if there is some path from the
>>> branch target to the branch that does not make any function calls. If
>>> we go this route we should be able to arrange for a preemption check
>>> to be two instructions: a load instruction and a conditional branch
>>> instruction, predicted not taken...
>>
>> ... and all register spill/restore. We can teach the compiler that
>> this call does not spoil registers, but that would require saving and
>> restoring whole context.
>
> On Intel we can use the test instruction, no registers required.

Ah, I see. The register spills will be around the function call on
cold path, and should not affect hot path, right?

> Note
> that I am assuming an additional TLS variable which is set to non-zero
> when preemption is required. I think that is clearly worth doing if
> we do decide to go this route.

I just use g->stackguard. So not additional TLS variable is required.


>> I would wait until we gather evidence that this is important at all. I
>> think that checking on function entry is good enough for almost all
>> real programs. I don't care much about for{}. I can be an issue only
>> for large slice-crunching, but for small slice-crunching it's harmful.
>
> There are a number of bug reports in which people get confused by the
> behaviour when running an infinite loop. I think it's worth
> considering how to handle those cases better, whether via a preemption
> mechanism or some other approach.

Well, we can replace for{} in the compiler :)
However, I agree that there will be long tail where checking on
function entry is not enough.


> I agree that in a range loop over a slice, in which the loop makes no
> function calls, it may be appropriate to only insert a test every 256
> times around or something. We would have to measure.

Unless we unroll 256 iterations, I think it's faster just to check
sp<g->stackguard, then to count iterations and check on every 256-th
iteration.

minux

unread,
May 13, 2013, 10:44:41 AM5/13/13
to Dmitry Vyukov, Ian Lance Taylor, Aram Hăvărneanu, golang-dev
On Mon, May 13, 2013 at 10:22 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Mon, May 13, 2013 at 5:56 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Sun, May 12, 2013 at 11:14 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>> On Mon, May 13, 2013 at 7:42 AM, Ian Lance Taylor <ia...@golang.org> wrote:
>>> We don't need to solve the halting problem here.  It suffices for the
>>> compiler to add a function call (or other check for preemption) to
>>> every backward branch.  We can optimize by examining the control flow
>>> graph: we only need to add a check if there is some path from the
>>> branch target to the branch that does not make any function calls.  If
>>> we go this route we should be able to arrange for a preemption check
>>> to be two instructions: a load instruction and a conditional branch
>>> instruction, predicted not taken...
>>
>> ... and all register spill/restore. We can teach the compiler that
>> this call does not spoil registers, but that would require saving and
>> restoring whole context.
>
> On Intel we can use the test instruction, no registers required.

Ah, I see. The register spills will be around the function call on
cold path, and should not affect hot path, right?

> Note
> that I am assuming an additional TLS variable which is set to non-zero
> when preemption is required.  I think that is clearly worth doing if
> we do decide to go this route.

I just use g->stackguard. So not additional TLS variable is required.
because the pointer to g is stored in TLS, so if we want to use a single
TEST/CMP instruction on x86, we need to use another TLS variable (slot)
to store the preempt flag.

On ARM, the situation is very different. i'm afraid we must spill a register
to do the check if we don't reserve another register for the flag (could we
embed the flag as one of the lower bit of m or g register?)

Dmitry Vyukov

unread,
May 13, 2013, 10:51:58 AM5/13/13
to minux, Ian Lance Taylor, Aram Hăvărneanu, golang-dev
Ah, I see, you want to optimize the additional load. That would
optimize split stacks as well.

> On ARM, the situation is very different. i'm afraid we must spill a register
> to do the check if we don't reserve another register for the flag (could we
> embed the flag as one of the lower bit of m or g register?)

This is problematic. We will need to carefully modify all code that
accesses m or g. And this will increase code size and slowdown split
stack preamble.

Ian Lance Taylor

unread,
May 13, 2013, 12:43:16 PM5/13/13
to Dmitry Vyukov, Aram Hăvărneanu, golang-dev
On Mon, May 13, 2013 at 7:22 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, May 13, 2013 at 5:56 PM, Ian Lance Taylor <ia...@golang.org> wrote:
>>
>> On Intel we can use the test instruction, no registers required.
>
> Ah, I see. The register spills will be around the function call on
> cold path, and should not affect hot path, right?

SGTM.

>> Note
>> that I am assuming an additional TLS variable which is set to non-zero
>> when preemption is required. I think that is clearly worth doing if
>> we do decide to go this route.
>
> I just use g->stackguard. So not additional TLS variable is required.

Understood. What I mean is, if we want to insert preemption tests in
loops, then I think we need another TLS variable, so that we can use a
test instruction. Then preempting a goroutine would mean setting both
g->stackguard and this new variable. Otherwise we need to add another
load instruction.

>> I agree that in a range loop over a slice, in which the loop makes no
>> function calls, it may be appropriate to only insert a test every 256
>> times around or something. We would have to measure.
>
> Unless we unroll 256 iterations, I think it's faster just to check
> sp<g->stackguard, then to count iterations and check on every 256-th
> iteration.

Well, we can measure. But memory loads are slow and register
operations are fast. I picked 256 because on Intel testing i%256 == 0
is fast.

Ian

Aram Hăvărneanu

unread,
May 14, 2013, 10:43:08 AM5/14/13
to Ian Lance Taylor, golan...@googlegroups.com
> We don't need to solve the halting problem here. It suffices for the
> compiler to add a function call (or other check for preemption) to
> every backward branch.

While I agree that in practice this is a non-issue, in theory, what
you said is not enough. For example, a series of a billion
instructions with no backwards jump would never be preempted.

--
Aram Hăvărneanu

Dmitry Vyukov

unread,
May 15, 2013, 9:25:32 AM5/15/13
to Ian Lance Taylor, Aram Hăvărneanu, golang-dev
On Mon, May 13, 2013 at 8:43 PM, Ian Lance Taylor <ia...@golang.org> wrote:
>>> On Intel we can use the test instruction, no registers required.
>>
>> Ah, I see. The register spills will be around the function call on
>> cold path, and should not affect hot path, right?
>
> SGTM.
>
>>> Note
>>> that I am assuming an additional TLS variable which is set to non-zero
>>> when preemption is required. I think that is clearly worth doing if
>>> we do decide to go this route.
>>
>> I just use g->stackguard. So not additional TLS variable is required.
>
> Understood. What I mean is, if we want to insert preemption tests in
> loops, then I think we need another TLS variable, so that we can use a
> test instruction. Then preempting a goroutine would mean setting both
> g->stackguard and this new variable. Otherwise we need to add another
> load instruction.


This looks orthogonal to preemption. The additional load equally
increases code size and slowdowns split stack checks. And this
optimization would benefit both.

What do you think if we commit this version (checks only at function
entry)? And then we can do the following things in any order:
1. Introduce an additional register for g->stackguard.
2. Add additional preemption checks on back edges.


>>> I agree that in a range loop over a slice, in which the loop makes no
>>> function calls, it may be appropriate to only insert a test every 256
>>> times around or something. We would have to measure.
>>
>> Unless we unroll 256 iterations, I think it's faster just to check
>> sp<g->stackguard, then to count iterations and check on every 256-th
>> iteration.
>
> Well, we can measure. But memory loads are slow and register
> operations are fast. I picked 256 because on Intel testing i%256 == 0
> is fast.


Probably you are right.
However I am not sure whether gc will be able to put it into a
register in useful cases.

Ian Lance Taylor

unread,
May 15, 2013, 9:32:21 AM5/15/13
to Dmitry Vyukov, Aram Hăvărneanu, golang-dev
On Wed, May 15, 2013 at 6:25 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> What do you think if we commit this version (checks only at function
> entry)? And then we can do the following things in any order:
> 1. Introduce an additional register for g->stackguard.
> 2. Add additional preemption checks on back edges.

That is fine with me, once we are all agreed that we want to introduce
preemption and that this is the right general approach to take.

Ian

minux

unread,
May 15, 2013, 9:37:57 AM5/15/13
to Aram Hăvărneanu, Ian Lance Taylor, golan...@googlegroups.com
how about using a hybrid approach to solve this problem?

first set stackguard to try to stop the goroutine, if it's not stopped
within reasonable time frame, send a signal to its OS thread to
force it to stop as the last resort.

then we don't need to add any new features to the compiler while
still able to handle corner cases like "for {}" or 1 billion instructions
without branches.

not to mention that the common case is also faster.

Dmitry Vyukov

unread,
May 15, 2013, 9:41:32 AM5/15/13
to minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
There were counter-arguments to using signals. In particular it does
not play well with precise GC or becomes very complex. Search for the
previous thread.

Andrew Gerrand

unread,
May 15, 2013, 11:54:29 AM5/15/13
to minux, golang-dev, ia...@golang.org, Aram Hăvărneanu

Those aren't corner cases. They're broken cases. I don't think we should optimise for them.

--
 
---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Keith Randall

unread,
May 15, 2013, 12:58:24 PM5/15/13
to Dmitry Vyukov, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
I looked through hashmap and I don't see anything that would break because of this change.  But it is a bit dicey, for instance if the clearbucket() macro was a function call instead that could trigger a gc, it would break.  In fact, the clearing code *will* break if we move to allowing gc at backedges (not a showstopper - it would be easy to fix).

I'm definitely for this change, but it makes me nervous...  Someone out there has code that depends on function calls not triggering gc.


Dmitry Vyukov

unread,
May 15, 2013, 1:18:50 PM5/15/13
to Keith Randall, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
On Wed, May 15, 2013 at 8:58 PM, Keith Randall <k...@google.com> wrote:
> I looked through hashmap and I don't see anything that would break because
> of this change. But it is a bit dicey, for instance if the clearbucket()
> macro was a function call instead that could trigger a gc, it would break.
> In fact, the clearing code *will* break if we move to allowing gc at
> backedges (not a showstopper - it would be easy to fix).

Thanks!
I will add m->locks++/-- around hashmap functions.


> I'm definitely for this change, but it makes me nervous... Someone out
> there has code that depends on function calls not triggering gc.

That code is fragile and will break sooner or later. If Go it's
difficult to control where mallocs are called.
And, well, it's formally incorrect. So I think it's good if we break it sooner.

Keith Randall

unread,
May 15, 2013, 2:03:01 PM5/15/13
to Dmitry Vyukov, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
On Wed, May 15, 2013 at 10:18 AM, Dmitry Vyukov <dvy...@google.com> wrote:
On Wed, May 15, 2013 at 8:58 PM, Keith Randall <k...@google.com> wrote:
> I looked through hashmap and I don't see anything that would break because
> of this change.  But it is a bit dicey, for instance if the clearbucket()
> macro was a function call instead that could trigger a gc, it would break.
> In fact, the clearing code *will* break if we move to allowing gc at
> backedges (not a showstopper - it would be easy to fix).

Thanks!
I will add m->locks++/-- around hashmap functions.

No, I'm saying you don't need those.  At least not until we implement backedge checks. 

Dmitry Vyukov

unread,
May 15, 2013, 2:21:10 PM5/15/13
to Keith Randall, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
On Wed, May 15, 2013 at 10:03 PM, Keith Randall <k...@google.com> wrote:
>> On Wed, May 15, 2013 at 8:58 PM, Keith Randall <k...@google.com> wrote:
>> > I looked through hashmap and I don't see anything that would break
>> > because
>> > of this change. But it is a bit dicey, for instance if the
>> > clearbucket()
>> > macro was a function call instead that could trigger a gc, it would
>> > break.
>> > In fact, the clearing code *will* break if we move to allowing gc at
>> > backedges (not a showstopper - it would be easy to fix).
>>
>> Thanks!
>> I will add m->locks++/-- around hashmap functions.
>>
> No, I'm saying you don't need those. At least not until we implement
> backedge checks.


Yes, I understand, but I think it's better to add it now. Or would you
prefer to not add it now?

Keith Randall

unread,
May 15, 2013, 2:36:52 PM5/15/13
to Dmitry Vyukov, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
I don't see any reason to add it yet.  Backedge implementation will probably take a while and we might not end up doing it at all.

Dmitry Vyukov

unread,
May 15, 2013, 3:25:02 PM5/15/13
to Keith Randall, minux, Aram Hăvărneanu, Ian Lance Taylor, golang-dev
On Wed, May 15, 2013 at 10:36 PM, Keith Randall <k...@google.com> wrote:
> I don't see any reason to add it yet. Backedge implementation will probably
> take a while and we might not end up doing it at all.

OK

Dave Cheney

unread,
May 15, 2013, 9:59:03 PM5/15/13
to Andrew Gerrand, minux, golang-dev, ia...@golang.org, Aram Hăvărneanu
> Those aren't corner cases. They're broken cases. I don't think we should
> optimise for them.

Seconded. I would be sad if loops got slower just for the edge case
where someone wrote an infinite loop. While this does catch people out
occasionally, the most prominent cases are in synthetic example code
-- real programs do IO, and real concurrent programs use channels to
communicate.

Gustavo Niemeyer

unread,
May 15, 2013, 10:01:12 PM5/15/13
to Dave Cheney, Andrew Gerrand, minux, golang-dev, ia...@golang.org, Aram Hăvărneanu
On Wed, May 15, 2013 at 10:59 PM, Dave Cheney <da...@cheney.net> wrote:
> Seconded. I would be sad if loops got slower just for the edge case
> where someone wrote an infinite loop. While this does catch people out
> occasionally, the most prominent cases are in synthetic example code
> -- real programs do IO,

Agreed.

> and real concurrent programs use channels to communicate.

But this isn't strictly true, FWIW.


gustavo @ http://niemeyer.net

Niklas Schnelle

unread,
May 16, 2013, 8:40:12 AM5/16/13
to golan...@googlegroups.com, Dave Cheney, Andrew Gerrand, minux, ia...@golang.org, Aram Hăvărneanu
Yeah maybe it should read "Real concurrent programs use channels or have working sets of sensible sise"
After all it works perfectly to have a tight pure cpu loop without any IO as long as it finishes in a sensible amount of time. For example when doing image manipulation
a non communicationg concurrent program might change one line of the image per goroutine, because there is shared memory and the working sets don't overlap there
is no need for explicit communication. However this case is already working fine because each line is processed in a finite and sensibly small amount of time.

Kevin Gillette

unread,
May 16, 2013, 11:05:15 AM5/16/13
to golan...@googlegroups.com, Dave Cheney, Andrew Gerrand, minux, ia...@golang.org, Aram Hăvărneanu
On Thursday, May 16, 2013 6:40:12 AM UTC-6, Niklas Schnelle wrote:
After all it works perfectly to have a tight pure cpu loop without any IO as long as it finishes in a sensible amount of time.

There are already numerous ways to deal with this. runtime.Gosched is a mechanism that long-running functions can opt-into at a time of their choosing (so a programmer can choose the location that minimizes context switch cost or a pointless GC sweep). If Gosched is ever intended for deprecation, one can always increment GOMAXPROCS for each long running, non-communicating task, allowing the OS to handle preemption, and since it is non-communicating, there's considerably less that the Go scheduler can offer compared to the OS scheduler. Since GOMAXPROCS may eventually go away, there's still runtime.LockOSThread, which will likely stick around as long as cgo does; in the absence of GOMAXPROCS, the scheduler would have to spawn an extra thread(s) any time there is a runnable communicating goroutine and all the workers are bound to non-communicating goroutines -- once again pushing the job of preemption to the OS.  The real need I'm seeing for full preemption would be to arrange GC sweeps; as far as I'm concerned, there doesn't need to be any out-of-the-way effort put into full preemption solely to give another reason for newcomers to avoid learning the basics of runtime internals.

Ingo Oeser

unread,
May 16, 2013, 1:25:10 PM5/16/13
to golan...@googlegroups.com, Dave Cheney, Andrew Gerrand, minux, ia...@golang.org, Aram Hăvărneanu
This second this line of arguments. Having a way to opt-out of the preemption for tight loops where overscheduling doesn't help is very useful for some workloads. 
Otherwise kernel improvements like reducing timer interrupts for HPC loads will be broken by Go.

Dmitry Vyukov

unread,
May 16, 2013, 1:31:11 PM5/16/13
to Ingo Oeser, golang-dev, Dave Cheney, Andrew Gerrand, minux, ia...@golang.org, Aram Hăvărneanu
If there are no other runnable goroutines, the thread won't actually
be preempted, it will quickly get back to executing the same
goroutine. We may not preempt at all in this case (this is not
implemented yet).

Dmitry Vyukov

unread,
May 17, 2013, 4:11:57 AM5/17/13
to golang-dev
So what is the final decision on this?

On Sun, May 12, 2013 at 7:19 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> Good news everyone!
>
> I have a working prototype of a preemptive scheduler:
> https://codereview.appspot.com/9136045
>
> It's not based on the design I described previously. Instead it
> [ab]uses split stack machinery to preempt long running goroutines.
>
> Sysmon thread watches for P's executing the same goroutine for more
> than X ms. If finds a one it sets g->stackguard to a very large value,
> so that on next split stack check the goroutine calls morestack() and
> gets preempted.
> On high level that's basically it. Less than 100 lines of portable
> code. And virtually zero overhead without preemption. It plays nicely
> with precise GC because goroutine are preempted only at controllable
> points. The implementation is relatively simple because goroutines are
> preempted voluntarily, i.e. no signals and inter-thread
> synchronization.
>
> Currently I use only existing split stack checks. But if we need
> finer-grained preemption, we can easily add additional checks on back
> edges.
>
> all.bash seems to pass on darwin on linux. I've implemented 2
> preemption tests and they work fine. First ensures that the ping-pong
> pattern works, and the second ensures that GC can stop goroutines even
> if they do not call chan/map/malloc:
> https://codereview.appspot.com/9136045/diff/16001/src/pkg/runtime/proc_test.go
>
> I've also tested on the following program, which models one of the GC
> stoptheworld issues we have:
> http://play.golang.org/p/Yzc4Vx-KaF
>
> w/o preemptive scheduling GC pause looks like:
> gc7(8): 0+0+429 ms, 3462 -> 2908 MB
> gc8(8): 0+0+296 ms, 5830 -> 3861 MB
> gc9(8): 0+0+661 ms, 7758 -> 3825 MB
> gc10(8): 0+0+939 ms, 7664 -> 4014 MB
> gc11(8): 0+0+907 ms, 8063 -> 4016 MB
>
> and with preemptive scheduler it looks:
> gc8(8): 0+0+126 ms, 4989 -> 3020 MB
> gc9(8): 0+0+124 ms, 6057 -> 3249 MB
> gc10(8): 0+0+72 ms, 6499 -> 3711 MB
> gc11(8): 0+0+121 ms, 7434 -> 3250 MB
>
> Note significant decrease in stoptheworld pause.
>
> So what do you think about this design?

Ian Lance Taylor

unread,
May 17, 2013, 9:10:09 AM5/17/13
to Dmitry Vyukov, golang-dev
On Fri, May 17, 2013 at 1:11 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> So what is the final decision on this?

To be honest I think we need to see a design for where we want to wind
up in the end. What we have right now is a CL, and we are not clear
about what else if anything is needed.

Ian

Dmitry Vyukov

unread,
May 21, 2013, 8:42:52 AM5/21/13
to Ian Lance Taylor, golang-dev
On Fri, May 17, 2013 at 5:10 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Fri, May 17, 2013 at 1:11 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>> So what is the final decision on this?
>
> To be honest I think we need to see a design for where we want to wind
> up in the end. What we have right now is a CL, and we are not clear
> about what else if anything is needed.


He we go:
Go Preemptive Scheduler Design Doc
https://docs.google.com/document/d/1ETuA2IOmnaQ4j81AtTGT40Y4_Jr6_IDASEKg0t0dBR8/edit?usp=sharing

wbi...@gmail.com

unread,
Aug 13, 2014, 4:34:11 AM8/13/14
to golan...@googlegroups.com, ia...@golang.org
Any update on this?

Russ Cox

unread,
Aug 13, 2014, 7:24:37 AM8/13/14
to wbi...@gmail.com, golang-dev, Ian Taylor
On Wed, Aug 13, 2014 at 4:34 AM, <wbi...@gmail.com> wrote:
Any update on this?

It happened.

bernhard...@gmail.com

unread,
Aug 14, 2014, 10:31:43 AM8/14/14
to golan...@googlegroups.com, wbi...@gmail.com, ia...@golang.org
It happend like it is in one of the releases up to 1.3.1? 
OR
It happend like the patches were merged into 1.4

Just asking for clarification. No restrictions/commitments asked for.

Dmitry Vyukov

unread,
Aug 14, 2014, 10:39:05 AM8/14/14
to Bernhard Spanyar, golang-dev, Bin Wang, Ian Taylor
http://golang.org/doc/go1.2#preemption
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages