Gerrit Bot has uploaded this change for review.
runtime: predict stack sizing
Goroutines can spend significant time in morestack/newstack if the dynamic
call tree is large, or if a specific call tree makes heavy use of the stack
to allocate significant amount of space for temporary variables.
This CL adds a simple stack size predictor based on the pc of the go statement
that starts each goroutine. This approach is predicated on the assumption that
a specific go statement in the program will mostly result in the resulting
goroutine mostly executing the same dynamic call tree (more precisely, dynamic
call trees with similar stack sizing requirements).
The way it works is by embedding in each P a small prediction table with 32 slots.
When a go statement is executed, the pc of the go statement is hashed with a per-P
seed to pick one of the slots. Each slot is a single byte containing a simple
running estimator. The result of the estimation is _StackMin << n, with n in the
range 0 to 15 (i.e. 2KB to 64MB), and it is used to allocate a stack of the
appropriate size. In newstack, called from morestack when we need to allocate a
new stack, we record the highest stack size (highwater) used by each goroutine
(this highwater is stored in the g struct, but thanks to existing padding this
additional 1-byte field does not cause the g struct to increase in size). When a
goroutine exits, the estimator for the pc of the statement that started that
goroutine is updated using the highwater value recorded by the exiting goroutine.
The current estimation scheme is not precise for multiple reasons.
First, multiple pcs could map to the same slot in the per-P table. This is not a
significant problem if we assume that the conflicting pcs are not executed with
the same frequency as in this case the estimator will still converge, albeit more
slowly, to the correct value. Furthermore, each P uses a different seed when
hashing the pc, and the seeds themselves are periodically reset (currently, as
a result of GC - although this is not the only available option).
Second, the highwater mechanism partially relies on stackguard0; the current
runtime sometimes mangles stackguard0 (e.g. when a goroutine needs to be pre-
empted) and this can lead to the highwater of a goroutine to be lower than it
should have been: this also should not lead to problems, apart from the estimator
taking longer to converge to the true value.
The stack size prediction mechanism is disabled by default, and can be enabled by
setting GOEXPERIMENT=predictstacksize.
This CL is currently in a PoC state. It passes all tests locally, and shows
significant promise in the included benchmark, where enabling stack size
prediction leads to a doubling of the performance for medium-large stack sizes.
It is still missing tests for the new feature pending discussion about the
proposed approach.
DO NOT SUBMIT
Fixes #18138
Change-Id: Id2f617e39bbd7ed969d35e1f231ab61c207fa572
GitHub-Last-Rev: 059efa4408dd9bc48b708b6e973d966ef6407444
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
D src/internal/goexperiment/exp_regabi_off.go
D src/internal/goexperiment/exp_regabi_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
10 files changed, 226 insertions(+), 21 deletions(-)
diff --git a/src/internal/goexperiment/exp_predictstacksize_off.go b/src/internal/goexperiment/exp_predictstacksize_off.go
new file mode 100644
index 0000000..019c252
--- /dev/null
+++ b/src/internal/goexperiment/exp_predictstacksize_off.go
@@ -0,0 +1,9 @@
+// Code generated by mkconsts.go. DO NOT EDIT.
+
+//go:build !goexperiment.predictstacksize
+// +build !goexperiment.predictstacksize
+
+package goexperiment
+
+const PredictStackSize = false
+const PredictStackSizeInt = 0
diff --git a/src/internal/goexperiment/exp_predictstacksize_on.go b/src/internal/goexperiment/exp_predictstacksize_on.go
new file mode 100644
index 0000000..e4437a3
--- /dev/null
+++ b/src/internal/goexperiment/exp_predictstacksize_on.go
@@ -0,0 +1,9 @@
+// Code generated by mkconsts.go. DO NOT EDIT.
+
+//go:build goexperiment.predictstacksize
+// +build goexperiment.predictstacksize
+
+package goexperiment
+
+const PredictStackSize = true
+const PredictStackSizeInt = 1
diff --git a/src/internal/goexperiment/exp_regabi_off.go b/src/internal/goexperiment/exp_regabi_off.go
deleted file mode 100644
index 5d88238..0000000
--- a/src/internal/goexperiment/exp_regabi_off.go
+++ /dev/null
@@ -1,9 +0,0 @@
-// Code generated by mkconsts.go. DO NOT EDIT.
-
-//go:build !goexperiment.regabi
-// +build !goexperiment.regabi
-
-package goexperiment
-
-const Regabi = false
-const RegabiInt = 0
diff --git a/src/internal/goexperiment/exp_regabi_on.go b/src/internal/goexperiment/exp_regabi_on.go
deleted file mode 100644
index c08d58e..0000000
--- a/src/internal/goexperiment/exp_regabi_on.go
+++ /dev/null
@@ -1,9 +0,0 @@
-// Code generated by mkconsts.go. DO NOT EDIT.
-
-//go:build goexperiment.regabi
-// +build goexperiment.regabi
-
-package goexperiment
-
-const Regabi = true
-const RegabiInt = 1
diff --git a/src/internal/goexperiment/flags.go b/src/internal/goexperiment/flags.go
index cd4c178..9e645a4 100644
--- a/src/internal/goexperiment/flags.go
+++ b/src/internal/goexperiment/flags.go
@@ -58,6 +58,7 @@
FieldTrack bool
PreemptibleLoops bool
StaticLockRanking bool
+ PredictStackSize bool
// Regabi is split into several sub-experiments that can be
// enabled individually. Not all combinations work.
diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go
index cc22b0f..1c4f882 100644
--- a/src/runtime/malloc.go
+++ b/src/runtime/malloc.go
@@ -1328,7 +1328,7 @@
rate = 0x3fffffff
}
if rate != 0 {
- return uintptr(fastrand() % uint32(2*rate))
+ return uintptr(fastrandn(uint32(2 * rate)))
}
return 0
}
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 7bc2a92..463087a 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3663,6 +3663,10 @@
dropg()
+ // println("goexit0: gopc:", gp.gopc, "highwater:", gp.highwater)
+ gstacksizeupdate(gp.gopc, gp.highwater)
+ gp.highwater = 0
+
if GOARCH == "wasm" { // no threads yet on wasm
gfput(_g_.m.p.ptr(), gp)
schedule() // never returns
@@ -4300,9 +4304,31 @@
}
_p_ := _g_.m.p.ptr()
- newg := gfget(_p_)
+
+ var newg *g
+ stackSize := gstacksizepredict(callerpc)
+ if stackSize != _StackMin {
+ // TODO: call gfget and immediately replace the default stack with one with the predicted size
+ //println("newproc1: callerpc:", callerpc, "predicted stack size:", stackSize)
+ } else {
+ newg = gfget(_p_)
+ }
if newg == nil {
- newg = malg(_StackMin)
+ newg = malg(stackSize)
+ if sz := newg.stack.hi - newg.stack.lo; sz > _FixedStack {
+ // If stack size prediction is enabled and predicts a larger stack than the minimum one,
+ // we set the stackguard as if the stack was only half the size: we do this to detect
+ // goroutines that never grow their stacks, as this means that the prediction was too big.
+ // In this case the highwater will be 0 because newstack is never called, and gstacksizeupdate
+ // will consider the prediction as "too big".
+ // In newstack we handle this case by simply setting the stackguard to its correct value
+ // and resuming execution.
+ // Note that preemption or other events that mangle stackguard0 will overwrite this special
+ // value: this is acceptable, as in this case the only effect will be that highwater will
+ // be 0 even if newstack was called. As long as this happen rarely it should not affect the
+ // estimation significantly.
+ newg.stackguard0 = newg.stack.lo + sz/2 + _StackGuard
+ }
casgstatus(newg, _Gidle, _Gdead)
allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
}
@@ -4532,6 +4558,125 @@
unlock(&sched.gFree.lock)
}
+//go:nosplit
+func gstacksizepredict(pc uintptr) int32 {
+ if !goexperiment.PredictStackSize {
+ return _StackMin
+ }
+ size, _ := gstacksizeinit().entryforPC(pc).get()
+ stack := int32(_StackMin) << size
+ if stack >= _FixedStack {
+ stack /= _FixedStack / _StackMin
+ }
+ return stack
+}
+
+//go:nosplit
+func gstacksizeupdate(pc uintptr, highwater uint8) {
+ if !goexperiment.PredictStackSize {
+ return
+ }
+ gstacksizeinit().entryforPC(pc).update(highwater)
+}
+
+//go:nosplit
+func gstacksizeinit() *gstacksize {
+ // TODO: can we move invalidation to the GC stop-the-world?
+ // TODO: should we use random invalidation instead?
+ // TODO: can we remove invalidation alltogether?
+ cycle := atomic.Load(&work.cycles)
+ _p_ := getg().m.p.ptr()
+ if cycle != _p_.gstacksize.cycle {
+ _p_.gstacksize.cycle = cycle
+ _p_.gstacksize.initseed()
+ for i := range _p_.gstacksize.entries {
+ _p_.gstacksize.entries[i] = 0
+ }
+ }
+ return &_p_.gstacksize
+}
+
+type (
+ gstacksize struct {
+ // TODO: if we could enumerate all valid callerpcs and allocate a slot for each
+ // one of them, we could avoid all conflicts among entries and remove the per-GC
+ // resetting
+ entries [32]gstacksizeentry
+ seed uintptr
+ cycle uint32
+ }
+ gstacksizeentry uint8
+)
+
+//go:nosplit
+func (g *gstacksize) initseed() {
+ if unsafe.Sizeof(g.seed) == 8 {
+ g.seed = uintptr(fastrand()) | (uintptr(fastrand()) << 32)
+ } else {
+ g.seed = uintptr(fastrand())
+ }
+}
+
+//go:nosplit
+func (s *gstacksize) entryforPC(pc uintptr) *gstacksizeentry {
+ var pch uintptr
+ if unsafe.Sizeof(pc) == 8 {
+ pch = memhash64(noescape(unsafe.Pointer(&pc)), s.seed)
+ } else {
+ pch = memhash32(noescape(unsafe.Pointer(&pc)), s.seed)
+ }
+ return &s.entries[pch%uintptr(len(s.entries))]
+}
+
+const freqBits = 4
+
+//go:nosplit
+func (e gstacksizeentry) get() (stacksize, frequency uint8) {
+ return uint8(e) >> freqBits, uint8(e) & ((1 << freqBits) - 1)
+}
+
+//go:nosplit
+func (e *gstacksizeentry) set(stacksize, frequency uint8) {
+ const (
+ maxFreq = 1<<freqBits - 1
+ maxSize = 1<<(8-freqBits) - 1
+ )
+ if stacksize > maxSize || frequency > maxFreq {
+ print("gstacksizeentry.set: stacksize=", stacksize, " frequency=", frequency)
+ throw("invalid stacksize/frequency")
+ }
+ *e = gstacksizeentry((stacksize << freqBits) | frequency)
+}
+
+//go:nosplit
+func (e *gstacksizeentry) update(highwater uint8) {
+ const (
+ maxFreq = 1<<freqBits - 1
+ maxSize = 1<<(8-freqBits) - 1
+ updateFreq = 10 // update probability = 1 over updateFreq
+ )
+ cursize, curfreq := e.get()
+ if highwater == cursize || fastrandn(updateFreq) != 0 {
+ return
+ }
+ if highwater < cursize {
+ if curfreq > 0 {
+ curfreq--
+ } else if cursize > 0 {
+ cursize--
+ curfreq = maxFreq / 2
+ }
+ } else /* if highwater > cursize */ {
+ if curfreq < maxFreq {
+ curfreq++
+ } else if cursize < maxSize {
+ cursize++
+ curfreq = maxFreq / 2
+ }
+ }
+ e.set(cursize, curfreq)
+}
+
// Breakpoint executes a breakpoint trap.
func Breakpoint() {
breakpoint()
@@ -4886,6 +5031,8 @@
// Similarly, we may not go through pidleget before this P starts
// running if it is P 0 on startup.
idlepMask.clear(id)
+
+ pp.gstacksize.initseed()
}
// destroy releases all of the resources associated with pp and
diff --git a/src/runtime/proc_test.go b/src/runtime/proc_test.go
index 53cafe8..ef24992 100644
--- a/src/runtime/proc_test.go
+++ b/src/runtime/proc_test.go
@@ -18,6 +18,7 @@
"syscall"
"testing"
"time"
+ "unsafe"
)
var stop = make(chan bool, 1)
@@ -692,6 +693,27 @@
}
}
+func BenchmarkCreateGoroutinesLargeStack(b *testing.B) {
+ c := make(chan bool)
+ var f func(n uintptr)
+ f = func(n uintptr) {
+ if n == 0 {
+ c <- true
+ return
+ }
+ f(n - 1)
+ }
+ for _, sz := range []uintptr{0, 1, 2, 4, 16, 64, 256} {
+ b.Run(fmt.Sprintf("%d", sz), func(b *testing.B) {
+ b.ReportAllocs()
+ for i := 0; i < b.N; i++ {
+ go f(sz * 1024 / (3 * unsafe.Sizeof(uintptr(0))))
+ <-c
+ }
+ })
+ }
+}
+
// warmupScheduler ensures the scheduler has at least targetThreadCount threads
// in its thread pool.
func warmupScheduler(targetThreadCount int) {
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index 5051ec4..b40c058 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -460,6 +460,12 @@
// for stack shrinking. It's a boolean value, but is updated atomically.
parkingOnChan uint8
+ // highwater is the highwater mark for the stack size allocated during the lifetime
+ // of this goroutine. It is updated in morestack, and used by gstacksizeupdate to
+ // decide whether the initial stack size was appropriate or not.
+ // The actual value is 2^highwater.
+ highwater uint8
+
raceignore int8 // ignore race detection events
sysblocktraced bool // StartTrace has emitted EvGoInSyscall about this goroutine
tracking bool // whether we're tracking this G for sched latency statistics
@@ -738,6 +744,9 @@
// scheduler ASAP (regardless of what G is running on it).
preempt bool
+ // dynamic initial stack sizing state
+ gstacksize gstacksize
+
// Padding is no longer needed. False sharing is now not a worry because p is large enough
// that its size class is an integer multiple of the cache line size (for any of our architectures).
}
diff --git a/src/runtime/stack.go b/src/runtime/stack.go
index 6e0d157..2aee46f 100644
--- a/src/runtime/stack.go
+++ b/src/runtime/stack.go
@@ -1076,6 +1076,7 @@
// Don't double the stack (or we may quickly run out
// if this is done repeatedly).
newsize = oldsize
+ gp.stackguard0 = gp.stack.lo + _StackGuard
}
if newsize > maxstacksize || newsize > maxstackceiling {
@@ -1088,6 +1089,21 @@
throw("stack overflow")
}
+ gp.highwater = ceillog2(newsize)
+
+ if gp.stackguard0 != gp.stack.lo+_StackGuard {
+ // This happens when stack size prediction is enabled and the stack grows
+ // for the first time. When the G is created with a custom stack size we set
+ // the stackguard as if the stack was only half the size that was allocated:
+ // we do this to detect goroutines that never grow their stacks, as this means
+ // that the prediction was too big.
+ // In this case here we just adjust the stackguard and resume.
+ // TODO: Check that we have enough space on the stack to avoid a second call
+ // to newstack.
+ gp.stackguard0 = gp.stack.lo + _StackGuard
+ gogo(&gp.sched)
+ }
+
// The goroutine must be executing in order to call newstack,
// so it must be Grunning (or Gscanrunning).
casgstatus(gp, _Grunning, _Gcopystack)
@@ -1393,3 +1409,13 @@
func morestackc() {
throw("attempt to execute system stack code on user stack")
}
+
+// returns ceil(log2(n/_StackMin))
+func ceillog2(n uintptr) (x uint8) {
+ // TODO: avoid the loop
+ for n > _StackMin {
+ x++
+ n >>= 1
+ }
+ return
+}
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Gerrit Bot uploaded patch set #2 to this change.
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
6 comments:
File src/internal/goexperiment/exp_regabi_off.go:
these two files were probably mistakenly left behind in a previous CL, they were deleted when running go generate
File src/runtime/proc.go:
Patch Set #2, Line 3666: // println("goexit0: gopc:", gp.gopc, "highwater:", gp.highwater)
leftover to be removed
Patch Set #2, Line 4312: //println("newproc1: callerpc:", callerpc, "predicted stack size:", stackSize)
leftover to be removed
Patch Set #2, Line 4561: //go:nosplit
note: all the nosplit are leftovers; they are not required and will be removed
Patch Set #2, Line 4586: // TODO: can we remove invalidation alltogether?
suggestions welcome about these TODOs
maybe these should be moved to a different home to avoid polluting proc.go; suggestions are welcome
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Gerrit Bot uploaded patch set #3 to this change.
GitHub-Last-Rev: ef98af83708aea641f43b3d4e93392c76b671bba
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
D src/internal/goexperiment/exp_regabi_off.go
D src/internal/goexperiment/exp_regabi_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
10 files changed, 234 insertions(+), 21 deletions(-)
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
4 comments:
Patchset:
I also fixed the commit message, but for reasons unknown to me gerrit did not update it correctly.
File src/runtime/proc.go:
Patch Set #2, Line 3666: // println("goexit0: gopc:", gp.gopc, "highwater:", gp.highwater)
leftover to be removed
Done
Patch Set #2, Line 4312: //println("newproc1: callerpc:", callerpc, "predicted stack size:", stackSize)
leftover to be removed
Done
Patch Set #2, Line 4561: //go:nosplit
note: all the nosplit are leftovers; they are not required and will be removed
Done
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Gerrit Bot uploaded patch set #4 to this change.
First, multiple PCs could map to the same slot in the per-P table. This
is not a significant problem if we assume that the conflicting pcs are
not executed with the same frequency as in this case the estimator will
still converge, albeit more slowly, to the correct value. Furthermore,
each P uses a different seed when hashing the pc, and the seeds
themselves are periodically reset (currently, as a result of GC -
although this is not the only available option).
Second, the highwater mechanism partially relies on stackguard0; the
current runtime sometimes mangles stackguard0 (e.g. when a goroutine
needs to be preempted) and this can lead to the highwater of a goroutine
to be lower than it should have been: this also should not lead to
problems, apart from the estimator taking longer to converge to the true
value.
The stack size prediction mechanism is disabled by default, and can be
enabled by setting GOEXPERIMENT=predictstacksize. The plan is to
eventually enable it default, and later remove the experiment
alltogether.
This CL is currently in a PoC state. It passes all tests locally, and
shows significant promise in the included benchmark, where enabling stack
size prediction leads to a doubling of the performance for medium-large
stack sizes. It is still missing tests for the new feature pending
discussion about the proposed approach.
Change-Id: Id2f617e39bbd7ed969d35e1f231ab61c207fa572
GitHub-Last-Rev: 1447823711b7e460cb410c5c7012378622aac4f7
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
8 files changed, 234 insertions(+), 4 deletions(-)
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
Gerrit Bot uploaded patch set #5 to this change.
GitHub-Last-Rev: 698c4652666e455475361ede262d975ecb135296
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
8 files changed, 242 insertions(+), 3 deletions(-)
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
Gerrit Bot uploaded patch set #6 to this change.
runtime: predict stack sizing
Goroutines can spend significant time in morestack/newstack if the dynamic
call tree is large, or if a specific call tree makes heavy use of the stack
to allocate significant amount of space for temporary variables.
This CL adds a simple stack size predictor based on the pc of the go statement
that starts each goroutine. This approach is predicated on the assumption that
a specific go statement in the program will mostly result in the resulting
goroutine mostly executing the same dynamic call tree (more precisely, dynamic
call trees with similar stack sizing requirements).
The way it works is by embedding in each P a small prediction table with 32 slots.
When a go statement is executed, the pc of the go statement is hashed with a per-P
seed to pick one of the slots. Each slot is a single byte containing a simple
running estimator. The result of the estimation is _StackMin << n, with n in the
range 0 to 15 (i.e. 2KB to 64MB), and it is used to allocate a stack of the
appropriate size. In newstack, called from morestack when we need to allocate a
new stack, we record the highest stack size (highwater) used by each goroutine
(this highwater is stored in the g struct, but thanks to existing padding this
additional 1-byte field does not cause the g struct to increase in size). When a
goroutine exits, the estimator for the pc of the statement that started that
goroutine is updated using the highwater value recorded by the exiting goroutine.
The current estimation scheme is not precise for multiple reasons.
First, multiple pcs could map to the same slot in the per-P table. This is not a
significant problem if we assume that the conflicting pcs are not executed with
the same frequency as in this case the estimator will still converge, albeit more
slowly, to the correct value. Furthermore, each P uses a different seed when
hashing the pc, and the seeds themselves are periodically reset (currently, as
a result of GC - although this is not the only available option).
Second, the highwater mechanism partially relies on stackguard0; the current
runtime sometimes mangles stackguard0 (e.g. when a goroutine needs to be pre-
empted) and this can lead to the highwater of a goroutine to be lower than it
should have been: this also should not lead to problems, apart from the estimator
taking longer to converge to the true value.
The stack size prediction mechanism is disabled by default, and can be enabled by
setting GOEXPERIMENT=predictstacksize.
This CL is currently in a PoC state. It passes all tests locally, and shows
significant promise in the included benchmark, where enabling stack size
prediction leads to a doubling of the performance for medium-large stack sizes.
It is still missing tests for the new feature pending discussion about the
proposed approach.
DO NOT SUBMIT
Fixes #18138
Change-Id: Id2f617e39bbd7ed969d35e1f231ab61c207fa572
Change-Id: Id2f617e39bbd7ed969d35e1f231ab61c207fa572
GitHub-Last-Rev: ef98af83708aea641f43b3d4e93392c76b671bba
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
8 files changed, 242 insertions(+), 3 deletions(-)
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
Gerrit Bot uploaded patch set #7 to this change.
First, multiple PCs could map to the same slot in the per-P table. This
is not a significant problem if we assume that the conflicting pcs are
not executed with the same frequency as in this case the estimator will
still converge, albeit more slowly, to the correct value. Furthermore,
each P uses a different seed when hashing the pc, and the seeds
themselves are periodically reset (currently, as a result of GC -
although this is not the only available option).
Second, the highwater mechanism partially relies on stackguard0; the
current runtime sometimes mangles stackguard0 (e.g. when a goroutine
needs to be preempted) and this can lead to the highwater of a goroutine
to be lower than it should have been: this also should not lead to
problems, apart from the estimator taking longer to converge to the true
value.
The stack size prediction mechanism is disabled by default, and can be
enabled by setting GOEXPERIMENT=predictstacksize. The plan is to
eventually enable it default, and later remove the experiment
alltogether.
This CL is currently in a PoC state. It passes all tests locally, and
shows significant promise in the included benchmark, where enabling stack
size prediction leads to a doubling of the performance for medium-large
stack sizes. It is still missing tests for the new feature pending
discussion about the proposed approach.
Change-Id: Id2f617e39bbd7ed969d35e1f231ab61c207fa572
GitHub-Last-Rev: 698c4652666e455475361ede262d975ecb135296
GitHub-Pull-Request: golang/go#47679
---
A src/internal/goexperiment/exp_predictstacksize_off.go
A src/internal/goexperiment/exp_predictstacksize_on.go
M src/internal/goexperiment/flags.go
M src/runtime/malloc.go
M src/runtime/proc.go
M src/runtime/proc_test.go
M src/runtime/runtime2.go
M src/runtime/stack.go
8 files changed, 242 insertions(+), 3 deletions(-)
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
1 comment:
File src/runtime/proc.go:
Patch Set #7, Line 4260: newg = malg(stackSize)
Wondering if the new G gets a pretty large stack but doesn't use too much, less than 1/4 of available space, when shrinkstack is called from within GC, will the shrink be prevented as we foresee the G consumes a large stack eventually?
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Xiangdong JI, Keith Randall.
1 comment:
File src/runtime/proc.go:
Patch Set #7, Line 4260: newg = malg(stackSize)
Wondering if the new G gets a pretty large stack but doesn't use too much, less than 1/4 of availabl […]
I did not test this specifically, but nothing in this CL should prevent shrinkstack from shrinking the stacks of all Gs. I briefly considered the option of preventing shrinking, but it would be an even more significant change than the current one, and in the worst possible scenario (wrong estimation, goroutine running forever) could lead to memory leaks. Furthermore, this CL is useful even without (e.g. in the common case of goroutine-per-request).
Nothing in this CL prevents separate improvements to shrinkstack though.
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, x j.
1 comment:
Patchset:
I just submitted CL 345889, which is a somewhat simpler approach to solving the same problem. Let me know if it looks like it might solve your use case.
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.
Gopher Robot abandoned this change.
To view, visit change 341990. To unsubscribe, or for help writing mail filters, visit settings.