Gerrit Bot has uploaded this change for review.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution for.
An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state. This commit does effectively the same, except for the
running state. Users are free to use this metric to power histograms or
tracking on-CPU time across a set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 1d36b9333a481fb91bdd52ac130d8785404cbf4d
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/src/runtime/metrics.go b/src/runtime/metrics.go
index ba0a920..c741f9a 100644
--- a/src/runtime/metrics.go
+++ b/src/runtime/metrics.go
@@ -274,6 +274,12 @@
in.sysStats.gcMiscSys + in.sysStats.otherSys
},
},
+ "/sched/goroutine/running:nanoseconds": {
+ compute: func(_ *statAggregate, out *metricValue) {
+ out.kind = metricKindUint64
+ out.scalar = uint64(grunningnanos())
+ },
+ },
"/sched/goroutines:goroutines": {
compute: func(_ *statAggregate, out *metricValue) {
out.kind = metricKindUint64
diff --git a/src/runtime/metrics/description.go b/src/runtime/metrics/description.go
index c147cad..cce0eac 100644
--- a/src/runtime/metrics/description.go
+++ b/src/runtime/metrics/description.go
@@ -215,6 +215,11 @@
Kind: KindUint64,
},
{
+ Name: "/sched/goroutine/running:nanoseconds",
+ Description: "Time spent by the current goroutine in the running state.",
+ Kind: KindUint64,
+ },
+ {
Name: "/sched/goroutines:goroutines",
Description: "Count of live goroutines.",
Kind: KindUint64,
diff --git a/src/runtime/metrics/doc.go b/src/runtime/metrics/doc.go
index 91ef030..241fd705 100644
--- a/src/runtime/metrics/doc.go
+++ b/src/runtime/metrics/doc.go
@@ -164,6 +164,9 @@
by code called via cgo or via the syscall package.
Sum of all metrics in /memory/classes.
+ /sched/goroutine/running:nanoseconds
+ Time spent by the current goroutine in the running state.
+
/sched/goroutines:goroutines
Count of live goroutines.
diff --git a/src/runtime/metrics_test.go b/src/runtime/metrics_test.go
index 5d32ef4..d93eaf8 100644
--- a/src/runtime/metrics_test.go
+++ b/src/runtime/metrics_test.go
@@ -222,6 +222,10 @@
for i := range h.Counts {
gc.pauses += h.Counts[i]
}
+ case "/sched/goroutine/running:nanoseconds":
+ if samples[i].Value.Uint64() < 1 {
+ t.Error("goroutine running nanoseconds is less than one")
+ }
case "/sched/goroutines:goroutines":
if samples[i].Value.Uint64() < 1 {
t.Error("number of goroutines is less than one")
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index b997a46..efa9190 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -990,8 +990,18 @@
}
}
- // Handle tracking for scheduling latencies.
+ // Handle tracking for scheduling and running latencies.
+ now := nanotime()
+ if newval == _Grunning {
+ // We're transitioning into the running state, record the timestamp for
+ // subsequent use.
+ gp.lastsched = now
+ }
if oldval == _Grunning {
+ // We're transitioning out of running, record how long we were in the
+ // state.
+ gp.runningnanos += now - gp.lastsched
+
// Track every 8th time a goroutine transitions out of running.
if gp.trackingSeq%gTrackingPeriod == 0 {
gp.tracking = true
@@ -3240,6 +3250,12 @@
setGNoWB(&_g_.m.curg, nil)
}
+// grunningnanos returns the running time observed by the current g.
+func grunningnanos() int64 {
+ gp := getg()
+ return gp.runningnanos + nanotime() - gp.lastsched
+}
+
// checkTimers runs any timers for the P that are ready.
// If now is not 0 it is the current time.
// It returns the passed time or the current time if now was passed as 0.
@@ -3472,6 +3488,8 @@
gp.param = nil
gp.labels = nil
gp.timer = nil
+ gp.lastsched = 0
+ gp.runningnanos = 0
if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
// Flush assist credit to the global pool. This gives
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index 3d01ac5..630a3ab 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -486,6 +486,8 @@
labels unsafe.Pointer // profiler labels
timer *timer // cached timer for time.Sleep
selectDone uint32 // are we participating in a select and did someone win the race?
+ lastsched int64 // timestamp when the G last started running
+ runningnanos int64 // total time spent in the running state
// Per-G GC state
diff --git a/src/runtime/sizeof_test.go b/src/runtime/sizeof_test.go
index ebf544a..2ae536d 100644
--- a/src/runtime/sizeof_test.go
+++ b/src/runtime/sizeof_test.go
@@ -21,7 +21,7 @@
_32bit uintptr // size on 32bit platforms
_64bit uintptr // size on 64bit platforms
}{
- {runtime.G{}, 236, 392}, // g, but exported for testing
+ {runtime.G{}, 236, 408}, // g, but exported for testing
{runtime.Sudog{}, 56, 88}, // sudog, but exported for testing
}
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Gerrit Bot uploaded patch set #2 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state. This commit does effectively the same, except for the
running state. Users are free to use this metric to power histograms or
tracking on-CPU time across a set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 50f1cb6f316a2e105c1177c5cb6b905de057c61c
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 77 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Gerrit Bot uploaded patch set #3 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state ([go-review.googlesource.com/c/go/+/308933](https://go-review.googlesource.com/c/go/+/308933)). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 3b7821c5c782bb0c459288b0e488ef10f4aeb488
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 78 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
Gerrit Bot uploaded patch set #4 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state ([go-review.googlesource.com/c/go/+/308933](https://go-review.googlesource.com/c/go/+/308933)). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 231b71cec190a42e18c13d1101c643855a76f31c
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 76 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.
Gerrit Bot uploaded patch set #5 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 231b71cec190a42e18c13d1101c643855a76f31c
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 76 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek.
Patch set 5:Run-TryBot +1
Attention is currently required from: Austin Clements, Michael Knyszek.
1 comment:
Patchset:
Somewhere in here we'll need to note the limitations. Notably that (as your metric description says) this is just "wall time spent in the running state". A running goroutine may be running on an OS thread that is descheduled by the OS scheduler. This time is not "on-cpu" but will count towards this metric.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Patchset:
See further discussion at https://github.com/golang/go/issues/41554#issuecomment-1050043124.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek.
Gerrit Bot uploaded patch set #6 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 177a434cab882ad5b855d29061452b502e18cb84
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 78 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek.
Gerrit Bot uploaded patch set #7 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 0d55fd11ce5e216c30f3802099dbf97292fcd8c7
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
6 files changed, 74 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
Gerrit Bot uploaded patch set #8 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 75d884d42a227d967b95b6c34342b432424fda59
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
6 files changed, 74 insertions(+), 2 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
3 comments:
Patchset:
5 of 28 TryBots failed. […]
Looks like on windows builds it's possible for the nanoseconds captured right at the start of TestReadMetricsConsistency to be zero. That's a bit surprising to me; I'm not sure I understand after reading the nanotime1 assembly impl for windows builds. I also noticed we don't have tests for /sched/latencies:seconds, presumably for similar reasons (?). Open to ideas about testing after we make progress on https://github.com/golang/go/issues/41554.
Updated TestSizeOf for 32-bit machines (forgot that I was running local tests only on a 64-bit machine).
Somewhere in here we'll need to note the limitations. […]
Done
See further discussion at https://github.com/golang/go/issues/41554#issuecomment-1050043124.
Ack, let's continue the conversation there.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
1 comment:
Patchset:
Ping Michael Pratt. Do we want this for 1.19, or wait another cycle? Thanks.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
1 comment:
File src/runtime/proc.go:
Patch Set #8, Line 1016: now := nanotime()
We have a few calls to `nanotime()` here that are redundant with the new unconditional call above.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Pratt.
4 comments:
Commit Message:
Patch Set #8, Line 7: on-cpu
this isn't totally accurate because nanotime doesn't properly measure CPU time. it doesn't account for a thread that's say, descheduled by the OS. there needs to be a lot of caveats and care given to the documentation about that.
Patchset:
sorry for letting this fall through the cracks. thanks for the change.
I don't fully see how this resolves #41554. in that issue, AFAICT, the desire was to attribute CPU time to *specific* goroutines, whereas this just exports a single total.
File src/runtime/metrics.go:
Patch Set #8, Line 277: nanoseconds
by convention, we expose time information as float64 seconds. better dynamic range and for most runtime values, it retains maximum precision.
File src/runtime/sizeof_test.go:
Patch Set #8, Line 24: 252, 408},
this is a fairly large increase to the G struct. it may be worth considering how we can either reorder the struct (though, not too much) to use up any spare padding, or if there's a way to encode the same information in fewer bytes (or, reuse existing fields).
this is a cost incurred for every goroutine so we want to be careful about it; it can add up. this test exists so that we can capture that.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Pratt.
1 comment:
Patchset:
sorry for letting this fall through the cracks. thanks for the change. […]
ah sorry, I misunderstood. the metric returns a different value depending on the goroutine that read it.
I'm not certain this is a path we want to go down for this package; it feels a little strange that a different goroutine could read a different value *because* it's a different goroutine.
because there's a subtlety here that doesn't exist for other metrics, this package seems like the wrong place for it (and maybe this should just be its own API)? I don't know.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
5 comments:
Commit Message:
Patch Set #8, Line 7: on-cpu
this isn't totally accurate because nanotime doesn't properly measure CPU time. […]
Re-worded to be more precise "track running time per goroutine", suggesting that it only captures the time spent in the running state (which could include time spent when descheduled by the OS). I also borrowed wording from Michael Pratt above for the godoc documentation by saying it was wall time. Open to other suggestions about how to make this more precise.
Patchset:
the metric returns a different value depending on the goroutine that read it.
Yup. There's some discussion in https://github.com/golang/go/issues/41554#issuecomment-1050063617; I'm open to exporting this metric through other means (including adding the API in some other package) -- I'll defer to you for what you'd prefer .
Thanks for reviewing!
File src/runtime/metrics.go:
Patch Set #8, Line 277: nanoseconds
by convention, we expose time information as float64 seconds. […]
Given this specifically exports nanoseconds, the uint64 felt appropriate (it's also the type we're getting out of `nanotime()`).
File src/runtime/sizeof_test.go:
Patch Set #8, Line 24: 252, 408},
Re-ordered a field and used `aligncheck` + `structlayout` to make sure we weren't leaving anything obvious on the table, brought it down by 8 bytes.
if there's a way to encode the same information in fewer bytes (or, reuse existing fields).
Nothing jumped out at me, do you have suggestions?
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.
Gerrit Bot uploaded patch set #9 to this change.
runtime,runtime/metrics: track on-cpu time per goroutine
Fixes #41554.
This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.
An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.
It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.
Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 2b9dc3c5463ce2aec73db3879498aca44e666fa6
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
6 files changed, 75 insertions(+), 5 deletions(-)
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Nathan VanBenschoten.
1 comment:
File src/runtime/proc.go:
Patch Set #8, Line 1016: now := nanotime()
We have a few calls to `nanotime()` here that are redundant with the new unconditional call above.
Done
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Irfan Sharif, Michael Pratt, Nathan VanBenschoten.
1 comment:
File src/runtime/metrics.go:
Patch Set #8, Line 277: nanoseconds
Given this specifically exports nanoseconds, the uint64 felt appropriate (it's also the type we're g […]
I understand, but as a rule, this package avoids providing units without prefixes so it's easier to parse and manipulate downstream. the fact that we measure time in nanoseconds is an internal implementation detail.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Irfan Sharif, Michael Pratt, Nathan VanBenschoten.
2 comments:
File src/runtime/metrics/doc.go:
Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.
This lets a goroutine have a peek into its history, to see if it's been around for a while or is newly launched. And, that view isn't available via any other goroutine. Is there precedent for this type of interface? It's no "goroutine local storage", but it's unlike the others I've seen.
What if this information were (instead) part of the text-format stack traces, similar to the "blocked (for 5 minutes)" output today. There's a way for a goroutine to get that information about itself, and there's a way for a goroutine to get the view for the whole app (runtime.Stack with all=true).
File src/runtime/proc.go:
Patch Set #9, Line 998: now := nanotime()
What effect does this have on scheduling performance? This looks like it would be two or three vDSO calls each time a goroutine is scheduled (either running->runnable->running or running->waiting->runnable->running), whether or not an app is using this counter.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Pratt, Nathan VanBenschoten, Rhys Hiltner.
2 comments:
File src/runtime/metrics/doc.go:
Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.
This lets a goroutine have a peek into its history, to see if it's been around for a while or is new […]
Are you asking whether there's precedence for a goroutine viewing state that isn't viewable by others? We use https://github.com/petermattis/goid for e.g. which lets a goroutine peek into its private ID, though admittedly we're linking into runtime internals to make that happen.
I'd be very happy with surfacing this information only as part of the text-format stack traces, which gives goroutines a way to grep for it for themselves and the whole app; done (PTAL). I don't know what the best textual format here should be so.
File src/runtime/proc.go:
Patch Set #9, Line 998: now := nanotime()
What effect does this have on scheduling performance? This looks like it would be two or three vDSO […]
To evaluate scheduling performance, I tried running the microbenchmarks in this package with and without these changes. I'm not observing material differences. I wanted to ask you + others if you have suggestions on the kind of benchmarks that would help better quantify the hit on scheduling performance. In addition to the microbenchmarks below, as smoke tests I also ran system-wide benchmarks after building CockroachDB using this Go patch observing nothing outside noise when running stock workloads (TPC-C, KV).
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
name old time/op new time/op delta
PingPongHog-24 517ns ± 8% 513ns ±13% ~ (p=0.690 n=5+5)
CreateGoroutines-24 302ns ± 1% 304ns ± 1% ~ (p=0.310 n=5+5)
CreateGoroutinesParallel-24 33.6ns ± 1% 34.0ns ± 1% +1.30% (p=0.032 n=5+5)
CreateGoroutinesCapture-24 2.86µs ± 2% 2.89µs ± 1% ~ (p=0.310 n=5+5)
CreateGoroutinesSingle-24 406ns ± 0% 407ns ± 1% ~ (p=0.421 n=5+5)
WakeupParallelSpinning/0s-24 14.7µs ± 1% 14.7µs ± 2% ~ (p=0.548 n=5+5)
WakeupParallelSpinning/1µs-24 19.0µs ± 3% 18.9µs ± 3% ~ (p=1.000 n=5+5)
WakeupParallelSpinning/2µs-24 24.4µs ± 4% 24.7µs ± 2% ~ (p=0.421 n=5+5)
WakeupParallelSpinning/5µs-24 36.7µs ± 3% 37.0µs ± 1% ~ (p=0.548 n=5+5)
WakeupParallelSpinning/10µs-24 54.0µs ± 0% 54.0µs ± 1% ~ (p=0.802 n=5+5)
WakeupParallelSpinning/20µs-24 96.3µs ± 0% 96.1µs ± 0% ~ (p=0.222 n=5+5)
WakeupParallelSpinning/50µs-24 222µs ± 0% 222µs ± 0% ~ (p=0.690 n=5+5)
WakeupParallelSpinning/100µs-24 386µs ± 2% 391µs ± 3% ~ (p=0.310 n=5+5)
WakeupParallelSyscall/0s-24 171µs ± 1% 170µs ± 1% ~ (p=0.095 n=5+5)
WakeupParallelSyscall/1µs-24 173µs ± 1% 172µs ± 1% ~ (p=0.222 n=5+5)
WakeupParallelSyscall/2µs-24 176µs ± 2% 174µs ± 1% ~ (p=0.421 n=5+5)
WakeupParallelSyscall/5µs-24 183µs ± 1% 184µs ± 1% ~ (p=0.095 n=5+5)
WakeupParallelSyscall/10µs-24 190µs ± 0% 193µs ± 1% +1.49% (p=0.008 n=5+5)
WakeupParallelSyscall/20µs-24 213µs ± 1% 213µs ± 1% ~ (p=0.548 n=5+5)
WakeupParallelSyscall/50µs-24 274µs ± 1% 275µs ± 3% ~ (p=0.690 n=5+5)
WakeupParallelSyscall/100µs-24 377µs ± 1% 380µs ± 3% ~ (p=0.151 n=5+5)
Matmult-24 0.97ns ± 1% 0.97ns ± 2% ~ (p=0.841 n=5+5)
name old alloc/op new alloc/op delta
CreateGoroutinesCapture-24 144B ± 0% 144B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
CreateGoroutinesCapture-24 5.00 ± 0% 5.00 ± 0% ~ (all equal)
Retrieving the goroutine running time (included is the vDSO call to nanotime())
is in the order of nanoseconds:
// BenchmarkGRunningNanos measures how costly it is to read the current
// goroutine's running time when going through an internal runtime API.
func BenchmarkGRunningNanos(b *testing.B) {
for n := 0; n < b.N; n++ {
_ = grunningnanos()
}
}
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkGRunningNanos
BenchmarkGRunningNanos-24 195321096 30.67 ns/op
BenchmarkGRunningNanos-24 195100147 30.77 ns/op
BenchmarkGRunningNanos-24 195415414 30.71 ns/op
BenchmarkGRunningNanos-24 195564742 30.70 ns/op
BenchmarkGRunningNanos-24 195472393 30.70 ns/op
PASS
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Michael Pratt, Nathan VanBenschoten, Rhys Hiltner.
1 comment:
File src/runtime/metrics/doc.go:
Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.
Are you asking whether there's precedence for a goroutine viewing state that isn't viewable by other […]
Oops, didn't actually push out the change to include the info in the stack trace. I wanted to first qualify the performance hit in the thread below, ask about what the textual format could be before proceeding. Do we also want to move the proposal through the Proposals board first (https://github.com/golang/go/issues/41554#issuecomment-1145394074)?
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Nathan VanBenschoten, Rhys Hiltner.
1 comment:
Patchset:
Ping @mkny...@google.com Question below: move to proposal process?
Also this CL has picked up a merge conflict.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Austin Clements, Irfan Sharif, Nathan VanBenschoten, Rhys Hiltner.
1 comment:
File src/runtime/proc.go:
Patch Set #9, Line 998: now := nanotime()
To evaluate scheduling performance, I tried running the microbenchmarks in this package with and wit […]
The `PingPongHog` result caught my eye because it seems implausible given the ~30ns result for `BenchmarkGRunningNanos`. When I run this benchmark locally comparing HEAD^ of this patch against HEAD, I get the result below.
```
name old time/op new time/op delta
PingPongHog-10 281ns ±13% 354ns ±12% +25.97% (p=0.000 n=10+9)
```
A 25% regression on this benchmark might not be noticeable for many real-world workloads, but it seems important for evaluating whether this tracking needs an opt-in/opt-out mechanism.
I'm on arm64 (M1), but I'd expect similar results for amd64. Could you please verify the results again on your end?
PS: I like this feature and would love to see it land in Go. We also have "EXPLAIN ANALYZE" and resource isolation use cases for it at Datadog.
To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.