[go] runtime,runtime/metrics: track on-cpu time per goroutine

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 9:00:59 AM2/24/22

to goph...@pubsubhelper.golang.org, Irfan Sharif, golang-co...@googlegroups.com

Gerrit Bot has uploaded this change for review.

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution for.

An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state. This commit does effectively the same, except for the
running state. Users are free to use this metric to power histograms or
tracking on-CPU time across a set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 1d36b9333a481fb91bdd52ac130d8785404cbf4d
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/src/runtime/metrics.go b/src/runtime/metrics.go
index ba0a920..c741f9a 100644
--- a/src/runtime/metrics.go
+++ b/src/runtime/metrics.go
@@ -274,6 +274,12 @@
 					in.sysStats.gcMiscSys + in.sysStats.otherSys
 			},
 		},
+		"/sched/goroutine/running:nanoseconds": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(grunningnanos())
+			},
+		},
 		"/sched/goroutines:goroutines": {
 			compute: func(_ *statAggregate, out *metricValue) {
 				out.kind = metricKindUint64
diff --git a/src/runtime/metrics/description.go b/src/runtime/metrics/description.go
index c147cad..cce0eac 100644
--- a/src/runtime/metrics/description.go
+++ b/src/runtime/metrics/description.go
@@ -215,6 +215,11 @@
 		Kind:        KindUint64,
 	},
 	{
+		Name:        "/sched/goroutine/running:nanoseconds",
+		Description: "Time spent by the current goroutine in the running state.",
+		Kind:        KindUint64,
+	},
+	{
 		Name:        "/sched/goroutines:goroutines",
 		Description: "Count of live goroutines.",
 		Kind:        KindUint64,
diff --git a/src/runtime/metrics/doc.go b/src/runtime/metrics/doc.go
index 91ef030..241fd705 100644
--- a/src/runtime/metrics/doc.go
+++ b/src/runtime/metrics/doc.go
@@ -164,6 +164,9 @@
 		by code called via cgo or via the syscall package.
 		Sum of all metrics in /memory/classes.
 
+	/sched/goroutine/running:nanoseconds
+		Time spent by the current goroutine in the running state.
+
 	/sched/goroutines:goroutines
 		Count of live goroutines.
 
diff --git a/src/runtime/metrics_test.go b/src/runtime/metrics_test.go
index 5d32ef4..d93eaf8 100644
--- a/src/runtime/metrics_test.go
+++ b/src/runtime/metrics_test.go
@@ -222,6 +222,10 @@
 			for i := range h.Counts {
 				gc.pauses += h.Counts[i]
 			}
+		case "/sched/goroutine/running:nanoseconds":
+			if samples[i].Value.Uint64() < 1 {
+				t.Error("goroutine running nanoseconds is less than one")
+			}
 		case "/sched/goroutines:goroutines":
 			if samples[i].Value.Uint64() < 1 {
 				t.Error("number of goroutines is less than one")
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index b997a46..efa9190 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -990,8 +990,18 @@
 		}
 	}
 
-	// Handle tracking for scheduling latencies.
+	// Handle tracking for scheduling and running latencies.
+	now := nanotime()
+	if newval == _Grunning {
+		// We're transitioning into the running state, record the timestamp for
+		// subsequent use.
+		gp.lastsched = now
+	}
 	if oldval == _Grunning {
+		// We're transitioning out of running, record how long we were in the
+		// state.
+		gp.runningnanos += now - gp.lastsched
+
 		// Track every 8th time a goroutine transitions out of running.
 		if gp.trackingSeq%gTrackingPeriod == 0 {
 			gp.tracking = true
@@ -3240,6 +3250,12 @@
 	setGNoWB(&_g_.m.curg, nil)
 }
 
+// grunningnanos returns the running time observed by the current g.
+func grunningnanos() int64 {
+	gp := getg()
+	return gp.runningnanos + nanotime() - gp.lastsched
+}
+
 // checkTimers runs any timers for the P that are ready.
 // If now is not 0 it is the current time.
 // It returns the passed time or the current time if now was passed as 0.
@@ -3472,6 +3488,8 @@
 	gp.param = nil
 	gp.labels = nil
 	gp.timer = nil
+	gp.lastsched = 0
+	gp.runningnanos = 0
 
 	if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
 		// Flush assist credit to the global pool. This gives
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index 3d01ac5..630a3ab 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -486,6 +486,8 @@
 	labels         unsafe.Pointer // profiler labels
 	timer          *timer         // cached timer for time.Sleep
 	selectDone     uint32         // are we participating in a select and did someone win the race?
+	lastsched      int64          // timestamp when the G last started running
+	runningnanos   int64          // total time spent in the running state
 
 	// Per-G GC state
 
diff --git a/src/runtime/sizeof_test.go b/src/runtime/sizeof_test.go
index ebf544a..2ae536d 100644
--- a/src/runtime/sizeof_test.go
+++ b/src/runtime/sizeof_test.go
@@ -21,7 +21,7 @@
 		_32bit uintptr // size on 32bit platforms
 		_64bit uintptr // size on 64bit platforms
 	}{
-		{runtime.G{}, 236, 392},   // g, but exported for testing
+		{runtime.G{}, 236, 408},   // g, but exported for testing
 		{runtime.Sudog{}, 56, 88}, // sudog, but exported for testing
 	}

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 9:05:28 AM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot uploaded patch set #2 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from

fine-grained CPU attribution.



An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state. This commit does effectively the same, except for the
running state. Users are free to use this metric to power histograms or
tracking on-CPU time across a set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 50f1cb6f316a2e105c1177c5cb6b905de057c61c


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

7 files changed, 77 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 9:10:57 AM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot uploaded patch set #3 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the

runnable state ([go-review.googlesource.com/c/go/+/308933](https://go-review.googlesource.com/c/go/+/308933)). This commit


does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 3b7821c5c782bb0c459288b0e488ef10f4aeb488


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

7 files changed, 78 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 9:15:51 AM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.

Gerrit Bot uploaded patch set #4 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative for scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state ([go-review.googlesource.com/c/go/+/308933](https://go-review.googlesource.com/c/go/+/308933)). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 231b71cec190a42e18c13d1101c643855a76f31c


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

7 files changed, 76 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 9:27:21 AM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Keith Randall.

Gerrit Bot uploaded patch set #5 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative to scheduler-backed CPU attribution would be the use of


profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the

runnable state (go-review.googlesource.com/c/go/+/308933). This commit


does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc
GitHub-Last-Rev: 231b71cec190a42e18c13d1101c643855a76f31c
GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
7 files changed, 76 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Pratt (Gerrit)

unread,

Feb 24, 2022, 11:25:12 AM2/24/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Michael Pratt, Austin Clements, Michael Knyszek, Gopher Robot, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek.

Patch set 5:Run-TryBot +1

View Change

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Pratt (Gerrit)

unread,

Feb 24, 2022, 11:28:54 AM2/24/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Michael Pratt, Austin Clements, Michael Knyszek, Gopher Robot, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek.

View Change

1 comment:

Patchset:
- Patch Set #5:
  Somewhere in here we'll need to note the limitations. Notably that (as your metric description says) this is just "wall time spent in the running state". A running goroutine may be running on an OS thread that is descheduled by the OS scheduler. This time is not "on-cpu" but will count towards this metric.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Pratt (Gerrit)

unread,

Feb 24, 2022, 11:38:25 AM2/24/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Michael Pratt, Austin Clements, Michael Knyszek, Gopher Robot, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek.

View Change

1 comment:

Patchset:
- Patch Set #5:
  See further discussion at https://github.com/golang/go/issues/41554#issuecomment-1050043124.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 12:43:37 PM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek.

Gerrit Bot uploaded patch set #6 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 177a434cab882ad5b855d29061452b502e18cb84


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/metrics_test.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

7 files changed, 78 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 12:47:48 PM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek.

Gerrit Bot uploaded patch set #7 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 0d55fd11ce5e216c30f3802099dbf97292fcd8c7


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go

M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

6 files changed, 74 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Feb 24, 2022, 1:27:04 PM2/24/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

Gerrit Bot uploaded patch set #8 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 75d884d42a227d967b95b6c34342b432424fda59


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go
6 files changed, 74 insertions(+), 2 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Irfan Sharif (Gerrit)

unread,

Feb 24, 2022, 1:58:00 PM2/24/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Gopher Robot, Michael Pratt, Austin Clements, Michael Knyszek, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

View Change

3 comments:

Patchset:
- Patch Set #5:
  5 of 28 TryBots failed. […]
  Looks like on windows builds it's possible for the nanoseconds captured right at the start of TestReadMetricsConsistency to be zero. That's a bit surprising to me; I'm not sure I understand after reading the nanotime1 assembly impl for windows builds. I also noticed we don't have tests for /sched/latencies:seconds, presumably for similar reasons (?). Open to ideas about testing after we make progress on https://github.com/golang/go/issues/41554.
  Updated TestSizeOf for 32-bit machines (forgot that I was running local tests only on a 64-bit machine).
- Patch Set #5:
  Somewhere in here we'll need to note the limitations. […]
  Done

- Patch Set #5:
  See further discussion at https://github.com/golang/go/issues/41554#issuecomment-1050043124.

- Ack, let's continue the conversation there.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Ian Lance Taylor (Gerrit)

unread,

May 6, 2022, 7:34:11 PM5/6/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, Michael Knyszek, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

View Change

1 comment:

Patchset:
- Patch Set #8:
  Ping Michael Pratt. Do we want this for 1.19, or wait another cycle? Thanks.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Nathan VanBenschoten (Gerrit)

unread,

May 13, 2022, 11:59:48 AM5/13/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, Michael Knyszek, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

View Change

1 comment:

File src/runtime/proc.go:
- Patch Set #8, Line 1016: now := nanotime()
  We have a few calls to `nanotime()` here that are redundant with the new unconditional call above.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Knyszek (Gerrit)

unread,

May 13, 2022, 12:24:22 PM5/13/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Pratt.

View Change

4 comments:

Commit Message:
- Patch Set #8, Line 7: on-cpu
  this isn't totally accurate because nanotime doesn't properly measure CPU time. it doesn't account for a thread that's say, descheduled by the OS. there needs to be a lot of caveats and care given to the documentation about that.
Patchset:
- Patch Set #8:
  sorry for letting this fall through the cracks. thanks for the change.
  I don't fully see how this resolves #41554. in that issue, AFAICT, the desire was to attribute CPU time to *specific* goroutines, whereas this just exports a single total.
File src/runtime/metrics.go:
- Patch Set #8, Line 277: nanoseconds
  by convention, we expose time information as float64 seconds. better dynamic range and for most runtime values, it retains maximum precision.
File src/runtime/sizeof_test.go:
- Patch Set #8, Line 24: 252, 408},
  this is a fairly large increase to the G struct. it may be worth considering how we can either reorder the struct (though, not too much) to use up any spare padding, or if there's a way to encode the same information in fewer bytes (or, reuse existing fields).
  this is a cost incurred for every goroutine so we want to be careful about it; it can add up. this test exists so that we can capture that.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Knyszek (Gerrit)

unread,

May 13, 2022, 1:20:42 PM5/13/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Pratt.

View Change

1 comment:

Patchset:
- Patch Set #8:
  sorry for letting this fall through the cracks. thanks for the change. […]
  ah sorry, I misunderstood. the metric returns a different value depending on the goroutine that read it.
  I'm not certain this is a path we want to go down for this package; it feels a little strange that a different goroutine could read a different value *because* it's a different goroutine.
  because there's a subtlety here that doesn't exist for other metrics, this package seems like the wrong place for it (and maybe this should just be its own API)? I don't know.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Irfan Sharif (Gerrit)

unread,

May 13, 2022, 2:14:01 PM5/13/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, Michael Knyszek, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

View Change

5 comments:

Commit Message:

- Patch Set #8, Line 7: on-cpu

- this isn't totally accurate because nanotime doesn't properly measure CPU time. […]
  Re-worded to be more precise "track running time per goroutine", suggesting that it only captures the time spent in the running state (which could include time spent when descheduled by the OS). I also borrowed wording from Michael Pratt above for the godoc documentation by saying it was wall time. Open to other suggestions about how to make this more precise.
Patchset:
- Patch Set #8:

- the metric returns a different value depending on the goroutine that read it.

- Yup. There's some discussion in https://github.com/golang/go/issues/41554#issuecomment-1050063617; I'm open to exporting this metric through other means (including adding the API in some other package) -- I'll defer to you for what you'd prefer .
- Patch Set #8:
  Thanks for reviewing!
File src/runtime/metrics.go:

- Patch Set #8, Line 277: nanoseconds

- by convention, we expose time information as float64 seconds. […]
  Given this specifically exports nanoseconds, the uint64 felt appropriate (it's also the type we're getting out of `nanotime()`).
File src/runtime/sizeof_test.go:

- Patch Set #8, Line 24: 252, 408},

- Re-ordered a field and used `aligncheck` + `structlayout` to make sure we weren't leaving anything obvious on the table, brought it down by 8 bytes.

- if there's a way to encode the same information in fewer bytes (or, reuse existing fields).

- Nothing jumped out at me, do you have suggestions?

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

May 13, 2022, 2:16:55 PM5/13/22

to Irfan Sharif, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt.

Gerrit Bot uploaded patch set #9 to this change.

View Change

runtime,runtime/metrics: track on-cpu time per goroutine

Fixes #41554.

This commit introduces a /sched/goroutine/running:nanoseconds metric,
defined as the total time spent by a goroutine in the running state.
This measurement is useful for systems that would benefit from
fine-grained CPU attribution.

An alternative to scheduler-backed CPU attribution would be the use of
profiler labels. Given it's common to spawn multiple goroutines for the
same task, goroutine-backed statistics can easily become misleading.
Profiler labels instead let you measure CPU performance across a set of
cooperating goroutines. That said, it comes with overhead that makes it
unfeasible to always enable. For high-performance systems that care
about fine-grained CPU attribution (databases for e.g. that want to
measure total CPU time spent processing each request), profiler labels
are too cost-prohibitive, especially given the Go runtime has a much
cheaper view of the data needed.

It's worth noting that we already export /sched/latencies:seconds to
track scheduling latencies, i.e. time spent by a goroutine in the
runnable state (go-review.googlesource.com/c/go/+/308933). This commit
does effectively the same, except for the running state. Users are free
to use this metric to power histograms or tracking on-CPU time across a
set of goroutines.

Change-Id: Id21ae4fcee0cd5f983604d61dad373098a0966bc

GitHub-Last-Rev: 2b9dc3c5463ce2aec73db3879498aca44e666fa6


GitHub-Pull-Request: golang/go#51347
---
M src/runtime/metrics.go
M src/runtime/metrics/description.go
M src/runtime/metrics/doc.go
M src/runtime/proc.go
M src/runtime/runtime2.go
M src/runtime/sizeof_test.go

6 files changed, 75 insertions(+), 5 deletions(-)

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Irfan Sharif (Gerrit)

unread,

May 13, 2022, 2:18:30 PM5/13/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, Michael Knyszek, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Knyszek, Michael Pratt, Nathan VanBenschoten.

View Change

1 comment:

File src/runtime/proc.go:

- Patch Set #8, Line 1016: now := nanotime()
  We have a few calls to `nanotime()` here that are redundant with the new unconditional call above.

- Done

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Michael Knyszek (Gerrit)

unread,

May 13, 2022, 2:26:55 PM5/13/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Irfan Sharif, Michael Pratt, Nathan VanBenschoten.

View Change

1 comment:

File src/runtime/metrics.go:

- Patch Set #8, Line 277: nanoseconds

- Given this specifically exports nanoseconds, the uint64 felt appropriate (it's also the type we're g […]
  I understand, but as a rule, this package avoids providing units without prefixes so it's easier to parse and manipulate downstream. the fact that we measure time in nanoseconds is an internal implementation detail.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Rhys Hiltner (Gerrit)

unread,

May 13, 2022, 2:42:22 PM5/13/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Rhys Hiltner, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Irfan Sharif, Michael Pratt, Nathan VanBenschoten.

View Change

2 comments:

File src/runtime/metrics/doc.go:
- Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.
  This lets a goroutine have a peek into its history, to see if it's been around for a while or is newly launched. And, that view isn't available via any other goroutine. Is there precedent for this type of interface? It's no "goroutine local storage", but it's unlike the others I've seen.
  What if this information were (instead) part of the text-format stack traces, similar to the "blocked (for 5 minutes)" output today. There's a way for a goroutine to get that information about itself, and there's a way for a goroutine to get the view for the whole app (runtime.Stack with all=true).
File src/runtime/proc.go:
- Patch Set #9, Line 998: now := nanotime()
  What effect does this have on scheduling performance? This looks like it would be two or three vDSO calls each time a goroutine is scheduled (either running->runnable->running or running->waiting->runnable->running), whether or not an app is using this counter.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Irfan Sharif (Gerrit)

unread,

Jun 8, 2022, 4:23:21 PM6/8/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Rhys Hiltner, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Pratt, Nathan VanBenschoten, Rhys Hiltner.

View Change

2 comments:

File src/runtime/metrics/doc.go:

- Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.

- This lets a goroutine have a peek into its history, to see if it's been around for a while or is new […]
  Are you asking whether there's precedence for a goroutine viewing state that isn't viewable by others? We use https://github.com/petermattis/goid for e.g. which lets a goroutine peek into its private ID, though admittedly we're linking into runtime internals to make that happen.
  I'd be very happy with surfacing this information only as part of the text-format stack traces, which gives goroutines a way to grep for it for themselves and the whole app; done (PTAL). I don't know what the best textual format here should be so.
File src/runtime/proc.go:

- Patch Set #9, Line 998: now := nanotime()

What effect does this have on scheduling performance? This looks like it would be two or three vDSO […]

To evaluate scheduling performance, I tried running the microbenchmarks in this package with and without these changes. I'm not observing material differences. I wanted to ask you + others if you have suggestions on the kind of benchmarks that would help better quantify the hit on scheduling performance. In addition to the microbenchmarks below, as smoke tests I also ran system-wide benchmarks after building CockroachDB using this Go patch observing nothing outside noise when running stock workloads (TPC-C, KV).

    goos: linux
    goarch: amd64
    cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
    name                             old time/op    new time/op    delta
    PingPongHog-24                      517ns ± 8%     513ns ±13%    ~     (p=0.690 n=5+5)
    CreateGoroutines-24                 302ns ± 1%     304ns ± 1%    ~     (p=0.310 n=5+5)
    CreateGoroutinesParallel-24        33.6ns ± 1%    34.0ns ± 1%  +1.30%  (p=0.032 n=5+5)
    CreateGoroutinesCapture-24         2.86µs ± 2%    2.89µs ± 1%    ~     (p=0.310 n=5+5)
    CreateGoroutinesSingle-24           406ns ± 0%     407ns ± 1%    ~     (p=0.421 n=5+5)
    WakeupParallelSpinning/0s-24       14.7µs ± 1%    14.7µs ± 2%    ~     (p=0.548 n=5+5)
    WakeupParallelSpinning/1µs-24      19.0µs ± 3%    18.9µs ± 3%    ~     (p=1.000 n=5+5)
    WakeupParallelSpinning/2µs-24      24.4µs ± 4%    24.7µs ± 2%    ~     (p=0.421 n=5+5)
    WakeupParallelSpinning/5µs-24      36.7µs ± 3%    37.0µs ± 1%    ~     (p=0.548 n=5+5)
    WakeupParallelSpinning/10µs-24     54.0µs ± 0%    54.0µs ± 1%    ~     (p=0.802 n=5+5)
    WakeupParallelSpinning/20µs-24     96.3µs ± 0%    96.1µs ± 0%    ~     (p=0.222 n=5+5)
    WakeupParallelSpinning/50µs-24      222µs ± 0%     222µs ± 0%    ~     (p=0.690 n=5+5)
    WakeupParallelSpinning/100µs-24     386µs ± 2%     391µs ± 3%    ~     (p=0.310 n=5+5)
    WakeupParallelSyscall/0s-24         171µs ± 1%     170µs ± 1%    ~     (p=0.095 n=5+5)
    WakeupParallelSyscall/1µs-24        173µs ± 1%     172µs ± 1%    ~     (p=0.222 n=5+5)
    WakeupParallelSyscall/2µs-24        176µs ± 2%     174µs ± 1%    ~     (p=0.421 n=5+5)
    WakeupParallelSyscall/5µs-24        183µs ± 1%     184µs ± 1%    ~     (p=0.095 n=5+5)
    WakeupParallelSyscall/10µs-24       190µs ± 0%     193µs ± 1%  +1.49%  (p=0.008 n=5+5)
    WakeupParallelSyscall/20µs-24       213µs ± 1%     213µs ± 1%    ~     (p=0.548 n=5+5)
    WakeupParallelSyscall/50µs-24       274µs ± 1%     275µs ± 3%    ~     (p=0.690 n=5+5)
    WakeupParallelSyscall/100µs-24      377µs ± 1%     380µs ± 3%    ~     (p=0.151 n=5+5)
    Matmult-24                         0.97ns ± 1%    0.97ns ± 2%    ~     (p=0.841 n=5+5)

    name                             old alloc/op   new alloc/op   delta
    CreateGoroutinesCapture-24           144B ± 0%      144B ± 0%    ~     (all equal)

    name                             old allocs/op  new allocs/op  delta
    CreateGoroutinesCapture-24           5.00 ± 0%      5.00 ± 0%    ~     (all equal)

Retrieving the goroutine running time (included is the vDSO call to nanotime())
is in the order of nanoseconds:

    // BenchmarkGRunningNanos measures how costly it is to read the current
    // goroutine's running time when going through an internal runtime API.
    func BenchmarkGRunningNanos(b *testing.B) {
      for n := 0; n < b.N; n++ {
        _ = grunningnanos()
      }
    }

    goos: linux
    goarch: amd64
    cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
    BenchmarkGRunningNanos
    BenchmarkGRunningNanos-24       195321096               30.67 ns/op
    BenchmarkGRunningNanos-24       195100147               30.77 ns/op
    BenchmarkGRunningNanos-24       195415414               30.71 ns/op
    BenchmarkGRunningNanos-24       195564742               30.70 ns/op
    BenchmarkGRunningNanos-24       195472393               30.70 ns/op
    PASS

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Irfan Sharif (Gerrit)

unread,

Jun 8, 2022, 4:31:10 PM6/8/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Rhys Hiltner, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Michael Pratt, Nathan VanBenschoten, Rhys Hiltner.

View Change

1 comment:

File src/runtime/metrics/doc.go:

- Patch Set #9, Line 176: Wall time spent by the current goroutine in the running state.

- Are you asking whether there's precedence for a goroutine viewing state that isn't viewable by other […]
  Oops, didn't actually push out the change to include the info in the stack trace. I wanted to first qualify the performance hit in the thread below, ask about what the textual format could be before proceeding. Do we also want to move the proposal through the Proposals board first (https://github.com/golang/go/issues/41554#issuecomment-1145394074)?

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Ian Lance Taylor (Gerrit)

unread,

Nov 8, 2022, 4:27:35 PM11/8/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Rhys Hiltner, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Nathan VanBenschoten, Rhys Hiltner.

View Change

1 comment:

Patchset:
- Patch Set #9:
  Ping @mkny...@google.com Question below: move to proposal process?
  Also this CL has picked up a merge conflict.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Felix Geisendörfer (Gerrit)

unread,

Dec 17, 2022, 1:07:20 PM12/17/22

to Gerrit Bot, Irfan Sharif, goph...@pubsubhelper.golang.org, Rhys Hiltner, Nathan VanBenschoten, Ian Lance Taylor, Gopher Robot, Michael Pratt, Austin Clements, golang-co...@googlegroups.com

Attention is currently required from: Austin Clements, Irfan Sharif, Nathan VanBenschoten, Rhys Hiltner.

View Change

1 comment:

File src/runtime/proc.go:

- Patch Set #9, Line 998: now := nanotime()

- To evaluate scheduling performance, I tried running the microbenchmarks in this package with and wit […]
  The `PingPongHog` result caught my eye because it seems implausible given the ~30ns result for `BenchmarkGRunningNanos`. When I run this benchmark locally comparing HEAD^ of this patch against HEAD, I get the result below.
```

- name old time/op new time/op delta

- PingPongHog-10 281ns ±13% 354ns ±12% +25.97% (p=0.000 n=10+9)
```
A 25% regression on this benchmark might not be noticeable for many real-world workloads, but it seems important for evaluating whether this tracking needs an opt-in/opt-out mechanism.
I'm on arm64 (M1), but I'd expect similar results for amd64. Could you please verify the results again on your end?
PS: I like this feature and would love to see it land in Go. We also have "EXPLAIN ANALYZE" and resource isolation use cases for it at Datadog.

To view, visit change 387874. To unsubscribe, or for help writing mail filters, visit settings.

Reply all

Reply to author

Forward