Russ Cox would like Austin Clements to review this change.
runtime: fix pprof cpu profile corruption on arm/mips/mipsle
CL 42652 changed the profile handler for mips/mipsle to
avoid recording a profile when in atomic functions, for fear
of interrupting the 32-bit simulation of a 64-bit atomic with
a lock. The profile logger itself uses 64-bit atomics and might
deadlock (#20146).
The change was to accumulate a count of dropped profile events
and then send the count when the next ordinary event was sent:
if prof.hz != 0 {
+ if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
+ cpuprof.addLostAtomic64(lostAtomic64Count)
+ lostAtomic64Count = 0
+ }
cpuprof.add(gp, stk[:n])
}
CL 117057 extended this behavior to include GOARCH == "arm".
Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
the original cpuprof.add in that it neglects to acquire the lock
protecting the profile buffer.
This CL moves the lostAtomic count into the profile buffer and
then lets the existing addExtra calls take care of it, instead of
duplicating the locking logic.
This has caused a steady stream of flakes on the arm builders
for the past 12 months, ever since CL 117057 landed.
Fixes #24991.
Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
---
M src/runtime/cpuprof.go
M src/runtime/proc.go
2 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/src/runtime/cpuprof.go b/src/runtime/cpuprof.go
index 81038f5..9bfdfe7 100644
--- a/src/runtime/cpuprof.go
+++ b/src/runtime/cpuprof.go
@@ -36,9 +36,10 @@
// 300 words per second.
// Hopefully a normal Go thread will get the profiling
// signal at least once every few seconds.
- extra [1000]uintptr
- numExtra int
- lostExtra uint64 // count of frames lost because extra is full
+ extra [1000]uintptr
+ numExtra int
+ lostExtra uint64 // count of frames lost because extra is full
+ lostAtomic uint64 // count of frames lost because of being in atomic64 on mips/arm; updated racily
}
var cpuprof cpuProfile
@@ -94,7 +95,7 @@
}
if prof.hz != 0 { // implies cpuprof.log != nil
- if p.numExtra > 0 || p.lostExtra > 0 {
+ if p.numExtra > 0 || p.lostExtra > 0 || p.lostAtomic > 0 {
p.addExtra()
}
hdr := [1]uint64{1}
@@ -159,18 +160,20 @@
funcPC(_LostExternalCode) + sys.PCQuantum,
funcPC(_ExternalCode) + sys.PCQuantum,
}
- cpuprof.log.write(nil, 0, hdr[:], lostStk[:])
+ p.log.write(nil, 0, hdr[:], lostStk[:])
p.lostExtra = 0
}
-}
-func (p *cpuProfile) addLostAtomic64(count uint64) {
- hdr := [1]uint64{count}
- lostStk := [2]uintptr{
- funcPC(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
- funcPC(_System) + sys.PCQuantum,
+ if p.lostAtomic > 0 {
+ hdr := [1]uint64{p.lostAtomic}
+ lostStk := [2]uintptr{
+ funcPC(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
+ funcPC(_System) + sys.PCQuantum,
+ }
+ p.log.write(nil, 0, hdr[:], lostStk[:])
+ p.lostAtomic = 0
}
- cpuprof.log.write(nil, 0, hdr[:], lostStk[:])
+
}
// CPUProfile panics.
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index b5cf9d4..93d329d 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3634,9 +3634,6 @@
func _LostSIGPROFDuringAtomic64() { _LostSIGPROFDuringAtomic64() }
func _VDSO() { _VDSO() }
-// Counts SIGPROFs received while in atomic64 critical section, on mips{,le}
-var lostAtomic64Count uint64
-
// Called if we receive a SIGPROF signal.
// Called by the signal handler, may run during STW.
//go:nowritebarrierrec
@@ -3654,7 +3651,7 @@
if GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm" {
if f := findfunc(pc); f.valid() {
if hasPrefix(funcname(f), "runtime/internal/atomic") {
- lostAtomic64Count++
+ cpuprof.lostAtomic++
return
}
}
@@ -3794,10 +3791,6 @@
}
if prof.hz != 0 {
- if (GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm") && lostAtomic64Count > 0 {
- cpuprof.addLostAtomic64(lostAtomic64Count)
- lostAtomic64Count = 0
- }
cpuprof.add(gp, stk[:n])
}
getg().m.mallocing--
To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.
TryBots beginning. Status page: https://farmer.golang.org/try?commit=61199f53
Uploaded patch set 2: Commit message was updated.
Russ Cox uploaded patch set #2 to this change.
runtime: fix pprof cpu profile corruption on arm/mips/mipsle
CL 42652 changed the profile handler for mips/mipsle to
avoid recording a profile when in atomic functions, for fear
of interrupting the 32-bit simulation of a 64-bit atomic with
a lock. The profile logger itself uses 64-bit atomics and might
deadlock (#20146).
The change was to accumulate a count of dropped profile events
and then send the count when the next ordinary event was sent:
if prof.hz != 0 {
+ if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
+ cpuprof.addLostAtomic64(lostAtomic64Count)
+ lostAtomic64Count = 0
+ }
cpuprof.add(gp, stk[:n])
}
CL 117057 extended this behavior to include GOARCH == "arm".
Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
the original cpuprof.add in that it neglects to acquire the lock
protecting the profile buffer.
This has caused a steady stream of flakes on the arm builders
for the past 12 months, ever since CL 117057 landed.
This CL moves the lostAtomic count into the profile buffer and
then lets the existing addExtra calls take care of it, instead of
duplicating the locking logic.
Fixes #24991.
Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
---
M src/runtime/cpuprof.go
M src/runtime/proc.go
2 files changed, 16 insertions(+), 20 deletions(-)
To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.
TryBots are happy.
Patch set 2:TryBot-Result +1
Patch set 2:Code-Review +2
Russ Cox merged this change.
runtime: fix pprof cpu profile corruption on arm/mips/mipsle
CL 42652 changed the profile handler for mips/mipsle to
avoid recording a profile when in atomic functions, for fear
of interrupting the 32-bit simulation of a 64-bit atomic with
a lock. The profile logger itself uses 64-bit atomics and might
deadlock (#20146).
The change was to accumulate a count of dropped profile events
and then send the count when the next ordinary event was sent:
if prof.hz != 0 {
+ if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
+ cpuprof.addLostAtomic64(lostAtomic64Count)
+ lostAtomic64Count = 0
+ }
cpuprof.add(gp, stk[:n])
}
CL 117057 extended this behavior to include GOARCH == "arm".
Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
the original cpuprof.add in that it neglects to acquire the lock
protecting the profile buffer.
This has caused a steady stream of flakes on the arm builders
for the past 12 months, ever since CL 117057 landed.
This CL moves the lostAtomic count into the profile buffer and
then lets the existing addExtra calls take care of it, instead of
duplicating the locking logic.
Fixes #24991.
Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
Reviewed-on: https://go-review.googlesource.com/c/go/+/184164
Run-TryBot: Russ Cox <r...@golang.org>
TryBot-Result: Gobot Gobot <go...@golang.org>
Reviewed-by: Cherry Zhang <cher...@google.com>