[go] runtime: fix pprof cpu profile corruption on arm/mips/mipsle

70 views
Skip to first unread message

Russ Cox (Gerrit)

unread,
Jun 28, 2019, 1:54:49 AM6/28/19
to Austin Clements, Ian Lance Taylor, goph...@pubsubhelper.golang.org, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, Cherry Zhang, Russ Cox, golang-co...@googlegroups.com

Russ Cox would like Austin Clements to review this change.

View Change

runtime: fix pprof cpu profile corruption on arm/mips/mipsle

CL 42652 changed the profile handler for mips/mipsle to
avoid recording a profile when in atomic functions, for fear
of interrupting the 32-bit simulation of a 64-bit atomic with
a lock. The profile logger itself uses 64-bit atomics and might
deadlock (#20146).

The change was to accumulate a count of dropped profile events
and then send the count when the next ordinary event was sent:

if prof.hz != 0 {
+ if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
+ cpuprof.addLostAtomic64(lostAtomic64Count)
+ lostAtomic64Count = 0
+ }
cpuprof.add(gp, stk[:n])
}

CL 117057 extended this behavior to include GOARCH == "arm".

Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
the original cpuprof.add in that it neglects to acquire the lock
protecting the profile buffer.

This CL moves the lostAtomic count into the profile buffer and
then lets the existing addExtra calls take care of it, instead of
duplicating the locking logic.

This has caused a steady stream of flakes on the arm builders
for the past 12 months, ever since CL 117057 landed.

Fixes #24991.

Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
---
M src/runtime/cpuprof.go
M src/runtime/proc.go
2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/src/runtime/cpuprof.go b/src/runtime/cpuprof.go
index 81038f5..9bfdfe7 100644
--- a/src/runtime/cpuprof.go
+++ b/src/runtime/cpuprof.go
@@ -36,9 +36,10 @@
// 300 words per second.
// Hopefully a normal Go thread will get the profiling
// signal at least once every few seconds.
- extra [1000]uintptr
- numExtra int
- lostExtra uint64 // count of frames lost because extra is full
+ extra [1000]uintptr
+ numExtra int
+ lostExtra uint64 // count of frames lost because extra is full
+ lostAtomic uint64 // count of frames lost because of being in atomic64 on mips/arm; updated racily
}

var cpuprof cpuProfile
@@ -94,7 +95,7 @@
}

if prof.hz != 0 { // implies cpuprof.log != nil
- if p.numExtra > 0 || p.lostExtra > 0 {
+ if p.numExtra > 0 || p.lostExtra > 0 || p.lostAtomic > 0 {
p.addExtra()
}
hdr := [1]uint64{1}
@@ -159,18 +160,20 @@
funcPC(_LostExternalCode) + sys.PCQuantum,
funcPC(_ExternalCode) + sys.PCQuantum,
}
- cpuprof.log.write(nil, 0, hdr[:], lostStk[:])
+ p.log.write(nil, 0, hdr[:], lostStk[:])
p.lostExtra = 0
}
-}

-func (p *cpuProfile) addLostAtomic64(count uint64) {
- hdr := [1]uint64{count}
- lostStk := [2]uintptr{
- funcPC(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
- funcPC(_System) + sys.PCQuantum,
+ if p.lostAtomic > 0 {
+ hdr := [1]uint64{p.lostAtomic}
+ lostStk := [2]uintptr{
+ funcPC(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
+ funcPC(_System) + sys.PCQuantum,
+ }
+ p.log.write(nil, 0, hdr[:], lostStk[:])
+ p.lostAtomic = 0
}
- cpuprof.log.write(nil, 0, hdr[:], lostStk[:])
+
}

// CPUProfile panics.
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index b5cf9d4..93d329d 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3634,9 +3634,6 @@
func _LostSIGPROFDuringAtomic64() { _LostSIGPROFDuringAtomic64() }
func _VDSO() { _VDSO() }

-// Counts SIGPROFs received while in atomic64 critical section, on mips{,le}
-var lostAtomic64Count uint64
-
// Called if we receive a SIGPROF signal.
// Called by the signal handler, may run during STW.
//go:nowritebarrierrec
@@ -3654,7 +3651,7 @@
if GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm" {
if f := findfunc(pc); f.valid() {
if hasPrefix(funcname(f), "runtime/internal/atomic") {
- lostAtomic64Count++
+ cpuprof.lostAtomic++
return
}
}
@@ -3794,10 +3791,6 @@
}

if prof.hz != 0 {
- if (GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm") && lostAtomic64Count > 0 {
- cpuprof.addLostAtomic64(lostAtomic64Count)
- lostAtomic64Count = 0
- }
cpuprof.add(gp, stk[:n])
}
getg().m.mallocing--

To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
Gerrit-Change-Number: 184164
Gerrit-PatchSet: 1
Gerrit-Owner: Russ Cox <r...@golang.org>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Russ Cox <r...@golang.org>
Gerrit-CC: Bryan C. Mills <bcm...@google.com>
Gerrit-CC: Cherry Zhang <cher...@google.com>
Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
Gerrit-MessageType: newchange

Gobot Gobot (Gerrit)

unread,
Jun 28, 2019, 1:55:02 AM6/28/19
to Russ Cox, goph...@pubsubhelper.golang.org, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, Cherry Zhang, golang-co...@googlegroups.com

TryBots beginning. Status page: https://farmer.golang.org/try?commit=61199f53

View Change

    To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
    Gerrit-Change-Number: 184164
    Gerrit-PatchSet: 1
    Gerrit-Owner: Russ Cox <r...@golang.org>
    Gerrit-Reviewer: Austin Clements <aus...@google.com>
    Gerrit-Reviewer: Russ Cox <r...@golang.org>
    Gerrit-CC: Bryan C. Mills <bcm...@google.com>
    Gerrit-CC: Cherry Zhang <cher...@google.com>
    Gerrit-CC: Gobot Gobot <go...@golang.org>
    Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
    Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
    Gerrit-Comment-Date: Fri, 28 Jun 2019 05:54:58 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Russ Cox (Gerrit)

    unread,
    Jun 28, 2019, 1:56:06 AM6/28/19
    to Russ Cox, goph...@pubsubhelper.golang.org, Gobot Gobot, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, Cherry Zhang, golang-co...@googlegroups.com

    Uploaded patch set 2: Commit message was updated.

    View Change

      To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
      Gerrit-Change-Number: 184164
      Gerrit-PatchSet: 2
      Gerrit-Owner: Russ Cox <r...@golang.org>
      Gerrit-Reviewer: Austin Clements <aus...@google.com>
      Gerrit-Reviewer: Russ Cox <r...@golang.org>
      Gerrit-CC: Bryan C. Mills <bcm...@google.com>
      Gerrit-CC: Cherry Zhang <cher...@google.com>
      Gerrit-CC: Gobot Gobot <go...@golang.org>
      Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
      Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
      Gerrit-Comment-Date: Fri, 28 Jun 2019 05:56:01 +0000

      Russ Cox (Gerrit)

      unread,
      Jun 28, 2019, 1:56:07 AM6/28/19
      to Russ Cox, Austin Clements, goph...@pubsubhelper.golang.org, Bryan C. Mills, Vladimir Stefanovic, Gobot Gobot, Hyang-Ah Hana Kim, Cherry Zhang, golang-co...@googlegroups.com

      Russ Cox uploaded patch set #2 to this change.

      View Change

      runtime: fix pprof cpu profile corruption on arm/mips/mipsle

      CL 42652 changed the profile handler for mips/mipsle to
      avoid recording a profile when in atomic functions, for fear
      of interrupting the 32-bit simulation of a 64-bit atomic with
      a lock. The profile logger itself uses 64-bit atomics and might
      deadlock (#20146).

      The change was to accumulate a count of dropped profile events
      and then send the count when the next ordinary event was sent:

      if prof.hz != 0 {
      + if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
      + cpuprof.addLostAtomic64(lostAtomic64Count)
      + lostAtomic64Count = 0
      + }
      cpuprof.add(gp, stk[:n])
      }

      CL 117057 extended this behavior to include GOARCH == "arm".

      Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
      the original cpuprof.add in that it neglects to acquire the lock
      protecting the profile buffer.

      This has caused a steady stream of flakes on the arm builders
      for the past 12 months, ever since CL 117057 landed.

      This CL moves the lostAtomic count into the profile buffer and
      then lets the existing addExtra calls take care of it, instead of
      duplicating the locking logic.

      Fixes #24991.

      Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
      ---
      M src/runtime/cpuprof.go
      M src/runtime/proc.go
      2 files changed, 16 insertions(+), 20 deletions(-)

      To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
      Gerrit-Change-Number: 184164
      Gerrit-PatchSet: 2
      Gerrit-Owner: Russ Cox <r...@golang.org>
      Gerrit-Reviewer: Austin Clements <aus...@google.com>
      Gerrit-Reviewer: Russ Cox <r...@golang.org>
      Gerrit-CC: Bryan C. Mills <bcm...@google.com>
      Gerrit-CC: Cherry Zhang <cher...@google.com>
      Gerrit-CC: Gobot Gobot <go...@golang.org>
      Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
      Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
      Gerrit-MessageType: newpatchset

      Gobot Gobot (Gerrit)

      unread,
      Jun 28, 2019, 1:56:19 AM6/28/19
      to Russ Cox, goph...@pubsubhelper.golang.org, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, Cherry Zhang, golang-co...@googlegroups.com

      TryBots beginning. Status page: https://farmer.golang.org/try?commit=7c5187cc

      View Change

        To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

        Gerrit-Project: go
        Gerrit-Branch: master
        Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
        Gerrit-Change-Number: 184164
        Gerrit-PatchSet: 2
        Gerrit-Owner: Russ Cox <r...@golang.org>
        Gerrit-Reviewer: Austin Clements <aus...@google.com>
        Gerrit-Reviewer: Russ Cox <r...@golang.org>
        Gerrit-CC: Bryan C. Mills <bcm...@google.com>
        Gerrit-CC: Cherry Zhang <cher...@google.com>
        Gerrit-CC: Gobot Gobot <go...@golang.org>
        Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
        Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
        Gerrit-Comment-Date: Fri, 28 Jun 2019 05:56:16 +0000

        Gobot Gobot (Gerrit)

        unread,
        Jun 28, 2019, 2:06:50 AM6/28/19
        to Russ Cox, goph...@pubsubhelper.golang.org, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, Cherry Zhang, golang-co...@googlegroups.com

        TryBots are happy.

        Patch set 2:TryBot-Result +1

        View Change

          To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

          Gerrit-Project: go
          Gerrit-Branch: master
          Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
          Gerrit-Change-Number: 184164
          Gerrit-PatchSet: 2
          Gerrit-Owner: Russ Cox <r...@golang.org>
          Gerrit-Reviewer: Austin Clements <aus...@google.com>
          Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
          Gerrit-Reviewer: Russ Cox <r...@golang.org>
          Gerrit-CC: Bryan C. Mills <bcm...@google.com>
          Gerrit-CC: Cherry Zhang <cher...@google.com>
          Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
          Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
          Gerrit-Comment-Date: Fri, 28 Jun 2019 06:06:44 +0000
          Gerrit-HasComments: No
          Gerrit-Has-Labels: Yes
          Gerrit-MessageType: comment

          Cherry Zhang (Gerrit)

          unread,
          Jun 28, 2019, 6:06:17 AM6/28/19
          to Russ Cox, goph...@pubsubhelper.golang.org, Gobot Gobot, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, golang-co...@googlegroups.com

          Patch set 2:Code-Review +2

          View Change

            To view, visit change 184164. To unsubscribe, or for help writing mail filters, visit settings.

            Gerrit-Project: go
            Gerrit-Branch: master
            Gerrit-Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
            Gerrit-Change-Number: 184164
            Gerrit-PatchSet: 2
            Gerrit-Owner: Russ Cox <r...@golang.org>
            Gerrit-Reviewer: Austin Clements <aus...@google.com>
            Gerrit-Reviewer: Cherry Zhang <cher...@google.com>
            Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
            Gerrit-Reviewer: Russ Cox <r...@golang.org>
            Gerrit-CC: Bryan C. Mills <bcm...@google.com>
            Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
            Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
            Gerrit-Comment-Date: Fri, 28 Jun 2019 10:06:13 +0000

            Russ Cox (Gerrit)

            unread,
            Jun 28, 2019, 4:09:50 PM6/28/19
            to Russ Cox, goph...@pubsubhelper.golang.org, golang-...@googlegroups.com, Cherry Zhang, Gobot Gobot, Austin Clements, Bryan C. Mills, Vladimir Stefanovic, Hyang-Ah Hana Kim, golang-co...@googlegroups.com

            Russ Cox merged this change.

            View Change

            Approvals: Cherry Zhang: Looks good to me, approved Russ Cox: Run TryBots Gobot Gobot: TryBots succeeded
            runtime: fix pprof cpu profile corruption on arm/mips/mipsle

            CL 42652 changed the profile handler for mips/mipsle to
            avoid recording a profile when in atomic functions, for fear
            of interrupting the 32-bit simulation of a 64-bit atomic with
            a lock. The profile logger itself uses 64-bit atomics and might
            deadlock (#20146).

            The change was to accumulate a count of dropped profile events
            and then send the count when the next ordinary event was sent:

            if prof.hz != 0 {
            + if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
            + cpuprof.addLostAtomic64(lostAtomic64Count)
            + lostAtomic64Count = 0
            + }
            cpuprof.add(gp, stk[:n])
            }

            CL 117057 extended this behavior to include GOARCH == "arm".

            Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
            the original cpuprof.add in that it neglects to acquire the lock
            protecting the profile buffer.

            This has caused a steady stream of flakes on the arm builders
            for the past 12 months, ever since CL 117057 landed.

            This CL moves the lostAtomic count into the profile buffer and
            then lets the existing addExtra calls take care of it, instead of
            duplicating the locking logic.

            Fixes #24991.

            Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
            Reviewed-on: https://go-review.googlesource.com/c/go/+/184164
            Run-TryBot: Russ Cox <r...@golang.org>
            TryBot-Result: Gobot Gobot <go...@golang.org>
            Reviewed-by: Cherry Zhang <cher...@google.com>
            Gerrit-PatchSet: 3
            Gerrit-Owner: Russ Cox <r...@golang.org>
            Gerrit-Reviewer: Austin Clements <aus...@google.com>
            Gerrit-Reviewer: Cherry Zhang <cher...@google.com>
            Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
            Gerrit-Reviewer: Russ Cox <r...@golang.org>
            Gerrit-CC: Bryan C. Mills <bcm...@google.com>
            Gerrit-CC: Hyang-Ah Hana Kim <hya...@gmail.com>
            Gerrit-CC: Vladimir Stefanovic <vladimir....@rt-rk.com>
            Gerrit-MessageType: merged
            Reply all
            Reply to author
            Forward
            0 new messages