pprof: what are runtime.scanobject/gentraceback/pcvalue/etc

1,364 views

Skip to first unread message

mhhcbon

unread,

Sep 20, 2016, 11:17:37 AM9/20/16

to golang-nuts

Hi,

on my way to learn to use pprof, i now face some intriguing results with lots of runtime.* calls,

can you help me to understand what those call are and how i should care about ?

(pprof) top30
8720ms of 14380ms total (60.64%)
Dropped 118 nodes (cum <= 71.90ms)
Showing top 30 nodes out of 143 (cum >= 3800ms)
      flat  flat%   sum%        cum   cum%
     720ms  5.01%  5.01%     1050ms  7.30%  runtime.scanobject
     700ms  4.87%  9.87%     3640ms 25.31%  runtime.gentraceback
     670ms  4.66% 14.53%     1780ms 12.38%  runtime.pcvalue
     490ms  3.41% 17.94%      490ms  3.41%  runtime.readvarint
     420ms  2.92% 20.86%      910ms  6.33%  runtime.step
     420ms  2.92% 23.78%      420ms  2.92%  runtime/internal/atomic.Xchg
     380ms  2.64% 26.43%      380ms  2.64%  runtime.heapBitsForObject
     340ms  2.36% 28.79%     1580ms 10.99%  runtime.mallocgc
     310ms  2.16% 30.95%      360ms  2.50%  runtime.gopark
     270ms  1.88% 32.82%      270ms  1.88%  runtime.heapBitsSetType
     260ms  1.81% 34.63%      510ms  3.55%  runtime.scanblock
     250ms  1.74% 36.37%      250ms  1.74%  runtime/internal/atomic.Cas
     240ms  1.67% 38.04%     4040ms 28.09%  github.com/mh-cbon/chan/stream.(*Transform).Push
     240ms  1.67% 39.71%     1210ms  8.41%  runtime.schedule
     230ms  1.60% 41.31%      230ms  1.60%  runtime.newdefer
     220ms  1.53% 42.84%      320ms  2.23%  runtime.deferreturn
     210ms  1.46% 44.30%      260ms  1.81%  bytes.genSplit
     210ms  1.46% 45.76%      260ms  1.81%  runtime.greyobject
     210ms  1.46% 47.22%     6080ms 42.28%  runtime.systemstack
     200ms  1.39% 48.61%      210ms  1.46%  runtime.acquireSudog
     200ms  1.39% 50.00%     1760ms 12.24%  runtime.scanframeworker
     190ms  1.32% 51.32%      790ms  5.49%  sync.(*Mutex).Lock
     180ms  1.25% 52.57%      180ms  1.25%  runtime.gogo
     180ms  1.25% 53.82%      180ms  1.25%  runtime.memmove
     170ms  1.18% 55.01%      790ms  5.49%  runtime.chanrecv
     170ms  1.18% 56.19%      530ms  3.69%  runtime.lock
     170ms  1.18% 57.37%      170ms  1.18%  runtime.usleep
     160ms  1.11% 58.48%      220ms  1.53%  runtime.siftdownTimer
     160ms  1.11% 59.60%      160ms  1.11%  runtime/internal/atomic.Load
     150ms  1.04% 60.64%     3800ms 26.43%  main.main.func2

Also, I m not totally comfortable with flat Vs cum meanings,
and those basics are not covered in https://blog.golang.org/profiling-go-programs

Can you take a moment to explain those two notions ?

One more Q. In this pprof output

(pprof) list Push
Total: 14.38s
ROUTINE ======================== github.com/mh-cbon/chan/stream.(*Readable).Push in /home/mh-cbon/gow/src/github.com/mh-cbon/chan/stream/reader.go
         0      1.65s (flat, cum) 11.47% of Total
         .          .     79:func (s *Readable) Push(b interface{}) {
         .          .     80:  for s.Paused() {
         .          .     81:    <- time.After(time.Millisecond * 1)
         .          .     82:  }
         .          .     83:  s.writeMutex.Lock()
         .       10ms     84:  defer s.writeMutex.Unlock()
         .          .     85:  for _, o := range s.Pipes {
         .      1.64s     86:    o.Write(b)
         .          .     87:  }
         .          .     88:}
         .          .     89:func (s *Readable) Catch(fn ErrorFunc) Stream {
         .          .     90:  s.E = append(s.E, fn)
         .          .     91:  return s
ROUTINE ======================== github.com/mh-cbon/chan/stream.(*Transform).Push in /home/mh-cbon/gow/src/github.com/mh-cbon/chan/stream/transform.go
     240ms      4.13s (flat, cum) 28.72% of Total
         .          .     72:  } else {
         .          .     73:    s.Push(b)
         .          .     74:  }
         .          .     75:}
         .          .     76:// Push writes data down in the stream.
      40ms       40ms     77:func (s *Transform) Push(b interface{}) {
         .          .     78:  // s.pipeMutex.Lock()
         .          .     79:  // defer s.pipeMutex.Unlock()
     190ms      190ms     80:  for _, o := range s.Pipes {
      10ms      3.90s     81:    o.Write(b)
         .          .     82:  }
         .          .     83:}
         .          .     84:func (s *Transform) Catch(fn ErrorFunc) Stream {
         .          .     85:  s.E = append(s.E, fn)
         .          .     86:  return s

The tow methods are very similar, but Transform.Push shows 190ms for the for loop, where Readable.Push does not show anything.
How to understand this ?

thanks!

Ian Lance Taylor

unread,

Sep 20, 2016, 12:37:55 PM9/20/16

to mhhcbon, golang-nuts

All the top functions in this profile are being run by the garbage collector.

The "flat" numbers are the time used by just that function. The "cum"
numbers are the time used by that function and all the functions that
it calls.

Without more information, I would simply assume that Transform.Push is
being called more often. The times are the total amount of time that
the program is spending in that function.

Ian

Message has been deleted

mhhcbon

unread,

Sep 21, 2016, 1:29:11 PM9/21/16

to golang-nuts, cpasmabo...@gmail.com

Hi,

thanks again for all the explanations.

I kept digging, and now i m having the same behavior as this thread

https://groups.google.com/forum/#!topic/golang-nuts/P1EFc4kMBfo

I run test with memprofile enabled, then use pprof to see the results, but the numbers are all 0.

$ go test -v -bench=. -cpuprofile=cpu.out -memprofile mem.out -benchmem -short stream/*
=== RUN   TestReader
--- PASS: TestReader (0.00s)
=== RUN   TestReader2
--- PASS: TestReader2 (0.00s)
BenchmarkByLine-4         100000         14654 ns/op        2248 B/op          39 allocs/op
PASS
ok      command-line-arguments    1.635s

$ go tool pprof stream.test mem.out 
Entering interactive mode (type "help" for commands)
(pprof) top10
0 of 0 total (    0%)


      flat  flat%   sum%        cum   cum%


         0     0%     0%          0     0%  bytes.(*Buffer).WriteString
         0     0%     0%          0     0%  bytes.(*Buffer).grow
         0     0%     0%          0     0%  bytes.Split
         0     0%     0%          0     0%  bytes.genSplit
         0     0%     0%          0     0%  bytes.makeSlice
         0     0%     0%          0     0%  command-line-arguments.(*Readable).Close
         0     0%     0%          0     0%  command-line-arguments.(*Readable).Pipe
         0     0%     0%          0     0%  command-line-arguments.(*Readable).Push
         0     0%     0%          0     0%  command-line-arguments.(*Readable).Read
         0     0%     0%          0     0%  command-line-arguments.(*Readable).ReadClose

I tried to put b.ReportAllocs() in front of the benchmark loop. It did not help.

At some points I added variations in the benchmark

$ go test -v -bench=. -cpuprofile=cpu.out -memprofile mem.out -benchmem -short stream/*
=== RUN   TestReader
--- PASS: TestReader (0.00s)
=== RUN   TestReader2
--- PASS: TestReader2 (0.00s)
BenchmarkReader1-4         500000          2337 ns/op         976 B/op          12 allocs/op
BenchmarkByLine-4          100000         16560 ns/op        2248 B/op          39 allocs/op
BenchmarkByLine2-4        1000000          2113 ns/op         848 B/op          13 allocs/op
PASS
ok      command-line-arguments    5.153s

which gives

$ go tool pprof stream.test mem.out
Entering interactive mode (type "help" for commands)
(pprof) top50
512.20kB of 512.20kB total (  100%)
Dropped 29 nodes (cum <= 2.56kB)


      flat  flat%   sum%        cum   cum%


  512.20kB   100%   100%   512.20kB   100%  runtime.malg
         0     0%   100%   512.20kB   100%  runtime.allocm
         0     0%   100%   512.20kB   100%  runtime.newm
         0     0%   100%   512.20kB   100%  runtime.startTheWorldWithSema
         0     0%   100%   512.20kB   100%  runtime.systemstack

$ go tool pprof -inuse_objects stream.test mem.out
Entering interactive mode (type "help" for commands)
(pprof) top100
1260 of 1260 total (  100%)
Dropped 29 nodes (cum <= 6)


      flat  flat%   sum%        cum   cum%


      1260   100%   100%       1260   100%  runtime.malg
         0     0%   100%       1260   100%  runtime.allocm
         0     0%   100%       1260   100%  runtime.newm
         0     0%   100%       1260   100%  runtime.startTheWorldWithSema
         0     0%   100%       1260   100%  runtime.systemstack

I m unclear because the tested func is definitely allocating,

// BytesByLine emits chunks of []byte split by line (\n).
func BytesByLine() *Transform {
  var buf []byte
  var EOL = []byte("\n")

  byLine := func () [][]byte {
    ret := make([][]byte, 0)
    lines := bytes.Split(buf, EOL)
    isEOL := len(buf)>0 && buf[len(buf)-1:][0]==EOL[0]
    for i, line := range lines {
      if i==len(lines)-1 {
        if isEOL && len(line)>0 {
          buf=buf[:0]
          ret = append(ret, line)
        } else {
          buf = line
        }
      } else {
        ret = append(ret, line)
      }
    }
    return ret
  }

  var TT TransformFunc
  TT = func (s Stream, data interface{}) {
    if val, ok := data.([]byte); ok {
      buf = append(buf, val...)
      for _, line := range byLine() {
        s.Push(line)
      }
    } else {
      s.Emit(errors.New("Value must be of type []byte"))
    }
  }

  var FF FlushFunc
  FF = func (s Stream) {
    for _, line := range byLine() {
      s.Push(line)
    }
    if len(buf)>0 {
      s.Push(buf)
    }
  }

  return NewTransformFlush(TT, FF)
}

mhhcbon

unread,

Sep 21, 2016, 1:31:36 PM9/21/16

to golang-nuts, cpasmabo...@gmail.com

... Sorry for the publishing issues..

// BytesByLine emits chunks of []byte splitted by line (\n).

Reply all

Reply to author

Forward

0 new messages