pprof: what are runtime.scanobject/gentraceback/pcvalue/etc

1,364 views
Skip to first unread message

mhhcbon

unread,
Sep 20, 2016, 11:17:37 AM9/20/16
to golang-nuts
Hi,

on my way to learn to use pprof, i now face some intriguing results with lots of runtime.* calls,

can you help me to understand what those call are and how i should care about ?

(pprof) top30
8720ms of 14380ms total (60.64%)
Dropped 118 nodes (cum <= 71.90ms)
Showing top 30 nodes out of 143 (cum >= 3800ms)
      flat  flat
%   sum%        cum   cum%
     
720ms  5.01%  5.01%     1050ms  7.30%  runtime.scanobject
     
700ms  4.87%  9.87%     3640ms 25.31%  runtime.gentraceback
     
670ms  4.66% 14.53%     1780ms 12.38%  runtime.pcvalue
     
490ms  3.41% 17.94%      490ms  3.41%  runtime.readvarint
     
420ms  2.92% 20.86%      910ms  6.33%  runtime.step
     
420ms  2.92% 23.78%      420ms  2.92%  runtime/internal/atomic.Xchg
     
380ms  2.64% 26.43%      380ms  2.64%  runtime.heapBitsForObject
     
340ms  2.36% 28.79%     1580ms 10.99%  runtime.mallocgc
     
310ms  2.16% 30.95%      360ms  2.50%  runtime.gopark
     
270ms  1.88% 32.82%      270ms  1.88%  runtime.heapBitsSetType
     
260ms  1.81% 34.63%      510ms  3.55%  runtime.scanblock
     
250ms  1.74% 36.37%      250ms  1.74%  runtime/internal/atomic.Cas
     
240ms  1.67% 38.04%     4040ms 28.09%  github.com/mh-cbon/chan/stream.(*Transform).Push
     
240ms  1.67% 39.71%     1210ms  8.41%  runtime.schedule
     
230ms  1.60% 41.31%      230ms  1.60%  runtime.newdefer
     
220ms  1.53% 42.84%      320ms  2.23%  runtime.deferreturn
     
210ms  1.46% 44.30%      260ms  1.81%  bytes.genSplit
     
210ms  1.46% 45.76%      260ms  1.81%  runtime.greyobject
     
210ms  1.46% 47.22%     6080ms 42.28%  runtime.systemstack
     
200ms  1.39% 48.61%      210ms  1.46%  runtime.acquireSudog
     
200ms  1.39% 50.00%     1760ms 12.24%  runtime.scanframeworker
     
190ms  1.32% 51.32%      790ms  5.49%  sync.(*Mutex).Lock
     
180ms  1.25% 52.57%      180ms  1.25%  runtime.gogo
     
180ms  1.25% 53.82%      180ms  1.25%  runtime.memmove
     
170ms  1.18% 55.01%      790ms  5.49%  runtime.chanrecv
     
170ms  1.18% 56.19%      530ms  3.69%  runtime.lock
     
170ms  1.18% 57.37%      170ms  1.18%  runtime.usleep
     
160ms  1.11% 58.48%      220ms  1.53%  runtime.siftdownTimer
     
160ms  1.11% 59.60%      160ms  1.11%  runtime/internal/atomic.Load
     
150ms  1.04% 60.64%     3800ms 26.43%  main.main.func2

Also, I m not totally comfortable with flat Vs cum meanings,
and those basics are not covered in https://blog.golang.org/profiling-go-programs

Can you take a moment to explain those two notions ?

One more Q. In this pprof output

(pprof) list Push
Total: 14.38s
ROUTINE
======================== github.com/mh-cbon/chan/stream.(*Readable).Push in /home/mh-cbon/gow/src/github.com/mh-cbon/chan/stream/reader.go
         
0      1.65s (flat, cum) 11.47% of Total
         
.          .     79:func (s *Readable) Push(b interface{}) {
         
.          .     80:  for s.Paused() {
         
.          .     81:    <- time.After(time.Millisecond * 1)
         
.          .     82:  }
         
.          .     83:  s.writeMutex.Lock()
         
.       10ms     84:  defer s.writeMutex.Unlock()
         
.          .     85:  for _, o := range s.Pipes {
         
.      1.64s     86:    o.Write(b)
         
.          .     87:  }
         
.          .     88:}
         
.          .     89:func (s *Readable) Catch(fn ErrorFunc) Stream {
         
.          .     90:  s.E = append(s.E, fn)
         
.          .     91:  return s
ROUTINE
======================== github.com/mh-cbon/chan/stream.(*Transform).Push in /home/mh-cbon/gow/src/github.com/mh-cbon/chan/stream/transform.go
     
240ms      4.13s (flat, cum) 28.72% of Total
         
.          .     72:  } else {
         
.          .     73:    s.Push(b)
         
.          .     74:  }
         
.          .     75:}
         
.          .     76:// Push writes data down in the stream.
     
40ms       40ms     77:func (s *Transform) Push(b interface{}) {
         
.          .     78:  // s.pipeMutex.Lock()
         
.          .     79:  // defer s.pipeMutex.Unlock()
     
190ms      190ms     80:  for _, o := range s.Pipes {
     
10ms      3.90s     81:    o.Write(b)
         
.          .     82:  }
         
.          .     83:}
         
.          .     84:func (s *Transform) Catch(fn ErrorFunc) Stream {
         
.          .     85:  s.E = append(s.E, fn)
         
.          .     86:  return s



The tow methods are very similar, but Transform.Push shows 190ms for the for loop, where Readable.Push does not show anything.
How to understand this ?


thanks!

Ian Lance Taylor

unread,
Sep 20, 2016, 12:37:55 PM9/20/16
to mhhcbon, golang-nuts
All the top functions in this profile are being run by the garbage collector.

The "flat" numbers are the time used by just that function. The "cum"
numbers are the time used by that function and all the functions that
it calls.
Without more information, I would simply assume that Transform.Push is
being called more often. The times are the total amount of time that
the program is spending in that function.

Ian
Message has been deleted

mhhcbon

unread,
Sep 21, 2016, 1:29:11 PM9/21/16
to golang-nuts, cpasmabo...@gmail.com
Hi,

thanks again for all the explanations.

I kept digging, and now i m having the same behavior as this thread

https://groups.google.com/forum/#!topic/golang-nuts/P1EFc4kMBfo

I run test with memprofile enabled, then use pprof to see the results, but the numbers are all 0.

$ go test -v -bench=. -cpuprofile=cpu.out -memprofile mem.out -benchmem -short stream/*
=== RUN   TestReader
--- PASS: TestReader (0.00s)
=== RUN   TestReader2
--- PASS: TestReader2 (0.00s)
BenchmarkByLine-4         100000         14654 ns/op        2248 B/op          39 allocs/op
PASS
ok      command-line-arguments    1.635s


$ go tool pprof stream.test mem.out
Entering interactive mode (type "help" for commands)
(pprof) top10
0 of 0 total (    0%)

      flat  flat
%   sum%        cum   cum%

         
0     0%     0%          0     0%  bytes.(*Buffer).WriteString
         
0     0%     0%          0     0%  bytes.(*Buffer).grow
         
0     0%     0%          0     0%  bytes.Split
         
0     0%     0%          0     0%  bytes.genSplit
         
0     0%     0%          0     0%  bytes.makeSlice
         
0     0%     0%          0     0%  command-line-arguments.(*Readable).Close
         
0     0%     0%          0     0%  command-line-arguments.(*Readable).Pipe
         
0     0%     0%          0     0%  command-line-arguments.(*Readable).Push
         
0     0%     0%          0     0%  command-line-arguments.(*Readable).Read
         
0     0%     0%          0     0%  command-line-arguments.(*Readable).ReadClose


I tried to put b.ReportAllocs() in front of the benchmark loop. It did not help.

At some points I added variations in the benchmark

$ go test -v -bench=. -cpuprofile=cpu.out -memprofile mem.out -benchmem -short stream/*
=== RUN   TestReader
--- PASS: TestReader (0.00s)
=== RUN   TestReader2
--- PASS: TestReader2 (0.00s)
BenchmarkReader1-4         500000          2337 ns/op         976 B/op          12 allocs/op
BenchmarkByLine-4          100000         16560 ns/op        2248 B/op          39 allocs/op
BenchmarkByLine2-4        1000000          2113 ns/op         848 B/op          13 allocs/op
PASS
ok      command-line-arguments    5.153s


which gives

$ go tool pprof stream.test mem.out
Entering interactive mode (type "help" for commands)
(pprof) top50
512.20kB of 512.20kB total (  100%)
Dropped 29 nodes (cum <= 2.56kB)

      flat  flat
%   sum%        cum   cum%

 
512.20kB   100%   100%   512.20kB   100%  runtime.malg
         
0     0%   100%   512.20kB   100%  runtime.allocm
         
0     0%   100%   512.20kB   100%  runtime.newm
         
0     0%   100%   512.20kB   100%  runtime.startTheWorldWithSema
         
0     0%   100%   512.20kB   100%  runtime.systemstack


$ go tool pprof -inuse_objects stream.test mem.out
Entering interactive mode (type "help" for commands)
(pprof) top100
1260 of 1260 total (  100%)
Dropped 29 nodes (cum <= 6)

      flat  flat
%   sum%        cum   cum%

     
1260   100%   100%       1260   100%  runtime.malg
         
0     0%   100%       1260   100%  runtime.allocm
         
0     0%   100%       1260   100%  runtime.newm
         
0     0%   100%       1260   100%  runtime.startTheWorldWithSema
         
0     0%   100%       1260   100%  runtime.systemstack


I m unclear because the tested func is definitely allocating,

// BytesByLine emits chunks of []byte split by line (\n).
func
BytesByLine() *Transform {
 
var buf []byte
 
var EOL = []byte("\n")

  byLine
:= func () [][]byte {
    ret
:= make([][]byte, 0)
    lines
:= bytes.Split(buf, EOL)
    isEOL
:= len(buf)>0 && buf[len(buf)-1:][0]==EOL[0]
   
for i, line := range lines {
     
if i==len(lines)-1 {
       
if isEOL && len(line)>0 {
          buf
=buf[:0]
          ret
= append(ret, line)
       
} else {
          buf
= line
       
}
     
} else {
        ret
= append(ret, line)
     
}
   
}
   
return ret
 
}

 
var TT TransformFunc
  TT
= func (s Stream, data interface{}) {
   
if val, ok := data.([]byte); ok {
      buf
= append(buf, val...)
     
for _, line := range byLine() {
        s
.Push(line)
     
}
   
} else {
      s
.Emit(errors.New("Value must be of type []byte"))
   
}
 
}

 
var FF FlushFunc
  FF
= func (s Stream) {
   
for _, line := range byLine() {
      s
.Push(line)
   
}
   
if len(buf)>0 {
      s
.Push(buf)
   
}
 
}

 
return NewTransformFlush(TT, FF)
}

mhhcbon

unread,
Sep 21, 2016, 1:31:36 PM9/21/16
to golang-nuts, cpasmabo...@gmail.com
... Sorry for the publishing issues..


// BytesByLine emits chunks of []byte splitted by line (\n).
Reply all
Reply to author
Forward
0 new messages