Help interpreting pprof output

368 views
Skip to first unread message

Jens-Uwe Mager

unread,
Oct 24, 2011, 10:47:22 AM10/24/11
to golan...@googlegroups.com
Hi!

I am attempting to make a program faster that reads about ~700MB of text data (6 float64 values per line) and puts that into a special binary file 8of about 84MB) for further processing. I did use pprof on the program and it looks like this:

(pprof) top20
Total: 24567 samples
   24566 100.0% 100.0%    24566 100.0% runtime.mach_semaphore_wait
       1   0.0% 100.0%        1   0.0% syscall.Syscall
       0   0.0% 100.0%        1   0.0% bufio..ReadRune
       0   0.0% 100.0%        1   0.0% bufio..fill
       0   0.0% 100.0%        1   0.0% fmt..ReadRune
       0   0.0% 100.0%        1   0.0% fmt..accept
       0   0.0% 100.0%        1   0.0% fmt..consume
       0   0.0% 100.0%        1   0.0% fmt..doScan
       0   0.0% 100.0%        1   0.0% fmt..floatToken
       0   0.0% 100.0%        1   0.0% fmt..getRune
       0   0.0% 100.0%        1   0.0% fmt..scanOne
       0   0.0% 100.0%        1   0.0% fmt.Fscanln
       0   0.0% 100.0%    24566 100.0% getprofile
       0   0.0% 100.0%        1   0.0% main.main
       0   0.0% 100.0%        1   0.0% main.readWAM85File
       0   0.0% 100.0%        1   0.0% os..Read
       0   0.0% 100.0%        1   0.0% os..read
       0   0.0% 100.0%    24566 100.0% runtime.CPUProfile
       0   0.0% 100.0%    24566 100.0% runtime.mach_semacquire
       0   0.0% 100.0%        1   0.0% runtime.mainstart

I do understand that most of time is taken by reading the lines with the float values, but I get the feeling that something is wrong with the pprof output. I used today's tip to compile the go program on darwin amd64. Any idea why the profile output looks so weird?

Andy Balholm

unread,
Oct 24, 2011, 1:11:06 PM10/24/11
to golan...@googlegroups.com
I don't know about the pprof problem (it looks strange to me, but I'm no expert on pprof), but one potential performance bottleneck I see is bufio..ReadRune. If all you're after is floating-point numbers that are written conventionally (with digits 0-9, decimal point, 'e' and '-'), you might be better off doing your parsing in ASCII rather than Unicode, because all the characters you care about are plain ASCII. Then you wouldn't have the overhead of calling functions that decode UTF-8 bytes to runes.

Jens-Uwe Mager

unread,
Oct 24, 2011, 1:21:57 PM10/24/11
to golan...@googlegroups.com
Ok, I have looked into package fmt and I see no obvious way to do the scanning in plain ASCII, and hints?

Russ Cox

unread,
Oct 24, 2011, 2:00:53 PM10/24/11
to golan...@googlegroups.com
You are using a Mac.

OS X's setitimer implementation is broken, so profiling is too.
The data you get is sometimes quite bogus, as in this case.

http://code.google.com/p/go/source/detail?r=35b716c94225
has some details.

Russ

Andy Balholm

unread,
Oct 24, 2011, 2:40:50 PM10/24/11
to golan...@googlegroups.com
Depending on the format of the input file, I would try replacing the calls to fmt.Scan with a combination of bufio.Reader.ReadString (to read lines from the file), strings.Split (to divide it into fields at spaces, tabs, commas, or whatever separator your file uses), and strconv.Atof64 (to convert the numbers from strings to floats).

However, this will allocate a string for each line. It is possible that the added GC overhead will outweigh the advantage of not doing the Unicode conversions. You'll need to benchmark it to see.

Or you could write the floating-point conversion yourself, optimized for the specific number format that is used in your file. It just depends where the tradeoff point is between programmer time and computer time.

Jens-Uwe Mager

unread,
Oct 24, 2011, 5:00:31 PM10/24/11
to golan...@googlegroups.com, r...@golang.org
I tried running the profiling on a gentoo Linux system and get the following result:

runtime: memory allocated by OS not in usable range
runtime: out of memory: cannot allocate 11141120-byte block (528220160 in use)
throw: out of memory

goroutine 1 [running]:
main.readWAM85File(0x34756e20, 0x19, 0x0, 0x0, 0x0, ...)
        /home/jum/wam85/wam85.go:482 +0x14a
main.main()
        /home/jum/wam85/wam85.go:121 +0x5e3

goroutine 2 [syscall]:
runtime/pprof.profileWriter(0x18618220, 0x18618200)
        /usr/local/go/src/pkg/runtime/pprof/pprof.go:154 +0x22
created by runtime/pprof.StartCPUProfile
        /usr/local/go/src/pkg/runtime/pprof/pprof.go:148 +0x140


Jens-Uwe Mager

unread,
Oct 24, 2011, 7:13:55 PM10/24/11
to golan...@googlegroups.com
Thanks, using bytes instead of standard strings got the execution time from:

real    4m16.595s
user    4m5.970s
sys    0m2.168s


to:

real    2m44.620s
user    2m38.058s
sys    0m1.449s

This is definitely a big improvement.
Reply all
Reply to author
Forward
0 new messages