perfs of a Go binary VS C++

528 views
Skip to first unread message

Sebastien Binet

unread,
Oct 16, 2017, 11:53:56 AM10/16/17
to golang-nuts
hi there,

I have this Go program that reads a binary file format coming from CERN.
I am trying to get it running almost as fast as its C++ counter-part.

the profile and code are here:

on my machine, I get the following timings for the C++ (g++-7.2.0) and the Go-1.9.1 programs:

=== C++/ROOT ===
real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0

=== go-hep/rootio ===
real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0

any idea on how to speed that up?

I have also this rather interesting strace-log for the Go program:

ie: there is a clear pattern of "epoll_wait/pselect6" on one thread followed by a bunch of "pread64" on another thread.
I suspect this is the runtime I/O layer (?) dispatching data to the "worker" thread/goroutine ?
(I have tried with GOGC=off and it was still there, as well as the same timings for my Go binary.)

I have put a ~764Mb data file there:

if anybody wants to have a look.

I initially thought it was a (lack of) bufio.Reader issue (as the file format forces you to do a bunch of seeks or readat, using bufio.Reader wasn't completely straightforward.)
but, mmap-ing the whole file in memory didn't help.

here is the top50 from the CPU profile:

(pprof) top50   
Showing nodes accounting for 12.90s, 99.15% of 13.01s total
Dropped 4 nodes (cum <= 0.07s)
      flat  flat%   sum%        cum   cum%
     2.83s 21.75% 21.75%      3.21s 24.67%  go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
     1.35s 10.38% 32.13%     12.99s 99.85%  go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/scanner.go
     1.26s  9.68% 41.81%      1.26s  9.68%  go-hep.org/x/hep/rootio.(*tbranch).loadEntry /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
     1.17s  8.99% 50.81%      2.01s 15.45%  go-hep.org/x/hep/rootio.(*tbranch).loadBasket /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
     1.15s  8.84% 59.65%      7.15s 54.96%  go-hep.org/x/hep/rootio.(*tbranch).loadEntry /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
     0.82s  6.30% 65.95%      3.06s 23.52%  go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
     0.76s  5.84% 71.79%      3.90s 29.98%  go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
     0.62s  4.77% 76.56%      1.55s 11.91%  go-hep.org/x/hep/rootio.(*LeafD).scan /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
     0.56s  4.30% 80.86%      0.56s  4.30%  syscall.Syscall6 /usr/lib/go/src/syscall/asm_linux_amd64.s
     0.44s  3.38% 84.24%      0.93s  7.15%  reflect.Indirect /usr/lib/go/src/reflect/value.go
     0.38s  2.92% 87.16%      0.38s  2.92%  reflect.Value.Elem /usr/lib/go/src/reflect/value.go
     0.34s  2.61% 89.78%      0.34s  2.61%  reflect.ValueOf /usr/lib/go/src/reflect/value.go
     0.28s  2.15% 91.93%      0.28s  2.15%  go-hep.org/x/hep/rootio.(*tbranch).findBasketIndex /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
     0.21s  1.61% 93.54%      0.21s  1.61%  go-hep.org/x/hep/rootio.(*LeafD).Len <autogenerated>
     0.18s  1.38% 94.93%      0.18s  1.38%  go-hep.org/x/hep/rootio.(*LeafD).UnmarshalROOT /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/rbuffer.go
     0.17s  1.31% 96.23%      0.17s  1.31%  go-hep.org/x/hep/rootio.(*tbranch).loadEntry /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/rbuffer.go
     0.11s  0.85% 97.08%      0.11s  0.85%  reflect.Value.Elem /usr/lib/go/src/reflect/type.go
     0.09s  0.69% 97.77%      0.09s  0.69%  go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/rbuffer.go
     0.09s  0.69% 98.46%      0.09s  0.69%  go-hep.org/x/hep/rootio.(*LeafD).readBasket /usr/lib/go/src/encoding/binary/binary.go
     0.08s  0.61% 99.08%      0.08s  0.61%  go-hep.org/x/hep/rootio.(*tleaf).Offset /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf.go
     0.01s 0.077% 99.15%     13.01s   100%  main.main /tmp/acat/read-data.go
         0     0% 99.15%      0.56s  4.30%  go-hep.org/x/hep/rootio.(*File).ReadAt /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/file.go
         0     0% 99.15%      0.11s  0.85%  go-hep.org/x/hep/rootio.(*Key).load /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/key.go
         0     0% 99.15%      0.56s  4.30%  internal/poll.(*FD).Pread /usr/lib/go/src/internal/poll/fd_unix.go
         0     0% 99.15%      0.11s  0.85%  io.(*SectionReader).Read /usr/lib/go/src/io/io.go
         0     0% 99.15%      0.11s  0.85%  io.ReadAtLeast /usr/lib/go/src/io/io.go
         0     0% 99.15%      0.11s  0.85%  io.ReadFull /usr/lib/go/src/io/io.go
         0     0% 99.15%      0.56s  4.30%  os.(*File).ReadAt /usr/lib/go/src/os/file.go
         0     0% 99.15%      0.56s  4.30%  os.(*File).pread /usr/lib/go/src/os/file_unix.go
         0     0% 99.15%     13.01s   100%  runtime.main /usr/lib/go/src/runtime/proc.go
         0     0% 99.15%      0.56s  4.30%  syscall.Pread /usr/lib/go/src/syscall/zsyscall_linux_amd64.go


anything obvious I missed ?

cheers,
-s

mura

unread,
Oct 17, 2017, 12:01:42 AM10/17/17
to golang-nuts
Hi,

A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take up to 55 secs when bs = 1.
Also I observe that the MaxRSS of the C++ program is much bigger than the Go version, so I suspect the C++ program probably does some kind of caching or preloading.

-mura

Sebastien Binet

unread,
Oct 17, 2017, 6:48:43 AM10/17/17
to mura, golang-nuts
hi,

(thanks for having a look at this)

On Tue, Oct 17, 2017 at 6:01 AM, mura <aruma...@gmail.com> wrote:
Hi,

A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take up to 55 secs when bs = 1.
Also I observe that the MaxRSS of the C++ program is much bigger than the Go version, so I suspect the C++ program probably does some kind of caching or preloading.

most of that extra RSS is coming from the C++ libraries that are linked against.
that said, it's true the C++ version does both caching and prefetching.
I am trying to devise a test case where both are disabled.

but, if I were to ioutil.ReadFile("foo.root") and then use that []byte slice as input (implementing all the io.Reader, io.ReaderAt ... needed interfaces) I should be able to at least recoup the caching effect, right? (that's in effect pretty much the same thing than mmap-ing the whole in memory, isn't?)

-s
 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Jones

unread,
Oct 17, 2017, 7:15:50 AM10/17/17
to Sebastien Binet, mura, golang-nuts
yes
--
Michael T. Jones
michae...@gmail.com

mura

unread,
Oct 18, 2017, 6:21:17 AM10/18/17
to golang-nuts
Could you use `perf record ./test f64s.root` and then `perf report --stdio` for both the Go and C++ programs? (You may need to `echo 0 | sudo tee /proc/sys/kernel/kptr_restrict before perf record, supposing that you can test it on Linux)

I am just interested in the performance comparison between C++ and Go too. :D

-mura
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Sokolov Yura

unread,
Oct 19, 2017, 2:32:36 AM10/19/17
to golang-nuts

Sebastien Binet

unread,
Oct 19, 2017, 2:43:07 AM10/19/17
to Sokolov Yura, golang-nuts
that was also my first knee-jerk reaction.
but I can't do that:
the reader needs to implement io.Reader, io.ReaderAt and io.Seeker.
bufio.Reader doesn't implement the last two.

that's why I've implemented (in another branch) a little reader that just call ioutil.ReadFile, works off the raw []byte and the implements the above 3 interfaces.
but I didn't notice any improvements overall.

-s

mura

unread,
Oct 19, 2017, 2:54:34 AM10/19/17
to golang-nuts
Hi,
Can you publish the C++ test program as well ? I have the ROOT installed and would like to explore the issue further.

-mura

Sebastien Binet

unread,
Oct 19, 2017, 3:03:55 AM10/19/17
to mura, golang-nuts
hi,


On Thu, Oct 19, 2017 at 8:54 AM, mura <aruma...@gmail.com> wrote:
Hi,
Can you publish the C++ test program as well ? I have the ROOT installed and would like to explore the issue further.

here's everything:

 -> read-data.go
 -> read-data.cxx
 -> perf-cxx.data (perf file)
 -> perf-cxx.data.txt (output of perf report)
 -> perf-go.data
 -> perf-go.data.txt

nothing jumped at me when looking at the perf reports.
(but I am a bit rusty with perf thanks to "go tool pprof" :)

-s

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Ingo Krabbe

unread,
Sep 6, 2019, 8:40:28 AM9/6/19
to golang-nuts
     1.35s 10.38% 32.13%     12.99s 99.85%  go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/scanner.go

Looks like you spent your time in a scanner routine.
I never used scanners to scan text tokens, but I guess there is a big perfomance penalty depending on the structure of your data.
The most important problem in scanners is when they start copying the data for scanning.

If you for example would have an STX [LENGTH] [DATA BLOCK]  logic in your data, the scanner is a big gun shooting for nothing. You could then generate []byte blobs pointing into your hopefully mmapped data file, that you can communicate through channels and put concurrent parallel worker to classify your data blocks. I guess that's exactly how you should sequentuallly streamline your data for HEP analysis. Actually that approach would be something , that golang is very good at.
Reply all
Reply to author
Forward
0 new messages