hi there,
I have this Go program that reads a binary file format coming from CERN.
I am trying to get it running almost as fast as its C++ counter-part.
the profile and code are here:
on my machine, I get the following timings for the C++ (g++-7.2.0) and the Go-1.9.1 programs:
=== C++/ROOT ===
real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
=== go-hep/rootio ===
real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0
real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0
real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0
real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
any idea on how to speed that up?
I have also this rather interesting strace-log for the Go program:
ie: there is a clear pattern of "epoll_wait/pselect6" on one thread followed by a bunch of "pread64" on another thread.
I suspect this is the runtime I/O layer (?) dispatching data to the "worker" thread/goroutine ?
(I have tried with GOGC=off and it was still there, as well as the same timings for my Go binary.)
I have put a ~764Mb data file there:
if anybody wants to have a look.
I initially thought it was a (lack of) bufio.Reader issue (as the file format forces you to do a bunch of seeks or readat, using bufio.Reader wasn't completely straightforward.)
but, mmap-ing the whole file in memory didn't help.
here is the top50 from the CPU profile:
(pprof) top50
Showing nodes accounting for 12.90s, 99.15% of 13.01s total
Dropped 4 nodes (cum <= 0.07s)
flat flat% sum% cum cum%
0.56s 4.30% 80.86% 0.56s 4.30% syscall.Syscall6 /usr/lib/go/src/syscall/asm_linux_amd64.s
0.44s 3.38% 84.24% 0.93s 7.15% reflect.Indirect /usr/lib/go/src/reflect/value.go
0.38s 2.92% 87.16% 0.38s 2.92% reflect.Value.Elem /usr/lib/go/src/reflect/value.go
0.34s 2.61% 89.78% 0.34s 2.61% reflect.ValueOf /usr/lib/go/src/reflect/value.go
0.11s 0.85% 97.08% 0.11s 0.85% reflect.Value.Elem /usr/lib/go/src/reflect/type.go
0.01s 0.077% 99.15% 13.01s 100% main.main /tmp/acat/read-data.go
0 0% 99.15% 0.56s 4.30% internal/poll.(*FD).Pread /usr/lib/go/src/internal/poll/fd_unix.go
0 0% 99.15% 0.11s 0.85% io.(*SectionReader).Read /usr/lib/go/src/io/io.go
0 0% 99.15% 0.11s 0.85% io.ReadAtLeast /usr/lib/go/src/io/io.go
0 0% 99.15% 0.11s 0.85% io.ReadFull /usr/lib/go/src/io/io.go
0 0% 99.15% 0.56s 4.30% os.(*File).ReadAt /usr/lib/go/src/os/file.go
0 0% 99.15% 0.56s 4.30% os.(*File).pread /usr/lib/go/src/os/file_unix.go
0 0% 99.15% 13.01s 100% runtime.main /usr/lib/go/src/runtime/proc.go
0 0% 99.15% 0.56s 4.30% syscall.Pread /usr/lib/go/src/syscall/zsyscall_linux_amd64.go
anything obvious I missed ?
cheers,
-s