Generics faster than native float64?

260 views
Skip to first unread message

Feng Tian

unread,
Apr 18, 2022, 4:15:05 PM4/18/22
to golang-nuts
Hi, I have the following simple benchmark code, 


I run this on my laptop since Go playground does not run benchmark code.   The strange thing is that Copy of float64 is slower than copy using generics.   I can imagine generics may add no overhead, but how can it be faster?   

ftian@DESKTOP-16FCU43:~/tmp$ go test -bench=.
goos: linux
goarch: amd64
pkg: a
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
BenchmarkCopy-8          5693944               221.7 ns/op
BenchmarkCopyG-8         8885454               137.1 ns/op
PASS
ok      a       2.838s

Ian Lance Taylor

unread,
Apr 19, 2022, 12:37:48 AM4/19/22
to Feng Tian, golang-nuts
The numbers for this kind of micro-benchmark can be deceptive. For
example, they can be highly affected by alignment of the instruction
loop. I don't know exactly what is happening for you. I compiled the
code with "go test -c" and disassembled it: both benchmark functions
contained exactly the same instructions.

Ian

Feng Tian

unread,
Apr 19, 2022, 2:58:39 AM4/19/22
to Ian Lance Taylor, golang-nuts
Thank you, yes, I took a look at the assembly and both generate MMX code, I did not check the preamble though.   That said, the result is very consistent on my machine.   Surprise.

I am happy with no-overhead, so everything is good.

peterGo

unread,
Apr 19, 2022, 11:06:39 AM4/19/22
to golang-nuts
As Ian has pointed out, software and hardware optimizations may distort the results of microbenchmarks.

If I run the original benchmarks using go1.18 and go1.19 the Copy and CopyG ns/op results are reversed.

# Original: https://go.dev/play/p/m1ClnbdbdWi

~/x$ go1.18 version && go1.18 test x_0_test.go -bench=.
go version go1.18.1 linux/amd64
BenchmarkCopy-8     3820009    310.7 ns/op
BenchmarkCopyG-8    7552230    158.3 ns/op

~/x$ go version && go test x_0_test.go -bench=.
go version devel go1.19-a11a885cb5 Mon Apr 18 23:57:00 2022 +0000 linux/amd64
BenchmarkCopy-8     7577499    158.2 ns/op
BenchmarkCopyG-8    3870822    309.7 ns/op

If I run the benchmarks with a ResetTimer() using go1.18 and go1.19 the Copy and CopyG ns/op results are effectively the same.

# ResetTimer: https://go.dev/play/p/hansq5ARrSh

~/x$ go1.18 version && go1.18 test x_1_test.go -bench=.
go version go1.18.1 linux/amd64
BenchmarkCopy-8     7581037    158.0 ns/op
BenchmarkCopyG-8    7590849    157.9 ns/op

~/x$ go version && go test x_1_test.go -bench=.
go version devel go1.19-a11a885cb5 Mon Apr 18 23:57:00 2022 +0000 linux/amd64
BenchmarkCopy-8     7525525    158.5 ns/op
BenchmarkCopyG-8    7521787    158.5 ns/op

If I run the original benchmarks on a Celeron N3450 using go1.18 and go1.19 the Copy and CopyG ns/op results are effectively the same. I run benchmarks on a Celeron N3450 because Intel disables many hardware optimizations on cheap hardware.
 
# Original: https://go.dev/play/p/m1ClnbdbdWi

$ go1.18 version && go1.18 test x_0_test.go -bench=.
go version go1.18.1 linux/amd64
BenchmarkCopy-4         2773934           428.0 ns/op
BenchmarkCopyG-4        2783043           429.7 ns/op

$  go version && go test x_0_test.go -bench=.
go version devel go1.19-a11a885cb5 Mon Apr 18 23:57:00 2022 +0000 linux/amd64
BenchmarkCopy-4         2781676           428.6 ns/op
BenchmarkCopyG-4        2765179           429.0 ns/op


Peter

Michael Ellis

unread,
Apr 19, 2022, 5:00:29 PM4/19/22
to golang-nuts
FWIW, no difference on my  MacBook.

(base) michaels-mbp copybench % go test -bench=.
goos: darwin
goarch: amd64
pkg: localgo/copybench
cpu: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
BenchmarkCopy-8         6999800           169.4 ns/op
BenchmarkCopyG-8        6967590           170.6 ns/op
PASS
ok      localgo/copybench    2.810s
Reply all
Reply to author
Forward
0 new messages