This is not a serious problem, but it surprised me. (By the way, how can I post a message with code formatting?)
I'd like to create table-driven benchmarks:
To that end, I started with code from The Go Programming Language:
// PopCount is based on an example from chapter 2 of THE GO PROGRAMMING LANGUAGE.
// Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
package popcount
// pc[i] is the population count of i.
var pc [256]byte
func init() {
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
}
// PopCount returns the population count (number of set bits) of x.
func PopCount(x uint64) int {
return int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
}
I then wrote a simple test:
// popcount_simple_test.go
package popcount
import "testing"
func BenchmarkPopCountSimple(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += PopCount(0x1234567890abcdef)
}
}
Because I was paranoid about dead code elimination, I wrote an identical test that calls the function many times (inside the loop to call it b.N times, which should make no difference if the hypothetically-still-dead code is being eliminated):
// popcount_slow_simple_test.go
package popcount
import "testing"
func BenchmarkPopCountSlowSimple(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
// Exactly as before, but call the function many times.
for j:= 0; j < 1_000_000; j++ {
sum += PopCount(0x1234567890abcdef)
}
}
}
Then I wrote an "almost simple" test that uses testing.B.Run() but is hardwired to call the function being benchmarked:
// popcount_almost_simple_test.go
package popcount
import "testing"
func BenchmarkPopCountAlmostSimple(b *testing.B) {
b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += PopCount(0x1234567890abcdef)
}
})
}
Finally, I wrote a "nearly-simple" test that passes arguments to Run() that could have come from a table:
// popcount_nearly_simple.go
package popcount
import "testing"
func BenchmarkPopCountNearlySimple(b *testing.B) {
f := PopCount
name := "PopCountNearlySimple"
b.Run(name, func(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += f(0x1234567890abcdef)
}
})
}
The simple and almost-simple results are nearly identical (happily, the slow results are not), but the nearly-simple results are an order of magnitude slower:
$ go version
go version go1.16.4 windows/amd64
$ go test -cpu=1 -bench=.
goos: windows
goarch: amd64
cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz
BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple 1000000000 0.6102 ns/op
BenchmarkPopCountNearlySimple/PopCountNearlySimple 194925662 6.197 ns/op
BenchmarkPopCountSimple 1000000000 0.5994 ns/op
BenchmarkPopCountSlowSimple 1953 606194 ns/op
PASS
After reading this article:
I ran:
$ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to PopCount'
.\popcount_almost_simple_test.go:9:19: inlining call to PopCount
.\popcount_simple_test.go:8:18: inlining call to PopCount
.\popcount_slow_simple_test.go:10:19: inlining call to PopCount
That makes sense. 0.6 ns is less than a function call. The simple and almost-simple benchmarks can inline the call to the hardwired function, but the nearly-simple benchmark can't.
In practice, this isn't much of a problem. Any function small enough to be inlined is unlikely to be a performance bottleneck. If it ever is, a non-table-driven benchmark can still measure it.
Hope this helps. --PSRC
P.S.: Out of curiosity, how can I post a message with fancy code examples like this one?