Strange benchmark results

291 views
Skip to first unread message

tapi...@gmail.com

unread,
May 13, 2021, 4:52:32 AM5/13/21
to golang-nuts

package main

import "testing"

const N = 1615119
// It is strange that if N is large enough,
// the one line implementations are fast as the others.
// And if N is odd number, the InsertOneline_Disassemble
// implementation is about 10% faster than the others.

func init() {
    println("==================== N =", N)
}

func InsertOneline(s []int, k int, vs ...int) []int {
    return append(s[:k], append(vs, s[k:]...)...)
}

func InsertOneline_Disassemble(s []int, k int, vs ...int) []int {
    z := append(vs, s[k:]...)
    return append(s[:k], z...)
}

func InsertVerbose(s []int, k int, vs ...int) []int {
    if n := len(s) + len(vs); n <= cap(s) {
        s2 := s[:n]
        copy(s2[k+len(vs):], s[k:])
        copy(s2[k:], vs)
        return s2
    }
    s2 := make([]int, len(s) + len(vs))
    copy(s2, s[:k])
    copy(s2[k:], vs)
    copy(s2[k+len(vs):], s[k:])
    return s2
}


func InsertVerbose_b(s []int, k int, vs ...int) []int {
    if n := len(s) + len(vs); n <= cap(s) {
        s2 := s[:n]
        copy(s2[k+len(vs):], s[k:])
        copy(s2[k:], vs)
        return s2
    }
    s2 := make([]int, 0, len(s) + len(vs))
    s2 = append(s2, s[:k]...)
    s2 = append(s2, vs...)
    s2 = append(s2, s[k:]...)
    return s2
}

func InsertVerbose_c(s []int, k int, vs ...int) []int {
    if n := len(s) + len(vs); n <= cap(s) {
        s2 := s[:n]
        copy(s2[k+len(vs):], s[k:])
        copy(s2[k:], vs)
        return s2
    }
    s2 := append([]int(nil), make([]int, len(s) + len(vs))...)[:0]
    s2 = append(s2, s[:k]...)
    s2 = append(s2, vs...)
    s2 = append(s2, s[k:]...)
    return s2
}

var s1 []int
func Benchmark_InsertOneline(b *testing.B) {
    var x = make([]int, N)
    var y = make([]int, N/2)
    var k = N/5
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s1 = InsertOneline(x, k, y...)
    }
}

var s1b []int
func Benchmark_InsertOneline_Disassemble(b *testing.B) {
    var x = make([]int, N)
    var y = make([]int, N/2)
    var k = N/2
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s1b = InsertOneline_Disassemble(x, k, y...)
    }
}

var s2 []int
func Benchmark_InsertVerbose(b *testing.B) {
    var x = make([]int, N)
    var y = make([]int, N/2)
    var k = N/2
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s2 = InsertVerbose(x, k, y...)
    }
}

var s3 []int
func Benchmark_InsertVerbose_b(b *testing.B) {
    var x = make([]int, N)
    var y = make([]int, N/2)
    var k = N/2
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s3 = InsertVerbose_b(x, k, y...)
    }
}

var s4 []int
func Benchmark_InsertVerbose_c(b *testing.B) {
    var x = make([]int, N)
    var y = make([]int, N/2)
    var k = N/2
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s4 = InsertVerbose_c(x, k, y...)
    }
}


The result:

$ go test -bench=. -benchtime=3s
==================== N = 1615119
goos: linux
goarch: amd64
pkg: a.y/bench/sliceinsert
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_InsertOneline-4                        693       4741509 ns/op
Benchmark_InsertOneline_Disassemble-4            871       4194142 ns/op
Benchmark_InsertVerbose-4                        764       4627334 ns/op
Benchmark_InsertVerbose_b-4                      769       4958537 ns/op
Benchmark_InsertVerbose_c-4                      661       4855514 ns/op

tapi...@gmail.com

unread,
May 13, 2021, 5:07:49 AM5/13/21
to golang-nuts
Sorry, there is a temp tweak, the N/5 should be N/2.
The second conclusion should be:

// And if N is odd number, the InsertOneline
// implementations are about 10% faster than the others.
Message has been deleted

Brian Candler

unread,
May 15, 2021, 9:15:21 AM5/15/21
to golang-nuts
With go version go1.16.3 darwin/amd64 (macOS 10.14.6), and after changing N/5 to N/2, I can't reproduce either.

$ go test . -bench=. -benchtime=3s
==================== N = 1615119
goos: darwin
goarch: amd64
pkg: bm
cpu: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Benchmark_InsertOneline-4                     867    4130473 ns/op
Benchmark_InsertOneline_Disassemble-4         848    4161459 ns/op
Benchmark_InsertVerbose-4                     696    4988955 ns/op
Benchmark_InsertVerbose_b-4                   712    4882270 ns/op
Benchmark_InsertVerbose_c-4                   702    4876656 ns/op
PASS
ok  bm 23.058s

InsertOneline_Disassemble-4 is about 0.75% slower than InsertOneline-4, and both are significantly faster than the others.

On Friday, 14 May 2021 at 23:49:16 UTC+1 peterGo wrote:
My results:


I can't reproduce your results.

Peter

On Thursday, May 13, 2021 at 4:52:32 AM UTC-4 tapi...@gmail.com wrote:
Message has been deleted

peterGo

unread,
May 15, 2021, 11:49:38 AM5/15/21
to golang-nuts
For your sliceinsert microbenchmarks, you don't provide the Go version, you don't provide memory allocation statistics, and you only provide results for a single data point.

My results for several values of N:

https://play.golang.org/p/WuKmIy_jY20

There are significant differences in CPU performance for different values of N, ranging from 1:1 to 2:1 for append versus precise implementations.

I am unable reproduce your result.

Peter

On Thursday, May 13, 2021 at 4:52:32 AM UTC-4 tapi...@gmail.com wrote:

tapi...@gmail.com

unread,
May 16, 2021, 2:59:06 AM5/16/21
to golang-nuts
Your benchmark exactly reproduced both of my observations.

tapi...@gmail.com

unread,
May 16, 2021, 3:07:17 AM5/16/21
to golang-nuts
> you don't provide the Go version,

My Go version is Go 1.16.3. (BTW, "go test" really should print Go version in the first line).

> you don't provide memory allocation statistics,

There are no surprises in memory allocation statistics so I didn't mention them.

> you only provide results for a single data point.

There are no surprises for small Ns. So I I didn't provide the results for them.

> I am unable reproduce your result.

At least, you exactly reproduced my first observation: if N is large enough, the one line implementations are fast as the others.

And you partially reproduced my second observation: the InsertOneline implementations run faster for odd Ns than even Ns for large Ns.

Brian Candler

unread,
May 16, 2021, 4:46:44 AM5/16/21
to golang-nuts
On Sunday, 16 May 2021 at 08:07:17 UTC+1 tapi...@gmail.com wrote:
> you don't provide memory allocation statistics,

There are no surprises in memory allocation statistics so I didn't mention them.


I think it is relevant, because your different functions return slices of different capacity (i.e. different amounts of memory allocated):

The only functions which allocate exactly the right size of slice are InsertVerbose and InsertVerbose_b.  The others rely on append(), and when that exceeds the size of the current slice and has to allocate a new one, it allocates a bit extra space for growing room.

Therefore, you could be measuring boundary conditions around the size that append() decides to round your slice up to, combined with amount of garbage collection overhead for large values of N.

Aside: larger values like N = 1615119 end up with SIGKILL in the playground - presumably using too much memory - but run locally:

======== N = 1615119
InsertOneline: cap=2524160
InsertOneline_Disassemble: cap=2524160
InsertVerbose: cap=2422678
InsertVerbose_b: cap=2422678
InsertVerbose_c: cap=2422784

tapi...@gmail.com

unread,
May 16, 2021, 5:04:13 AM5/16/21
to golang-nuts
I'm aware of this. So I expect that the append implementations (especially the one-line ones) should be slower than InsertVerbose and InsertVerbose_b.
This is just a small reason why the one-line implementations should be slower. The main reason is they both allocate twice.

The expectation is promised from small Ns, but the benchmarks show it is broken for large Ns.
 

peterGo

unread,
May 23, 2021, 10:43:32 AM5/23/21
to golang-nuts
The Go optimizing gc compiler and runtime are working as intended. The results are not strange. Nothing is broken.

Peter

tapi...@gmail.com

unread,
May 24, 2021, 3:33:23 AM5/24/21
to golang-nuts
After some profiling investigations, it looks the following code has not been optimized yet:

    s2 := make([]int, len(s) + len(vs))
    copy(s2, s[:k])
    copy(s2[k:], vs)
    copy(s2[k+len(vs):], s[k:])

Much unnecessary time is consumed on memclrNoHeapPointers.

The one-line implementation does move more memory, but this disadvantage is compensated by its advantage of spending less time on memclrNoHeapPointers when N is large.

On Thursday, May 13, 2021 at 4:52:32 AM UTC-4 tapi...@gmail.com wrote:

tapi...@gmail.com

unread,
May 24, 2021, 4:11:16 AM5/24/21
to golang-nuts
The profiling results constantly show that more time are spent on memclrNoHeapPointers if N is a big even integer (1615118) than a big odd integer (1615119).
Reply all
Reply to author
Forward
0 new messages