BenchmarkMemclr_100-4 100000000 22.8 ns/op
BenchmarkLoop_100-4 30000000 47.1 ns/op
BenchmarkMemclr_1000-4 10000000 181 ns/op
BenchmarkLoop_1000-4 5000000 365 ns/op
BenchmarkMemclr_10000-4 500000 2777 ns/op
BenchmarkLoop_10000-4 300000 4003 ns/op
BenchmarkMemclr_100000-4 50000 38993 ns/op
BenchmarkLoop_100000-4 30000 43893 ns/op
BenchmarkMemclr_200000-4 20000 79159 ns/op
BenchmarkLoop_200000-4 20000 87533 ns/op
BenchmarkMemclr_300000-4 10000 127745 ns/op
BenchmarkLoop_300000-4 10000 140770 ns/op
BenchmarkMemclr_400000-4 10000 217689 ns/op
BenchmarkLoop_400000-4 10000 234632 ns/op
BenchmarkMemclr_500000-4 5000 344265 ns/op
BenchmarkLoop_500000-4 2000 535585 ns/op
BenchmarkMemclr_1000000-4 1000 1130508 ns/op
BenchmarkLoop_1000000-4 2000 889592 ns/op
BenchmarkMemclr_2000000-4 1000 2071970 ns/op
BenchmarkLoop_2000000-4 1000 1758001 ns/op
PASS
ok _/Users/bao/program/go/learn/goTour/memclr 37.313s
But if I changed the linetype MyInt int32totype MyInt intthen again, the memclr version becomes slower, or no advantage, for cases of slice lengths larger than 2000000.
Be wary of slice size, as caching is going to have an extremely strong effect on the results. I submitted a CL that made append, only clear memory that was not going to be overwritten ( https://github.com/golang/go/commit/c1e267cc734135a66af8a1a5015e572cbb598d44 ). I thought this would have a much larger impact, but it only had a small impact. memclr would zero the memory, but it also brought it into the cache, where it was hot for being overwritten.Have you tried running with perf to see dcache misses for each benchmark?
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I've heard, adaptive prefetching is turned on if there were 3 consequent accesses to same cache-line in increasing address order. So, perhaps optimised SSE/AVX zeroing doesn't trigger adaptive prefetch cause it uses less memory accesses. And then, it may vary much by CPU model: newer models may fix adaptive prefetch, so that memclr is great again.
package P
import (
"strconv"
"testing"
)
func memclr(a []int) {
for i := range a {
a[i] = 0
}
}
func BenchmarkMemclr(b *testing.B) {
for i := 100000; i < 409600000; i *= 2 {
b.Run("bench"+strconv.Itoa(i), func(b *testing.B) {
var a = make([]int, i)
b.ResetTimer()
for i := 0; i < b.N; i++ {
memclr(a)
}