performance problem with sort.Search?

Dan Adkins

unread,

Apr 10, 2012, 2:45:33 PM4/10/12

to golang-nuts

Hi go-nuts,

Over the weekend, I was debugging an unexpectedly slow Go program and
the profiler fingered a function that was doing nothing but calling
sort.Search as in the documentation.

func find(data []int, x int) int {
return sort.Search(len(data), func(i int) bool { return data[i] >= x })
}

The profile showed a lot of time spent in runtime.closure, and
runtime.mallocgc. As far as I can tell, that dominated the actual
time spent searching the array. I replaced the binary search with a
simple linear search and knocked a factor of 5 off of the runtime.

Could I have written the call to sort.Search in a way that didn't pay
a large penalty for creating a closure and garbage-collecting it
later? Is this something the compiler might be able to optimize?

I wrote a short benchmark to demonstrate the problem:

package search_test

import (
"sort"
"testing"
)

func benchmarkBinarySearch(b *testing.B, size int) {
b.StopTimer()
data := make([]int, size)
for i := 0; i < len(data); i++ {
data[i] = i
}
b.StartTimer()
for i := 0; i < b.N; i++ {
for targ := 0; targ <= size; targ += size/2 {
j := sort.Search(size, func(j int) bool { return data[j] >= targ })
if j != targ {
b.Errorf("sort.Search(%d, %d) = %d", size, targ, j)
}
}
}
}

func BenchmarkBinarySearch128(b *testing.B) { benchmarkBinarySearch(b, 1<<7) }
func BenchmarkBinarySearch256(b *testing.B) { benchmarkBinarySearch(b, 1<<8) }
func BenchmarkBinarySearch512(b *testing.B) { benchmarkBinarySearch(b, 1<<9) }
func BenchmarkBinarySearch1K(b *testing.B) { benchmarkBinarySearch(b, 1<<10) }

func linearSearch(data []int, targ int) int {
for i := range data {
if data[i] >= targ {
return i
}
}
return len(data)
}

func benchmarkLinearSearch(b *testing.B, size int) {
b.StopTimer()
data := make([]int, size)
for i := 0; i < len(data); i++ {
data[i] = i
}
b.StartTimer()
for i := 0; i < b.N; i++ {
for targ := 0; targ <= size; targ += size/2 {
j := linearSearch(data, targ)
if j != targ {
b.Errorf("linearSearch(%d, %d) = %d", size, targ, j)
}
}
}
}

func BenchmarkLinearSearch128(b *testing.B) { benchmarkLinearSearch(b, 1<<7) }
func BenchmarkLinearSearch256(b *testing.B) { benchmarkLinearSearch(b, 1<<8) }
func BenchmarkLinearSearch512(b *testing.B) { benchmarkLinearSearch(b, 1<<9) }
func BenchmarkLinearSearch1K(b *testing.B) { benchmarkLinearSearch(b, 1<<10) }

Results:

BenchmarkBinarySearch128 2000000 942 ns/op
BenchmarkBinarySearch256 2000000 975 ns/op
BenchmarkBinarySearch512 2000000 979 ns/op
BenchmarkBinarySearch1K 1000000 1001 ns/op
BenchmarkLinearSearch128 10000000 192 ns/op
BenchmarkLinearSearch256 5000000 347 ns/op
BenchmarkLinearSearch512 5000000 656 ns/op
BenchmarkLinearSearch1K 1000000 1276 ns/op

The break-even point is around 1,000.

A profile from running just the binary search benchmarks:

(pprof) top10
Total: 899 samples
225 25.0% 25.0% 225 25.0%
_/home/dadkins/sandbox/search_test._func_001
147 16.4% 41.4% 372 41.4% sort.Search
81 9.0% 50.4% 487 54.2% runtime.closure
64 7.1% 57.5% 141 15.7% runtime.MCache_Alloc
64 7.1% 64.6% 64 7.1% scanblock
36 4.0% 68.6% 370 41.2% runtime.mallocgc
35 3.9% 72.5% 88 9.8% sweepspan
30 3.3% 75.9% 30 3.3% runtime.memmove
28 3.1% 79.0% 31 3.4% MCentral_Alloc
27 3.0% 82.0% 880 97.9%
_/home/dadkins/sandbox/search_test.benchmarkBinarySearch

-Dan

Jonathan Wills

unread,

Apr 10, 2012, 6:15:11 PM4/10/12

to Dan Adkins, golang-nuts

On every call to sort.Search you allocate another closure which eventually needs to be garbage collected, that's where your time is going. Here is your benchmark function rewritten to avoid this:

func benchmarkBinarySearch(b *testing.B, size int) {

b.StopTimer()

data := make([]int, size)

for i := 0; i < len(data); i++ {

data[i] = i

}

b.StartTimer()

var targ int

cmp := func(j int) bool { return data[j] >= targ }

for i := 0; i < b.N; i++ {

for targ = 0; targ <= size; targ += size/2 {

j := sort.Search(size, cmp)

if j != targ {

b.Errorf("sort.Search(%d, %d) = %d", size, targ, j)

}

This runs about 6x faster on my machine.

Glenn Brown

unread,

Apr 10, 2012, 8:53:58 PM4/10/12

to Jonathan Wills, Dan Adkins, golang-nuts

With Jonathan's change on a Macbook Pro:

BenchmarkBinarySearch128 10000000 186 ns/op
BenchmarkBinarySearch256 10000000 211 ns/op
BenchmarkBinarySearch512 10000000 233 ns/op
BenchmarkBinarySearch1K 10000000 265 ns/op
BenchmarkLinearSearch128 10000000 228 ns/op
BenchmarkLinearSearch256 5000000 397 ns/op
BenchmarkLinearSearch512 5000000 741 ns/op
BenchmarkLinearSearch1K 1000000 1426 ns/op

Dan Adkins

unread,

Apr 10, 2012, 9:06:19 PM4/10/12

to Jonathan Wills, golang-nuts

On Tue, Apr 10, 2012 at 3:15 PM, Jonathan Wills <runni...@gmail.com> wrote:
> On every call to sort.Search you allocate another closure which eventually
> needs to be garbage collected, that's where your time is going. Here is
> your benchmark function rewritten to avoid this:

I understand that the closure is the costly part; that's what the
distilled benchmark was intended to show. The question is, how can I
write my original function to avoid that (without resorting to a
global)? Or, I guess, why isn't the straightforward code with the
O(log n) algorithm fast enough?

func find(data []int, x int) int {
return sort.Search(len(data), func(i int) bool { return data[i] >= x })
}

-Dan

Jonathan Wills

unread,

Apr 11, 2012, 12:14:39 AM4/11/12

to Dan Adkins, golang-nuts

Well you'll have to keep track of something, maybe something like this would work?

type finder struct {

data []int

targ int

f func(n int) bool

}

func MakeFinder() *finder {

var f finder

f.f = func(i int) bool {

return f.data[i] >= f.targ

}

return &f

}

func (f *finder) Find(data []int, x int) int {

f.data = data

f.targ = x

return sort.Search(len(f.data), f.f)

}

Then you can make a finder and call finder.Find(data, x) anywhere you would have called find(data, x). Obviously now you have to keep around a finder, and you have to keep around one for each goroutine that wants to call Find. I can't think of a way to avoid the cost of those closures without keeping track of something like this, though.

(I'm assuming you aren't just search through a slice of ints or you'd be calling sort.SearchInts())

⚛

unread,

Apr 11, 2012, 2:12:12 AM4/11/12

to golan...@googlegroups.com, Jonathan Wills

On Wednesday, April 11, 2012 3:06:19 AM UTC+2, Dan Adkins wrote:

The question is, how can I
write my original function to avoid that (without resorting to a
global)?

When calling sort.Search the current Go compiler cannot avoid the costs.

To make the program run faster you need to introduce a package variable to avoid closures or reuse the same closure multiple times when calling sort.Search.

Alternatively, you can copy the source code of sort.Search (run "godoc -src sort Search") and modify it so that the function 'f' accepts 2 arguments instead of just 1:

func Search(n int, a interface{}, f func(int,interface{}) bool) int { ... }

or:

type SearchFunction interface { F(int) bool }

func Search(n int, f SearchFunction) int { ... }

Russ Cox

unread,

Apr 13, 2012, 11:49:18 AM4/13/12

to Dan Adkins, golang-nuts

The behavior of sort.Search is unfortunate.
You can't use it in a tight loop because of the closure
allocation. I had hoped that we would have better
analysis in the compiler for Go 1 so that a case like

sort.Search(len(data), func(i int) bool { return data[i] >= x })

would recognize that sort.Search did not retain the
closure, so the closure did not need to be allocated
(and thus not garbage collected), but there was not
enough time to do everything I had hoped for.
This is still something we can do for Go 1.1.

Russ

Reply all

Reply to author

Forward