Set implementation benchmarking

karls...@gmail.com

unread,

Jul 15, 2011, 12:04:58 PM7/15/11

to golan...@googlegroups.com

Hi,

Posted my first Q here yesterday and I got some really nice, fast and helpful answers. Motivated me, so I wrote and tested various possible Set implementation in Go. Thought some people might be interested:

http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Implementation

I'm currently 12% tempted to write The Little Go Book (similar to The Little MongoDB Book I wrote)

Karl

⚛

unread,

Jul 15, 2011, 1:18:11 PM7/15/11

to golang-nuts

I am wondering what the effect of using an O(N*logN) algorithm would
be instead of the O(N*N) you are using there. Could you please try
"sort.SortInts" in the implementation of your
"OnDemandArraySet.RemoveDuplicates" (and name it "OnDemandArraySet_2"
or something) ?

On Jul 15, 6:04 pm, karlseg...@gmail.com wrote:
> Hi,
> Posted my first Q here yesterday and I got some really nice, fast and
> helpful answers. Motivated me, so I wrote and tested various possible Set
> implementation in Go. Thought some people might be interested:
>

> http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Imple...

⚛

unread,

Jul 15, 2011, 1:43:09 PM7/15/11

to golang-nuts

Just another small note:

In your examples, the max [ratio of values in the set] to [the range
of possible values] is just 1:7. In that case, using an array of
booleans, or ints, or a bit-set should be much faster than any of the
implementations you currently have there.

The way I see it, hash-maps and hash-sets start to make sense than the
ratio of Actual:Possible is like 1:100. For example, if the keys of
the map are 32-bit pointers then the ratio is N:(2**32), where N is
usually a much smaller number than 2**32.

If the max ratio is just 1:7, it is not making much sense to use a
hash-map nor a growing array - if you are concentrating on
performance.

-----

Maybe another small note: In your text you are using the word
"collision". The meaning of this word in context of hash-maps and hash-
sets is different (http://en.wikipedia.org/wiki/Hash_table). I would
suggest to use the word "duplicate" in your text.

On Jul 15, 6:04 pm, karlseg...@gmail.com wrote:

> Hi,
> Posted my first Q here yesterday and I got some really nice, fast and
> helpful answers. Motivated me, so I wrote and tested various possible Set
> implementation in Go. Thought some people might be interested:
>

> http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Imple...

karls...@gmail.com

unread,

Jul 15, 2011, 8:28:59 PM7/15/11

to golan...@googlegroups.com

I couldn't think of a good way to remove duplicates - since you can just stick the last item in its place. However, after sleeping on it, I realize I can simply move the duplicate to the front and slice off the front duplicate part. I'll try it tonight when I get back!

Michael Jones

unread,

Jul 17, 2011, 5:04:46 AM7/17/11

to golan...@googlegroups.com

Karl,

Following up on "*"'s suggestion of a bit set in your situation, consider this:

package set

const minValue = 0

const maxValue = 700000

type BitSet struct {

data []uint8

count int

}

func (this *BitSet) Add(value int) {

if value < minValue || value > maxValue { panic("value out of range") }

if (this.data[value>>3] & (1<<uint(value & 0x7))) == 0 {

this.data[value>>3] |= 1<<uint(value & 0x7)

this.count++

}

func (this *BitSet) Contains(value int) (exists bool) {

if value < minValue || value > maxValue { panic("value out of range") }

return (this.data[value>>3] & (1<<uint(value & 0x7))) != 0

}

func (this *BitSet) Length() (int) {

return this.count

}

func (this *BitSet) RemoveDuplicates() {}

func NewSet() (Set) {

return &BitSet{make([]uint8, (maxValue+1+7)/8), 0}

}

which trades fixed storage of ~100kb for fixed O(1) runtime:

set.BenchmarkSmallSetWithFewCollisions-4 50000 43991 ns/op => 440 ns/Add()`

set.BenchmarkSmallSetWithMoreCollisions-4 50000 44500 ns/op

set.BenchmarkSmallSetWithManyCollisions-4 50000 44221 ns/op

set.BenchmarkMediumSetWithFewCollisions-4 5000 449921 ns/op => 90 ns/Add()

set.BenchmarkMediumSetWithMoreCollisions-4 5000 458409 ns/op

set.BenchmarkMediumSetWithManyCollisions-4 5000 441238 ns/op

set.BenchmarkLargeSetWithFewCollisions-4 200 8452580 ns/op => 84 ns/Add()

set.BenchmarkLargeSetWithMoreCollisions-4 200 8445275 ns/op

set.BenchmarkLargeSetWithManyCollisions-4 200 8085210 ns/op

The variation in time per Add() in the three benchmarks sizes comes from the fixed cost of allocation vs. the loop-count varying cost of Add(). If we change the benchmark function to pause the benchmark cost during array allocation, then we can see the constant time more clearly:

func benchmark(b *testing.B, size int, fill int) {

for i := 0; i < b.N; i++ {

b.StopTimer()

set := NewSet()

b.StartTimer()

for j := 0; j < size; j++ {

set.Add(rand.Int() % fill)

}

set.BenchmarkSmallSetWithFewCollisions-4 200000 9722 ns/op

set.BenchmarkSmallSetWithMoreCollisions-4 200000 10076 ns/op => 100 ns/Add()

set.BenchmarkSmallSetWithManyCollisions-4 200000 9785 ns/op

set.BenchmarkMediumSetWithFewCollisions-4 5000 415762 ns/op

set.BenchmarkMediumSetWithMoreCollisions-4 5000 430995 ns/op => 86 ns/Add()

set.BenchmarkMediumSetWithManyCollisions-4 5000 413545 ns/op

set.BenchmarkLargeSetWithFewCollisions-4 200 8460650 ns/op

set.BenchmarkLargeSetWithMoreCollisions-4 200 8586540 ns/op => 86 ns/Add()

set.BenchmarkLargeSetWithManyCollisions-4 200 8186250 ns/op

The Add code is a little faster without the range check. It is also a little faster with index and mask computed as variables and reused between the test and the assignment. (This kind of simple common sub expression elimination is something we can hope for in future compilers.) It runs fastest on my machine with bytes as the bit set datatype. Remove(), were it one of your functions, would be like Add() but would test for set, use &^ to clear the bit, and decrement the count. This simple approach is O(1) for add, remove, and count -- great when you can afford the storage.

Michael

On Fri, Jul 15, 2011 at 8:28 PM, <karls...@gmail.com> wrote:

I couldn't think of a good way to remove duplicates - since you can just stick the last item in its place. However, after sleeping on it, I realize I can simply move the duplicate to the front and slice off the front duplicate part. I'll try it tonight when I get back!

--

Michael T. Jones

Chief Technology Advocate, Google Inc.

1600 Amphitheatre Parkway, Mountain View, California 94043

Email: m...@google.com Mobile: 650-335-5765 Fax: 650-649-1938

Organizing the world's information to make it universally accessible and useful

John Arbash Meinel

unread,

Jul 17, 2011, 5:39:14 AM7/17/11

to Michael Jones, golan...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> The Add code is a little faster without the range check. It is also a little
> faster with index and mask computed as variables and reused between the test
> and the assignment. (This kind of simple common sub expression elimination
> is something we can hope for in future compilers.) It runs fastest on my
> machine with bytes as the bit set datatype. Remove(), were it one of your
> functions, would be like Add() but would test for set, use &^ to clear the
> bit, and decrement the count. This simple approach is O(1) for add, remove,
> and count -- great when you can afford the storage.
>
> Michael

And if storage is a specific issue (because you might have a larger
range), you can put a tree structure in place. So at the top level, you
mask out the first N bits, pointing into an array of arrays, etc.
You trade off a little bit of performance (because now you have to jump
through the arrays), but you can save potentially lots of memory.
(Depending on how well your data is grouped into sub-ranges.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ircIACgkQJdeBCYSNAAM56QCgqnXNUX3Qi6y/X8xgZKyMlCXT
8ccAnRhPHLhfoiGqjD1W8FZwWr2nsjPB
=09ip
-----END PGP SIGNATURE-----

Michael Jones

unread,

Jul 17, 2011, 1:10:44 PM7/17/11

to John Arbash Meinel, golan...@googlegroups.com

Indeed. I was going to to that but wanted to be as simple in the code as possible. Here is the dynamic version using one level of segmenting (an array of pointers to segments) but the idea expands simply to N-ary trees, which could be beneficial in cases of very large ranges and wildly spread values. Such values would be seen by taking the random numbers in the test program, after they've ben modded, and then shifting them up toward the highest bits. Anyway, here is a simple dynamic bitset:

// dynamic bit set

//

// bitset is slice of pointers to segments of byte slices holding a bit for each number

// segments are 1<<(segmentBits-3) bytes long

// bitset array references 1<<(31-segmentBits-3) segments

// segments are allocated as needed

package set

const segmentBits = 18

type BitSetSegment struct {

data []uint8

}

type BitSet struct {

segment []BitSetSegment

count int

}

func (this *BitSet) Add(value int) {

if value < 0 { panic("value out of range") }

e := value>>segmentBits

if this.segment[e].data == nil {

this.segment[e].data = make([]uint8, 1<<(segmentBits-3))

}

value &= 1<<segmentBits - 1

if (this.segment[e].data[value>>3] & (1<<uint(value & 0x7))) == 0 {

this.segment[e].data[value>>3] |= 1<<uint(value & 0x7)

this.count++

}

func (this *BitSet) Contains(value int) (exists bool) {

if value < 0 { panic("value out of range") }

e := value>>segmentBits

if this.segment[e].data == nil {

return false

}

value &= 1<<segmentBits - 1

return (this.segment[e].data[value>>3] & (1<<uint(value & 0x7))) != 0

}

func (this *BitSet) Length() (int) {

return this.count

}

func (this *BitSet) RemoveDuplicates() {}

func NewSet() (Set) { // 1<<31 because input is int(31) vs uint(32), -3 for uint8

return &BitSet{make([]BitSetSegment, 1<<(31-segmentBits-3)), 0}

}

Slower, but not much. Uses much less memory. Normal benchmark:

set.BenchmarkSmallSetWithFewCollisions-4 50000 39625 ns/op

set.BenchmarkSmallSetWithMoreCollisions-4 50000 40368 ns/op => 403 ns/Add()

set.BenchmarkSmallSetWithManyCollisions-4 50000 40318 ns/op

set.BenchmarkMediumSetWithFewCollisions-4 5000 473111 ns/op

set.BenchmarkMediumSetWithMoreCollisions-4 5000 486249 ns/op => 97 ns/Add()

set.BenchmarkMediumSetWithManyCollisions-4 5000 462251 ns/op

set.BenchmarkLargeSetWithFewCollisions-4 200 9004800 ns/op

set.BenchmarkLargeSetWithMoreCollisions-4 200 9259785 ns/op => 92 ns/Add()

set.BenchmarkLargeSetWithManyCollisions-4 200 8598025 ns/op

Here is the version of set_test.go where we stop the clock while building the top-level array (as would happen once in an actual program and not per 100 insertions, presumably, otherwise use the numbers above):

set.BenchmarkSmallSetWithFewCollisions-4 100000 26281 ns/op

set.BenchmarkSmallSetWithMoreCollisions-4 100000 26759 ns/op => 267 ns/Add()

set.BenchmarkSmallSetWithManyCollisions-4 100000 26277 ns/op

set.BenchmarkMediumSetWithFewCollisions-4 5000 449336 ns/op

set.BenchmarkMediumSetWithMoreCollisions-4 5000 466167 ns/op => 93 ns/Add()

set.BenchmarkMediumSetWithManyCollisions-4 5000 448365 ns/op

set.BenchmarkLargeSetWithFewCollisions-4 200 8967720 ns/op

set.BenchmarkLargeSetWithMoreCollisions-4 200 8984375 ns/op => 90 ns/Add()

set.BenchmarkLargeSetWithManyCollisions-4 200 8592885 ns/op

This is not bad. 90-93 ns/Add() here and 86 ns/Add() in the static version. No claims of best coding. In particular, I would not have BItSetSegment as a struct but rather as a simple slice of bytes and skip the ".data" tags for clarity, but I wanted to make the changes as few as possible between versions. Plus, one might want to keep per Segment counts so that code to go and extract the values could stop when all had been found in a particular segment (average 2x speedup maybe.) That's when this radix-sort wannabe approach could shine, at least when the counts were large. Especially in a tree implementation.

Reply all

Reply to author

Forward