Set implementation benchmarking

239 views
Skip to first unread message

karls...@gmail.com

unread,
Jul 15, 2011, 12:04:58 PM7/15/11
to golan...@googlegroups.com
Hi,
Posted my first Q here yesterday and I got some really nice, fast and helpful answers. Motivated me, so I wrote and tested various possible Set implementation in Go. Thought some people might be interested:

http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Implementation

I'm currently 12% tempted to write The Little Go Book (similar to The Little MongoDB Book I wrote)

Karl

unread,
Jul 15, 2011, 1:18:11 PM7/15/11
to golang-nuts
I am wondering what the effect of using an O(N*logN) algorithm would
be instead of the O(N*N) you are using there. Could you please try
"sort.SortInts" in the implementation of your
"OnDemandArraySet.RemoveDuplicates" (and name it "OnDemandArraySet_2"
or something) ?

On Jul 15, 6:04 pm, karlseg...@gmail.com wrote:
> Hi,
> Posted my first Q here yesterday and I got some really nice, fast and
> helpful answers. Motivated me, so I wrote and tested various possible Set
> implementation in Go. Thought some people might be interested:
>
> http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Imple...

unread,
Jul 15, 2011, 1:43:09 PM7/15/11
to golang-nuts
Just another small note:

In your examples, the max [ratio of values in the set] to [the range
of possible values] is just 1:7. In that case, using an array of
booleans, or ints, or a bit-set should be much faster than any of the
implementations you currently have there.

The way I see it, hash-maps and hash-sets start to make sense than the
ratio of Actual:Possible is like 1:100. For example, if the keys of
the map are 32-bit pointers then the ratio is N:(2**32), where N is
usually a much smaller number than 2**32.

If the max ratio is just 1:7, it is not making much sense to use a
hash-map nor a growing array - if you are concentrating on
performance.

-----

Maybe another small note: In your text you are using the word
"collision". The meaning of this word in context of hash-maps and hash-
sets is different (http://en.wikipedia.org/wiki/Hash_table). I would
suggest to use the word "duplicate" in your text.

On Jul 15, 6:04 pm, karlseg...@gmail.com wrote:
> Hi,
> Posted my first Q here yesterday and I got some really nice, fast and
> helpful answers. Motivated me, so I wrote and tested various possible Set
> implementation in Go. Thought some people might be interested:
>
> http://openmymind.net/2011/7/15/Learning-Go-By-Benchmarking-Set-Imple...

karls...@gmail.com

unread,
Jul 15, 2011, 8:28:59 PM7/15/11
to golan...@googlegroups.com
I couldn't think of a good way to remove duplicates - since you can just stick the last item in its place. However, after sleeping on it, I realize I can simply move the duplicate to the front and slice off the front duplicate part. I'll try it tonight when I get back!

Michael Jones

unread,
Jul 17, 2011, 5:04:46 AM7/17/11
to golan...@googlegroups.com
Karl,

Following up on "*"'s suggestion of a bit set in your situation, consider this:

package set

const minValue = 0
const maxValue = 700000

type BitSet struct {
  data []uint8
  count int
}

func (this *BitSet) Add(value int) {
  if value < minValue || value > maxValue { panic("value out of range") }

  if (this.data[value>>3] & (1<<uint(value & 0x7))) == 0 {
    this.data[value>>3] |= 1<<uint(value & 0x7)
    this.count++
  }
}

func (this *BitSet) Contains(value int) (exists bool) {
  if value < minValue || value > maxValue { panic("value out of range") }

  return (this.data[value>>3] & (1<<uint(value & 0x7))) != 0
}

func (this *BitSet) Length() (int) {
  return this.count
}

func (this *BitSet) RemoveDuplicates() {}

func NewSet() (Set) {
  return &BitSet{make([]uint8, (maxValue+1+7)/8), 0}
}

which trades fixed storage of ~100kb for fixed O(1) runtime:

set.BenchmarkSmallSetWithFewCollisions-4   50000     43991 ns/op  => 440 ns/Add()`
set.BenchmarkSmallSetWithMoreCollisions-4   50000     44500 ns/op
set.BenchmarkSmallSetWithManyCollisions-4   50000     44221 ns/op
set.BenchmarkMediumSetWithFewCollisions-4    5000    449921 ns/op => 90 ns/Add()
set.BenchmarkMediumSetWithMoreCollisions-4    5000    458409 ns/op
set.BenchmarkMediumSetWithManyCollisions-4    5000    441238 ns/op
set.BenchmarkLargeSetWithFewCollisions-4     200   8452580 ns/op => 84 ns/Add()
set.BenchmarkLargeSetWithMoreCollisions-4     200   8445275 ns/op
set.BenchmarkLargeSetWithManyCollisions-4     200   8085210 ns/op

The variation in time per Add() in the three benchmarks sizes comes from the fixed cost of allocation vs. the loop-count varying cost of Add(). If we change the benchmark function to pause the benchmark cost during array allocation, then we can see the constant time more clearly:

func benchmark(b *testing.B, size int, fill int) {
  for i := 0; i < b.N; i++ {
    b.StopTimer()
    set := NewSet()
    b.StartTimer()
    for j := 0; j < size; j++ {
      set.Add(rand.Int() % fill)
    }
  }
}

set.BenchmarkSmallSetWithFewCollisions-4  200000      9722 ns/op
set.BenchmarkSmallSetWithMoreCollisions-4  200000     10076 ns/op  =>  100 ns/Add()
set.BenchmarkSmallSetWithManyCollisions-4  200000      9785 ns/op
set.BenchmarkMediumSetWithFewCollisions-4    5000    415762 ns/op 
set.BenchmarkMediumSetWithMoreCollisions-4    5000    430995 ns/op  =>  86 ns/Add()
set.BenchmarkMediumSetWithManyCollisions-4    5000    413545 ns/op
set.BenchmarkLargeSetWithFewCollisions-4     200   8460650 ns/op
set.BenchmarkLargeSetWithMoreCollisions-4     200   8586540 ns/op  =>  86 ns/Add()
set.BenchmarkLargeSetWithManyCollisions-4     200   8186250 ns/op

The Add code is a little faster without the range check. It is also a little faster with index and mask computed as variables and reused between the test and the assignment. (This kind of simple common sub expression elimination is something we can hope for in future compilers.) It runs fastest on my machine with bytes as the bit set datatype. Remove(), were it one of your functions, would be like Add() but would test for set, use &^ to clear the bit, and decrement the count. This simple approach is O(1) for add, remove, and count -- great when you can afford the storage.

Michael 

On Fri, Jul 15, 2011 at 8:28 PM, <karls...@gmail.com> wrote:
I couldn't think of a good way to remove duplicates - since you can just stick the last item in its place. However, after sleeping on it, I realize I can simply move the duplicate to the front and slice off the front duplicate part. I'll try it tonight when I get back!



--

Michael T. Jones

   Chief Technology Advocate, Google Inc.

   1600 Amphitheatre Parkway, Mountain View, California 94043

   Email: m...@google.com  Mobile: 650-335-5765  Fax: 650-649-1938

   Organizing the world's information to make it universally accessible and useful


John Arbash Meinel

unread,
Jul 17, 2011, 5:39:14 AM7/17/11
to Michael Jones, golan...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...


> The Add code is a little faster without the range check. It is also a little
> faster with index and mask computed as variables and reused between the test
> and the assignment. (This kind of simple common sub expression elimination
> is something we can hope for in future compilers.) It runs fastest on my
> machine with bytes as the bit set datatype. Remove(), were it one of your
> functions, would be like Add() but would test for set, use &^ to clear the
> bit, and decrement the count. This simple approach is O(1) for add, remove,
> and count -- great when you can afford the storage.
>
> Michael

And if storage is a specific issue (because you might have a larger
range), you can put a tree structure in place. So at the top level, you
mask out the first N bits, pointing into an array of arrays, etc.
You trade off a little bit of performance (because now you have to jump
through the arrays), but you can save potentially lots of memory.
(Depending on how well your data is grouped into sub-ranges.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ircIACgkQJdeBCYSNAAM56QCgqnXNUX3Qi6y/X8xgZKyMlCXT
8ccAnRhPHLhfoiGqjD1W8FZwWr2nsjPB
=09ip
-----END PGP SIGNATURE-----

Michael Jones

unread,
Jul 17, 2011, 1:10:44 PM7/17/11
to John Arbash Meinel, golan...@googlegroups.com
Indeed. I was going to to that but wanted to be as simple in the code as possible. Here is the dynamic version using one level of segmenting (an array of pointers to segments) but the idea expands simply to N-ary trees, which could be beneficial in cases of very large ranges and wildly spread values. Such values would be seen by taking the random numbers in the test program, after they've ben modded, and then shifting them up toward the highest bits. Anyway, here is a simple dynamic bitset:

// dynamic bit set
//
// bitset is slice of pointers to segments of byte slices holding a bit for each number
//   segments are 1<<(segmentBits-3) bytes long
//   bitset array references 1<<(31-segmentBits-3) segments
// segments are allocated as needed

package set

const segmentBits = 18

type BitSetSegment struct {
  data []uint8
}

type BitSet struct {
   segment []BitSetSegment
   count int
}

func (this *BitSet) Add(value int) {
  if value < 0 { panic("value out of range") }

 e := value>>segmentBits
  if this.segment[e].data == nil {
    this.segment[e].data = make([]uint8, 1<<(segmentBits-3))
  }
  
  value &= 1<<segmentBits - 1
  if (this.segment[e].data[value>>3] & (1<<uint(value & 0x7))) == 0 {
    this.segment[e].data[value>>3] |= 1<<uint(value & 0x7)
    this.count++
  }
}

func (this *BitSet) Contains(value int) (exists bool) {
  if value < 0 { panic("value out of range") }

  e := value>>segmentBits
  if this.segment[e].data == nil {
    return false
  }
  
  value &= 1<<segmentBits - 1
  return (this.segment[e].data[value>>3] & (1<<uint(value & 0x7))) != 0
}

func (this *BitSet) Length() (int) {
  return this.count
}

func (this *BitSet) RemoveDuplicates() {}

func NewSet() (Set) { // 1<<31 because input is int(31) vs uint(32), -3 for uint8
  return &BitSet{make([]BitSetSegment, 1<<(31-segmentBits-3)), 0}
}

Slower, but not much. Uses much less memory. Normal benchmark:

set.BenchmarkSmallSetWithFewCollisions-4   50000     39625 ns/op
set.BenchmarkSmallSetWithMoreCollisions-4   50000     40368 ns/op  =>  403 ns/Add()
set.BenchmarkSmallSetWithManyCollisions-4   50000     40318 ns/op
set.BenchmarkMediumSetWithFewCollisions-4    5000    473111 ns/op
set.BenchmarkMediumSetWithMoreCollisions-4    5000    486249 ns/op  =>  97 ns/Add()
set.BenchmarkMediumSetWithManyCollisions-4    5000    462251 ns/op
set.BenchmarkLargeSetWithFewCollisions-4     200   9004800 ns/op
set.BenchmarkLargeSetWithMoreCollisions-4     200   9259785 ns/op  => 92 ns/Add()
set.BenchmarkLargeSetWithManyCollisions-4     200   8598025 ns/op

Here is the version of set_test.go where we stop the clock while building the top-level array (as would happen once in an actual program and not per 100 insertions, presumably, otherwise use the numbers above):

set.BenchmarkSmallSetWithFewCollisions-4  100000     26281 ns/op
set.BenchmarkSmallSetWithMoreCollisions-4  100000     26759 ns/op  => 267 ns/Add()
set.BenchmarkSmallSetWithManyCollisions-4  100000     26277 ns/op
set.BenchmarkMediumSetWithFewCollisions-4    5000    449336 ns/op
set.BenchmarkMediumSetWithMoreCollisions-4    5000    466167 ns/op  => 93 ns/Add()
set.BenchmarkMediumSetWithManyCollisions-4    5000    448365 ns/op
set.BenchmarkLargeSetWithFewCollisions-4     200   8967720 ns/op
set.BenchmarkLargeSetWithMoreCollisions-4     200   8984375 ns/op  =>  90 ns/Add()
set.BenchmarkLargeSetWithManyCollisions-4     200   8592885 ns/op

This is not bad. 90-93 ns/Add() here and 86 ns/Add() in the static version. No claims of best coding. In particular, I would not have BItSetSegment as a struct but rather as a simple slice of bytes and skip the ".data" tags for clarity, but I wanted to make the changes as few as possible between versions. Plus, one might want to keep per Segment counts so that code to go and extract the values could stop when all had been found in a particular segment (average 2x speedup maybe.) That's when this radix-sort wannabe approach could shine, at least when the counts were large. Especially in a tree implementation.
Reply all
Reply to author
Forward
0 new messages