intsets performance

268 views
Skip to first unread message

Leo Neumeyer

unread,
Apr 17, 2017, 10:27:00 PM4/17/17
to gonum-dev
Hi all. I've been using github.com/gonum/graph to build a large graph (about 10M nodes). I lik the design it is very intuitive and easy to use. 

I noticed performance issues, building the graph was getting slower as the graph grew. I finally found the issue, something is wrong with the intsets package (golang.org/x/tools/container/intsets).

I replaced intsets with map[int]struct{} and the performance gain was dramatic:

* build the graph using simple package went from 10 minutes to 30 seconds
* connected components from 4 hours to 25 seconds

In some cases the change is trivial:

// DepthFirst implements stateful depth-first graph traversal.
type DepthFirst struct {
    EdgeFilter func(graph.Edge) bool
    Visit      func(u, v graph.Node)
    stack      linear.NodeStack
    //  visited    *intsets.Sparse
    visited map[int]struct{}
}

I will do more testing to confirm this. 

-leo
   

Dan Kortschak

unread,
Apr 17, 2017, 10:34:17 PM4/17/17
to Leo Neumeyer, gonum-dev, Alan Donovan
That's unfortunate. The code was initially written to use
map[int]struct{} and was changed to use intsets.Sparse because I noted
performance gains with that (I did not test with tens of millions of
nodes though).

It may be worth getting in contact with Alan Donovan about this (cc'd),
since he is the main author of intsets, to see if there is anything
that can be done to improve the performance of insets.Sparse.

Dan

Dan Kortschak

unread,
Apr 18, 2017, 6:37:34 PM4/18/17
to Leo Neumeyer, gonum-dev
For the record, the relevant commits were:

commit 0f9dfa1251e569ecb9f2f4287ead78cbdd0ed298
Author: kortschak <dan.ko...@adelaide.edu.au>
Date:   Wed Sep 9 20:51:59 2015 +0930

    topo: use use golang.org/x/.../intsets for TarjanSCC
    
    name                     old mean              new mean              delta
    TarjanSCCGnp_10_tenth    20.3µs × (0.97,1.03)  20.0µs × (0.99,1.05)     ~    (p=0.400 n=10+9)
    TarjanSCCGnp_100_tenth    489µs × (0.98,1.01)   448µs × (0.99,1.01)    -8.38% (p=0.000 n=8+9)
    TarjanSCCGnp_1000_tenth  39.2ms × (0.99,1.02)  37.2ms × (1.00,1.00)    -5.12% (p=0.000 n=8+9)
    TarjanSCCGnp_10_half     28.9µs × (1.00,1.00)  26.7µs × (1.00,1.01)   -7.67% (p=0.000 n=9+10)
    TarjanSCCGnp_100_half    1.90ms × (0.99,1.01)  1.72ms × (0.97,1.02)  -9.49% (p=0.000 n=10+10)
    TarjanSCCGnp_1000_half    177ms × (1.00,1.01)   169ms × (0.96,1.06)  -4.68% (p=0.001 n=10+10)


commit cf7ce3ae79fc6ddb4241d7a3a8e325f5b28d6748
Author: kortschak <dan.ko...@adelaide.edu.au>
Date:   Wed Sep 9 11:35:46 2015 +0930

    traverse: use golang.org/x/.../intsets
    
    This gives a significant performance improvements:
    
    WalkAllBreadthFirstGnp_10_tenth    6.79µs ± 3%  4.76µs ± 2%  -29.91%   (p=0.000 n=10+9)
    WalkAllBreadthFirstGnp_100_tenth    233µs ± 1%   172µs ± 1%  -26.26%    (p=0.000 n=9+8)
    WalkAllBreadthFirstGnp_1000_tenth  19.5ms ± 1%  15.9ms ± 0%  -18.69%   (p=0.000 n=10+9)
    WalkAllBreadthFirstGnp_10_half     14.8µs ± 2%  11.1µs ± 1%  -24.90%  (p=0.000 n=10+10)
    WalkAllBreadthFirstGnp_100_half     866µs ± 2%   758µs ± 2%  -12.44%   (p=0.000 n=8+10)
    WalkAllBreadthFirstGnp_1000_half   82.2ms ± 2%  73.0ms ± 1%  -11.16%  (p=0.000 n=10+10)
    WalkAllDepthFirstGnp_10_tenth      6.84µs ± 1%  4.71µs ± 0%  -31.12%    (p=0.000 n=9+9)
    WalkAllDepthFirstGnp_100_tenth      234µs ± 1%   177µs ± 5%  -24.26%   (p=0.000 n=9+10)
    WalkAllDepthFirstGnp_1000_tenth    19.7ms ± 1%  16.0ms ± 1%  -18.46%  (p=0.000 n=10+10)
    WalkAllDepthFirstGnp_10_half       14.7µs ± 1%  11.1µs ± 0%  -24.28%  (p=0.000 n=10+10)
    WalkAllDepthFirstGnp_100_half       894µs ± 6%   764µs ± 3%  -14.53%  (p=0.000 n=10+10)
    WalkAllDepthFirstGnp_1000_half     82.6ms ± 0%  73.5ms ± 1%  -11.06%  (p=0.000 n=10+10)
    
    Using the non-pointer type for visited is clearer, but resulted in
    an ~4-8% performance drop for some cases. So leaving as the pointer.


commit 0cb3ef48ea002db3ba72d0f7c9b237b991f95c7c
Author: kortschak <dan.ko...@adelaide.edu.au>
Date:   Sun Oct 4 15:57:46 2015 +1030

    simple: clean up code and docs for simple graphs


commit cee7cd2ec964b689269d4a6ab33f8170cb0cb486
Author: kortschak <dan.ko...@adelaide.edu.au>
Date:   Fri Oct 30 18:30:48 2015 +1030

    graph: return all neighbours reachable from u
    
    The transformation of a digraph to an undirected graph means nodes
    returned by To should be returned by From.


I think the remaining instances of use were included in new code after
the decision to use intsets.Sparse.

Dan Kortschak

unread,
May 27, 2017, 12:14:59 AM5/27/17
to Leo Neumeyer, gonum-dev, Alan Donovan
Any update on this? I'm looking at changes to the graph API that impact
on this (changing ID values to int64 and other things - see [1] and
[2]). Some of these things may be helped if we leave intsets.Sparse, so
it would be nice to know the details of this.

Are you able to share a minimal reproducer?

[1]https://github.com/gonum/graph/issues/87
[2]https://github.com/gonum/graph/issues/109

Leo Neumeyer

unread,
May 28, 2017, 1:56:37 PM5/28/17
to Dan Kortschak, gonum-dev, Alan Donovan
Here is a simple test creating 1M ids with intset vs using map[int]struct{}. The speed ratio is 34x and increases with the number of IDs. (for 2M the ratio is 66x so clearly not linear!).

I suggest we replace intset throughout. I use len(structset) to get the next ID and don't reuse IDs after a node is removed using delete(structset, someID). To support pretty much any practical use case we could use uint64 instead of int to have more IDs available.

We should also increase the graph size to millions in the benchmarks to catch performance issues. 

what do you think?

-leo

-----

package simple

import (
"testing"
"time"

)

var freeIDs intsets.Sparse
var usedIDs intsets.Sparse
var structset = make(map[int]struct{})

const nIter = 1000000

func TestIntset(t *testing.T) {

t0 := time.Now()
for i := 0; i < nIter; i++ {
id := newID()
freeIDs.Remove(id)
usedIDs.Insert(id)
}
t.Logf("numIDs=%d, dur=%v", nIter, time.Since(t0))
}

func TestIntset2(t *testing.T) {

t0 := time.Now()
for i := 0; i < nIter; i++ {
_ = newID2()
}
t.Logf("numIDs=%d, dur=%v", nIter, time.Since(t0))
}

func newID() int {

var id int
if freeIDs.Len() != 0 && freeIDs.TakeMin(&id) {
return id
}
if id = usedIDs.Max(); id < maxInt {
return id + 1
}
for id = 0; id < maxInt; id++ {
if !usedIDs.Has(id) {
return id
}
}
panic("unreachable")
}

func newID2() int {

id := len(structset)
structset[id] = struct{}{}
return id
}


$ go test -v -run Intset   github.com/gonum/graph/simple
=== RUN   TestIntset
--- PASS: TestIntset (10.81s)
intset_test.go:24: numIDs=1000000, dur=10.812868181s
=== RUN   TestIntset2
--- PASS: TestIntset2 (0.31s)
intset_test.go:33: numIDs=1000000, dur=313.803051ms

Dan Kortschak

unread,
May 28, 2017, 8:08:11 PM5/28/17
to Leo Neumeyer, gonum-dev, Alan Donovan
I'm quite surprised at that (I have attached benchmark that make use of
testing and code in gonum/gonum#30 - this gives better information for
this kind of thing).

I would like to see benchmark results on real use though - not just
obtaining IDs. I suspect that intsets.Sparse is better for read mainly
work (I have benchmark support for its use in the algorithms where is
has been used in other packages).

Note that your approach to getting IDs as you describe it is incorrect;
if you are deleting from the map, then len(s) will return you values
that are not guaranteed to be free to use. You can either not delete
(wasting memory) or you can keep an additional variable holding the
next ID and bump that when an ID is added. In practice, the NewID
method of the simple graphs will stay the same (you don't have to use
them) - though maybe we can add an IDSource interface value to the
graph types that allows users to customise this behaviour (nil would be
the current behaviour).

I don't think that IDs should be unsigned; there are very good uses for
signed node IDs, but I agree using a 64 bit integer is worth while - as
an example, there is no reason a disk-backed store could not be used on
32 bit architectures. This complexifies the issue of using
intsets.Sparse as we would need to fork it to keep int64 support on 32
bit platforms.
ids_test.go

Daniel Skinner

unread,
May 29, 2017, 8:16:46 AM5/29/17
to Dan Kortschak, Leo Neumeyer, gonum-dev, Alan Donovan
It's often more flexible and faster to use sort.Search, see attached for example which performs 10 million inserts in less time than map[int]struct. 

$ go test -run none -bench . -benchmem
BenchmarkStructSet-4   10000000       211 ns/op      40 B/op       0 allocs/op
BenchmarkIntSlice-4     10000000       122 ns/op      42 B/op       0 allocs/op

For other set types, such as strings, it often has much fewer allocations as well. Even at 100 million inserts, it is significantly faster.

$ go test -run none -bench . -benchmem -benchtime 10s
BenchmarkStructSet-4   100000000       210 ns/op      34 B/op       0 allocs/op
BenchmarkIntSlice-4     100000000       132 ns/op      49 B/op       0 allocs/op



--
You received this message because you are subscribed to the Google Groups "gonum-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gonum-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
ids_test.go

Daniel Skinner

unread,
May 29, 2017, 8:51:49 AM5/29/17
to Dan Kortschak, Leo Neumeyer, gonum-dev, Alan Donovan
I've actually been testing/working with a lot of random strings and looking at this benchmark, realized the predictability of N for performing an insert, my bad. Using a `var samples = rand.Perm(N)` is a better comparison for ints in what I was doing and lowers the ceiling dramatically.

For 10,000 inserts per iteration:

$ go test -run none -bench . -benchmem
BenchmarkStructSet-4      1000   1421114 ns/op  383133 B/op     625 allocs/op
BenchmarkIntSlice-4         200   9900846 ns/op  386296 B/op      20 allocs/op

So, nevermind :D
ids_test.go

Leo Neumeyer

unread,
May 29, 2017, 12:12:51 PM5/29/17
to Dan Kortschak, gonum-dev, Alan Donovan
You are right, in my actual application I keep the maxID in a separate variable. I found this problem in a real application with a very large graph. The test code is just to reproduce the problem. It shows that intset does not scale for this use case. map is constant time for reads, writes, and deletes and O(n) for memory.

-leo

Dan Kortschak

unread,
May 29, 2017, 6:25:37 PM5/29/17
to Leo Neumeyer, gonum-dev, Alan Donovan
No worries. I would like to be able to look at the kinds of things that
we provide and that you are using though. The problem I face is that
the change from intsets.Sparse to internal.IntSet potentially
propagates throughout graph/... and I have benchmarking that shows
improvements from the change to intsets.Sparse. So I would like to be
well informed before I go through and make the reversions to
internal.IntSet.

Dan Kortschak

unread,
May 30, 2017, 8:24:18 PM5/30/17
to Alan Donovan, Leo Neumeyer, gonum-dev
Thanks Alan,

I think the conclusion we can draw from this in conjunction with other
conversations we've had is that we'll probably move back to the map-
based sets thoughout the suite.

Part of this decision is due to the desire to move to int64 sets for
all platforms (for the case of on-disk graphs on 32 bit platforms) and
the effort needed to make intsets.Sparse work this way.

cheers
Dan

On Tue, 2017-05-30 at 14:41 -0400, Alan Donovan wrote:
> Late to the party---sorry.
>
> A few observations about intsets.Sparse:
>
> 1. intsets.Sparse was intended for pointer analysis, whose sets often
> have
> a particular locality structure for which sparse bit vectors are a
> good
> fit.  They may not be the right data type for all arbitrary graphs.
>
> 2. since Sparse was created, the cost of storing pointers to the Go
> heap
> has increased (due to GC write barriers); meanwhile, Go's built-in
> map has
> gotten faster, so the performance advantage of Sparse even for
> pointer
> analysis may not be as great as it once was.
>
> 3. Unlike Go maps, Sparse sets have deterministic iteration order,
> which
> may be worth paying some price for since it makes debugging
> applications
> easier.  (Debugging a nondeterministic pointer analysis would have
> been
> next to impossible.)
>
> 4. The bitsPerBlock parameter (a constant) may make a big difference
> to
> performance for a given problem.  Try various multiples of 64 to see
> how
> performance varies.  Results depend on many factors.
>
> 5. In your benchmark, don't use Len() != 0; it's expensive for Sparse
> sets.
> Use !Empty(), or just call TakeMin directly.
>
> Also:
>
> re: sort.Search: I recently found that its dynamic call accounts for
> two
> third of execution time.  Try copy/pasting the code and using a
> static
> function call.
>
>
>
> On 29 May 2017 at 18:25, Dan Kortschak <dan.ko...@adelaide.edu.au
--
Omnes mundum facimus.

Dan Kortschak <dan.ko...@adelaide.edu.au>
F9B3 3810 C4DD E214 347C B8DA D879 B7A7 EECC 5A40
10C7 EEF4 A467 89C9 CA00 70DF C18F 3421 A744 607C


Leo Neumeyer

unread,
Sep 18, 2017, 8:10:29 PM9/18/17
to gonum-dev
Hi, sorry it took so long. I finally had a chance to try the new implementation of the graph package. I can confirm that adding nodes is now constant time (I compared with my modified version and compute time is comparable.) Thanks!
Reply all
Reply to author
Forward
0 new messages