Thanks for the feedback. Frankly, it doesn’t surprise me that there are performance issues in the current tree relation implementation. It’s very much a first cut to get some experience with the approach.
Yep, doing the collect each time for a big tree would be a problem. We already do quite bit of caching either with lazy vals or Kiama attributes (which are backed by an identity cache), so this looks like a place where we should do some more. I will look at it. Thanks for the tip.
The NodeNotInTree one is trickier, I think, since we wouldn’t want to disable it completely. The issue is what should happen if you accidentally do something like ask for the parent of a node that’s not in the tree structure you’re using. In an earlier version we didn’t have the NodeNotInTree check and it lead to a bunch of hard-to-find bugs since each such node just silently appeared to have no parent at all. I will take a look to make this check faster though. A linear search is not really scalable, of course.
> On 27 Sep. 2016, at 9:50 pm, Miguel Branco <
m...@spacebase.org> wrote:
>
> ps: in case it wasn't obvious, the snippets above are not correct for the general case (e.g. no reliance on same(..)) but it should still show the sort of improvement possible to get with a better structure.
>
> On 27 September 2016 at 13:26, Miguel Branco <
m...@spacebase.org> wrote:
> Just an update on this one:
>
> So we indeed found a few bottlenecks which might be of interest:
>
> One such is in tree.parent.pair(...). The unapply in pair calls image(...) which goes to graph and does a collect. The graph is implemented with a Vector, which involves a sequential scan for reference equality. That collect was killing us in big trees, in the midst of a chain doing tree.parent.pair. We switched the implementation (actually, we created a separate graph object just to try out) using Java's IdentityHashMap and we saw a very significant improvement in performance (>10x). The implementation we did is incredibly naive but it shows the concept:
> lazy val fastGraph: util.IdentityHashMap[T, Vector[U]] = {
> val hashMap = new util.IdentityHashMap[T, Vector[U]]()
> for ((t, u) <- graph) {
> val v = hashMap.get(t)
> if (v == null) {
> hashMap.put(t, Vector(u))
> } else {
> hashMap.put(t, v :+ u)
> }
> }
> hashMap
> }
>
> The image method then is:
> def image(t : T) : Vector[U] = {
> fastGraph.getOrDefault(t, Vector())
> //graph.collect { case (t1, u) if same(t, t1) => u }
> }
>
> Separately, in Tree.scala, we disabled the checks for NodeNotInTree, which itself triggered another 10x gain. So our code was taking 100s to analyze, now less than 1s.
>
> We may contribute these as proper patches in the near future, but I'd just like to let you know in advance in case it helps!
>
> cheers,
> Miguel
>
>
>
> On 21 September 2016 at 10:52, Miguel Branco <
m...@spacebase.org> wrote: