Chains: env vs in vs env.in

Miguel Branco

unread,

Sep 7, 2016, 5:46:07 AM9/7/16

to ki...@googlegroups.com

Dear all,

Looking at the Oberon0 example, and having faced am issue in our source code when trying to define chains across multiple traits (e.g. envinDef(in) orElse super.envin(in) ), I found the following:

    def envout(out : SourceNode => Environment) : SourceNode ==> Environment = {
        case b : Block       => leave(out(b))
        case ConstDecl(d, _) => env(d)
        case TypeDecl(d, _)  => env(d)
        case n @ IdnDef(i)   => define(out(n), i, entity(n))
        case a : Assignment  => env.in(a)
        case e : Expression  => env.in(e)
    }

My question is then: env(...) is an alias for env.out(). But what would be the different between doing in() vs env.in() ? When to use one or the other? (Same for out vs env.out.) The intuition is that e.g. env.in is the real environment while in is a default one (use to do e.g. enter(in(..)) since enter(env.in(..)) would cause a loop?). It'd be helpful to have this more formally understood! :D

thanks so much! Really appreciating your work here!!

Many thanks,
Miguel

Tony Sloane

unread,

Sep 7, 2016, 11:16:41 PM9/7/16

to ki...@googlegroups.com

Hi Miguel,

Your intuition is correct. The in (resp. out) argument gives access to the default. In other words, that’s the value that the chain will have either in or out of a node, if you don’t specify any additional computation for that situation.

So, in define(out(n), i, entity(n)) we are taking the default value which is computed by the sub-tree and adding a new definition to it. In something like enter(in(n)) we are taking the value coming in from the parent node and entering a new scope in that environment.

When we say env(d) (which is an alias for env.out(d)) we are saying “give us the value of the chain as it heads out of d, including any extra processing that applies at d”. If we said out(d) we would just get the default value heading out of d without the extra processing.

Yes, something like using “enter(env.in(d))” in the definition of env.in at d would result in a cycle. since you would be defining env.in(d) in terms of itself.

I’m not sure if this explanation makes things as clear as they could be. We are working on more comprehensive docs for Kiama so this will definitely get good treatment there.

> thanks so much! Really appreciating your work here!!

No worries. Thanks for the feedback and we’re glad Kiama is of use to you.

cheers,
Tony

Miguel Branco

unread,

Sep 21, 2016, 4:52:42 AM9/21/16

to ki...@googlegroups.com

Makes sense, thank you!

A follow-up question if I may:

I find that in fairly large trees, a lot of time seems to be spent in pattern matching tree.parent.pair(...).

In some situations we use it fairly regularly as part of the definition of the chains.

Does this intuitively make sense to you - a fairly obviously slowdown in trees with 1K to 10K nodes - or are we aiming at the wrong duck here?

cheers!

--
You received this message because you are subscribed to the Google Groups "kiama" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kiama+unsubscribe@googlegroups.com.
To post to this group, send email to ki...@googlegroups.com.
Visit this group at https://groups.google.com/group/kiama.
For more options, visit https://groups.google.com/d/optout.

Miguel Branco

unread,

Sep 27, 2016, 7:26:25 AM9/27/16

to ki...@googlegroups.com

Just an update on this one:

So we indeed found a few bottlenecks which might be of interest:

One such is in tree.parent.pair(...). The unapply in pair calls image(...) which goes to graph and does a collect. The graph is implemented with a Vector, which involves a sequential scan for reference equality. That collect was killing us in big trees, in the midst of a chain doing tree.parent.pair. We switched the implementation (actually, we created a separate graph object just to try out) using Java's IdentityHashMap and we saw a very significant improvement in performance (>10x). The implementation we did is incredibly naive but it shows the concept:

lazy val fastGraph: util.IdentityHashMap[T, Vector[U]] = {
  val hashMap = new util.IdentityHashMap[T, Vector[U]]()
  for ((t, u) <- graph) {
    val v = hashMap.get(t)
    if (v == null) {
      hashMap.put(t, Vector(u))
    } else {
      hashMap.put(t, v :+ u)
    }
  }
  hashMap
}

The image method then is:

def image(t : T) : Vector[U] = {
  fastGraph.getOrDefault(t, Vector())
  //graph.collect { case (t1, u) if same(t, t1) => u }
}

Separately, in Tree.scala, we disabled the checks for NodeNotInTree, which itself triggered another 10x gain. So our code was taking 100s to analyze, now less than 1s.

We may contribute these as proper patches in the near future, but I'd just like to let you know in advance in case it helps!

cheers,

Miguel

Miguel Branco

unread,

Sep 27, 2016, 7:51:16 AM9/27/16

to ki...@googlegroups.com

ps: in case it wasn't obvious, the snippets above are not correct for the general case (e.g. no reliance on same(..)) but it should still show the sort of improvement possible to get with a better structure.

Tony Sloane

unread,

Sep 28, 2016, 11:47:48 PM9/28/16

to Kiama

Hi Miguel,

Thanks for the feedback. Frankly, it doesn’t surprise me that there are performance issues in the current tree relation implementation. It’s very much a first cut to get some experience with the approach.

Yep, doing the collect each time for a big tree would be a problem. We already do quite bit of caching either with lazy vals or Kiama attributes (which are backed by an identity cache), so this looks like a place where we should do some more. I will look at it. Thanks for the tip.

The NodeNotInTree one is trickier, I think, since we wouldn’t want to disable it completely. The issue is what should happen if you accidentally do something like ask for the parent of a node that’s not in the tree structure you’re using. In an earlier version we didn’t have the NodeNotInTree check and it lead to a bunch of hard-to-find bugs since each such node just silently appeared to have no parent at all. I will take a look to make this check faster though. A linear search is not really scalable, of course.

cheers,
Tony

> On 27 Sep. 2016, at 9:50 pm, Miguel Branco <m...@spacebase.org> wrote:
>
> ps: in case it wasn't obvious, the snippets above are not correct for the general case (e.g. no reliance on same(..)) but it should still show the sort of improvement possible to get with a better structure.
>
> On 27 September 2016 at 13:26, Miguel Branco <m...@spacebase.org> wrote:
> Just an update on this one:
>
> So we indeed found a few bottlenecks which might be of interest:
>
> One such is in tree.parent.pair(...). The unapply in pair calls image(...) which goes to graph and does a collect. The graph is implemented with a Vector, which involves a sequential scan for reference equality. That collect was killing us in big trees, in the midst of a chain doing tree.parent.pair. We switched the implementation (actually, we created a separate graph object just to try out) using Java's IdentityHashMap and we saw a very significant improvement in performance (>10x). The implementation we did is incredibly naive but it shows the concept:
> lazy val fastGraph: util.IdentityHashMap[T, Vector[U]] = {
> val hashMap = new util.IdentityHashMap[T, Vector[U]]()
> for ((t, u) <- graph) {
> val v = hashMap.get(t)
> if (v == null) {
> hashMap.put(t, Vector(u))
> } else {
> hashMap.put(t, v :+ u)
> }
> }
> hashMap
> }
>
> The image method then is:
> def image(t : T) : Vector[U] = {
> fastGraph.getOrDefault(t, Vector())
> //graph.collect { case (t1, u) if same(t, t1) => u }
> }
>
> Separately, in Tree.scala, we disabled the checks for NodeNotInTree, which itself triggered another 10x gain. So our code was taking 100s to analyze, now less than 1s.
>
> We may contribute these as proper patches in the near future, but I'd just like to let you know in advance in case it helps!
>
> cheers,
> Miguel
>
>
>
> On 21 September 2016 at 10:52, Miguel Branco <m...@spacebase.org> wrote:

Miguel Branco

unread,

Sep 29, 2016, 7:53:49 AM9/29/16

to ki...@googlegroups.com

Thanks for the feedback. We're slowly getting more familiar with Kiama internals and we're happy to contribute on this as we manage. If there are any particular hints/places to look/changes you have in mind, let us know.

BTW, we also see some issues with lazy cloning the tree; likely due to similar performance issues.

And you're completely right on NodeNotInTree. It was really just a compromise for now; we run our "compiler" in "training wheels" mode where it checks itself the internal consistency between "phases" (i.e. between strategies being applied) so its removal is a lesser issue. But it's by no means a proper solution.