a poor man's MethodHandles

73 views
Skip to first unread message

Miguel Garcia

unread,
Nov 8, 2012, 3:01:35 PM11/8/12
to scala-i...@googlegroups.com

The way UnCurry lowers closures, lexical capture results in bytecode performing chains of "heap-hops", e.g. to reach out to C's field from a closure class nested in another:

class C {

  var cfield = 0

  def pqr(args: List[Int]) {
    for (i <- 1 to 10; j <- 11 to 20) {
      cfield = i + j + args.size
    }
  }

}

The innermost appply() receives just one stack-argument, the rest is reached via dot navigation:

    def apply$mcVI$sp(v1: Int): Unit =
       C$$anonfun$pqr$1$$anonfun$apply$mcVI$sp$1.this.$outer.C$$anonfun$$$outer().cfield_=(
           C$$anonfun$pqr$1$$anonfun$apply$mcVI$sp$1.this.v1$1 +
           v1 +
           C$$anonfun$pqr$1$$anonfun$apply$mcVI$sp$1.this.$outer.args$2.size()
       );

An alternative approach

UnCurry could instead lower as shown below, forcing the invoker to perform navigation (once), such that the callee need access only local vars in "closure methods", much like a MethodHandle would allow:

class C {

  var cfield = 0

  def abc(args: List[Int]) {

      def closuA(i: Int) {

          def closuB(j: Int) { cfield = i + j + args.size }

        (11 to 20) foreach { j => closuB(j) }
      }

    (1  to 10) foreach { i => closuA(i) }

  }

}

In this case, the innermost apply() receives "flattend" (scalarified) all the arguments it needs:

  final def C$$closuB$1(j: Int, args$1: List, i$1: Int): Unit = C.this.cfield_=(i$1.+(j).+(args$1.size()));

Should run faster.

It's true that, until Closure Elimination kicks in, the anon-closure-classes still remain as middlemen. The invoker of the method above is hosted in one such middleman, and thus needs to perform "heap-access" as shown below. However, it's on THIS only, and thus should be on the cache already:

    <specialized> def apply$mcVI$sp(v1: Int): Unit =
        C$$anonfun$C$$closuA$1$1.this.$outer.C$$closuB$1(v1, C$$anonfun$C$$closuA$1$1.this.args$1, C$$anonfun$C$$closuA$1$1.this.i$1);

I don't see any disadvantages, and moreover this lowering would give the JIT compiler an easier time (a) kicking in earlier (if only because of smaller code size); and (b) optimization opportunities (local-vars accesses are easier to optimize than heap-navigation).


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/



Grzegorz Kossakowski

unread,
Nov 8, 2012, 3:46:16 PM11/8/12
to scala-i...@googlegroups.com
Good thoughts Miguel. I agree that this seems like an interesting thing to explore. How hard would it be to implemented what you described above and gather some data on bytecode sizes of some lambda-heavy codebases (compiler is a good candidate for that)?

--
Grzegorz Kossakowski

Miguel Garcia

unread,
Nov 8, 2012, 4:05:12 PM11/8/12
to scala-i...@googlegroups.com

That's right, compiling the compiler under this approach would give a concrete verdict. Before that, everyone is welcome to point out any corner cases / counter-examples that come to mind!

Having said this, the transformation is only additive, in fact UnCurry can be left as-is. Right before UnCurry, we need to transform this AST shape:

  rcv.hiOMethod(args_i , Function(vparams, body) , args_j ).etc

into this one:

  rcv.hiOMethod(args_i ,
                {
                     def closuApply( <vparams> ) = <body>
             
                  Function( vparams, <call-to-closuApply-with-vparams-as-args> )
                }
                args_j ).etc


Good old UnCurry, ExplicitOuter, and LambdaLift will do the rest.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

Rex Kerr

unread,
Nov 8, 2012, 5:00:34 PM11/8/12
to scala-i...@googlegroups.com
Looks very promising!

What would happen in cases of multiple nesting, e.g.


class C {
  var cfield = 0
  def something = {
    var sfield = 0

    for (i <- 1 to 10; j <- 11 to 20) {
      var k = cfield + sfield
      cfield = i + j + k + args.size
      k += i*j
      sfield = i*j*k/args.size
    }
  }
}

Now we have variables at various different levels all of which need to be set, none of which can really see each other without hopping.  One can envision various schemes (including giving up and not optimizing), but did you have something in mind?

  --Rex

Miguel Garcia

unread,
Nov 8, 2012, 5:31:21 PM11/8/12
to scala-i...@googlegroups.com

As expected, by the time the innermost apply() is invoked all environment usages have been collected as formal params.

To see how that works in your example, run the compiler with -Xprint:refchecks to get code "just before UnCurry" , and manually perform the transformation described above. The following is obtained:

class C extends scala.AnyRef {
  var cfield: Int = 0;
  def something(args: List[Int]): Unit = {
    var sfield: Int = 0;
    Predef.intWrapper(1).to(10).foreach[Unit](

      {

        def closuA(i: Int) = {
          Predef.intWrapper(11).to(20).foreach[Unit](

            {
              def closuB(j: Int) = {
                var k: Int = cfield.+(sfield);
                cfield_=(i.+(j).+(k).+(args.size));
                k = k.+(i.*(j));
                sfield = i.*(j).*(k)./(args.size)
              }

              (j: Int) => closuB(j)
            }

          )
        }

        (i: Int) => closuA(i)

      }

    )
  }
}



Compile, -Xprint:cleanup already shows that heap navigation is down to accesses on the current instance only:

    final def C$$closuB$1(j: Int, args$1: List, sfield$1: runtime.IntRef, i$1: Int): Unit = {
      var k: Int = C.this.cfield().+(sfield$1.elem);
      C.this.cfield_=(i$1.+(j).+(k).+(args$1.size()));
      k = k.+(i$1.*(j));
      sfield$1.elem = i$1.*(j).*(k)./(args$1.size())
    };


The proposed transformation plays best the more captures there are. A closure like: "(1 to 10) foreach println" won't benefit from scalar replacement, thus it doesn't make sense applying the xform. Still, if applied, no big harm: the resulting extra methods are always final, and either the JVM inliner or scalac's can easily deal with them.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
Reply all
Reply to author
Forward
0 new messages