of interest to Android developers: scalac emitting 30% less code

131 views
Skip to first unread message

Miguel Garcia

unread,
Jun 18, 2013, 7:21:10 AM6/18/13
to scala-i...@googlegroups.com

Remember all those little classfiles for anonymous closures?
How about using the following as template instead? (in the example, for lambdas of arity 2).

package scala.runtime

final class ReflBasedFunR2[-T1, -T2, +R](
  delegate: java.lang.reflect.Method,
  receiver: Object,
  args: Array[Object]
) extends AbstractFunction2[T1, T2, R] {

  /** Apply the body of this function to the argument{s}.
   *  @return   the result of function application.
   */
  def apply(v1: T1, v2: T2): R = {
    args(0) = v1.asInstanceOf[AnyRef]
    args(1) = v2.asInstanceOf[AnyRef]
    try {
      delegate.invoke(receiver, args: _*).asInstanceOf[R]
    } catch {
      case ite: java.lang.reflect.InvocationTargetException => throw ite.getCause()
    }
  }

}


In exchange for the 30% code reduction, each apply() now involves a reflective invocation. Isn't that a lot slower? On Android I haven't tried, but when compiling the compiler the slowdown lies between 5% and 10%. Two measures were taken to lessen the performance impact:
  • the "target method" is looked up at most once, and cached afterwards. Lookup via getDeclaredMethod() is the expensive part of "reflection", successive invocations on it are comparatively unexpensive.
  • in that particular benchmark, the compiler was compiled with the new optimizer, at optimization level -neo:o3 (ie, cross-library method and closure inlining). Doing so results in a program that's both faster and smaller.

Given ReflBasedFun... subclass AbstractFunctionX , the receiver won't notice the difference. The "target method" is public thus posing no access restriction when called from the lambda. On the other hand, the "producer" of a lambda has to know how to obtain it, and the anon-closure-class isn't emitted anymore. In that sense, binary compatibility is broken.

To benchmark, compile with -closurify:reflect using the new compiler backend: http://magarciaepfl.github.io/scala/ (Getting Started, instructions for sbt, etc., can be found there).

How do reflect-based lambdas compare with MethodHandles? To recap, support for MethodHandles was removed (for now) from the new backend because performance was disappointing. As of JDK 7, both reflection and method-handles result in similarly small JARs, but reflection-based lambdas (currently) outerperform their MethodHandle-based counterparts. And there's also Android, right?

Under the hood, reflection-based lambdas build upon the Late-Closure-Classes approach, in just two commits:

 (1) adding scala.runtime.ReflBasedFun...
     https://github.com/magarciaEPFL/scala/commit/67b1f0936c19cfd42d2192a87fccdf285fe6daab

 (2) rewriting for reflect-based lambdas (including ... documentation!)
     https://github.com/magarciaEPFL/scala/commit/8ed60361845f376c99328c3f81cd47678a57e8e4#L5R34


How about testing? The compiler bootstraps using reflection-based lambdas. Therefore -closurify:reflect must be doing something well.

Ah, but there are areas for improvement. Some tests simply have been tuned to "one little classfile per lambda", for example the following fails under -closurify:reflect

% diff test/files/specialized/fft-specialized.log test/files/specialized/fft.check
@@ -1,4 +1,4 @@
 Processing 65536 items
 Boxed doubles: 0
-Boxed ints: 3
+Boxed ints: 2
 Boxed longs: 1179811



A few more tests fail because of serialization complaining  about j.l.r.Method standing in the way (the `delegate` field in ReflBasedFun... should have been transient, after all). Here's the list of failing tests, easy to reproduce. Want to help with those?

SCALAC_OPTS=-closurify:reflect ./test/partest path/to/test

  [partest] !! 1 - continuations-run/ifelse2.scala           [non-zero exit code]
  [partest] !!  227 - run/t6467.scala                           [non-zero exit code]
  [partest] !!  552 - run/t4895.scala                           [non-zero exit code]
  [partest] !!  662 - run/t6052.scala                           [non-zero exit code]
  [partest] !!  902 - run/t6410.scala                           [output differs]
  [partest] !!  931 - run/t5018.scala                           [non-zero exit code]
  [partest] !!  78 - jvm/t1143.scala                           [non-zero exit code]
  [partest] !!  86 - jvm/serialization-new.scala               [non-zero exit code]
  [partest] !! 100 - jvm/serialization.scala                   [non-zero exit code]
  [partest] !! 101 - jvm/t1116.scala                           [non-zero exit code]
  [partest] !! 113 - jvm/t1143-2                               [non-zero exit code]
  [partest] !! 18 - scalacheck/range.scala                    [ScalaCheck test failed]
  [partest] !! 21 - scalacheck/parallel-collections           [ScalaCheck test failed]
  [partest] !! 22 - scalacheck/Ctrie.scala                    [ScalaCheck test failed]
  [partest] !! 15 - specialized/fft.scala                     [output differs]
  [partest] !! 20 - specialized/spec-absfun.scala             [output differs]




Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

√iktor Ҡlang

unread,
Jun 18, 2013, 7:27:19 AM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 1:21 PM, Miguel Garcia <miguel...@tuhh.de> wrote:

Remember all those little classfiles for anonymous closures?
How about using the following as template instead? (in the example, for lambdas of arity 2).

package scala.runtime

final class ReflBasedFunR2[-T1, -T2, +R](
  delegate: java.lang.reflect.Method,
  receiver: Object,
  args: Array[Object]
) extends AbstractFunction2[T1, T2, R] {

  /** Apply the body of this function to the argument{s}.
   *  @return   the result of function application.
   */
  def apply(v1: T1, v2: T2): R = {
    args(0) = v1.asInstanceOf[AnyRef]
    args(1) = v2.asInstanceOf[AnyRef]
    try {
      delegate.invoke(receiver, args: _*).asInstanceOf[R]
    } catch {
      case ite: java.lang.reflect.InvocationTargetException => throw ite.getCause()
    }
  }

}

Just noting that this isn't Threadsafe (parallel collections, Akka android users etc)

Cheers,
 


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Viktor Klang
Director of Engineering

Twitter: @viktorklang

Miguel Garcia

unread,
Jun 18, 2013, 7:31:53 AM6/18/13
to scala-i...@googlegroups.com

Any of the proposed techniques for lambdas that capture just read-only, stationary values; can also be applied to reflection-based lambdas.

Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

√iktor Ҡlang

unread,
Jun 18, 2013, 7:41:09 AM6/18/13
to scala-i...@googlegroups.com
I'm not sure I understand what you mean. I meant:

scala> object Foo { def plus(a: Long, b: Long): Long = a + b }
defined module Foo

scala> new ReflBasedFunR2(Foo.getClass.getDeclaredMethod("plus", classOf[Long], classOf[Long]), Foo, Array.ofDim[Object](2))
res9: ReflBasedFunR2[Any,Any,Nothing] = <function2>

scala> (1l to 100000l).par reduce res9
res10: Long = 9404356473

scala> (1l to 100000l).par reduce res9
res11: Long = 7024524869

scala> (1l to 100000l).par reduce res9
res12: Long = 12580732276

scala> (1l to 100000l).par reduce res9
res13: Long = 4552730506

scala> (1l to 100000l).par reduce res9
res14: Long = 3862915331

scala> (1l to 100000l) reduce res9
res15: Long = 5000050000

scala> (1l to 100000l) reduce res9
res16: Long = 5000050000

scala> (1l to 100000l) reduce res9
res17: Long = 5000050000

Johannes Rudolph

unread,
Jun 18, 2013, 7:43:13 AM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 1:41 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
I'm not sure I understand what you mean. I meant:

I.e. you can't reuse that args array. 

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Johannes Rudolph

unread,
Jun 18, 2013, 7:53:22 AM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 1:21 PM, Miguel Garcia <miguel...@tuhh.de> wrote:
In exchange for the 30% code reduction, each apply() now involves a reflective invocation. Isn't that a lot slower? On Android I haven't tried, but when compiling the compiler the slowdown lies between 5% and 10%. 

AFAIR the OpenJDK standard library contains enough of a bytecode generator to compile reflective method calls into specialized call stubs with a generic interface (`GeneratedMethodAccessor`, you often see them in turning up stack traces). This means the cost should mainly consist of the cost of adopting arguments to that generic interface and the added cost of doing a virtual call (to the stub, not to the actual function). Taken that into account with your new strategy you win only for rarely called functions. Functions (objects) that are called often enough are compiled anyway into something even more heavy-weight than we could compile to before. It's just that the decision which ones to compile is deferred to use actual runtime metrics. So, I don't say it's necessarily a bad strategy, but maybe it would make sense to view those applications of staged compilation in a bigger context.

Speaking of Android I think the infrastructure to do this runtime generation is quite advanced (at least for Java standards) so I wouldn't be surprised that a similar mechanism is missing in Android especially since it would involve a runtime dalvik bytecode generator.

Miguel Garcia

unread,
Jun 18, 2013, 8:05:14 AM6/18/13
to scala-i...@googlegroups.com

To fix that bug, I'll make ReflBasedFunR2.apply() clone the array before it gets mutated and handed over to the reflective invocation.

Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

√iktor Ҡlang

unread,
Jun 18, 2013, 8:11:59 AM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 2:05 PM, Miguel Garcia <miguel...@tuhh.de> wrote:

To fix that bug, I'll make ReflBasedFunR2.apply() clone the array before it gets mutated and handed over to the reflective invocation.

I'd just allocate a new Array for each call and skip the ctor arg for it. It'll generate quite some garbage tho.

Miguel Garcia

unread,
Jun 18, 2013, 8:27:54 AM6/18/13
to scala-i...@googlegroups.com

Space-optimizations are certainly possible, subject to the reflective invocation receiving (in general) arguments provided by
  (a) whoever instantiates the closure (ie captured values)
while others are provided by
  (b) whoever applies the closure (apply arguments).

The lifetime of (a) encloses that of (b), for fat-closures (a) is bigger than (b), while for slim-closures (a) may be even empty (as rules of thumb, plus or minus a lot).

With the current approach garbage will be generated, yes. On the plus side, what's available already allows Android developers to determine whether it's "good enough" in general, or an improved version is needed.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

√iktor Ҡlang

unread,
Jun 18, 2013, 8:31:41 AM6/18/13
to scala-i...@googlegroups.com
Absolutely, I didn't want to make it seems like it's not useful, I just wanted to fix a bug :)

Rex Kerr

unread,
Jun 18, 2013, 11:57:13 AM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 7:21 AM, Miguel Garcia <miguel...@tuhh.de> wrote:

Remember all those little classfiles for anonymous closures?
How about using the following as template instead? (in the example, for lambdas of arity 2).

package scala.runtime

final class ReflBasedFunR2[-T1, -T2, +R](
  delegate: java.lang.reflect.Method,
  receiver: Object,
  args: Array[Object]
) extends AbstractFunction2[T1, T2, R] {

  /** Apply the body of this function to the argument{s}.
   *  @return   the result of function application.
   */
  def apply(v1: T1, v2: T2): R = {
    args(0) = v1.asInstanceOf[AnyRef]
    args(1) = v2.asInstanceOf[AnyRef]
    try {
      delegate.invoke(receiver, args: _*).asInstanceOf[R]
    } catch {
      case ite: java.lang.reflect.InvocationTargetException => throw ite.getCause()
    }
  }

}


In exchange for the 30% code reduction, each apply() now involves a reflective invocation. Isn't that a lot slower?

Yes, it's a lot slower--about 10x slower than a non-inlined function call, and it can't be inlined by any JVM I know of.  Even in my relatively heavyweight test below it's 3x slower.

You also have to create a companion object to create the method in advance because that takes forever (~70x slower than creating a function).

If the compiler's inliner can erase these at all hot sites, it might be a worthwhile tradeoff.  Otherwise 10x-70x speed for 30% space is the kind of tradeoff that you usually are happy to make in favor of speed.

If reflection outperforms method handles, that's a really horrible condemnation of method handles.  Are you certain this was properly measured head-to-head?  I thought the whole point of method handles was to have something faster than reflection (otherwise why bother?).

  --Rex
 
P.S. Benchmarking code below.  You will need Thyme to run it.  (Paste into REPL, for example.)

import java.lang.reflect.{Array => yarrA, _}


final class ReflBasedFunR2[-T1, -T2, +R](
  delegate: Method, receiver: Object
) {

  def apply(v1: T1, v2: T2): R = {
    try {
      delegate.invoke(
        receiver, v1.asInstanceOf[AnyRef], v2.asInstanceOf[AnyRef]
      ).asInstanceOf[R]
    } catch {
      case ite: InvocationTargetException => throw ite.getCause()
    }
  }
}

abstract class DirectFunR2[-T1, -T2, +R] {

  def apply(v1: T1, v2: T2): R
}

class Tester {
  val Pass = Some(())
  val s = Array.fill(1025)(util.Random.nextInt.toString)
  val o = Array.fill(1024)(Option(()))
  def f0(v1: String, v2: String) = {
    if (v1.charAt(0)==v2.charAt(0)) Pass else None
  }
  def f1Create() = f0 _
  def f2Create() = new DirectFunR2[String, String, Option[Unit]] {
    def apply(v1: String, v2: String) = f0(v1,v2)
  }
  def f3Create() = new ReflBasedFunR2[String,String,Option[Unit]](
    getClass.getMethod("f0", classOf[String], classOf[String]), this
  )
  val f1 = f1Create
  val f2 = f2Create
  val f3 = f3Create
  def runf0 = { var i = 0; while(i < o.length) { o(i) = f0(s(i), s(i+1)); i += 1 }; o }
  def runf1 = { var i = 0; while(i < o.length) { o(i) = f1(s(i), s(i+1)); i += 1 }; o }
  def runf2 = { var i = 0; while(i < o.length) { o(i) = f2(s(i), s(i+1)); i += 1 }; o }
  def runf3 = { var i = 0; while(i < o.length) { o(i) = f3(s(i), s(i+1)); i += 1 }; o }
  def mkf0 = { var i = 0; while(i < o.length) { o(i) = (f0 _)(s(i), s(i+1)); i += 1 }; o }
  def mkf1 = { var i = 0; while(i < o.length) { o(i) = f1Create()(s(i), s(i+1)); i += 1 }; o }
  def mkf2 = { var i = 0; while(i < o.length) { o(i) = f2Create()(s(i), s(i+1)); i += 1 }; o }
  def mkf3 = { var i = 0; while(i < o.length) { o(i) = f3Create()(s(i), s(i+1)); i += 1 }; o }
}

val th = ichi.bench.Thyme.warmed(verbose = print)
val test = new Tester
val r0 = th.Warm(test.runf0)
val r1 = th.Warm(test.runf1)
val r2 = th.Warm(test.runf2)
val r3 = th.Warm(test.runf3)
val m0 = th.Warm(test.mkf0)
val m1 = th.Warm(test.mkf1)
val m2 = th.Warm(test.mkf2)
val m3 = th.Warm(test.mkf3)
th.pbenchOffWarm(title = "Inline call vs. val")(r0)(r1) apply 0
th.pbenchOffWarm(title = "Inline call vs. custom")(r0)(r2) apply 0
th.pbenchOffWarm(title = "Inline call vs. reflect")(r0)(r3) apply 0
th.pbenchOffWarm(title = "Direct _ vs. one indirection")(m0)(m1) apply 0
th.pbenchOffWarm(title = "Direct _ vs. custom creation")(m0)(m2) apply 0
th.pbenchOffWarm(title = "Direct _ vs. reflect setup")(m0)(m3) apply 0


Alec Zorab

unread,
Jun 18, 2013, 12:07:12 PM6/18/13
to scala-i...@googlegroups.com
Content filters at work won't actually let me get to the blog linked ([1]), but I seem to recall that hotspot inflates (then can jit) reflective calls after some number of invocations.



Rex Kerr

unread,
Jun 18, 2013, 12:27:04 PM6/18/13
to scala-i...@googlegroups.com
I didn't mean that reflection isn't optimized--the values I measure are after optimization.  I just don't think it can take the optimization all the way to inlining.  (It can with regular functions under limited circumstances.)

  --Rex

Miguel Garcia

unread,
Jun 18, 2013, 2:36:52 PM6/18/13
to scala-i...@googlegroups.com
Rex,

The motivation for "reflection-based lambdas" isn't replacing anon-closure-classes (ever). Rather, at this stage, the main goal is finding out what *might* work reasonably well on Android, saving 30% JAR size. For all other cases, the new backend + the new optimizer deliver the best results via -closurify:delegating and -neo:o3.

Regarding slowdown, yes, those benchmarks taxing on closure application will experience unacceptable degradation with -closurify:reflect. But as the example of compiling the compiler shows, that's not the case for every Scala application (I don't suggest for users to compile the compiler on Android, it's just the testbed most familiar to me).

Regarding MethodHandles, measurements were conducted on JDK 7. The prototype [1] wasn't further tuned for a number of reasons. For one, tuned and all, the best MethodHandle performance comes with JDK 8 only. If you ask me, the new backend can support them any of these days, it's just that there's no roaring demand to have them in place.

Finally, another goal of "reflection-based lambdas" has been to put in place "hooks" for customizing closure-conversion, via -closurify alternatives. In that sense a future MethodHandles implementation will benefit from the "spring cleaning" already performed in the codebase.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/


References
[1] prototype support for MethodHandles https://groups.google.com/forum/#!searchin/scala-internals/methodhandle/scala-internals/uBxprJixpwk/jG78X1k_92IJ

Ismael Juma

unread,
Jun 18, 2013, 3:25:51 PM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 7:36 PM, Miguel Garcia <miguel...@tuhh.de> wrote:
The motivation for "reflection-based lambdas" isn't replacing anon-closure-classes (ever). Rather, at this stage, the main goal is finding out what *might* work reasonably well on Android, saving 30% JAR size. For all other cases, the new backend + the new optimizer deliver the best results via -closurify:delegating and -neo:o3.

It's an interesting experiment, but I expect that the performance hit of relying on reflection will be even higher on Android. The negative effect on battery life caused by that is likely a bigger deal than the additional jar size these days.

Best,
Ismael

Rex Kerr

unread,
Jun 18, 2013, 3:54:13 PM6/18/13
to scala-i...@googlegroups.com
On Tue, Jun 18, 2013 at 2:36 PM, Miguel Garcia <miguel...@tuhh.de> wrote:
Rex,

The motivation for "reflection-based lambdas" isn't replacing anon-closure-classes (ever). Rather, at this stage, the main goal is finding out what *might* work reasonably well on Android, saving 30% JAR size.

Fair enough, but I wouldn't consider this a successful experiment.  I'd assume that it wouldn't work in the absence of some decent on-Android performance measurements (benchmarks or realistic applications).
 
Regarding slowdown, yes, those benchmarks taxing on closure application will experience unacceptable degradation with -closurify:reflect. But as the example of compiling the compiler shows, that's not the case for every Scala application (I don't suggest for users to compile the compiler on Android, it's just the testbed most familiar to me).

This suggests to me that the compiler doesn't make heavy use of closures and/or that because it's a self-contained system that you can optimize away/inline a lot of the heavily-called cases.  If either of these is true, it would make it rather unrepresentative of your average Scala application.

Not that it's not worth testing--I'm just deeply suspicious that this was not a representative test.

  --Rex
 

Simon Ochsenreither

unread,
Jun 18, 2013, 5:11:30 PM6/18/13
to scala-i...@googlegroups.com

Regarding slowdown, yes, those benchmarks taxing on closure application will experience unacceptable degradation with -closurify:reflect. But as the example of compiling the compiler shows, that's not the case for every Scala application (I don't suggest for users to compile the compiler on Android, it's just the testbed most familiar to me).

I think it is a reasonable approach.
For heavily used closures, one could have some annotation telling the compiler to use a different strategy (e. g. generate classes like before).
Reply all
Reply to author
Forward
0 new messages