new backend, part 1 of 2: code refactored into feature-per-commit, future work

150 views
Skip to first unread message

Miguel Garcia

unread,
Mar 16, 2013, 6:30:46 AM3/16/13
to scala-i...@googlegroups.com


As before the new backend delivers between 10% and 20% compiler speedup, please try it yourself: Getting Started (Section 4) at http://magarciaepfl.github.com/scala/

In the style of "feature-per-commit" , the new backend and optimizer look as follows:
  https://github.com/magarciaEPFL/scala/compare/master...GenRefactored5


Theme: Lower GC
---------------

Two commits exemplify this. First: in Constructors, (code is emitted to) test outer for null, followed by throwing an NPE. As per the JVM spec, a smaller tree is enough:

-          IF (from OBJ_EQ NULL) THEN Throw(NewFromConstructor(NPEConstructor)) ELSE result
+          IF (from OBJ_EQ NULL) THEN Throw(gen.mkZero(ThrowableClass.tpe)) ELSE result


More aggressively, "sharing immutable Trees for lower GC overhead" has the compiler reuse zero-literals (no matter how many times they are typed, they'll always type the same)
  https://github.com/magarciaEPFL/scala/commit/41c70f87a73534f6e15de5cd08608cd678350693


Theme: Improvements unrelated to the new backend/optimizer
----------------------------------------------------------

Actually all commits up to "new bytecode emitter, GenBCode, activated by default" aren't tied to the new backend/optimizer in any way. The last one ("sharing immutable Trees") also falls in this category.


Theme: New backend emitter
--------------------------

GenBCode works as a drop-in replacement of GenASM (after all it emits code with the same API). From the let go, however, GenBCode avoids bugs that require *further* *additional* *SLOW* processing on the part of GenICode + GenASM (to avoid those bugs) bugs like SI-6720 (backedges in constructor arguments), or an over-populated InnerClasses JVM attribute (SI-6759, SI-6546)

As bytecode emitters go, GenBCode is easy to review. Have you found anywhere else a fairly complete description of how try-catch-finally is lowered, other than here?
  https://github.com/magarciaEPFL/scala/blob/c136f88943a834c565f0926b82402b549ba33287/src/compiler/scala/tools/nsc/backend/jvm/GenBCode.scala#L1420

It's true that, as commits progress, one can see that GenBCode is more than a faster GenASM. To name just two synergies:

  (a) GenBCode plays well with Late-Closure-Classes (aka leaner closure ASTs), doing part of the job (materializing them late in the pipeline).

  (b) also related to Late-Closure-Classes, a more thorough outer-pointer elimination is done in the whereabouts of GenBCode,
      https://github.com/magarciaEPFL/scala/commit/0a426b640411ee85983a6deb8a5612ebaa6d5ff3#diff-5

Features like the above are not needed initially to understand the architecture of GenBCode, an architecture that becomes clear from the self-contained commit at:
  https://github.com/magarciaEPFL/scala/commit/c136f88943a834c565f0926b82402b549ba33287


Theme: the new optimizer
------------------------

These commits constitute the second half of https://github.com/magarciaEPFL/scala/compare/master...GenRefactored5

There's the very rutinary (dead-code elimination and so on) as part of "Level 1 of the new optimizer: intra-method optimizations"

Inlining is also well-known territory (only that this time, documented, in the sense that pre-conditions and post-conditions can be gleaned from the documentation)

There are new optimizations (without ICode counterpart) around closures,
that's what commit "Level 3, new optimizer: GC-savvy closures (singleton closures, minimal closure state)"
is all about. Innovation is a good thing! Otherwise the compiler will continue to emit dumbed down closures!

In all cases I've found the new optimizer emits code that actually run faster. As you know, the same cannot be said about the current optimizer.


Future work
-----------

Along the lines of "lower GC overhead during compilation" , the following ideas seem like safe bets:

  (c.1) guaranteed empty-stack on try-entry, without packing the try expression into its own method.
        That should cut down on code size, boxing, and indirect dispatch.
        Moreover improving run time in addition to compile time.
        Early discussion: https://groups.google.com/d/msg/scala-internals/VkEL7wOVQpE/aSiNnF3ym-cJ

  (c.2) lower GC overhead for Set[Symbol]
        https://groups.google.com/d/msg/scala-internals/leI41TCuTdc/MjDQqdJbW7AJ
        https://groups.google.com/d/msg/scala-internals/leI41TCuTdc/AcYEz4AwXSUJ


Summing up
----------

That's a good deal of food for thought for the weekend, isn't it? ;-)



Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/



Reply all
Reply to author
Forward
0 new messages