Cliff Click's blog: "Fixing The Inlining Problem"

Alex Cruise

unread,

Apr 6, 2011, 3:05:46 AM4/6/11

to scala-i...@googlegroups.com, iulian dragos

Cliff Click (IIRC, original architect of Hotspot's "server" compiler, now ubernerd at Azul Systems) has blogged something that deserves at least to be read closely: writing hot loops that accept functions* in a way that makes them more likely to be inlined: http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem

I don't know enough about Scala optimizations and specialization to know whether we'll see the 10x performance impact he mentions--if it's the case that more-or-less idiomatic Scala implementations of the examples he posts already do perform as well as the manually specialized, monomorphic HOFs, "we" should comment to that effect on his blog. :)

Otherwise, it would be good to discuss whether the style he suggests can (or should) be implemented in the Scala compiler, and/or whether some specialization trick can be brought to bear.

-0xe1a

* Is plain old "function" the accepted term for instances of Function<n>? They're not necessarily a "closure" or a "lambda" under the usual definitions of those terms. They might be eta-expanded methods etc.

Ismael Juma

unread,

Apr 6, 2011, 3:52:23 AM4/6/11

to scala-i...@googlegroups.com, Alex Cruise, iulian dragos

On Wed, Apr 6, 2011 at 8:05 AM, Alex Cruise <al...@cluonflux.com> wrote:
> Cliff Click (IIRC, original architect of Hotspot's "server" compiler, now
> ubernerd at Azul Systems) has blogged something that deserves at least to be
> read closely: writing hot loops that accept functions* in a way that makes
> them more likely to be
> inlined: http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem

This is a known problem and has been discussed previously here:

http://thread.gmane.org/gmane.comp.lang.scala.internals/865

Separating the hot part of the code into a separate method so that it
can be inlined is also a known technique, but good to emphasize it for
this scenario with examples. Note that even if you use that style
you're not guaranteed to have what you want now. Supposedly JVMs will
be better at doing the right thing in the future.

> Otherwise, it would be good to discuss whether the style he suggests can (or
> should) be implemented in the Scala compiler, and/or whether some
> specialization trick can be brought to bear.

I guess something could be done along the same lines as @specialized,
but seems like a significant amount of work and something best handled
at JVM level, really (I guess one could say the same thing about
@specialized).

Best,
Ismael

Tiark Rompf

unread,

Apr 6, 2011, 5:40:47 AM4/6/11

to scala-i...@googlegroups.com, Alex Cruise, iulian dragos

We've known this for a long time but the blog post is still an interesting read. We're
suffering from this problem more than many people might be aware of and I think it
is one key reason why collections performance is not quite what it could be.

The problem is that current jvm heuristics are mostly based on single call sites,
and call-site based optimizations are fundamentally at odds with higher-order
functions: If you have

def foreach(f) = {
val iter = this.iterator
while (iter.hasNext)
f(iter.next) // A
}

foreach { x => // B
...
}

foreach { x => // C
...
}

Then inlining (or building a polymorphic inline cache) at call site A will lead nowhere but
it is quite likely that the jvm will do just that. Even if you inline foreach at B or C
you are in trouble if you add another layer:

def exists(f) = {
foreach { x =>
if (f(x)) return true
}
false
}

Let's say you have custom logic that inlines all foreach calls, then the problem is just
shifted from foreach to exists. Inlining foreach into exists will get the caller of exists
nowhere.

Scalac's -optimize flag and @inline annotations can help in some cases but unfortunately
only for final methods. The problem here is that scalac has to be pessimistic and
it cannot know whether methods will be overridden by classes loaded at runtime
(open world assumption). Of course we want collections to be extensible.

As a final note, this problem has also come up in our work on compiled DSLs. There,
life is much easier for two reasons: first, we can do whole-program optimization and
second, we can use staging to effectively implement foreach, exists, etc as macros
that will always be expanded completely (i.e. the generated code will contain just
the while loops at appropriate places).

- Tiark

Alex Cruise

unread,

Apr 6, 2011, 1:34:08 PM4/6/11

to scala-i...@googlegroups.com

Pretty much a transcript of some quarter-assed (at best) speculation I indulged in on IRC just now:

Is it sensible to consider using something like Manifests to drive specialization on particular Function<n> implementations?

e.g. you have a method that takes a Function1[Int,Int], and since in theory there are an infinite variety of such functions, the inner calls to it never get inlined and you suffer from the 10x performance fail.

But maybe given something-like-a-manifest for the particular concrete class of the function type being passed in, the compiler could generate internally specialized versions of the HOF which use static references to particular, final function types--be they anonymous or not.

It's hard to avoid having to build your own PIC (Polymorphic Inline Cache) at runtime, unless you can use something like a 'sealed' trick to bound the concrete types of functions that are allowed. The @specialized technique, in which the target types are named, wouldn't work too well for anonymous functions, but maybe an annotation with a tag value? e.g.:

@specializedFunDecl("myTag1") val f1: Int => Int = _ * 2

@specializedFunDecl("myTag1") def f2(in: Int): Int = _ / 2

// It should be a compile-time error to pass any function not tagged "myTag1"

// Maybe it should also be a compile-time error to write

// @specializedFunDecl("myTag1") in another compilation unit?

def consume(@specializedFunCall("myTag1") in: Int => Int) = ...

def $$generated$consume$specialized$myTag1$f1(in: <actual class of f1>)

def $$generated$consume$specialized$myTag1$f2(in: <actual class of f2, eta-expanded>)

Note that a commenter on Cliff's blog post suggests--and Cliff agrees--that trace-based compilation could be a big win for this kind of problem. In the absence of JVM support though, we'd pretty much have to have a Scala interpreter to make it doable. With the trend toward virtualization and staging though... :)

-0xe1a

Ruediger Keller

unread,

Apr 7, 2011, 4:11:51 AM4/7/11

to scala-i...@googlegroups.com

As far as I have understood from various blogs etc, Java 7's invoke
dynamic / method handles can be used to alleviate some of these
problems. Or am I mistaken? But Scala would need to change it's
representation of functions / closures to be able to profit from this.

Btw. I just found this thread, which might be of interest:
http://groups.google.com/group/jvm-languages/browse_thread/thread/72a5f4b16eba3505

Regards,
Rüdiger

2011/4/6 Alex Cruise <al...@cluonflux.com>:

Ruediger Keller

unread,

Apr 7, 2011, 6:20:32 AM4/7/11

to scala-i...@googlegroups.com

Ok great! I just posted the same link mentioned in the post by Cliff
Click. Sorry about that. I should have read the post before writing my
mail.

Regards,
Rüdiger

2011/4/7 Ruediger Keller <ruedige...@rk42.de>:

Ismael Juma

unread,

Apr 10, 2011, 7:53:41 AM4/10/11

to scala-i...@googlegroups.com, Tiark Rompf, Alex Cruise, iulian dragos

On Wed, Apr 6, 2011 at 10:40 AM, Tiark Rompf <tiark...@epfl.ch> wrote:
> As a final note, this problem has also come up in our work on compiled DSLs. There,
> life is much easier for two reasons: first, we can do whole-program optimization

It would be nice to have whole-program optimization switch for plain
Scala code too. Libraries can't benefit from this, but many of us have
self-contained applications that would.

Since the initial blog post, Remi Forax wrote a prototype that relies
on a Java agent:

Fixing The Inlining “Problem” - A prototype
http://weblogs.java.net/blog/forax/archive/2011/04/08/fixing-inlining-%E2%80%9Cproblem%E2%80%9D-prototype

Best,
Ismael

Lex Spoon

unread,

Apr 10, 2011, 11:10:18 AM4/10/11

to scala-i...@googlegroups.com, Ismael Juma, Tiark Rompf, Alex Cruise, iulian dragos

On Sun, Apr 10, 2011 at 7:53 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> On Wed, Apr 6, 2011 at 10:40 AM, Tiark Rompf <tiark...@epfl.ch> wrote:
>> As a final note, this problem has also come up in our work on compiled DSLs. There,
>> life is much easier for two reasons: first, we can do whole-program optimization
>
> It would be nice to have whole-program optimization switch for plain
> Scala code too. Libraries can't benefit from this, but many of us have
> self-contained applications that would.

Yeah.

I know that the early efforts at whole-program optimizations have not
developed into a tool that's in practical use. Was it ever explored
having just *some* of the program being fixed at compile time? For
optimizing the iteration methods on collections, much should be
possible based only on the assumption that the Scala standard library
is known.

Lex

Miguel Garcia

unread,

Apr 27, 2011, 5:10:47 AM4/27/11

to scala-i...@googlegroups.com, Ismael Juma, Tiark Rompf, Alex Cruise, iulian dragos

I have a suggestion for compiler hackers wanting a "whole-program compilation mode".

Need not modify the compiler. Write instead a source-to-source (but AST-aware) converter, as shown by jdk2ikvm [1] (which performs an API migration, but the technique itself is applicable to your source-level transformation of choice, all thanks to RangePositions).

I haven't thought much about it, but a "whole-program mode" could involve compiling the standard library along the main program (thanks -sourcepath), letting the plugin in question add as many "sealed" and "final" as you can, to later re-compile the converted source files (see? with the same compiler).

After the above transformation has shown its value then a full-fledged "whole program mode" can be built into the compiler. Or perhaps never, if the approach above is deemed "good enough".

Miguel

http://lamp.epfl.ch/~magarcia/ScalaNET/

[1] http://lamp.epfl.ch/~magarcia/jdk2ikvm/

Reply all

Reply to author

Forward