Clojure, Java JIT, and inlining

Andy Fingerhut

unread,

Aug 12, 2009, 3:03:54 PM8/12/09

to Clojure

My apologies for the noise if this is well known in the Clojure
community, but I'll ask anyway.

One of the tweaks to my Clojure benchmarks that people have suggested
for improving performance, and that does help, is changing some
function definitions to macros. This is in effect inlining those
functions at the source level, so the Clojure compiler has a shot at
it.

Is there some reason that the Java JIT is not doing this, with the
original code using defn, as fast as it works when using defmacro?

Perhaps some JITs do inlining, but cannot do it as well as a defn ->
defmacro change permits?

Is it because of some kind of function call/return overhead that the
JIT cannot eliminate?

Thanks,
Andy

Richard Newman

unread,

Aug 12, 2009, 3:59:45 PM8/12/09

to clo...@googlegroups.com

> Is there some reason that the Java JIT is not doing this, with the
> original code using defn, as fast as it works when using defmacro?

The macro expands into bytecode within the same Java method, rather
than a method invocation. Some method blocks are too big to inline,
and perhaps the JIT doesn't have enough information (or motivation) to
do so. The JIT compiler will only inline certain hotspots where it
predicts a benefit. Switching to a macro forces the issue by avoiding
that runtime analysis altogether.

It's also possible that more type propagation, or elimination of
boxing (all function parameters are boxed when they cross the function
boundary) is involved.

I'm sure others will have more to add...

Aaron Cohen

unread,

Aug 12, 2009, 4:09:20 PM8/12/09

to clo...@googlegroups.com

I may be wrong, but doesn't a typical function invocation involve
dereferencing the Var holding the object that implements "IFn" and
calling invoke? It seems pretty intuitive to me that this would be
difficult to inline by the JIT, there is a little bit of
synchronization going on every time a Var is dereferenced.

I think this is why a "let local" variable is faster than def'ing a
*constant* and referencing it. Methods like AtomicInteger.get start
showing up in the profiler when I use *constants* in tight loops at
least.

Hotspot is notoriously difficult for us to intuit about, so take this
all with a grain of salt.

Chas Emerick

unread,

Aug 12, 2009, 4:14:35 PM8/12/09

to clo...@googlegroups.com

On Aug 12, 2009, at 3:59 PM, Richard Newman wrote:

>> Is there some reason that the Java JIT is not doing this, with the
>> original code using defn, as fast as it works when using defmacro?
>
> The macro expands into bytecode within the same Java method, rather
> than a method invocation. Some method blocks are too big to inline,
> and perhaps the JIT doesn't have enough information (or motivation) to
> do so.

To emphasize: since many common clojure forms are macros themselves,
the fns you're writing are likely much larger than you think they
are. Simple things like doseq (nevermind more complicated stuff like
for) expand into sizable chunks of clojure, which are themselves doing
way more work per LOC than typical Java methods. Thus, I'll bet
typical clojure fns exceed whatever code-size windows the JIT has in
mind for inlining far more often than Java methods.

...even 'and' results in more code than you'd likely expect intuitively:

user=> (use 'clojure.contrib.walk)
nil
user=> (macroexpand-all '(and a b c))
(let*
[and__3314__auto__ a]
(if
and__3314__auto__
(let*
[and__3314__auto__ b]
(if and__3314__auto__ c and__3314__auto__))
and__3314__auto__))

- Chas

Richard Newman

unread,

Aug 12, 2009, 4:24:38 PM8/12/09

to clo...@googlegroups.com

> I may be wrong, but doesn't a typical function invocation involve
> dereferencing the Var holding the object that implements "IFn" and
> calling invoke? It seems pretty intuitive to me that this would be
> difficult to inline by the JIT, there is a little bit of
> synchronization going on every time a Var is dereferenced.

In principle, the JIT can inline the Var lookup, and do the
appropriate analysis to eliminate much of the work -- Vars have thread-
local bindings, so the JVM should be pretty well aware of access and
scope. Of course, this will only happen if everything is small enough,
frequently used, etc. etc.

I saw a presentation at JavaOne which illustrated to just what extent
the dynamic compiler can eliminate locks, allocations, aliases,
synchronization boundaries, do closed-world analysis of class
hierarchies, and so on. It's pretty impressive. ("Inside Out: A Modern
Virtual Machine Revealed", if you're interested.)

-R

Aaron Cohen

unread,

Aug 12, 2009, 4:31:32 PM8/12/09

to clo...@googlegroups.com

I don't think Vars are thread-local. They're one of the shared
mutable state primitives. They can be defacto thread local if only
used by a single thread but you need a "sufficiently smart compiler"
to notice that.

Hotspot definitely is smart enough in some cases, but I think for
Escape Analysis you currently need a black magic command line
parameter. I'm playing around with: "-XX:+DoEscapeAnalysis
-XX:+UseBiasedLocking" with inconsistant results.

Chouser

unread,

Aug 12, 2009, 4:41:59 PM8/12/09

to clo...@googlegroups.com

On Wed, Aug 12, 2009 at 3:03 PM, Andy
Fingerhut<andy_fi...@alum.wustl.edu> wrote:
>
> My apologies for the noise if this is well known in the Clojure
> community, but I'll ask anyway.
>
> One of the tweaks to my Clojure benchmarks that people have suggested
> for improving performance, and that does help, is changing some
> function definitions to macros. This is in effect inlining those
> functions at the source level, so the Clojure compiler has a shot at
> it.
>
> Is there some reason that the Java JIT is not doing this, with the
> original code using defn, as fast as it works when using defmacro?

I think inlining via Clojure macro or :inline has the most
benefit when it allows you to avoid boxing arguments and
return values. That is, if you have primitive locals in the
calling function and primitive locals in the called
function, the Java method signatures created by the Clojure
compiler will still be Objects and require boxing and
unboxing for each invocation.

I don't know for sure, but it appears HotSpot doesn't
(usually? ever?) remove that un/boxing when inlining.

Using a Clojure macro or :inline metadata allows the Clojure
compiler to use the same local primitives with no boxing or
unboxing.

--Chouser

CuppoJava

unread,

Aug 12, 2009, 8:36:20 PM8/12/09

to Clojure

It is my experience also, that inlining gives the greatest performance
gain for functions that expect primitive arguments.
As Chouser said, doing this eliminates the boxing/unboxing overhead.

Here's my take on this:
The Java method signatures created by Clojure will always be Objects
in order to maintain a consistent interface. Therefore it doesn't make
sense for HotSpot to remove that boxing overhead. HotSpot sees a
"function that takes a Integer". Why should it eliminate the boxing
and treat it like a "function that takes a int"? What if the function
really did want a Integer instead of an int? This isn't a decision the
compiler can make.

Hope that helps (and is correct).
-Patrick

Richard Newman

unread,

Aug 12, 2009, 9:16:21 PM8/12/09

to clo...@googlegroups.com

> I don't think Vars are thread-local. They're one of the shared
> mutable state primitives. They can be defacto thread local if only
> used by a single thread but you need a "sufficiently smart compiler"
> to notice that.

"Vars provide a mechanism to refer to a mutable storage location that
can be dynamically rebound (to a new storage location) on a per-thread
basis."

That is, bindings are thread-local. set! modifies only the current
thread's binding.

There is a root binding which is intended to be immutable:

"Currently, it is an error to attempt to set the root binding of a var
using set!, i.e. var assignments are thread-local."

From

<http://clojure.org/vars>

All of this broadly means that the scope of a particular var can be
determined for a given thread's execution. That's not true of a ref,
for example.

Reply all

Reply to author

Forward