The macro expands into bytecode within the same Java method, rather
than a method invocation. Some method blocks are too big to inline,
and perhaps the JIT doesn't have enough information (or motivation) to
do so. The JIT compiler will only inline certain hotspots where it
predicts a benefit. Switching to a macro forces the issue by avoiding
that runtime analysis altogether.
It's also possible that more type propagation, or elimination of
boxing (all function parameters are boxed when they cross the function
boundary) is involved.
I'm sure others will have more to add...
I may be wrong, but doesn't a typical function invocation involve
dereferencing the Var holding the object that implements "IFn" and
calling invoke? It seems pretty intuitive to me that this would be
difficult to inline by the JIT, there is a little bit of
synchronization going on every time a Var is dereferenced.
I think this is why a "let local" variable is faster than def'ing a
*constant* and referencing it. Methods like AtomicInteger.get start
showing up in the profiler when I use *constants* in tight loops at
least.
Hotspot is notoriously difficult for us to intuit about, so take this
all with a grain of salt.
>> Is there some reason that the Java JIT is not doing this, with the
>> original code using defn, as fast as it works when using defmacro?
>
> The macro expands into bytecode within the same Java method, rather
> than a method invocation. Some method blocks are too big to inline,
> and perhaps the JIT doesn't have enough information (or motivation) to
> do so.
To emphasize: since many common clojure forms are macros themselves,
the fns you're writing are likely much larger than you think they
are. Simple things like doseq (nevermind more complicated stuff like
for) expand into sizable chunks of clojure, which are themselves doing
way more work per LOC than typical Java methods. Thus, I'll bet
typical clojure fns exceed whatever code-size windows the JIT has in
mind for inlining far more often than Java methods.
...even 'and' results in more code than you'd likely expect intuitively:
user=> (use 'clojure.contrib.walk)
nil
user=> (macroexpand-all '(and a b c))
(let*
[and__3314__auto__ a]
(if
and__3314__auto__
(let*
[and__3314__auto__ b]
(if and__3314__auto__ c and__3314__auto__))
and__3314__auto__))
- Chas
In principle, the JIT can inline the Var lookup, and do the
appropriate analysis to eliminate much of the work -- Vars have thread-
local bindings, so the JVM should be pretty well aware of access and
scope. Of course, this will only happen if everything is small enough,
frequently used, etc. etc.
I saw a presentation at JavaOne which illustrated to just what extent
the dynamic compiler can eliminate locks, allocations, aliases,
synchronization boundaries, do closed-world analysis of class
hierarchies, and so on. It's pretty impressive. ("Inside Out: A Modern
Virtual Machine Revealed", if you're interested.)
-R
I don't think Vars are thread-local. They're one of the shared
mutable state primitives. They can be defacto thread local if only
used by a single thread but you need a "sufficiently smart compiler"
to notice that.
Hotspot definitely is smart enough in some cases, but I think for
Escape Analysis you currently need a black magic command line
parameter. I'm playing around with: "-XX:+DoEscapeAnalysis
-XX:+UseBiasedLocking" with inconsistant results.
I think inlining via Clojure macro or :inline has the most
benefit when it allows you to avoid boxing arguments and
return values. That is, if you have primitive locals in the
calling function and primitive locals in the called
function, the Java method signatures created by the Clojure
compiler will still be Objects and require boxing and
unboxing for each invocation.
I don't know for sure, but it appears HotSpot doesn't
(usually? ever?) remove that un/boxing when inlining.
Using a Clojure macro or :inline metadata allows the Clojure
compiler to use the same local primitives with no boxing or
unboxing.
--Chouser
"Vars provide a mechanism to refer to a mutable storage location that
can be dynamically rebound (to a new storage location) on a per-thread
basis."
That is, bindings are thread-local. set! modifies only the current
thread's binding.
There is a root binding which is intended to be immutable:
"Currently, it is an error to attempt to set the root binding of a var
using set!, i.e. var assignments are thread-local."
From
All of this broadly means that the scope of a particular var can be
determined for a given thread's execution. That's not true of a ref,
for example.