Primitive Support

58 views
Skip to first unread message

Rich Hickey

unread,
Jun 2, 2008, 3:53:47 PM6/2/08
to Clojure
Dismayed by a recent request to embed Java code in Clojure code
(yuck), I've tried over the past week to address the only area in
which such an endeavor might be reasonable: to attain the arithmetic
performance of the Java primitives, and I'm happy to report much
success in making that possible directly in Clojure.

More details here:

http://clojure.org/news/primitive_support.html

Rich

Raoul Duke

unread,
Jun 2, 2008, 4:01:40 PM6/2/08
to clo...@googlegroups.com
> http://clojure.org/news/primitive_support.html

neat.

sounds like one must remember to use "-server" mode (to make sure one
is getting the HotSpot VM and its benefits over time) when starting up
the JVM for Closure, presumably the Closure scripts do that now?

sincerely.

Randall R Schulz

unread,
Jun 2, 2008, 4:32:06 PM6/2/08
to clo...@googlegroups.com

The client-mode JVM most assuredly is HotSpot and does basically the
same optimizations, it just has different parameters for when to
JIT-compile and in-line, etc.

Client mode typically gives better start-up time (launch to first Java
code execution) and is more reluctant about deciding to JIT any given
block of bytecodes and / or to in-line any given set of instructions.

I don't think that server mode is generally a good idea. It's really
meant for and only really good for long-running programs (like
servers...).


> sincerely.


Randall Schulz

Raoul Duke

unread,
Jun 2, 2008, 4:36:14 PM6/2/08
to clo...@googlegroups.com
I'm not sure I have ever really understood what exactly -server does
:-) and/or what Clojure needs to get the performance talked about wrt
"numberics" :-).

"JIT Compiler / What's the difference between the -client and -server
systems? / These two systems are different binaries. They are
essentially two different compilers (JITs) interfacing to the same
runtime system. The client system is optimal for applications which
need fast startup times or small footprints, the server system is
optimal for applications where the overall performance is most
important. In general the client system is better suited for
interactive applications such as GUIs. Some of the other differences
include the compilation policy,heap defaults, and inlining policy."

Rich Hickey

unread,
Jun 2, 2008, 4:48:15 PM6/2/08
to Clojure
I certainly didn't intend to dictate the use of -server one way or the
other. There is no single right answer here. I just want people to use
-server when attempting to replicate my results, as that is what I use
and where I most reliably see the effects of HotSpot.

Rich

On Jun 2, 4:36 pm, "Raoul Duke" <rao...@gmail.com> wrote:
> I'm not sure I have ever really understood what exactly -server does
> :-) and/or what Clojure needs to get the performance talked about wrt
> "numberics" :-).
>
> "JIT Compiler / What's the difference between the -client and -server
> systems? / These two systems are different binaries. They are
> essentially two different compilers (JITs) interfacing to the same
> runtime system. The client system is optimal for applications which
> need fast startup times or small footprints, the server system is
> optimal for applications where the overall performance is most
> important. In general the client system is better suited for
> interactive applications such as GUIs. Some of the other differences
> include the compilation policy,heap defaults, and inlining policy."
>

Randall R Schulz

unread,
Jun 2, 2008, 4:55:32 PM6/2/08
to clo...@googlegroups.com
On Monday 02 June 2008 13:36, Raoul Duke wrote:
> I'm not sure I have ever really understood what exactly -server does
> :-) and/or what Clojure needs to get the performance talked about wrt
> "numberics" :-).
>
> "JIT Compiler / What's the difference between the -client and -server
> systems? / These two systems are different binaries. They are
> essentially two different compilers (JITs) interfacing to the same
> runtime system. The client system is optimal for applications which
> need fast startup times or small footprints, the server system is
> optimal for applications where the overall performance is most
> important. In general the client system is better suited for
> interactive applications such as GUIs. Some of the other differences
> include the compilation policy,heap defaults, and inlining policy."

If you have something like GKrellM on your system and configure it to
display CPU utilization graphs (and if you have multiple CPUs or
cores), it can be interesting to run a given single-threaded, CPU-bound
program that runs for at least the better part of a minute using first
the client and then the server VMs. The client will have much more flat
utilization curve very close to 100% (of a single CPU / core) while the
server mode will show significantly higher and "spikier" utilization
curve, at least early on.


There's a little more information here:

<http://java.sun.com/products/hotspot/docs/whitepaper/Java_HotSpot_WP_Final_4_30_01.html>

An excerpt:

-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
Java HotSpot Client Compiler

The client compiler is tuned for the performance profile of typical
client applications. The Java HotSpot Client Compiler is a simple and
fast two-phased compiler. In the first phase, a platform-independent
front end constructs an intermediate representation (IR) from the
bytecodes. In the second phase, the platform-specific background
generates machine code from the IR. Emphasis is placed on extracting
and preserving as much information as possible from the bytecode level
(for example, locality information, initial control flow graph), which
directly translates into reduced compilation time. Note that the client
VM does only minimal inlining and no deoptimization.


Java HotSpot Server Compiler

The server compiler is tuned for the performance profile of typical
server applications. The Java HotSpot Server Compiler is a high-end
fully-optimizing compiler. It uses an advanced static single assignment
(SSA)-based IR for optimizations. The optimizer performs all the
classic optimizations, including dead code elimination, loop invariant
hoisting, common subexpression elimination, and constant propagation.
It also features optimizations more specific to Java technology, such
as null-check and range-check elimination. The register allocator is a
global graph coloring allocator and makes full use of large register
sets commonly found in RISC microprocessors. The compiler is highly
portable, relying on a machine description file to describe all aspects
of the target hardware. While the compiler is slow by JIT standards, it
is still much faster than conventional optimizing compilers. And the
improved code quality "pays back" the compile time by reducing
execution times of compiled code. The server compiler performs full
inlining and full deoptimization.
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-


Randall Schulz

squeegee

unread,
Jun 2, 2008, 6:17:10 PM6/2/08
to Clojure
> http://clojure.org/news/primitive_support.html
>
> Rich

Primitive support looks very cool and I look forward to using it.

I see 3 uses of unchecked operations in the clojure/src hierarchy
currently. They're calls to unchecked-inc used in loops. In 2 cases
(amap, areduce), the thing being incremented is an array index and
therefore is known to be between 0 and the max index of a Java array
(2^31-2). The loops are safe and correct despite the inc being
unchecked.

In the other case (dotimes), if the caller requests a count outside
the range 0 to (2^31-1) the repetition will not be executed correctly
due to overflow on increment. Please consider making dotimes more
capable so it works correctly for any count. Preserving its fast
operation for the range of positive integers would be cool.

It might be a useful addition to have something like a "checked-cast"
facility that will, for example, cast a Number to int and throw an
exception if the result is not the equal to the Number. One could use
such a checked-cast outside of a loop to be certain (modulo some
programmer analysis) that unchecked operations inside the loop are
guaranteed safe.

--Steve

Albert Cardona

unread,
Jun 2, 2008, 6:56:35 PM6/2/08
to clo...@googlegroups.com

Dismayed by a recent request to embed Java code in Clojure code
(yuck), I've tried over the past week to address the only area in
which such an endeavor might be reasonable: to attain the arithmetic
performance of the Java primitives, and I'm happy to report much
success in making that possible directly in Clojure.


I am very impressed Rich. Thank you very much.

Indeed my only need for speed (hence the request for embedding java code), is for number crunching.

One question though: you mention #^float but not #^double. Is this intentional? Are any of both types issued as esteemed best?

Albert

-- 
Albert Cardona
http://www.mcdb.ucla.edu/Research/Hartenstein/acardona

Albert Cardona

unread,
Jun 2, 2008, 6:59:31 PM6/2/08
to clo...@googlegroups.com
> One question though: you mention #^float but not #^double. Is this intentional? Are any of both types issued as esteemed best?

Never mind, I'm blind today. I see now that double is fully supported.

Rich Hickey

unread,
Jun 2, 2008, 7:01:13 PM6/2/08
to Clojure


On Jun 2, 6:56 pm, "Albert Cardona" <sapri...@gmail.com> wrote:
> > Dismayed by a recent request to embed Java code in Clojure code
> > (yuck), I've tried over the past week to address the only area in
> > which such an endeavor might be reasonable: to attain the arithmetic
> > performance of the Java primitives, and I'm happy to report much
> > success in making that possible directly in Clojure.
>
> I am very impressed Rich. Thank you very much.
>
> Indeed my only need for speed (hence the request for embedding java code),
> is for number crunching.
>
> One question though: you mention #^float but not #^double. Is this
> intentional? Are any of both types issued as esteemed best?
>

Everything should work for int/long/float/double and arrays of same.

float vs double is an application-domain question, precision and
memory use both are factors. But there is neither preference nor bias
as far as Clojure is concerned,

Rich

Rich Hickey

unread,
Jun 2, 2008, 7:09:42 PM6/2/08
to Clojure


On Jun 2, 6:17 pm, squeegee <scgila...@gmail.com> wrote:
> >http://clojure.org/news/primitive_support.html
>
> > Rich
>
> Primitive support looks very cool and I look forward to using it.
>
> I see 3 uses of unchecked operations in the clojure/src hierarchy
> currently. They're calls to unchecked-inc used in loops. In 2 cases
> (amap, areduce), the thing being incremented is an array index and
> therefore is known to be between 0 and the max index of a Java array
> (2^31-2). The loops are safe and correct despite the inc being
> unchecked.
>
> In the other case (dotimes), if the caller requests a count outside
> the range 0 to (2^31-1) the repetition will not be executed correctly
> due to overflow on increment. Please consider making dotimes more
> capable so it works correctly for any count. Preserving its fast
> operation for the range of positive integers would be cool.
>

I still haven't decided if there will be multiple variants, e.g.
dotimes-long or if I'll add complexity to dotimes, but in the end
there will be support for speed (the version now is substantially
faster than before) and range. I wonder how often dotimes is actually
called with a range > Integer.MAX_VALUE, or Long.MAX_VALUE.

> It might be a useful addition to have something like a "checked-cast"
> facility that will, for example, cast a Number to int and throw an
> exception if the result is not the equal to the Number. One could use
> such a checked-cast outside of a loop to be certain (modulo some
> programmer analysis) that unchecked operations inside the loop are
> guaranteed safe.
>

I'm thinking of adding this right into the int coercion function,
since when used for speed that should never be inside the loop, i.e.
the preferred idiom is:

(let [x (int blah)]
... fast math with x)

Stay tuned - these tweaks will get in there, still catching my breath
from this one - it's a big change.

You might also have noted that primmath is gone and with it the
predefined array functions. I'm still not sure what I want to include,
now that everyone has the tools to build their own. Also, I think
there might be some powerful formula-building macros possible on top
of this stuff.

Rich

Albert Cardona

unread,
Jun 4, 2008, 11:54:27 PM6/4/08
to clo...@googlegroups.com
Rich,

Primitive support is awesome:

>>> (load-file "/home/albert/Programming/fiji/plugins/Examples/embeded_java_compiler.clj")

In Clojure lists:

"Elapsed time: 60.273874 msecs"
"Elapsed time: 18.692337 msecs"
"Elapsed time: 12.910268 msecs"
"Elapsed time: 12.157825 msecs"
"Elapsed time: 11.790928 msecs"
nil

In janino java-compiled code:

"Elapsed time: 5.542431 msecs"
"Elapsed time: 8.884038 msecs"
"Elapsed time: 1.356255 msecs"
"Elapsed time: 1.276561 msecs"
"Elapsed time: 1.247018 msecs"
nil

In Clojure with primitives:

"Elapsed time: 21.794794 msecs"
"Elapsed time: 4.948466 msecs"
"Elapsed time: 1.01541 msecs"
"Elapsed time: 0.820752 msecs"
"Elapsed time: 0.792814 msecs"
nil

Code attached, depends on janino.jar

Albert


; Janino wrapper code from Fred Nicolier, as shared in the Clojure
googlegroups mailing list
; 2008-05-27
;
; Depends on janino.jar being present in the classpath

(import '(org.codehaus.janino ExpressionEvaluator
ClassBodyEvaluator
Scanner))

; Create a java class and single static method on the fly,
; with the given arguments and body
(defn jfun
[args body] (let [cl (new ClassBodyEvaluator)]
(.cook cl (str "static public Object f(" args ") {"
body
"}"))
(let [meth (first (.. cl
getClazz
getDeclaredMethods))]
(fn [& args]
(.invoke meth nil (into-array args))))))

; A benchmark: test the sum of a list of numbers
; in both pure Clojure and janino-compiled java bytecode
(defn bench []
; 1 - Pure Clojure
(defn sum1 [a] (reduce + a))
; 2 - Janino-compiled java code
(def sum2 (jfun "double[] a"
(str "double s = 0.0;"
"int N = a.length;"
"for (int k = 0; k<N; k++) {"
" s = s + a[k];"
"}"
"return s;")))
; 3 - Clojure with primitives declared
(defn sum3 [#^doubles a]
(areduce a i ret (double 0)
(+ ret (aget a i))))

;
(let [s (double-array 500000)
end 5]
(prn (dotimes x end (time (sum1 s))))
(prn)
(prn (dotimes x end (time (sum2 s))))
(prn)
(prn (dotimes x end (time (sum3 s))))))
(bench)

Reply all
Reply to author
Forward
0 new messages