when performance matters

432 views
Skip to first unread message

Mark P

unread,
Jan 12, 2009, 12:41:54 AM1/12/09
to Clojure
I have recently found out about Clojure and am
rather impressed. I am seriously considering
whether Clojure is a viable language for use at
work. The main stumbling block would be if
performance (both speed and memory) turns out
to be insufficent. I currently use C++, but I'd love
to be able to leave C++ behind and use Clojure
(or similar) instead.

The programs I write perform applied mathematical
optimization (using mainly integer arithmetic)
and often take hours (occasionally even days)
to run. So even small percentage improvements
in execution speed can make a significant
practical difference. And large problems can use
a large amount of memory - so memory efficiency
is also a concern.

Given these performance considerations, at first
glance Clojure does not seem like a good choice.
But I don't want to give up on the idea just yet.
The allure of modernized lisp-style programming
is really tempting.

There are three key factors that still give me
hope:

1. Some of the algorithms I use have the potential
to be parallelized. I am hoping that as the number
of cores in PCs increase, at some point Clojure's
performance will beat C++'s due to Clojure's
superior utilization of multiple cores. (Any ideas
on how many cores are needed for this to become
true?)

2. The JVM is continually being improved. Hopefully
in a year or two, the performance of HotSpot will be
closer to that of C++. (But maybe this is just
wishful thinking.)

3. Maybe I can implement certain performance critical
components in C++ via the JNI. (But I get the impression
that JNI itself isn't particularly efficient. Also, the more
I pull over into the C++ side, the fewer advantages to
using Clojure.)

If all else fails, maybe I could use Clojure as a prototyping
language. Then when I get it right, I code up the actual
programs in C++. But probably a lot would get lost in
the translation from Clojure -> C++ so would it be worth
it?

I'd love to be convinced that Clojure is a viable choice,
but I need to be a realist too. So what do people think?
How realistic are my three "hopes"? And are there
any other performance enhancing possibilities that I
have not taken into account?

Thanks,

Mark P.

e

unread,
Jan 12, 2009, 1:22:10 AM1/12/09
to clo...@googlegroups.com
I can't speak for clojure, so I'm interested in seeing how people who can will answer.

There's so much to consider.  I've heard Haskell is getting faster and has (or will have) parallel programming under the scenes (automatically doing independent parts of operations).  There are other fast functional languages, like OCaml and Clean, but I don't know about concurrency.

As for C++, can you afford a Cray?  They have pragmas for making parallelism easy, and a special CPU that can do like 128 operations (one for each of 128 simultaneous streams of input) in a single cycle . . . which I think is really cool!  If THAT becomes mainstream, I'm not sure one could argue that C++ will be left in the dust.  Now, if Cray, stays super duper expensive, then it's not really part of the discussion.

Konrad Hinsen

unread,
Jan 12, 2009, 3:41:20 AM1/12/09
to clo...@googlegroups.com
On 12.01.2009, at 06:41, Mark P wrote:

> 1. Some of the algorithms I use have the potential
> to be parallelized. I am hoping that as the number
> of cores in PCs increase, at some point Clojure's
> performance will beat C++'s due to Clojure's
> superior utilization of multiple cores. (Any ideas
> on how many cores are needed for this to become
> true?)

It all depends on your algorithms and your code. Clojure has lots of
support (data structures and algorithms) for concurrent programming,
but you have to choose and combine them yourself to get efficient
parallelization.

> 2. The JVM is continually being improved. Hopefully
> in a year or two, the performance of HotSpot will be
> closer to that of C++. (But maybe this is just
> wishful thinking.)

Again this depends a lot on what you are doing. Hotspot is already
doing an impressive job on some applications, but remains bad at
others. Optimization of a JIT compiler is quite different from
optimization by hand: the compiler knows nothing about your algorithm
at large that could help it to speed up execution, but on the other
hand it knows where optimization is really useful and it can create
specialized code versions that no sane person would ever attempt to
write by hand.

> 3. Maybe I can implement certain performance critical
> components in C++ via the JNI. (But I get the impression
> that JNI itself isn't particularly efficient. Also, the more
> I pull over into the C++ side, the fewer advantages to
> using Clojure.)

I also wish there were a better interface between the Java world and
the C world than JNI. Perhaps there is? I'd happily trade some safety
for performance.

One source of hope for better interfaces is new virtual machines. One
candidate is VMKit (http://vmkit.llvm.org/), an implementation of the
JVM (and .Net as well) on top of LLVM. Combine this with the gcc
version that produces LLVM code, and it should be possible to get
Java-C interoperability with much less friction than through the JNI.
On the other hand, it will be a long time before VMKit matches the
performance of HotSpot.

> If all else fails, maybe I could use Clojure as a prototyping
> language. Then when I get it right, I code up the actual
> programs in C++. But probably a lot would get lost in
> the translation from Clojure -> C++ so would it be worth
> it?

You could also consider writing Clojure code that generates C++ code.
Or C, or why not directly LLVM.

> I'd love to be convinced that Clojure is a viable choice,
> but I need to be a realist too. So what do people think?

I am in a very similar situation if I ever want to use Clojure in my
professional environment. For the moment I have decided that Clojure
is fun enough to explore even if I never will use it professionally,
so it's not an immediate concern. Time will tell what can be done
realistically.

Konrad.

Christian Vest Hansen

unread,
Jan 12, 2009, 4:42:35 AM1/12/09
to clo...@googlegroups.com
On Mon, Jan 12, 2009 at 6:41 AM, Mark P <pier...@gmail.com> wrote:
> 1. Some of the algorithms I use have the potential
> to be parallelized. I am hoping that as the number
> of cores in PCs increase, at some point Clojure's
> performance will beat C++'s due to Clojure's
> superior utilization of multiple cores. (Any ideas
> on how many cores are needed for this to become
> true?)

Asuming the right algorithm, single-threaded C++ and no reflection or
boxing overhead in Clojure, then I think 2 will do.

>
> 2. The JVM is continually being improved. Hopefully
> in a year or two, the performance of HotSpot will be
> closer to that of C++. (But maybe this is just
> wishful thinking.)

We're already there. As Konrad said, they optimize differently so
objective testing is hard. I think that Java is behind on optimizing
cache locality because the structure of the typical Java program makes
this a harder problem, that's what I've heard anyway.

Sun has some wiki pages on HotSpot specific optmization techniques,
which might come in handy when you want to squeeze those last 5% of
performance.

>
> 3. Maybe I can implement certain performance critical
> components in C++ via the JNI. (But I get the impression
> that JNI itself isn't particularly efficient. Also, the more
> I pull over into the C++ side, the fewer advantages to
> using Clojure.)

JNA might be an alternative to JNI: https://jna.dev.java.net/
But I wouldn't know; haven't used either.

But instead of interfacing with C++, you might find that simply
interfacing with Java will provide enough of a boost in those
super-critical sections. Unless, that is, if your C++ is using dirty
tricks like CUDA and what-ever-the-AMD-version-is-called. :)

>
> If all else fails, maybe I could use Clojure as a prototyping
> language. Then when I get it right, I code up the actual
> programs in C++. But probably a lot would get lost in
> the translation from Clojure -> C++ so would it be worth
> it?
>
> I'd love to be convinced that Clojure is a viable choice,
> but I need to be a realist too. So what do people think?
> How realistic are my three "hopes"? And are there
> any other performance enhancing possibilities that I
> have not taken into account?
>
> Thanks,
>
> Mark P.
>
> >
>



--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

bOR_

unread,
Jan 12, 2009, 7:23:52 AM1/12/09
to Clojure
This might help: java libraries for fast computing. I guess they will
be usable from within clojure as well.

http://acs.lbl.gov/~hoschek/colt/

Welcome to the Colt Project. Colt provides a set of Open Source
Libraries for High Performance Scientific and Technical Computing in
Java.

Scientific and technical computing, as, for example, carried out at
CERN, is characterized by demanding problem sizes and a need for high
performance at reasonably small memory footprint. There is a
perception by many that the Java language is unsuited for such work.
However, recent trends in its evolution suggest that it may soon be a
major player in performance sensitive scientific and technical
computing. For example, IBM Watson's Ninja project showed that Java
can indeed perform BLAS matrix computations up to 90% as fast as
optimized Fortran. The Java Grande Forum Numerics Working Group
provides a focal point for information on numerical computing in Java.
With the performance gap steadily closing, Java has recently found
increased adoption in the field. The reasons include ease of use,
cross-platform nature, built-in support for multi-threading, network
friendly APIs and a healthy pool of available developers. Still, these
efforts are to a significant degree hindered by the lack of foundation
toolkits broadly available and conveniently accessible in C and
Fortran.

I ran into these when fiddling with JUNG, as I'll be drawing networks
soon.

Another speedup I recently managed to integrate in clojure into a
simulation is the brics automaton regular expression engine, which is
faster than java regex engine.

Good luck!
On 12 jan, 10:42, "Christian Vest Hansen" <karmazi...@gmail.com>
wrote:
> Christian Vest Hansen.- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Stuart Sierra

unread,
Jan 12, 2009, 10:21:52 AM1/12/09
to Clojure
Okay, I'm one of Clojure's biggest fans, and probably one of a very
few using it regularly at work. But let me attempt to inject some
reality into the discussion. The big advantages of Clojure are
succinctness and expressiveness within a Java environment. If you have
highly-optimized, custom-designed numerical algorithms written in a
low-level language like C++, you will never be able to write a version
that is equally fast in a dynamic, virtual-machine language.
Compilers are smart, but they are not that smart. Relying on
constantly-improving compilers is a dangerous wager. Parallelism may
help, but adapting non-parallel algorithms to parallel computation is
fiendishly difficult, and the performance aspects of parallel
computing are net yet well understood.

That being said, experience is the only reliable guide. If you want to
know if Clojure will work for your job, the only thing to do is try
it.

-Stuart Sierra

Mark P

unread,
Jan 12, 2009, 11:35:53 PM1/12/09
to Clojure
> It all depends on your algorithms and your code. Clojure has lots of  
> support (data structures and algorithms) for concurrent programming,  
> but you have to choose and combine them yourself to get efficient  
> parallelization.

I know that there are other functional approaches where
the compiler automatically finds ways to parallelize. Is there
much scope for this in Clojure? Or is it all pretty much
manually added concurrency?

And this leads to another question I have about Lisp-style
languages in general. I get the impression that in Lisp
much of the "compiler" is Lisp code itself. Ie, layer and
layer of macros build up towards the language that then
actually gets "used". Does this layer upon layer of macros
lead to inefficiencies in itself? Or is the opposite true?

> Again this depends a lot on what you are doing. Hotspot is already  
> doing an impressive job on some applications, but remains bad at  
> others.

That seems to match what I've read. On average I get the
impression that it is about 2 times as slow as efficient C++ code
and uses maybe 10 times as much memory.

What I'm wondering is whether there is still quite a bit of
"fat to be trimmed" with HotSpot, or whether it's approaching
the limits of what is achievable.

> One source of hope for better interfaces is new virtual machines. One  
> candidate is VMKit (http://vmkit.llvm.org/), an implementation of the  
> JVM (and .Net as well) on top of LLVM. Combine this with the gcc  
> version that produces LLVM code, and it should be possible to get  
> Java-C interoperability with much less friction than through the JNI.  
> On the other hand, it will be a long time before VMKit matches the  
> performance of HotSpot.

Thanks for the info! This does sound like a promising
approach. For the future though.

> You could also consider writing Clojure code that generates C++ code.  
> Or C, or why not directly LLVM.

Thanks. Worth considering. Though I don't really know
whether this is a practical option or not. Maybe it is.


Thanks Konrad for your thoughts. Most useful.

Cheers,

Mark P.

Mark P

unread,
Jan 13, 2009, 12:40:55 AM1/13/09
to Clojure
> JNA might be an alternative to JNI:https://jna.dev.java.net/
> But I wouldn't know; haven't used either.

Thanks for this info. It sounds like JNA may be worth
looking into.

Cheers,

Mark.

Mark P

unread,
Jan 13, 2009, 1:28:29 AM1/13/09
to Clojure
On Jan 13, 1:21 am, Stuart Sierra <the.stuart.sie...@gmail.com> wrote:
> If you have
> highly-optimized, custom-designed numerical algorithms written in a
> low-level language like C++, you will never be able to write a version
> that is equally fast in a dynamic, virtual-machine language.

I think you are essentially right (although some optimizations
can only be performed with runtime information). But highly-optimized
low-level code is very rigid. It is difficult to maintain it and
adapt it
to changing program needs and new technology (with different
optimization characteristics).

So I am willing to wear a certain loss of performance in exchange
for a more agile program, but not too much. I think I'd be willing
to wear a 20% loss in performance (perhaps even 50%), but not
magnitudes much higher than this. And my gut feeling is that
20% or so should be achievable for a lisp-style language. But the
Clojure+JVM solution sounds like it is still a way off from this.

> Compilers are smart, but they are not that smart.

I am a relative newcomer to lisp, but my impression is that
lisp should be thought of as both low level and high level
all in one. Ie, that the leverage from lower levels to higher
ones happens primarily via macros... which the programmer
has direct access to. So in theory a developer could use
macros to perform compiler-style optimizations within
lisp itself, except these optimizations could be tailored
to the particular application. So as long as the base level
lisp was implemented efficiently, the whole top-level
lisp program could be efficient. But from what I can tell,
even the best lisp compilers produce slow code. Even
SBCL lisp seems to be 3 to 4 times slower than C++.
So I must be misunderstanding something. I can't see
why lisps should be inherently so slow.

> That being said, experience is the only reliable guide. If you want to
> know if Clojure will work for your job, the only thing to do is try
> it.

Yes, I think some experimental coding might be in order. But
I'd like to get as much a grip on the performance landscape as
I can beforehand.

Thanks for tempering any over-enthusiasm and giving me your
best honest assessment.

Cheers,

Mark P.

e

unread,
Jan 13, 2009, 1:46:49 AM1/13/09
to clo...@googlegroups.com
There's a study someone did that compared C++, Java, and Lisp implementations of various problems.  It could be urban legend, but how I remember it:

The best C++ implementation was the fastest for each problem.
But the average lisp implementation was faster then the average java or C++ implementation.
The variance was the smallest on performance of lisp implementations -- it being a higher-level language.
The time to deliver was least on average with lisp and with least variance.
The code size and variance thereof was least in lisp.
The correctness was greatest in lisp.

Moral of the story?  Either I remember this wrong or you have to be an extremely proficient programmer to get the bang for the buck out of C++.  You gotta know about functors with inline constructors . . . buy lots of books, go to lots of conferences . . . .etc.

Of course, another conclusion that makes sense to me is that the average lisp programmer is someone whose been coding for a while.  Statistics can mislead.  What were the sample sets like?  Did the same people code for each language, etc., etc.  Also, were it not for the terrible syntax for STL-type stuff (and the terrible, mile long errors) , , , ,and the terrible compile times, can anyone really say that C++ isn't a powerful language?  Have you seen all the crazy stuff in the boost libraries?  People extend them occasionally with parallel implementations, too.

Zak Wilson

unread,
Jan 13, 2009, 2:44:22 AM1/13/09
to Clojure
Lisps are not inherently slow. SBCL, Clojure and several Schemes are
all much faster than most popular high-level languages (e.g. Python,
Perl, Ruby...) while being at least as high-level.

Optimized code in SBCL may only be marginally slower than C when type
declarations are used properly. The code is usually littered with type
declarations and just as unsafe as C (things like integer overflows
become possible, while Lisp would ordinarily promote an int to a
bigint automatically). The nice thing is that you usually only have to
do this to the inner loops. All the glue code can be written in
potentially slow, but highly abstract idiomatic Lisp. This is similar
to a common idiom in the C world, where a scripting language like Lua
is embedded to make writing the parts that aren't speed-critical
easier.

Clojure, at best should be about as fast as Java, if you use type
hints and unchecked math. A good Common Lisp will probably be slightly
faster. Stalin, a highly-optimized subset of Scheme may be even
faster.

There's no magic number of cores where Clojure becomes faster than C+
+. If your C++ code is parallel and well-written, it will probably
always be faster than Clojure. If your current code is single-
threaded, but parallelizable and you have an 8-core machine, you might
get faster results merely by using one of the operations from
clojure.parallel.

Zak Wilson

unread,
Jan 13, 2009, 2:49:09 AM1/13/09
to Clojure
You're probably thinking of this: http://www.flownet.com/gat/papers/lisp-java.pdf

There's also the (in)famous language benchmark site: http://shootout.alioth.debian.org/

Konrad Hinsen

unread,
Jan 13, 2009, 3:33:56 AM1/13/09
to clo...@googlegroups.com
On 13.01.2009, at 05:35, Mark P wrote:

> I know that there are other functional approaches where
> the compiler automatically finds ways to parallelize. Is there
> much scope for this in Clojure? Or is it all pretty much
> manually added concurrency?

I am not aware of any auto-parallelizing compiler that is actually
useful in practice. All useable parallelization schemes rely at least
on hints from the programmer, and most leave the whole specification
of parallelism to the programmer.

One might say that data-parallel systems are auto-parallelizing, but
they aren't really: the programmer has to define the data that is
being distributed, which counts for me as a parallelization hint.

Clojure does nothing at all, it just provides an infrastructure for
pure functional programming that makes concurrency and parallel
computing easier to implement than most other languages.

> And this leads to another question I have about Lisp-style
> languages in general. I get the impression that in Lisp
> much of the "compiler" is Lisp code itself. Ie, layer and
> layer of macros build up towards the language that then
> actually gets "used". Does this layer upon layer of macros
> lead to inefficiencies in itself? Or is the opposite true?

Macros are handled at compile time, so they add no run-time overhead.
There remains the question of how good the compiler can optimize the
expressions resulting from macro expansion. But the same question
applies to any other compiler, as most of them first transform the
input language into some simple intermediate language and then
optimize at that level. The difference with Lisp is just that the
simple intermediate language is a subset of the full language.

> That seems to match what I've read. On average I get the
> impression that it is about 2 times as slow as efficient C++ code
> and uses maybe 10 times as much memory.

I wouldn't trust any "average"!

> What I'm wondering is whether there is still quite a bit of
> "fat to be trimmed" with HotSpot, or whether it's approaching
> the limits of what is achievable.

JIT technology being young and in rapid development, it would be
risky to do any prediction there.

>> You could also consider writing Clojure code that generates C++ code.
>> Or C, or why not directly LLVM.
>
> Thanks. Worth considering. Though I don't really know
> whether this is a practical option or not. Maybe it is.

There are efficient real-life libraries that use this approach. The
best known is probably FFTW, which consists of optimized C code
generated by functional OCaml code. I have no doubt that Clojure
could take the role of OCaml there. Of course, writing such an
optimizing library requires a significant effort.

Konrad.

Mark P

unread,
Jan 13, 2009, 6:30:33 AM1/13/09
to Clojure
> I am not aware of any auto-parallelizing compiler that is actually  
> useful in practice.

Good to know - thanks.

> > And this leads to another question I have about Lisp-style
> > languages in general.  I get the impression that in Lisp
> > much of the "compiler" is Lisp code itself.  Ie, layer and
> > layer of macros build up towards the language that then
> > actually gets "used".  Does this layer upon layer of macros
> > lead to inefficiencies in itself?  Or is the opposite true?
>
> Macros are handled at compile time, so they add no run-time overhead.

But isn't "compile time" sometimes _at_ runtime? Ie, if
a macro is dependent on runtime information, then isn't
it passed to "the compiler" which is included as part of
the runtime executable, turned into bytecode, and then
executed itself. That's how I thought it worked anyway.
In which case, compiler efficiency is of importance.

> There are efficient real-life libraries that use this approach. The  
> best known is probably FFTW, which consists of optimized C code  
> generated by functional OCaml code. I have no doubt that Clojure  
> could take the role of OCaml there. Of course, writing such an  
> optimizing library requires a significant effort.

Thanks for the example. It's good to know what is
possible.

Cheers,

Mark P.

Konrad Hinsen

unread,
Jan 13, 2009, 7:07:47 AM1/13/09
to clo...@googlegroups.com
On Jan 13, 2009, at 12:30, Mark P wrote:

>> Macros are handled at compile time, so they add no run-time overhead.
>
> But isn't "compile time" sometimes _at_ runtime? Ie, if
> a macro is dependent on runtime information, then isn't
> it passed to "the compiler" which is included as part of
> the runtime executable, turned into bytecode, and then
> executed itself. That's how I thought it worked anyway.

A macro cannot depend on runtime information. A macro is a function
that is called at compile time, its argument is an expression (as
written by the programmer, or as returned by another macro), and its
result is a modified expression. There is no way a macro could access
runtime information. It is a program that works on program code, not
on runtime data.

Konrad.


Mark P

unread,
Jan 13, 2009, 7:30:13 AM1/13/09
to Clojure
On Jan 13, 5:49 pm, Zak Wilson <zak.wil...@gmail.com> wrote:
> You're probably thinking of this:http://www.flownet.com/gat/papers/lisp-java.pdf

Thanks for the link.

> There's also the (in)famous language benchmark
> site:http://shootout.alioth.debian.org/

This is primarily what I was going on. I realize no
benchmarking approach is going to be perfect, but
this attempt doesn't seem too bad. Is there any
reason to be particularly sceptical about results
found here?

Oh, and I realize my comments about SBCL lisp being
3 to 4 times slower than C++ only apply for the quad
core machines. The single core machines have SBCL
as 2.2 or 2.3 times slower than C++. Perhaps this
is a truer representation of SBCL's "raw capabilities".

But a 120% performance hit is still quite a whack.

Cheers,

Mark P.

Eric Lavigne

unread,
Jan 13, 2009, 7:41:58 AM1/13/09
to clo...@googlegroups.com

> There's also the (in)famous language benchmark
> site:http://shootout.alioth.debian.org/

This is primarily what I was going on.  I realize no
benchmarking approach is going to be perfect, but
this attempt doesn't seem too bad.  Is there any
reason to be particularly sceptical about results
found here?

The programs are written by volunteers, so the languages which have people that care about the results (and spend more time writing optimized code for their language of choice) get a big boost in score. Results are also affected by whether relevant libraries (often highly optimized for speed and memory) are included in the language's standard library, as third party libraries can't be used in shootout submissions. Also, for many shootout problems, the answer can be determined at compile-time, so you are potentially testing an aspect of compilation optimization that is not so relevant for practical programming problems.

I don't know of a better set of benchmark results to look at - I use the shootout results myself - but I would take them with a grain of salt.

Mark P

unread,
Jan 13, 2009, 8:04:36 AM1/13/09
to Clojure
> A macro cannot depend on runtime information. A macro is a function  
> that is called at compile time, its argument is an expression (as  
> written by the programmer, or as returned by another macro), and its  
> result is a modified expression. There is no way a macro could access  
> runtime information. It is a program that works on program code, not  
> on runtime data.

Then what do people mean when they say that lisp blurs
the distinction between compile-time and run-time? I
thought that "macros executing at runtime" was part
of this. But if not, I don't know what they do mean.

I thought macros could get executed at runtime as
follows. Suppose I have a domain-specific-language
implemented in a lisp, using macros. Suppose at
runtime some of this domain-specific-code gets
generated somehow (code is data at this point).
An "eval" is executed on this domain-specific-code,
which causes the macros (used to implement the
domain-specific-language) to transform the input
code at runtime.

Is this wrong?

I also thought that when a "clojure application" is
bundled up as java bytecode, this "executable"
actually includes the clojure compiler. Why would
this be included if compilation (including macros)
is never performed at runtime?

Is there something fundamental I am not
understanding?

Thanks,

Mark P.

Konrad Hinsen

unread,
Jan 13, 2009, 8:23:41 AM1/13/09
to clo...@googlegroups.com
On Jan 13, 2009, at 14:04, Mark P wrote:

> Then what do people mean when they say that lisp blurs
> the distinction between compile-time and run-time?

I don't know. I can only guess that they may be referring to the fact
that many Lisp dialects have both an interpreter and a compiler. Or
to the fact that macros (executed at compile time) can call functions
that are also available at runtime. Or to the fact that in
interactive sessions in a purely compiled Lisp (such as Clojure),
every expression gets compiled on the fly just before being
executed. Or again to the fact that you have the function eval, which
calls the compiler at runtime.

Anyway, in Clojure there is a clear distinction between compiling an
expression, i.e. producing equivalent JVM bytecodes, and executing
it, i.e. running the JVM bytecode.

> I thought macros could get executed at runtime as
> follows. Suppose I have a domain-specific-language
> implemented in a lisp, using macros. Suppose at
> runtime some of this domain-specific-code gets
> generated somehow (code is data at this point).
> An "eval" is executed on this domain-specific-code,
> which causes the macros (used to implement the
> domain-specific-language) to transform the input
> code at runtime.
>
> Is this wrong?

You can do that but it is rarely done. DSLs are typically implemented
as ordinary functions and macros without calling eval. Eval is there
for very specific needs, such as implementing interactive
interpreters. And even in those specific applications, I wouldn't
expect the CPU time requirements of eval to be important.

> I also thought that when a "clojure application" is
> bundled up as java bytecode, this "executable"
> actually includes the clojure compiler. Why would
> this be included if compilation (including macros)
> is never performed at runtime?

As long as you want to be able to call eval, you need the compiler to
be around. I guess most Clojure applications could be bundled up
without including the compiler, but perhaps this is too much effort
to be worth pursuing.

Konrad.


Mark H.

unread,
Jan 13, 2009, 11:33:51 AM1/13/09
to Clojure
On Jan 11, 9:41 pm, Mark P <pierh...@gmail.com> wrote:
> The programs I write perform applied mathematical
> optimization (using mainly integer arithmetic)
> and often take hours (occasionally even days)
> to run.  So even small percentage improvements
> in execution speed can make a significant
> practical difference.  And large problems can use
> a large amount of memory - so memory efficiency
> is also a concern.

Would you happen to know exactly what the bottlenecks in your
applications are? Is performance bound by integer arithmetic, by
memory bandwidth (due to lack of temporal locality), or by memory
latency (due to lack of spatial locality)? It's important to know
because garbage collection influences performance differently
depending on the performance bottlenecks in your application. For
example, if you're creating a lot of small objects in your C++ code
anyway, you're still calling destructors a lot and so replacing C++
with a garbage-collected language may even _improve_ performance. (I
recall this being the case with a particular C++ vs. OCaml benchmark,
as the OCaml GC implementation used had soft real-time guarantees
whereas the C++ destructor chain called at the ends of key functions
would cause significant latencies.)

> There are three key factors that still give me
> hope:
>
> 1. Some of the algorithms I use have the potential
> to be parallelized.  I am hoping that as the number
> of cores in PCs increase, at some point Clojure's
> performance will beat C++'s due to Clojure's
> superior utilization of multiple cores.  (Any ideas
> on how many cores are needed for this to become
> true?)

You could use OpenMP to parallelize your C++ code without much effort,
and it will be backwards portable to single-core machines. Getting
performance to scale nicely takes a little more effort with OpenMP.
Clojure needs to build up more infrastructure before parallel
programming for performance becomes practical (how's that for a tongue
twister! ;-P ), but building that infrastructure yourself can be fun
and educational. I would do it myself if I had the time. (I have a
fair amount of experience in parallel programming and consider myself
qualified at least to evaluate a parallel programming framework
design.)

> 2. The JVM is continually being improved.  Hopefully
> in a year or two, the performance of HotSpot will be
> closer to that of C++.  (But maybe this is just
> wishful thinking.)

Again, this depends on the bottlenecks in your app. Remember also
that C++ might be / is (I'm not sure -- you'll have to check this)
_managed_ code (i.e., running under the .NET runtime) in Windows.
Many developers seem to be willing to endure a small-percentage
performance hit in order to make debugging and deployment easier.
I've heard senior Windows engineers say that _every_ Windows app
should run with a tracer so that on a crash they can get a trace of
the last 30 seconds of the program.

> If all else fails, maybe I could use Clojure as a prototyping
> language.  Then when I get it right, I code up the actual
> programs in C++.  But probably a lot would get lost in
> the translation from Clojure -> C++ so would it be worth
> it?

The alternative mentioned in the discussion above was to use Clojure
as a coordination (a.k.a. "scripting") language for Java or C(++).
Combining high-level and low-level languages works great as long as
you set up the interface right (e.g., making it clear where
immutability is violated and avoiding errors that result from certain
Clojure operations assuming immutability). We use Lua as a
coordination language for our C library OSKI (bebop.cs.berkeley.edu/
oski) and it works just fine; it expresses an algebra of sparse matrix
transformations. I prefer Lispy languages but that's just my
taste ;-)

> I'd love to be convinced that Clojure is a viable choice,
> but I need to be a realist too.  So what do people think?
> How realistic are my three "hopes"?  And are there
> any other performance enhancing possibilities that I
> have not taken into account?

I've found that concerns about performance aren't productive unless
they are backed up both by performance models (i.e., expectations of
performance characteristics) and measurements. Also, you'll want to
consider your own productivity as a programmer: on the one hand,
rewriting everything in Clojure or any other language will take a lot
of time, and on the other, maintaining an application written in a low-
level language like C++ also will take a lot of time. The combination
of high-level coordination language and low-level performance language
is perhaps the best both for productivity and for performance:

Productivity:
* You can add the high-level coordination functions separately and
incrementally, instead of rewriting the whole application from
scratch.
* It's much easier and faster to add new higher-level functionality.
* Using two languages forces you to make a clean divide between high-
level interface and low-level implementation, which is good for
software maintenance.

Performance:
* Splitting up the app into "productivity" and "performance" layers
forces you to identify performance bottlenecks, which helps you
understand and improve performance better than if you just had a blob
of C++ to manage.
* Performance-oriented code is often simple and coordination code is
often complex. Complex code written in a low-level language is often
buggy, which can hurt performance (via memory leaks) or even make your
app crash ("infinitely slow" means "infinitely bad
performance" ;-) ).

I can find data on performance of parallel apps under low-level and
higher-level languages if you are interested.

mfh

cliffc

unread,
Jan 13, 2009, 1:07:25 PM1/13/09
to Clojure
Some comments:

1- If you think that HotSpot/Java is slower than C++ by any
interesting amount, I'd love to see the test case. Being the
architect of HotSpot "-server" I've a keen interest in where
performance isn't on par with C. Except for a handful of specialized
uses (e.g. high-level interpreters using gnu label vars), I've only
ever seen equivalent code between C/C++ & Java (not so w/asm+C where
the asm calls out specialized ops or makes specialized optimizations).

2- As already mentioned, there's no auto-parallelization tools Out
There that are ready for prime-time. (there ARE tools out there that
can *help* parallelize an algorithm but you need annotations, etc to
make them work)

3- Making your algorithm parallel is worth an N-times speedup, where N
is limited by the algorithm & available CPUs. Since you can get huge
CPU counts these days, if you can parallelize your algorithm you'll
surely win over almost any other hacking. If you take a 50% slowdown
in the hacking but get to run well on a 4-way box, then your 2x
ahead. I'd love to say that the JVM "will just do it", but hand-
hacking for parallelism is the current state-of-the-art.

4- Java/Clojure makes some of this much easier than in C/C++. Having
a memory model is a HUGE help in writing parallel code, as is the Java
concurrency libs, or the above-mentioned Colt libraries.

5- The debian shootout results generally badly mis-represent Java.
Most of them have runtimes that are too small (<10sec) to show off the
JIT, and generally don't use any of the features which commonly appear
in large Java programs (heavy use of virtuals, deep class hierarchies,
etc) for which the JIT does a lot of optimization. I give a public
talk on the dangers of microbenchmarks and all the harnesses I've
looked at in the shootout fail basic sanity checks. Example: the
fannkuch benchmark runs 5 secs in Java, somewhat faster in C++. Why
does anybody care about a program which runs 5sec? (there's other
worse problems: e.g. the C++ code gets explicit constant array sizes
hand-optimized via templates; the equivalent Java optimization isn't
done but is trivial (declare 'n' as a *final* static var) and doing so
greatly speeds up Java, etc).

6- If you need a zillion integer (not FP) parallel Java cycles, look
at an Azul box. Azul's got more integer cycles in a flat shared-
memory config than anybody else by a long shot.

Cliff

Peter Wolf

unread,
Jan 13, 2009, 2:53:36 PM1/13/09
to clo...@googlegroups.com
Why is Clojure slower than Java? And, can it be fixed? Is it just the
dynamic lookups?

I also want to use Clojure in my work to implement the inner loops, but
I was disappointed by a previous discussion looking at the speed of
Clojure. As I recall Clojure seems to be about 1/4 the speed of Java at
the moment.

Until we regularly have 10's of processors, it seems hard to justify
that kind of hit for code that has to be fast. So, I use Clojure for
scripting and high level code currently.

Peter

P.S. I also find that C++ and Java are now approximately the same
speed. And if exceptions are enabled, Java blows C++ out of the water.

Isaac Gouy

unread,
Jan 13, 2009, 2:51:01 PM1/13/09
to Clojure


On Jan 13, 10:07 am, cliffc <cli...@acm.org> wrote:
> Some comments:
>
> 1- If you think that HotSpot/Java is slower than C++ by any
> interesting amount, I'd love to see the test case.  Being the
> architect of HotSpot "-server" I've a keen interest in where
> performance isn't on par with C.  

-snip-

> 5- The debianshootoutresults generally badly mis-represent Java.
> Most of them have runtimes that are too small (<10sec) to show off the
> JIT,

When you say most of the runtimes are <10sec I wonder what exactly you
are looking at - on u32 & u64, 6 of 13 are <10sec (on quadcore u32q &
u64q, 8 of 13 are <10sec).

I'm always puzzled when I read something like " have runtimes that are
too small (<10sec)" because in my innocence it seems as though JIT
gets to do a lot more in 10 sec on Q6600 than it did on Pentium 4 - am
I just wrong about that?

The benchmarks game has never aimed "to show off the JIT" anymore than
it's aimed "to show off C++" but -

http://shootout.alioth.debian.org/gp4/miscfile.php?file=dynamic&title=Java%20Dynamic%20Compilation


> and generally don't use any of the features which commonly appear
> in large Java programs (heavy use of virtuals, deep class hierarchies,
> etc) for which the JIT does a lot of optimization.

I think that has to be right although it seems like similar issues
would apply to all the other programming languages, not just Java.

"your application is the ultimate benchmark"

http://shootout.alioth.debian.org/u32/miscfile.php?file=benchmarking&title=Flawed%20Benchmarks#app


> I give a public talk on the dangers of microbenchmarks and all the
> harnesses I've looked at in theshootoutfail basic sanity checks.
> Example: the fannkuch benchmark runs 5 secs in Java, somewhat faster
> in C++.  Why does anybody care about a program which runs 5sec?

Why does anybody care that a program runs 4sec rather than 6sec? (otoh
those programs which run 10 minutes...)

Of course you could choose other examples from the benchmarks game
which show a Java program only a few tenths of a second from the
fastest C++ program. (The reaction I frequently see is surprise that
Java can be that close to C++).


> (there's other worse problems: e.g. the C++ code gets explicit constant
> array sizes hand-optimized via templates; the equivalent Java
> optimization isn't done but is trivial (declare 'n' as a *final* static var)
> and doing so greatly speeds up Java, etc).

Which is to say that some of the programs are better than some of the
other programs - isn't that always going to be the case?

It's not obvious which Java fannkuch program you're talking about - my
guess would be "Java 6 -server" rather than "Java 6 -server #2" or
"Java 6 -server #4" (both of which seem to declare 'n' final but don't
use all the cores) ?




Isaac Gouy

unread,
Jan 13, 2009, 4:58:39 PM1/13/09
to Clojure


On Jan 13, 4:30 am, Mark P <pierh...@gmail.com> wrote:
> On Jan 13, 5:49 pm, Zak Wilson <zak.wil...@gmail.com> wrote:
>
> > You're probably thinking of this:http://www.flownet.com/gat/papers/lisp-java.pdf
>
> Thanks for the link.
>
> > There's also the (in)famous language benchmark
> > site:http://shootout.alioth.debian.org/
>
> This is primarily what I was going on.  I realize no
> benchmarking approach is going to be perfect, but
> this attempt doesn't seem too bad.  Is there any
> reason to be particularly sceptical about results
> found here?
>
> Oh, and I realize my comments about SBCL lisp being
> 3 to 4 times slower than C++ only apply for the quad
> core machines.  

Notice which programs are using more than one core and which aren't.

Isaac Gouy

unread,
Jan 13, 2009, 5:17:36 PM1/13/09
to Clojure


On Jan 13, 4:41 am, "Eric Lavigne" <lavigne.e...@gmail.com> wrote:
> > > There's also the (in)famous language benchmark
> > > site:http://shootout.alioth.debian.org/
>
> > This is primarily what I was going on.  I realize no
> > benchmarking approach is going to be perfect, but
> > this attempt doesn't seem too bad.  Is there any
> > reason to be particularly sceptical about results
> > found here?
>
> The programs are written by volunteers, so the languages which have people
> that care about the results (and spend more time writing optimized code for
> their language of choice) get a big boost in score.


It's easy to see which languages have suffered from relative neglect -
they're the ones where most of the programs were written by me and
still haven't been re-written by anyone else :-)

C# Mono, Mozart/Oz, PHP, Scala

otoh Scheme programs by Matthew Flatt, Haskell programs by Don
Stewart, Clean programs by John van Groningen, etc - wow!


> Results are also
> affected by whether relevant libraries (often highly optimized for speed and
> memory) are included in the language's standard library, as third party
> libraries can't be used in shootout submissions.


Except when they can - pcre, libgmp


> Also, for many shootout
> problems, the answer can be determined at compile-time, so you are
> potentially testing an aspect of compilation optimization that is not so
> relevant for practical programming problems.
>
> I don't know of a better set of benchmark results to look at - I use the
> shootout results myself - but I would take them with a grain of salt.


In the words of Simon Peyton-Jones - "flawed, but available"

http://www.haskell.org/pipermail/haskell-cafe/2008-December/052589.html

Colin Walters

unread,
Jan 13, 2009, 8:17:50 PM1/13/09
to Clojure
On Jan 12, 10:21 am, Stuart Sierra <the.stuart.sie...@gmail.com>
wrote:
> If you have
> highly-optimized, custom-designed numerical algorithms written in a
> low-level language like C++, you will never be able to write a version
> that is equally fast in a dynamic, virtual-machine language.

I wouldn't say "never"; clearly Clojure is a managed language, but one
important selling point it has I think versus "language+toy VM"
dynamic languages like Ruby and Python is that you can easily drop to
a lower level with type hints, and in Java dropping to primitives is
going down even farther.

It doesn't seem hard to imagine that Hotspot could see through and
unroll some "inner loop" functional Clojure code with type hints into
something much like what a C/C++ compiler would produce.

Mark H.

unread,
Jan 13, 2009, 11:20:39 PM1/13/09
to Clojure
I humbly propose that folks shouldn't complain about Clojure being
slow for their apps until they have at least one of the following:

1. A targeted benchmark for an important bottleneck in their
application, implemented in both Clojure and the current
implementation language, with performance results; or

2. A performance model based on an understanding of the Clojure and
HotSpot compilation processes, that highlights which particular
features of the latter would make their application slower than in its
current implementation language.

This is not out of elitism but out of an earnest desire to be helpful;
it's hard to give good advice on language choice and code optimization
when we don't know what the bottlenecks are. Also it's really hard
for anyone to optimize their own code unless they themselves know
where the bottlenecks are. The latter means that #1 above is
surprisingly easy; it often can be done with a simple function trace
of a typical run.

mfh

chris

unread,
Jan 14, 2009, 12:45:06 AM1/14/09
to Clojure
Perhaps this thread is dead, but have you looked at CUDA?

A modern GPU has around 100 times the FPU firepower compared to a
modern core duo. Whether you can structure your algorithm to suite
the hardware is another question and I could help you with that. CUDA
isn't as strong on integer but the multiplier is still damn large.
NVIDIA also sells specialized boxes (the Tegra line) specially made
for high-speed computing or supercomputer level processing power.

For an easier route that still produces really great results have you
tried SSE optimized assembly? If you don't like SSE, have you looked
at MMX?

Finally, there is a paper out recently on writing CUDA code with
Haskell. You might check that out if you are interested.

Any of these options should dramatically increase your available
computing resources.

Forget language speed; that is a stupid metric that lots of people
like to waste lots of time on. Using specialized tools in the right
situation will increase your speed by orders of magnitude, not
factors.

I would highly recommend, if something taking a couple days is costing
you any money, to take a serious look at one of the four options
listed above.

Chris

Rich Hickey

unread,
Jan 14, 2009, 9:42:36 AM1/14/09
to Clojure
For those interested in numeric performance, Clojure lets you use
arrays of primitives, has primitive math support, primitive local
support, and has higher-level macros for dealing with them (amap,
areduce) which can also serve as models for your own. You can also
use :inline to wrap arithmetic primitives implemented as Java static
methods, as Clojure itself does for +/* etc, which HotSpot inlines
quite nicely.

If people are not exploring and using these things, and are concerned
about performance not being the same as Java's, I can't help them.
That's how you get numeric performance the same as Java's in Clojure.
Cliff Click has already spoken about the performance of Java vs/ C++,
and I concur.

Where Clojure currently has an unavoidable overhead vs Java is Clojure-
>Clojure calls involving numbers. Since the fn call/return signature
is Object based, those numbers need to be boxed, at least until we get
something like tagged numbers on the JVM. That means naive fib
microbenchmarks suffer (boxing in a tight loop), but most real
performance-critical numeric code is not like naive fib. Most of it is
inner loops on vectors/matrices, or local iterative calculations
involving primitives, which, if implemented using the above
constructs, is as fast as Java.

So, getting the fastest numerics means working with primitive types,
and lower-level constructs. It's not something you'll want to do
anywhere other than where it really matters. When you do, you'll find
Clojure's macros let you write higher-level code than the for loops of
Java. amap and areduce just hint at the possibilities of a high-
performance, macro-based math library, and I expect similar macros to
come from people actually doing number crunching with Clojure.

Rich

Peter Wolf

unread,
Jan 14, 2009, 10:11:25 AM1/14/09
to clo...@googlegroups.com
Rich, I must apologize-- I worded my question *far* too harshly. I knew
it as I pushed the send button. I am a huge fan of Clojure, and plan to
use it as often as possible. My question was really looking for hints,
so that I can use it in more places. You gave me a great one, thanks!

Is there any way to avoid the overhead of dynamic lookups in Clojure?
Can I tell the compiler somehow that the types of each argument is
constant, and/or that a symbol will always resolve to the same place.

I am assuming, and perhaps I am wrong, that dynamic behavior is a
significant overhead.

P

Asbjørn Bjørnstad

unread,
Jan 14, 2009, 11:27:39 AM1/14/09
to Clojure
I'm not complaining, but I find the topic interesting as I just tried
translating a fluid dynamics algorithm to clojure.
(http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/
GDC03.pdf)
I got no experience with this kind of stuff, it's just something I
tried out for fun. So the performance I get may be as expected. Or
there may be obvious ways of speeding it up that I don't see.

Anyway, here is a core part of the algorithm. It's a diffusion
step, this gets called 3 times and in total this takes up more than
one second of cpu time on my machine which makes framerates very
slow. If anyone got any improvements I'd be happy to hear about it:

(def grid-size 300)

(defn make-grid []
(make-array Float/TYPE (* (+ grid-size 2)
(+ grid-size 2))))

(defn diffuse [grid diff-ratio dt]
(let [a (float (* dt diff-ratio grid-size grid-size))
a4-1 (float (+ 1 (* 4 a)))
grid #^floats (deref grid)
diffused-grid #^floats (make-grid)
line-length (int (+ grid-size 2))]
(dotimes [n 20]
(dotimes [y grid-size]
(let [line-offset (* (inc y) line-length)]
(loop [x (int 1)
c (+ x line-offset)]
(aset diffused-grid c
(float (/ (+ (aget grid c)
(* a
(+ (+ (aget diffused-grid (dec
c))
(aget diffused-grid (inc
c)))
(+ (aget diffused-grid (- c
line-length))
(aget diffused-grid (+ c
line-length))))))
a4-1)))
(when (< x grid-size)
(recur (inc x)
(inc c)))))))
diffused-grid))

(def foo (ref (make-grid)))

(time (diffuse foo 0.00025 0.002))

--
-asbjxrn

Greg Harman

unread,
Jan 14, 2009, 11:49:13 AM1/14/09
to Clojure
Asbjxrn,

One thing that leaps out to me performance-wise is the 3 nested loops
(dotimes, dotimes, loop/recur). Whatever's inside the inner loop is
getting run a lot of times! General advice about reducing loop depth
and computation required inside the innermost loop aside... have you
looked at clojure.parallel? Seems like it might be a good fit since
your algorithm is doing a lot of array processing...

-Greg

Mark H.

unread,
Jan 14, 2009, 2:38:20 PM1/14/09
to Clojure
On Jan 14, 8:27 am, Asbjørn Bjørnstad <asbj...@gmail.com> wrote:
> Anyway, here is a core part of the algorithm. It's a diffusion
> step, this gets called 3 times and in total this takes up more than
> one second of cpu time on my machine which makes framerates very
> slow. If anyone got any improvements I'd be happy to hear about it:

Here are a few possibilities:

1. Java 2-D, 3-D, etc. arrays are implemented as arrays of (arrays
of ... pointers), which hurts spatial locality. It was enough of a
problem that I've seen people reimplement 2-D arrays using 1-D
arrays. You might consider using a 1-D array and taking the trouble
to do the indexing by hand. Of course only a benchmark can tell
whether it really helps.

2. It looks like your inner loop is iterating over the first index of
the array (is that right?), which is good for Fortran but bad for
Java. Java 2-D arrays are row-oriented. In your inner loop, you
should iterate such that successive array values are stored adjacent
to each other in memory.

3. I noticed you are doing 20 iterations of Gauss-Seidel. There are
some smart ways to speed up multiple stencil applications. I can
point you to references if you are interested.

Fun to see numeric codes in Clojure! ;-)

mfh

Chouser

unread,
Jan 14, 2009, 3:04:37 PM1/14/09
to clo...@googlegroups.com
On Wed, Jan 14, 2009 at 11:27 AM, Asbjørn Bjørnstad <asb...@gmail.com> wrote:
>
> Anyway, here is a core part of the algorithm. It's a diffusion
> step, this gets called 3 times and in total this takes up more than
> one second of cpu time on my machine which makes framerates very
> slow. If anyone got any improvements I'd be happy to hear about it:

Not sure if it makes for this algorithm, but there are faster (and
less safe) versions of several functions you're using. Check the
output of (find-doc "unchecked")

While I think your loop-local 'c' will be a primitive int, I'm not
entirely sure -- might be worth trying (int (unchecked-add x
line-offset)). Same thing on line-offset, not sure it's a primitive.

--Chouser

chris

unread,
Jan 14, 2009, 3:29:02 PM1/14/09
to Clojure
For a completely different way of doing this, you could certainly use
GPGPU programming to speed this up.

This section:
(aset diffused-grid c
(float (/ (+ (aget grid c)
(* a
(+ (+ (aget diffused-grid (dec
c))
(aget diffused-grid (inc
c)))
(+ (aget diffused-grid (- c
line-length))
(aget diffused-grid (+ c
line-length)))))

looks a lot to me like what any blending or bloom pixel shader does.
If you are doing large grids (like 4000x1000) then you can't beat CUDA
for something like this. Those four lookups would most likely all be
in texture cache as texture cache is spatially managed (i.e. function
of both x and y). You mentioned frame rate anyway, just read the
framebuffer back after the render call and you can work with the next
step of your iteration.

You want this for a game engine anyway; do it in opengl or directx
using shaders.

Chris

On Jan 14, 1:04 pm, Chouser <chou...@gmail.com> wrote:

Rich Hickey

unread,
Jan 14, 2009, 3:33:19 PM1/14/09
to Clojure
Try this (and make sure you are using -server):

(defn diffuse [grid diff-ratio dt]
(let [a (float (* dt diff-ratio grid-size grid-size))
a4-1 (float (+ 1 (* 4 a)))
grid #^floats (deref grid)
diffused-grid #^floats (make-grid)
grid-size (int grid-size)
line-length (int (+ grid-size 2))]
(dotimes [n 20]
(dotimes [y grid-size]
(let [line-offset (* (inc y) line-length)]
(loop [x (int 1)
c (int (+ x line-offset))]
(aset diffused-grid c
(/ (+ (aget grid c)
(* a
(+ (+ (aget diffused-grid
(unchecked-dec c))
(aget diffused-grid
(unchecked-inc c)))
(+ (aget diffused-grid
(unchecked-subtract c line-length))
(aget diffused-grid
(unchecked-add c line-length))))))
a4-1))
(when (< x grid-size)
(recur (unchecked-inc x)
(unchecked-inc c)))))))
diffused-grid))

(time (diffuse foo 0.00025 0.002))
"Elapsed time: 25.588 msecs"

Rich

Mark H.

unread,
Jan 14, 2009, 7:42:28 PM1/14/09
to Clojure
On Jan 14, 12:29 pm, chris <cnuern...@gmail.com> wrote:
> For a completely different way of doing this, you could certainly use
> GPGPU programming to speed this up.
> ...
> You want this for a game engine anyway; do it in opengl or directx
> using shaders.

I recommend OpenCL or CUDA instead for less pain ;-) (Well, maybe not
for you -- I tried doing stuff with OpenGL and some language called CG
in 2006 and it was a world of pain.)

What's the target frame rate?

mfh

Asbjørn Bjørnstad

unread,
Jan 14, 2009, 9:37:43 PM1/14/09
to Clojure


On Jan 15, 3:38 am, "Mark H." <mark.hoem...@gmail.com> wrote:
> On Jan 14, 8:27 am, Asbjørn  Bjørnstad <asbj...@gmail.com> wrote:
>
> > Anyway, here is a core part of the algorithm. It's a diffusion
> > step, this gets called 3 times and in total this takes up more than
> > one second of cpu time on my machine which makes framerates very
> > slow. If anyone got any improvements I'd be happy to hear about it:
>
> Here are a few possibilities:
>
> 1. Java 2-D, 3-D, etc. arrays are implemented as arrays of (arrays
> of ... pointers), which hurts spatial locality.  It was enough of a
> problem that I've seen people reimplement 2-D arrays using 1-D
> arrays.  You might consider using a 1-D array and taking the trouble
> to do the indexing by hand.  Of course only a benchmark can tell
> whether it really helps.

Look closer, I'm using a 1-D array and indexing by hand :-)

> 3. I noticed you are doing 20 iterations of Gauss-Seidel.  There are
> some smart ways to speed up multiple stencil applications.  I can
> point you to references if you are interested.

That would be cool. Although, I just copied someone elses algorithm,
and I think he knew what he were doing so it may be hard to improve
on it. (Apart from the unchecked- functions pointed out by Chouser
and Rich.)

--
-asbjxrn

Asbjørn Bjørnstad

unread,
Jan 14, 2009, 9:49:17 PM1/14/09
to Clojure
For me, fast enough that the smoke looks smokey :) I'm just playing
around, saw this article about "fluid dynamics for games" and thought
it looked fun to try to implement in clojure. There is no planned
project that this needs to fit into.

OpenCL/CUDA might be a next step to see what the performance
improvement would be, but right now I just want to see if it could
be done in pure clojure.
--
-asbjxrn

RZez...@gmail.com

unread,
Jan 14, 2009, 9:54:54 PM1/14/09
to Clojure
I realize this thread has long since changed gears, but you may find
your answer in Chapter 8 of Practical Common Lisp (http://
www.gigamonkeys.com/book/). Look for the section titled 'Macro
Expansion vs. Runtime". Basically, it's like Konrad said, macros are
executed at 'macro expansion time' which is during the compilation
phase. However, Peter Seibel does mention that when interpreted (vs
compiled) the line between macro expansion time and runtime becomes
kind of fuzzy because they become intertwined. He mentions that it is
not explicitly defined when an interpreter must expand macro's. It
could expand all the macros up front, or do it lazily as it comes
across them, but that doesn't change the fact that they must be
expanded first so that the actual runtime code can be produced and
executed.

Hope this helps.

-Ryan

Asbjørn Bjørnstad

unread,
Jan 14, 2009, 10:08:32 PM1/14/09
to Clojure
Your computer is faster than mine.:)
I got to check when I get back home tonight, but I think this might
be fast enough once I use this technique in all the array manipulation
functions. I got about a 4x speedup in my quick tests. Thanks.

--
-asbjxrn

Mark H.

unread,
Jan 15, 2009, 1:03:12 AM1/15/09
to Clojure
On Jan 14, 6:37 pm, Asbjørn Bjørnstad <asbj...@gmail.com> wrote:
> Look closer, I'm using a 1-D array and indexing by hand :-)

oops, sorry about that -- i see you are doing the right thing here ;-)

> > 3. I noticed you are doing 20 iterations of Gauss-Seidel.  There are
> > some smart ways to speed up multiple stencil applications.  I can
> > point you to references if you are interested.
>
> That would be cool. Although, I just copied someone elses algorithm,
> and I think he knew what he were doing so it may be hard to improve
> on it. (Apart from the unchecked- functions pointed out by Chouser
> and Rich.)

Just for optimizing the stencil computations, check out
http://bebop.cs.berkeley.edu/#pubs -- in particular:

--> "Stencil Computation Optimization and Auto-Tuning on State-of-the-
Art Multicore Architectures"

--> "Avoiding Communication in Sparse Matrix Computations" (check out
the bibliography for references on "time skewing," as our paper is
more general than what you need)

As for the algorithm itself, the paper is making an accuracy /
performance tradeoff in the diffusion solve by using very few
iterations of a slowly converging iteration (Gauss-Seidel). I'd have
to investigate the problem further to see whether a more accurate
solver can reduce time-to-solution. For now, maybe you could replace
the 20 steps of Gauss-Seidel with some kind of multigrid cycle -- it
could save some time since you won't have to sweep over the whole grid
so many times.

mfh

bOR_

unread,
Jan 15, 2009, 3:09:41 AM1/15/09
to Clojure
I remember from 5 years ago that a collegue of mine improved a
diffusion algorithm for a successor of me by some heavy algorithms. My
own algorithm was a simple loop-over-the- array once, dump-a-fraction-
of-each-cell-into-spatially-neighbouring-cells-in-the-new-array, and
sum what is in every cell of the new array (you are doing the same
thing, right?). However, there are far faster algorithms.

If you are interested, I'll inquire.


On Jan 15, 7:03 am, "Mark H." <mark.hoem...@gmail.com> wrote:
> On Jan 14, 6:37 pm, Asbjørn  Bjørnstad <asbj...@gmail.com> wrote:
>
> > Look closer, I'm using a 1-D array and indexing by hand :-)
>
> oops, sorry about that -- i see you are doing the right thing here ;-)
>
> > > 3. I noticed you are doing 20 iterations of Gauss-Seidel.  There are
> > > some smart ways to speed up multiple stencil applications.  I can
> > > point you to references if you are interested.
>
> > That would be cool. Although, I just copied someone elses algorithm,
> > and I think he knew what he were doing so it may be hard to improve
> > on it. (Apart from the unchecked- functions pointed out by Chouser
> > and Rich.)
>
> Just for optimizing the stencil computations, check outhttp://bebop.cs.berkeley.edu/#pubs-- in particular:

Mark H.

unread,
Jan 15, 2009, 2:32:42 PM1/15/09
to Clojure
On Jan 15, 12:09 am, bOR_ <boris.sch...@gmail.com> wrote:
> I remember from 5 years ago that a collegue of mine improved a
> diffusion algorithm for a successor of me by some heavy algorithms. My
> own algorithm was a simple loop-over-the- array once, dump-a-fraction-
> of-each-cell-into-spatially-neighbouring-cells-in-the-new-array, and
> sum what is in every cell of the new array (you are doing the same
> thing, right?). However, there are far faster algorithms.
>
> If you are interested, I'll inquire.

It's just idle curiosity on my part but Asbjørn may very well be
interested ;-)

"Dump a fraction of each cell into spatially neighbouring cells in the
new array" is a stationary iterative method, probably Jacobi iteration
(since there is a "new array" -- Asbjørn is recycling the "old" array
so he is doing Gauss-Seidel iteration). Gauss-Seidel often converges
a bit faster but the "heavy algorithms" (FFT, multigrid,
preconditioned iterative solvers) converge much faster. However there
is a higher setup cost with some of these and for the application it's
not clear whether an accurate solve is needed.

mfh

Konrad Hinsen

unread,
Jan 16, 2009, 3:04:29 AM1/16/09
to clo...@googlegroups.com
On 13.01.2009, at 19:07, cliffc wrote:

> 1- If you think that HotSpot/Java is slower than C++ by any
> interesting amount, I'd love to see the test case. Being the

I have to admit that I haven't done many tests, but for those I did I
found little to no difference, provided the tests have sufficiently
long run times.

However, what remains a problem for with the Java/HotSpot approach to
performance is the absence of a clear (even though necessarily
approximate) performance model. When writing C or C++ code, I have
some idea of the approximate cost of everything. Of course there are
differences between machines and compilers, but those rarely cause
drastic differences in performance, rarely more than a factor of 2.
On the JVM, I don't know what HotSpot can optimize and what it can't,
and I can't predict the impact of object allocation and garbage
collection. I suppose JVM experts can do better than me, but where
can I learn about all that?

Konrad.


Isaac Gouy

unread,
Feb 1, 2009, 12:06:54 AM2/1/09
to Clojure


On Jan 13, 10:07 am, cliffc <cli...@acm.org> wrote:
-snip-
> 5- The debianshootoutresults generally badly mis-represent Java.
> Most of them have runtimes that are too small (<10sec) to show off the
> JIT, and generally don't use any of the features which commonly appear
> in large Java programs (heavy use of virtuals, deep class hierarchies,
> etc) for which the JIT does a lot of optimization.  I give a public
> talk on the dangers of microbenchmarks and all the harnesses I've
> looked at in theshootoutfail basic sanity checks.  Example: the
> fannkuch benchmark runs 5 secs in Java, somewhat faster in C++.  Why
> does anybody care about a program which runs 5sec?


I guess they wonder if "somewhat faster" might become 25 minutes as
the workload increases ;-)

usr+sys :: elapsed
N=10
Java 0.756s :: 0.643s
C++ 0.332s :: 0.099s

N=11
Java 5.944s :: 1.859s
C++ 3.540s :: 1.063s

N=12
Java 74.689s :: 20.460s
C++ 51.895s :: 14.167s

N=13
Java 1083.592s :: 271.876s
C++ 796.506s :: 199.856s

N=14
Java 18015.666s :: 5296.275s
C++ 12392.987s :: 3791.224s
Reply all
Reply to author
Forward
0 new messages