Elegant but very slow

PeterB

unread,

Dec 4, 2008, 2:55:35 PM12/4/08

to Clojure

Hi,

I downloaded clojure_20080916.zip and had a go with a simple tree
benchmark (cut-down version of the one in the computer language
shootout http://shootout.alioth.debian.org/).

The Java version is pretty simple and runs in about 2s on my laptop:

public class Main
{
public static void main(String[] args)
{
System.out.println("result: " + sumTrees(10000, 10));
}

public static int sumTrees(int iterations, int depth)
{
int i;
int sum = 0;

for (i = 1; i <= iterations; i++)
sum += TreeNode.bottomUpTree(i, depth).itemCheck() +
TreeNode.bottomUpTree(-i, depth).itemCheck();
return sum;
}

public static class TreeNode
{
private TreeNode left, right;
private int item;

TreeNode(int item)
{
this.item = item;
}

TreeNode(int item, TreeNode left, TreeNode right)
{
this.left = left;
this.right = right;
this.item = item;
}

private static TreeNode bottomUpTree(int item, int depth)
{
if (depth > 0)
return new TreeNode(item,
bottomUpTree(2*item-1,
depth-1),
bottomUpTree(2*item,
depth-1));
else
return new TreeNode(item);
}

int itemCheck()
{
if (left == null)
return item;
else
return item + left.itemCheck() - right.itemCheck();
}
}
}

The Clojure version is considerably more compact and elegant:

(defn make-tree [item depth]
(if (zero? depth)
(list item nil nil)
(let [item2 (* 2 item) depth-1 (dec depth)]
(list item (make-tree (dec item2) depth-1) (make-tree item2
depth-1)))))

(defn check-tree [tree]
(if (nil? tree)
0
(let [[i l r] tree] (- (+ i (check-tree l)) (check-tree r)))))

(defn sum-trees [iterations depth]
(let [sum #(+ (check-tree (make-tree % depth))
(check-tree (make-tree (- %) depth)))]
(reduce #(+ %1 %2) 0 (map sum (range 1 (inc iterations))))))

(time (println "result:" (sum-trees 10000 10)))

However, there is a heavy price to be paid for this elegance:
'Elapsed time: 100730.161515 msecs'

Ouch! That's rather disappointing :-(
Any suggestions for improving the performance?

Daniel Eklund

unread,

Dec 4, 2008, 5:37:27 PM12/4/08

to Clojure

I

Daniel Eklund

unread,

Dec 4, 2008, 5:39:06 PM12/4/08

to Clojure

oops...

I am just learning the language right now.... and just quickly looked
at what you did...
Would the use of "recur" instead of self-calls potentially help
consuming stack space?

http://clojure.org/special_forms#toc10

Daniel Eklund

unread,

Dec 4, 2008, 5:44:37 PM12/4/08

to Clojure

ahhh... to answer my own question (and if I had looked at the code
and the API a bit closer),
it turns out that "recur" can only be used in tail-position... and
your code (as a tree-recursor) would
not benefit from this.

Meikel Brandmeyer

unread,

Dec 4, 2008, 5:53:11 PM12/4/08

to clo...@googlegroups.com

Hi,

Am 04.12.2008 um 20:55 schrieb PeterB:
> However, there is a heavy price to be paid for this elegance:
> 'Elapsed time: 100730.161515 msecs'
>
> Ouch! That's rather disappointing :-(
> Any suggestions for improving the performance?

http://groups.google.com/group/clojure/browse_thread/thread/f0303a9e00b38529/99f02fef21721a2f?lnk=gst&q=alioth+tree#99f02fef21721a2f

In the giving thread this was already discussed. The whole
benchmark ran for me on my crappy office machine in
24 seconds and someone else tested on a more powerful
machine is 15 seconds. I'm not sure what makes the
difference. The only obvious thing is, that I used vectors
to represent the node. But I don't believe that that makes
such a difference. For the deeper details I'm too sleepy now
to look into this...

Sincerely
Meikel

PS: The code in the above mentioned thread is in no
way optimisied or something. So you can maybe do
better....

Mark Engelberg

unread,

Dec 4, 2008, 6:20:22 PM12/4/08

to clo...@googlegroups.com

A couple quick suggestions:

On my machine, Peter's code runs in 120 seconds.
Changing make-tree to return vectors rather than lists reduced time to
47 seconds.
Taking out the nil? test (just (if tree branch1 branch2) shaved off a
few more seconds.
Removing the pattern-matching let in check-tree (use (tree 0), (tree
1), and (tree 2)) reduced time to 33 seconds.

So these minor changes made a big difference. Maybe there's more to
do? Also, I get the impression Clojure needs to be started with the
-server flag to use all its optimizations. I haven't tried that.

bOR_

unread,

Dec 4, 2008, 6:22:19 PM12/4/08

to Clojure

> (reduce #(+ %1 %2) 0 (map sum (range 1 (inc iterations))))

can be replaced by (reduce + (map sum (range 1 (inc iterations))))

There is at least some functions in clojure's api for doing unchecked
calculations. That should speed up things. I'm not yet familiar
enough
with clojure or building trees if there is some large algorithmic
difference
between the java and the clojure version.

http://clojure.org/api
(unchecked-add x y)
(unchecked-dec x)
(unchecked-divide x y)
(unchecked-inc x)
(unchecked-multiply x y)
(unchecked-negate x)
(unchecked-subtract x y)

bOR_

unread,

Dec 4, 2008, 6:25:15 PM12/4/08

to Clojure

using (int ..) should also help (type hinting)

Mark Engelberg

unread,

Dec 4, 2008, 6:31:25 PM12/4/08

to clo...@googlegroups.com

I already tried bOR's two suggestions (replace anonymous function with
+ and type hinting), but they made no difference on my machine.

Christian Vest Hansen

unread,

Dec 4, 2008, 7:04:11 PM12/4/08

to clo...@googlegroups.com

Is it important that we build and deconstruct a complete tree in the
process, or is merely getting the correct end result enough?

(defn checkTree [item depth]
(if (zero? depth)
item
(- (+ item (checkTree (- (* 2 item) 1) (dec depth)))
(checkTree (* 2 item) (dec depth)))))

(defn sumTrees [iterations depth]
(loop [sum (int 0) i (int 1)]
(if (<= i iterations)
(recur (int (+ sum (checkTree i depth) (checkTree (- i) depth)))
(inc i))
sum)))

;;; followed by all of your code, and then:

(time (println "My result:" (sumTrees 10000 10)))
(time (println "Your result:" (sum-trees 10000 10)))

Produces this output:

rowe:~$ clj trees.clj
My result: -20000
"Elapsed time: 11179.616 msecs"
Your result: -20000
"Elapsed time: 112541.067 msecs"
rowe:~$

--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Christian Vest Hansen

unread,

Dec 4, 2008, 7:09:43 PM12/4/08

to clo...@googlegroups.com

Ah, disregard that. I found the rules:
http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all#about

Stuart Sierra

unread,

Dec 5, 2008, 10:38:56 AM12/5/08

to Clojure

On Dec 4, 7:09 pm, "Christian Vest Hansen" <karmazi...@gmail.com>
wrote:

> Ah, disregard that. I found the rules:
> http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all#about

Yeah -- " this is an adaptation of a benchmark for testing GC so we
are interested in the whole tree being allocated before any nodes are
GC'd - which probably excludes lazy evaluation."

So this is testing the compiler and garbage collector more than the
language. Given that Clojure's immutable data structures are much
more complex than the simple classes in the Java example, I don't
think it will ever be possible to write a competitive implementation
in pure Clojure. However, the advantage of having Java as a host
language is that you can use it for low-level operations like this.

-Stuart Sierra

PeterB

unread,

Dec 6, 2008, 9:14:51 AM12/6/08

to Clojure

Thanks for all the suggestions!

This modified version is very close to that in the thread:
http://groups.google.com/group/clojure/browse_thread/thread/f0303a9e00b38529/99f02fef21721a2f?lnk=gst&q=alioth+tree#99f02fef21721a2f
(Thanks for pointing that out Meikel. I should have searched old
threads before posting.)

(defn make-tree [item depth]
(if (zero? depth)

[item nil nil]

(let [item2 (* 2 item) depth-1 (dec depth)]

[item (make-tree (dec item2) depth-1) (make-tree item2
depth-1)])))

; Note: (+ (tree 0) (check-tree (tree 1)) (- (check-tree (tree 2))))
seems to require
; the creation of an intermediate list and runs twice as slow
(defn check-tree [tree]
(if tree
(- (+ (tree 0) (check-tree (tree 1))) (check-tree (tree 2)))
0))

(defn sum-trees [iterations depth]
(let [sum #(+ (check-tree (make-tree % depth))
(check-tree (make-tree (- %) depth)))]

(reduce + (map sum (range 1 (inc iterations))))))

(time (println "result:" (sum-trees 10000 10)))

Running in Clojure REPL for java 1.6.0_11 with -server option:

result: -20000
Elapsed time: 6080.294283 msecs

Wow! Elegant and fast!

Mark Engelberg

unread,

Dec 6, 2008, 12:16:46 PM12/6/08

to clo...@googlegroups.com

Has anyone been able to use type hints to successfully close the last
bit of difference between the Clojure and Java version on this
benchmark?

Peter Wolf

unread,

Dec 7, 2008, 10:11:44 AM12/7/08

to clo...@googlegroups.com

I'm a n00b, but isn't the point of this language to be *faster* than
Java?... at least on a multiprocessor machine.

Shouldn't the number of processors on the test machine make a big
difference to how fast it runs? Whereas, the Java version is only
dependent on the clock rate of the individual processors.

What happens if we run this benchmark on a nice 4 core core machine?

Dave Newton

unread,

Dec 7, 2008, 10:18:56 AM12/7/08

to clo...@googlegroups.com

--- On Sun, 12/7/08, Peter Wolf wrote:
> I'm a n00b, but isn't the point of this language to
> be *faster* than Java?...

I don't believe so. My impression was that the point was to be *better* than Java by: being a Lisp, being functional, and allowing safe and easy multithreading/concurrency.

That's just my impression; you'll get far more definitive answers than this.

Dave

Randall R Schulz

unread,

Dec 7, 2008, 10:24:23 AM12/7/08

to clo...@googlegroups.com

On Sunday 07 December 2008 07:11, Peter Wolf wrote:
> I'm a n00b, but isn't the point of this language to be *faster* than
> Java?... at least on a multiprocessor machine.

I don't think performance is a particular criterion for the design of
this language. It's not unimportant, but the quality of the code it
engenders, especially w.r.t. to concurrency and the difficulty of
writing correct code using conventional thread-aware mechanisms,
_is_ a key design goal.

> Shouldn't the number of processors on the test machine make a big
> difference to how fast it runs? Whereas, the Java version is only
> dependent on the clock rate of the individual processors.

Only for algorithms that are both parallelizable and which have actually
been written in parallel form. There is not yet (and may never be) any
ability to automatically parallelize arbitrary algorithms or code.

> What happens if we run this benchmark on a nice 4 core core machine?

And nothing else is running? One core will be used for the thread
running this code. Another will do any I/O, though in this case there
is virtually none, and another will do GC.

This test is a sequential algorithm. I'm not familiar with the Alioth
benchmarks / shootout, but maybe there are some parallel tests in
there.

Randall Schulz

Peter Wolf

unread,

Dec 7, 2008, 10:40:02 AM12/7/08

to clo...@googlegroups.com

Hmmm...

Looking at the code I see

(defn sum-trees [iterations depth]
(let [sum #(+ (check-tree (make-tree % depth))
(check-tree (make-tree (- %) depth)))]

(reduce + (map sum (range 1 (inc iterations))))))

Shouldn't expressing the algorithm as a REDUCE and MAP instead of a LOOP
do the trick? I would expect that to compile into parallel code.
Otherwise, why go through all the pain of learning functional
programming (and convincing management)?

Stephen C. Gilardi

unread,

Dec 7, 2008, 11:17:08 AM12/7/08

to clo...@googlegroups.com

On Dec 7, 2008, at 10:11 AM, Peter Wolf wrote:

> Shouldn't the number of processors on the test machine make a big
> difference to how fast it runs? Whereas, the Java version is only
> dependent on the clock rate of the individual processors.

Replacing the "map" call with "pmap" on a 2 core machine improved my
time by about 6% (the code was already keeping both cores pretty busy).

On a 4 core machine it brought the time from 6.8 seconds down to 4.7.

On the same 4 core machine, the Java version posted by the original
poster runs in 1.3 seconds including JVM launch time.

I tried use clojure.parallel's preduce as well, but on my 4 core
machine it didn't affect the time.

--Steve

Peter Wolf

unread,

Dec 7, 2008, 12:02:21 PM12/7/08

to clo...@googlegroups.com

That's pretty encouraging! :-D

The language does work as advertised, but is still under development.
One shouldn't expect it to crush Java on speed, nor take full advantage
of multiple processors... yet.

Clojure is a language for the future, after all. It can only get
better. Whereas Java will always be a slave to clock-rate.

Reply all

Reply to author

Forward