Designing a Performance Vector Math library in Clojure.

112 views
Skip to first unread message

CuppoJava

unread,
Jan 13, 2009, 12:49:39 AM1/13/09
to Clojure
Hi,
I'm still getting used to the primitive support in Clojure, and I want
to confirm my understanding.

As far as I know, primitive support is only supported in local let
bindings, right?

So in designing a vector math library,

To translate from Java, one would write a class with primitive fields.

class Vector{
float x,y,z;
}

What would be the best choice in Clojure?

{:x 0 :y 0 :z 0}
I think a map would have too much overhead to retrieve and set keys.
Is that correct? Although this is the most straight-forward
conversion.

A 3-element vector
[0 0 0]
Still has the boxing/unboxing overhead. (Unless Clojure supports
primitives in collections, which I'm not sure about)

A Java array
(make-array Float/TYPE 0)
Is this the most efficient representation?
I don't really like this representation, cause I lose the named
fields, as well as the ability to put other types of data into the
same vector struct.

e

unread,
Jan 13, 2009, 1:25:13 AM1/13/09
to clo...@googlegroups.com
I hear about boxing a lot.  What is it?  Thanks.

CuppoJava

unread,
Jan 13, 2009, 1:38:13 AM1/13/09
to Clojure
Do you know Java? It'll be easier to explain if you do.

In Java, everything is an Object. Object's are treated as references
to a specific memory location which holds the actual data. So when you
pass an Object to a method, you're only passing the method the memory
location of that Object.

For performance reasons Java has primitives. It makes little sense to
pass a memory-location of an integer to a method, when the memory-
location itself takes up the same number of bits as an integer. So
primitives are passed to methods directly by value.

Unfortunately, most of the standard API is written around Objects.
Vectors, Lists, Maps, all are collections of Objects. If you wish to
put a primitive in a Vector, you must first create a Object that wraps
around the primitive.

ie. class Integer {
int i;
}

This lets you use primitives with the standard API.

The process of wrapping a primitive up into a object is called boxing.
The reverse is called unboxing. And when Java does it automatically
for you, it's called autoboxing.

Timothy Pratley

unread,
Jan 13, 2009, 1:55:45 AM1/13/09
to Clojure
> {:x 0 :y 0 :z 0}
> I think a map would have too much overhead to retrieve and set keys.
> Is that correct? Although this is the most straight-forward
> conversion.

How about a structmap?

http://clojure.org/data_structures
"StructMaps
Often many map instances have the same base set of keys, for instance
when maps are used as structs or objects would be in other languages.
StructMaps support this use case by efficiently sharing the key
information, while also providing optional enhanced-performance
accessors to those keys."

user=> (defstruct employee :name :id)
#'user/employee

user=> (struct employee "Mr. X" 10)
{:name "Mr. X", :id 10}


e

unread,
Jan 13, 2009, 2:00:49 AM1/13/09
to clo...@googlegroups.com
ahhhhh great explanation.  That will help a lot.  I found that annoying in Java.

Timothy Pratley

unread,
Jan 13, 2009, 2:18:15 AM1/13/09
to Clojure
BTW Rich,

The documentation http://clojure.org/data_structures hints that
accessors are faster than regular map lookups and provides the
following example:
(reduce (fn [n y] (+ n (:fred y))) 0 x)
-> 4999950000
(reduce (fn [n y] (+ n (fred y))) 0 x)
-> 4999950000

However I think it would be clearer if you explicitly showed the speed
improvement:
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 100.838989 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))
"Elapsed time: 48.82307 msecs"
4999950000


Regards,
Tim.

Chouser

unread,
Jan 13, 2009, 2:14:22 PM1/13/09
to clo...@googlegroups.com

With repeated runs, and my cpu frequency set to not change, I get very
little speed improvement. I increased the size of the example range
times 10 for these runs:

user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))

"Elapsed time: 112.961087 msecs"
499999500000


user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))

"Elapsed time: 102.127459 msecs"
499999500000

--Chouser

CuppoJava

unread,
Jan 13, 2009, 2:23:41 PM1/13/09
to Clojure
Do struct-maps also have boxing and unboxing overhead? Or does the JVM
optimize that away?

Chouser

unread,
Jan 13, 2009, 2:36:40 PM1/13/09
to clo...@googlegroups.com
On Tue, Jan 13, 2009 at 2:23 PM, CuppoJava <patrick...@hotmail.com> wrote:
>
> Do struct-maps also have boxing and unboxing overhead? Or does the JVM
> optimize that away?

Clojure collections currently store Objects, so they'll be boxed.

To store primitive values I think you'll have to use a Java array (but
only if all the primitives are the same type, of course) or create a
new Java class (gen-interface+proxy, gen-class, or .java).

If the Java array is an option for you, you can build up an API of
functions to keep things named and immutable.

--Chouser

Timothy Pratley

unread,
Jan 13, 2009, 5:54:19 PM1/13/09
to Clojure
> With repeated runs, and my cpu frequency set to not change, I get very
> little speed improvement. I increased the size of the example range
> times 10 for these runs:

For me there is a very clear speed improvement of at least 40%, doing
quite a lot of repeated runs. I can't run with range times 10 because
it crashes my REPL, so I guess it is quite machine/JVM specific as to
the relative performances. Here is a sample extract of how I tested
(more runs done in practice). I'm not sure how to "cpu frequency set
to not change" or how significant that is on my results.

;; first setting up the test and using normal get
user=> (defstruct desilu :fred :ricky)
#'user/desilu
user=> (def x (map (fn [n]
(struct-map desilu
:fred n
:ricky 2
:lucy 3
:ethel 4))
(range 100000)))
#'user/x
user=> (def fred (accessor desilu :fred))
#'user/fred
;; ignore the first one
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 1365.325055 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 91.447766 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 72.951146 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 78.106709 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 76.300491 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
"Elapsed time: 81.184388 msecs"
4999950000

;; switching to accessor, getting a consistent speed increase of about
40%
user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))
"Elapsed time: 53.602536 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))
"Elapsed time: 51.098607 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))
"Elapsed time: 51.512913 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (fred y))) 0 x))
"Elapsed time: 53.062693 msecs"
4999950000

;; I defined my own access method so that an accessor is not required,
;; however then you need to type hint which makes it too clumsy
;; performs very similar to an accessor, in theory slightly faster
;;public Object access(int i) throws Exception{
;; return vals[i];
;;}
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 53.376319 msecs"
4999950000
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 49.545614 msecs"
4999950000
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 49.348831 msecs"
4999950000
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 49.313406 msecs"
4999950000
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 51.485216 msecs"
4999950000
user=> (time (reduce (fn [n #^clojure.lang.PersistentStructMap y] (+ n
(.access y 0))) 0 x))
"Elapsed time: 50.024856 msecs"
4999950000

;; then I try running with 10x as you suggested
user=> (defstruct desilu :fred :ricky)
#'user/desilu
user=> (def x (map (fn [n]
(struct-map desilu
:fred n
:ricky 2
:lucy 3
:ethel 4))
(range 1000000)))
#'user/x
user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at jline.ConsoleRunner.main(ConsoleRunner.java:69)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.reflect.Method.copy(Unknown Source)
at java.lang.reflect.ReflectAccess.copyMethod(Unknown Source)
at sun.reflect.ReflectionFactory.copyMethod(Unknown Source)
at java.lang.Class.copyMethods(Unknown Source)
at java.lang.Class.getMethods(Unknown Source)
at clojure.lang.Reflector.getMethods(Reflector.java:300)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:
27)
at clojure.main$repl__5506$fn__5509.invoke(main.clj:136)
at clojure.main$repl__5506$fn__5524.invoke(main.clj:154)
at clojure.main$repl__5506.doInvoke(main.clj:145)
at clojure.lang.RestFn.invoke(RestFn.java:426)
at clojure.main$repl_opt__5548.invoke(main.clj:208)
at clojure.main$legacy_repl__5573.invoke(main.clj:249)
at clojure.lang.Var.invoke(Var.java:327)
at clojure.main.legacy_repl(main.java:29)
at clojure.lang.Repl.main(Repl.java:20)
... 5 more
;; REPL crashed

Aaron Cohen

unread,
Jan 13, 2009, 5:59:36 PM1/13/09
to clo...@googlegroups.com
> user=> (defstruct desilu :fred :ricky)
> #'user/desilu
> user=> (def x (map (fn [n]
> (struct-map desilu
> :fred n
> :ricky 2
> :lucy 3
> :ethel 4))
> (range 1000000)))
> #'user/x
> user=> (time (reduce (fn [n y] (+ n (:fred y))) 0 x))
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at jline.ConsoleRunner.main(ConsoleRunner.java:69)
> Caused by: java.lang.OutOfMemoryError: Java heap space

I'm not sure how to do it in your configuration, but you need to as
-Xmx<SomeLargeNumber>M to your java invocation.

Timothy Pratley

unread,
Jan 13, 2009, 7:13:18 PM1/13/09
to Clojure
> ;; I defined my own access method so that an accessor is not required,
> ;; however then you need to type hint which makes it too clumsy
> ;; performs very similar to an accessor, in theory slightly faster

Actually there is a very simple way to make "by index" quite usable
without user type hints:

core.clj:
(defn get-field
"Accesses a struct-map field by index instead of name or accessor"
[#^clojure.lang.PersistentStructMap s #^Integer index]
(.access s index))

PersistentStructMap.java:
public Object access(int i) throws Exception{
return vals[i];
}


user=> (time (reduce (fn [n y] (+ n (get-field y 0))) 0 x))
"Elapsed time: 45.038806 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (get-field y 0))) 0 x))
"Elapsed time: 51.727498 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (get-field y 0))) 0 x))
"Elapsed time: 50.091357 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (get-field y 0))) 0 x))
"Elapsed time: 51.299742 msecs"
4999950000
user=> (time (reduce (fn [n y] (+ n (get-field y 0))) 0 x))
"Elapsed time: 51.106905 msecs"
4999950000

Just thought I'd mention it as it might be a preferable form over
accessors to some people (perhaps only myself *chuckle*).


Regards,
Tim.

Chouser

unread,
Jan 14, 2009, 11:01:46 AM1/14/09
to clo...@googlegroups.com
On Tue, Jan 13, 2009 at 5:54 PM, Timothy Pratley
<timothy...@gmail.com> wrote:
>
> I'm not sure how to "cpu frequency set
> to not change" or how significant that is on my results.

I was afraid that was too vague a reference, sorry.

My laptop usually slows down the CPU when its idle or nearly so.
Specifically, the linux kernel is using an "on demand" CPU frequency
scaler. This is nice for nearly every context except trying to run
benchmarks, in which case some runs of a test may be partially handled
while the CPU is in a slow and cool mode (800 MHz), while the rest is
in the fast and hot mode (2500 MHz), resulting in almost unusably
divergent timing results.

So when I remember to do so, I switch the kernel to using the
"performance" CPU frequency scaler before running my tests, and back
to "on demand" afterwards.

--Chouser

Reply all
Reply to author
Forward
0 new messages