Help me understand what part of this code is slow, and how to make it faster?

Didier

unread,

Nov 15, 2016, 10:39:43 PM11/15/16

to Clojure

Hey all,

I came upon a benchmark of F#, Rust and OCaml, where F# performs much faster then the other two. I decided for fun to try and port it to Clojure to see how Clojure does. Benchmark link: https://github.com/c-cube/hashset_benchs

This is my code for it: https://gist.github.com/didibus/1fd4c00b69d927745fbce3dcd7ca461a

(ns hash-set-bench
  "A Benchmark I modified to Clojure from:
   https://github.com/c-cube/hashset_benchs")

(defn iterNeighbors [f [i j]]
  (f [(dec i) j])
  (f [(inc i) j])
  (f [i (dec j)])
  (f [i (inc j)]))

(defn nth* [n p]
  (loop [n n s1 #{p} s2 #{}]
    (if (= n 0)
      s1
      (let [s0 (atom #{})]
        (letfn [(add [p]
                     (when (not (or (contains? s1 p) (contains? s2 p)))
                       (reset! s0 (conj @s0 p))))]
               (doseq [p s1] (iterNeighbors add p))
               (recur (dec n) @s0 s1))))))

#_(printf "result is %d" (count (time (nth* 2000 [0 0]))))

And here's the F# code: https://github.com/c-cube/hashset_benchs/blob/master/neighbors2.fsx

Currently, this takes about 30s in Clojure, while it only takes around 3s for OCaml, Rust and F#.

From what I see, the differences between my code and theirs are:

Lack of a Point struct, I'm just using a vector.
They use a mutable set, I don't.
They overrode Hashing for their point struct, as well as equality. I rely on Clojure's default hashing, and vector equality.

I'm not sure if any of these things should really impact performance that much though. And what I could do in Clojure if I wanted to improve it.

Any Help?

Thanks.

James Reeves

unread,

Nov 16, 2016, 9:09:27 AM11/16/16

to clo...@googlegroups.com

On 16 November 2016 at 03:39, Didier <did...@gmail.com> wrote:

Currently, this takes about 30s in Clojure, while it only takes around 3s for OCaml, Rust and F#.

From what I see, the differences between my code and theirs are:
Lack of a Point struct, I'm just using a vector.
They use a mutable set, I don't.
They overrode Hashing for their point struct, as well as equality. I rely on Clojure's default hashing, and vector equality.
I'm not sure if any of these things should really impact performance that much though. And what I could do in Clojure if I wanted to improve it.

It's the mutability of the set that's causing the greatest performance impact. You can't substitute in an immutable data structure into an algorithm that makes heavy use of a mutable structure, and expect it to have the same performance.

Something like this is more faithful to the original F# benchmark you posted:

(defn nth* [n p]

(loop [n n, s1 (java.util.HashSet. [p]), s2 (java.util.HashSet.)]

(if (= n 0)

s1

(let [s0 (java.util.HashSet.)]

(letfn [(add [p]

(when-not (or (.contains s1 p) (.contains s2 p))

(.add s0 p)))]

(doseq [p s1] (iter-neighbors add p))

(recur (dec n) s0 s1))))))

- James

Any Help?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Felice

unread,

Nov 16, 2016, 9:10:27 AM11/16/16

to clo...@googlegroups.com

I'll bet the atom accesses dominate the computation. They are a part of Clojures software transactional memory (STM) and have a cost.

Something like this (untested) should be faster:

(->> (iterate neighbors #{p}) (drop 1999) first)

neighbors could be:

(into #{} (for [[id jd] [[-1 0] [+1 0] [0 +1] [0 -1]]

[(+ i id) (+ j jd)]))

This creates some negligible lazy lists.

If you want to test actual time for hash inserts, though, you'll probably need to use transients or some such, and the code would become less idiomatic.

--

Alex Miller

unread,

Nov 16, 2016, 9:27:06 AM11/16/16

to Clojure

On Wednesday, November 16, 2016 at 8:10:27 AM UTC-6, Jason Felice wrote:

I'll bet the atom accesses dominate the computation. They are a part of Clojures software transactional memory (STM) and have a cost.

Atoms don't use the STM and if used in a single-threaded context like this, they will be uncontended, so the fastest case. That said, if you want something stateful, a volatile is faster in a single-threaded context.

Alex Miller

unread,

Nov 16, 2016, 9:31:46 AM11/16/16

to Clojure

In general, any benchmark code using math should be aware of boxing (old post here: http://insideclojure.org/2014/12/15/warn-on-boxed/).

I would recommend doing the work to leverage primitive longs and unchecked math. Generally this makes numeric code like this about 2 orders of magnitude faster.

Removing the stateful atom and converting to a purely functional form would probably be a win too, but it's hard to say.

Make sure to run it multiple times before looking at timings so that the JVM compiles and optimizes the code.

Michael Gardner

unread,

Nov 17, 2016, 12:42:06 AM11/17/16

to clo...@googlegroups.com

Below is the fastest version I tested, using ideas from the various responses in this thread. It runs in ~4s on my machine, compared with ~27s for the original version.

The biggest win by far was from James Reeves' suggestion of switching to Java's mutable HashSet. I'm not sure why; I'd thought that using transients and transducers with Clojure's native sets would provide comparable performance, but this was not true in my testing.

I also couldn't figure out how to avoid the reflection warning when invoking HashSet's constructor that takes a generic Collection, which is why that ugly doto is there in nth-neighbors. How would I avoid that?

(ns hash-set-bench
(import
[java.util HashSet]
[java.awt Point]))

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)

(defn neighbors [^Point p]
(let [x (.-x p), y (.-y p)]
[(Point. x (inc y))
(Point. x (dec y))
(Point. (inc x) y)
(Point. (dec x) y)]))

(defn nth-neighbors [^long n ^Point p]
(loop [n n, s1 (doto (HashSet.) (.add p)), s2 (HashSet.)]
(if (zero? n) s1
(let [s0 (HashSet.)]
(doseq [_ s1, p (neighbors _)]

(when-not (or (.contains s1 p) (.contains s2 p))
(.add s0 p)))

(recur (dec n) s0 s1)))))

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to

> clojure+u...@googlegroups.com

> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Didier

unread,

Nov 21, 2016, 3:05:29 PM11/21/16

to Clojure

I experimented with this a lot, and took everyone's advice, and this is the fastest I got, even faster then Michael Gardner's answer.

(deftype point5 [^long i ^long j]
  Object
  (equals [this that] (and (= (.i this) (.i ^point5 that))
                           (= (.j this) (.j ^point5 that))))
  (hashCode [this] (+ (.i this) (* 4000 (.j this)))))

(defn iter-neighbors5 [f ^point5 p]
  (let [i ^long (.i p)
        j ^long (.j p)]
    (f (->point5 (dec i) j))
    (f (->point5 (inc i) j))
    (f (->point5 i (dec j)))
    (f (->point5 i (inc j)))))

(defn nth5 [n p]
  (loop [n ^long n
         s1 (HashSet. [p])


         s2 (HashSet.)]
    (if (zero? n)
      s1
      (let [s0 (HashSet.)]


        (letfn [(add [p]
                     (when (not (or (.contains s1 p) (.contains s2 p)))
                       (.add s0 p)))]
               (doseq [p s1] (iter-neighbors5 add p))
               (recur (dec n) s0 s1))))))

This is basically the closest I can get to the F#, Rust and OCaml solution. I create a minimal class where I overload equals and hashCode using the same logic the F# solution uses. It gives me equal performance to F# on my machine: 3s.

Here's the run down from slowest to fastest with a size of 100 using criterium. All of them I did my best to avoid reflection and boxed numeric and had enabled uncheck math. Granted I did not find that boxing and unchecked math really provided a drastic improvement in my case.:

Using a record to hold the point with a java HashSet - 826ms
Using a struct to hold the point with a java HashSet - 246ms
Using a volatile with a Clojure HashSet and vectors for point - 50ms
Using an atom with a Clojure HashSet and vectors for point - 51ms
Using java.awt Point with a java HashSet and add as a closure - 41ms
Using vectors with a java HashSet, but with add done inside doseq, without a closure - 11ms
Using vectors with a java HashSet and add as a closure - 9ms
Using java.awt Point with a java HashSet, but with add done inside doseq, without a closure (Michael Gardner's solution) - 7.5ms
Using deftype with overloaded equals and hashCode as points with a java HashSet, but with add done inside doseq, without a closure - 6ms
Using deftype with overloaded equals and hashCode as points with a java HashSet and add as a closure - 4ms

This is what I gathered from my experiments:

Using java's mutable HashSet instead of Clojure's persistent hashset gave a significant speed improvement. From 30s to 5s. This was the most significant speed boost.
Records are really slow, probably because of the time to instantiate them. So slow, that even though it was using a mutable HashSet, it was still slower then Clojure sets with vectors.
Similarly, structs are also pretty slow, which I'd guess is also because of instantiating them. Though they were faster then Records.
Volatile did not noticeably improve performance compared to Atom, but I learned about them today, did not know Clojure had that.
Unsafe Math and type hinting numeric to get rid of boxing did not noticeably improve performance, but I learned about it today too. Speaking of which, is ^long faster then ^int?
Vectors were as fast as java.awt.Point. Though for some reason, putting the java.awt.Point inside a closure slowed the whole thing down quite a bit. Where as without it, using a closure actually performed faster. Maybe I did something wrong here, since I find that a little surprising and confusing.
Defining my own type optimized for my specific use-case was the fastest. deftypes are lightweight, and instantiate quickly, and with the custom equals and hashCode tailored to the problem, they performed best, matching F# and beating out OCaml and Rust.

Andy Fingerhut

unread,

Nov 21, 2016, 3:44:21 PM11/21/16

to clo...@googlegroups.com

Not sure which version of Clojure you are using, but in all versions up to the latest 'official' release, Clojure 1.8.0, records have their hash value calculated every time they are needed, with no caching of the calculated value. The hash value is needed every time an operation is done in a Clojure set, and probably a java HashSet, too.

The latest alpha release, Clojure 1.9.0-alpha14, contains a performance improvement where records cache their hash value after it is first calculated. http://dev.clojure.org/jira/browse/CLJ-1224

I don't know if that change would make your version of the code using records and Java HashSet's competitive with your fastest code or not, but you may wish to try it.

Andy

--

You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

clojure+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.

To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

Steve Miner

unread,

Nov 21, 2016, 7:09:43 PM11/21/16

to clo...@googlegroups.com

On Nov 21, 2016, at 3:05 PM, Didier <did...@gmail.com> wrote:

I experimented with this a lot, and took everyone's advice, and this is the fastest I got, even faster then Michael Gardner's answer.

(deftype point5 [^long i ^long j] Object (equals [this that] (and (= (.i this) (.i ^point5 that)) (= (.j this) (.j ^point5 that)))) (hashCode [this] (+ (.i this) (* 4000 (.j this)))))

Java equals and hashCode are notoriously tricky to get right. Equals should handle any kind of “that” so you typically need to test instance? (basically, instanceOf in Java). Here are a couple of reference links:

http://stackoverflow.com/questions/27581/what-issues-should-be-considered-when-overriding-equals-and-hashcode-in-java

http://www.angelikalanger.com/Articles/JavaSolutions/SecretsOfEquals/Equals.html

In this case, I suggest you try something like this:

(equals [this that] (or (identical? this that)

(and (instance? point5 that)

(= (.i this) (.i ^point5 that))

(= (.j this) (.j ^point5 that)))))

It might be a little slower but it should be safer.

If you’re out for speed, you could also try unchecked-add, unchecked-multiply, unchecked-inc, and unchecked-dec where it's safe to ignore the chance of overflows. I mention these only because you’re trying to compete on benchmarks. Most of the time, you shouldn’t use them.

Didier

unread,

Nov 21, 2016, 8:03:13 PM11/21/16

to Clojure

I tried it with the safe equals, and it is slightly slower, but still faster then all others at 4.5ms. The non safe equals gives me 4s. Though this is now within my error margin. If ire-run quick-bench, I sometime get a mean equal for each, so I don't think the instance check adds that much overhead if any at all.

@miner: Doesn't using the flag (set! *unchecked-math* :warn-on-boxed) gives me unchecked math automatically? I was under the impression that +, -, /, * etc. would all now perform in an equal way to unchecked-add, etc. If not, what is the difference?

@Andy: I tried with the "1.9.0-alpha14" version, and Records were still just as slow as with "1.8.0". Maybe I'm using them wrong.

On Tuesday, 15 November 2016 19:39:43 UTC-8, Didier wrote:

Michał Marczyk

unread,

Nov 21, 2016, 11:21:26 PM11/21/16

to clojure

Some further optimizations for a factor of ~2.3 speed-up over nth5 as copy & pasted from upthread (6.713742 ms → 2.897630 ms) in

(c/quick-bench (nth-shell 100 (point. 0 0)))

(1) java.util.HashSet has a ctor that takes initial capacity of the set as an int. Passing in (* 4 (.size s1)) when constructing s0 – (HashSet. (* 4 (.size s1)) – gives me a fair speed-up on its own.

(2) Also, (java.util.HashSet. [p]) is a reflective call – replacing that with (doto (java.util.HashSet.) (.add p)) seems to result in a tiny, but measurable speed-up as well (to 5.245931 ms).

(3) Using an iterator results in a good speed-up, particularly in a size-based loop (counting down from (.size current-set) rather than calling (.hasNext current-iterator)).

(4) Finally, inlining all the visiting and switching to direct ctor calls also helped.

Final code:

(deftype point [^long i ^long j]

Object

(equals [this that]

(and (= (.i this) (.i ^point that))

(= (.j this) (.j ^point that))))

(hashCode [this]

(+ (.i this) (* 4000 (.j this)))))

(defn nth-shell [^long n p]

(loop [n n

s1 (doto (HashSet.) (.add p))

s2 (HashSet.)]

(if (zero? n)

s1

(let [s0 (HashSet. (* 4 (.size s1)))

i1 (.iterator s1)]

(dotimes [_ (.size s1)]

(let [^point p (.next i1)

i ^long (.i p)

j ^long (.j p)

p1 (point. (dec i) j)

p2 (point. (inc i) j)

p3 (point. i (dec j))

p4 (point. i (inc j))]

(if-not (or (.contains s1 p1) (.contains s2 p1))

(.add s0 p1))

(if-not (or (.contains s1 p2) (.contains s2 p2))

(.add s0 p2))

(if-not (or (.contains s1 p3) (.contains s2 p3))

(.add s0 p3))

(if-not (or (.contains s1 p4) (.contains s2 p4))

(.add s0 p4))))

(recur (dec n) s0 s1)))))

Also, to check that this still outputs the same neighbours the original impl (nth*) does:

(defn persistent-shell [shell]

(persistent!

(reduce (fn [out ^point p]

(conj! out [(.-i p) (.-j p)]))

(transient #{})

shell)))

(= (sort (persistent-shell (nth-shell 500 (point. 0 0))))

(sort (nth* 500 [0 0])))

Cheers,

Michał

--

Michał Marczyk

unread,

Nov 21, 2016, 11:27:28 PM11/21/16

to clojure

PS. Results for the original input on my box. Going by the the timings posted above, yours is rather beefier, so this is probably faster than the current F# version.

(c/quick-bench (nth-shell 2000 (point. 0 0)))

Evaluation count : 6 in 6 samples of 1 calls.

Execution time mean : 2.956519 sec

Execution time std-deviation : 22.423970 ms

Execution time lower quantile : 2.930938 sec ( 2.5%)

Execution time upper quantile : 2.978283 sec (97.5%)

Overhead used : 21.423788 ns

Steve Miner

unread,

Nov 22, 2016, 8:44:08 AM11/22/16

to clo...@googlegroups.com

On Nov 21, 2016, at 8:03 PM, Didier <did...@gmail.com> wrote:

@miner: Doesn't using the flag (set! *unchecked-math* :warn-on-boxed) gives me unchecked math automatically? I was under the impression that +, -, /, * etc. would all now perform in an equal way to unchecked-add, etc. If not, what is the difference?

Yes, you’re right about the *unchecked-math* flag. There shouldn’t be any difference.

Andy Fingerhut

unread,

Nov 22, 2016, 1:58:44 PM11/22/16

to clo...@googlegroups.com

More likely, records being just as slow with Clojure 1.9.0-alpha14 probably mean that recalculating of record hashes was not a significant amount of the time your program was taking. Thanks for trying it out.

Andy

--

Didier

unread,

Nov 22, 2016, 7:01:15 PM11/22/16

to Clojure

@Marczyk, I did try your improvements, and it shaved off 2 seconds, from 4s for the nth5 to 2s for your implementation.

I'm curious to try it one change at a time to see if any one of the changes was responsible for a bigger part, or if its an its of equal improvements that total up to a big speed boost.

On Tuesday, 15 November 2016 19:39:43 UTC-8, Didier wrote:

Didier

unread,

Nov 24, 2016, 4:49:32 AM11/24/16

to Clojure

Here's my findings:

Speed increase from most increase to least:
* Pre-sizing the HashSet - from 4.7ms to 3.7ms
* Inlining - from 4.7ms to 3.9ms
* Using point. constructor instead of ->point - from 2.4ms to 2ms
* Using non relfective HashSet init - from 2.36ms to 2.17ms
* Using iterator instead of doseq - from 2.17ms to 2.07ms

All in all, after this whole experiment, those would be my go to recommendations if wanting to optimize something in Clojure:

- If you are creating a lot of temporary structures use deftype for optimum performance, avoid records, and go for vectors and maps as a good middle ground.
- If you are adding a lot to a data-structure, resort to a mutable version for best performance.
- If using a mutable data-structure, try to pre-size them, if you plan on them growing a lot.
- If performance is critical, try inlining the functions you call most, things called from a loop are great candidates.
- Avoid using ->wtv style constructors, prefer wtv. instead for best performance.
- Avoid reflection and checked boxed math, especially if inside a loop.
- Try an iterator as the fastest way to iterate over a structure, though at this point, you're probably spending too much time optimizing.

On Tuesday, 15 November 2016 19:39:43 UTC-8, Didier wrote:

Alex Miller

unread,

Nov 24, 2016, 8:24:08 AM11/24/16

to Clojure

That seems like a decent list. Also sounds like most of the Clojure Alioth contributions. :)

Reply all

Reply to author

Forward