Request for help optimising a Clojure program

2,090 views
Skip to first unread message

Paul Butcher

unread,
Oct 22, 2013, 12:28:33 PM10/22/13
to clo...@googlegroups.com
I've been playing around with a generalised version of the N-Queens problem that handles other chess pieces. I have a Scala implementation that uses an eager depth-first search. Given enough RAM (around 12G - it's very memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5 minutes on my MacBook Pro.

I then created a Clojure version of the same algorithm with the intention of (eventually) using laziness to avoid the memory issue. However, the performance of the Clojure version is several orders of magnitude worse than the Scala version (I've left it running overnight on the problem that the Scala version solves in 2.5 minutes and it still hasn't completed).

I'm struggling to see why the performance of the Clojure version is so much worse than the Scala. Profiling in YourKit suggests that it's spending 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain why. I would be grateful for any insight.

The code for the two different versions is on GitHub:


Notes:

- I know that an exhaustive depth-first search isn't a great way to tackle this problem. But I'd like to understand why I see such a dramatic difference between the performance of the Scala and Clojure versions.

- I believe that I've implemented the same algorithm in both Scala and Clojure - certainly they both generate the same results for small problems (there are 3x3 and 4x4 problems commented out in the source). But clearly I can't rule out the possibility that I've made a mistake in the Clojure implementation. If anyone can spot my stupid mistake, I'd greatly appreciate it.

Thanks in advance for any insight that anyone can offer.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Mark Engelberg

unread,
Oct 22, 2013, 1:45:01 PM10/22/13
to clojure
I looked briefly at the code and can confirm that to my eye, the two implementations appear to be implementing the same algorithm.

My first guess would be that the performance difference comes from Clojure's use of boxed numbers for all the positions.  Possibly you could get better performance by using something like (defrecord Posn [^int x ^int y]) rather than vectors for the positions.  It's tricky though, in Clojure, to make sure that the numbers remain primitives everywhere they are used.

This hypothesis doesn't fit, however, with your observation that most of the time is spent in conjing things into sets.  I can't explain why that would be a bottleneck.

David Nolen

unread,
Oct 22, 2013, 2:55:34 PM10/22/13
to clojure
I note that the Clojure version isn't being given 12gigs of RAM, is this something you're giving to the JVM after when you run a AOTed version of the Clojure code.

You also haven't said whether the timings for the smaller problems are significantly different between Scala and Clojure.

David


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Butcher

unread,
Oct 22, 2013, 3:09:35 PM10/22/13
to clo...@googlegroups.com
On 22 Oct 2013, at 18:45, Mark Engelberg <mark.en...@gmail.com> wrote:

I looked briefly at the code and can confirm that to my eye, the two implementations appear to be implementing the same algorithm.

Thanks - always good to have a second pair of eyes :-)

My first guess would be that the performance difference comes from Clojure's use of boxed numbers

That was my first thought too - but I thought that I'd look at a profile first :-)

Paul Butcher

unread,
Oct 22, 2013, 3:11:46 PM10/22/13
to clo...@googlegroups.com
On 22 Oct 2013, at 19:55, David Nolen <dnolen...@gmail.com> wrote:

I note that the Clojure version isn't being given 12gigs of RAM, is this something you're giving to the JVM after when you run a AOTed version of the Clojure code.

Yeah - I have tried giving it more RAM without any effect on the timing whatsoever. And I couldn't see the point of stopping people with less RAM than that from being able to run it :-)

You also haven't said whether the timings for the smaller problems are significantly different between Scala and Clojure.

They both run in less than a second in both Clojure and Scala, so no significant difference.

David Nolen

unread,
Oct 22, 2013, 3:20:24 PM10/22/13
to clojure
On Tue, Oct 22, 2013 at 3:11 PM, Paul Butcher <pa...@paulbutcher.com> wrote:
Yeah - I have tried giving it more RAM without any effect on the timing whatsoever. And I couldn't see the point of stopping people with less RAM than that from being able to run it :-)

But without enough RAM most JVMs will thrash in GC when running that code.

Saying both run in less than a second is not particularly informative for the small problems :) Does Scala take 4ms and Clojure takes 400ms on the small problems? That's a big difference.

David

Paul Butcher

unread,
Oct 23, 2013, 5:09:12 AM10/23/13
to clo...@googlegroups.com
On 22 Oct 2013, at 20:20, David Nolen <dnolen...@gmail.com> wrote:

On Tue, Oct 22, 2013 at 3:11 PM, Paul Butcher <pa...@paulbutcher.com> wrote:
Yeah - I have tried giving it more RAM without any effect on the timing whatsoever. And I couldn't see the point of stopping people with less RAM than that from being able to run it :-)

But without enough RAM most JVMs will thrash in GC when running that code.

Yes, of course. But as I said, I tried giving it more RAM (the same 12G as the Scala solution) and it made no difference whatsoever. In fact, YourKit shows me that, even after running for an hour (i.e. more than 10x longer than the Scala version takes to return the full solution) the Clojure version is still using less than 2G. Whatever is going on, it's not a result of a lack of RAM.

Saying both run in less than a second is not particularly informative for the small problems :) Does Scala take 4ms and Clojure takes 400ms on the small problems? That's a big difference.

Yes, of course. But my experience with small problems has been that differences between execution time, especially when talking about runtimes as dissimilar as the Clojure and Scala runtimes, are meaningless. Those differences tend to have more to do with startup costs and the vagaries of when hotspot happens to optimise things than anything fundamental.

But you asked, so here it is:

For Clojure, the 3x3 problem takes 7ms and the 4x4 problem takes 78ms. For Scala, it's 15ms and 252ms respectively.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

David Nolen

unread,
Oct 23, 2013, 7:21:01 AM10/23/13
to clo...@googlegroups.com
Those numbers make the larger problem runtime suspect. How are you running the Clojure version? With Lein?

Paul Butcher

unread,
Oct 23, 2013, 7:22:35 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 12:21, David Nolen <dnolen...@gmail.com> wrote:

Those numbers make the larger problem runtime suspect. How are you running the Clojure version? With Lein?

Yes - lein run.

Timothy Baldridge

unread,
Oct 23, 2013, 7:44:25 AM10/23/13
to clo...@googlegroups.com
Lein does some odd stuff to the JVM when it runs. Try adding ^:replace to jvm-opts:

:jvm-opts ^:replace [....]

That being said, the #1 rule of benchmarking with lein is don't benchmark with lein. The JVM lein spins up has a different set of goals (namely startup time). The best way to benchmark is to run the test several thousand times, from a jar created with lein uberjar. 

Timothy


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

David Nolen

unread,
Oct 23, 2013, 7:45:13 AM10/23/13
to clo...@googlegroups.com
Lein doesn't use your specified JVM settings, you must override with ^:replace. It's also a good idea to set "-server" as Lein does not default to this.


On Wednesday, October 23, 2013, Paul Butcher wrote:

Paul Butcher

unread,
Oct 23, 2013, 8:09:05 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 12:44, Timothy Baldridge <tbald...@gmail.com> wrote:

That being said, the #1 rule of benchmarking with lein is don't benchmark with lein. The JVM lein spins up has a different set of goals (namely startup time). The best way to benchmark is to run the test several thousand times, from a jar created with lein uberjar. 

I'm well aware of the issues associated with micro-benchmarking on the JVM. Bear in mind that what I'm doing isn't micro-benchmarking. I have a Scala program that runs in 2.5 minutes, which when re-implemented in Clojure doesn't finish after being run overnight. I.e. the Clojure is (at least) 2 orders of magnitude slower than the Scala.

I'd love to be able to run it just once - running it thousands of times with its current performance would require *years*.

I've now got it recompiled as an uberjar and I'm running it with 12G RAM (-Xms12G -Xmx12G). So far, it's been running for 12 minutes (i.e. 6x longer than the Scala version takes to complete) and YourKit shows me that it's used less than 300MB RAM (i.e. the vast majority of the 12G is unused).

YourKit also shows me that it's still the case that it's spending 99% of its time in PersistentHashSet.cons.

I completely understand that playing around with pre-compilation and JVM settings will make a difference to the observed performance. But not two orders of magnitude difference, surely?

It strikes me that there's clearly something that I'm missing in the Clojure implementation that's making a dramatic difference to performance. I'd love some help determining what that something is.

Timothy Baldridge

unread,
Oct 23, 2013, 8:18:19 AM10/23/13
to clo...@googlegroups.com
Great! you have a profiler, use that. Find the hotspots, use YourKit to find where the .cons is being called from, find things to optimize, and go from there. This is exactly the same process I would use any optimizations I attempted. 

Timothy


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Timothy Baldridge

unread,
Oct 23, 2013, 8:21:13 AM10/23/13
to clo...@googlegroups.com
This code will have a ton of allocations: 

(defn solve [rows cols pieces]
  (if (empty? pieces)
    #{#{}}
    (set
      (let [candidate (first pieces)]
        (for [solution (solve rows cols (rest pieces))
              x (range 0 cols)
              y (range 0 rows)
              :when (allowed? candidate [x y] solution)]
          (conj solution [candidate [x y]]))))))
Every item in the cons will require an allocation. After that, the entire thing has to be turned into a set. This entire thing is also recursive, so that compounds the problem. Perhaps re-write this as a reducer? Timothy

Timothy Baldridge

unread,
Oct 23, 2013, 8:22:24 AM10/23/13
to clo...@googlegroups.com
Eh, I said cons, but it's 6:30 in the morning. I meant the for's lazy seq requires at least one allocation per item. Your time spent in PersistentHashSet.cons is probably due to the use of set. 

Timothy

Paul Butcher

unread,
Oct 23, 2013, 8:36:37 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 13:18, Timothy Baldridge <tbald...@gmail.com> wrote:

Great! you have a profiler, use that. Find the hotspots, use YourKit to find where the .cons is being called from, find things to optimize, and go from there. This is exactly the same process I would use any optimizations I attempted. 

I fear I may have failed to convey the question I'm trying to answer.

I'm sure that I could create a faster solution in Clojure - that's not the question I'm trying to answer though.

What I'm trying to answer is why the *exact same* algorithm implemented in Scala is 1000x faster.

As far as I know, the Scala version of the algorithm creates exactly as many sets and performs exactly as many set operations as the Clojure does. But the Clojure version is 1000x slower. That strikes me as very strange and worth getting to the bottom of?

I have, of course, looked at the result of the profiler. And what it seems to be saying is that set operations in Clojure are ruinously slow. I'm not sure that I believe that though - I can't think of any reason why Clojure's sets should be 1000x slower than Scala's? So I'm asking for the help of this list.

Of course, I can't rule out the possibility that I've failed to convert the Scala version to Clojure and they're actually implementing very different algorithms - but I've had several people look at the implementations and confirm that they appear to be the same. Nevertheless, if there is a problem there, I'd also be interested to find it as I'm sure that it will teach me something.

Timothy Baldridge

unread,
Oct 23, 2013, 9:10:33 AM10/23/13
to clo...@googlegroups.com
From what I can tell, Clojure uses persistent hash sets inside of set. This can be fixed by replacing set with (into #{} data). Into uses transients internally and should be much faster. 

I don't know Scala, but from I can tell, Scala is using mutable collections. I think the combination of these two things (immutable collections plus lack of transients) could be causing some performance impact. I wouldn't expect it to be 1000x but it's a starting place. 

Timothy


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David Nolen

unread,
Oct 23, 2013, 9:18:11 AM10/23/13
to clo...@googlegroups.com
If set construction was 1000X worse why don't the smaller problem sizes exhibit exactly the same issue? If the Scala version requires 12G why is the Clojure version steady at 300M?

Aren't Scala for comprehensions optimized now into lower level loops? Clojure for comprehensions generate lazy sequences and all the allocation and forcing of thunks that implies.

David

On Wednesday, October 23, 2013, Paul Butcher wrote:

David Nolen

unread,
Oct 23, 2013, 9:19:05 AM10/23/13
to clo...@googlegroups.com
The Scala code is using immutable sets.

Paul Butcher

unread,
Oct 23, 2013, 10:00:05 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 14:18, David Nolen <dnolen...@gmail.com> wrote:

If set construction was 1000X worse why don't the smaller problem sizes exhibit exactly the same issue? If the Scala version requires 12G why is the Clojure version steady at 300M?

Both excellent questions.

Aren't Scala for comprehensions optimized now into lower level loops? Clojure for comprehensions generate lazy sequences and all the allocation and forcing of thunks that implies.

I don't think so - AFAIK Scala's "for" statement is rewritten into flatMap and map under the hood. Happy to be corrected though.

Paul Butcher

unread,
Oct 23, 2013, 10:00:28 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 14:19, David Nolen <dnolen...@gmail.com> wrote:

The Scala code is using immutable sets.

Correct - the Scala is immutable throughout.

David Powell

unread,
Oct 23, 2013, 10:32:48 AM10/23/13
to clo...@googlegroups.com

When you say it is spending 99% of its time in PersistentHashSet.cons, is that the time spent in just that method, or the time spent in that method and the methods that it calls?

Given that (set ...) is one of the first things called by (solve..), and that its contents are produced lazily by a for comprehension, if you are looking at the total time spent in PersistentHashSet.assoc, then maybe that will be particularly slow.  Not that that solves the problem, but I was wondering if PersistentHashSet.assoc might be a red herring.

Also, out of interest, are you using a sampling profiler, or an instrumenting one?


Sometimes the built-in hprof profiler is good for this sort of thing:

java -Xrunhprof:cpu=times,depth=4 -Xmx2g -jar target\chess-clojure-0.1.0-SNAPSHOT-standalone.jar

  - you can use either cpu=times for instrumenting, or cpu=samples for sampling.

It is nice because the depth parameter lets you find hotspots including a few steps up the stacktrace (4 in this case).  And you can use Ctrl-\ / Ctrl-Break / SIG_QUIT to get it to dump the current stats while the program is still running to see if some methods are getting worse over time.




Paul Butcher

unread,
Oct 23, 2013, 10:47:51 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 15:32, David Powell <djpo...@djpowell.net> wrote:

When you say it is spending 99% of its time in PersistentHashSet.cons, is that the time spent in just that method, or the time spent in that method and the methods that it calls?

The latter.

Given that (set ...) is one of the first things called by (solve..), and that its contents are produced lazily by a for comprehension, if you are looking at the total time spent in PersistentHashSet.assoc, then maybe that will be particularly slow.  Not that that solves the problem, but I was wondering if PersistentHashSet.assoc might be a red herring.

Ah - that's entirely plausible. I have basically no experience of profiling Clojure, or understanding the way in which laziness affects doing so (I suspect that it complicates things somewhat?).

Also, out of interest, are you using a sampling profiler, or an instrumenting one?

Sampling.

Sometimes the built-in hprof profiler is good for this sort of thing:

java -Xrunhprof:cpu=times,depth=4 -Xmx2g -jar target\chess-clojure-0.1.0-SNAPSHOT-standalone.jar

  - you can use either cpu=times for instrumenting, or cpu=samples for sampling.

It is nice because the depth parameter lets you find hotspots including a few steps up the stacktrace (4 in this case).  And you can use Ctrl-\ / Ctrl-Break / SIG_QUIT to get it to dump the current stats while the program is still running to see if some methods are getting worse over time.

I've not used hprof before - I'll see if I can divine anything useful from it.

Thanks.

Mark Engelberg

unread,
Oct 23, 2013, 11:46:29 AM10/23/13
to clojure
For those of you playing along at home, I also ran Paul's chess program in YourKit with the sampling profiler.  Here are all the calls that appear in the "HotSpots" view, along with time spent in those methods.

java.io.PushbackInputStream.read() 422977
clojure.lang.PersistentHashMap$HashCollisionNode.findIndex(Object) 206687
clojure.lang.Util.equiv(Object, Object) 204984
clojure.lang.PersistentHashMap$BitmapIndexedNode.find(int, int, Object, Object) 145968
clojure.lang.PersistentHashMap$NodeSeq.create(Object[]) 81031
clojure.lang.APersistentVector.doEquiv(IPersistentVector, Object) 41656
clojure.lang.Util.dohasheq(IHashEq) 31453
clojure.lang.RT.next(Object) 28671
clojure.lang.PersistentHashMap$NodeSeq.next() 24484
clojure.lang.PersistentVector$1.<init>(PersistentVector, int, int) 24015

I concur with Paul that it looks like the program is spending nearly all its time mired in hashing.  Even the calls on here that aren't obviously related to hashing turn out on further inspection to be related to hashing.

For example, doing a backtrace on Util.equiv, one sees that the = operation in the allowed? function is taking negligible time and the equiv method time is almost entirely spent in hash set lookup resolution.  Similarly, the RT.next method is negligible time spent stepping through his for sequence, and appears to be almost entirely about stepping through hash collision nodes.

My best guess at this point is that there is some sort of bug with the set implementation causing too many items to hash to the same bucket.  I have seen this behavior before from time to time when profiling some of my own code (a suspiciously inordinate amount of time spent resolving hash collisions) but have never been able to pin it down with such a pure example.

I'll admit that my skills in reading and making sense out of profiler output is pretty weak, but it looks to me that that's what is going on here.

Paul Butcher

unread,
Oct 23, 2013, 11:49:51 AM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 15:47, Paul Butcher <pa...@paulbutcher.com> wrote:

I've not used hprof before - I'll see if I can divine anything useful from it.

So I've run hprof and got some results. I'm not sure exactly how to interpret them though. Attached is the output from hprof containing two dumps - one after the program's been running for around 3 minutes and another after it's been running 40 minutes. 

If I'm reading it right, it's saying that the two most time-consuming methods are Object.wait and ReferenceQueue.remove? After that, though, there's a long list of APersistetSet methods, which would suggest that set operations are where most of the time is being spent?

But I'd greatly appreciate insight from someone used to reading hprof dumps...

java.hprof.txt.zip

Andy Fingerhut

unread,
Oct 23, 2013, 12:06:52 PM10/23/13
to clo...@googlegroups.com
Mark, I think you have hit the nail on the head.  I have instrumented a copy of Paul's Clojure program to print the hash code of all of the solutions in the set returned by solve, and there are *many* pairs of solutions that have identical hash values, and thus collide in a PersistentHashSet.  Below is an excerpt from that output:

    97392280 = (hash #{[:N [2 0]] [:R [4 2]] [:K [0 0]] [:Q [1 3]] [:B [3 0]]})
    97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [1 2]] [:Q [4 3]] [:B [2 0]]})
    97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [2 1]] [:Q [1 4]] [:B [4 0]]})
    97392280 = (hash #{[:N [4 0]] [:K [0 0]] [:R [3 3]] [:Q [1 2]] [:B [2 0]]})
    97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [4 0]]})
    97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [4 0]]})
    97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [3 0]]})
    97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [3 0]]})
    97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:Q [4 2]] [:R [2 3]] [:B [1 0]]})
    97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:R [4 2]] [:Q [2 3]] [:B [1 0]]})
    97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:Q [4 2]] [:R [2 3]] [:B [0 0]]})
    97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:R [4 2]] [:Q [2 3]] [:B [0 0]]})
    97392280 = (hash #{[:K [4 0]] [:N [0 0]] [:Q [3 2]] [:R [1 3]] [:B [2 0]]})
    97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [3 2]] [:Q [0 3]] [:B [2 0]]})
    97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [2 1]] [:Q [3 4]] [:B [0 0]]})
    97392280 = (hash #{[:N [2 0]] [:K [4 0]] [:R [0 2]] [:Q [3 3]] [:B [1 0]]})

The hash value of a set is the sum of the hashes of its elements, because sets are not ordered.  The hash of a vector is dependent on the order of the elements.  In fact, for 2-element vectors it is 31*(hash of 1st elem) + (hash of 2nd elem).  The hash value of a small integer is just the integer value itself.

Thus with many small integers in 2-element vectors like in this problem, all it takes is having a set of vectors where the first elements sum to the same value, and the second elements sum to the same value, and the hash of the whole set of vectors will be equal.

Exactly what would be the best way to improve the hash code for situations like this, I am not sure yet.

Andy


--

Paul Butcher

unread,
Oct 23, 2013, 12:18:57 PM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 17:06, Andy Fingerhut <andy.fi...@gmail.com> wrote:

I have instrumented a copy of Paul's Clojure program to print the hash code of all of the solutions in the set returned by solve, and there are *many* pairs of solutions that have identical hash values

Aha! The smoking gun :-)

Very many thanks, Andy.

Andy Fingerhut

unread,
Oct 23, 2013, 12:43:27 PM10/23/13
to clo...@googlegroups.com
Paul, your function solve returns a set of solutions, but there is nothing on the program that seems to rely upon being able to quickly test whether a particular solution is in such a set.  Returning a sequence from solve is much faster, since it avoids the PersistentHashSet hash collision issue entirely.

Just replace #{#{}} with [#{}], and remove the call to 'set' in solve.

Andy


Paul Butcher

unread,
Oct 23, 2013, 1:13:49 PM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 17:43, Andy Fingerhut <andy.fi...@gmail.com> wrote:

Paul, your function solve returns a set of solutions, but there is nothing on the program that seems to rely upon being able to quickly test whether a particular solution is in such a set.  Returning a sequence from solve is much faster, since it avoids the PersistentHashSet hash collision issue entirely.

Just replace #{#{}} with [#{}], and remove the call to 'set' in solve.

The problem with that, Andy, is that it will result in a lot of duplicates. For the 4x4 problem, for example, that approach raises the size of the returned collection from 8 to 384.

My guess is that for the 6x9 problem it will just remove the set handling problem and replace it with a memory/time issue as the search space grows (and I only have 16G in my MacBook Pro :-)

I don't need to get this working - I only knocked it together as an interesting exercise. And if it's helped to track down an issue in the Clojure runtime, it's already achieved far more than I expected it to :-)

Thanks again,

Andy Fingerhut

unread,
Oct 23, 2013, 1:15:46 PM10/23/13
to clo...@googlegroups.com
One possibility of a change to PersistentHashSet and PersistentHashMap that would help avoid these cases with pathologically bad performance:

If we had a 'universal comparator', i.e. a comparison function that provided a total order on any pair of values that anyone would ever want to put into a set or use as a map key, then instead of having linked lists for values that collide, we could have trees like those in the implementations of sorted-maps and sorted-sets today.  Such a comparison function might look something like 'cc-cmp' a couple of pages down from this link:

    https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-that-work-between-different-types

However, I believe there will always be some types where such a comparator will fail, and thus not truly be 'universal'.  Still, if it hit the sweet spot of 99.99% of the use cases people cared about, it might be good to have.

Andy

Andy Fingerhut

unread,
Oct 23, 2013, 1:18:11 PM10/23/13
to clo...@googlegroups.com
Ah, right you are.  I missed that.

I think the collisions possible with PersistentHashSet and PersistentHashMap were 'well known', at least to a subset of people.  What is probably less well known is a nice small program that caused many collisions with today's hash functions, and an explanation of why.  Thanks for that.

Andy


Paul Butcher

unread,
Oct 23, 2013, 1:28:41 PM10/23/13
to clo...@googlegroups.com
On 23 Oct 2013, at 18:15, Andy Fingerhut <andy.fi...@gmail.com> wrote:

If we had a 'universal comparator', i.e. a comparison function that provided a total order on any pair of values that anyone would ever want to put into a set or use as a map key, then instead of having linked lists for values that collide, we could have trees like those in the implementations of sorted-maps and sorted-sets today.

Wouldn't it be better to improve the way that hashes are calculated for vectors? A good hash function should make it unlikely that similar values have the same hash. The current algorithm seems to make that more likely than it should?

Andy Fingerhut

unread,
Oct 23, 2013, 1:30:32 PM10/23/13
to clo...@googlegroups.com
If you can think of a different hash function for vectors that doesn't lead to these types of collisions, I'm all ears.  The issue is that the hash function for sets adds up the hash values of its elements.  Those sums of vector hash values are what are colliding, not the individual vector hash values themselves.

Andy


David Nolen

unread,
Oct 23, 2013, 1:37:54 PM10/23/13
to clojure
The problem here is that you're not following your Scala solution closely enough. I suspect if you used defrecords to represent the pieces the way that you used a class in Scala you can avoid the number of collisions for larger problems.

Not tested much but I tried something like the following and I no longer see hash collisions dominating in YourKit:

(defrecord Piece [name pos])

(defn piece [name  pos]
  (Piece. name pos))

(defn takes? [name [dx dy]]
  (case name
    :K (and (<= dx 1) (<= dy 1))
    :Q (or (zero? dx) (zero? dy) (== dx dy))
    :R (or (zero? dx) (zero? dy))
    :B (== dx dy)
    :N (or (and (== dx 1) (== dy 2)) (and (== dx 2) (== dy 1)))))

(defn allows? [{name :name [px py] :pos} [x y]]
  (let [delta [(Math/abs (int (- px x))) (Math/abs (int (- py y)))]]
    (and (not (and (== px x) (== py y))) (not (takes? name delta)))))

(defn allowed? [{cname :name cpos :pos :as cand} solution]
  (every?
    (fn [{pos :pos :as piece}] 
      (and (allows? piece cpos) (allows? cand pos)))
    solution))

(defn solve [rows cols pieces]
  (if (empty? pieces)
    #{#{}}
    (set
      (let [name (first pieces)]
        (for [solution (solve rows cols (rest pieces))
              x (range 0 cols)
              y (range 0 rows)
              :let [cand (piece name [x y])]
              :when (allowed? cand solution)]
          (conj solution cand))))))

David

Pablo Nussembaum

unread,
Oct 23, 2013, 1:39:48 PM10/23/13
to clo...@googlegroups.com
What do you think of adding odd elements and substract even ones?

[1 2] = 1 - 2 = -1
[2 1] = 2 - 1 = 1

Mark Engelberg

unread,
Oct 23, 2013, 2:02:31 PM10/23/13
to clojure
On Wed, Oct 23, 2013 at 10:39 AM, Pablo Nussembaum <bau...@gmail.com> wrote:
What do you think of adding odd elements and substract even ones?

[1 2] = 1 - 2 = -1
[2 1] = 2 - 1 = 1

Wouldn't multiplying the hash values of the items in a set (rather than adding) result in far fewer accidental collisions from similar items?

Andy Fingerhut

unread,
Oct 23, 2013, 2:06:28 PM10/23/13
to clo...@googlegroups.com
That makes them each different, but if you put them in a set together, like #{[1 2] [2 1]}, the hash of the set is 1-1=0 no matter what order those vectors are put into the set.

That is the difficulty.


On Wed, Oct 23, 2013 at 10:39 AM, Pablo Nussembaum <bau...@gmail.com> wrote:

Michael Gardner

unread,
Oct 23, 2013, 2:44:55 PM10/23/13
to clo...@googlegroups.com
On Oct 23, 2013, at 12:30 , Andy Fingerhut <andy.fi...@gmail.com> wrote:

> If you can think of a different hash function for vectors that doesn't lead to these types of collisions, I'm all ears. The issue is that the hash function for sets adds up the hash values of its elements. Those sums of vector hash values are what are colliding, not the individual vector hash values themselves.

What about the formula used by Boost's hash_combine?

http://stackoverflow.com/questions/4948780/magic-number-in-boosthash-combine

Mark Engelberg

unread,
Oct 23, 2013, 3:10:21 PM10/23/13
to clojure
I see how changing the hash function for vectors/records/maps would be useful, because if their hash values were more widely distributed, then adding up those hash values would be less likely to yield the same sum.  But wouldn't the simplest solution be to change the hashing strategy for sets to be something more complicated than a sum?


Mark Engelberg

unread,
Oct 23, 2013, 3:34:32 PM10/23/13
to clojure
Another example of why this has more to do with the hashing of sets, than underlying elements:
=> (hash #{#{1 2} 3})
6
=> (hash #{#{1 3} 2})
6

Michael Gardner

unread,
Oct 23, 2013, 4:28:37 PM10/23/13
to clo...@googlegroups.com
The hash-combining function for sets must be commutative. But in order for the hashes of #{#{1 2} 3} and #{#{1 3} 2} to be unequal, it must also be non-associative. That's possible, but I'm not sure what would be a good candidate function. Something involving averaging, perhaps?

Mark Engelberg

unread,
Oct 23, 2013, 6:03:33 PM10/23/13
to clojure
It is true that it must be commutative, but not true that it must be non-associative.  Here is a sample implementation of a hash function for sets that is commutative and associative but doesn't collide for these sorts of common rearrangements.  The idea of this proof of concept is that the hash value of a set is the sum of (expt 3 (abs (hash item))) for each item in the set.

Implementation:

;; First a wrap-around implementation of expt
(defn- expt [base pow]
  (if (zero? pow) 1
    (loop [n (int pow), y (int 1), z (int base)]
      (let [t (even? n), n (quot n 2)]
        (cond
          t (recur n y (unchecked-multiply-int z z))
          (zero? n) (unchecked-multiply-int z y)
          :else (recur n (unchecked-multiply-int z y) (unchecked-multiply-int z z)))))))

(declare my-hash)

(defn set-hash [coll]
  (reduce + (for [item (seq coll)] (expt 3 (my-hash item)))))

(defn my-hash [i]
  (if (set? i) (set-hash i) (hash i)))

=> (my-hash #{#{1 2} 3})
531468
=> (my-hash #{#{1 3} 2})
-1010140990

=> (my-hash #{#{1 3} #{2 4}})
-2065039966
=> (my-hash #{#{1 2} #{3 4}})
-868484254

=> (my-hash #{[1 2] [2 1]})
-3497906806
=> (my-hash #{[1 1] [2 2]})
-443882362


So this is certainly a solvable problem, the real question is how to go about designing such a hash function around a smaller number of simple arithmetic operations.  Also, the above implementation is kind of lame because that expt function treats the hash value of the item as a positive number, effectively cutting the possible hash values in half.  So this implementation isn't practical, but should serve as an illustration of what could be done.

Michael Gardner

unread,
Oct 23, 2013, 6:39:23 PM10/23/13
to clo...@googlegroups.com
On Oct 23, 2013, at 17:03 , Mark Engelberg <mark.en...@gmail.com> wrote:

> It is true that it must be commutative, but not true that it must be non-associative. Here is a sample implementation of a hash function for sets that is commutative and associative but doesn't collide for these sorts of common rearrangements. The idea of this proof of concept is that the hash value of a set is the sum of (expt 3 (abs (hash item))) for each item in the set.

Associativity is only meaningful for binary operators-- I was assuming a binary hash-combining function that would be applied via reduce or such. The function you've described can't be implemented as a binary operator without changing its behavior, so it can't be described as associative. But you're right that Set's hash function doesn't need to be associative, since it doesn't actually need to be a binary operator.

BTW, while looking into this I discovered that Clojure already defines a function clojure.lang.Util/hashCombine that uses Boost's hash-combining algorithm, but it doesn't seem to be used for anything except Symbols at the moment.

Mark Engelberg

unread,
Oct 23, 2013, 6:51:18 PM10/23/13
to clojure
OK, here is a more serious proposal, comprised of simple, fast operations:

To hash a set, you take each of the items in the set, compute the hash value, xor it with the golden number, and square it.
Add these results together.

Essentially, it's the sum of the squares of the hashes, the xor is thrown in there primarily so common numbers like 0 have a meaningful effect on the hash (without the xor there's no difference between #{0 1} and #{1}, for example).

Here's Clojure code:

(declare my-hash)

(defn set-hash [coll]
  (reduce + (for [item (seq coll)]
              (let [hash-item (.intValue (bit-xor 0x9e3779b9 (my-hash item)))]
                (unchecked-multiply-int hash-item hash-item)))))               

(defn my-hash [i]
  (if (set? i) (set-hash i) (hash i)))

Note that the .intValue call can go away if you convert this to Java and use int xor rather than Clojure's xor.

This gives really nice spread-out results for similar sets:

=> (my-hash #{{1 2} 3})
1067103816
=> (my-hash #{{1 3} 2})
3094912306
=> (my-hash #{{1 4} {2 3}})
-3227863472
=> (my-hash #{{1 3} {2 4}})
2855562010

=> (my-hash #{[1 1] [2 2]})

=> (my-hash #{[1 2] [2 1]})
-1060041718

Mark Engelberg

unread,
Oct 23, 2013, 7:46:09 PM10/23/13
to clojure
I'd like to point out that my proposed hash function lends itself nicely to incremental hashing for sets.

Furthermore, maps also have very weak hashing, and would benefit from a similar change.  Right now:
=> (hash {1 1})
0
=> (hash {1 1 2 2})
0
=> (hash {1 1 2 2 3 3})
0
=> (hash {1 2 2 3 3 1})
6
=> (hash {1 3 2 1 3 2})
6

Cedric Greevey

unread,
Oct 23, 2013, 8:31:56 PM10/23/13
to clo...@googlegroups.com
There's still a problem with collisions among *vectors* though.

I'd say the real underlying problem is that small integers hash to themselves. If you let N be a fairly large number satisfying GCD(N, 2^32 - 1) = 1, then multiplying 32-bit integers by N would send nearby integers to widely separated ones, while inducing a permutation on nonzero integers -- the hash space wouldn't be any smaller. That alone won't help much with the vector problem, since 3N + 2N = 5N just as 3 + 2 = 5, though it may help avoid collisions with small integers in small hash maps when the hash is being used modulo a small hash-table length. What might help with vectors, sets, and other small data structures containing integers (but would shrink the hash space by about half) is to additionally square the result (all operations being 2s-complement 32-bit integer operations with wrapping -- Java "int" arithmetic). Now 1 and 4 won't give the same summed hashes as 3 and 2, and 0 and 5 will be different again. You might also want to add some constant to avoid 0 hashing to 0.


Mark Engelberg

unread,
Oct 23, 2013, 8:41:15 PM10/23/13
to clojure
Perhaps I don't understand your point, but vector hashing is not currently the sum of hashes.  It's a more complex computation.

=> (hash [0 2])
963
=> (hash [2 0])
1023
=> (hash [2])
33

The fact that numbers hash to themselves means that the resulting hashes are not hugely spread apart, but collisions don't seem to be as common as with sets and maps.  I'm sure it could be improved, but vector/sequence hashing doesn't strike me as a huge issue.

Stuart Halloway

unread,
Oct 23, 2013, 8:46:03 PM10/23/13
to clo...@googlegroups.com
Is the Scala for lazy or eager?  If the latter, you are not comparing apples to apples (separate from the other differences David already pointed out.)

Stu


Mark Engelberg

unread,
Oct 23, 2013, 9:12:21 PM10/23/13
to clojure
Even though vectors aren't as prone to collision between similar-looking vectors, I think you are right that the smallness of the numbers is a problem:

Consider the following:

=> (count (set (for [x (range 200) y (range 200)] (hash [x y]))))
6369

This means that out of 40,000 common ordered pairs, there are only 6369 distinct hash values.  That's not good.

So yes, upon further thought, I think sequence hashing should be improved as well.

I want to emphasize that I don't view this as merely a "theoretical" problem.  I have noticed for over a year now, when profiling my Clojure code, that a surprising amount of time was spent on dealing with hash collisions.  I chalked it up as the natural behavior of hash tables which will occasionally have collisions, and just learned to live with it, but the more I play around with the existing hash function on data similar to what I use (with small numbers in slightly varying structures), the more I'm convinced that Clojure's weak hashing strategy explains the performance issues I've had.  Would love to see the Clojure community tackle this head-on; I think there's a lot of performance to be gained for many real apps by doing this.

Mikera

unread,
Oct 24, 2013, 12:05:25 AM10/24/13
to clo...@googlegroups.com
You can't change the hashCode of lists / vectors without breaking the contract for java.util.List (which PersistentList and PersistentVector both implement) - http://docs.oracle.com/javase/7/docs/api/java/util/List.html#hashCode()

I don't think breaking this would be a good idea.

The java.util.List hashCode isn't actually too bad.  It's a decent compromise between an efficient implementation and reasonably effective hashing for most real-world use cases.

Sure you can construct artificial examples where you get some collisions, but that's true of any hash code function. But note that the maximum number of collisions is still pretty low:

(apply max (vals (frequencies (for [x (range 200) y (range 200)] (hash [x y])))))
=> 7

That's actually the number you really care about, since it is the number of objects in the same hash bucket that drives the pathological O(n^2) behaviours.
 

Mark Engelberg

unread,
Oct 24, 2013, 12:40:13 AM10/24/13
to clojure
OK, I get that if lists have to match the Java hash code, then it's a nonstarter to try to improve it.  However, Clojure makes a distinction between hasheq and hashCode, one of which is used within Clojure and one of which is used for Java interop, and I don't think they necessarily need to match.

I don't really consider this to be a contrived case.  It's pretty typical, in my experience, for Clojure programmers to choose to represent a 2D array as a map from [x y] to values.  That's often more practical to intialize and work with than a vectors-in-vector layout.  So a map with keys ranging from [0 0] to [199 199] would be what you'd get for a 200x200 array.

Having 6-7 objects per bucket isn't the end of the world, but it means that all your lookups are several times slower than they could be.

All that said, I think that improving set hashing is a more pressing issue.



--

Mikera

unread,
Oct 24, 2013, 1:44:23 AM10/24/13
to clo...@googlegroups.com
Agreed, set hashing is definitely a bigger issue.

Also agree array indexing is is a pretty common use case, though if people care about performance for manipulating arrays of data with integer indexes then they should probably be looking at core.matrix rather than trying to simulate array programming with maps and vectors. Right tool for the right job, and all that.....

Mark Engelberg

unread,
Oct 24, 2013, 2:48:38 AM10/24/13
to clojure
BTW, in the sample set hashing function I provided,
the `reduce +` should have been `reduce unchecked-add-int`

I've done some timing tests (on a loop/recur version that leverages primitive math better) and it looks like the strategy I proposed has a negligible impact on the current hashing strategy, with the benefits of better hash distribution.

Mark Engelberg

unread,
Oct 24, 2013, 2:50:38 AM10/24/13
to clojure
Ugh, I meant "negligible impact on the running time versus the current hashing strategy"

Paul Butcher

unread,
Oct 24, 2013, 5:27:27 AM10/24/13
to clo...@googlegroups.com
On 24 Oct 2013, at 01:46, Stuart Halloway <stuart....@gmail.com> wrote:

Is the Scala for lazy or eager?  If the latter, you are not comparing apples to apples (separate from the other differences David already pointed out.)

Oh, it's eager.

Bear in mind that my purpose wasn't really to directly compare Scala and Clojure - in fact, I was originally planning to knock up a fully lazy solution in Clojure to demonstrate that the memory-hungry eager Scala solution could be trivially made to run in much less memory in Clojure. Unfortunately I got mired in this performance issue before I could get to that point.

Phillip Lord

unread,
Oct 24, 2013, 6:34:14 AM10/24/13
to clo...@googlegroups.com

What does Scala do? I mean, given that it doesn't have the same problem,
perhaps it has a solution?
> --

--
Phillip Lord, Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: philli...@newcastle.ac.uk
School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower, skype: russet_apples
Newcastle University, twitter: phillord
NE1 7RU

Paul Butcher

unread,
Oct 24, 2013, 7:46:00 AM10/24/13
to clo...@googlegroups.com
On 24 Oct 2013, at 11:34, Phillip Lord <philli...@newcastle.ac.uk> wrote:

What does Scala do? I mean, given that it doesn't have the same problem,
perhaps it has a solution?

By default, Scala uses a tree-based set, not a hash-based set. So the fact that it doesn't run into hashing issues isn't surprising (and is yet another example of how this isn't a perfect apples-to-apples comparison). 

However, I've just tried rerunning the Scala using a hash-based set implementation, and that takes the runtime down from around 2.5 minutes to just over 2.

Paul Butcher

unread,
Oct 24, 2013, 8:08:40 AM10/24/13
to clo...@googlegroups.com
On 23 Oct 2013, at 18:37, David Nolen <dnolen...@gmail.com> wrote:

The problem here is that you're not following your Scala solution closely enough. I suspect if you used defrecords to represent the pieces the way that you used a class in Scala you can avoid the number of collisions for larger problems.

Not tested much but I tried something like the following and I no longer see hash collisions dominating in YourKit:

It would be lovely if it was that easy, David. However, I still see lots of hash collisions when using records. Here's a portion of the output from an implementation that's instrumented to print hash codes, similar to what Andy did with the vector-based implementation:

"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 2]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :N, :pos [0 1]} #chess_clojure.core.Piece{:name :R, :pos [8 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :N, :pos [0 4]} #chess_clojure.core.Piece{:name :Q, :pos [3 3]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :N, :pos [0 4]} #chess_clojure.core.Piece{:name :Q, :pos [3 3]} #chess_clojure.core.Piece{:name :B, :pos [1 4]} #chess_clojure.core.Piece{:name :R, :pos [8 1]}}"
"842565000: #{#chess_clojure.core.Piece{:name :N, :pos [0 3]} #chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :N, :pos [0 3]} #chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [8 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :R, :pos [0 3]} #chess_clojure.core.Piece{:name :Q, :pos [4 2]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :N, :pos [8 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :R, :pos [0 3]} #chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :N, :pos [8 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 2]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :R, :pos [0 0]} #chess_clojure.core.Piece{:name :N, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 2]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :B, :pos [0 5]} #chess_clojure.core.Piece{:name :N, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 2]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :B, :pos [0 4]} #chess_clojure.core.Piece{:name :N, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :R, :pos [0 3]} #chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :N, :pos [8 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :B, :pos [0 5]} #chess_clojure.core.Piece{:name :N, :pos [8 2]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :B, :pos [0 4]} #chess_clojure.core.Piece{:name :N, :pos [8 2]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :R, :pos [0 2]} #chess_clojure.core.Piece{:name :N, :pos [8 1]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :B, :pos [0 4]} #chess_clojure.core.Piece{:name :R, :pos [1 2]} #chess_clojure.core.Piece{:name :N, :pos [8 1]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :R, :pos [7 2]} #chess_clojure.core.Piece{:name :N, :pos [0 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 2]} #chess_clojure.core.Piece{:name :B, :pos [0 4]} #chess_clojure.core.Piece{:name :N, :pos [6 5]} #chess_clojure.core.Piece{:name :R, :pos [1 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 0]} #chess_clojure.core.Piece{:name :R, :pos [0 2]} #chess_clojure.core.Piece{:name :N, :pos [6 5]} #chess_clojure.core.Piece{:name :B, :pos [1 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 2]} #chess_clojure.core.Piece{:name :R, :pos [0 0]} #chess_clojure.core.Piece{:name :N, :pos [6 5]} #chess_clojure.core.Piece{:name :B, :pos [1 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 2]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :N, :pos [6 5]} #chess_clojure.core.Piece{:name :B, :pos [0 3]}}"
"842565000: #{#chess_clojure.core.Piece{:name :R, :pos [1 4]} #chess_clojure.core.Piece{:name :Q, :pos [3 3]} #chess_clojure.core.Piece{:name :N, :pos [6 5]} #chess_clojure.core.Piece{:name :B, :pos [2 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [7 2]} #chess_clojure.core.Piece{:name :N, :pos [0 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 1]} #chess_clojure.core.Piece{:name :N, :pos [7 3]} #chess_clojure.core.Piece{:name :R, :pos [1 5]} #chess_clojure.core.Piece{:name :B, :pos [0 3]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 1]} #chess_clojure.core.Piece{:name :R, :pos [1 2]} #chess_clojure.core.Piece{:name :N, :pos [7 3]} #chess_clojure.core.Piece{:name :B, :pos [2 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :N, :pos [7 3]} #chess_clojure.core.Piece{:name :B, :pos [0 3]}}"
"842565000: #{#chess_clojure.core.Piece{:name :R, :pos [1 4]} #chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :N, :pos [8 3]} #chess_clojure.core.Piece{:name :B, :pos [0 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [0 5]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :R, :pos [1 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :B, :pos [0 4]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :R, :pos [1 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [0 5]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :B, :pos [1 0]} #chess_clojure.core.Piece{:name :R, :pos [0 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [4 1]} #chess_clojure.core.Piece{:name :R, :pos [1 3]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :B, :pos [2 0]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :R, :pos [0 0]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :B, :pos [1 4]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 5]} #chess_clojure.core.Piece{:name :R, :pos [1 1]} #chess_clojure.core.Piece{:name :N, :pos [7 2]} #chess_clojure.core.Piece{:name :B, :pos [0 3]}}"
"842565000: #{#chess_clojure.core.Piece{:name :Q, :pos [3 4]} #chess_clojure.core.Piece{:name :B, :pos [1 5]} #chess_clojure.core.Piece{:name :N, :pos [0 2]} #chess_clojure.core.Piece{:name :R, :pos [8 1]}}"

David Nolen

unread,
Oct 24, 2013, 12:12:43 PM10/24/13
to clojure
OK, it appears defrecord suffers from a problem that used to plague Scala case classes as well - http://issues.scala-lang.org/browse/SI-2537.

David

Evan Gamble

unread,
Oct 24, 2013, 8:37:53 PM10/24/13
to clo...@googlegroups.com
As Andy Fingerhut pointed out, since (hash [a b]) is 31*(a + 30) + (b + 31) when a and b are ints, summing the hashes of 2-tuples when the values are small ints, as happens when hashing sets of such 2-tuples, is quite likely to lead to collisions. This particular problem could be avoided by spreading the hashes of ints out to large numbers with a non-linear function, not just multiplying them all by some N.

Stuart Halloway

unread,
Oct 25, 2013, 12:11:34 PM10/25/13
to clo...@googlegroups.com
To save others from having to read this whole thread, is the following summary correct?

1. We believe that the performance issue Paul first encountered is substantially / entirely caused by hash collisions.

2. The hash collisions could be improved by spreading the hashes of numbers (and not needing to mess with hashes of collections).

If the above summary is correct, then I have the following questions:

a. Is the approach Scala took substantially #2 above, or something different / more?

b. Is changing the hash codes of numbers going to break expectations on the Java side of things?


Ben Mabey

unread,
Oct 25, 2013, 12:23:46 PM10/25/13
to clo...@googlegroups.com
On 10/24/13 5:46 AM, Paul Butcher wrote:
On 24 Oct 2013, at 11:34, Phillip Lord <philli...@newcastle.ac.uk> wrote:

What does Scala do? I mean, given that it doesn't have the same problem,
perhaps it has a solution?

By default, Scala uses a tree-based set, not a hash-based set. So the fact that it doesn't run into hashing issues isn't surprising (and is yet another example of how this isn't a perfect apples-to-apples comparison). 

However, I've just tried rerunning the Scala using a hash-based set implementation, and that takes the runtime down from around 2.5 minutes to just over 2.

Which hash-based set implementation did you use for that comparison?  It would be nice to look at the hashing fn it is using.

-Ben

Andy Fingerhut

unread,
Oct 25, 2013, 12:26:41 PM10/25/13
to clo...@googlegroups.com
Two partial answers in-line below, but as it says there, Mark Engelberg has more experimental data than I do right now on proposed changes.


On Fri, Oct 25, 2013 at 9:11 AM, Stuart Halloway <stuart....@gmail.com> wrote:
To save others from having to read this whole thread, is the following summary correct?

1. We believe that the performance issue Paul first encountered is substantially / entirely caused by hash collisions.

There are definitely many hash collisions with Paul's program.  I believe Mark Engelberg has already run experiments to show that most/all of these collisions go away with one proposed change to the hash functions.


2. The hash collisions could be improved by spreading the hashes of numbers (and not needing to mess with hashes of collections).

If the only change made were to spread out the hashes of numbers using a linear function, e.g. something of the form const1 * number + const2, then the same hash collisions would remain, due to the current hash functions for vectors/sequences and sets and the nature of Paul's data (which were definitely not created with an eye towards breaking the hash functions -- they are quite reasonable things a programmer would hope to work efficiently without needing to know any special tricks).

If the only change made were to spread out the hashes of numbers using a non-linear function, e.g. something like the number*number suggested by Mark Engelberg, then it needs more verification than I personally have done so far, but I believe that would help avoid many of the collisions.  I think Mark Engelberg could make comments based on more collected data than I have.

I do not know the answers to questions a or b below.

Andy

David Nolen

unread,
Oct 25, 2013, 12:56:43 PM10/25/13
to clojure
What I suggested is far, far more conservative than anything else proposed on this or on the dev thread.

Changing the hash implementation of sets and vectors etc. is likely to have unintended consequences for performance - http://www.scala-lang.org/old/node/8510.html

The original Scala code used case classes - like defrecord they provide a lot of functionality out of the box and don't require defining hash / equals methods. In the past Scala case classes had bad hash code distribution, this is no longer the case.

What I'm suggesting is to leave everything else alone, but only make the sensible change so that the Scala code can be reasonably ported to Clojure. This simply requires giving defrecord a better hashing strategy. I suspect we can duplicate the approach already taken by Scala for case classes.

David

Mark Engelberg

unread,
Oct 25, 2013, 2:11:03 PM10/25/13
to clojure
On Fri, Oct 25, 2013 at 9:56 AM, David Nolen <dnolen...@gmail.com> wrote:
What I suggested is far, far more conservative than anything else proposed on this or on the dev thread.

I understand your point that records are a separate "island" unique to Clojure-land, whereas the other Clojure types need to align with Java, and therefore, are riskier to mess around with.  But I find it hard to get excited about a records-only solution, mainly because I hardly ever use records.  Using straight maps, sets, and vectors to represent data is perfectly ordinary and idiomatic in Clojure, e.g., Paul's initial code didn't use records to represent his objects.  So solving the problem just for records doesn't feel to me like much of a solution.

 
 I suspect we can duplicate the approach already taken by Scala for case classes.



There are two things that jump out to me as things that might make it hard to mimic Scala's approach for case classes in Clojure's records:
1. Clojure records function like maps, and you can assoc more things into them after construction.  Scala is working with a fixed number of fields at compile time.
2. Scala knows the type of each of the fields at compile time, and so (if I understand their approach correctly), they can compile a custom hashing function for the case class that leverages the types of all its components for maximum speed.

Cedric Greevey

unread,
Oct 25, 2013, 2:14:38 PM10/25/13
to clo...@googlegroups.com
I don't see a general solution for idiomatic Clojure data (so, often not records) other than making the hashes of small integers a spread-out, nonlinear function of them.


--

Mark Engelberg

unread,
Oct 25, 2013, 2:32:28 PM10/25/13
to clojure
On Fri, Oct 25, 2013 at 9:11 AM, Stuart Halloway <stuart....@gmail.com> wrote:
To save others from having to read this whole thread, is the following summary correct?

1. We believe that the performance issue Paul first encountered is substantially / entirely caused by hash collisions.

Yes.
 

2. The hash collisions could be improved by spreading the hashes of numbers (and not needing to mess with hashes of collections).

Yes.  Increasing the spread of the number hashes wouldn't solve all the types of hash collisions we've discussed, but it would definitely help.  However, as David pointed out, there's a tradeoff between increasing the complexity of the hashing function and the time saved by improving hash collisions.  My intuition from playing around with this is that increasing the complexity of hashing numbers from the current, dead-simple scheme would be an overhead that would really be felt.  Two advantages of messing with hashes at the collection level is that:
a. Collections are already doing reasonably complex things during hashing, like traversing their elements.
b. The hash, once computed, can be cached.
Those two factors mean that adding a tiny amount of complexity to the hash function of a collection is more likely to help than to hurt performance.

 

If the above summary is correct, then I have the following questions:

a. Is the approach Scala took substantially #2 above, or something different / more?

I don't believe that Scala altered the hashing of numbers, but I am not certain.  All I know about Scala's hashing is what I've seen from following some of the links that David Nolen has posted here.  It appears to me that they made a variety of small improvements for different kinds of hashing.  For example, choosing different magic primes for strings versus collections (Clojure mimics Java and uses 31 everywhere) helped with certain kinds of collisions.  The link David posted is specifically about how they improved case classes, which are similar to Clojure's records.  It looks like they generate a custom hash function at compile time using knowledge of the types of the element and an algorithm that munges together the individual bytes of the items.  So I'd say that it looks like they did something in the different / more category.


b. Is changing the hash codes of numbers going to break expectations on the Java side of things?

I don't know.
 

Andy Fingerhut

unread,
Oct 31, 2013, 10:19:23 AM10/31/13
to clo...@googlegroups.com
Just a follow up after a fair amount of thinking and experimentation has been done on this problem.

Mark Engelberg has a very nice document describing the existing hash functions used by Clojure, examples with why they can cause collisions so often, and suggestions for changes to them to have fewer collisions in these cases.  https://docs.google.com/document/d/10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit

I have made modifications to a fork of the Clojure code base with his suggested hash modifications here, for experimentation purposes: https://github.com/jafingerhut/clojure   I also have some modifications to Paul's N-queens Clojure program to print out additional statistics and try out different hash functions here: https://github.com/jafingerhut/chess-clojure

Running Paul's N-queens problem with a 6x6 board without any changes to Clojure takes about 7 mins on one of my computers.  With those changes to the hash functions it finishes in about 12 sec.

I haven't let a 6x9 board run to completion without those hash changes, but it takes a long time :-)  With those changes to the hash functions it finishes in about 11 minutes.

The reason for the currently slow behavior is a combination of the current hash functions and the implementation of the PersistentHashSet data structure used to implement sets.  A PersistentHashSet is, very roughly, a 32-way branching tree with the hash function of the elements used to decide which branch to take.  When you hit a leaf in the tree, all elements with the same hash function value are put into an array that is scanned linearly whenever you want to check whether a value is already in the set.  So the more hash collisions, the more it is like using an array to store and look up set elements in O(n) time.

With the 6x9 board, there are 20,136,752 solutions.  The way those solutions are represented in the program as sets of vectors, those 20 million values only have 17,936 different hash values.  Thus the average length of those arrays of values in the tree leaves is 1,122 values.  The longest of those arrays has 81,610 values in it, all with the same hash function.

With Mark's proposed hash function changes, those 20,136,752 values hash to 20,089,488 different ints, for an average array length of 1 value, and a longest array length of 3 values.  Big speedup.

There is nothing unique about Mark's particular hash functions that achieves this.  Other hash modifications can give similar improvements.  The Clojure dev team is considering which changes would be good ones to make, in hopes of changing them once and not having to do it again.

Andy


On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com> wrote:
I've been playing around with a generalised version of the N-Queens problem that handles other chess pieces. I have a Scala implementation that uses an eager depth-first search. Given enough RAM (around 12G - it's very memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5 minutes on my MacBook Pro.

I then created a Clojure version of the same algorithm with the intention of (eventually) using laziness to avoid the memory issue. However, the performance of the Clojure version is several orders of magnitude worse than the Scala version (I've left it running overnight on the problem that the Scala version solves in 2.5 minutes and it still hasn't completed).

I'm struggling to see why the performance of the Clojure version is so much worse than the Scala. Profiling in YourKit suggests that it's spending 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain why. I would be grateful for any insight.

The code for the two different versions is on GitHub:


Notes:

- I know that an exhaustive depth-first search isn't a great way to tackle this problem. But I'd like to understand why I see such a dramatic difference between the performance of the Scala and Clojure versions.

- I believe that I've implemented the same algorithm in both Scala and Clojure - certainly they both generate the same results for small problems (there are 3x3 and 4x4 problems commented out in the source). But clearly I can't rule out the possibility that I've made a mistake in the Clojure implementation. If anyone can spot my stupid mistake, I'd greatly appreciate it.

Thanks in advance for any insight that anyone can offer.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Jim - FooBar();

unread,
Oct 31, 2013, 12:25:03 PM10/31/13
to clo...@googlegroups.com
awesome news :)

Jim

Alex Miller

unread,
Nov 1, 2013, 2:28:34 PM11/1/13
to clo...@googlegroups.com
Has anyone created a design page on dev.clojure for this yet? 

Andy Fingerhut

unread,
Nov 2, 2013, 12:44:18 PM11/2/13
to clo...@googlegroups.com
A few minutes ago I finished copying, pasting, and doing a little reformatting on Mark Engelberg's document on the subject.

    http://dev.clojure.org/display/design/Better+hashing

Andy

Andy Fingerhut

unread,
Jan 30, 2014, 11:13:56 AM1/30/14
to clo...@googlegroups.com, Paul Butcher
Thanks to the work and thought of Mark Engelberg, Alex Miller, Rich Hickey, myself, and likely several others who I should be naming here but am forgetting, the latest (not yet released) Clojure master version has an improved hash function that gives a much better variety of hash values, and thus much improved performance for hash-based collections like hash maps and hash sets, than previous versions of Clojure.

For example, the N-queen program that Paul Butcher wrote and was asking about its performance at the beginning of this thread, improves from 6.7 minutes before these hash changes to 12.8 seconds after the changes, when run on the 6x6 board.  For the 6x9 board, the improvement is from "something over 8 hours that I didn't measure precisely because I didn't want to wait" down to 12.8 minutes.  A few more experimental results are given on the design wiki page [1].

[1] http://dev.clojure.org/display/design/Better+hashing

Andy

Alex Miller

unread,
Jan 30, 2014, 2:11:08 PM1/30/14
to clo...@googlegroups.com, Paul Butcher
Thanks for posting that here Andy - I've been mostly off the grid this week and haven't had time to do some of this followup. I would also like to say many thanks to Mark and Andy who have spent a lot of time elucidating the original issue and working on how to address it.

Alex

Cedric Greevey

unread,
Jan 30, 2014, 10:30:38 PM1/30/14
to clo...@googlegroups.com
+1
Reply all
Reply to author
Forward
0 new messages