I'd first like to state that I went into this exercise expecting Bagwell's Hash Array Mapped Tries to blow the competition out of the water; I'd get a fast functional map and give it to Haskell and people would rejoice.
Instead, I found a functional hash-map that was approximately four times slower than Haskell's IntMap. I find this puzzling, and would love to puzzle it out with you folks.
First, the test case:
(ns maptest (:gen-class))
(defn mk-random-stream [] (let [r (new ec.util.MersenneTwisterFast)] (repeatedly (fn [] (. r (nextInt))))))
(defn -main [sn] (let [n (Integer/parseInt sn) rs (take (* 2 n) (mk-random-stream))] (time (let [m (apply hash-map (take (* n 2) (interleave rs rs)))] (reduce (fn [s k] (let [v (get m k)] (+ s (if (nil? v) 0 v)))) 0 (take n (drop (/ n 2) rs)))))))
We generate an n-sized hash tree, and then do n accesses, half of which match and half of which miss. We compute the sum of all the values that catch. We see the following characteristics:
I was careful to compile the Clojure code before attempting to run it. I also pre-allocate the entire random sequence to prevent it from muddying the execution time (although I am using Sean Luke's excellent fast Mersenne Twister [1]). The full string I used to invoke the program was:
The fact that it's similar in performance to the Haskell reimplementation (I'd say that the variance is not particularly significant and both are about on par performance wise) seems to validate the testing methodology, I hope, though the real proof of the pudding would lie in implementing Haskell-style IntMap in Java.
You can see the test code for the Haskell versions here:
So... what's up with these numbers? Is my methodology wrong? Is the garbage collector sucking? Is the algorithm just not as good as it makes out to be?
> I'd first like to state that I went into this exercise expecting > Bagwell's Hash Array Mapped Tries to blow the competition out of the > water; I'd get a fast functional map and give it to Haskell and people > would rejoice.
> Instead, I found a functional hash-map that was approximately four times > slower than Haskell's IntMap. I find this puzzling, and would love to > puzzle it out with you folks.
> First, the test case:
> (ns maptest (:gen-class))
> (defn mk-random-stream [] > (let [r (new ec.util.MersenneTwisterFast)] > (repeatedly (fn [] (. r (nextInt))))))
> We generate an n-sized hash tree, and then do n accesses, half of which > match and half of which miss. We compute the sum of all the values that > catch. We see the following characteristics:
> I was careful to compile the Clojure code before attempting to run it. > I also pre-allocate the entire random sequence to prevent it from > muddying the execution time (although I am using Sean Luke's excellent > fast Mersenne Twister [1]). The full string I used to invoke the > program was:
> The fact that it's similar in performance to the Haskell > reimplementation (I'd say that the variance is not particularly > significant and both are about on par performance wise) seems to > validate the testing methodology, I hope, though the real proof of the > pudding would lie in implementing Haskell-style IntMap in Java.
> You can see the test code for the Haskell versions here:
> So... what's up with these numbers? Is my methodology wrong? Is the > garbage collector sucking? Is the algorithm just not as good as it > makes out to be?
Yes, the methodology is wrong, at least on the Clojure/JVM side. First, benchmarking anything without -server is a waste of time. Second, the JVM does dynamic profile-driven compilation - performance on single cold runs is not indicative of optimal performance. Third, putting the random stream gen call in the let doesn't preallocate the sequence, nor move its execution there (and thus out of the timing), as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to structure the code more like your Haskell IntMap code [1] (i.e. no interleave). Fifth, AOT compilation makes no difference in performance - Clojure code is always compiled.
Try this:
;run with java -server -Xmx1024m
(defn mk-random-stream [] (let [r (new ec.util.MersenneTwisterFast)] (repeatedly (fn [] (. r (nextInt))))))
(defn main [i] (let [vals (vec (take (* i 2) (mk-random-stream)))] (dotimes [_ 10] (time (let [m (apply hash-map (take i vals))] (reduce (fn [s k] (+ s (m k 0))) 0 (take i (drop (/ i 2) vals))))))))
I'd suggest checking your JVM memory settings. The Haskell implementation may allow the heap to grow arbitrarily, but the default JVM heap size is very limited, which may result in poor memory performance.
On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:
> I'd first like to state that I went into this exercise expecting > Bagwell's Hash Array Mapped Tries to blow the competition out of the > water; I'd get a fast functional map and give it to Haskell and people > would rejoice.
> Instead, I found a functional hash-map that was approximately four times > slower than Haskell's IntMap. I find this puzzling, and would love to > puzzle it out with you folks.
> First, the test case:
> (ns maptest (:gen-class))
> (defn mk-random-stream [] > (let [r (new ec.util.MersenneTwisterFast)] > (repeatedly (fn [] (. r (nextInt))))))
> We generate an n-sized hash tree, and then do n accesses, half of which > match and half of which miss. We compute the sum of all the values that > catch. We see the following characteristics:
> I was careful to compile the Clojure code before attempting to run it. > I also pre-allocate the entire random sequence to prevent it from > muddying the execution time (although I am using Sean Luke's excellent > fast Mersenne Twister [1]). The full string I used to invoke the > program was:
> The fact that it's similar in performance to the Haskell > reimplementation (I'd say that the variance is not particularly > significant and both are about on par performance wise) seems to > validate the testing methodology, I hope, though the real proof of the > pudding would lie in implementing Haskell-style IntMap in Java.
> You can see the test code for the Haskell versions here:
> So... what's up with these numbers? Is my methodology wrong? Is the > garbage collector sucking? Is the algorithm just not as good as it > makes out to be?
> I was careful to compile the Clojure code before attempting to run it.
This will not have any effect on runtime execution, because the Clojure code is compiled anyway upon load. So this only effect startup- time, which you don't measure.
> I also pre-allocate the entire random sequence to prevent it from > muddying the execution time.
In fact you are not. You create a lazy sequence (repeatedly + take) which is completely realised in the time call. However I don't know about Haskell being lazy. So the same might happen there...
Without knowing much about benchmarking and the like, here my try on the code:
This should get rid of all laziness in the time'd execution and get as close to the map operations as possible.
Usual disclaimers about "broken microbenchmarks", "not averaging", "using a 'cold' JIT" etc. apply. I'm no expert here so I won't comment on that. However I recommend investigating the presentations by Cliff Click on these topics to avoid common pitfalls.
Here's another version that's almost as fast as Rich's and it even includes the time for computing values.
(defn mk-random-map-fast [n] (let [m (transient {}) r (new ec.util.MersenneTwisterFast) ni (. r (nextInt))] (loop [i 0 result m ni ni] (if (= i n) (persistent! m) (recur (inc i) (assoc! m ni ni) (. r nextInt))))))
(defn fast-run [n] (time (let [m (mk-random-map-fast n) rs (vals m)] (reduce (fn [s k] (let [v (get m k)] (+ s (if (nil? v) 0 v)))) 0 (take n (drop (/ n 2) rs))))))
I'd suggest allocating more heap than the default JVM settings, or running the JVM in server mode. I'm not sure what Haskell's underlying implementation is, but I'd assume that it can grow its heap (somewhat) arbitrarily, which the JVM cannot, beyond a fixed bound.
On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:
> I'd first like to state that I went into this exercise expecting > Bagwell's Hash Array Mapped Tries to blow the competition out of the > water; I'd get a fast functional map and give it to Haskell and people > would rejoice.
> Instead, I found a functional hash-map that was approximately four times > slower than Haskell's IntMap. I find this puzzling, and would love to > puzzle it out with you folks.
> First, the test case:
> (ns maptest (:gen-class))
> (defn mk-random-stream [] > (let [r (new ec.util.MersenneTwisterFast)] > (repeatedly (fn [] (. r (nextInt))))))
> We generate an n-sized hash tree, and then do n accesses, half of which > match and half of which miss. We compute the sum of all the values that > catch. We see the following characteristics:
> I was careful to compile the Clojure code before attempting to run it. > I also pre-allocate the entire random sequence to prevent it from > muddying the execution time (although I am using Sean Luke's excellent > fast Mersenne Twister [1]). The full string I used to invoke the > program was:
> The fact that it's similar in performance to the Haskell > reimplementation (I'd say that the variance is not particularly > significant and both are about on par performance wise) seems to > validate the testing methodology, I hope, though the real proof of the > pudding would lie in implementing Haskell-style IntMap in Java.
> You can see the test code for the Haskell versions here:
> So... what's up with these numbers? Is my methodology wrong? Is the > garbage collector sucking? Is the algorithm just not as good as it > makes out to be?
Haskell's IntMap performs no hashing, but also doesn't allow non-Int keys. It stands to reason it would be very fast at the cost of a decrease in utility.
By the way, I don't view this as a Haskell vs Clojure contest (GHC, native code vs HotSpot, JVM), but rather one of data structures. Perhaps the fact that they simply aren't interchangeable (due to IntMap's type constraint on keys -- as mentioned above, it's always Ints, not even (Integral a)) makes this a bit pointless (take a HAMT, add the constraint that keys are all machine word-sized integers and the problem of hash collisions just doesn't exist and you might be able to make it faster).
Despite all that, it might be interesting (if only academically) to compare worst-case asymptotic performance of IntMaps and HAMTs with very large datasets, but writing appropriate test code could be problematic, since it becomes rather important not to generate heaps of garbage along the way (the collection of which could contribute significantly to the observed performance characteristics of the code).
Excerpts from Rich Hickey's message of Tue Feb 23 09:41:28 -0500 2010:
> Yes, the methodology is wrong, at least on the Clojure/JVM side. > First, benchmarking anything without -server is a waste of time. > Second, the JVM does dynamic profile-driven compilation - performance > on single cold runs is not indicative of optimal performance. Third, > putting the random stream gen call in the let doesn't preallocate the > sequence, nor move its execution there (and thus out of the timing), > as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to > structure the code more like your Haskell IntMap code [1] (i.e. no > interleave). Fifth, AOT compilation makes no difference in performance > - Clojure code is always compiled.
Great, I was hoping that was the case. The HotSpot JVM is definitely a big part of the mix, and I hadn't realized that it does dynamic profile driven compilation. I went off and read Cliff Click's excellent presentation on the subject.
Re laziness, on closer inspection it looks like the equivalent Haskell is also generating the lazy values inside the loop, so actually they're roughly equivalent. I've rewritten the Haskell code (not uploaded yet) to force the random generation before timing.
> Try this:
> ;run with java -server -Xmx1024m
Good catch. I will bump up the default heap size in the Haskell RTS to 1G as well.
> (defn main [i] > (let [vals (vec (take (* i 2) (mk-random-stream)))] > (dotimes [_ 10] > (time > (let [m (apply hash-map (take i vals))]
If I understand this code correctly, hash-map will get called with i arguments, which means it will be populated with i/2 values... probably not the intended result? (I will admit, I've done a bit of hacking on mit-scheme but my clojure is not quite up to speed, so I might be missing a subtlety here).
From David Nolen:
> Here's another version that's almost as fast as Rich's and it even includes > the time for computing values.
> (defn mk-random-map-fast [n] > (let [m (transient {}) > r (new ec.util.MersenneTwisterFast) > ni (. r (nextInt))] > (loop [i 0 result m ni ni] > (if (= i n) > (persistent! m) > (recur (inc i) (assoc! m ni ni) (. r nextInt))))))
Aww, that's cheating! :-) More seriously, the operation I'd like to test here is the functional (non-destructive) update, so marking the structure as transient sort of defeats the purpose of the benchmark. It's true that in this particular case, the transient variant would be much faster. You also appear to do the real sample size, which might be why you observed this version to be slightly slower than Rich's.
I'll resetup my tests and try this out. Thanks for the comments!
> I'd suggest allocating more heap than the default JVM settings, or > running the JVM in server mode. I'm not sure what Haskell's > underlying implementation is, but I'd assume that it can grow its heap > (somewhat) arbitrarily, which the JVM cannot, beyond a fixed bound.
> On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:
> > I'd first like to state that I went into this exercise expecting > > Bagwell's Hash Array Mapped Tries to blow the competition out of the > > water; I'd get a fast functional map and give it to Haskell and people > > would rejoice.
> > Instead, I found a functional hash-map that was approximately four times > > slower than Haskell's IntMap. I find this puzzling, and would love to > > puzzle it out with you folks.
> > First, the test case:
> > (ns maptest (:gen-class))
> > (defn mk-random-stream [] > > (let [r (new ec.util.MersenneTwisterFast)] > > (repeatedly (fn [] (. r (nextInt))))))
> > We generate an n-sized hash tree, and then do n accesses, half of which > > match and half of which miss. We compute the sum of all the values that > > catch. We see the following characteristics:
> > I was careful to compile the Clojure code before attempting to run it. > > I also pre-allocate the entire random sequence to prevent it from > > muddying the execution time (although I am using Sean Luke's excellent > > fast Mersenne Twister [1]). The full string I used to invoke the > > program was:
> > The fact that it's similar in performance to the Haskell > > reimplementation (I'd say that the variance is not particularly > > significant and both are about on par performance wise) seems to > > validate the testing methodology, I hope, though the real proof of the > > pudding would lie in implementing Haskell-style IntMap in Java.
> > You can see the test code for the Haskell versions here:
> > So... what's up with these numbers? Is my methodology wrong? Is the > > garbage collector sucking? Is the algorithm just not as good as it > > makes out to be?
Excerpts from Michał Marczyk's message of Tue Feb 23 12:09:15 -0500 2010:
> Haskell's IntMap performs no hashing, but also doesn't allow non-Int > keys. It stands to reason it would be very fast at the cost of a > decrease in utility.
This is an oversight in the test harness. I went and made the test code hash the integers before inserting them. Since the hashes are, internally, 32-bit or 64-bit integers, I don't really see any reason why IntMap couldn't be "genericized" for general use; you'd just have to make sure you has a hash function for your key data type!
> By the way, I don't view this as a Haskell vs Clojure contest (GHC, > native code vs HotSpot, JVM), but rather one of data structures.
That is my hope! But unless I have totally misunderstood the algorithm in my Haskell reimplementation (which is quite possible, though I have reviewed it several times), the Haskell version is drastically slower than the Java implementation. It could be the case that GC/Runtime/etc is critical to the performance of the algorithm.
> Perhaps the fact that they simply aren't interchangeable (due to > IntMap's type constraint on keys -- as mentioned above, it's always > Ints, not even (Integral a)) makes this a bit pointless (take a HAMT, > add the constraint that keys are all machine word-sized integers and > the problem of hash collisions just doesn't exist and you might be > able to make it faster).
I was under the impression that Java hashCode() was also a machine sized integer. Is this not the case?
On 23 February 2010 20:36, Edward Z. Yang <ezy...@mit.edu> wrote:
> This is an oversight in the test harness. I went and made the test code > hash the integers before inserting them. Since the hashes are, > internally, 32-bit or 64-bit integers, I don't really see any reason why > IntMap couldn't be "genericized" for general use; you'd just have to > make sure you has a hash function for your key data type!
That's a start. Then you'd also have to take care of key collisions.
Incidentally, there's no point in hashing word-sized integers. (They're normally used as unchanged as their own hash values for purposes of insertion into hash maps.)
>> By the way, I don't view this as a Haskell vs Clojure contest (GHC, >> native code vs HotSpot, JVM), but rather one of data structures.
> That is my hope! But unless I have totally misunderstood the algorithm > in my Haskell reimplementation (which is quite possible, though I have > reviewed it several times), the Haskell version is drastically slower > than the Java implementation. It could be the case that GC/Runtime/etc > is critical to the performance of the algorithm.
I'm a bit confused now... Your initial numbers suggest there's not much difference. Anyway, if your Haskell version doesn't perform very well, perhaps it would be useful to ask some Haskell ninjas (from Haskell Café maybe) have a look at it. I'm not one myself, regrettably. :-)
> I was under the impression that Java hashCode() was also a machine > sized integer. Is this not the case?
What I had in mind was that hashCode() may be rather slow for more complex objects... For your particular test it doesn't matter. There's still the issue of collision resolution.
Excerpts from Michał Marczyk's message of Tue Feb 23 14:54:10 -0500 2010:
> That's a start. Then you'd also have to take care of key collisions.
Yup. As far I can tell, this would be easy, and hopefully not too much performance problem in the no-collision case with unboxing.
> Incidentally, there's no point in hashing word-sized integers. > (They're normally used as unchanged as their own hash values for > purposes of insertion into hash maps.)
If I recall correctly, hashing integers tends to be useless except for certain pathological distributions which cause all of the items to be put in the same bucket. I wasn't sure if Java had any protections against this; inspecting the source code, it looks like they just take the integer modulo the number of buckets.
> I'm a bit confused now... Your initial numbers suggest there's not > much difference. Anyway, if your Haskell version doesn't perform very > well, perhaps it would be useful to ask some Haskell ninjas (from > Haskell Café maybe) have a look at it. I'm not one myself, > regrettably. :-)
My initial numbers were shown to be wrong. :-) I haven't setup new numbers yet. I haven't gotten any bites from haskell-cafe yet.
Aight. Here are the updated benchmarks. Heap size set to 1G by default, Java run with -server; I also got myself some shiny statistics machinery in Haskell.
The really short summary: they're... about the same?
benchmarking intmap/32000 collecting 20 samples, 1 iterations each, in estimated 820.9419 ms bootstrapping with 100000 resamples mean: 35.01358 ms, lb 34.11121 ms, ub 36.47188 ms, ci 0.950 std dev: 2.629518 ms, lb 1.464889 ms, ub 3.820059 ms, ci 0.950 found 2 outliers among 20 samples (10.0%) 2 (10.0%) high severe variance introduced by outliers: 4.965% variance is slightly inflated by outliers
benchmarking intmap/64000 collecting 20 samples, 1 iterations each, in estimated 1.794639 s bootstrapping with 100000 resamples mean: 85.36590 ms, lb 83.84863 ms, ub 87.34270 ms, ci 0.950 std dev: 4.042450 ms, lb 3.071180 ms, ub 5.210041 ms, ci 0.950 variance introduced by outliers: 4.913% variance is slightly inflated by outliers
benchmarking intmap/128000 collecting 20 samples, 1 iterations each, in estimated 4.032903 s bootstrapping with 100000 resamples mean: 190.1984 ms, lb 189.6087 ms, ub 191.1099 ms, ci 0.950 std dev: 1.699876 ms, lb 1.121827 ms, ub 2.637079 ms, ci 0.950 found 1 outliers among 20 samples (5.0%) 1 (5.0%) high mild variance introduced by outliers: 4.750% variance is slightly inflated by outliers
benchmarking intmap/256000 collecting 20 samples, 1 iterations each, in estimated 9.389544 s bootstrapping with 100000 resamples mean: 439.0274 ms, lb 435.6792 ms, ub 444.7927 ms, ci 0.950 std dev: 10.04909 ms, lb 5.964673 ms, ub 14.68590 ms, ci 0.950 found 3 outliers among 20 samples (15.0%) 2 (10.0%) high mild 1 (5.0%) high severe variance introduced by outliers: 4.750% variance is slightly inflated by outliers
benchmarking intmap/512000 collecting 20 samples, 1 iterations each, in estimated 20.40116 s bootstrapping with 100000 resamples mean: 1.046665 s, lb 1.024223 s, ub 1.077329 s, ci 0.950 std dev: 60.99068 ms, lb 43.85342 ms, ub 78.53380 ms, ci 0.950 variance introduced by outliers: 4.942% variance is slightly inflated by outliers