The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Dubious performance characteristics of hash-maps
 There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic. There was an error processing your request. Please try again. Standard view   View as tree
 13 messages

From:
To:
Cc:
Followup To:
Subject:
 Validation: For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon.

More options Feb 23 2010, 12:51 am
From: "Edward Z. Yang" <ezy...@MIT.EDU>
Date: Tue, 23 Feb 2010 00:51:21 -0500
Local: Tues, Feb 23 2010 12:51 am
Subject: Dubious performance characteristics of hash-maps
I'd first like to state that I went into this exercise expecting
Bagwell's Hash Array Mapped Tries to blow the competition out of the
water; I'd get a fast functional map and give it to Haskell and people
would rejoice.

Instead, I found a functional hash-map that was approximately four times
slower than Haskell's IntMap.  I find this puzzling, and would love to
puzzle it out with you folks.

First, the test case:

(ns maptest (:gen-class))

(defn mk-random-stream []
(let [r (new ec.util.MersenneTwisterFast)]
(repeatedly (fn [] (. r (nextInt))))))

(defn -main [sn]
(let [n  (Integer/parseInt sn)
rs (take (* 2 n) (mk-random-stream))]
(time (let [m  (apply hash-map (take (* n 2) (interleave rs rs)))]
(reduce
(fn [s k] (let [v (get m k)]
(+ s (if (nil? v) 0 v))))
0
(take n (drop (/ n 2) rs)))))))

We generate an n-sized hash tree, and then do n accesses, half of which
match and half of which miss.  We compute the sum of all the values that
catch.  We see the following characteristics:

32K     .56s            .30s           .09s
64K     .84s            .69s           .22s
128K    1.65s           1.50s           .46s
256K    2.94s           3.23s          1.17s
512K    7.32s           6.96s          2.70s

I was careful to compile the Clojure code before attempting to run it.
I also pre-allocate the entire random sequence to prevent it from
muddying the execution time (although I am using Sean Luke's excellent
fast Mersenne Twister [1]).  The full string I used to invoke the
program was:

/usr/lib/jvm/java-6-sun-1.6.0.16/bin/java -Dfile.encoding=UTF-8 \
-classpath \$CLASSES maptest \$@

The fact that it's similar in performance to the Haskell
reimplementation (I'd say that the variance is not particularly
significant and both are about on par performance wise) seems to
validate the testing methodology, I hope, though the real proof of the
pudding would lie in implementing Haskell-style IntMap in Java.

You can see the test code for the Haskell versions here:

So... what's up with these numbers?  Is my methodology wrong?  Is the
garbage collector sucking?  Is the algorithm just not as good as it
makes out to be?

Cheers,
Edward

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 9:41 am
From: Rich Hickey <richhic...@gmail.com>
Date: Tue, 23 Feb 2010 06:41:28 -0800 (PST)
Local: Tues, Feb 23 2010 9:41 am
Subject: Re: Dubious performance characteristics of hash-maps

On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:

Yes, the methodology is wrong, at least on the Clojure/JVM side.
First, benchmarking anything without -server is a waste of time.
Second, the JVM does dynamic profile-driven compilation - performance
on single cold runs is not indicative of optimal performance. Third,
putting the random stream gen call in the let doesn't preallocate the
sequence, nor move its execution there (and thus out of the timing),
as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to
structure the code more like your Haskell IntMap code [1] (i.e. no
interleave). Fifth, AOT compilation makes no difference in performance
- Clojure code is always compiled.

Try this:

;run with java -server -Xmx1024m

(defn mk-random-stream []
(let [r (new ec.util.MersenneTwisterFast)]
(repeatedly (fn [] (. r (nextInt))))))

(defn main [i]
(let [vals (vec (take (* i 2) (mk-random-stream)))]
(dotimes [_ 10]
(time
(let [m (apply hash-map (take i vals))]
(reduce (fn [s k] (+ s (m k 0)))
0
(take i (drop (/ i 2) vals))))))))

(doseq [n (range 15 20)]
(let [i (int (Math/pow 2 n))]
(println " I = " i)
(main i)))

I =  32768
"Elapsed time: 199.499 msecs"
"Elapsed time: 43.018 msecs"
"Elapsed time: 24.715 msecs"
"Elapsed time: 24.081 msecs"
"Elapsed time: 39.707 msecs"
"Elapsed time: 23.909 msecs"
"Elapsed time: 25.918 msecs"
"Elapsed time: 27.328 msecs"
"Elapsed time: 23.797 msecs"
"Elapsed time: 23.874 msecs"
I =  65536
"Elapsed time: 66.411 msecs"
"Elapsed time: 298.676 msecs"
"Elapsed time: 58.821 msecs"
"Elapsed time: 66.621 msecs"
"Elapsed time: 60.428 msecs"
"Elapsed time: 56.664 msecs"
"Elapsed time: 67.778 msecs"
"Elapsed time: 56.833 msecs"
"Elapsed time: 58.613 msecs"
"Elapsed time: 60.37 msecs"
I =  131072
"Elapsed time: 118.257 msecs"
"Elapsed time: 173.605 msecs"
"Elapsed time: 121.663 msecs"
"Elapsed time: 166.162 msecs"
"Elapsed time: 118.889 msecs"
"Elapsed time: 158.164 msecs"
"Elapsed time: 120.545 msecs"
"Elapsed time: 157.452 msecs"
"Elapsed time: 120.446 msecs"
"Elapsed time: 154.614 msecs"
I =  262144
"Elapsed time: 390.623 msecs"
"Elapsed time: 247.928 msecs"
"Elapsed time: 336.522 msecs"
"Elapsed time: 361.865 msecs"
"Elapsed time: 250.092 msecs"
"Elapsed time: 324.004 msecs"
"Elapsed time: 619.761 msecs"
"Elapsed time: 250.708 msecs"
"Elapsed time: 253.414 msecs"
"Elapsed time: 247.428 msecs"
I =  524288
"Elapsed time: 752.528 msecs"
"Elapsed time: 1254.839 msecs"
"Elapsed time: 1126.026 msecs"
"Elapsed time: 589.351 msecs"
"Elapsed time: 532.753 msecs"
"Elapsed time: 532.615 msecs"
"Elapsed time: 554.859 msecs"
"Elapsed time: 529.93 msecs"
"Elapsed time: 525.966 msecs"
"Elapsed time: 551.065 msecs"

Note as HotSpot kicks in to optimize the code.

Note also the near-linearity of the times with increase in N,
indicating the promised near-constant-time performance (unlike the
IntMap times).

Regards,

Rich

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 9:51 am
From: Mark Tomko <mjt0...@gmail.com>
Date: Tue, 23 Feb 2010 06:51:46 -0800 (PST)
Local: Tues, Feb 23 2010 9:51 am
Subject: Re: Dubious performance characteristics of hash-maps
implementation may allow the heap to grow arbitrarily, but the default
JVM heap size is very limited, which may result in poor memory
performance.

On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 7:59 am
From: Meikel Brandmeyer <m...@kotka.de>
Date: Tue, 23 Feb 2010 04:59:42 -0800 (PST)
Local: Tues, Feb 23 2010 7:59 am
Subject: Re: Dubious performance characteristics of hash-maps
Hi,

On Feb 23, 6:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:

This will not have any effect on runtime execution, because the
Clojure code is compiled anyway upon load. So this only effect startup-
time, which you don't measure.

> I also pre-allocate the entire random sequence to prevent it from
> muddying the execution time.

In fact you are not. You create a lazy sequence (repeatedly + take)
which is completely realised in the time call. However I don't know

Without knowing much about benchmarking and the like, here my try on
the code:

(defn -main
[n]
(let [n       (Integer/parseInt n)
n2      (* 2 n)
r       (ec.util.MersenneTwisterFast.)
rs      (take n2 (repeatedly #(.nextInt r)))
dropped (doall (drop n rs))]
(time
(let [m (zipmap rs rs)]
(reduce #(+ %1 (get m %2 0)) 0 dropped)))))

This should get rid of all laziness in the time'd execution and get as
close to the map operations as possible.

Usual disclaimers about "broken microbenchmarks", "not averaging",
"using a 'cold' JIT" etc. apply. I'm no expert here so I won't comment
on that. However I recommend investigating the presentations by Cliff
Click on these topics to avoid common pitfalls.

Sincerely
Meikel

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 10:55 am
From: David Nolen <dnolen.li...@gmail.com>
Date: Tue, 23 Feb 2010 10:55:17 -0500
Local: Tues, Feb 23 2010 10:55 am
Subject: Re: Dubious performance characteristics of hash-maps

Here's another version that's almost as fast as Rich's and it even includes
the time for computing values.

(defn mk-random-map-fast [n]
(let [m (transient {})
r (new ec.util.MersenneTwisterFast)
ni (. r (nextInt))]
(loop [i 0 result m ni ni]
(if (= i n)
(persistent! m)
(recur (inc i) (assoc! m ni ni) (. r nextInt))))))

(defn fast-run [n]
(time
(let [m (mk-random-map-fast n)
rs (vals m)]
(reduce
(fn [s k] (let [v (get m k)]
(+ s (if (nil? v) 0 v))))
0
(take n (drop (/ n 2) rs))))))

; MacBook Pro Core 2 Duo 2.53ghz JDK 1.6 64bit

; ~40ms
(dotimes [_ 10]
(fast-run 32000))

; ~90ms
(dotimes [_ 10]
(fast-run 64000))

; ~180ms
(dotimes [_ 10]
(fast-run 128000))

; ~330ms
(dotimes [_ 10]
(fast-run 256000))

; ~700ms
(dotimes [_ 10]
(fast-run 512000))

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 9:29 am
From: Mark Tomko <mjt0...@gmail.com>
Date: Tue, 23 Feb 2010 06:29:46 -0800 (PST)
Local: Tues, Feb 23 2010 9:29 am
Subject: Re: Dubious performance characteristics of hash-maps
I'd suggest allocating more heap than the default JVM settings, or
running the JVM in server mode.  I'm not sure what Haskell's
underlying implementation is, but I'd assume that it can grow its heap
(somewhat) arbitrarily, which the JVM cannot, beyond a fixed bound.

On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 12:09 pm
From: Michał Marczyk <michal.marc...@gmail.com>
Date: Tue, 23 Feb 2010 18:09:15 +0100
Local: Tues, Feb 23 2010 12:09 pm
Subject: Re: Dubious performance characteristics of hash-maps
Haskell's IntMap performs no hashing, but also doesn't allow non-Int
keys. It stands to reason it would be very fast at the cost of a
decrease in utility.

By the way, I don't view this as a Haskell vs Clojure contest (GHC,
native code vs HotSpot, JVM), but rather one of data structures.
Perhaps the fact that they simply aren't interchangeable (due to
IntMap's type constraint on keys -- as mentioned above, it's always
Ints, not even (Integral a)) makes this a bit pointless (take a HAMT,
add the constraint that keys are all machine word-sized integers and
the problem of hash collisions just doesn't exist and you might be
able to make it faster).

Despite all that, it might be interesting (if only academically) to
compare worst-case asymptotic performance of IntMaps and HAMTs with
very large datasets, but writing appropriate test code could be
problematic, since it becomes rather important not to generate heaps
of garbage along the way (the collection of which could contribute
significantly to the observed performance characteristics of the
code).

Sincerely,
Michał

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 12:51 pm
From: "Edward Z. Yang" <ezy...@MIT.EDU>
Date: Tue, 23 Feb 2010 12:51:10 -0500
Local: Tues, Feb 23 2010 12:51 pm
Subject: Re: Dubious performance characteristics of hash-maps
Hello!

Excerpts from Rich Hickey's message of Tue Feb 23 09:41:28 -0500 2010:

> Yes, the methodology is wrong, at least on the Clojure/JVM side.
> First, benchmarking anything without -server is a waste of time.
> Second, the JVM does dynamic profile-driven compilation - performance
> on single cold runs is not indicative of optimal performance. Third,
> putting the random stream gen call in the let doesn't preallocate the
> sequence, nor move its execution there (and thus out of the timing),
> as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to
> structure the code more like your Haskell IntMap code [1] (i.e. no
> interleave). Fifth, AOT compilation makes no difference in performance
> - Clojure code is always compiled.

Great, I was hoping that was the case.  The HotSpot JVM is definitely
a big part of the mix, and I hadn't realized that it does dynamic profile
driven compilation.  I went off and read Cliff Click's excellent presentation
on the subject.

Re laziness, on closer inspection it looks like the equivalent Haskell
is also generating the lazy values inside the loop, so actually they're
to force the random generation before timing.

> Try this:

> ;run with java -server -Xmx1024m

Good catch.  I will bump up the default heap size in the Haskell RTS
to 1G as well.

> (defn main [i]
>   (let [vals (vec (take (* i 2) (mk-random-stream)))]
>     (dotimes [_ 10]
>       (time
>        (let [m (apply hash-map (take i vals))]

If I understand this code correctly, hash-map will get called with
i arguments, which means it will be populated with i/2 values... probably
not the intended result? (I will admit, I've done a bit of hacking on
mit-scheme but my clojure is not quite up to speed, so I might be missing a
subtlety here).

From David Nolen:

> Here's another version that's almost as fast as Rich's and it even includes
> the time for computing values.

> (defn mk-random-map-fast [n]
>   (let [m (transient {})
>         r (new ec.util.MersenneTwisterFast)
>         ni (. r (nextInt))]
>     (loop [i 0 result m ni ni]
>       (if (= i n)
>         (persistent! m)
>         (recur (inc i) (assoc! m ni ni) (. r nextInt))))))

Aww, that's cheating! :-)  More seriously, the operation I'd like to test here
is the functional (non-destructive) update, so marking the structure as transient
sort of defeats the purpose of the benchmark.  It's true that in this particular
case, the transient variant would be much faster.  You also appear to do the real
sample size, which might be why you observed this version to be slightly slower
than Rich's.

I'll resetup my tests and try this out.  Thanks for the comments!

Cheers,
Edward

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 1:22 pm
From: Daniel Ribeiro <dan...@gmail.com>
Date: Tue, 23 Feb 2010 10:22:00 -0800 (PST)
Subject: Re: Dubious performance characteristics of hash-maps
IBM has this very comprehensive guide on java benchmarking:
http://www.ibm.com/developerworks/java/library/j-benchmark1.html

It is quite long, but there are so many parameters and optimizations
on the JVM that overlooking them is not a real option for a real
benchmark.

Regards,
Daniel

On Feb 23, 11:29 am, Mark Tomko <mjt0...@gmail.com> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 2:36 pm
From: "Edward Z. Yang" <ezy...@MIT.EDU>
Date: Tue, 23 Feb 2010 14:36:41 -0500
Local: Tues, Feb 23 2010 2:36 pm
Subject: Re: Dubious performance characteristics of hash-maps
Excerpts from Michał Marczyk's message of Tue Feb 23 12:09:15 -0500 2010:

> Haskell's IntMap performs no hashing, but also doesn't allow non-Int
> keys. It stands to reason it would be very fast at the cost of a
> decrease in utility.

This is an oversight in the test harness.  I went and made the test code
hash the integers before inserting them.  Since the hashes are,
internally, 32-bit or 64-bit integers, I don't really see any reason why
IntMap couldn't be "genericized" for general use; you'd just have to
make sure you has a hash function for your key data type!

> By the way, I don't view this as a Haskell vs Clojure contest (GHC,
> native code vs HotSpot, JVM), but rather one of data structures.

That is my hope!  But unless I have totally misunderstood the algorithm
in my Haskell reimplementation (which is quite possible, though I have
reviewed it several times), the Haskell version is drastically slower
than the Java implementation.  It could be the case that GC/Runtime/etc
is critical to the performance of the algorithm.

> Perhaps the fact that they simply aren't interchangeable (due to
> IntMap's type constraint on keys -- as mentioned above, it's always
> Ints, not even (Integral a)) makes this a bit pointless (take a HAMT,
> add the constraint that keys are all machine word-sized integers and
> the problem of hash collisions just doesn't exist and you might be
> able to make it faster).

I was under the impression that Java hashCode() was also a machine
sized integer.  Is this not the case?

Cheers,
Edward

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 2:54 pm
From: Michał Marczyk <michal.marc...@gmail.com>
Date: Tue, 23 Feb 2010 20:54:10 +0100
Local: Tues, Feb 23 2010 2:54 pm
Subject: Re: Dubious performance characteristics of hash-maps
On 23 February 2010 20:36, Edward Z. Yang <ezy...@mit.edu> wrote:

> This is an oversight in the test harness.  I went and made the test code
> hash the integers before inserting them.  Since the hashes are,
> internally, 32-bit or 64-bit integers, I don't really see any reason why
> IntMap couldn't be "genericized" for general use; you'd just have to
> make sure you has a hash function for your key data type!

That's a start. Then you'd also have to take care of key collisions.

Incidentally, there's no point in hashing word-sized integers.
(They're normally used as unchanged as their own hash values for
purposes of insertion into hash maps.)

>> By the way, I don't view this as a Haskell vs Clojure contest (GHC,
>> native code vs HotSpot, JVM), but rather one of data structures.

> That is my hope!  But unless I have totally misunderstood the algorithm
> in my Haskell reimplementation (which is quite possible, though I have
> reviewed it several times), the Haskell version is drastically slower
> than the Java implementation.  It could be the case that GC/Runtime/etc
> is critical to the performance of the algorithm.

I'm a bit confused now... Your initial numbers suggest there's not
well, perhaps it would be useful to ask some Haskell ninjas (from
Haskell Café maybe) have a look at it. I'm not one myself,
regrettably. :-)

> I was under the impression that Java hashCode() was also a machine
> sized integer.  Is this not the case?

What I had in mind was that hashCode() may be rather slow for more
complex objects... For your particular test it doesn't matter. There's
still the issue of collision resolution.

All best,
Michał

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 3:23 pm
From: "Edward Z. Yang" <ezy...@MIT.EDU>
Date: Tue, 23 Feb 2010 15:23:14 -0500
Local: Tues, Feb 23 2010 3:23 pm
Subject: Re: Dubious performance characteristics of hash-maps
Excerpts from Michał Marczyk's message of Tue Feb 23 14:54:10 -0500 2010:

> That's a start. Then you'd also have to take care of key collisions.

Yup.  As far I can tell, this would be easy, and hopefully not too
much performance problem in the no-collision case with unboxing.

> Incidentally, there's no point in hashing word-sized integers.
> (They're normally used as unchanged as their own hash values for
> purposes of insertion into hash maps.)

If I recall correctly, hashing integers tends to be useless except
for certain pathological distributions which cause all of the items
to be put in the same bucket.  I wasn't sure if Java had any protections
against this; inspecting the source code, it looks like they just take
the integer modulo the number of buckets.

> I'm a bit confused now... Your initial numbers suggest there's not
> well, perhaps it would be useful to ask some Haskell ninjas (from
> Haskell Café maybe) have a look at it. I'm not one myself,
> regrettably. :-)

My initial numbers were shown to be wrong. :-) I haven't setup new
numbers yet.  I haven't gotten any bites from haskell-cafe yet.

Cheers,
Edward

To post a message you must first join this group.
You do not have the permission required to post.
More options Feb 23 2010, 6:49 pm
From: "Edward Z. Yang" <ezy...@MIT.EDU>
Date: Tue, 23 Feb 2010 18:49:39 -0500
Local: Tues, Feb 23 2010 6:49 pm
Subject: Re: Dubious performance characteristics of hash-maps
Aight. Here are the updated benchmarks.  Heap size set to 1G by
default, Java run with -server; I also got myself some shiny

The really short summary:  they're... about the same?

Summary:

IntMap       Java HAMT (32K-512K)  Java HAMT (512K-32K)
32K     .035s       .100s                  .042s
64K     .085s       .077s                  .088s
128K     .190s       .173s                  .166s
256K     .439s       .376s                  .483s
512K    1.047s      1.107s                 1.113s

In detail, IntMap:

benchmarking intmap/32000
collecting 20 samples, 1 iterations each, in estimated 820.9419 ms
bootstrapping with 100000 resamples
mean: 35.01358 ms, lb 34.11121 ms, ub 36.47188 ms, ci 0.950
std dev: 2.629518 ms, lb 1.464889 ms, ub 3.820059 ms, ci 0.950
found 2 outliers among 20 samples (10.0%)
2 (10.0%) high severe
variance introduced by outliers: 4.965%
variance is slightly inflated by outliers

benchmarking intmap/64000
collecting 20 samples, 1 iterations each, in estimated 1.794639 s
bootstrapping with 100000 resamples
mean: 85.36590 ms, lb 83.84863 ms, ub 87.34270 ms, ci 0.950
std dev: 4.042450 ms, lb 3.071180 ms, ub 5.210041 ms, ci 0.950
variance introduced by outliers: 4.913%
variance is slightly inflated by outliers

benchmarking intmap/128000
collecting 20 samples, 1 iterations each, in estimated 4.032903 s
bootstrapping with 100000 resamples
mean: 190.1984 ms, lb 189.6087 ms, ub 191.1099 ms, ci 0.950
std dev: 1.699876 ms, lb 1.121827 ms, ub 2.637079 ms, ci 0.950
found 1 outliers among 20 samples (5.0%)
1 (5.0%) high mild
variance introduced by outliers: 4.750%
variance is slightly inflated by outliers

benchmarking intmap/256000
collecting 20 samples, 1 iterations each, in estimated 9.389544 s
bootstrapping with 100000 resamples
mean: 439.0274 ms, lb 435.6792 ms, ub 444.7927 ms, ci 0.950
std dev: 10.04909 ms, lb 5.964673 ms, ub 14.68590 ms, ci 0.950
found 3 outliers among 20 samples (15.0%)
2 (10.0%) high mild
1 (5.0%) high severe
variance introduced by outliers: 4.750%
variance is slightly inflated by outliers

benchmarking intmap/512000
collecting 20 samples, 1 iterations each, in estimated 20.40116 s
bootstrapping with 100000 resamples
mean: 1.046665 s, lb 1.024223 s, ub 1.077329 s, ci 0.950
std dev: 60.99068 ms, lb 43.85342 ms, ub 78.53380 ms, ci 0.950
variance introduced by outliers: 4.942%
variance is slightly inflated by outliers

And now Java-implemented HAMT:

I =  32000
"Elapsed time: 476.83896 msecs"
"Elapsed time: 129.631101 msecs"
"Elapsed time: 74.291231 msecs"
"Elapsed time: 68.319839 msecs"
"Elapsed time: 41.243881 msecs"
"Elapsed time: 45.591625 msecs"
"Elapsed time: 58.740188 msecs"
"Elapsed time: 38.914531 msecs"
"Elapsed time: 35.862171 msecs"
"Elapsed time: 41.271539 msecs"
I =  64000
"Elapsed time: 77.344356 msecs"
"Elapsed time: 76.803971 msecs"
"Elapsed time: 93.124551 msecs"
"Elapsed time: 71.330153 msecs"
"Elapsed time: 81.106194 msecs"
"Elapsed time: 90.538605 msecs"
"Elapsed time: 70.184368 msecs"
"Elapsed time: 70.196166 msecs"
"Elapsed time: 78.350237 msecs"
"Elapsed time: 68.696711 msecs"
I =  128000
"Elapsed time: 145.116415 msecs"
"Elapsed time: 357.026015 msecs"
"Elapsed time: 145.325526 msecs"
"Elapsed time: 194.460519 msecs"
"Elapsed time: 144.369214 msecs"
"Elapsed time: 143.871316 msecs"
"Elapsed time: 175.60394 msecs"
"Elapsed time: 144.483901 msecs"
"Elapsed time: 147.329535 msecs"
"Elapsed time: 144.782009 msecs"
I =  256000
"Elapsed time: 653.918336 msecs"
"Elapsed time: 388.307165 msecs"
"Elapsed time: 361.146159 msecs"
"Elapsed time: 316.747299 msecs"
"Elapsed time: 313.326512 msecs"
"Elapsed time: 329.156172 msecs"
"Elapsed time: 323.883539 msecs"
"Elapsed time: 359.509225 msecs"
"Elapsed time: 360.504021 msecs"
"Elapsed time: 359.890434 msecs"
I =  512000
"Elapsed time: 1368.259749 msecs"
"Elapsed time: 1357.138096 msecs"
"Elapsed time: 1164.463425 msecs"
"Elapsed time: 1087.096846 msecs"
"Elapsed time: 1421.036514 msecs"
"Elapsed time: 965.061633 msecs"
"Elapsed time: 1076.974038 msecs"
"Elapsed time: 828.068692 msecs"
"Elapsed time: 987.893417 msecs"
"Elapsed time: 810.575108 msecs

Aaand in the other direction

I =  512000
"Elapsed time: 3466.817659 msecs"
"Elapsed time: 1047.325342 msecs"
"Elapsed time: 870.286512 msecs"
"Elapsed time: 985.984497 msecs"
"Elapsed time: 921.926887 msecs"
"Elapsed time: 772.401926 msecs"
"Elapsed time: 790.609548 msecs"
"Elapsed time: 747.57967 msecs"
"Elapsed time: 777.731826 msecs"
"Elapsed time: 754.598516 msecs"
I =  256000
"Elapsed time: 476.565748 msecs"
"Elapsed time: 439.623009 msecs"
"Elapsed time: 548.94325 msecs"
"Elapsed time: 472.441847 msecs"
"Elapsed time: 577.204951 msecs"
"Elapsed time: 548.692722 msecs"
"Elapsed time: 416.976504 msecs"
"Elapsed time: 523.522705 msecs"
"Elapsed time: 420.375852 msecs"
"Elapsed time: 404.929297 msecs"
I =  128000
"Elapsed time: 157.033585 msecs"
"Elapsed time: 191.957237 msecs"
"Elapsed time: 179.999572 msecs"
"Elapsed time: 156.903536 msecs"
"Elapsed time: 160.216012 msecs"
"Elapsed time: 180.149246 msecs"
"Elapsed time: 153.392229 msecs"
"Elapsed time: 166.831646 msecs"
"Elapsed time: 153.221276 msecs"
"Elapsed time: 157.149316 msecs"
I =  64000
"Elapsed time: 73.265278 msecs"
"Elapsed time: 72.142408 msecs"
"Elapsed time: 126.916014 msecs"
"Elapsed time: 75.661313 msecs"
"Elapsed time: 75.418126 msecs"
"Elapsed time: 76.240255 msecs"
"Elapsed time: 112.986045 msecs"
"Elapsed time: 72.012846 msecs"
"Elapsed time: 72.265045 msecs"
"Elapsed time: 121.865843 msecs"
I =  32000
"Elapsed time: 41.439609 msecs"
"Elapsed time: 40.39077 msecs"
"Elapsed time: 37.374235 msecs"
"Elapsed time: 74.785005 msecs"
"Elapsed time: 36.891967 msecs"
"Elapsed time: 35.746256 msecs"
"Elapsed time: 37.571191 msecs"
"Elapsed time: 37.394487 msecs"
"Elapsed time: 36.929189 msecs"
"Elapsed time: 37.027318 msecs"

still abysmal.

Cheers,
Edward