One (bad?) possibility is creating a new vector out of sequence:
(vec (sort [1 2 3 4 5 6]))
I am asking because I need random access (nth ..) to huge sorted
vectors - which are now actually huge sequences after the sort, with
horrible O(n) random access time
Sean
Clojure's 'sort' actually copies your input collection into an
array, and then uses java.util.Arrays/sort to sort it in place.
It then wraps a seq around that array, which is good in that it
makes that array essentially immutable, but bad for indexed
lookup. You might consider using java.util.Array/sort yourself,
then you could get very fast indexed lookups into the sorted
array using 'aget'.
A bit uglier, but ought to be quite fast.
It does raise the question of whether it would be good for
ArraySeqs to implement Indexed. Then 'nth' would be fast on the
object that 'sort' returns.
--Chouser
It doesn't need to be that ugly:
(defn sorted-vec [coll]
(doto (into-array coll)
(java.util.Arrays/sort)
(vec)))
looks OK to me at least.
Doesn't need to be that ugly, this looks OK to me at least:
(defn sorted-vec [coll]
(doto (into-array coll)
(java.util.Arrays/sort)
(vec)))
It'd also be possible to generalise the return type by passing in a fn
to apply to the sorted array.
Unfortunately the above code does not work, because it returns the array and not the vector.
(defn sorted-vec
[coll]
(let [arr (into-array coll)]
(java.util.Array/sort arr)
(vec arr)))
How fast is vec on an array? Isn't there a LazilyPersistentVector which just wraps the array?
Sincerely
Meikel
user=> (def v (vec (take 10000 (repeatedly #(rand-int 100000)))))
#'user/v
user=> (time(dotimes [_ 1000] (sort v)))
"Elapsed time: 23945.682336 msecs"
nil
user=> (time(dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 6030.585433 msecs"
BTW we have state here (the native Java array). Is it thread safe ?
I almost doubled the speed of sorted-vec by using clojure.lang.RT/
toArray:
(defn sorted-vec2
[coll]
(let [arr (clojure.lang.RT/toArray coll)]
(java.util.Arrays/sort arr)
(vec arr)))
user=> (time(dotimes [_ 1000] (sorted-vec2 v)))
"Elapsed time: 3502.369933 msecs"
user=> (time(dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 5874.088425 msecs"
;Modified clojure.core.clj sort
(defn sort2
"Returns a sorted sequence of the items in coll. If no comparator is
supplied, uses compare. comparator must
implement java.util.Comparator."
([coll]
(if (seq coll)
(let [a (to-array coll)]
(java.util.Arrays/sort a)
(seq a))
()))
([#^java.util.Comparator comp coll]
(if (seq coll)
(let [a (to-array coll)]
(. java.util.Arrays (sort a comp))
(seq a))
())))
;Modified sort
user=> (time (dotimes[_ 1000](sort2 v)))
"Elapsed time: 3510.047755 msecs"
;Original sort
user=> (time (dotimes[_ 1000](sort v)))
"Elapsed time: 23872.07338 msecs"
;Sorted vec. Returns vector, not sequence
user=> (time (dotimes[_ 1000](sorted-vec2 v)))
"Elapsed time: 3534.578648 msecs"
Am 03.01.2010 um 09:29 schrieb Gabi:
> BTW we have state here (the native Java array). Is it thread safe ?
>
>> (defn sorted-vec
>> [coll]
>> (let [arr (into-array coll)]
>> (java.util.Array/sort arr)
>> (vec arr)))
Since the array is not accessible from outside the function I would suspect its use to be similar safe as transients. (Although not enforced...)
Sincerely
Meikel
Well, the state doesn’t escape the function so it’s similar to
transients in that regard (although without the guarantees that
transients provide, but I think that for a fn this short it’s OK).
I don’t see that on my machine. I’m running 1.1.0-master-SNAPSHOT with
Apple’s Java 6 VM in case that has anything to do with it, but here's
what I get (after running the tests several times to warm up hotspot):
user=> (def v (vec (take 10000 (repeatedly #(rand-int 100000)))))
user=> (time (dotimes [_ 1000] (sort v)))
"Elapsed time: 4376.471 msecs"
user=> (defn sorted-vec [coll]
(let [a (into-array coll)]
(java.util.Arrays/sort a)
(vec a)))
user=> (time (dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 3254.371 msecs"
user=> (defn sorted-vec-2 [coll]
(let [a (to-array coll)]
(java.util.Arrays/sort a)
(vec a)))
user=> (time (dotimes [_ 1000] (sorted-vec-2 v)))
"Elapsed time: 2599.63 msecs"
So sorted-vec is faster, but not an order of magnitude, and sorted-
vec-2 is faster again.
Another alternative that may be worth considering is leaving the data
in the array and using aget to access elements (this should give you O
(1) access times vs. O(log32N) AFAIK). This may be a solution if
you're not mutating the data in the array, but I'd be careful about
this optimisation unless it really gets a large speed boost for your
code.
Does it make any difference if you specify -XX:+DoEscapeAnalysis at
the command line? Various JVM 6 sub-versions enable and disable it by
default and it can make a pretty hefty difference if it isn't enabled.
-- Aaron
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
On Jan 3, 11:04 pm, Aaron Cohen <remled...@gmail.com> wrote:
> What JVM 6 sub-version are you using?
>
> Does it make any difference if you specify -XX:+DoEscapeAnalysis at
> the command line? Various JVM 6 sub-versions enable and disable it by
> default and it can make a pretty hefty difference if it isn't enabled.
>
> -- Aaron
>
$ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-10M3025)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode)
> Does it make any difference if you specify -XX:+DoEscapeAnalysis at
My clojure start script currently has:
java -server -classpath $CLASSPATH \
-XX:+EliminateLocks -XX:+UseBiasedLocking \
-XX:+UseCompressedOops -XX:+DoEscapeAnalysis \
jline.ConsoleRunner clojure.main $@
> the command line? Various JVM 6 sub-versions enable and disable it by
> default and it can make a pretty hefty difference if it isn't enabled.
And the following times are pretty representative (after warming up
the VM, the first couple of runs are slower):
user=> (time (dotimes [_ 1000] (sort v)))
"Elapsed time: 4145.987 msecs"
user=> (time (dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 3295.779 msecs"
user=> (time (dotimes [_ 1000] (sorted-vec-2 v)))
"Elapsed time: 2537.989 msecs"
I don't have easy access to any machines with Windows on to compare
though - sorry!