Sequence to Vector

58 views
Skip to first unread message

Gabi

unread,
Jan 1, 2010, 7:19:47 PM1/1/10
to Clojure
What is the preferred way getting a vector back from sequence, after a
sequence producing operation (like sort)? Does using (vec..) on a
sequence that was a vector is costly?

One (bad?) possibility is creating a new vector out of sequence:

(vec (sort [1 2 3 4 5 6]))

I am asking because I need random access (nth ..) to huge sorted
vectors - which are now actually huge sequences after the sort, with
horrible O(n) random access time

Sean Devlin

unread,
Jan 1, 2010, 10:45:01 PM1/1/10
to Clojure
You vec approach is the current solution. I'm 95% sure you'll have to
pay the O(n) once, but you'll be good to go after that.

Sean

Chouser

unread,
Jan 2, 2010, 12:33:40 AM1/2/10
to clo...@googlegroups.com
On Fri, Jan 1, 2010 at 7:19 PM, Gabi <bugs...@gmail.com> wrote:
>
> I am asking because I need random access (nth ..) to huge sorted
> vectors - which are now actually huge sequences after the sort, with
> horrible O(n) random access time

Clojure's 'sort' actually copies your input collection into an
array, and then uses java.util.Arrays/sort to sort it in place.
It then wraps a seq around that array, which is good in that it
makes that array essentially immutable, but bad for indexed
lookup. You might consider using java.util.Array/sort yourself,
then you could get very fast indexed lookups into the sorted
array using 'aget'.

A bit uglier, but ought to be quite fast.

It does raise the question of whether it would be good for
ArraySeqs to implement Indexed. Then 'nth' would be fast on the
object that 'sort' returns.

--Chouser

ianp

unread,
Jan 2, 2010, 2:15:11 PM1/2/10
to Clojure
> A bit uglier, but ought to be quite fast.

It doesn't need to be that ugly:

(defn sorted-vec [coll]
(doto (into-array coll)
(java.util.Arrays/sort)
(vec)))

looks OK to me at least.

ianp

unread,
Jan 2, 2010, 2:17:18 PM1/2/10
to Clojure
> A bit uglier, but ought to be quite fast.

Doesn't need to be that ugly, this looks OK to me at least:

(defn sorted-vec [coll]
(doto (into-array coll)
(java.util.Arrays/sort)
(vec)))

It'd also be possible to generalise the return type by passing in a fn
to apply to the sorted array.

Meikel Brandmeyer

unread,
Jan 2, 2010, 4:04:29 PM1/2/10
to clo...@googlegroups.com
Hi,

Unfortunately the above code does not work, because it returns the array and not the vector.

(defn sorted-vec
[coll]
(let [arr (into-array coll)]
(java.util.Array/sort arr)
(vec arr)))

How fast is vec on an array? Isn't there a LazilyPersistentVector which just wraps the array?

Sincerely
Meikel

Gabi

unread,
Jan 3, 2010, 3:29:16 AM1/3/10
to Clojure
The sorted-vec is ~4 times faster than Clojure's sort (on my humble
old machine):

user=> (def v (vec (take 10000 (repeatedly #(rand-int 100000)))))
#'user/v
user=> (time(dotimes [_ 1000] (sort v)))
"Elapsed time: 23945.682336 msecs"
nil
user=> (time(dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 6030.585433 msecs"

BTW we have state here (the native Java array). Is it thread safe ?

Gabi

unread,
Jan 3, 2010, 6:54:03 AM1/3/10
to Clojure
I investigated a little bit more. Seems that (into-array) is slows
things down because it seqs (un-necessarily?) that vector before
passing to clojure.lang.RT/seqToTypedArray.

I almost doubled the speed of sorted-vec by using clojure.lang.RT/
toArray:

(defn sorted-vec2
[coll]
(let [arr (clojure.lang.RT/toArray coll)]
(java.util.Arrays/sort arr)
(vec arr)))

user=> (time(dotimes [_ 1000] (sorted-vec2 v)))
"Elapsed time: 3502.369933 msecs"

user=> (time(dotimes [_ 1000] (sorted-vec v)))

"Elapsed time: 5874.088425 msecs"

Gabi

unread,
Jan 3, 2010, 7:52:00 AM1/3/10
to Clojure
More findings: The reason that the Clojure's original sort is 8 times
slower than sorted-vec2 is only because Clojure uses its own
comparator by default, instead of using Java's default comparator.
Otherwise it's same performance.

;Modified clojure.core.clj sort
(defn sort2
"Returns a sorted sequence of the items in coll. If no comparator is
supplied, uses compare. comparator must
implement java.util.Comparator."
([coll]
(if (seq coll)
(let [a (to-array coll)]
(java.util.Arrays/sort a)
(seq a))
()))
([#^java.util.Comparator comp coll]
(if (seq coll)
(let [a (to-array coll)]
(. java.util.Arrays (sort a comp))
(seq a))
())))

;Modified sort
user=> (time (dotimes[_ 1000](sort2 v)))
"Elapsed time: 3510.047755 msecs"

;Original sort
user=> (time (dotimes[_ 1000](sort v)))
"Elapsed time: 23872.07338 msecs"

;Sorted vec. Returns vector, not sequence
user=> (time (dotimes[_ 1000](sorted-vec2 v)))
"Elapsed time: 3534.578648 msecs"

Meikel Brandmeyer

unread,
Jan 3, 2010, 4:11:12 AM1/3/10
to clo...@googlegroups.com
Hi,

Am 03.01.2010 um 09:29 schrieb Gabi:

> BTW we have state here (the native Java array). Is it thread safe ?
>

>> (defn sorted-vec
>> [coll]
>> (let [arr (into-array coll)]
>> (java.util.Array/sort arr)
>> (vec arr)))

Since the array is not accessible from outside the function I would suspect its use to be similar safe as transients. (Although not enforced...)

Sincerely
Meikel

ianp

unread,
Jan 3, 2010, 10:40:43 AM1/3/10
to Clojure
> BTW we have state here (the native Java array). Is it thread safe ?

Well, the state doesn’t escape the function so it’s similar to
transients in that regard (although without the guarantees that
transients provide, but I think that for a fn this short it’s OK).

ianp

unread,
Jan 3, 2010, 10:51:20 AM1/3/10
to Clojure
> More findings: The reason that the Clojure's original sort is  8 times slower

I don’t see that on my machine. I’m running 1.1.0-master-SNAPSHOT with
Apple’s Java 6 VM in case that has anything to do with it, but here's
what I get (after running the tests several times to warm up hotspot):

user=> (def v (vec (take 10000 (repeatedly #(rand-int 100000)))))

user=> (time (dotimes [_ 1000] (sort v)))
"Elapsed time: 4376.471 msecs"

user=> (defn sorted-vec [coll]
(let [a (into-array coll)]
(java.util.Arrays/sort a)
(vec a)))
user=> (time (dotimes [_ 1000] (sorted-vec v)))
"Elapsed time: 3254.371 msecs"

user=> (defn sorted-vec-2 [coll]


(let [a (to-array coll)]
(java.util.Arrays/sort a)

(vec a)))
user=> (time (dotimes [_ 1000] (sorted-vec-2 v)))
"Elapsed time: 2599.63 msecs"

So sorted-vec is faster, but not an order of magnitude, and sorted-
vec-2 is faster again.

Another alternative that may be worth considering is leaving the data
in the array and using aget to access elements (this should give you O
(1) access times vs. O(log32N) AFAIK). This may be a solution if
you're not mutating the data in the array, but I'd be careful about
this optimisation unless it really gets a large speed boost for your
code.

Gabi

unread,
Jan 3, 2010, 4:00:35 PM1/3/10
to Clojure
I've double checked on my machine (Vista. JVM 6. Clojure 1.1.0).
Clojure's sort is is 4 to 5 times slower than sorted-vec2
Maybe somebody with a Vista machine double check this?

Aaron Cohen

unread,
Jan 3, 2010, 4:04:17 PM1/3/10
to clo...@googlegroups.com
What JVM 6 sub-version are you using?

Does it make any difference if you specify -XX:+DoEscapeAnalysis at
the command line? Various JVM 6 sub-versions enable and disable it by
default and it can make a pretty hefty difference if it isn't enabled.

-- Aaron

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Gabi

unread,
Jan 3, 2010, 4:20:05 PM1/3/10
to Clojure
"1.6.0_17" .It doesn't support this flag:
Unrecognized VM option '+DoEscapeAnalysis'

On Jan 3, 11:04 pm, Aaron Cohen <remled...@gmail.com> wrote:
> What JVM 6 sub-version are you using?
>
> Does it make any difference if you specify -XX:+DoEscapeAnalysis at
> the command line? Various JVM 6 sub-versions enable and disable it by
> default and it can make a pretty hefty difference if it isn't enabled.
>
> -- Aaron
>

Gabi

unread,
Jan 3, 2010, 4:26:39 PM1/3/10
to Clojure
It turns out I run the client version.
When running the server version (-server) the performance of sort is 4
times better.

Gabi

unread,
Jan 3, 2010, 4:32:27 PM1/3/10
to Clojure
But that does not exclude the fact that sorted-vec-2 is about %75
times faster than sort

ianp

unread,
Jan 3, 2010, 6:44:20 PM1/3/10
to Clojure
> What JVM 6 sub-version are you using?

$ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-10M3025)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode)

> Does it make any difference if you specify -XX:+DoEscapeAnalysis at

My clojure start script currently has:

java -server -classpath $CLASSPATH \
-XX:+EliminateLocks -XX:+UseBiasedLocking \
-XX:+UseCompressedOops -XX:+DoEscapeAnalysis \
jline.ConsoleRunner clojure.main $@

> the command line? Various JVM 6 sub-versions enable and disable it by
> default and it can make a pretty hefty difference if it isn't enabled.

And the following times are pretty representative (after warming up
the VM, the first couple of runs are slower):

user=> (time (dotimes [_ 1000] (sort v)))

"Elapsed time: 4145.987 msecs"

user=> (time (dotimes [_ 1000] (sorted-vec v)))

"Elapsed time: 3295.779 msecs"

user=> (time (dotimes [_ 1000] (sorted-vec-2 v)))

"Elapsed time: 2537.989 msecs"

I don't have easy access to any machines with Windows on to compare
though - sorry!

Reply all
Reply to author
Forward
0 new messages