Randomly select an element from a sorted-set (rand-nth (sorted-set ..))

1,268 views
Skip to first unread message

Paul Richards

unread,
Sep 26, 2011, 9:12:34 AM9/26/11
to Clojure
Hi,
How can I efficiently pick a random element from a sorted-set?

If I try rand-nth I get this:
user=> (rand-nth (sorted-set 1 2 3))
java.lang.UnsupportedOperationException: nth not supported on this
type: PersistentTreeSet (NO_SOURCE_FILE:0)

I can get this expression to work if I naively apply seq:
user=> (rand-nth (seq (sorted-set 1 2 3)))
1

However this performs very badly on large sets. Is there a more
efficient way to do this?

(I need to keep my elements in a sorted-set for other parts of the
application, where I rely on subseq.)

Michael Gardner

unread,
Sep 26, 2011, 12:18:27 PM9/26/11
to clo...@googlegroups.com
On Sep 26, 2011, at 8:12 AM, Paul Richards wrote:

> How can I efficiently pick a random element from a sorted-set?

If your sorted set is densely packed (if most possible values do appear in the set), then you could just pick a random value, see if it's in the set, and repeat until you find a match.

Otherwise, you could use a sorted-vector. I don't know if there's a good implementation out there, though, so you might have to roll your own.

Or you could use a sorted-set that can do a "random binary search", where it chooses between the first and second halves at random rather than by comparison with a particular value. Maybe you could subclass the Clojure sorted-set to do this, though I'm guessing it's not made for subclassing.

Benny Tsai

unread,
Sep 26, 2011, 1:04:32 PM9/26/11
to clo...@googlegroups.com
The reason that (rand-nth (seq (sorted-set 1 2 3))) performs badly on large sets is probably because nth is O(n) on sequences.  nth is much much faster on vectors, so I would suggest trying out (rand-nth (vec (sorted-set 1 2 3))) and see if that works for your application.

Jeremy Heiler

unread,
Sep 26, 2011, 1:13:07 PM9/26/11
to clo...@googlegroups.com

Try just getting the value with rand-int directly. The sorted-set uses
a tree map underneath, so look up time is consistent with a map. Also,
count is O(1).

(get foo (rand-int (count foo)))

Paul Richards

unread,
Sep 26, 2011, 5:12:00 PM9/26/11
to Clojure
On Sep 26, 2:12 pm, Paul Richards <paul.richa...@gmail.com> wrote:
> Hi,
> How can I efficiently pick a random element from a sorted-set?
>

I've come up with a bit of a hack, which relies on me not caring about
non-uniform distributions. If I create a custom comparator with a
"random" backdoor, I can select random elements from the sorted set:

(defn my-comp [a b]
(if (or (= :random a) (= :random b))
(- (* 2 (rand-int 2)) 1)
(compare a b)))

; Create test collection (of strings) using the special comparator
(def coll (apply sorted-set-by my-comp (map str (range 1 1000))))

(map #(first (subseq coll > %)) ["500" "200" "700" :random :random])
; Result: ("501" "201" "701" "626" "286")



Bonus question.. What is the distribution of the random selection?

(I suspect elements will be chosen with some probability like (1/2)^H
(where H is the height of the element within the red-black tree). I
should probably generate a histogram to find out..)

Paul Richards

unread,
Sep 26, 2011, 5:01:20 PM9/26/11
to Clojure
On Sep 26, 6:13 pm, Jeremy Heiler <jeremyhei...@gmail.com> wrote:
I feel I picked a bad example. My sorted-set does not contain
integers, the elements are (collections of) strings. Trying this
approach leads to a different failure:

user=> (get (sorted-set "a" "b" "c") 1)
java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Number (NO_SOURCE_FILE:0)

Paul Richards

unread,
Sep 26, 2011, 4:58:59 PM9/26/11
to Clojure
This will replace an O(n) call to "nth" with an O(n) call to copy into
a vector, so still leaving me with O(n).

Benny Tsai

unread,
Sep 27, 2011, 12:04:36 AM9/27/11
to clo...@googlegroups.com
On Monday, September 26, 2011 2:58:59 PM UTC-6, Paul Richards wrote:
This will replace an O(n) call to "nth" with an O(n) call to copy into
a vector, so still leaving me with O(n).

Oops, right :)  If you're getting random elements out of the same sorted set multiple times, then it might be still be worth it to pay the cost of vectorizing the set once in exchange for faster subsequent random selections.  But if not, then I guess not so much.

Alan Malloy

unread,
Sep 27, 2011, 12:07:25 AM9/27/11
to Clojure
I think it's complicated by what you expect "chosen" to mean. If you
were randomly selecting a single element, you would only ever choose
one at the bottom of the tree; the internal nodes would never be
selected, right? Of course, you can't do that, since `get` does an
equality check before returning the item. And if you're selecting a
subseq, it depends on what test(s) you supply, and how many items you
take, and...

Anyway, this is sorta fun to generalize to wrap any other comparator:

(defn hackable-comparator [f]
(fn [& xs]
(if (some #{:random} xs)
(- (* 2 (rand-int 2)) 1)
(apply f xs))))

(def my-comp (hackable-comparator compare))
Reply all
Reply to author
Forward
0 new messages