a lazy version of shuffle.

Sunil S Nandihalli

unread,

Jul 11, 2011, 6:44:41 AM7/11/11

to clo...@googlegroups.com

Hello everybody,

I think a lazy version of shuffle would be a good addtion to the core. I tried googling for one and found

http://ytakenaka.blogspot.com/2011/05/lazy-shuffle-clojure.html which was infact slower than original shuffle and was unable to reproduce the performance claims made there in. In fact the experiment showed that the lazy-shuffle was about 10 times slower (of course depends on the size of the collection to be shuffled).. Can somebody help me ?

Thanks,

Sunil.

Tassilo Horn

unread,

Jul 11, 2011, 6:57:44 AM7/11/11

to clo...@googlegroups.com

Sunil S Nandihalli <sunil.na...@gmail.com> writes:

Hi Sunil,

> I think a lazy version of shuffle would be a good addtion to the
> core. I tried googling for one and found
> http://ytakenaka.blogspot.com/2011/05/lazy-shuffle-clojure.html which
> was infact slower than original shuffle and was unable to reproduce
> the performance claims made there in.

I think there is no way to implement a "good" lazy version of shuffle.
Correct shuffling, i.e., the result should be a real pseudo-random
permutation of all elements, can only be defined on finite seqs anyway.
And there, one has to realize the given seq. In the code you cite,
there's

(defn lazy-shuffle [lst fetch-len]
(let [len (count lst)
^^^^^^^^^^^

which does exactly that. So here only the cheap creation of the
shuffled seq is lazy, but the possibly expensive calculation of the
given seq is (and has to be done) directly.

Bye,
Tassilo

Sunil S Nandihalli

unread,

Jul 11, 2011, 7:05:19 AM7/11/11

to clo...@googlegroups.com

yea true, the full sequence has to be realized before it is shuffled .. but however the thing is that I am only interested in the first couple of elements of the shuffled sequence. That is the lazy part I am talking about. But I guess I won't be winning much there.. And again, I think I would have got better performance if I had passed in a "Counted" collection which was not true in my case..
Sunil.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Tassilo Horn

unread,

Jul 11, 2011, 8:23:35 AM7/11/11

to clo...@googlegroups.com

Sunil S Nandihalli <sunil.na...@gmail.com> writes:

Hi Sunil,

> yea true, the full sequence has to be realized before it is shuffled

> .. but however the thing is that I am only interested in the first
> couple of elements of the shuffled sequence.

Well, in that case, you can use the standard shuffle, right? For
example, to get a random permutation of the first 10 natural numbers,
you can do:

(shuffle (take 10 (iterate inc 0)))
==> [1 6 9 8 4 5 3 7 0 2]

However, you cannot get a seq of 10 randomly chosen natural numbers,
because that would mean to realize the seq of natural numbers.

(take 10 (shuffle (iterate inc 0)))
==> Runs till hell freezes over

But if you specify an upper limit, then it'll work again.

(take 10 (shuffle (take 1000 (iterate inc 0))))
==> (748 460 353 349 586 743 994 404 468 306)

Of course, that will realize the first 1000 natural numbers, although
you only take 10 out of it.

Well, in that use-case, you'd probably want to go with rand-int anyway.

Bye,
Tassilo

Daniel Janus

unread,

Jul 11, 2011, 8:54:31 AM7/11/11

to clo...@googlegroups.com

One thing to keep in mind when shuffling is that, by default,

if the input sequence is large enough then most permutations

can never be generated by shuffle. This follows naturally from

the Dirichlet's pigeonhole principle: there are only 2^64 possible

seeds (states of the RNG), but (factorial (count s)) distinct

permutations.

Daniel

Alan Malloy

unread,

Jul 11, 2011, 1:13:42 PM7/11/11

to Clojure

If the sequence is already realized, or is cheap, and you want only a
very small random subset of it, you can do better than shuffling the
whole thing. Fliebel and I played around with several solutions to
this, some time ago. I can't find the whole thing, but some
interesting examples and benchmarking data are at https://gist.github.com/805747
if you want to try it out.

On Jul 11, 3:57 am, Tassilo Horn <tass...@member.fsf.org> wrote:

Ken Wesson

unread,

Jul 11, 2011, 4:56:16 PM7/11/11

to clo...@googlegroups.com

On Mon, Jul 11, 2011 at 1:13 PM, Alan Malloy <al...@malloys.org> wrote:
> If the sequence is already realized, or is cheap, and you want only a
> very small random subset of it, you can do better than shuffling the
> whole thing. Fliebel and I played around with several solutions to
> this, some time ago. I can't find the whole thing, but some
> interesting examples and benchmarking data are at https://gist.github.com/805747
> if you want to try it out.

Also if the sequence is not "realized" but there's a cheap way to
calculate (nth the-hypothetical-sequence n) just knowing n and not
previous members, there are ways to lazily generate part of a random
permutation:

(take k (distinct (f (rand-int limit))))

where n ranges from 0 to limit-1, f changes n to (nth
the-hypothetical-sequence n) without realizing (or even creating as a
seq object) the-hypothetical-sequence, and k < limit is the number of
items desired from the head of the shuffled sequence. (take limit
(distinct (f (rand-int limit)))) represents a full random permutation
whose elements can be taken lazily.

The downside is that the (distinct ...) implements a rejection
algorithm, so that we get a uniform random element, then a uniform
random from what's left, then ... and so a uniformly-chosen random
permutation, but at the cost that if you realize most or all of the
permutation and not just the first few elements of a big one it will
slow down tremendously. For instance if limit is 10,000 and you
realize the whole thing it will need 10,000 tries on average before
generating the final element. The (distinct ...) also amounts to
holding onto the head.

So this works well if you want the first few elements of the
permutation, f exists, and you don't want to realize all of
the-hypothetical-sequence. It works terribly if you want the whole
permutation or much of it or f does not exist.

There are obvious ways to speed up distinct in this case and make it
more compact, though, for the special case of sets of nonnegative
integers from a range starting at 0: create a BitSet representing
"have we already generated this?" and keep a running count of
generated elements. This allows creating permutations of (range limit)
more efficiently and without realizing all of the Integer objects at
once, at the cost of having mutable state under the hood. Each
iteration we generate k = (rand-int (- limit cnt)) and then walk the
bitset, incrementing a counter n and decrementing k every time we see
a zero in the bitset at position n. When k hits zero, we flip that bit
of the bitset to 1 and output k; that is the next item in the lazy
sequence. Memory complexity is limit/8 bytes instead of limit*12 bytes
(Integer objects consume 12 bytes of memory each) for a space savings
factor of 72, nearly two whole orders of magnitude, over using
(distinct ...).

However, this seems to work only for shuffling (range limit) ... or,
we can use (map f (bitset-perm-of-range limit)) to get an arbitrary
lazy permutation of the-hypothetical-sequence the same way!

Time complexity is quadratic: we have to walk further and further
along the bitset to generate each element. It still beats the
(distinct ...) rejection algorithm when you (eventually) want most or
all of the permutation, and especially when you want that
two-order-of-magnitude memory savings.

Both algorithms avoid the core shuffle's gamut problem when (factorial
limit) exceeds Integer/MAX_VALUE; as long as limit itself doesn't
exceed that there is no problem.

First stab at an implementation:

(defn bitset-perm-of-range [limit]
(let [bs (java.util.BitSet. limit)
rem (atom limit)
step (fn step []
(lazy-seq
(locking bs
(if-not (zero? @rem)
(let [k (rand-int @rem)]
(loop [i 0 k k]
(if (.get bs i)
(recur (inc i) k)
(if (> k 0)
(recur (inc i) (dec k))
(do
(.set bs i)
(swap! rem dec)
(cons i (step)))))))))))]
(step)))

It *seems* to work after very minimal testing on short ranges and one
small chunk of a big range:

=> (bitset-perm-of-range 10)
(0 7 4 1 8 6 5 3 2 9)
=> (bitset-perm-of-range 10)
(2 8 1 5 0 3 9 4 7 6)
=> (bitset-perm-of-range 10)
(5 9 2 1 4 3 7 6 8 0)
=> (take 10 (bitset-perm-of-range 1000000))
(113738 532538 423561 100835 157218 710244 339727 507963 199228 842555)

--
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

Brian Goslinga

unread,

Jul 12, 2011, 8:26:14 PM7/12/11

to Clojure

On Jul 11, 5:44 am, Sunil S Nandihalli <sunil.nandiha...@gmail.com>
wrote:

> Hello everybody,
> I think a lazy version of shuffle would be a good addtion to the core. I

> tried googling for one and foundhttp://ytakenaka.blogspot.com/2011/05/lazy-shuffle-clojure.htmlwhich was

> infact slower than original shuffle and was unable to reproduce the
> performance claims made there in. In fact the experiment showed that the
> lazy-shuffle was about 10 times slower (of course depends on the size of the
> collection to be shuffled).. Can somebody help me ?
>
> Thanks,
> Sunil.

I once created a lazy version of shuffle that worked on some random
access collection (such as vector) using the Fisher-Yates algorithm
that virtually swapped elements by using a map to keep track of index-
>element remappings. It seemed to work rather well if you wanted a
minority of the items in the original collection. I should try to dig
up that code...

NeedMoreDesu

unread,

Aug 23, 2013, 4:57:50 AM8/23/13

to clo...@googlegroups.com

(letfn [(vec-throw-out [v i]
         (pop (assoc v i (get v (dec (count v))))))]
(defn lazy-shuffle [v]
(if (seq v)
   (let [idx (rand-int (count v))]
    (cons
     (get v idx)
     (lazy-seq
      (lazy-shuffle
       (vec-throw-out v idx))))))))

(let [s (range 1000000)
      v (vec s)]
(time (take 10 (lazy-shuffle v)))
nil)
;; "Elapsed time: 0.092389 msecs"

Christophe Grand

unread,

Aug 23, 2013, 12:16:29 PM8/23/13

to clojure

=> (defn lazy-shuffle [v]
(lazy-seq

(if (seq v)
(let [idx (rand-int (count v))]

(cons (nth v idx)
(lazy-shuffle (pop (assoc v idx (peek v)))))))))

=> (take 10 (lazy-shuffle (vec (range 100))))
(77 63 37 99 81 73 25 85 35 10)

--

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

---

You received this message because you are subscribed to the Google Groups "Clojure" group.

To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
On Clojure http://clj-me.cgrand.net/
Clojure Programming http://clojurebook.com
Training, Consulting & Contracting http://lambdanext.eu/

Reply all

Reply to author

Forward