Hi Sunil,
> I think a lazy version of shuffle would be a good addtion to the
> core. I tried googling for one and found
> http://ytakenaka.blogspot.com/2011/05/lazy-shuffle-clojure.html which
> was infact slower than original shuffle and was unable to reproduce
> the performance claims made there in.
I think there is no way to implement a "good" lazy version of shuffle.
Correct shuffling, i.e., the result should be a real pseudo-random
permutation of all elements, can only be defined on finite seqs anyway.
And there, one has to realize the given seq. In the code you cite,
there's
(defn lazy-shuffle [lst fetch-len]
(let [len (count lst)
^^^^^^^^^^^
which does exactly that. So here only the cheap creation of the
shuffled seq is lazy, but the possibly expensive calculation of the
given seq is (and has to be done) directly.
Bye,
Tassilo
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
Hi Sunil,
> yea true, the full sequence has to be realized before it is shuffled
> .. but however the thing is that I am only interested in the first
> couple of elements of the shuffled sequence.
Well, in that case, you can use the standard shuffle, right? For
example, to get a random permutation of the first 10 natural numbers,
you can do:
(shuffle (take 10 (iterate inc 0)))
==> [1 6 9 8 4 5 3 7 0 2]
However, you cannot get a seq of 10 randomly chosen natural numbers,
because that would mean to realize the seq of natural numbers.
(take 10 (shuffle (iterate inc 0)))
==> Runs till hell freezes over
But if you specify an upper limit, then it'll work again.
(take 10 (shuffle (take 1000 (iterate inc 0))))
==> (748 460 353 349 586 743 994 404 468 306)
Of course, that will realize the first 1000 natural numbers, although
you only take 10 out of it.
Well, in that use-case, you'd probably want to go with rand-int anyway.
Bye,
Tassilo
Also if the sequence is not "realized" but there's a cheap way to
calculate (nth the-hypothetical-sequence n) just knowing n and not
previous members, there are ways to lazily generate part of a random
permutation:
(take k (distinct (f (rand-int limit))))
where n ranges from 0 to limit-1, f changes n to (nth
the-hypothetical-sequence n) without realizing (or even creating as a
seq object) the-hypothetical-sequence, and k < limit is the number of
items desired from the head of the shuffled sequence. (take limit
(distinct (f (rand-int limit)))) represents a full random permutation
whose elements can be taken lazily.
The downside is that the (distinct ...) implements a rejection
algorithm, so that we get a uniform random element, then a uniform
random from what's left, then ... and so a uniformly-chosen random
permutation, but at the cost that if you realize most or all of the
permutation and not just the first few elements of a big one it will
slow down tremendously. For instance if limit is 10,000 and you
realize the whole thing it will need 10,000 tries on average before
generating the final element. The (distinct ...) also amounts to
holding onto the head.
So this works well if you want the first few elements of the
permutation, f exists, and you don't want to realize all of
the-hypothetical-sequence. It works terribly if you want the whole
permutation or much of it or f does not exist.
There are obvious ways to speed up distinct in this case and make it
more compact, though, for the special case of sets of nonnegative
integers from a range starting at 0: create a BitSet representing
"have we already generated this?" and keep a running count of
generated elements. This allows creating permutations of (range limit)
more efficiently and without realizing all of the Integer objects at
once, at the cost of having mutable state under the hood. Each
iteration we generate k = (rand-int (- limit cnt)) and then walk the
bitset, incrementing a counter n and decrementing k every time we see
a zero in the bitset at position n. When k hits zero, we flip that bit
of the bitset to 1 and output k; that is the next item in the lazy
sequence. Memory complexity is limit/8 bytes instead of limit*12 bytes
(Integer objects consume 12 bytes of memory each) for a space savings
factor of 72, nearly two whole orders of magnitude, over using
(distinct ...).
However, this seems to work only for shuffling (range limit) ... or,
we can use (map f (bitset-perm-of-range limit)) to get an arbitrary
lazy permutation of the-hypothetical-sequence the same way!
Time complexity is quadratic: we have to walk further and further
along the bitset to generate each element. It still beats the
(distinct ...) rejection algorithm when you (eventually) want most or
all of the permutation, and especially when you want that
two-order-of-magnitude memory savings.
Both algorithms avoid the core shuffle's gamut problem when (factorial
limit) exceeds Integer/MAX_VALUE; as long as limit itself doesn't
exceed that there is no problem.
First stab at an implementation:
(defn bitset-perm-of-range [limit]
(let [bs (java.util.BitSet. limit)
rem (atom limit)
step (fn step []
(lazy-seq
(locking bs
(if-not (zero? @rem)
(let [k (rand-int @rem)]
(loop [i 0 k k]
(if (.get bs i)
(recur (inc i) k)
(if (> k 0)
(recur (inc i) (dec k))
(do
(.set bs i)
(swap! rem dec)
(cons i (step)))))))))))]
(step)))
It *seems* to work after very minimal testing on short ranges and one
small chunk of a big range:
=> (bitset-perm-of-range 10)
(0 7 4 1 8 6 5 3 2 9)
=> (bitset-perm-of-range 10)
(2 8 1 5 0 3 9 4 7 6)
=> (bitset-perm-of-range 10)
(5 9 2 1 4 3 7 6 8 0)
=> (take 10 (bitset-perm-of-range 1000000))
(113738 532538 423561 100835 157218 710244 339727 507963 199228 842555)
--
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.