Primes are hard, because you essentially need to track all previous primes.
Here's one way to think about what you're doing here. First, you
create a lazy sequence of infinite numbers from 3 and up. This is done
by building a cons cell with 3 as the head and (iterate inc 4) as the
tail.
Then you filter that. From now on, every time you want the next
element, you'll have to run two functons: inc and odd?, plus some
extra processing for iterate and filter which both build up an
intermediate lazy seq.
Then you call sieve, which yet again builds up a new lazy seq that
accumulates more functions on its tail: assoc-nth, drop-while and a
recursive call to sieve. And assoc-nth itself still wraps a bunch of
other functions on your tail.
Now, the crucial part here is that most of these functions never go
away: for each new element that you compute, you add additional
function calls to each subsequent element. So that getting the 5th
element requires more additional elements than getting the 4th.
In some way, you have chosen to encode your list of previous primes as
accumulated functions. By the 500th prime, you have some much
accumulated functions that your call stack itself blows the stack.
With your implementation, on my computer, I only get about 300:
(defn num-primes
[prime-seq]
(try (doall prime-seq)
(catch StackOverflowError e
(count prime-seq))))
=> (time (num-primes primes))
"Elapsed time: 62385.706564 msecs"
325
A slightly more direct approach to the sieve (though arguably not a
true sieve), which has the same problem, in a more obvious way, would
be:
(defn filter-primes
[]
(let [helper (fn helper [n]
(lazy-seq (cons n (remove #(zero? (mod % n))
(helper (inc n))))))]
(helper 2)))
=> (time (max-prime (filter-primes)))
"Elapsed time: 472.078328 msecs"
251
Two things to note:
* I've turned the sequence into a defn instead of a def. This makes it
possible to not hold onto the head. Since we get a stack overflow
around 300 integers here, this is no biggie, but it's a good practice
when defining potentially very long lazy seqs.
* It should be much more clear here that each prime is adding a call
to remove: whereas the 2 gets returned directly, the 3 has to go
through a mod 2 operation, then a remove function has to check that it
returned false. Similarly, the 5 will, in this very naive
implementation, need to check that it is not divisible by either 2, 3,
or 4, so we now have a "stack" of three remove functions to go
through.
A closer implementation to yours would be:
(defn filter-primes2
[]
(let [sieve (fn [ls]
(let [p (first ls)]
(lazy-seq
(cons p (remove #(zero? (mod % p)) ls)))))]
(sieve (iterate inc 2))))
=> (time (num-primes (filter-primes2)))
"Elapsed time: 59021.93554 msecs"
325
However, storing "primes found so far" into a stack of functions is
not very efficient (in either memory or computation), and the sieve
only really shines when it's used on vectors of known length (because
then there's no division), not on lazy seqs of potentially infinite
length. Here's an attempt to combine the advantages of the sieve with
lazy seqs:
(defn array-sieve
[]
(let [sieve (fn [n]
(let [a (long-array n)]
(loop [idx 0]
(when (< idx n)
(aset-long a idx (+ 2 idx))
(recur (inc idx))))
(loop [idx 0]
(when (and (< idx n)
(not= 0 (aget a idx)))
(let [v (aget a idx)]
(loop [idx2 (+ idx v)]
(when (< idx2 n)
(aset-long a idx2 0)
(recur (+ v idx2))))))
(when (< idx n)
(recur (inc idx))))
(vec (remove zero? a))))
generator (fn [[prev-idx known-primes prev-prime]]
(let [next-idx (inc prev-idx)
known-primes (if (< next-idx (count known-primes))
known-primes
(loop [n (* 2 next-idx)]
(let [k (sieve n)]
(if (> (count k) next-idx)
k
(recur (* 2 n))))))
next-prime (get known-primes next-idx)]
[next-idx known-primes next-prime]))]
(->> [0 [2] 2]
(iterate generator)
(map last))))
=> (time (nth (array-sieve) (dec 100000)))
"Elapsed time: 13454.668724 msecs"
1299709
(You can check e.g. here that this is correct:
https://primes.utm.edu/nthprime/index.php#nth; the dec is there
because this website counts from 1, and nth from 0)
This is of course much more complex, but it's also much faster and
much more memory-efficient. There's probably a way to exploit the
current known primes in the computation of the next ones, too, but I
find it fast enough (and complex enough!) as it is.
Note that I do not construct a lazy seq myself anymore, and instead
rely on iterate. If you can keep a relatively small amount of state to
compute the next number in the sequence, I would generally recommend
using a similar approach: iterate a function of [[value state]] ->
[value state] and then map first over it. Here's another example of
this technique, for computing Fibonacci numbers:
(defn fib
[]
(->> [0M 1M]
(iterate (fn [[a b]] [b (+ a b)]))
(map first)))
=> (time (nth (fib) 50000))
"Elapsed time: 283.99525 msecs"
10777734893072974780279...8305364252373553125M
(For Fibonacci numbers, I start out with BigNums because, well, it
overflows pretty quickly otherwise.)