What you're missing that's arguably obvious is that there's a lot of
storage space overhead for Clojure's sorted-maps and other data
structures. Here's a Clojure sieve that works fine up to at least 50
million and is reasonably fast (though the same algorithm implemented
in Java is 10x faster even with the type hints and primitive
arithmetic in the Clojure; not sure why).
(defn primes-to [n]
(let [#^booleans sieve (boolean-array (inc n) true)
n (int n)]
(loop [p (int 2)]
(let [pp (unchecked-multiply p p)]
(if (> pp n)
(filter identity (map #(if %1 %2) (drop 2 sieve) (iterate inc 2)))
(do
(loop [i pp]
(aset sieve i false)
(let [j (unchecked-add i p)]
(if (<= j n)
(recur j))))
(let [q (int
(loop [p (unchecked-inc p)]
(if (aget sieve p)
p
(recur (unchecked-inc p)))))]
(recur q))))))))
Here's a much more Clojury version. It's much slower and runs out of
memory past about 20 million, on a VM with 1GB -Xmx. That still sounds
like it beats your less than 1 million:
(defn primes-to [n]
(loop [sieve (vec (take (quot (dec n) 2) (iterate #(+ % 2) 3))) i 0]
(let [p (first (drop i (filter identity sieve)))
indices (take-while
#(<= % (dec (quot n 2)))
(iterate #(+ % p) (dec (quot (* p p) 2))))]
(if (empty? indices)
(cons 2 (filter identity sieve))
(recur
(persistent!
(reduce
(fn [s index]
(assoc! s index nil))
(transient sieve)
indices))
(quot p 2))))))
The logic's fairly convoluted; I squeezed some performance and memory
out of it by storing only odd numbers in the vector (hence all the
(quot foo 2)s and the (cons 2 foo) on the line that generates the
return value); some more by using a transient vector for the inner
loop of the sieve. However it is functional instead of stateful,
unlike the version with the explicit mutable boolean array above.
There's no sorted map involved in either -- except to the extent that
a vector can be regarded as a sorted map with non-negative-integer
keys. Neither implementation explicitly uses the lazy-seq macro,
either.
That doesn't make much sense. At some point the sieve has to be
groveled over and the appropriate indices emitted. Whether that occurs
lazily or promptly shouldn't make a difference to speed, let alone one
of that magnitude. Skipping over the even entries after 2 could speed
things up somewhat but not nearly that much ...
Either there's something massively inefficient about map (or iterate)
or there's something else hinky going on here.
Here's the fastest one I've come up with:
(defn bit-sieve [n]
(let [n (int n)]
"Returns a vector of all primes from 2 to n (not including n)"
(let [root (int (Math/round (Math/floor (Math/sqrt n))))]
(loop [i (int 3)
a (java.util.BitSet. n)
result (transient [2])]
(if (>= i n)
(persistent! result)
(recur (+ i (int 2))
(if (and (not (.get a i)) (<= i root))
(loop [arr a
inc (+ i i)
j (* i i)]
(if (>= j n)
arr
(recur (do (.set arr j) arr)
inc
(+ j inc))))
a)
(if-not (.get a i)
(conj! result i)
result)))))))
Also, here's an implementation of the lazy prime stream algorithm at:
http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
(defn- wheel2357 [] (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8
6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10]))
(defn- spin [l n] (lazy-seq (cons n (spin (rest l) (+ n (first l))))))
(defn- insert-prime [p xs table]
(update-in table [(* p p)] #(conj % (map (fn [n] (* n p)) xs))))
(defn- reinsert [table x table-x]
(loop [m (dissoc table x), elems table-x]
(if-let [elems (seq elems)]
(let [elem (first elems)]
(recur (update-in m [(first elem)] #(conj % (rest elem))) (rest elems)))
m)))
(defn- adjust [x table]
(let [nextTable (first table),
n (nextTable 0)]
(if (<= n x) (recur x (reinsert table n (nextTable 1)))
table)))
(defn- sieve-helper [s table]
(when-let [ss (seq s)]
(let [x (first ss), xs (rest ss),
nextTable (first table),
nextComposite (nextTable 0)]
(if (> nextComposite x)
(lazy-seq (cons x (sieve-helper xs (insert-prime x xs table))))
(recur xs (adjust x table))))))
(defn- sieve [s]
(when-let [ss (seq s)]
(let [x (first ss), xs (rest ss)]
(cons x (sieve-helper xs (insert-prime x xs
(sorted-map)))))))
(defn prime-seq
"(prime-seq) creates a lazy stream of all positive prime numbers"
[]
(concat [2 3 5 7] (sieve (spin (wheel2357) 11))))
(def primes (prime-seq))