The Genuine Sieve of Eratosthenes

171 views
Skip to first unread message

thattommyhall

unread,
Nov 6, 2010, 11:28:31 AM11/6/10
to Clojure
Hey,

I read http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf a while ago
and noticed you had referenced the paper and implemented the naive
algorithm in contrib and thought I could chip in and do the sieve for
you but get an error in my current implementation with only a million
primes.
"Exception in thread "main" java.lang.OutOfMemoryError: Java heap
space (core.clj:1)"

The code is at https://github.com/thattommyhall/primes/blob/master/src/primes/core.clj
, its only my second week in clojure so I'm not sure im using lazy-seq
correctly, if sorted-map is the correct datastructure to use or if I'm
missing something else obvious.


Tom

Ken Wesson

unread,
Nov 6, 2010, 1:57:53 PM11/6/10
to clo...@googlegroups.com

What you're missing that's arguably obvious is that there's a lot of
storage space overhead for Clojure's sorted-maps and other data
structures. Here's a Clojure sieve that works fine up to at least 50
million and is reasonably fast (though the same algorithm implemented
in Java is 10x faster even with the type hints and primitive
arithmetic in the Clojure; not sure why).

(defn primes-to [n]
(let [#^booleans sieve (boolean-array (inc n) true)
n (int n)]
(loop [p (int 2)]
(let [pp (unchecked-multiply p p)]
(if (> pp n)
(filter identity (map #(if %1 %2) (drop 2 sieve) (iterate inc 2)))
(do
(loop [i pp]
(aset sieve i false)
(let [j (unchecked-add i p)]
(if (<= j n)
(recur j))))
(let [q (int
(loop [p (unchecked-inc p)]
(if (aget sieve p)
p
(recur (unchecked-inc p)))))]
(recur q))))))))

Here's a much more Clojury version. It's much slower and runs out of
memory past about 20 million, on a VM with 1GB -Xmx. That still sounds
like it beats your less than 1 million:

(defn primes-to [n]
(loop [sieve (vec (take (quot (dec n) 2) (iterate #(+ % 2) 3))) i 0]
(let [p (first (drop i (filter identity sieve)))
indices (take-while
#(<= % (dec (quot n 2)))
(iterate #(+ % p) (dec (quot (* p p) 2))))]
(if (empty? indices)
(cons 2 (filter identity sieve))
(recur
(persistent!
(reduce
(fn [s index]
(assoc! s index nil))
(transient sieve)
indices))
(quot p 2))))))

The logic's fairly convoluted; I squeezed some performance and memory
out of it by storing only odd numbers in the vector (hence all the
(quot foo 2)s and the (cons 2 foo) on the line that generates the
return value); some more by using a transient vector for the inner
loop of the sieve. However it is functional instead of stateful,
unlike the version with the explicit mutable boolean array above.

There's no sorted map involved in either -- except to the extent that
a vector can be regarded as a sorted map with non-negative-integer
keys. Neither implementation explicitly uses the lazy-seq macro,
either.

Juha Arpiainen

unread,
Nov 6, 2010, 5:50:23 PM11/6/10
to Clojure
On Nov 6, 7:57 pm, Ken Wesson <kwess...@gmail.com> wrote:
> Here's a Clojure sieve that works fine up to at least 50
> million and is reasonably fast (though the same algorithm implemented
> in Java is 10x faster even with the type hints and primitive
> arithmetic in the Clojure; not sure why).
>
> (defn primes-to [n]
>   (let [#^booleans sieve (boolean-array (inc n) true)
>         n (int n)]
>     (loop [p (int 2)]
>       (let [pp (unchecked-multiply p p)]
>         (if (> pp n)
>           (filter identity (map #(if %1 %2) (drop 2 sieve) (iterate inc 2)))
>           (do
>             (loop [i pp]
>               (aset sieve i false)
>               (let [j (unchecked-add i p)]
>                 (if (<= j n)
>                   (recur j))))
>             (let [q (int
>                       (loop [p (unchecked-inc p)]
>                         (if (aget sieve p)
>                           p
>                           (recur (unchecked-inc p)))))]
>               (recur q))))))))

Returning a lazy seq doesn't seem to make much sense here,
especially since (map ... (drop 2 sieve)) holds onto the
sieve array. Returning a vector instead drops the runtime
of (time (count (primes-to 10000000))) from 5.3 s to 0.5 s
on my machine:

(loop [r (transient [2])
i 3]
(if (>= i n)
(persistent! r)
(recur
(if (aget sieve i)
(conj! r i)
r)
(+ 2 i))))

Ken Wesson

unread,
Nov 6, 2010, 7:02:48 PM11/6/10
to clo...@googlegroups.com
On Sat, Nov 6, 2010 at 5:50 PM, Juha Arpiainen <jarp...@gmail.com> wrote:
> On Nov 6, 7:57 pm, Ken Wesson <kwess...@gmail.com> wrote:
>>           (filter identity (map #(if %1 %2) (drop 2 sieve) (iterate inc 2)))
>
> Returning a lazy seq doesn't seem to make much sense here,
> especially since (map ... (drop 2 sieve)) holds onto the
> sieve array. Returning a vector instead drops the runtime
> of (time (count (primes-to 10000000))) from 5.3 s to 0.5 s
> on my machine

That doesn't make much sense. At some point the sieve has to be
groveled over and the appropriate indices emitted. Whether that occurs
lazily or promptly shouldn't make a difference to speed, let alone one
of that magnitude. Skipping over the even entries after 2 could speed
things up somewhat but not nearly that much ...

Either there's something massively inefficient about map (or iterate)
or there's something else hinky going on here.

Mark Engelberg

unread,
Nov 7, 2010, 1:31:41 AM11/7/10
to clo...@googlegroups.com
I like tinkering with prime sieve algorithms.

Here's the fastest one I've come up with:

(defn bit-sieve [n]
(let [n (int n)]
"Returns a vector of all primes from 2 to n (not including n)"
(let [root (int (Math/round (Math/floor (Math/sqrt n))))]
(loop [i (int 3)
a (java.util.BitSet. n)
result (transient [2])]


(if (>= i n)

(persistent! result)
(recur (+ i (int 2))
(if (and (not (.get a i)) (<= i root))
(loop [arr a
inc (+ i i)
j (* i i)]


(if (>= j n)

arr
(recur (do (.set arr j) arr)
inc
(+ j inc))))
a)
(if-not (.get a i)
(conj! result i)
result)))))))


Also, here's an implementation of the lazy prime stream algorithm at:
http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

(defn- wheel2357 [] (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8
6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10]))

(defn- spin [l n] (lazy-seq (cons n (spin (rest l) (+ n (first l))))))

(defn- insert-prime [p xs table]
(update-in table [(* p p)] #(conj % (map (fn [n] (* n p)) xs))))

(defn- reinsert [table x table-x]
(loop [m (dissoc table x), elems table-x]
(if-let [elems (seq elems)]
(let [elem (first elems)]
(recur (update-in m [(first elem)] #(conj % (rest elem))) (rest elems)))
m)))

(defn- adjust [x table]
(let [nextTable (first table),
n (nextTable 0)]
(if (<= n x) (recur x (reinsert table n (nextTable 1)))
table)))

(defn- sieve-helper [s table]
(when-let [ss (seq s)]
(let [x (first ss), xs (rest ss),
nextTable (first table),
nextComposite (nextTable 0)]
(if (> nextComposite x)
(lazy-seq (cons x (sieve-helper xs (insert-prime x xs table))))
(recur xs (adjust x table))))))

(defn- sieve [s]
(when-let [ss (seq s)]
(let [x (first ss), xs (rest ss)]
(cons x (sieve-helper xs (insert-prime x xs
(sorted-map)))))))
(defn prime-seq
"(prime-seq) creates a lazy stream of all positive prime numbers"
[]
(concat [2 3 5 7] (sieve (spin (wheel2357) 11))))

(def primes (prime-seq))

Reply all
Reply to author
Forward
0 new messages