[racket] Why is this slow?

48 views
Skip to first unread message

Jordan Schatz

unread,
Apr 24, 2013, 1:37:49 AM4/24/13
to Racket Users

I'm working through some programing puzzles, and one of them was to find the sum
of all primes less the 2 million, the new math/number-theory collection (thanks
Neil) makes it easy:

#lang racket

(require math/number-theory)

;;takes about 4 seconds
(apply + (filter prime? (range 1 2000000)))

But I thought that spoiled the fun, and I was surely making racket work more
then necessary testing each number in the range, so I thought the Sieve of
Eratosthenes would be fast and fun:

#lang racket
;; Input: an integer n > 1
(define (sieve-of-eratosthenes n)
;; Let A be an array of Boolean values, indexed by integers 2 to n,
;; initially all set to true.

;; for i = 2, 3, 4, ..., √n :
;; when A[i] is true:
;; for j = i², i²+i, i²+2i, ..., n:
;; A[j] := false
(define A (make-vector n #t))
(for ([i (range 2 (sqrt n))])
(when (vector-ref A i)
(for ([p (range 0 n)])
#:break (> (+ (* i i) (* p i) 1) n)
(let ([j (+ (* i i) (* p i))])
(vector-set! A j #f)))))

;; Now all i such that A[i] is true are prime.
(filter number?
(for/list ([i (range 2 n)])
(when (vector-ref A i) i))))

;;takes about 17 seconds
(time (apply + (sieve-of-eratosthenes 2000000))

But it is dead slow....

Thanks,
Jordan

____________________
Racket Users list:
http://lists.racket-lang.org/users

Jens Axel Søgaard

unread,
Apr 24, 2013, 4:49:54 AM4/24/13
to Jordan Schatz, Racket Users
Hi Jordan,

2013/4/24 Jordan Schatz <jor...@noionlabs.com>


I'm working through some programing puzzles, and one of them was to find the sum of all primes less the 2 million, the new math/number-theory collection (thanks Neil) makes it easy:

I better take the blame for the number-theory collection :-)
 
#lang racket

(require math/number-theory)

;;takes about 4 seconds
(apply + (filter prime? (range 1 2000000)))

But I thought that spoiled the fun, and I was surely making racket work more
then necessary testing each number in the range, so I thought the Sieve of
Eratosthenes would be fast and fun:

(define (sieve-of-eratosthenes n)
     ... 
  (define A (make-vector n #t))
      ... 
But it is dead slow....

The easiest way to make your sieve faster is to allocate a smaller vector.
Since 2 is the only even prime, you know in advance that every other
element of the vector contains false. Using this fact you kan reduce
the size of the vector to n/2.

This change is relatively easy to make. You can however take the
idea further and use the same trick with 3. Since 2*3=6 we look at
the remainders from division with 6. The remainders 0, 2, and 4
indicate an even number, and the remainders 0 and 3 are numbers
that 3 divide. Primes must therefore have remainder 1 or 5 when
divided by 6. Omitting numbers with remainders 0,2,3,4 from the
vector reduces the size of the vector to n/3.
Extending to the first three primes is also possible.

In the number-theory module I used remainders modulo 60=2*2*3*5.
The trick is that there are exactly 16 numbers modulo 60, that
stems from non-primes. One can therefore represent the block of 60
remainders in only 2 bytes. See the gory details in:


/Jens Axel


 

Gary Baumgartner

unread,
Apr 24, 2013, 5:17:28 AM4/24/13
to Jordan Schatz, Racket Users
The `range` function produces a list, which is a big overhead for each time
the inner loop executes. There's a big improvement changing the inner loop
sequence clause to just [p n]. There's a bit more improvement making that
[p (in-range n)], which is recognized statically at expansion time by the
`for` loop to produce better code. See:

http://docs.racket-lang.org/guide/for.html#(part._for-performance)

Filtering the final list during building is also an improvement:

(for/list ([i (range 2 n)]

#:when (vector-ref A i))
i)

And let's make it more consistent with the comment style, and consistent
with the above changes. This appears to speed it up even more, although
I didn't investigate where exactly the big win(s) were:


#lang racket

;; Input: an integer n > 1
(define (sieve-of-eratosthenes n)
;; Let A be an array of Boolean values, indexed by integers 2 to n,
;; initially all set to true.

;; for i = 2, 3, 4, ..., √n :
;; when A[i] is true:
;; for j = i², i²+i, i²+2i, ..., n:
;; A[j] := false
(define A (make-vector n #t))

(for ([i (in-range 2 (sqrt n))]
#:when (vector-ref A i))
(for ([j (in-range (sqr i) n i)])
(vector-set! A j #f)))

;; Now all i such that A[i] is true are prime.

(for/list ([i (in-range 2 n)]
#:when (vector-ref A i))
i))

(time (apply + (sieve-of-eratosthenes 2000000)))

Shannon Severance

unread,
Apr 24, 2013, 5:38:34 AM4/24/13
to Jordan Schatz, Racket Users
Jordan,

I'm new to Racket, so I do not fully understand your implementation of
the sieve-of-eratosthenes. So I can't comment one why it is slow. But
I have a faster version, and maybe some differences will stand out to
investigate.

I did notice that your when your implementation of the sieve was
running, the little GC icon was always atleast flashing and sometimes
solid. Lots of intermediate results being created that need GCed? If
you compare my faster implementation of the sieve below, it has far
fewer arithmetic operations and fewer intermediate results.

My version is pretty close to what I would have done in r5rs
scheme. But I don't have the scheme library functions memorized and
used the Racket documentation to find appropriate Racket functions,
which may or may not have been in r5rs.

#lang racket

(define (soe n)
(define sqrt-n (inexact->exact (ceiling(sqrt n))))

(define primes (build-vector (add1 n) (λ (_) _)))
(vector-copy! primes 0 (vector #f #f))

(define (cross-off-multiples prime)
(define (cross-off-multiples-iter not-prime)
(if (<= not-prime n)
[begin
(vector-set! primes not-prime #f)
(cross-off-multiples-iter (+ not-prime prime))]
(visit-candidates (add1 prime))))
(cross-off-multiples-iter (expt prime 2)))

(define (visit-candidates prime-candidate)
(cond [(> prime-candidate sqrt-n) primes]
[(not (vector-ref primes prime-candidate)) (visit-candidates
(add1 prime-candidate))]
[(vector-ref primes prime-candidate) (cross-off-multiples
prime-candidate)]))
(filter (λ (_) _) (vector->list (visit-candidates 2))))

I placed the two versions you gave and my version in the same
definitions file. Within the Dr Racket IDE, ran the definitions and
then:

> (define n (inexact->exact 2e6))
> (time (apply + (filter prime? (range 2 n))))
cpu time: 9788 real time: 9974 gc time: 1851
142913828922
> (time (apply + (soe n)))
cpu time: 1049 real time: 1065 gc time: 311
142913828922
> (time (apply + (sieve-of-eratosthenes n)))
cpu time: 266955 real time: 274532 gc time: 218362
142913828922

Machine Info:
Model Name: Mac Pro
Model Identifier: MacPro4,1
Processor Name: Quad-Core Intel Xeon
Processor Speed: 2.66 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 8 MB
Memory: 6 GB

System Version: OS X 10.8.3 (12D78)
Kernel Version: Darwin 12.3.0
Boot Volume: Macintosh SSD
Boot Mode: Normal
Secure Virtual Memory: Enabled

DrRacket version 5.3.3
Memory Limit 4096MB

Hoping there is a clue above.

-- Shannon

Shannon Severance

unread,
Apr 24, 2013, 5:45:47 AM4/24/13
to Gary Baumgartner, Jordan Schatz, Racket Users
This one I will need to study, since it uses new to me Racket stuff. And
it is quick. On my machine:
> (time (apply + (gb-sieve-of-eratosthenes n)))
cpu time: 485 real time: 497 gc time: 118
142913828922


Compare:


> (time (apply + (filter prime? (range 2 n))))
cpu time: 9788 real time: 9974 gc time: 1851
142913828922
> (time (apply + (soe n)))
cpu time: 1049 real time: 1065 gc time: 311
142913828922
> (time (apply + (sieve-of-eratosthenes n)))
cpu time: 266955 real time: 274532 gc time: 218362
142913828922

Machine info in separate email to the list.

Robby Findler

unread,
Apr 24, 2013, 8:15:43 AM4/24/13
to Shannon Severance, Racket Users
Hi: beware that timing inside DrRacket can be very misleading (DrRacket is doing lots of stuff). See:


for more details.

Robby

Jordan Schatz

unread,
Apr 25, 2013, 2:02:58 AM4/25/13
to Jens Axel Søgaard, Racket Users
Jens Axel Søgaard <jens...@soegaard.net> writes:
> I better take the blame for the number-theory collection :-)
Ah, sorry for the miss attribution.

> In the number-theory module I used remainders modulo 60=2*2*3*5.
> The trick is that there are exactly 16 numbers modulo 60, that
> stems from non-primes. One can therefore represent the block of 60
> remainders in only 2 bytes. See the gory details in:
>
> https://github.com/plt/racket/blob/master/collects/math/private/number-theory/small-primes.rkt

I looked at that for quite a while (it is wonderful that so much of racket is
written in racket) but never quite "got it" I think that method is called the
Atkin Sieve? As I said it is still over my head, but it seems like their would
be a straight forward way to provide your sieve from that module?

Jordan Schatz

unread,
Apr 25, 2013, 1:45:40 AM4/25/13
to Gary Baumgartner, Racket Users

Gary Baumgartner <g...@cs.toronto.edu> writes:
> The `range` function produces a list, which is a big overhead for each time
> the inner loop executes. There's a big improvement changing the inner loop
> sequence clause to just [p n]. There's a bit more improvement making that
> [p (in-range n)], which is recognized statically at expansion time by the
> `for` loop to produce better code. See:
>
> http://docs.racket-lang.org/guide/for.html#(part._for-performance)
Thank you! That worked wonders :)

- Jordan

Tim Brown

unread,
Apr 25, 2013, 5:53:04 AM4/25/13
to Shannon Severance, us...@racket-lang.org

On 24 Apr 2013 10:42, "Shannon Severance" <s...@s53.me> wrote:
>       (define sqrt-n (inexact->exact (ceiling(sqrt n))))

Racket has integer-sqrt, which saves you defining sqrt-n every time.
Also worth noting is sqr, which covers the occasions you want (* x x) or
(expt x 2)

Customer testimonial:
> I have used integer-sqrt in lots of Project Euler problems. It has never let me down!
>  - Tim, Epsom

--
sent from my phone
consider yourself lucky if you get
all the right letters in the right order

JP Verkamp

unread,
Apr 25, 2013, 2:57:11 PM4/25/13
to Jordan Schatz, Racket Users
If you'd like to see some code about the same sort of problem scaled up, I have a pair of blog posts about summing the first billion prime numbers using Racket (it gets even more interesting, since the sieves start getting too large to hold in a single vector):

Calculates a direct sum using naive prime testing, prime testing using previous primes, and the Sieves of Eratosthenes, Atkin, and Sundaram.

Uses a segmented Sieve of Eratosthenes which runs about three times faster than the previous best.

Another trick (if you can call it that) that I picked up on this mailing list a bit ago was that converting code to Typed Racket for numeric problems can do wonders. Without it, Racket has to do a lot more checking to make sure you aren't adding apples and oranges. So even just by adding a few type annotations (and not changing anything else) you can get some pretty impressive boosts. 

Here's some more on that:

(Granted, if you make the change below using integer-sqrt and in-range, the runtime is already only 0.85 seconds on my machine, so that may not strictly be necessary. :)

JP
Reply all
Reply to author
Forward
0 new messages