[racket] Math - factorial.rkt, binomial.rkt and memoization

Laurent

unread,

Apr 8, 2013, 12:33:11 PM4/8/13

to Racket Mailing List

Would it make sense to use memoization more broadly for the definition of the factorial in the math lib, maybe with a `provide'd parameter to control how many values can be memoized?
https://github.com/plt/racket/blob/master/collects/math/private/number-theory/factorial.rkt

While building big numbers (around 800!) I had to redefine my own factorial and binomial functions with memoization, otherwise it would take too long.

Also, probably `binomial' could take advantage of this memoization, just like `permutations' does.

https://github.com/plt/racket/blob/master/collects/math/private/number-theory/binomial.rkt

Currently (in DrRacket, so not entirely accurate, but should be close enough):
> (time
   (for ([n 100])
     (binomial 10000 4000)))
cpu time: 4072 real time: 4075 gc time: 440
> (time
   (for ([n 10000])
     (binomial 800 400)))
cpu time: 9424 real time: 9430 gc time: 836
> (time (for ([i 1000])
          (factorial 10000)))
cpu time: 11441 real time: 11630 gc time: 276

which makes it impractical in some cases.

Laurent

Jens Axel Søgaard

unread,

Apr 8, 2013, 3:11:00 PM4/8/13

to Laurent, Racket Mailing List

Hi Laurent,

I am curious to see how fast the libgmp versions of factorial

and binomial are on your computer. Om my computer libgmp

is much faster than the Racket one.

Since the libgmp versions converts a Racket bignum into

a mpz, there is some overhead, so a proper solution

would be to use a Racket version for small numbers,

and then call the libgmp for large numbers.

An even better solution would be to have a builtin bignum->mpz.

/Jens Axel

#lang racket
(require math/private/bigfloat/gmp)
(require (for-syntax racket/base)
         ffi/unsafe
         ffi/unsafe/define
         racket/runtime-path)

(define-runtime-path libgmp-so 
  (case (system-type)
    [(macosx) '(so "libgmp.10.dylib")]
    [(windows) '(so "libgmp-10.dll")]
    [else '(so "libgmp")]))

(define (get-gmp-fun name type)
  (get-ffi-obj name gmp-lib type (make-not-available name)))

(define mpz-fac-ui
  (get-gmp-fun '__gmpz_fac_ui  (_fun _mpz-pointer _int -> _mpz-pointer)))

(define (fact n)
  (define n! (new-mpz))
  (mpz-fac-ui n! n)
  (mpz->integer n!))

(define mpz-bin-ui
  (get-gmp-fun '__gmpz_bin_ui  
               (_fun _mpz-pointer _mpz-pointer _int -> _mpz-pointer)))

(define (bin n k)
  (define b (new-mpz))
  (mpz-bin-ui b (integer->mpz n) k)
  (mpz->integer b))

(require (only-in math factorial))
(time (for ([i (in-range 2000 2100)])
        (factorial i)))

(time (for ([i (in-range 2000 2100)])
        (fact i)))

(require (only-in math binomial))
(time (for ([n 100])
        (bin 10000 4000)))
(time (for ([n 100])
        (binomial 10000 4000)))

2013/4/8 Laurent <laurent...@gmail.com>

____________________
Racket Users list:
http://lists.racket-lang.org/users

--
--
Jens Axel Søgaard

Bradley Lucier

unread,

Apr 8, 2013, 3:34:04 PM4/8/13

to us...@racket-lang.org, Bradley Lucier

On Apr 8, 2013, at 3:11 PM, users-...@racket-lang.org wrote:

I would never recommend against using GMP for bignum code, but I use code like this for factorial, based on the idea of binary splitting:

(define (iota m n)
(let loop ((result '())
(n (- n 1)))
(if (<= m n)
(loop (cons n result)
(- n 1))
result)))

(define (partial-factorial m n)
;; computes the product (m+1) * ... * (n-1) * n
(if (< (- n m) 10)
(apply * (iota (+ m 1) (+ n 1)))
(* (partial-factorial m (quotient (+ m n) 2))
(partial-factorial (quotient (+ m n) 2) n))))

(define (factorial n)
(partial-factorial 0 n))

In Gambit, even in the interpreter, this is pretty fast:

(define a (time (factorial 1000000)))
(time (factorial 1000000))
6466 ms real time
6462 ms cpu time (6011 user, 452 system)
645 collections accounting for 370 ms real time (212 user, 160 system)
1712593016 bytes allocated
84704 minor faults
32 major faults

Brad

Jens Axel Søgaard

unread,

Apr 8, 2013, 3:34:13 PM4/8/13

to Laurent, Racket Mailing List

> Would it make sense to use memoization more broadly for the definition of the factorial

> in the math lib, maybe with a `provide'd parameter to control how many values can be

> memoized?

An interesting question. First of all it seems that especially binomial

can be improved. Second I think there is a difference between

non-permanent caching and permanent caching. I see no problem

with non-permanent caching. For caching between calls I am not

sure, what is the best solution for a general purpose library.

I lean towards non-caching functions as standard, but it would

be nice to offer both versions in the library. The same problem

arises in prime? . Having a large table of small primes

improve the running time, but is a reasonable table

size 10^3, 10^4 or 10^5 ?

/Jens Axel

2013/4/8 Laurent <laurent...@gmail.com>

____________________
Racket Users list:
http://lists.racket-lang.org/users

Laurent

unread,

Apr 9, 2013, 5:54:44 AM4/9/13

to Jens Axel Søgaard, Racket Mailing List

On Mon, Apr 8, 2013 at 9:34 PM, Jens Axel Søgaard <jens...@soegaard.net> wrote:

> Would it make sense to use memoization more broadly for the definition of the factorial
> in the math lib, maybe with a `provide'd parameter to control how many values can be

> memoized?

An interesting question. First of all it seems that especially binomial

can be improved. Second I think there is a difference between
non-permanent caching and permanent caching. I see no problem

with non-permanent caching. For caching between calls I am not
sure, what is the best solution for a general purpose library.

I lean towards non-caching functions as standard, but it would
be nice to offer both versions in the library. The same problem

arises in prime? . Having a large table of small primes
improve the running time, but is a reasonable table

size 10^3, 10^4 or 10^5 ?

Hence the idea of the parameter (or several?), with a default correct general value that can be changed by the user.

Btw, the factorial table uses some unnecessary memory space (though not that big anyway), I think it would be better to request memory on demand.

Memoization with a hash table does exactly that, although a growable vector would fit too.

I guess the same goes for `prime?' and maybe other places?

Laurent

unread,

Apr 9, 2013, 6:05:09 AM4/9/13

to Bradley Lucier, users

On Mon, Apr 8, 2013 at 9:34 PM, Bradley Lucier <luc...@math.purdue.edu> wrote:

(define (iota m n)
(let loop ((result '())
(n (- n 1)))
(if (<= m n)
(loop (cons n result)
(- n 1))
result)))

(define (partial-factorial m n)
;; computes the product (m+1) * ... * (n-1) * n
(if (< (- n m) 10)
(apply * (iota (+ m 1) (+ n 1)))
(* (partial-factorial m (quotient (+ m n) 2))
(partial-factorial (quotient (+ m n) 2) n))))

(define (factorial n)
(partial-factorial 0 n))

In Gambit, even in the interpreter, this is pretty fast:

(define a (time (factorial 1000000)))
(time (factorial 1000000))

Indeed, it's impressively fast!
% racket fast-factorial.rkt
cpu time: 5760 real time: 5771 gc time: 28

Probably this should replace the default factorial version then.

Laurent

Pierpaolo Bernardi

unread,

Apr 9, 2013, 7:43:42 AM4/9/13

to Laurent, users, Bradley Lucier

But why generate lists which are immediately discarded?

;; from excluded; to included

(define (mult from to)

(case (- to from)

((0) 1)

((1) to)

((2) (* (- to 1) to))

((3) (* (- to 2) (- to 1) to))

;; ...

(else

(let ((middle (quotient (+ from to) 2)))

(* (mult from middle) (mult middle to))))))

(define (fact n)

(if (zero? n)

1

(mult 1 n)))

Pierpaolo Bernardi

unread,

Apr 9, 2013, 8:00:19 AM4/9/13

to Laurent, users, Bradley Lucier

Or, more lispy:

;; from included; to excluded.

(define (mult from to)

(case (- to from)

((0) 1)

((1) from)

((2) (* from (+ from 1)))

((3) (* from (+ from 1) (+ from 2)))

;; ...

(else

(let ((middle (quotient (+ from to) 2)))

(* (mult from middle) (mult middle to))))))

(define (fact n)

(if (zero? n)

1

(mult 2 (add1 n))))

Laurent

unread,

Apr 9, 2013, 8:02:27 AM4/9/13

to Pierpaolo Bernardi, users, Bradley Lucier

But if you write all 10 cases, aren't the lists constructed anyway before calling `*'?

Pierpaolo Bernardi

unread,

Apr 9, 2013, 8:13:24 AM4/9/13

to Laurent, users, Bradley Lucier

On Tue, Apr 9, 2013 at 2:02 PM, Laurent <laurent...@gmail.com> wrote:

But if you write all 10 cases, aren't the lists constructed anyway before calling `*'?

I'll leave the answer to someone who knows exactly how the compiler works.

Intuitively, I thought my case should be better, but measuring them gives no appreciable difference, so I don't know. :)

(I didn't want to suggest that 10 cases are necessary. The right number should be determined by experiment.)

Matthew Flatt

unread,

Apr 9, 2013, 8:42:05 AM4/9/13

to Laurent, users

At Tue, 9 Apr 2013 14:02:27 +0200, Laurent wrote:
> But if you write all 10 cases, aren't the lists constructed anyway before
> calling `*'?

Applying `*' to N arguments doesn't construct a list of N arguments.
There's a continuation frame that accumulates the N arguments, but in a
way that tends to be much more efficient for the purposes of
constructing a function call.

For `(partial-factorial 0 1000000)', the difference doesn't matter,
since the 100000 or so bignum multiplications dominate. But I'd write

(for/fold ([v 1]) ([p (in-range (+ m 1) (+ n 1))])
(* v p))

or even

(for/product ([p (in-range (+ m 1) (+ n 1))]) p)

Laurent

unread,

Apr 9, 2013, 8:51:21 AM4/9/13

to Matthew Flatt, users

On Tue, Apr 9, 2013 at 2:42 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote:

(for/product ([p (in-range (+ m 1) (+ n 1))]) p)

This fact/for variant is a clear winner:
> (time (void (fact/for 1000000)))
cpu time: 6948 real time: 6956 gc time: 964
> (time (void (factorial 1000000)))
cpu time: 9936 real time: 9951 gc time: 3700
> (time (void (fact 1000000)))
cpu time: 8445 real time: 8460 gc time: 2273

But I don't fully understand why a simple (for/product ([p (in-range 1 (add1 n))]) p) is as fast as the iota variants.

I see that the latter makes many more small products, which are certainly faster, but it also does more products of 2 big numbers, whereas for/product makes only big*small products. Is that a sufficient reason?

Laurent

Pierpaolo Bernardi

unread,

Apr 9, 2013, 9:30:00 AM4/9/13

to Laurent, Matthew Flatt, users

On Tue, Apr 9, 2013 at 2:51 PM, Laurent <laurent...@gmail.com> wrote:

On Tue, Apr 9, 2013 at 2:42 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote:

(for/product ([p (in-range (+ m 1) (+ n 1))]) p)

But I don't fully understand why a simple (for/product ([p (in-range 1 (add1 n))]) p) is as fast as the iota variants.

I don't understand either.

FWIW here's the kind of multiplications performed by the doctored version attached below:

> (fact 1000000)

#hash(((fíx fíx fíx) . 48577) ((big big) . 209717) ((fíx big) . 46338) ((fíx fíx) . 646789))

while the for version should be almost all big x big.

====

(define x*-counter 'not-initialized)

(define (x* . args)

(hash-update! x*-counter (map (λ (x)

(if (fixnum? x)

'fíx

'big))

args)

add1

0)

(apply * args))

;; from included; to excluded.

(define (mult from to)

(case (- to from)

((0) 1)

((1) from)

((2) (x* from (+ from 1)))

((3) (x* from (+ from 1) (+ from 2)))

;; ...

(else

(let ((middle (quotient (+ from to) 2)))

(x* (mult from middle) (mult middle to))))))

(define (fact n)

(set! x*-counter (make-hash))

(define f (if (zero? n)

1

(mult 2 (add1 n))))

(values ;f

x*-counter))

Pierpaolo Bernardi

unread,

Apr 9, 2013, 9:33:43 AM4/9/13

to Laurent, Matthew Flatt, users

On Tue, Apr 9, 2013 at 3:30 PM, Pierpaolo Bernardi <olop...@gmail.com> wrote:

while the for version should be almost all big x big.

oops, no. The for version should be almost all fix x big.

Here's a count considering also the type of the result of the multiplication:

> (fact 1000000)

#hash(((fix fix -> fix) . 437071)

((big big -> big) . 209717)

((fix big -> big) . 46338)

((fix fix fix -> fix) . 48577)

((fix fix -> big) . 209718))

====

(define x*-counter 'not-initialized)

(define (x* . args)

(define (classify x)

(if (fixnum? x)

'fix

'big))

(define res (apply * args))

(hash-update! x*-counter

(append (map classify args)

'(->)

(list (classify res)))

Jos Koot

unread,

Apr 9, 2013, 2:26:31 PM4/9/13

to Laurent, Matthew Flatt, users

Would it be possible to use Stirling's approximation for a fast inexact first approximation for factorials of very big numbers and from there quickly get to an exact factorial? (if exactness is required) I don't know for I haven't thought thoroughly about this.

Jos.

Hendrik Boom

unread,

Apr 9, 2013, 2:41:31 PM4/9/13

to users

On Tue, Apr 09, 2013 at 08:26:31PM +0200, Jos Koot wrote:
> Would it be possible to use

> <http://en.wikipedia.org/wiki/Stirling%27s_approximation> Stirling's

> approximation for a fast inexact first approximation for factorials of very
> big numbers and from there quickly get to an exact factorial? (if exactness
> is required) I don't know for I haven't thought thoroughly about this.
> Jos.

I don't know of a way take an approximation and by applying some
operation to it to get a better one.

There are ways to do thie for other functions, like squate root, but I
don't know one for factorial.

But there might be a bound on the error in Stirling's approximation that
you cna use teo determine whether that's good enough for your
application (if you have one).

-- hendrik

Neil Toronto

unread,

Apr 9, 2013, 4:00:20 PM4/9/13

to us...@racket-lang.org

On 04/09/2013 11:41 AM, Hendrik Boom wrote:
> On Tue, Apr 09, 2013 at 08:26:31PM +0200, Jos Koot wrote:
>> Would it be possible to use
>> <http://en.wikipedia.org/wiki/Stirling%27s_approximation> Stirling's
>> approximation for a fast inexact first approximation for factorials of very
>> big numbers and from there quickly get to an exact factorial? (if exactness
>> is required) I don't know for I haven't thought thoroughly about this.
>> Jos.
>
> I don't know of a way take an approximation and by applying some
> operation to it to get a better one.
>
> There are ways to do thie for other functions, like squate root, but I
> don't know one for factorial.

For Newton's method, you'd need a differentiable inverse, but computing
the gamma function's inverse is hard in the first place.

> But there might be a bound on the error in Stirling's approximation that
> you cna use teo determine whether that's good enough for your
> application (if you have one).

Use the first omitted series term as the signed relative error bound. If
this is e, then exp(|e|)-1 (which is nearly |e| for small e) is the
relative error bound after exponentiating.

Stirling's series isn't convergent for any particular n; i.e. for any n
there's a point at which adding more terms makes the error worse. On the
other hand, for any fixed number of terms, error decreases as *n*
increases. So the thing I'd try first is finding the n for which the
error of the non-polynomial part n * log(n) - n + 1/2 * log(2*pi*n) is
acceptable, and use its exponential

n^((2*n+1)/2) * exp(-n) * sqrt(2*pi)

for all greater n. The tricky part would be computing exp(-n) with
acceptable precision. (In fact, this is where I'd go looking for other
solutions.)

Neil ⊥

Jos Koot

unread,

Apr 9, 2013, 11:51:15 PM4/9/13

to Neil Toronto, us...@racket-lang.org, Hendrik Boom

Thanks for your interesting replies. It was just my first thought.
Jos

Reply all

Reply to author

Forward