# Computing large Bernoulli numbers

185 views

### Fredrik Johansson

May 2, 2008, 2:34:35 PM5/2/08
to sage-devel
Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
10 millionth Bernoulli number using Mathematica:
http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/

How does sage's Bernoulli number implementation compare? I'd like to
see bernoulli(10^7) in sage beating Mathematica's time. And then
computing the 20 millionth Bernoulli number...

Fredrik

### mhampton

May 2, 2008, 2:56:30 PM5/2/08
to sage-devel
It takes about 30 seconds on my machine to get the 10^5 Bernoulli
number. The mathematica blog says it took a "development" version of
mathematica 6 days to do the 10^7 calc. So it would probably take
some work, but we are not that badly off as is.

-M. Hampton

On May 2, 12:34 pm, Fredrik Johansson <fredrik.johans...@gmail.com>
wrote:
> Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
> 10 millionth Bernoulli number using Mathematica:http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-recor...

### David Harvey

May 2, 2008, 3:00:26 PM5/2/08

On May 2, 2008, at 2:56 PM, mhampton wrote:

> It takes about 30 seconds on my machine to get the 10^5 Bernoulli
> number. The mathematica blog says it took a "development" version of
> mathematica 6 days to do the 10^7 calc. So it would probably take
> some work, but we are not that badly off as is.

But what about the asymptotics? I tried 10^5 and 2*10^5 and 4*10^5
and it wasn't pretty.

david

### John Cremona

May 2, 2008, 3:01:38 PM5/2/08
I might take a look at this, as there are some ways fo computing B nos
which are very much faster tha others, and not everyone knows them.
Pari has something respectable, certainly.

John

2008/5/2 mhampton <hamp...@gmail.com>:

### John Cremona

May 2, 2008, 3:10:10 PM5/2/08
ok, so the docstring reaveals (1) that the pari version is "by far the
fastest" as I suspected, but also that for n>50000 that we use a gp
interface rather than the pari library " since the C-library interface
to PARI
is limited in memory for individual operations" -- whatever that means!

That might explain David's timing observations.

I tihnk the pari implementation is actually quite simple (and there is
a huge amount about Berouilli nos in Henri Cohen's latest book too)
which suggests that doing a cython implementation would not be hard.

Maybe this is time for a repeat performance of the partitions
competition with M**ca!

John

2008/5/2 John Cremona <john.c...@gmail.com>:

### William Stein

May 2, 2008, 3:11:51 PM5/2/08

Sage's Bernoulli number is *just* PARI/GP's bernoulli number implementation.

Last time I tried timing Sage versus Mathematica's Bernoulli number command
(which was 2 years ago), Sage was twice as fast.

William

### William Stein

May 2, 2008, 3:40:00 PM5/2/08
On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
<fredrik....@gmail.com> wrote:
>

I couldn't find any information about the hardware that guy used.
64-bit? 32-bit?
1.8Ghz or 3Ghz? Could somebody write and ask?

Also, when I tried

bernoulli(10^7+2)

directly in Sage there were a couple of issues that arose, since that command
is much more designed for smaller input. I fixed those small issues.
I guess we'll see in a week ..

William

### David Harvey

May 2, 2008, 3:41:48 PM5/2/08

On May 2, 2008, at 3:40 PM, William Stein wrote:

> Also, when I tried
>
> bernoulli(10^7+2)
>
> directly in Sage there were a couple of issues that arose, since
> that command
> is much more designed for smaller input. I fixed those small issues.
> I guess we'll see in a week ..

I hope you did:

sage: x = bernoulli(10^7 + 2)

and not

sage: bernoulli(10^7 + 2)

david

### didier deshommes

May 2, 2008, 3:43:27 PM5/2/08
On Fri, May 2, 2008 at 3:40 PM, William Stein <wst...@gmail.com> wrote:
>
> On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
> <fredrik....@gmail.com> wrote:
> >
>
> > Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
> > 10 millionth Bernoulli number using Mathematica:
> > http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
> >
> > How does sage's Bernoulli number implementation compare? I'd like to
> > see bernoulli(10^7) in sage beating Mathematica's time. And then
> > computing the 20 millionth Bernoulli number...
>
> I couldn't find any information about the hardware that guy used.
> 64-bit? 32-bit?
> 1.8Ghz or 3Ghz? Could somebody write and ask?

I did earlier, and I hope he will answer.

didier

### Bill Hart

May 2, 2008, 3:43:49 PM5/2/08
to sage-devel
I think the asymptotics aren't going to go our way if we use pari. It
takes 11s for 10^5 and I've been sitting here for quite a few minutes
and didn't get 10^6 yet.

I think pari uses the zeta function to compute bernoulli numbers.

If I'm reading the code right it first computes 1/zeta(n) using the
Euler product, then computes the numerator of the bernoulli number to
the required precision using this value, then divides by the required
denominator, which is just a product of primes.

Bill.

On 2 May, 20:11, "William Stein" <wst...@gmail.com> wrote:
> On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
>
> <fredrik.johans...@gmail.com> wrote:
>
> >  Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
> >  10 millionth Bernoulli number using Mathematica:
> >  http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-recor...

### William Stein

May 2, 2008, 3:45:38 PM5/2/08
On Fri, May 2, 2008 at 12:10 PM, John Cremona <john.c...@gmail.com> wrote:
>
> ok, so the docstring reaveals (1) that the pari version is "by far the
> fastest" as I suspected, but also that for n>50000 that we use a gp
> interface rather than the pari library " since the C-library interface
> to PARI
> is limited in memory for individual operations" -- whatever that means!

That's out of date now that Gonzalo T. and I fixed the pari interface
so that it automatically doubles the stack if needed. The
code needs to be fixed to never use gp by default. If you explicitly
use algorithm='pari' it will still switch to gp; changing this was
one of my fixes to my code so I could try this.

Tom is trying the whole calculation directly in GP.
He also did a log fit to timings and estimates it should take
about a week in Sage on his machine. We'll see.

>
> That might explain David's timing observations.
>
> I tihnk the pari implementation is actually quite simple (and there is
> a huge amount about Berouilli nos in Henri Cohen's latest book too)
> which suggests that doing a cython implementation would not be hard.
>
> Maybe this is time for a repeat performance of the partitions
> competition with M**ca!

The complexity mostly depends on the precision one uses in
computing a certain Euler product approximation to zeta
and also the number of factors in the product. If you look
at the PARI source code the comments do *not* inspire confidence in
its correctness. I had a student give a provable bound on precision
and number of factors needed and wasn't able to get anything
as good as what PARI uses.

Here's the funny part of the PARI code (in trans3.c):

/* 1.712086 = ??? */
t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) + 1.712086;

-- William

### David Harvey

May 2, 2008, 3:46:07 PM5/2/08

On May 2, 2008, at 3:43 PM, Bill Hart wrote:

> I think the asymptotics aren't going to go our way if we use pari. It
> takes 11s for 10^5 and I've been sitting here for quite a few minutes
> and didn't get 10^6 yet.

So far I have on a 2.6GHz opteron:

sage: time x = bernoulli(60000)
Wall time: 3.79

sage: time x = bernoulli(120000)
Wall time: 16.97

sage: time x = bernoulli(240000)
Wall time: 118.24

sage: time x = bernoulli(480000)
Wall time: 540.25

and I'll report back with 960000 hopefully within an hour.

david

### William Stein

May 2, 2008, 3:46:36 PM5/2/08

I did:

t=cputime(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem(); b =
bernoulli(10^7+2, algorithm='pari'); b.save('bern.sobj');
save(t,'time.sobj')

after patching Sage to always use the PARI C library if algorithm='pari'.

-- William

### David Harvey

May 2, 2008, 3:55:54 PM5/2/08

On May 2, 2008, at 3:45 PM, William Stein wrote:

> The complexity mostly depends on the precision one uses in
> computing a certain Euler product approximation to zeta
> and also the number of factors in the product. If you look
> at the PARI source code the comments do *not* inspire confidence in
> its correctness. I had a student give a provable bound on precision
> and number of factors needed and wasn't able to get anything
> as good as what PARI uses.
>
> Here's the funny part of the PARI code (in trans3.c):
>
> /* 1.712086 = ??? */
> t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) +
> 1.712086;

One way to check it is to use the bernoulli_mod_p_single() function,
which computes B_k mod p for a single p and k, and uses a completely
independent algorithm.

sage: x = bernoulli(240000)

sage: p = next_prime(500000)
sage: bernoulli_mod_p_single(p, 240000)
498812
sage: x % p
498812

sage: p = next_prime(10^6)
sage: bernoulli_mod_p_single(p, 240000)
841174
sage: x % p
841174

So I would say the answer is correct.

david

### William Stein

May 2, 2008, 4:04:10 PM5/2/08

I've done numerous similar tests, and
I definitely don't think PARI is giving wrong answers.
The issue is just that I've written a paper to generalize
the algorithm to generalized Bernoulli numbers, and was
very annoyed that I couldn't prove that even the algorithm
used by PARI worked.

-- William

### boo...@u.washington.edu

May 2, 2008, 4:08:52 PM5/2/08
Funny this should come up. William just gave a take-home midterm in which we had to predict the runtime for various computations, so I wrote some generic code to help. According to my code, and some liberal assumptions, it should take 5.1 days. I've attached the plots that show the curves I fit to some runtime data (x-axis is log(n,1.5) y-axis is seconds).

However, this same code predicted that computing the determinant of a 10000x10000 matrix with single-digit entries would take 20 hours, but it really took 30 hours. So my estimates are not to be trusted too much as the numbers grow...

### David Harvey

May 2, 2008, 4:13:15 PM5/2/08

On May 2, 2008, at 4:08 PM, boo...@u.washington.edu wrote:

> Funny this should come up. William just gave a take-home midterm
> in which we had to predict the runtime for various computations, so
> I wrote some generic code to help. According to my code, and some
> liberal assumptions, it should take 5.1 days. I've attached the
> plots that show the curves I fit to some runtime data (x-axis is log
> (n,1.5) y-axis is seconds).

Sorry, could you please say more precisely what the two axes are? I'm
seeing negative time the way I interpret your statement.

david

### boo...@u.washington.edu

May 2, 2008, 4:15:44 PM5/2/08
Sorry, the y-axis in the lower plot is log(time in seconds).

On Fri, 2 May 2008, David Harvey wrote:

### David Harvey

May 2, 2008, 4:24:05 PM5/2/08
One more data point (2.6GHz opteron):

sage: time x = bernoulli(60000)
Wall time: 3.79

sage: time x = bernoulli(120000)
Wall time: 16.97

sage: time x = bernoulli(240000)
Wall time: 118.24

sage: time x = bernoulli(480000)
Wall time: 540.25

sage: time x = bernoulli(960000)
Wall time: 2436.06

The ratios between successive times are:

4.47757255936675
6.96758986446671
4.56909675236806
4.50913465987969

If we guess that it's "really" 4.5, then the complexity is N^(log
(4.5) / log(2)) = N^2.17. This is puzzling; I thought the algorithm
was supposed to be better than quadratic. Does anyone know what the
theoretical complexity is supposed to be?

Anyway, extrapolating gives about 4.5 days, pretty much the same as
what Tom estimates. I'm going to start it running now.

david

### didier deshommes

May 2, 2008, 4:34:28 PM5/2/08

---------- Forwarded message ----------
From: Oleksandr Pavlyk <pav...@gmail.com>
Date: Fri, May 2, 2008 at 4:29 PM
Subject: Re: Today We Broke the Bernoulli Record: From the Analytical
Engine to Mathematica
To: didier deshommes <dfde...@gmail.com>

Hi Didier,

I used Linux, with 64 bit AMD processor:

AMD Opteron(tm) Processor 250
cpu MHz : 1000.000
cache size : 1024 KB

and 8GB of memory, but as I say in the blog, I did
not use that much.

The calculations were done using development build of
Mathematica, but calculations will go through in any
flavor of Mathematica version 6 as well, to the best of
my knowledge. Just run

Timing[ result = BernoulliB[10^7]; ]

It will take about twice longer on 32-bit processors,

Sincerely,
Oleksandr Pavlyk

On Fri, May 2, 2008 at 2:12 PM, didier deshommes <dfde...@gmail.com> wrote:
> Hi Dr. Pavlyk,
> My question is in referrence to your blog post:
> http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
>
> - Do you have the specs of the machine you ran this off? CPU, memory, etc.
> - I assume this function is in the development version of mathematica?
>
> Thanks for your informative post!
>
> didier

### mabshoff

May 2, 2008, 4:38:30 PM5/2/08
to sage-devel

On May 2, 10:34 pm, "didier deshommes" <dfdes...@gmail.com> wrote:

Hi,

> Hi Didier,
>
>  I used Linux, with 64 bit AMD processor:
>
>  AMD Opteron(tm) Processor 250
>  cpu MHz         : 1000.000
>  cache size      : 1024 KB

FYI: That CPU runs at 2.4GHz when not throttled, like in this case. I
assume that it would run at full speed during the computation :)

>  and 8GB of memory, but as I say in the blog, I did
>  not use that much.

Yep.

Cheers,

Michael

### Bill Hart

May 2, 2008, 5:22:24 PM5/2/08
to sage-devel
I did some computations using von Staudt's theorem and up to 400000 no
errors. Of course that doesn't prove anything for much larger n.

Bill.

On 2 May, 21:04, "William Stein" <wst...@gmail.com> wrote:

### Bill Hart

May 2, 2008, 7:30:44 PM5/2/08
to sage-devel
The theoretical complexity of all the algorithms that rely on
recurrences is supposed to be n^2. But this doesn't take into account
the size of the numbers themselves. When you do this they are all
about n^3 as far as I can see. You can use Ramanujan identities, the
Akiyama-Tanigawa algorithm, the identity used by Lovelace, but all are
n^3 or so.

The theoretical complexity of the version using the zeta function
looks something like n log n steps at precision n log n, i.e. time n^2
(log n)^2.

Bill.

### Bill Hart

May 2, 2008, 8:21:06 PM5/2/08
to sage-devel
Actually, it might be n/log(n) steps, so the time might be something
like n^2 though there are other terms involved.

Bill.

### Bill Hart

May 3, 2008, 8:12:36 AM5/3/08
to sage-devel
I think I nearly understand what Pari does.

The value of B_k is given by zeta(n)*(2*n!)/(2^n pi^n). However,
zeta(n) is *very* close to 1 for large n. So one starts by computing
zeta to a precision given by the size of (2*n!)/(2^n pi^n) (which is
basically the size of B_k) with 3 added to the precision to take care
of small values of n.

To compute how large (2*n!)/(2^n pi^n) can be, one uses Stirling's
formula for n!, i.e. n! ~ sqrt(2n*pi)*n^n*e^(-n). The log of this
expression is never more than 0.1 of the log of n!. Taking the log of
the expression one gets when plugging in Stirling's approximation, one
gets that the log of B_k does not exceed (n + 0.5) * log(n) -
n*(1+log2PI) + log(2*sqrt(2*Pi)) + 0.1 except for very small n (for
which it is sufficient to add 3 to the precision).

Now we need to know how big the numerator can be in B_k. But we can
compute the denominator precisely using the Clausen-von Staudt
formula. Pari does this and calls the denominator d.

Now the log of the numerator is no bigger than t = log(d) + (n + 0.5)
* log(n) - n*(1+log2PI) + log(2*sqrt(2*Pi)) + 0.1 = log(d ) + (n +
0.5) * log(n) - n*(1+log2PI) + 1.712086.

If this is how big the numerator of B_k is, we only need to compute it
to half a part in e^t. Thus we only need to compute 1/zeta(n) to half
a part in e^t. This is done by computing the inverse of the Euler
product for sufficiently many primes. Suppose we compute this for
primes up to p. Then the error in the inverse of the zeta function is
less than sum 1/p^n + 1/(p+1)^n +..... Consider this sum p terms at a
time. The first p are <= 1/p^n, the second p terms are <= 1/(2p)^n,
etc. Thus the error is quite a bit less than zeta(n) * p/p^n ~ 1/
p^(n-1).

Thus if we want this error to be less than half a part in e^t we need
to choose p to be about exp(t)^(1/(n-1)). Pari uses exp((t -
log(n-1)) / (n-1)) and I'm not sure where the extra log(n-1) comes
from, but either they use a slightly better approximation than me, or
they perhaps note that one doesn't need the first 1/p^n in the
approximation.

Pari could be improved by noting that one does not need the last few
terms of the numerator, as they can be recovered from Clausen-von
Staudt, which actually gives the fractional part of B_k. But for large
n, this is not a significant saving.

Bill.

### Bill Hart

May 3, 2008, 8:14:19 AM5/3/08
to sage-devel
Sorry, this:

The log of this expression is never more than 0.1 of the log of n!.

The log of this expression is always within 0.1 of the log of n!.

Bill.

### Bill Hart

May 3, 2008, 9:57:38 AM5/3/08
to sage-devel
I probably also mean:

Then the error in the the zeta function

not:

Then the error in the inverse of the zeta function

Bill.

On 3 May, 13:12, Bill Hart <goodwillh...@googlemail.com> wrote:

### boo...@u.washington.edu

May 5, 2008, 4:02:54 PM5/5/08
My computation of bernoulli(10^7+4) using GP version 2.3.3 has completed in 217417011 miliseconds. That's about 2 days, 12 hours. Anybody know how I can print the thing to file?

Machine:
Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.

Currently, my gp session is using 4GB of RAM.

### William Stein

May 5, 2008, 4:09:12 PM5/5/08
On Mon, May 5, 2008 at 1:02 PM, <boo...@u.washington.edu> wrote:
>
> My computation of bernoulli(10^7+4) using GP version 2.3.3 has completed in 217417011 miliseconds. That's about 2 days, 12 hours. Anybody know how I can print the thing to file?
>

So PARI is already over twice as fast as Mathematica? Cool!
BTW, I started almost same computation at the same time on a 32-bit
build of PARI under os x, and it will probably take days more.
We need 64-bit Sage on OS X!

> Machine:
> Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.
>
> Currently, my gp session is using 4GB of RAM.

Use the write command:

? write('foo', 92834028340)
?

>
>
>
>
>
> >
>

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

### mhampton

May 6, 2008, 12:53:53 PM5/6/08
to sage-devel
That certainly merits a blog post somewhere - ?

### David Harvey

May 6, 2008, 1:09:18 PM5/6/08

You might also want to check the result using the
bernoulli_mod_p_single function that I mentioned a few days ago on

david

### boo...@u.washington.edu

May 6, 2008, 1:15:00 PM5/6/08
to sage-devel
William has mentioned some congruence tests that we can perform -- I'd like to make sure that I got the right answer before we pat ourselves on the back too much.

### William Stein

May 6, 2008, 1:18:53 PM5/6/08
On Tue, May 6, 2008 at 10:15 AM, <boo...@u.washington.edu> wrote:
>
> William has mentioned some congruence tests that we can perform -- I'd like to make sure that I got the right answer before we pat ourselves on the back too much.
>
>

David Harvey's congruence tests would be pretty good. Just choose
*any* prime p > 10^7 + 10
say and compute B_{10^7+4} modulo it using David Harvey's function;

sage: p = next_prime(10^7+10)
sage: time bernoulli_mod_p_single(p, 10^7+4)
CPU times: user 0.49 s, sys: 0.00 s, total: 0.49 s
Wall time: 0.51
8087583

Pretty cool, eh?

William

### David Harvey

May 6, 2008, 1:57:00 PM5/6/08

..... and the natural question is, could you then reconstruct the
actual bernoulli number, by computing it modulo sufficiently many
primes? Well, I did the estimates a few days ago, and it turns out
that the asymptotic behaviour of this proposed algorithm is pretty
much *the same* as the one using the zeta function (i.e. the one Pari
uses); they are both around n^2 log^2(n), if I'm not mistaken.
Unfortunately, I did some further tests, and even if I sped up
bernoulli_mod_p_single() by the largest factor I could conceive of,
overall it would still be maybe 50x slower than Pari. If anyone can
find an extra constant factor of 50x in this algorithm, I'd love to

david

### William Stein

May 6, 2008, 2:08:11 PM5/6/08

I think maybe yours parallelizes much more easily than
Pari's, so just use 1000 computers :-).

-- William

### Mike Hansen

May 6, 2008, 2:12:30 PM5/6/08
I think a blog post with PARI timings and then timings for a modular
dsage approach would be cool.

--Mike

### David Harvey

May 6, 2008, 2:16:09 PM5/6/08

On May 6, 2008, at 2:12 PM, Mike Hansen wrote:

> I think a blog post with PARI timings and then timings for a modular
> dsage approach would be cool.

Probably not so cool, since it would be like 50 machines vs one machine.

david

### Mike Hansen

May 6, 2008, 2:19:33 PM5/6/08
> Probably not so cool, since it would be like 50 machines vs one machine.

Sure, but the Mathematica blog post is scalablity: "In Mathematica, a
core principle is that everything should be scalable. So in my job of
creating algorithms for Mathematica I have to make sure that
everything I produce is scalable."

--Mike

### mhampton

May 6, 2008, 2:25:02 PM5/6/08
to sage-devel
I agree, I think demonstrating a distributed algorithm would be very
cool. From what I can tell of processor trends, we won't see enormous
gains in speed but we might see an awful lot of processors (like
Intel's prototype 80-core chip).

### David Harvey

May 6, 2008, 2:55:15 PM5/6/08

But Pari's algorithm is already parallelisable (just farm off a bunch
of euler factors to each thread).

The only advantage my algorithm has in "scalability" is that each
thread needs only O(1) memory, instead of O(n log n) which would be
required for Pari's algorithm. So if memory were tight, but you had
lots of processors, and your n was really big, then perhaps this
algorithm would win. But when I think about the actual numbers
involved, the economics just don't work out. Even 80 processors is a
pathetically small number, given a 50x serial slowdown in the
algorithm. You would need thousands of cpus to even consider this
approach. And even then, it would only be worthwhile if each cpu had
very limited memory.

david