Computing large Bernoulli numbers

Fredrik Johansson

unread,

May 2, 2008, 2:34:35 PM5/2/08

to sage-devel

Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
10 millionth Bernoulli number using Mathematica:
http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/

How does sage's Bernoulli number implementation compare? I'd like to
see bernoulli(10^7) in sage beating Mathematica's time. And then
computing the 20 millionth Bernoulli number...

Fredrik

mhampton

unread,

May 2, 2008, 2:56:30 PM5/2/08

to sage-devel

It takes about 30 seconds on my machine to get the 10^5 Bernoulli
number. The mathematica blog says it took a "development" version of
mathematica 6 days to do the 10^7 calc. So it would probably take
some work, but we are not that badly off as is.

-M. Hampton

On May 2, 12:34 pm, Fredrik Johansson <fredrik.johans...@gmail.com>
wrote:

> Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the

> 10 millionth Bernoulli number using Mathematica:http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-recor...

David Harvey

unread,

May 2, 2008, 3:00:26 PM5/2/08

to sage-...@googlegroups.com

On May 2, 2008, at 2:56 PM, mhampton wrote:

> It takes about 30 seconds on my machine to get the 10^5 Bernoulli
> number. The mathematica blog says it took a "development" version of
> mathematica 6 days to do the 10^7 calc. So it would probably take
> some work, but we are not that badly off as is.

But what about the asymptotics? I tried 10^5 and 2*10^5 and 4*10^5
and it wasn't pretty.

david

John Cremona

unread,

May 2, 2008, 3:01:38 PM5/2/08

to sage-...@googlegroups.com

I might take a look at this, as there are some ways fo computing B nos
which are very much faster tha others, and not everyone knows them.
Pari has something respectable, certainly.

John

2008/5/2 mhampton <hamp...@gmail.com>:

John Cremona

unread,

May 2, 2008, 3:10:10 PM5/2/08

to sage-...@googlegroups.com

ok, so the docstring reaveals (1) that the pari version is "by far the
fastest" as I suspected, but also that for n>50000 that we use a gp
interface rather than the pari library " since the C-library interface
to PARI
is limited in memory for individual operations" -- whatever that means!

That might explain David's timing observations.

I tihnk the pari implementation is actually quite simple (and there is
a huge amount about Berouilli nos in Henri Cohen's latest book too)
which suggests that doing a cython implementation would not be hard.

Maybe this is time for a repeat performance of the partitions
competition with M**ca!

John

2008/5/2 John Cremona <john.c...@gmail.com>:

William Stein

unread,

May 2, 2008, 3:11:51 PM5/2/08

to sage-...@googlegroups.com

Sage's Bernoulli number is *just* PARI/GP's bernoulli number implementation.

Last time I tried timing Sage versus Mathematica's Bernoulli number command
(which was 2 years ago), Sage was twice as fast.

William

William Stein

unread,

May 2, 2008, 3:40:00 PM5/2/08

to sage-...@googlegroups.com

On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
<fredrik....@gmail.com> wrote:
>

I couldn't find any information about the hardware that guy used.
64-bit? 32-bit?
1.8Ghz or 3Ghz? Could somebody write and ask?

Also, when I tried

bernoulli(10^7+2)

directly in Sage there were a couple of issues that arose, since that command
is much more designed for smaller input. I fixed those small issues.
I guess we'll see in a week ..

William

David Harvey

unread,

May 2, 2008, 3:41:48 PM5/2/08

to sage-...@googlegroups.com

On May 2, 2008, at 3:40 PM, William Stein wrote:

> Also, when I tried
>
> bernoulli(10^7+2)
>
> directly in Sage there were a couple of issues that arose, since
> that command
> is much more designed for smaller input. I fixed those small issues.
> I guess we'll see in a week ..

I hope you did:

sage: x = bernoulli(10^7 + 2)

and not

sage: bernoulli(10^7 + 2)

david

didier deshommes

unread,

May 2, 2008, 3:43:27 PM5/2/08

to sage-...@googlegroups.com

On Fri, May 2, 2008 at 3:40 PM, William Stein <wst...@gmail.com> wrote:
>
> On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
> <fredrik....@gmail.com> wrote:
> >
>
> > Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
> > 10 millionth Bernoulli number using Mathematica:
> > http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
> >
> > How does sage's Bernoulli number implementation compare? I'd like to
> > see bernoulli(10^7) in sage beating Mathematica's time. And then
> > computing the 20 millionth Bernoulli number...
>
> I couldn't find any information about the hardware that guy used.
> 64-bit? 32-bit?
> 1.8Ghz or 3Ghz? Could somebody write and ask?

I did earlier, and I hope he will answer.

didier

Bill Hart

unread,

May 2, 2008, 3:43:49 PM5/2/08

to sage-devel

I think the asymptotics aren't going to go our way if we use pari. It
takes 11s for 10^5 and I've been sitting here for quite a few minutes
and didn't get 10^6 yet.

I think pari uses the zeta function to compute bernoulli numbers.

If I'm reading the code right it first computes 1/zeta(n) using the
Euler product, then computes the numerator of the bernoulli number to
the required precision using this value, then divides by the required
denominator, which is just a product of primes.

Bill.

On 2 May, 20:11, "William Stein" <wst...@gmail.com> wrote:
> On Fri, May 2, 2008 at 11:34 AM, Fredrik Johansson
>

> <fredrik.johans...@gmail.com> wrote:
>
> > Oleksandr Pavlyk reports on the Wolfram Blog that he has computed the
> > 10 millionth Bernoulli number using Mathematica:

> > http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-recor...

William Stein

unread,

May 2, 2008, 3:45:38 PM5/2/08

to sage-...@googlegroups.com, Kevin McGown

On Fri, May 2, 2008 at 12:10 PM, John Cremona <john.c...@gmail.com> wrote:
>
> ok, so the docstring reaveals (1) that the pari version is "by far the
> fastest" as I suspected, but also that for n>50000 that we use a gp
> interface rather than the pari library " since the C-library interface
> to PARI
> is limited in memory for individual operations" -- whatever that means!

That's out of date now that Gonzalo T. and I fixed the pari interface
so that it automatically doubles the stack if needed. The
code needs to be fixed to never use gp by default. If you explicitly
use algorithm='pari' it will still switch to gp; changing this was
one of my fixes to my code so I could try this.

Tom is trying the whole calculation directly in GP.
He also did a log fit to timings and estimates it should take
about a week in Sage on his machine. We'll see.

>
> That might explain David's timing observations.
>
> I tihnk the pari implementation is actually quite simple (and there is
> a huge amount about Berouilli nos in Henri Cohen's latest book too)
> which suggests that doing a cython implementation would not be hard.
>
> Maybe this is time for a repeat performance of the partitions
> competition with M**ca!

The complexity mostly depends on the precision one uses in
computing a certain Euler product approximation to zeta
and also the number of factors in the product. If you look
at the PARI source code the comments do *not* inspire confidence in
its correctness. I had a student give a provable bound on precision
and number of factors needed and wasn't able to get anything
as good as what PARI uses.

Here's the funny part of the PARI code (in trans3.c):

/* 1.712086 = ??? */
t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) + 1.712086;

-- William

David Harvey

unread,

May 2, 2008, 3:46:07 PM5/2/08

to sage-...@googlegroups.com

On May 2, 2008, at 3:43 PM, Bill Hart wrote:

> I think the asymptotics aren't going to go our way if we use pari. It
> takes 11s for 10^5 and I've been sitting here for quite a few minutes
> and didn't get 10^6 yet.

So far I have on a 2.6GHz opteron:

sage: time x = bernoulli(60000)
Wall time: 3.79

sage: time x = bernoulli(120000)
Wall time: 16.97

sage: time x = bernoulli(240000)
Wall time: 118.24

sage: time x = bernoulli(480000)
Wall time: 540.25

and I'll report back with 960000 hopefully within an hour.

david

William Stein

unread,

May 2, 2008, 3:46:36 PM5/2/08

to sage-...@googlegroups.com

I did:

t=cputime(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem(); b =
bernoulli(10^7+2, algorithm='pari'); b.save('bern.sobj');
save(t,'time.sobj')

after patching Sage to always use the PARI C library if algorithm='pari'.

-- William

David Harvey

unread,

May 2, 2008, 3:55:54 PM5/2/08

to sage-...@googlegroups.com

On May 2, 2008, at 3:45 PM, William Stein wrote:

> The complexity mostly depends on the precision one uses in
> computing a certain Euler product approximation to zeta
> and also the number of factors in the product. If you look
> at the PARI source code the comments do *not* inspire confidence in
> its correctness. I had a student give a provable bound on precision
> and number of factors needed and wasn't able to get anything
> as good as what PARI uses.
>
> Here's the funny part of the PARI code (in trans3.c):
>
> /* 1.712086 = ??? */
> t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) +
> 1.712086;

One way to check it is to use the bernoulli_mod_p_single() function,
which computes B_k mod p for a single p and k, and uses a completely
independent algorithm.

sage: x = bernoulli(240000)

sage: p = next_prime(500000)
sage: bernoulli_mod_p_single(p, 240000)
498812
sage: x % p
498812

sage: p = next_prime(10^6)
sage: bernoulli_mod_p_single(p, 240000)
841174
sage: x % p
841174

So I would say the answer is correct.

david

William Stein

unread,

May 2, 2008, 4:04:10 PM5/2/08

to sage-...@googlegroups.com

I've done numerous similar tests, and
I definitely don't think PARI is giving wrong answers.
The issue is just that I've written a paper to generalize
the algorithm to generalized Bernoulli numbers, and was
very annoyed that I couldn't prove that even the algorithm
used by PARI worked.

-- William

boo...@u.washington.edu

unread,

May 2, 2008, 4:08:52 PM5/2/08

to sage-...@googlegroups.com

Funny this should come up. William just gave a take-home midterm in which we had to predict the runtime for various computations, so I wrote some generic code to help. According to my code, and some liberal assumptions, it should take 5.1 days. I've attached the plots that show the curves I fit to some runtime data (x-axis is log(n,1.5) y-axis is seconds).

However, this same code predicted that computing the determinant of a 10000x10000 matrix with single-digit entries would take 20 hours, but it really took 30 hours. So my estimates are not to be trusted too much as the numbers grow...

David Harvey

unread,

May 2, 2008, 4:13:15 PM5/2/08

to sage-...@googlegroups.com

On May 2, 2008, at 4:08 PM, boo...@u.washington.edu wrote:

> Funny this should come up. William just gave a take-home midterm
> in which we had to predict the runtime for various computations, so
> I wrote some generic code to help. According to my code, and some
> liberal assumptions, it should take 5.1 days. I've attached the
> plots that show the curves I fit to some runtime data (x-axis is log
> (n,1.5) y-axis is seconds).

Sorry, could you please say more precisely what the two axes are? I'm
seeing negative time the way I interpret your statement.

david

boo...@u.washington.edu

unread,

May 2, 2008, 4:15:44 PM5/2/08

to sage-...@googlegroups.com

Sorry, the y-axis in the lower plot is log(time in seconds).

On Fri, 2 May 2008, David Harvey wrote:

David Harvey

unread,

May 2, 2008, 4:24:05 PM5/2/08

to sage-...@googlegroups.com

One more data point (2.6GHz opteron):

sage: time x = bernoulli(60000)
Wall time: 3.79

sage: time x = bernoulli(120000)
Wall time: 16.97

sage: time x = bernoulli(240000)
Wall time: 118.24

sage: time x = bernoulli(480000)
Wall time: 540.25

sage: time x = bernoulli(960000)
Wall time: 2436.06

The ratios between successive times are:

4.47757255936675
6.96758986446671
4.56909675236806
4.50913465987969

If we guess that it's "really" 4.5, then the complexity is N^(log
(4.5) / log(2)) = N^2.17. This is puzzling; I thought the algorithm
was supposed to be better than quadratic. Does anyone know what the
theoretical complexity is supposed to be?

Anyway, extrapolating gives about 4.5 days, pretty much the same as
what Tom estimates. I'm going to start it running now.

david

didier deshommes

unread,

May 2, 2008, 4:34:28 PM5/2/08

to sage-...@googlegroups.com

Here is some more information about the machine used to compute this:

---------- Forwarded message ----------
From: Oleksandr Pavlyk <pav...@gmail.com>
Date: Fri, May 2, 2008 at 4:29 PM
Subject: Re: Today We Broke the Bernoulli Record: From the Analytical
Engine to Mathematica
To: didier deshommes <dfde...@gmail.com>

Hi Didier,

I used Linux, with 64 bit AMD processor:

AMD Opteron(tm) Processor 250
cpu MHz : 1000.000
cache size : 1024 KB

and 8GB of memory, but as I say in the blog, I did
not use that much.

The calculations were done using development build of
Mathematica, but calculations will go through in any
flavor of Mathematica version 6 as well, to the best of
my knowledge. Just run

Timing[ result = BernoulliB[10^7]; ]

It will take about twice longer on 32-bit processors,
thus about 2 weeks.

Please do not hesitate to ask further questions.

Sincerely,
Oleksandr Pavlyk

On Fri, May 2, 2008 at 2:12 PM, didier deshommes <dfde...@gmail.com> wrote:
> Hi Dr. Pavlyk,
> My question is in referrence to your blog post:
> http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
>
> - Do you have the specs of the machine you ran this off? CPU, memory, etc.
> - I assume this function is in the development version of mathematica?
>
> Thanks for your informative post!
>
> didier

mabshoff

unread,

May 2, 2008, 4:38:30 PM5/2/08

to sage-devel

On May 2, 10:34 pm, "didier deshommes" <dfdes...@gmail.com> wrote:
> Here is some more information about the machine used to compute this:

Hi,

> Hi Didier,
>
> I used Linux, with 64 bit AMD processor:
>
> AMD Opteron(tm) Processor 250
> cpu MHz : 1000.000
> cache size : 1024 KB

FYI: That CPU runs at 2.4GHz when not throttled, like in this case. I
assume that it would run at full speed during the computation :)

> and 8GB of memory, but as I say in the blog, I did
> not use that much.

Yep.

Cheers,

Michael

Bill Hart

unread,

May 2, 2008, 5:22:24 PM5/2/08

to sage-devel

I did some computations using von Staudt's theorem and up to 400000 no
errors. Of course that doesn't prove anything for much larger n.

Bill.

On 2 May, 21:04, "William Stein" <wst...@gmail.com> wrote:

Bill Hart

unread,

May 2, 2008, 7:30:44 PM5/2/08

to sage-devel

The theoretical complexity of all the algorithms that rely on
recurrences is supposed to be n^2. But this doesn't take into account
the size of the numbers themselves. When you do this they are all
about n^3 as far as I can see. You can use Ramanujan identities, the
Akiyama-Tanigawa algorithm, the identity used by Lovelace, but all are
n^3 or so.

The theoretical complexity of the version using the zeta function
looks something like n log n steps at precision n log n, i.e. time n^2
(log n)^2.

Bill.

Bill Hart

unread,

May 2, 2008, 8:21:06 PM5/2/08

to sage-devel

Actually, it might be n/log(n) steps, so the time might be something
like n^2 though there are other terms involved.

Bill.

Bill Hart

unread,

May 3, 2008, 8:12:36 AM5/3/08

to sage-devel

I think I nearly understand what Pari does.

The value of B_k is given by zeta(n)*(2*n!)/(2^n pi^n). However,
zeta(n) is *very* close to 1 for large n. So one starts by computing
zeta to a precision given by the size of (2*n!)/(2^n pi^n) (which is
basically the size of B_k) with 3 added to the precision to take care
of small values of n.

To compute how large (2*n!)/(2^n pi^n) can be, one uses Stirling's
formula for n!, i.e. n! ~ sqrt(2n*pi)*n^n*e^(-n). The log of this
expression is never more than 0.1 of the log of n!. Taking the log of
the expression one gets when plugging in Stirling's approximation, one
gets that the log of B_k does not exceed (n + 0.5) * log(n) -
n*(1+log2PI) + log(2*sqrt(2*Pi)) + 0.1 except for very small n (for
which it is sufficient to add 3 to the precision).

Now we need to know how big the numerator can be in B_k. But we can
compute the denominator precisely using the Clausen-von Staudt
formula. Pari does this and calls the denominator d.

Now the log of the numerator is no bigger than t = log(d) + (n + 0.5)
* log(n) - n*(1+log2PI) + log(2*sqrt(2*Pi)) + 0.1 = log(d ) + (n +
0.5) * log(n) - n*(1+log2PI) + 1.712086.

If this is how big the numerator of B_k is, we only need to compute it
to half a part in e^t. Thus we only need to compute 1/zeta(n) to half
a part in e^t. This is done by computing the inverse of the Euler
product for sufficiently many primes. Suppose we compute this for
primes up to p. Then the error in the inverse of the zeta function is
less than sum 1/p^n + 1/(p+1)^n +..... Consider this sum p terms at a
time. The first p are <= 1/p^n, the second p terms are <= 1/(2p)^n,
etc. Thus the error is quite a bit less than zeta(n) * p/p^n ~ 1/
p^(n-1).

Thus if we want this error to be less than half a part in e^t we need
to choose p to be about exp(t)^(1/(n-1)). Pari uses exp((t -
log(n-1)) / (n-1)) and I'm not sure where the extra log(n-1) comes
from, but either they use a slightly better approximation than me, or
they perhaps note that one doesn't need the first 1/p^n in the
approximation.

Pari could be improved by noting that one does not need the last few
terms of the numerator, as they can be recovered from Clausen-von
Staudt, which actually gives the fractional part of B_k. But for large
n, this is not a significant saving.

Bill.

Bill Hart

unread,

May 3, 2008, 8:14:19 AM5/3/08

to sage-devel

Sorry, this:

The log of this expression is never more than 0.1 of the log of n!.

should read:

The log of this expression is always within 0.1 of the log of n!.

Bill.

Bill Hart

unread,

May 3, 2008, 9:57:38 AM5/3/08

to sage-devel

I probably also mean:

Then the error in the the zeta function

not:

Then the error in the inverse of the zeta function

Bill.

On 3 May, 13:12, Bill Hart <goodwillh...@googlemail.com> wrote:

boo...@u.washington.edu

unread,

May 5, 2008, 4:02:54 PM5/5/08

to sage-...@googlegroups.com

My computation of bernoulli(10^7+4) using GP version 2.3.3 has completed in 217417011 miliseconds. That's about 2 days, 12 hours. Anybody know how I can print the thing to file?

Machine:
Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.

Currently, my gp session is using 4GB of RAM.

William Stein

unread,

May 5, 2008, 4:09:12 PM5/5/08

to sage-...@googlegroups.com

On Mon, May 5, 2008 at 1:02 PM, <boo...@u.washington.edu> wrote:
>
> My computation of bernoulli(10^7+4) using GP version 2.3.3 has completed in 217417011 miliseconds. That's about 2 days, 12 hours. Anybody know how I can print the thing to file?
>

So PARI is already over twice as fast as Mathematica? Cool!
BTW, I started almost same computation at the same time on a 32-bit
build of PARI under os x, and it will probably take days more.
We need 64-bit Sage on OS X!

> Machine:
> Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.
>
> Currently, my gp session is using 4GB of RAM.

Use the write command:

? write('foo', 92834028340)
?

>
>
>
>
>
> >
>

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

mhampton

unread,

May 6, 2008, 12:53:53 PM5/6/08

to sage-devel

That certainly merits a blog post somewhere - ?

David Harvey

unread,

May 6, 2008, 1:09:18 PM5/6/08

to sage-...@googlegroups.com

You might also want to check the result using the
bernoulli_mod_p_single function that I mentioned a few days ago on
this thread.

david

boo...@u.washington.edu

unread,

May 6, 2008, 1:15:00 PM5/6/08

to sage-devel

William has mentioned some congruence tests that we can perform -- I'd like to make sure that I got the right answer before we pat ourselves on the back too much.

William Stein

unread,

May 6, 2008, 1:18:53 PM5/6/08

to sage-...@googlegroups.com

On Tue, May 6, 2008 at 10:15 AM, <boo...@u.washington.edu> wrote:
>
> William has mentioned some congruence tests that we can perform -- I'd like to make sure that I got the right answer before we pat ourselves on the back too much.
>
>

David Harvey's congruence tests would be pretty good. Just choose
*any* prime p > 10^7 + 10
say and compute B_{10^7+4} modulo it using David Harvey's function;

sage: p = next_prime(10^7+10)
sage: time bernoulli_mod_p_single(p, 10^7+4)
CPU times: user 0.49 s, sys: 0.00 s, total: 0.49 s
Wall time: 0.51
8087583

Pretty cool, eh?

William

David Harvey

unread,

May 6, 2008, 1:57:00 PM5/6/08

to sage-...@googlegroups.com

..... and the natural question is, could you then reconstruct the
actual bernoulli number, by computing it modulo sufficiently many
primes? Well, I did the estimates a few days ago, and it turns out
that the asymptotic behaviour of this proposed algorithm is pretty
much *the same* as the one using the zeta function (i.e. the one Pari
uses); they are both around n^2 log^2(n), if I'm not mistaken.
Unfortunately, I did some further tests, and even if I sped up
bernoulli_mod_p_single() by the largest factor I could conceive of,
overall it would still be maybe 50x slower than Pari. If anyone can
find an extra constant factor of 50x in this algorithm, I'd love to
hear about it.

david

William Stein

unread,

May 6, 2008, 2:08:11 PM5/6/08

to sage-...@googlegroups.com

I think maybe yours parallelizes much more easily than
Pari's, so just use 1000 computers :-).

-- William

Mike Hansen

unread,

May 6, 2008, 2:12:30 PM5/6/08

to sage-...@googlegroups.com

I think a blog post with PARI timings and then timings for a modular
dsage approach would be cool.

--Mike

David Harvey

unread,

May 6, 2008, 2:16:09 PM5/6/08

to sage-...@googlegroups.com

On May 6, 2008, at 2:12 PM, Mike Hansen wrote:

> I think a blog post with PARI timings and then timings for a modular
> dsage approach would be cool.

Probably not so cool, since it would be like 50 machines vs one machine.

david

Mike Hansen

unread,

May 6, 2008, 2:19:33 PM5/6/08

to sage-...@googlegroups.com

> Probably not so cool, since it would be like 50 machines vs one machine.

Sure, but the Mathematica blog post is scalablity: "In Mathematica, a
core principle is that everything should be scalable. So in my job of
creating algorithms for Mathematica I have to make sure that
everything I produce is scalable."

--Mike

mhampton

unread,

May 6, 2008, 2:25:02 PM5/6/08

to sage-devel

I agree, I think demonstrating a distributed algorithm would be very
cool. From what I can tell of processor trends, we won't see enormous
gains in speed but we might see an awful lot of processors (like
Intel's prototype 80-core chip).

David Harvey

unread,

May 6, 2008, 2:55:15 PM5/6/08

to sage-...@googlegroups.com

But Pari's algorithm is already parallelisable (just farm off a bunch
of euler factors to each thread).

The only advantage my algorithm has in "scalability" is that each
thread needs only O(1) memory, instead of O(n log n) which would be
required for Pari's algorithm. So if memory were tight, but you had
lots of processors, and your n was really big, then perhaps this
algorithm would win. But when I think about the actual numbers
involved, the economics just don't work out. Even 80 processors is a
pathetically small number, given a 50x serial slowdown in the
algorithm. You would need thousands of cpus to even consider this
approach. And even then, it would only be worthwhile if each cpu had
very limited memory.

david

William Stein

unread,

May 6, 2008, 5:40:30 PM5/6/08

to sage-...@googlegroups.com

On Tue, May 6, 2008 at 11:55 AM, David Harvey <dmha...@math.harvard.edu> wrote:
>
>
> On May 6, 2008, at 2:19 PM, Mike Hansen wrote:
>
> >> Probably not so cool, since it would be like 50 machines vs one
> >> machine.
> >
> > Sure, but the Mathematica blog post is scalablity: "In Mathematica, a
> > core principle is that everything should be scalable. So in my job of
> > creating algorithms for Mathematica I have to make sure that
> > everything I produce is scalable."
>
> But Pari's algorithm is already parallelisable (just farm off a bunch
> of euler factors to each thread).

Yep. I was about to point out that I was joking in my remark about
parallelizing your code. I certainly agree with the above suggested
way to parallelize computing zeta.

>
> The only advantage my algorithm has in "scalability" is that each
> thread needs only O(1) memory, instead of O(n log n) which would be
> required for Pari's algorithm. So if memory were tight, but you had
> lots of processors, and your n was really big, then perhaps this
> algorithm would win. But when I think about the actual numbers
> involved, the economics just don't work out. Even 80 processors is a
> pathetically small number, given a 50x serial slowdown in the
> algorithm. You would need thousands of cpus to even consider this
> approach. And even then, it would only be worthwhile if each cpu had
> very limited memory.

Hmm... That's exactly the configuration of a lot of interesting supercomputers
these days. See
http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.index.html
They have these single-cabinet tiny memory 1024 processor machines...

David Harvey

unread,

Jul 2, 2008, 3:28:13 PM7/2/08

to sage-...@googlegroups.com

Returning to a slightly old thread.....

I have implemented a new algorithm for computing large Bernoulli numbers.

Running on 10 cores for 5.5 days, I computed B_k for k = 10^8, which I believe is a new record. (Recall the Mathematica blog post from April was for k = 10^7.)

Essentially it's the multimodular algorithm I suggested earlier on this thread, but I figured out some tricks to optimise the crap out of the computation of B_k mod p.

Patch is up for review here:

http://sagetrac.org/sage_trac/ticket/3542

Preprint is here (comments welcome):

http://math.harvard.edu/~dmharvey/bernmm/bernmm.pdf

One data point, on a 2.6GHz opteron:

sage: time x = bernoulli(10^5, algorithm="pari")

CPU times: user 0.16 s, sys: 0.00 s, total: 0.16 s

Wall time: 20.57 s

sage: time y = bernoulli(10^5, algorithm="bernmm")

CPU times: user 6.54 s, sys: 0.00 s, total: 6.54 s

Wall time: 6.54 s

sage: time z = bernoulli(10^5, algorithm="bernmm", num_threads=3)

CPU times: user 6.54 s, sys: 0.03 s, total: 6.57 s

Wall time: 2.71 s

sage: x == y

True

sage: x == z

True

Timings for some bigger k:

k = 10^7:

PARI/GP = 75 h

Mathematica = 142 h

bernmm (1 core) = 11.1 h

bernmm (10 cores) = 1.3 h

k = 10^8:

bernmm (10 cores) = 131h = 5.5 days

david

mabshoff

unread,

Jul 6, 2008, 4:32:06 AM7/6/08

to sage-devel

On Jul 2, 12:28 pm, David Harvey <dmhar...@math.harvard.edu> wrote:

Hi David,

> Returning to a slightly old thread.....

I meant to answer earlier ...

> I have implemented a new algorithm for computing large Bernoulli
> numbers.
>
> Running on 10 cores for 5.5 days, I computed B_k for k = 10^8, which
> I believe is a new record. (Recall the Mathematica blog post from
> April was for k = 10^7.)

w00t :)

> Essentially it's the multimodular algorithm I suggested earlier on
> this thread, but I figured out some tricks to optimise the crap out
> of the computation of B_k mod p.
>
> Patch is up for review here:
>
> http://sagetrac.org/sage_trac/ticket/3542

We should assign an editor for that. The use pthreads is ok, we might
have to disable or port some code for the Windows port There seems to
be multiple pthread libraries for Win32, but I cannot say for sure if
they work or not yet.

Very, very impressive to say the least.

Cheers,

Michael

Reply all

Reply to author

Forward