> It takes about 30 seconds on my machine to get the 10^5 Bernoulli
> number. The mathematica blog says it took a "development" version of
> mathematica 6 days to do the 10^7 calc. So it would probably take
> some work, but we are not that badly off as is.
But what about the asymptotics? I tried 10^5 and 2*10^5 and 4*10^5
and it wasn't pretty.
david
John
2008/5/2 mhampton <hamp...@gmail.com>:
That might explain David's timing observations.
I tihnk the pari implementation is actually quite simple (and there is
a huge amount about Berouilli nos in Henri Cohen's latest book too)
which suggests that doing a cython implementation would not be hard.
Maybe this is time for a repeat performance of the partitions
competition with M**ca!
John
2008/5/2 John Cremona <john.c...@gmail.com>:
Sage's Bernoulli number is *just* PARI/GP's bernoulli number implementation.
Last time I tried timing Sage versus Mathematica's Bernoulli number command
(which was 2 years ago), Sage was twice as fast.
William
I couldn't find any information about the hardware that guy used.
64-bit? 32-bit?
1.8Ghz or 3Ghz? Could somebody write and ask?
Also, when I tried
bernoulli(10^7+2)
directly in Sage there were a couple of issues that arose, since that command
is much more designed for smaller input. I fixed those small issues.
I guess we'll see in a week ..
William
> Also, when I tried
>
> bernoulli(10^7+2)
>
> directly in Sage there were a couple of issues that arose, since
> that command
> is much more designed for smaller input. I fixed those small issues.
> I guess we'll see in a week ..
I hope you did:
sage: x = bernoulli(10^7 + 2)
and not
sage: bernoulli(10^7 + 2)
david
I did earlier, and I hope he will answer.
didier
That's out of date now that Gonzalo T. and I fixed the pari interface
so that it automatically doubles the stack if needed. The
code needs to be fixed to never use gp by default. If you explicitly
use algorithm='pari' it will still switch to gp; changing this was
one of my fixes to my code so I could try this.
Tom is trying the whole calculation directly in GP.
He also did a log fit to timings and estimates it should take
about a week in Sage on his machine. We'll see.
>
> That might explain David's timing observations.
>
> I tihnk the pari implementation is actually quite simple (and there is
> a huge amount about Berouilli nos in Henri Cohen's latest book too)
> which suggests that doing a cython implementation would not be hard.
>
> Maybe this is time for a repeat performance of the partitions
> competition with M**ca!
The complexity mostly depends on the precision one uses in
computing a certain Euler product approximation to zeta
and also the number of factors in the product. If you look
at the PARI source code the comments do *not* inspire confidence in
its correctness. I had a student give a provable bound on precision
and number of factors needed and wasn't able to get anything
as good as what PARI uses.
Here's the funny part of the PARI code (in trans3.c):
/* 1.712086 = ??? */
t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) + 1.712086;
-- William
> I think the asymptotics aren't going to go our way if we use pari. It
> takes 11s for 10^5 and I've been sitting here for quite a few minutes
> and didn't get 10^6 yet.
So far I have on a 2.6GHz opteron:
sage: time x = bernoulli(60000)
Wall time: 3.79
sage: time x = bernoulli(120000)
Wall time: 16.97
sage: time x = bernoulli(240000)
Wall time: 118.24
sage: time x = bernoulli(480000)
Wall time: 540.25
and I'll report back with 960000 hopefully within an hour.
david
I did:
t=cputime(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem();
pari.allocatemem(); pari.allocatemem(); pari.allocatemem(); b =
bernoulli(10^7+2, algorithm='pari'); b.save('bern.sobj');
save(t,'time.sobj')
after patching Sage to always use the PARI C library if algorithm='pari'.
-- William
> The complexity mostly depends on the precision one uses in
> computing a certain Euler product approximation to zeta
> and also the number of factors in the product. If you look
> at the PARI source code the comments do *not* inspire confidence in
> its correctness. I had a student give a provable bound on precision
> and number of factors needed and wasn't able to get anything
> as good as what PARI uses.
>
> Here's the funny part of the PARI code (in trans3.c):
>
> /* 1.712086 = ??? */
> t = log( gtodouble(d) ) + (n + 0.5) * log(n) - n*(1+log2PI) +
> 1.712086;
One way to check it is to use the bernoulli_mod_p_single() function,
which computes B_k mod p for a single p and k, and uses a completely
independent algorithm.
sage: x = bernoulli(240000)
sage: p = next_prime(500000)
sage: bernoulli_mod_p_single(p, 240000)
498812
sage: x % p
498812
sage: p = next_prime(10^6)
sage: bernoulli_mod_p_single(p, 240000)
841174
sage: x % p
841174
So I would say the answer is correct.
david
I've done numerous similar tests, and
I definitely don't think PARI is giving wrong answers.
The issue is just that I've written a paper to generalize
the algorithm to generalized Bernoulli numbers, and was
very annoyed that I couldn't prove that even the algorithm
used by PARI worked.
-- William
However, this same code predicted that computing the determinant of a 10000x10000 matrix with single-digit entries would take 20 hours, but it really took 30 hours. So my estimates are not to be trusted too much as the numbers grow...
> Funny this should come up. William just gave a take-home midterm
> in which we had to predict the runtime for various computations, so
> I wrote some generic code to help. According to my code, and some
> liberal assumptions, it should take 5.1 days. I've attached the
> plots that show the curves I fit to some runtime data (x-axis is log
> (n,1.5) y-axis is seconds).
Sorry, could you please say more precisely what the two axes are? I'm
seeing negative time the way I interpret your statement.
david
On Fri, 2 May 2008, David Harvey wrote:
sage: time x = bernoulli(60000)
Wall time: 3.79
sage: time x = bernoulli(120000)
Wall time: 16.97
sage: time x = bernoulli(240000)
Wall time: 118.24
sage: time x = bernoulli(480000)
Wall time: 540.25
sage: time x = bernoulli(960000)
Wall time: 2436.06
The ratios between successive times are:
4.47757255936675
6.96758986446671
4.56909675236806
4.50913465987969
If we guess that it's "really" 4.5, then the complexity is N^(log
(4.5) / log(2)) = N^2.17. This is puzzling; I thought the algorithm
was supposed to be better than quadratic. Does anyone know what the
theoretical complexity is supposed to be?
Anyway, extrapolating gives about 4.5 days, pretty much the same as
what Tom estimates. I'm going to start it running now.
david
---------- Forwarded message ----------
From: Oleksandr Pavlyk <pav...@gmail.com>
Date: Fri, May 2, 2008 at 4:29 PM
Subject: Re: Today We Broke the Bernoulli Record: From the Analytical
Engine to Mathematica
To: didier deshommes <dfde...@gmail.com>
Hi Didier,
I used Linux, with 64 bit AMD processor:
AMD Opteron(tm) Processor 250
cpu MHz : 1000.000
cache size : 1024 KB
and 8GB of memory, but as I say in the blog, I did
not use that much.
The calculations were done using development build of
Mathematica, but calculations will go through in any
flavor of Mathematica version 6 as well, to the best of
my knowledge. Just run
Timing[ result = BernoulliB[10^7]; ]
It will take about twice longer on 32-bit processors,
thus about 2 weeks.
Please do not hesitate to ask further questions.
Sincerely,
Oleksandr Pavlyk
On Fri, May 2, 2008 at 2:12 PM, didier deshommes <dfde...@gmail.com> wrote:
> Hi Dr. Pavlyk,
> My question is in referrence to your blog post:
> http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
>
> - Do you have the specs of the machine you ran this off? CPU, memory, etc.
> - I assume this function is in the development version of mathematica?
>
> Thanks for your informative post!
>
> didier
Machine:
Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.
Currently, my gp session is using 4GB of RAM.
So PARI is already over twice as fast as Mathematica? Cool!
BTW, I started almost same computation at the same time on a 32-bit
build of PARI under os x, and it will probably take days more.
We need 64-bit Sage on OS X!
> Machine:
> Quad-core 2.0Ghz Xeon, 1333MHz FSB, 32GB RAM.
>
> Currently, my gp session is using 4GB of RAM.
Use the write command:
? write('foo', 92834028340)
?
>
>
>
>
>
> >
>
--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org
You might also want to check the result using the
bernoulli_mod_p_single function that I mentioned a few days ago on
this thread.
david
David Harvey's congruence tests would be pretty good. Just choose
*any* prime p > 10^7 + 10
say and compute B_{10^7+4} modulo it using David Harvey's function;
sage: p = next_prime(10^7+10)
sage: time bernoulli_mod_p_single(p, 10^7+4)
CPU times: user 0.49 s, sys: 0.00 s, total: 0.49 s
Wall time: 0.51
8087583
Pretty cool, eh?
William
..... and the natural question is, could you then reconstruct the
actual bernoulli number, by computing it modulo sufficiently many
primes? Well, I did the estimates a few days ago, and it turns out
that the asymptotic behaviour of this proposed algorithm is pretty
much *the same* as the one using the zeta function (i.e. the one Pari
uses); they are both around n^2 log^2(n), if I'm not mistaken.
Unfortunately, I did some further tests, and even if I sped up
bernoulli_mod_p_single() by the largest factor I could conceive of,
overall it would still be maybe 50x slower than Pari. If anyone can
find an extra constant factor of 50x in this algorithm, I'd love to
hear about it.
david
I think maybe yours parallelizes much more easily than
Pari's, so just use 1000 computers :-).
-- William
--Mike
> I think a blog post with PARI timings and then timings for a modular
> dsage approach would be cool.
Probably not so cool, since it would be like 50 machines vs one machine.
david
Sure, but the Mathematica blog post is scalablity: "In Mathematica, a
core principle is that everything should be scalable. So in my job of
creating algorithms for Mathematica I have to make sure that
everything I produce is scalable."
--Mike
But Pari's algorithm is already parallelisable (just farm off a bunch
of euler factors to each thread).
The only advantage my algorithm has in "scalability" is that each
thread needs only O(1) memory, instead of O(n log n) which would be
required for Pari's algorithm. So if memory were tight, but you had
lots of processors, and your n was really big, then perhaps this
algorithm would win. But when I think about the actual numbers
involved, the economics just don't work out. Even 80 processors is a
pathetically small number, given a 50x serial slowdown in the
algorithm. You would need thousands of cpus to even consider this
approach. And even then, it would only be worthwhile if each cpu had
very limited memory.
david
Yep. I was about to point out that I was joking in my remark about
parallelizing your code. I certainly agree with the above suggested
way to parallelize computing zeta.
>
> The only advantage my algorithm has in "scalability" is that each
> thread needs only O(1) memory, instead of O(n log n) which would be
> required for Pari's algorithm. So if memory were tight, but you had
> lots of processors, and your n was really big, then perhaps this
> algorithm would win. But when I think about the actual numbers
> involved, the economics just don't work out. Even 80 processors is a
> pathetically small number, given a 50x serial slowdown in the
> algorithm. You would need thousands of cpus to even consider this
> approach. And even then, it would only be worthwhile if each cpu had
> very limited memory.
Hmm... That's exactly the configuration of a lot of interesting supercomputers
these days. See
http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.index.html
They have these single-cabinet tiny memory 1024 processor machines...