A challenge

129 views
Skip to first unread message

Bill Hart

unread,
Aug 19, 2014, 11:55:30 PM8/19/14
to sage-flame
Here is a serious challenge for the Python/Sage community. Here is a microbenchmark:

for i in xrange(1, 100000000):
    if (i % 1000) == 0:
        a += 1
    a += 1

It takes around 150s on my machine in Sage.

In an efficient language with all the features I mentioned (plus a few I didn't), this takes 5.5s. (There's a decimal point in there!)

1) I'm using actual flint bignums for a, not machine words
2) No trickery. It doesn't optimise the computation away. It really does all the steps.
3) It's garbage collected. I type just about exactly what you type.
4) I allow a machine word for i. Technically it doesn't require an annotation, but I allow one on the constant literal for simplicity, for now. (I haven't got infinite time to write compiler optimisation passes!)
5) My language is currently a bit slow at this task. I could do much better if I tried harder.
6) I typed this in live at the console. I didn't precompile my program using a static compiler (though I did use a jit behind the scenes obviously)

Maybe if you used a really, really efficient dictionary struct, with some metaclasses, some monkeypatching and a few decorators.... Perhaps there's "one weird trick"?

I'll check this example in Sage every year. Any predictions on when it will beat 5.5s at 2.2 GHz? I have a prediction.

Bill.

Bill Hart

unread,
Aug 20, 2014, 12:26:08 AM8/20/14
to sage-flame
Here is a second challenge. First, in Sage: 

R.<x> = PolynomialRing(ZZ)
S.<y> = PolynomialRing(R)
T.<z> = PolynomialRing(S)
U.<t> = PolynomialRing(T)
f = x + y + z + w + 1
p = f^30;
q = p*(p+1);

About 166s.

Here it is in Nemo (Julia + flint):

julia> using Rings

julia> R, x = PolynomialRing(ZZ, "x")
(Univariate polynomial ring in x over ZZ,x)

julia> S, y = PolynomialRing(R, "y")
(Univariate polynomial ring in y over Univariate polynomial ring in x over ZZ,y)

julia> T, z = PolynomialRing(S, "z")
(Univariate polynomial ring in z over Univariate polynomial ring in y over Univariate polynomial ring in x over ZZ,z)

julia> U, t = PolynomialRing(T, "t")
(Univariate polynomial ring in t over Univariate polynomial ring in z over Univariate polynomial ring in y over Univariate polynomial ring in x over ZZ,t)

julia> f = x + y + z + t + 1
t+(z+(y+(x+1)))

julia> p = f^30;

julia> @time q = p*(p+1);
elapsed time: 42.386738165 seconds (577871736 bytes allocated)

* Dense representation
* Nemo could be a lot faster
* Currently Nemo uses the classical algorithm for three generic rings over flint's fmpz_poly (which is used for R) and binary exponentiation. But this challenge is more fun if other algorithms are allowed. So long as the representation remains dense.

Any predictions on the rate of improvement in this computation over the next few years in Sage? In Nemo? I have a prediction.

I will give this to Sage. It's twice as fast as Magma on the same 2.2GHz machine. Only a factor of 4 to go! For now.

Bill.

Ondřej Čertík

unread,
Aug 20, 2014, 12:33:26 AM8/20/14
to sage-...@googlegroups.com
On Tue, Aug 19, 2014 at 9:55 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> Here is a serious challenge for the Python/Sage community. Here is a
> microbenchmark:
>
> for i in xrange(1, 100000000):
> if (i % 1000) == 0:
> a += 1
> a += 1

You should post something that has some application. If this is the
problem at hand, I am going to use Fortran, which takes 0.344s on my
slow laptop...

implicit none
integer :: i, a
a = 0
do i = 1, 100000000-1
if (modulo(i, 1000) == 0) a = a + 1
a = a + 1
end do
print *, a
end


Ondrej

>
> It takes around 150s on my machine in Sage.
>
> In an efficient language with all the features I mentioned (plus a few I
> didn't), this takes 5.5s. (There's a decimal point in there!)
>
> 1) I'm using actual flint bignums for a, not machine words
> 2) No trickery. It doesn't optimise the computation away. It really does all
> the steps.
> 3) It's garbage collected. I type just about exactly what you type.
> 4) I allow a machine word for i. Technically it doesn't require an
> annotation, but I allow one on the constant literal for simplicity, for now.
> (I haven't got infinite time to write compiler optimisation passes!)
> 5) My language is currently a bit slow at this task. I could do much better
> if I tried harder.
> 6) I typed this in live at the console. I didn't precompile my program using
> a static compiler (though I did use a jit behind the scenes obviously)
>
> Maybe if you used a really, really efficient dictionary struct, with some
> metaclasses, some monkeypatching and a few decorators.... Perhaps there's
> "one weird trick"?
>
> I'll check this example in Sage every year. Any predictions on when it will
> beat 5.5s at 2.2 GHz? I have a prediction.
>
> Bill.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-flame" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-flame+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-flame.
> For more options, visit https://groups.google.com/d/optout.

Bill Hart

unread,
Aug 20, 2014, 12:54:52 AM8/20/14
to sage-flame
Did you use bignums for a? Or did you not read the rules?

Did you type it in real time in the console or precompile it?

And the whole point of Julia is you don't need to use another language. It can match Python for high level stuff and C (or Fortran if you insist) for performance. It takes 0.09s on my machine in Julia without bignums.I guess your laptop is slow.

But this wasn't the point of the challenge, which was very specific.

Bill.

Ondřej Čertík

unread,
Aug 20, 2014, 1:06:52 AM8/20/14
to sage-...@googlegroups.com
On Tue, Aug 19, 2014 at 10:54 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> Did you use bignums for a? Or did you not read the rules?
>
> Did you type it in real time in the console or precompile it?
>
> And the whole point of Julia is you don't need to use another language. It
> can match Python for high level stuff and C (or Fortran if you insist) for
> performance. It takes 0.09s on my machine in Julia without bignums.I guess
> your laptop is slow.

My solution does not follow your rules. But it solves the problem, and
that's what I am interested in. In fact, here is a solution to the
problem in 0.0s:

100099998

Ondrej

Bill Hart

unread,
Aug 20, 2014, 1:42:25 AM8/20/14
to sage-flame
OK, well perhaps you can try the following challenge 3 in your Fortran then:

julia> R, x = FiniteField(7, 11, "x")
(Finite field of degree 11 over F_7,x)

julia> S, y = PolynomialRing(R, "y")
(Univariate polynomial ring in y over Finite field of degree 11 over F_7,y)

julia> T = ResidueRing(S, y^3 + 3x*y + 1)
Residue ring of Univariate polynomial ring in y over Finite field of degree 11 over F_7 modulo y^3+(3*x)*y+1 (constructor with 5 methods)

julia> U, z = PolynomialRing(T, "z")
(Univariate polynomial ring in z over Residue ring of Univariate polynomial ring in y over Finite field of degree 11 over F_7 modulo y^3+(3*x)*y+1,z)

julia> f = (3y^2 + y + x)*z^2 + ((x + 2)*y^2 + x + 1)*z + 4x*y + 3
(3*y^2+y+(x))*z^2+((x+2)*y^2+(x+1))*z+((4*x)*y+3)

julia> g = (7y^2 - y + 2x + 7)*z^2 + (3y^2 + 4x + 1)*z + (2x + 1)*y + 1
(6*-y+(2*x))*z^2+(3*y^2+(4*x+1))*z+((2*x+1)*y+1)

julia> s = f^12;

julia> t = (s + g)^12;

julia> @time r = resultant(s, t);
elapsed time: 4.956234525 seconds (207789656 bytes allocated)

For this week only this is *horrendously* slow in Nemo, so you have a chance. I gave up waiting for Sage to finish after about half an hour, so I don't have a timing for you to aim for as a first attempt. Of course this was my second attempt in Nemo. The first attempt didn't work because I used Julia instead of Nemo by mistake.

Still awaiting your Fortran program for challenge 2.

Actually, I await a correct answer to challenge 3, let alone a timing! Magma will probably trick you on this one, but I've confirmed Nemo gives the correct answer.

I don't know how to compute this in Pari. Maybe both Nemo and Magma are both wrong! How can I check? Who can help me!?

Bill.

Bill Hart

unread,
Aug 20, 2014, 2:44:40 AM8/20/14
to sage-flame
Oh, I bet know why Sage takes forever on that challenge 3. It does the same thing as Magma, which I happen to think is wrong. It fails to coerce the coefficients of the polynomials into the coefficient ring of the polynomial. The coercion mechanism instead creates a ring the user didn't create and coerces everything into that, then does the computation.

Whereas, if I had meant that, I would have typed that.

Sage probably totally thrashes Nemo on this example, when computed correctly. But I'll have Nemo up to speed by next week, probably.

Bill.


Bill Hart

unread,
Aug 20, 2014, 2:54:36 AM8/20/14
to sage-flame
Nope. I appear to be wrong about that. Explicitly coercing the coefficients into the right ring doesn't fix the problem in Sage.

It's either just *really* slow, or I don't understand what I am actually doing (probably the latter).

Bill.

Robert Bradshaw

unread,
Aug 20, 2014, 3:28:26 PM8/20/14
to sage-...@googlegroups.com
Don't hold your breath; the Python/Sage community simply doesn't care
much about this (kind of) micro benchmark. It's simply a non-goal.
Would it be nice? Yes, but it's just not relevant (the few times it is
needed a little bit of extra effort gets you there, which is "good
enough" for now, beats writing the whole system in
assembly/C/Fortran/...).

Languages that solve span performance/high level gap are certainly
interesting (to me at least), but that's only two dimensions.

- Robert

Bill Hart

unread,
Aug 20, 2014, 3:41:32 PM8/20/14
to sage-flame
For a project like Sage to meet its goals it needs to be taken seriously by *lots* of people. That means it needs to be either better than what else people could use, or as good or nearly as good, but with some very important must have feature(s). That could be that it is free for example.

So Sage needs to have lots of things right. Performance is only one of those things. Coverage is another. Widespread publicity is another. Support is another. Reliability is another. Continuous improvements is another. Prevailing winds/politics is another. Etcetera.

The amount of manpower required to do those things is therefore extremely relevant, as is the result that you will get for the effort you put in.

I think your argument is Python/Sage couldn't care less about these sorts of things because either

(i) the amount of effort they put in would have to be too high to achieve those sorts of goals
(ii) by wasting time working around these issues in Python/Sage it is nearly always possible to get almost all the way there without actually solving any of these much more difficult problems

I think you just make my point for me.


Ondřej Čertík

unread,
Aug 20, 2014, 4:01:59 PM8/20/14
to sage-...@googlegroups.com, Francesco Biscani
Hi Bill,
For these polynomial problems, the best tool that I was able to find is Piranha:

https://github.com/bluescarni/piranha

this Fateman benchmark is implemented in:

https://github.com/bluescarni/piranha/blob/5fe6e71e17192295b6c9e1d7fc23f29ada1d868c/tests/fateman2.cpp

and it takes just 8.4s on my 4 core machine:

$ time ./fateman2_perf
Running 1 test case...
8.404437s wall, 32.150000s user + 0.050000s system = 32.200000s CPU (383.1%)

*** No errors detected
Freeing MPFR caches.
Setting shutdown flag.

real 0m8.662s
user 0m32.383s
sys 0m0.068s


True, it's implemented in parallel, but it also prints the total CPU
time summed over cores, which is 32.2s. You can of course also just
force it on one core:

$ time ./fateman2_perf 1
Running 1 test case...
29.264847s wall, 29.240000s user + 0.050000s system = 29.290000s CPU (100.1%)

*** No errors detected
Freeing MPFR caches.
Setting shutdown flag.

real 0m29.519s
user 0m29.465s
sys 0m0.062s

Then it takes 29.2s.

I CCed Francesco, I hope I ran it correctly. I am using
https://github.com/bluescarni/piranha/commit/5fe6e71e17192295b6c9e1d7fc23f29ada1d868c.


> * Dense representation
> * Nemo could be a lot faster
> * Currently Nemo uses the classical algorithm for three generic rings over
> flint's fmpz_poly (which is used for R) and binary exponentiation. But this
> challenge is more fun if other algorithms are allowed. So long as the
> representation remains dense.

I am not quite sure which representation Piranha uses internally.

> Any predictions on the rate of improvement in this computation over the next
> few years in Sage? In Nemo? I have a prediction.

We are in the process of integrating it into CSymPy
(https://github.com/sympy/csympy) to do fast polynomial manipulation.
Sage could easily use it too.

Ondrej

Bill Hart

unread,
Aug 20, 2014, 4:50:04 PM8/20/14
to sage-flame, Francesco Biscani
That's a very nicely written library indeed! The code is very well written.

They seem to represent polynomials using kronecker monomials, so I would say it is not a dense representation. 



this Fateman benchmark is implemented in:

https://github.com/bluescarni/piranha/blob/5fe6e71e17192295b6c9e1d7fc23f29ada1d868c/tests/fateman2.cpp

and it takes just 8.4s on my 4 core machine:

$ time ./fateman2_perf
Running 1 test case...
 8.404437s wall, 32.150000s user + 0.050000s system = 32.200000s CPU (383.1%)

*** No errors detected
Freeing MPFR caches.
Setting shutdown flag.

real    0m8.662s
user    0m32.383s
sys    0m0.068s


True, it's implemented in parallel, but it also prints the total CPU
time summed over cores, which is 32.2s. You can of course also just
force it on one core: 
$ time ./fateman2_perf 1
Running 1 test case...
 29.264847s wall, 29.240000s user + 0.050000s system = 29.290000s CPU (100.1%)

Nice. 

What is the clock speed of your CPU?

The best time I am aware of presently on a 2.2GHz machine for that benchmark in open source software is 18.7s. (Bernard Parisse)

The best time I am aware of for closed source software is with exponent 40 instead of 30, and the time is 33.9s on 12 cores. Perhaps you can try Piranha with exponent 40 and see how it compares. It doesn't scale very well with the number of cores, but unfotunately I don't have timings for 4 cores.

The best times I managed with my own code were apparently about 3.4s with exponent 20 on a single core at 2.2GHz. Again perhaps you can time Piranha and see how long it takes.

But none of these times are with dense representation of course.


*** No errors detected
Freeing MPFR caches.
Setting shutdown flag.

real    0m29.519s
user    0m29.465s
sys    0m0.062s

Then it takes 29.2s.

I CCed Francesco, I hope I ran it correctly. I am using
https://github.com/bluescarni/piranha/commit/5fe6e71e17192295b6c9e1d7fc23f29ada1d868c.


> * Dense representation
> * Nemo could be a lot faster
> * Currently Nemo uses the classical algorithm for three generic rings over
> flint's fmpz_poly (which is used for R) and binary exponentiation. But this
> challenge is more fun if other algorithms are allowed. So long as the
> representation remains dense.

I am not quite sure which representation Piranha uses internally.

> Any predictions on the rate of improvement in this computation over the next
> few years in Sage? In Nemo? I have a prediction.

We are in the process of integrating it into CSymPy
(https://github.com/sympy/csympy) to do fast polynomial manipulation.

Excellent!

Robert Bradshaw

unread,
Aug 20, 2014, 4:55:17 PM8/20/14
to sage-...@googlegroups.com
On Wed, Aug 20, 2014 at 12:41 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> For a project like Sage to meet its goals it needs to be taken seriously by
> *lots* of people. That means it needs to be either better than what else
> people could use, or as good or nearly as good, but with some very important
> must have feature(s). That could be that it is free for example.
>
> So Sage needs to have lots of things right. Performance is only one of those
> things. Coverage is another. Widespread publicity is another. Support is
> another. Reliability is another. Continuous improvements is another.
> Prevailing winds/politics is another. Etcetera.
>
> The amount of manpower required to do those things is therefore extremely
> relevant, as is the result that you will get for the effort you put in.
>
> I think your argument is Python/Sage couldn't care less about these sorts of
> things because either
>
> (i) the amount of effort they put in would have to be too high to achieve
> those sorts of goals
> (ii) by wasting time working around these issues in Python/Sage it is nearly
> always possible to get almost all the way there without actually solving any
> of these much more difficult problems

I was referring to this particular benchmark, of making nearly empty
loops typed in interactively very, very fast without an extra
compilation step. Otherwise

%cython
def bill_benchmark():
cdef long i, a = 0
for i in range(1, 100000000):
if (i % 1000) == 0:
a += 1
a += 1
return a

%timeit
bill_benchmark()
5 loops, best of 3: 200 ms per loop

This isn't because performance isn't important, but because it often
isn't important, and where it is doing the extra work isn't too bad
(in other words, solving that isn't as important attacking the many
other areas that need attention).

For sage specifically, one (mostly) non-goal is language design. I
know that you're of the opinion that you have to design the perfect
language before you can design math software on top of it. And I am
excited to see progress here. But Sage takes more of a pragmatic
stance.

> I think you just make my point for me.

This is sage-flame; feel free to misinterpret my statements to make your point.

Bill Hart

unread,
Aug 20, 2014, 4:56:16 PM8/20/14
to sage-flame, Francesco Biscani
Another benchmark you could try with Piranha is

f = (1 + x + y + 2z^2 + 3t^3 + 5u^5)^12

and 

g = (1 + u + t + 2z^2 + 3y^3 + 5x^5)^12

Computing f*g should take something like 1s on a single core.

Bill.

Bill Hart

unread,
Aug 20, 2014, 5:05:13 PM8/20/14
to sage-flame
You did the same thing as Ondrej. You used machine words instead of bignums. And I already said Julia does that *at the console* just as fast *with machine words*. A point people seem to miss consistently when it suits them!

The importance of microbenchmarks like this is not to compute the answer as fast as you can, but to use them as a means of identifying and improving performance of systems as a whole. 

My point is that this can be done extremely fast *with* bignums, typed at the console. And that does matter, regardless of what you think. It exercises:

* bignums
* loops
* the language itself
* garbage collection

And it was the first of three offerings. I notice no one even bothered to touch the other two cases. Why? Because they are not microbenchmarks!

Cython could be formidable if it had a jit and parametric types. But feel free to ignore progress.

Sage could be formidable if it merely used the LLVM Jit from Python itself. Something I recommended half a decade ago!

Naturally. :-)

Bill Hart

unread,
Aug 20, 2014, 5:27:54 PM8/20/14
to sage-flame
Fredrik also wrote code in an experimental flint module (not merged), which does the Fateman benchmark for exponent 30 in less than 21s (I don't have the precise times as the Fateman benchmark was only a portion of the entire time, though most of it).

Bill.


On 20 August 2014 22:50, Bill Hart <goodwi...@googlemail.com> wrote:

Robert Bradshaw

unread,
Aug 20, 2014, 5:34:06 PM8/20/14
to sage-...@googlegroups.com
On Wed, Aug 20, 2014 at 2:05 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> You did the same thing as Ondrej. You used machine words instead of bignums.
> And I already said Julia does that *at the console* just as fast *with
> machine words*. A point people seem to miss consistently when it suits them!
>
> The importance of microbenchmarks like this is not to compute the answer as
> fast as you can, but to use them as a means of identifying and improving
> performance of systems as a whole.
>
> My point is that this can be done extremely fast *with* bignums, typed at
> the console. And that does matter, regardless of what you think. It
> exercises:
>
> * bignums
> * loops
> * the language itself
> * garbage collection
>
> And it was the first of three offerings. I notice no one even bothered to
> touch the other two cases. Why? Because they are not microbenchmarks!

Because I didn't have time to dive into them.

> Cython could be formidable if it had a jit and parametric types. But feel
> free to ignore progress.

There is @cython.comple and cython.inline. It's a bit "heavy" for a
JIT (yeah, it shells out to a standard C compiler the first time) but
probably technically meets the definition (and takes advantage of
knowing, for example, runtime argument type).

Parametric types are a different beast...

Cython is not the perfect language, far from it. Rather it's a very
useful tool in an existing ecosystem.

Bill Hart

unread,
Aug 20, 2014, 5:40:27 PM8/20/14
to sage-flame
On 20 August 2014 23:33, Robert Bradshaw <robe...@gmail.com> wrote:
On Wed, Aug 20, 2014 at 2:05 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> You did the same thing as Ondrej. You used machine words instead of bignums.
> And I already said Julia does that *at the console* just as fast *with
> machine words*. A point people seem to miss consistently when it suits them!
>
> The importance of microbenchmarks like this is not to compute the answer as
> fast as you can, but to use them as a means of identifying and improving
> performance of systems as a whole.
>
> My point is that this can be done extremely fast *with* bignums, typed at
> the console. And that does matter, regardless of what you think. It
> exercises:
>
> * bignums
> * loops
> * the language itself
> * garbage collection
>
> And it was the first of three offerings. I notice no one even bothered to
> touch the other two cases. Why? Because they are not microbenchmarks!

Because I didn't have time to dive into them.

> Cython could be formidable if it had a jit and parametric types. But feel
> free to ignore progress.

There is @cython.comple and cython.inline. It's a bit "heavy" for a
JIT (yeah, it shells out to a standard C compiler the first time) but
probably technically meets the definition (and takes advantage of
knowing, for example, runtime argument type).

Yeah it qualifies, so long as it preserves state between calls. I once saw someone claim they'd written a repl for C. It was about 10 lines of Python and it called gcc. The second line had no idea what the state of the variables were after the first line.

Bill Hart

unread,
Aug 20, 2014, 5:43:30 PM8/20/14
to sage-flame
For comparison, Nemo takes about 12s with a dense representation (it doesn't have a sparse format, which would be much faster, as this is a sparse benchmark).

Sage takes 219s.

Again I feel I should emphasise that I didn't do anything to "optimise" Nemo. This could be done much, much faster. It does just about the most stupid thing possible at present, and it already beats Sage, Pari, Maple, Singular and Maple (version 13 and below).

Ondřej Čertík

unread,
Aug 20, 2014, 6:12:20 PM8/20/14
to sage-...@googlegroups.com, Francesco Biscani
model name : Intel(R) Xeon(R) CPU E5-1603 0 @ 2.80GHz


>
> The best time I am aware of presently on a 2.2GHz machine for that benchmark
> in open source software is 18.7s. (Bernard Parisse)

Which code does this?

>
> The best time I am aware of for closed source software is with exponent 40
> instead of 30, and the time is 33.9s on 12 cores. Perhaps you can try

Which code?

> Piranha with exponent 40 and see how it compares. It doesn't scale very well
> with the number of cores, but unfotunately I don't have timings for 4 cores.

Exponent 40 took almost half an hour on 4 cores:

1684.955474s wall, 6629.100000s user + 2.840000s system =
6631.940000s CPU (393.6%)

It's weird it would scale so bad with the increasing exponent.

>
> The best times I managed with my own code were apparently about 3.4s with
> exponent 20 on a single core at 2.2GHz. Again perhaps you can time Piranha
> and see how long it takes.

Exponent 20:

$ ./fateman1_perf
Running 1 test case...
0.550569s wall, 1.830000s user + 0.010000s system = 1.840000s CPU (334.2%)

$ ./fateman1_perf 1
Running 1 test case...
1.680002s wall, 1.670000s user + 0.010000s system = 1.680000s CPU (100.0%)


So 0.55s on 4 cores and 1.68s on 1 core.

Ondrej

Ondřej Čertík

unread,
Aug 20, 2014, 6:21:33 PM8/20/14
to Francesco Biscani, sage-...@googlegroups.com, Bill Hart
Ah I see. With that change, the result is:

146.669439s wall, 557.060000s user + 0.670000s system = 557.730000s
CPU (380.3%)

So it is 2min 26.7s on 4 cores.

On Wed, Aug 20, 2014 at 4:14 PM, Francesco Biscani <blues...@gmail.com> wrote:
> Ondrej,
>
> the exponent 40 will trigger a bad behaviour in the integer class which I
> need to fix. Try replacing
>
> mp_integer<>
>
> with
>
> integer
>
> in the test code.
>
> Cheers,
>
> Francesco.

rjf

unread,
Aug 20, 2014, 6:38:40 PM8/20/14
to sage-...@googlegroups.com, blues...@gmail.com, goodwi...@googlemail.com
I am confused by the notion that something like a polynomial multiplication takes time T
in Sage.  Surely there are many different programs in Sage that multiply polynomials, and
so the time specified is not the Sage time, but the time taken by some library routine or
other.
RJF


Bill Hart

unread,
Aug 20, 2014, 7:20:28 PM8/20/14
to sage-flame
On 21 August 2014 00:12, Ondřej Čertík <ondrej...@gmail.com> wrote:
Giac, I think.
 

>
> The best time I am aware of for closed source software is with exponent 40
> instead of 30, and the time is 33.9s on 12 cores. Perhaps you can try

Which code?

That's Trip (if you believe their figures).
 

> Piranha with exponent 40 and see how it compares. It doesn't scale very well
> with the number of cores, but unfotunately I don't have timings for 4 cores.

Exponent 40 took almost half an hour on 4 cores:

 1684.955474s wall, 6629.100000s user + 2.840000s system =
6631.940000s CPU (393.6%)

It's weird it would scale so bad with the increasing exponent.

Not at all. It's because of cache misses, which is the biggest problem with multivariate arithmetic. Clever code tends to use clever tricks to avoid cache misses.
 
And you can see Francesco's response which explains what he does. It's clear this will be faster than just using say GMP bignums. 


>
> The best times I managed with my own code were apparently about 3.4s with
> exponent 20 on a single core at 2.2GHz. Again perhaps you can time Piranha
> and see how long it takes.

Exponent 20:

$ ./fateman1_perf
Running 1 test case...
 0.550569s wall, 1.830000s user + 0.010000s system = 1.840000s CPU (334.2%)

$ ./fateman1_perf 1
Running 1 test case...
 1.680002s wall, 1.670000s user + 0.010000s system = 1.680000s CPU (100.0%)


So 0.55s on 4 cores and 1.68s on 1 core.

That's pretty good. I can see from Francesco's response why this is so good.
 

Ondrej

Bill Hart

unread,
Aug 20, 2014, 7:38:56 PM8/20/14
to Ondřej Čertík, Francesco Biscani, sage-flame
What is the <integer> type precisely?

Bill Hart

unread,
Aug 20, 2014, 7:54:37 PM8/20/14
to rjf, sage-flame, Francesco Biscani
I do apologise. It takes time T using flint for Z[x]. The authors of flint are William Hart, Fredrik Johansson, Sebastian Pancratz, David Harvey, Andy Novocin and dozens of other authors. 

The generic arithmetic is part of the Sage library, written in Cython and Python, and contains code written by authors too numerous to list, but including William Stein, David Harvey, Robert Bradshaw....

I didn't actually check if it automatically switches to using a different library in the multivariate case. I should have checked. But I'm lazy.

It might use Pari, the primary authors of which are Henri Cohen, Karim Belabas and Bill Alombert or Singular, the primary directors of which are Wolfram Decker, Gert-Martin Gruel, Gerhard Pfister and Hand Schoenemann. 

I don't know who actually wrote the code.

It is sage-flame though. I didn't realise we were being so formal and scientific.

Bill.

Bill Hart

unread,
Aug 20, 2014, 8:55:26 PM8/20/14
to rjf, sage-flame, Francesco Biscani
On 21 August 2014 01:54, Bill Hart <goodwi...@googlemail.com> wrote:
I do apologise. It takes time T using flint for Z[x]. The authors of flint are William Hart, Fredrik Johansson, Sebastian Pancratz, David Harvey, Andy Novocin and dozens of other authors. 

The generic arithmetic is part of the Sage library, written in Cython and Python, and contains code written by authors too numerous to list, but including William Stein, David Harvey, Robert Bradshaw....

I checked and I was right. It uses flint and the Sage library. But I missed two authors of the code in the Sage library. Maarten Derickx and Gonzalo Tornaria, though not all authors contributed code relevant to this benchmark.

So it's predominantly written in Cython and C.

Bill.

Bill Hart

unread,
Aug 20, 2014, 10:45:57 PM8/20/14
to sage-flame
Piranha's beefed up sparse Pearce benchmark is much more exciting.

f = (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^16;
g = (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^16;
time f*g;

Piranha takes 11s (sparse distributed)
Nemo takes 90s (recursive dense)
Sage takes 3000s (recursive dense)

Sage uses Karatsuba multiplication.

I'd dearly love to know how long Trip and sdmp take on these nowadays. I should also try Giac.

Bill.


On 21 August 2014 00:21, Ondřej Čertík <ondrej...@gmail.com> wrote:

Ondřej Čertík

unread,
Aug 20, 2014, 11:00:03 PM8/20/14
to sage-...@googlegroups.com
Hi Bill,

Perfect. Did you compile Piranha in release mode
(CMAKE_BUILD_TYPE=Release)? That's very important.

Ondrej

Bill Hart

unread,
Aug 20, 2014, 11:12:47 PM8/20/14
to sage-flame
Sorry, I should have said Piranha timings were not done by me, or on the same machine. I'm not particularly bothered by a factor of 2 here or there when there is such a huge difference.

Bill.

Bill Hart

unread,
Aug 20, 2014, 11:22:50 PM8/20/14
to sage-flame
Unfortunately I can't build Piranha on my machine because neither my gcc nor cmake are recent enough. Which is fair enough. The timings wouldn't be terribly meaningful with gcc 4.7.3 anyway.

Bill.

Ondřej Čertík

unread,
Aug 21, 2014, 12:29:11 AM8/21/14
to sage-...@googlegroups.com
On Wed, Aug 20, 2014 at 9:22 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> Unfortunately I can't build Piranha on my machine because neither my gcc nor
> cmake are recent enough. Which is fair enough. The timings wouldn't be
> terribly meaningful with gcc 4.7.3 anyway.

Yeah, I had to compile 4.9 gcc compilers myself to use Piranha. I used
Hasdist for that:

http://hashdist.github.io/

Ondrej

Bill Hart

unread,
Aug 21, 2014, 10:36:25 AM8/21/14
to sage-flame
Does that allow you to install them in your home directory or does it try to install them globally on the machine (I can't do the latter, since it is no longer a machine belonging to me).

Bill.



Ondrej

Ondřej Čertík

unread,
Aug 21, 2014, 10:39:07 AM8/21/14
to sage-...@googlegroups.com

You specify path where things get installed,  no root needed. I don't have root access to most of my computers.

Sent from my mobile phone.

Bill Hart

unread,
Aug 22, 2014, 2:12:34 AM8/22/14
to sage-flame, Ondřej Čertík, Francesco Biscani
Julia version 0.3 was released today.

The Fateman 30 benchmark is down from 42s to 36s, beating our generic C implementation!

The Pearce 16 benchmark is down from 90s to 72s.

I had to add one single word to the entire Nemo codebase to port from Julia 0.2 to 0.3. Not a bad improvement for 1 second of work!

Bill.
Reply all
Reply to author
Forward
0 new messages