MockMMA 5.8 seconds (including 2.1 in GC)
Macsyma (in ACL 6.1) 6.9 seconds (including 2.0 in GC)
Hashing 8.6 seconds (including 2.2 in GC)
Fermat 9.2 seconds
Macsyma (maxima5.5) 32.0 seconds (? GC not reported, second trial)
Macsyma (commercial) 37.5 seconds (including 29.78 in GC)
Macsyma (maxima5.5) 53.5 seconds (? GC not reported, first trial)
Maple7 50.0 seconds (? GC, it was slower on 2nd trial)
Mathematica 4.1 82.3 seconds (? it was slower on 2nd trial)
MuPad 1.4.2 Pro 118.6 seconds
for multiplying the expanded version of (1+x+y+z)^20 by itself.
details on http://www.cs.berkeley.edu/~fateman/papers/fastmult.pdf
comments invited.
RJF
Interesting challenge, just tested with Giac (static readline version
compiled with gcc 2.96 CXXFLAGS=-O2)
35s on a Celeron 766Mhz, 100M RAM, 128K cache (kernel=Linux 2.4.3)
Memory used was around 10M.
Giac use ordered distributed representation for polynomials. The monomials
exponents are stored in a vector of int. Bignum arithmetic uses GMP. No
garbage collector.
I believe that it would be interesting to know the exact representation
used for polynomials in the system you tested, as well as the limit of the
representations.
For example, recursive representation is obviously faster on this
problem than distributed representation (where sorting is needed)...
As far as limits: the way I implemented hashing, all the
exponents are stored in one small integer. Generalizing this
to a vector of integers is simple but would increase storage
and slow down computation. The hashing representation is
much smaller than macsyma or mockmma, and so could handle
larger problems. There are no limits on the largest exponent
inthe Macsyma representation because x^(1000)+1 is simply
a list (x 1000 1 0 1). In the Mockmma representation,
there is a limit in that x^(1000)+1 would be stored in an
array of 1001 coefficients, and so you would run out of
memory sooner.
Maple, at least on some machines, had a limit on the
number of terms in an expression. This was occasionally
encountered. I don't know if this limit has been removed
for Maple 7.
I think that each system has some maximum bignum
size; While many systems seem to have a design that
would allow one to fill all of memory with one long
chain of bignum digits, in practice there is usually
something else that stops it.
RJF
Briefly: Macsyma us
Please be careful with such comparisons. I have seen huge (factor 2)
differences on one machine when the malloc algorithm was changed. Also,
memory access latency is usually more important than CPU cycles. To be
sure, use one machine only and run all the tests under the same OS.
[...]
: Maple, at least on some machines, had a limit on the
: number of terms in an expression. This was occasionally
: encountered. I don't know if this limit has been removed
: for Maple 7.
Yes, it has for Maple 6.
: I think that each system has some maximum bignum
: size; While many systems seem to have a design that
: would allow one to fill all of memory with one long
: chain of bignum digits, in practice there is usually
: something else that stops it.
Really??? What stops it on Maple or Mathematica? In CLN's exact
types we certainly have no such limits.
Regards
-richy.
--
Richard B. Kreckel
<Richard...@Uni-Mainz.DE>
<http://wwwthep.physik.uni-mainz.de/~kreckel/>
By short integer, you mean something that fit in a register of the CPU?
That would for 3 vars mean a max exponent of about 1000 on a 32 bit CPU?
> As far as limits: the way I implemented hashing, all the
> exponents are stored in one small integer. Generalizing this
> to a vector of integers is simple but would increase storage
> and slow down computation. The hashing representation is
> much smaller than macsyma or mockmma, and so could handle
> larger problems.
Correct me if I'm wrong, but this representation would not
work if you had too much variables because the short int
representation would not allow some powers?
Parisse Bernard wrote:
>>Hashing uses a distributed form, unordered hashtable with
>>key = exponents compressed into a short integer
>>value = bignum.
>>
>>
>
> By short integer, you mean something that fit in a register of the CPU?
> That would for 3 vars mean a max exponent of about 1000 on a 32 bit CPU?
That's right. It would require no change in the source code in lisp,
but would make it run slower if you made the exponents larger than
a word. Or we could hash on lists, which would be larger and
slower both.
>
>
>>As far as limits: the way I implemented hashing, all the
>>exponents are stored in one small integer. Generalizing this
>>to a vector of integers is simple but would increase storage
>>and slow down computation. The hashing representation is
>>much smaller than macsyma or mockmma, and so could handle
>>larger problems.
>>
>
> Correct me if I'm wrong, but this representation would not
> work if you had too much variables because the short int
> representation would not allow some powers?
It would still work, but be slower since the exponents
would be encoded as bignums. I guess I could time some
version of that too.
>
I have made a corresponding C++ program (see below) and I get around 18s
on my Celeron 766M, memory used around 2M.
You can replace giac and gen by any library/class representing bignum
(gen is a generic type that can represent other data types than just
large integers)
#include <giac/giac.h>
using namespace std;
using namespace giac;
typedef map<int,gen> pol;
void fastmult(const pol & p1,const pol & p2,pol & product){
product.clear();
pol::const_iterator it1=p1.begin(),it1end=p1.end();
pol::const_iterator it2,it2end=p2.end();
pol::iterator prod_it;
int sum1,sum;
gen coeff1;
for (;it1!=it1end;++it1){
sum1=it1->first;
coeff1=it1->second;
for (it2=p2.begin();it2!=it2end;++it2){
sum=sum1+it2->first;
prod_it=product.find(sum);
if (prod_it==product.end())
product[sum]=coeff1*it2->second;
else
prod_it->second = prod_it->second +coeff1*it2->second;
}
}
}
void fastpower(const pol & p,int k,pol & res){
if (k==0){
res.clear();
res[0]=1;
return;
}
res=p;
pol tmp;
for (int i=1;i<k;++i){
tmp=res;
fastmult(tmp,p,res);
}
}
ostream & operator << (ostream & of,const pol & p){
pol::const_iterator it=p.begin(),itend=p.end();
for (;it!=itend;++it){
of << it->second << "[" << it->first << "] ";
}
return of;
}
int main(){
int k;
cout << "Enter power";
cin >> k;
pol xyz;
xyz[0]=1; // cst coeff
xyz[1]=1; // x coeff
xyz[1<<8]=1; // y coeff
xyz[1<<16]=1; // z coeff
cout << xyz << endl;
// make xyz^20
pol xyz20,xyz40;
fastpower(xyz,k,xyz20);
cout << xyz20.size() << endl;
clock_t start=clock();
fastmult(xyz20,xyz20,xyz40);
// cout << xyz40 << endl;
clock_t end=clock();
cout << end-start << endl;
cout << "Number of monomials" << xyz40.size() << endl;
}
I think that multiplication using hashtables
in Lisp looks somewhat neater, than writing it in C++
at least if you are familiar with lisp. I have removed
some error checking from the code below, but the n^2
operation is just a doubly-nested "maphash".
(defun hashtimes(r1 r2) ;r1 and r2 are hash tables
(let((res (make-hash-table :test #'eq)))
(maphash #'(lambda(key val)
(maphash #'(lambda(key2 val2)
(incf (gethash (+ key key2) res 0)
(* val val2)
))r2)) r1) ;* may be bignum mult
;;optional, remove 0 terms from hashtable result
(maphash #'(lambda(key val)(if (eql 0 val)(remhash key res)))res)
;; make the form of the result acceptable to macsyma
`(($hashtable simp ,res, vars) |$encoded|,(hash-table-count res))))
I think that to make these programs much faster we would
have to
(a) make bignum arithmetic faster
or
(b) make bignum arithmetic cause less storage allocation.
One way to do this is to change bignum mult or add so that
we compute x:=a*b+x
by computing a*b in a temporary location that is immediately
reclaimed and doesn't need to be garbage collected (in lisp
this would be called stack-allocated variables),
and by re-using the memory cells used for x in the new
value for x.
A really clever lisp compiler could know all this because
the inner statement is
(incf x (* v v2)) where x= hashtable access.
It is obvious that the value (* v v2) is not used anywhere
else because it is inaccessible. It is obvious that x
can be overwritten because the statement (incf x y) means
x:=x+y.
the compiler I'm using is not nearly so clever, and in
fact it even recomputes the hash location twice. If it
were changed to compute once, it would improve the time
by some 10%.
RJF
I tried your example on meditor : it took 204.5 seconds on a 700MHZ
Processor. As pointed out by R.Kreckel, you should test it on your
computer (with an up-to-date JVM) for accuracy. The polynomials are
represented in distributed form, ordered using a beta-tree with
monomials as keys and coefficients as values. The monomials are arrays
of integer exponents. The coefficients are java BigIntegers. Here is
the program:
import spoke.math.*;
import spoke.math.analyse.*;
public class Essai {
public static void main(String args[]) {
Arithmetique a=Polynome.valueOf(Expression.valueOf("(1+x+y+z)^20"),new
Variable[] {new Constante("x"),new Constante("y"),new
Constante("z")},Lexicographique.comparer,0);
System.out.println(a);
long t=System.currentTimeMillis();
Arithmetique b=a.multiply(a);
System.out.println(System.currentTimeMillis()-t);
}
}
Right. I have tested on a PIII 800Mhz 256K cache and I get 19.5s for
Giac
and 13.5s for the C++ program.
Maxima 5.9.0 (old, using GCL) : 112.21 seconds
Maxima 5.9.0 (new*, using GCL): 18.80 seconds
Maxima 5.9.0 (using clisp) : 31.30 seconds
Maxima 5.9.0 (using CMUCL) : 7.91 seconds
*The new version includes some default memory preallocation with GCL. The
preallocation hasn't made it into cvs yet.
Given that the numbers below come from a 933 MHz machine, I think the above
numbers compare pretty well.
--Jim
I tried this on my 533 MHz G4 PowerMac and got the following:
Maple7 57.0 seconds
MapleV 21.1 seconds
Does anyone know if this is true in general, that Maple7 is 2 to 3 times
slower than MapleV?
John
> [...]
> Does anyone know if this is true in general, that Maple7 is 2 to 3 times
> slower than MapleV?
Certainly not. See ?updates,Maple7,efficiency for counterexamples.
--
Thomas Richard
Maple Support
Scientific Computers GmbH
http://www.scientific.de
Thomas Richard wrote:
> John Greene <jgr...@d.umn.edu> wrote:
>
>
>>[...]
>>Does anyone know if this is true in general, that Maple7 is 2 to 3 times
>>slower than MapleV?
>>
>
> Certainly not. See ?updates,Maple7,efficiency for counterexamples.
>
>
Form 3.0* 0.9 sec (or 94 sec, depending...)
Reduce 3.7 (in CSL) 5.0 sec (including 1.3 in GC)
MockMMA (in ACL 6.1) 5.8 sec (including 2.1 in GC)
Macsyma (in ACL 6.1) 6.9 sec (including 2.0 in GC)
Hashing (in ACL 6.1) 8.6 sec (including 2.2 in GC)
Fermat 8.6 sec (fastest of 2 runs.)
Maxima5.5 (in GCL) 32.0 sec (? GC not reported, second trial)
Macsyma (commercial) 37.5 sec (including 29.78 in GC)
Maxima5.5 (in GCL) 53.5 sec (? GC not reported, first trial)
Maple7 50.0 sec (? GC, it was slower on 2nd trial)
Mathematica 4.1 82.3 sec (? it was slower on 2nd trial)
MuPad 1.4.2 Pro 118.6 sec
Times on other machines:
Mupad 1.4.2* 119.2 sec
Mupad 2.0* 117.0 sec
Maxima5.6* 51.0 sec
Giac2.96* 35.0 sec
Giac2.96* Hashing 18.0 sec
Ginac 1.0.8* 57.3 sec
MapleVR3* 16.7 sec
MapleVR4* 17.9 sec
MapleVR5* 18.4 sec
Meditor* 204.5 sec
Maxima 5.9.0(GCL-1)*112.21 sec
Maxima 5.9.0(GCL-2)* 18.80 sec
Maxima 5.9.0(clisp)* 31.30 sec
Maxima 5.9.0(CMUCL)* 7.91 sec
\end{verbatim}}
*(Giac 2.96 compiled with gcc 2.96 CXXFLAGS=-02 on Celeron 766Mhz 100M
RAM 128K cache, kernel=Linux 2.4.3, data points provided by Paraisse
Bernard. Numerous tests were
run by Richard Kreckel on a P-III 1GHz 5112kB cache
Linux machine. Times for his machine seems to be about the same
as mine for MuPad 1.4.2, Macsyma 5.6, and Mathematica, so probably
the numbers for Form, Ginac-1.0.8, Mupad 2.0, MapleVR3,4,5
are about comparable. Meditor took 204.5
sec on a 700Mhz processor. Meditor run by Raphael Jolly.
The collection of Maxima 5.9 timings are from James Amundson,
running on a 700Mhz P-III linux machine. The GCL-1 time is, I believe,
for the system that took 53 seconds on our machine. The GCL-2 time
is given more memory allocation. Clisp is a byte-coded common lisp
that is expected to be slower. CMUCL is a highly optimzing implementation
of common lisp based on work at Carnegie Mellon Univ. If it scales up
comparably to the other measurements, it would be fastest of the
lisp systems. Currently CMUCL runs only on Linux.
)
Please note that it's not Giac 2.96, it's Giac 0.2.2 compiled with gcc
2.96.
And I get 19.5s (giac) or 13.5 (C++ prog) on a PIII 800Mhz with 256K
cache.
For those who want to test, the giac Linux executable can be downloaded
at
ftp://fourier.ujf-grenoble.fr/pub/hp48/giac.gz
(the source is available in the same directory).
Here is one more data point:
Axiom 2.3 13.56 (evaluation) + 6.56 (GC)
This is Duron-600 running Linux.
Andrey
REDUCE 3.7 6.02 (evaluation) + 0.22 (GC)
PSL-based REDUCE on the same Duron-600 running Linux
Andrey
Richard> lisp systems. Currently CMUCL runs only on Linux.
Just a quick correction. CMUCL runs on Linux and Solaris systems. I
think it also works on OpenBSD and NetBSD systems. Older versions ran
on Alphas, MIPS, and HPs. There has been some talk about reviving
some of these other platforms. The limiting factor has been access
and desire for these other platforms.
SBCL, a derivative of CMUCL, runs on several platforms as well,
including Linux/PPC. I expect about the same performance.
Ray
> MuPad 1.4.2 Pro 118.6 seconds
>
> for multiplying the expanded version of (1+x+y+z)^20 by itself.
>
> details on http://www.cs.berkeley.edu/~fateman/papers/fastmult.pdf
While this is certainly not satisfactory, I'd like to point out that
your comparison seems to be a bit unfair: If I read your paper
correctly, your own code uses a data structure holding only
polynomials, while your MuPAD code uses general expressions. Try
this version:
>> a:=expand((1+x+y+z)^20):
>> p:=poly(a):
>> q:=poly(a+1):
>> time(expand(p*q))
30000
I.e., 30 seconds. Not really good, but better.
--
Attachment? Nein: http://piology.org/ILOVEYOU-Signature-FAQ.html
begin LOVE-LETTER-FOR-YOU.txt.vbs
Christopher Creutzig, MuPAD group, Tel.: +49-5251-60-5525
end
RJF
I was disappointed to see that
r:=x+y;
poly(r)+1 ; is not a poly
poly(poly(r)+1); returns FAIL.
And that "poly" form is not contagious as is rat form in macsyma,
Except that poly(r)*poly(r) = poly(r)*r = poly(r^2) = expand(poly(r)*r).
So sometimes it is.
There is, in mupad, a substantial substructure of domains.
This poly form is part of that structure, but does not
seem to do what I would intuitively expect, at least in
returning FAIL.
RJF
p = (x^(2^100-1)+x^123+x^57x^13)
?
It seems unfair you use only one test, one sample point, one for which your representation
is efficient due to the sparse nature of (1+x+y+z)^20 (due to the '1').
Of course computers are good at dealing with arrays. But with sparse objects?
Wouldn't a general test suite be more fair? Wouldn't a sparse representation
be better overall?
Also, out of curiosity, are all the tests run on the same machine? Are you using
the CPU speed as the sole measure of the speed of the machine?
Ayal Pinkus
apinkus wrote:
> Just out of curiosity, since it seems you are using a dense representation,
> how well does it fare on expressions like
>
> p = (x^(2^100-1)+x^123+x^57x^13)
>
> ?
OK, there are 3 different representations of interest here.
the vector representation used by MockMMa
the list representation used by Macsyma (also Reduce... )
the hashtable representation used by (un-named 1-page lisp program).
For vector, it would be a big loss since I would have to allocate a vector
[1 0 0 0..... 1 0 0 0 0..... 1 ....1 0 0 0 0...] and that itself would
be impossible. However if one did this: let y=x^(2^100-1), then this
would possibly win.
For list, there would be no problem at all.
(x bignumber 1 123 1 57 1 13 1) would be about as fast.
For hash, I would have to have a much longer exponent key to encode this
stuff,
so the exponent would be a few hundred bits. It would slow things down
a little,
but not as much as making the coefficients really huge.
>
> It seems unfair you use only one test, one sample point, one for which your representation
> is efficient due to the sparse nature of (1+x+y+z)^20 (due to the '1').
I don't think it would make much difference to leave out the 1.
>
> Of course computers are good at dealing with arrays. But with sparse objects?
The hash table is a sparse object. So is the list.
>
> Wouldn't a general test suite be more fair? Wouldn't a sparse representation
> be better overall?
I think that judiciously switching between sparse and dense is a way
to win. If you only had one representation, sparse would be more
plausible.
More test cases are possible, but there has to be some selection
to make a point..
>
> Also, out of curiosity, are all the tests run on the same machine?
Yes. a 933 MHz Pentium III ? that is in my office. Running Windows 2000
5.00.2195
It says x86 family 6 model 8 stepping 6 523,568KB RAM
Are you using
> the CPU speed as the sole measure of the speed of the machine?
Yes
However, you compare the times the systems take, making it a
competition between systems based on this test.
>
> OK, there are 3 different representations of interest here.
> the vector representation used by MockMMa
> the list representation used by Macsyma (also Reduce... )
> the hashtable representation used by (un-named 1-page lisp program).
>
> For vector, it would be a big loss since I would have to allocate a vector
> [1 0 0 0..... 1 0 0 0 0..... 1 ....1 0 0 0 0...] and that itself would
> be impossible. However if one did this: let y=x^(2^100-1), then this
> would possibly win.
>
True, but then you would be helping the CAS. The whole point is to have
it done automatically? You can help the CAS even more by writing this in c++.
I'm sure you could hand-optimize it to outperform any lisp compiler.
Write a c++ program specifically for this test. If I wanted to cheat I could
have the compiler unroll some template classes. Then runtime would be low.
If you are indeed interested in the coefficients of the expression, if they
really were of interest for some research, then this would be a valid and
fruitful approach.
Or, a closed form expression for the coefficients of the terms might be found.
> For list, there would be no problem at all.
> (x bignumber 1 123 1 57 1 13 1) would be about as fast.
>
Slower than your vector/array representation though, since arrays
can be dealt with more efficiently by computers. The point: other
systems might be slower, because they use sparse representations,
but at the same time they can solve problems quickly that would be
troublesome in your system. So that is why it is perhaps not a fair
test.
>
> I think that judiciously switching between sparse and dense is a way
> to win. If you only had one representation, sparse would be more
> plausible.
>
> More test cases are possible, but there has to be some selection
> to make a point..
>
Forgive me if this sounds rude, but in making the point, wouldn't it be fair
to discuss disadvantages also? What is the point of the article by the way?
1) That dense representations are good for some specific calculations?
Because then it would have been best to compare various representations
implemented under the same lisp system.
2) That lisp is the way to go? Then the same algorithm should have been
implemented under different systems, in different languages, for comparison.
3) The quality of MockMMA, but then an extensive test suite would have been
appropriate.
4) That it is an interesting test for CAS systems? That is true. But it is
unfair to make a time comparison between your 'special purpose' code to general
all-purpose systems which have to perform reasonably for many more different inputs.
>
> Are you using
> > the CPU speed as the sole measure of the speed of the machine?
>
> Yes
>
Ok, that is fair, I suppose.
But an interesting test nonetheless!
Ayal Pinkus