update on fast mult benchmark

Richard Fateman

unread,

Mar 22, 2002, 7:01:01 PM3/22/02

to

I've added some more programs. It looks like this now.

MockMMA 5.8 seconds (including 2.1 in GC)
Macsyma (in ACL 6.1) 6.9 seconds (including 2.0 in GC)
Hashing 8.6 seconds (including 2.2 in GC)
Fermat 9.2 seconds
Macsyma (maxima5.5) 32.0 seconds (? GC not reported, second trial)
Macsyma (commercial) 37.5 seconds (including 29.78 in GC)
Macsyma (maxima5.5) 53.5 seconds (? GC not reported, first trial)
Maple7 50.0 seconds (? GC, it was slower on 2nd trial)
Mathematica 4.1 82.3 seconds (? it was slower on 2nd trial)
MuPad 1.4.2 Pro 118.6 seconds

for multiplying the expanded version of (1+x+y+z)^20 by itself.

details on http://www.cs.berkeley.edu/~fateman/papers/fastmult.pdf

comments invited.

RJF

Parisse Bernard

unread,

Mar 23, 2002, 10:30:33 AM3/23/02

to

Richard Fateman wrote:

Interesting challenge, just tested with Giac (static readline version
compiled with gcc 2.96 CXXFLAGS=-O2)
35s on a Celeron 766Mhz, 100M RAM, 128K cache (kernel=Linux 2.4.3)
Memory used was around 10M.
Giac use ordered distributed representation for polynomials. The monomials
exponents are stored in a vector of int. Bignum arithmetic uses GMP. No
garbage collector.
I believe that it would be interesting to know the exact representation
used for polynomials in the system you tested, as well as the limit of the
representations.
For example, recursive representation is obviously faster on this
problem than distributed representation (where sorting is needed)...

Richard Fateman

unread,

Mar 23, 2002, 11:48:57 AM3/23/02

to

Thanks for the additional data. I would guess that Giac, normalized
for the difference between 933Mhz and 766Mhz, would be about 28 seconds.
The exact representation for the systems I know about can be
viewed directly (I can provide the source code) form MockMMA,
Macsyma (public domain version), Hashing. I cannot provide
the commercial Macsyma, though I think it is probably the same,
nor can I provide Maple, Mathematica or Mupad. There is
some description in the literature of Maple.
The reference URL has some details, but the essence is that
MockMMA uses recursive representation, dense-vectors of bignums
at each level x,y,z.
Macsyma uses recursive representation, sparse lists of
exponent/coefficient (bignum) at each level.
Hashing uses a distributed form, unordered hashtable with
key = exponents compressed into a short integer
value = bignum.

As far as limits: the way I implemented hashing, all the
exponents are stored in one small integer. Generalizing this
to a vector of integers is simple but would increase storage
and slow down computation. The hashing representation is
much smaller than macsyma or mockmma, and so could handle
larger problems. There are no limits on the largest exponent
inthe Macsyma representation because x^(1000)+1 is simply
a list (x 1000 1 0 1). In the Mockmma representation,
there is a limit in that x^(1000)+1 would be stored in an
array of 1001 coefficients, and so you would run out of
memory sooner.
Maple, at least on some machines, had a limit on the
number of terms in an expression. This was occasionally
encountered. I don't know if this limit has been removed
for Maple 7.
I think that each system has some maximum bignum
size; While many systems seem to have a design that
would allow one to fill all of memory with one long
chain of bignum digits, in practice there is usually
something else that stops it.

RJF

Briefly: Macsyma us

Richard B. Kreckel

unread,

Mar 23, 2002, 2:07:43 PM3/23/02

to

Richard Fateman <nos...@nowhere.net> wrote:
: Thanks for the additional data. I would guess that Giac, normalized

: for the difference between 933Mhz and 766Mhz, would be about 28 seconds.

Please be careful with such comparisons. I have seen huge (factor 2)
differences on one machine when the malloc algorithm was changed. Also,
memory access latency is usually more important than CPU cycles. To be
sure, use one machine only and run all the tests under the same OS.

[...]
: Maple, at least on some machines, had a limit on the

: number of terms in an expression. This was occasionally
: encountered. I don't know if this limit has been removed
: for Maple 7.

Yes, it has for Maple 6.

: I think that each system has some maximum bignum

: size; While many systems seem to have a design that
: would allow one to fill all of memory with one long
: chain of bignum digits, in practice there is usually
: something else that stops it.

Really??? What stops it on Maple or Mathematica? In CLN's exact
types we certainly have no such limits.

Regards
-richy.
--
Richard B. Kreckel
<Richard...@Uni-Mainz.DE>
<http://wwwthep.physik.uni-mainz.de/~kreckel/>

Parisse Bernard

unread,

Mar 23, 2002, 1:50:22 PM3/23/02

to

> Hashing uses a distributed form, unordered hashtable with
> key = exponents compressed into a short integer
> value = bignum.
>

By short integer, you mean something that fit in a register of the CPU?
That would for 3 vars mean a max exponent of about 1000 on a 32 bit CPU?

> As far as limits: the way I implemented hashing, all the
> exponents are stored in one small integer. Generalizing this
> to a vector of integers is simple but would increase storage
> and slow down computation. The hashing representation is
> much smaller than macsyma or mockmma, and so could handle
> larger problems.

Correct me if I'm wrong, but this representation would not
work if you had too much variables because the short int
representation would not allow some powers?

Richard Fateman

unread,

Mar 23, 2002, 2:58:39 PM3/23/02

to

Parisse Bernard wrote:

>>Hashing uses a distributed form, unordered hashtable with
>>key = exponents compressed into a short integer
>>value = bignum.
>>
>>
>
> By short integer, you mean something that fit in a register of the CPU?
> That would for 3 vars mean a max exponent of about 1000 on a 32 bit CPU?

That's right. It would require no change in the source code in lisp,
but would make it run slower if you made the exponents larger than
a word. Or we could hash on lists, which would be larger and
slower both.

>
>
>>As far as limits: the way I implemented hashing, all the
>>exponents are stored in one small integer. Generalizing this
>>to a vector of integers is simple but would increase storage
>>and slow down computation. The hashing representation is
>>much smaller than macsyma or mockmma, and so could handle
>>larger problems.
>>
>
> Correct me if I'm wrong, but this representation would not
> work if you had too much variables because the short int
> representation would not allow some powers?

It would still work, but be slower since the exponents
would be encoded as bignums. I guess I could time some
version of that too.

>

Parisse Bernard

unread,

Mar 24, 2002, 5:00:20 AM3/24/02

to

Richard Fateman a écrit :

>
> Parisse Bernard wrote:
>
> >>Hashing uses a distributed form, unordered hashtable with
> >>key = exponents compressed into a short integer
> >>value = bignum.
> >>
> >>
> >
> > By short integer, you mean something that fit in a register of the CPU?
> > That would for 3 vars mean a max exponent of about 1000 on a 32 bit CPU?
>
> That's right. It would require no change in the source code in lisp,
> but would make it run slower if you made the exponents larger than
> a word. Or we could hash on lists, which would be larger and
> slower both.

I have made a corresponding C++ program (see below) and I get around 18s
on my Celeron 766M, memory used around 2M.
You can replace giac and gen by any library/class representing bignum
(gen is a generic type that can represent other data types than just
large integers)

#include <giac/giac.h>

using namespace std;
using namespace giac;

typedef map<int,gen> pol;

void fastmult(const pol & p1,const pol & p2,pol & product){
product.clear();
pol::const_iterator it1=p1.begin(),it1end=p1.end();
pol::const_iterator it2,it2end=p2.end();
pol::iterator prod_it;
int sum1,sum;
gen coeff1;
for (;it1!=it1end;++it1){
sum1=it1->first;
coeff1=it1->second;
for (it2=p2.begin();it2!=it2end;++it2){
sum=sum1+it2->first;
prod_it=product.find(sum);
if (prod_it==product.end())
product[sum]=coeff1*it2->second;
else
prod_it->second = prod_it->second +coeff1*it2->second;
}
}
}

void fastpower(const pol & p,int k,pol & res){
if (k==0){
res.clear();
res[0]=1;
return;
}
res=p;
pol tmp;
for (int i=1;i<k;++i){
tmp=res;
fastmult(tmp,p,res);
}
}

ostream & operator << (ostream & of,const pol & p){
pol::const_iterator it=p.begin(),itend=p.end();
for (;it!=itend;++it){
of << it->second << "[" << it->first << "] ";
}
return of;
}

int main(){
int k;
cout << "Enter power";
cin >> k;
pol xyz;
xyz[0]=1; // cst coeff
xyz[1]=1; // x coeff
xyz[1<<8]=1; // y coeff
xyz[1<<16]=1; // z coeff
cout << xyz << endl;
// make xyz^20
pol xyz20,xyz40;
fastpower(xyz,k,xyz20);
cout << xyz20.size() << endl;
clock_t start=clock();
fastmult(xyz20,xyz20,xyz40);
// cout << xyz40 << endl;
clock_t end=clock();
cout << end-start << endl;
cout << "Number of monomials" << xyz40.size() << endl;
}

Richard Fateman

unread,

Mar 24, 2002, 3:53:45 PM3/24/02

to

Thanks to all of you for contributing!

I think that multiplication using hashtables
in Lisp looks somewhat neater, than writing it in C++
at least if you are familiar with lisp. I have removed
some error checking from the code below, but the n^2
operation is just a doubly-nested "maphash".

(defun hashtimes(r1 r2) ;r1 and r2 are hash tables
(let((res (make-hash-table :test #'eq)))
(maphash #'(lambda(key val)
(maphash #'(lambda(key2 val2)
(incf (gethash (+ key key2) res 0)
(* val val2)
))r2)) r1) ;* may be bignum mult
;;optional, remove 0 terms from hashtable result
(maphash #'(lambda(key val)(if (eql 0 val)(remhash key res)))res)
;; make the form of the result acceptable to macsyma
`(($hashtable simp ,res, vars) |$encoded|,(hash-table-count res))))

I think that to make these programs much faster we would
have to
(a) make bignum arithmetic faster
or
(b) make bignum arithmetic cause less storage allocation.

One way to do this is to change bignum mult or add so that
we compute x:=a*b+x
by computing a*b in a temporary location that is immediately
reclaimed and doesn't need to be garbage collected (in lisp
this would be called stack-allocated variables),
and by re-using the memory cells used for x in the new
value for x.
A really clever lisp compiler could know all this because
the inner statement is
(incf x (* v v2)) where x= hashtable access.
It is obvious that the value (* v v2) is not used anywhere
else because it is inaccessible. It is obvious that x
can be overwritten because the statement (incf x y) means
x:=x+y.

the compiler I'm using is not nearly so clever, and in
fact it even recomputes the hash location twice. If it
were changed to compute once, it would improve the time
by some 10%.

RJF

Raphael Jolly

unread,

Mar 25, 2002, 7:29:08 AM3/25/02

to

Richard Fateman <fat...@cs.berkeley.edu> wrote in message news:<3C9BC5B...@cs.berkeley.edu>...

I tried your example on meditor : it took 204.5 seconds on a 700MHZ
Processor. As pointed out by R.Kreckel, you should test it on your
computer (with an up-to-date JVM) for accuracy. The polynomials are
represented in distributed form, ordered using a beta-tree with
monomials as keys and coefficients as values. The monomials are arrays
of integer exponents. The coefficients are java BigIntegers. Here is
the program:

import spoke.math.*;
import spoke.math.analyse.*;

public class Essai {
public static void main(String args[]) {
Arithmetique a=Polynome.valueOf(Expression.valueOf("(1+x+y+z)^20"),new
Variable[] {new Constante("x"),new Constante("y"),new
Constante("z")},Lexicographique.comparer,0);
System.out.println(a);
long t=System.currentTimeMillis();
Arithmetique b=a.multiply(a);
System.out.println(System.currentTimeMillis()-t);
}
}

Message has been deleted

parisse

unread,

Mar 26, 2002, 2:32:00 PM3/26/02

to

"Richard B. Kreckel" a écrit :

>
> Richard Fateman <nos...@nowhere.net> wrote:
> : Thanks for the additional data. I would guess that Giac, normalized
> : for the difference between 933Mhz and 766Mhz, would be about 28 seconds.
>
> Please be careful with such comparisons. I have seen huge (factor 2)
> differences on one machine when the malloc algorithm was changed. Also,
> memory access latency is usually more important than CPU cycles. To be
> sure, use one machine only and run all the tests under the same OS.

Right. I have tested on a PIII 800Mhz 256K cache and I get 19.5s for
Giac
and 13.5s for the C++ program.

James Amundson

unread,

Mar 27, 2002, 11:05:18 PM3/27/02

to

Here are some results on my 700 MHz PIII Linux machine using (not yet released
but getting close) Maxima 5.9.0:

Maxima 5.9.0 (old, using GCL) : 112.21 seconds
Maxima 5.9.0 (new*, using GCL): 18.80 seconds
Maxima 5.9.0 (using clisp) : 31.30 seconds
Maxima 5.9.0 (using CMUCL) : 7.91 seconds

*The new version includes some default memory preallocation with GCL. The
preallocation hasn't made it into cvs yet.

Given that the numbers below come from a 933 MHz machine, I think the above
numbers compare pretty well.

--Jim

John Greene

unread,

Mar 28, 2002, 11:35:37 AM3/28/02

to

In article <ua55h5k...@corp.supernews.com>, James Amundson
<amun...@fnal.gov> wrote:

I tried this on my 533 MHz G4 PowerMac and got the following:

Maple7 57.0 seconds

MapleV 21.1 seconds

Does anyone know if this is true in general, that Maple7 is 2 to 3 times
slower than MapleV?

John

Thomas Richard

unread,

Mar 28, 2002, 11:44:26 AM3/28/02

to

John Greene <jgr...@d.umn.edu> wrote:

> [...]

> Does anyone know if this is true in general, that Maple7 is 2 to 3 times
> slower than MapleV?

Certainly not. See ?updates,Maple7,efficiency for counterexamples.

--
Thomas Richard
Maple Support
Scientific Computers GmbH
http://www.scientific.de

Richard Fateman

unread,

Mar 28, 2002, 2:52:31 PM3/28/02

to

I think the evidence is that this computation
has gotten slower, not that EVERYTHING is slower!
See below for the latest collection of times
for this problem.
I got a copy of FORM 3.0 for my machine.
Really fast, but quite different from the other
systems.
RJF

Thomas Richard wrote:

> John Greene <jgr...@d.umn.edu> wrote:
>
>
>>[...]
>>Does anyone know if this is true in general, that Maple7 is 2 to 3 times
>>slower than MapleV?
>>
>
> Certainly not. See ?updates,Maple7,efficiency for counterexamples.
>
>

Form 3.0* 0.9 sec (or 94 sec, depending...)
Reduce 3.7 (in CSL) 5.0 sec (including 1.3 in GC)
MockMMA (in ACL 6.1) 5.8 sec (including 2.1 in GC)
Macsyma (in ACL 6.1) 6.9 sec (including 2.0 in GC)
Hashing (in ACL 6.1) 8.6 sec (including 2.2 in GC)
Fermat 8.6 sec (fastest of 2 runs.)
Maxima5.5 (in GCL) 32.0 sec (? GC not reported, second trial)
Macsyma (commercial) 37.5 sec (including 29.78 in GC)
Maxima5.5 (in GCL) 53.5 sec (? GC not reported, first trial)
Maple7 50.0 sec (? GC, it was slower on 2nd trial)
Mathematica 4.1 82.3 sec (? it was slower on 2nd trial)

MuPad 1.4.2 Pro 118.6 sec

Times on other machines:
Mupad 1.4.2* 119.2 sec
Mupad 2.0* 117.0 sec
Maxima5.6* 51.0 sec
Giac2.96* 35.0 sec
Giac2.96* Hashing 18.0 sec
Ginac 1.0.8* 57.3 sec
MapleVR3* 16.7 sec
MapleVR4* 17.9 sec
MapleVR5* 18.4 sec
Meditor* 204.5 sec
Maxima 5.9.0(GCL-1)*112.21 sec
Maxima 5.9.0(GCL-2)* 18.80 sec
Maxima 5.9.0(clisp)* 31.30 sec
Maxima 5.9.0(CMUCL)* 7.91 sec
\end{verbatim}}

*(Giac 2.96 compiled with gcc 2.96 CXXFLAGS=-02 on Celeron 766Mhz 100M
RAM 128K cache, kernel=Linux 2.4.3, data points provided by Paraisse
Bernard. Numerous tests were
run by Richard Kreckel on a P-III 1GHz 5112kB cache
Linux machine. Times for his machine seems to be about the same
as mine for MuPad 1.4.2, Macsyma 5.6, and Mathematica, so probably
the numbers for Form, Ginac-1.0.8, Mupad 2.0, MapleVR3,4,5
are about comparable. Meditor took 204.5
sec on a 700Mhz processor. Meditor run by Raphael Jolly.

The collection of Maxima 5.9 timings are from James Amundson,
running on a 700Mhz P-III linux machine. The GCL-1 time is, I believe,
for the system that took 53 seconds on our machine. The GCL-2 time
is given more memory allocation. Clisp is a byte-coded common lisp
that is expected to be slower. CMUCL is a highly optimzing implementation
of common lisp based on work at Carnegie Mellon Univ. If it scales up
comparably to the other measurements, it would be fastest of the
lisp systems. Currently CMUCL runs only on Linux.
)

parisse

unread,

Mar 29, 2002, 1:15:21 AM3/29/02

to

> *(Giac 2.96 compiled with gcc 2.96 CXXFLAGS=-02 on Celeron 766Mhz 100M
> RAM 128K cache, kernel=Linux 2.4.3, data points provided by Paraisse
> Bernard.

Please note that it's not Giac 2.96, it's Giac 0.2.2 compiled with gcc
2.96.
And I get 19.5s (giac) or 13.5 (C++ prog) on a PIII 800Mhz with 256K
cache.
For those who want to test, the giac Linux executable can be downloaded
at
ftp://fourier.ujf-grenoble.fr/pub/hp48/giac.gz
(the source is available in the same directory).

Andrey Grozin

unread,

Mar 30, 2002, 12:25:32 AM3/30/02

to

Hello *,

Here is one more data point:

Axiom 2.3 13.56 (evaluation) + 6.56 (GC)

This is Duron-600 running Linux.

Andrey

Andrey Grozin

unread,

Mar 30, 2002, 12:28:57 AM3/30/02

to

And one more data point:

REDUCE 3.7 6.02 (evaluation) + 0.22 (GC)

PSL-based REDUCE on the same Duron-600 running Linux

Andrey

Raymond Toy

unread,

Apr 1, 2002, 3:37:58 PM4/1/02

to

>>>>> "Richard" == Richard Fateman <fat...@cs.berkeley.edu> writes:

Richard> lisp systems. Currently CMUCL runs only on Linux.

Just a quick correction. CMUCL runs on Linux and Solaris systems. I
think it also works on OpenBSD and NetBSD systems. Older versions ran
on Alphas, MIPS, and HPs. There has been some talk about reviving
some of these other platforms. The limiting factor has been access
and desire for these other platforms.

SBCL, a derivative of CMUCL, runs on several platforms as well,
including Linux/PPC. I expect about the same performance.

Ray

Christopher Creutzig

unread,

Apr 3, 2002, 3:04:59 AM4/3/02

to

Richard Fateman <fat...@cs.berkeley.edu> writes:

> MuPad 1.4.2 Pro 118.6 seconds
>
> for multiplying the expanded version of (1+x+y+z)^20 by itself.
>
> details on http://www.cs.berkeley.edu/~fateman/papers/fastmult.pdf

While this is certainly not satisfactory, I'd like to point out that
your comparison seems to be a bit unfair: If I read your paper
correctly, your own code uses a data structure holding only
polynomials, while your MuPAD code uses general expressions. Try
this version:

>> a:=expand((1+x+y+z)^20):
>> p:=poly(a):
>> q:=poly(a+1):
>> time(expand(p*q))

30000

I.e., 30 seconds. Not really good, but better.

--
Attachment? Nein: http://piology.org/ILOVEYOU-Signature-FAQ.html
begin LOVE-LETTER-FOR-YOU.txt.vbs
Christopher Creutzig, MuPAD group, Tel.: +49-5251-60-5525
end

Richard Fateman

unread,

Apr 3, 2002, 10:02:49 AM4/3/02

to

Your assumption about data structures is correct. I was not
aware of the poly() construction in MuPad and will test
your version soon.
Thanks for the suggestion. I'll have to learn about poly().

RJF

Richard Fateman

unread,

Apr 3, 2002, 11:54:20 AM4/3/02

to

Mupad's time is reduced from 118.6 to 29.4 seconds on my machine.

I was disappointed to see that
r:=x+y;
poly(r)+1 ; is not a poly
poly(poly(r)+1); returns FAIL.
And that "poly" form is not contagious as is rat form in macsyma,

Except that poly(r)*poly(r) = poly(r)*r = poly(r^2) = expand(poly(r)*r).

So sometimes it is.

There is, in mupad, a substantial substructure of domains.
This poly form is part of that structure, but does not
seem to do what I would intuitively expect, at least in
returning FAIL.
RJF

apinkus

unread,

Apr 6, 2002, 12:27:35 AM4/6/02

to

Just out of curiosity, since it seems you are using a dense representation,
how well does it fare on expressions like

p = (x^(2^100-1)+x^123+x^57x^13)

?

It seems unfair you use only one test, one sample point, one for which your representation
is efficient due to the sparse nature of (1+x+y+z)^20 (due to the '1').

Of course computers are good at dealing with arrays. But with sparse objects?

Wouldn't a general test suite be more fair? Wouldn't a sparse representation
be better overall?

Also, out of curiosity, are all the tests run on the same machine? Are you using
the CPU speed as the sole measure of the speed of the machine?

Ayal Pinkus

Richard Fateman

unread,

Apr 5, 2002, 5:11:30 PM4/5/02

to apinkus

apinkus wrote:

> Just out of curiosity, since it seems you are using a dense representation,
> how well does it fare on expressions like
>
> p = (x^(2^100-1)+x^123+x^57x^13)
>
> ?

OK, there are 3 different representations of interest here.
the vector representation used by MockMMa
the list representation used by Macsyma (also Reduce... )
the hashtable representation used by (un-named 1-page lisp program).

For vector, it would be a big loss since I would have to allocate a vector
[1 0 0 0..... 1 0 0 0 0..... 1 ....1 0 0 0 0...] and that itself would
be impossible. However if one did this: let y=x^(2^100-1), then this
would possibly win.

For list, there would be no problem at all.
(x bignumber 1 123 1 57 1 13 1) would be about as fast.

For hash, I would have to have a much longer exponent key to encode this
stuff,
so the exponent would be a few hundred bits. It would slow things down
a little,
but not as much as making the coefficients really huge.

>
> It seems unfair you use only one test, one sample point, one for which your representation
> is efficient due to the sparse nature of (1+x+y+z)^20 (due to the '1').

I don't think it would make much difference to leave out the 1.

>
> Of course computers are good at dealing with arrays. But with sparse objects?

The hash table is a sparse object. So is the list.

>
> Wouldn't a general test suite be more fair? Wouldn't a sparse representation
> be better overall?

I think that judiciously switching between sparse and dense is a way
to win. If you only had one representation, sparse would be more
plausible.

More test cases are possible, but there has to be some selection
to make a point..

>
> Also, out of curiosity, are all the tests run on the same machine?

Yes. a 933 MHz Pentium III ? that is in my office. Running Windows 2000
5.00.2195
It says x86 family 6 model 8 stepping 6 523,568KB RAM

Are you using
> the CPU speed as the sole measure of the speed of the machine?

Yes

apinkus

unread,

Apr 6, 2002, 3:27:39 AM4/6/02

to Richard Fateman

Let me start off by saying that it is indeed an interesting test.
I maintain Yacas, and it did show the limitations of the system
very clearly. So thanks for sharing the information!

However, you compare the times the systems take, making it a
competition between systems based on this test.

>
> OK, there are 3 different representations of interest here.
> the vector representation used by MockMMa
> the list representation used by Macsyma (also Reduce... )
> the hashtable representation used by (un-named 1-page lisp program).
>
> For vector, it would be a big loss since I would have to allocate a vector
> [1 0 0 0..... 1 0 0 0 0..... 1 ....1 0 0 0 0...] and that itself would
> be impossible. However if one did this: let y=x^(2^100-1), then this
> would possibly win.
>

True, but then you would be helping the CAS. The whole point is to have
it done automatically? You can help the CAS even more by writing this in c++.
I'm sure you could hand-optimize it to outperform any lisp compiler.
Write a c++ program specifically for this test. If I wanted to cheat I could
have the compiler unroll some template classes. Then runtime would be low.

If you are indeed interested in the coefficients of the expression, if they
really were of interest for some research, then this would be a valid and
fruitful approach.

Or, a closed form expression for the coefficients of the terms might be found.

> For list, there would be no problem at all.
> (x bignumber 1 123 1 57 1 13 1) would be about as fast.
>

Slower than your vector/array representation though, since arrays
can be dealt with more efficiently by computers. The point: other
systems might be slower, because they use sparse representations,
but at the same time they can solve problems quickly that would be
troublesome in your system. So that is why it is perhaps not a fair
test.

>
> I think that judiciously switching between sparse and dense is a way
> to win. If you only had one representation, sparse would be more
> plausible.
>
> More test cases are possible, but there has to be some selection
> to make a point..
>

Forgive me if this sounds rude, but in making the point, wouldn't it be fair
to discuss disadvantages also? What is the point of the article by the way?

1) That dense representations are good for some specific calculations?
Because then it would have been best to compare various representations
implemented under the same lisp system.
2) That lisp is the way to go? Then the same algorithm should have been
implemented under different systems, in different languages, for comparison.
3) The quality of MockMMA, but then an extensive test suite would have been
appropriate.
4) That it is an interesting test for CAS systems? That is true. But it is
unfair to make a time comparison between your 'special purpose' code to general
all-purpose systems which have to perform reasonably for many more different inputs.

>
> Are you using
> > the CPU speed as the sole measure of the speed of the machine?
>
> Yes
>

Ok, that is fair, I suppose.

But an interesting test nonetheless!

Ayal Pinkus