Floating-point performance of Lisp compared to C

Nicolas Neuss

unread,

Jul 5, 2002, 6:22:35 AM7/5/02

to

Hello, Lispers.

In spite of Erik's nice signature I have chosen for this message, too,
I'm still interested in low-level performance of my programs. In my
case (I'm doing numerical analysis for partial differential
equations), it is especially the floating point performance which
matters. I'm using CMUCL and it doesn't perform badly in comparison
with C, at least on my computer (some of you will remember that they
helped me with my first steps in CL exactly at this problem).

Now, what I would like to have is some more data, about how Lisp
implementations run this program. Especially, I would be interested
with CMUCL on SUN workstations, ACL, Lispworks, ... on X86 and other
architectures. If someone would like to test it, please go ahead.
I'm very interested in the results. Please always report the results
for the C program

Nicolas.

P.S.: The demo versions for commercial Lisps will probably not
allocate the memory needed by the program. Also: don't be too
disappointed if your Lisp does not perform very well. Floating-point
performance ist not of highest importance for most of applications.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; mflop.lisp
;;;; (C) Nicolas Neuss (Nicola...@iwr.uni-heidelberg.de)
;;;; mflop.lisp is in the public domain.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defconstant +N-long+ #x100000) ; does not fit in secondary cache
(defconstant +N-short+ #x100) ; fits in primary cache

(defparameter *mflop-delta* 5.0
"Time interval in seconds over which we measure performance.")

(defun make-double-float-array (size &optional (initial 0.0d0))
(make-array size :element-type 'double-float :initial-element initial))

(defun ddot (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))
(declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
(loop for i fixnum from 0 below n
summing (* (aref x i) (aref y i)) double-float))

(defun daxpy (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))
(declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
(loop with s double-float = 0.3d0
for i from 0 below n do
(setf (aref x i) (+ (* s (aref y i))))))

(defun test (fn size)
(let ((x (make-double-float-array +N-long+))
(y (make-double-float-array +N-long+)))
(format
t "~A-~A: ~$ MFLOPS~%"
fn
(if (= size +N-long+) "long" "short")
(loop with after = 0
for before = (get-internal-run-time) then after
and count = 1 then (* count 2)
do
(loop repeat count do (funcall fn x y size))
(setq after (get-internal-run-time))
(when (> (/ (- after before) internal-time-units-per-second)
*mflop-delta*)
(return (/ (* 2 size count internal-time-units-per-second)
(* 1e6 (- after before)))))))))

(defun mflop-test ()
"Returns several numbers characteristic for floating point efficiency of
your CL implementation. Please compare these numbers to those obtained by
the C version in mflop.c."
(test 'ddot +N-long+)
(test 'ddot +N-short+)
(test 'daxpy +N-long+)
(test 'daxpy +N-short+))

#+ignore (mflop-test)

/**********************************************************************
mflop.c -- performance testing
(C) Nicolas Neuss (Nicola...@iwr.uni-heidelberg.de)
mflop.c is public domain.
**********************************************************************/

/* Reasonable compilation lines are
Linux: gcc -O3 mflop.c
IRIS Octane: cc -Ofast mflop.c
Sparc Ultra II: cc -fast mflop.c
IBM RS6000/590: xlc -O3 -qarch=pwrx -qtune=pwrx mflop.c */

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

#define MFLOP_DELTA 5.0 /* time interval over which we measure performance */
#define Nlong 1000000 /* does not fit in secondary cache */
#define Nshort 256 /* fits in primary cache */

#define CURRENT_TIME (((double)clock()) / ((double)CLOCKS_PER_SEC))

double ddot (double *x, double *y, int n) {
int j;
double s = 0.0;
for (j=0; j<n; j++)
s += x[j]*y[j];
return s;
}
double daxpy (double *x, double *y, int n) {
int j;
double s = 0.1;
for (j=0; j<n; j++)
y[j] += s*x[j];
return 0.0;
}
typedef double testfun (double *, double *, int n);

void test (testfun f, char *name, int n) {
int i, nr;
double start_time, end_time;
double s = 0.0;
double *x = (double *) malloc(sizeof(double)*Nlong);
double *y = (double *) malloc(sizeof(double)*Nlong);
for (i=0; i<Nlong; i++)
x[i] = 0.0; y[i] = 0.9;
nr = 1;
do {
nr = 2*nr;
start_time = CURRENT_TIME;
for (i=0; i<nr; i++)
s += f(x, y, n);
end_time = CURRENT_TIME;
} while (end_time-start_time<MFLOP_DELTA);
printf ("%s%s %4.2f MFLOPS\n", name, ((n==Nlong) ? "-long":"-short"),
1.0e-6*2*n*(s+nr/(end_time-start_time)));
}

int main (void) {
test(ddot, "ddot", Nlong);
test(ddot, "ddot", Nshort);
test(daxpy, "daxpy", Nlong);
test(daxpy, "daxpy", Nshort);
return 0;
}

Sample results for my Toshiba TECRA 8000 Laptop:

CMUCL:
* ;;; Evaluate mflop-test
DDOT-long: 42.01 MFLOPS
DDOT-short: 108.90 MFLOPS
DAXPY-long: 23.46 MFLOPS
DAXPY-short: 136.26 MFLOPS
NIL

gcc -O3 mflop-neu.c; a.out
ddot-long 62.75 MFLOPS
ddot-short 178.36 MFLOPS
daxpy-long 22.82 MFLOPS
daxpy-short 119.70 MFLOPS

Speed disadvantage of CMUCL:

ddot-long: 1.7
ddot-short: 0.61
daxpy-long: 1.0
daxpy-short: 0.9

--

Performance is the last refuge of the miserable programmer.
-- Erik Naggum

Nicolas Neuss

unread,

Jul 5, 2002, 7:44:32 AM7/5/02

to

Sorry, here the correction of some small errors (colleagues called me
to lunch when I was just finishing the mail).

1. Of course, you should drop the #+ignore in front of the call
(mflop-test) in mflop.lisp.

2. Speed disadvantage of CMUCL:

ddot-long: 0.67

ddot-short: 0.61
daxpy-long: 1.0
daxpy-short: 0.9

I.e. not too much difference (about 0 - 40% loss of efficiency).

By the way: this program was optimized for CMUCL. It might be
necessary that some changes are necessary for good performance on ACL
or Lispworks. In those changes I would also be interested.

Yours, Nicolas

Igor Carron

unread,

Jul 6, 2002, 1:49:15 AM7/6/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87eleif...@ortler.iwr.uni-heidelberg.de>...

Too large for the trial version of Allegro/Franz.

DDOT-long: 17.73 MFLOPS
Error: An allocation request for 8388624 bytes caused a need for
22282240 more bytes of heap. This request cannot be satisfied
because you have hit the Allegro CL Trial heap limit.

Igor.

Igor Carron

unread,

Jul 6, 2002, 3:43:47 AM7/6/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87eleif...@ortler.iwr.uni-heidelberg.de>...

x86 - celeron 900 MHz - WinXP Home :

Lispworks:

Error: Unknown LOOP keyword in (... DOUBLE-FLOAT). Maybe missing
OF-TYPE loop keyword.

CLISP:

DDOT-long: 0.20 MFLOPS
DDOT-short: 0.20 MFLOPS
DAXPY-long: 0.16 MFLOPS
DAXPY-short: 0.16 MFLOPS

ACL 6.1 (trial version):

DDOT-long: 16.73 MFLOPS
and then error because of trial version.

The CLISP results are depressing.

Igor.

Wade Humeniuk

unread,

Jul 6, 2002, 11:51:50 AM7/6/02

to

Just for the fun of it, I changed around your code for LWW 4.1.20. Made a DLL and did a
foreign function approach.

Compiled my mflop.c with mingw32 2.95.2

gcc -O3 -c mflop.c
dllwrap --output-lib=libmflop.a --dllname=mflop.dll --driver-name=gcc mflop.o

Results on 1.4GHz Athlon

CL-USER 36 > (mflop-test)
DDOT-long: 49.18 MFLOPS
DDOT-short: 339.85 MFLOPS
DAXPY-long: 41.30 MFLOPS
DAXPY-short: 372.31 MFLOPS
NIL

CL-USER 37 >

The best I could get for LWW (with modifications) with straight Lisp was

CL-USER 27 > (mflop-test)
DDOT-long: 5.32 MFLOPS
DDOT-short: 5.65 MFLOPS
DAXPY-long: 3.19 MFLOPS
DAXPY-short: 3.64 MFLOPS
NIL

mflop.c -----------------------------------

#ifdef BUILD_DLL
// the dll exports
#define EXPORT __declspec(dllexport)
#else
// the exe imports
#define EXPORT __declspec(dllimport)
#endif
// function to be imported/exported
EXPORT double ddot (double*, double*, int);
EXPORT double daxpy (double*, double*, int);

EXPORT double ddot (double *x, double *y, int n) {

int j;
double s = 0.0;
for (j=0; j<n; j++)
s += x[j]*y[j];
return s;
}

EXPORT double daxpy (double *x, double *y, int n) {

int j;
double s = 0.1;
for (j=0; j<n; j++)
y[j] += s*x[j];
return 0.0;
}

mflop.lisp ------------------------------

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; mflop.lisp
;;;; (C) Nicolas Neuss (Nicola...@iwr.uni-heidelberg.de)
;;;; mflop.lisp is in the public domain.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defconstant +N-long+ #x100000) ; does not fit in secondary cache
(defconstant +N-short+ #x100) ; fits in primary cache

(defparameter *mflop-delta* 5.0
"Time interval in seconds over which we measure performance.")

(fli:register-module #p"d:/user/wade/lww/mflop.dll")

(fli:define-foreign-function (ddot "ddot")
((x (:pointer :double))
(y (:pointer :double))
(n :int))
:result-type :double
:calling-convention :cdecl)

(fli:define-foreign-function (daxpy "daxpy")
((x (:pointer :double))
(y (:pointer :double))
(n :int))
:result-type :double
:calling-convention :cdecl)

(defun allocate-double-float-array (size &optional (initial 0.0d0))
(fli:allocate-foreign-object :type :double :nelems size :initial-element initial))
(defun free-double-float-array (array)
(fli:free-foreign-object array))

(defun test (fn size)
(let ((x (allocate-double-float-array +N-long+))
(y (allocate-double-float-array +N-long+)))
(unwind-protect

(format
t "~A-~A: ~$ MFLOPS~%"
fn
(if (= size +N-long+) "long" "short")
(loop with after = 0
for before = (get-internal-run-time) then after
and count = 1 then (* count 2)
do
(loop repeat count do (funcall fn x y size))
(setq after (get-internal-run-time))
(when (> (/ (- after before) internal-time-units-per-second)
*mflop-delta*)
(return (/ (* 2 size count internal-time-units-per-second)
(* 1e6 (- after before)))))))

(free-double-float-array x)
(free-double-float-array y))))

Jochen Schmidt

unread,

Jul 6, 2002, 1:03:35 PM7/6/02

to

Wade Humeniuk wrote:

> Just for the fun of it, I changed around your code for LWW 4.1.20. Made a
> DLL and did a foreign function approach.
>
> Compiled my mflop.c with mingw32 2.95.2
>
> gcc -O3 -c mflop.c
> dllwrap --output-lib=libmflop.a --dllname=mflop.dll --driver-name=gcc
> mflop.o
>
> Results on 1.4GHz Athlon
>
> CL-USER 36 > (mflop-test)
> DDOT-long: 49.18 MFLOPS
> DDOT-short: 339.85 MFLOPS
> DAXPY-long: 41.30 MFLOPS
> DAXPY-short: 372.31 MFLOPS
> NIL
>
> CL-USER 37 >
>
> The best I could get for LWW (with modifications) with straight Lisp was
>
> CL-USER 27 > (mflop-test)
> DDOT-long: 5.32 MFLOPS
> DDOT-short: 5.65 MFLOPS
> DAXPY-long: 3.19 MFLOPS
> DAXPY-short: 3.64 MFLOPS
> NIL

Mine was as follows (with LW4.2.6):

Results on Duron 800 :

jsc@shsp0629:~ > gcc -O3 mflop.c
jsc@shsp0629:~ > ./a.out
ddot-long 78.29 MFLOPS
ddot-short 382.80 MFLOPS
daxpy-long 34.59 MFLOPS
daxpy-short 594.87 MFLOPS

CL-USER 8 > (mflop-test)
DDOT-long: 33.55 MFLOPS
DDOT-short: 65.39 MFLOPS
DAXPY-long: 28.99 MFLOPS
DAXPY-short: 56.69 MFLOPS
NIL

For the CL version I had to reduce the size of the longer double array
because it exeeced the maximum array size in LW4.2.6.

The bad numbers for the short tests may be because of smaller caches on the
Duron. Setting the "float" optimization flag to 0 doesn't seem to bring
much gains.

This is the actual code I used
(note that I changed the setf in daxpy to incf - which looks better if you
compare it with the C source):

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; mflop.lisp
;;;; (C) Nicolas Neuss (Nicola...@iwr.uni-heidelberg.de)
;;;; mflop.lisp is in the public domain.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defconstant +N-long+ (floor #x100000 2)) ; does not fit in secondary cache

(defconstant +N-short+ #x100) ; fits in primary cache

(defparameter mflop-delta 5.0

"Time interval in seconds over which we measure performance.")

(defun make-double-float-array (size &optional (initial 0.0s0))

(make-array size :element-type 'double-float :initial-element initial))

(defun ddot (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))
(declare (optimize (safety 0) (space 0) (debug 0) (speed

3)#+lispworks(float 0)))
(loop for i of-type fixnum from 0 below n
summing (* (aref x i) (aref y i)) of-type double-float))

(defun daxpy (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))

(declare (optimize (safety 0) (space 0) (debug 0) (speed 3)#-lispworks
(float 0)))
(loop with s of-type double-float = 0.3d0
for i of-type fixnum from 0 below n do
(incf (aref x i) (* s (aref y i))))))))

(defun test (fn size)

(let ((x (make-double-float-array +N-long+))
(y (make-double-float-array +N-long+)))

(format
t "~A-~A: ~$ MFLOPS~%"
fn
(if (= size +N-long+) "long" "short")
(loop with after = 0
for before = (get-internal-run-time) then after
and count = 1 then (* count 2)
do
(loop repeat count do (funcall fn x y size))
(setq after (get-internal-run-time))
(when (> (/ (- after before) internal-time-units-per-second)

mflop-delta)

(return (/ (* 2 size count internal-time-units-per-second)

(* 1e6 (- after before)))))))))

(defun mflop-test ()
"Returns several numbers characteristic for floating point efficiency of
your CL implementation. Please compare these numbers to those obtained by
the C version in mflop.c."
(test 'ddot +N-long+)
(test 'ddot +N-short+)
(test 'daxpy +N-long+)
(test 'daxpy +N-short+))

;(mflop-test)

--
http://www.dataheaven.de

Jochen Schmidt

unread,

Jul 6, 2002, 1:07:48 PM7/6/02

to

Jochen Schmidt wrote:

> (defun daxpy (x y n)
> (declare (type fixnum n)
> (type (simple-array double-float (*)) x y))
> (declare (optimize (safety 0) (space 0) (debug 0) (speed 3)#-lispworks
> (float 0)))
> (loop with s of-type double-float = 0.3d0
> for i of-type fixnum from 0 below n do
> (incf (aref x i) (* s (aref y i))))))))

For the given test values I had #+lispworks in the above code and not
#-lispworks.

ciao,
Jochen

--
http://www.dataheaven.de

Wade Humeniuk

unread,

Jul 6, 2002, 1:48:52 PM7/6/02

to

"Jochen Schmidt" <j...@dataheaven.de> wrote in message
news:ag77o8$fch$1...@rznews2.rrze.uni-erlangen.de...

> Mine was as follows (with LW4.2.6):
>
> Results on Duron 800 :
>
> jsc@shsp0629:~ > gcc -O3 mflop.c
> jsc@shsp0629:~ > ./a.out
> ddot-long 78.29 MFLOPS
> ddot-short 382.80 MFLOPS
> daxpy-long 34.59 MFLOPS
> daxpy-short 594.87 MFLOPS
>
> CL-USER 8 > (mflop-test)
> DDOT-long: 33.55 MFLOPS
> DDOT-short: 65.39 MFLOPS
> DAXPY-long: 28.99 MFLOPS
> DAXPY-short: 56.69 MFLOPS
> NIL

Used your code. The result on a 1.4GHz Athlon, LWW 4.1.20 is

CL-USER 1 > (mflop-test)
DDOT-long: 51.55 MFLOPS
DDOT-short: 116.80 MFLOPS
DAXPY-long: 42.62 MFLOPS
DAXPY-short: 91.48 MFLOPS
NIL

CL-USER 2 >

So the key is (float 0).

Wade

Nils Goesche

unread,

Jul 6, 2002, 3:02:58 PM7/6/02

to

Jochen Schmidt <j...@dataheaven.de> writes:

> Mine was as follows (with LW4.2.6):
>
> Results on Duron 800 :
>
> jsc@shsp0629:~ > gcc -O3 mflop.c
> jsc@shsp0629:~ > ./a.out
> ddot-long 78.29 MFLOPS
> ddot-short 382.80 MFLOPS
> daxpy-long 34.59 MFLOPS
> daxpy-short 594.87 MFLOPS
>
> CL-USER 8 > (mflop-test)
> DDOT-long: 33.55 MFLOPS
> DDOT-short: 65.39 MFLOPS
> DAXPY-long: 28.99 MFLOPS
> DAXPY-short: 56.69 MFLOPS
> NIL

LWL4.2.6 on a Pentium IV 1.7 GHz:

cartan@darkstar:[lisp]$ gcc -O3 -o mflop mflop.c
cartan@darkstar:[lisp]$ ./mflop
ddot-long 381.38 MFLOPS
ddot-short 610.08 MFLOPS
daxpy-long 150.59 MFLOPS
daxpy-short 525.06 MFLOPS

CL-USER 1 > (mflop-test)
DDOT-long: 118.25 MFLOPS
DDOT-short: 117.86 MFLOPS
DAXPY-long: 132.40 MFLOPS
DAXPY-short: 251.46 MFLOPS
NIL

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xC66D6E6F

Watson Aname

unread,

Jul 6, 2002, 2:58:44 PM7/6/02

to

Here are results for a 1.33Ghz athlon with CMUCL 18d
(original code)

* (mflop-test)
DDOT-long: 71.39 MFLOPS
DDOT-short: 446.00 MFLOPS
DAXPY-long: 72.65 MFLOPS
DAXPY-short: 465.83 MFLOPS

Nils Goesche

unread,

Jul 6, 2002, 3:21:53 PM7/6/02

to

Nils Goesche <n...@cartan.de> writes:

> LWL4.2.6 on a Pentium IV 1.7 GHz:
>
> cartan@darkstar:[lisp]$ gcc -O3 -o mflop mflop.c
> cartan@darkstar:[lisp]$ ./mflop
> ddot-long 381.38 MFLOPS
> ddot-short 610.08 MFLOPS
> daxpy-long 150.59 MFLOPS
> daxpy-short 525.06 MFLOPS
>
>
> CL-USER 1 > (mflop-test)
> DDOT-long: 118.25 MFLOPS
> DDOT-short: 117.86 MFLOPS
> DAXPY-long: 132.40 MFLOPS
> DAXPY-short: 251.46 MFLOPS
> NIL

Oh, and CMUCL 18d:

* (mflop-test)
ddot-long: 280.72 MFLOPS
ddot-short: 454.01 MFLOPS
daxpy-long: 161.22 MFLOPS
daxpy-short: 318.15 MFLOPS

Matthew Danish

unread,

Jul 6, 2002, 5:39:10 PM7/6/02

to

For these tests, I used your code directly (with a single exception noted later
on):

[Note: These are informal benchmarks. I tried to ensure that the machines were
relatively idle (load avg close to 0) before running them. That's about it.]

Linux/x86: Pentium IV 2.4GHz 900 MB RAM

ACL 6.0 Enterprise (Linux/x86) ((speed 1) (safety 1) (debug 2) (space 1)):
DDOT-long: 123.70 MFLOPS
DDOT-short: 129.37 MFLOPS
DAXPY-long: 69.36 MFLOPS
DAXPY-short: 122.99 MFLOPS

ACL 6.0 Enterprise (Linux/x86) ((speed 3) (safety 1) (debug 0) (space 0)):
DDOT-long: 118.91 MFLOPS
DDOT-short: 133.05 MFLOPS
DAXPY-long: 71.97 MFLOPS
DAXPY-short: 125.73 MFLOPS

ACL 6.0 Enterprise (Linux/x86) ((speed 3) (safety 0) (debug 0) (space 0)):
DDOT-long: 119.44 MFLOPS
DDOT-short: 163.18 MFLOPS
DAXPY-long: 68.48 MFLOPS
DAXPY-short: 124.13 MFLOPS

GCC (Linux/x86) (no opt):
ddot-long 162.28 MFLOPS
ddot-short 186.41 MFLOPS
daxpy-long 68.91 MFLOPS
daxpy-short 324.88 MFLOPS

GCC (Linux/x86) (-O):
ddot-long 238.97 MFLOPS
ddot-short 920.68 MFLOPS
daxpy-long 69.85 MFLOPS
daxpy-short 1077.78 MFLOPS

[gcc -O2,3 gave worse performance!]

Looks like ACL 6.0 either doesn't have Pentium IV optimizations or I don't know
how to turn them on.

Solaris/SPARC:
SunOS <host omitted> 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-60
sparcv9 at 360MHz, has sparcv9 FP processor
512MB RAM

ACL 6.0 Enterprise (Solaris/SPARC) ((speed 1) (safety 1) (debug 2) (space 1))
DDOT-long: 23.38 MFLOPS
DDOT-short: 43.23 MFLOPS
DAXPY-long: 27.42 MFLOPS
DAXPY-short: 56.51 MFLOPS

ACL 6.0 Enterprise (Solaris/SPARC) ((speed 3) (safety 0) (debug 0) (space 0))
DDOT-long: 23.59 MFLOPS
DDOT-short: 43.16 MFLOPS
DAXPY-long: 27.31 MFLOPS
DAXPY-short: 56.57 MFLOPS

ucbcc (Solaris/SPARC) (-fast):
Segmentation Fault (core dumped)

GCC (Solaris/SPARC) (-O3):
ddot-long 22.42 MFLOPS
ddot-short 63.76 MFLOPS
Segmentation Fault (core dumped)

[ I had to add free(x); free(y); at end of test() to even get it this far! ]

--
; Matthew Danish <mda...@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

Matthew Danish

unread,

Jul 6, 2002, 5:54:35 PM7/6/02

to

Being clever and too lazy to figure out what was corrupting the heap, I
re-arranged the test order for the SPARC C test and obtained the additional two
numbers:

GCC (Solaris/SPARC) (-O3):
daxpy-long 21.26 MFLOPS
daxpy-short 58.80 MFLOPS

Shaun Rowland

unread,

Jul 6, 2002, 8:28:49 PM7/6/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:

> for (i=0; i<Nlong; i++)
> x[i] = 0.0; y[i] = 0.9;

I recieve a segmentation fault on Solaris using gcc and the Sun
compiler. One problem I notied is this for loop above in test(). There
are two statements there:

x[i] = 0.0;
y[i] = 0.9;

You must use { ... } when there is more than one statement. Once I do
this, the code runs fine on Solaris. I have not had a chance to test it
further. Is this what you intended:

for (i=0; i<Nlong; i++) {
x[i] = 0.0;
y[i] = 0.9;
}

?
--
Shaun Rowland row...@cis.ohio-state.edu
http://www.cis.ohio-state.edu/~rowland/

Shaun Rowland

unread,

Jul 6, 2002, 9:11:46 PM7/6/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:

> Now, what I would like to have is some more data, about how Lisp
> implementations run this program. Especially, I would be interested
> with CMUCL on SUN workstations, ACL, Lispworks, ... on X86 and other
> architectures. If someone would like to test it, please go ahead.
> I'm very interested in the results. Please always report the results
> for the C program

With the for loop fix I mentioned before, these are the results I have
been able to produce using the provided code:

AMD Athlon(TM) XP1800+ (1.55GHz) Red Hat 7.3
============================================

CMUCL 18d
---------

DDOT-long: 103.24 MFLOPS
DDOT-short: 530.90 MFLOPS
DAXPY-long: 89.18 MFLOPS
DAXPY-short: 530.90 MFLOPS

gcc 2.96 (no optimization)
--------------------------

ddot-long 88.58 MFLOPS
ddot-short 170.98 MFLOPS
daxpy-long 78.89 MFLOPS
daxpy-short 166.47 MFLOPS

gcc 2.96 (-O3 optimization)
---------------------------

ddot-long 110.34 MFLOPS
ddot-short 736.70 MFLOPS
daxpy-long 97.15 MFLOPS
daxpy-short 891.07 MFLOPS

Solaris 8 4x480MHz US-II (Ultra Enterprise 450)
===============================================

CMUCL 18c
---------

DDOT-long: 22.41 MFLOPS
DDOT-short: 59.06 MFLOPS
DAXPY-long: 29.34 MFLOPS
DAXPY-short: 85.22 MFLOPS

CMUCL 18d
---------

DDOT-long: 21.93 MFLOPS
DDOT-short: 65.79 MFLOPS
DAXPY-long: 33.68 MFLOPS
DAXPY-short: 91.00 MFLOPS

gcc 3.0.4 (no optimization)
---------------------------

ddot-long 15.08 MFLOPS
ddot-short 28.96 MFLOPS
daxpy-long 13.62 MFLOPS
daxpy-short 26.06 MFLOPS

gcc 3.0.4 (-O3 optimization)
----------------------------

ddot-long 22.07 MFLOPS
ddot-short 84.41 MFLOPS
daxpy-long 22.03 MFLOPS
daxpy-short 77.25 MFLOPS

Sun cc v. 6 update 1 (no optimization)
--------------------------------------

ddot-long 14.87 MFLOPS
ddot-short 28.29 MFLOPS
daxpy-long 12.24 MFLOPS
daxpy-short 21.30 MFLOPS

Sun cc v. 6 update 1 (-fast optimization with -xarch=native [v8plusa])
----------------------------------------------------------------------

ddot-long 38.04 MFLOPS
ddot-short 416.99 MFLOPS
daxpy-long 24.29 MFLOPS
daxpy-short 152.52 MFLOPS

Matthew Danish

unread,

Jul 7, 2002, 1:55:18 AM7/7/02

to

Sigh, sorry for another self-response. But here are updated C timings taking
into account Shaun Rowland's fix:

GCC (Linux/x86) (no opt):
ddot-long 117.57 MFLOPS
ddot-short 187.06 MFLOPS
daxpy-long 72.01 MFLOPS
daxpy-short 327.86 MFLOPS

GCC (Linux/x86) (-O):
ddot-long 119.35 MFLOPS
ddot-short 921.67 MFLOPS
daxpy-long 72.01 MFLOPS
daxpy-short 1083.22 MFLOPS

ucbcc (Solaris/SPARC) (-fast):
ddot-long 42.52 MFLOPS
ddot-short 317.68 MFLOPS
daxpy-long 24.38 MFLOPS
daxpy-short 115.46 MFLOPS

GCC (Solaris/SPARC) (-O3):
ddot-long 22.42 MFLOPS

ddot-short 64.53 MFLOPS
daxpy-long 21.26 MFLOPS
daxpy-short 58.74 MFLOPS

Joe Marshall

unread,

Jul 7, 2002, 1:51:59 PM7/7/02

to

From looking at the various numbers posted I draw these conclusions:

1. When using short (IEEE single precision) floating point, code
written in C will likely be noticably faster than the equivalent
code written in Lisp.

2. When using long (IEEE double precision) floating point, code
written in C will have approximately the same performance as the
equivalent code written in Lisp.

Jochen Schmidt

unread,

Jul 7, 2002, 5:20:14 PM7/7/02

to

Joe Marshall wrote:

The "short" measurements actually used a smaller array of doubles than the
"long" test. The reason (following the comment in the code) was that the
long test should exceed the caches. On my Duron 800 both parts seems to
have exceeded the caches.

Joe Marshall

unread,

Jul 7, 2002, 9:28:30 PM7/7/02

to

"Jochen Schmidt" <j...@dataheaven.de> wrote in message news:agab4v$d6d$1...@rznews2.rrze.uni-erlangen.de...

Ok, so when using a short array (that fits in the cache) the C code should
run noticably faster than the Lisp code. When using a long array (which does
not fit in the cache), the Lisp and the C code seem to take about the same
amount of time (probably because cache refill dominates).

Igor Carron

unread,

Jul 7, 2002, 11:50:22 PM7/7/02

to

space_...@yahoo.com (Igor Carron) wrote in message news:<b882b855.02070...@posting.google.com>...

> Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87eleif...@ortler.iwr.uni-heidelberg.de>...
> > Sorry, here the correction of some small errors (colleagues called me
> > to lunch when I was just finishing the mail).
> >
> > 1. Of course, you should drop the #+ignore in front of the call
> > (mflop-test) in mflop.lisp.
> >
> > 2. Speed disadvantage of CMUCL:
> >
> > ddot-long: 0.67
> > ddot-short: 0.61
> > daxpy-long: 1.0
> > daxpy-short: 0.9
> >
> > I.e. not too much difference (about 0 - 40% loss of efficiency).
> >
> > By the way: this program was optimized for CMUCL. It might be
> > necessary that some changes are necessary for good performance on ACL
> > or Lispworks. In those changes I would also be interested.
> >
> > Yours, Nicolas
>
> x86 - celeron 900 MHz - WinXP Home :
>
> Lispworks:
>
> Error: Unknown LOOP keyword in (... DOUBLE-FLOAT). Maybe missing
> OF-TYPE loop keyword.
>

any fix for the lispworks problem ?

Igor.

Jochen Schmidt

unread,

Jul 8, 2002, 12:23:48 AM7/8/02

to

Igor Carron wrote:

Of course - you can use the code of my posting in this thread which fixes
this. The code used the old LOOP syntax for declaring types of LOOP
variables.

;; Old Syntax Example
(loop with n fixnum = 5)

;; New Syntax Example
(loop with n of-type fixnum = 5)

For backwards compatibility one can use the old syntax for the types
fixnum, float, t or nil. It is unclear to me if this is thought to include
subtypes of those (like the double-float of the code in the original
posting).

http://www.xanalys.com/software_tools/reference/HyperSpec/Body/06_aag.htm

Describes the case in detail.

I would recommend to always use of-type when declaring types for LOOP
variables.

Aleksandr Skobelev

unread,

Jul 8, 2002, 3:27:55 AM7/8/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:

[...]

>
> (defun daxpy (x y n)
> (declare (type fixnum n)
> (type (simple-array double-float (*)) x y))
> (declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
> (loop with s double-float = 0.3d0
> for i from 0 below n do
> (setf (aref x i) (+ (* s (aref y i))))))

[...]

Probably, the last line in DAXPY should be
(setf (aref x i) (+ (aref x i) (* s (aref y i))))))

Speaking honestly, I don't understand what you are trying to measure in
this test. Is it a speed of operations with float-point numbers or might
be a speed of access elements in Lisp and C arrays?

On my Linux box with 1.2 GHz Athlon and CMU Common Lisp 18d+:

;;original version of mflop.lisp
DDOT-long: 72.94 MFLOPS
DDOT-short: 451.39 MFLOPS
DAXPY-long: 65.87 MFLOPS
DAXPY-short: 413.77 MFLOPS

;; Version of mflop.lisp with the last line in DAXPY fixed (see above)
DDOT-long: 72.94 MFLOPS
DDOT-short: 449.97 MFLOPS
DAXPY-long: 51.97 MFLOPS
DAXPY-short: 356.72 MFLOPS

Results for C version

GCC-2.95.3

(gcc -o mflop -O3 -malign-functions=4 -malign-loops=4 \
-funroll-loops -fexpensive-optimizations mflop.c)
ddot-long 124.57 MFLOPS
ddot-short 574.19 MFLOPS
daxpy-long 71.41 MFLOPS
daxpy-short 845.47 MFLOPS

(gcc -o mflop -O3 mflop.c)
ddot-long 100.99 MFLOPS
ddot-short 574.96 MFLOPS
daxpy-long 58.92 MFLOPS
daxpy-short 892.92 MFLOPS

(gcc -o mflop -O mflop.c)
ddot-long 78.41 MFLOPS
ddot-short 572.66 MFLOPS
daxpy-long 65.73 MFLOPS
daxpy-short 750.87 MFLOPS

GCC-3.1

(gcc -o mflop -O3 -falign-functions=4 -falign-loops=4 \
-funroll-loops -fexpensive-optimizations mflop.c)
ddot-long 123.08 MFLOPS
ddot-short 574.96 MFLOPS
daxpy-long 72.73 MFLOPS
daxpy-short 884.65 MFLOPS

(gcc-3.1 -o mflop -O3 mflop.c)
ddot-long 78.29 MFLOPS
ddot-short 575.73 MFLOPS
daxpy-long 58.25 MFLOPS
daxpy-short 896.65 MFLOPS

(gcc-3.1 -o mflop -O mflop.c)
ddot-long 123.67 MFLOPS
ddot-short 576.51 MFLOPS
daxpy-long 68.36 MFLOPS
daxpy-short 692.74 MFLOPS

Nicolas Neuss

unread,

Jul 8, 2002, 4:39:49 AM7/8/02

to

Shaun Rowland <row...@cis.ohio-state.edu> writes:

> Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:
>
> > for (i=0; i<Nlong; i++)
> > x[i] = 0.0; y[i] = 0.9;
>
> I recieve a segmentation fault on Solaris using gcc and the Sun
> compiler. One problem I notied is this for loop above in test(). There
> are two statements there:
>
> x[i] = 0.0;
> y[i] = 0.9;
>
> You must use { ... } when there is more than one statement. Once I do
> this, the code runs fine on Solaris. I have not had a chance to test it
> further. Is this what you intended:
>
> for (i=0; i<Nlong; i++) {
> x[i] = 0.0;
> y[i] = 0.9;
> }

Of course. Here we see drastically the advantages of using lisp
instead of C to get correct code:-)

Nicolas.

Nicolas Neuss

unread,

Jul 8, 2002, 5:07:10 AM7/8/02

to

space_...@yahoo.com (Igor Carron) writes:

>
> The CLISP results are depressing.
>
> Igor.

Do not be too depressed. For many problems performance depends on
other factors, e.g. how fast is the network connection, or on the
efficiency of built-in functions like SORT. Also note that you can
get very efficient matrix routines by simply using the Matlisp
interface to the Fortran BLAS/Lapack routines (see
http://sourceforge.net/projects/matlisp). This would probably even
surpass the C speed, at least for the long array computations when the
generic function overhead can be neglected.

Even if I'm not using it, I guess that CLISP is a nice development
environment. Furthermore, it is running on many platforms. And when
you should need native code compilation you should be able to switch
to another implementation quite easily. (I.e., with only small
changes like using OF-TYPE in loop declarations:-)

Yours, Nicolas.

Nicolas Neuss

unread,

Jul 8, 2002, 5:24:57 AM7/8/02

to

Aleksandr Skobelev <publi...@list.ru> writes:

> Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:
>
>
> [...]
>
> >
> > (defun daxpy (x y n)
> > (declare (type fixnum n)
> > (type (simple-array double-float (*)) x y))
> > (declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
> > (loop with s double-float = 0.3d0
> > for i from 0 below n do
> > (setf (aref x i) (+ (* s (aref y i))))))
>
> [...]
>
> Probably, the last line in DAXPY should be
> (setf (aref x i) (+ (aref x i) (* s (aref y i))))))

Of course. Even better is using incf as Jochen noted. And to be
completely correct with the naming one should also switch x and y.

(incf (aref y i) (* s (aref x i)))

> Speaking honestly, I don't understand what you are trying to measure in
> this test. Is it a speed of operations with float-point numbers or might
> be a speed of access elements in Lisp and C arrays?

You're right. I changed the comment for mflop-test now to:

"Returns several numbers characteristic for the efficiency with which
cour CL implementation will process numeric code written in a C/Fortran
style. This results should be significant also for other code using
arrays for achieving good data locality. Please compare these numbers

to those obtained by the C version in mflop.c."

I have put updated code at

http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop.c
http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop.lisp

Nicolas

Nicolas Neuss

unread,

Jul 8, 2002, 5:44:47 AM7/8/02

to

Hello.

First I want to thank you very much for all the responses I got.
Second, I guess that my code needs some updates which are provided
here:

http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop.c
http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop.lisp

I changed the following:

1. Fixed the bug discovered by Jochen Schmidt and Alexander Skobelev
in the Lisp daxpy routine.

2. Fixed the bug discovered by Shwan Rowland and Roland Kaufmann
(private mail) in the C program.

3. Inserted OF-TYPE in my loop type declarations. Does it work on
Lispworks now out of the box?

4. Changed the interface of the test routine to take both function and
name analogous to the C code. For me this makes no difference, but
for faster machines with slow symbol-lookup this might be
significant.

5. Changed the comment of mflop-test to reflect Alexander Skobelev's
point. These values will be significant for all code using uniform
arrays. Vector operations on long vectors are just an important
example of such code.

Thanks again,

Nicolas.

Nicolas Neuss

unread,

Jul 8, 2002, 7:18:59 AM7/8/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> writes:

> 4. Changed the interface of the test routine to take both function and
> name analogous to the C code. For me this makes no difference, but
> for faster machines with slow symbol-lookup this might be
> significant.

I forgot: this was pointed out to me by Paul Foley in private mail.

Nicolas.

Malouf R.

unread,

Jul 8, 2002, 11:24:40 AM7/8/02

to

In article <87n0t23...@ortler.iwr.uni-heidelberg.de>,

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote:
>First I want to thank you very much for all the responses I got.
>Second, I guess that my code needs some updates which are provided
>here:

Here are some more numbers...

Machine: Intel(R) Pentium(R) 4 CPU 1700MHz

** gcc -O3:

ddot-long 269.12 MFLOPS
ddot-short 624.27 MFLOPS
daxpy-long 174.45 MFLOPS
daxpy-short 523.78 MFLOPS

** cmucl 18d:

DDOT-long: 271.83 MFLOPS
DDOT-short: 483.67 MFLOPS
DAXPY-long: 168.83 MFLOPS
DAXPY-short: 334.50 MFLOPS

So, looks pretty good, right? C and lisp are roughly the same in
performance. Well, unfortunately for cmucl, we get much better
results using Intel's compiler (version 6.0):

** icc -xW -ip -tpp7 -O3

ddot-long 295.53 MFLOPS
ddot-short 1410.50 MFLOPS
daxpy-long 179.02 MFLOPS
daxpy-short 1651.91 MFLOPS

Now lisp doesn't look so good, and neither does gcc. But there's some
good news... using the BLAS routines from Intel's MKL, we get:

** icc -xW -ip -tpp7 -O3 -lmkl_p4

ddot-long 291.74 MFLOPS
ddot-short 1533.92 MFLOPS
daxpy-long 182.53 MFLOPS
daxpy-short 1214.98 MFLOPS

Presumably, you could get these same results from lisp or gcc by using
the appropriate BLAS calls.

Which, I think, leads to a question about what it is you are trying to
do. Rather than worrying about how you can most efficiently implement
numeric kernels for your programs, wouldn't it make more sense to
figure out how to best use the highly optimized numeric codes that are
already out there? For just about anything you want to do, someone's
written a killer fortran subroutine to do just that. Why try to
reinvent it in lisp?

Rob Malouf
malouf at let dot rug dot nl

Nicolas Neuss

unread,

Jul 9, 2002, 6:14:38 AM7/9/02

to

mal...@thor.let.rug.nl (Malouf R.) writes:

> Now lisp doesn't look so good, and neither does gcc. But there's some
> good news... using the BLAS routines from Intel's MKL, we get:
>
> ** icc -xW -ip -tpp7 -O3 -lmkl_p4
>
> ddot-long 291.74 MFLOPS
> ddot-short 1533.92 MFLOPS
> daxpy-long 182.53 MFLOPS
> daxpy-short 1214.98 MFLOPS
>
> Presumably, you could get these same results from lisp or gcc by using
> the appropriate BLAS calls.

Yes, I mentioned in my reply to Igor Carron that there even exists a
CL interface for BLAS/LAPACK called Matlisp.

> Which, I think, leads to a question about what it is you are trying to
> do. Rather than worrying about how you can most efficiently implement
> numeric kernels for your programs, wouldn't it make more sense to
> figure out how to best use the highly optimized numeric codes that are
> already out there? For just about anything you want to do, someone's
> written a killer fortran subroutine to do just that. Why try to
> reinvent it in lisp?

This is a model situation. My actual application is a pde solver on
unstructured grids where similar operations occur on small vectors.
To be reasonably fast I will probably have to inline such operations
into more complex functions. This needs more flexibility than a FFI
gets you. For handling larger linear algebra problems involving block
matrices and vectors I use Matlisp, of course.

Nicolas

Igor Carron

unread,

Jul 9, 2002, 8:44:12 PM7/9/02

to

Jochen Schmidt <j...@dataheaven.de> wrote in message news:<agb3uv$p1a$1...@rznews2.rrze.uni-erlangen.de>...

Thanks Jochen,

The Lispworks personal edition now tells me :
Error: can't make array of 8388624 bytes.

So both commercial versions in their personal/trial versions can't go
through the benchmark.

Igor.

Igor Carron

unread,

Jul 9, 2002, 9:36:17 PM7/9/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87sn2u3...@ortler.iwr.uni-heidelberg.de>...

> space_...@yahoo.com (Igor Carron) writes:
>
> >
> > The CLISP results are depressing.
> >
> > Igor.
>
> Do not be too depressed. For many problems performance depends on
> other factors, e.g. how fast is the network connection, or on the
> efficiency of built-in functions like SORT. Also note that you can
> get very efficient matrix routines by simply using the Matlisp
> interface to the Fortran BLAS/Lapack routines (see
> http://sourceforge.net/projects/matlisp). This would probably even
> surpass the C speed, at least for the long array computations when the
> generic function overhead can be neglected.
>

I agree, however the build for matlisp doesn't seem to be available
for the windows OS. Actually that could be rebuild given a compiler
using either open watcom or gcc/cygwin but this looks like a full time
job :-)

> Even if I'm not using it, I guess that CLISP is a nice development
> environment. Furthermore, it is running on many platforms. And when
> you should need native code compilation you should be able to switch

> to another implementation quite easily. (i.e., with only small

> changes like using OF-TYPE in loop declarations :-)

Sure, but like you, I am really trying to figure out if LISP can do
rapidily the things I already know how to do in fortran or C. It looks
like it can on the right platform (CUCML is not available on the win
OS). So unless I spend some time building matlip, the rapid
prototyping capability in LISP will not be available to me.... then
again I could also have dual boot on my current laptop.

Thanks,

Igor.

Igor Carron

unread,

Jul 10, 2002, 12:40:08 AM7/10/02

to

Nicolas:

Here is another data point for Corman Lisp:
win XP, celeron, 900 MHz

DDOT-long: 0.581552 MFLOPS
DDOT-short: 0.533124 MFLOPS
DAXPY-long: 0.200635 MFLOPS
DAXPY-short: 0.919238 MFLOPS

Igor.

Jochen Schmidt

unread,

Jul 10, 2002, 1:07:55 AM7/10/02

to

Igor Carron wrote:

> Thanks Jochen,
>
> The Lispworks personal edition now tells me :
> Error: can't make array of 8388624 bytes.
>
> So both commercial versions in their personal/trial versions can't go
> through the benchmark.

I fixed that too in the version of the other posting in this thread. The
longer array exceeds the maximum array size of LispWorks. I you make it
half as big it fits within the array dimension limit.

Nicolas Neuss

unread,

Jul 10, 2002, 7:09:22 AM7/10/02

to

space_...@yahoo.com (Igor Carron) writes:

Actually, I had hoped that Corman Lisp would be an alternative to
ACL/Lispworks on Windows:-(

Thanks, Nicolas.

P.S.: Somehow I had in mind that Corman Lisp would compile to native
code, but now I assume that it does not. Can someone say more?

Raymond Toy

unread,

Jul 10, 2002, 9:22:17 AM7/10/02

to

>>>>> "Igor" == Igor Carron <space_...@yahoo.com> writes:

Igor> I agree, however the build for matlisp doesn't seem to be available
Igor> for the windows OS. Actually that could be rebuild given a compiler
Igor> using either open watcom or gcc/cygwin but this looks like a full time
Igor> job :-)

I believe there is (was?) a version of matlisp using ACL on Windows.
I do not know if this still works.

>> Even if I'm not using it, I guess that CLISP is a nice development
>> environment. Furthermore, it is running on many platforms. And when
>> you should need native code compilation you should be able to switch
>> to another implementation quite easily. (i.e., with only small
>> changes like using OF-TYPE in loop declarations :-)

Igor> Sure, but like you, I am really trying to figure out if LISP can do
Igor> rapidily the things I already know how to do in fortran or C. It looks
Igor> like it can on the right platform (CUCML is not available on the win
Igor> OS). So unless I spend some time building matlip, the rapid
Igor> prototyping capability in LISP will not be available to me.... then
Igor> again I could also have dual boot on my current laptop.

You may want to try Clisp or gcl on Windows. While matlisp doesn't
support either at this time, it shouldn't be difficult to get it to
work for either of these, if you know how the FFI for these work.

The only real requirement is that you be able to access the actual
memory used by Lisp arrays. Even if you can't, you can always do a
copy-in/out, but that can cause severe degradation in performance,
obviously.

Ray

Joe Marshall

unread,

Jul 10, 2002, 10:14:25 AM7/10/02

to

"Nicolas Neuss" <Nicola...@iwr.uni-heidelberg.de> wrote in message

news:87bs9f3...@ortler.iwr.uni-heidelberg.de...

It does compile to native code. I expect that there is something
wrong that is causing those awful numbers.

Igor Carron

unread,

Jul 10, 2002, 11:41:47 PM7/10/02

to

"Joe Marshall" <prunes...@attbi.com> wrote in message news:<5jXW8.11425$Jp.35537@rwcrnsc53>...

can somebody else check these numbers ?

Igor.

Nicolas Neuss

unread,

Jul 11, 2002, 2:59:50 AM7/11/02

to

"Joe Marshall" <prunes...@attbi.com> writes:

> >
> > P.S.: Somehow I had in mind that Corman Lisp would compile to native
> > code, but now I assume that it does not. Can someone say more?
>
> It does compile to native code. I expect that there is something
> wrong that is causing those awful numbers.

Another possibility would be that it does not have uniform
double-float arrays or that it does not eliminate the generic
arithmetic.

To Igor:

1. Did you use the program from
http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop.lisp
unchanged?

2. Does it help to use this version (avoids loop):
http://cox.iwr.uni-heidelberg.de/~neuss/misc/mflop2.lisp

Nicolas.

Roger Corman

unread,

Jul 12, 2002, 1:35:08 AM7/12/02

to

While Corman Lisp has quite a bit of optimizations for integers
(primarily fixnums) at this time it does not have any optimizations
for floating point numbers. This is simply a matter of priorities:
we have had few requests or complaints about floating point speed
to this point. I certainly understand how important it is for some
applications, and plan to work on this area.

Since Corman Lisp if fully compiled, all the time, there is a
lot it can do to easily get far better performance. Right now
all floating point operations are out-of-line function calls,
and all floating point values (at least IEEE single and double)
are boxed and do heap allocation. Much of the time in this
benchmark is measuring garbage collections times.

In the 2.0 release, simple arrays of unboxed single and double
floats have been added, and this will lead to support for
inline operations on floating point operations. Presumably in the
next (post 2.0) release you will see significantly better numbers.
At the moment, using 2.0, I am getting these results on a
1.0-ghz Athlon:

DDOT-long: 1.258809 MFLOPS
DDOT-short: 1.295908 MFLOPS
DAXPY-long: 1.83115 MFLOPS
DAXPY-short: 1.887128 MFLOPS

On my system using CCL 1.5, I get this:
DDOT-long: 1.156547 MFLOPS
DDOT-short: 1.167793 MFLOPS
DAXPY-long: 0.336824 MFLOPS
DAXPY-short: 1.694285 MFLOPS

Probably there is some speedup in 2.0 due to the reduced
storage requirements by supporting arrays of unboxed floats.

We realize these are low relative to some other compilers.
We expect to be able to produce floating point performance
comparable to the fastest compilers in the near future.

Corman Lisp is still actively evolving, if slowly because of
limited development resources. It is to a large degree
driven by market demand. The things that the most people
are asking for are the things we prioritize.

Roger
-----------------------------------------
Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87vg7mu...@ortler.iwr.uni-heidelberg.de>...

Igor Carron

unread,

Jul 12, 2002, 1:42:19 AM7/12/02

to

Nicolas Neuss <Nicola...@iwr.uni-heidelberg.de> wrote in message news:<87vg7mu...@ortler.iwr.uni-heidelberg.de>...

straight out of your examples:

Corman Lisp 1.5 Copyright © 2001 Roger Corman. All rights reserved.
+N-LONG+
+N-SHORT+
*MFLOP-DELTA*
MAKE-DOUBLE-FLOAT-ARRAY
;;; Autoloading C:\Program Files\Corman Tools\Corman Lisp
1.5\sys\loop.lisp ...
DDOT
DAXPY
TEST
MFLOP-TEST
DDOT-long: 0.638327 MFLOPS
DDOT-short: 0.634276 MFLOPS
DAXPY-long: 0.147311 MFLOPS
DAXPY-short: 0.573144 MFLOPS
NIL
;;;
;;;and then mflop2
;;;
Corman Lisp 1.5 Copyright © 2001 Roger Corman. All rights reserved.
+N-LONG+
+N-SHORT+
*MFLOP-DELTA*
MAKE-DOUBLE-FLOAT-ARRAY
DDOT
DAXPY
TEST
MFLOP-TEST
ddot-long: 0.770556 MFLOPS
ddot-short: 0.786894 MFLOPS
;;; An error occurred in function +:
;;; Error: Cannot call function '+' with these operands: #< UNKNOWN
OBJECT TYPE: #xBFFC15 > and 0.0d0
;;; Entering Corman Lisp debug loop.
;;; Use :C followed by an option to exit. Type :HELP for help.
;;; Restart options:
;;; 1 Abort to top level.

Win xp, celeron 900 Mhz. tried that on other machines (Win 2000, NT)
and get similar results.

Igor.