How faster can this be done?

Emre Sevinc

unread,

May 19, 2007, 9:38:28 AM5/19/07

to

Hello Lisp meisters,

I was reading a blog about the performance of implementations of a
very simple Pythagorean triplets algorithm in different languages:

http://tempe.st/2007/05/erlang-ruby-and-php-battle-it-out/

And I wondered what I could do with Common Lisp (SBCL 0.9.17) I have
tried two versions:

(defconstant +max+ 5000
"The constant value up to which the main loop will run")

(defun pythagorean-triplet()
(let ((a 0) (i 0))
(loop for c from 2 to +max+
do
(loop for b from 1 below c
do (progn
(setf a (sqrt (- (* c c) (* b b))))
(when (= (floor a) a) (incf i)))))
(print i)))

(defun pythagorean-triplet-optimized-for-speed()
(declare (optimize (speed 3) (debug 0) (safety 0))
(type single-float a)
(type fixnum i))
(let ((a 0) (i 0))
(loop for c from 2 to +max+
do
(loop for b from 1 below c
do (progn
(setf a (sqrt (- (* c c) (* b b))))
(when (= (ffloor a) a) (incf i)))))
(print i)))

Then I timed them:

CL-USER> (time (pythagorean-triplet))
13413
Evaluation took:
5.015 seconds of real time
3.87941 seconds of user run time
0.035995 seconds of system run time
[Run times include 0.25 seconds GC run time.]
0 calls to %EVAL
0 page faults and
299,940,200 bytes consed.

CL-USER> (time (pythagorean-triplet-optimized-for-speed))
13413
Evaluation took:
1.571 seconds of real time
1.433782 seconds of user run time
0.002999 seconds of system run time
[Run times include 0.151 seconds GC run time.]
0 calls to %EVAL
0 page faults and
199,960,776 bytes consed.

I wonder if pythagorean-triplet-optimized-for-speed function can be
made faster by tweaking the compilation parameters (without touching
the structure of the loop).

Maybe some of you will try and show me (or maybe SBCL > 1 versions got
faster, I don't know).

The machine I ran this on was a Debian with the following
specifications:

debian:/home/fz# uname -a
Linux debian 2.6.11-1-686 #1 Mon Jun 20 22:00:38 MDT 2005 i686 GNU/
Linux

fz@debian:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Celeron(R) CPU 2.40GHz
stepping : 1
cpu MHz : 2392.778
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid xtpr
bogomips : 4718.59

Cheers
--
Emre Sevinc

Chris Russell

unread,

May 19, 2007, 10:25:46 AM5/19/07

to

> I wonder if pythagorean-triplet-optimized-for-speed function can be
> made faster by tweaking the compilation parameters (without touching
> the structure of the loop).
>
> Maybe some of you will try and show me (or maybe SBCL > 1 versions got
> faster, I don't know).
>

You need to put more (the fixnum _)s in there to guarantee that the
numbers won't overflow. By default common lisp should assume that the
sum/product of two fixnums is an integer not another fixnum.

(defun pythagorean-triplet-re-optimized-for-speed()
(declare (optimize (speed 3) (debug 0) (safety 0)))
(let ((a 0.0) (i 0))
(declare (type single-float a) (type fixnum i)) ;Moved to avoid
warnings
(loop for c from 2 to (the fixnum +max+) do

(loop for b from 1 below c
do (progn

(setf a (sqrt (the fixnum (- (the fixnum (* c c))
(the fixnum (* b b))))))
(when (= (the single-float(ffloor a)) a) (incf
i))))) ;;Strangely this seemed to help
(print i)))

CL-USER> (time (pythagorean-triplet-re-optimized-for-speed))

13413
Evaluation took:
1.366 seconds of real time
1.336084 seconds of user run time
0.008001 seconds of system run time
[Run times include 0.116 seconds GC run time.]

0 calls to %EVAL
0 page faults and

199,961,352 bytes consed.
13413
CL-USER> (time (pythagorean-triplet-optimized-for-speed))

13413
Evaluation took:
1.66 seconds of real time
1.624102 seconds of user run time
0.040002 seconds of system run time
[Run times include 0.124 seconds GC run time.]

0 calls to %EVAL
0 page faults and

199,957,592 bytes consed.
13413

Bob Felts

unread,

May 19, 2007, 11:51:00 AM5/19/07

to

Chris Russell <christophe...@gmail.com> wrote:

[...]

Why is this consing so much?

Chris Russell

unread,

May 19, 2007, 1:06:01 PM5/19/07

to

> Why is this consing so much?

I don't know. I assume that these single-floats must be passed by
reference and that this is where it's coming from.
More drastically, these algorithms are flawed.

> (- (+ (expt 4576 2) (expt 2015 2))
(expt 5000 2))

gives 1 but 4576, 2015, 5000 is considered an acceptable solution by
the first algorithm due to the rounding error.

I only noticed this when I tried:

(defun pythagorean-triplet()
(let ((c2-b2 0) (a 0) (i 0))
(loop for c from 2 to 5000 do

(loop for b from 1 below c do
(progn

(setf c2-b2 (- (* c c) (* b b))
a (isqrt c2-b2))
(when (= c2-b2 (* a a))
(incf i)))))
(print i)))

Which has an output of
11362

Fortunately, passing double-floats instead of fixnums to sqrt gives
you the right answer, but triples the amount of consing again.

Jon Harrop

unread,

May 19, 2007, 5:31:54 PM5/19/07

to

Chris Russell wrote:
> (defun pythagorean-triplet()
> (let ((c2-b2 0) (a 0) (i 0))
> (loop for c from 2 to 5000 do
> (loop for b from 1 below c do
> (progn
> (setf c2-b2 (- (* c c) (* b b))
> a (isqrt c2-b2))
> (when (= c2-b2 (* a a))
> (incf i)))))
> (print i)))

I think this is the only Lisp posted so far that gives the same answer as
the original program. As far as timings, I get:

OCaml: 0.341s
C: 0.384s
Lisp: 9.233s

--
Dr Jon D Harrop, Flying Frog Consultancy
The F#.NET Journal
http://www.ffconsultancy.com/products/fsharp_journal/?usenet

Rainer Joswig

unread,

May 19, 2007, 6:19:21 PM5/19/07

to

In article <464f6e03$0$8714$ed26...@ptn-nntp-reader02.plus.net>,
Jon Harrop <j...@ffconsultancy.com> wrote:

> Chris Russell wrote:
> > (defun pythagorean-triplet()
> > (let ((c2-b2 0) (a 0) (i 0))
> > (loop for c from 2 to 5000 do
> > (loop for b from 1 below c do
> > (progn
> > (setf c2-b2 (- (* c c) (* b b))
> > a (isqrt c2-b2))
> > (when (= c2-b2 (* a a))
> > (incf i)))))
> > (print i)))
>
> I think this is the only Lisp posted so far that gives the same answer as
> the original program. As far as timings, I get:
>
> OCaml: 0.341s
> C: 0.384s
> Lisp: 9.233s

___________________________
/| /| | |
||__|| | Please don't |
/ O O\__ feed |
/ \ the trolls |
/ \ \ |
/ _ \ \ ----------------------
/ |\____\ \ ||
/ | | | |\____/ ||
/ \|_|_|/ | __||
/ / \ |____| ||
/ | | /| | --|
| | |// |____ --|
* _ | |_|_|_| | \-/
*-- _--\ _ \ // |
/ _ \\ _ // | /
* / \_ /- | - | |
* ___ c_c_c_C/ \C_c_c_c____________

--
http://lispm.dyndns.org

Message has been deleted

Wolfram Fenske

unread,

May 19, 2007, 7:33:35 PM5/19/07

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Chris Russell wrote:
>> (defun pythagorean-triplet()
>> (let ((c2-b2 0) (a 0) (i 0))
>> (loop for c from 2 to 5000 do
>> (loop for b from 1 below c do
>> (progn
>> (setf c2-b2 (- (* c c) (* b b))
>> a (isqrt c2-b2))
>> (when (= c2-b2 (* a a))
>> (incf i)))))
>> (print i)))
>
> I think this is the only Lisp posted so far that gives the same answer as
> the original program. As far as timings, I get:
>
> OCaml: 0.341s
> C: 0.384s
> Lisp: 9.233s

For some reason ISQRT seems to be very slow. Replacing

(isqrt c2-b2)

with

(truncate (sqrt c2-b2))

brings down the runtime on my computer from 9.000s to 0.734s, while
still yielding the same result (11362, right?). Here it is in full:

--8<---------------cut here---------------start------------->8---
(defun pythagorean-triplet ()

(let ((c2-b2 0) (a 0) (i 0))
(loop for c from 2 to 5000 do
(loop for b from 1 below c do
(progn
(setf c2-b2 (- (* c c) (* b b))
;;a (isqrt c2-b2)

a (truncate (sqrt c2-b2)))

(when (= c2-b2 (* a a))
(incf i)))))
(print i)))

--8<---------------cut here---------------end--------------->8---

As has been mentioned, this solution allocates a lot of memory,
ca. 100 MB using SBCL 1.0.5. If that could be eliminated, I assume
it'd be as fast as the C or OCaml versions. Unfortunately I don't
know enough about CL optimization to do this. None of the type
declarations I tried had any effect.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

Chris Russell

unread,

May 19, 2007, 7:34:18 PM5/19/07

to

> I think this is the only Lisp posted so far that gives the same answer as
> the original program. As far as timings, I get:

Well it's the only program guaranteed to keep giving the right answer
as the numbers get bigger and bigger.
This program works for the same range as the c or ocaml

(defun pythagorean-triple2()

(declare (optimize (speed 3) (debug 0) (safety 0)))

(let ((i 0))
(declare (type fixnum i))
(loop for c from 2d0 to 5000d0 do
(loop for b from 1d0 below c do
(progn
(let* ((c2-b2 (- (* c c) (* b b)))
(a (ffloor (the double-float (sqrt c2-b2)))))

(when (= c2-b2 (* a a))

(incf i))))))
(print i)))

But time wise it's out by a factor of 5 or so, probably due to the
boxing of the floats.

Evaluation took:
1.792 seconds of real time

Does anyone know how to turn this off?

Chris Russell

unread,

May 19, 2007, 7:49:56 PM5/19/07

to

On 20 May, 00:33, Wolfram Fenske <i...@gmx.net> wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:
> > Chris Russell wrote:
> >> (defun pythagorean-triplet()
> >> (let ((c2-b2 0) (a 0) (i 0))
> >> (loop for c from 2 to 5000 do
> >> (loop for b from 1 below c do
> >> (progn
> >> (setf c2-b2 (- (* c c) (* b b))
> >> a (isqrt c2-b2))
> >> (when (= c2-b2 (* a a))
> >> (incf i)))))
> >> (print i)))
>
> > I think this is the only Lisp posted so far that gives the same answer as
> > the original program. As far as timings, I get:
>
> > OCaml: 0.341s
> > C: 0.384s
> > Lisp: 9.233s
>
> For some reason ISQRT seems to be very slow. Replacing
>
> (isqrt c2-b2)
>
> with
>
> (truncate (sqrt c2-b2))
>
> brings down the runtime on my computer from 9.000s to 0.734s, while
> still yielding the same result (11362, right?). Here it is in full:

Cool. :)

Jon Harrop

unread,

May 20, 2007, 9:24:43 AM5/20/07

to

j.oke wrote:
> Now we get:
>
> OCaml: 0.341s [not tested]
> C: 0.384s [not tested]
> Lisp: 0.0s [tested!]

Can you post your code so that we can benchmark compile times?

Jon Harrop

unread,

May 20, 2007, 9:33:00 AM5/20/07

to

Wolfram Fenske wrote:
> For some reason ISQRT seems to be very slow. Replacing
>
> (isqrt c2-b2)
>
> with
>
> (truncate (sqrt c2-b2))
>
> brings down the runtime on my computer from 9.000s to 0.734s, while
> still yielding the same result (11362, right?).

Yes.

> As has been mentioned, this solution allocates a lot of memory,
> ca. 100 MB using SBCL 1.0.5. If that could be eliminated, I assume
> it'd be as fast as the C or OCaml versions. Unfortunately I don't
> know enough about CL optimization to do this. None of the type
> declarations I tried had any effect.

This is much faster. Now I get:

C: 0.299s
OCaml: 0.341s
Lisp: 0.454s

with better compiler options for the C code (-O3 -ffast-math).

Jon Harrop

unread,

May 20, 2007, 9:43:06 AM5/20/07

to

Chris Russell wrote:
>> I think this is the only Lisp posted so far that gives the same answer as
>> the original program. As far as timings, I get:
>
> Well it's the only program guaranteed to keep giving the right answer
> as the numbers get bigger and bigger.

When that happens, this program will be taking almost 500 years to run. :-)

Kjetil S. Matheussen

unread,

May 20, 2007, 10:19:36 AM5/20/07

to

On Sat, 19 May 2007, Jon Harrop wrote:

> Chris Russell wrote:
>> (defun pythagorean-triplet()
>> (let ((c2-b2 0) (a 0) (i 0))
>> (loop for c from 2 to 5000 do
>> (loop for b from 1 below c do
>> (progn
>> (setf c2-b2 (- (* c c) (* b b))
>> a (isqrt c2-b2))
>> (when (= c2-b2 (* a a))
>> (incf i)))))
>> (print i)))
>
> I think this is the only Lisp posted so far that gives the same answer as
> the original program. As far as timings, I get:
>
> OCaml: 0.341s
> C: 0.384s
> Lisp: 9.233s

C: 0.604s
Scheme: 0.478s

scheme
------
"
(define (pythag max)
(let ((i 0))
(let loop ((c 2))
(if (<= c max)
(begin
(let loop ((b 1))
(if (< b c)
(begin
(let ((a (sqrt (- (* c c) (* b b)))))
(if (= (floor a) a)
(set! i (+ 1 i))))
(loop (+ 1 b)))))
(loop (+ 1 c)))))
(display i)
(newline)))
(pythag 5000)
"
stalin -On -copt -O3 -copt -ffast-math pythag.scm
time ./pythag

C
-
"
#include <stdio.h>
#include <math.h>
#define MAX 5000
int main() {
double c, b;
int i = 0;
double a;

for (c = 2; c <= MAX; c++) {
for (b = 1; b < c; b++) {
a = sqrt(c*c - b*b);
if (trunc(a) == a) {
i++;
}
}
}
printf("%d\n", i);
}
"
gcc -O3 -ffast-math pythag_c.c -lm
time ./a.out

Kjetil S. Matheussen

unread,

May 20, 2007, 10:45:56 AM5/20/07

to

On Sun, 20 May 2007, Kjetil S. Matheussen wrote:

>
>
> On Sat, 19 May 2007, Jon Harrop wrote:
>
>> Chris Russell wrote:
>> > (defun pythagorean-triplet()
>> > (let ((c2-b2 0) (a 0) (i 0))
>> > (loop for c from 2 to 5000 do
>> > (loop for b from 1 below c do
>> > (progn
>> > (setf c2-b2 (- (* c c) (* b b))
>> > a (isqrt c2-b2))
>> > (when (= c2-b2 (* a a))
>> > (incf i)))))
>> > (print i)))
>>
>> I think this is the only Lisp posted so far that gives the same answer as
>> the original program. As far as timings, I get:
>>
>> OCaml: 0.341s
>> C: 0.384s
>> Lisp: 9.233s
>
> C: 0.604s
> Scheme: 0.478s
>

Further experimentation shows:

C: 0.620s (median of 5)
Scheme/Lisp: 0.408s (median of 5)

gcc -O3 -ffast-math -fomit-frame-pointer -freg-struct-return
-malign-double pythag_c.c -lm

../stalin-0.11/stalin -Ob -Om -On -Or -Ot -dP -dC -dh -q -d -db
-no-clone-size-limit -split-even-if-no-widening -copt -O3
-copt -fomit-frame-pointer -copt -freg-struct-return -copt -ffast-math
pythag.scm

Wade Humeniuk

unread,

May 20, 2007, 11:35:26 AM5/20/07

to

None of the proposed solutions is guaranteed to give the correct
solution since they all resort to using imprecise floats. I am
beginning to hate the warm fuzzies. Try this instead.

(defparameter +max+ 5000)

(defparameter *perfect-square-lookup*
(let ((lookup (make-hash-table)))
(loop for i from 0 to (1+ +max+)
for i2 = (* i i) do
(setf (gethash i2 lookup) i2))
lookup))

(defun perfect-square (n) (gethash n *perfect-square-lookup*))

(defun pythagorean-triplet (&aux (n 0))
(loop for c from 2 to +max+ do
(loop for b from 1 below c
when (perfect-square (- (* c c) (* b b)))
do (incf n)))
n)

CL-USER> (time (pythagorean-triplet))
Evaluation took:
1.471 seconds of real time
1.469061 seconds of user run time
9.93e-4 seconds of system run time

0 calls to %EVAL
0 page faults and

0 bytes consed.
11362
CL-USER>

Wade

vy

unread,

May 20, 2007, 12:02:21 PM5/20/07

to

On May 19, 4:38 pm, Emre Sevinc <emre.sev...@gmail.com> wrote:
> (defconstant +max+ 5000
> "The constant value up to which the main loop will run")
>
> (defun pythagorean-triplet()
> (let ((a 0) (i 0))
> (loop for c from 2 to +max+
> do
> (loop for b from 1 below c
> do (progn
> (setf a (sqrt (- (* c c) (* b b))))
> (when (= (floor a) a) (incf i)))))
> (print i)))

I'm not sure but maybe it would also be possible to optimize the
algorithm to skip triples with bogus parities.

(defun pythagorean-triplet (lim)
(loop for c from 2 to (the fixnum lim)
for cc = (* c c)
for cp = (mod c 2)
sum (loop for b from 1 below c
for bp = (mod b 2)
for aa = (- cc (* b b))
for a = (sqrt aa)
for ap = (mod aa 2)
count (and (or (= 0 ap bp cp) ; E +
E = E
(and (= 1 ap bp) (= 0 cp)) ; O +
O = E
(and (= 1 ap cp) (= 0 bp)) ; O +
E = O
(and (= 1 bp cp) (= 0 ap))) ; E +
O = O
(= (floor a) a)))))

It seems like parity tests cost much more than it ought to help out.
Maybe others know how to optimize them anyway. (Because parity tests
reduce the example space to its half size.)

Regards.

Message has been deleted

Chris Russell

unread,

May 20, 2007, 1:12:00 PM5/20/07

to

On May 20, 4:35 pm, Wade Humeniuk <whume...@telus.net.no.spam> wrote:
> None of the proposed solutions is guaranteed to give the correct
> solution since they all resort to using imprecise floats. I am
> beginning to hate the warm fuzzies. Try this instead.

The hyperspec is a bit ambiguous about whether isqrt is rounded out
from the floats or not.
It says:

isqrt returns the greatest integer less than or equal to the *exact*
positive square root of natural.

But there is also a note at the bottom that reads:

(isqrt x) == (values (floor (sqrt x)))

but it is potentially more efficient.

So I looked at the SBCL source code to see what was actually going on
under the hood, and this is the relevant code.

;;; From discussion on comp.lang.lisp and Akira Kurihara.

(defun isqrt (n)
#!+sb-doc
"Return the root of the nearest integer less than n which is a
perfect
square."
(declare (type unsigned-byte n) (values unsigned-byte))
;; Theoretically (> n 7), i.e., n-len-quarter > 0.
(if (and (fixnump n) (<= n 24))
(cond ((> n 15) 4)
((> n 8) 3)
((> n 3) 2)
((> n 0) 1)
(t 0))
(let* ((n-len-quarter (ash (integer-length n) -2))
(n-half (ash n (- (ash n-len-quarter 1))))
(n-half-isqrt (isqrt n-half))
(init-value (ash (1+ n-half-isqrt) n-len-quarter)))
(loop
(let ((iterated-value
(ash (+ init-value (truncate n init-value)) -1)))
(unless (< iterated-value init-value)
(return init-value))
(setq init-value iterated-value))))))

I'd say it is also appropriate for calculating the exact solution.

Jon Harrop

unread,

May 20, 2007, 1:14:10 PM5/20/07

to

Wade Humeniuk wrote:
> None of the proposed solutions is guaranteed to give the correct
> solution since they all resort to using imprecise floats.

You mean if the program is run for years it will eventually suffer from
imprecision? I think we have more important things to worry about...

Jon Harrop

unread,

May 20, 2007, 1:19:34 PM5/20/07

to

j.oke wrote:
> If you can live with compromises (in order to give you some
> superficial(!) advantages), you might consider to switch to some other
> language...

What if the superficial disadvantage is "Lisp lost me a $1M trade because it
hung whilst garbage collecting a big int I never needed" or "nobody is
buying my computer game because Lisp has generalised the player coordinates
to arbitrary-precision numbers and crippled the frame rate"?

The fact is, Lisp is slow and it puts people off.

Juho Snellman

unread,

May 20, 2007, 1:45:02 PM5/20/07

to

Emre Sevinc <emre....@gmail.com> wrote:
> (defun pythagorean-triplet()
> (let ((a 0) (i 0))
> (loop for c from 2 to +max+
> do
> (loop for b from 1 below c
> do (progn
> (setf a (sqrt (- (* c c) (* b b))))
> (when (= (floor a) a) (incf i)))))
> (print i)))

(deftype small-positive-double-float ()
`(double-float 0d0 ,(float most-positive-fixnum 0d0)))

(defun pythagorean-triplet ()
(declare (optimize speed (safety 0)))
(loop for c from 2 to 5000
sum (loop for b from 1 below c
count (let ((a (the small-positive-double-float
(sqrt (float (- (* c c) (* b b)) 0d0)))))
(= (float (truncate a) 0d0) a)))))

There are a few things to note here:

* SBCL can open-code SQRT on values known to be positive floats (for
example on x86-64 into a single sqrtsd instruction).
* SBCL will only open-code TRUNCATE (or FLOOR) on floats when the
result is known to fit into a machine word.
* SBCL won't open-code = on arguments that aren't known to be of the
same types.

CL-USER> (time (print (pythagorean-triplet)))
11362
Evaluation took:
0.16 seconds of real time
0.159975 seconds of user run time
0.0 seconds of system run time

0 calls to %EVAL
0 page faults and

8,192 bytes consed.

CL-USER> (values (lisp-implementation-type)
(lisp-implementation-version)
(machine-type))
"SBCL"
"1.0.5.54"
"X86-64"

--
Juho Snellman

Rainer Joswig

unread,

May 20, 2007, 1:46:21 PM5/20/07

to

In article <4650847e$0$8738$ed26...@ptn-nntp-reader02.plus.net>,
Jon Harrop <j...@ffconsultancy.com> wrote:

> j.oke wrote:
> > If you can live with compromises (in order to give you some
> > superficial(!) advantages), you might consider to switch to some other
> > language...
>
> What if the superficial disadvantage is "Lisp lost me a $1M trade because it
> hung whilst garbage collecting a big int I never needed" or "nobody is
> buying my computer game because Lisp has generalised the player coordinates
> to arbitrary-precision numbers and crippled the frame rate"?
>
> The fact is, Lisp is slow and it puts people off.

___________________________

Wolfram Fenske

unread,

May 20, 2007, 1:54:32 PM5/20/07

to

Wade Humeniuk <whum...@telus.net.no.spam> writes:

> None of the proposed solutions is guaranteed to give the correct
> solution since they all resort to using imprecise floats. I am
> beginning to hate the warm fuzzies. Try this instead.

>From the "memory is cheap" front comes this modification that replaces
the hash table with a simple vector:

--8<---------------cut here---------------start------------->8---
(defconstant +max+ 5000)

(defun make-lookup (size)
(let* ((lookup (make-array (* size size) :initial-element nil)))
(dotimes (i size lookup)
(setf (svref lookup (* i i)) t))))

(defun pythagorean-triplet ()
(let* ((n 0)
(lookup (make-lookup (1+ +max+))))

(loop for c from 2 to +max+ do
(loop for b from 1 below c

when (svref lookup (- (* c c) (* b b)))
do (incf n)))
(print n)))
--8<---------------cut here---------------end--------------->8---

This sped things up by a factor of 2 using SBCL 1.0.5. The hash table
version takes 2.173s on my machine, the vector version 0.974s.

Kent M Pitman

unread,

May 20, 2007, 2:01:58 PM5/20/07

to

Jon Harrop <j...@ffconsultancy.com> writes:

> j.oke wrote:
> > If you can live with compromises (in order to give you some
> > superficial(!) advantages), you might consider to switch to some other
> > language...
>
> What if the superficial disadvantage is "Lisp lost me a $1M trade because it
> hung whilst garbage collecting a big int I never needed"

Garbage collection is something done by the Sun JVM and the Microsoft
CLR. Why don't you worry about the billions they are losing, instead
of the paltry millions we're losing. Sounds like a real business
opportunity for you.

Or is it possible that this is just a made up problem and that
ordinary quality assurance is responsible for catching it? Do things
get through QA? Sure. But that doesn't make it the responsibility of
the tool.

Incidentally, there's also an unspoken presupposition in this post that
somehow if it had been done in some other language (I can't imagine
which you were thinking of), this wouldn't happen.

What would happen if I lost a million dollars because my statically
type-checked application that nevertheless yielded a bad result.
(Oops. It turns out that some design errors still yield effects at
runtime.) I hear you saying "That would just be bad design." or
"This would be caught by QA, and too bad if someone doesn't QA their
application." As if somehow those reasons aren't equally true of
dynamic languages.

The difference is that dynamic language users tend to know they have
to test their programs and make them robust against bad data. Too
often in static languages, I've seen people assume that after all the
hassle of getting something to compile, they have an entitlement for
it to just work... and then it fails mysteriously at runtime because
they forget that it IS possible for statically checked programs to
have runtime errors.

I've seen plenty of statically typed applications crash mysteriously
on null errors or other bad-data errors (e.g., a list of homogeneously
typed foo elements was expected to be of known length, so was declared
[foo], when really it had a type limitation no one encoded, so it
slipped past the type checker but produced bad data something else
never expected because it thought the upstream facility was strongly
checking everything).

> or "nobody is buying my computer game because Lisp has generalised
> the player coordinates to arbitrary-precision numbers and crippled
> the frame rate"?

Wow, it got all the way through to release without needing to be tested
and ran on the first try, just a little slowly? Cool.

Does that happen a lot in ML? Someone writes an entire game application
and never tests it, then compiles it and runs it and it plays fine
AND fast on the first try?

Or is it possible that this is just a completely contrived example
not typical of anything?

> The fact is, Lisp is slow and it puts people off.

If you're so put off, why are you still posting here?

Jon Harrop

unread,

May 20, 2007, 4:07:50 PM5/20/07

to

Kent M Pitman wrote:
> Incidentally, there's also an unspoken presupposition in this post that

> somehow if it had been done in some other language...

You've really given up on doing this in Lisp? I had expected to get a
competitive solution with only a few more days work... ;-)

Jon Harrop

unread,

May 20, 2007, 4:28:51 PM5/20/07

to

Juho Snellman wrote:
> (deftype small-positive-double-float ()
> `(double-float 0d0 ,(float most-positive-fixnum 0d0)))
>
> (defun pythagorean-triplet ()
> (declare (optimize speed (safety 0)))
> (loop for c from 2 to 5000
> sum (loop for b from 1 below c
> count (let ((a (the small-positive-double-float
> (sqrt (float (- (* c c) (* b b))
> 0d0)))))
> (= (float (truncate a) 0d0) a)))))

This is the fastest Lisp so far and its run time is faster than everything
except the macro solution that j.oke hinted at.

So this is clearly doing significant work at compile time.

For a fair comparison, you either need to include compilation time or not
hardcode the 5000.

Including compile times I get:

C: 0.399s
OCaml: 0.478s
SBCL: 0.992s

Tweaking the programs to accept N as an argument and then measuring only run
times, I get:

C: 0.291s
OCaml: 0.350s
SBCL: 0.885s

So Lisp is around 3x slower than C from this benchmark. OCaml is only 20%
slower.

Frank Buss

unread,

May 20, 2007, 4:56:25 PM5/20/07

to

Jon Harrop wrote:

> Tweaking the programs to accept N as an argument and then measuring only run
> times, I get:
>
> C: 0.291s
> OCaml: 0.350s
> SBCL: 0.885s
>
> So Lisp is around 3x slower than C from this benchmark. OCaml is only 20%
> slower.

3x to 5x slower than C is the same result I had with other micro-benchmarks
like this, but who cares? I don't think this could be generalized for
larger applications, because typically there are only a few hotspots, which
could be implemented in C and called by Lisp and for the rest of the
program it is more important to have the flexibility Lisp provides, to make
it easier for developers to write and maintain programs. And with
incremental GCs, like modern Lisp implementations provide, there is no
significant timeout e.g. when using Lisp for games.

Some more reasons why a good GC is faster than manual memory managment:
http://www.jwz.org/doc/gc.html

--
Frank Buss, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Jon Harrop

unread,

May 20, 2007, 5:01:34 PM5/20/07

to

Frank Buss wrote:
> Some more reasons why a good GC is faster than manual memory managment:
> http://www.jwz.org/doc/gc.html

So you'd recommend the fastest language with GC?

Juho Snellman

unread,

May 20, 2007, 5:29:33 PM5/20/07

to

Jon Harrop <j...@ffconsultancy.com> wrote:
> This is the fastest Lisp so far and its run time is faster than everything
> except the macro solution that j.oke hinted at.
>
> So this is clearly doing significant work at compile time.

It's not doing any of the computation at compile time. "Clearly" your
prejudices are showing.

> For a fair comparison, you either need to include compilation time or not
> hardcode the 5000.
>
> Including compile times I get:
>
> C: 0.399s
> OCaml: 0.478s
> SBCL: 0.992s

Compiling that function takes less than 0.03 seconds on my machine. I
suggest that your measurements are complete bullshit.

> Tweaking the programs to accept N as an argument and then measuring only run
> times, I get:
>
> C: 0.291s
> OCaml: 0.350s
> SBCL: 0.885s
>
> So Lisp is around 3x slower than C from this benchmark. OCaml is only 20%
> slower.

Declaring the type of N (for example as an (unsigned-byte 14)) would
fix that.

--
Juho Snellman

Wolfram Fenske

unread,

May 20, 2007, 5:36:53 PM5/20/07

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Juho Snellman wrote:
>> (deftype small-positive-double-float ()
>> `(double-float 0d0 ,(float most-positive-fixnum 0d0)))
>>
>> (defun pythagorean-triplet ()
>> (declare (optimize speed (safety 0)))
>> (loop for c from 2 to 5000
>> sum (loop for b from 1 below c
>> count (let ((a (the small-positive-double-float
>> (sqrt (float (- (* c c) (* b b))
>> 0d0)))))
>> (= (float (truncate a) 0d0) a)))))
>
> This is the fastest Lisp so far and its run time is faster than everything
> except the macro solution that j.oke hinted at.
>
> So this is clearly doing significant work at compile time.

No, it isn't. All it does is use some type declarations that help
SBCL avoid boxing and use inline arithmetic where possible.

> For a fair comparison, you either need to include compilation time

No, since nothing is pre-computed during compilation time.

> or not hardcode the 5000.

[...]

> Tweaking the programs to accept N as an argument and then measuring
> only run times, I get:

[...]

> So Lisp is around 3x slower than C from this benchmark. OCaml is
> only 20% slower.

I'm assuming the first few lines of your tweaked version look like
this?

--8<---------------cut here---------------start------------->8---
(defun pythagorean-triplet (n)

(declare (optimize speed (safety 0)))

(loop for c from 2 to n
...))
--8<---------------cut here---------------end--------------->8---

If that is so, SBCL's type inferencer cannot assume that n is a
machine integer, which means it also cannot assume that c is a machine
integer, etc.. This leads to inefficient code.

Here is a version that takes n as a parameter, annotates even more
types than Juho's version, and runs even faster. I'm sorry I had to
switch from LOOP to the DO macro, but using LOOP, SBCL kept giving me
performance warnings.

--8<---------------cut here---------------start------------->8---

(deftype small-positive-double-float ()
`(double-float 0d0 ,(float most-positive-fixnum 0d0)))

(defun pythagorean-triplet (n)
(declare (optimize speed (safety 0))
(type fixnum n))
(do ((c 2 (1+ c))
(count 0))
((> c n)
count)
(declare (type fixnum c count))
(do ((b 1 (1+ b)))
((>= b c))
(declare (type fixnum b))
(let ((a (the small-positive-double-float
(sqrt (float (- (the fixnum (* c c))
(the fixnum (* b b)))
0d0)))))
(when (= (float (truncate a) 0d0) a)
(incf count))))))
--8<---------------cut here---------------end--------------->8---

I get 0.446183s for this Lisp version and 0.355s for the C version.
Measurements do not include compilation times.

Jon Harrop

unread,

May 20, 2007, 7:15:42 PM5/20/07

to

Juho Snellman wrote:
> It's not doing any of the computation at compile time.

Wolfram's results prove otherwise.

Wolfram Fenske

unread,

May 20, 2007, 7:49:35 PM5/20/07

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Juho Snellman wrote:
>> It's not doing any of the computation at compile time.
>
> Wolfram's results prove otherwise.

Huh? How so?

Kjetil S. Matheussen

unread,

May 20, 2007, 8:34:18 PM5/20/07

to

On Sun, 20 May 2007, Jon Harrop wrote:

> Juho Snellman wrote:
>> (deftype small-positive-double-float ()
>> `(double-float 0d0 ,(float most-positive-fixnum 0d0)))
>>
>> (defun pythagorean-triplet ()
>> (declare (optimize speed (safety 0)))
>> (loop for c from 2 to 5000
>> sum (loop for b from 1 below c
>> count (let ((a (the small-positive-double-float
>> (sqrt (float (- (* c c) (* b b))
>> 0d0)))))
>> (= (float (truncate a) 0d0) a)))))
>
> This is the fastest Lisp so far and its run time is faster than everything
> except the macro solution that j.oke hinted at.
>

No. Not correct. As I posted, the stalin version was about 1.5 times
faster than the C version, and therefore probably also faster than the
OCaml version. Even better, the stalin version wasn't even optimized, I
just wrote down the most straigh-forward version, compiled it, and bang,
1.5 times faster than C.

Wade Humeniuk

unread,

May 20, 2007, 9:11:05 PM5/20/07

to

Wolfram Fenske <in...@gmx.net> writes:

> Wade Humeniuk <whum...@telus.net.no.spam> writes:
>
>> None of the proposed solutions is guaranteed to give the correct
>> solution since they all resort to using imprecise floats. I am
>> beginning to hate the warm fuzzies. Try this instead.
>
>>From the "memory is cheap" front comes this modification that replaces
> the hash table with a simple vector:
>
> --8<---------------cut here---------------start------------->8---
> (defconstant +max+ 5000)
>
> (defun make-lookup (size)
> (let* ((lookup (make-array (* size size) :initial-element nil)))
> (dotimes (i size lookup)
> (setf (svref lookup (* i i)) t))))
>
> (defun pythagorean-triplet ()
> (let* ((n 0)
> (lookup (make-lookup (1+ +max+))))
> (loop for c from 2 to +max+ do
> (loop for b from 1 below c
> when (svref lookup (- (* c c) (* b b)))
> do (incf n)))
> (print n)))
> --8<---------------cut here---------------end--------------->8---
>
> This sped things up by a factor of 2 using SBCL 1.0.5. The hash table
> version takes 2.173s on my machine, the vector version 0.974s.
>

In the same vein,...

(defconstant +max+ 5000)

(defun make-lookup (size)
(let* ((lookup (make-array (* size size) :element-type 'bit :initial-element 0)))
(dotimes (i size lookup)
(setf (sbit lookup (* i i)) 1))))

(defun pythagorean-triplet ()
(declare (optimize (speed 3) (safety 0) (debug 0)))

(let* ((n 0)
(lookup (make-lookup (1+ +max+))))

(declare (fixnum n))

(loop for c from 2 to +max+ do
(loop for b from 1 below c

unless (zerop (sbit lookup (- (* c c) (* b b))))

do (incf n)))
(print n)))

CL-USER> (time (pythagorean-triplet))

11362
Evaluation took:
0.07 seconds of real time
0.069843 seconds of user run time
2.73e-4 seconds of system run time

0 calls to %EVAL
0 page faults and

3,126,264 bytes consed.
11362
CL-USER>

Why this is so fast I am not sure, but disassembling...
you can see why.

CL-USER> (disassemble 'pythagorean-triplet)
; 11E9FC6A: C745F000000000 MOV DWORD PTR [EBP-16], 0 ; no-arg-parsing entry point
; C71: 8BDC MOV EBX, ESP
; C73: 83EC0C SUB ESP, 12
; C76: BA244E0000 MOV EDX, 20004
; C7B: 8B0540FCE911 MOV EAX, [#x11E9FC40] ; #<FDEFINITION object for MAKE-LOOKUP>
; C81: B904000000 MOV ECX, 4
; C86: 896BFC MOV [EBX-4], EBP
; C89: 8BEB MOV EBP, EBX
; C8B: FF5005 CALL DWORD PTR [EAX+5]
; C8E: 7302 JNB L0
; C90: 8BE3 MOV ESP, EBX
; C92: L0: 8BDA MOV EBX, EDX
; C94: C745EC08000000 MOV DWORD PTR [EBP-20], 8
; C9B: EB50 JMP L5
; C9D: L1: 8B75EC MOV ESI, [EBP-20]
; CA0: BA04000000 MOV EDX, 4
; CA5: EB39 JMP L4
; CA7: L2: 8B4DEC MOV ECX, [EBP-20]
; CAA: C1F902 SAR ECX, 2
; CAD: 0FAF4DEC IMUL ECX, [EBP-20]
; CB1: 8BFA MOV EDI, EDX
; CB3: C1FF02 SAR EDI, 2
; CB6: 0FAFFA IMUL EDI, EDX
; CB9: 29F9 SUB ECX, EDI
; CBB: 8BF9 MOV EDI, ECX
; CBD: C1FF02 SAR EDI, 2
; CC0: 8BCF MOV ECX, EDI
; CC2: C1E905 SHR ECX, 5
; CC5: 8B448B01 MOV EAX, [EBX+ECX*4+1]
; CC9: 8BCF MOV ECX, EDI
; CCB: D3E8 SHR EAX, CL
; CCD: 83E001 AND EAX, 1
; CD0: 8945F4 MOV [EBP-12], EAX
; CD3: 8B4DF4 MOV ECX, [EBP-12]
; CD6: C1E102 SHL ECX, 2
; CD9: 85C9 TEST ECX, ECX
; CDB: 752D JNE L6
; CDD: L3: 83C204 ADD EDX, 4
; CE0: L4: 39F2 CMP EDX, ESI
; CE2: 7CC3 JL L2
; CE4: 8B45EC MOV EAX, [EBP-20]
; CE7: 83C004 ADD EAX, 4
; CEA: 8945EC MOV [EBP-20], EAX
; CED: L5: 817DEC204E0000 CMP DWORD PTR [EBP-20], 20000
; CF4: 7EA7 JLE L1
; CF6: 8B55F0 MOV EDX, [EBP-16]
; CF9: 8B0544FCE911 MOV EAX, [#x11E9FC44] ; #<FDEFINITION object for PRINT>
; CFF: B904000000 MOV ECX, 4
; D04: FF75F8 PUSH DWORD PTR [EBP-8]
; D07: FF6005 JMP DWORD PTR [EAX+5]
; D0A: L6: 8B4DF0 MOV ECX, [EBP-16]
; D0D: 83C104 ADD ECX, 4
; D10: 894DF0 MOV [EBP-16], ECX
; D13: EBC8 JMP L3
; D15: 90 NOP
; D16: 90 NOP
; D17: 90 NOP
;
NIL
CL-USER>

Wade

Larry Clapp

unread,

May 20, 2007, 10:05:53 PM5/20/07

to

/me feeds the troll.

On 2007-05-20, Jon Harrop <j...@ffconsultancy.com> wrote:
> Juho Snellman wrote:
>> (deftype small-positive-double-float ()
>> `(double-float 0d0 ,(float most-positive-fixnum 0d0)))
>>
>> (defun pythagorean-triplet ()
>> (declare (optimize speed (safety 0)))
>> (loop for c from 2 to 5000
>> sum (loop for b from 1 below c
>> count (let ((a (the small-positive-double-float
>> (sqrt (float (- (* c c) (* b b))
>> 0d0)))))
>> (= (float (truncate a) 0d0) a)))))
>
> This is the fastest Lisp so far and its run time is faster than
> everything except the macro solution that j.oke hinted at.
>
> So this is clearly doing significant work at compile time.

Others have disagreed with this statement, but ...

> For a fair comparison, you either need to include compilation time

... I'll disagree with this one.

Suddenly the Lisp program wins, through (you believe) compile-time
computation, and suddenly you change what you measure so that the Lisp
program doesn't win any more.

Why?

You might as well include the typing speed of the programmers too,
"for a fair comparison". Or, hmm, the square-footage of their houses,
"for a fair comparison".

You should define what you want up front, and whoever wins, wins.

Just prepare yourself for *us* to want different things (or in
different proportions) than *you*.

Kjetil S. Matheussen

unread,

May 20, 2007, 10:46:27 PM5/20/07

to

On Mon, 21 May 2007, Kjetil S. Matheussen wrote:

>
>
> On Sun, 20 May 2007, Jon Harrop wrote:
>
>> Juho Snellman wrote:
>> > (deftype small-positive-double-float ()
>> > `(double-float 0d0 ,(float most-positive-fixnum 0d0)))
>> >
>> > (defun pythagorean-triplet ()
>> > (declare (optimize speed (safety 0)))
>> > (loop for c from 2 to 5000
>> > sum (loop for b from 1 below c
>> > count (let ((a (the small-positive-double-float
>> > (sqrt (float (- (* c c) (* b b))
>> > 0d0)))))
>> > (= (float (truncate a) 0d0) a)))))
>>
>> This is the fastest Lisp so far and its run time is faster than everything
>> except the macro solution that j.oke hinted at.
>>
>
> No. Not correct.

%!#$%

I thought the lisp benchmark result in your post was created running the
version above.

Wolfram Fenske

unread,

May 20, 2007, 11:41:16 PM5/20/07

to

Wade Humeniuk <whum...@telus.net.no.spam> writes:

> Wolfram Fenske <in...@gmx.net> writes:
>
>> Wade Humeniuk <whum...@telus.net.no.spam> writes:
>>
>>> None of the proposed solutions is guaranteed to give the correct
>>> solution since they all resort to using imprecise floats. I am
>>> beginning to hate the warm fuzzies. Try this instead.
>>
>> From the "memory is cheap" front comes this modification that replaces
>> the hash table with a simple vector:

[...]

> In the same vein,...
>
> (defconstant +max+ 5000)
>
> (defun make-lookup (size)
> (let* ((lookup (make-array (* size size) :element-type 'bit :initial-element 0)))
> (dotimes (i size lookup)
> (setf (sbit lookup (* i i)) 1))))

[...]

> CL-USER> (time (pythagorean-triplet))
>
> 11362
> Evaluation took:
> 0.07 seconds of real time
> 0.069843 seconds of user run time
> 2.73e-4 seconds of system run time
> 0 calls to %EVAL
> 0 page faults and
> 3,126,264 bytes consed.
> 11362

What kind of machine have you got? I'm using SBCL 1.0.5 on an AMD64
2800+ or something, the disassemblies look identical, but for me

(time (pythagorean-triplet))

says:

--8<---------------cut here---------------start------------->8---
Evaluation took:
0.267 seconds of real time
0.25461 seconds of user run time
7.5e-5 seconds of system run time

0 calls to %EVAL
0 page faults and
3,126,264 bytes consed.

--8<---------------cut here---------------end--------------->8---

... which, by the way, is still *awesome* -- the best result so far,
even faster than the C version [1] I have.

Here's my slightly slower version with `+max+' as a parameter:

--8<---------------cut here---------------start------------->8---
(defun pythagorean-triplet (n)

(declare (optimize speed (safety 0))
(type fixnum n))
(do ((c 2 (1+ c))
(count 0)

(lookup (make-lookup (1+ n))))

((> c n)
count)
(declare (type fixnum c count))
(do ((b 1 (1+ b)))
((>= b c))
(declare (type fixnum b))

(unless (zerop (sbit lookup (the fixnum

(- (the fixnum (* c c))

(the fixnum (* b b))))))
(incf count)))))
--8<---------------cut here---------------end--------------->8---

Timings:

--8<---------------cut here---------------start------------->8---
* (time (pythagorean-triplet 5000))

Evaluation took:
0.282 seconds of real time
0.270825 seconds of user run time
7.4e-5 seconds of system run time

0 calls to %EVAL
0 page faults and
3,126,264 bytes consed.
11362

--8<---------------cut here---------------end--------------->8---

Also faster than the C version (0.356 s) on my machine.

Footnotes:
[1] My C version doesn't use bit fields, though. It's a direct
translation of the original algorithm, using `sqrt' etc.

Alex Mizrahi

unread,

May 21, 2007, 2:55:40 AM5/21/07

to

(message (Hello 'Jon)
(you :wrote :on '(Sun, 20 May 2007 21:07:50 +0100))
(

??>> Incidentally, there's also an unspoken presupposition in this post
??>> that somehow if it had been done in some other language...

JH> You've really given up on doing this in Lisp? I had expected to get a
JH> competitive solution with only a few more days work... ;-)

why do you discard competitive solution written in Scheme by Kjetil?

and why do you need these benchmarks? there are already lots of them. e.g.

http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=sbcl&lang2=ocaml

you can find SBCL there beats ocaml in 5 benchmarks, go better optimize
ocaml code..
(OTOH ocaml beats SBCL in 10 benchmarks, but very slightly, maybe it's just
because of 13x larger startup time of SBCL)

btw, you've mentioned GC time. binary-trees is a benchmark for GC made by
Hans Boehm. SBCL is 1.4 times faster than ocaml in it, and on par with C.

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"I am everything you want and I am everything you need")

Jon Harrop

unread,

May 21, 2007, 7:15:35 AM5/21/07

to

Alex Mizrahi wrote:
> why do you discard competitive solution written in Scheme by Kjetil?

I haven't discarded it: I just haven't gotten around to looking at it yet
(Stalin doesn't run in 64-bit AFAIK).

> and why do you need these benchmarks?

We're just playing. :-)

> there are already lots of them. e.g.
>
>
http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=sbcl&lang2=ocaml
>
> you can find SBCL there beats ocaml in 5 benchmarks, go better optimize
> ocaml code..

My machine gives very different results: OCaml is only slower on
binary-trees.

> (OTOH ocaml beats SBCL in 10 benchmarks, but very slightly, maybe it's
> just because of 13x larger startup time of SBCL)
>
> btw, you've mentioned GC time. binary-trees is a benchmark for GC made by
> Hans Boehm. SBCL is 1.4 times faster than ocaml in it, and on par with C.

The benchmark definition is poor: I submitted several progressively faster
OCaml implementation that took time going down to zero seconds about two
years ago and they all got rejected.

The same optimizations can be applied to the Lisp, of course. So you aren't
really measuring anything.

Jon Harrop

unread,

May 21, 2007, 7:18:09 AM5/21/07

to

Larry Clapp wrote:
>> For a fair comparison, you either need to include compilation time
>
> ... I'll disagree with this one.
>
> Suddenly the Lisp program wins, through (you believe) compile-time
> computation, and suddenly you change what you measure so that the Lisp
> program doesn't win any more.
>
> Why?

Other solutions also take zero run time but significant compile time (to do
the calculation). How do you propose we compare them?

Jon Harrop

unread,

May 21, 2007, 7:36:15 AM5/21/07

to

Kjetil S. Matheussen wrote:
> No. Not correct. As I posted, the stalin version was about 1.5 times
> faster than the C version,

Stalin only runs on my 32-bit system so I haven't gotten around to it yet.

> Even better, the stalin version wasn't even optimized,

Unless you count the page of compiler flags. ;-)

> I
> just wrote down the most straigh-forward version, compiled it, and bang,
> 1.5 times faster than C.

This also indicates that the C could be improved.

Using ints instead of doubles and (int)(..) instead of trunc(..) makes this
C99 more than twice as fast (0.137s):

int main(int argc, char *argv[]) {
const int n = (argc == 2 ? atoi(argv[1]) : 5000);
int i = 0;

for (int c=2; c<=n; c++)
for (int b=1; b<c; b++) {
double a = sqrt(c*c - b*b);
if ((int)a == a) i++;
}
printf("%d\n", i);

fireblade

unread,

May 21, 2007, 7:54:16 AM5/21/07

to

On May 20, 7:19 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> j.oke wrote:
> > If you can live with compromises (in order to give you some
> > superficial(!) advantages), you might consider to switch to some other
> > language...
>
> What if the superficial disadvantage is "Lisp lost me a $1M trade because it
> hung whilst garbage collecting a big int I never needed" or "nobody is
> buying my computer game because Lisp has generalised the player coordinates
> to arbitrary-precision numbers and crippled the frame rate"?
>
> The fact is, Lisp is slow and it puts people off.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy

Jon, you never made any game so you don't know is it slow or not. You
never even made a game with Ocaml to speak about it. IMHO , the
largest thing I ever made is a racing demo ~20000 lines with
lightfeather and newton handling physics showed that unoptimised lisp
is about 30% slower , something that people could live with
considering easiness of development with lisp. And BTW making a good
3d game is hardest thing i ever tackled. People develop games with c/c+
+/c# because that's what the GDKs and game engines support. Some use
higher languages for scripting or develop something in house but
that's another topic.

fireblade

unread,

May 21, 2007, 7:56:03 AM5/21/07

to

On May 20, 7:46 pm, Rainer Joswig <jos...@lisp.de> wrote:
> In article <4650847e$0$8738$ed261...@ptn-nntp-reader02.plus.net>,

Sorry Rainer but I really hate when people speak thrash game
development, beside I find this troll quite interesthing.

fireblade

unread,

May 21, 2007, 8:08:15 AM5/21/07

to

On May 20, 8:01 pm, Kent M Pitman <pit...@nhplace.com> wrote:

> Jon Harrop <j...@ffconsultancy.com> writes:
>
> > or "nobody is buying my computer game because Lisp has generalised
> > the player coordinates to arbitrary-precision numbers and crippled
> > the frame rate"?
>
> Wow, it got all the way through to release without needing to be tested
> and ran on the first try, just a little slowly? Cool.
>
> Does that happen a lot in ML? Someone writes an entire game application
> and never tests it, then compiles it and runs it and it plays fine
> AND fast on the first try?
>
> Or is it possible that this is just a completely contrived example
> not typical of anything?
>
> > The fact is, Lisp is slow and it puts people off.
>
> If you're so put off, why are you still posting here?

If he plans to make a game he would use a premade engine , the
commercial or an opensource one. Making your own engine is better
left to people who have a enough time , knowledge and a good team who
will help them. People with those talents don't spend time trolling in
forums they build their engine. Where we gonna see engine build in
OCaml or F# ?

And Jon you better took my advcie and write some demos with
Irrlicht .Net and show us how good is F#. It'll give you some
necessary credentials to speak on this subject.

j.oke

unread,

May 21, 2007, 8:27:02 AM5/21/07

to

Jon Harrop wrote:
> Other solutions also take zero run time but significant compile time (to do
> the calculation). How do you propose we compare them?

[off topic, whispering]

(You might consider to change your way of learning Lisp:
In usenet it will take you more than a dozen years... ;-)

Dan Bensen

unread,

May 21, 2007, 8:37:14 AM5/21/07

to

fireblade wrote:
> On May 20, 7:46 pm, Rainer Joswig <jos...@lisp.de> wrote:

>> Jon Harrop <j...@ffconsultancy.com> wrote:
>>> The fact is, Lisp is slow and it puts people off.
>> ___________________________
>> /| /| | |
>> ||__|| | Please don't |
>> / O O\__ feed |
>> / \ the trolls |
>> / \ \ |
>> / _ \ \ ----------------------
>> / |\____\ \ ||
>> / | | | |\____/ ||
>> / \|_|_|/ | __||
>> / / \ |____| ||
>> / | | /| | --|
>> | | |// |____ --|
>> * _ | |_|_|_| | \-/
>> *-- _--\ _ \ // |
>> / _ \\ _ // | /
>> * / \_ /- | - | |
>> * ___ c_c_c_C/ \C_c_c_c____________
>>
>> --http://lispm.dyndns.org
>
> Sorry Rainer but I really hate when people speak thrash game
> development, beside I find this troll quite interesthing.

I don't. I find him very boring and upsetting, and I appreciate
when people call out trolls for what they are.

The fact is that Lisp is a dynamically typed language, and dynamic
type checking takes time. That's not an argument against Lisp
any more than dozens of other popular languages. Even less so,
in fact, because Lisp has optional type declarations.

According to Peter Norivig and Greg Sullivan, the fact is that
dynamic typing is more expressive than static typing. Statically
typed languages don't have that expressiveness, and some people
are "put off" by that.

The fact is that dynamic typing is better for RAD. Statically
typed languages tend to quibble over minor type issues, and
some people are "put off" by that.

The fact is that trolls are disruptive and unhelpful, and *many*
people are put off by that.

--
Dan
www.prairienet.org/~dsb/

Rainer Joswig

unread,

May 21, 2007, 8:39:16 AM5/21/07

to

In article <1179748563.2...@36g2000prm.googlegroups.com>,
fireblade <slobodan...@gmail.com> wrote:

Then discuss game development or other stuff via email with him.

--
http://lispm.dyndns.org

Larry Clapp

unread,

May 21, 2007, 8:43:34 AM5/21/07

to

On 2007-05-21, Jon Harrop <j...@ffconsultancy.com> wrote:
> Larry Clapp wrote:
>>> For a fair comparison, you either need to include compilation time
>>
>> ... I'll disagree with this one.
>>
>> Suddenly the Lisp program wins, through (you believe) compile-time
>> computation, and suddenly you change what you measure so that the
>> Lisp program doesn't win any more.
>>
>> Why?
>
> Other solutions also take zero run time but significant compile time
> (to do the calculation). How do you propose we compare them?

Which part of

"You should define what you want up front, and whoever wins, wins."

did you not understand?

fireblade

unread,

May 21, 2007, 8:47:29 AM5/21/07

to

Yes but it makes a cool looking page. Just imagine : Jon Harrop's
Road to Lisp.

fireblade

unread,

May 21, 2007, 8:51:21 AM5/21/07

to

On May 21, 2:39 pm, Rainer Joswig <jos...@lisp.de> wrote:
> In article <1179748563.230818.255...@36g2000prm.googlegroups.com>,

> --http://lispm.dyndns.org- Hide quoted text -
>
Ok

fireblade

unread,

May 21, 2007, 8:56:55 AM5/21/07

to

Most people still believe in waterfall method, planning everything
from start. The layered architecture, field where embedded languages
perfectly fit in is in little use for someone who prefer big ball of
mud architecture.

Kjetil Svalastog Matheussen

unread,

May 21, 2007, 9:10:17 AM5/21/07

to

On Mon, 21 May 2007, Jon Harrop wrote:

> Kjetil S. Matheussen wrote:
> > No. Not correct. As I posted, the stalin version was about 1.5 times
> > faster than the C version,
>
> Stalin only runs on my 32-bit system so I haven't gotten around to it yet.
>
> > Even better, the stalin version wasn't even optimized,
>
> Unless you count the page of compiler flags. ;-)
>
> > I
> > just wrote down the most straigh-forward version, compiled it, and bang,
> > 1.5 times faster than C.
>
> This also indicates that the C could be improved.
>
> Using ints instead of doubles and (int)(..) instead of trunc(..) makes this
> C99 more than twice as fast (0.137s):
>
> int main(int argc, char *argv[]) {
> const int n = (argc == 2 ? atoi(argv[1]) : 5000);
> int i = 0;
>
> for (int c=2; c<=n; c++)
> for (int b=1; b<c; b++) {
> double a = sqrt(c*c - b*b);
> if ((int)a == a) i++;
> }
> printf("%d\n", i);
> }
>
>

Well, for fun, here is the benchmark for stalin (the one I posted
earlier), c (your version above), and ocaml (the one found at
http://tempe.st/2007/05/erlang-ruby-and-php-battle-it-out/).
Unfortunately, I don't have sbcl (my linux lacks ntpl).

Compile options
---------------
gcc -O3 --fast-math
ocamlopt -rectypes -inline 100 -ffast-math -ccopt -O3
stalin -Ob -Om -On -Or -Ot -dP -dC -dh -q -d -db -no-clone-size-limit -split-even-if-no-widening -copt -O3 -copt -ffast-math

Compiler versions
-----------------
gcc 3.4.6
ocamlopt 3.09.3
stalin 0.11

Results
-------
gcc: 0.307s (best of 5)
OCaml: 0.538s (best of 5)
Stalin: 0.356s (best of 5)

jt

unread,

May 21, 2007, 9:15:01 AM5/21/07

to

Something like I learned lisp by trolling in usenet, or my first
exposure to lisp was when I decide to convert lispers to my ugly ocaml
like microshit language and by the way sell my stupid journal.

No way.

fireblade

unread,

May 21, 2007, 10:07:09 AM5/21/07

to

How do you know that his journal is stupid? Wait the minute you are
the lisper that order it ? :)

Sacha

unread,

May 21, 2007, 10:26:58 AM5/21/07

to

Jon Harrop wrote:
[lots of stuff]

Great, you learned that some things can and should be computed at
compile time for maximum efficiency.

You also learned that a good algorithm performs way better. And that
indeed the used language has not much of an impact on that simple fact.

That's two nice stories for your journal, Jon.

Sacha

Wade Humeniuk

unread,

May 21, 2007, 10:52:28 AM5/21/07

to

Wolfram Fenske <in...@gmx.net> writes:

>
> What kind of machine have you got? I'm using SBCL 1.0.5 on an AMD64
> 2800+ or something, the disassemblies look identical, but for me
>
> (time (pythagorean-triplet))
>
> says:
>
> --8<---------------cut here---------------start------------->8---
> Evaluation took:
> 0.267 seconds of real time
> 0.25461 seconds of user run time
> 7.5e-5 seconds of system run time
> 0 calls to %EVAL
> 0 page faults and
> 3,126,264 bytes consed.
> --8<---------------cut here---------------end--------------->8---
>
> ... which, by the way, is still *awesome* -- the best result so far,
> even faster than the C version [1] I have.
>

I have a new Mac Book Pro. Its a Centrino Dual Core 2.16GHz
with a 4MB shared L2 Cache. The cache is probably the thing
that makes the difference. The bit-vector can fit in it.

SBCL 1.04.

Wade

Jon Harrop

unread,

May 21, 2007, 1:55:52 PM5/21/07

to

Dan Bensen wrote:
> I find him very boring and upsetting

Please don't be upset by what I say. I shall try to be more diplomatic.

> The fact is that dynamic typing is better for RAD.

Dynamic typing can certainly be better for RAD under certain circumstances
(e.g. interop, GUI dev) but not all.

--
Dr Jon D Harrop, Flying Frog Consultancy

Joe Marshall

unread,

May 21, 2007, 2:33:13 PM5/21/07

to

Dr Jon D Harrop of Flying F*** Consultancy finally hauls out this well-
worn saw:

> The fact is, Lisp is slow and it puts people off.

If I had a nickel for every time I heard this...
I think this counts as a lisp-specific instance of Godwin's law.

Anyone remember `Ahmed the Peaceman'?

Apparently you don't read what I post. In a conversation you
participated in on comp.lang.lisp just a bit over a year ago I wrote:

Allow me to point you at the Supercomputer Toolkit:
http://repository.readscheme.org/ftp/papers/berlinsurati-pepm94.pdf

``The Supercomputer Toolkit compiler performs partial evaluation
of data-independent programs expressed in the Scheme
dialect of Lisp by using the symbolic execution technique described
in previously published work by Berlin.''

Jacob Katzemelson at Technion has apparently used the Supercomputer
Toolkit to run weather prediction models:

``Conservative estimates performed for a typical mesoscale forecast
over the eastern Mediterranean (for 36 hour) suggest that were the
Toolkit constructed from ALPHA processors, 10 processors would do the
prediction in only about 13 minutes. A 36 hours simulation with full
physics for the whole earth will require 2 hours for 80 ALPHA
processors.''

If Lisp is slow, why then are MIT researchers using it for computation-
intensive programs?

What sort of crappy education are they giving at Robinson College
these days? It only takes a few seconds on Google to find enough
papers on high-performance computing in Lisp to make a decent text
book! If I were you I'd ask Robinson for my money back. It seems
they left you as ignorant as when you entered and certainly didn't
train you to change that sad situation.

Next time you feel like uttering a sentence with the word `fact' in it
(or `most people' or `vast majority', etc.) just type it into that
little rectangular box at the top of your browser and *find out* if
there is any truth to it *before* you go spouting off.

Frank Buss

unread,

May 21, 2007, 3:04:20 PM5/21/07

to

Wade Humeniuk wrote:

That's nice. I've tried it with Visual C++ .NET 2003. The source should be
standard C++, so the measurement function is a bit coarse grained, which is
the reason why I run the loop 10 times:

#include <ctime>
#include <cstdlib>
#include <cstdio>
#include <cmath>
#include <vector>

using namespace std;

int main(int argc, char *argv[]) {

const int n = argc == 2 ? atoi(argv[1]) : 5000;

clock_t start = clock();
int i = 0;
for (int loop = 0; loop < 10; loop++) {
vector<bool> lookup(n*n);
for (int j = 0; j < n; j++) lookup[j*j] = true;
for (int c = 2; c <= n; c++) {
for (int b = 1; b < c; b++) {
if (lookup[c*c - b*b]) i++;
}
}
}
printf("%i\n", i / 10);
clock_t end = clock();

float time = float(end-start) / float(CLK_TCK) / 10.0f;
printf("time: %0.04f", time);

return 0;
}

Output:

11362
time: 0.1921

On the same computer with SBCL 1.0.2 on Windows with your program:

* (time (pythagorean-triplet))

11362
Evaluation took:
0.234 seconds of real time
0.21875 seconds of user run time
0.015625 seconds of system run time

0 calls to %EVAL
0 page faults and
3,126,264 bytes consed.
11362

BTW: I've tried to loop the Lisp program 10 times, too, but it crashes all
the time on Windows. This is the modified program:

(defconstant +max+ 5000)

(defun make-lookup (size)
(let* ((lookup (make-array (* size size) :element-type 'bit
:initial-element 0)))
(dotimes (i size lookup)
(setf (sbit lookup (* i i)) 1))))

(defun pythagorean-triplet ()
(declare (optimize (speed 3) (safety 0) (debug 0)))

(let ((n 0))
(loop repeat 10 do
(let ((lookup (make-lookup (1+ +max+))))

(declare (fixnum n))
(loop for c from 2 to +max+ do
(loop for b from 1 below c
unless (zerop (sbit lookup (- (* c c) (* b b))))

do (incf n)))))
(print (/ n 10))))

Calling with "(time (pythagorean-triplet))" gives:

fatal error encountered in SBCL pid 3708:
GC invariant lost, file "gencgc.c", line 833

LDB monitor

Looks like it needs still some work for a stable SBCL version on Windows.

Finally some parts of the disassembly of the C++ program:

mov esi, edi
imul esi, edi
mov DWORD PTR _start$[esp+76], eax
xor ebp, ebp
lea eax, DWORD PTR [esi+31]
shr eax, 5
mov DWORD PTR _i$[esp+72], ebp
mov DWORD PTR $T10326[esp+76], esi
; Line 15
mov DWORD PTR $T10313[esp+76], ebp
mov DWORD PTR tv328[esp+76], eax
mov DWORD PTR tv242[esp+76], 10 ; 0000000aH
jmp SHORT $L5974
$L10623:
; Line 12
mov eax, DWORD PTR tv328[esp+76]
mov esi, DWORD PTR $T10326[esp+76]
npad 6
$L5974:
; Line 15
lea edx, DWORD PTR $T10313[esp+76]
push edx
push eax
lea ecx, DWORD PTR _lookup$6436[esp+88]
mov DWORD PTR _lookup$6436[esp+84], ebp
call ?_Construct_n@?$vector@IV?$allocator@I@std@@@std@@QAEXIABI@Z ;
std::vector<unsigned int,std::allocator<unsigned int> >::_Construct_n
push esi
lea ecx, DWORD PTR _lookup$6436[esp+80]
mov DWORD PTR __$EHRec$[esp+88], ebp
call ?_Trim@?$vector@_NV?$allocator@_N@std@@@std@@IAEXI@Z ;
std::vector<bool,std::allocator<bool> >::_Trim
; Line 16
mov ebx, DWORD PTR _lookup$6436[esp+84]
xor edx, edx
cmp edi, ebp
mov DWORD PTR __$EHRec$[esp+84], -1
jle SHORT $L6440
$L6438:
mov ecx, edx
imul ecx, edx
mov eax, ebp
add eax, ecx
mov ecx, eax
shr ecx, 5
lea esi, DWORD PTR [ebx+ecx*4]
and eax, 31 ; 0000001fH
mov ecx, 1
mov DWORD PTR tv406[esp+76], ecx
mov ecx, eax
mov eax, DWORD PTR tv406[esp+76]
shl eax, cl
mov ecx, DWORD PTR [esi]
or ecx, eax
inc edx
cmp edx, edi
mov DWORD PTR [esi], ecx
jl SHORT $L6438
xor ebp, ebp
$L6440:
; Line 17
mov eax, 2
cmp edi, eax
mov DWORD PTR _c$6496[esp+76], eax
jl SHORT $L6499
$L6497:
; Line 18
mov edx, 1
cmp eax, edx
jle SHORT $L6498
; Line 19
mov edi, eax
imul edi, eax
$L6501:
mov esi, edx
imul esi, edx
xor eax, eax
mov ebp, edi
sub ebp, esi
mov eax, ebp
mov esi, eax
mov ecx, ebx
shr esi, 5
and eax, 31 ; 0000001fH
lea esi, DWORD PTR [ecx+esi*4]
mov ecx, eax
mov eax, DWORD PTR [esi]
mov ebp, 1
shl ebp, cl
test ebp, eax
je SHORT $L6502
inc DWORD PTR _i$[esp+72]
$L6502:
mov eax, DWORD PTR _c$6496[esp+76]
inc edx
cmp edx, eax
jl SHORT $L6501
mov edi, DWORD PTR _n$[esp+76]
xor ebp, ebp
$L6498:
inc eax
cmp eax, edi
mov DWORD PTR _c$6496[esp+76], eax
jle SHORT $L6497
$L6499:
; Line 22
cmp ebx, ebp
mov DWORD PTR _lookup$6436[esp+76], ebp
je SHORT $L10605
push ebx
call ??3@YAXPAX@Z ; operator delete
add esp, 4
$L10605:
mov eax, DWORD PTR tv242[esp+76]
dec eax
mov DWORD PTR _lookup$6436[esp+84], ebp
mov DWORD PTR _lookup$6436[esp+88], ebp
mov DWORD PTR _lookup$6436[esp+92], ebp
mov DWORD PTR tv242[esp+76], eax
jne $L10623
; Line 23
mov eax, 1717986919 ; 66666667H
imul DWORD PTR _i$[esp+72]
sar edx, 2
mov eax, edx
shr eax, 31 ; 0000001fH
add eax, edx
push eax
push OFFSET FLAT:??_C@_03PELOGHMK@?$CFi?6?$AA@
call _printf

--
Frank Buss, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Jon Harrop

unread,

May 21, 2007, 3:01:11 PM5/21/07

to

Kjetil Svalastog Matheussen wrote:
> gcc: 0.307s (best of 5)
> OCaml: 0.538s (best of 5)
> Stalin: 0.356s (best of 5)

I get:

gcc: 0.137s
SBCL: 0.214s
Stalin: 0.330s (hardcoded "n")
OCaml: 0.353s

Probably time to increase n. With n=20000:

gcc: 2.251s
SBCL: 2.462s
Stalin: 5.164s
OCaml: 5.433s

Well, this microbenchmark certainly isn't showing anything meaningful. ;-)

Frank Buss

unread,

May 21, 2007, 3:10:55 PM5/21/07

to

Jon Harrop wrote:

> Frank Buss wrote:
>> Some more reasons why a good GC is faster than manual memory managment:
>> http://www.jwz.org/doc/gc.html
>
> So you'd recommend the fastest language with GC?

This is language independant, because you can use a GC in C++, too. But
Lisp is much easier to use.

Raffael Cavallaro

unread,

May 21, 2007, 4:02:17 PM5/21/07

to

On 2007-05-21 15:01:11 -0400, Jon Harrop <j...@ffconsultancy.com> said:

> I get:
>
> gcc: 0.137s
> SBCL: 0.214s
> Stalin: 0.330s (hardcoded "n")
> OCaml: 0.353s
>
> Probably time to increase n. With n=20000:
>
> gcc: 2.251s
> SBCL: 2.462s
> Stalin: 5.164s
> OCaml: 5.433s
>
> Well, this microbenchmark certainly isn't showing anything meaningful. ;-)

Apparently benchmark "meaning" is inversely proportional to how much
faster a lisp runs it than ocaml.

Tamas Papp

unread,

May 21, 2007, 4:10:28 PM5/21/07

to

Joe Marshall <eval....@gmail.com> writes:

> Allow me to point you at the Supercomputer Toolkit:
> http://repository.readscheme.org/ftp/papers/berlinsurati-pepm94.pdf
>
> ``The Supercomputer Toolkit compiler performs partial evaluation
> of data-independent programs expressed in the Scheme
> dialect of Lisp by using the symbolic execution technique described
> in previously published work by Berlin.''

Thanks for posting this, it was extremely interesting. I do a lot of
numerical work (slowly migrating to Lisp), and I would like to learn
more about these techniques. For example, is it Scheme specific or
could this be done in CL? What would be a good place to start?

Thanks,

Tamas

Jon Harrop

unread,

May 21, 2007, 4:24:24 PM5/21/07

to

Joe Marshall wrote:
> Dr Jon D Harrop of Flying F*** Consultancy finally hauls out this well-
> worn saw:
>> The fact is, Lisp is slow and it puts people off.
>

> Allow me to point you at the Supercomputer Toolkit:
> http://repository.readscheme.org/ftp/papers/berlinsurati-pepm94.pdf

Allow me to point you at some benchmarks made this century that run on
platforms that aren't dead:

http://www.ffconsultancy.com/languages/ray_tracer/results.html

> It only takes a few seconds on Google to find enough
> papers on high-performance computing in Lisp to make a decent text
> book!

Yeah, your new book could include this paper from 1983 as well:

http://portal.acm.org/citation.cfm?id=801672&coll=portal&dl=ACM

> It seems
> they left you as ignorant as when you entered and certainly didn't
> train you to change that sad situation.

I can't believe you haven't reached for the stereotypical Lisper's
performance comparison with Java. You must be getting slow in your old age.

> Next time you feel like uttering a sentence with the word `fact' in it
> (or `most people' or `vast majority', etc.) just type it into that
> little rectangular box at the top of your browser and *find out* if
> there is any truth to it *before* you go spouting off.

<sarcasm>
I hear Lisp is the new black at high-performance supercomputing conferences.
Not to mention the fact that performance only matters when you're coding
for a supercomputer...
</sarcasm>

ROTFLMAO.

Rainer Joswig

unread,

May 21, 2007, 4:42:41 PM5/21/07

to

In article <46520151$0$8748$ed26...@ptn-nntp-reader02.plus.net>,

Jon Harrop

unread,

May 21, 2007, 4:39:51 PM5/21/07

to

You missed off the factorial.

Jon Harrop

unread,

May 21, 2007, 4:40:47 PM5/21/07

to

Frank Buss wrote:
> This is language independant...

Language design can affect the potential for optimization in the GC, e.g.
see BIBOP.

Dan Bensen

unread,

May 21, 2007, 5:03:32 PM5/21/07

to

Jon Harrop wrote:
> Dan Bensen wrote:
>> I find him very boring and upsetting

> I shall try to be more diplomatic.

Could you try to be rational?
That's all I really care about.

--
Dan
www.prairienet.org/~dsb/

Pascal Costanza

unread,

May 21, 2007, 5:54:58 PM5/21/07

to

Jon Harrop wrote:
> Kjetil Svalastog Matheussen wrote:
>> gcc: 0.307s (best of 5)
>> OCaml: 0.538s (best of 5)
>> Stalin: 0.356s (best of 5)
>
> I get:
>
> gcc: 0.137s
> SBCL: 0.214s
> Stalin: 0.330s (hardcoded "n")
> OCaml: 0.353s
>
> Probably time to increase n. With n=20000:
>
> gcc: 2.251s
> SBCL: 2.462s
> Stalin: 5.164s
> OCaml: 5.433s
>
> Well, this microbenchmark certainly isn't showing anything meaningful. ;-)

No, of course not. Microbenchmarks never show anything meaningful, no
matter what language they favor.

In this thread we have just seen what every experienced Lisper already
knows: With enough tweaking we can get in the same ballpark as any other
compiled language. This is only useful for the hot spots, though, and
you can never know the hot spots in advance. For everything else,
performance doesn't matter, as long as its reasonable.

Other aspects are at least similarly important.

For example, from
http://groups.google.com/group/comp.lang.lisp/msg/c8aac11d40efa4b0

"It is interesting to notice that one year later, [the Common Lisp
directory] is still the same Lisp process that is running. The project
has evolved from Linkit to the CLD without ever quitting the Lisp
process. It has never crashed, even though the application and the CLOS
classes have changed a lot of times. In the same time, I had to restart
several times the Apache process and reboot a lot of time the 3COM
firewall router in front of the server."

Another example, from
https://lists.csail.mit.edu/pipermail/ll-discuss/2007-February/001186.html

"The second moment came when I had re-factored a class into a
superclass/subclass relationship. I then reloaded the file and tested
the software to see that it worked ok. It did.

Then I thought about that. I just re-organized the class topology on
the fly in a running system. How on earth could that just work?
Obviously existing instances had to be converted on the fly to new
objects and any dependencies had to be recomputed. But a huge chunk of
CLOS that I had thought `overkill' was what had stepped in and made sure
that I could just reload a file and keep going. Once again I was deeply
impressed.

I think if Graham spent a good chunk of time working on a CLOS project
that he, too, would come around to the point of view that CLOS is a
significant achievement."

Or from http://lib.store.yahoo.net/lib/paulgraham/bbnexcerpts.txt

"Lisp's interactive toplevel is a great help in developing software
rapidly. But the biggest advantage for us was probably in finding bugs.
As I mentioned before, with Web-based applications you have the users'
data on your servers and can usually reproduce bugs.

When one of the customer support people came to me with a report of a
bug in the editor, I would load the code into the Lisp interpreter and
log into the user's account. If I was able to reproduce the bug I'd get
an actual break loop, telling me exactly what was going wrong. Often I
could fix the code and release a fix right away. And when I say right
away, I mean while the user was still on the phone."

...and so on, and so forth.

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

Joe Marshall

unread,

May 21, 2007, 6:20:36 PM5/21/07

to

On May 21, 1:10 pm, Tamas Papp <tkp...@gmail.com> wrote:

> Joe Marshall <eval.ap...@gmail.com> writes:
> > Allow me to point you at the Supercomputer Toolkit:
> >http://repository.readscheme.org/ftp/papers/berlinsurati-pepm94.pdf
>
> > ``The Supercomputer Toolkit compiler performs partial evaluation
> > of data-independent programs expressed in the Scheme
> > dialect of Lisp by using the symbolic execution technique described
> > in previously published work by Berlin.''
>
> Thanks for posting this, it was extremely interesting. I do a lot of
> numerical work (slowly migrating to Lisp), and I would like to learn
> more about these techniques. For example, is it Scheme specific or
> could this be done in CL? What would be a good place to start?

It can be done in Common Lisp. Jeffrey Mark Siskind has a partial
evaluator for Common Lisp at
http://cobweb.ecn.purdue.edu/~qobi/software.html

You should also check out this paper:
@article{ fateman95fast,
author = "Richard J. Fateman and Kevin A. Broughan and Diane K.
Willcock and Duane Rettig",
title = "Fast Floating Point Processing in {Common Lisp}",
journal = "ACM Transactions on Mathematical Software",
volume = "21",
number = "1",
pages = "26--62",
year = "1995",
url = "citeseer.ist.psu.edu/fateman95fast.html" }

Two of the authors have been known to post to this group.

A lot of this work is still very experimental, so be prepared to poke
around.

Message has been deleted

Wolfram Fenske

unread,

May 21, 2007, 9:42:18 PM5/21/07

to

Wade Humeniuk <whum...@telus.net.no.spam> writes:

That has to be it. I knew that caches are a lot faster than main
memory, but still, this is the most impressive demonstration I've seen
so far. Whew! :-)

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

Wade Humeniuk

unread,

May 21, 2007, 9:58:34 PM5/21/07

to

I tried (a modified version of your C++) version on OS X

The C++ version is slower that the SBCL Version (probably because of
caching). The Lisp version is 40% faster.

wade-humeniuks-computer:~/lisp wade$ gcc -O3 -c pyth.cc
wade-humeniuks-computer:~/lisp wade$ g++ -lstdc++ -o pyth pyth.o
wade-humeniuks-computer:~/lisp wade$ time ./pyth
11362
11362
11362
11362
11362
11362
11362
11362
11362
11362
time: 1.0300
real 0m1.052s
user 0m0.998s
sys 0m0.042s
wade-humeniuks-computer:~/lisp wade$

CL-USER> (time (loop repeat 10 do (pythagorean-triplet)))

11362
11362
11362
11362
11362
11362
11362
11362
11362
11362
Evaluation took:
0.699 seconds of real time
0.691419 seconds of user run time
0.00562 seconds of system run time
[Run times include 0.015 seconds GC run time.]

0 calls to %EVAL
0 page faults and

31,295,360 bytes consed.
NIL
CL-USER>

Wade Humeniuk

unread,

May 21, 2007, 10:00:33 PM5/21/07

to

Wade Humeniuk <whum...@telus.net.no.spam> writes:

> I tried (a modified version of your C++) version on OS X
>
> The C++ version is slower that the SBCL Version (probably because of
> caching). The Lisp version is 40% faster.
>

Forgot the code.

#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <vector>

using namespace std;

int main(int argc, char *argv[]) {
const int n = argc == 2 ? atoi(argv[1]) : 5000;

clock_t start = clock();
int i = 0;
for (int loop = 0; loop < 10; loop++) {

i = 0;

vector<bool> lookup(n*n);
for (int j = 0; j < n; j++) lookup[j*j] = true;
for (int c = 2; c <= n; c++) {
for (int b = 1; b < c; b++) {
if (lookup[c*c - b*b]) i++;
}
}

printf("%i\n", i);
}
clock_t end = clock();

float time = float(end-start) / float(CLOCKS_PER_SEC);

printf("time: %0.04f", time);

return 0;
}

Wade Humeniuk

unread,

May 21, 2007, 10:39:13 PM5/21/07

to

Wolfram Fenske <in...@gmx.net> writes:

>
> That has to be it. I knew that caches are a lot faster than main
> memory, but still, this is the most impressive demonstration I've seen
> so far. Whew! :-)

Here is a link to the processor blurb.

Intel Core™2 Duo Processor T7400 2.16GHz w/ 4MB Cache

http://www.memoryexpress.com/index.php?PageTag=&page=file&memx_menu=EmbedProductDetail.php&DisplayProductID=9503&SID=

The L2 Cache on the Core2 is the same speed as the processor clock.
The RAM is 667MHz, the processor is 2.16GHz.

This caching thing can be very big. Its explains why the UltraSpracs
in the Sun we have at work is so fast (8MB Caches).

Wade

Kjetil S. Matheussen

unread,

May 21, 2007, 10:44:53 PM5/21/07

to

On Mon, 21 May 2007, Jon Harrop wrote:

> Kjetil Svalastog Matheussen wrote:
>> gcc: 0.307s (best of 5)
>> OCaml: 0.538s (best of 5)
>> Stalin: 0.356s (best of 5)
>
> I get:
>
> gcc: 0.137s
> SBCL: 0.214s
> Stalin: 0.330s (hardcoded "n")
> OCaml: 0.353s
>
> Probably time to increase n. With n=20000:
>
> gcc: 2.251s
> SBCL: 2.462s
> Stalin: 5.164s
> OCaml: 5.433s
>
> Well, this microbenchmark certainly isn't showing anything meaningful. ;-)
>

Right. Its weird that you get so slow code from stalin though. I remember
the same thing happened with your ray-tracer benchmark. both Siskin and
myself got much faster code out of stalin than you did. I like stalin as a
proof that untweaked lisp (including other languages with dynamic typing
and such) can be really fast, and in theory beat everything else when it
comes to performance, so its dissapointing that you can't get as good
performance out of it since you make so much buzz about performance in
your endless number of newsgroup posts and the promotion for your
ray-tracer benchmark.

Raffael Cavallaro

unread,

May 22, 2007, 12:16:31 AM5/22/07

to

On 2007-05-21 16:39:51 -0400, Jon Harrop <j...@ffconsultancy.com> said:

> Raffael Cavallaro wrote:
>> On 2007-05-21 15:01:11 -0400, Jon Harrop <j...@ffconsultancy.com> said:
>>> Well, this microbenchmark certainly isn't showing anything meaningful.
>>> ;-)
>>
>> Apparently benchmark "meaning" is inversely proportional to how much
>> faster a lisp runs it than ocaml.
>
> You missed off the factorial.

Credit where credit is due:

"OCaml - the language of choice for toy benchmarks from factorials to
ray-tracers!"

Jon Harrop

unread,

May 22, 2007, 9:21:40 AM5/22/07

to

Kjetil S. Matheussen wrote:
> Right. Its weird that you get so slow code from stalin though. I remember
> the same thing happened with your ray-tracer benchmark. both Siskin and
> myself got much faster code out of stalin than you did. I like stalin as a
> proof that untweaked lisp (including other languages with dynamic typing
> and such) can be really fast, and in theory beat everything else when it
> comes to performance, so its dissapointing that you can't get as good
> performance out of it since you make so much buzz about performance in
> your endless number of newsgroup posts and the promotion for your
> ray-tracer benchmark.

I don't think this result is so surprising. Firstly, the Stalin entry is the
only one running in 32-bit. GCC, OCaml and SBCL all support 64-bit and I'm
running an AMD64 that is much faster in 64-bit.

Secondly, previous benchmarks indicate that SBCL and Stalin both do
relatively better than OCaml on Intel hardware. We only use AMD hardware so
I can't benchmark the same programs on Intels. But my guess is that the
results would be significantly different.

At the end of the day, all this really means is that the results are not
significantly different.

There are still several interesting facts to be drawn from these results:

1. Functional programming languages can be very competitive with C on
imperative code. I was very surprised to see the obvious C implementation
beaten by Lisp.

2. This is another benchmark where OCaml's lack of optimized div and mod
hurts.

3. Compiler switches and platforms have at least as big an effect as
language choice on this benchmark.

4. The underlying representation and choice of GC can have very significant
performance implications for a language. In this case, OCaml has trouble
using a bit-vector because it offers only tagged 31/63-bit ints for unboxed
arrays, which require much more work and slightly more memory.

Frank Buss

unread,

May 22, 2007, 12:41:40 PM5/22/07

to

Wade Humeniuk wrote:

> I tried (a modified version of your C++) version on OS X
>
> The C++ version is slower that the SBCL Version (probably because of
> caching). The Lisp version is 40% faster.
>
> wade-humeniuks-computer:~/lisp wade$ gcc -O3 -c pyth.cc
> wade-humeniuks-computer:~/lisp wade$ g++ -lstdc++ -o pyth pyth.o
> wade-humeniuks-computer:~/lisp wade$ time ./pyth

Yes, usually gcc is slower than Microsoft Visual C++ :-)

asci...@gmail.com

unread,

May 22, 2007, 1:13:41 PM5/22/07

to

On May 20, 9:11 pm, Wade Humeniuk <whume...@telus.net.no.spam> wrote:
> (defconstant +max+ 5000)
>
> (defun make-lookup (size)
> (let* ((lookup (make-array (* size size) :element-type 'bit :initial-element 0)))
> (dotimes (i size lookup)
> (setf (sbit lookup (* i i)) 1))))
>
> (defun pythagorean-triplet ()
> (declare (optimize (speed 3) (safety 0) (debug 0)))
> (let* ((n 0)
> (lookup (make-lookup (1+ +max+))))
> (declare (fixnum n))
> (loop for c from 2 to +max+ do
> (loop for b from 1 below c
> unless (zerop (sbit lookup (- (* c c) (* b b))))
> do (incf n)))
> (print n)))

I like it. I made a small tweak to take a parameter (I don't like
constants for things like this):

(defun count-triples-slow (hypotenuse-max)
(declare (type (integer 5 #.(isqrt array-dimension-limit))
hypotenuse-max)

(optimize (speed 3) (safety 0) (debug 0)))

(let ((lookup (make-lookup (1+ hypotenuse-max))))
(loop for c from 2 to hypotenuse-max
sum (loop for b from 1 below c
sum (sbit lookup (- (* c c) (* b b)))))))

CL-USER> (time (count-triples-slow 5000))
Evaluation took:
0.474 seconds of real time
0.476029 seconds of user run time
0.0 seconds of system run time

0 page faults and
3,126,264 bytes consed.

11362

Note, however, that this algorithm finds all ordered triples for each
hypotenuse. For example, it counts both (3, 4, 5) and (4, 3, 5). So
we only have to search to the point where b^2 is half of c^2 and then
double the count.

(defun count-triples-slow (hypotenuse-max)
(declare (type (integer 5 #.(isqrt array-dimension-limit))
hypotenuse-max)

(optimize (speed 3) (safety 0) (debug 0)))

(let ((lookup (make-lookup (1+ hypotenuse-max))))
(* 2 (loop for c from 2 to hypotenuse-max
sum (loop for b from 1 to (isqrt (truncate (* c c)
2))
sum (sbit lookup (- (* c c) (* b
b))))))))

CL-USER> (time (count-triples-slow 5000))
Evaluation took:
0.226 seconds of real time
0.220013 seconds of user run time
0.008001 seconds of system run time

0 page faults and
3,126,264 bytes consed.

11362

I then took a look at the macroexpansion of the loop code and noted
that there were some inefficiencies in there. I pulled the two loops
apart into separate functions and made them into tail-recursive
algorithms. I also made some pretty restrictive type definitions.
(The lookup table must fit into an array, so the maximum hypotenuse
must be the square root of array-dimension-limit. Also, the number of
candidate triples below a given hypotenuse length is a triangular
number, so the type for the count of pythagorean triples reflects the
closed formula for triangle number n, (n*(n+1)/2).)

(defun make-lookup (size)
(let ((lookup (make-array (* size size) :element-type 'bit

:initial-element 0)))
(dotimes (i size lookup)
(setf (sbit lookup (* i i)) 1))))

(deftype triple-count ()
`(integer 0 ,(truncate (- array-dimension-limit
(isqrt array-dimension-limit))
2)))

(defun triples-with-hypotenuse (hypotenuse lookup)
(declare (type (integer 5 #.(isqrt array-dimension-limit))
hypotenuse)
(type simple-bit-vector lookup)
(optimize (speed 3)))
(let ((c2 (* hypotenuse hypotenuse)))
(labels ((doit (b sum)
(declare (type (integer 0 #.(isqrt (truncate array-
dimension-limit 2))) b)
(type triple-count sum))
(if (zerop b)
sum
(doit (1- b) (+ sum (sbit lookup (- c2 (* b
b))))))))
(doit (isqrt (truncate c2 2)) 0))))

(defun count-triples-slow (hypotenuse-max)
(declare (type (integer 5 #.(isqrt array-dimension-limit))
hypotenuse-max)
(optimize (speed 3)))
(let ((lookup (make-lookup (1+ hypotenuse-max))))
(labels ((doit (c sum)
(declare (type (integer 0 #.(isqrt array-dimension-
limit)) c)
(type triple-count sum))
(if (zerop c)
sum
(doit (1- c)
(+ sum (the triple-count (triples-with-
hypotenuse c lookup)))))))
(* 2 (doit hypotenuse-max 0)))))

CL-USER> (time (count-triples-slow 5000))
Evaluation took:
0.184 seconds of real time
0.184011 seconds of user run time
0.0 seconds of system run time

0 page faults and
3,126,264 bytes consed.

11362