Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

jwrwea...@gmail.com

unread,

Sep 2, 2007, 7:51:42 AM9/2/07

to

I'm pretty new to python, but am very happy with it. As well as using
it at work I've been using it to solve various puzzles on the Project
Euler site - http://projecteuler.net. So far it has not let me down,
but it has proved surprisingly slow on one puzzle.

The puzzle is: p is the perimeter of a right angle triangle with
integral length sides, {a,b,c}. which value of p < 1000, is the
number of solutions {a,b,c} maximised?

Here's my python code:

#!/usr/local/bin/python

solutions = [0] * 1001
p = 0

for a in xrange(1, 1000):
for b in xrange(1, 1000 - a):
for c in xrange(1, 1000 - a - b):
p = a + b + c
if p < 1000:
if a ** 2 + b ** 2 == c ** 2:
solutions[p] += 1

max = 0
maxIndex = 0
index = 0
for solution in solutions:
if solution > max:
max = solution
maxIndex = index
index += 1

print maxIndex

It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook
Pro. Surprised at how slow it was I implemented the same algorithm in
C:

#include <stdio.h>
#include <stdlib.h>

int main() {
int* solutions = calloc(1000, sizeof(int));

int p;
for(int a = 1; a < 1000; ++a) {
for(int b = 1; b < 1000 - a; ++b) {
for(int c = 1; c < 1000 - a - b; ++c) {
p = a + b + c;
if(p < 1000) {
if(a * a + b * b == c * c) {
solutions[p] += 1;
}
}
}
}
}

int max = 0;
int maxIndex = 0;

for(int i = 0; i < 1000; ++i) {
if(solutions[i] > max) {
max = solutions[i];
maxIndex = i;
}
}
printf("%d\n", maxIndex);
return 0;
}

gcc -o 39 -std=c99 -O3 39.c

The resulting executable takes 0.24 seconds to run. I'm not expecting
a scripting language to run faster than native code, but I was
surprised at how much slower it was in this case. Any ideas as to what
is causing python so much trouble in the above code?

Arnaud Delobelle

unread,

Sep 2, 2007, 8:20:42 AM9/2/07

to

On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote:
> I'm pretty new to python, but am very happy with it. As well as using
> it at work I've been using it to solve various puzzles on the Project

> Euler site -http://projecteuler.net. So far it has not let me down,

from math import sqrt

solutions = [0] * 1001
p = 0

for a in xrange(1, 1000):

a2 = a*a

for b in xrange(1, 1000 - a):

c = sqrt(a2 + b*b)
if c == int(c) and a+b+c < 1000:
solutions[a+b+int(c)] += 1

max = 0
maxIndex = 0
index = 0
for solution in solutions:
if solution > max:
max = solution
maxIndex = index
index += 1

print maxIndex

--
Arnaud

Greg Copeland

unread,

Sep 2, 2007, 9:31:42 AM9/2/07

to

For the curious:

O O + P A A + P
======= ======= ======= =======
2:22.56 0:25.65 0:00.75 0:00.20

O = Original Implementation
P = Psyco (psyco.full())
A = Arnaud's Revised Implementation

rzed

unread,

Sep 2, 2007, 9:36:45 AM9/2/07

to

jwrwea...@gmail.com wrote in
news:1188733902....@r34g2000hsd.googlegroups.com:

> The puzzle is: p is the perimeter of a right angle triangle with
> integral length sides, {a,b,c}. which value of p < 1000, is the
> number of solutions {a,b,c} maximised?
>
> Here's my python code:
>
> #!/usr/local/bin/python
>
> solutions = [0] * 1001
> p = 0
>
> for a in xrange(1, 1000):
> for b in xrange(1, 1000 - a):
> for c in xrange(1, 1000 - a - b):
> p = a + b + c
> if p < 1000:
> if a ** 2 + b ** 2 == c ** 2:
> solutions[p] += 1
>

Once p >= 1000, it ain't goin' back. If you break out of the
innermost loop here after that happens, you'll save a bunch of
time.

> max = 0
> maxIndex = 0
> index = 0
> for solution in solutions:
> if solution > max:
> max = solution
> maxIndex = index
> index += 1
>
> print maxIndex
>
>
> It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo
> MacBook Pro.
>

[...]

jwrwea...@gmail.com

unread,

Sep 2, 2007, 9:45:16 AM9/2/07

to

[snip code]

Thanks for that. I realise that improving the algorithm will speed
things up. I wanted to know why my less than perfect algorithm was so
much slower in python than exactly the same algorithm in C. Even when
turning off gcc's optimiser with the -O0 flag, the C version is still
> 100 times quicker.

Mark Dickinson

unread,

Sep 2, 2007, 10:10:31 AM9/2/07

to

Well, for one thing, you're creating half a million xrange objects in
the course of the search. All the C code has
to do is increment a few integers.

Mark

Ivan Wang

unread,

Sep 2, 2007, 10:28:19 AM9/2/07

to

> > 100 times quicker.- Hide quoted text -
>
> - Show quoted text -
Maybe Python is the slowest programming language in the world.
So there is a joke: some python hater said that "python" can only
crawl rather than run. :)

Python is slow because:
(1) dynamic binding
(2) it is a interpretation language
For example, in C code, an interger expression "a+b" will directly
generate an assembly code "add" for x86 processors.
A python interpreter, on the other side, need detect the type of a and
b first, then choose the right "+" operation for them and use a
evaluation stack to get the result.

Psyco is a JIT compiler with just in time specialization which can
somehow solve these two problems

Arnaud Delobelle

unread,

Sep 2, 2007, 10:53:29 AM9/2/07

to

On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote:

[...]

> The resulting executable takes 0.24 seconds to run. I'm not expecting
> a scripting language to run faster than native code, but I was
> surprised at how much slower it was in this case. Any ideas as to what
> is causing python so much trouble in the above code?

Sorry I didn't answer your question at all in my previous post
(although my modified method gets the answer in about 6s on a measly
PB G4 1GHz :).
Your little code snippet is just about the worst bit of code for
python to be compared to C. Three loops, only integer arithmetic:
that's not going to make Python shine!

Nevertheless as you optimise your C snippet (-O3), there are probably
a few things that the compiler does for you:

* as in the inner loop, a*a + b*b always remain the same, it is
probably stored in a register once and for all
* in the second loop, a*a remains the same so it might be calculated
once and for all as well.

It gives this:

M = 1000
solutions = [0] * M

def f1():
"Your original implementation"
for a in xrange(1, M):
for b in xrange(1, M - a):
for c in xrange(1, M - a - b):
if a**2 + b**2 == c**2:
solutions[a+b+c] += 1

def f2():
"a*a + b*b precalculated"
for a in xrange(1, M):
a2 = a*a
for b in xrange(1, M - a):
s = a2 + b*b
for c in xrange(1, M - a - b):
if s == c*c:
solutions[a+b+c] += 1

It doesn't make it as quick as C of course, but more than halves the
execution time.

--
Arnaud

Cameron Laird

unread,

Sep 2, 2007, 11:39:34 AM9/2/07

to

In article <1188742231....@g4g2000hsf.googlegroups.com>,

Right: Mr. Dickinson's original question is entirely
legitimate, and it's not adequate to respond, as some
follow-ups did, with ways to improve the Python-coded
algorithm.

The correct answer, which I want to reinforce, is that
the exhibited Python and C versions are NOT "exactly
the same algorithm", at least not without more quali-
fication. Part of Python expertise is to recognize
that creation of xrange objects, mentioned above, is
far from free. Also, -O3 gives C the opportunity,
also remarked in a follow-up, to factor calculations
outside their loops.

Bruno Desthuilliers

unread,

Sep 1, 2007, 2:12:23 PM9/1/07

to

Ivan Wang a écrit :

> On Sep 2, 9:45 pm, jwrweather...@gmail.com wrote:
>
>>[snip code]
>>
>>Thanks for that. I realise that improving the algorithm will speed
>>things up. I wanted to know why my less than perfect algorithm was so
>>much slower in python than exactly the same algorithm in C. Even when
>>turning off gcc's optimiser with the -O0 flag, the C version is still
>>
>>
>>
>>
>>>100 times quicker.- Hide quoted text -
>>
>>- Show quoted text -
>
> Maybe Python is the slowest programming language in the world.
> So there is a joke: some python hater said that "python" can only
> crawl rather than run. :)
>
> Python is slow because:
> (1) dynamic binding

Yes.

> (2) it is a interpretation language

Not quite. It's compiled to byte-code - just like Java (would you call
Java an 'interpreted language' ?)

Alex Martelli

unread,

Sep 2, 2007, 12:55:29 PM9/2/07

to

Mark Dickinson <dick...@gmail.com> wrote:

I don't think the creation of xrange objects is a meaningful part of
Python's execution time here. Consider:

M = 1000
solutions = [0] * M

def f2():

"a*a + b*b precalculated"
for a in xrange(1, M):

a2 = a*a

for b in xrange(1, M - a):
s = a2 + b*b
for c in xrange(1, M - a - b):
if s == c*c:
solutions[a+b+c] += 1

def f3(M=M, solutions=solutions):
"pull out all the stops"
xrs = [xrange(1, k) for k in xrange(0, M+1)]
for a in xrs[M]:
a2 = a*a
for b in xrs[M-a]:

s = a2 + b*b

for c in xrs[M-a-b]:

if s == c*c:
solutions[a+b+c] += 1

import time

t = time.time()
f2()
e = time.time()
print e-t, max(xrange(M), key=solutions.__getitem__)

solutions = [0]*M
t = time.time()
f3(M, solutions)
e = time.time()
print e-t, max(xrange(M), key=solutions.__getitem__)

f2 is Arnaud's optimization of the OP's algorithm by simple hoisting; f3
further hoists the xrange creation -- it creates only 1000 such objects
rather than half a million. And yet...:

brain:~/py25 alex$ python puz.py
34.6613101959 840
36.2000119686 840
brain:~/py25 alex$

...which suggests that creating an xrange object is _cheaper_ than
indexing a list...

Alex

Paul Rubin

unread,

Sep 2, 2007, 1:05:27 PM9/2/07

to

al...@mac.com (Alex Martelli) writes:
> ...which suggests that creating an xrange object is _cheaper_ than
> indexing a list...

Why not re-use the xrange instead of keeping a list around?

Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
>>> a = xrange(3)
>>> print list(a)
[0, 1, 2]
>>> print list(a)
[0, 1, 2]

Alex Martelli

unread,

Sep 2, 2007, 1:24:55 PM9/2/07

to

Reusing xranges is exactly what my code was doing -- at each for loop
you need an xrange(1, k) for a different value of k, which is why you
need some container to keep them around (and a list of xrange objects is
the simplest applicable container).

Your suggestion doesn't appear to make any sense in the context of the
optimization problem at hand -- what list(...) calls are you thinking
of?! Please indicate how your suggestion would apply in the context of:

def f3(M=M, solutions=solutions):
"pull out all the stops"
xrs = [xrange(1, k) for k in xrange(0, M+1)]
for a in xrs[M]:
a2 = a*a
for b in xrs[M-a]:
s = a2 + b*b
for c in xrs[M-a-b]:
if s == c*c:
solutions[a+b+c] += 1

Alex

Paul Rubin

unread,

Sep 2, 2007, 1:37:01 PM9/2/07

to

al...@mac.com (Alex Martelli) writes:
> Reusing xranges is exactly what my code was doing

Oh cool, I missed that, I was just going by the text description.
Looking at the code, yes, it's re-using the xranges. Thanks.

Mark Dickinson

unread,

Sep 2, 2007, 1:42:23 PM9/2/07

to

On Sep 2, 12:55 pm, al...@mac.com (Alex Martelli) wrote:

> Mark Dickinson <dicki...@gmail.com> wrote:
> > Well, for one thing, you're creating half a million xrange objects in
> > the course of the search. All the C code has
> > to do is increment a few integers.
>
> I don't think the creation of xrange objects is a meaningful part of
> Python's execution time here. Consider:

> [...]

Agreed---I just came to the same conclusion after doing some tests.
So maybe it's the billion or so integer objects being created that
dominate the running time? (Not sure how many integer objects
actually are created here: doesn't Python cache *some* small
integers?)

Mark

"Martin v. Löwis"

unread,

Sep 2, 2007, 2:45:49 PM9/2/07

to Bruno Desthuilliers

>> (2) it is a interpretation language
> Not quite. It's compiled to byte-code - just like Java (would you call
> Java an 'interpreted language' ?)

Python is not implemented like Java. In Java (at least in HotSpot),
the byte code is further compiled to machine code before execution;
in Python, the byte code is interpreted.

Whether this makes Python an interpreter or a compiler,
I don't know.

Regards,
Martin

Alex Martelli

unread,

Sep 2, 2007, 2:46:50 PM9/2/07

to

Mark Dickinson <dick...@gmail.com> wrote:

Yep, some, say -5 to 100 or thereabouts; it also caches on a free-list
all the "empty" integer-objects it ever has (rather than returning the
memory for the system), so I don't think there's much optimization to be
had on that score either.

Alex

Wildemar Wildenburger

unread,

Sep 2, 2007, 3:00:45 PM9/2/07

to

Martin v. Löwis wrote:
>>> (2) it is a interpretation language
>> Not quite. It's compiled to byte-code - just like Java (would you call
>> Java an 'interpreted language' ?)
>
> Python is not implemented like Java. In Java (at least in HotSpot),
> the byte code is further compiled to machine code before execution;
> in Python, the byte code is interpreted.
>

OK, good. Naive question now comming to mind: Why doesn't Python do the
latter as well?

/W

Mark Dickinson

unread,

Sep 2, 2007, 3:23:21 PM9/2/07

to

On Sep 2, 7:51 am, jwrweather...@gmail.com wrote:
> The resulting executable takes 0.24 seconds to run. I'm not expecting
> a scripting language to run faster than native code, but I was
> surprised at how much slower it was in this case. Any ideas as to what
> is causing python so much trouble in the above code?

Below is some code and some timings that show that Python is spending
at least 1/3 of its time computing the squares: a*a, b*b and c*c. So
to explain why Python is so much slower than C, it should be enough to
explain what's going on with these multiplications. Here's my attempt
at an explanation:

To compute a*a, Python has to do the following, all at run-time

(1) Find the type of a.
(2) Look up the corresponding multiplication method for this type.
(3) (In the multiplication method): find the type of the right-hand
side, in this case a again.
(4) Having established that both arguments are ints, worry about
whether the result is going to overflow (in which case a long needs to
be returned).
(5) Do the multiplication.
(6) Allocate a new Python integer to hold the result, and return it.

The C code only has to do step (5), and this boils down to a single
machine instruction.

Mark

Code and timings: (Python 2.5.1/G4.)

def test():
solutions = [0] * 1000

for a in xrange(1, 1000):
for b in xrange(1, 1000 - a):
for c in xrange(1, 1000 - a - b):

if a*a + b*b == c*c:
solutions[a+b+c] += 1
return solutions

def test2():
solutions = [0] * 1000
squares = [x*x for x in xrange(1000)]

for a in xrange(1, 1000):
for b in xrange(1, 1000 - a):
for c in xrange(1, 1000 - a - b):

if squares[a] + squares[b] == squares[c]:
solutions[a+b+c] += 1
return solutions

>>> from profile import run
>>> run("l = test(); l2 = test2()")
5 function calls in 299.984 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.010 0.010 0.010 0.010 :0(setprofile)
1 0.000 0.000 299.973 299.973 <string>:1(<module>)
1 0.000 0.000 299.984 299.984 profile:0(l = test(); l2
= test2())
0 0.000 0.000 profile:0(profiler)
1 182.262 182.262 182.262 182.262 test.py:1(test)
1 117.711 117.711 117.711 117.711 test.py:10(test2)

Steven D'Aprano

unread,

Sep 2, 2007, 6:05:44 PM9/2/07

to

There is no single version of Java, and the reference interpretation runs
on a virtual machine just like Python. Today there are virtual machine
implementations of Java, native compilers, and Just In Time compilers for
Java, including HotSpot mentioned by Martin, but Java the language was
originally defined to run on a VM.

See, for example, here: http://schmidt.devlib.org/java/compilers.html

There are costs to native compilation, the biggest one of course being
the initial investment in time and effort in creating the native
compiler. Sun and other commercial companies have invested a lot of money
in Java, and I don't think the money invested in Python has come even
close. Consequently, any work into JIT compilation for Java has been done
by volunteers.

Nevertheless, we have Psyco, which is a JIT compiler of sorts; work also
continues on PyPy (Python implemented in Python) which, it is hoped, will
lead to faster Python implementations.

Part of the reason that native compilation for Python is hard is that
Python's dynamic object model makes it very difficult to apply the same
sorts of compiler optimizations that C and Java allow. Comparatively
little of the work done by Python can be safely pushed from run time to
compile time, so it is unlikely that the average Python application will
ever run as fast as the equivalent C code -- although that ignores the
question of what "the equivalent C code" could possibly mean. (If the C
code includes all the dynamic high-level features of Python, it too would
surely run as slowly as Python, and if it didn't, it can hardly be said
to be equivalent.) Nevertheless, by using something like Psyco parts of
your Python code can run at virtually the same speed as C.

A big question mark in my mind is Lisp, which according to aficionados is
just as dynamic as Python, but has native compilers that generate code
running as fast as highly optimized C. I'm not qualified to judge whether
the lessons learnt from Lisp can be applied to Python, but in any case
Lisp is an old, old language -- only Fortran is older. The amount of
development effort and money put into Lisp dwarfs that put into Python by
possibly a hundred or more.

So... if you'd like to see Python run as fast as C or Lisp, and you have
a few tens of millions of dollars spare to invest in development, I think
the Python Software Foundation would love to hear from you.

--
Steven.

Diez B. Roggisch

unread,

Sep 2, 2007, 6:29:01 PM9/2/07

to

Wildemar Wildenburger schrieb:

because of the dynamic nature of it. Java is statically typed, so the
JIT can heavily optimize.

OTH psyco IS a JIT-compiler to optimize certain calculations which are
mostly of a numerical nature. But this can't be done to the extend it is
possible in java.

Diez

Chris Mellon

unread,

Sep 2, 2007, 8:32:06 PM9/2/07

to pytho...@python.org

Original code: 3 min, 9 seconds
Original code with psyco: 30.28 seconds
Original code, compiled with Pyrex: 1min 39 seconds (moves the for loops into C)
Same, with a,b,c declared with cdef int: 20 seconds (uses c pow()
instead of going through Python).
Same, with the for..xrange loops rewritten to use C integer loops: 13
seconds (saves xrange use, but since the python loop was already gone,
not too much savings).

With a small amount of work, you should be able to implement the C
algorithm in Pyrex (or even just use the C algorithm, in a wrapper
that converts the return value to an int) and get the same speed as
the C version + a constant marshalling factor.

Adding pysco took all of 20 seconds (half that because I needed to
move the module scope code into a function), and re-writing with pyrex
took a minute or two. So, as a demonstration of numeric optmization,
both of them can give you quite good rewards with minimal effort.

Neil Cerutti

unread,

Sep 3, 2007, 12:58:57 AM9/3/07

to

I'd call it an integrated compiler and virtual machine--a classic
combination.

--
Neil Cerutti

Neil Cerutti

unread,

Sep 3, 2007, 1:05:11 AM9/3/07

to

On 2007-09-02, Steven D'Aprano

<st...@REMOVE-THIS-cybersource.com.au> wrote:
> A big question mark in my mind is Lisp, which according to
> aficionados is just as dynamic as Python, but has native
> compilers that generate code running as fast as highly
> optimized C. I'm not qualified to judge whether the lessons
> learnt from Lisp can be applied to Python, but in any case Lisp
> is an old, old language -- only Fortran is older. The amount of
> development effort and money put into Lisp dwarfs that put into
> Python by possibly a hundred or more.

Lisp, as far as I know, requires type declarations, discipline,
deep knowledge of Lisp, and more than passing knowledge of your
Lisp implementation in order to generate code that's competitive
with C. On the other hand, Lisp afficionados don't have a problem
meeting those requirements. ;)

> So... if you'd like to see Python run as fast as C or Lisp, and
> you have a few tens of millions of dollars spare to invest in
> development, I think the Python Software Foundation would love
> to hear from you.

Hmmm. I have four dollars... almost five...

--
Neil Cerutti

Paul Rubin

unread,

Sep 3, 2007, 1:56:48 AM9/3/07

to

Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes: A big

> question mark in my mind is Lisp, which according to aficionados is
> just as dynamic as Python, but has native compilers that generate
> code running as fast as highly optimized C. I'm not qualified to
> judge whether the lessons learnt from Lisp can be applied to Python,
> but in any case Lisp is an old, old language -- only Fortran is
> older. The amount of development effort and money put into Lisp
> dwarfs that put into Python by possibly a hundred or more.

Writing a simple Lisp compiler is not all that difficult. There's a
chapter in SICP about how to do it, so it's sometimes part of
introductory courses. To get C-like performance you end up having to
rely on user-supplied type declarations and/or type inference, but
even without that you can still do ok. I'd say Lisp is a less dynamic
language than Python. Lisp doesn't have duck typing and doesn't
represent class instance variables as dictionary elements that have to
be looked up at runtime all the time. This maybe even applies to
locals, e.g. in Python if you say

x = 3
print "hello"
y = x + 4

you're possibly not guaranteed that y=7, because you might have bound
sys.stdout to something with a .write method that reaches up the call
stack and messes with the caller's local variables, and if the langref
manual says this is supposed to work, then the compiler has to do it
like CPython. Maybe Python 4.0 will fix some of this stuff.

jwrwea...@gmail.com

unread,

Sep 3, 2007, 9:39:46 AM9/3/07

to

Thanks for all the answers to my question. I think what I need to take
away from this is that xrange is an object - I thought it was just
some loop construct, and that maths is slow in python - so avoid
pathological looping.I remember the first time I tried Objective-C on
OS X I used the NSNumber class for arithmetic - the results that time
were pretty awful too!

Paul Boddie

unread,

Sep 3, 2007, 10:23:02 AM9/3/07

to

On 3 Sep, 15:39, jwrweather...@gmail.com wrote:
> Thanks for all the answers to my question. I think what I need to take
> away from this is that xrange is an object

Indeed, but using xrange can be faster than converting your "for"
loops to "while" loops plus a counter; I converted your code to use
the latter arrangement and it was noticeably slower. Perhaps the
repeated invocation of each xrange object's next method is less
expensive than repeatedly obtaining new incremented int objects and
testing them against other int objects.

> - I thought it was just some loop construct, and that maths is slow in
> python - so avoid pathological looping.

You'd be wise to avoid doing unnecessary work deep within nested loops
in any programming language. Sadly, it's not possible for the compiler
to work out that some calculations can be lifted out of the innermost
loops in Python, since the various guarantees that make such
operations possible in other languages are not offered by Python.

> I remember the first time I tried Objective-C on OS X I used the
> NSNumber class for arithmetic - the results that time were pretty
> awful too!

Obviously, it'd be a fairer test if you used comparable numeric types
in both implementations, but a more capable numeric type would be
overkill for the C implementation of this program, and a less capable
numeric type doesn't exist for Python unless you're willing to use a
dialect such as Pyrex (as others have shown).

Paul

Bruno Desthuilliers

unread,

Sep 3, 2007, 10:26:30 AM9/3/07

to

Martin v. Löwis a écrit :

>>> (2) it is a interpretation language
>> Not quite. It's compiled to byte-code - just like Java (would you call
>> Java an 'interpreted language' ?)
>
> Python is not implemented like Java. In Java (at least in HotSpot),
> the byte code is further compiled to machine code before execution;

This is a VM-specific feature.

> in Python, the byte code is interpreted.

Idem.

> Whether this makes Python an interpreter or a compiler,
> I don't know.

This is an old troll^Mdiscussion !-)

Now IANAL, but AFAIK, the byte-code compilation stage can make a great
difference performances-wide, and for a same language, a byte-compiled
implementation is likely to be faster than a pure-interpreted one, at
least because of the saving on parsing time (please someone correct me
if I'm wrong) ...

Robert Brown

unread,

Sep 11, 2007, 4:29:11 AM9/11/07

to

Neil Cerutti <hor...@yahoo.com> writes:

> On 2007-09-02, Steven D'Aprano
> <st...@REMOVE-THIS-cybersource.com.au> wrote:
>> A big question mark in my mind is Lisp, which according to
>> aficionados is just as dynamic as Python, but has native
>> compilers that generate code running as fast as highly
>> optimized C.

> Lisp, as far as I know, requires type declarations, discipline,

> deep knowledge of Lisp, and more than passing knowledge of your
> Lisp implementation in order to generate code that's competitive
> with C.

On my Mac Mini, the original Python code runs in 6 minutes 37 seconds using
Python 2.3.5. The Common Lisp code below, a straightforward translation,
containing *no* type declarations, runs in 27 seconds on the same Mini using
SBCL.

When the commented out optimization declaration is included in the code, the
run time drops to 3.3 seconds. For comparison, run times with GCC on the C
version posted earlier are 3.5 seconds unoptimized and 0.58 seconds with
optimization flag "-O3".

So for this example, deep knowledge of the Lisp implementation and type
declarations are not required to get speed equivalent to unoptimized C code.
Approaching the speed of optimized C code does require more work.

====================

(defun doit ()
;; (declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((solutions (make-array 1001 :initial-element 0)))
(loop for a upfrom 1 below 10000
do (loop for b upfrom 1 below (- 1000 a)
do (loop for c upfrom 1 below (- 1000 a b)
do (let ((p (+ a b c)))
(when (and (< p 1000)
(= (+ (* a a) (* b b)) (* c c)))
(incf (aref solutions p)))))))
(loop with max-index = 0
with max = 0
for solution across solutions
for index upfrom 0
do (when (> solution max)
(setf max solution)
(setf max-index index))
finally (print max-index))))

Neil Cerutti

unread,

Sep 11, 2007, 9:57:55 AM9/11/07

to

That's pretty cool. Thanks for the example.

--
Neil Cerutti

paulhankin

unread,

Sep 19, 2007, 12:59:44 PM9/19/07

to

On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote:

> ... right-angled triangle puzzle solver
>
> max = 0
> maxIndex = 0
> index = 0
> for solution in solutions:
> if solution > max:
> max = solution
> maxIndex = index
> index += 1
>
> print maxIndex

Not that it solves your slowness problem, or has anything to
do with the question you asked :), but you can replace this
last segment of your code with:

print solutions.index(max(solutions))

--
Paul Hankin