Python is darn fast (was: How fast is Python)

Michele Simionato

unread,

Aug 23, 2003, 11:37:44 AM8/23/03

to

I posted this few weeks ago (remember the C Sharp thread?) but it went
unnoticed on the large mass of posts, so let me retry. Here I get Python+
Psyco twice as fast as optimized C, so I would like to now if something
is wrong on my old laptop and if anybody can reproduce my results.
Here are I my numbers for calling the error function a million times
(Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):

$ time p23 erf.py
real 0m0.614s
user 0m0.551s
sys 0m0.029s

This is twice as fast as optimized C:

$ gcc erf.c -lm -o3
$ time ./a.out
real 0m1.125s
user 0m1.086s
sys 0m0.006s

Here is the situation for pure Python

$time p23 erf.jy
real 0m25.761s
user 0m25.012s
sys 0m0.049s

and, just for fun, here is Jython performance:

$ time jython erf.jy
real 0m42.979s
user 0m41.430s
sys 0m0.361s

The source code follows (copied from Alex Martelli's post):

----------------------------------------------------------------------

$ cat erf.py
import math
import psyco
psyco.full()

def erfc(x):
exp = math.exp

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*x)
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
return erfcx

def main():
erg = 0.0

for i in xrange(1000000):
erg += erfc(0.456)

if __name__ == '__main__':
main()

--------------------------------------------------------------------------

# python/jython version = same without "import psyco; psyco.full()"

--------------------------------------------------------------------------

$cat erf.c
#include <stdio.h>
#include <math.h>

double erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;

p = 0.3275911;
a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;

t = 1.0 / (1.0 + p*x);
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);

return erfcx;
}

int main()
{
double erg=0.0;
int i;

for(i=0; i<1000000; i++)
{
erg = erg + erfc(0.456);
}

return 0;
}

Michele Simionato, Ph. D.
MicheleS...@libero.it
http://www.phyast.pitt.edu/~micheles
--- Currently looking for a job ---

Irmen de Jong

unread,

Aug 23, 2003, 12:07:04 PM8/23/03

to

Michele Simionato wrote:
> I posted this few weeks ago (remember the C Sharp thread?) but it went
> unnoticed on the large mass of posts, so let me retry. Here I get Python+
> Psyco twice as fast as optimized C, so I would like to now if something
> is wrong on my old laptop and if anybody can reproduce my results.

I can. :-)

I had to increase the loop counter by a factor of 10 because it
ran too fast on my machine (celeron 533 mhz), and added a print statement
of the accumulated sum (erg). These are my results:

[irmen@atlantis]$ gcc -O3 -march=pentium2 -mcpu=pentium2 -lm erf.c

[irmen@atlantis]$ time ./a.out
5190039.338694
4.11user 0.00system 0:04.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (103major+13minor)pagefaults 0swaps

[irmen@atlantis]$ time python2.3 erf.py
5190039.33869
2.91user 0.01system 0:02.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (544major+380minor)pagefaults 0swaps

This is with gcc 3.2.2 on Mandrake 9.1.

While Python + Psyco is not twice as fast as compiled & optimized C,
it's still faster by almost 30% on my system, which is still great!!

--Irmen

Lawrence Oluyede

unread,

Aug 23, 2003, 12:30:05 PM8/23/03

to

Michele Simionato wrote:

> $ time p23 erf.py
> real 0m0.614s
> user 0m0.551s
> sys 0m0.029s
>
> This is twice as fast as optimized C:
>
> $ gcc erf.c -lm -o3
> $ time ./a.out
> real 0m1.125s
> user 0m1.086s
> sys 0m0.006s
>
> Here is the situation for pure Python
>
> $time p23 erf.jy
> real 0m25.761s
> user 0m25.012s
> sys 0m0.049s
>
> and, just for fun, here is Jython performance:
>
> $ time jython erf.jy
> real 0m42.979s
> user 0m41.430s
> sys 0m0.361s

Mmm...on my machine C is faster. What version of GCC do you have? I think
2.9x, right?

These are my timings (Debian GNU Linux Unstable, Duron 1300, Python2.3,
Psyco CVS, GCC 3.3.2, Java 1.4.1):

$ time python erf.py

real 0m0.251s
user 0m0.207s
sys 0m0.012s

$ gcc erf.c -lm -O3

$ time ./a.out

real 0m0.162s

user 0m0.157s
sys 0m0.001s

Notice that C is faster than Psyco + Python2.3 on my machine (about 65% of
speedup)

Without Psyco Python2.3 tooks about 6 seconds

$ time python erf.jy

real 0m6.177s
user 0m6.040s
sys 0m0.010s

And Jython is definitely slower :)

$ time jython erf.jy

real 0m10.423s
user 0m9.506s
sys 0m0.197s

--
Lawrence "Rhymes" Oluyede
http://loluyede.blogspot.com
rhy...@NOSPAMmyself.com

P...@draigbrady.com

unread,

Aug 23, 2003, 1:54:59 PM8/23/03

to

Michele Simionato wrote:
> I posted this few weeks ago (remember the C Sharp thread?) but it went
> unnoticed on the large mass of posts, so let me retry. Here I get Python+

> Psyco twice as fast as optimized C
>
> $ gcc erf.c -lm -O3

try a 3.x series gcc with the appropriate -march=pentium3
You'll be pleasently surprised. I can't understand how
the sudden improvment of gcc code generation lately hasn't
been hyped more? If you want to try different machines
then http://www.pixelbeat.org/scripts/gcccpuopt will give
you the appropriate machine specific gcc options to use.
Note also -ffast-math might help a lot in this application?

cheers,
Pádraig.

Lawrence Oluyede

unread,

Aug 23, 2003, 2:27:32 PM8/23/03

to

P...@draigBrady.com wrote:

> If you want to try different machines
> then http://www.pixelbeat.org/scripts/gcccpuopt will give
> you the appropriate machine specific gcc options to use.

Very cool script, thanks :) Anyway it didn't change so much with erf.c

$ time ./erf

real 0m0.190s
user 0m0.157s
sys 0m0.001s

$ time ./erfCPU

real 0m0.180s
user 0m0.146s
sys 0m0.002s

erfCPU is compiled with the flags suggested by gcccpuopt script:

$ gcccpuopt
-march=athlon-xp -mfpmath=sse -msse -mmmx -m3dnow

Irmen de Jong

unread,

Aug 23, 2003, 3:08:55 PM8/23/03

to

P...@draigBrady.com wrote:

> try a 3.x series gcc with the appropriate -march=pentium3
> You'll be pleasently surprised.

In my other reply I mentioned that I still get a Python+Psyco
advantage of 30% over a gcc 3.2.2 compiled version.
My gcc is doing a lot better than Michele's reported 50% difference,
but Python+Psyco still wins :-)

--Irmen

John J. Lee

unread,

Aug 23, 2003, 7:31:15 PM8/23/03

to

So, the interesting part is: why?

John

Simon Burton

unread,

Aug 23, 2003, 8:14:07 PM8/23/03

to

On Sun, 24 Aug 2003 00:31:15 +0100, John J. Lee wrote:

> Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes:
>
>> P...@draigBrady.com wrote:
>>

...

>> but Python+Psyco still wins :-)
>
> So, the interesting part is: why?
>
>
> John

My suspicion is that when psyco looks at erfc, it
finds that nothing changes and so replaces the
function call with the resulting number (am i right? it's the
same each time?). This is what a "specializing compiler"
would do, me thinks. So, try using a different number
with each call.

Simon.

Geoff Howland

unread,

Aug 23, 2003, 9:55:51 PM8/23/03

to

I showed this to a more low-level friend and he said it's all down to
exp(). He says it's an awfully slow C call, and if you were to take
it out you would see a real change. Since youre measuring ALU type
things anyway, it's better to keep them to real ALUs. Otherwise
picking a different problem to solve, or not using exp() would make
the test more valid.

He also stated that just linking the library might take some
non-trivial amount of time out of the C implementation.

I'm fluent in C, but not into doing these kinds of things so I didn't
investigate myself, just mouthpiecing it on in case it's useful for
you in your question.

Though since you're just running time on it, I would assume the same
is true for launching the Python interpretter. Seems to me that maybe
time from the command prompt isnt the best measuring tool because of
those loading issues . Checking start/stop time once the program is
initiated would rule those out.

-Geoff Howland
http://ludumdare.com/

Van Gale

unread,

Aug 23, 2003, 11:59:51 PM8/23/03

to

Michele Simionato wrote:
> I posted this few weeks ago (remember the C Sharp thread?) but it went
> unnoticed on the large mass of posts, so let me retry. Here I get Python+
> Psyco twice as fast as optimized C, so I would like to now if something
> is wrong on my old laptop and if anybody can reproduce my results.
> Here are I my numbers for calling the error function a million times
> (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):
>

> $ gcc erf.c -lm -o3

Did you really use "-o3" instead of "-O3"? The lowercase -o3 will
produce object code file named "3" instead of doing optimization.

Also, code optimization with gcc really seems to be a black art. It's
just very application and machine specific as many Gentoo Linux users
are discovering. For instance "-Os" can be much faster than "-O[23]"
for some applications run on machines that have fast CPU's with low
cache. It also seems that -O2 is faster in many cases than -O3 :/

Regardless, if psyco can speed things up to even unoptimized C range
that's pretty impressive.

Van

Van Gale

unread,

Aug 24, 2003, 12:07:38 AM8/24/03

to

Lawrence Oluyede wrote:
> P...@draigBrady.com wrote:
>
>>If you want to try different machines
>>then http://www.pixelbeat.org/scripts/gcccpuopt will give
>>you the appropriate machine specific gcc options to use.
>
> Very cool script, thanks :) Anyway it didn't change so much with erf.c

> erfCPU is compiled with the flags suggested by gcccpuopt script:
>
> $ gcccpuopt
> -march=athlon-xp -mfpmath=sse -msse -mmmx -m3dnow

You still need some -O optimization flags. The -m options just let gcc
generate some nice instructions specific to your Athlon CPU.

Also, I don't think that script is all that useful because at least some
(if not all) of those -m options are already implied by -march=athlon-xp
(I don't recall which ones off the top of my head but I'll find a
reference for anyone interested... you can also find out by looking at
the gcc command line option parsing code).

Anyone who wants some other good ideas for the best flags on their
machine check out ccbench:

http://www.rocklinux.net/packages/ccbench.html

The problem here of course is that not all applications behave like the
benchmarks :(

Van Gale

Lawrence Oluyede

unread,

Aug 24, 2003, 2:57:26 AM8/24/03

to

Van Gale wrote:

> You still need some -O optimization flags. The -m options just let gcc
> generate some nice instructions specific to your Athlon CPU.

I didn't mention but I also used -O3 flag. I don't know why but on my
machine C code is faster than psyco code in this test

Lawrence Oluyede

unread,

Aug 24, 2003, 3:09:03 AM8/24/03

to

Van Gale wrote:

> It also seems that -O2 is faster in many cases than -O3 :/

Yeah, you're right. I ran ccbench on my machine and it told me that the best
is gcc -O2 -march=athlon-mp (I have a Duron not an MP :) I tried but it's
slower than the one compiled with the gcccpuopt suggested flags.

It's funny :)

Michele Simionato

unread,

Aug 24, 2003, 4:42:04 AM8/24/03

to

Van Gale <ne...@exultants.org> wrote in message news:<XKW1b.4512$TE6....@newssvr27.news.prodigy.com>...

> Michele Simionato wrote:
> > I posted this few weeks ago (remember the C Sharp thread?) but it went
> > unnoticed on the large mass of posts, so let me retry. Here I get Python+
> > Psyco twice as fast as optimized C, so I would like to now if something
> > is wrong on my old laptop and if anybody can reproduce my results.
> > Here are I my numbers for calling the error function a million times
> > (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):
> >
> > $ gcc erf.c -lm -o3
>
> Did you really use "-o3" instead of "-O3"? The lowercase -o3 will
> produce object code file named "3" instead of doing optimization.

Yes, I used -O3, this was a misprint in the e-email. The compiler was
gcc 2.96.

Michele Simionato

unread,

Aug 24, 2003, 7:29:15 AM8/24/03

to

Your remark made me check which a.out I was timing (I had two
versions,
an optimized one and a standard one). Since I had compiled them one
month
ago, I made some mistake and the numbers I posted refer to the
non-optimized
version, actually :-( Sorry about that, it seemed too good to be true
!
(a factor of 2 in favor of Python!). With the respect to the -O3
optimized version the gain of Python over C is only of 15-20% (Irmen
de Iong
reported 30% on his machine). I am sure that with a newer compiler and
smart switches one can revert the ratio. But the message is still the
same: on this numerical computation, taken from a real life case, C
does not have any significant advantage over Python. Maybe the reason
is
the slow C implementation of exp or some other accident, but still
this
is a quite impressive result.

Michele

Michele Simionato

unread,

Aug 24, 2003, 10:38:49 AM8/24/03

to

I finally came to the conclusion that the exceeding good performance
of Psyco was due to the fact that the function was called a million
times with the *same* argument. Evidently Psyco is smart enough to
notice that. Changing the argument at each call
(erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
Psyco improves Python performance by an order of magnitude, but still it
is not enough :-(

I was too optimistic!

Here I my numbers for Python 2.3, Psyco 1.0, Red Hat Linux 7.3,
Pentium II 366 MHz:

$ time p23 erf.py
real 0m3.245s
user 0m3.164s
sys 0m0.037s

This is more than four times slower than optimized C:

$ gcc erf.c -lm -O3
$ time ./a.out

real 0m0.742s
user 0m0.725s
sys 0m0.002s

Here is the situation for pure Python

$time p23 erf.jy
real 0m27.470s
user 0m27.162s
sys 0m0.023s

and, just for fun, here is Jython performance:

$ time jython erf.jy
real 0m44.395s
user 0m42.602s
sys 0m0.389s

----------------------------------------------------------------------

$ cat erf.py
import math
import psyco
psyco.full()

def erfc(x):
exp = math.exp

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*x)
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
return erfcx

def main():
erg = 0.0

for i in xrange(1000000):
erg += erfc(i/1000000.0)

--------------------------------------------------------------------------

return erfcx;
}

for(i=0; i<1000000; i++)
{
erg = erg + erfc(i/1000000.0);
}

return 0;
}

Michele Simionato, Ph. D.
MicheleS...@libero.it

http://www.phyast.pitt.edu/~micheles/
---- Currently looking for a job ----

Alan Gauld

unread,

Aug 24, 2003, 12:33:24 PM8/24/03

to

On 24 Aug 2003 07:38:49 -0700, mi...@pitt.edu (Michele Simionato)
wrote:

> I finally came to the conclusion that the exceeding good performance
> of Psyco was due to the fact that the function was called a million
> times with the *same* argument.

Still an interesting thread.

> Here is the situation for pure Python

> user 0m27.162s
> sys 0m0.023s
>
> and, just for fun, here is Jython performance:

> user 0m42.602s
> sys 0m0.389s

Just as a matter of interest what happens if Jython code is
compiled to Java and run under the JVM rather than the Jython
interpreter? Does that increase or decrease the speed.

I've been toying with the idea of moving to Jython but the speed
hit is one factor (of several) that stops me.

Alan G.

Author of the Learn to Program website
http://www.freenetpages.co.uk/hp/alan.gauld

Tim Hochberg

unread,

Aug 24, 2003, 4:40:24 PM8/24/03

to

Michele Simionato wrote:
> I finally came to the conclusion that the exceeding good performance
> of Psyco was due to the fact that the function was called a million
> times with the *same* argument. Evidently Psyco is smart enough to
> notice that. Changing the argument at each call
> (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
> Psyco improves Python performance by an order of magnitude, but still it
> is not enough :-(

This is not suprising. Last I checked, Psyco does not fully compile
floating point expressions. If, I rememeber correctly (though every time
try to delve too deeply into Psyco my brains start oozing out my ears),
there are three ways a in which a given chunk of code evaluated. At one
level, which I'll call #1, Psyco generates the machine code(*) for the
expression. At a second level, Psyco calls out to C helper functions,
but still works with unboxed values. At the third level, Psyco punts and
creates a Python object and hands things off to the interpreter.

Most integer functions operate at level #1, so they tend to be quite
fast. Most floating point operations operate at level #2, so they have a
certain amount of overhead, but are still much faster than unpsyco
(sane?) Python. I believe the reason for this is that x86 floating point
operations are very messy, so Armin punted...

(*) Armin is working on virtual machine implementation of Psyco, so it
should be available on non x86 machines soon.

FWIW,

-tim

dan

unread,

Aug 24, 2003, 10:52:18 PM8/24/03

to

mi...@pitt.edu (Michele Simionato) wrote in message
news:<2259b0e2.03082...@posting.google.com>...

> I finally came to the conclusion that the exceeding good performance
> of Psyco was due to the fact that the function was called a million
> times with the *same* argument. Evidently Psyco is smart enough to
> notice that. Changing the argument at each call
> (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
> Psyco improves Python performance by an order of magnitude, but still it
> is not enough :-(
>

It's plenty! A factor of 4 from optimized C, considering the newness
and limited resources behind psyco, is very encouraging, and good
enough for most tasks. Java JIT compilers are still around a factor
of 2 slower than C, and they've had at least 2 orders of magnitude
more whumpage.

This is a far cry from the factor of 10-30 I've been seeing with pure
python. For performance-critical code, this could be the difference
between hand-coding 5% versus 20% of your code.

Excellent news!!

John J. Lee

unread,

Aug 26, 2003, 4:59:23 PM8/26/03

to

danb...@yahoo.com (dan) writes:

> mi...@pitt.edu (Michele Simionato) wrote in message
> news:<2259b0e2.03082...@posting.google.com>...

[...]

> This is a far cry from the factor of 10-30 I've been seeing with pure
> python. For performance-critical code, this could be the difference
> between hand-coding 5% versus 20% of your code.
>
> Excellent news!!

If you care about this a lot, don't forget Pyrex.

John

dan

unread,

Aug 26, 2003, 11:58:35 PM8/26/03

to

right, pyrex -- looked at that a while ago. Compiled Python with
C-style type declarations, right? Kinda like common lisp??? (I'm
stretching my memory cells now)

will review

j...@pobox.com (John J. Lee) wrote in message news:<87wud05...@pobox.com>...