# Python speed vs csharp

4 visualizzazioni
Passa al primo messaggio da leggere

### Mike

da leggere,
31 lug 2003, 02:09:2231/07/03
a
Bear with me: this post is moderately long, but I hope it is relatively
succinct.

I've been using Python for several years as a behavioral modeling tool for
the circuits I design. So far, it's been a good trade-off: compiled C++
would run faster, but the development time of Python is so much faster, and
the resulting code is so much more reliable after the first pass, that I've
never been tempted to return to C++. Every time I think stupid thoughts
like, "I'll bet I could do this in C++," I get out my copy of Scott Meyers'
"Effecive C++," and I'm quickly reminded why it's better to stick with
Python (Meyers is a very good author, but points out lots of quirks and
pitfalls with C++ that I keep thinking that I shouldn't have to worry
about, much less try to remember). Even though Python is wonderful in that
regard, there are problems.

Here's the chunk of code that I'm spending most of my time executing:

# Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
# Fifth order approximation. |error| <= 1.5e-7 for all x
#
def erfc( x ):
p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*float(x))
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
return erfcx

This is an error function approximation, which gets called around 1.5
billion times during the simulation, and takes around 3500 seconds (just
under an hour) to complete. While trying to speed things up, I created a
simple test case with the code above and a main function to call it 10
million times. The code takes roughly 210 seconds to run.

The current execution time is acceptable, but I need to increase the
complexity of the simulation, and will need to increase the number of data
points by around 20X, to roughly 30 billion. This will increase the
simulation time to over a day. Since the test case code was fairly small, I
translated it to C and ran it. The C code runs in approximately 7.5
seconds. That's compelling, but C isn't: part of my simulation includes a
parser to read an input file. I put that together in a few minutes in
Python, but there are no corresponding string or regex libraries with my C
compiler, so converting my Python code would take far more time than I'd
save during the resulting simulations.

On a lark, I grabbed the Mono C# compiler, and converted my test case to
C#. Here's the corresponding erfc code:

public static double erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;

p = 0.3275911;
a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;

t = 1.0 / (1.0 + p*x);
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * Math.Exp(-Math.Pow(x,2.0));
return erfcx;
}

Surprisingly (to me, at least), this code executed 10 million iterations in
8.5 seconds - only slightly slower than the compiled C code.

My first question is, why is the Python code, at 210 seconds, so much
slower?

My second question is, is there anything that can be done to get Python's
speed close to the speed of C#?

-- Mike --

### Martin v. Löwis

da leggere,
31 lug 2003, 02:29:2631/07/03
a
Mike <mi...@nospam.com> writes:

> My first question is, why is the Python code, at 210 seconds, so much
> slower?

One would have to perform profiling, but in this case, it is likely
because numbers are objects in Python, so each multiplication results
in a memory allocation. In addition, as soon as the parameters of the
multiplication are not needed anymore, you get a deallocation of the
temporaries.

There is also an overhead for the byte code interpretation, but this
is likely less significant.

> My second question is, is there anything that can be done to get Python's
> speed close to the speed of C#?

C# implementations typically do just-in-time compilation to machine
code. They represent numbers as primitive (machine) values, directly
using machine operations for the multiplication. Doing the same for
Python is tricky.

I recommend that you try out Python 2.3. It has significantly improved
memory allocation mechanisms, so you should see some speed-up.

You could also try Psyco, which is a just-in-time compiler for
Python. It should give very good results in this case, also.

Regards,
Martin

### Sami Hangaslammi

da leggere,
31 lug 2003, 05:18:3431/07/03
a
Mike <mi...@nospam.com> wrote in message news:<rkdc7poiachh.1x...@40tude.net>...

> My second question is, is there anything that can be done to get Python's
> speed close to the speed of C#?

If you have a single function that is easy to convert to C, you
should. Wrapping it with, for example, Pyrex
(http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/) is easy enough,

### Alexandre Fayolle

da leggere,
31 lug 2003, 05:22:4131/07/03
a
Dans l'article <rkdc7poiachh.1x...@40tude.net>, Mike a écrit :

> This is an error function approximation, which gets called around 1.5
> billion times during the simulation, and takes around 3500 seconds (just
> under an hour) to complete. While trying to speed things up, I created a
> simple test case with the code above and a main function to call it 10
> million times. The code takes roughly 210 seconds to run.

Hi there.

This is pure numerical computation, it is therefore a job for psyco
(http://psyco.sf.net/)

I've made a quick test, cut'n'pasted your code in a small python module,
--------------------------8<----------------------
def run():
results = []
for i in xrange(10000000):
x = i/100. # apply erfc to a float
y = erfc(x)

if __name__ == '__main__':
run()
-------------------------8<-----------------------
at the end. So I'm basically calling erfc 1 million times, and not doing
much with the result. On my older machine (1.1GHz Celeron), this takes
13 seconds.

By adding the following 2 lines at the top of the script:
---8<-------
import psyco
psyco.full()
----8<------
I get down to under 4 seconds.

This is obviously not as good as you'd get with c#, but it's still a
great gain with not too much work, I think.

--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Développement logiciel avancé - Intelligence Artificielle - Formations

### Achim Domma

da leggere,
31 lug 2003, 05:24:3931/07/03
a
"Martin v. Löwis" <mar...@v.loewis.de> wrote in message
news:m3smonm...@mira.informatik.hu-berlin.de...

> Mike <mi...@nospam.com> writes:
> I recommend that you try out Python 2.3. It has significantly improved
> memory allocation mechanisms, so you should see some speed-up.
>
> You could also try Psyco, which is a just-in-time compiler for
> Python. It should give very good results in this case, also.

Another option would be to implement the critical function in C++ and make
it available to python. This could be done for example via boost.python. See
http://www.boost.org/libs/python/doc/tutorial/index.html It's much easier
than you would expect.

Achim

### Riccardo

da leggere,
31 lug 2003, 05:44:0831/07/03
a
Another option would be SWIG (www.swig.org).
Write a simple function, pass the header file (sometime you do not need to
modify it at all) to SWIG and you got your source ready to be compile with
distutils (for example).
I used that approach several times.

I had a look at pyrex and BOOST.
I think that pyrex sounds very interesting if you use C (but I NEVER used it
myself unless for some stupid demo/test) while I found BOOST learning curve
(and installation, etc. etc.) a bit steep. Anyway this is just what I felt
using the two of them.

Riccardo

"Mike" <mi...@nospam.com> wrote in message
news:rkdc7poiachh.1x...@40tude.net...

### Guenther Starnberger

da leggere,
31 lug 2003, 07:34:5131/07/03
a
Mike <mi...@nospam.com> wrote in message news:<rkdc7poiachh.1x...@40tude.net>...

> The current execution time is acceptable, but I need to increase the

> complexity of the simulation, and will need to increase the number of data
> points by around 20X, to roughly 30 billion. This will increase the
> simulation time to over a day. Since the test case code was fairly small, I
> translated it to C and ran it. The C code runs in approximately 7.5
> seconds. That's compelling, but C isn't: part of my simulation includes a
> parser to read an input file. I put that together in a few minutes in
> Python, but there are no corresponding string or regex libraries with my C
> compiler, so converting my Python code would take far more time than I'd
> save during the resulting simulations.

just write the time critical code in C and use SWIG
(http://www.swig.org/) to generate a wrapper which allows you to call

/gst

### John J. Lee

da leggere,
31 lug 2003, 07:43:3631/07/03
a
Mike <mi...@nospam.com> writes:
[...]

> Here's the chunk of code that I'm spending most of my time executing:
>
> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
> # Fifth order approximation. |error| <= 1.5e-7 for all x
> #
> def erfc( x ):
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> return erfcx
[...]

> million times. The code takes roughly 210 seconds to run.
[...]

> On a lark, I grabbed the Mono C# compiler, and converted my test case to
> C#. Here's the corresponding erfc code:
>
> public static double erfc( double x )
[...]

> Surprisingly (to me, at least), this code executed 10 million iterations in
> 8.5 seconds - only slightly slower than the compiled C code.
[...other people explained why Python is slow at this...]

> My second question is, is there anything that can be done to get Python's
> speed close to the speed of C#?

Pyrex site seems to be down ATM so you'll have to wait to try it, but
on my machine 1E7 calls takes 10 seconds (about 60 seconds for the
original). It's starting to annoy me that Pyrex doesn't have an
initialisation syntax, though...

BTW, if you want to know why Python is so slow at this, it's highly
instructive to, for example, comment out the 'cdef double t', and look
at the generated erfc.c code with and without that line. Normal
Python code does essentially the same thing as the version without the
Pyrex cdef, except of course all those Py_ functions aren't written
down in a static C program like this.

numarray (successor to Numeric) is also worth a look -- but only buys
you anything when processing big arrays, of course, not when writing
functions that take scalar arguments.

-----start of erfc.pyx-----
cdef extern from "math.h":
double exp(double x)

# Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
# Fifth order approximation. |error| <= 1.5e-7 for all x
#

cdef double p
cdef double a1
cdef double a2
cdef double a3
cdef double a4
cdef double a5

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

def erfc(double x):
cdef double t

t = 1.0 / (1.0 + p*x)

return ( (a1 + (a2 + (a3 + (a4 + a5*t)*t)*t)*t)*t ) * exp(-(x**2))
-----end of erfc.pyx-----
-----start of Makefile-----
all:
python setup.py build_ext --inplace

clean:
rm -f erfc.c *.o *.so *~ core
rm -rf build
-----end of Makefile-----
-----start of setup.py-----
from distutils.core import setup, Extension
from Pyrex.Distutils import build_ext

setup(name = "erfc",
ext_modules = [Extension("erfc", sources=["erfc.pyx"])],
cmdclass = {'build_ext': build_ext}
)
-----end of setup.py-----
-----start of erfc.py-----
#!/usr/bin/env python

from erfc import erfc
from time import time

t = time()

val = 0.5

# 1E7 calls
for i in xrange(1000000):
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)
erfc(val)

t = time() - t
print "total time", t
-----end of erfc.py-----

John

### Juha Autero

da leggere,
31 lug 2003, 07:00:5431/07/03
a
Mike <mi...@nospam.com> writes:

> Since the test case code was fairly small, I
> translated it to C and ran it. The C code runs in approximately 7.5
> seconds. That's compelling, but C isn't: part of my simulation includes a
> parser to read an input file. I put that together in a few minutes in
> Python, but there are no corresponding string or regex libraries with my C
> compiler, so converting my Python code would take far more time than I'd
> save during the resulting simulations.

So, write only most executed parts with C as C extension to
Python. This is the reason I like Python. If your program is not fast
enough, you can implement critical functions with C. For example the
error function would be very simple to turn into an C extension module
using "Extending and Embedding the Python Interpreter" document from
<URL: http://www.python.org/doc/current/ext/ext.html>.

There are several tools that make writing C extensions easier. I've
used SWIG and Pyrex.

--
Juha Autero
http://www.iki.fi/jautero/
Eschew obscurity!

### Steven Taschuk

da leggere,
31 lug 2003, 07:18:0331/07/03
a
Quoth Mike:
[a number-crunching hotspot benchmarks at 210s in Python]

> translated it to C and ran it. The C code runs in approximately 7.5
> seconds. That's compelling, but C isn't: part of my simulation includes a
> parser to read an input file. I put that together in a few minutes in
> Python, but there are no corresponding string or regex libraries with my C
> compiler, so converting my Python code would take far more time than I'd
> save during the resulting simulations.

It seems you're speaking here as if the options are a pure-Python
program and a pure-C program. Is there a reason you can't move
just this one hotspot function out to C, leaving the parser etc.
in Python?

--
Steven Taschuk stas...@telusplanet.net
"Telekinesis would be worth patenting." -- James Gleick

### Richie Hindle

da leggere,
31 lug 2003, 08:39:4731/07/03
a

[Mike]

> is there anything that can be done to get Python's speed close
> to the speed of C#?

Using Pyrex takes a million loops from 7.1 seconds to 1.3 seconds for me:

------------------------------- mike.pyx -------------------------------

cdef extern from "math.h":
double exp(double x)

def erfc(x):
cdef double f_x, p, a1, a2, a3, a4, a5, t

f_x = x

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*f_x)
return ( (a1 + (a2 + (a3 + (a4 + a5*t)*t)*t)*t)*t ) * exp(-(f_x**2))

------------------------------- setup.py -------------------------------

# Run with "python setup.py build"
from distutils.core import setup
from distutils.extension import Extension

from Pyrex.Distutils import build_ext
setup(

name = "Mike",
ext_modules=[
Extension("mike", ["mike.pyx"])

],
cmdclass = {'build_ext': build_ext}
)

------------------------------- test.py -------------------------------

# This takes about 1.3 seconds on my machine.
import time
from mike import erfc

if __name__ == '__main__':
start = time.time()
for i in xrange(1000000):
erfc(123.456)
print time.time() - start

-----------------------------------------------------------------------

Given that a million calls to "def x(): pass" take 0.9 seconds, the actual
calculation time is about 0.4 seconds, down from 6.2. Not as fast as your
C# results, but a lot better than you had before.

Using Psyco dropped from 7.1 seconds to 5.4 - not great, but not bad for

import psyco
psyco.bind(erfc)

Psyco currently only works on Intel-x86 compatible processors.

Hope that helps,

--
Richie Hindle
ric...@entrian.com

### Steve Horsley

da leggere,
31 lug 2003, 13:43:0931/07/03
a
On Wed, 30 Jul 2003 23:09:22 -0700, Mike wrote:

> Bear with me: this post is moderately long, but I hope it is relatively
> succinct.
>
> I've been using Python for several years as a behavioral modeling tool for
> the circuits I design. So far, it's been a good trade-off: compiled C++
> would run faster, but the development time of Python is so much faster, and
> the resulting code is so much more reliable after the first pass, that I've
> never been tempted to return to C++. Every time I think stupid thoughts
> like, "I'll bet I could do this in C++," I get out my copy of Scott Meyers'
> "Effecive C++," and I'm quickly reminded why it's better to stick with
> Python (Meyers is a very good author, but points out lots of quirks and
> pitfalls with C++ that I keep thinking that I shouldn't have to worry
> about, much less try to remember). Even though Python is wonderful in that
> regard, there are problems.
>

It might be worth having a look ay Jython. This allows python code to run
on a java VM by emitting JVM bytecode rather than standard python
interpreter bytecode. This gives you all the hotspot optimisations that
have been invested in java runtime performance.

Steve

### David M. Cooke

da leggere,
31 lug 2003, 17:08:5531/07/03
a
At some point, Mike <mi...@nospam.com> wrote:
> Here's the chunk of code that I'm spending most of my time executing:
>
> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
> # Fifth order approximation. |error| <= 1.5e-7 for all x
> #
> def erfc( x ):
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> return erfcx

With python2.2, a million iterations takes 9.93 s on my machine.
With python2.3, 8.13 s.

Replacing float(x) with just x (multiplying by p (a float) is enough to
coerce x to a float), with python 2.3 I'm down to 6.98 s. Assigning
math.exp to a global variable exp cuts this down to 6.44 s:

exp = math.exp
def erfc(x):
[...]

t = 1.0 / (1.0 + p*x)

erfcx = ... * exp(-x**2)
return erfcx

You can go a little further and look up ways of optimising the
polynomial evaluation -- Horner's method is a good, general-purpose
technique, but you can get faster if you're willing to put some effort
into it. I believe Ralston and Rabonwitz's numerical analysis book has
a discussion on this (sorry, I don't have the complete reference; the
book's at home).

That's about the best you can do in pure python.

Now look at how you're calling this routine. Can you call it on a
bunch of numbers at once? Using Numeric arrays (http://numpy.org/),
calling the above routine on an array of a million numbers takes
1.6 s. Calling it 1000 times on an array of 1000 numbers takes 0.61 s.

Using erfc from scipy.special, this takes 0.65 s on a array of a
million numbers. Scipy (from http://scipy.org/) is a library of
scientific tools, supplementing Numeric. I believe the routine it uses
for erfc comes from the Cephes library (http://netlib.org/), which has
a error < 3.4e-14 for x in (-13,0). Calling it 1000 times on an array
of 1000 numbers takes 0.43 s.

(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
fast as once on an array of 1000*1000 numbers.)

scipy.special.erfc on moderate-sized arrays is 18.9x faster. That's
pretty good :)

For numerical codes in Python, I *strongly* suggest that you use
Numeric (or numarray, it's successor). It's _much_ better to pass 1000
numbers around as a aggregate, instead of 1 number 1000 times.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

### Bengt Richter

da leggere,
31 lug 2003, 18:37:2531/07/03
a

I just made a C extension out erfc and only got about 5-6 times improvement over the
python version, which makes me think that most of the time is in call setup and passing
parameters, which as you say involves memory allocation/deallocation.

I don't see that csharp could improve on that. The benefit would really only show up
fullfledged if the calling loop is using C calling methods, as in numeric array processing
that only makes one call from python and many in C, depending on arrary dimensions.

I wonder what it would take to have the Python VM do primary allocation of ints and doubles
on the stack C-style and only migrate them to heap if/when exported. Then you could
bypass exporting when calling a trusted C extension, and in a case like erfc
you could just make the raw C call and replace the value on top of the stack, and go on
from there until you hit an export trigger.

Regards,
Bengt Richter

### John Hunter

da leggere,
31 lug 2003, 18:00:0831/07/03
a
>>>>> "Mike" == Mike <mi...@nospam.com> writes:

Mike> My second question is, is there anything that can be done to
Mike> get Python's speed close to the speed of C#?

Can you use scipy? erfc is implemented in the scipy.stats module,
which runs about 3 times faster than the pure python version in my
simple tests.

JDH

### Bengt Richter

da leggere,
31 lug 2003, 20:08:4631/07/03
a
On Wed, 30 Jul 2003 23:09:22 -0700, Mike <mi...@nospam.com> wrote:

>Bear with me: this post is moderately long, but I hope it is relatively
>succinct.
>
>I've been using Python for several years as a behavioral modeling tool for
>the circuits I design. So far, it's been a good trade-off: compiled C++
>would run faster, but the development time of Python is so much faster, and
>the resulting code is so much more reliable after the first pass, that I've
>never been tempted to return to C++. Every time I think stupid thoughts
>like, "I'll bet I could do this in C++," I get out my copy of Scott Meyers'
>"Effecive C++," and I'm quickly reminded why it's better to stick with
>Python (Meyers is a very good author, but points out lots of quirks and
>pitfalls with C++ that I keep thinking that I shouldn't have to worry
>about, much less try to remember). Even though Python is wonderful in that
>regard, there are problems.
>
>Here's the chunk of code that I'm spending most of my time executing:
>

<snip>

Some thoughts...

Is there any order to the values of x you call with? Or are they totally
unrelated? (I.e., can the last value give you a good approximation for the
next, knowing the step in the argument? What are the limits to possible x values?
Have you tested how many terms in the approximation you really need to get usable results
(vs final tests)? |error| <= 1.5e-7 for all x sounds unnecessarily tight
for many engineering problems. What are the statistics of the errors on the
x values you are passing? Could you conceivably pre-compute a lookup table to
use instead of a function? E.g., this can be useful in turning a uniform distribution
into something else. Sometimes you can get away with a surprisingly small table for that.
(Plus you get to pre-eliminate weird outliers ;-)

Do you have many places in your code calling this function? 1.5 billion sounds
like mostly in a few places in some loops. If so, have you tried just putting
the approximation code in-line without a function call? That should save a
fair amount of time. Function calls are pretty expensive time-wise in python.
You can verify the relative cost by just returning a constant immediately from
the function.

Regards,
Bengt Richter

### Terry Reedy

da leggere,
31 lug 2003, 20:51:1631/07/03
a

"David M. Cooke" <cooked...@physics.mcmaster.ca> wrote in message
news:qnkn0eu...@arbutus.physics.mcmaster.ca...

> At some point, Mike <mi...@nospam.com> wrote:
> > Here's the chunk of code that I'm spending most of my time
executing:
> >
> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.
7.1.26)
> > # Fifth order approximation. |error| <= 1.5e-7 for all x
> > #
> > def erfc( x ):
> > p = 0.3275911
> > a1 = 0.254829592
> > a2 = -0.284496736
> > a3 = 1.421413741
> > a4 = -1.453152027
> > a5 = 1.061405429
> >
> > t = 1.0 / (1.0 + p*float(x))
> > erfcx = ( (a1 + (a2 + (a3 +
> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> > return erfcx

Does inlining the constants give any speedup? or is local lookup so
fast that it does not matter?

TJR

### David M. Cooke

da leggere,
31 lug 2003, 21:50:1831/07/03
a

Should've thought of that too :-). It runs in 6.07 s now (my previous
version was 6.45 s, and the unmodified version above was 8.23 s).

Here's my best version:

exp = math.exp
def erfc( x ):
t = 1.0 + 0.3275911*x
return ( (0.254829592 + (
(1.421413741 + (-1.453152027 + 1.061405429/t)/t)/t - 0.284496736)
/t)/t ) * exp(-(x**2))

The reversal of the order of the a2 evaluation is because looking at
the bytecodes (using the dis module) it turns out that -0.284 was
being stored as 0.284, then negated. However, -1.45 is stored as that.
This runs in 5.77 s. You could speed that up a little if you're
willing to manipulate the bytecode directly :-)

### Brendan Hahn

da leggere,
31 lug 2003, 22:16:1931/07/03
a
cooked...@physics.mcmaster.ca (David M. Cooke) wrote:
>(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
>fast as once on an array of 1000*1000 numbers.)

Probably cache effects.

--
brendan DOT hahn AT hp DOT com

### Bengt Richter

da leggere,
31 lug 2003, 23:11:3531/07/03
a
On Wed, 30 Jul 2003 23:09:22 -0700, Mike <mi...@nospam.com> wrote:

I recoded the above and also in-lined the code, and also made a C DLL module version
and timed them roughly:

====< erfcopt.py >=====================================================
import math
def erfc_old( x ):

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*float(x))
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
return erfcx

# You know that all those "constant" assignments are
# actually executed every time erfc is called,
# right? To move that to def-time instead of
# call-time, you can write

def erfc( x, # constants bound at def time, rparen moved down, commas added ):
p = 0.3275911,
a1 = 0.254829592,
a2 = -0.284496736,
a3 = 1.421413741,
a4 = -1.453152027,
a5 = 1.061405429,
exp = math.exp
): # just to make it stand out
t = 1.0 / (1.0 + p*x) #XXX# don't need float(), since p is float
return ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x**2) #XXX# elim attr lookup on global math.exp
# elim intermediate store of erfcx #XXX# return erfcx

def test():
from time import clock
from erfc import erfc as erfc_in_c

# sanity check
for x in range(20): assert erfc_old(x) == erfc(x) and abs(erfc_in_c(x)-erfc_old(x))<1e-18

t1=clock()
for i in xrange(100000): erfc_old(1.2345)
t2=clock()
for i in xrange(100000): erfc(1.2345)
t3=clock()
# inlining code from old version:
# constant assignments hoisted out of loop, of course

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

x = 1.2345
for i in xrange(100000):

t = 1.0 / (1.0 + p*float(x))
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))

t4 = clock()
for i in xrange(100000): erfc_in_c(1.2345)
t5=clock()
print 'old: %f, new: %f, inline: %f, in_c %f' %(t2-t1, t3-t2, t4-t3, t5-t4)

import dis
print '---- erfc_old code -----------------------------------'
dis.dis(erfc_old)
print '---- erfc --------------------------------------------'
dis.dis(erfc)

if __name__== '__main__':
test()
=======================================================================

Result is:

[19:48] C:\pywk\cstuff>erfcopt.py
old: 4.490975, new: 3.123005, inline: 3.051674, in_c 0.806907

I was surprised that the inline gain was not more, but I guess there was enough computation
to obscure that cost. Anyway, moving some of the work to def-time paid off about 30%. But
going to C cut 82%.

I added the C version to the sanity check, and had to allow an epsilon. I suspect that this
is because the C version probably keeps the whole problem in the FPU with 64 fractional bits
of precision until returning the final double (which has 53 bits), wherease the Python
algorithm presumably stores all intermediate values in memory with the 53-bit precision,
and so loses in the overall. That's my theory, anyway.

To see the extra work the old code is doing vs the new, you can see below:

---- erfc_old code -----------------------------------
0 SET_LINENO 2

3 SET_LINENO 3
9 STORE_FAST 3 (p)

12 SET_LINENO 4
18 STORE_FAST 2 (a1)

21 SET_LINENO 5
27 STORE_FAST 5 (a2)

30 SET_LINENO 6
36 STORE_FAST 4 (a3)

39 SET_LINENO 7
45 STORE_FAST 7 (a4)

48 SET_LINENO 8
54 STORE_FAST 6 (a5)

Everything above here was unnecessary to do at run-time

57 SET_LINENO 10
75 CALL_FUNCTION 1
78 BINARY_MULTIPLY
80 BINARY_DIVIDE
81 STORE_FAST 8 (t)

84 SET_LINENO 11
105 BINARY_MULTIPLY
110 BINARY_MULTIPLY
115 BINARY_MULTIPLY
120 BINARY_MULTIPLY
125 BINARY_MULTIPLY
138 BINARY_POWER
139 UNARY_NEGATIVE
140 CALL_FUNCTION 1
143 BINARY_MULTIPLY
144 STORE_FAST 1 (erfcx) <<--+
|
147 SET_LINENO 13 <<--+
150 LOAD_FAST 1 (erfcx) <<--+-- all eliminated
153 RETURN_VALUE
157 RETURN_VALUE
---- erfc --------------------------------------------
0 SET_LINENO 20

3 SET_LINENO 29
18 BINARY_MULTIPLY
20 BINARY_DIVIDE
21 STORE_FAST 8 (t)

24 SET_LINENO 30
45 BINARY_MULTIPLY
50 BINARY_MULTIPLY
55 BINARY_MULTIPLY
60 BINARY_MULTIPLY
65 BINARY_MULTIPLY
75 BINARY_POWER
76 UNARY_NEGATIVE
77 CALL_FUNCTION 1
80 BINARY_MULTIPLY
81 RETURN_VALUE <<--
85 RETURN_VALUE

[19:49] C:\pywk\cstuff>

Regards,
Bengt Richter

### Greg Brunet

da leggere,
31 lug 2003, 23:25:5531/07/03
a
"Mike" <mi...@nospam.com> wrote in message
news:rkdc7poiachh.1x...@40tude.net...
> Bear with me: this post is moderately long, but I hope it is
relatively
> succinct.
> ...

> This is an error function approximation, which gets called around 1.5
> billion times during the simulation, and takes around 3500 seconds
(just
> under an hour) to complete. While trying to speed things up, I created
a
> simple test case with the code above and a main function to call it 10
> million times. The code takes roughly 210 seconds to run.
>
> The current execution time is acceptable, but I need to increase the
> complexity of the simulation, and will need to increase the number of
data
> points by around 20X, to roughly 30 billion. This will increase the
> simulation time to over a day. Since the test case code was fairly
small, I
> translated it to C and ran it. The C code runs in approximately 7.5
> seconds. That's compelling, but C isn't: part of my simulation
includes a
> parser to read an input file. I put that together in a few minutes in
> Python, but there are no corresponding string or regex libraries with
my C
> compiler, so converting my Python code would take far more time than
I'd
> save during the resulting simulations.
> ...

> Surprisingly (to me, at least), this code executed 10 million
iterations in
> 8.5 seconds - only slightly slower than the compiled C code.
>
> My first question is, why is the Python code, at 210 seconds, so much
> slower?
>

I'm just a little curious how test code runs 10 million cycles in 210
seconds, while production code with runs 1.5 billion cycles in 3500
seconds. That makes your production code more efficient by roughly a
factor of 10. If so, even though folks are only quoting an improvement
factor of about 3x for Pysco and 5-6x with Pyrex, that may be sufficient

--
Greg

### Lulu of the Lotus-Eaters

da leggere,
31 lug 2003, 21:42:3531/07/03
a
Richie Hindle <ric...@entrian.com> wrote previously:

|Using Psyco dropped from 7.1 seconds to 5.4 - not great, but not bad for
| import psyco
| psyco.bind(erfc)

import psyco
psyco.full()

I tried what Hindle did, intially (actually 'erfcp=psyco.proxy(erfc)'),
and was disappointed by a similarly lackluster improvement (about the
same order--he probably has a faster machine).

Using the call to psyco.full() gives me a much nicer improvement from
18 seconds -> 3.25 seconds.

--
---[ to our friends at TLAs (spread the word) ]--------------------------
Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad
---[ Postmodern Enterprises <me...@gnosis.cx> ]--------------------------

### Andrew Dalke

da leggere,
1 ago 2003, 00:51:3501/08/03
a
David M. Cooke:

> Here's my best version:
>
> exp = math.exp
> def erfc( x ):

Change that to

def erfc(x, exp = math.exp)

so there's a local namespace lookup instead of a global one. This
is considered a hack.

### Richie Hindle

da leggere,
1 ago 2003, 05:13:4501/08/03
a

[Richie]

> Using Psyco dropped from 7.1 seconds to 5.4 - not great, but not bad for
> two lines of additional code:
> import psyco
> psyco.bind(erfc)

[Lulu]

> Try these two lines instead:
>
> import psyco
> psyco.full()
>

> 18 seconds -> 3.25 seconds.

I tried that the first time but saw no improvement over bind... having
checked the Psyco docs I see that it won't optimise code at module scope,
and I had my driver loop at module scope. I now get 1.25 seconds down
from 7.1 - slightly faster than Pyrex. Impressive.

[OT: I hope Pyrex isn't suffering as a result of people writing little
test scripts full of module-level code and being disappointed at the
results.]

--
Richie Hindle
ric...@entrian.com

### Richie Hindle

da leggere,
1 ago 2003, 08:20:0701/08/03
a

> [OT: I hope Pyrex isn't suffering as a result of people writing little
> test scripts full of module-level code and being disappointed at the
> results.]

s/Pyrex/Psyco/g

--
Richie Hindle
ric...@entrian.com

### Bengt Richter

da leggere,
1 ago 2003, 10:05:3601/08/03
a
On 1 Aug 2003 03:11:35 GMT, bo...@oz.net (Bengt Richter) wrote:
[...]

> # inlining code from old version:
> # constant assignments hoisted out of loop, of course
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
> x = 1.2345
> for i in xrange(100000):
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> t4 = clock()
> for i in xrange(100000): erfc_in_c(1.2345)
> t5=clock()
> print 'old: %f, new: %f, inline: %f, in_c %f' %(t2-t1, t3-t2, t4-t3, t5-t4)

D'oh ...

# inlining the new version code

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

exp = math.exp
for i in xrange(100000):

t = 1.0 / (1.0 + p*x) #XXX# don't need float(), since p is float

erfcx = ( (a1 + (a2 + (a3 +

(a4 + a5*t)*t)*t)*t)*t ) * exp(-x**2) #XXX# elim attr lookup on global math.exp

t6=clock()
print 'old: %f, new: %f, inline: %f, in_c %f, inline new %f' %(
t2-t1, t3-t2, t4-t3, t5-t4, t6-t5)

>=======================================================================
>
>Result is:
>
>
>[19:48] C:\pywk\cstuff>erfcopt.py
>old: 4.490975, new: 3.123005, inline: 3.051674, in_c 0.806907
>
>I was surprised that the inline gain was not more, but I guess there was enough computation
>to obscure that cost. Anyway, moving some of the work to def-time paid off about 30%. But
>going to C cut 82%.
>

[ 6:48] C:\pywk\cstuff>erfcopt.py
old: 4.448979, new: 3.112778, inline: 3.073672, in_c 0.806066, inline new 2.193491

Ok, that last number is a bit more like it. Of course the C version is still the fastest.
Forgot to include the source ;-/ If you have MSVC++6 and python sources you can compile

====< mkpydll.cmd >===========================================
@cl -LD -nologo -Id:\python22\include %1.c -link -LIBPATH:D:\python22\libs -export:init%1
==============================================================
Invoked thus:

[ 7:03] C:\pywk\cstuff>mkpydll erfc
erfc.c
Creating library erfc.lib and object erfc.exp

(Tested only as far as you've seen in recent posts!)
====< erfc.c >================================================
/*
** erfc.c taken with slight mod from Mike @ nospam.com's c# posted version of
** Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
** Fifth order approximation. |error| <= 1.5e-7 for all x
** Version 0.01 20030731 Bengt Richter bo...@oz.net
*/

#include <math.h>

static double
erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;

p = 0.3275911;

a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;

t = 1.0 / (1.0 + p*x);

erfcx = ( (a1 + (a2 + (a3 +

(a4 + a5*t)*t)*t)*t)*t ) * exp(-pow(x,2.0));
return erfcx;
}

#include "Python.h"
#include <windows.h>

static char doc_erfc[] =
"erfc(x) -> approximation\n";

static PyObject *
erfc_erfc(PyObject *self, PyObject *args)
{
PyObject *rv;
double farg;

if (!PyArg_ParseTuple(args, "d:erfc", &farg)) /* Single floatingpoint arg */
return NULL;
rv = Py_BuildValue("d", erfc(farg));
return rv;
}

/* List of functions defined in the module */
static struct PyMethodDef erfc_module_methods[] = {
{"erfc", erfc_erfc, METH_VARARGS, doc_erfc},
{NULL, NULL} /* sentinel */
};

/* Initialization function for the module (*must* be called initerfc) */
static char doc_erfc_module[] =
"# Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)\n"
"# Fifth order approximation. |error| <= 1.5e-7 for all x\n";

DL_EXPORT(void)
initerfc(void)
{
PyObject *m, *d, *x;

/* Create the module and add the functions */
m = Py_InitModule("erfc", erfc_module_methods);
d = PyModule_GetDict(m);
x = PyString_FromString(doc_erfc_module);
PyDict_SetItemString(d, "__doc__", x);
Py_XDECREF(x);
}
==============================================================
>>> help('erfc')
Help on module erfc:

NAME
erfc

FILE
c:\pywk\cstuff\erfc.dll

DESCRIPTION

# Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
# Fifth order approximation. |error| <= 1.5e-7 for all x

FUNCTIONS
erfc(...)
erfc(x) -> approximation

DATA
__file__ = 'erfc.dll'
__name__ = 'erfc'

>>> from erfc import erfc
>>> for x in range(10): print '%s => %g'%(x,erfc(x))
...
0 => 1
1 => 0.157299
2 => 0.00467786
3 => 2.21051e-005
4 => 1.54603e-008
5 => 1.54782e-012
6 => 2.17852e-017
7 => 4.26424e-023
8 => 1.15275e-029
9 => 4.28336e-037

Regards,
Bengt Richter

### David M. Cooke

da leggere,
1 ago 2003, 16:14:4701/08/03
a
At some point, Mike <mi...@nospam.com> wrote:
[...]

> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
> # Fifth order approximation. |error| <= 1.5e-7 for all x
> #
> def erfc( x ):
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> return erfcx

Since no else has mentioned this, and it's not in the comments: the
above code is no good for x < 0 (erfc(-3)=84337, for instance). You'll
need a check for x < 0, and if so, use the identity
erfc(x) = 2 - erfc(-x). That'll slow you down a tad again :)

### John Machin

da leggere,
1 ago 2003, 22:08:5401/08/03
a
bo...@oz.net (Bengt Richter) wrote in message news:<bgds3g\$f95\$0...@216.39.172.122>...

> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * exp(-pow(x,2.0));

Wouldn't (x*x) be better than pow(x,2.0) ?

### Bengt Richter

da leggere,
2 ago 2003, 00:31:1502/08/03
a

Yup, I would think so too, but I wasn't thinking about optimizing the C ;-) I just
copied the C# from Mike's post and made it legal C ;-)

Regards,
Bengt Richter

### Mike

da leggere,
2 ago 2003, 14:43:1502/08/03
a
On Thu, 31 Jul 2003 22:25:55 -0500, Greg Brunet wrote:

> I'm just a little curious how test code runs 10 million cycles in 210
> seconds, while production code with runs 1.5 billion cycles in 3500
> seconds. That makes your production code more efficient by roughly a
> factor of 10. If so, even though folks are only quoting an improvement
> factor of about 3x for Pysco and 5-6x with Pyrex, that may be sufficient

Ugh. Because I can't be trusted with a calculator, that's why. 1.5 billion
should be 150 million. The elapsed times are correct: 210 seconds and 3500
seconds.

Using psyco, the test case runs in 43 seconds. I was able to reduce that to
25 seconds using some of the improvements that Bengt Richter provided. I
was able to reduce the full simulation time down to 647 seconds; I'm sure
more improvements are possible, since I haven't implemented all of Bengt's
improvements. Part of the problem with the full simulation is that psyco
gobbles up all my available memory if I use psyco.full(), so I've been
experimenting with psyco.bind(). 647 seconds is my best result so far.

-- Mike --

### Mike

da leggere,
2 ago 2003, 14:48:2902/08/03
a
On Fri, 01 Aug 2003 16:14:47 -0400, David M. Cooke wrote:

> At some point, Mike <mi...@nospam.com> wrote:
> [...]
>> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
>> # Fifth order approximation. |error| <= 1.5e-7 for all x
>> #
>> def erfc( x ):
>> p = 0.3275911
>> a1 = 0.254829592
>> a2 = -0.284496736
>> a3 = 1.421413741
>> a4 = -1.453152027
>> a5 = 1.061405429
>>
>> t = 1.0 / (1.0 + p*float(x))
>> erfcx = ( (a1 + (a2 + (a3 +
>> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
>> return erfcx
>
> Since no else has mentioned this, and it's not in the comments: the
> above code is no good for x < 0 (erfc(-3)=84337, for instance). You'll
> need a check for x < 0, and if so, use the identity
> erfc(x) = 2 - erfc(-x). That'll slow you down a tad again :)

That's correct (and noted in A&S). Since the argument I use is time
relative to the start of the simulation, it's always positive. Since I'm
not using it as a general purpose erfc calculator, I didn't bother with any
checks (although, now that you've mentioned it, it would be rather stupid

-- Mike --

### Mike

da leggere,
2 ago 2003, 15:33:0702/08/03
a
On Wed, 30 Jul 2003 23:09:22 -0700, Mike wrote:

Wow. Thanks, everyone, for the suggestions.

As noted in another reply, the full simulation calls the erfc function 150
milllion times, not 1.5 billion; the elapsed times are correct.

There doesn't appear to be any way to use previous results in the next
calculation. I'm calculating the output of a transmission line to a series
of input pulses. The input pulse sequence is not repetitive, so the output
doesn't asymptotically approach a final state.

I had tried the erfc function in the transcendental package (I can't
remember where I grabbed it from). I didn't try the scipy package, but
since I already had that installed, I should have tried that first.

Using psyco, the test case runs in 43 seconds. I was able to reduce that to
25 seconds using some of the improvements that Bengt Richter provided. I
was able to reduce the full simulation time down to 647 seconds; I'm sure
more improvements are possible, since I haven't implemented all of Bengt's
improvements. Part of the problem with the full simulation is that psyco
gobbles up all my available memory if I use psyco.full(), so I've been
experimenting with psyco.bind(). 647 seconds is my best result so far.

Psyco seems to do a good enough job of optimization that replacing lines
like

a1 = 0.254829542
return ( (a1 + ...

with

return ( (0.254829542 + ...

don't result in any significant change in speed. In addition to the
math.exp call, the test case called the function like the full simulation
does, with a math.sqrt(c) in the call, which I replaced with sqrt =
math.sqrt, then replaced the x**2 with x*x.

At first glance, I didn't think there was a way to "vectorize" the
simulation, but after looking at it a little more carefully, it's pretty
clear that there is.

At this point, I'm within a factor of 2 of the c# results, and I'm pretty
much prepared to call that good enough to justify sticking with Python
instead of learning and converting to c#.

Thanks again,

-- Mike --

### Jan Decaluwe

da leggere,
2 ago 2003, 16:59:0302/08/03
a
Mike wrote:
>

> improvements. Part of the problem with the full simulation is that psyco
> gobbles up all my available memory if I use psyco.full(), so I've been
> experimenting with psyco.bind().

What about psyco.profile() ? I understood from Armin Rigo that's
the preferred interface.

--
Jan Decaluwe - Resources bvba
Losbergenlaan 16, B-3010 Leuven, Belgium
mailto:j...@jandecaluwe.com
http://jandecaluwe.com

### Mike

da leggere,
3 ago 2003, 11:09:2303/08/03
a
On Sat, 02 Aug 2003 22:59:03 +0200, Jan Decaluwe wrote:

> Mike wrote:
>>
>
>> improvements. Part of the problem with the full simulation is that psyco
>> gobbles up all my available memory if I use psyco.full(), so I've been
>> experimenting with psyco.bind().
>
> What about psyco.profile() ? I understood from Armin Rigo that's
> the preferred interface.

Since the slow portion of code is well known, I didn't bother to try
profile(). However...

Since you mentioned it, I gave it a try, and there's no difference in the
results between using psyco.bind() on the slow routines and psyco.profile()
on the entire simulation. However, the difference in effort, and
corresponding reduction in liklihood of programmer error, is enough to
justify using the single psyco.profile() line vs multiple psyco.bind()
lines.

Thanks,

-- Mike --

### Siegfried Gonzi

da leggere,
3 ago 2003, 18:40:1103/08/03
a
Mike wrote:

> Ugh. Because I can't be trusted with a calculator, that's why. 1.5 billion
> should be 150 million. The elapsed times are correct: 210 seconds and 3500
> seconds.

Hi:

I am sure you know what you are after, but Python for numerical
computations is more or less crap. Not only has Python a crippled
syntax it is most of the time dog slow and unpredictable. Python is
more or less a fraud. But I must admit it is sometimes immensly useful
due to its big pool of add-on libraries. But I would rather like to
see that all the manpower gets concentrated in languages which
deserve to be called a programming language (for example Scheme,

I am not sure what exactly are your simulation problems and
requirements but the Bigloo (Scheme) compiler is very robust and I use
it every day for numerical computations. I once used Python but
quickly stopped using it and I have never looked back.

For example Bigloo lets you specify types. It is easy to specify types
in Bigloo and the code is transportable even, provided you write a
small macro where the compiler decides wheter it is Bigloo or
not. Normally, giving types in Bigloo puts you in the range of "only 2
times slower than C"; also for heavy numerical computations. The
following small Scheme program sums up your error function 10^6
times. Also included a C program. The timings on a simple Celeron
laptop are as follows:

g++ -O3 erf.c : 0.5 seconds
bigloo -Obench erf.scm: 1.1 seconds

The Bigloo code could be a bit improved because I do not know wheter
using (expn x 2.0) introduces a bottleneck in Bigloo. If there are only
floating
point calculations involved Bigloo is as fast as C (see
slashdot). If you have big arrays Bigloo's gap to C is shifted towards
a factor of 2 or in hard cases 3. This is still fairly good.

Not to cheerlead Bigloo or Scheme here. If you are pragmatic stay with
Python it has really tons of add-on libraries and huge community.

S. Gonzi
;;;;;;;;;;;;;;;;; BIGLOO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(module erfc2
(option (set! *genericity* #f)))

(define (erfc x )
(let* ((x (exact->inexact x))
(p 0.3275911)
(a1 0.254829592)
(a2 -0.284496736)
(a3 1.421413741)
(a4 -1.453152027)
(a5 1.061405429)
(t (/fl 1.0
(+fl 1.0
(*fl p
x))))
(erfcx (*fl
(*fl t
(+fl a1
(*fl t (+fl a2
(*fl t
(+fl a3
(*fl t
(+fl a4
(*fl t a5)))))))))
(exp (- (expt x 2.0))))))
erfcx))

(define (do-it)
(let ((erg 0.0))
(do ((i 0 (+fx i 1)))
((=fx i 1000000))
(set! erg (+fl erg (erfc 0.456))))
erg))

(print (do-it))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; C ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#include <stdio.h>
#include <math.h>

double erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;

p = 0.3275911;

a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;

t = 1.0 / (1.0 + p*x);

erfcx = ( (a1 + (a2 + (a3 +

(a4 + a5*t)*t)*t)*t)*t ) * exp(-pow(x,2.0));

return erfcx;
}

int main()
{
double erg=0.0;
int i;

for(i=0; i<1000000; i++)
{
erg += erfc(0.456);
}
printf("%f",erg);

return 1;
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

### Siegfried Gonzi

da leggere,
4 ago 2003, 03:47:1204/08/03
a
Siegfried Gonzi wrote:

Only for the sake of completeness. My last version had a minor
bottleneck. The following Bigloo version is as fast as C .

Suming up your erfc function 10^6 times:

new version: 0.5 seconds
C version: 0.5 seconds

As I wrote giving types in Bigloo renders code effectively. It has also
the advantage that the compiler becomes very picky and spots a lot of
type errors; Bigloo is as OCaml for example then. If you do not give
types Bigloo is as forgiving as what you would assume from Scheme.

S. Gonzi
===== NEW VERSION =====

(module erfc2
(option (set! *genericity* #f)))

(define (erfc::double x::double )
(let* (
(x::double (exact->inexact x))
(p::double 0.3275911)
(a1::double 0.254829592)
(a2::double -0.284496736)
(a3::double 1.421413741)
(a4::double -1.453152027)
(a5::double 1.061405429)
(t::double (/fl 1.0

(+fl 1.0
(*fl p
x))))

(erfcx::double (*fl

(*fl t
(+fl a1
(*fl t (+fl a2
(*fl t
(+fl a3
(*fl t
(+fl a4
(*fl t a5)))))))))

(exp (negfl (*fl x x))))))
erfcx))

(define (do-it)
(let ((erg::double 0.0))

(do ((i 0 (+fx i 1)))
((=fx i 1000000))
(set! erg (+fl erg (erfc 0.456))))
erg))

(print (do-it))
======================

### Alex Martelli

da leggere,
4 ago 2003, 04:25:3204/08/03
a
Siegfried Gonzi wrote:
...

> I am sure you know what you are after, but Python for numerical
> computations is more or less crap. Not only has Python a crippled

...Note for the unwary...:
It's interesting that "gonzi", in Italian, means "gullible" (masculine
plural adjective, also masculine plural noun "gullible individuals") --
this may or may not give a hint to the purpose of this troll/flamewar:-).

Anyway, a challenge is a challenge, so:

> times. Also included a C program. The timings on a simple Celeron
> laptop are as follows:
>
> g++ -O3 erf.c : 0.5 seconds
> bigloo -Obench erf.scm: 1.1 seconds

Never having been as silly as to purchase such a deliberately crippled
chip as a Celeron, I cannot, unfortunately, reproduce these timings --
I will, however, offer my own timings on a somewhat decent CPU, an old
but reasonably well-configured early Athlon. No bigloo here, so I
started with the C program:

> ;;; C ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> #include <stdio.h>
> #include <math.h>
>
>
> double erfc( double x )
> {
> double p, a1, a2, a3, a4, a5;
> double t, erfcx;
>
> p = 0.3275911;
> a1 = 0.254829592;
> a2 = -0.284496736;
> a3 = 1.421413741;
> a4 = -1.453152027;
> a5 = 1.061405429;
>
> t = 1.0 / (1.0 + p*x);
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * exp(-pow(x,2.0));
>
> return erfcx;
> }
>
> int main()
> {
> double erg=0.0;
> int i;
>
> for(i=0; i<1000000; i++)
> {
> erg += erfc(0.456);
> }
> printf("%f",erg);
>
> return 1;
> }
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

saved this to gonzi.c, compiled it and ran/timed it:

[alex@lancelot swig_wrappers]\$ gcc -O -o gonzi gonzi.c -lm
[alex@lancelot swig_wrappers]\$ time ./gonzi
519003.933831Command exited with non-zero status 1
0.37user 0.01system 0:00.39elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (107major+13minor)pagefaults 0swaps

(that silly "return 1" must be connected to some in-joke or other...
why's the program unconditionally returning an error-indication!?-)]

Then I "converted" it to Python, a trivial exercise to be sure:

import math

def erfc(x):
exp = math.exp

p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429

t = 1.0 / (1.0 + p*x)
erfcx = ( (a1 + (a2 + (a3 +

(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
return erfcx

def main():
erg = 0.0

for i in xrange(1000000):
erg += erfc(0.456)

print "%f" % erg

if __name__ == '__main__':
main()

and ran that:

[alex@lancelot swig_wrappers]\$ time python gonzi.py
519003.933831
5.32user 0.05system 0:05.44elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (473major+270minor)pagefaults 0swaps

then I put a huge effort into optimization -- specifically, I inserted
after the "import math" the two full lines:

import psyco
psyco.full()

[must have taken me AT LEAST 3 seconds of work, maybe as much as 4] and
ran it again:

[alex@lancelot swig_wrappers]\$ time python gonzi.py
519003.933831
0.15user 0.02system 0:00.24elapsed 69%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (547major+319minor)pagefaults 0swaps

There -- optimized Python over twice as fast as optimized C in terms of
user-mode CPU consumption, almost twice as fast in terms of elapsed time
(of course, using programs as tiny and fast as this one for such purposes
is not clever, since they run so fast their speed is hard to measure,
but "oh well"). No need to declare types, of course, since psyco can
easily infer them.

Of course, there IS a "subtle" trick here -- I write x*x rather than
pow(x,2.0). Anybody whose instincts fail to rebel against "pow(x,2.0)"
(or x**2, for that matter) doesn't _deserve_ to do numeric computation...
it's the first trick in the book, routinely taught to every freshman
programmer before they're entrusted to punch in the first card of their
first Fortran program (or at least, so it was in my time in university,
both as a student and as a professor).

Correcting this idiocy in the C code (and removing the return 1 since
we're at it -- return 0 is the only sensible choice, of course:-) we
turn C's performance from the previous:

[alex@lancelot swig_wrappers]\$ time ./gonzi
519003.933831Command exited with non-zero status 1
0.37user 0.01system 0:00.39elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (107major+13minor)pagefaults 0swaps

to a more respectable:
[alex@lancelot swig_wrappers]\$ time ./gonzi
519003.9338310.20user 0.00system 0:00.19elapsed 103%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (105major+13minor)pagefaults 0swaps

(the "103%" of CPU being a funny artefact of my Linux kernel, no doubt).

So, about HALF the elapsed time was being squandered in that absurd call
to pow -- so much for Mr Gonzi's abilities as a numerical programmer:-).

With the program having been made decent, C's faster than optimized Python
again, not in CPU time (0.20 all usermode for C, 0.15 usermode + 0.02
systemmode for Python) but in elapsed time (0.19 for C vs 0.24 for
Python) -- the inversion being accounted for in psyco's lavish use of
memory and resulting pagination.

Still, the performance ratio of psyco-optimized Python vs C on my
machine is much better than that of bigloo vs C on Mr Gonzi's Celeron,
and with no need for type declarations either. So much, then, for
Mr Gonzi's claims about Python being "more or less crap"!-)

Alex

### Siegfried Gonzi

da leggere,
4 ago 2003, 05:14:2604/08/03
a
Alex Martelli wrote:

> Still, the performance ratio of psyco-optimized Python vs C on my
> machine is much better than that of bigloo vs C on Mr Gonzi's Celeron,
> and with no need for type declarations either. So much, then, for
> Mr Gonzi's claims about Python being "more or less crap"!-)

Oh man. I forget to tell if you comment out

(x (exact->inexact x))

the Bigloo version calculates things in 0.25 seconds (as compared to 0.5
seconds of the C version).

Okay, a lot of people have learned a lot from your post. Maybe they can
use some techniques for their own Python programming.

But still Python is crap. Sorry to tell you the truth. It is not only
speed. Look if I do not use types in Bigloo it is 10 times slower than
C. So performance is not everything; at least for me.

But Mr. Martelli your post was very funny. It is always a pleasure for
me to make loosers upset (yes I know your contribution to Python is huge
especially because you are book author). Oh, man I always thought such
individuals are just in the CommonLisp community.

If you are pragamatic Python is okay and has a lot to offer.

S. Gonzi

### Alex Martelli

da leggere,
4 ago 2003, 07:19:0704/08/03
a
Siegfried Gonzi wrote:

> Alex Martelli wrote:
>
>> Still, the performance ratio of psyco-optimized Python vs C on my
>> machine is much better than that of bigloo vs C on Mr Gonzi's Celeron,
>> and with no need for type declarations either. So much, then, for
>> Mr Gonzi's claims about Python being "more or less crap"!-)
>
> Oh man. I forget to tell if you comment out
>
> (x (exact->inexact x))

...presumably avoiding floating-point in favour of rational...

> the Bigloo version calculates things in 0.25 seconds (as compared to 0.5
> seconds of the C version).

floating-point hardware -- there's no other explanation as to why
(software-implemented) rational arithmetic could outperform (hardware-
implemented) float arithmetic.

> Okay, a lot of people have learned a lot from your post. Maybe they can
> use some techniques for their own Python programming.

Particularly given that the Python-specific "techniques" amount (in all)
to using a good optimizer (aka just-in-time compiler), psyco, I'm quite
confident that many people can indeed use that. The idea of writing
x*x instead of pow(x, 2.0) is just as good in C (as I've shown -- just
that one change speeds up the C program by about two times) or Fortran
(as I mentioned, it was for Fortran, over a quarter of a century ago,
that I first learned of it) as just about any other language, so it's
no doubt going to be even more widely applicable.

> But still Python is crap. Sorry to tell you the truth. It is not only
> speed. Look if I do not use types in Bigloo it is 10 times slower than
> C. So performance is not everything; at least for me.

Python is demonstrably more productive than lower-level languages (such
as C) for application development, and demonstrably more acceptable to
most programmers than languages of similar semantic level but with prefix
rather than infix syntax. Therefore, your repeated flame-bait assertion
that "still Python is crap", now totally unsupported, has sunk to abysmal
levels (I don't see what more you could do to further lower your
credibility).

You've shown how good you are at numerical programming by coding
"pow(x,2.0)" where any decently experienced coder would have coded
"x*x" and thereby slowing down a whole C program by a factor of two. Now
you show that your ability in evaluating programming languages is
on a par with your ability in numerical programming -- unsurprising.

> But Mr. Martelli your post was very funny. It is always a pleasure for
> me to make loosers upset (yes I know your contribution to Python is huge
> especially because you are book author). Oh, man I always thought such
> individuals are just in the CommonLisp community.

Is a "looser" somebody who is more loose (less uptight) than you, or
somebody who looses (as in, eases, facilitates) things? Or is your
ability at spelling on a par with your ability at numerical programming
and language evaluation? In the latter case, you're surely reaching
for an interesting "across the board" overall score.

My contributions to Python have been pretty minor -- writing books is
nice, sure, but of course it can't compare with the real contributions
of the "core" Pythonlabs developers and those of many others. Nor did
you make me _upset_ -- disgusted, yes, because I'm an optimist and like
human beings in general, and it's always sad to see such a sad specimen
as you attempt to pass itself off as a human being. As for "such
individuals" (such as?) being "just in the Common Lisp community", what
would that be describing? Was your working hypothesis that, when you
post to a newsgroup about language X a flamebait about X being crap,
accompanied by technically faulty examples in non-X languages, somebody
will point out your idiocy only if X == "Common Lisp"? If so, then your
understanding of Usenet is on a par with your ability at numerical
programming, and so on -- truly, truly a sad ersatz-human being there.

> If you are pragamatic Python is okay and has a lot to offer.

More fodder for the "he spells as well as he programs" theory, I see.
Not to mention that "is crap" and "is okay" are compatible only in
the field of natural fertilizers (a field I heartily recomment you
turn to, given your proven abilities in spewing male bovine exrement).

Alex

### Siegfried Gonzi

da leggere,
4 ago 2003, 07:57:1004/08/03
a
Alex Martelli wrote:

> floating-point hardware -- there's no other explanation as to why
> (software-implemented) rational arithmetic could outperform (hardware-
> implemented) float arithmetic.

Sorry I know that the Celeron is very poor at floating point
calculations. My stationary Pentium II 450 Mhz at my university is as
fast as my 1000 MHz Celeron.

So what and who cares?

> Particularly given that the Python-specific "techniques" amount (in all)
> to using a good optimizer (aka just-in-time compiler), psyco, I'm quite
> confident that many people can indeed use that. The idea of writing
> x*x instead of pow(x, 2.0) is just as good in C (as I've shown -- just
> that one change speeds up the C program by about two times) or Fortran
> (as I mentioned, it was for Fortran, over a quarter of a century ago,
> that I first learned of it) as just about any other language, so it's
> no doubt going to be even more widely applicable.

Maybe you cannot read. But I wrote in my first post that I blindly "copy
and pasted" the original code and that I didn't check wehter (* x x) is
faster. This was not the point.

Again, my intention simply was: there is more than just Python. People,
if their time budget allows it, should investigate those alternatives.
If they think Scheme is shit oh Lord nobody cares and nobody will impede
them in going/comming back to Python.

Sorry, Alex you are one of the idiots and bigots who believe that Python
is the best since sliced bred and butter.

When I was younger it happend that I was a similar idiot and believed
that "functional programming" in all its glory will salvage this world.

For me Python is crap and not any longer worth to consider. But look, I
even introduced Python at our institution some month ago. A colleague
asked me and I showed him Scheme. He didn't like the parantheses and I
adviced him to buy some Python books. Indeed he did it. What is
important here: he is happy that I am not any of this idiots who falls
for and believes that his own programming language is the best thing.

I will stop this discussion because I am used to discuss with people who
can demonstrate to have at least a minimal amount of brain.

S. Gonzi

### Alex Martelli

da leggere,
4 ago 2003, 09:09:2104/08/03
a
Siegfried Gonzi wrote:
...

> Maybe you cannot read. But I wrote in my first post that I blindly "copy
> and pasted" the original code and that I didn't check wehter (* x x) is
> faster. This was not the point.

I can read better than you can write (a criterion that is not hard to
meet). You used neither the phrase "copy and paste" NOR the adverb
"blindy" in your "first post" (to this thread) -- liar. What you wrote
was quite different and "bigloo-specific":

"""
The Bigloo code could be a bit improved because I do not know wheter
using (expn x 2.0) introduces a bottleneck in Bigloo. If there are only
"""

Apart from your creative innovation in mis-spellings of the poor innocent
word "whether", it's clear that you do not know whether raising-to-the-
power-of-two is importantly slower than multiplying-by-itself in just
about ANY programming language -- what I keep pointing out is that the
_C_ code would surely be improved by this tiny change (it speeds up by
JUST two times on my machine... trifles, surely...?). This shows you're
not an experienced coder of numerically intensive codes, period.

> Again, my intention simply was: there is more than just Python. People,

Nobody ever denied that. Python's own interpreter is coded in C (though
in the pypy project we're trying to do something about that;-), so it's
obvious to everybody that "there is more than just Python".

Python just happens to be the best overall choice for application-level
programming (as opposed to, say, operating system kernels, device drivers,
and the like) in most situations. This fact stands in sharp contrast to
your previous idiotic assertion that "it's crap" (currently softened to
"there is more than just it", I see) -- interestingly accompanied in your
own previous posts by the contradictory assertion that "it's okay" (if,
that is, one is pragmatic).

> if their time budget allows it, should investigate those alternatives.
> If they think Scheme is shit oh Lord nobody cares and nobody will impede
> them in going/comming back to Python.

If they have nothing better to do with their time, investigating a huge
variety of programming languages surely does appear a neat idea to me
(but then, like many other computer enthusiasts, I do like programming
languages for their own sake -- most users consider learning such tools
a price to be paid in order to be able to solve their problems, so their
evaluation of your suggestion may differ from mine). In the galaxy of
programming languages, Scheme is definitely an interesting one (the
"stalin" compiler, back when I last tried several extensively, seemed
even faster than the "bigloo" one, but the underlying language is still
Scheme in both cases), particularly thanks to the excellent book "SICP"
(now freely available on the net, and very, very instructive). But if
you do want to suggest to people that they explore other alternatives,
you'll find out that starting out by describing their current choice
as "crap" is definitely not the best way to "make friends and influence
people". Your skill at that appears to match that at numerically
intensive programming, spelling, and other fields already mentioned.

> Sorry, Alex you are one of the idiots and bigots who believe that Python
> is the best since sliced bred and butter.

Assuming "bred" is meant to stand for "bread" (rather than being the
past form of the verb "to breed", which is what one might normally
assume;-) -- I don't particularly care for pre-sliced bread, nor for
putting butter on my bread (I'm Italian, remember?). I do believe
that Python is currently the best alternative for application-level
programming, but you have not shown any support for your insults
against me on this score (nor for just about any of your assertions).

I know Scheme (and many dozens of other languages), and use the most
appropriate one for each task -- these days, my opinion (supported by
very substantial evidence) is that Python happens to be the most
appropriate language for most tasks I tackle (except that C keeps
being preferable in several cases, when I work on lower-level code).
Since my choices are (obviously -- I'm an engineer!) "pragmatic"
by your own previous admission my choice is OK. So how can that
choice, by itself, identify me as an idiot and a bigot? Thus, your
insults are not just unjustified and unsupported -- they're actually
self-contradictory with respect to other recent assertions of yours.
In other words, you keep justifying and reinforcing my opinion that
you're a sad travesty of a human being.

> When I was younger it happend that I was a similar idiot and believed
> that "functional programming" in all its glory will salvage this world.

Oh, and were you convinced you could spell, too?

Functional Programming is a very neat mental toy, and maybe one day
I'll come upon a real-world reason to USE it for production code. I'm
not holding my breath, though. My opinion that Python is the best
currently available language for most tasks isn't based on some kind
of preconceived evaluation: it's based on real-world experience over
a huge variety of programming tasks, which I have, over the decades,
tackled using a similarly huge variety of languages, in some cases by
myself, in most cases as a part of a team of programmers. I've seen
excellent programmers just fail to GET such (intrinsically excellent)
languages as Lisp and Scheme variants of all sorts, Prolog and its ilk,
functional programming languages -- try and fail to get any real
productivity with them. I've seen excellent programmers dive head-first
into complex higher-level languages such as Perl, and come up with huge
masses of unmaintainable code. SOME people can no doubt make very
effective use of each and every one of these tools, but experience has
shown me that they just don't generalize well. For other higher level
languages, such as Rexx, Icon, Python, and Ruby, my experience suggests
instead that programmers (including both experienced ones and ones with
less experience) tend to become very productive with them, very fast,
and write good code, suitable for team-work and maintenance. Out of
these, Python is currently my choice (and Ruby a close second, but there
are aspects of Ruby -- signally its "dynamic-all-the-way" nature -- that
suggests to me that, while it may be even better than Python for such
purposes as "tinkering", it's intrinsically inferior for the purpose of
building very large and complex system which will need to be maintained
for many years by a large variety of people) -- in part for pragmatical
reasons, and I see nothing shameful in that.

> For me Python is crap and not any longer worth to consider. But look, I

If so, then hanging out on this newsgroup shows your ability to manage
your own time in doing things of interest to you is on a par with the
several other intellectual skills we've already mentioned.

> even introduced Python at our institution some month ago. A colleague

If you DO think it's crap, as you repeatedly said, then such
behavior is inexcusable -- you deliberately _damaged_ your "institution"
(once a common euphemism for "mental hospital", you know...) by
introducing there what you consider "crap". Again, the amount of
very sad, depending on one's viewpoint.

> asked me and I showed him Scheme. He didn't like the parantheses and I
> adviced him to buy some Python books. Indeed he did it. What is
> important here: he is happy that I am not any of this idiots who falls
> for and believes that his own programming language is the best thing.

Python is not "my own" programming language (if anybody's, it's Guido
van Rossum's) -- it's simply the one I currently consider as the best
available for most production tasks in application development. I, in
turn, have repeatedly advised (note the correct spelling) people who
came to this newsgroup in order to ask for changes to Python, to take
up other languages instead (e.g., Scheme, Dylan, or Common Lisp, if
they're *really* keen to have powerful macros in their language). The
difference, of course, is that I do not consider such languages to be
"crap", as you have stated about Python -- I would not damage another
human being by suggesting he use something I consider to be "crap".

> I will stop this discussion because I am used to discuss with people who
> can demonstrate to have at least a minimal amount of brain.

We'll see if you're a liar once again (wouldn't be surprising, considering
the amount of lies we've already seen in a few posts of yours) or if for
once you keep your word. Your stated reason for "stopping this discussion",
of course, doesn't hold water -- you've already repeatedly (if grudgingly)
conceded some of my points (indeed, you've explicitly said you thought one
of my posts would be useful to others) and therefore you've already,
unwittingly to be sure!, demonstrated that you admit I do have "at least
a minimal amount of brain". But so what -- lies, obfuscation, muddled
thinking, insults, confusion, self-contradition -- I've seen nothing but
that in your posts; you appear to think as well as you spell, which IS
saying something. So long (hopefully)...

Alex

### Alan Kennedy

da leggere,
4 ago 2003, 09:44:0404/08/03
a
[Siegfried Gonzi wrote]

> For me Python is crap and not any longer worth to consider.

Then move on, and give us all a rest.

--
alan kennedy
-----------------------------------------------------
email alan: http://xhaus.com/mailto/alan

### Gerhard Häring

da leggere,
4 ago 2003, 10:27:1604/08/03
a
Alex Martelli wrote:
> [nice flame, fun to read]

I interpreted Mr. Gonzi's statement "Python is crap" to be meant for
numerical programming. Which I can support, unless you use third-party
libraries like Numeric, psyco or others.

Perhaps it was a misunderstanding and you two can calm down now.

-- Gerhard

### Alex Martelli

da leggere,
4 ago 2003, 11:13:4304/08/03
a
On Monday 04 August 2003 04:27 pm, Gerhard Häring wrote:
> Alex Martelli wrote:
> > [nice flame, fun to read]

> I interpreted Mr. Gonzi's statement "Python is crap" to be meant for
> numerical programming. Which I can support, unless you use third-party
> libraries like Numeric, psyco or others.

<shrug> that's like saying, say, that "C is crap" for multimedia
programming... unless you use third-party libraries like SDL. Of _course_
you'll use the appropriate tools for the kind of applications you're writing
(what languages, in turn, such tools/libraries are implemented in, is quite
secondary -- if and when the pypy project is done, it will not change the
nature of Python programming, even though extensions may well at that
time have to be similarly reimplemented in Python).

Using psyco is as hard as inserting _two_ statements into your code.

You may call it "using a library", it that floats your boat, but to me it
feels much closer to using 3rd-party optimizer tools for Fortran matrix
computations (I still remember the early-90's release of such a tool as
totally busting the "specmarks"...), using a jit-enabled JVM rather than
a non-jitting one for Java, etc. It doesn't change the way you write
your code, as "using a library" normally implies -- it just makes your
code faster (if you're lucky;-). How does that differ from, say, using a
newer and better optimizer as part of your favourite compiler for any
given language? Why is it crucial to you whether such an optimizer is
"third-party" or rather sold by the same supplier as your base compiler?

If a language "is crap" as long as a needed optimizer is supplied by
a third party, then what would the magical process be that would
suddenly make it "non-crap" if the base compiler's seller bought out
the optimizer-selling company and released the oprimizer itself? How
would such a purely commercial operation change the language from
"crap" into "non-crap"?-)

I'm afraid these observations suggest you may not have thought the
issues through.

> Perhaps it was a misunderstanding and you two can calm down now.

I'm quite calm, thanks (the flames being as fun to write as they are to
read:-). And Mr Gonzi has asserted he won't take further part in the
discussion, so, unless he wants to show himself up as an outright liar (not
for the first time), I guess your recommendations can't affect him now.

Glad to hear this! I'll no doubt flame again when the occasion should
arise (I _am_ quite prone to doing that, as is well known).

Alex

### Gerhard Häring

da leggere,
4 ago 2003, 12:27:2104/08/03
a
Alex Martelli wrote:
> On Monday 04 August 2003 04:27 pm, Gerhard Häring wrote:
>>I interpreted Mr. Gonzi's statement "Python is crap" to be meant for
>>numerical programming. Which I can support, unless you use third-party
>>libraries like Numeric, psyco or others.
>
> [...] Using psyco is as hard as inserting _two_ statements into your code.
>
> You may call it "using a library", [...]

> I'm afraid these observations suggest you may not have thought the
> issues through.

I'm well aware of what Psyco is and was even aware that 'library' was
not really describing it very well while typing my post (though you
'use' it as a library from within your Python programs!). I should have
been less lazy and replaced 'library' with 'addon' ;-)

-- Gerhard

### Michele Simionato

da leggere,
4 ago 2003, 15:06:3904/08/03
a
Alex Martelli <al...@aleax.it> wrote in message news:<0MoXa.21930\$cl3.8...@news2.tin.it>...

> import math
>
> def erfc(x):
> exp = math.exp
>
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*x)
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
> return erfcx
>
> def main():
> erg = 0.0
>
> for i in xrange(1000000):
> erg += erfc(0.456)
>
> print "%f" % erg
>
> if __name__ == '__main__':
> main()
>

Just for fun, I tested your scripts on my machine and actually my
results also substain the (heretic?) argument that Python (with psyco)
is *faster* than C for numeric computations:

Time Relative to (optimized) C

python+psyco 0.41s 0.59
c (-O option) 0.69s 1
c (normal) 0.80s 1.15
pure python 17.8s 26

This means that (in this computation) normal Python is 26 times
slower than optimized C; however Psyco gives a 43x speed-up and it
ends up to be 70% faster!

It took my breath away.

Kudos to Armin Rigo and Psyco!

Michele

P.S. Red Hat Linux 7.2, 500 GHz PIII, time measured with 'time',
Python 2.3 and Psyco 1.0, Alex Martelli scripts (with pow(x,2)-> x*x).

### Steve Hutton

da leggere,
5 ago 2003, 23:58:3205/08/03
a
In article <0MoXa.21930\$cl3.8...@news2.tin.it>, Alex Martelli wrote:
> Siegfried Gonzi wrote:
> ...

>
>> g++ -O3 erf.c : 0.5 seconds
>> bigloo -Obench erf.scm: 1.1 seconds
>
> [alex@lancelot swig_wrappers]\$ gcc -O -o gonzi gonzi.c -lm
> [alex@lancelot swig_wrappers]\$ time ./gonzi

To be fair, using -O3 here instead of -O buys around a 15%
performance gain for me, presumably due to inlining.

Steve

Rispondi a tutti
Rispondi all'autore
Inoltra
0 nuovi messaggi