Back in the day, when looking at an interpreted language (or even
compiled ones) the first thing I would ask is, "how fast is it?"
These days, with 1ghz processor machines selling for < $500, it seldom
comes up as an issue. And of course in Py's case you can always
'extend and embed' your core routines for fun & profit.
However, there are definitely cases where a lot of code would need to
be optimized, and so I ask the question: How fast is Python, compared
to say a typical optimizing C/C++ compiler?
I realize this is a more complex question than one might think. There
are various types of code constructs that might end up with different
efficiency issues. I guess what I'm asking is, in a general sense,
how fast is it now for typical code sequences, and -- importantly --
what could be done to optimize the interpreter? Are any parts written
in assembly? Could things like hash tables be optimized with parallel
units such as MMX? Etc.
Profile & code slow parts as C extensions.
Include your own assembly there if so desired.
Investigate Psyco. There was one example on this
newsgroup that showed that Python+psyco actually
outperformed the same program in compiled C.
Highly dependent on context. I use factor of 10-20 as a ballpark,
with factor of 100 for some things like low-level string processing.
Eg, I've got a pure Python regexp engine which clocks at about x80
slower than sre.
> what could be done to optimize the interpreter? Are any parts written
> in assembly? Could things like hash tables be optimized with parallel
> units such as MMX? Etc.
Spend a few tens of millions on developing just-in-time compilers
and program analysis. That worked for Java.
Nothing is written in assembly, except that C can be considered
a portable assembly language. Otherwise ports to different platforms
would be a lot more difficult.
I would hope that the C compiler could optimize the C code
sufficiently well for the hardware, rather than tweaking the
code by hand. (Though I know of at least one person who sent
in a patch to gcc to optimize poorly written in-house code.
Rather circuitous way to fix things, but it worked.)
This are my advice as well. Especially use the profiler and change your
high level algorithms. You will find a lot with hidden quadratic behavaviour
which slow down your program when it comes to high volume.
Psyco will generally speed up 2. This is fine (I use it!) but not a break
through. There may be cases where it performs better.
A bottleneck can be Tkinter. Use something different then (Qt, wx)..
The extension modules run at optimized C speed because they *are*
For pure python applications, Psyco can provide just-in-time native
Tim once said that anything written using Python's dictionaries are
zillions of times faster that anything else. There is a grain of truth
in that because Python makes it so easy to create efficient data
structures that their performance can surpass less data
structures written in assembly or C.
All that being said, Python is designed for those who value
programmer time more than they value clock cycles.
I think (but will gladly stand corrected if I'm wrong!) that
this is a misinterpretation of some code I posted -- the
C code (crazily) used pow(x,2.0), the Python one (sanely)
x*x -- within a complicated calculation of erf, and that
one malapropism in the C code was what let psyco make
faster code than C did. With C fixed to use x*x -- as any
performance-aware programmer will always code -- the
two ran neck to neck, no advantage to either side.
>>Investigate Psyco. There was one example on this
>>newsgroup that showed that Python+psyco actually
>>outperformed the same program in compiled C.
> I think (but will gladly stand corrected if I'm wrong!) that
> this is a misinterpretation of some code I posted -- the
> C code (crazily) used pow(x,2.0), the Python one (sanely)
> x*x -- within a complicated calculation of erf, and that
> one malapropism in the C code was what let psyco make
> faster code than C did. With C fixed to use x*x -- as any
> performance-aware programmer will always code -- the
> two ran neck to neck, no advantage to either side.
Whoops, I missed that :) Thanks for the clarification.
Nevertheless, a Psyco-optimized piece of Python code
that runs as fast as compiled C is still very impressive
to me. I know that JIT compiler technology theoretically
could produce better optimized code than a static optimizing
compiler, but am happy already if it reaches equal level :-)
--Irmen de Jong
If anybody does have an actual example (idealy toy-sized:-)
where psyco's JIT does make repeatably faster code than a
C compiler (well-used, e.g. -O3 for gcc, NOT just -O...!-)
I'd be overjoyed to see it, by the way.
> A bottleneck can be Tkinter. Use something different then (Qt, wx)..
I've found wx to be way slower than Tkinter.
On a P133 running Win98, a McMillan-compiled prog using wx took twice as
long to start up as a similar prog implemented in Tkinter.
I did a benchmark some time ago (nothing optimised):
The purpose of this technical report is to gauge the relative speed of
the languages: VB, VBA, Python 2.2, C++, and Fortran.
It was discovered that uncompiled VB code in VB 6.0 ran at the same
speed as VBA code in Excel. It was half the speed of compiled VB code,
5 times the speed of Python, and 1/20th the speed of C++/Fortran.
The following algorithm was implemented in each of the target
X = 0.5
For I = 1 to 108
X = 1 – X* X
Timings were made for the execution. The following results were
Language Timing (seconds)
VB – uncompiled 74
VB – compiled 37
VBA – Excel 75
C++ - debug version 4
C++ - release version 3
The timings for Fortran are approximate. The execution time had to be
timed with a stopwatch because timing functions could not be
On the strength of this thread, I investigated Psyco. Results of a
very quick investigation with the following program:
pi4 = 1.0
for i in xrange(1, iterations):
denominator = (4*i)-1
pi4 = pi4 - 1.0/denominator + 1.0/(denominator+2)
return pi4 * 4.0
def timethis(func, funcName):
i = int(sys.argv)
i = 1000000
start = time.time()
pi = func(i)
end = time.time()
print "%s calculated pi as %s in %s seconds" % (funcName, pi, end
speedyPi = psyco.proxy(calcPi)
if __name__ == '__main__':
produced the following results on a 1.7GHz P4 running FreeBSD 4.8:
C is roughly 10 to 100 times faster than Python, though of course it's
easy to find cases outside of this range, on either side.
I use 30 as a general overall rule of thumb, in the exceptionally
few cases where it seems relevant how much faster C would be.
And in those very few cases, so far, I have consistently concluded
I'm happy enough with the speed of Python given that the speed of
*development* in Python is easily 5 to 10 times faster than the
speed of development in C. (And again, it's easy to find cases
outside of this range, on either side...)
> It was discovered that uncompiled VB code in VB 6.0 ran at the same
> speed as VBA code in Excel. It was half the speed of compiled VB code,
> 5 times the speed of Python, and 1/20th the speed of C++/Fortran.
Although, as the saying goes, there's no such thing as a slow language
- only slow implementations.
Interesting. One wonders what and where you measured, e.g:
[alex@lancelot gmpy]$ cat a.cpp
double X = 0.5;
for(int i = 0; i < 108; i++)
X = 1 + X * X;
[alex@lancelot gmpy]$ g++ -O3 a.cpp
[alex@lancelot gmpy]$ time ./a.out
0.01user 0.00system 0:00.00elapsed 333%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (186major+21minor)pagefaults 0swaps
i.e., it's just too fast to measure. Not much better w/Python...:
[alex@lancelot gmpy]$ cat a.py
X = 0.5
for i in xrange(108):
X = 1 + X*X
[alex@lancelot gmpy]$ time python -O a.py
0.03user 0.01system 0:00.15elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (452major+260minor)pagefaults 0swaps
i.e., for all we can tell, the ratio COULD be 100:1 -- or just about
anything else! Perhaps more details are warranted...
Consider the percentage of software projects for which the total
number of hours of developer time over the life of the project
exceeds the total number of hours of CPU run time during productive
use of the software produced. This percentage is abysmally high.
Python works on improving it on both ends, by both reducing the
developer time and increasing the number of hours of productive
use. What more could you want?
Though there's also pleac.sf.net which isn't for timings
but does show how the different languages would be
used to do the same thing.
And I see my Python contribution still leads the
pack in % done.
Of course! The wx DLL is mor ethan 6 MB whilest Tcl/Tk still keeps around 1.
I am not talking about start up. When you have ever used a Canvas with a
600x800 Image oder with a thousend items or a TIX HList with a dozend
diffently styled columns you might know WHAT I am talking about.
Even with less filled widgets, most of what you perceive as "lazy" with e.g.
games is generally not the Python but the Tcl interpreter. Pygame shows that
you can dio fast visualisation with Python.
> produced the following results on a 1.7GHz P4 running FreeBSD 4.8:
> > python2.2 pi.py
> calcPi calculated pi as 3.14159315359 in 3.87623202801 seconds
> speedyPi calculated pi as 3.14159315359 in 0.790405035019 seconds
> -- Neil
This is certainly correct. My experiance with more general programs running
for a few minutes shows that you can expect a speed-up of two. This is still
impressiv when you have your results in 5 instead of 10 minutes..
In a way this comes back to practicality vs. purity. In a synthetic
benchmark where one function is called repeatedly with homogeneous data,
it's hard to imagine that a JIT compiler could ever outperform a good
optimizing C compiler. But that's the pure side of a performance
analysis. The practical side is how that function performs in a real
application where a JIT for a dynamically typed language has much more
information to work with than a C compiler does.
For example, a C compiler might only know that you declared a parameter
as a double and so it can only optimize for that. If you happen to call
the function more often than not with an int (that gets promoted to a
double on the way in) then the compiler generated code may waste a good
deal of time doing floating point arithmetic rather than integer
arithmetic. Now the programmer might take the time to profile their C
program and hopefully notice that time is being wasted doing floating
point arithmetic and then create an int version of their function, but
often practical constraints will get in the way of this happening.
This brings back memories of the old arguments about optimizing C
compilers being able to generate faster code that hand written assembly.
Of course an expert at assembly could write a faster program given
enough time, but most people didn't have the time or expertise to write
assembly code that could perform as well as optimized C once the
compilers attained a certain level of sophistication. In many practical
situations C is faster than assembly. PyPy is exciting because it
presents hope of providing the JIT with enough extra information and
flexibility that it may be able to make practical Python code outperform
practical C code in many cases.
The example above involving double/int was not just an example. It
happened in an application of mine a while back. I replaced a function
in a C extension with a Psyco-compiled Python version of the same
function and the performance of the part of my application that used the
function doubled. I posted a couple of notes about it. The first is
about a test in isolation (the pure test) and C was faster:
http://tinyurl.com/kpgd (includes Python code and links to C code)
The second note came later after I decided the slightly slower
Python/Psyco version was fast enough to eliminate the headaches of
maintaining the C extension. After replacing the C code I was startled
by a performance improvement in my application. This is the about the
And finally a bit of a caveat:
I've looked at Psyco and Pyrex, I think both are interesting projects
but I doubt anything in the Py world has had nearly the kind of
man-hours devoted to optimization that Java, C++, and probably C# have
... and anyway, this modified code (which does actually compute X when
compiled on my system) aborts with a floating point overflow error.
As far as I can tell, your program would be computing a value on the
order of 10^(3x10^31)...
Ah, the joy of writing the proverbial good benchmark.
fpu_control_t __fpu_control = _FPU_IEEE &~ _FPU_MASK_OM;
double X = 0.5;
for(int i = 0; i < 108; i++)
X = 1 + X * X;
Y = X;
> for(int i = 0; i < 108; i++)
> X = 1 + X * X;
He had 1-X*X. Since X starts at 0.5, this will never go leave
the range 0 to 1.
Oh, I completely misinterpreted the question then. I thought you wanted
_In principle_, (which I'll interpret as "in theory"), Python can be made
to run even faster than C or C++.
In practice, nobody has been able to prove or disprove that theory yet...
>How fast is Python, compared
>to say a typical optimizing C/C++ compiler?
The most important time for me is the time *I* invest in a program,
since when it's run-time, I can always do other stuff while some slave
computer follows my orders. So, I'll reply only about development time
and I'll quote the Smiths: "How Soon Is Now?" :)
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.
Actually, as I posted in the C sharp thread of few weeks ago, on my
machine psyco+psyco was FASTER than C. The numbers quoted are for C
option -o, but even for -o3 psyco was still faster and, notice, with
pow(x,2) replacedby x*x in C too. I would be happy if somebody can
reproduce that. Here is the link:
Andrew gave the same quantities, incidentally. Myself,
I use ten "as a general over-all rule of thumb", and
expect generally to be in the three-to-thirty range. I
know other programmers whose Python work consistently
runs about one-one-hundredth as fast as the C equivalent.
As near as I can tell, that reflects on the kinds of
programming we do (how numeric, and so on), rather than
the quality of our coding.
A bit off-topic perhaps, but I'd be interested in the details of
Steven Taschuk o- @
stas...@telusplanet.net 7O )
Okay. I know someone who really likes optimized programming.
The kind of person who will develop an in-memory compiler
to generate specialized assembly for the exact parameters used,
thus squeezing out a few extra cycles. He works in a C++ company.
They used an idiom, the details of what I don't know. Most
people wouldn't use that idiom because it didn't translate well
to assembly, but the compiler in theory could figure it out. He
submitted a patch to do that optimization. It was originally
rejected because they couldn't see that anyone would write
code that way. He dug around in gcc itself to find some place
which used that code, to show that it is used. It was accepted.
Moral: it's easier to change the technical details (gcc) than
the social ones (getting people to use a better idiom).
That's about all I know of the story.
My comment is completely off-topic, but I enjoyed a lyrical moment
when I mis-read Cameron's statement, and found myself imagining what
"Peter's wise counsel bears" looked like. I am envious of Peter,
having never made any magical forest-friends myself.
If we each had at least /one/ wise counsel bear, then c.l.py would
certainly reap the benefits of our enhanced posts!
> Spend a few tens of millions on developing just-in-time compilers
> and program analysis. That worked for Java.
Have you heard of Jython - python language running on a java VM? It's kind
of double interpreted - the python source is converted to JVM bytecode,
and then the JVM runs it however that JVM runs bytecode. I guess it should
be many times faster than python because of the JVM performance, and
wopuld be interested to hear any comparisons.
> Have you heard of Jython - python language running on a java VM? It's kind
> of double interpreted - the python source is converted to JVM bytecode,
> and then the JVM runs it however that JVM runs bytecode. I guess it should
> be many times faster than python because of the JVM performance, and
> wopuld be interested to hear any comparisons.
Jython faster than Python? We did little test and it doesn't seem, look:
Lawrence "Rhymes" Oluyede
> experience with ReportLab suggests jython can be fairly slow compared to
> CPython although it does have advantages.
The advantages being?
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: and...@bullseye.apana.org.au (pref) | Snail: PO Box 370
and...@pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
> Jython faster than Python? We did little test and it doesn't seem, look:
Please bear in mind that the test code included the start up time for
interpreter. For jython, this is a high cost, because starting a JVM
often takes up to 10 seconds or more.
It would probably be fairer to run timings after the VM has already
been through the startup phase. I think that is a more valid
reflection of real-world scenarios where a VM gets started once and
left running for a long time.
> The advantages being?
I think gain access to Java stuff is an advantage in some situations,