Re: About alternatives to Matlab

76 views
Skip to first unread message

sturlamolden

unread,
Nov 16, 2006, 4:09:03 PM11/16/06
to
Boris wrote:
> Hi, is there any alternative software for Matlab? Although Matlab is
> powerful & popular among mathematical & engineering guys, it still
> costs too much & not publicly open. So I wonder if there's similar
> software/lang that is open & with comparable functionality, at least
> for numerical aspect. Thanks!

I have used Matlab for years, and has recently changed to Python. In
addition one needs NumPy and Matplotlib, and perhaps SciPy. Other
useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for
3D graphics, Mpi4Py for parallel computation, etc. You will find python
packages for nearly any conceivable task. Unlike Matlab, Python is a
general purpose programming language, and a distant cousin of Lisp. The
Python language is fare more expressive and productive than Matlab, yet
even more easy to use.

The NumPy package is the core requirement for numerical work in Python.
It is quite different form Matlab, but I think it is more powerful.
Particularly, arrays are passed by reference (not by value), and
indexing creates view matrices.

To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
transform. I have here coded it in Matlab and Python/NumPy using Tim
Swelden's lifting scheme.

First the Matlab version (D4_Transform.m):

function x = D4_Transform(x)
% D4 Wavelet transform in Matlab
% (C) Sturla Molden
C1 = 1.7320508075688772;
C2 = 0.4330127018922193;
C3 = -0.066987298107780702;
C4 = 0.51763809020504137;
C5 = 1.9318516525781364;
s1 = zeros(ceil(size(x)/2));
d1 = zeros(ceil(size(x)/2));
d2 = zeros(ceil(size(x)/2));
odd = x(2:2:end);
even = x(1:2:end-1);
d1(:) = odd - C2*even;
s1(1) = even(1) + C2*d1(1) + C3*d1(end);
s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);
d2(1) = d1(1) + s1(end);
d2(2:end) = d1(2:end) + s1(1:end-1);
x(1:2:end-1) = C4*s1;
x(2:2:end) = C5*d2;
if (length(x) >2)
x(1:2:end-1) = D4_Transform(x(1:2:end-1));
end


Then the Python version (D4.py):

import numpy
import time

def D4_Transform(x, s1=None, d1=None, d2=None):
"""
D4 Wavelet transform in NumPy
(C) Sturla Molden
"""
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C4 * s1[:]
odd[:] = C5 * d2[:]
if x.size > 2:

D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

if __name__ == "__main__":
x = numpy.random.rand(2**23)
t0 = time.clock()
D4_Transform(x)
t = time.clock()
print "Elapsed time is %.6f seconds" % (t-t0)


Now let's do benchmark on my laptop (1.73 GHz Pentium M, 0.99 GB RAM).
I have stopped paying for Matlab maintenance (for reasons that will be
obvious), so I only have R14 Service Pack 2 for comparison.

First we try Matlab (R14 Service Pack 2):

>> x = rand(2^23,1);
>> tic; D4_Transform(x); toc
Elapsed time is 27.145438 seconds.


Then we Python 2.4.4 with NumPy 1.0:

C:\develop\python\D4>python D4.py
Elapsed time is 3.364887 seconds

That is quite astonishing.

If anyone wonders why I think Travis Oliphant and the NumPy team should
be knighted, then this is the answer. The Mathworks' product only
achieved 100% * 3/27 = 11% the speed of Python/NumPy, and is infinitely
more expensive.

Does anyone wonder why I am not paying for Matlab maintenance anymore?

Sorry Mathworks, I have used your product for years, but you cannot
compete with NumPy.


Cheers,
Sturla Molden

John Henry

unread,
Nov 16, 2006, 4:46:13 PM11/16/06
to
Bill Gates will have you jailed! :-)

On a more serious note, is there any alternative to Simulink though?

Ramon Diaz-Uriarte

unread,
Nov 16, 2006, 5:23:07 PM11/16/06
to sturlamolden, pytho...@python.org
R (http://cran.r-project.org) might be an alternative, specially if
you do a lot of statistics and graphics. (R is probably the most
widely used language/system in statistical research).


R.

> --
> http://mail.python.org/mailman/listinfo/python-list
>


--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

Cameron Laird

unread,
Nov 16, 2006, 5:55:38 PM11/16/06
to
In article <1163711343.8...@h48g2000cwc.googlegroups.com>,

sturlamolden <sturla...@yahoo.no> wrote:
>Boris wrote:
>> Hi, is there any alternative software for Matlab? Although Matlab is
>> powerful & popular among mathematical & engineering guys, it still
>> costs too much & not publicly open. So I wonder if there's similar
>> software/lang that is open & with comparable functionality, at least
>> for numerical aspect. Thanks!
>
>I have used Matlab for years, and has recently changed to Python. In
.
.
.
While I frequently push Matlab users toward Python, they also deserve
to know about Octave <URL:
http://www-128.ibm.com/developerworks/library/l-oslab/?n-l-10242 >.

Matimus

unread,
Nov 16, 2006, 6:37:48 PM11/16/06
to
Boris wrote:
> Hi, is there any alternative software for Matlab? Although Matlab is
> powerful & popular among mathematical & engineering guys, it still
> costs too much & not publicly open. So I wonder if there's similar
> software/lang that is open & with comparable functionality, at least
> for numerical aspect. Thanks!

There is also Scilab. I've only used it a tiny bit but for the task it
worked well. I do know that it has a somewhat restrictive license. It
is open source, but you aren't allowed to modify and redistribute the
source. I get the feeling that some people avoid Scilab but I'm not
sure why. If anybody knows the reason I'd be happy to hear it.

Scilab website: http://www.scilab.org

-Matt

Robert Kern

unread,
Nov 16, 2006, 6:47:09 PM11/16/06
to pytho...@python.org
Matimus wrote:
> There is also Scilab. I've only used it a tiny bit but for the task it
> worked well. I do know that it has a somewhat restrictive license. It
> is open source, but you aren't allowed to modify and redistribute the
> source. I get the feeling that some people avoid Scilab but I'm not
> sure why. If anybody knows the reason I'd be happy to hear it.

I think you just stated it.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Brian Blais

unread,
Nov 17, 2006, 1:51:02 PM11/17/06
to Matimus, scipy...@scipy.org, pytho...@python.org
Matimus wrote:
> Boris wrote:
> > Hi, is there any alternative software for Matlab? Although Matlab is
> > powerful & popular among mathematical & engineering guys, it still
> > costs too much & not publicly open. So I wonder if there's similar
> > software/lang that is open & with comparable functionality, at least
> > for numerical aspect. Thanks!
>
> There is also Scilab.
> If anybody knows the reason I'd be happy to hear it.

I feel like I need to respond on this thread, as a recent Matlab/Octave/Scilab to
Python convert. I'm cross-posting this to the scipy list, because there was a recent
thread about advantages to Python over Matlab.

Bottom line: Python is more than a replacement for Matlab, in almost every capacity.
There are a couple of places here and there where I feel Matlab still has an
advantage, but since I converted about 12 months ago, I have run Matlab only a couple
of times, and only to send something to someone who didn't have Python installed.

I've been using Matlab for more than 10 years, in every part of my research and
teaching. When I started to teach courses with a larger Matlab component, I looked
for free/cheap alternatives that the students could use on their own machines. There
was the student version of Matlab, which was crippled in ways unacceptable to me.

Octave was good, and I used it a couple of years, but the MS Windows port is pretty
horrible. I still use it some on Linux, but on Windows it requires Cygwin to be
installed, and performance takes a *huge* hit with the compiled extensions in
Windows. The community is pretty amazing, and the main author John Eaton is nothing
short of incredible (I've personally received bug fix responses within minutes of
submitting the bug report!). Writing C++ extensions is also pretty easy in Octave,
and I made a lot of use of that. The graphics uses a connection to Gnuplot, which is
ok.

I used Scilab for a short while. It did install more cleanly in Windows, and the
performance was generally better than Octave. I didn't like how some things were not
Matlab compatible for no good reason. I got the feeling from the community that it
wasn't as open, and that decisions were somewhat arbitrary. GUI syntax was weird,
and compiling extensions was a real pain, requiring more than one file per function.

I was introduced to python about 12 months ago, and haven't looked back since. :)
I want to state some of the hurdles that I felt in the transition, in the hope that
it will help others to overcome them. The advantages to Python far outweigh the
problems that I had.

1) in Matlab, you can call a function without any execfile() or import statements.
This impacts Python negatively in the ease of using it interactively. On the other
hand, the one-function per file organization of Matlab can get a little hard to follow.

2) following up on point 1), this has an effect on matplotlib more drastically.
Turning it to interactive mode (ion()) helps, but in Linux I get window refresh
issues with this mode. Switching to ipython for the shell, making sure to call it
with ipython -pylab, has been a big help in this regard.

3) 3D plotting requires yet-another library. luckily I haven't had to use this much,
but I hope that someday that it will be part of matplotlib.

4) GUI development is a bit easier in Matlab for small projects, and harder for
larger ones. I like using wax in python, which wraps wxPython. Others swear by
Dabo, but I always get two windows popping up when I run even the hello world daboui
program. At any rate, wxPython looks much better on Windows than the same interface
in Matlab, and it is more robust across operating systems.

5) although stated in the python docs that python is easy to interface with C, and in
the scipy docs that it is easier to interface to C than Matlab, I cannot agree.
perhaps it is my limitation, but I found that the API was more complicated (because
of the multiple data structures), and having to keep track of references was
something I never had to do in writing Matlab extensions. HOWEVER, once I found
Pyrex, all of that changed. Pyrex is simply the best way to write an extension that
I have ever seen, and I have never seen its equal for any other language.


So my recommendation for a (nearly) complete Matlab replacement would be:
python
numpy
scipy
matplotlib
pyrex

There is a bit of a learning curve, but it is well worth it. On an amusing note, if
you work in Python for a while, and go back to Matlab for a bit, you will find that
you will forget your semi-colons at the ends of the Matlab lines, and all of your
arrays will display and scroll like crazy on your screen. :)

thanks all for making such a great set of tools!


Brian Blais

--
-----------------

bbl...@bryant.edu
http://web.bryant.edu/~bblais


Filip Wasilewski

unread,
Nov 17, 2006, 2:17:49 PM11/17/06
to
sturlamolden wrote:
> Boris wrote:
> > Hi, is there any alternative software for Matlab? Although Matlab is
> > powerful & popular among mathematical & engineering guys, it still
> > costs too much & not publicly open. So I wonder if there's similar
> > software/lang that is open & with comparable functionality, at least
> > for numerical aspect. Thanks!
>
> I have used Matlab for years, and has recently changed to Python. In
> addition one needs NumPy and Matplotlib, and perhaps SciPy. Other
> useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for
> 3D graphics, Mpi4Py for parallel computation, etc. You will find python
> packages for nearly any conceivable task. Unlike Matlab, Python is a
> general purpose programming language, and a distant cousin of Lisp. The
> Python language is fare more expressive and productive than Matlab, yet
> even more easy to use.

Huh, so you don't think it's just damn sexy to do OOP, networking and
databases by multiplying matrices?;-)


> The NumPy package is the core requirement for numerical work in Python.
> It is quite different form Matlab, but I think it is more powerful.
> Particularly, arrays are passed by reference (not by value), and
> indexing creates view matrices.

For those interested in drilling down on this topic there is a Numpy
for Matlab Users guide at http://scipy.com/NumPy_for_Matlab_Users


> To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
> transform. I have here coded it in Matlab and Python/NumPy using Tim
> Swelden's lifting scheme.

[...]

Actually you have not. The algorithm you presented gives completely
wrong results. Have a look at quick&dirty(TM) implementation bellow.

import math
import numpy

def d2_lwt(x, axis=-1, level=None, copy=False):
"""
Daubechies2 (4 point support) Lifting Wavelet Transform in NumPy
"""
C1 = 1.7320508075688772 # sqrt(3)
C2 = 0.4330127018922193 # sqrt(3)/4
C3 = -0.066987298107780702 # (sqrt(3)-2)/4)
C4 = 0.51763809020504137 # (sqrt(3)-1)/sqrt(2)
C5 = 1.9318516525781364 # (sqrt(3)+1)/sqrt(2)

if not isinstance(x, numpy.ndarray) or x.dtype.kind != 'f':
x = numpy.array(x, dtype=numpy.float64)
elif copy:
x = x.copy()

max_level = int(math.floor(math.log(x.shape[axis],2)))
if level is None:
level = max_level
else:
assert level <= max_level, "Level param too high"

coeffs = []
cA = x.swapaxes(0, axis)
while level > 0:
level -= 1

# lazy
even = cA[::2]
odd = cA[1::2]

# dual
# using `even = even + C1*odd` may speed up things on
# some machines (only for not in-place transform).
even += C1*odd

# primal
odd[0] -= C2*even[0] + C3*even[-1]
odd[1:] -= C2*even[1:] + C3*even[:-1]

# dual
even[:-1] -= odd[1:]
even[-1] -= odd[0]

# scale
even *= C4
odd *= C5

cA, cD = even, odd
coeffs.append(cD.swapaxes(axis, 0))

coeffs.append(cA.swapaxes(axis, 0))
coeffs.reverse()
return coeffs

if __name__ == "__main__":
d = [1,5,3,2,4,8,5,2]
data = [
numpy.array(d, dtype=numpy.float32),
numpy.array(d, dtype=numpy.float64),
numpy.array([d]*2, dtype=numpy.float64),
numpy.array([d]*2, dtype=numpy.float32),
numpy.array(d, dtype=numpy.int32),
[d]*2,
[[d,d]]*3,
]

for i,x in enumerate(data):
print "Case %d:" % (i+1)
print "x in:\n", x
coeffs = d2_lwt(x, axis=-1, level=None, copy=False)
print "coeffs:"
for c in coeffs:
print c, c.dtype
print "x out:\n", x
print


Please excuse me not including Matlab version here, I would like to
have a free weekend.

As far as the speed comparison is concerned I totally agree that NumPy
can easily outperform Matlab in most cases. Of course one can use
compiled low-level extensions to speed up specific computations in
Matlab, but it's a lot easier and/or cheaper to find very good tools
for Python.


> If anyone wonders why I think Travis Oliphant and the NumPy team should
> be knighted, then this is the answer.

Would love to see the ceremony:)

cheers,
fw

Sébastien Boisgérault

unread,
Nov 17, 2006, 2:38:02 PM11/17/06
to

On Nov 16, 10:46 pm, "John Henry" <john106he...@hotmail.com> wrote:
> Bill Gates will have you jailed! :-)
>
> On a more serious note, is there any alternative to Simulink though?

Ptolemy II. Java stuff in the core but components may be written in
Python

http://ptolemy.eecs.berkeley.edu/ptolemyII/
http://ptolemy.eecs.berkeley.edu/ptolemyii/ptII5.0/ptII/doc/codeDoc/ptolemy/actor/lib/python/PythonScript.htm

John Henry

unread,
Nov 17, 2006, 4:42:47 PM11/17/06
to
Thanks for pointing that out. I wasn't aware of this. Will take a
look.

Maarten van Reeuwijk

unread,
Nov 17, 2006, 7:15:48 PM11/17/06
to
sturlamolden wrote:

> Sorry Mathworks, I have used your product for years, but you cannot
> compete with NumPy.

Funny. I went exactly the other way. Had a full OO postprocessing library
for Python/Scipy/HDF etc which worked brilliantly. Then changed to a 64 bit
machine and spent three days trying to install all the additional libraries
that I had become dependent on over the year. In the end, I gave up
completely frustrated (something to do with a Fortran compiler in
combination with SciPy or something). Then I tried MATLAB and was
completely delighted that there was a program with all MY batteries
included :-).

BTW, I have the impression that MATLAB and SciPy have about the same
performance.

Cheers, Maarten
--
===================================================================
Maarten van Reeuwijk dept. of Multiscale Physics
Phd student Faculty of Applied Sciences
maarten.vanreeuwijk.net Delft University of Technology

Prabhu Ramachandran

unread,
Nov 18, 2006, 5:39:31 AM11/18/06
to SciPy Users List, pytho...@python.org, Matimus
>>>>> "Brian" == Brian Blais <bbl...@bryant.edu> writes:

Brian> 3) 3D plotting requires yet-another library. luckily I
Brian> haven't had to use this much, but I hope that someday that
Brian> it will be part of matplotlib.

I'd rather not say anything about this since I have strong opinions
about the subject.

Brian> 4) GUI development is a bit easier in Matlab for small
Brian> projects, and harder for larger ones. I like using wax in
Brian> python, which wraps wxPython. Others swear by Dabo, but I
Brian> always get two windows popping up when I run even the hello
Brian> world daboui program. At any rate, wxPython looks much
Brian> better on Windows than the same interface in Matlab, and it
Brian> is more robust across operating systems.

You should try and take a look at enthought.traits. It makes creating
UIs a lot easier than anything else I've seen.

cheers,
prabhu


sturlamolden

unread,
Nov 18, 2006, 1:57:00 PM11/18/06
to

Filip Wasilewski wrote:

> Actually you have not. The algorithm you presented gives completely
> wrong results. Have a look at quick&dirty(TM) implementation bellow.

God grief. I followed the implementation in Ingrid Daubechies' and Wim
Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:
247-269, 1998). If you look at the factorized polyphase matrix for D4
(which gives the inverse transform), their implementation of the
lifting steps the original paper is incorrect. That is scary.

A free PDF or PS of the paper is available here:

http://cm.bell-labs.com/who/wim/papers/factor/index.html

Thanks a lot!


Sturla Molden

sturlamolden

unread,
Nov 18, 2006, 2:45:27 PM11/18/06
to

sturlamolden wrote:

> God grief. I followed the implementation in Ingrid Daubechies' and Wim
> Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:
> 247-269, 1998). If you look at the factorized polyphase matrix for D4
> (which gives the inverse transform), their implementation of the
> lifting steps the original paper is incorrect. That is scary.
>
> A free PDF or PS of the paper is available here:
>
> http://cm.bell-labs.com/who/wim/papers/factor/index.html


Actually its more complicated than that. The PS and the PDF files are
different, and only the PS-file is confused. Here is the correct
explanation:

The factorization of the polyphase matrix is not unique. There are
several valid factorizations. Our implementations corresponds to
different factorizations of the analysis and synthesis poyphase
matrices, and both are in a sence correct, although numerically
different.

sturlamolden

unread,
Nov 19, 2006, 12:18:09 PM11/19/06
to

sturlamolden wrote:

> def D4_Transform(x, s1=None, d1=None, d2=None):
> """
> D4 Wavelet transform in NumPy
> (C) Sturla Molden
> """
> C1 = 1.7320508075688772
> C2 = 0.4330127018922193
> C3 = -0.066987298107780702
> C4 = 0.51763809020504137
> C5 = 1.9318516525781364
> if d1 == None:
> d1 = numpy.zeros(x.size/2)
> s1 = numpy.zeros(x.size/2)
> d2 = numpy.zeros(x.size/2)
> odd = x[1::2]
> even = x[:-1:2]
> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1] #typ0
> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1] #typo


> d2[0] = d1[0] + s1[-1]
> d2[1:] = d1[1:] + s1[:-1]
> even[:] = C4 * s1[:]
> odd[:] = C5 * d2[:]
> if x.size > 2:
> D4_Transform(even,s1[0:even.size/2],
> d1[0:even.size/2],d2[0:even.size/2])


Actually, there was a typo in the original code. I used d1[l-1] where I
should have used d1[l+1]. Arrgh. Here is the corrected version, the
Matlab code must be changed similarly. It has no relevance for the
performance timings though.


def D4_Transform(x, s1=None, d1=None, d2=None):
"""
D4 Wavelet transform in NumPy
(C) Sturla Molden
"""
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]

s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]

Filip Wasilewski

unread,
Nov 19, 2006, 3:28:56 PM11/19/06
to
sturlamolden wrote:
[...]

> Here is the correct explanation:
>
> The factorization of the polyphase matrix is not unique. There are
> several valid factorizations. Our implementations corresponds to
> different factorizations of the analysis and synthesis poyphase
> matrices, and both are in a sence correct, although numerically
> different.

Please correct me if I'm missing something but I'm pretty sure that
choice of polyphase matrix factorization for given wavelet filter bank
has influence only on speed and numerical stability of computation and
not on the result itself. Particularly it should be possible to
reconstruct analysis and synthesis filters from polyphase matrices
regardless of the chosen factorization and both the discrete wavelet
transform and the wavelet lifting scheme should give corresponding
results for chosen wavelet (one can be rewritten in the form of other).

cheers,
fw

Filip Wasilewski

unread,
Nov 19, 2006, 3:33:27 PM11/19/06
to

If you swap C4 with C5 then our solutions give identical results and
both match the classic dwt approach:

even[:] = C5 * s1[:]
odd[:] = C4 * d2[:]

> if x.size > 2:
> D4_Transform(even,s1[0:even.size/2],
> d1[0:even.size/2],d2[0:even.size/2])

cheers,
fw

Phil Schmidt

unread,
Nov 23, 2006, 8:18:02 PM11/23/06
to
Brian Blais wrote:

> So my recommendation for a (nearly) complete Matlab replacement would be:
> python
> numpy
> scipy
> matplotlib
> pyrex
>


Brian,

Thanks for that list. I'm currently in the process of getting quotes
for a bunch of Matlab tools for hardware-in-the-loop simulation. Big
bucks.

I'd love to use Python, but I'm not comfortable with the hardware side
of that. I'm certain that most, if not all data acquisition hardware
comes with DLL drivers, which I could interface with using ctypes. I'm
concerned though about spending more time messing around with the
hardware interface than it's worth.

Do you have any experience with this side of the Matlab replacement
question? How about anyone else? Any recommendations?

Thanks,
Phil

Brian Blais

unread,
Nov 25, 2006, 11:57:30 AM11/25/06
to Phil Schmidt, pytho...@python.org
Phil Schmidt wrote:
>
> I'd love to use Python, but I'm not comfortable with the hardware side
> of that. I'm certain that most, if not all data acquisition hardware
> comes with DLL drivers, which I could interface with using ctypes. I'm
> concerned though about spending more time messing around with the
> hardware interface than it's worth.
>
> Do you have any experience with this side of the Matlab replacement
> question? How about anyone else? Any recommendations?
>

Unfortunately, no I don't have experience with the hardware end of things. One can,
both with c-types and with pyrex, very easily use existing C libraries so if there is
a C-API, that shouldn't be a problem.

Sorry for not being much help!


bb

sturlamolden

unread,
Nov 25, 2006, 8:29:22 PM11/25/06
to

Phil Schmidt wrote:

> Thanks for that list. I'm currently in the process of getting quotes
> for a bunch of Matlab tools for hardware-in-the-loop simulation. Big
> bucks.

Yup, and better spent elsewhere...


> I'd love to use Python, but I'm not comfortable with the hardware side
> of that. I'm certain that most, if not all data acquisition hardware
> comes with DLL drivers, which I could interface with using ctypes. I'm
> concerned though about spending more time messing around with the
> hardware interface than it's worth.

Usually you have to mess equally much with Matlab, writing CMEX
interfaces between the DLLs and Matlab. And afterwards you get the
headache of Matlab's single-threaded environment and horrible
pass-by-value sematics.


> Do you have any experience with this side of the Matlab replacement
> question? How about anyone else? Any recommendations?

If you are afraid of doing some C coding or using ctypes, I'd say go
for LabView. When it comes to data acquisition, LabView is far superior
to Matlab. And data acquisition hardware usually comes with LabView
drivers ready to run. LabView can also compile programs that you can
run on real-time OS'es and common DSP chips, so you will not need to
port your code to these targets if you need hard real-time constraints
in your system.

First find out what drivers or APIs are supplied with the hardware.
Then do the decision of which language to use - including Python,
Matlab, LabView, Java, C# .NET, C or C++. I would in any case get a
quote for LabView as well, i does not cost you anything just to obtain
the quote. Generally, development time is far more expensive than
licenses, even with the ultra-expensive Matlab and LabView software.
Using Python just for the sake of using Python is silly.

Phil Schmidt

unread,
Nov 26, 2006, 12:23:30 AM11/26/06
to
sturlamolden wrote:

> Using Python just for the sake of using Python is silly.

Well, that kind of gets right to my point. Does the "added" effort with
Python to interface with data acquisition hardware really result in
less productivity? I am very familiar with Matlab, Labview, and Python,
and frankly, Python is the most productive and powerful programming
language of the three. But it's the hardware compatibility thing that
concerns me with Python.

It seems the answers are hard to come by on this issue. I sure would be
willing to give it a try, except that I'm getting paid to get a job
done, not to tinker around with Python and DAQ hardware. But if
tinkering around could save my project money on commercial software,
and still get it done on schedule, it would be a big win. I just don't
have confidence that it would.

Phil

Robert Kern

unread,
Nov 26, 2006, 12:52:19 AM11/26/06
to pytho...@python.org
Phil Schmidt wrote:
> sturlamolden wrote:
>
>> Using Python just for the sake of using Python is silly.
>
> Well, that kind of gets right to my point. Does the "added" effort with
> Python to interface with data acquisition hardware really result in
> less productivity? I am very familiar with Matlab, Labview, and Python,
> and frankly, Python is the most productive and powerful programming
> language of the three. But it's the hardware compatibility thing that
> concerns me with Python.

Without knowing the hardware that you want to use, it's quite difficult for us
to tell you anything useful. Most likely, you're simply going to have to sit
down with a list of the hardware that you want to use, the existing solutions
for interfacing with that hardware (e.g. already written MATLAB, LabView, or
Python extension modules for that hardware), and their costs. You're going to
have to weigh those costs against the cost of using ctypes or something similar
to wrap the raw DLLs if a Python extension module isn't available.

We might be able to help you locate the already-existing Python wrappers for
your hardware, but the rest are judgement calls that only you can make. To make
an accurate judgement, you will probably need to have a little bit of experience
trying to wrap one such raw DLL. If you can't afford the time to do that one
experiment, then you probably have your answer.

sturlamolden

unread,
Nov 26, 2006, 12:43:48 PM11/26/06
to
Phil Schmidt wrote:

> Well, that kind of gets right to my point. Does the "added" effort with
> Python to interface with data acquisition hardware really result in
> less productivity? I am very familiar with Matlab, Labview, and Python,
> and frankly, Python is the most productive and powerful programming
> language of the three. But it's the hardware compatibility thing that
> concerns me with Python.

This is really a question that has no general answer. It depends on the
specific hardware and the solutions available.

Generally, Matlab and Python sucks equally much when it comes to
interfacing with hardware. Both tycan use C extensions (which you may
have to write) or ctypes (Matlab's equivalent is the loadlibrary/callib
functions). If only a C SDK is available, SWIG can autogenerate Python
wrappers (require some tweking). Matlab's loadlibrary can sometimes
parse C header files, but it is not always successful.

Both Matlab and Python can use COM/ActiveX objects on Windows. In
addition, Matlab can use Java classes, with which Python does not
interface easily. On the other hand, Python can use .NET objects, if
you use Microsoft's IronPython or "Python for .NET" with classic
Python. Last time I checked, Matlab could not use .NET objects, but
that was a year ago.

Most data acquisition hardware are supplied with SDKs for C, and
perhaps COM/ActiveX objects for RAD tools like Visual Basic and Delphi.
Modern hardware is often supplied with .NET SDKs. LabView drivers are
also commonly supplied. If none are supplied by the vendor, there are
often prewritten solutions for Matlab, Python or Java available on the
Internet, but you will have to search for them, or write your own.

Thus, it is not generally true that it is easier to interface hardware
with Matlab than Python. It depens on what SDKs you get.

On the programming side, Python is a general purpose language, Matlab
is designed for working with matrices easily. Array and matrix
computation in Python require NumPy, which has recently been declared
stable. Matlab has been stable for quite a while. Python is
multi-threaded, Matlab is single-threaded. Matlab uses pass-by-value
even for the largest of arrays, Python always pass references.

With Python, you can program with multiple threads; with Matlab, you
have everything in one thread. As Matlab only supports synchronous i/o
out of the box, you cannot do anything else while Matlab's fwrite or
fread functions are 'blocking' (i.e. waiting for data from hardware or
writing to disk). This is Matlab's major disadvantage for data
acquisition applications. Thus to make your data acquisition run
smoothly and efficiently with Matlab, you must resort to writing CMEX
extensions for performing asynchronous I/O or spawn off threads. Have
you ever looked at overlapped i/o in the Windows API? I can guarantee
that it is hundred times more complex than anything a hardware vendor
can throw at you. Alternatively, you can code the i/o in Java, and make
have Matlab call Java for i/o tasks. With Python, this is a non-issue.
You can put blocking i/o in separate threads or use prewritten APIs for
non-blocking i/o. It is obvious which is the better option here.
Generally, non-blocking i/o and multithreading is essential for any
serious data acqusition application, and Matlab does not support any of
these.

Both Matlab and Python is interpreted languages. Matlab has a
rudimentary JIT compiler, whilst classic Python is interpreted only.
Microsoft's IronPython uses the .NET JIT compiler. Jython uses Java's
JIT compiler (run the Server JVM or IBM's JDK for best performance,
don't use the client JVM with hotspot). However, if you use IronPython
or Jython, NumPy is out of the question. Thus, Matlab has perhaps a
slight advantage here.

With Python you can inline C in your Python code (e.g. using SciPy and
Weave), and even inline assembly in the inlined C. Thus you can easily
deal with hotspots in your Python code. You cannot do this easily with
Matlab.

For these reasons, my general view is that Python is better for data
acquisition than Matlab. But using Python for the sake of using Python
is nevertheless silly. The same holds true for Matlab.

A data acquisition program ofte require fast graphics as well. For slow
2D plots, Python and Matlab perform equally well (Python requires
Matplotlib). Matlab can draw 3D graphs, but not quickly. Python require
VTK for this, which may be harder to use. For fast 2D or 3D graphics,
i.e. DirectDraw, Direct3D or OpenGL, Python is the only game in town.
DirectDraw is used by SDL, which in turn is used by PyGame. Direct3D is
available though DirectPython (or COM objects or .NET objects). OpenGL
is available through PyOpenGL, which has recently been redesigned to
use ctypes; previously the build process for PyOpenGL used to be a
nightmare. For GUI, Matlab only has rudementary capabilities. Python
has several high quality GUI libraries (not counting tkinter). You can
create multimedia applications with PyGame, you cannot do this easily
with Matlab. E.g. do you want to play sound while you record?


> It seems the answers are hard to come by on this issue. I sure would be
> willing to give it a try, except that I'm getting paid to get a job
> done, not to tinker around with Python and DAQ hardware. But if
> tinkering around could save my project money on commercial software,
> and still get it done on schedule, it would be a big win. I just don't
> have confidence that it would.

That you will have to figure out your self. The answer is not obvious.
But for GUI, forget about tkinter.

Rob Purser

unread,
Nov 27, 2006, 10:16:54 AM11/27/06
to
Hi all,

I'm not going to touch the big picture issues here -- you need to pick the
right tool for the job you're doing, and only you know what works best for
your task. However, since it didn't come up, I feel I need to add a piece
of info to the mix, since I spend my days getting MATLAB working with data
acquisition hardware. To be clear, I'm the development lead for Data
Acquisition Toolbox, which is MathWorks' add on product to MATLAB and
Simulink to handle all the issues associated with interfacing data
acquisition hardware without all the CMEX work, threading, DLLs, etc. As
Sturla points out, that's not trivial work in any language.

Anyway, I just wanted to call your attention to Data Acquisition Toolbox:
http://www.mathworks.com/products/daq/

I'm not sure what hardware you're using, Phil, so I can't say whether we
support it. We do add support for new hardware all the time, so I'd be
interested in hearing about your application.

All the best,
-Rob

--
Rob Purser
Senior Team Lead, Connectivity Products
The MathWorks
rob.p...@mathworks.com

"sturlamolden" <sturla...@yahoo.no> wrote in message
news:1164504562.8...@l39g2000cwd.googlegroups.com...

sturlamolden

unread,
Nov 27, 2006, 7:12:17 PM11/27/06
to

Rob Purser wrote:

> Anyway, I just wanted to call your attention to Data Acquisition Toolbox:
> http://www.mathworks.com/products/daq/

Absolutely. If the hardware is supported by this toolbox, there is no
need to reinvent the wheel. The license is expensive, but development
time can be far more expensive. Matlab is an excellent tool for
numerical work.

However, Matlab is only single-threaded and does not optimize its
pass-by-value semantics with proper 'escape analysis'. (Yes, Matlab
does lazy copy-on-write to optimize function calls, but functions have
return values as well.) I have (temporarily?) stopped using Matlab
because I constantly got annoyed by the limitations of the language. I
still use Matlab as an advanced desktop calculator though. Somehow I
find it easier to type in an expression in Matlab than NumPy, probably
because I am more accustomed to the syntax.

Message has been deleted

aap

unread,
Nov 29, 2006, 6:52:05 PM11/29/06
to

John Henry wrote:
> Bill Gates will have you jailed! :-)
>
> On a more serious note, is there any alternative to Simulink though?
>
It's called SciCos, and as far as I've seen it not only covers Simulink
but also PowerSim.

I found only 1 major disadvantage inSciLab, ...
... it has no ActiveX

cheers,
Stef Mientki

Bill Maxwell

unread,
Nov 29, 2006, 11:44:45 PM11/29/06
to
On 16 Nov 2006 13:09:03 -0800, "sturlamolden" <sturla...@yahoo.no>
wrote:

...SNIP...

>To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
>transform. I have here coded it in Matlab and Python/NumPy using Tim
>Swelden's lifting scheme.
>

>First the Matlab version (D4_Transform.m):
>
>function x = D4_Transform(x)
> % D4 Wavelet transform in Matlab

> % (C) Sturla Molden


> C1 = 1.7320508075688772;
> C2 = 0.4330127018922193;
> C3 = -0.066987298107780702;
> C4 = 0.51763809020504137;

> C5 = 1.9318516525781364;
> s1 = zeros(ceil(size(x)/2));
> d1 = zeros(ceil(size(x)/2));
> d2 = zeros(ceil(size(x)/2));
> odd = x(2:2:end);
> even = x(1:2:end-1);
> d1(:) = odd - C2*even;
> s1(1) = even(1) + C2*d1(1) + C3*d1(end);
> s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);
> d2(1) = d1(1) + s1(end);
> d2(2:end) = d1(2:end) + s1(1:end-1);
> x(1:2:end-1) = C4*s1;
> x(2:2:end) = C5*d2;
> if (length(x) >2)
> x(1:2:end-1) = D4_Transform(x(1:2:end-1));
> end
>
>
>Then the Python version (D4.py):
>
>import numpy
>import time
>

>def D4_Transform(x, s1=None, d1=None, d2=None):
> """
> D4 Wavelet transform in NumPy
> (C) Sturla Molden
> """
> C1 = 1.7320508075688772
> C2 = 0.4330127018922193
> C3 = -0.066987298107780702
> C4 = 0.51763809020504137
> C5 = 1.9318516525781364
> if d1 == None:
> d1 = numpy.zeros(x.size/2)
> s1 = numpy.zeros(x.size/2)
> d2 = numpy.zeros(x.size/2)
> odd = x[1::2]
> even = x[:-1:2]
> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

> d2[0] = d1[0] + s1[-1]
> d2[1:] = d1[1:] + s1[:-1]
> even[:] = C4 * s1[:]
> odd[:] = C5 * d2[:]

> if x.size > 2:
>


>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])
>
>if __name__ == "__main__":
> x = numpy.random.rand(2**23)
> t0 = time.clock()
> D4_Transform(x)
> t = time.clock()
> print "Elapsed time is %.6f seconds" % (t-t0)
>

...SNIP...


I couldn't help but notice that the following two lines from the above
code are doing different things, but I think they are supposed to do the
same things (the first is the Matlab version, the 2nd is the Python
version):


> d1(:) = odd - C2*even;

> d1[:] = odd[:] - C1*even[:]


I don't understand what this code is doing, so I don't know which is
wrong. I don't even know if this is the actual code you are using. Just
thought I'd mention it.


Bill


Jon Harrop

unread,
Dec 2, 2006, 2:44:07 AM12/2/06
to

I don't know Python but this benchmark caught my eye.

>>def D4_Transform(x, s1=None, d1=None, d2=None):
>> """
>> D4 Wavelet transform in NumPy
>> (C) Sturla Molden
>> """
>> C1 = 1.7320508075688772
>> C2 = 0.4330127018922193
>> C3 = -0.066987298107780702
>> C4 = 0.51763809020504137
>> C5 = 1.9318516525781364
>> if d1 == None:
>> d1 = numpy.zeros(x.size/2)
>> s1 = numpy.zeros(x.size/2)
>> d2 = numpy.zeros(x.size/2)

Are these definitions ever used? It looks like s1, d1 and d2 are all
redefined below without reference to these previous values.

>> odd = x[1::2]
>> even = x[:-1:2]
>> d1[:] = odd[:] - C1*even[:]
>> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
>> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
>> d2[0] = d1[0] + s1[-1]
>> d2[1:] = d1[1:] + s1[:-1]
>> even[:] = C4 * s1[:]
>> odd[:] = C5 * d2[:]

Does that line create an array that is never used? If so, is C5 also never
used?

>> if x.size > 2:
>>
>>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

What is the result of this function?

I'm interested in translating this function into other languages, like
OCaml, to see how good Python's performance is (I am amazed it can beat
Matlab, not that I've used Matlab). In particular, I think you are eagerly
allocating arrays when, in a functional language, you could just as easily
compose closures.

For example, this program is 3x faster than the Python on my machine:

let rec d4 s1 d1 d2 x =
let c1 = 1.7320508075688772 in
let c2 = 0.4330127018922193 in
let c3 = -0.066987298107780702 in
let c4 = 0.51763809020504137 in
let c5 = 1.9318516525781364 in
let n = Array.length x in
let odd i = x.(2*i) and even i = x.(2*i + 1) in
let d1 i = odd i -. c1 *. even i in
let f = function -1 -> n/2 - 1 | i -> i in
let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in
let d2 i = d1 i +. s1 (f(i-1)) in
let even = Array.init (n/2) (fun i -> c4 *. s1 i) in
if n > 2 then d4 s1 d1 d2 even else s1, d1, d2

but I'm not sure it is correct!

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Carl Banks

unread,
Dec 2, 2006, 12:20:56 PM12/2/06
to
Jon Harrop wrote:
> I don't know Python but this benchmark caught my eye.
>
> >>def D4_Transform(x, s1=None, d1=None, d2=None):
> >> """
> >> D4 Wavelet transform in NumPy
> >> (C) Sturla Molden
> >> """
> >> C1 = 1.7320508075688772
> >> C2 = 0.4330127018922193
> >> C3 = -0.066987298107780702
> >> C4 = 0.51763809020504137
> >> C5 = 1.9318516525781364
> >> if d1 == None:
> >> d1 = numpy.zeros(x.size/2)
> >> s1 = numpy.zeros(x.size/2)
> >> d2 = numpy.zeros(x.size/2)
>
> Are these definitions ever used? It looks like s1, d1 and d2 are all
> redefined below without reference to these previous values.

No, they're never redefined (even in the recursive version). Slices of
them are reassigned. That's a big difference.

> >> odd = x[1::2]
> >> even = x[:-1:2]
> >> d1[:] = odd[:] - C1*even[:]
> >> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
> >> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
> >> d2[0] = d1[0] + s1[-1]
> >> d2[1:] = d1[1:] + s1[:-1]
> >> even[:] = C4 * s1[:]
> >> odd[:] = C5 * d2[:]
>
> Does that line create an array that is never used? If so, is C5 also never
> used?

No it doesn't create a new array. But it does have an effect.
Explained below.

(Actually, it does create a temporary array to store the intermediate
result, but that array does not get bound to odd.)


> >> if x.size > 2:
> >>
> >>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])
>
> What is the result of this function?

It modifies the array even in place. This also has an effect.


> I'm interested in translating this function into other languages, like
> OCaml, to see how good Python's performance is (I am amazed it can beat
> Matlab, not that I've used Matlab).

Matlab has a few *cough* limitations when it comes to hand-optimizing.
When writing naive code, Matlab often is faster than Python with numpy
because it has many commerical man-year of optimizing behind it.
However, Matlab helps v


> In particular, I think you are eagerly
> allocating arrays when, in a functional language, you could just as easily
> compose closures.

You are completely wrong.

What you seem to be missing is that slice assignment shares data. For
example, the statement "odd = x[1::2]" does NOT create a new array. It
takes a slice of x's data (in this case, the odd elements of x: slices
need not be contiguous), and binds it to the symbol odd. The array
elements are shared; changing elements in odd also changes elements of
x.

So later on, the statement "odd[:] = C5*d2[:]" reassigns the elements
of odd, which, because the array elements are shared, also reassigns
the odd elements in x.

(Notice the difference between "odd[:]=" and "odd=". "odd[:]=" is
slice assignment; it changes the elements in the given slice of odd, in
this case the whole array. "odd=" rebinds the symbol odd to a
different object, which could be a new array, a slice of an existing
array, or any other Python object.)


This data sharing more or less accomplishes the same thing that the
"closures" you speak of accomplish (in this case), only without the
functional.


> For example, this program is 3x faster than the Python on my machine:
>
> let rec d4 s1 d1 d2 x =
> let c1 = 1.7320508075688772 in
> let c2 = 0.4330127018922193 in
> let c3 = -0.066987298107780702 in
> let c4 = 0.51763809020504137 in
> let c5 = 1.9318516525781364 in
> let n = Array.length x in
> let odd i = x.(2*i) and even i = x.(2*i + 1) in
> let d1 i = odd i -. c1 *. even i in
> let f = function -1 -> n/2 - 1 | i -> i in
> let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in
> let d2 i = d1 i +. s1 (f(i-1)) in
> let even = Array.init (n/2) (fun i -> c4 *. s1 i) in
> if n > 2 then d4 s1 d1 d2 even else s1, d1, d2
>
> but I'm not sure it is correct!

It's not correct, but what you left out is probably low cost.

OCaml is compiled to machine code, right? And types can be inferred at
compile time, correct? Well then of course it's faster. It seems to
me a big help is the ability to fold multiple array operations into a
single loop, which is optimization a dynamically-typed language like
Python can't easily make. (It'd require are really smart JIT compiler
or some concessions in dynamicity.)


Carl Banks

Carl Banks

unread,
Dec 2, 2006, 12:28:52 PM12/2/06
to

Carl Banks wrote:
> Matlab has a few *cough* limitations when it comes to hand-optimizing.
> When writing naive code, Matlab often is faster than Python with numpy
> because it has many commerical man-year of optimizing behind it.
> However, Matlab helps v

That should say:

However, Matlab helps very little when you need to speed things up, or
(especially) use less memory than the naive way.


Carl Banks

Jon Harrop

unread,
Dec 3, 2006, 12:00:12 AM12/3/06
to
Carl Banks wrote:
> No, they're never redefined (even in the recursive version). Slices of
> them are reassigned. That's a big difference.

I see.

> (Actually, it does create a temporary array to store the intermediate
> result, but that array does not get bound to odd.)

Sure.

>> In particular, I think you are eagerly
>> allocating arrays when, in a functional language, you could just as
>> easily compose closures.
>
> You are completely wrong.

I'll give an example. If you write the Python:

a[:] = b[:] + c[:] + d[:]

I think that is equivalent to the ML:

fill a (map2 ( + ) (map2 ( + ) b c) d)

which can be deforested in ML to avoid the creation of the intermediate
result b[:] + c[:] by using a closure to add three values at once:

fill a (map3 (fun b c d -> b + c + d) b c d)

which will be much faster because it doesn't generate an intermediate array.

> This data sharing more or less accomplishes the same thing that the
> "closures" you speak of accomplish (in this case), only without the
> functional.

The closure avoided the intermediate result. You said that the Python isn't
doing that?

> It's not correct, but what you left out is probably low cost.
>
> OCaml is compiled to machine code, right?

Yes.

> And types can be inferred at compile time, correct?

Types are completely inferred at compile time.

> Well then of course it's faster.

Yes. So this doesn't seem to be a killer example of numpy, although I am
still amazed that it can outperform Matlab.

> It seems to
> me a big help is the ability to fold multiple array operations into a
> single loop, which is optimization a dynamically-typed language like
> Python can't easily make. (It'd require are really smart JIT compiler
> or some concessions in dynamicity.)

Writing a JIT to compile this kind of stuff is easy. My point is that this
is fundamentally bad code, so why bother trying to write a Python JIT? Why
not just write in a better language for this task? Optimising within a
fundamentally slow language seems silly to me.

Carl Banks

unread,
Dec 3, 2006, 10:58:52 AM12/3/06
to
Jon Harrop wrote:
> >> In particular, I think you are eagerly
> >> allocating arrays when, in a functional language, you could just as
> >> easily compose closures.
> >
> > You are completely wrong.
>
> I'll give an example. If you write the Python:
>
> a[:] = b[:] + c[:] + d[:]
>
> I think that is equivalent to the ML:
>
> fill a (map2 ( + ) (map2 ( + ) b c) d)
>
> which can be deforested in ML to avoid the creation of the intermediate
> result b[:] + c[:] by using a closure to add three values at once:
>
> fill a (map3 (fun b c d -> b + c + d) b c d)
>
> which will be much faster because it doesn't generate an intermediate array.

Ah, but, this wasn't about temporaries when you spoke of "eagerly
allocating arrays", was it? But yes, you're right, in this example
temporary arrays are created; arrays that Ocaml would not need. (I
don't exactly understand why you'd need a functional language to get
this optimization, though.)

You might have guessed that you can get rid of most temporary arrays,
at the expense of readability, with numpy. For example, replacing

d1[:] = odd[:] - C1*even[:]

with

numpy.multiply(-C1,even,d1)
numpy.add(d1,odd,d1)

eliminates the intermediate. No unnecessary array allocations; no
closures necessary.


I'm curious whether this shared-slicing isn't, in some ways,
advantageous over the Ocaml way. I presume that in Ocaml, the way
you'd "share" array data is to create a closure can apply some
operation to selected elements of the array, returning the result. But
could this as easily modify the array in-place? I presume a good
compiler could be able to inline the function and optimize the call to
an in-place operation, but I'm not sure how well this works in
practice.

Here's a version of D4_Transform that uses no temporary arrays (aside
from the work arrays s1, d1, and d2, which are allocated only once).
It's about 40% faster for me than the one with infix operations. I'd
be curious how it compares to a correct Ocaml version. (I'd still
expect Ocaml to be at least twice as fast.)

def alt_D4_Transform(x, s1=None, d1=None, d2=None):
add = numpy.add
multiply = numpy.multiply


C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:

d1 = numpy.zeros(x.size/2,numpy.Float)
s1 = numpy.zeros(x.size/2,numpy.Float)
d2 = numpy.zeros(x.size/2,numpy.Float)


odd = x[1::2]
even = x[:-1:2]

multiply(-C1,even,d1)
add(d1,odd,d1)
multiply(C2,d1,s1)
add(s1,even,s1)
d2[0] = C3 * d1[-1]
multiply(C3,d1[:-1],d2[1:])
add(s1,d2,s1)


d2[0] = d1[0] + s1[-1]

add(d1[1:],s1[:-1],d2[1:])
multiply(C4,s1,even)
multiply(C5,d2,odd)
if x.size > 2:

alt_D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])


> > It seems to
> > me a big help is the ability to fold multiple array operations into a
> > single loop, which is optimization a dynamically-typed language like
> > Python can't easily make. (It'd require are really smart JIT compiler
> > or some concessions in dynamicity.)
>
> Writing a JIT to compile this kind of stuff is easy.

Eh, getting a JIT to do the optimization I spoke of in Python is not
easy. It would be relatively easy in a statically typed language. In
Python, it'd be tantamount to rewriting a dynamically reconfigurable
version of numpy--you'd have to duplicate numpy's complex
type-dispatching rules.

> My point is that this
> is fundamentally bad code,

Whoa, there. I realize that for people who prefer functional
programming, with their recursively reductionist way of thinking, it
might seem as if leaving any sort of reducibility in final result is
"fundamentally bad", but that's a pretty strong thing to say.

> so why bother trying to write a Python JIT? Why
> not just write in a better language for this task? Optimising within a
> fundamentally slow language seems silly to me.

Because, for most people, language choice is not a greedy maximizing of
a single issue. Nobody who uses numpy is under the impression that it
can match a statically-typed language that is compiled to machine code
in peak performance. (Matlab is fair game, of course.) But speed is
not the only important thing.

I use Python because it's a excellently designed language that fits my
overall needs, and numpy because sometimes I need more speed than
vanilla Python. Only when speed is critical (for example, 3D collision
detection) do I write an extension in a "better language for the task"
(i.e. C).

This is something that's quite popular in the numerically-intensive
computing community, BTW. Many people use Python to handle boring
stuff like file I/O, memory managment, and non-critical numerical
calculations, and write C or Fortran extensions to do the
numerically-intensive stuff.

Carl Banks

Jon Harrop

unread,
Dec 3, 2006, 12:01:27 PM12/3/06
to
Carl Banks wrote:
>> fill a (map3 (fun b c d -> b + c + d) b c d)
>>
>> which will be much faster because it doesn't generate an intermediate
>> array.
>
> Ah, but, this wasn't about temporaries when you spoke of "eagerly
> allocating arrays", was it?

I had thought that all of the array operations were allocating new arrays at
first but it seems that at least assignment to a slice does not.

Does:

a[:] = b[:] + c[:]

allocate a temporary for b[:] + c[:]?

> But yes, you're right, in this example
> temporary arrays are created; arrays that Ocaml would not need. (I
> don't exactly understand why you'd need a functional language to get
> this optimization, though.)

I thought that functional programming could get you both the brevity of
Python's slicing and the performance of C's compilation but I was wrong.
You can get the brevity of the Python approach in C.

> You might have guessed that you can get rid of most temporary arrays,
> at the expense of readability, with numpy. For example, replacing
>
> d1[:] = odd[:] - C1*even[:]
>
> with
>
> numpy.multiply(-C1,even,d1)
> numpy.add(d1,odd,d1)
>
> eliminates the intermediate. No unnecessary array allocations; no
> closures necessary.

Right. Performance will probably go from 5x slower to 2x slower because
you're traversing the arrays twice instead of once.

> I'm curious whether this shared-slicing isn't, in some ways,
> advantageous over the Ocaml way.

That's exactly what I thought to start with. The author of F# is working on
getting slicing into his language but the use of slicing in this Python
benchmark corrupted my fragile little mind. Slicing is a terrible way to
approach this problem if you're using a compiled language like F#.

I first wrote an OCaml translation of the Python and wrote my own
little "slice" implementation. I have since looked up a C++ solution and
translated that into OCaml instead:

let rec d4_aux a n =
let n2 = n lsr 1 in
let tmp = Array.make n 0. in
for i=0 to n2-2 do
tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
done;
tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
Array.blit tmp 0 a 0 n;
if n > 4 then d4_aux a (n lsr 1)

let d4 a = d4_aux a (Array.length a)

Not only is that shorter than the Python, it is much faster:

0.56s C++ (direct arrays)
0.61s F# (direct arrays)
0.62s OCaml (direct arrays)
1.38s OCaml (slices)
2.38s Python (slices)
10s Mathematica 5.1

Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
a[i]).

> I presume that in Ocaml, the way
> you'd "share" array data is to create a closure can apply some
> operation to selected elements of the array, returning the result.

Yes. That is certainly one way of doing it.

> But could this as easily modify the array in-place?

Absolutely. A closure would capture a reference to the array. Arrays are
mutable so you could alter the array from within the closure. In F# you
could even execute the closures concurrently to alter different parts of
the same array at the same time. I just tried that and it is actually
slower to multithread this.

> I presume a good
> compiler could be able to inline the function and optimize the call to
> an in-place operation, but I'm not sure how well this works in
> practice.

Surprisingly well it seems: F# and OCaml are almost as fast as C/C++!

> Here's a version of D4_Transform that uses no temporary arrays (aside
> from the work arrays s1, d1, and d2, which are allocated only once).
> It's about 40% faster for me than the one with infix operations. I'd
> be curious how it compares to a correct Ocaml version. (I'd still
> expect Ocaml to be at least twice as fast.)

I get:

1.57s Python (in-place)

>> > It seems to
>> > me a big help is the ability to fold multiple array operations into a
>> > single loop, which is optimization a dynamically-typed language like
>> > Python can't easily make. (It'd require are really smart JIT compiler
>> > or some concessions in dynamicity.)
>>
>> Writing a JIT to compile this kind of stuff is easy.
>
> Eh, getting a JIT to do the optimization I spoke of in Python is not
> easy. It would be relatively easy in a statically typed language. In
> Python, it'd be tantamount to rewriting a dynamically reconfigurable
> version of numpy--you'd have to duplicate numpy's complex
> type-dispatching rules.

There aren't any complicated types in the above code. In fact, there are
only two types: float and float array. Type checking is easy in this case,
compilation to C is also easy, then you just dispatch to the compiled C
code when the types are ok. You would want to write your Python code like C
code but that is shorter in this case. You may also want to flag code for
compilation manually in order to avoid the overhead of compiling code
unnecessarily.

>> My point is that this is fundamentally bad code,
>
> Whoa, there. I realize that for people who prefer functional
> programming, with their recursively reductionist way of thinking, it
> might seem as if leaving any sort of reducibility in final result is
> "fundamentally bad", but that's a pretty strong thing to say.

I was referring to the slicing specifically, nothing to do with functional
programming. My F# and OCaml code are now basically identical to the C++
code.

>> so why bother trying to write a Python JIT? Why
>> not just write in a better language for this task? Optimising within a
>> fundamentally slow language seems silly to me.
>
> Because, for most people, language choice is not a greedy maximizing of
> a single issue. Nobody who uses numpy is under the impression that it
> can match a statically-typed language that is compiled to machine code
> in peak performance. (Matlab is fair game, of course.) But speed is
> not the only important thing.

In this specific context (discrete wavelet transform benchmark), I'd have
said that speed was the most important thing after correctness.

> I use Python because it's a excellently designed language that fits my
> overall needs, and numpy because sometimes I need more speed than
> vanilla Python. Only when speed is critical (for example, 3D collision
> detection) do I write an extension in a "better language for the task"
> (i.e. C).

Have you looked at languages like F#, OCaml, Haskell, SML and so on? They
seem to offer the benefits of both worlds.

> This is something that's quite popular in the numerically-intensive
> computing community, BTW. Many people use Python to handle boring
> stuff like file I/O, memory managment, and non-critical numerical
> calculations, and write C or Fortran extensions to do the
> numerically-intensive stuff.

Yes, I appreciate that Python is hugely popular, I just don't understand why
it is quite so popular.

I found a series of lectures on VPython, showing some amazing stuff:

http://showmedo.com/videos/series?name=pythonThompsonVPythonSeries

I just found some more interesting stuff here:

http://www.phy.syr.edu/~salgado/software/vpython/

This is really great work but I can't help but wonder why the authors chose
to use Python when other languages seem better suited. I'd like to work on
raising people's awareness of these alternatives, and probably create some
useful tools in the process. So I'm keen to learn what Python programmers
would want/expect from F# and OCaml.

What would it take to make you convert?

Carl Banks

unread,
Dec 3, 2006, 1:05:17 PM12/3/06
to
Jon Harrop wrote:
> I had thought that all of the array operations were allocating new arrays at
> first but it seems that at least assignment to a slice does not.
>
> Does:
>
> a[:] = b[:] + c[:]
>
> allocate a temporary for b[:] + c[:]?

Yep.

[snip]


> Not only is that shorter than the Python, it is much faster:
>
> 0.56s C++ (direct arrays)
> 0.61s F# (direct arrays)
> 0.62s OCaml (direct arrays)
> 1.38s OCaml (slices)
> 2.38s Python (slices)
> 10s Mathematica 5.1

[snip]
> 1.57s Python (in-place)

So,
optimized Python is roughly the same speed as naive Ocaml
optimized Ocaml is roughly the same speed as C++


> >> > It seems to
> >> > me a big help is the ability to fold multiple array operations into a
> >> > single loop, which is optimization a dynamically-typed language like
> >> > Python can't easily make. (It'd require are really smart JIT compiler
> >> > or some concessions in dynamicity.)
> >>
> >> Writing a JIT to compile this kind of stuff is easy.
> >
> > Eh, getting a JIT to do the optimization I spoke of in Python is not
> > easy. It would be relatively easy in a statically typed language. In
> > Python, it'd be tantamount to rewriting a dynamically reconfigurable
> > version of numpy--you'd have to duplicate numpy's complex
> > type-dispatching rules.
>
> There aren't any complicated types in the above code. In fact, there are
> only two types: float and float array.

You're vastly underestimating the complexity of numpy objects. They
have an awful lot going on under the covers to make it look simple on
the surface. There's all kinds of type-checking and type-conversions.
A JIT that folds loops together would have to have knowledge of that
process, and it's a lot of knowledge to have. That is not easy.

> Type checking is easy in this case,
> compilation to C is also easy, then you just dispatch to the compiled C
> code when the types are ok. You would want to write your Python code like C
> code but that is shorter in this case. You may also want to flag code for
> compilation manually in order to avoid the overhead of compiling code
> unnecessarily.
>
> >> My point is that this is fundamentally bad code,
> >
> > Whoa, there. I realize that for people who prefer functional
> > programming, with their recursively reductionist way of thinking, it
> > might seem as if leaving any sort of reducibility in final result is
> > "fundamentally bad", but that's a pretty strong thing to say.
>
> I was referring to the slicing specifically, nothing to do with functional
> programming. My F# and OCaml code are now basically identical to the C++
> code.

It is pretty strong thing to say that anything is fundamentally bad
just because it's not fast as something else. Fundamental badness
ought to run deeper than some superficial, linear measure.

> >> so why bother trying to write a Python JIT? Why
> >> not just write in a better language for this task? Optimising within a
> >> fundamentally slow language seems silly to me.
> >
> > Because, for most people, language choice is not a greedy maximizing of
> > a single issue. Nobody who uses numpy is under the impression that it
> > can match a statically-typed language that is compiled to machine code
> > in peak performance. (Matlab is fair game, of course.) But speed is
> > not the only important thing.
>
> In this specific context (discrete wavelet transform benchmark), I'd have
> said that speed was the most important thing after correctness.

So let's say I'm planning to write this complicated graphical program,
that does all sorts of stuff. I/O, networking, database, etc. It's a
medium throughput program, and a slow language like Python is
sufficient to run it, except for this one tiny part where I have to
transform of wave data.

Now, I was going to write the program in Python, and use the very
convenient numpy to do the transformation part. No problem, numpy
will be fast enough for my needs.

But wait! It's silly try to optimize code in a fundamentally slow
language! It's vanity to even try! I guess for the sake of getting
this little D4 transform to double in speed I'll scrap Python and write
the whole thing in OCaml.

(Or I can begrudingly admit that it's not really silly to try to
optimize a slow language.)


[snip]


> This is really great work but I can't help but wonder why the authors chose
> to use Python when other languages seem better suited. I'd like to work on
> raising people's awareness of these alternatives, and probably create some
> useful tools in the process. So I'm keen to learn what Python programmers
> would want/expect from F# and OCaml.
>
> What would it take to make you convert?

Them not being functional would be a good start....


Carl Banks

Jon Harrop

unread,
Dec 3, 2006, 2:43:05 PM12/3/06
to
Carl Banks wrote:
>> 0.56s C++ (direct arrays)
>> 0.61s F# (direct arrays)
>> 0.62s OCaml (direct arrays)
>> 1.38s OCaml (slices)
>> 2.38s Python (slices)
>> 10s Mathematica 5.1
> [snip]
>> 1.57s Python (in-place)
>
> So,
> optimized Python is roughly the same speed as naive Ocaml
> optimized Ocaml is roughly the same speed as C++

Absolutely not:

Optimized Python is 14% slower than badly written OCaml. Given the problem,
rather than the Python solution, nobody would write OCaml code like that.

Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest
Python so far whilst also requiring about half as much code.

Optimising the C++ by hoisting the O(log n) temporary array allocations into
one allocation makes it another 20% faster. I'm sure there are plenty more
optimisations...

>> There aren't any complicated types in the above code. In fact, there are
>> only two types: float and float array.
>
> You're vastly underestimating the complexity of numpy objects. They
> have an awful lot going on under the covers to make it look simple on
> the surface. There's all kinds of type-checking and type-conversions.
> A JIT that folds loops together would have to have knowledge of that
> process, and it's a lot of knowledge to have. That is not easy.

My suggestion doesn't really have anything to do with numpy. If you had such
a JIT you wouldn't use numpy in this case.

That's my point, using numpy encouraged the programmer to optimise in the
wrong direction in this case (to use slices instead of element-wise
operations).

>> I was referring to the slicing specifically, nothing to do with
>> functional programming. My F# and OCaml code are now basically identical
>> to the C++ code.
>
> It is pretty strong thing to say that anything is fundamentally bad
> just because it's not fast as something else. Fundamental badness
> ought to run deeper than some superficial, linear measure.

The slice based approach is not only slower, it is longer, more obfuscated
and more difficult to optimise. That doesn't just apply to Python and
numpy, it also applies to Matlab, Mathematica etc.

Which begs the question, if you were writing in a compiled language like F#
would you ever use slices?

> Now, I was going to write the program in Python, and use the very
> convenient numpy to do the transformation part.

What makes numpy convenient?

> No problem, numpy will be fast enough for my needs.

Ok. But you'd rather have a compiler?

> But wait! It's silly try to optimize code in a fundamentally slow
> language! It's vanity to even try! I guess for the sake of getting
> this little D4 transform to double in speed

C++ is currently 3.4x faster than Python.

> I'll scrap Python and write the whole thing in OCaml.

Why not drop to C for this one function?

> [snip]
>> This is really great work but I can't help but wonder why the authors
>> chose to use Python when other languages seem better suited. I'd like to
>> work on raising people's awareness of these alternatives, and probably
>> create some useful tools in the process. So I'm keen to learn what Python
>> programmers would want/expect from F# and OCaml.
>>
>> What would it take to make you convert?
>
> Them not being functional would be a good start....

They don't impose functional programming on you.

Carl Banks

unread,
Dec 3, 2006, 5:12:50 PM12/3/06
to
Jon Harrop wrote:
> Carl Banks wrote:
> >> 0.56s C++ (direct arrays)
> >> 0.61s F# (direct arrays)
> >> 0.62s OCaml (direct arrays)
> >> 1.38s OCaml (slices)
> >> 2.38s Python (slices)
> >> 10s Mathematica 5.1
> > [snip]
> >> 1.57s Python (in-place)
> >
> > So,
> > optimized Python is roughly the same speed as naive Ocaml
> > optimized Ocaml is roughly the same speed as C++
>
> Absolutely not:
>
> Optimized Python is 14% slower than badly written OCaml.

I'd call that roughly the same speed. Did you use any sort of
benchmark suite that miminized testing error, or did you just run it
surrounded by calls to the system timer like I did? If the latter,
then it's poor benchmark, and 10% accuracy is better than you can
expect. (Naive tests like that tend to favor machine code.)

> Given the problem,
> rather than the Python solution, nobody would write OCaml code like that.
>
> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest
> Python so far whilst also requiring about half as much code.

Frankly, I have a hard time believing anyone would naively write the
Ocaml code like the optimized version you posted. You'd know better
than I, though.

> Optimising the C++ by hoisting the O(log n) temporary array allocations into
> one allocation makes it another 20% faster. I'm sure there are plenty more
> optimisations...
>
> >> There aren't any complicated types in the above code. In fact, there are
> >> only two types: float and float array.
> >
> > You're vastly underestimating the complexity of numpy objects. They
> > have an awful lot going on under the covers to make it look simple on
> > the surface. There's all kinds of type-checking and type-conversions.
> > A JIT that folds loops together would have to have knowledge of that
> > process, and it's a lot of knowledge to have. That is not easy.
>
> My suggestion doesn't really have anything to do with numpy. If you had such
> a JIT you wouldn't use numpy in this case.
>
> That's my point, using numpy encouraged the programmer to optimise in the
> wrong direction in this case (to use slices instead of element-wise
> operations).

Ok, I can see that. We have a sort of JIT compiler, psyco, that works
pretty well but I don't think it's as fast for large arrays as a good
numpy solution. But it has more to deal with than a JIT for a
statically typed language.

> >> I was referring to the slicing specifically, nothing to do with
> >> functional programming. My F# and OCaml code are now basically identical
> >> to the C++ code.
> >
> > It is pretty strong thing to say that anything is fundamentally bad
> > just because it's not fast as something else. Fundamental badness
> > ought to run deeper than some superficial, linear measure.
>
> The slice based approach is not only slower, it is longer, more obfuscated
> and more difficult to optimise. That doesn't just apply to Python and
> numpy, it also applies to Matlab, Mathematica etc.
>
> Which begs the question, if you were writing in a compiled language like F#
> would you ever use slices?

If I was writing in F#? I absolutely would, because some problems are
most cleanly, readably, and maintainably done with slices. I don't
have any speed-lust, and I don't waste time making things fast if they
don't need to be. I would rather spend my time designing and
implementing the system, and optimizing things that actually need it.
I don't want to spend my time tracking down some stupid indexing error.

If it was critical for it to be fast, then I'd pull out the stops and
write it as efficiently as possible. In my experience, a typical
middle size program will have half a dozen or so subroutines where it's
critical to be as fast as possible.

Will any other F# people do it? I don't know if they all have
speed-lust, or if they'd all deliberately avoid slicing to prove some
kind of point about functional programming and/or non-machine-compiled
languages, but I assume there'd be some who would do it for the same
reasons I'd do it.

> > Now, I was going to write the program in Python, and use the very
> > convenient numpy to do the transformation part.
>
> What makes numpy convenient?

It's in Python, it's almost always fast enough, it's clean, it scales
up well, it expresses mathematical concepts straightforwardly, it's
well suppported, it has useful advanced functionality built in (linear
algebra, etc.).

And a lot of the time, you don't have to worry about indexing.

> > No problem, numpy will be fast enough for my needs.
>
> Ok. But you'd rather have a compiler?

Python *is* compiled, buddy. It has a VM that runs bytecode.

But, would I rather have a seperate compile to *machine code*, when an
automatic compile to VM code is fast enough?

Not remotely. Compilers are a PITA and I avoid them when it's
possible.

> > But wait! It's silly try to optimize code in a fundamentally slow
> > language! It's vanity to even try! I guess for the sake of getting
> > this little D4 transform to double in speed
>
> C++ is currently 3.4x faster than Python.

Oh, well *that* made all the difference.


> > I'll scrap Python and write the whole thing in OCaml.
>
> Why not drop to C for this one function?

Why would I? The assumptions clearly stated that numpy is fast enough.
Why would I take several hours to write a C extension when I could do
it with numpy in about 20 minutes, *and it's fast enough*?


> > [snip]
> >> This is really great work but I can't help but wonder why the authors
> >> chose to use Python when other languages seem better suited. I'd like to
> >> work on raising people's awareness of these alternatives, and probably
> >> create some useful tools in the process. So I'm keen to learn what Python
> >> programmers would want/expect from F# and OCaml.
> >>
> >> What would it take to make you convert?
> >
> > Them not being functional would be a good start....
>
> They don't impose functional programming on you.

In the same way that a screwdriver can't prevent you from driving a
nail. Give me a break, we all know these guys (Haskell especially) are
designed to support functional first and other paradigms second. Way
second. That's quite enough imposing for me.

Look, I don't want to accuse you of anything, but when you come in here
and say "speed is crucial" and then "you should be using Haskell,
Ocaml, F#, etc., ...even though C++ is faster...", it gives off a
really strong signal that you're not interested in speed so much so as
in pimping functional languages; speed is just the mechanism. Which is
fine; those languages are undeniably faster, and there's nothing wrong
with using an advantage of a language to push it, if you're respectful.

But let me offer you some advice: if you wan't to lure people from
Python, speed isn't going to cut it. Most people who chose Python
(and, let's face it, most people who use Python chose it themselves)
already knew it was slow, and chose it anyways. Pythonistas prefer
Python because it's clean, readable, well-supported, and practical.
Practical might be the sticking point here. If you were to get a
Python prompt, and type "import this", it would print the Zen of
Python, a set of principles by which the language is designed. One of
them is "Practicality beats purity." And let's face it, functional
languages are (or seem to be) all about purity. Notice that there's no
Zen that says "Fast is better than slow". If you want us to swtich to
Ocaml, you better figure out how it supports the Zen of Python better
than Python does.


Carl Banks

Jon Harrop

unread,
Dec 3, 2006, 6:03:40 PM12/3/06
to
Carl Banks wrote:
>> Optimized Python is 14% slower than badly written OCaml.
>
> I'd call that roughly the same speed. Did you use any sort of
> benchmark suite that miminized testing error, or did you just run it
> surrounded by calls to the system timer like I did?

System timer, best of three.

> If the latter,
> then it's poor benchmark, and 10% accuracy is better than you can
> expect. (Naive tests like that tend to favor machine code.)

Yes, they are roughly the same speed. My objection was that this OCaml
program was not the unoptimised one, it was the badly written one. The
unoptimised OCaml/F#/C++ are all several times faster than this OCaml, and
much shorter and simpler too.

>> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the
>> fastest Python so far whilst also requiring about half as much code.
>
> Frankly, I have a hard time believing anyone would naively write the
> Ocaml code like the optimized version you posted.

That wasn't optimised. I haven't optimised any of my solutions. The OCaml I
posted was a naive convolution with some unnecessary allocation.

> You'd know better than I, though.

When asked to convolve an array with a 4-element array (h0..h3), I think
most programmers would write (pseudocode):

for i in [0 .. n-4]
x[i] = h0 y[i] + h1 y[i+1] + h2 y[i+2] + h3 y[i+3]

in most languages. That is the essence of this benchmark.

You can express it even more succinctly with a dot product:

for i in [0 .. n-4]
x[i] = h . y[i:]

In this case, you'd want a cyclic array type too.

>> That's my point, using numpy encouraged the programmer to optimise in the
>> wrong direction in this case (to use slices instead of element-wise
>> operations).
>
> Ok, I can see that. We have a sort of JIT compiler, psyco, that works
> pretty well but I don't think it's as fast for large arrays as a good
> numpy solution. But it has more to deal with than a JIT for a
> statically typed language.

Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
would be a good student project?

>> Which begs the question, if you were writing in a compiled language like
>> F# would you ever use slices?
>
> If I was writing in F#? I absolutely would, because some problems are
> most cleanly, readably, and maintainably done with slices. I don't
> have any speed-lust, and I don't waste time making things fast if they
> don't need to be. I would rather spend my time designing and
> implementing the system, and optimizing things that actually need it.
> I don't want to spend my time tracking down some stupid indexing error.

Ok. Can you name some problems that are solved more easily with slices than
with indexing? I may have just answered my own question by posting some
really concise pseudocode that uses slices.

> Will any other F# people do it? I don't know if they all have
> speed-lust, or if they'd all deliberately avoid slicing to prove some
> kind of point about functional programming and/or non-machine-compiled
> languages, but I assume there'd be some who would do it for the same
> reasons I'd do it.

My concern is that slices are being pulled into F# because they are popular
in languages like Python and Matlab but I've yet to see an example where
they are actually a good idea (in the context of F#, i.e. a compiled
language where indexing is as fast or faster).

>> What makes numpy convenient?
>
> ...it scales up well...

What do you mean?

> ...it expresses mathematical concepts straightforwardly...

It didn't express this mathematical concept very straightforwardly.

> And a lot of the time, you don't have to worry about indexing.

Right. Ideally you want to factor that out without losing anything. Indeed,
if you do factor the C++ code you should get back to the underlying
mathematical definitions. Ideally, you'd want to write those directly and
have them executed efficiently.

>> > No problem, numpy will be fast enough for my needs.
>>
>> Ok. But you'd rather have a compiler?
>
> Python *is* compiled, buddy. It has a VM that runs bytecode.

I meant a native-code compiler, of course.

> But, would I rather have a seperate compile to *machine code*, when an
> automatic compile to VM code is fast enough?
>
> Not remotely. Compilers are a PITA and I avoid them when it's
> possible.

I can envisage a JIT Python compiler that would spot statically typed code,
compile it with no user intervention and then run it. I implemented such a
thing for Mathematica a few years ago.

>> Why not drop to C for this one function?
>
> Why would I? The assumptions clearly stated that numpy is fast enough.
> Why would I take several hours to write a C extension when I could do
> it with numpy in about 20 minutes, *and it's fast enough*?

Is it very difficult to interface Python to other languages efficiently?

> But let me offer you some advice: if you wan't to lure people from
> Python, speed isn't going to cut it. Most people who chose Python
> (and, let's face it, most people who use Python chose it themselves)
> already knew it was slow, and chose it anyways. Pythonistas prefer
> Python because it's clean, readable, well-supported, and practical.
> Practical might be the sticking point here. If you were to get a
> Python prompt, and type "import this", it would print the Zen of
> Python, a set of principles by which the language is designed. One of
> them is "Practicality beats purity." And let's face it, functional
> languages are (or seem to be) all about purity. Notice that there's no
> Zen that says "Fast is better than slow". If you want us to swtich to
> Ocaml, you better figure out how it supports the Zen of Python better
> than Python does.

I'm keen on the practical aspect. I intend to take some of the excellent
Python demos out there and port them to languages like F# in order to
illustrate how they too can be practical languages.

George Sakkis

unread,
Dec 3, 2006, 10:05:41 PM12/3/06
to
Jon Harrop wrote:

> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> would be a good student project?

I guess for a student project it's not that important, but if you have
higher ambitions, make sure you read
http://dirtsimple.org/2005/10/children-of-lesser-python.html before you
commit to writing what will turn out to be yet another "80% Python" VM
/ JIT compiler.

George

Niels L Ellegaard

unread,
Dec 4, 2006, 2:15:38 AM12/4/06
to
Jon Harrop wrote:
> So I'm keen to learn what Python programmers would want/expect from F# and
> OCaml.

I think this discussion becoming is a little misguided.

The real strength of scipy is the elegant notation rather than speed.
Being raised with Matlab I find scipy nicely familiar, and its fast
enough for many tasks. Some would argue that the strength of scipy is
the weakness of ocaml. Others would disagree. That is a question of
taste.

My only grudge about strongly recommending scipy to friends is the way
that two arrays can share the same data. This can lead to subtle errors
that I will eventually be blamed for. I don't know if arrays in Matlab
(or Octave) can share data, but if they do, then everything happens
behind the scenes and the user does not have to worry. I would love to
see a future version of numpy that was 50% slower and had a more
foolproof approach to array copying.

http://www.scipy.org/Tentative_NumPy_Tutorial#head-1529ae93dd5d431ffe3a1001a4ab1a394e70a5f2


Niels

Filip Wasilewski

unread,
Dec 4, 2006, 9:18:15 AM12/4/06
to
Jon Harrop wrote:

[...]


> I first wrote an OCaml translation of the Python and wrote my own
> little "slice" implementation. I have since looked up a C++ solution and
> translated that into OCaml instead:
>
> let rec d4_aux a n =
> let n2 = n lsr 1 in
> let tmp = Array.make n 0. in
> for i=0 to n2-2 do
> tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
> tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
> done;
> tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
> tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
> Array.blit tmp 0 a 0 n;
> if n > 4 then d4_aux a (n lsr 1)
>
> let d4 a = d4_aux a (Array.length a)
>
> Not only is that shorter than the Python

and makes my eyes bleeding, but it's only my personal feeling. Try
reading the code above aloud and you will see why `shorter` is not
necessarily considered a merit, especially on this group.

Besides of that this code is irrelevant to the original one and your
further conclusions may not be perfectly correct. Please learn first
about the topic of your benchmark and different variants of wavelet
transform, namely difference between lifting scheme and dwt, and try
posting some relevant examples or use other topic for your benchmarks.

Neglecting this issues, I wonder if it is equally easy to juggle
arbitrary multidimensional arrays in a statically typed language and do
operations on selected axis directly from the interactive interpreter
like in the NumPy example from my other post -
http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
I don't know much OCaml so it would be great if you could elaborate on
this.

> , it is much faster:
>
> 0.56s C++ (direct arrays)
> 0.61s F# (direct arrays)
> 0.62s OCaml (direct arrays)
> 1.38s OCaml (slices)
> 2.38s Python (slices)
> 10s Mathematica 5.1
>
> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
> a[i]).

[...]

So why not use C++ instead of all others if the speed really matters?
What is your point here?

Could you please share full benchmark code, so we could make
conclusions too? I would like to get to know about your benchmark
methodology. I wonder if you are allocating the input data really
dynamically or whether it's size is a priori knowledge for the
compiler.


> In this specific context (discrete wavelet transform benchmark), I'd have
> said that speed was the most important thing after correctness.

In this very specific context:
Yes, if you are targeting specific one-shot issue like JPEG2000
encoder, where you are using only 2 types of wavelets. In that case you
would probably draw back to low-level C or C++ and use a good
optimizing compiler for a specific hardware.
Not necessarily, if you are doing research with different kinds of
wavelets and need a general, flexible and easily adaptable tool or just
the computation speed is not considered a bottleneck.

Language speed is a great advantage, but if it always was the most
important, people would talk directly to the CPUs or knit DSP chips in
theirs backyards whenever they need to solve a calculation problem.

Please don't make this discussion heading towards `what is the fastest
way of doing d4 transform and why OCaml rules` instead of `what is the
optimal way of doing things under given circumstances and still have a
free afternoon ;-)`. Different tasks need different, well-balanced
measures. Neither Python nor OCalm nor any other language is a panacea
for every single problem.

cheers,
fw

sturlamolden

unread,
Dec 4, 2006, 10:39:13 AM12/4/06
to

Filip Wasilewski wrote:

> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

If speed matters, one should consider using hand-coded assembly.
Nothing really beats that. But it's painful and not portable. So lets
forget about that for a moment.

Contrary to what most computer scientists think, C++ is not the tool of
choice if speed really matters. Numeric computing is typically done in
Fortran 90/95.

That is not without reason. There are aspects of the C++ language that
makes certain numerical optimizations impossible, e.g. the potential
for pointer aliasing within tight loops, which hampers register
allocation. For example:

for(i=0; i<n; i++) {
*a += *b;
*b++ = 0;
}

Here the C++ compiler cannot put *a in a register, as a may be aliased
by b in one of the iterations. It must therefore reload the value *a
from memory in every iteration of the loop. In Fortran this can never
happen (aliasing is disallowed in the standard), and it is therefore
much easier for the compiler to do numerical optimizations.

One should also consider using ATLAS or vendor-optimized BLAS (e.g.
Intel math kernel) for some of the axpy-operations in the wavelet
transform, unless one has an expensive Fortran compiler that takes care
of this automatically (in which case one should just use slicing and
let the compiler do the rest).

I don't agree that slicing is not the best way to approach this
problem. In both NumPy and Matlab, slicing (often) results in calls to
BLAS/ATLAS, which are heavily optimized numerical routines. One can
supply BLAS libraries that automatically exploits multiple CPUs, use
FMA (fused multiply and add) whenever appropriate, and make sure cache
is used in the most optimal way.

Vendors of Fortran compilers thrive on scientists need for speed.
Fortran compilers are therefore extremely good at taking slicing and
"vectorizing" the corresponding machine code. For those who don't know
Fortran, it has slicing operations similar to those of NumPy and
Matlab. If slicing was not the right tool for numerical work, Fortran
would not support it. Good Fortran compilers can automatically detect
when a call to BLAS is appropriate, use FMA opcodes whenever possible,
use SIMD extensions (e.g. SSE and MMX), or use MPI automatically to
spread the work onto multiple CPUs in a cluster, etc. This is possible
because of the implicit SIMD parallelism in Fortran's array slicing. In
Fortran, array slicing can be evaluated in any order, it is a parallel
SIMD statement. SIMD is a very powerful and perhaps the preferred
paradigm for numerical parallel computing. In Fortran 90/95 a
programmer will use array slicing, and the compiler is free to
parallelize however it wants. In contrast, the MIMD paradigm often used
for parallel computing in C and C++ (e.g. multi-threading) is a lot
harder to code for the programmer and optimize for the compiler.

Not to speak of the fact that Fortran 95 is a lot easier to write than
C++. Also, those that use Fortran tend to be scientists, not
professional programmers. C++ introduces a higher level of complexity
and entire new classes of bugs; this is highly undesirable. Fortran is
a small, nice language for numerical computing. Anything else should
rather be done in Python than C++.

I would urge you all to read this:

http://latticeqcd.blogspot.com/2006/11/why-we-use-fortran-and-python.html

Carl Banks

unread,
Dec 4, 2006, 12:22:00 PM12/4/06
to
Jon Harrop wrote:
[snip]

> >> That's my point, using numpy encouraged the programmer to optimise in the
> >> wrong direction in this case (to use slices instead of element-wise
> >> operations).
> >
> > Ok, I can see that. We have a sort of JIT compiler, psyco, that works
> > pretty well but I don't think it's as fast for large arrays as a good
> > numpy solution. But it has more to deal with than a JIT for a
> > statically typed language.
>
> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> would be a good student project?

...and finishing would be a good project for a well-funded team of
experienced engineers.


> >> Which begs the question, if you were writing in a compiled language like
> >> F# would you ever use slices?
> >
> > If I was writing in F#? I absolutely would, because some problems are
> > most cleanly, readably, and maintainably done with slices. I don't
> > have any speed-lust, and I don't waste time making things fast if they
> > don't need to be. I would rather spend my time designing and
> > implementing the system, and optimizing things that actually need it.
> > I don't want to spend my time tracking down some stupid indexing error.
>
> Ok. Can you name some problems that are solved more easily with slices than
> with indexing? I may have just answered my own question by posting some
> really concise pseudocode that uses slices.

D4 transform?

How about addition?

How about the vast majority of cases where slicing is possible?

Occasionally you have cases where slicing would work but be
complicated, but in most cases, wherever slicing is feasible it's
easier, less error-prone, less typing, more readable, more concise,
more maintainable.


> > Will any other F# people do it? I don't know if they all have
> > speed-lust, or if they'd all deliberately avoid slicing to prove some
> > kind of point about functional programming and/or non-machine-compiled
> > languages, but I assume there'd be some who would do it for the same
> > reasons I'd do it.
>
> My concern is that slices are being pulled into F# because they are popular
> in languages like Python and Matlab but I've yet to see an example where
> they are actually a good idea (in the context of F#, i.e. a compiled
> language where indexing is as fast or faster).

It's often a good idea where speed isn't crucial because it's easier,
more readable, and more maintainable. I highly suspect F# and other
languages would still using slicing for many problems for this reason.

OTOH, it's rarely a good idea to always base coding decisions on speed
to the exclusion of all other factors.


> >> What makes numpy convenient?
> >
> > ...it scales up well...
>
> What do you mean?
>
> > ...it expresses mathematical concepts straightforwardly...
>
> It didn't express this mathematical concept very straightforwardly.

I thought it did. One must know about the shared data nuances and the
slicing syntax. Given that, the numpy way closely mimicked the way
mathematical formulas look. Put it this way: which one of these two
(numpy with slicing, Ocaml with indexing) would it be easer to reverse
engineer a mathematical formula out of? I'd say the numpy version with
slicing, by a long shot.


> > And a lot of the time, you don't have to worry about indexing.
>
> Right. Ideally you want to factor that out without losing anything. Indeed,
> if you do factor the C++ code you should get back to the underlying
> mathematical definitions. Ideally, you'd want to write those directly and
> have them executed efficiently.

I am almost always happy to factor indexing out *with* losing
something.


> >> > No problem, numpy will be fast enough for my needs.
> >>
> >> Ok. But you'd rather have a compiler?
> >
> > Python *is* compiled, buddy. It has a VM that runs bytecode.
>
> I meant a native-code compiler, of course.
>
> > But, would I rather have a seperate compile to *machine code*, when an
> > automatic compile to VM code is fast enough?
> >
> > Not remotely. Compilers are a PITA and I avoid them when it's
> > possible.
>
> I can envisage a JIT Python compiler that would spot statically typed code,
> compile it with no user intervention and then run it. I implemented such a
> thing for Mathematica a few years ago.

You can almost never spot "statically typed" code in Python reliably,
even with a separate compile stage that does a global analysis. Even
Lisp compilers, which are generally known to be the best compilers out
there for a dynamically-typed language, don't do too well at this
unless you help them out, and Lisp isn't nearly as dynamic as Python.

As for JIT's, the best you could do is hook into a function, checking
the types of its arguments, and cacheing optimized code for the given
type signature. It's basically what psyco does. Anything more would
require either help from the programmer or concessions in dynamicity.


> >> Why not drop to C for this one function?
> >
> > Why would I? The assumptions clearly stated that numpy is fast enough.
> > Why would I take several hours to write a C extension when I could do
> > it with numpy in about 20 minutes, *and it's fast enough*?
>
> Is it very difficult to interface Python to other languages efficiently?

No. It does involve lots of boilerplate. But even if it didn't, I
wouldn't write an extension, because the numpy solution is cleaner,
more readable, and more maintainable than anything I could write in C
(which is a poor language that's good at being fast and talking to
hardware and nothing else), and it's fast enough.


> > But let me offer you some advice: if you wan't to lure people from
> > Python, speed isn't going to cut it. Most people who chose Python
> > (and, let's face it, most people who use Python chose it themselves)
> > already knew it was slow, and chose it anyways. Pythonistas prefer
> > Python because it's clean, readable, well-supported, and practical.
> > Practical might be the sticking point here. If you were to get a
> > Python prompt, and type "import this", it would print the Zen of
> > Python, a set of principles by which the language is designed. One of
> > them is "Practicality beats purity." And let's face it, functional
> > languages are (or seem to be) all about purity. Notice that there's no
> > Zen that says "Fast is better than slow". If you want us to swtich to
> > Ocaml, you better figure out how it supports the Zen of Python better
> > than Python does.
>
> I'm keen on the practical aspect. I intend to take some of the excellent
> Python demos out there and port them to languages like F# in order to
> illustrate how they too can be practical languages.

That's what you say. But nothing you've done in this whole thread
remotely suggests that "practical" means anything to you other than,
"possible to write faster code in."


Carl Banks

sturlamolden

unread,
Dec 4, 2006, 1:09:19 PM12/4/06
to

Carl Banks wrote:

> > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> > would be a good student project?
>
> ...and finishing would be a good project for a well-funded team of
> experienced engineers.

I think this is a good idea. We could use the AST from the CPython
compiler, translate on the fly to ANSI Common Lisp, and pass it off to
SBCL (which is very fast and supports native threads). It would even
solve the issue of Python's crappy GIL and reference counting (as
opposed to a modern garbage collector).

Another option would be to use something like pyrex to compile Python
on-the-fly to C, and then compile using GCC or MSVC.

A third option would be to translate on the fly to Scheme, and compile
using Bigloo.

It would not be a JIT-compiler, but rather something like the Lisp
solution where individual functions can be compiled when speed is
important.

It sounds like a fun project to me.

Jon Harrop

unread,
Dec 4, 2006, 4:11:10 PM12/4/06
to
sturlamolden wrote:
> I don't agree that slicing is not the best way to approach this
> problem.

Indeed, the C++ approach can be written very succinctly using slicing:

for i=0 to n/2-1 do
tmp[i] = dot a[2*i:] h;
tmp[i + n/2] = dot a[2*i + 1:] g;

where a[i:] denotes the array starting at index "i".

> In both NumPy and Matlab, slicing (often) results in calls to
> BLAS/ATLAS, which are heavily optimized numerical routines. One can
> supply BLAS libraries that automatically exploits multiple CPUs, use
> FMA (fused multiply and add) whenever appropriate, and make sure cache
> is used in the most optimal way.

If the Python were making only a few calls to BLAS then I would agree.
However, in order to move as much computation as possible from the
interpreted language onto BLAS you have had to call BLAS many times.

So the super-fast BLAS routines are now iterating over the arrays many times
instead of once and the whole program is slower than a simple loop written
in C.

Indeed, this algorithm could be written concisely using pattern matching
over linked lists. I wonder if that would be as fast as using BLAS from
Python.

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Carl Banks

unread,
Dec 4, 2006, 4:28:31 PM12/4/06
to
sturlamolden wrote:
> Carl Banks wrote:
>
> > > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> > > would be a good student project?
> >
> > ...and finishing would be a good project for a well-funded team of
> > experienced engineers.
>
> I think this is a good idea. We could use the AST from the CPython
> compiler, translate on the fly to ANSI Common Lisp, and pass it off to
> SBCL (which is very fast and supports native threads). It would even
> solve the issue of Python's crappy GIL and reference counting (as
> opposed to a modern garbage collector).

I doubt it would work too well. Lisp's semantics differ from Python's
in subtle ways, and covering the edge cases usually means you have to
do all the stuff you thought you would avoid by using another
dynamically-typed language. Translating Lisp integers to Python ints
would work probably 99 percent of the time, but in the end you'd
essentially have to reimplement the Python int type.

If you're going to go through all that work, you might as well
translate it to C or directly to machine code.


Carl Banks

sturlamolden

unread,
Dec 4, 2006, 9:08:48 PM12/4/06
to
Jon Harrop wrote:

> So the super-fast BLAS routines are now iterating over the arrays many times
> instead of once and the whole program is slower than a simple loop written
> in C.

Yes. And the biggest overhead is probably Python's function calls.

Little is as efficient as well-written ISO C99 (not to be confused with
C++ or ANSI C). The only thing I can think of is hand-written assembly.
But whether a single loop is faster than several BLAS calls does not
only depend on how many times the memory is traversed. It also depends
a lot on cache prefetching. So I assume you make sure that the cache is
prefetched and exploited optimally for your CPU in your C code? And you
are presumably also inlining assembly to use SSE and other SIMD
opcodes? After all, SSE can speed up floating point performance by a
factor of four if the computations can be carried out in parallel. At
one point, one has to stop and ask a very important question:

How fast is fast enough? And how much suffering am I willing to endure
to achieve that speed?

Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how
much time was saved writing and debugging the code? I value my own time
more than a few extra CPU cycles. With Fortran 95, it is easy to write
numerical code that runs reasonably fast. It is only a little bit
harder to write numerical code that runs very fast. There is nothing in
Fortran that prevents you from writing loops that are equally fast or
faster than ANSI C loops. If you have several subsequent slicing
operations, a good Fortran compiler can detect that and fuse them into
a single loop. Never underestimate the ability of a modern Fortran
compiler to generate efficient machine code, even if array slicing is
used heavily.

Also never underestimate the truth in Knuth's claim that premature
optimization is the root of all evil in computer programming. It is a
lot smarter to write a pure Python/NumPy solution, and if its too slow,
run the profiler, identify the worst bottlenecks, and translate them to
Fortran. The result will probably not be as fast as a pure C solution.
But if it is still too slow, money is better spent buying better
hardware than attempting a major rewrite in C/C99/C++.

NumPy can be slower than equivalent C by an order of magnitude. But
still, that may be more than fast enough. The beauty of NumPy is really
the way it allows arrays of numerical data to be created and
manipulated within Python, not the speed of python code using NumPy
objects. We have already seen that NumPy is faster than the most recent
version of Matlab, the most expensive commercial tool on the marked;
which means, it is as fast as you can expect a numerical scripting tool
to be, but sure, there is still room for improvement.

In a numerical scripting language it is certainly not true that several
slicing operations is slower than a single big for-loop. That is due to
the overhead involved when the interpreter is invoked. The performance
is to a large extent determined by the amount of interpreter
invocations. That is why it is preferred to "vectorize" NumPy code
using slicing, repmat, reshapes, etc., rather than looping with a
pythonic for-loop. Even if you have to iterate over the memory several
times, array slicing will still be a lot faster than a pythonic
for-loop.


> Indeed, this algorithm could be written concisely using pattern matching
> over linked lists. I wonder if that would be as fast as using BLAS from
> Python.

Pattern matching on linked lists? For wavelet transforms? Seriously?

Jon Harrop

unread,
Dec 5, 2006, 5:11:58 AM12/5/06
to
sturlamolden wrote:
> Little is as efficient as well-written ISO C99 (not to be confused with
> C++ or ANSI C).

OCaml and F# are almost as fast as C++ in this case. I suspect most other
modern languages are.

> So I assume you make sure that the cache is
> prefetched and exploited optimally for your CPU in your C code?

No. My implementations are all unoptimised.

> How fast is fast enough? And how much suffering am I willing to endure
> to achieve that speed?

No suffering: The other implementations are all shorter, simpler and faster
than the Python.

> Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how
> much time was saved writing and debugging the code?

A lot of effort was saved by not having to shoehorn the solution into numpy.

> I value my own time more than a few extra CPU cycles.

Then you won't write code like this in Python.

> Also never underestimate the truth in Knuth's claim that premature
> optimization is the root of all evil in computer programming.

Indeed, only the Python was optimised.

> NumPy can be slower than equivalent C by an order of magnitude. But
> still, that may be more than fast enough. The beauty of NumPy is really
> the way it allows arrays of numerical data to be created and
> manipulated within Python, not the speed of python code using NumPy
> objects. We have already seen that NumPy is faster than the most recent
> version of Matlab, the most expensive commercial tool on the marked;
> which means, it is as fast as you can expect a numerical scripting tool
> to be, but sure, there is still room for improvement.

Price aside, Matlab is clearly the wrong tool for the job here.

> In a numerical scripting language it is certainly not true that several
> slicing operations is slower than a single big for-loop. That is due to
> the overhead involved when the interpreter is invoked. The performance
> is to a large extent determined by the amount of interpreter
> invocations. That is why it is preferred to "vectorize" NumPy code
> using slicing, repmat, reshapes, etc., rather than looping with a
> pythonic for-loop. Even if you have to iterate over the memory several
> times, array slicing will still be a lot faster than a pythonic
> for-loop.

Yes. You have to suffer in Python if you want to approach performance that
can be reached with ease in most other languages in the case of this
benchmark (and probably most benchmarks).

>> Indeed, this algorithm could be written concisely using pattern matching
>> over linked lists. I wonder if that would be as fast as using BLAS from
>> Python.
>
> Pattern matching on linked lists? For wavelet transforms? Seriously?

Sure. Here's a list-based implementation in OCaml:

let h x0 x1 x2 x3 = h0 *. x0 +. h1 *. x1 +. h2 *. x2 +. h3 *. x3
let g x0 x1 x2 x3 = g0 *. x0 +. g1 *. x1 +. g2 *. x2 +. g3 *. x3

let rec d4_even s d t a = match t, a with
| [x0; x1; x2; x3], x4::x5::x6::_ ->
h x0 x1 x2 x3::h x2 x3 x4 x5::s, g x1 x2 x3 x4::g x3 x4 x5 x6::d
| x0::x1::(x2::x3::x4::_ as t), a ->
d4_even (h x0 x1 x2 x3 :: s) (g x1 x2 x3 x4 :: d) t a

let rec d4 = function
| [_; _; _; _] as a -> [a, []]
| a -> (fun (s, _ as h) -> h :: d4 s) (d4_even [] [] a a)

That's about 3x slower than the Python.

Jon Harrop

unread,
Dec 5, 2006, 7:36:38 AM12/5/06
to
Filip Wasilewski wrote:
> Besides of that this code is irrelevant to the original one and your
> further conclusions may not be perfectly correct. Please learn first
> about the topic of your benchmark and different variants of wavelet
> transform, namely difference between lifting scheme and dwt, and try
> posting some relevant examples or use other topic for your benchmarks.

Your lifting implementation is slightly longer and slower than the
non-lifting one but its characteristics are identical. For n=2^25 I get:

1.88s C++ (816 non-whitespace bytes)
2.00s OCaml (741 b)
2.33s F# (741 b)
9.83s Your Python (1,002 b)

The original python was 789 bytes.

> Neglecting this issues, I wonder if it is equally easy to juggle
> arbitrary multidimensional arrays in a statically typed language and do
> operations on selected axis directly from the interactive interpreter
> like in the NumPy example from my other post -
> http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
> I don't know much OCaml so it would be great if you could elaborate on
> this.

It is probably just as easy. Instead of dynamic typing you have parametric
polymorphism. If you want to make your function generic over arithmetic
type then you can pass in the arithmetic operators.

>> 0.56s C++ (direct arrays)
>> 0.61s F# (direct arrays)
>> 0.62s OCaml (direct arrays)
>> 1.38s OCaml (slices)
>> 2.38s Python (slices)
>> 10s Mathematica 5.1
>>
>> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
>> a[i]).
>

> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

1. Benchmarks should not just include two inappropriate languages.
2. Speed aside, the other languages are more concise.

> Could you please share full benchmark code, so we could make
> conclusions too?

I'll paste the whole programs at the end...

> I would like to get to know about your benchmark
> methodology. I wonder if you are allocating the input data really
> dynamically or whether it's size is a priori knowledge for the
> compiler.

The knowledge is there for the compiler to use but I don't believe any of
them exploit it.

>> In this specific context (discrete wavelet transform benchmark), I'd have
>> said that speed was the most important thing after correctness.
>

> Not necessarily, if you are doing research with different kinds of
> wavelets and need a general, flexible and easily adaptable tool or just
> the computation speed is not considered a bottleneck.

Brevity is probably next.

> Language speed is a great advantage, but if it always was the most
> important, people would talk directly to the CPUs or knit DSP chips in
> theirs backyards whenever they need to solve a calculation problem.

Sure.

> Please don't make this discussion heading towards `what is the fastest
> way of doing d4 transform and why OCaml rules` instead of `what is the
> optimal way of doing things under given circumstances and still have a
> free afternoon ;-)`.

Comments like that seem to be based on the fundamental misconception that
writing C++ like this is hard.

> Different tasks need different, well-balanced
> measures. Neither Python nor OCalm nor any other language is a panacea
> for every single problem.

Absolutely. OCaml is as good as the next (compiled) language in this case.
Python and Matlab seem to be particularly bad at this task.

Here's my complete OCaml:

let a = sqrt 3. and b = 4. *. sqrt 2.
let h0, h1, h2, h3 =
(1. +. a) /. b, (3. +. a) /. b, (3. -. a) /. b, (1. -. a) /. b
let g0, g1, g2, g3 = h3, -.h2, h1, -.h0

let rec d4_aux tmp a n =


let n2 = n lsr 1 in

for i=0 to n2-2 do
tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
done;
tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
Array.blit tmp 0 a 0 n;

if n > 4 then d4_aux tmp a (n lsr 1)

let d4 a =
let tmp = Array.make (Array.length a) 0. in
d4_aux tmp a (Array.length a)

let () =
print_endline "Generating test data...";
let x = Array.init (1 lsl 25) (fun _ -> Random.float 1.) in
print_endline "Benchmarking...";
let t1 = Sys.time() in
ignore(d4 x);
Printf.printf "Elapsed time is %.6f seconds\n" (Sys.time() -. t1)

and C++:

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

using namespace std;

double a = sqrt(3), b = 4 * sqrt(2);
double h0 = (1 + a) / b, h1 = (3 + a) / b, h2 = (3 - a) / b, h3 = (1 - a) /
b;
double g0 = h3, g1 = -h2, g2 = h1, g3 = -h0;

void d4(vector<double> &a) {
vector<double> tmp(a.size());
for (int n = a.size(); n >= 4; n >>= 1) {
int half = n >> 1;

for (int i=0; i<n-2; i+=2) {
tmp.at(i/2) = a.at(i)*h0 + a.at(i+1)*h1 + a.at(i+2)*h2 + a.at(i+3)*h3;
tmp.at(i/2+half) = a.at(i)*h3 - a.at(i+1)*h2 + a.at(i+2)*h1 -
a.at(i+3)*h0
;
}

tmp.at(half-1) = a.at(n-2)*h0 + a.at(n-1)*h1 + a.at(0)*h2 + a.at(1)*h3;
tmp.at(2*half-1) = a.at(n-2)*h3 - a.at(n-1)*h2 + a.at(0)*h1 -
a.at(1)*h0;

copy(tmp.begin(), tmp.begin() + n, a.begin());
}
}

int main() {
cout << "Generating data...\n";
vector<double> a(1 << 25);
for (int i=0; i<a.size(); ++i) a.at(i) = rand();
cout << "Benchmarking...\n";
double t1 = clock();
d4(a);
cout << "Elapsed time " << (clock() - t1) / CLOCKS_PER_SEC << "s\n";
}

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Mark Morss

unread,
Dec 5, 2006, 10:35:37 AM12/5/06
to
I doubt that anyone would dispute that even as boosted by Numpy/Scipy,
Python will almost certainly be notably slower than moderately
well-written code in a compiled language. The reason Numpy exists,
however, is not to deliver the best possible speed, but to deliver
enough speed to make it possible to solve large numerical problems with
the powerful and flexible Python language. As observed by Hans
Latangen in _Python Scripting for Computational Science_, 2nd ed.,
Springer 2005, scientific computing is more than number crunching:

"Very often programming is about shuffling data in and out of different
tools, converting one data format to another, extracting numerical data
from a text, and administering numerical experiments involving a large
number of data files and directories. Such tasks are much faster to
accomplish in a language like Python than in Fortran, C, C++ or Java."

He might well have mentioned the importance of developing nice-looking
reports once the analysis is complete, and that development is simpler
and more flexible in Python than a compiled language.

In principle, I agree that heavy-duty number-crunching, at least if it
has to be repeated again and again, should be accomplished by means of
a compiled language. However, if someone has to solve many different
problems just one time, or just a few times (for example if you are an
engineering consultant), there is an excellent argument for using
Python + Numpy. Unless it affects feasibility, I opine, computational
speed is important primarily in context of regular production, e.g.,
computing the daily "value at risk" in a commodity trading portfolio,
or making daily weather predictions.

I am aware of the power and flexibility of the OCaml language, and
particularly that an OCaml user can easily switch back and forth
between interpreted and compiled implementation. I'm attacted to OCaml
and, indeed, I'm in the process of reading Smith's (unfortunately not
very well-written) _Practical OCaml_. However, I also understand that
OCaml supports only double-precision implementation of real numbers;
that its implementation of arrays is a little clunky compared to
Fortran 95 or Numpy (and I suspect not as fast as Fortran's); and that
the available libraries, while powerful, are by no means as
comprehensive as those available for Python. For example, I am unaware
of the existance of an HDF5 interface for OCaml.

In summary, I think that OCaml is an excellent language, but I think
that the question of whether to use it in preference to Python+Numpy
for general-purpose numerical analysis must rest on much more than its
computational speed.

Mark Morss

unread,
Dec 5, 2006, 10:37:06 AM12/5/06
to
Hans >Langtangen<, rather.

Mark Morss

unread,
Dec 5, 2006, 12:16:07 PM12/5/06
to
Carl,

I agree with practically everything you say about the choice between
Python and functional languages, but apropos of Ocaml, not these
remarks:

>
> In the same way that a screwdriver can't prevent you from driving a
> nail. Give me a break, we all know these guys (Haskell especially) are
> designed to support functional first and other paradigms second. Way
> second. That's quite enough imposing for me.

> ...


> And let's face it, functional
> languages are (or seem to be) all about purity.

As you will see if you look at some OCaml references (there are a
number that are available on the OCaml website) that OCaml goes a very
long way toward facilitating imperative, procedural methods. It's
possible to write perfectly good code in OCaml and not use functional
methods. The OCaml people are fond of saying "OCaml isn't pure."
Indeed, the need for speed is one very good reason to choose and
imperative style over a functional one.

Konrad Hinsen

unread,
Dec 6, 2006, 5:52:12 AM12/6/06
to pytho...@python.org
On Dec 5, 2006, at 16:35, Mark Morss wrote:

> very well-written) _Practical OCaml_. However, I also understand that
> OCaml supports only double-precision implementation of real numbers;
> that its implementation of arrays is a little clunky compared to
> Fortran 95 or Numpy (and I suspect not as fast as Fortran's); and that
> the available libraries, while powerful, are by no means as
> comprehensive as those available for Python. For example, I am
> unaware
> of the existance of an HDF5 interface for OCaml.

OCaml is a popular language with computer scientists in France (not
surprisingly, given its origins), and I have regular contacts with
the computer science research community, so I have been exposed quite
a bit to OCaml, to the point of once considering it for numerical
work. Not as a replacement for Python, but as a replacement for C or
Fortran, i.e. for the heavy-duty parts of my work.

What made me abandon this idea are mainly two points:
1) the lack of polymorphism
2) the difficulty of interfacing to C and Fortran code
3) the limited processor support of the native code compiler

The lack of polymorphism, particularly in operators, makes OCaml code
a pain to write and read in my opinion. Float arithmetic is already
bad enough, with "+." etc. everywhere. If I add complex and vector
types (which could easily be defined in OCaml), I'd have to come up
with two more sets of arithmetic operators that make my code less
readable. In addition, I'd either have to keep multiple
implementations for many algorithms, or pass in all operators and
mathematical functions as arguments. Neither solution is acceptable
for me.

Interfacing to C and Fortran code is important because that's what
most libraries are written in. While it is possible in principle with
OCaml, it is (or at least was at the time I looked) a pain to
interface typical array-handling code, for lack of a compatible data
structure on the OCaml side.

Native code compilation is obviously important for speed. While many
popular processors are supported by ocamlopt, scientific users are
notorious for grabbing whatever fast hardware they can lay their
hands on. It seems safe to count on the GNU suite being ported
rapidly to any new platform. I don't have as much faith in the OCaml
developers, though perhaps just out of ignorance.

Konrad.

Jon Harrop

unread,
Dec 10, 2006, 5:23:17 AM12/10/06
to
Konrad Hinsen wrote:
> The lack of polymorphism, particularly in operators, makes OCaml code
> a pain to write and read in my opinion.

F# addresses this by adding operator overloading. However, you have to add
more type annotations...

> Interfacing to C and Fortran code is important because that's what
> most libraries are written in. While it is possible in principle with
> OCaml, it is (or at least was at the time I looked) a pain to
> interface typical array-handling code, for lack of a compatible data
> structure on the OCaml side.

You want bigarrays, they are just C/Fortran arrays. Look at the bindings to
FFTW/LAPACK, for example.

> Native code compilation is obviously important for speed. While many
> popular processors are supported by ocamlopt, scientific users are
> notorious for grabbing whatever fast hardware they can lay their
> hands on. It seems safe to count on the GNU suite being ported
> rapidly to any new platform.

OCaml already supports 9 architectures and optimised to AMD64 earlier than
gcc. How many do you want?

Jon Harrop

unread,
Dec 10, 2006, 5:25:48 AM12/10/06
to
Carl Banks wrote:
> I doubt it would work too well.

What about translating the current Python interpreter into a language with a
GC, like MLton-compiled SML? That would probably make it much faster, more
reliable and easier to develop.

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

konrad...@laposte.net

unread,
Dec 10, 2006, 7:29:09 AM12/10/06