Re: About alternatives to Matlab

sturlamolden

unread,

Nov 16, 2006, 4:09:03 PM11/16/06

to

Boris wrote:
> Hi, is there any alternative software for Matlab? Although Matlab is
> powerful & popular among mathematical & engineering guys, it still
> costs too much & not publicly open. So I wonder if there's similar
> software/lang that is open & with comparable functionality, at least
> for numerical aspect. Thanks!

I have used Matlab for years, and has recently changed to Python. In
addition one needs NumPy and Matplotlib, and perhaps SciPy. Other
useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for
3D graphics, Mpi4Py for parallel computation, etc. You will find python
packages for nearly any conceivable task. Unlike Matlab, Python is a
general purpose programming language, and a distant cousin of Lisp. The
Python language is fare more expressive and productive than Matlab, yet
even more easy to use.

The NumPy package is the core requirement for numerical work in Python.
It is quite different form Matlab, but I think it is more powerful.
Particularly, arrays are passed by reference (not by value), and
indexing creates view matrices.

To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
transform. I have here coded it in Matlab and Python/NumPy using Tim
Swelden's lifting scheme.

First the Matlab version (D4_Transform.m):

function x = D4_Transform(x)
% D4 Wavelet transform in Matlab
% (C) Sturla Molden
C1 = 1.7320508075688772;
C2 = 0.4330127018922193;
C3 = -0.066987298107780702;
C4 = 0.51763809020504137;
C5 = 1.9318516525781364;
s1 = zeros(ceil(size(x)/2));
d1 = zeros(ceil(size(x)/2));
d2 = zeros(ceil(size(x)/2));
odd = x(2:2:end);
even = x(1:2:end-1);
d1(:) = odd - C2*even;
s1(1) = even(1) + C2*d1(1) + C3*d1(end);
s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);
d2(1) = d1(1) + s1(end);
d2(2:end) = d1(2:end) + s1(1:end-1);
x(1:2:end-1) = C4*s1;
x(2:2:end) = C5*d2;
if (length(x) >2)
x(1:2:end-1) = D4_Transform(x(1:2:end-1));
end

Then the Python version (D4.py):

import numpy
import time

def D4_Transform(x, s1=None, d1=None, d2=None):
"""
D4 Wavelet transform in NumPy
(C) Sturla Molden
"""
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C4 * s1[:]
odd[:] = C5 * d2[:]
if x.size > 2:

D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

if __name__ == "__main__":
x = numpy.random.rand(2**23)
t0 = time.clock()
D4_Transform(x)
t = time.clock()
print "Elapsed time is %.6f seconds" % (t-t0)

Now let's do benchmark on my laptop (1.73 GHz Pentium M, 0.99 GB RAM).
I have stopped paying for Matlab maintenance (for reasons that will be
obvious), so I only have R14 Service Pack 2 for comparison.

First we try Matlab (R14 Service Pack 2):

>> x = rand(2^23,1);
>> tic; D4_Transform(x); toc
Elapsed time is 27.145438 seconds.

Then we Python 2.4.4 with NumPy 1.0:

C:\develop\python\D4>python D4.py
Elapsed time is 3.364887 seconds

That is quite astonishing.

If anyone wonders why I think Travis Oliphant and the NumPy team should
be knighted, then this is the answer. The Mathworks' product only
achieved 100% * 3/27 = 11% the speed of Python/NumPy, and is infinitely
more expensive.

Does anyone wonder why I am not paying for Matlab maintenance anymore?

Sorry Mathworks, I have used your product for years, but you cannot
compete with NumPy.

Cheers,
Sturla Molden

John Henry

unread,

Nov 16, 2006, 4:46:13 PM11/16/06

to

Bill Gates will have you jailed! :-)

On a more serious note, is there any alternative to Simulink though?

Ramon Diaz-Uriarte

unread,

Nov 16, 2006, 5:23:07 PM11/16/06

to sturlamolden, pytho...@python.org

R (http://cran.r-project.org) might be an alternative, specially if
you do a lot of statistics and graphics. (R is probably the most
widely used language/system in statistical research).

R.

> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

Cameron Laird

unread,

Nov 16, 2006, 5:55:38 PM11/16/06

to

In article <1163711343.8...@h48g2000cwc.googlegroups.com>,

sturlamolden <sturla...@yahoo.no> wrote:
>Boris wrote:
>> Hi, is there any alternative software for Matlab? Although Matlab is
>> powerful & popular among mathematical & engineering guys, it still
>> costs too much & not publicly open. So I wonder if there's similar
>> software/lang that is open & with comparable functionality, at least
>> for numerical aspect. Thanks!
>
>I have used Matlab for years, and has recently changed to Python. In

.
.
.
While I frequently push Matlab users toward Python, they also deserve
to know about Octave <URL:
http://www-128.ibm.com/developerworks/library/l-oslab/?n-l-10242 >.

Matimus

unread,

Nov 16, 2006, 6:37:48 PM11/16/06

to

Boris wrote:
> Hi, is there any alternative software for Matlab? Although Matlab is
> powerful & popular among mathematical & engineering guys, it still
> costs too much & not publicly open. So I wonder if there's similar
> software/lang that is open & with comparable functionality, at least
> for numerical aspect. Thanks!

There is also Scilab. I've only used it a tiny bit but for the task it
worked well. I do know that it has a somewhat restrictive license. It
is open source, but you aren't allowed to modify and redistribute the
source. I get the feeling that some people avoid Scilab but I'm not
sure why. If anybody knows the reason I'd be happy to hear it.

Scilab website: http://www.scilab.org

-Matt

Robert Kern

unread,

Nov 16, 2006, 6:47:09 PM11/16/06

to pytho...@python.org

Matimus wrote:
> There is also Scilab. I've only used it a tiny bit but for the task it
> worked well. I do know that it has a somewhat restrictive license. It
> is open source, but you aren't allowed to modify and redistribute the
> source. I get the feeling that some people avoid Scilab but I'm not
> sure why. If anybody knows the reason I'd be happy to hear it.

I think you just stated it.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Brian Blais

unread,

Nov 17, 2006, 1:51:02 PM11/17/06

to Matimus, scipy...@scipy.org, pytho...@python.org

Matimus wrote:
> Boris wrote:
> > Hi, is there any alternative software for Matlab? Although Matlab is
> > powerful & popular among mathematical & engineering guys, it still
> > costs too much & not publicly open. So I wonder if there's similar
> > software/lang that is open & with comparable functionality, at least
> > for numerical aspect. Thanks!
>
> There is also Scilab.

> If anybody knows the reason I'd be happy to hear it.

I feel like I need to respond on this thread, as a recent Matlab/Octave/Scilab to
Python convert. I'm cross-posting this to the scipy list, because there was a recent
thread about advantages to Python over Matlab.

Bottom line: Python is more than a replacement for Matlab, in almost every capacity.
There are a couple of places here and there where I feel Matlab still has an
advantage, but since I converted about 12 months ago, I have run Matlab only a couple
of times, and only to send something to someone who didn't have Python installed.

I've been using Matlab for more than 10 years, in every part of my research and
teaching. When I started to teach courses with a larger Matlab component, I looked
for free/cheap alternatives that the students could use on their own machines. There
was the student version of Matlab, which was crippled in ways unacceptable to me.

Octave was good, and I used it a couple of years, but the MS Windows port is pretty
horrible. I still use it some on Linux, but on Windows it requires Cygwin to be
installed, and performance takes a *huge* hit with the compiled extensions in
Windows. The community is pretty amazing, and the main author John Eaton is nothing
short of incredible (I've personally received bug fix responses within minutes of
submitting the bug report!). Writing C++ extensions is also pretty easy in Octave,
and I made a lot of use of that. The graphics uses a connection to Gnuplot, which is
ok.

I used Scilab for a short while. It did install more cleanly in Windows, and the
performance was generally better than Octave. I didn't like how some things were not
Matlab compatible for no good reason. I got the feeling from the community that it
wasn't as open, and that decisions were somewhat arbitrary. GUI syntax was weird,
and compiling extensions was a real pain, requiring more than one file per function.

I was introduced to python about 12 months ago, and haven't looked back since. :)
I want to state some of the hurdles that I felt in the transition, in the hope that
it will help others to overcome them. The advantages to Python far outweigh the
problems that I had.

1) in Matlab, you can call a function without any execfile() or import statements.
This impacts Python negatively in the ease of using it interactively. On the other
hand, the one-function per file organization of Matlab can get a little hard to follow.

2) following up on point 1), this has an effect on matplotlib more drastically.
Turning it to interactive mode (ion()) helps, but in Linux I get window refresh
issues with this mode. Switching to ipython for the shell, making sure to call it
with ipython -pylab, has been a big help in this regard.

3) 3D plotting requires yet-another library. luckily I haven't had to use this much,
but I hope that someday that it will be part of matplotlib.

4) GUI development is a bit easier in Matlab for small projects, and harder for
larger ones. I like using wax in python, which wraps wxPython. Others swear by
Dabo, but I always get two windows popping up when I run even the hello world daboui
program. At any rate, wxPython looks much better on Windows than the same interface
in Matlab, and it is more robust across operating systems.

5) although stated in the python docs that python is easy to interface with C, and in
the scipy docs that it is easier to interface to C than Matlab, I cannot agree.
perhaps it is my limitation, but I found that the API was more complicated (because
of the multiple data structures), and having to keep track of references was
something I never had to do in writing Matlab extensions. HOWEVER, once I found
Pyrex, all of that changed. Pyrex is simply the best way to write an extension that
I have ever seen, and I have never seen its equal for any other language.

So my recommendation for a (nearly) complete Matlab replacement would be:
python
numpy
scipy
matplotlib
pyrex

There is a bit of a learning curve, but it is well worth it. On an amusing note, if
you work in Python for a while, and go back to Matlab for a bit, you will find that
you will forget your semi-colons at the ends of the Matlab lines, and all of your
arrays will display and scroll like crazy on your screen. :)

thanks all for making such a great set of tools!

Brian Blais

--
-----------------

bbl...@bryant.edu
http://web.bryant.edu/~bblais

Filip Wasilewski

unread,

Nov 17, 2006, 2:17:49 PM11/17/06

to

sturlamolden wrote:
> Boris wrote:
> > Hi, is there any alternative software for Matlab? Although Matlab is
> > powerful & popular among mathematical & engineering guys, it still
> > costs too much & not publicly open. So I wonder if there's similar
> > software/lang that is open & with comparable functionality, at least
> > for numerical aspect. Thanks!
>
> I have used Matlab for years, and has recently changed to Python. In
> addition one needs NumPy and Matplotlib, and perhaps SciPy. Other
> useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for
> 3D graphics, Mpi4Py for parallel computation, etc. You will find python
> packages for nearly any conceivable task. Unlike Matlab, Python is a
> general purpose programming language, and a distant cousin of Lisp. The
> Python language is fare more expressive and productive than Matlab, yet
> even more easy to use.

Huh, so you don't think it's just damn sexy to do OOP, networking and
databases by multiplying matrices?;-)

> The NumPy package is the core requirement for numerical work in Python.
> It is quite different form Matlab, but I think it is more powerful.
> Particularly, arrays are passed by reference (not by value), and
> indexing creates view matrices.

For those interested in drilling down on this topic there is a Numpy
for Matlab Users guide at http://scipy.com/NumPy_for_Matlab_Users

> To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
> transform. I have here coded it in Matlab and Python/NumPy using Tim
> Swelden's lifting scheme.

[...]

Actually you have not. The algorithm you presented gives completely
wrong results. Have a look at quick&dirty(TM) implementation bellow.

import math
import numpy

def d2_lwt(x, axis=-1, level=None, copy=False):
"""
Daubechies2 (4 point support) Lifting Wavelet Transform in NumPy
"""
C1 = 1.7320508075688772 # sqrt(3)
C2 = 0.4330127018922193 # sqrt(3)/4
C3 = -0.066987298107780702 # (sqrt(3)-2)/4)
C4 = 0.51763809020504137 # (sqrt(3)-1)/sqrt(2)
C5 = 1.9318516525781364 # (sqrt(3)+1)/sqrt(2)

if not isinstance(x, numpy.ndarray) or x.dtype.kind != 'f':
x = numpy.array(x, dtype=numpy.float64)
elif copy:
x = x.copy()

max_level = int(math.floor(math.log(x.shape[axis],2)))
if level is None:
level = max_level
else:
assert level <= max_level, "Level param too high"

coeffs = []
cA = x.swapaxes(0, axis)
while level > 0:
level -= 1

# lazy
even = cA[::2]
odd = cA[1::2]

# dual
# using `even = even + C1*odd` may speed up things on
# some machines (only for not in-place transform).
even += C1*odd

# primal
odd[0] -= C2*even[0] + C3*even[-1]
odd[1:] -= C2*even[1:] + C3*even[:-1]

# dual
even[:-1] -= odd[1:]
even[-1] -= odd[0]

# scale
even *= C4
odd *= C5

cA, cD = even, odd
coeffs.append(cD.swapaxes(axis, 0))

coeffs.append(cA.swapaxes(axis, 0))
coeffs.reverse()
return coeffs

if __name__ == "__main__":
d = [1,5,3,2,4,8,5,2]
data = [
numpy.array(d, dtype=numpy.float32),
numpy.array(d, dtype=numpy.float64),
numpy.array([d]*2, dtype=numpy.float64),
numpy.array([d]*2, dtype=numpy.float32),
numpy.array(d, dtype=numpy.int32),
[d]*2,
[[d,d]]*3,
]

for i,x in enumerate(data):
print "Case %d:" % (i+1)
print "x in:\n", x
coeffs = d2_lwt(x, axis=-1, level=None, copy=False)
print "coeffs:"
for c in coeffs:
print c, c.dtype
print "x out:\n", x
print

Please excuse me not including Matlab version here, I would like to
have a free weekend.

As far as the speed comparison is concerned I totally agree that NumPy
can easily outperform Matlab in most cases. Of course one can use
compiled low-level extensions to speed up specific computations in
Matlab, but it's a lot easier and/or cheaper to find very good tools
for Python.

> If anyone wonders why I think Travis Oliphant and the NumPy team should
> be knighted, then this is the answer.

Would love to see the ceremony:)

cheers,
fw

Sébastien Boisgérault

unread,

Nov 17, 2006, 2:38:02 PM11/17/06

to

On Nov 16, 10:46 pm, "John Henry" <john106he...@hotmail.com> wrote:
> Bill Gates will have you jailed! :-)
>
> On a more serious note, is there any alternative to Simulink though?

Ptolemy II. Java stuff in the core but components may be written in
Python

http://ptolemy.eecs.berkeley.edu/ptolemyII/
http://ptolemy.eecs.berkeley.edu/ptolemyii/ptII5.0/ptII/doc/codeDoc/ptolemy/actor/lib/python/PythonScript.htm

John Henry

unread,

Nov 17, 2006, 4:42:47 PM11/17/06

to

Thanks for pointing that out. I wasn't aware of this. Will take a
look.

Maarten van Reeuwijk

unread,

Nov 17, 2006, 7:15:48 PM11/17/06

to

sturlamolden wrote:

> Sorry Mathworks, I have used your product for years, but you cannot
> compete with NumPy.

Funny. I went exactly the other way. Had a full OO postprocessing library
for Python/Scipy/HDF etc which worked brilliantly. Then changed to a 64 bit
machine and spent three days trying to install all the additional libraries
that I had become dependent on over the year. In the end, I gave up
completely frustrated (something to do with a Fortran compiler in
combination with SciPy or something). Then I tried MATLAB and was
completely delighted that there was a program with all MY batteries
included :-).

BTW, I have the impression that MATLAB and SciPy have about the same
performance.

Cheers, Maarten
--
===================================================================
Maarten van Reeuwijk dept. of Multiscale Physics
Phd student Faculty of Applied Sciences
maarten.vanreeuwijk.net Delft University of Technology

Prabhu Ramachandran

unread,

Nov 18, 2006, 5:39:31 AM11/18/06

to SciPy Users List, pytho...@python.org, Matimus

>>>>> "Brian" == Brian Blais <bbl...@bryant.edu> writes:

Brian> 3) 3D plotting requires yet-another library. luckily I
Brian> haven't had to use this much, but I hope that someday that
Brian> it will be part of matplotlib.

I'd rather not say anything about this since I have strong opinions
about the subject.

Brian> 4) GUI development is a bit easier in Matlab for small
Brian> projects, and harder for larger ones. I like using wax in
Brian> python, which wraps wxPython. Others swear by Dabo, but I
Brian> always get two windows popping up when I run even the hello
Brian> world daboui program. At any rate, wxPython looks much
Brian> better on Windows than the same interface in Matlab, and it
Brian> is more robust across operating systems.

You should try and take a look at enthought.traits. It makes creating
UIs a lot easier than anything else I've seen.

cheers,
prabhu

sturlamolden

unread,

Nov 18, 2006, 1:57:00 PM11/18/06

to

Filip Wasilewski wrote:

> Actually you have not. The algorithm you presented gives completely
> wrong results. Have a look at quick&dirty(TM) implementation bellow.

God grief. I followed the implementation in Ingrid Daubechies' and Wim
Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:
247-269, 1998). If you look at the factorized polyphase matrix for D4
(which gives the inverse transform), their implementation of the
lifting steps the original paper is incorrect. That is scary.

A free PDF or PS of the paper is available here:

http://cm.bell-labs.com/who/wim/papers/factor/index.html

Thanks a lot!

Sturla Molden

sturlamolden

unread,

Nov 18, 2006, 2:45:27 PM11/18/06

to

sturlamolden wrote:

> God grief. I followed the implementation in Ingrid Daubechies' and Wim
> Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:
> 247-269, 1998). If you look at the factorized polyphase matrix for D4
> (which gives the inverse transform), their implementation of the
> lifting steps the original paper is incorrect. That is scary.
>
> A free PDF or PS of the paper is available here:
>
> http://cm.bell-labs.com/who/wim/papers/factor/index.html

Actually its more complicated than that. The PS and the PDF files are
different, and only the PS-file is confused. Here is the correct
explanation:

The factorization of the polyphase matrix is not unique. There are
several valid factorizations. Our implementations corresponds to
different factorizations of the analysis and synthesis poyphase
matrices, and both are in a sence correct, although numerically
different.

sturlamolden

unread,

Nov 19, 2006, 12:18:09 PM11/19/06

to

sturlamolden wrote:

> def D4_Transform(x, s1=None, d1=None, d2=None):
> """
> D4 Wavelet transform in NumPy
> (C) Sturla Molden
> """
> C1 = 1.7320508075688772
> C2 = 0.4330127018922193
> C3 = -0.066987298107780702
> C4 = 0.51763809020504137
> C5 = 1.9318516525781364
> if d1 == None:
> d1 = numpy.zeros(x.size/2)
> s1 = numpy.zeros(x.size/2)
> d2 = numpy.zeros(x.size/2)
> odd = x[1::2]
> even = x[:-1:2]
> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1] #typ0
> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1] #typo

> d2[0] = d1[0] + s1[-1]
> d2[1:] = d1[1:] + s1[:-1]
> even[:] = C4 * s1[:]
> odd[:] = C5 * d2[:]
> if x.size > 2:
> D4_Transform(even,s1[0:even.size/2],
> d1[0:even.size/2],d2[0:even.size/2])

Actually, there was a typo in the original code. I used d1[l-1] where I
should have used d1[l+1]. Arrgh. Here is the corrected version, the
Matlab code must be changed similarly. It has no relevance for the
performance timings though.

def D4_Transform(x, s1=None, d1=None, d2=None):
"""
D4 Wavelet transform in NumPy
(C) Sturla Molden
"""
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]

s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]

Filip Wasilewski

unread,

Nov 19, 2006, 3:28:56 PM11/19/06

to

sturlamolden wrote:
[...]

> Here is the correct explanation:
>
> The factorization of the polyphase matrix is not unique. There are
> several valid factorizations. Our implementations corresponds to
> different factorizations of the analysis and synthesis poyphase
> matrices, and both are in a sence correct, although numerically
> different.

Please correct me if I'm missing something but I'm pretty sure that
choice of polyphase matrix factorization for given wavelet filter bank
has influence only on speed and numerical stability of computation and
not on the result itself. Particularly it should be possible to
reconstruct analysis and synthesis filters from polyphase matrices
regardless of the chosen factorization and both the discrete wavelet
transform and the wavelet lifting scheme should give corresponding
results for chosen wavelet (one can be rewritten in the form of other).

cheers,
fw

Filip Wasilewski

unread,

Nov 19, 2006, 3:33:27 PM11/19/06

to

If you swap C4 with C5 then our solutions give identical results and
both match the classic dwt approach:

even[:] = C5 * s1[:]
odd[:] = C4 * d2[:]

> if x.size > 2:
> D4_Transform(even,s1[0:even.size/2],
> d1[0:even.size/2],d2[0:even.size/2])

cheers,
fw

Phil Schmidt

unread,

Nov 23, 2006, 8:18:02 PM11/23/06

to

Brian Blais wrote:

> So my recommendation for a (nearly) complete Matlab replacement would be:
> python
> numpy
> scipy
> matplotlib
> pyrex
>

Brian,

Thanks for that list. I'm currently in the process of getting quotes
for a bunch of Matlab tools for hardware-in-the-loop simulation. Big
bucks.

I'd love to use Python, but I'm not comfortable with the hardware side
of that. I'm certain that most, if not all data acquisition hardware
comes with DLL drivers, which I could interface with using ctypes. I'm
concerned though about spending more time messing around with the
hardware interface than it's worth.

Do you have any experience with this side of the Matlab replacement
question? How about anyone else? Any recommendations?

Thanks,
Phil

Brian Blais

unread,

Nov 25, 2006, 11:57:30 AM11/25/06

to Phil Schmidt, pytho...@python.org

Phil Schmidt wrote:
>
> I'd love to use Python, but I'm not comfortable with the hardware side
> of that. I'm certain that most, if not all data acquisition hardware
> comes with DLL drivers, which I could interface with using ctypes. I'm
> concerned though about spending more time messing around with the
> hardware interface than it's worth.
>
> Do you have any experience with this side of the Matlab replacement
> question? How about anyone else? Any recommendations?
>

Unfortunately, no I don't have experience with the hardware end of things. One can,
both with c-types and with pyrex, very easily use existing C libraries so if there is
a C-API, that shouldn't be a problem.

Sorry for not being much help!

bb

sturlamolden

unread,

Nov 25, 2006, 8:29:22 PM11/25/06

to

Phil Schmidt wrote:

> Thanks for that list. I'm currently in the process of getting quotes
> for a bunch of Matlab tools for hardware-in-the-loop simulation. Big
> bucks.

Yup, and better spent elsewhere...

> I'd love to use Python, but I'm not comfortable with the hardware side
> of that. I'm certain that most, if not all data acquisition hardware
> comes with DLL drivers, which I could interface with using ctypes. I'm
> concerned though about spending more time messing around with the
> hardware interface than it's worth.

Usually you have to mess equally much with Matlab, writing CMEX
interfaces between the DLLs and Matlab. And afterwards you get the
headache of Matlab's single-threaded environment and horrible
pass-by-value sematics.

> Do you have any experience with this side of the Matlab replacement
> question? How about anyone else? Any recommendations?

If you are afraid of doing some C coding or using ctypes, I'd say go
for LabView. When it comes to data acquisition, LabView is far superior
to Matlab. And data acquisition hardware usually comes with LabView
drivers ready to run. LabView can also compile programs that you can
run on real-time OS'es and common DSP chips, so you will not need to
port your code to these targets if you need hard real-time constraints
in your system.

First find out what drivers or APIs are supplied with the hardware.
Then do the decision of which language to use - including Python,
Matlab, LabView, Java, C# .NET, C or C++. I would in any case get a
quote for LabView as well, i does not cost you anything just to obtain
the quote. Generally, development time is far more expensive than
licenses, even with the ultra-expensive Matlab and LabView software.
Using Python just for the sake of using Python is silly.

Phil Schmidt

unread,

Nov 26, 2006, 12:23:30 AM11/26/06

to

sturlamolden wrote:

> Using Python just for the sake of using Python is silly.

Well, that kind of gets right to my point. Does the "added" effort with
Python to interface with data acquisition hardware really result in
less productivity? I am very familiar with Matlab, Labview, and Python,
and frankly, Python is the most productive and powerful programming
language of the three. But it's the hardware compatibility thing that
concerns me with Python.

It seems the answers are hard to come by on this issue. I sure would be
willing to give it a try, except that I'm getting paid to get a job
done, not to tinker around with Python and DAQ hardware. But if
tinkering around could save my project money on commercial software,
and still get it done on schedule, it would be a big win. I just don't
have confidence that it would.

Phil

Robert Kern

unread,

Nov 26, 2006, 12:52:19 AM11/26/06

to pytho...@python.org

Phil Schmidt wrote:
> sturlamolden wrote:
>
>> Using Python just for the sake of using Python is silly.
>
> Well, that kind of gets right to my point. Does the "added" effort with
> Python to interface with data acquisition hardware really result in
> less productivity? I am very familiar with Matlab, Labview, and Python,
> and frankly, Python is the most productive and powerful programming
> language of the three. But it's the hardware compatibility thing that
> concerns me with Python.

Without knowing the hardware that you want to use, it's quite difficult for us
to tell you anything useful. Most likely, you're simply going to have to sit
down with a list of the hardware that you want to use, the existing solutions
for interfacing with that hardware (e.g. already written MATLAB, LabView, or
Python extension modules for that hardware), and their costs. You're going to
have to weigh those costs against the cost of using ctypes or something similar
to wrap the raw DLLs if a Python extension module isn't available.

We might be able to help you locate the already-existing Python wrappers for
your hardware, but the rest are judgement calls that only you can make. To make
an accurate judgement, you will probably need to have a little bit of experience
trying to wrap one such raw DLL. If you can't afford the time to do that one
experiment, then you probably have your answer.

sturlamolden

unread,

Nov 26, 2006, 12:43:48 PM11/26/06

to

Phil Schmidt wrote:

> Well, that kind of gets right to my point. Does the "added" effort with
> Python to interface with data acquisition hardware really result in
> less productivity? I am very familiar with Matlab, Labview, and Python,
> and frankly, Python is the most productive and powerful programming
> language of the three. But it's the hardware compatibility thing that
> concerns me with Python.

This is really a question that has no general answer. It depends on the
specific hardware and the solutions available.

Generally, Matlab and Python sucks equally much when it comes to
interfacing with hardware. Both tycan use C extensions (which you may
have to write) or ctypes (Matlab's equivalent is the loadlibrary/callib
functions). If only a C SDK is available, SWIG can autogenerate Python
wrappers (require some tweking). Matlab's loadlibrary can sometimes
parse C header files, but it is not always successful.

Both Matlab and Python can use COM/ActiveX objects on Windows. In
addition, Matlab can use Java classes, with which Python does not
interface easily. On the other hand, Python can use .NET objects, if
you use Microsoft's IronPython or "Python for .NET" with classic
Python. Last time I checked, Matlab could not use .NET objects, but
that was a year ago.

Most data acquisition hardware are supplied with SDKs for C, and
perhaps COM/ActiveX objects for RAD tools like Visual Basic and Delphi.
Modern hardware is often supplied with .NET SDKs. LabView drivers are
also commonly supplied. If none are supplied by the vendor, there are
often prewritten solutions for Matlab, Python or Java available on the
Internet, but you will have to search for them, or write your own.

Thus, it is not generally true that it is easier to interface hardware
with Matlab than Python. It depens on what SDKs you get.

On the programming side, Python is a general purpose language, Matlab
is designed for working with matrices easily. Array and matrix
computation in Python require NumPy, which has recently been declared
stable. Matlab has been stable for quite a while. Python is
multi-threaded, Matlab is single-threaded. Matlab uses pass-by-value
even for the largest of arrays, Python always pass references.

With Python, you can program with multiple threads; with Matlab, you
have everything in one thread. As Matlab only supports synchronous i/o
out of the box, you cannot do anything else while Matlab's fwrite or
fread functions are 'blocking' (i.e. waiting for data from hardware or
writing to disk). This is Matlab's major disadvantage for data
acquisition applications. Thus to make your data acquisition run
smoothly and efficiently with Matlab, you must resort to writing CMEX
extensions for performing asynchronous I/O or spawn off threads. Have
you ever looked at overlapped i/o in the Windows API? I can guarantee
that it is hundred times more complex than anything a hardware vendor
can throw at you. Alternatively, you can code the i/o in Java, and make
have Matlab call Java for i/o tasks. With Python, this is a non-issue.
You can put blocking i/o in separate threads or use prewritten APIs for
non-blocking i/o. It is obvious which is the better option here.
Generally, non-blocking i/o and multithreading is essential for any
serious data acqusition application, and Matlab does not support any of
these.

Both Matlab and Python is interpreted languages. Matlab has a
rudimentary JIT compiler, whilst classic Python is interpreted only.
Microsoft's IronPython uses the .NET JIT compiler. Jython uses Java's
JIT compiler (run the Server JVM or IBM's JDK for best performance,
don't use the client JVM with hotspot). However, if you use IronPython
or Jython, NumPy is out of the question. Thus, Matlab has perhaps a
slight advantage here.

With Python you can inline C in your Python code (e.g. using SciPy and
Weave), and even inline assembly in the inlined C. Thus you can easily
deal with hotspots in your Python code. You cannot do this easily with
Matlab.

For these reasons, my general view is that Python is better for data
acquisition than Matlab. But using Python for the sake of using Python
is nevertheless silly. The same holds true for Matlab.

A data acquisition program ofte require fast graphics as well. For slow
2D plots, Python and Matlab perform equally well (Python requires
Matplotlib). Matlab can draw 3D graphs, but not quickly. Python require
VTK for this, which may be harder to use. For fast 2D or 3D graphics,
i.e. DirectDraw, Direct3D or OpenGL, Python is the only game in town.
DirectDraw is used by SDL, which in turn is used by PyGame. Direct3D is
available though DirectPython (or COM objects or .NET objects). OpenGL
is available through PyOpenGL, which has recently been redesigned to
use ctypes; previously the build process for PyOpenGL used to be a
nightmare. For GUI, Matlab only has rudementary capabilities. Python
has several high quality GUI libraries (not counting tkinter). You can
create multimedia applications with PyGame, you cannot do this easily
with Matlab. E.g. do you want to play sound while you record?

> It seems the answers are hard to come by on this issue. I sure would be
> willing to give it a try, except that I'm getting paid to get a job
> done, not to tinker around with Python and DAQ hardware. But if
> tinkering around could save my project money on commercial software,
> and still get it done on schedule, it would be a big win. I just don't
> have confidence that it would.

That you will have to figure out your self. The answer is not obvious.
But for GUI, forget about tkinter.

Rob Purser

unread,

Nov 27, 2006, 10:16:54 AM11/27/06

to

Hi all,

I'm not going to touch the big picture issues here -- you need to pick the
right tool for the job you're doing, and only you know what works best for
your task. However, since it didn't come up, I feel I need to add a piece
of info to the mix, since I spend my days getting MATLAB working with data
acquisition hardware. To be clear, I'm the development lead for Data
Acquisition Toolbox, which is MathWorks' add on product to MATLAB and
Simulink to handle all the issues associated with interfacing data
acquisition hardware without all the CMEX work, threading, DLLs, etc. As
Sturla points out, that's not trivial work in any language.

Anyway, I just wanted to call your attention to Data Acquisition Toolbox:
http://www.mathworks.com/products/daq/

I'm not sure what hardware you're using, Phil, so I can't say whether we
support it. We do add support for new hardware all the time, so I'd be
interested in hearing about your application.

All the best,
-Rob

--
Rob Purser
Senior Team Lead, Connectivity Products
The MathWorks
rob.p...@mathworks.com

"sturlamolden" <sturla...@yahoo.no> wrote in message
news:1164504562.8...@l39g2000cwd.googlegroups.com...

sturlamolden

unread,

Nov 27, 2006, 7:12:17 PM11/27/06

to

Rob Purser wrote:

> Anyway, I just wanted to call your attention to Data Acquisition Toolbox:
> http://www.mathworks.com/products/daq/

Absolutely. If the hardware is supported by this toolbox, there is no
need to reinvent the wheel. The license is expensive, but development
time can be far more expensive. Matlab is an excellent tool for
numerical work.

However, Matlab is only single-threaded and does not optimize its
pass-by-value semantics with proper 'escape analysis'. (Yes, Matlab
does lazy copy-on-write to optimize function calls, but functions have
return values as well.) I have (temporarily?) stopped using Matlab
because I constantly got annoyed by the limitations of the language. I
still use Matlab as an advanced desktop calculator though. Somehow I
find it easier to type in an expression in Matlab than NumPy, probably
because I am more accustomed to the syntax.

Message has been deleted

aap

unread,

Nov 29, 2006, 6:52:05 PM11/29/06

to

John Henry wrote:
> Bill Gates will have you jailed! :-)
>
> On a more serious note, is there any alternative to Simulink though?
>

It's called SciCos, and as far as I've seen it not only covers Simulink
but also PowerSim.

I found only 1 major disadvantage inSciLab, ...
... it has no ActiveX

cheers,
Stef Mientki

Bill Maxwell

unread,

Nov 29, 2006, 11:44:45 PM11/29/06

to

On 16 Nov 2006 13:09:03 -0800, "sturlamolden" <sturla...@yahoo.no>
wrote:

...SNIP...

>To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet
>transform. I have here coded it in Matlab and Python/NumPy using Tim
>Swelden's lifting scheme.
>

>First the Matlab version (D4_Transform.m):
>
>function x = D4_Transform(x)
> % D4 Wavelet transform in Matlab

> % (C) Sturla Molden

> C1 = 1.7320508075688772;
> C2 = 0.4330127018922193;
> C3 = -0.066987298107780702;
> C4 = 0.51763809020504137;

> C5 = 1.9318516525781364;
> s1 = zeros(ceil(size(x)/2));
> d1 = zeros(ceil(size(x)/2));
> d2 = zeros(ceil(size(x)/2));
> odd = x(2:2:end);
> even = x(1:2:end-1);
> d1(:) = odd - C2*even;
> s1(1) = even(1) + C2*d1(1) + C3*d1(end);
> s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);
> d2(1) = d1(1) + s1(end);
> d2(2:end) = d1(2:end) + s1(1:end-1);
> x(1:2:end-1) = C4*s1;
> x(2:2:end) = C5*d2;
> if (length(x) >2)
> x(1:2:end-1) = D4_Transform(x(1:2:end-1));
> end
>
>
>Then the Python version (D4.py):
>
>import numpy
>import time
>

>def D4_Transform(x, s1=None, d1=None, d2=None):
> """
> D4 Wavelet transform in NumPy
> (C) Sturla Molden
> """
> C1 = 1.7320508075688772
> C2 = 0.4330127018922193
> C3 = -0.066987298107780702
> C4 = 0.51763809020504137
> C5 = 1.9318516525781364
> if d1 == None:
> d1 = numpy.zeros(x.size/2)
> s1 = numpy.zeros(x.size/2)
> d2 = numpy.zeros(x.size/2)
> odd = x[1::2]
> even = x[:-1:2]
> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

> d2[0] = d1[0] + s1[-1]
> d2[1:] = d1[1:] + s1[:-1]
> even[:] = C4 * s1[:]
> odd[:] = C5 * d2[:]

> if x.size > 2:
>

>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])
>
>if __name__ == "__main__":
> x = numpy.random.rand(2**23)
> t0 = time.clock()
> D4_Transform(x)
> t = time.clock()
> print "Elapsed time is %.6f seconds" % (t-t0)
>

...SNIP...

I couldn't help but notice that the following two lines from the above
code are doing different things, but I think they are supposed to do the
same things (the first is the Matlab version, the 2nd is the Python
version):

> d1(:) = odd - C2*even;

> d1[:] = odd[:] - C1*even[:]

I don't understand what this code is doing, so I don't know which is
wrong. I don't even know if this is the actual code you are using. Just
thought I'd mention it.

Bill

Jon Harrop

unread,

Dec 2, 2006, 2:44:07 AM12/2/06

to

I don't know Python but this benchmark caught my eye.

>>def D4_Transform(x, s1=None, d1=None, d2=None):
>> """
>> D4 Wavelet transform in NumPy
>> (C) Sturla Molden
>> """
>> C1 = 1.7320508075688772
>> C2 = 0.4330127018922193
>> C3 = -0.066987298107780702
>> C4 = 0.51763809020504137
>> C5 = 1.9318516525781364
>> if d1 == None:
>> d1 = numpy.zeros(x.size/2)
>> s1 = numpy.zeros(x.size/2)
>> d2 = numpy.zeros(x.size/2)

Are these definitions ever used? It looks like s1, d1 and d2 are all
redefined below without reference to these previous values.

>> odd = x[1::2]
>> even = x[:-1:2]
>> d1[:] = odd[:] - C1*even[:]
>> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
>> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
>> d2[0] = d1[0] + s1[-1]
>> d2[1:] = d1[1:] + s1[:-1]
>> even[:] = C4 * s1[:]
>> odd[:] = C5 * d2[:]

Does that line create an array that is never used? If so, is C5 also never
used?

>> if x.size > 2:
>>
>>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

What is the result of this function?

I'm interested in translating this function into other languages, like
OCaml, to see how good Python's performance is (I am amazed it can beat
Matlab, not that I've used Matlab). In particular, I think you are eagerly
allocating arrays when, in a functional language, you could just as easily
compose closures.

For example, this program is 3x faster than the Python on my machine:

let rec d4 s1 d1 d2 x =
let c1 = 1.7320508075688772 in
let c2 = 0.4330127018922193 in
let c3 = -0.066987298107780702 in
let c4 = 0.51763809020504137 in
let c5 = 1.9318516525781364 in
let n = Array.length x in
let odd i = x.(2*i) and even i = x.(2*i + 1) in
let d1 i = odd i -. c1 *. even i in
let f = function -1 -> n/2 - 1 | i -> i in
let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in
let d2 i = d1 i +. s1 (f(i-1)) in
let even = Array.init (n/2) (fun i -> c4 *. s1 i) in
if n > 2 then d4 s1 d1 d2 even else s1, d1, d2

but I'm not sure it is correct!

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Carl Banks

unread,

Dec 2, 2006, 12:20:56 PM12/2/06

to

Jon Harrop wrote:
> I don't know Python but this benchmark caught my eye.
>
> >>def D4_Transform(x, s1=None, d1=None, d2=None):
> >> """
> >> D4 Wavelet transform in NumPy
> >> (C) Sturla Molden
> >> """
> >> C1 = 1.7320508075688772
> >> C2 = 0.4330127018922193
> >> C3 = -0.066987298107780702
> >> C4 = 0.51763809020504137
> >> C5 = 1.9318516525781364
> >> if d1 == None:
> >> d1 = numpy.zeros(x.size/2)
> >> s1 = numpy.zeros(x.size/2)
> >> d2 = numpy.zeros(x.size/2)
>
> Are these definitions ever used? It looks like s1, d1 and d2 are all
> redefined below without reference to these previous values.

No, they're never redefined (even in the recursive version). Slices of
them are reassigned. That's a big difference.

> >> odd = x[1::2]
> >> even = x[:-1:2]
> >> d1[:] = odd[:] - C1*even[:]
> >> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]
> >> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]
> >> d2[0] = d1[0] + s1[-1]
> >> d2[1:] = d1[1:] + s1[:-1]
> >> even[:] = C4 * s1[:]
> >> odd[:] = C5 * d2[:]
>
> Does that line create an array that is never used? If so, is C5 also never
> used?

No it doesn't create a new array. But it does have an effect.
Explained below.

(Actually, it does create a temporary array to store the intermediate
result, but that array does not get bound to odd.)

> >> if x.size > 2:
> >>
> >>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])
>
> What is the result of this function?

It modifies the array even in place. This also has an effect.

> I'm interested in translating this function into other languages, like
> OCaml, to see how good Python's performance is (I am amazed it can beat
> Matlab, not that I've used Matlab).

Matlab has a few *cough* limitations when it comes to hand-optimizing.
When writing naive code, Matlab often is faster than Python with numpy
because it has many commerical man-year of optimizing behind it.
However, Matlab helps v

> In particular, I think you are eagerly
> allocating arrays when, in a functional language, you could just as easily
> compose closures.

You are completely wrong.

What you seem to be missing is that slice assignment shares data. For
example, the statement "odd = x[1::2]" does NOT create a new array. It
takes a slice of x's data (in this case, the odd elements of x: slices
need not be contiguous), and binds it to the symbol odd. The array
elements are shared; changing elements in odd also changes elements of
x.

So later on, the statement "odd[:] = C5*d2[:]" reassigns the elements
of odd, which, because the array elements are shared, also reassigns
the odd elements in x.

(Notice the difference between "odd[:]=" and "odd=". "odd[:]=" is
slice assignment; it changes the elements in the given slice of odd, in
this case the whole array. "odd=" rebinds the symbol odd to a
different object, which could be a new array, a slice of an existing
array, or any other Python object.)

This data sharing more or less accomplishes the same thing that the
"closures" you speak of accomplish (in this case), only without the
functional.

> For example, this program is 3x faster than the Python on my machine:
>
> let rec d4 s1 d1 d2 x =
> let c1 = 1.7320508075688772 in
> let c2 = 0.4330127018922193 in
> let c3 = -0.066987298107780702 in
> let c4 = 0.51763809020504137 in
> let c5 = 1.9318516525781364 in
> let n = Array.length x in
> let odd i = x.(2*i) and even i = x.(2*i + 1) in
> let d1 i = odd i -. c1 *. even i in
> let f = function -1 -> n/2 - 1 | i -> i in
> let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in
> let d2 i = d1 i +. s1 (f(i-1)) in
> let even = Array.init (n/2) (fun i -> c4 *. s1 i) in
> if n > 2 then d4 s1 d1 d2 even else s1, d1, d2
>
> but I'm not sure it is correct!

It's not correct, but what you left out is probably low cost.

OCaml is compiled to machine code, right? And types can be inferred at
compile time, correct? Well then of course it's faster. It seems to
me a big help is the ability to fold multiple array operations into a
single loop, which is optimization a dynamically-typed language like
Python can't easily make. (It'd require are really smart JIT compiler
or some concessions in dynamicity.)

Carl Banks

unread,

Dec 2, 2006, 12:28:52 PM12/2/06

to

Carl Banks wrote:
> Matlab has a few *cough* limitations when it comes to hand-optimizing.
> When writing naive code, Matlab often is faster than Python with numpy
> because it has many commerical man-year of optimizing behind it.
> However, Matlab helps v

That should say:

However, Matlab helps very little when you need to speed things up, or
(especially) use less memory than the naive way.

Carl Banks

Jon Harrop

unread,

Dec 3, 2006, 12:00:12 AM12/3/06

to

Carl Banks wrote:
> No, they're never redefined (even in the recursive version). Slices of
> them are reassigned. That's a big difference.

I see.

> (Actually, it does create a temporary array to store the intermediate
> result, but that array does not get bound to odd.)

Sure.

>> In particular, I think you are eagerly
>> allocating arrays when, in a functional language, you could just as
>> easily compose closures.
>
> You are completely wrong.

I'll give an example. If you write the Python:

a[:] = b[:] + c[:] + d[:]

I think that is equivalent to the ML:

fill a (map2 ( + ) (map2 ( + ) b c) d)

which can be deforested in ML to avoid the creation of the intermediate
result b[:] + c[:] by using a closure to add three values at once:

fill a (map3 (fun b c d -> b + c + d) b c d)

which will be much faster because it doesn't generate an intermediate array.

> This data sharing more or less accomplishes the same thing that the
> "closures" you speak of accomplish (in this case), only without the
> functional.

The closure avoided the intermediate result. You said that the Python isn't
doing that?

> It's not correct, but what you left out is probably low cost.
>
> OCaml is compiled to machine code, right?

Yes.

> And types can be inferred at compile time, correct?

Types are completely inferred at compile time.

> Well then of course it's faster.

Yes. So this doesn't seem to be a killer example of numpy, although I am
still amazed that it can outperform Matlab.

> It seems to
> me a big help is the ability to fold multiple array operations into a
> single loop, which is optimization a dynamically-typed language like
> Python can't easily make. (It'd require are really smart JIT compiler
> or some concessions in dynamicity.)

Writing a JIT to compile this kind of stuff is easy. My point is that this
is fundamentally bad code, so why bother trying to write a Python JIT? Why
not just write in a better language for this task? Optimising within a
fundamentally slow language seems silly to me.

Carl Banks

unread,

Dec 3, 2006, 10:58:52 AM12/3/06

to

Jon Harrop wrote:
> >> In particular, I think you are eagerly
> >> allocating arrays when, in a functional language, you could just as
> >> easily compose closures.
> >
> > You are completely wrong.
>
> I'll give an example. If you write the Python:
>
> a[:] = b[:] + c[:] + d[:]
>
> I think that is equivalent to the ML:
>
> fill a (map2 ( + ) (map2 ( + ) b c) d)
>
> which can be deforested in ML to avoid the creation of the intermediate
> result b[:] + c[:] by using a closure to add three values at once:
>
> fill a (map3 (fun b c d -> b + c + d) b c d)
>
> which will be much faster because it doesn't generate an intermediate array.

Ah, but, this wasn't about temporaries when you spoke of "eagerly
allocating arrays", was it? But yes, you're right, in this example
temporary arrays are created; arrays that Ocaml would not need. (I
don't exactly understand why you'd need a functional language to get
this optimization, though.)

You might have guessed that you can get rid of most temporary arrays,
at the expense of readability, with numpy. For example, replacing

d1[:] = odd[:] - C1*even[:]

with

numpy.multiply(-C1,even,d1)
numpy.add(d1,odd,d1)

eliminates the intermediate. No unnecessary array allocations; no
closures necessary.

I'm curious whether this shared-slicing isn't, in some ways,
advantageous over the Ocaml way. I presume that in Ocaml, the way
you'd "share" array data is to create a closure can apply some
operation to selected elements of the array, returning the result. But
could this as easily modify the array in-place? I presume a good
compiler could be able to inline the function and optimize the call to
an in-place operation, but I'm not sure how well this works in
practice.

Here's a version of D4_Transform that uses no temporary arrays (aside
from the work arrays s1, d1, and d2, which are allocated only once).
It's about 40% faster for me than the one with infix operations. I'd
be curious how it compares to a correct Ocaml version. (I'd still
expect Ocaml to be at least twice as fast.)

def alt_D4_Transform(x, s1=None, d1=None, d2=None):
add = numpy.add
multiply = numpy.multiply

C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:

d1 = numpy.zeros(x.size/2,numpy.Float)
s1 = numpy.zeros(x.size/2,numpy.Float)
d2 = numpy.zeros(x.size/2,numpy.Float)

odd = x[1::2]
even = x[:-1:2]

multiply(-C1,even,d1)
add(d1,odd,d1)
multiply(C2,d1,s1)
add(s1,even,s1)
d2[0] = C3 * d1[-1]
multiply(C3,d1[:-1],d2[1:])
add(s1,d2,s1)

d2[0] = d1[0] + s1[-1]

add(d1[1:],s1[:-1],d2[1:])
multiply(C4,s1,even)
multiply(C5,d2,odd)
if x.size > 2:

alt_D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

> > It seems to
> > me a big help is the ability to fold multiple array operations into a
> > single loop, which is optimization a dynamically-typed language like
> > Python can't easily make. (It'd require are really smart JIT compiler
> > or some concessions in dynamicity.)
>
> Writing a JIT to compile this kind of stuff is easy.

Eh, getting a JIT to do the optimization I spoke of in Python is not
easy. It would be relatively easy in a statically typed language. In
Python, it'd be tantamount to rewriting a dynamically reconfigurable
version of numpy--you'd have to duplicate numpy's complex
type-dispatching rules.

> My point is that this
> is fundamentally bad code,

Whoa, there. I realize that for people who prefer functional
programming, with their recursively reductionist way of thinking, it
might seem as if leaving any sort of reducibility in final result is
"fundamentally bad", but that's a pretty strong thing to say.

> so why bother trying to write a Python JIT? Why
> not just write in a better language for this task? Optimising within a
> fundamentally slow language seems silly to me.

Because, for most people, language choice is not a greedy maximizing of
a single issue. Nobody who uses numpy is under the impression that it
can match a statically-typed language that is compiled to machine code
in peak performance. (Matlab is fair game, of course.) But speed is
not the only important thing.

I use Python because it's a excellently designed language that fits my
overall needs, and numpy because sometimes I need more speed than
vanilla Python. Only when speed is critical (for example, 3D collision
detection) do I write an extension in a "better language for the task"
(i.e. C).

This is something that's quite popular in the numerically-intensive
computing community, BTW. Many people use Python to handle boring
stuff like file I/O, memory managment, and non-critical numerical
calculations, and write C or Fortran extensions to do the
numerically-intensive stuff.

Carl Banks

Jon Harrop

unread,

Dec 3, 2006, 12:01:27 PM12/3/06

to

Carl Banks wrote:
>> fill a (map3 (fun b c d -> b + c + d) b c d)
>>
>> which will be much faster because it doesn't generate an intermediate
>> array.
>
> Ah, but, this wasn't about temporaries when you spoke of "eagerly
> allocating arrays", was it?

I had thought that all of the array operations were allocating new arrays at
first but it seems that at least assignment to a slice does not.

Does:

a[:] = b[:] + c[:]

allocate a temporary for b[:] + c[:]?

> But yes, you're right, in this example
> temporary arrays are created; arrays that Ocaml would not need. (I
> don't exactly understand why you'd need a functional language to get
> this optimization, though.)

I thought that functional programming could get you both the brevity of
Python's slicing and the performance of C's compilation but I was wrong.
You can get the brevity of the Python approach in C.

> You might have guessed that you can get rid of most temporary arrays,
> at the expense of readability, with numpy. For example, replacing
>
> d1[:] = odd[:] - C1*even[:]
>
> with
>
> numpy.multiply(-C1,even,d1)
> numpy.add(d1,odd,d1)
>
> eliminates the intermediate. No unnecessary array allocations; no
> closures necessary.

Right. Performance will probably go from 5x slower to 2x slower because
you're traversing the arrays twice instead of once.

> I'm curious whether this shared-slicing isn't, in some ways,
> advantageous over the Ocaml way.

That's exactly what I thought to start with. The author of F# is working on
getting slicing into his language but the use of slicing in this Python
benchmark corrupted my fragile little mind. Slicing is a terrible way to
approach this problem if you're using a compiled language like F#.

I first wrote an OCaml translation of the Python and wrote my own
little "slice" implementation. I have since looked up a C++ solution and
translated that into OCaml instead:

let rec d4_aux a n =
let n2 = n lsr 1 in
let tmp = Array.make n 0. in
for i=0 to n2-2 do
tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
done;
tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
Array.blit tmp 0 a 0 n;
if n > 4 then d4_aux a (n lsr 1)

let d4 a = d4_aux a (Array.length a)

Not only is that shorter than the Python, it is much faster:

0.56s C++ (direct arrays)
0.61s F# (direct arrays)
0.62s OCaml (direct arrays)
1.38s OCaml (slices)
2.38s Python (slices)
10s Mathematica 5.1

Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
a[i]).

> I presume that in Ocaml, the way
> you'd "share" array data is to create a closure can apply some
> operation to selected elements of the array, returning the result.

Yes. That is certainly one way of doing it.

> But could this as easily modify the array in-place?

Absolutely. A closure would capture a reference to the array. Arrays are
mutable so you could alter the array from within the closure. In F# you
could even execute the closures concurrently to alter different parts of
the same array at the same time. I just tried that and it is actually
slower to multithread this.

> I presume a good
> compiler could be able to inline the function and optimize the call to
> an in-place operation, but I'm not sure how well this works in
> practice.

Surprisingly well it seems: F# and OCaml are almost as fast as C/C++!

> Here's a version of D4_Transform that uses no temporary arrays (aside
> from the work arrays s1, d1, and d2, which are allocated only once).
> It's about 40% faster for me than the one with infix operations. I'd
> be curious how it compares to a correct Ocaml version. (I'd still
> expect Ocaml to be at least twice as fast.)

I get:

1.57s Python (in-place)

>> > It seems to
>> > me a big help is the ability to fold multiple array operations into a
>> > single loop, which is optimization a dynamically-typed language like
>> > Python can't easily make. (It'd require are really smart JIT compiler
>> > or some concessions in dynamicity.)
>>
>> Writing a JIT to compile this kind of stuff is easy.
>
> Eh, getting a JIT to do the optimization I spoke of in Python is not
> easy. It would be relatively easy in a statically typed language. In
> Python, it'd be tantamount to rewriting a dynamically reconfigurable
> version of numpy--you'd have to duplicate numpy's complex
> type-dispatching rules.

There aren't any complicated types in the above code. In fact, there are
only two types: float and float array. Type checking is easy in this case,
compilation to C is also easy, then you just dispatch to the compiled C
code when the types are ok. You would want to write your Python code like C
code but that is shorter in this case. You may also want to flag code for
compilation manually in order to avoid the overhead of compiling code
unnecessarily.

>> My point is that this is fundamentally bad code,
>
> Whoa, there. I realize that for people who prefer functional
> programming, with their recursively reductionist way of thinking, it
> might seem as if leaving any sort of reducibility in final result is
> "fundamentally bad", but that's a pretty strong thing to say.

I was referring to the slicing specifically, nothing to do with functional
programming. My F# and OCaml code are now basically identical to the C++
code.

>> so why bother trying to write a Python JIT? Why
>> not just write in a better language for this task? Optimising within a
>> fundamentally slow language seems silly to me.
>
> Because, for most people, language choice is not a greedy maximizing of
> a single issue. Nobody who uses numpy is under the impression that it
> can match a statically-typed language that is compiled to machine code
> in peak performance. (Matlab is fair game, of course.) But speed is
> not the only important thing.

In this specific context (discrete wavelet transform benchmark), I'd have
said that speed was the most important thing after correctness.

> I use Python because it's a excellently designed language that fits my
> overall needs, and numpy because sometimes I need more speed than
> vanilla Python. Only when speed is critical (for example, 3D collision
> detection) do I write an extension in a "better language for the task"
> (i.e. C).

Have you looked at languages like F#, OCaml, Haskell, SML and so on? They
seem to offer the benefits of both worlds.

> This is something that's quite popular in the numerically-intensive
> computing community, BTW. Many people use Python to handle boring
> stuff like file I/O, memory managment, and non-critical numerical
> calculations, and write C or Fortran extensions to do the
> numerically-intensive stuff.

Yes, I appreciate that Python is hugely popular, I just don't understand why
it is quite so popular.

I found a series of lectures on VPython, showing some amazing stuff:

http://showmedo.com/videos/series?name=pythonThompsonVPythonSeries

I just found some more interesting stuff here:

http://www.phy.syr.edu/~salgado/software/vpython/

This is really great work but I can't help but wonder why the authors chose
to use Python when other languages seem better suited. I'd like to work on
raising people's awareness of these alternatives, and probably create some
useful tools in the process. So I'm keen to learn what Python programmers
would want/expect from F# and OCaml.

What would it take to make you convert?

Carl Banks

unread,

Dec 3, 2006, 1:05:17 PM12/3/06

to

Jon Harrop wrote:
> I had thought that all of the array operations were allocating new arrays at
> first but it seems that at least assignment to a slice does not.
>
> Does:
>
> a[:] = b[:] + c[:]
>
> allocate a temporary for b[:] + c[:]?

Yep.

[snip]

> Not only is that shorter than the Python, it is much faster:
>
> 0.56s C++ (direct arrays)
> 0.61s F# (direct arrays)
> 0.62s OCaml (direct arrays)
> 1.38s OCaml (slices)
> 2.38s Python (slices)
> 10s Mathematica 5.1

[snip]
> 1.57s Python (in-place)

So,
optimized Python is roughly the same speed as naive Ocaml
optimized Ocaml is roughly the same speed as C++

> >> > It seems to
> >> > me a big help is the ability to fold multiple array operations into a
> >> > single loop, which is optimization a dynamically-typed language like
> >> > Python can't easily make. (It'd require are really smart JIT compiler
> >> > or some concessions in dynamicity.)
> >>
> >> Writing a JIT to compile this kind of stuff is easy.
> >
> > Eh, getting a JIT to do the optimization I spoke of in Python is not
> > easy. It would be relatively easy in a statically typed language. In
> > Python, it'd be tantamount to rewriting a dynamically reconfigurable
> > version of numpy--you'd have to duplicate numpy's complex
> > type-dispatching rules.
>
> There aren't any complicated types in the above code. In fact, there are
> only two types: float and float array.

You're vastly underestimating the complexity of numpy objects. They
have an awful lot going on under the covers to make it look simple on
the surface. There's all kinds of type-checking and type-conversions.
A JIT that folds loops together would have to have knowledge of that
process, and it's a lot of knowledge to have. That is not easy.

> Type checking is easy in this case,
> compilation to C is also easy, then you just dispatch to the compiled C
> code when the types are ok. You would want to write your Python code like C
> code but that is shorter in this case. You may also want to flag code for
> compilation manually in order to avoid the overhead of compiling code
> unnecessarily.
>
> >> My point is that this is fundamentally bad code,
> >
> > Whoa, there. I realize that for people who prefer functional
> > programming, with their recursively reductionist way of thinking, it
> > might seem as if leaving any sort of reducibility in final result is
> > "fundamentally bad", but that's a pretty strong thing to say.
>
> I was referring to the slicing specifically, nothing to do with functional
> programming. My F# and OCaml code are now basically identical to the C++
> code.

It is pretty strong thing to say that anything is fundamentally bad
just because it's not fast as something else. Fundamental badness
ought to run deeper than some superficial, linear measure.

> >> so why bother trying to write a Python JIT? Why
> >> not just write in a better language for this task? Optimising within a
> >> fundamentally slow language seems silly to me.
> >
> > Because, for most people, language choice is not a greedy maximizing of
> > a single issue. Nobody who uses numpy is under the impression that it
> > can match a statically-typed language that is compiled to machine code
> > in peak performance. (Matlab is fair game, of course.) But speed is
> > not the only important thing.
>
> In this specific context (discrete wavelet transform benchmark), I'd have
> said that speed was the most important thing after correctness.

So let's say I'm planning to write this complicated graphical program,
that does all sorts of stuff. I/O, networking, database, etc. It's a
medium throughput program, and a slow language like Python is
sufficient to run it, except for this one tiny part where I have to
transform of wave data.

Now, I was going to write the program in Python, and use the very
convenient numpy to do the transformation part. No problem, numpy
will be fast enough for my needs.

But wait! It's silly try to optimize code in a fundamentally slow
language! It's vanity to even try! I guess for the sake of getting
this little D4 transform to double in speed I'll scrap Python and write
the whole thing in OCaml.

(Or I can begrudingly admit that it's not really silly to try to
optimize a slow language.)

[snip]

> This is really great work but I can't help but wonder why the authors chose
> to use Python when other languages seem better suited. I'd like to work on
> raising people's awareness of these alternatives, and probably create some
> useful tools in the process. So I'm keen to learn what Python programmers
> would want/expect from F# and OCaml.
>
> What would it take to make you convert?

Them not being functional would be a good start....

Carl Banks

Jon Harrop

unread,

Dec 3, 2006, 2:43:05 PM12/3/06

to

Carl Banks wrote:
>> 0.56s C++ (direct arrays)
>> 0.61s F# (direct arrays)
>> 0.62s OCaml (direct arrays)
>> 1.38s OCaml (slices)
>> 2.38s Python (slices)
>> 10s Mathematica 5.1
> [snip]
>> 1.57s Python (in-place)
>
> So,
> optimized Python is roughly the same speed as naive Ocaml
> optimized Ocaml is roughly the same speed as C++

Absolutely not:

Optimized Python is 14% slower than badly written OCaml. Given the problem,
rather than the Python solution, nobody would write OCaml code like that.

Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest
Python so far whilst also requiring about half as much code.

Optimising the C++ by hoisting the O(log n) temporary array allocations into
one allocation makes it another 20% faster. I'm sure there are plenty more
optimisations...

>> There aren't any complicated types in the above code. In fact, there are
>> only two types: float and float array.
>
> You're vastly underestimating the complexity of numpy objects. They
> have an awful lot going on under the covers to make it look simple on
> the surface. There's all kinds of type-checking and type-conversions.
> A JIT that folds loops together would have to have knowledge of that
> process, and it's a lot of knowledge to have. That is not easy.

My suggestion doesn't really have anything to do with numpy. If you had such
a JIT you wouldn't use numpy in this case.

That's my point, using numpy encouraged the programmer to optimise in the
wrong direction in this case (to use slices instead of element-wise
operations).

>> I was referring to the slicing specifically, nothing to do with
>> functional programming. My F# and OCaml code are now basically identical
>> to the C++ code.
>
> It is pretty strong thing to say that anything is fundamentally bad
> just because it's not fast as something else. Fundamental badness
> ought to run deeper than some superficial, linear measure.

The slice based approach is not only slower, it is longer, more obfuscated
and more difficult to optimise. That doesn't just apply to Python and
numpy, it also applies to Matlab, Mathematica etc.

Which begs the question, if you were writing in a compiled language like F#
would you ever use slices?

> Now, I was going to write the program in Python, and use the very
> convenient numpy to do the transformation part.

What makes numpy convenient?

> No problem, numpy will be fast enough for my needs.

Ok. But you'd rather have a compiler?

> But wait! It's silly try to optimize code in a fundamentally slow
> language! It's vanity to even try! I guess for the sake of getting
> this little D4 transform to double in speed

C++ is currently 3.4x faster than Python.

> I'll scrap Python and write the whole thing in OCaml.

Why not drop to C for this one function?

> [snip]
>> This is really great work but I can't help but wonder why the authors
>> chose to use Python when other languages seem better suited. I'd like to
>> work on raising people's awareness of these alternatives, and probably
>> create some useful tools in the process. So I'm keen to learn what Python
>> programmers would want/expect from F# and OCaml.
>>
>> What would it take to make you convert?
>
> Them not being functional would be a good start....

They don't impose functional programming on you.

Carl Banks

unread,

Dec 3, 2006, 5:12:50 PM12/3/06

to

Jon Harrop wrote:
> Carl Banks wrote:
> >> 0.56s C++ (direct arrays)
> >> 0.61s F# (direct arrays)
> >> 0.62s OCaml (direct arrays)
> >> 1.38s OCaml (slices)
> >> 2.38s Python (slices)
> >> 10s Mathematica 5.1
> > [snip]
> >> 1.57s Python (in-place)
> >
> > So,
> > optimized Python is roughly the same speed as naive Ocaml
> > optimized Ocaml is roughly the same speed as C++
>
> Absolutely not:
>
> Optimized Python is 14% slower than badly written OCaml.

I'd call that roughly the same speed. Did you use any sort of
benchmark suite that miminized testing error, or did you just run it
surrounded by calls to the system timer like I did? If the latter,
then it's poor benchmark, and 10% accuracy is better than you can
expect. (Naive tests like that tend to favor machine code.)

> Given the problem,
> rather than the Python solution, nobody would write OCaml code like that.
>
> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest
> Python so far whilst also requiring about half as much code.

Frankly, I have a hard time believing anyone would naively write the
Ocaml code like the optimized version you posted. You'd know better
than I, though.

> Optimising the C++ by hoisting the O(log n) temporary array allocations into
> one allocation makes it another 20% faster. I'm sure there are plenty more
> optimisations...
>
> >> There aren't any complicated types in the above code. In fact, there are
> >> only two types: float and float array.
> >
> > You're vastly underestimating the complexity of numpy objects. They
> > have an awful lot going on under the covers to make it look simple on
> > the surface. There's all kinds of type-checking and type-conversions.
> > A JIT that folds loops together would have to have knowledge of that
> > process, and it's a lot of knowledge to have. That is not easy.
>
> My suggestion doesn't really have anything to do with numpy. If you had such
> a JIT you wouldn't use numpy in this case.
>
> That's my point, using numpy encouraged the programmer to optimise in the
> wrong direction in this case (to use slices instead of element-wise
> operations).

Ok, I can see that. We have a sort of JIT compiler, psyco, that works
pretty well but I don't think it's as fast for large arrays as a good
numpy solution. But it has more to deal with than a JIT for a
statically typed language.

> >> I was referring to the slicing specifically, nothing to do with
> >> functional programming. My F# and OCaml code are now basically identical
> >> to the C++ code.
> >
> > It is pretty strong thing to say that anything is fundamentally bad
> > just because it's not fast as something else. Fundamental badness
> > ought to run deeper than some superficial, linear measure.
>
> The slice based approach is not only slower, it is longer, more obfuscated
> and more difficult to optimise. That doesn't just apply to Python and
> numpy, it also applies to Matlab, Mathematica etc.
>
> Which begs the question, if you were writing in a compiled language like F#
> would you ever use slices?

If I was writing in F#? I absolutely would, because some problems are
most cleanly, readably, and maintainably done with slices. I don't
have any speed-lust, and I don't waste time making things fast if they
don't need to be. I would rather spend my time designing and
implementing the system, and optimizing things that actually need it.
I don't want to spend my time tracking down some stupid indexing error.

If it was critical for it to be fast, then I'd pull out the stops and
write it as efficiently as possible. In my experience, a typical
middle size program will have half a dozen or so subroutines where it's
critical to be as fast as possible.

Will any other F# people do it? I don't know if they all have
speed-lust, or if they'd all deliberately avoid slicing to prove some
kind of point about functional programming and/or non-machine-compiled
languages, but I assume there'd be some who would do it for the same
reasons I'd do it.

> > Now, I was going to write the program in Python, and use the very
> > convenient numpy to do the transformation part.
>
> What makes numpy convenient?

It's in Python, it's almost always fast enough, it's clean, it scales
up well, it expresses mathematical concepts straightforwardly, it's
well suppported, it has useful advanced functionality built in (linear
algebra, etc.).

And a lot of the time, you don't have to worry about indexing.

> > No problem, numpy will be fast enough for my needs.
>
> Ok. But you'd rather have a compiler?

Python *is* compiled, buddy. It has a VM that runs bytecode.

But, would I rather have a seperate compile to *machine code*, when an
automatic compile to VM code is fast enough?

Not remotely. Compilers are a PITA and I avoid them when it's
possible.

> > But wait! It's silly try to optimize code in a fundamentally slow
> > language! It's vanity to even try! I guess for the sake of getting
> > this little D4 transform to double in speed
>
> C++ is currently 3.4x faster than Python.

Oh, well *that* made all the difference.

> > I'll scrap Python and write the whole thing in OCaml.
>
> Why not drop to C for this one function?

Why would I? The assumptions clearly stated that numpy is fast enough.
Why would I take several hours to write a C extension when I could do
it with numpy in about 20 minutes, *and it's fast enough*?

> > [snip]
> >> This is really great work but I can't help but wonder why the authors
> >> chose to use Python when other languages seem better suited. I'd like to
> >> work on raising people's awareness of these alternatives, and probably
> >> create some useful tools in the process. So I'm keen to learn what Python
> >> programmers would want/expect from F# and OCaml.
> >>
> >> What would it take to make you convert?
> >
> > Them not being functional would be a good start....
>
> They don't impose functional programming on you.

In the same way that a screwdriver can't prevent you from driving a
nail. Give me a break, we all know these guys (Haskell especially) are
designed to support functional first and other paradigms second. Way
second. That's quite enough imposing for me.

Look, I don't want to accuse you of anything, but when you come in here
and say "speed is crucial" and then "you should be using Haskell,
Ocaml, F#, etc., ...even though C++ is faster...", it gives off a
really strong signal that you're not interested in speed so much so as
in pimping functional languages; speed is just the mechanism. Which is
fine; those languages are undeniably faster, and there's nothing wrong
with using an advantage of a language to push it, if you're respectful.

But let me offer you some advice: if you wan't to lure people from
Python, speed isn't going to cut it. Most people who chose Python
(and, let's face it, most people who use Python chose it themselves)
already knew it was slow, and chose it anyways. Pythonistas prefer
Python because it's clean, readable, well-supported, and practical.
Practical might be the sticking point here. If you were to get a
Python prompt, and type "import this", it would print the Zen of
Python, a set of principles by which the language is designed. One of
them is "Practicality beats purity." And let's face it, functional
languages are (or seem to be) all about purity. Notice that there's no
Zen that says "Fast is better than slow". If you want us to swtich to
Ocaml, you better figure out how it supports the Zen of Python better
than Python does.

Carl Banks

Jon Harrop

unread,

Dec 3, 2006, 6:03:40 PM12/3/06

to

Carl Banks wrote:
>> Optimized Python is 14% slower than badly written OCaml.
>
> I'd call that roughly the same speed. Did you use any sort of
> benchmark suite that miminized testing error, or did you just run it
> surrounded by calls to the system timer like I did?

System timer, best of three.

> If the latter,
> then it's poor benchmark, and 10% accuracy is better than you can
> expect. (Naive tests like that tend to favor machine code.)

Yes, they are roughly the same speed. My objection was that this OCaml
program was not the unoptimised one, it was the badly written one. The
unoptimised OCaml/F#/C++ are all several times faster than this OCaml, and
much shorter and simpler too.

>> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the
>> fastest Python so far whilst also requiring about half as much code.
>
> Frankly, I have a hard time believing anyone would naively write the
> Ocaml code like the optimized version you posted.

That wasn't optimised. I haven't optimised any of my solutions. The OCaml I
posted was a naive convolution with some unnecessary allocation.

> You'd know better than I, though.

When asked to convolve an array with a 4-element array (h0..h3), I think
most programmers would write (pseudocode):

for i in [0 .. n-4]
x[i] = h0 y[i] + h1 y[i+1] + h2 y[i+2] + h3 y[i+3]

in most languages. That is the essence of this benchmark.

You can express it even more succinctly with a dot product:

for i in [0 .. n-4]
x[i] = h . y[i:]

In this case, you'd want a cyclic array type too.

>> That's my point, using numpy encouraged the programmer to optimise in the
>> wrong direction in this case (to use slices instead of element-wise
>> operations).
>
> Ok, I can see that. We have a sort of JIT compiler, psyco, that works
> pretty well but I don't think it's as fast for large arrays as a good
> numpy solution. But it has more to deal with than a JIT for a
> statically typed language.

Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
would be a good student project?

>> Which begs the question, if you were writing in a compiled language like
>> F# would you ever use slices?
>
> If I was writing in F#? I absolutely would, because some problems are
> most cleanly, readably, and maintainably done with slices. I don't
> have any speed-lust, and I don't waste time making things fast if they
> don't need to be. I would rather spend my time designing and
> implementing the system, and optimizing things that actually need it.
> I don't want to spend my time tracking down some stupid indexing error.

Ok. Can you name some problems that are solved more easily with slices than
with indexing? I may have just answered my own question by posting some
really concise pseudocode that uses slices.

> Will any other F# people do it? I don't know if they all have
> speed-lust, or if they'd all deliberately avoid slicing to prove some
> kind of point about functional programming and/or non-machine-compiled
> languages, but I assume there'd be some who would do it for the same
> reasons I'd do it.

My concern is that slices are being pulled into F# because they are popular
in languages like Python and Matlab but I've yet to see an example where
they are actually a good idea (in the context of F#, i.e. a compiled
language where indexing is as fast or faster).

>> What makes numpy convenient?
>
> ...it scales up well...

What do you mean?

> ...it expresses mathematical concepts straightforwardly...

It didn't express this mathematical concept very straightforwardly.

> And a lot of the time, you don't have to worry about indexing.

Right. Ideally you want to factor that out without losing anything. Indeed,
if you do factor the C++ code you should get back to the underlying
mathematical definitions. Ideally, you'd want to write those directly and
have them executed efficiently.

>> > No problem, numpy will be fast enough for my needs.
>>
>> Ok. But you'd rather have a compiler?
>
> Python *is* compiled, buddy. It has a VM that runs bytecode.

I meant a native-code compiler, of course.

> But, would I rather have a seperate compile to *machine code*, when an
> automatic compile to VM code is fast enough?
>
> Not remotely. Compilers are a PITA and I avoid them when it's
> possible.

I can envisage a JIT Python compiler that would spot statically typed code,
compile it with no user intervention and then run it. I implemented such a
thing for Mathematica a few years ago.

>> Why not drop to C for this one function?
>
> Why would I? The assumptions clearly stated that numpy is fast enough.
> Why would I take several hours to write a C extension when I could do
> it with numpy in about 20 minutes, *and it's fast enough*?

Is it very difficult to interface Python to other languages efficiently?

> But let me offer you some advice: if you wan't to lure people from
> Python, speed isn't going to cut it. Most people who chose Python
> (and, let's face it, most people who use Python chose it themselves)
> already knew it was slow, and chose it anyways. Pythonistas prefer
> Python because it's clean, readable, well-supported, and practical.
> Practical might be the sticking point here. If you were to get a
> Python prompt, and type "import this", it would print the Zen of
> Python, a set of principles by which the language is designed. One of
> them is "Practicality beats purity." And let's face it, functional
> languages are (or seem to be) all about purity. Notice that there's no
> Zen that says "Fast is better than slow". If you want us to swtich to
> Ocaml, you better figure out how it supports the Zen of Python better
> than Python does.

I'm keen on the practical aspect. I intend to take some of the excellent
Python demos out there and port them to languages like F# in order to
illustrate how they too can be practical languages.

George Sakkis

unread,

Dec 3, 2006, 10:05:41 PM12/3/06

to

Jon Harrop wrote:

> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> would be a good student project?

I guess for a student project it's not that important, but if you have
higher ambitions, make sure you read
http://dirtsimple.org/2005/10/children-of-lesser-python.html before you
commit to writing what will turn out to be yet another "80% Python" VM
/ JIT compiler.

George

Niels L Ellegaard

unread,

Dec 4, 2006, 2:15:38 AM12/4/06

to

Jon Harrop wrote:
> So I'm keen to learn what Python programmers would want/expect from F# and
> OCaml.

I think this discussion becoming is a little misguided.

The real strength of scipy is the elegant notation rather than speed.
Being raised with Matlab I find scipy nicely familiar, and its fast
enough for many tasks. Some would argue that the strength of scipy is
the weakness of ocaml. Others would disagree. That is a question of
taste.

My only grudge about strongly recommending scipy to friends is the way
that two arrays can share the same data. This can lead to subtle errors
that I will eventually be blamed for. I don't know if arrays in Matlab
(or Octave) can share data, but if they do, then everything happens
behind the scenes and the user does not have to worry. I would love to
see a future version of numpy that was 50% slower and had a more
foolproof approach to array copying.

http://www.scipy.org/Tentative_NumPy_Tutorial#head-1529ae93dd5d431ffe3a1001a4ab1a394e70a5f2

Niels

Filip Wasilewski

unread,

Dec 4, 2006, 9:18:15 AM12/4/06

to

Jon Harrop wrote:

[...]

> I first wrote an OCaml translation of the Python and wrote my own
> little "slice" implementation. I have since looked up a C++ solution and
> translated that into OCaml instead:
>
> let rec d4_aux a n =
> let n2 = n lsr 1 in
> let tmp = Array.make n 0. in
> for i=0 to n2-2 do
> tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
> tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
> done;
> tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
> tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
> Array.blit tmp 0 a 0 n;
> if n > 4 then d4_aux a (n lsr 1)
>
> let d4 a = d4_aux a (Array.length a)
>
> Not only is that shorter than the Python

and makes my eyes bleeding, but it's only my personal feeling. Try
reading the code above aloud and you will see why `shorter` is not
necessarily considered a merit, especially on this group.

Besides of that this code is irrelevant to the original one and your
further conclusions may not be perfectly correct. Please learn first
about the topic of your benchmark and different variants of wavelet
transform, namely difference between lifting scheme and dwt, and try
posting some relevant examples or use other topic for your benchmarks.

Neglecting this issues, I wonder if it is equally easy to juggle
arbitrary multidimensional arrays in a statically typed language and do
operations on selected axis directly from the interactive interpreter
like in the NumPy example from my other post -
http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
I don't know much OCaml so it would be great if you could elaborate on
this.

> , it is much faster:
>
> 0.56s C++ (direct arrays)
> 0.61s F# (direct arrays)
> 0.62s OCaml (direct arrays)
> 1.38s OCaml (slices)
> 2.38s Python (slices)
> 10s Mathematica 5.1
>
> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
> a[i]).

[...]

So why not use C++ instead of all others if the speed really matters?
What is your point here?

Could you please share full benchmark code, so we could make
conclusions too? I would like to get to know about your benchmark
methodology. I wonder if you are allocating the input data really
dynamically or whether it's size is a priori knowledge for the
compiler.

> In this specific context (discrete wavelet transform benchmark), I'd have
> said that speed was the most important thing after correctness.

In this very specific context:
Yes, if you are targeting specific one-shot issue like JPEG2000
encoder, where you are using only 2 types of wavelets. In that case you
would probably draw back to low-level C or C++ and use a good
optimizing compiler for a specific hardware.
Not necessarily, if you are doing research with different kinds of
wavelets and need a general, flexible and easily adaptable tool or just
the computation speed is not considered a bottleneck.

Language speed is a great advantage, but if it always was the most
important, people would talk directly to the CPUs or knit DSP chips in
theirs backyards whenever they need to solve a calculation problem.

Please don't make this discussion heading towards `what is the fastest
way of doing d4 transform and why OCaml rules` instead of `what is the
optimal way of doing things under given circumstances and still have a
free afternoon ;-)`. Different tasks need different, well-balanced
measures. Neither Python nor OCalm nor any other language is a panacea
for every single problem.

cheers,
fw

sturlamolden

unread,

Dec 4, 2006, 10:39:13 AM12/4/06

to

Filip Wasilewski wrote:

> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

If speed matters, one should consider using hand-coded assembly.
Nothing really beats that. But it's painful and not portable. So lets
forget about that for a moment.

Contrary to what most computer scientists think, C++ is not the tool of
choice if speed really matters. Numeric computing is typically done in
Fortran 90/95.

That is not without reason. There are aspects of the C++ language that
makes certain numerical optimizations impossible, e.g. the potential
for pointer aliasing within tight loops, which hampers register
allocation. For example:

for(i=0; i<n; i++) {
*a += *b;
*b++ = 0;
}

Here the C++ compiler cannot put *a in a register, as a may be aliased
by b in one of the iterations. It must therefore reload the value *a
from memory in every iteration of the loop. In Fortran this can never
happen (aliasing is disallowed in the standard), and it is therefore
much easier for the compiler to do numerical optimizations.

One should also consider using ATLAS or vendor-optimized BLAS (e.g.
Intel math kernel) for some of the axpy-operations in the wavelet
transform, unless one has an expensive Fortran compiler that takes care
of this automatically (in which case one should just use slicing and
let the compiler do the rest).

I don't agree that slicing is not the best way to approach this
problem. In both NumPy and Matlab, slicing (often) results in calls to
BLAS/ATLAS, which are heavily optimized numerical routines. One can
supply BLAS libraries that automatically exploits multiple CPUs, use
FMA (fused multiply and add) whenever appropriate, and make sure cache
is used in the most optimal way.

Vendors of Fortran compilers thrive on scientists need for speed.
Fortran compilers are therefore extremely good at taking slicing and
"vectorizing" the corresponding machine code. For those who don't know
Fortran, it has slicing operations similar to those of NumPy and
Matlab. If slicing was not the right tool for numerical work, Fortran
would not support it. Good Fortran compilers can automatically detect
when a call to BLAS is appropriate, use FMA opcodes whenever possible,
use SIMD extensions (e.g. SSE and MMX), or use MPI automatically to
spread the work onto multiple CPUs in a cluster, etc. This is possible
because of the implicit SIMD parallelism in Fortran's array slicing. In
Fortran, array slicing can be evaluated in any order, it is a parallel
SIMD statement. SIMD is a very powerful and perhaps the preferred
paradigm for numerical parallel computing. In Fortran 90/95 a
programmer will use array slicing, and the compiler is free to
parallelize however it wants. In contrast, the MIMD paradigm often used
for parallel computing in C and C++ (e.g. multi-threading) is a lot
harder to code for the programmer and optimize for the compiler.

Not to speak of the fact that Fortran 95 is a lot easier to write than
C++. Also, those that use Fortran tend to be scientists, not
professional programmers. C++ introduces a higher level of complexity
and entire new classes of bugs; this is highly undesirable. Fortran is
a small, nice language for numerical computing. Anything else should
rather be done in Python than C++.

I would urge you all to read this:

http://latticeqcd.blogspot.com/2006/11/why-we-use-fortran-and-python.html

Carl Banks

unread,

Dec 4, 2006, 12:22:00 PM12/4/06

to

Jon Harrop wrote:
[snip]

> >> That's my point, using numpy encouraged the programmer to optimise in the
> >> wrong direction in this case (to use slices instead of element-wise
> >> operations).
> >
> > Ok, I can see that. We have a sort of JIT compiler, psyco, that works
> > pretty well but I don't think it's as fast for large arrays as a good
> > numpy solution. But it has more to deal with than a JIT for a
> > statically typed language.
>
> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> would be a good student project?

...and finishing would be a good project for a well-funded team of
experienced engineers.

> >> Which begs the question, if you were writing in a compiled language like
> >> F# would you ever use slices?
> >
> > If I was writing in F#? I absolutely would, because some problems are
> > most cleanly, readably, and maintainably done with slices. I don't
> > have any speed-lust, and I don't waste time making things fast if they
> > don't need to be. I would rather spend my time designing and
> > implementing the system, and optimizing things that actually need it.
> > I don't want to spend my time tracking down some stupid indexing error.
>
> Ok. Can you name some problems that are solved more easily with slices than
> with indexing? I may have just answered my own question by posting some
> really concise pseudocode that uses slices.

D4 transform?

How about addition?

How about the vast majority of cases where slicing is possible?

Occasionally you have cases where slicing would work but be
complicated, but in most cases, wherever slicing is feasible it's
easier, less error-prone, less typing, more readable, more concise,
more maintainable.

> > Will any other F# people do it? I don't know if they all have
> > speed-lust, or if they'd all deliberately avoid slicing to prove some
> > kind of point about functional programming and/or non-machine-compiled
> > languages, but I assume there'd be some who would do it for the same
> > reasons I'd do it.
>
> My concern is that slices are being pulled into F# because they are popular
> in languages like Python and Matlab but I've yet to see an example where
> they are actually a good idea (in the context of F#, i.e. a compiled
> language where indexing is as fast or faster).

It's often a good idea where speed isn't crucial because it's easier,
more readable, and more maintainable. I highly suspect F# and other
languages would still using slicing for many problems for this reason.

OTOH, it's rarely a good idea to always base coding decisions on speed
to the exclusion of all other factors.

> >> What makes numpy convenient?
> >
> > ...it scales up well...
>
> What do you mean?
>
> > ...it expresses mathematical concepts straightforwardly...
>
> It didn't express this mathematical concept very straightforwardly.

I thought it did. One must know about the shared data nuances and the
slicing syntax. Given that, the numpy way closely mimicked the way
mathematical formulas look. Put it this way: which one of these two
(numpy with slicing, Ocaml with indexing) would it be easer to reverse
engineer a mathematical formula out of? I'd say the numpy version with
slicing, by a long shot.

> > And a lot of the time, you don't have to worry about indexing.
>
> Right. Ideally you want to factor that out without losing anything. Indeed,
> if you do factor the C++ code you should get back to the underlying
> mathematical definitions. Ideally, you'd want to write those directly and
> have them executed efficiently.

I am almost always happy to factor indexing out *with* losing
something.

> >> > No problem, numpy will be fast enough for my needs.
> >>
> >> Ok. But you'd rather have a compiler?
> >
> > Python *is* compiled, buddy. It has a VM that runs bytecode.
>
> I meant a native-code compiler, of course.
>
> > But, would I rather have a seperate compile to *machine code*, when an
> > automatic compile to VM code is fast enough?
> >
> > Not remotely. Compilers are a PITA and I avoid them when it's
> > possible.
>
> I can envisage a JIT Python compiler that would spot statically typed code,
> compile it with no user intervention and then run it. I implemented such a
> thing for Mathematica a few years ago.

You can almost never spot "statically typed" code in Python reliably,
even with a separate compile stage that does a global analysis. Even
Lisp compilers, which are generally known to be the best compilers out
there for a dynamically-typed language, don't do too well at this
unless you help them out, and Lisp isn't nearly as dynamic as Python.

As for JIT's, the best you could do is hook into a function, checking
the types of its arguments, and cacheing optimized code for the given
type signature. It's basically what psyco does. Anything more would
require either help from the programmer or concessions in dynamicity.

> >> Why not drop to C for this one function?
> >
> > Why would I? The assumptions clearly stated that numpy is fast enough.
> > Why would I take several hours to write a C extension when I could do
> > it with numpy in about 20 minutes, *and it's fast enough*?
>
> Is it very difficult to interface Python to other languages efficiently?

No. It does involve lots of boilerplate. But even if it didn't, I
wouldn't write an extension, because the numpy solution is cleaner,
more readable, and more maintainable than anything I could write in C
(which is a poor language that's good at being fast and talking to
hardware and nothing else), and it's fast enough.

> > But let me offer you some advice: if you wan't to lure people from
> > Python, speed isn't going to cut it. Most people who chose Python
> > (and, let's face it, most people who use Python chose it themselves)
> > already knew it was slow, and chose it anyways. Pythonistas prefer
> > Python because it's clean, readable, well-supported, and practical.
> > Practical might be the sticking point here. If you were to get a
> > Python prompt, and type "import this", it would print the Zen of
> > Python, a set of principles by which the language is designed. One of
> > them is "Practicality beats purity." And let's face it, functional
> > languages are (or seem to be) all about purity. Notice that there's no
> > Zen that says "Fast is better than slow". If you want us to swtich to
> > Ocaml, you better figure out how it supports the Zen of Python better
> > than Python does.
>
> I'm keen on the practical aspect. I intend to take some of the excellent
> Python demos out there and port them to languages like F# in order to
> illustrate how they too can be practical languages.

That's what you say. But nothing you've done in this whole thread
remotely suggests that "practical" means anything to you other than,
"possible to write faster code in."

Carl Banks

sturlamolden

unread,

Dec 4, 2006, 1:09:19 PM12/4/06

to

Carl Banks wrote:

> > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> > would be a good student project?
>
> ...and finishing would be a good project for a well-funded team of
> experienced engineers.

I think this is a good idea. We could use the AST from the CPython
compiler, translate on the fly to ANSI Common Lisp, and pass it off to
SBCL (which is very fast and supports native threads). It would even
solve the issue of Python's crappy GIL and reference counting (as
opposed to a modern garbage collector).

Another option would be to use something like pyrex to compile Python
on-the-fly to C, and then compile using GCC or MSVC.

A third option would be to translate on the fly to Scheme, and compile
using Bigloo.

It would not be a JIT-compiler, but rather something like the Lisp
solution where individual functions can be compiled when speed is
important.

It sounds like a fun project to me.

Jon Harrop

unread,

Dec 4, 2006, 4:11:10 PM12/4/06

to

sturlamolden wrote:
> I don't agree that slicing is not the best way to approach this
> problem.

Indeed, the C++ approach can be written very succinctly using slicing:

for i=0 to n/2-1 do
tmp[i] = dot a[2*i:] h;
tmp[i + n/2] = dot a[2*i + 1:] g;

where a[i:] denotes the array starting at index "i".

> In both NumPy and Matlab, slicing (often) results in calls to
> BLAS/ATLAS, which are heavily optimized numerical routines. One can
> supply BLAS libraries that automatically exploits multiple CPUs, use
> FMA (fused multiply and add) whenever appropriate, and make sure cache
> is used in the most optimal way.

If the Python were making only a few calls to BLAS then I would agree.
However, in order to move as much computation as possible from the
interpreted language onto BLAS you have had to call BLAS many times.

So the super-fast BLAS routines are now iterating over the arrays many times
instead of once and the whole program is slower than a simple loop written
in C.

Indeed, this algorithm could be written concisely using pattern matching
over linked lists. I wonder if that would be as fast as using BLAS from
Python.

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Carl Banks

unread,

Dec 4, 2006, 4:28:31 PM12/4/06

to

sturlamolden wrote:
> Carl Banks wrote:
>
> > > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme
> > > would be a good student project?
> >
> > ...and finishing would be a good project for a well-funded team of
> > experienced engineers.
>
> I think this is a good idea. We could use the AST from the CPython
> compiler, translate on the fly to ANSI Common Lisp, and pass it off to
> SBCL (which is very fast and supports native threads). It would even
> solve the issue of Python's crappy GIL and reference counting (as
> opposed to a modern garbage collector).

I doubt it would work too well. Lisp's semantics differ from Python's
in subtle ways, and covering the edge cases usually means you have to
do all the stuff you thought you would avoid by using another
dynamically-typed language. Translating Lisp integers to Python ints
would work probably 99 percent of the time, but in the end you'd
essentially have to reimplement the Python int type.

If you're going to go through all that work, you might as well
translate it to C or directly to machine code.

Carl Banks

sturlamolden

unread,

Dec 4, 2006, 9:08:48 PM12/4/06

to

Jon Harrop wrote:

> So the super-fast BLAS routines are now iterating over the arrays many times
> instead of once and the whole program is slower than a simple loop written
> in C.

Yes. And the biggest overhead is probably Python's function calls.

Little is as efficient as well-written ISO C99 (not to be confused with
C++ or ANSI C). The only thing I can think of is hand-written assembly.
But whether a single loop is faster than several BLAS calls does not
only depend on how many times the memory is traversed. It also depends
a lot on cache prefetching. So I assume you make sure that the cache is
prefetched and exploited optimally for your CPU in your C code? And you
are presumably also inlining assembly to use SSE and other SIMD
opcodes? After all, SSE can speed up floating point performance by a
factor of four if the computations can be carried out in parallel. At
one point, one has to stop and ask a very important question:

How fast is fast enough? And how much suffering am I willing to endure
to achieve that speed?

Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how
much time was saved writing and debugging the code? I value my own time
more than a few extra CPU cycles. With Fortran 95, it is easy to write
numerical code that runs reasonably fast. It is only a little bit
harder to write numerical code that runs very fast. There is nothing in
Fortran that prevents you from writing loops that are equally fast or
faster than ANSI C loops. If you have several subsequent slicing
operations, a good Fortran compiler can detect that and fuse them into
a single loop. Never underestimate the ability of a modern Fortran
compiler to generate efficient machine code, even if array slicing is
used heavily.

Also never underestimate the truth in Knuth's claim that premature
optimization is the root of all evil in computer programming. It is a
lot smarter to write a pure Python/NumPy solution, and if its too slow,
run the profiler, identify the worst bottlenecks, and translate them to
Fortran. The result will probably not be as fast as a pure C solution.
But if it is still too slow, money is better spent buying better
hardware than attempting a major rewrite in C/C99/C++.

NumPy can be slower than equivalent C by an order of magnitude. But
still, that may be more than fast enough. The beauty of NumPy is really
the way it allows arrays of numerical data to be created and
manipulated within Python, not the speed of python code using NumPy
objects. We have already seen that NumPy is faster than the most recent
version of Matlab, the most expensive commercial tool on the marked;
which means, it is as fast as you can expect a numerical scripting tool
to be, but sure, there is still room for improvement.

In a numerical scripting language it is certainly not true that several
slicing operations is slower than a single big for-loop. That is due to
the overhead involved when the interpreter is invoked. The performance
is to a large extent determined by the amount of interpreter
invocations. That is why it is preferred to "vectorize" NumPy code
using slicing, repmat, reshapes, etc., rather than looping with a
pythonic for-loop. Even if you have to iterate over the memory several
times, array slicing will still be a lot faster than a pythonic
for-loop.

> Indeed, this algorithm could be written concisely using pattern matching
> over linked lists. I wonder if that would be as fast as using BLAS from
> Python.

Pattern matching on linked lists? For wavelet transforms? Seriously?

Jon Harrop

unread,

Dec 5, 2006, 5:11:58 AM12/5/06

to

sturlamolden wrote:
> Little is as efficient as well-written ISO C99 (not to be confused with
> C++ or ANSI C).

OCaml and F# are almost as fast as C++ in this case. I suspect most other
modern languages are.

> So I assume you make sure that the cache is
> prefetched and exploited optimally for your CPU in your C code?

No. My implementations are all unoptimised.

> How fast is fast enough? And how much suffering am I willing to endure
> to achieve that speed?

No suffering: The other implementations are all shorter, simpler and faster
than the Python.

> Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how
> much time was saved writing and debugging the code?

A lot of effort was saved by not having to shoehorn the solution into numpy.

> I value my own time more than a few extra CPU cycles.

Then you won't write code like this in Python.

> Also never underestimate the truth in Knuth's claim that premature
> optimization is the root of all evil in computer programming.

Indeed, only the Python was optimised.

> NumPy can be slower than equivalent C by an order of magnitude. But
> still, that may be more than fast enough. The beauty of NumPy is really
> the way it allows arrays of numerical data to be created and
> manipulated within Python, not the speed of python code using NumPy
> objects. We have already seen that NumPy is faster than the most recent
> version of Matlab, the most expensive commercial tool on the marked;
> which means, it is as fast as you can expect a numerical scripting tool
> to be, but sure, there is still room for improvement.

Price aside, Matlab is clearly the wrong tool for the job here.

> In a numerical scripting language it is certainly not true that several
> slicing operations is slower than a single big for-loop. That is due to
> the overhead involved when the interpreter is invoked. The performance
> is to a large extent determined by the amount of interpreter
> invocations. That is why it is preferred to "vectorize" NumPy code
> using slicing, repmat, reshapes, etc., rather than looping with a
> pythonic for-loop. Even if you have to iterate over the memory several
> times, array slicing will still be a lot faster than a pythonic
> for-loop.

Yes. You have to suffer in Python if you want to approach performance that
can be reached with ease in most other languages in the case of this
benchmark (and probably most benchmarks).

>> Indeed, this algorithm could be written concisely using pattern matching
>> over linked lists. I wonder if that would be as fast as using BLAS from
>> Python.
>
> Pattern matching on linked lists? For wavelet transforms? Seriously?

Sure. Here's a list-based implementation in OCaml:

let h x0 x1 x2 x3 = h0 *. x0 +. h1 *. x1 +. h2 *. x2 +. h3 *. x3
let g x0 x1 x2 x3 = g0 *. x0 +. g1 *. x1 +. g2 *. x2 +. g3 *. x3

let rec d4_even s d t a = match t, a with
| [x0; x1; x2; x3], x4::x5::x6::_ ->
h x0 x1 x2 x3::h x2 x3 x4 x5::s, g x1 x2 x3 x4::g x3 x4 x5 x6::d
| x0::x1::(x2::x3::x4::_ as t), a ->
d4_even (h x0 x1 x2 x3 :: s) (g x1 x2 x3 x4 :: d) t a

let rec d4 = function
| [_; _; _; _] as a -> [a, []]
| a -> (fun (s, _ as h) -> h :: d4 s) (d4_even [] [] a a)

That's about 3x slower than the Python.

Jon Harrop

unread,

Dec 5, 2006, 7:36:38 AM12/5/06

to

Filip Wasilewski wrote:
> Besides of that this code is irrelevant to the original one and your
> further conclusions may not be perfectly correct. Please learn first
> about the topic of your benchmark and different variants of wavelet
> transform, namely difference between lifting scheme and dwt, and try
> posting some relevant examples or use other topic for your benchmarks.

Your lifting implementation is slightly longer and slower than the
non-lifting one but its characteristics are identical. For n=2^25 I get:

1.88s C++ (816 non-whitespace bytes)
2.00s OCaml (741 b)
2.33s F# (741 b)
9.83s Your Python (1,002 b)

The original python was 789 bytes.

> Neglecting this issues, I wonder if it is equally easy to juggle
> arbitrary multidimensional arrays in a statically typed language and do
> operations on selected axis directly from the interactive interpreter
> like in the NumPy example from my other post -
> http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
> I don't know much OCaml so it would be great if you could elaborate on
> this.

It is probably just as easy. Instead of dynamic typing you have parametric
polymorphism. If you want to make your function generic over arithmetic
type then you can pass in the arithmetic operators.

>> 0.56s C++ (direct arrays)
>> 0.61s F# (direct arrays)
>> 0.62s OCaml (direct arrays)
>> 1.38s OCaml (slices)
>> 2.38s Python (slices)
>> 10s Mathematica 5.1
>>
>> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
>> a[i]).
>

> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

1. Benchmarks should not just include two inappropriate languages.
2. Speed aside, the other languages are more concise.

> Could you please share full benchmark code, so we could make
> conclusions too?

I'll paste the whole programs at the end...

> I would like to get to know about your benchmark
> methodology. I wonder if you are allocating the input data really
> dynamically or whether it's size is a priori knowledge for the
> compiler.

The knowledge is there for the compiler to use but I don't believe any of
them exploit it.

>> In this specific context (discrete wavelet transform benchmark), I'd have
>> said that speed was the most important thing after correctness.
>

> Not necessarily, if you are doing research with different kinds of
> wavelets and need a general, flexible and easily adaptable tool or just
> the computation speed is not considered a bottleneck.

Brevity is probably next.

> Language speed is a great advantage, but if it always was the most
> important, people would talk directly to the CPUs or knit DSP chips in
> theirs backyards whenever they need to solve a calculation problem.

Sure.

> Please don't make this discussion heading towards `what is the fastest
> way of doing d4 transform and why OCaml rules` instead of `what is the
> optimal way of doing things under given circumstances and still have a
> free afternoon ;-)`.

Comments like that seem to be based on the fundamental misconception that
writing C++ like this is hard.

> Different tasks need different, well-balanced
> measures. Neither Python nor OCalm nor any other language is a panacea
> for every single problem.

Absolutely. OCaml is as good as the next (compiled) language in this case.
Python and Matlab seem to be particularly bad at this task.

Here's my complete OCaml:

let a = sqrt 3. and b = 4. *. sqrt 2.
let h0, h1, h2, h3 =
(1. +. a) /. b, (3. +. a) /. b, (3. -. a) /. b, (1. -. a) /. b
let g0, g1, g2, g3 = h3, -.h2, h1, -.h0

let rec d4_aux tmp a n =

let n2 = n lsr 1 in

for i=0 to n2-2 do
tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
done;
tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
Array.blit tmp 0 a 0 n;

if n > 4 then d4_aux tmp a (n lsr 1)

let d4 a =
let tmp = Array.make (Array.length a) 0. in
d4_aux tmp a (Array.length a)

let () =
print_endline "Generating test data...";
let x = Array.init (1 lsl 25) (fun _ -> Random.float 1.) in
print_endline "Benchmarking...";
let t1 = Sys.time() in
ignore(d4 x);
Printf.printf "Elapsed time is %.6f seconds\n" (Sys.time() -. t1)

and C++:

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

using namespace std;

double a = sqrt(3), b = 4 * sqrt(2);
double h0 = (1 + a) / b, h1 = (3 + a) / b, h2 = (3 - a) / b, h3 = (1 - a) /
b;
double g0 = h3, g1 = -h2, g2 = h1, g3 = -h0;

void d4(vector<double> &a) {
vector<double> tmp(a.size());
for (int n = a.size(); n >= 4; n >>= 1) {
int half = n >> 1;

for (int i=0; i<n-2; i+=2) {
tmp.at(i/2) = a.at(i)*h0 + a.at(i+1)*h1 + a.at(i+2)*h2 + a.at(i+3)*h3;
tmp.at(i/2+half) = a.at(i)*h3 - a.at(i+1)*h2 + a.at(i+2)*h1 -
a.at(i+3)*h0
;
}

tmp.at(half-1) = a.at(n-2)*h0 + a.at(n-1)*h1 + a.at(0)*h2 + a.at(1)*h3;
tmp.at(2*half-1) = a.at(n-2)*h3 - a.at(n-1)*h2 + a.at(0)*h1 -
a.at(1)*h0;

copy(tmp.begin(), tmp.begin() + n, a.begin());
}
}

int main() {
cout << "Generating data...\n";
vector<double> a(1 << 25);
for (int i=0; i<a.size(); ++i) a.at(i) = rand();
cout << "Benchmarking...\n";
double t1 = clock();
d4(a);
cout << "Elapsed time " << (clock() - t1) / CLOCKS_PER_SEC << "s\n";
}

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Mark Morss

unread,

Dec 5, 2006, 10:35:37 AM12/5/06

to

I doubt that anyone would dispute that even as boosted by Numpy/Scipy,
Python will almost certainly be notably slower than moderately
well-written code in a compiled language. The reason Numpy exists,
however, is not to deliver the best possible speed, but to deliver
enough speed to make it possible to solve large numerical problems with
the powerful and flexible Python language. As observed by Hans
Latangen in _Python Scripting for Computational Science_, 2nd ed.,
Springer 2005, scientific computing is more than number crunching:

"Very often programming is about shuffling data in and out of different
tools, converting one data format to another, extracting numerical data
from a text, and administering numerical experiments involving a large
number of data files and directories. Such tasks are much faster to
accomplish in a language like Python than in Fortran, C, C++ or Java."

He might well have mentioned the importance of developing nice-looking
reports once the analysis is complete, and that development is simpler
and more flexible in Python than a compiled language.

In principle, I agree that heavy-duty number-crunching, at least if it
has to be repeated again and again, should be accomplished by means of
a compiled language. However, if someone has to solve many different
problems just one time, or just a few times (for example if you are an
engineering consultant), there is an excellent argument for using
Python + Numpy. Unless it affects feasibility, I opine, computational
speed is important primarily in context of regular production, e.g.,
computing the daily "value at risk" in a commodity trading portfolio,
or making daily weather predictions.

I am aware of the power and flexibility of the OCaml language, and
particularly that an OCaml user can easily switch back and forth
between interpreted and compiled implementation. I'm attacted to OCaml
and, indeed, I'm in the process of reading Smith's (unfortunately not
very well-written) _Practical OCaml_. However, I also understand that
OCaml supports only double-precision implementation of real numbers;
that its implementation of arrays is a little clunky compared to
Fortran 95 or Numpy (and I suspect not as fast as Fortran's); and that
the available libraries, while powerful, are by no means as
comprehensive as those available for Python. For example, I am unaware
of the existance of an HDF5 interface for OCaml.

In summary, I think that OCaml is an excellent language, but I think
that the question of whether to use it in preference to Python+Numpy
for general-purpose numerical analysis must rest on much more than its
computational speed.

Mark Morss

unread,

Dec 5, 2006, 10:37:06 AM12/5/06

to

Hans >Langtangen<, rather.

Mark Morss

unread,

Dec 5, 2006, 12:16:07 PM12/5/06

to

Carl,

I agree with practically everything you say about the choice between
Python and functional languages, but apropos of Ocaml, not these
remarks:

>
> In the same way that a screwdriver can't prevent you from driving a
> nail. Give me a break, we all know these guys (Haskell especially) are
> designed to support functional first and other paradigms second. Way
> second. That's quite enough imposing for me.

> ...

> And let's face it, functional
> languages are (or seem to be) all about purity.

As you will see if you look at some OCaml references (there are a
number that are available on the OCaml website) that OCaml goes a very
long way toward facilitating imperative, procedural methods. It's
possible to write perfectly good code in OCaml and not use functional
methods. The OCaml people are fond of saying "OCaml isn't pure."
Indeed, the need for speed is one very good reason to choose and
imperative style over a functional one.

Konrad Hinsen

unread,

Dec 6, 2006, 5:52:12 AM12/6/06

to pytho...@python.org

On Dec 5, 2006, at 16:35, Mark Morss wrote:

> very well-written) _Practical OCaml_. However, I also understand that
> OCaml supports only double-precision implementation of real numbers;
> that its implementation of arrays is a little clunky compared to
> Fortran 95 or Numpy (and I suspect not as fast as Fortran's); and that
> the available libraries, while powerful, are by no means as
> comprehensive as those available for Python. For example, I am
> unaware
> of the existance of an HDF5 interface for OCaml.

OCaml is a popular language with computer scientists in France (not
surprisingly, given its origins), and I have regular contacts with
the computer science research community, so I have been exposed quite
a bit to OCaml, to the point of once considering it for numerical
work. Not as a replacement for Python, but as a replacement for C or
Fortran, i.e. for the heavy-duty parts of my work.

What made me abandon this idea are mainly two points:
1) the lack of polymorphism
2) the difficulty of interfacing to C and Fortran code
3) the limited processor support of the native code compiler

The lack of polymorphism, particularly in operators, makes OCaml code
a pain to write and read in my opinion. Float arithmetic is already
bad enough, with "+." etc. everywhere. If I add complex and vector
types (which could easily be defined in OCaml), I'd have to come up
with two more sets of arithmetic operators that make my code less
readable. In addition, I'd either have to keep multiple
implementations for many algorithms, or pass in all operators and
mathematical functions as arguments. Neither solution is acceptable
for me.

Interfacing to C and Fortran code is important because that's what
most libraries are written in. While it is possible in principle with
OCaml, it is (or at least was at the time I looked) a pain to
interface typical array-handling code, for lack of a compatible data
structure on the OCaml side.

Native code compilation is obviously important for speed. While many
popular processors are supported by ocamlopt, scientific users are
notorious for grabbing whatever fast hardware they can lay their
hands on. It seems safe to count on the GNU suite being ported
rapidly to any new platform. I don't have as much faith in the OCaml
developers, though perhaps just out of ignorance.

Konrad.

Jon Harrop

unread,

Dec 10, 2006, 5:23:17 AM12/10/06

to

Konrad Hinsen wrote:
> The lack of polymorphism, particularly in operators, makes OCaml code
> a pain to write and read in my opinion.

F# addresses this by adding operator overloading. However, you have to add
more type annotations...

> Interfacing to C and Fortran code is important because that's what
> most libraries are written in. While it is possible in principle with
> OCaml, it is (or at least was at the time I looked) a pain to
> interface typical array-handling code, for lack of a compatible data
> structure on the OCaml side.

You want bigarrays, they are just C/Fortran arrays. Look at the bindings to
FFTW/LAPACK, for example.

> Native code compilation is obviously important for speed. While many
> popular processors are supported by ocamlopt, scientific users are
> notorious for grabbing whatever fast hardware they can lay their
> hands on. It seems safe to count on the GNU suite being ported
> rapidly to any new platform.

OCaml already supports 9 architectures and optimised to AMD64 earlier than
gcc. How many do you want?

Jon Harrop

unread,

Dec 10, 2006, 5:25:48 AM12/10/06

to

Carl Banks wrote:
> I doubt it would work too well.

What about translating the current Python interpreter into a language with a
GC, like MLton-compiled SML? That would probably make it much faster, more
reliable and easier to develop.

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

konrad...@laposte.net

unread,

Dec 10, 2006, 7:29:09 AM12/10/06

to pytho...@python.org

On 10.12.2006, at 11:23, Jon Harrop wrote:

> Konrad Hinsen wrote:
>> The lack of polymorphism, particularly in operators, makes OCaml code
>> a pain to write and read in my opinion.
>
> F# addresses this by adding operator overloading. However, you have
> to add more type annotations...

That sounds interesting, but I'd have to see this in practice to form
an opinion. As long as F# is a Windows-only language, I am not
interested in it anyway.

> You want bigarrays, they are just C/Fortran arrays. Look at the
> bindings to
> FFTW/LAPACK, for example.

That's good news, thanks.

>> Native code compilation is obviously important for speed. While many
>> popular processors are supported by ocamlopt, scientific users are
>> notorious for grabbing whatever fast hardware they can lay their
>> hands on. It seems safe to count on the GNU suite being ported
>> rapidly to any new platform.
>
> OCaml already supports 9 architectures and optimised to AMD64
> earlier than gcc. How many do you want?

It's not a matter of number, it's a matter of availability when new
processors appear on the market. How much time passes on average
between the availability of a new processor type and the availability
of a native code compiler for OCaml?

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hin...@cnrs-orleans.fr
---------------------------------------------------------------------

Paul Rubin

unread,

Dec 10, 2006, 7:39:46 AM12/10/06

to

Jon Harrop <j...@ffconsultancy.com> writes:
> What about translating the current Python interpreter into a language with a
> GC, like MLton-compiled SML? That would probably make it much faster, more
> reliable and easier to develop.

Same issue as translating to CL, Python semantics are kind of odd and
don't map that well to Lisp and probably not to ML either (hint:
dynamic types). Scheme is probably the best bet but even still there
are serious difference, like Python strings being immutable.

Right now Python is being reimplemented in itself, as any
self-respecting language should do.

Carl Banks

unread,

Dec 10, 2006, 10:54:29 AM12/10/06

to

Jon Harrop wrote:
> Carl Banks wrote:
> > I doubt it would work too well.
>
> What about translating the current Python interpreter into a language with a
> GC, like MLton-compiled SML? That would probably make it much faster, more
> reliable and easier to develop.

I doubt it would work too well. MLton-compiled SML's semantics differ

from Python's in subtle ways, and covering the edge cases usually means
you have to do all the stuff you thought you would avoid by using

another dynamically-typed language. Translating MTton-compiled SML

Jon Harrop

unread,

Dec 11, 2006, 8:21:31 AM12/11/06

to

konrad...@laposte.net wrote:
> On 10.12.2006, at 11:23, Jon Harrop wrote:
>> F# addresses this by adding operator overloading. However, you have
>> to add more type annotations...
>
> That sounds interesting, but I'd have to see this in practice to form
> an opinion. As long as F# is a Windows-only language, I am not
> interested in it anyway.

F# runs under Linux with Mono.

>> OCaml already supports 9 architectures and optimised to AMD64
>> earlier than gcc. How many do you want?
>
> It's not a matter of number, it's a matter of availability when new
> processors appear on the market. How much time passes on average
> between the availability of a new processor type and the availability
> of a native code compiler for OCaml?

OCaml had AMD64 support before many other language. It had a better
optimiser before gcc...

Jon Harrop

unread,

Dec 11, 2006, 8:24:14 AM12/11/06

to

Carl Banks wrote:
> Jon Harrop wrote:

>> What about translating the current Python interpreter into a language
>> with a GC, like MLton-compiled SML? That would probably make it much
>> faster, more reliable and easier to develop.
>
> I doubt it would work too well. MLton-compiled SML's semantics differ
> from Python's in subtle ways, and covering the edge cases usually means
> you have to do all the stuff you thought you would avoid by using
> another dynamically-typed language. Translating MTton-compiled SML
> integers to Python ints would work probably 99 percent of the time, but
> in the end you'd essentially have to reimplement the Python int type.
>
> If you're going to go through all that work, you might as well
> translate it to C or directly to machine code.

That's not what I meant. I was referring to translating the Python
_interpreter_ into another language, not translating Python programs into
other languages. MLton-compiled SML is especially fast at symbolic
manipulation, e.g. interpreters, so it will be probably be as fast or
faster than the current Python interpreter. Then you can start boiling the
interpreter down, removing the GC for a start because MLton already has a
much better GC...

Paul Rubin

unread,

Dec 11, 2006, 8:30:51 AM12/11/06

to

Jon Harrop <j...@ffconsultancy.com> writes:
> F# runs under Linux with Mono.

Interesting, where do I get it, and is there source? I've never been
interested in Mono but maybe this is a reason. How does the compiled
code compare to OCaml or MLton code?

Paul Rubin

unread,

Dec 11, 2006, 8:39:58 AM12/11/06

to

Jon Harrop <j...@ffconsultancy.com> writes:
> That's not what I meant. I was referring to translating the Python
> _interpreter_ into another language, not translating Python programs into
> other languages. MLton-compiled SML is especially fast at symbolic
> manipulation, e.g. interpreters, so it will be probably be as fast or
> faster than the current Python interpreter. Then you can start boiling the
> interpreter down, removing the GC for a start because MLton already has a
> much better GC...

Well, work is already under way (already mentioned) to implement
Python in Python, including a reasonable compiler (Psyco).

The big deficiency of MLton from a concurrency perspective is
inability to use multiprocessors. Of course CPython has the same
deficiency. Same with OCaml. Is the ML community trying to do
anything about this?

Beliavsky

unread,

Dec 11, 2006, 12:57:26 PM12/11/06

to

I came across SAGE "Software for Algebra and Geometry Experimentation"
http://sage.math.washington.edu/sage/ , which includes Python and
Numeric and consists of

Group theory and combinatorics -- GAP
Symbolic computation and Calculus -- Maxima
Commutative algebra -- Singular
Number theory -- PARI, MWRANK, NTL
Graphics -- Matplotlib
Numerical methods -- GSL and Numeric
Mainstream programming language -- Python
Interactive shell -- IPython
Graphical User Interface -- The SAGE Notebook
Versioned Source Tracking -- Mercurial HG

Maybe people who has tried this could give their impressions.

ols...@verizon.net

unread,

Dec 11, 2006, 1:31:22 PM12/11/06

to

The source is avaliable, but it's under Microsoft's Shared Source
license, which isn't quite an open source license. There are some
restrictions on commercial usage.

http://research.microsoft.com/fsharp/fsharp-license.txt

Mark Morss

unread,

Dec 11, 2006, 2:29:07 PM12/11/06

to

> The [F#] source is avaliable, but it's under Microsoft's Shared Source

> license, which isn't quite an open source license. There are some
> restrictions on commercial usage.
>

You can call me a bigot, but it will be engraved upon my tombstone that
I never used a proprietary Microsoft language.

Jon Harrop

unread,

Dec 12, 2006, 6:42:35 AM12/12/06

to

Paul Rubin wrote:
> Well, work is already under way (already mentioned) to implement
> Python in Python, including a reasonable compiler (Psyco).
>
> The big deficiency of MLton from a concurrency perspective is
> inability to use multiprocessors. Of course CPython has the same
> deficiency. Same with OCaml. Is the ML community trying to do
> anything about this?

Yes. Several MLs support concurrency. Perhaps F# is the most notable in this
case because it would provide an easier path to JIT (to .NET IL).

Concurrent GC seems to be a lot slower though (e.g. Java or .NET vs OCaml or
MLton).

Ramon Diaz-Uriarte

unread,

Dec 12, 2006, 7:44:26 AM12/12/06

to Jon Harrop, pytho...@python.org

On 12/12/06, Jon Harrop <j...@ffconsultancy.com> wrote:
> Paul Rubin wrote:
> > Well, work is already under way (already mentioned) to implement
> > Python in Python, including a reasonable compiler (Psyco).
> >
> > The big deficiency of MLton from a concurrency perspective is
> > inability to use multiprocessors. Of course CPython has the same
> > deficiency. Same with OCaml. Is the ML community trying to do
> > anything about this?
>
> Yes. Several MLs support concurrency. Perhaps F# is the most notable in this
> case because it would provide an easier path to JIT (to .NET IL).
>
> Concurrent GC seems to be a lot slower though (e.g. Java or .NET vs OCaml or
> MLton).
>

Would this be of any interest? (specially since the Lisp vs. Python
thread seems to be dying out :-). Its Scheme (Gambit, so one of the
speedy Schemes) with Erlang-like concurrency

http://toute.ca/
http://lambda-the-ultimate.org/node/841

Best,

R.

> --
> Dr Jon D Harrop, Flying Frog Consultancy
> Objective CAML for Scientists
> http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

Konrad Hinsen

unread,

Dec 12, 2006, 8:37:19 AM12/12/06

to Jon Harrop, pytho...@python.org

On Dec 11, 2006, at 14:21, Jon Harrop wrote:

> F# runs under Linux with Mono.

Well, then it should also run on my Mac... Do you have any experience
with performance of numerical code under Mono, or, for that matter,
under .NET? I suspect that the JIT compilers were not written with
number crunching in mind, but perhaps I am wrong.

Jon Harrop

unread,

Dec 12, 2006, 10:38:06 AM12/12/06

to

Konrad Hinsen wrote:
> Well, then it should also run on my Mac... Do you have any experience
> with performance of numerical code under Mono, or, for that matter,
> under .NET? I suspect that the JIT compilers were not written with
> number crunching in mind, but perhaps I am wrong.

Actually, F# seems to be very fast for array-based floating-point
calculations. From the timings I gave earlier in this thread, for example,
it is between C++ and OCaml.

F# (actually .NET) seems to be slower for GC intensive programs, like my ray
tracer. I suspect this is because a concurrent GC is inherently slower. I
haven't benchmarked this in detail but F# seems to be 2-4x slower than
OCaml here.

Filip Wasilewski

unread,

Dec 12, 2006, 11:38:41 AM12/12/06

to

Jon Harrop wrote:
> Filip Wasilewski wrote:
> > Besides of that this code is irrelevant to the original one and your
> > further conclusions may not be perfectly correct. Please learn first
> > about the topic of your benchmark and different variants of wavelet
> > transform, namely difference between lifting scheme and dwt, and try
> > posting some relevant examples or use other topic for your benchmarks.
>
> Your lifting implementation is slightly longer and slower than the
> non-lifting one but its characteristics are identical. For n=2^25 I get:

Jon, both Python and Matlab implementations discussed here use the
lifting scheme, while yours is a classic convolution based approach.
These are two *different* algorithms for computing wavelet transforms.
Although they have similar names and give similar results, it does not
mean you can use them interchangeably in one benchmark! It just does
not make sense.

What's more, taking only very big input 'n' gives only very limited
information about true performance. How do you think who is faster, a
sprinter or a long distance runner? If you want to draw objective
conclusions about algorithm implementation you should measure timing
and memory usage characteristic for different input lengths. This will
also give you some idea about call overhead and possible memory
bandwidth influence. For example, the 4-tap db2 lifting transform
should be about 50% faster than the one using subband filtering. That's
theory. In practice, due to specific memory usage, the speedup gained
with reduction of number of arithmetic operations is easily wasted by
huge cache miss overhead (C implementation, measured on x86
architecture) for arrays which don't fit entirely in the CPU cache. See
my point?

> 1.88s C++ (816 non-whitespace bytes)
> 2.00s OCaml (741 b)
> 2.33s F# (741 b)
> 9.83s Your Python (1,002 b)
>
> The original python was 789 bytes.

Is the byte count a standard measure you apply to describe code
quality? I don't think you would be much happier to see totally
obfuscated golf one-liners.

> > Neglecting this issues, I wonder if it is equally easy to juggle
> > arbitrary multidimensional arrays in a statically typed language and do
> > operations on selected axis directly from the interactive interpreter
> > like in the NumPy example from my other post -
> > http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
> > I don't know much OCaml so it would be great if you could elaborate on
> > this.
>
> It is probably just as easy. Instead of dynamic typing you have parametric
> polymorphism. If you want to make your function generic over arithmetic
> type then you can pass in the arithmetic operators.

Probably? Using the interactive interpreter, how can I easily create
two n-dimensional arrays of arbitrary data type, add them and multiply
by 17?

> >> In this specific context (discrete wavelet transform benchmark), I'd have
> >> said that speed was the most important thing after correctness.
> >
> > Not necessarily, if you are doing research with different kinds of
> > wavelets and need a general, flexible and easily adaptable tool or just
> > the computation speed is not considered a bottleneck.
>
> Brevity is probably next.

Memory usage, development time, cost, portability, scalability,
readability, level of abstraction, usage experience and many many
others, not necessarily in this order. I do not intend getting into
advocacy of any specific language or feature here. I just prefer fair
comparisons and clean use cases instead of marketing like that and I
don't believe anyone will buy your story about OCaml superior brevity
after seeing one fairly irrelevant loop with few arithmetic operations
as a killer argument.

> > Please don't make this discussion heading towards `what is the fastest
> > way of doing d4 transform and why OCaml rules` instead of `what is the
> > optimal way of doing things under given circumstances and still have a
> > free afternoon ;-)`.
>
> Comments like that seem to be based on the fundamental misconception that
> writing C++ like this is hard.

Not harder than writing a loop in bazillion of other languages. When
developing software do you only count time spent on typing because you
are idle during build process?

> Here's my complete OCaml:

...
> and C++:
> #include <vector>
...

You didn't mention your platform and compiler settings. Using vector
at() method instead of plain C/C++ array indexing increases the timings
by factor of ~1.5 to 10 or more (GCC, MSVC), depending on the
optimizations set, so it doesn't seem to be the best possible solution.

I was also unable to run your OCaml example for n >= 2^21 (win32, out
of the box OCaml installation). Is there some 16MB memory limit imposed
on Array?
# let q = Array.make (1 lsl 21) 0.0;;
Exception: Invalid_argument "Array.make".

cheers,
fw

konrad...@laposte.net

unread,

Dec 12, 2006, 4:53:18 PM12/12/06

to Jon Harrop, pytho...@python.org

On 11.12.2006, at 14:21, Jon Harrop wrote:

>> It's not a matter of number, it's a matter of availability when new
>> processors appear on the market. How much time passes on average
>> between the availability of a new processor type and the availability
>> of a native code compiler for OCaml?
>
> OCaml had AMD64 support before many other language. It had a better
> optimiser before gcc...

A concrete example of interest to me: can I get an OCaml-to-native
compiler for an IBM BlueGene? The processor is in the PowerPC family,
but it has some modifications, and the binary format is different
from standard Linux as well.

Jon Harrop

unread,

Dec 13, 2006, 4:05:25 AM12/13/06

to

Filip Wasilewski wrote:
> Jon, both Python and Matlab implementations discussed here use the
> lifting scheme, while yours is a classic convolution based approach.

I've done both in OCaml. The results are basically the same.

> These are two *different* algorithms for computing wavelet transforms.
> Although they have similar names and give similar results, it does not
> mean you can use them interchangeably in one benchmark! It just does
> not make sense.

It makes sense because they solve the same problem. When you're comparing
non-trivial problems between disparate languages you are not likely to use
the same algorithms. Using built-in hash functions is an obvious example.

> What's more, taking only very big input 'n' gives only very limited
> information about true performance.

I've presented times for 2 different "n". I agree that it would be better to
present infinite different "n" but I only had finite time.

> If you want to draw objective
> conclusions about algorithm implementation you should measure timing
> and memory usage characteristic for different input lengths.

I did. The results are unsurprising: multi-pass (Matlab/Python) gets
comparatively slower for bigger n. Memory usage is roughly the same for
different languages.

> This will
> also give you some idea about call overhead and possible memory
> bandwidth influence. For example, the 4-tap db2 lifting transform
> should be about 50% faster than the one using subband filtering. That's
> theory. In practice, due to specific memory usage, the speedup gained
> with reduction of number of arithmetic operations is easily wasted by
> huge cache miss overhead (C implementation, measured on x86
> architecture) for arrays which don't fit entirely in the CPU cache. See
> my point?

No. The effects you are talking about are swamped by the interpreted vs
compiled effect.

>> 1.88s C++ (816 non-whitespace bytes)
>> 2.00s OCaml (741 b)
>> 2.33s F# (741 b)
>> 9.83s Your Python (1,002 b)
>>
>> The original python was 789 bytes.
>
> Is the byte count a standard measure you apply to describe code
> quality?

You can use bytes, lines, words, tokens or just look at the code. Whatever
you do, Python loses on this benchmark in terms of brevity, clarity and
performance.

> I don't think you would be much happier to see totally
> obfuscated golf one-liners.

That doesn't even make sense. Squeezing code onto one line doesn't improve
byte count.

>> It is probably just as easy. Instead of dynamic typing you have
>> parametric polymorphism. If you want to make your function generic over
>> arithmetic type then you can pass in the arithmetic operators.
>
> Probably? Using the interactive interpreter, how can I easily create
> two n-dimensional arrays of arbitrary data type, add them and multiply
> by 17?

In F#:

open Math.Matrix.Generic
let a = init n m (fun i j -> ...)
let b = init n m (fun i j -> ...)
17. $* (a + b)

In OCaml you'd either use an array of arrays and define your own functions
over them, or you'd use the built-in Bigarray.Array2 and define some
functions over that. Either way, you have to use different operator names
for different types. You'd end up with something like:

open Array2
let a = init n m (fun i j -> ...)
let b = init n m (fun i j -> ...)
17. $* (a +: b)

> I just prefer fair
> comparisons and clean use cases instead of marketing like that and I
> don't believe anyone will buy your story about OCaml superior brevity
> after seeing one fairly irrelevant loop with few arithmetic operations
> as a killer argument.

Sure. There are lots of other examples of OCaml's brevity out there:

http://www.ffconsultancy.com/free/ocaml
http://www.ffconsultancy.com/free/sudoku
http://www.ffconsultancy.com/free/maze
http://www.ffconsultancy.com/free/ray_tracer

and now F#:

http://www.ffconsultancy.com/dotnet/fsharp/

>> Here's my complete OCaml:
> ...
>> and C++:
>> #include <vector>
> ...
>
> You didn't mention your platform and compiler settings.

2.2GHz Athlon64 x2 with 2Gb RAM running Debian Linux.

g++ -O2
ocamlopt

> Using vector
> at() method instead of plain C/C++ array indexing increases the timings
> by factor of ~1.5 to 10 or more (GCC, MSVC), depending on the
> optimizations set, so it doesn't seem to be the best possible solution.

Bounds checking is <1% slower with GCC and only 14% slower with MSVC.

> I was also unable to run your OCaml example for n >= 2^21 (win32, out
> of the box OCaml installation). Is there some 16MB memory limit imposed
> on Array?
> # let q = Array.make (1 lsl 21) 0.0;;
> Exception: Invalid_argument "Array.make".

On 32-bit, yes. You'd either have to use a different data structure
(Bigarrays), upgrade or switch to F#.

Jon Harrop

unread,

Dec 13, 2006, 4:46:09 AM12/13/06

to

konrad...@laposte.net wrote:
> A concrete example of interest to me: can I get an OCaml-to-native
> compiler for an IBM BlueGene? The processor is in the PowerPC family,
> but it has some modifications, and the binary format is different
> from standard Linux as well.

No idea. OCaml has quite a good PPC backend for the Mac already so you might
be able to persuade a student to do the conversion as a project.

If Mono runs on that platform then you might also consider F#...

Marc 'BlackJack' Rintsch

unread,

Dec 13, 2006, 8:24:26 AM12/13/06

to

In <457fc2f5$0$8732$ed26...@ptn-nntp-reader02.plus.net>, Jon Harrop
wrote:

>> I don't think you would be much happier to see totally obfuscated golf
>> one-liners.
>
> That doesn't even make sense. Squeezing code onto one line doesn't
> improve byte count.

So you don't count line endings when counting bytes. ;-)

Ciao,
Marc 'BlackJack' Rintsch

Jon Harrop

unread,

Dec 13, 2006, 9:05:11 AM12/13/06

to

Marc 'BlackJack' Rintsch wrote:
> So you don't count line endings when counting bytes. ;-)

You'd probably replace "\n" -> " " so it wouldn't affect the byte count.
Anyway, I think I was using non-whitespace bytes, so neither "\n" nor " "
is counted.

Filip Wasilewski

unread,

Dec 13, 2006, 1:41:51 PM12/13/06

to

Jon Harrop wrote:
> Filip Wasilewski wrote:
> > Jon, both Python and Matlab implementations discussed here use the
> > lifting scheme, while yours is a classic convolution based approach.
>
> I've done both in OCaml. The results are basically the same.

Have you tried taking advantage of the 50% reduction of arithmetic
operations for the lifting scheme?

> > These are two *different* algorithms for computing wavelet transforms.
> > Although they have similar names and give similar results, it does not
> > mean you can use them interchangeably in one benchmark! It just does
> > not make sense.
>
> It makes sense because they solve the same problem. When you're comparing
> non-trivial problems between disparate languages you are not likely to use
> the same algorithms. Using built-in hash functions is an obvious example.

I don't even ask if you have checked the output result coefficients
because I'm pretty sure you have not. They solve similar problem but in
different ways and give a bit different results, for example the
layout of coefficients in the memory is different as well as border
distortion handling. Please don't tell me it's better to do something
else instead of presenting a relevant solution.

By the way, how do you think why there are both implementations
included in Matlab Wavelet Toolbox and there are people paying lots of
money for it?

> > What's more, taking only very big input 'n' gives only very limited
> > information about true performance.
>
> I've presented times for 2 different "n". I agree that it would be better to
> present infinite different "n" but I only had finite time.

A simple loglog plot with 25 or so dyadic lengths would be just fine. I
don't expect you to measure times for non-dyadic lengths because
neither of the solutions presented here so far support this.

[...]

> > This will
> > also give you some idea about call overhead and possible memory
> > bandwidth influence. For example, the 4-tap db2 lifting transform
> > should be about 50% faster than the one using subband filtering. That's
> > theory. In practice, due to specific memory usage, the speedup gained
> > with reduction of number of arithmetic operations is easily wasted by
> > huge cache miss overhead (C implementation, measured on x86
> > architecture) for arrays which don't fit entirely in the CPU cache. See
> > my point?
>
> No. The effects you are talking about are swamped by the interpreted vs
> compiled effect.

Have I already mentioned it was observed with low-level C
implementation on x86? I would really appreciate you took some time on
exploring the unfortunate topic of this benchmark before posting
another comment like that.

> >> 1.88s C++ (816 non-whitespace bytes)
> >> 2.00s OCaml (741 b)
> >> 2.33s F# (741 b)
> >> 9.83s Your Python (1,002 b)
> >>
> >> The original python was 789 bytes.
> >
> > Is the byte count a standard measure you apply to describe code
> > quality?
>
> You can use bytes, lines, words, tokens or just look at the code. Whatever
> you do, Python loses on this benchmark in terms of brevity, clarity and
> performance.

Jon, as I have already said, you did not even show a fully comparable
example yet. What I would really like to see is a lifting scheme
implementation that can take 1- to 10-dimensional arrays (single and
double precision floats, convert on the fly from integer types when
necessary and handle strided, non-contiguous arrays), axis and maximum
decomposition level as input and return list of views (or arrays if
necessary) with transform coefficients for every level of
decomposition. Can do?

> > I don't think you would be much happier to see totally
> > obfuscated golf one-liners.
>
> That doesn't even make sense. Squeezing code onto one line doesn't improve
> byte count.

Yeah, right ;-)

> >> It is probably just as easy. Instead of dynamic typing you have
> >> parametric polymorphism. If you want to make your function generic over
> >> arithmetic type then you can pass in the arithmetic operators.
> >
> > Probably? Using the interactive interpreter, how can I easily create
> > two n-dimensional arrays of arbitrary data type, add them and multiply
> > by 17?
>
> In F#:
>
> open Math.Matrix.Generic
> let a = init n m (fun i j -> ...)
> let b = init n m (fun i j -> ...)
> 17. $* (a + b)
>
> In OCaml you'd either use an array of arrays and define your own functions
> over them, or you'd use the built-in Bigarray.Array2 and define some
> functions over that. Either way, you have to use different operator names
> for different types. You'd end up with something like:
>
> open Array2
> let a = init n m (fun i j -> ...)
> let b = init n m (fun i j -> ...)
> 17. $* (a +: b)

Are you telling I have to define a different operator for every
arithmetic operation for every two types? That definitely does not look
like a quick, flexible and clear solution to start with. Any working
example I could type into OCamlWinPlus of the following would be very
appreciated. I'm fairly new to this and would really like to see how to
handle differently shaped arrays and data types in the interactive mode
without much hassle:

from numpy import ones, arange, reshape, int32
a = ones((2,3,4,5))
b = ones(a.shape, dtype=int32)
c = ones((3,4,5))
d = 2*a + 2.5*(3*b + 3.3)
d[0] = d[0] + 5
d[1] *= c * (2+3*1.2)
d[:,2,...] = reshape(arange(d[:,2,...].size), d[:,2,...].shape)
print d[...,1]

> > I just prefer fair
> > comparisons and clean use cases instead of marketing like that and I
> > don't believe anyone will buy your story about OCaml superior brevity
> > after seeing one fairly irrelevant loop with few arithmetic operations
> > as a killer argument.
>
> Sure. There are lots of other examples of OCaml's brevity out there:
>
> http://www.ffconsultancy.com/free/ocaml
> http://www.ffconsultancy.com/free/sudoku
> http://www.ffconsultancy.com/free/maze
> http://www.ffconsultancy.com/free/ray_tracer

The ray tracer example is really nice, however the sudoku solver is not
so impressive when compared to the K thingy ;-) -
http://markbyers.com/moinmoin/moin.cgi/ShortestSudokuSolver.

[...]

> > I was also unable to run your OCaml example for n >= 2^21 (win32, out
> > of the box OCaml installation). Is there some 16MB memory limit imposed
> > on Array?
> > # let q = Array.make (1 lsl 21) 0.0;;
> > Exception: Invalid_argument "Array.make".
>
> On 32-bit, yes. You'd either have to use a different data structure
> (Bigarrays), upgrade or switch to F#.

That renders the Array type almost useless for me. Where can I find
documentation or tutorial on using Bigarrays?

best,
fw

Jon Harrop

unread,

Dec 14, 2006, 4:59:00 AM12/14/06

to

Paul Rubin wrote:
> Interesting, where do I get it, and is there source? I've never been
> interested in Mono but maybe this is a reason. How does the compiled
> code compare to OCaml or MLton code?

GC intensive code is 2-4x slower than OCaml or 4-8x slower than MLton.
Floating point intensive code is as fast as OCaml and MLton.

Jon Harrop

unread,

Dec 14, 2006, 5:36:37 AM12/14/06

to

Filip Wasilewski wrote:
> Jon Harrop wrote:
>> Filip Wasilewski wrote:
>> > Jon, both Python and Matlab implementations discussed here use the
>> > lifting scheme, while yours is a classic convolution based approach.
>>
>> I've done both in OCaml. The results are basically the same.
>
> Have you tried taking advantage of the 50% reduction of arithmetic
> operations for the lifting scheme?

Yes. The time taken is dominated by memory accesses. The amount of
arithmetic is almost irrelevant.

>> It makes sense because they solve the same problem. When you're comparing
>> non-trivial problems between disparate languages you are not likely to
>> use the same algorithms. Using built-in hash functions is an obvious
>> example.
>
> I don't even ask if you have checked the output result coefficients
> because I'm pretty sure you have not.

I checked the coefficients against other DWTs from the web.

> They solve similar problem but in
> different ways and give a bit different results, for example the
> layout of coefficients in the memory is different as well as border
> distortion handling. Please don't tell me it's better to do something
> else instead of presenting a relevant solution.

I used a more efficient approach because I could. Using a compiled language
gave me that choice. We should compare both approaches in both languages.

>> No. The effects you are talking about are swamped by the interpreted vs
>> compiled effect.
>
> Have I already mentioned it was observed with low-level C
> implementation on x86? I would really appreciate you took some time on
> exploring the unfortunate topic of this benchmark before posting
> another comment like that.

We are talking about this Python implementation, not some "low-level C
implementation on x86". With Python, it is the overhead of the interpreter
that makes it slow and encourages you to obfuscate your code in order to
make it run in a reasonable amount of time.

> What I would really like to see is a lifting scheme
> implementation that can take 1- to 10-dimensional arrays

I believe that is just a case of abstracting get/set.

> (single and double precision floats,

That is also a case of abstracting get/set.

> convert on the fly from integer types when necessary

That is dynamic typing.

> and handle strided, non-contiguous arrays),

That is also an abstraction of get/set.

> axis and maximum decomposition level as input

I don't know what that is.

> and return list of views (or arrays if necessary)

Probably just a closure.

> with transform coefficients for every level of
> decomposition. Can do?

Sure. Sounds like a lot of work though. This would be best done in F#
because you can make your own IList derivatives with get_Item and set_Item
that provide views but look like arrays.

> Are you telling I have to define a different operator for every
> arithmetic operation for every two types?

In OCaml, yes. Not in F#.

> That definitely does not look
> like a quick, flexible and clear solution to start with. Any working
> example I could type into OCamlWinPlus of the following would be very
> appreciated. I'm fairly new to this and would really like to see how to
> handle differently shaped arrays and data types in the interactive mode
> without much hassle:
>
> from numpy import ones, arange, reshape, int32
> a = ones((2,3,4,5))
> b = ones(a.shape, dtype=int32)
> c = ones((3,4,5))
> d = 2*a + 2.5*(3*b + 3.3)
> d[0] = d[0] + 5
> d[1] *= c * (2+3*1.2)
> d[:,2,...] = reshape(arange(d[:,2,...].size), d[:,2,...].shape)
> print d[...,1]

I can't get that to work in Python. What do I need to do?

>>> import numpy
>>> a = ones((2,3,4,5))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'ones' is not defined

> The ray tracer example is really nice, however the sudoku solver is not
> so impressive when compared to the K thingy ;-) -
> http://markbyers.com/moinmoin/moin.cgi/ShortestSudokuSolver.

Well, OCaml does ok on the sudoku but K does awfully on the ray tracer.

>> > I was also unable to run your OCaml example for n >= 2^21 (win32, out
>> > of the box OCaml installation). Is there some 16MB memory limit imposed
>> > on Array?
>> > # let q = Array.make (1 lsl 21) 0.0;;
>> > Exception: Invalid_argument "Array.make".
>>
>> On 32-bit, yes. You'd either have to use a different data structure
>> (Bigarrays), upgrade or switch to F#.
>
> That renders the Array type almost useless for me. Where can I find
> documentation or tutorial on using Bigarrays?

http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bigarray.html

You probably want to use F# though.

sturlamolden

unread,

Dec 14, 2006, 9:37:06 AM12/14/06

to

Jon Harrop wrote:

> Yes. The time taken is dominated by memory accesses. The amount of
> arithmetic is almost irrelevant.

That is extremely interesting. It would explain why I see almost the
same performance in NumPy and Fortran 95 on this kind of task, using
array slicing in both languages.

Robert Kern

unread,

Dec 14, 2006, 9:56:49 AM12/14/06

to pytho...@python.org

Jon Harrop wrote:
> Filip Wasilewski wrote:

>> from numpy import ones, arange, reshape, int32
>> a = ones((2,3,4,5))
>> b = ones(a.shape, dtype=int32)
>> c = ones((3,4,5))
>> d = 2*a + 2.5*(3*b + 3.3)
>> d[0] = d[0] + 5
>> d[1] *= c * (2+3*1.2)
>> d[:,2,...] = reshape(arange(d[:,2,...].size), d[:,2,...].shape)
>> print d[...,1]
>
> I can't get that to work in Python. What do I need to do?
>
>>>> import numpy
>>>> a = ones((2,3,4,5))
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> NameError: name 'ones' is not defined

Use the code that Filip wrote:

from numpy import ones, arange, reshape, int32

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Filip Wasilewski

unread,

Dec 14, 2006, 6:10:39 PM12/14/06

to

Jon Harrop wrote:
> Filip Wasilewski wrote:
> > Jon Harrop wrote:
> >> Filip Wasilewski wrote:
> >> > Jon, both Python and Matlab implementations discussed here use the
> >> > lifting scheme, while yours is a classic convolution based approach.
> >>
> >> I've done both in OCaml. The results are basically the same.
> >
> > Have you tried taking advantage of the 50% reduction of arithmetic
> > operations for the lifting scheme?
>
> Yes. The time taken is dominated by memory accesses. The amount of
> arithmetic is almost irrelevant.

I can say from my experience, that this depends much on the input size
and the CPU cache size. For arrays of about n^18 or less and 2MB cache
the speedup for the straightforward C lifting implementation is around
50%, while all operations are done in-place. For bigger arrays
traversing memory several times becomes too expensive, at least until
the input size is so big that the memory gets swapped and the inplace
method becomes faster again. I suppose that such behavior should be
quite similar for other low-level implementations as well.

Actually I started to wonder if I could automate conversion of
multi-loop lifting schemes [1] into one-pass loop. This could be done
by delaying computation of the "coming next" step just by few
units, until values from the preceding step are ready to use. This
should reduce the random memory access effect for several cases and
make the timings quite stable.

[1] http://en.wikipedia.org/wiki/Lifting_scheme

> >> It makes sense because they solve the same problem. When you're comparing
> >> non-trivial problems between disparate languages you are not likely to
> >> use the same algorithms. Using built-in hash functions is an obvious
> >> example.
> >
> > I don't even ask if you have checked the output result coefficients
> > because I'm pretty sure you have not.
>
> I checked the coefficients against other DWTs from the web.
>
> > They solve similar problem but in
> > different ways and give a bit different results, for example the
> > layout of coefficients in the memory is different as well as border
> > distortion handling. Please don't tell me it's better to do something
> > else instead of presenting a relevant solution.
>
> I used a more efficient approach because I could. Using a compiled language
> gave me that choice. We should compare both approaches in both languages.

I am getting a bit tired of this discussion. I am not arguing which
approach is faster, because it very much depends on the use case. I
just ask you to do fair comparison using the original algorithm
presented here instead of forcing your arguments with the
not-so-relevant variant convenient for you.

Every language (or at least vast majority of them) that has loops and
basic arithmetic gives you such choice, no matter compiled or not.
These are pretty simple algorithms when limited to single D4 case, but
the D4 is not the only wavelet transform known to mankind and there are
situations when you do not have such choice, because some problems just
can't be expressed using the classic approach. If the OCaml is so
superior as you claim, please show me how to do the lifting scheme
(preferably in the
http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57
form with features described in my previous post) so I could actually
learn something new and maybe even start to consider it as a possible
alternative to C-ish language family.

As far as the DWT only is concerned, well, If I were to choose the most
efficient approach (in both the CPU time and the development time
terms) for Python I would use a ready extension instead of developing
the wheel once again. Let me think, that should go something like this:

import pywt
coeffs = pywt.wavedec(x, 'db2', 'per')

> >> No. The effects you are talking about are swamped by the interpreted vs
> >> compiled effect.
> >
> > Have I already mentioned it was observed with low-level C
> > implementation on x86? I would really appreciate you took some time on
> > exploring the unfortunate topic of this benchmark before posting
> > another comment like that.
>
> We are talking about this Python implementation, not some "low-level C
> implementation on x86". With Python, it is the overhead of the interpreter
> that makes it slow and encourages you to obfuscate your code in order to
> make it run in a reasonable amount of time.

I was talking about why you should not limit your benchmarks to the
very single case and gave example how the choice of parameters can
influence the results. It has nothing to do with the interpreted vs.
compiled harangue. Would you argue that some O(n^2) algorithm is always
better than it's O(n) alternative just because it has lower call
overhead and performs better under some limited circumstances?

> > What I would really like to see is a lifting scheme
> > implementation that can take 1- to 10-dimensional arrays
>
> I believe that is just a case of abstracting get/set.
>
> > (single and double precision floats,
>
> That is also a case of abstracting get/set.
>
> > convert on the fly from integer types when necessary
>
> That is dynamic typing.
>
> > and handle strided, non-contiguous arrays),
>
> That is also an abstraction of get/set.

Oh my, are all these abstractions to be put into reality every time
from a scratch?

> > axis and maximum decomposition level as input
>
> I don't know what that is.

When doing 1D operations on a multidimensional array it is often
convenient to be able to specify along which axis these operations
should be performed, for example:

>>> x = numpy.ones((2,4))
>>> x.sum(axis=0)
array([ 2., 2., 2., 2.])
>>> x.sum(axis=1)
array([ 4., 4.])

The maximum level denotes how many decomposition steps should be
performed for multilevel transform. Just like described on
http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/wavedec.html

> > and return list of views (or arrays if necessary)
>
> Probably just a closure.
>
> > with transform coefficients for every level of
> > decomposition. Can do?
>
> Sure. Sounds like a lot of work though.

See. It's for sure pretty much work to create a general solution in C.
It is a lot easier to focus on the real problem when using Python (and
NumPy or other package from a wide range of available extensions) and
save yourself getting much into implementation details while still
maintaining reasonable performance. That is one of the most important
reasons I prefer to use high-level language for most of my tasks. I
also try to avoid doing premature optimizations that could spare some
CPU cycles, just to see in a few moments that the performance gain is
completely swallowed by I/O overhead on some remote data access point.
For the rest of intensive number crunching, if there is no specialized
extension available yet, I still prefer to use the Python/Pyrex/C or
similar combo to get solution that can be easily integrated with other
parts of a bigger system (like networking, databases, reports, web,
etc. - wow, how many things the transition from C++ and Java few years
ago made just so accessible) or just used to play in the interactive
mode.

> This would be best done in F#
> because you can make your own IList derivatives with get_Item and set_Item
> that provide views but look like arrays.
>
> > Are you telling I have to define a different operator for every
> > arithmetic operation for every two types?
>
> In OCaml, yes. Not in F#.
>
> > That definitely does not look
> > like a quick, flexible and clear solution to start with. Any working
> > example I could type into OCamlWinPlus of the following would be very
> > appreciated. I'm fairly new to this and would really like to see how to
> > handle differently shaped arrays and data types in the interactive mode
> > without much hassle:
> >
> > from numpy import ones, arange, reshape, int32
> > a = ones((2,3,4,5))
> > b = ones(a.shape, dtype=int32)
> > c = ones((3,4,5))
> > d = 2*a + 2.5*(3*b + 3.3)
> > d[0] = d[0] + 5
> > d[1] *= c * (2+3*1.2)
> > d[:,2,...] = reshape(arange(d[:,2,...].size), d[:,2,...].shape)
> > print d[...,1]
>
> I can't get that to work in Python. What do I need to do?

Type exactly as presented above, starting from the very first line. The
`from module import ...` imports chosen identifiers into current
namespace, so you don't have to prefix them with module name.

> >>> import numpy
> >>> a = ones((2,3,4,5))
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> NameError: name 'ones' is not defined
>
> > The ray tracer example is really nice, however the sudoku solver is not
> > so impressive when compared to the K thingy ;-) -
> > http://markbyers.com/moinmoin/moin.cgi/ShortestSudokuSolver.
>
> Well, OCaml does ok on the sudoku but K does awfully on the ray tracer.

That's pretty natural. Different languages are designed to solve
different tasks. I hope your point is not to convince people to use
OCaml to solve (or complicate) every possible problem.

> > Where can I find
> > documentation or tutorial on using Bigarrays?
>
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bigarray.html

Yes, I found this before but it is not very helpful to start with. Any
doc with basic use cases and examples would be much better. Think of it
as a possible restraint for people trying to learn the language.

> You probably want to use F# though.

On the other hand, I wonder how this integrates with other .net
languages like C# or IronPython.

cheers,
fw

Jon Harrop

unread,

Dec 15, 2006, 11:54:47 AM12/15/06

to

Filip Wasilewski wrote:
> Jon Harrop wrote:
>> Yes. The time taken is dominated by memory accesses. The amount of
>> arithmetic is almost irrelevant.
>
> I can say from my experience, that this depends much on the input size
> and the CPU cache size. For arrays of about n^18 or less and 2MB cache
> the speedup for the straightforward C lifting implementation is around
> 50%, while all operations are done in-place. For bigger arrays
> traversing memory several times becomes too expensive, at least until
> the input size is so big that the memory gets swapped and the inplace
> method becomes faster again. I suppose that such behavior should be
> quite similar for other low-level implementations as well.

With my two OCaml implementations, the lifting scheme is always slower. Here
is n vs t for convolution (i.e. a simple loop):

{{4, 0.0000001},
{8, 0.0000003},
{16, 0.0000006},
{32, 0.0000012},
{64, 0.0000023},
{128, 0.0000046},
{256, 0.0000089},
{512, 0.0000178},
{1024, 0.0000355},
{2048, 0.0000720},
{4096, 0.0001445},
{8192, 0.0002968},
{16384, 0.0005956},
{32768, 0.0012225},
{65536, 0.0029214},
{131072, 0.0062959},
{262144, 0.0128418},
{524288, 0.0259961},
{1048576, 0.0519921},
{2097152, 0.1037343},
{4194304, 0.2064685},
{8388608, 0.4059380}}

and the same for the lifting scheme with deforesting optimisations:

{{4, 0.0000006},
{8, 0.0000013},
{16, 0.0000026},
{32, 0.0000048},
{64, 0.0000091},
{128, 0.0000174},
{256, 0.0000341},
{512, 0.0000681},
{1024, 0.0001373},
{2048, 0.0002736},
{4096, 0.0005483},
{8192, 0.0010936},
{16384, 0.0022086},
{32768, 0.0044017},
{65536, 0.0087643},
{131072, 0.0176380},
{262144, 0.0357446},
{524288, 0.0720515},
{1048576, 0.1449780},
{2097152, 0.2912058},
{4194304, 0.5824115},
{8388608, 1.1698220}}

The results will vary between languages, of course, but from this it looks
like convolution is ~3x faster for 4 <= n <= 2^23.

Can convolution be implemented efficiently in Python?

> Actually I started to wonder if I could automate conversion of
> multi-loop lifting schemes [1] into one-pass loop.

See below.

> This could be done
> by delaying computation of the "coming next" step just by few
> units, until values from the preceding step are ready to use.

Exactly.

Functional programming makes this easy. You just compose closures from
closures instead of arrays from arrays.

I discussed exactly this technique in the context of the travelling salesman
problem in my book "OCaml for Scientists".

>> I used a more efficient approach because I could. Using a compiled
>> language gave me that choice. We should compare both approaches in both
>> languages.
>
> I am getting a bit tired of this discussion. I am not arguing which
> approach is faster, because it very much depends on the use case.

Compiled code is virtually always faster.

> I
> just ask you to do fair comparison using the original algorithm
> presented here instead of forcing your arguments with the
> not-so-relevant variant convenient for you.

This is the third time that I've presented both variants. Now I've presented
detailed timings as well. Where's the other Python implementation?

> Every language (or at least vast majority of them) that has loops and
> basic arithmetic gives you such choice, no matter compiled or not.

I am under the impression that it is prohibitively slow in Python.

> If the OCaml is so superior as you claim,

I'm not claiming that OCaml is superior. I took the original Matlab vs
Python language performance comparison and showed that C++, OCaml and F#
all outperform both Matlab and Python. I'm keen to see what advantages
Python has.

> please show me how to do the lifting scheme
> (preferably in the
> http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57
> form with features described in my previous post) so I could actually
> learn something new and maybe even start to consider it as a possible
> alternative to C-ish language family.

Here's a starter. Your Python translated directly to functional OCaml/F#
that simply constructs new arrays at every step:

let rec d4_aux cA = function
| 0 -> [cA]
| n ->
let n = n/2 in
let f = init n in
let even = f (fun i -> cA.(2*i)) in
let odd = f (fun i -> cA.(2*i + 1)) in
let even = f (fun i -> even.(i) +. c1 *. odd.(i)) in
let odd = f (function
| 0 -> odd.(0) -. c2 *. even.(0) -. c3 *. even.(n-1)
| i -> odd.(i) -. c2 *. even.(i) -. c3 *. even.(i-1)) in
let even = f (function
| i when i=n-1 -> even.(i) -. odd.(0)
| i -> even.(i) -. odd.(i+1)) in
let even = f (fun i -> c4 *. even.(i)) in
let odd = f (fun i -> c5 *. odd.(i)) in
odd :: d4_aux even n;;

Note the use of the "init" function, allocating and filling new arrays at
every step of the computation.

The advantages of the above code are not immediately apparent. However this
is trivially deforested (i.e. multi-pass -> single pass) by accumulating
new closures "even" and "odd" rather than new arrays:

let rec d4_aux cA = function
| 1 -> [cA]
| n ->
let n = n/2 in
let even i = cA.(2*i) in
let odd i = cA.(2*i + 1) in
let even i = even i +. c1 *. odd i in
let odd = function
| 0 -> odd 0 -. c2 *. even 0 -. c3 *. even (n-1)
| i -> odd i -. c2 *. even i -. c3 *. even (i-1) in
let even = function
| i when i = n-1 -> even (i) -. odd (0)
| i -> even (i) -. odd (i+1) in
let even = init n (fun i -> c4 *. even i) in
let odd = init n (fun i -> c5 *. odd i) in
odd :: d4_aux even n;;

The code is almost identical but five of the seven O(n) array allocations
have been replaced with O(1) closure allocations. The change simply
replaces array lookup "a.(i)" with function invocation "a i".

On my machine, your original Python took 2.49s, the forested OCaml above
takes 1.34s and the deforested OCaml takes 1.16s. The deforesting
optimisation is only possible because OCaml is compiled. If you do the same
transformation in Python the program will become vastly slower because the
closures must be interpreted for each array element.

This is what I meant by optimising Python takes you in the wrong direction
(foresting). I think you're only using slice-based rewriting because it is
fast in Python, not because it is a good idea. So, is it worth adding
slicing to F#?

> As far as the DWT only is concerned, well, If I were to choose the most
> efficient approach (in both the CPU time and the development time
> terms) for Python I would use a ready extension instead of developing
> the wheel once again. Let me think, that should go something like this:
>
> import pywt
> coeffs = pywt.wavedec(x, 'db2', 'per')

Is there a difference between this in Python and in any other language?

>>> ...

>> That is also an abstraction of get/set.
>
> Oh my, are all these abstractions to be put into reality every time
> from a scratch?

This entails converting "a.(i)" to "a i", so it isn't a big deal.

>> > axis and maximum decomposition level as input
>>
>> I don't know what that is.
>
> When doing 1D operations on a multidimensional array it is often
> convenient to be able to specify along which axis these operations
> should be performed, for example:
>
>>>> x = numpy.ones((2,4))
>>>> x.sum(axis=0)
> array([ 2., 2., 2., 2.])
>>>> x.sum(axis=1)
> array([ 4., 4.])

I see. This can also be accomplished using functional programming. You have
an array lookup that accepts an array of indices and build a closure that
fills in all of the dimensions except the one you're looping over.

The main difference is probably that I'd statically type the dimensionality
of the arrays. I'll have a play with this.

> The maximum level denotes how many decomposition steps should be
> performed for multilevel transform. Just like described on
> http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/wavedec.html

What about a lazy list?

>> > with transform coefficients for every level of
>> > decomposition. Can do?
>>
>> Sure. Sounds like a lot of work though.
>
> See. It's for sure pretty much work to create a general solution in C.

I looked at some of the sophisticated wavelet libraries when I did my PhD on
wavelets and they were seriously complicated. Lisp might also be worth
considering because you could easily compile efficient implementations of
transforms that you were interested in.

> It is a lot easier to focus on the real problem when using Python (and
> NumPy or other package from a wide range of available extensions) and
> save yourself getting much into implementation details while still
> maintaining reasonable performance.

I'm not sure, but you might find it better to factor what you're doing using
functional programming constructs. The results will probably be almost as
easy to use whilst being easier to optimise, e.g. by deforesting. If you're
a researcher then I'm sure there's some new work to be done in this area...

> That is one of the most important
> reasons I prefer to use high-level language for most of my tasks. I
> also try to avoid doing premature optimizations that could spare some
> CPU cycles, just to see in a few moments that the performance gain is
> completely swallowed by I/O overhead on some remote data access point.

Absolutely. Good factorisation of your code will let you do that, as well as
leveraging existing libraries as you already are. You will probably miss
numpy if you try OCaml though. I'm keen to hear what you'd miss about numpy
though, as we can add it to F#. :-)

> For the rest of intensive number crunching, if there is no specialized
> extension available yet, I still prefer to use the Python/Pyrex/C or
> similar combo to get solution that can be easily integrated with other
> parts of a bigger system (like networking, databases, reports, web,
> etc. - wow, how many things the transition from C++ and Java few years
> ago made just so accessible) or just used to play in the interactive
> mode.

That makes perfect sense. I think F# could become a better Python but I need
to understand the relative benefits offered by Python first.

>> I can't get that to work in Python. What do I need to do?
>
> Type exactly as presented above, starting from the very first line. The
> `from module import ...` imports chosen identifiers into current
> namespace, so you don't have to prefix them with module name.

Sorry, I mistook the first line for English. :-)

>> Well, OCaml does ok on the sudoku but K does awfully on the ray tracer.
>
> That's pretty natural. Different languages are designed to solve
> different tasks. I hope your point is not to convince people to use
> OCaml to solve (or complicate) every possible problem.

Not at all. I haven't used OCaml for a while now because I'm working with F#
instead. Both languages have a lot of merits but the time is ripe to add
the features that you're touting from Python to F# because it is still
evolving. We've yet to consider Scheme/Lisp for this task...

>> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bigarray.html
>
> Yes, I found this before but it is not very helpful to start with. Any
> doc with basic use cases and examples would be much better. Think of it
> as a possible restraint for people trying to learn the language.

I'll try to find one.

>> You probably want to use F# though.
>
> On the other hand, I wonder how this integrates with other .net
> languages like C# or IronPython.

Seamlessly. C# is very easy (look at my examples, which use C# interfaces
from .NET). FFI to C is also very easy. There is a nice example of
interfacing to LAPACK in the F# distro.

Jon Harrop

unread,

Dec 15, 2006, 11:56:31 AM12/15/06

to

Yes. Non-sequential access is hugely expensive these days, and bounds
checking is virtually free. So that's one less reason to use C/C++... ;-)

sturlamolden

unread,

Dec 15, 2006, 2:53:04 PM12/15/06

to

Jon Harrop wrote:

> Yes. Non-sequential access is hugely expensive these days, and bounds
> checking is virtually free. So that's one less reason to use C/C++... ;-)

The lifiting scheme is sequential.

sturlamolden

unread,

Dec 15, 2006, 3:07:55 PM12/15/06

to

Jon Harrop wrote:

> Can convolution be implemented efficiently in Python?

numpy.convolve

> Functional programming makes this easy. You just compose closures from
> closures instead of arrays from arrays.

Indeed. But learning yet another language is too much work.

> This is what I meant by optimising Python takes you in the wrong direction
> (foresting). I think you're only using slice-based rewriting because it is
> fast in Python, not because it is a good idea. So, is it worth adding
> slicing to F#?

If you only think in times of run-time speed, probably not. But it has
other advantages, particularly ease of programming.

> That makes perfect sense. I think F# could become a better Python but I need
> to understand the relative benefits offered by Python first.

When I look at F# or Lisp I see line noise, even though I know the
syntax in theory. I don't get a mental image of what the code does. I
spend 2/3 of the time reading code when I write a program.

Second, Python is very easy to write, and complicated problems can be
handled with few lines of code (compared to e.g. C).