78 views

Skip to first unread message

Nov 16, 2006, 4:09:03 PM11/16/06

to

Boris wrote:

> Hi, is there any alternative software for Matlab? Although Matlab is

> powerful & popular among mathematical & engineering guys, it still

> costs too much & not publicly open. So I wonder if there's similar

> software/lang that is open & with comparable functionality, at least

> for numerical aspect. Thanks!

> Hi, is there any alternative software for Matlab? Although Matlab is

> powerful & popular among mathematical & engineering guys, it still

> costs too much & not publicly open. So I wonder if there's similar

> software/lang that is open & with comparable functionality, at least

> for numerical aspect. Thanks!

I have used Matlab for years, and has recently changed to Python. In

addition one needs NumPy and Matplotlib, and perhaps SciPy. Other

useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for

3D graphics, Mpi4Py for parallel computation, etc. You will find python

packages for nearly any conceivable task. Unlike Matlab, Python is a

general purpose programming language, and a distant cousin of Lisp. The

Python language is fare more expressive and productive than Matlab, yet

even more easy to use.

The NumPy package is the core requirement for numerical work in Python.

It is quite different form Matlab, but I think it is more powerful.

Particularly, arrays are passed by reference (not by value), and

indexing creates view matrices.

To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet

transform. I have here coded it in Matlab and Python/NumPy using Tim

Swelden's lifting scheme.

First the Matlab version (D4_Transform.m):

function x = D4_Transform(x)

% D4 Wavelet transform in Matlab

% (C) Sturla Molden

C1 = 1.7320508075688772;

C2 = 0.4330127018922193;

C3 = -0.066987298107780702;

C4 = 0.51763809020504137;

C5 = 1.9318516525781364;

s1 = zeros(ceil(size(x)/2));

d1 = zeros(ceil(size(x)/2));

d2 = zeros(ceil(size(x)/2));

odd = x(2:2:end);

even = x(1:2:end-1);

d1(:) = odd - C2*even;

s1(1) = even(1) + C2*d1(1) + C3*d1(end);

s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);

d2(1) = d1(1) + s1(end);

d2(2:end) = d1(2:end) + s1(1:end-1);

x(1:2:end-1) = C4*s1;

x(2:2:end) = C5*d2;

if (length(x) >2)

x(1:2:end-1) = D4_Transform(x(1:2:end-1));

end

Then the Python version (D4.py):

import numpy

import time

def D4_Transform(x, s1=None, d1=None, d2=None):

"""

D4 Wavelet transform in NumPy

(C) Sturla Molden

"""

C1 = 1.7320508075688772

C2 = 0.4330127018922193

C3 = -0.066987298107780702

C4 = 0.51763809020504137

C5 = 1.9318516525781364

if d1 == None:

d1 = numpy.zeros(x.size/2)

s1 = numpy.zeros(x.size/2)

d2 = numpy.zeros(x.size/2)

odd = x[1::2]

even = x[:-1:2]

d1[:] = odd[:] - C1*even[:]

s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

d2[0] = d1[0] + s1[-1]

d2[1:] = d1[1:] + s1[:-1]

even[:] = C4 * s1[:]

odd[:] = C5 * d2[:]

if x.size > 2:

D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

if __name__ == "__main__":

x = numpy.random.rand(2**23)

t0 = time.clock()

D4_Transform(x)

t = time.clock()

print "Elapsed time is %.6f seconds" % (t-t0)

Now let's do benchmark on my laptop (1.73 GHz Pentium M, 0.99 GB RAM).

I have stopped paying for Matlab maintenance (for reasons that will be

obvious), so I only have R14 Service Pack 2 for comparison.

First we try Matlab (R14 Service Pack 2):

>> x = rand(2^23,1);

>> tic; D4_Transform(x); toc

Elapsed time is 27.145438 seconds.

Then we Python 2.4.4 with NumPy 1.0:

C:\develop\python\D4>python D4.py

Elapsed time is 3.364887 seconds

That is quite astonishing.

If anyone wonders why I think Travis Oliphant and the NumPy team should

be knighted, then this is the answer. The Mathworks' product only

achieved 100% * 3/27 = 11% the speed of Python/NumPy, and is infinitely

more expensive.

Does anyone wonder why I am not paying for Matlab maintenance anymore?

Sorry Mathworks, I have used your product for years, but you cannot

compete with NumPy.

Cheers,

Sturla Molden

Nov 16, 2006, 4:46:13 PM11/16/06

to

Bill Gates will have you jailed! :-)

On a more serious note, is there any alternative to Simulink though?

Nov 16, 2006, 5:23:07 PM11/16/06

to sturlamolden, pytho...@python.org

R (http://cran.r-project.org) might be an alternative, specially if

you do a lot of statistics and graphics. (R is probably the most

widely used language/system in statistical research).

you do a lot of statistics and graphics. (R is probably the most

widely used language/system in statistical research).

R.

> --

> http://mail.python.org/mailman/listinfo/python-list

>

--

Ramon Diaz-Uriarte

Statistical Computing Team

Structural Biology and Biocomputing Programme

Spanish National Cancer Centre (CNIO)

http://ligarto.org/rdiaz

Nov 16, 2006, 5:55:38 PM11/16/06

to

In article <1163711343.8...@h48g2000cwc.googlegroups.com>,

sturlamolden <sturla...@yahoo.no> wrote:

>Boris wrote:

>> Hi, is there any alternative software for Matlab? Although Matlab is

>> powerful & popular among mathematical & engineering guys, it still

>> costs too much & not publicly open. So I wonder if there's similar

>> software/lang that is open & with comparable functionality, at least

>> for numerical aspect. Thanks!

>

>I have used Matlab for years, and has recently changed to Python. In

.

.

.

While I frequently push Matlab users toward Python, they also deserve

to know about Octave <URL:

http://www-128.ibm.com/developerworks/library/l-oslab/?n-l-10242 >.

sturlamolden <sturla...@yahoo.no> wrote:

>Boris wrote:

>> Hi, is there any alternative software for Matlab? Although Matlab is

>> powerful & popular among mathematical & engineering guys, it still

>> costs too much & not publicly open. So I wonder if there's similar

>> software/lang that is open & with comparable functionality, at least

>> for numerical aspect. Thanks!

>

>I have used Matlab for years, and has recently changed to Python. In

.

.

While I frequently push Matlab users toward Python, they also deserve

to know about Octave <URL:

http://www-128.ibm.com/developerworks/library/l-oslab/?n-l-10242 >.

Nov 16, 2006, 6:37:48 PM11/16/06

to

Boris wrote:

> Hi, is there any alternative software for Matlab? Although Matlab is

> powerful & popular among mathematical & engineering guys, it still

> costs too much & not publicly open. So I wonder if there's similar

> software/lang that is open & with comparable functionality, at least

> for numerical aspect. Thanks!

> Hi, is there any alternative software for Matlab? Although Matlab is

> powerful & popular among mathematical & engineering guys, it still

> costs too much & not publicly open. So I wonder if there's similar

> software/lang that is open & with comparable functionality, at least

> for numerical aspect. Thanks!

There is also Scilab. I've only used it a tiny bit but for the task it

worked well. I do know that it has a somewhat restrictive license. It

is open source, but you aren't allowed to modify and redistribute the

source. I get the feeling that some people avoid Scilab but I'm not

sure why. If anybody knows the reason I'd be happy to hear it.

Scilab website: http://www.scilab.org

-Matt

Nov 16, 2006, 6:47:09 PM11/16/06

to pytho...@python.org

Matimus wrote:

> There is also Scilab. I've only used it a tiny bit but for the task it

> worked well. I do know that it has a somewhat restrictive license. It

> is open source, but you aren't allowed to modify and redistribute the

> source. I get the feeling that some people avoid Scilab but I'm not

> sure why. If anybody knows the reason I'd be happy to hear it.

> There is also Scilab. I've only used it a tiny bit but for the task it

> worked well. I do know that it has a somewhat restrictive license. It

> is open source, but you aren't allowed to modify and redistribute the

> source. I get the feeling that some people avoid Scilab but I'm not

> sure why. If anybody knows the reason I'd be happy to hear it.

I think you just stated it.

--

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma

that is made terrible by our own mad attempt to interpret it as though it had

an underlying truth."

-- Umberto Eco

Nov 17, 2006, 1:51:02 PM11/17/06

to Matimus, scipy...@scipy.org, pytho...@python.org

Matimus wrote:

> Boris wrote:

> > Hi, is there any alternative software for Matlab? Although Matlab is

> > powerful & popular among mathematical & engineering guys, it still

> > costs too much & not publicly open. So I wonder if there's similar

> > software/lang that is open & with comparable functionality, at least

> > for numerical aspect. Thanks!

>

> There is also Scilab.

> Boris wrote:

> > Hi, is there any alternative software for Matlab? Although Matlab is

> > powerful & popular among mathematical & engineering guys, it still

> > costs too much & not publicly open. So I wonder if there's similar

> > software/lang that is open & with comparable functionality, at least

> > for numerical aspect. Thanks!

>

> There is also Scilab.

> If anybody knows the reason I'd be happy to hear it.

I feel like I need to respond on this thread, as a recent Matlab/Octave/Scilab to

Python convert. I'm cross-posting this to the scipy list, because there was a recent

thread about advantages to Python over Matlab.

Bottom line: Python is more than a replacement for Matlab, in almost every capacity.

There are a couple of places here and there where I feel Matlab still has an

advantage, but since I converted about 12 months ago, I have run Matlab only a couple

of times, and only to send something to someone who didn't have Python installed.

I've been using Matlab for more than 10 years, in every part of my research and

teaching. When I started to teach courses with a larger Matlab component, I looked

for free/cheap alternatives that the students could use on their own machines. There

was the student version of Matlab, which was crippled in ways unacceptable to me.

Octave was good, and I used it a couple of years, but the MS Windows port is pretty

horrible. I still use it some on Linux, but on Windows it requires Cygwin to be

installed, and performance takes a *huge* hit with the compiled extensions in

Windows. The community is pretty amazing, and the main author John Eaton is nothing

short of incredible (I've personally received bug fix responses within minutes of

submitting the bug report!). Writing C++ extensions is also pretty easy in Octave,

and I made a lot of use of that. The graphics uses a connection to Gnuplot, which is

ok.

I used Scilab for a short while. It did install more cleanly in Windows, and the

performance was generally better than Octave. I didn't like how some things were not

Matlab compatible for no good reason. I got the feeling from the community that it

wasn't as open, and that decisions were somewhat arbitrary. GUI syntax was weird,

and compiling extensions was a real pain, requiring more than one file per function.

I was introduced to python about 12 months ago, and haven't looked back since. :)

I want to state some of the hurdles that I felt in the transition, in the hope that

it will help others to overcome them. The advantages to Python far outweigh the

problems that I had.

1) in Matlab, you can call a function without any execfile() or import statements.

This impacts Python negatively in the ease of using it interactively. On the other

hand, the one-function per file organization of Matlab can get a little hard to follow.

2) following up on point 1), this has an effect on matplotlib more drastically.

Turning it to interactive mode (ion()) helps, but in Linux I get window refresh

issues with this mode. Switching to ipython for the shell, making sure to call it

with ipython -pylab, has been a big help in this regard.

3) 3D plotting requires yet-another library. luckily I haven't had to use this much,

but I hope that someday that it will be part of matplotlib.

4) GUI development is a bit easier in Matlab for small projects, and harder for

larger ones. I like using wax in python, which wraps wxPython. Others swear by

Dabo, but I always get two windows popping up when I run even the hello world daboui

program. At any rate, wxPython looks much better on Windows than the same interface

in Matlab, and it is more robust across operating systems.

5) although stated in the python docs that python is easy to interface with C, and in

the scipy docs that it is easier to interface to C than Matlab, I cannot agree.

perhaps it is my limitation, but I found that the API was more complicated (because

of the multiple data structures), and having to keep track of references was

something I never had to do in writing Matlab extensions. HOWEVER, once I found

Pyrex, all of that changed. Pyrex is simply the best way to write an extension that

I have ever seen, and I have never seen its equal for any other language.

So my recommendation for a (nearly) complete Matlab replacement would be:

python

numpy

scipy

matplotlib

pyrex

There is a bit of a learning curve, but it is well worth it. On an amusing note, if

you work in Python for a while, and go back to Matlab for a bit, you will find that

you will forget your semi-colons at the ends of the Matlab lines, and all of your

arrays will display and scroll like crazy on your screen. :)

thanks all for making such a great set of tools!

Brian Blais

--

-----------------

bbl...@bryant.edu

http://web.bryant.edu/~bblais

Nov 17, 2006, 2:17:49 PM11/17/06

to

sturlamolden wrote:

> Boris wrote:

> > Hi, is there any alternative software for Matlab? Although Matlab is

> > powerful & popular among mathematical & engineering guys, it still

> > costs too much & not publicly open. So I wonder if there's similar

> > software/lang that is open & with comparable functionality, at least

> > for numerical aspect. Thanks!

>

> I have used Matlab for years, and has recently changed to Python. In

> addition one needs NumPy and Matplotlib, and perhaps SciPy. Other

> useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for

> 3D graphics, Mpi4Py for parallel computation, etc. You will find python

> packages for nearly any conceivable task. Unlike Matlab, Python is a

> general purpose programming language, and a distant cousin of Lisp. The

> Python language is fare more expressive and productive than Matlab, yet

> even more easy to use.

> Boris wrote:

> > Hi, is there any alternative software for Matlab? Although Matlab is

> > powerful & popular among mathematical & engineering guys, it still

> > costs too much & not publicly open. So I wonder if there's similar

> > software/lang that is open & with comparable functionality, at least

> > for numerical aspect. Thanks!

>

> I have used Matlab for years, and has recently changed to Python. In

> addition one needs NumPy and Matplotlib, and perhaps SciPy. Other

> useful packages are PyGTK for GUI, PyGame for multimedia, PyOpenGL for

> 3D graphics, Mpi4Py for parallel computation, etc. You will find python

> packages for nearly any conceivable task. Unlike Matlab, Python is a

> general purpose programming language, and a distant cousin of Lisp. The

> Python language is fare more expressive and productive than Matlab, yet

> even more easy to use.

Huh, so you don't think it's just damn sexy to do OOP, networking and

databases by multiplying matrices?;-)

> The NumPy package is the core requirement for numerical work in Python.

> It is quite different form Matlab, but I think it is more powerful.

> Particularly, arrays are passed by reference (not by value), and

> indexing creates view matrices.

For those interested in drilling down on this topic there is a Numpy

for Matlab Users guide at http://scipy.com/NumPy_for_Matlab_Users

> To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet

> transform. I have here coded it in Matlab and Python/NumPy using Tim

> Swelden's lifting scheme.

[...]

Actually you have not. The algorithm you presented gives completely

wrong results. Have a look at quick&dirty(TM) implementation bellow.

import math

import numpy

def d2_lwt(x, axis=-1, level=None, copy=False):

"""

Daubechies2 (4 point support) Lifting Wavelet Transform in NumPy

"""

C1 = 1.7320508075688772 # sqrt(3)

C2 = 0.4330127018922193 # sqrt(3)/4

C3 = -0.066987298107780702 # (sqrt(3)-2)/4)

C4 = 0.51763809020504137 # (sqrt(3)-1)/sqrt(2)

C5 = 1.9318516525781364 # (sqrt(3)+1)/sqrt(2)

if not isinstance(x, numpy.ndarray) or x.dtype.kind != 'f':

x = numpy.array(x, dtype=numpy.float64)

elif copy:

x = x.copy()

max_level = int(math.floor(math.log(x.shape[axis],2)))

if level is None:

level = max_level

else:

assert level <= max_level, "Level param too high"

coeffs = []

cA = x.swapaxes(0, axis)

while level > 0:

level -= 1

# lazy

even = cA[::2]

odd = cA[1::2]

# dual

# using `even = even + C1*odd` may speed up things on

# some machines (only for not in-place transform).

even += C1*odd

# primal

odd[0] -= C2*even[0] + C3*even[-1]

odd[1:] -= C2*even[1:] + C3*even[:-1]

# dual

even[:-1] -= odd[1:]

even[-1] -= odd[0]

# scale

even *= C4

odd *= C5

cA, cD = even, odd

coeffs.append(cD.swapaxes(axis, 0))

coeffs.append(cA.swapaxes(axis, 0))

coeffs.reverse()

return coeffs

if __name__ == "__main__":

d = [1,5,3,2,4,8,5,2]

data = [

numpy.array(d, dtype=numpy.float32),

numpy.array(d, dtype=numpy.float64),

numpy.array([d]*2, dtype=numpy.float64),

numpy.array([d]*2, dtype=numpy.float32),

numpy.array(d, dtype=numpy.int32),

[d]*2,

[[d,d]]*3,

]

for i,x in enumerate(data):

print "Case %d:" % (i+1)

print "x in:\n", x

coeffs = d2_lwt(x, axis=-1, level=None, copy=False)

print "coeffs:"

for c in coeffs:

print c, c.dtype

print "x out:\n", x

print

Please excuse me not including Matlab version here, I would like to

have a free weekend.

As far as the speed comparison is concerned I totally agree that NumPy

can easily outperform Matlab in most cases. Of course one can use

compiled low-level extensions to speed up specific computations in

Matlab, but it's a lot easier and/or cheaper to find very good tools

for Python.

> If anyone wonders why I think Travis Oliphant and the NumPy team should

> be knighted, then this is the answer.

Would love to see the ceremony:)

cheers,

fw

Nov 17, 2006, 2:38:02 PM11/17/06

to

On Nov 16, 10:46 pm, "John Henry" <john106he...@hotmail.com> wrote:

> Bill Gates will have you jailed! :-)

>

> On a more serious note, is there any alternative to Simulink though?

Ptolemy II. Java stuff in the core but components may be written in

Python

http://ptolemy.eecs.berkeley.edu/ptolemyII/

http://ptolemy.eecs.berkeley.edu/ptolemyii/ptII5.0/ptII/doc/codeDoc/ptolemy/actor/lib/python/PythonScript.htm

Nov 17, 2006, 4:42:47 PM11/17/06

to

Thanks for pointing that out. I wasn't aware of this. Will take a

look.

look.

Nov 17, 2006, 7:15:48 PM11/17/06

to

sturlamolden wrote:

> Sorry Mathworks, I have used your product for years, but you cannot

> compete with NumPy.

Funny. I went exactly the other way. Had a full OO postprocessing library

for Python/Scipy/HDF etc which worked brilliantly. Then changed to a 64 bit

machine and spent three days trying to install all the additional libraries

that I had become dependent on over the year. In the end, I gave up

completely frustrated (something to do with a Fortran compiler in

combination with SciPy or something). Then I tried MATLAB and was

completely delighted that there was a program with all MY batteries

included :-).

BTW, I have the impression that MATLAB and SciPy have about the same

performance.

Cheers, Maarten

--

===================================================================

Maarten van Reeuwijk dept. of Multiscale Physics

Phd student Faculty of Applied Sciences

maarten.vanreeuwijk.net Delft University of Technology

Nov 18, 2006, 5:39:31 AM11/18/06

to SciPy Users List, pytho...@python.org, Matimus

>>>>> "Brian" == Brian Blais <bbl...@bryant.edu> writes:

Brian> 3) 3D plotting requires yet-another library. luckily I

Brian> haven't had to use this much, but I hope that someday that

Brian> it will be part of matplotlib.

I'd rather not say anything about this since I have strong opinions

about the subject.

Brian> 4) GUI development is a bit easier in Matlab for small

Brian> projects, and harder for larger ones. I like using wax in

Brian> python, which wraps wxPython. Others swear by Dabo, but I

Brian> always get two windows popping up when I run even the hello

Brian> world daboui program. At any rate, wxPython looks much

Brian> better on Windows than the same interface in Matlab, and it

Brian> is more robust across operating systems.

You should try and take a look at enthought.traits. It makes creating

UIs a lot easier than anything else I've seen.

cheers,

prabhu

Nov 18, 2006, 1:57:00 PM11/18/06

to

Filip Wasilewski wrote:

> Actually you have not. The algorithm you presented gives completely

> wrong results. Have a look at quick&dirty(TM) implementation bellow.

God grief. I followed the implementation in Ingrid Daubechies' and Wim

Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:

247-269, 1998). If you look at the factorized polyphase matrix for D4

(which gives the inverse transform), their implementation of the

lifting steps the original paper is incorrect. That is scary.

A free PDF or PS of the paper is available here:

http://cm.bell-labs.com/who/wim/papers/factor/index.html

Thanks a lot!

Sturla Molden

Nov 18, 2006, 2:45:27 PM11/18/06

to

sturlamolden wrote:

> God grief. I followed the implementation in Ingrid Daubechies' and Wim

> Sweldens' original wavelet lifting paper (J. Fourier Anal. Appl., 4:

> 247-269, 1998). If you look at the factorized polyphase matrix for D4

> (which gives the inverse transform), their implementation of the

> lifting steps the original paper is incorrect. That is scary.

>

> A free PDF or PS of the paper is available here:

>

> http://cm.bell-labs.com/who/wim/papers/factor/index.html

Actually its more complicated than that. The PS and the PDF files are

different, and only the PS-file is confused. Here is the correct

explanation:

The factorization of the polyphase matrix is not unique. There are

several valid factorizations. Our implementations corresponds to

different factorizations of the analysis and synthesis poyphase

matrices, and both are in a sence correct, although numerically

different.

Nov 19, 2006, 12:18:09 PM11/19/06

to

sturlamolden wrote:

> def D4_Transform(x, s1=None, d1=None, d2=None):

> """

> D4 Wavelet transform in NumPy

> (C) Sturla Molden

> """

> C1 = 1.7320508075688772

> C2 = 0.4330127018922193

> C3 = -0.066987298107780702

> C4 = 0.51763809020504137

> C5 = 1.9318516525781364

> if d1 == None:

> d1 = numpy.zeros(x.size/2)

> s1 = numpy.zeros(x.size/2)

> d2 = numpy.zeros(x.size/2)

> odd = x[1::2]

> even = x[:-1:2]

> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1] #typ0

> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1] #typo

> d2[0] = d1[0] + s1[-1]

> d2[1:] = d1[1:] + s1[:-1]

> even[:] = C4 * s1[:]

> odd[:] = C5 * d2[:]

> if x.size > 2:

> D4_Transform(even,s1[0:even.size/2],

> d1[0:even.size/2],d2[0:even.size/2])

Actually, there was a typo in the original code. I used d1[l-1] where I

should have used d1[l+1]. Arrgh. Here is the corrected version, the

Matlab code must be changed similarly. It has no relevance for the

performance timings though.

def D4_Transform(x, s1=None, d1=None, d2=None):

"""

D4 Wavelet transform in NumPy

(C) Sturla Molden

"""

C1 = 1.7320508075688772

C2 = 0.4330127018922193

C3 = -0.066987298107780702

C4 = 0.51763809020504137

C5 = 1.9318516525781364

if d1 == None:

d1 = numpy.zeros(x.size/2)

s1 = numpy.zeros(x.size/2)

d2 = numpy.zeros(x.size/2)

odd = x[1::2]

even = x[:-1:2]

d1[:] = odd[:] - C1*even[:]

s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]

s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]

Nov 19, 2006, 3:28:56 PM11/19/06

to

sturlamolden wrote:

[...]

> Here is the correct explanation:

>

> The factorization of the polyphase matrix is not unique. There are

> several valid factorizations. Our implementations corresponds to

> different factorizations of the analysis and synthesis poyphase

> matrices, and both are in a sence correct, although numerically

> different.

[...]

> Here is the correct explanation:

>

> The factorization of the polyphase matrix is not unique. There are

> several valid factorizations. Our implementations corresponds to

> different factorizations of the analysis and synthesis poyphase

> matrices, and both are in a sence correct, although numerically

> different.

Please correct me if I'm missing something but I'm pretty sure that

choice of polyphase matrix factorization for given wavelet filter bank

has influence only on speed and numerical stability of computation and

not on the result itself. Particularly it should be possible to

reconstruct analysis and synthesis filters from polyphase matrices

regardless of the chosen factorization and both the discrete wavelet

transform and the wavelet lifting scheme should give corresponding

results for chosen wavelet (one can be rewritten in the form of other).

cheers,

fw

Nov 19, 2006, 3:33:27 PM11/19/06

to

If you swap C4 with C5 then our solutions give identical results and

both match the classic dwt approach:

even[:] = C5 * s1[:]

odd[:] = C4 * d2[:]

> if x.size > 2:

> D4_Transform(even,s1[0:even.size/2],

> d1[0:even.size/2],d2[0:even.size/2])

cheers,

fw

Nov 23, 2006, 8:18:02 PM11/23/06

to

Brian Blais wrote:

> So my recommendation for a (nearly) complete Matlab replacement would be:

> python

> numpy

> scipy

> matplotlib

> pyrex

>

Brian,

Thanks for that list. I'm currently in the process of getting quotes

for a bunch of Matlab tools for hardware-in-the-loop simulation. Big

bucks.

I'd love to use Python, but I'm not comfortable with the hardware side

of that. I'm certain that most, if not all data acquisition hardware

comes with DLL drivers, which I could interface with using ctypes. I'm

concerned though about spending more time messing around with the

hardware interface than it's worth.

Do you have any experience with this side of the Matlab replacement

question? How about anyone else? Any recommendations?

Thanks,

Phil

Nov 25, 2006, 11:57:30 AM11/25/06

to Phil Schmidt, pytho...@python.org

Phil Schmidt wrote:

>

> I'd love to use Python, but I'm not comfortable with the hardware side

> of that. I'm certain that most, if not all data acquisition hardware

> comes with DLL drivers, which I could interface with using ctypes. I'm

> concerned though about spending more time messing around with the

> hardware interface than it's worth.

>

> Do you have any experience with this side of the Matlab replacement

> question? How about anyone else? Any recommendations?

>

>

> I'd love to use Python, but I'm not comfortable with the hardware side

> of that. I'm certain that most, if not all data acquisition hardware

> comes with DLL drivers, which I could interface with using ctypes. I'm

> concerned though about spending more time messing around with the

> hardware interface than it's worth.

>

> Do you have any experience with this side of the Matlab replacement

> question? How about anyone else? Any recommendations?

>

Unfortunately, no I don't have experience with the hardware end of things. One can,

both with c-types and with pyrex, very easily use existing C libraries so if there is

a C-API, that shouldn't be a problem.

Sorry for not being much help!

bb

Nov 25, 2006, 8:29:22 PM11/25/06

to

Phil Schmidt wrote:

> Thanks for that list. I'm currently in the process of getting quotes

> for a bunch of Matlab tools for hardware-in-the-loop simulation. Big

> bucks.

Yup, and better spent elsewhere...

> I'd love to use Python, but I'm not comfortable with the hardware side

> of that. I'm certain that most, if not all data acquisition hardware

> comes with DLL drivers, which I could interface with using ctypes. I'm

> concerned though about spending more time messing around with the

> hardware interface than it's worth.

Usually you have to mess equally much with Matlab, writing CMEX

interfaces between the DLLs and Matlab. And afterwards you get the

headache of Matlab's single-threaded environment and horrible

pass-by-value sematics.

> Do you have any experience with this side of the Matlab replacement

> question? How about anyone else? Any recommendations?

If you are afraid of doing some C coding or using ctypes, I'd say go

for LabView. When it comes to data acquisition, LabView is far superior

to Matlab. And data acquisition hardware usually comes with LabView

drivers ready to run. LabView can also compile programs that you can

run on real-time OS'es and common DSP chips, so you will not need to

port your code to these targets if you need hard real-time constraints

in your system.

First find out what drivers or APIs are supplied with the hardware.

Then do the decision of which language to use - including Python,

Matlab, LabView, Java, C# .NET, C or C++. I would in any case get a

quote for LabView as well, i does not cost you anything just to obtain

the quote. Generally, development time is far more expensive than

licenses, even with the ultra-expensive Matlab and LabView software.

Using Python just for the sake of using Python is silly.

Nov 26, 2006, 12:23:30 AM11/26/06

to

sturlamolden wrote:

> Using Python just for the sake of using Python is silly.

Well, that kind of gets right to my point. Does the "added" effort with

Python to interface with data acquisition hardware really result in

less productivity? I am very familiar with Matlab, Labview, and Python,

and frankly, Python is the most productive and powerful programming

language of the three. But it's the hardware compatibility thing that

concerns me with Python.

It seems the answers are hard to come by on this issue. I sure would be

willing to give it a try, except that I'm getting paid to get a job

done, not to tinker around with Python and DAQ hardware. But if

tinkering around could save my project money on commercial software,

and still get it done on schedule, it would be a big win. I just don't

have confidence that it would.

Phil

Nov 26, 2006, 12:52:19 AM11/26/06

to pytho...@python.org

Phil Schmidt wrote:

> sturlamolden wrote:

>

>> Using Python just for the sake of using Python is silly.

>

> Well, that kind of gets right to my point. Does the "added" effort with

> Python to interface with data acquisition hardware really result in

> less productivity? I am very familiar with Matlab, Labview, and Python,

> and frankly, Python is the most productive and powerful programming

> language of the three. But it's the hardware compatibility thing that

> concerns me with Python.

> sturlamolden wrote:

>

>> Using Python just for the sake of using Python is silly.

>

> Well, that kind of gets right to my point. Does the "added" effort with

> Python to interface with data acquisition hardware really result in

> less productivity? I am very familiar with Matlab, Labview, and Python,

> and frankly, Python is the most productive and powerful programming

> language of the three. But it's the hardware compatibility thing that

> concerns me with Python.

Without knowing the hardware that you want to use, it's quite difficult for us

to tell you anything useful. Most likely, you're simply going to have to sit

down with a list of the hardware that you want to use, the existing solutions

for interfacing with that hardware (e.g. already written MATLAB, LabView, or

Python extension modules for that hardware), and their costs. You're going to

have to weigh those costs against the cost of using ctypes or something similar

to wrap the raw DLLs if a Python extension module isn't available.

We might be able to help you locate the already-existing Python wrappers for

your hardware, but the rest are judgement calls that only you can make. To make

an accurate judgement, you will probably need to have a little bit of experience

trying to wrap one such raw DLL. If you can't afford the time to do that one

experiment, then you probably have your answer.

Nov 26, 2006, 12:43:48 PM11/26/06

to

Phil Schmidt wrote:

> Well, that kind of gets right to my point. Does the "added" effort with

> Python to interface with data acquisition hardware really result in

> less productivity? I am very familiar with Matlab, Labview, and Python,

> and frankly, Python is the most productive and powerful programming

> language of the three. But it's the hardware compatibility thing that

> concerns me with Python.

This is really a question that has no general answer. It depends on the

specific hardware and the solutions available.

Generally, Matlab and Python sucks equally much when it comes to

interfacing with hardware. Both tycan use C extensions (which you may

have to write) or ctypes (Matlab's equivalent is the loadlibrary/callib

functions). If only a C SDK is available, SWIG can autogenerate Python

wrappers (require some tweking). Matlab's loadlibrary can sometimes

parse C header files, but it is not always successful.

Both Matlab and Python can use COM/ActiveX objects on Windows. In

addition, Matlab can use Java classes, with which Python does not

interface easily. On the other hand, Python can use .NET objects, if

you use Microsoft's IronPython or "Python for .NET" with classic

Python. Last time I checked, Matlab could not use .NET objects, but

that was a year ago.

Most data acquisition hardware are supplied with SDKs for C, and

perhaps COM/ActiveX objects for RAD tools like Visual Basic and Delphi.

Modern hardware is often supplied with .NET SDKs. LabView drivers are

also commonly supplied. If none are supplied by the vendor, there are

often prewritten solutions for Matlab, Python or Java available on the

Internet, but you will have to search for them, or write your own.

Thus, it is not generally true that it is easier to interface hardware

with Matlab than Python. It depens on what SDKs you get.

On the programming side, Python is a general purpose language, Matlab

is designed for working with matrices easily. Array and matrix

computation in Python require NumPy, which has recently been declared

stable. Matlab has been stable for quite a while. Python is

multi-threaded, Matlab is single-threaded. Matlab uses pass-by-value

even for the largest of arrays, Python always pass references.

With Python, you can program with multiple threads; with Matlab, you

have everything in one thread. As Matlab only supports synchronous i/o

out of the box, you cannot do anything else while Matlab's fwrite or

fread functions are 'blocking' (i.e. waiting for data from hardware or

writing to disk). This is Matlab's major disadvantage for data

acquisition applications. Thus to make your data acquisition run

smoothly and efficiently with Matlab, you must resort to writing CMEX

extensions for performing asynchronous I/O or spawn off threads. Have

you ever looked at overlapped i/o in the Windows API? I can guarantee

that it is hundred times more complex than anything a hardware vendor

can throw at you. Alternatively, you can code the i/o in Java, and make

have Matlab call Java for i/o tasks. With Python, this is a non-issue.

You can put blocking i/o in separate threads or use prewritten APIs for

non-blocking i/o. It is obvious which is the better option here.

Generally, non-blocking i/o and multithreading is essential for any

serious data acqusition application, and Matlab does not support any of

these.

Both Matlab and Python is interpreted languages. Matlab has a

rudimentary JIT compiler, whilst classic Python is interpreted only.

Microsoft's IronPython uses the .NET JIT compiler. Jython uses Java's

JIT compiler (run the Server JVM or IBM's JDK for best performance,

don't use the client JVM with hotspot). However, if you use IronPython

or Jython, NumPy is out of the question. Thus, Matlab has perhaps a

slight advantage here.

With Python you can inline C in your Python code (e.g. using SciPy and

Weave), and even inline assembly in the inlined C. Thus you can easily

deal with hotspots in your Python code. You cannot do this easily with

Matlab.

For these reasons, my general view is that Python is better for data

acquisition than Matlab. But using Python for the sake of using Python

is nevertheless silly. The same holds true for Matlab.

A data acquisition program ofte require fast graphics as well. For slow

2D plots, Python and Matlab perform equally well (Python requires

Matplotlib). Matlab can draw 3D graphs, but not quickly. Python require

VTK for this, which may be harder to use. For fast 2D or 3D graphics,

i.e. DirectDraw, Direct3D or OpenGL, Python is the only game in town.

DirectDraw is used by SDL, which in turn is used by PyGame. Direct3D is

available though DirectPython (or COM objects or .NET objects). OpenGL

is available through PyOpenGL, which has recently been redesigned to

use ctypes; previously the build process for PyOpenGL used to be a

nightmare. For GUI, Matlab only has rudementary capabilities. Python

has several high quality GUI libraries (not counting tkinter). You can

create multimedia applications with PyGame, you cannot do this easily

with Matlab. E.g. do you want to play sound while you record?

> It seems the answers are hard to come by on this issue. I sure would be

> willing to give it a try, except that I'm getting paid to get a job

> done, not to tinker around with Python and DAQ hardware. But if

> tinkering around could save my project money on commercial software,

> and still get it done on schedule, it would be a big win. I just don't

> have confidence that it would.

That you will have to figure out your self. The answer is not obvious.

But for GUI, forget about tkinter.

Nov 27, 2006, 10:16:54 AM11/27/06

to

Hi all,

I'm not going to touch the big picture issues here -- you need to pick the

right tool for the job you're doing, and only you know what works best for

your task. However, since it didn't come up, I feel I need to add a piece

of info to the mix, since I spend my days getting MATLAB working with data

acquisition hardware. To be clear, I'm the development lead for Data

Acquisition Toolbox, which is MathWorks' add on product to MATLAB and

Simulink to handle all the issues associated with interfacing data

acquisition hardware without all the CMEX work, threading, DLLs, etc. As

Sturla points out, that's not trivial work in any language.

Anyway, I just wanted to call your attention to Data Acquisition Toolbox:

http://www.mathworks.com/products/daq/

I'm not sure what hardware you're using, Phil, so I can't say whether we

support it. We do add support for new hardware all the time, so I'd be

interested in hearing about your application.

All the best,

-Rob

--

Rob Purser

Senior Team Lead, Connectivity Products

The MathWorks

rob.p...@mathworks.com

"sturlamolden" <sturla...@yahoo.no> wrote in message

news:1164504562.8...@l39g2000cwd.googlegroups.com...

Nov 27, 2006, 7:12:17 PM11/27/06

to

Rob Purser wrote:

> Anyway, I just wanted to call your attention to Data Acquisition Toolbox:

> http://www.mathworks.com/products/daq/

Absolutely. If the hardware is supported by this toolbox, there is no

need to reinvent the wheel. The license is expensive, but development

time can be far more expensive. Matlab is an excellent tool for

numerical work.

However, Matlab is only single-threaded and does not optimize its

pass-by-value semantics with proper 'escape analysis'. (Yes, Matlab

does lazy copy-on-write to optimize function calls, but functions have

return values as well.) I have (temporarily?) stopped using Matlab

because I constantly got annoyed by the limitations of the language. I

still use Matlab as an advanced desktop calculator though. Somehow I

find it easier to type in an expression in Matlab than NumPy, probably

because I am more accustomed to the syntax.

Message has been deleted

Nov 29, 2006, 6:52:05 PM11/29/06

to

John Henry wrote:

> Bill Gates will have you jailed! :-)

>

> On a more serious note, is there any alternative to Simulink though?

>

but also PowerSim.

I found only 1 major disadvantage inSciLab, ...

... it has no ActiveX

cheers,

Stef Mientki

Nov 29, 2006, 11:44:45 PM11/29/06

to

On 16 Nov 2006 13:09:03 -0800, "sturlamolden" <sturla...@yahoo.no>

wrote:

wrote:

...SNIP...

>To compare Matlab with NumPy we can e.g. use the D4 discrete wavelet

>transform. I have here coded it in Matlab and Python/NumPy using Tim

>Swelden's lifting scheme.

>

>First the Matlab version (D4_Transform.m):

>

>function x = D4_Transform(x)

> % D4 Wavelet transform in Matlab

> % (C) Sturla Molden

> C1 = 1.7320508075688772;

> C2 = 0.4330127018922193;

> C3 = -0.066987298107780702;

> C4 = 0.51763809020504137;

> C5 = 1.9318516525781364;

> s1 = zeros(ceil(size(x)/2));

> d1 = zeros(ceil(size(x)/2));

> d2 = zeros(ceil(size(x)/2));

> odd = x(2:2:end);

> even = x(1:2:end-1);

> d1(:) = odd - C2*even;

> s1(1) = even(1) + C2*d1(1) + C3*d1(end);

> s1(2:end) = even(2:end) + C2*d1(2:end) + C3*d1(1:end-1);

> d2(1) = d1(1) + s1(end);

> d2(2:end) = d1(2:end) + s1(1:end-1);

> x(1:2:end-1) = C4*s1;

> x(2:2:end) = C5*d2;

> if (length(x) >2)

> x(1:2:end-1) = D4_Transform(x(1:2:end-1));

> end

>

>

>Then the Python version (D4.py):

>

>import numpy

>import time

>

>def D4_Transform(x, s1=None, d1=None, d2=None):

> """

> D4 Wavelet transform in NumPy

> (C) Sturla Molden

> """

> C1 = 1.7320508075688772

> C2 = 0.4330127018922193

> C3 = -0.066987298107780702

> C4 = 0.51763809020504137

> C5 = 1.9318516525781364

> if d1 == None:

> d1 = numpy.zeros(x.size/2)

> s1 = numpy.zeros(x.size/2)

> d2 = numpy.zeros(x.size/2)

> odd = x[1::2]

> even = x[:-1:2]

> d1[:] = odd[:] - C1*even[:]

> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

> d2[0] = d1[0] + s1[-1]

> d2[1:] = d1[1:] + s1[:-1]

> even[:] = C4 * s1[:]

> odd[:] = C5 * d2[:]

> if x.size > 2:

>

>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

>

>if __name__ == "__main__":

> x = numpy.random.rand(2**23)

> t0 = time.clock()

> D4_Transform(x)

> t = time.clock()

> print "Elapsed time is %.6f seconds" % (t-t0)

>

...SNIP...

I couldn't help but notice that the following two lines from the above

code are doing different things, but I think they are supposed to do the

same things (the first is the Matlab version, the 2nd is the Python

version):

> d1(:) = odd - C2*even;

> d1[:] = odd[:] - C1*even[:]

I don't understand what this code is doing, so I don't know which is

wrong. I don't even know if this is the actual code you are using. Just

thought I'd mention it.

Bill

Dec 2, 2006, 2:44:07 AM12/2/06

to

I don't know Python but this benchmark caught my eye.

>>def D4_Transform(x, s1=None, d1=None, d2=None):

>> """

>> D4 Wavelet transform in NumPy

>> (C) Sturla Molden

>> """

>> C1 = 1.7320508075688772

>> C2 = 0.4330127018922193

>> C3 = -0.066987298107780702

>> C4 = 0.51763809020504137

>> C5 = 1.9318516525781364

>> if d1 == None:

>> d1 = numpy.zeros(x.size/2)

>> s1 = numpy.zeros(x.size/2)

>> d2 = numpy.zeros(x.size/2)

Are these definitions ever used? It looks like s1, d1 and d2 are all

redefined below without reference to these previous values.

>> odd = x[1::2]

>> even = x[:-1:2]

>> d1[:] = odd[:] - C1*even[:]

>> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

>> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

>> d2[0] = d1[0] + s1[-1]

>> d2[1:] = d1[1:] + s1[:-1]

>> even[:] = C4 * s1[:]

>> odd[:] = C5 * d2[:]

Does that line create an array that is never used? If so, is C5 also never

used?

>> if x.size > 2:

>>

>>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

What is the result of this function?

I'm interested in translating this function into other languages, like

OCaml, to see how good Python's performance is (I am amazed it can beat

Matlab, not that I've used Matlab). In particular, I think you are eagerly

allocating arrays when, in a functional language, you could just as easily

compose closures.

For example, this program is 3x faster than the Python on my machine:

let rec d4 s1 d1 d2 x =

let c1 = 1.7320508075688772 in

let c2 = 0.4330127018922193 in

let c3 = -0.066987298107780702 in

let c4 = 0.51763809020504137 in

let c5 = 1.9318516525781364 in

let n = Array.length x in

let odd i = x.(2*i) and even i = x.(2*i + 1) in

let d1 i = odd i -. c1 *. even i in

let f = function -1 -> n/2 - 1 | i -> i in

let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in

let d2 i = d1 i +. s1 (f(i-1)) in

let even = Array.init (n/2) (fun i -> c4 *. s1 i) in

if n > 2 then d4 s1 d1 d2 even else s1, d1, d2

but I'm not sure it is correct!

--

Dr Jon D Harrop, Flying Frog Consultancy

Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists

Dec 2, 2006, 12:20:56 PM12/2/06

to

Jon Harrop wrote:

> I don't know Python but this benchmark caught my eye.

>

> >>def D4_Transform(x, s1=None, d1=None, d2=None):

> >> """

> >> D4 Wavelet transform in NumPy

> >> (C) Sturla Molden

> >> """

> >> C1 = 1.7320508075688772

> >> C2 = 0.4330127018922193

> >> C3 = -0.066987298107780702

> >> C4 = 0.51763809020504137

> >> C5 = 1.9318516525781364

> >> if d1 == None:

> >> d1 = numpy.zeros(x.size/2)

> >> s1 = numpy.zeros(x.size/2)

> >> d2 = numpy.zeros(x.size/2)

>

> Are these definitions ever used? It looks like s1, d1 and d2 are all

> redefined below without reference to these previous values.

> I don't know Python but this benchmark caught my eye.

>

> >>def D4_Transform(x, s1=None, d1=None, d2=None):

> >> """

> >> D4 Wavelet transform in NumPy

> >> (C) Sturla Molden

> >> """

> >> C1 = 1.7320508075688772

> >> C2 = 0.4330127018922193

> >> C3 = -0.066987298107780702

> >> C4 = 0.51763809020504137

> >> C5 = 1.9318516525781364

> >> if d1 == None:

> >> d1 = numpy.zeros(x.size/2)

> >> s1 = numpy.zeros(x.size/2)

> >> d2 = numpy.zeros(x.size/2)

>

> Are these definitions ever used? It looks like s1, d1 and d2 are all

> redefined below without reference to these previous values.

No, they're never redefined (even in the recursive version). Slices of

them are reassigned. That's a big difference.

> >> odd = x[1::2]

> >> even = x[:-1:2]

> >> d1[:] = odd[:] - C1*even[:]

> >> s1[0] = even[0] + C2*d1[0] + C3*d1[-1]

> >> s1[1:] = even[1:] + C2*d1[1:] + C3*d1[:-1]

> >> d2[0] = d1[0] + s1[-1]

> >> d2[1:] = d1[1:] + s1[:-1]

> >> even[:] = C4 * s1[:]

> >> odd[:] = C5 * d2[:]

>

> Does that line create an array that is never used? If so, is C5 also never

> used?

No it doesn't create a new array. But it does have an effect.

Explained below.

(Actually, it does create a temporary array to store the intermediate

result, but that array does not get bound to odd.)

> >> if x.size > 2:

> >>

> >>D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

>

> What is the result of this function?

It modifies the array even in place. This also has an effect.

> I'm interested in translating this function into other languages, like

> OCaml, to see how good Python's performance is (I am amazed it can beat

> Matlab, not that I've used Matlab).

Matlab has a few *cough* limitations when it comes to hand-optimizing.

When writing naive code, Matlab often is faster than Python with numpy

because it has many commerical man-year of optimizing behind it.

However, Matlab helps v

> In particular, I think you are eagerly

> allocating arrays when, in a functional language, you could just as easily

> compose closures.

You are completely wrong.

What you seem to be missing is that slice assignment shares data. For

example, the statement "odd = x[1::2]" does NOT create a new array. It

takes a slice of x's data (in this case, the odd elements of x: slices

need not be contiguous), and binds it to the symbol odd. The array

elements are shared; changing elements in odd also changes elements of

x.

So later on, the statement "odd[:] = C5*d2[:]" reassigns the elements

of odd, which, because the array elements are shared, also reassigns

the odd elements in x.

(Notice the difference between "odd[:]=" and "odd=". "odd[:]=" is

slice assignment; it changes the elements in the given slice of odd, in

this case the whole array. "odd=" rebinds the symbol odd to a

different object, which could be a new array, a slice of an existing

array, or any other Python object.)

This data sharing more or less accomplishes the same thing that the

"closures" you speak of accomplish (in this case), only without the

functional.

> For example, this program is 3x faster than the Python on my machine:

>

> let rec d4 s1 d1 d2 x =

> let c1 = 1.7320508075688772 in

> let c2 = 0.4330127018922193 in

> let c3 = -0.066987298107780702 in

> let c4 = 0.51763809020504137 in

> let c5 = 1.9318516525781364 in

> let n = Array.length x in

> let odd i = x.(2*i) and even i = x.(2*i + 1) in

> let d1 i = odd i -. c1 *. even i in

> let f = function -1 -> n/2 - 1 | i -> i in

> let s1 i = even i +. c2 *. d1 i +. c3 *. d1 (f(i-1)) in

> let d2 i = d1 i +. s1 (f(i-1)) in

> let even = Array.init (n/2) (fun i -> c4 *. s1 i) in

> if n > 2 then d4 s1 d1 d2 even else s1, d1, d2

>

> but I'm not sure it is correct!

It's not correct, but what you left out is probably low cost.

OCaml is compiled to machine code, right? And types can be inferred at

compile time, correct? Well then of course it's faster. It seems to

me a big help is the ability to fold multiple array operations into a

single loop, which is optimization a dynamically-typed language like

Python can't easily make. (It'd require are really smart JIT compiler

or some concessions in dynamicity.)

Carl Banks

Dec 2, 2006, 12:28:52 PM12/2/06

to

Carl Banks wrote:

> Matlab has a few *cough* limitations when it comes to hand-optimizing.

> When writing naive code, Matlab often is faster than Python with numpy

> because it has many commerical man-year of optimizing behind it.

> However, Matlab helps v

That should say:

However, Matlab helps very little when you need to speed things up, or

(especially) use less memory than the naive way.

Carl Banks

Dec 3, 2006, 12:00:12 AM12/3/06

to

Carl Banks wrote:

> No, they're never redefined (even in the recursive version). Slices of

> them are reassigned. That's a big difference.

> No, they're never redefined (even in the recursive version). Slices of

> them are reassigned. That's a big difference.

I see.

> (Actually, it does create a temporary array to store the intermediate

> result, but that array does not get bound to odd.)

Sure.

>> In particular, I think you are eagerly

>> allocating arrays when, in a functional language, you could just as

>> easily compose closures.

>

> You are completely wrong.

I'll give an example. If you write the Python:

a[:] = b[:] + c[:] + d[:]

I think that is equivalent to the ML:

fill a (map2 ( + ) (map2 ( + ) b c) d)

which can be deforested in ML to avoid the creation of the intermediate

result b[:] + c[:] by using a closure to add three values at once:

fill a (map3 (fun b c d -> b + c + d) b c d)

which will be much faster because it doesn't generate an intermediate array.

> This data sharing more or less accomplishes the same thing that the

> "closures" you speak of accomplish (in this case), only without the

> functional.

The closure avoided the intermediate result. You said that the Python isn't

doing that?

> It's not correct, but what you left out is probably low cost.

>

> OCaml is compiled to machine code, right?

Yes.

> And types can be inferred at compile time, correct?

Types are completely inferred at compile time.

> Well then of course it's faster.

Yes. So this doesn't seem to be a killer example of numpy, although I am

still amazed that it can outperform Matlab.

> It seems to

> me a big help is the ability to fold multiple array operations into a

> single loop, which is optimization a dynamically-typed language like

> Python can't easily make. (It'd require are really smart JIT compiler

> or some concessions in dynamicity.)

Writing a JIT to compile this kind of stuff is easy. My point is that this

is fundamentally bad code, so why bother trying to write a Python JIT? Why

not just write in a better language for this task? Optimising within a

fundamentally slow language seems silly to me.

Dec 3, 2006, 10:58:52 AM12/3/06

to

Jon Harrop wrote:

> >> In particular, I think you are eagerly

> >> allocating arrays when, in a functional language, you could just as

> >> easily compose closures.

> >

> > You are completely wrong.

>

> I'll give an example. If you write the Python:

>

> a[:] = b[:] + c[:] + d[:]

>

> I think that is equivalent to the ML:

>

> fill a (map2 ( + ) (map2 ( + ) b c) d)

>

> which can be deforested in ML to avoid the creation of the intermediate

> result b[:] + c[:] by using a closure to add three values at once:

>

> fill a (map3 (fun b c d -> b + c + d) b c d)

>

> which will be much faster because it doesn't generate an intermediate array.

> >> In particular, I think you are eagerly

> >> allocating arrays when, in a functional language, you could just as

> >> easily compose closures.

> >

> > You are completely wrong.

>

> I'll give an example. If you write the Python:

>

> a[:] = b[:] + c[:] + d[:]

>

> I think that is equivalent to the ML:

>

> fill a (map2 ( + ) (map2 ( + ) b c) d)

>

> which can be deforested in ML to avoid the creation of the intermediate

> result b[:] + c[:] by using a closure to add three values at once:

>

> fill a (map3 (fun b c d -> b + c + d) b c d)

>

> which will be much faster because it doesn't generate an intermediate array.

Ah, but, this wasn't about temporaries when you spoke of "eagerly

allocating arrays", was it? But yes, you're right, in this example

temporary arrays are created; arrays that Ocaml would not need. (I

don't exactly understand why you'd need a functional language to get

this optimization, though.)

You might have guessed that you can get rid of most temporary arrays,

at the expense of readability, with numpy. For example, replacing

d1[:] = odd[:] - C1*even[:]

with

numpy.multiply(-C1,even,d1)

numpy.add(d1,odd,d1)

eliminates the intermediate. No unnecessary array allocations; no

closures necessary.

I'm curious whether this shared-slicing isn't, in some ways,

advantageous over the Ocaml way. I presume that in Ocaml, the way

you'd "share" array data is to create a closure can apply some

operation to selected elements of the array, returning the result. But

could this as easily modify the array in-place? I presume a good

compiler could be able to inline the function and optimize the call to

an in-place operation, but I'm not sure how well this works in

practice.

Here's a version of D4_Transform that uses no temporary arrays (aside

from the work arrays s1, d1, and d2, which are allocated only once).

It's about 40% faster for me than the one with infix operations. I'd

be curious how it compares to a correct Ocaml version. (I'd still

expect Ocaml to be at least twice as fast.)

def alt_D4_Transform(x, s1=None, d1=None, d2=None):

add = numpy.add

multiply = numpy.multiply

C1 = 1.7320508075688772

C2 = 0.4330127018922193

C3 = -0.066987298107780702

C4 = 0.51763809020504137

C5 = 1.9318516525781364

if d1 == None:

d1 = numpy.zeros(x.size/2,numpy.Float)

s1 = numpy.zeros(x.size/2,numpy.Float)

d2 = numpy.zeros(x.size/2,numpy.Float)

odd = x[1::2]

even = x[:-1:2]

multiply(-C1,even,d1)

add(d1,odd,d1)

multiply(C2,d1,s1)

add(s1,even,s1)

d2[0] = C3 * d1[-1]

multiply(C3,d1[:-1],d2[1:])

add(s1,d2,s1)

d2[0] = d1[0] + s1[-1]

add(d1[1:],s1[:-1],d2[1:])

multiply(C4,s1,even)

multiply(C5,d2,odd)

if x.size > 2:

alt_D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2])

> > It seems to

> > me a big help is the ability to fold multiple array operations into a

> > single loop, which is optimization a dynamically-typed language like

> > Python can't easily make. (It'd require are really smart JIT compiler

> > or some concessions in dynamicity.)

>

> Writing a JIT to compile this kind of stuff is easy.

Eh, getting a JIT to do the optimization I spoke of in Python is not

easy. It would be relatively easy in a statically typed language. In

Python, it'd be tantamount to rewriting a dynamically reconfigurable

version of numpy--you'd have to duplicate numpy's complex

type-dispatching rules.

> My point is that this

> is fundamentally bad code,

Whoa, there. I realize that for people who prefer functional

programming, with their recursively reductionist way of thinking, it

might seem as if leaving any sort of reducibility in final result is

"fundamentally bad", but that's a pretty strong thing to say.

> so why bother trying to write a Python JIT? Why

> not just write in a better language for this task? Optimising within a

> fundamentally slow language seems silly to me.

Because, for most people, language choice is not a greedy maximizing of

a single issue. Nobody who uses numpy is under the impression that it

can match a statically-typed language that is compiled to machine code

in peak performance. (Matlab is fair game, of course.) But speed is

not the only important thing.

I use Python because it's a excellently designed language that fits my

overall needs, and numpy because sometimes I need more speed than

vanilla Python. Only when speed is critical (for example, 3D collision

detection) do I write an extension in a "better language for the task"

(i.e. C).

This is something that's quite popular in the numerically-intensive

computing community, BTW. Many people use Python to handle boring

stuff like file I/O, memory managment, and non-critical numerical

calculations, and write C or Fortran extensions to do the

numerically-intensive stuff.

Carl Banks

Dec 3, 2006, 12:01:27 PM12/3/06

to

Carl Banks wrote:

>> fill a (map3 (fun b c d -> b + c + d) b c d)

>>

>> which will be much faster because it doesn't generate an intermediate

>> array.

>

> Ah, but, this wasn't about temporaries when you spoke of "eagerly

> allocating arrays", was it?

>> fill a (map3 (fun b c d -> b + c + d) b c d)

>>

>> which will be much faster because it doesn't generate an intermediate

>> array.

>

> Ah, but, this wasn't about temporaries when you spoke of "eagerly

> allocating arrays", was it?

I had thought that all of the array operations were allocating new arrays at

first but it seems that at least assignment to a slice does not.

Does:

a[:] = b[:] + c[:]

allocate a temporary for b[:] + c[:]?

> But yes, you're right, in this example

> temporary arrays are created; arrays that Ocaml would not need. (I

> don't exactly understand why you'd need a functional language to get

> this optimization, though.)

I thought that functional programming could get you both the brevity of

Python's slicing and the performance of C's compilation but I was wrong.

You can get the brevity of the Python approach in C.

> You might have guessed that you can get rid of most temporary arrays,

> at the expense of readability, with numpy. For example, replacing

>

> d1[:] = odd[:] - C1*even[:]

>

> with

>

> numpy.multiply(-C1,even,d1)

> numpy.add(d1,odd,d1)

>

> eliminates the intermediate. No unnecessary array allocations; no

> closures necessary.

Right. Performance will probably go from 5x slower to 2x slower because

you're traversing the arrays twice instead of once.

> I'm curious whether this shared-slicing isn't, in some ways,

> advantageous over the Ocaml way.

That's exactly what I thought to start with. The author of F# is working on

getting slicing into his language but the use of slicing in this Python

benchmark corrupted my fragile little mind. Slicing is a terrible way to

approach this problem if you're using a compiled language like F#.

I first wrote an OCaml translation of the Python and wrote my own

little "slice" implementation. I have since looked up a C++ solution and

translated that into OCaml instead:

let rec d4_aux a n =

let n2 = n lsr 1 in

let tmp = Array.make n 0. in

for i=0 to n2-2 do

tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;

tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;

done;

tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;

tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;

Array.blit tmp 0 a 0 n;

if n > 4 then d4_aux a (n lsr 1)

let d4 a = d4_aux a (Array.length a)

Not only is that shorter than the Python, it is much faster:

0.56s C++ (direct arrays)

0.61s F# (direct arrays)

0.62s OCaml (direct arrays)

1.38s OCaml (slices)

2.38s Python (slices)

10s Mathematica 5.1

Note that all implementations are safe (e.g. C++ uses a.at(i) instead of

a[i]).

> I presume that in Ocaml, the way

> you'd "share" array data is to create a closure can apply some

> operation to selected elements of the array, returning the result.

Yes. That is certainly one way of doing it.

> But could this as easily modify the array in-place?

Absolutely. A closure would capture a reference to the array. Arrays are

mutable so you could alter the array from within the closure. In F# you

could even execute the closures concurrently to alter different parts of

the same array at the same time. I just tried that and it is actually

slower to multithread this.

> I presume a good

> compiler could be able to inline the function and optimize the call to

> an in-place operation, but I'm not sure how well this works in

> practice.

Surprisingly well it seems: F# and OCaml are almost as fast as C/C++!

> Here's a version of D4_Transform that uses no temporary arrays (aside

> from the work arrays s1, d1, and d2, which are allocated only once).

> It's about 40% faster for me than the one with infix operations. I'd

> be curious how it compares to a correct Ocaml version. (I'd still

> expect Ocaml to be at least twice as fast.)

I get:

1.57s Python (in-place)

>> > It seems to

>> > me a big help is the ability to fold multiple array operations into a

>> > single loop, which is optimization a dynamically-typed language like

>> > Python can't easily make. (It'd require are really smart JIT compiler

>> > or some concessions in dynamicity.)

>>

>> Writing a JIT to compile this kind of stuff is easy.

>

> Eh, getting a JIT to do the optimization I spoke of in Python is not

> easy. It would be relatively easy in a statically typed language. In

> Python, it'd be tantamount to rewriting a dynamically reconfigurable

> version of numpy--you'd have to duplicate numpy's complex

> type-dispatching rules.

There aren't any complicated types in the above code. In fact, there are

only two types: float and float array. Type checking is easy in this case,

compilation to C is also easy, then you just dispatch to the compiled C

code when the types are ok. You would want to write your Python code like C

code but that is shorter in this case. You may also want to flag code for

compilation manually in order to avoid the overhead of compiling code

unnecessarily.

>> My point is that this is fundamentally bad code,

>

> Whoa, there. I realize that for people who prefer functional

> programming, with their recursively reductionist way of thinking, it

> might seem as if leaving any sort of reducibility in final result is

> "fundamentally bad", but that's a pretty strong thing to say.

I was referring to the slicing specifically, nothing to do with functional

programming. My F# and OCaml code are now basically identical to the C++

code.

>> so why bother trying to write a Python JIT? Why

>> not just write in a better language for this task? Optimising within a

>> fundamentally slow language seems silly to me.

>

> Because, for most people, language choice is not a greedy maximizing of

> a single issue. Nobody who uses numpy is under the impression that it

> can match a statically-typed language that is compiled to machine code

> in peak performance. (Matlab is fair game, of course.) But speed is

> not the only important thing.

In this specific context (discrete wavelet transform benchmark), I'd have

said that speed was the most important thing after correctness.

> I use Python because it's a excellently designed language that fits my

> overall needs, and numpy because sometimes I need more speed than

> vanilla Python. Only when speed is critical (for example, 3D collision

> detection) do I write an extension in a "better language for the task"

> (i.e. C).

Have you looked at languages like F#, OCaml, Haskell, SML and so on? They

seem to offer the benefits of both worlds.

> This is something that's quite popular in the numerically-intensive

> computing community, BTW. Many people use Python to handle boring

> stuff like file I/O, memory managment, and non-critical numerical

> calculations, and write C or Fortran extensions to do the

> numerically-intensive stuff.

Yes, I appreciate that Python is hugely popular, I just don't understand why

it is quite so popular.

I found a series of lectures on VPython, showing some amazing stuff:

http://showmedo.com/videos/series?name=pythonThompsonVPythonSeries

I just found some more interesting stuff here:

http://www.phy.syr.edu/~salgado/software/vpython/

This is really great work but I can't help but wonder why the authors chose

to use Python when other languages seem better suited. I'd like to work on

raising people's awareness of these alternatives, and probably create some

useful tools in the process. So I'm keen to learn what Python programmers

would want/expect from F# and OCaml.

What would it take to make you convert?

Dec 3, 2006, 1:05:17 PM12/3/06

to

Jon Harrop wrote:

> I had thought that all of the array operations were allocating new arrays at

> first but it seems that at least assignment to a slice does not.

>

> Does:

>

> a[:] = b[:] + c[:]

>

> allocate a temporary for b[:] + c[:]?

> I had thought that all of the array operations were allocating new arrays at

> first but it seems that at least assignment to a slice does not.

>

> Does:

>

> a[:] = b[:] + c[:]

>

> allocate a temporary for b[:] + c[:]?

Yep.

[snip]

> Not only is that shorter than the Python, it is much faster:

>

> 0.56s C++ (direct arrays)

> 0.61s F# (direct arrays)

> 0.62s OCaml (direct arrays)

> 1.38s OCaml (slices)

> 2.38s Python (slices)

> 10s Mathematica 5.1

[snip]

> 1.57s Python (in-place)

So,

optimized Python is roughly the same speed as naive Ocaml

optimized Ocaml is roughly the same speed as C++

> >> > It seems to

> >> > me a big help is the ability to fold multiple array operations into a

> >> > single loop, which is optimization a dynamically-typed language like

> >> > Python can't easily make. (It'd require are really smart JIT compiler

> >> > or some concessions in dynamicity.)

> >>

> >> Writing a JIT to compile this kind of stuff is easy.

> >

> > Eh, getting a JIT to do the optimization I spoke of in Python is not

> > easy. It would be relatively easy in a statically typed language. In

> > Python, it'd be tantamount to rewriting a dynamically reconfigurable

> > version of numpy--you'd have to duplicate numpy's complex

> > type-dispatching rules.

>

> There aren't any complicated types in the above code. In fact, there are

> only two types: float and float array.

You're vastly underestimating the complexity of numpy objects. They

have an awful lot going on under the covers to make it look simple on

the surface. There's all kinds of type-checking and type-conversions.

A JIT that folds loops together would have to have knowledge of that

process, and it's a lot of knowledge to have. That is not easy.

> Type checking is easy in this case,

> compilation to C is also easy, then you just dispatch to the compiled C

> code when the types are ok. You would want to write your Python code like C

> code but that is shorter in this case. You may also want to flag code for

> compilation manually in order to avoid the overhead of compiling code

> unnecessarily.

>

> >> My point is that this is fundamentally bad code,

> >

> > Whoa, there. I realize that for people who prefer functional

> > programming, with their recursively reductionist way of thinking, it

> > might seem as if leaving any sort of reducibility in final result is

> > "fundamentally bad", but that's a pretty strong thing to say.

>

> I was referring to the slicing specifically, nothing to do with functional

> programming. My F# and OCaml code are now basically identical to the C++

> code.

It is pretty strong thing to say that anything is fundamentally bad

just because it's not fast as something else. Fundamental badness

ought to run deeper than some superficial, linear measure.

> >> so why bother trying to write a Python JIT? Why

> >> not just write in a better language for this task? Optimising within a

> >> fundamentally slow language seems silly to me.

> >

> > Because, for most people, language choice is not a greedy maximizing of

> > a single issue. Nobody who uses numpy is under the impression that it

> > can match a statically-typed language that is compiled to machine code

> > in peak performance. (Matlab is fair game, of course.) But speed is

> > not the only important thing.

>

> In this specific context (discrete wavelet transform benchmark), I'd have

> said that speed was the most important thing after correctness.

So let's say I'm planning to write this complicated graphical program,

that does all sorts of stuff. I/O, networking, database, etc. It's a

medium throughput program, and a slow language like Python is

sufficient to run it, except for this one tiny part where I have to

transform of wave data.

Now, I was going to write the program in Python, and use the very

convenient numpy to do the transformation part. No problem, numpy

will be fast enough for my needs.

But wait! It's silly try to optimize code in a fundamentally slow

language! It's vanity to even try! I guess for the sake of getting

this little D4 transform to double in speed I'll scrap Python and write

the whole thing in OCaml.

(Or I can begrudingly admit that it's not really silly to try to

optimize a slow language.)

[snip]

> This is really great work but I can't help but wonder why the authors chose

> to use Python when other languages seem better suited. I'd like to work on

> raising people's awareness of these alternatives, and probably create some

> useful tools in the process. So I'm keen to learn what Python programmers

> would want/expect from F# and OCaml.

>

> What would it take to make you convert?

Them not being functional would be a good start....

Carl Banks

Dec 3, 2006, 2:43:05 PM12/3/06

to

Carl Banks wrote:

>> 0.56s C++ (direct arrays)

>> 0.61s F# (direct arrays)

>> 0.62s OCaml (direct arrays)

>> 1.38s OCaml (slices)

>> 2.38s Python (slices)

>> 10s Mathematica 5.1

> [snip]

>> 1.57s Python (in-place)

>

> So,

> optimized Python is roughly the same speed as naive Ocaml

> optimized Ocaml is roughly the same speed as C++

>> 0.56s C++ (direct arrays)

>> 0.61s F# (direct arrays)

>> 0.62s OCaml (direct arrays)

>> 1.38s OCaml (slices)

>> 2.38s Python (slices)

>> 10s Mathematica 5.1

> [snip]

>> 1.57s Python (in-place)

>

> So,

> optimized Python is roughly the same speed as naive Ocaml

> optimized Ocaml is roughly the same speed as C++

Absolutely not:

Optimized Python is 14% slower than badly written OCaml. Given the problem,

rather than the Python solution, nobody would write OCaml code like that.

Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest

Python so far whilst also requiring about half as much code.

Optimising the C++ by hoisting the O(log n) temporary array allocations into

one allocation makes it another 20% faster. I'm sure there are plenty more

optimisations...

>> There aren't any complicated types in the above code. In fact, there are

>> only two types: float and float array.

>

> You're vastly underestimating the complexity of numpy objects. They

> have an awful lot going on under the covers to make it look simple on

> the surface. There's all kinds of type-checking and type-conversions.

> A JIT that folds loops together would have to have knowledge of that

> process, and it's a lot of knowledge to have. That is not easy.

My suggestion doesn't really have anything to do with numpy. If you had such

a JIT you wouldn't use numpy in this case.

That's my point, using numpy encouraged the programmer to optimise in the

wrong direction in this case (to use slices instead of element-wise

operations).

>> I was referring to the slicing specifically, nothing to do with

>> functional programming. My F# and OCaml code are now basically identical

>> to the C++ code.

>

> It is pretty strong thing to say that anything is fundamentally bad

> just because it's not fast as something else. Fundamental badness

> ought to run deeper than some superficial, linear measure.

The slice based approach is not only slower, it is longer, more obfuscated

and more difficult to optimise. That doesn't just apply to Python and

numpy, it also applies to Matlab, Mathematica etc.

Which begs the question, if you were writing in a compiled language like F#

would you ever use slices?

> Now, I was going to write the program in Python, and use the very

> convenient numpy to do the transformation part.

What makes numpy convenient?

> No problem, numpy will be fast enough for my needs.

Ok. But you'd rather have a compiler?

> But wait! It's silly try to optimize code in a fundamentally slow

> language! It's vanity to even try! I guess for the sake of getting

> this little D4 transform to double in speed

C++ is currently 3.4x faster than Python.

> I'll scrap Python and write the whole thing in OCaml.

Why not drop to C for this one function?

> [snip]

>> This is really great work but I can't help but wonder why the authors

>> chose to use Python when other languages seem better suited. I'd like to

>> work on raising people's awareness of these alternatives, and probably

>> create some useful tools in the process. So I'm keen to learn what Python

>> programmers would want/expect from F# and OCaml.

>>

>> What would it take to make you convert?

>

> Them not being functional would be a good start....

They don't impose functional programming on you.

Dec 3, 2006, 5:12:50 PM12/3/06

to

Jon Harrop wrote:

> Carl Banks wrote:

> >> 0.56s C++ (direct arrays)

> >> 0.61s F# (direct arrays)

> >> 0.62s OCaml (direct arrays)

> >> 1.38s OCaml (slices)

> >> 2.38s Python (slices)

> >> 10s Mathematica 5.1

> > [snip]

> >> 1.57s Python (in-place)

> >

> > So,

> > optimized Python is roughly the same speed as naive Ocaml

> > optimized Ocaml is roughly the same speed as C++

>

> Absolutely not:

>

> Optimized Python is 14% slower than badly written OCaml.

> Carl Banks wrote:

> >> 0.56s C++ (direct arrays)

> >> 0.61s F# (direct arrays)

> >> 0.62s OCaml (direct arrays)

> >> 1.38s OCaml (slices)

> >> 2.38s Python (slices)

> >> 10s Mathematica 5.1

> > [snip]

> >> 1.57s Python (in-place)

> >

> > So,

> > optimized Python is roughly the same speed as naive Ocaml

> > optimized Ocaml is roughly the same speed as C++

>

> Absolutely not:

>

> Optimized Python is 14% slower than badly written OCaml.

I'd call that roughly the same speed. Did you use any sort of

benchmark suite that miminized testing error, or did you just run it

surrounded by calls to the system timer like I did? If the latter,

then it's poor benchmark, and 10% accuracy is better than you can

expect. (Naive tests like that tend to favor machine code.)

> Given the problem,

> rather than the Python solution, nobody would write OCaml code like that.

>

> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the fastest

> Python so far whilst also requiring about half as much code.

Frankly, I have a hard time believing anyone would naively write the

Ocaml code like the optimized version you posted. You'd know better

than I, though.

> Optimising the C++ by hoisting the O(log n) temporary array allocations into

> one allocation makes it another 20% faster. I'm sure there are plenty more

> optimisations...

>

> >> There aren't any complicated types in the above code. In fact, there are

> >> only two types: float and float array.

> >

> > You're vastly underestimating the complexity of numpy objects. They

> > have an awful lot going on under the covers to make it look simple on

> > the surface. There's all kinds of type-checking and type-conversions.

> > A JIT that folds loops together would have to have knowledge of that

> > process, and it's a lot of knowledge to have. That is not easy.

>

> My suggestion doesn't really have anything to do with numpy. If you had such

> a JIT you wouldn't use numpy in this case.

>

> That's my point, using numpy encouraged the programmer to optimise in the

> wrong direction in this case (to use slices instead of element-wise

> operations).

Ok, I can see that. We have a sort of JIT compiler, psyco, that works

pretty well but I don't think it's as fast for large arrays as a good

numpy solution. But it has more to deal with than a JIT for a

statically typed language.

> >> I was referring to the slicing specifically, nothing to do with

> >> functional programming. My F# and OCaml code are now basically identical

> >> to the C++ code.

> >

> > It is pretty strong thing to say that anything is fundamentally bad

> > just because it's not fast as something else. Fundamental badness

> > ought to run deeper than some superficial, linear measure.

>

> The slice based approach is not only slower, it is longer, more obfuscated

> and more difficult to optimise. That doesn't just apply to Python and

> numpy, it also applies to Matlab, Mathematica etc.

>

> Which begs the question, if you were writing in a compiled language like F#

> would you ever use slices?

If I was writing in F#? I absolutely would, because some problems are

most cleanly, readably, and maintainably done with slices. I don't

have any speed-lust, and I don't waste time making things fast if they

don't need to be. I would rather spend my time designing and

implementing the system, and optimizing things that actually need it.

I don't want to spend my time tracking down some stupid indexing error.

If it was critical for it to be fast, then I'd pull out the stops and

write it as efficiently as possible. In my experience, a typical

middle size program will have half a dozen or so subroutines where it's

critical to be as fast as possible.

Will any other F# people do it? I don't know if they all have

speed-lust, or if they'd all deliberately avoid slicing to prove some

kind of point about functional programming and/or non-machine-compiled

languages, but I assume there'd be some who would do it for the same

reasons I'd do it.

> > Now, I was going to write the program in Python, and use the very

> > convenient numpy to do the transformation part.

>

> What makes numpy convenient?

It's in Python, it's almost always fast enough, it's clean, it scales

up well, it expresses mathematical concepts straightforwardly, it's

well suppported, it has useful advanced functionality built in (linear

algebra, etc.).

And a lot of the time, you don't have to worry about indexing.

> > No problem, numpy will be fast enough for my needs.

>

> Ok. But you'd rather have a compiler?

Python *is* compiled, buddy. It has a VM that runs bytecode.

But, would I rather have a seperate compile to *machine code*, when an

automatic compile to VM code is fast enough?

Not remotely. Compilers are a PITA and I avoid them when it's

possible.

> > But wait! It's silly try to optimize code in a fundamentally slow

> > language! It's vanity to even try! I guess for the sake of getting

> > this little D4 transform to double in speed

>

> C++ is currently 3.4x faster than Python.

Oh, well *that* made all the difference.

> > I'll scrap Python and write the whole thing in OCaml.

>

> Why not drop to C for this one function?

Why would I? The assumptions clearly stated that numpy is fast enough.

Why would I take several hours to write a C extension when I could do

it with numpy in about 20 minutes, *and it's fast enough*?

> > [snip]

> >> This is really great work but I can't help but wonder why the authors

> >> chose to use Python when other languages seem better suited. I'd like to

> >> work on raising people's awareness of these alternatives, and probably

> >> create some useful tools in the process. So I'm keen to learn what Python

> >> programmers would want/expect from F# and OCaml.

> >>

> >> What would it take to make you convert?

> >

> > Them not being functional would be a good start....

>

> They don't impose functional programming on you.

In the same way that a screwdriver can't prevent you from driving a

nail. Give me a break, we all know these guys (Haskell especially) are

designed to support functional first and other paradigms second. Way

second. That's quite enough imposing for me.

Look, I don't want to accuse you of anything, but when you come in here

and say "speed is crucial" and then "you should be using Haskell,

Ocaml, F#, etc., ...even though C++ is faster...", it gives off a

really strong signal that you're not interested in speed so much so as

in pimping functional languages; speed is just the mechanism. Which is

fine; those languages are undeniably faster, and there's nothing wrong

with using an advantage of a language to push it, if you're respectful.

But let me offer you some advice: if you wan't to lure people from

Python, speed isn't going to cut it. Most people who chose Python

(and, let's face it, most people who use Python chose it themselves)

already knew it was slow, and chose it anyways. Pythonistas prefer

Python because it's clean, readable, well-supported, and practical.

Practical might be the sticking point here. If you were to get a

Python prompt, and type "import this", it would print the Zen of

Python, a set of principles by which the language is designed. One of

them is "Practicality beats purity." And let's face it, functional

languages are (or seem to be) all about purity. Notice that there's no

Zen that says "Fast is better than slow". If you want us to swtich to

Ocaml, you better figure out how it supports the Zen of Python better

than Python does.

Carl Banks

Dec 3, 2006, 6:03:40 PM12/3/06

to

Carl Banks wrote:

>> Optimized Python is 14% slower than badly written OCaml.

>

> I'd call that roughly the same speed. Did you use any sort of

> benchmark suite that miminized testing error, or did you just run it

> surrounded by calls to the system timer like I did?

>> Optimized Python is 14% slower than badly written OCaml.

>

> I'd call that roughly the same speed. Did you use any sort of

> benchmark suite that miminized testing error, or did you just run it

> surrounded by calls to the system timer like I did?

System timer, best of three.

> If the latter,

> then it's poor benchmark, and 10% accuracy is better than you can

> expect. (Naive tests like that tend to favor machine code.)

Yes, they are roughly the same speed. My objection was that this OCaml

program was not the unoptimised one, it was the badly written one. The

unoptimised OCaml/F#/C++ are all several times faster than this OCaml, and

much shorter and simpler too.

>> Unoptimised but well-written OCaml/C/C++ is 2.5-2.8x faster than the

>> fastest Python so far whilst also requiring about half as much code.

>

> Frankly, I have a hard time believing anyone would naively write the

> Ocaml code like the optimized version you posted.

That wasn't optimised. I haven't optimised any of my solutions. The OCaml I

posted was a naive convolution with some unnecessary allocation.

> You'd know better than I, though.

When asked to convolve an array with a 4-element array (h0..h3), I think

most programmers would write (pseudocode):

for i in [0 .. n-4]

x[i] = h0 y[i] + h1 y[i+1] + h2 y[i+2] + h3 y[i+3]

in most languages. That is the essence of this benchmark.

You can express it even more succinctly with a dot product:

for i in [0 .. n-4]

x[i] = h . y[i:]

In this case, you'd want a cyclic array type too.

>> That's my point, using numpy encouraged the programmer to optimise in the

>> wrong direction in this case (to use slices instead of element-wise

>> operations).

>

> Ok, I can see that. We have a sort of JIT compiler, psyco, that works

> pretty well but I don't think it's as fast for large arrays as a good

> numpy solution. But it has more to deal with than a JIT for a

> statically typed language.

Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

would be a good student project?

>> Which begs the question, if you were writing in a compiled language like

>> F# would you ever use slices?

>

> If I was writing in F#? I absolutely would, because some problems are

> most cleanly, readably, and maintainably done with slices. I don't

> have any speed-lust, and I don't waste time making things fast if they

> don't need to be. I would rather spend my time designing and

> implementing the system, and optimizing things that actually need it.

> I don't want to spend my time tracking down some stupid indexing error.

Ok. Can you name some problems that are solved more easily with slices than

with indexing? I may have just answered my own question by posting some

really concise pseudocode that uses slices.

> Will any other F# people do it? I don't know if they all have

> speed-lust, or if they'd all deliberately avoid slicing to prove some

> kind of point about functional programming and/or non-machine-compiled

> languages, but I assume there'd be some who would do it for the same

> reasons I'd do it.

My concern is that slices are being pulled into F# because they are popular

in languages like Python and Matlab but I've yet to see an example where

they are actually a good idea (in the context of F#, i.e. a compiled

language where indexing is as fast or faster).

>> What makes numpy convenient?

>

> ...it scales up well...

What do you mean?

> ...it expresses mathematical concepts straightforwardly...

It didn't express this mathematical concept very straightforwardly.

> And a lot of the time, you don't have to worry about indexing.

Right. Ideally you want to factor that out without losing anything. Indeed,

if you do factor the C++ code you should get back to the underlying

mathematical definitions. Ideally, you'd want to write those directly and

have them executed efficiently.

>> > No problem, numpy will be fast enough for my needs.

>>

>> Ok. But you'd rather have a compiler?

>

> Python *is* compiled, buddy. It has a VM that runs bytecode.

I meant a native-code compiler, of course.

> But, would I rather have a seperate compile to *machine code*, when an

> automatic compile to VM code is fast enough?

>

> Not remotely. Compilers are a PITA and I avoid them when it's

> possible.

I can envisage a JIT Python compiler that would spot statically typed code,

compile it with no user intervention and then run it. I implemented such a

thing for Mathematica a few years ago.

>> Why not drop to C for this one function?

>

> Why would I? The assumptions clearly stated that numpy is fast enough.

> Why would I take several hours to write a C extension when I could do

> it with numpy in about 20 minutes, *and it's fast enough*?

Is it very difficult to interface Python to other languages efficiently?

> But let me offer you some advice: if you wan't to lure people from

> Python, speed isn't going to cut it. Most people who chose Python

> (and, let's face it, most people who use Python chose it themselves)

> already knew it was slow, and chose it anyways. Pythonistas prefer

> Python because it's clean, readable, well-supported, and practical.

> Practical might be the sticking point here. If you were to get a

> Python prompt, and type "import this", it would print the Zen of

> Python, a set of principles by which the language is designed. One of

> them is "Practicality beats purity." And let's face it, functional

> languages are (or seem to be) all about purity. Notice that there's no

> Zen that says "Fast is better than slow". If you want us to swtich to

> Ocaml, you better figure out how it supports the Zen of Python better

> than Python does.

I'm keen on the practical aspect. I intend to take some of the excellent

Python demos out there and port them to languages like F# in order to

illustrate how they too can be practical languages.

Dec 3, 2006, 10:05:41 PM12/3/06

to

Jon Harrop wrote:

> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> would be a good student project?

I guess for a student project it's not that important, but if you have

higher ambitions, make sure you read

http://dirtsimple.org/2005/10/children-of-lesser-python.html before you

commit to writing what will turn out to be yet another "80% Python" VM

/ JIT compiler.

George

Dec 4, 2006, 2:15:38 AM12/4/06

to

Jon Harrop wrote:

> So I'm keen to learn what Python programmers would want/expect from F# and

> OCaml.

> So I'm keen to learn what Python programmers would want/expect from F# and

> OCaml.

I think this discussion becoming is a little misguided.

The real strength of scipy is the elegant notation rather than speed.

Being raised with Matlab I find scipy nicely familiar, and its fast

enough for many tasks. Some would argue that the strength of scipy is

the weakness of ocaml. Others would disagree. That is a question of

taste.

My only grudge about strongly recommending scipy to friends is the way

that two arrays can share the same data. This can lead to subtle errors

that I will eventually be blamed for. I don't know if arrays in Matlab

(or Octave) can share data, but if they do, then everything happens

behind the scenes and the user does not have to worry. I would love to

see a future version of numpy that was 50% slower and had a more

foolproof approach to array copying.

http://www.scipy.org/Tentative_NumPy_Tutorial#head-1529ae93dd5d431ffe3a1001a4ab1a394e70a5f2

Niels

Dec 4, 2006, 9:18:15 AM12/4/06

to

Jon Harrop wrote:

[...]

> I first wrote an OCaml translation of the Python and wrote my own

> little "slice" implementation. I have since looked up a C++ solution and

> translated that into OCaml instead:

>

> let rec d4_aux a n =

> let n2 = n lsr 1 in

> let tmp = Array.make n 0. in

> for i=0 to n2-2 do

> tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;

> tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;

> done;

> tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;

> tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;

> Array.blit tmp 0 a 0 n;

> if n > 4 then d4_aux a (n lsr 1)

>

> let d4 a = d4_aux a (Array.length a)

>

> Not only is that shorter than the Python

and makes my eyes bleeding, but it's only my personal feeling. Try

reading the code above aloud and you will see why `shorter` is not

necessarily considered a merit, especially on this group.

Besides of that this code is irrelevant to the original one and your

further conclusions may not be perfectly correct. Please learn first

about the topic of your benchmark and different variants of wavelet

transform, namely difference between lifting scheme and dwt, and try

posting some relevant examples or use other topic for your benchmarks.

Neglecting this issues, I wonder if it is equally easy to juggle

arbitrary multidimensional arrays in a statically typed language and do

operations on selected axis directly from the interactive interpreter

like in the NumPy example from my other post -

http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?

I don't know much OCaml so it would be great if you could elaborate on

this.

> , it is much faster:

>

> 0.56s C++ (direct arrays)

> 0.61s F# (direct arrays)

> 0.62s OCaml (direct arrays)

> 1.38s OCaml (slices)

> 2.38s Python (slices)

> 10s Mathematica 5.1

>

> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of

> a[i]).

[...]

So why not use C++ instead of all others if the speed really matters?

What is your point here?

Could you please share full benchmark code, so we could make

conclusions too? I would like to get to know about your benchmark

methodology. I wonder if you are allocating the input data really

dynamically or whether it's size is a priori knowledge for the

compiler.

> In this specific context (discrete wavelet transform benchmark), I'd have

> said that speed was the most important thing after correctness.

In this very specific context:

Yes, if you are targeting specific one-shot issue like JPEG2000

encoder, where you are using only 2 types of wavelets. In that case you

would probably draw back to low-level C or C++ and use a good

optimizing compiler for a specific hardware.

Not necessarily, if you are doing research with different kinds of

wavelets and need a general, flexible and easily adaptable tool or just

the computation speed is not considered a bottleneck.

Language speed is a great advantage, but if it always was the most

important, people would talk directly to the CPUs or knit DSP chips in

theirs backyards whenever they need to solve a calculation problem.

Please don't make this discussion heading towards `what is the fastest

way of doing d4 transform and why OCaml rules` instead of `what is the

optimal way of doing things under given circumstances and still have a

free afternoon ;-)`. Different tasks need different, well-balanced

measures. Neither Python nor OCalm nor any other language is a panacea

for every single problem.

cheers,

fw

Dec 4, 2006, 10:39:13 AM12/4/06

to

Filip Wasilewski wrote:

> So why not use C++ instead of all others if the speed really matters?

> What is your point here?

If speed matters, one should consider using hand-coded assembly.

Nothing really beats that. But it's painful and not portable. So lets

forget about that for a moment.

Contrary to what most computer scientists think, C++ is not the tool of

choice if speed really matters. Numeric computing is typically done in

Fortran 90/95.

That is not without reason. There are aspects of the C++ language that

makes certain numerical optimizations impossible, e.g. the potential

for pointer aliasing within tight loops, which hampers register

allocation. For example:

for(i=0; i<n; i++) {

*a += *b;

*b++ = 0;

}

Here the C++ compiler cannot put *a in a register, as a may be aliased

by b in one of the iterations. It must therefore reload the value *a

from memory in every iteration of the loop. In Fortran this can never

happen (aliasing is disallowed in the standard), and it is therefore

much easier for the compiler to do numerical optimizations.

One should also consider using ATLAS or vendor-optimized BLAS (e.g.

Intel math kernel) for some of the axpy-operations in the wavelet

transform, unless one has an expensive Fortran compiler that takes care

of this automatically (in which case one should just use slicing and

let the compiler do the rest).

I don't agree that slicing is not the best way to approach this

problem. In both NumPy and Matlab, slicing (often) results in calls to

BLAS/ATLAS, which are heavily optimized numerical routines. One can

supply BLAS libraries that automatically exploits multiple CPUs, use

FMA (fused multiply and add) whenever appropriate, and make sure cache

is used in the most optimal way.

Vendors of Fortran compilers thrive on scientists need for speed.

Fortran compilers are therefore extremely good at taking slicing and

"vectorizing" the corresponding machine code. For those who don't know

Fortran, it has slicing operations similar to those of NumPy and

Matlab. If slicing was not the right tool for numerical work, Fortran

would not support it. Good Fortran compilers can automatically detect

when a call to BLAS is appropriate, use FMA opcodes whenever possible,

use SIMD extensions (e.g. SSE and MMX), or use MPI automatically to

spread the work onto multiple CPUs in a cluster, etc. This is possible

because of the implicit SIMD parallelism in Fortran's array slicing. In

Fortran, array slicing can be evaluated in any order, it is a parallel

SIMD statement. SIMD is a very powerful and perhaps the preferred

paradigm for numerical parallel computing. In Fortran 90/95 a

programmer will use array slicing, and the compiler is free to

parallelize however it wants. In contrast, the MIMD paradigm often used

for parallel computing in C and C++ (e.g. multi-threading) is a lot

harder to code for the programmer and optimize for the compiler.

Not to speak of the fact that Fortran 95 is a lot easier to write than

C++. Also, those that use Fortran tend to be scientists, not

professional programmers. C++ introduces a higher level of complexity

and entire new classes of bugs; this is highly undesirable. Fortran is

a small, nice language for numerical computing. Anything else should

rather be done in Python than C++.

I would urge you all to read this:

http://latticeqcd.blogspot.com/2006/11/why-we-use-fortran-and-python.html

Dec 4, 2006, 12:22:00 PM12/4/06

to

Jon Harrop wrote:

[snip]

> >> That's my point, using numpy encouraged the programmer to optimise in the

> >> wrong direction in this case (to use slices instead of element-wise

> >> operations).

> >

> > Ok, I can see that. We have a sort of JIT compiler, psyco, that works

> > pretty well but I don't think it's as fast for large arrays as a good

> > numpy solution. But it has more to deal with than a JIT for a

> > statically typed language.

>

> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> would be a good student project?

[snip]

> >> That's my point, using numpy encouraged the programmer to optimise in the

> >> wrong direction in this case (to use slices instead of element-wise

> >> operations).

> >

> > Ok, I can see that. We have a sort of JIT compiler, psyco, that works

> > pretty well but I don't think it's as fast for large arrays as a good

> > numpy solution. But it has more to deal with than a JIT for a

> > statically typed language.

>

> Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> would be a good student project?

...and finishing would be a good project for a well-funded team of

experienced engineers.

> >> Which begs the question, if you were writing in a compiled language like

> >> F# would you ever use slices?

> >

> > If I was writing in F#? I absolutely would, because some problems are

> > most cleanly, readably, and maintainably done with slices. I don't

> > have any speed-lust, and I don't waste time making things fast if they

> > don't need to be. I would rather spend my time designing and

> > implementing the system, and optimizing things that actually need it.

> > I don't want to spend my time tracking down some stupid indexing error.

>

> Ok. Can you name some problems that are solved more easily with slices than

> with indexing? I may have just answered my own question by posting some

> really concise pseudocode that uses slices.

D4 transform?

How about addition?

How about the vast majority of cases where slicing is possible?

Occasionally you have cases where slicing would work but be

complicated, but in most cases, wherever slicing is feasible it's

easier, less error-prone, less typing, more readable, more concise,

more maintainable.

> > Will any other F# people do it? I don't know if they all have

> > speed-lust, or if they'd all deliberately avoid slicing to prove some

> > kind of point about functional programming and/or non-machine-compiled

> > languages, but I assume there'd be some who would do it for the same

> > reasons I'd do it.

>

> My concern is that slices are being pulled into F# because they are popular

> in languages like Python and Matlab but I've yet to see an example where

> they are actually a good idea (in the context of F#, i.e. a compiled

> language where indexing is as fast or faster).

It's often a good idea where speed isn't crucial because it's easier,

more readable, and more maintainable. I highly suspect F# and other

languages would still using slicing for many problems for this reason.

OTOH, it's rarely a good idea to always base coding decisions on speed

to the exclusion of all other factors.

> >> What makes numpy convenient?

> >

> > ...it scales up well...

>

> What do you mean?

>

> > ...it expresses mathematical concepts straightforwardly...

>

> It didn't express this mathematical concept very straightforwardly.

I thought it did. One must know about the shared data nuances and the

slicing syntax. Given that, the numpy way closely mimicked the way

mathematical formulas look. Put it this way: which one of these two

(numpy with slicing, Ocaml with indexing) would it be easer to reverse

engineer a mathematical formula out of? I'd say the numpy version with

slicing, by a long shot.

> > And a lot of the time, you don't have to worry about indexing.

>

> Right. Ideally you want to factor that out without losing anything. Indeed,

> if you do factor the C++ code you should get back to the underlying

> mathematical definitions. Ideally, you'd want to write those directly and

> have them executed efficiently.

I am almost always happy to factor indexing out *with* losing

something.

> >> > No problem, numpy will be fast enough for my needs.

> >>

> >> Ok. But you'd rather have a compiler?

> >

> > Python *is* compiled, buddy. It has a VM that runs bytecode.

>

> I meant a native-code compiler, of course.

>

> > But, would I rather have a seperate compile to *machine code*, when an

> > automatic compile to VM code is fast enough?

> >

> > Not remotely. Compilers are a PITA and I avoid them when it's

> > possible.

>

> I can envisage a JIT Python compiler that would spot statically typed code,

> compile it with no user intervention and then run it. I implemented such a

> thing for Mathematica a few years ago.

You can almost never spot "statically typed" code in Python reliably,

even with a separate compile stage that does a global analysis. Even

Lisp compilers, which are generally known to be the best compilers out

there for a dynamically-typed language, don't do too well at this

unless you help them out, and Lisp isn't nearly as dynamic as Python.

As for JIT's, the best you could do is hook into a function, checking

the types of its arguments, and cacheing optimized code for the given

type signature. It's basically what psyco does. Anything more would

require either help from the programmer or concessions in dynamicity.

> >> Why not drop to C for this one function?

> >

> > Why would I? The assumptions clearly stated that numpy is fast enough.

> > Why would I take several hours to write a C extension when I could do

> > it with numpy in about 20 minutes, *and it's fast enough*?

>

> Is it very difficult to interface Python to other languages efficiently?

No. It does involve lots of boilerplate. But even if it didn't, I

wouldn't write an extension, because the numpy solution is cleaner,

more readable, and more maintainable than anything I could write in C

(which is a poor language that's good at being fast and talking to

hardware and nothing else), and it's fast enough.

> > But let me offer you some advice: if you wan't to lure people from

> > Python, speed isn't going to cut it. Most people who chose Python

> > (and, let's face it, most people who use Python chose it themselves)

> > already knew it was slow, and chose it anyways. Pythonistas prefer

> > Python because it's clean, readable, well-supported, and practical.

> > Practical might be the sticking point here. If you were to get a

> > Python prompt, and type "import this", it would print the Zen of

> > Python, a set of principles by which the language is designed. One of

> > them is "Practicality beats purity." And let's face it, functional

> > languages are (or seem to be) all about purity. Notice that there's no

> > Zen that says "Fast is better than slow". If you want us to swtich to

> > Ocaml, you better figure out how it supports the Zen of Python better

> > than Python does.

>

> I'm keen on the practical aspect. I intend to take some of the excellent

> Python demos out there and port them to languages like F# in order to

> illustrate how they too can be practical languages.

That's what you say. But nothing you've done in this whole thread

remotely suggests that "practical" means anything to you other than,

"possible to write faster code in."

Carl Banks

Dec 4, 2006, 1:09:19 PM12/4/06

to

Carl Banks wrote:

> > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> > would be a good student project?

>

> ...and finishing would be a good project for a well-funded team of

> experienced engineers.

I think this is a good idea. We could use the AST from the CPython

compiler, translate on the fly to ANSI Common Lisp, and pass it off to

SBCL (which is very fast and supports native threads). It would even

solve the issue of Python's crappy GIL and reference counting (as

opposed to a modern garbage collector).

Another option would be to use something like pyrex to compile Python

on-the-fly to C, and then compile using GCC or MSVC.

A third option would be to translate on the fly to Scheme, and compile

using Bigloo.

It would not be a JIT-compiler, but rather something like the Lisp

solution where individual functions can be compiled when speed is

important.

It sounds like a fun project to me.

Dec 4, 2006, 4:11:10 PM12/4/06

to

sturlamolden wrote:

> I don't agree that slicing is not the best way to approach this

> problem.

> I don't agree that slicing is not the best way to approach this

> problem.

Indeed, the C++ approach can be written very succinctly using slicing:

for i=0 to n/2-1 do

tmp[i] = dot a[2*i:] h;

tmp[i + n/2] = dot a[2*i + 1:] g;

where a[i:] denotes the array starting at index "i".

> In both NumPy and Matlab, slicing (often) results in calls to

> BLAS/ATLAS, which are heavily optimized numerical routines. One can

> supply BLAS libraries that automatically exploits multiple CPUs, use

> FMA (fused multiply and add) whenever appropriate, and make sure cache

> is used in the most optimal way.

If the Python were making only a few calls to BLAS then I would agree.

However, in order to move as much computation as possible from the

interpreted language onto BLAS you have had to call BLAS many times.

So the super-fast BLAS routines are now iterating over the arrays many times

instead of once and the whole program is slower than a simple loop written

in C.

Indeed, this algorithm could be written concisely using pattern matching

over linked lists. I wonder if that would be as fast as using BLAS from

Python.

--

Dr Jon D Harrop, Flying Frog Consultancy

Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Dec 4, 2006, 4:28:31 PM12/4/06

to

sturlamolden wrote:

> Carl Banks wrote:

>

> > > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> > > would be a good student project?

> >

> > ...and finishing would be a good project for a well-funded team of

> > experienced engineers.

>

> I think this is a good idea. We could use the AST from the CPython

> compiler, translate on the fly to ANSI Common Lisp, and pass it off to

> SBCL (which is very fast and supports native threads). It would even

> solve the issue of Python's crappy GIL and reference counting (as

> opposed to a modern garbage collector).

> Carl Banks wrote:

>

> > > Ok. Perhaps starting a Python JIT in something like MetaOCaml or Lisp/Scheme

> > > would be a good student project?

> >

> > ...and finishing would be a good project for a well-funded team of

> > experienced engineers.

>

> I think this is a good idea. We could use the AST from the CPython

> compiler, translate on the fly to ANSI Common Lisp, and pass it off to

> SBCL (which is very fast and supports native threads). It would even

> solve the issue of Python's crappy GIL and reference counting (as

> opposed to a modern garbage collector).

I doubt it would work too well. Lisp's semantics differ from Python's

in subtle ways, and covering the edge cases usually means you have to

do all the stuff you thought you would avoid by using another

dynamically-typed language. Translating Lisp integers to Python ints

would work probably 99 percent of the time, but in the end you'd

essentially have to reimplement the Python int type.

If you're going to go through all that work, you might as well

translate it to C or directly to machine code.

Carl Banks

Dec 4, 2006, 9:08:48 PM12/4/06

to

Jon Harrop wrote:

> So the super-fast BLAS routines are now iterating over the arrays many times

> instead of once and the whole program is slower than a simple loop written

> in C.

Yes. And the biggest overhead is probably Python's function calls.

Little is as efficient as well-written ISO C99 (not to be confused with

C++ or ANSI C). The only thing I can think of is hand-written assembly.

But whether a single loop is faster than several BLAS calls does not

only depend on how many times the memory is traversed. It also depends

a lot on cache prefetching. So I assume you make sure that the cache is

prefetched and exploited optimally for your CPU in your C code? And you

are presumably also inlining assembly to use SSE and other SIMD

opcodes? After all, SSE can speed up floating point performance by a

factor of four if the computations can be carried out in parallel. At

one point, one has to stop and ask a very important question:

How fast is fast enough? And how much suffering am I willing to endure

to achieve that speed?

Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how

much time was saved writing and debugging the code? I value my own time

more than a few extra CPU cycles. With Fortran 95, it is easy to write

numerical code that runs reasonably fast. It is only a little bit

harder to write numerical code that runs very fast. There is nothing in

Fortran that prevents you from writing loops that are equally fast or

faster than ANSI C loops. If you have several subsequent slicing

operations, a good Fortran compiler can detect that and fuse them into

a single loop. Never underestimate the ability of a modern Fortran

compiler to generate efficient machine code, even if array slicing is

used heavily.

Also never underestimate the truth in Knuth's claim that premature

optimization is the root of all evil in computer programming. It is a

lot smarter to write a pure Python/NumPy solution, and if its too slow,

run the profiler, identify the worst bottlenecks, and translate them to

Fortran. The result will probably not be as fast as a pure C solution.

But if it is still too slow, money is better spent buying better

hardware than attempting a major rewrite in C/C99/C++.

NumPy can be slower than equivalent C by an order of magnitude. But

still, that may be more than fast enough. The beauty of NumPy is really

the way it allows arrays of numerical data to be created and

manipulated within Python, not the speed of python code using NumPy

objects. We have already seen that NumPy is faster than the most recent

version of Matlab, the most expensive commercial tool on the marked;

which means, it is as fast as you can expect a numerical scripting tool

to be, but sure, there is still room for improvement.

In a numerical scripting language it is certainly not true that several

slicing operations is slower than a single big for-loop. That is due to

the overhead involved when the interpreter is invoked. The performance

is to a large extent determined by the amount of interpreter

invocations. That is why it is preferred to "vectorize" NumPy code

using slicing, repmat, reshapes, etc., rather than looping with a

pythonic for-loop. Even if you have to iterate over the memory several

times, array slicing will still be a lot faster than a pythonic

for-loop.

> Indeed, this algorithm could be written concisely using pattern matching

> over linked lists. I wonder if that would be as fast as using BLAS from

> Python.

Pattern matching on linked lists? For wavelet transforms? Seriously?

Dec 5, 2006, 5:11:58 AM12/5/06

to

sturlamolden wrote:

> Little is as efficient as well-written ISO C99 (not to be confused with

> C++ or ANSI C).

> Little is as efficient as well-written ISO C99 (not to be confused with

> C++ or ANSI C).

OCaml and F# are almost as fast as C++ in this case. I suspect most other

modern languages are.

> So I assume you make sure that the cache is

> prefetched and exploited optimally for your CPU in your C code?

No. My implementations are all unoptimised.

> How fast is fast enough? And how much suffering am I willing to endure

> to achieve that speed?

No suffering: The other implementations are all shorter, simpler and faster

than the Python.

> Bad Fortran 95 can easily be 25% lower than well-tuned C99. But how

> much time was saved writing and debugging the code?

A lot of effort was saved by not having to shoehorn the solution into numpy.

> I value my own time more than a few extra CPU cycles.

Then you won't write code like this in Python.

> Also never underestimate the truth in Knuth's claim that premature

> optimization is the root of all evil in computer programming.

Indeed, only the Python was optimised.

> NumPy can be slower than equivalent C by an order of magnitude. But

> still, that may be more than fast enough. The beauty of NumPy is really

> the way it allows arrays of numerical data to be created and

> manipulated within Python, not the speed of python code using NumPy

> objects. We have already seen that NumPy is faster than the most recent

> version of Matlab, the most expensive commercial tool on the marked;

> which means, it is as fast as you can expect a numerical scripting tool

> to be, but sure, there is still room for improvement.

Price aside, Matlab is clearly the wrong tool for the job here.

> In a numerical scripting language it is certainly not true that several

> slicing operations is slower than a single big for-loop. That is due to

> the overhead involved when the interpreter is invoked. The performance

> is to a large extent determined by the amount of interpreter

> invocations. That is why it is preferred to "vectorize" NumPy code

> using slicing, repmat, reshapes, etc., rather than looping with a

> pythonic for-loop. Even if you have to iterate over the memory several

> times, array slicing will still be a lot faster than a pythonic

> for-loop.

Yes. You have to suffer in Python if you want to approach performance that

can be reached with ease in most other languages in the case of this

benchmark (and probably most benchmarks).

>> Indeed, this algorithm could be written concisely using pattern matching

>> over linked lists. I wonder if that would be as fast as using BLAS from

>> Python.

>

> Pattern matching on linked lists? For wavelet transforms? Seriously?

Sure. Here's a list-based implementation in OCaml:

let h x0 x1 x2 x3 = h0 *. x0 +. h1 *. x1 +. h2 *. x2 +. h3 *. x3

let g x0 x1 x2 x3 = g0 *. x0 +. g1 *. x1 +. g2 *. x2 +. g3 *. x3

let rec d4_even s d t a = match t, a with

| [x0; x1; x2; x3], x4::x5::x6::_ ->

h x0 x1 x2 x3::h x2 x3 x4 x5::s, g x1 x2 x3 x4::g x3 x4 x5 x6::d

| x0::x1::(x2::x3::x4::_ as t), a ->

d4_even (h x0 x1 x2 x3 :: s) (g x1 x2 x3 x4 :: d) t a

let rec d4 = function

| [_; _; _; _] as a -> [a, []]

| a -> (fun (s, _ as h) -> h :: d4 s) (d4_even [] [] a a)

That's about 3x slower than the Python.

Dec 5, 2006, 7:36:38 AM12/5/06

to

Filip Wasilewski wrote:

> Besides of that this code is irrelevant to the original one and your

> further conclusions may not be perfectly correct. Please learn first

> about the topic of your benchmark and different variants of wavelet

> transform, namely difference between lifting scheme and dwt, and try

> posting some relevant examples or use other topic for your benchmarks.

> Besides of that this code is irrelevant to the original one and your

> further conclusions may not be perfectly correct. Please learn first

> about the topic of your benchmark and different variants of wavelet

> transform, namely difference between lifting scheme and dwt, and try

> posting some relevant examples or use other topic for your benchmarks.

Your lifting implementation is slightly longer and slower than the

non-lifting one but its characteristics are identical. For n=2^25 I get:

1.88s C++ (816 non-whitespace bytes)

2.00s OCaml (741 b)

2.33s F# (741 b)

9.83s Your Python (1,002 b)

The original python was 789 bytes.

> Neglecting this issues, I wonder if it is equally easy to juggle

> arbitrary multidimensional arrays in a statically typed language and do

> operations on selected axis directly from the interactive interpreter

> like in the NumPy example from my other post -

> http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?

> I don't know much OCaml so it would be great if you could elaborate on

> this.

It is probably just as easy. Instead of dynamic typing you have parametric

polymorphism. If you want to make your function generic over arithmetic

type then you can pass in the arithmetic operators.

>> 0.56s C++ (direct arrays)

>> 0.61s F# (direct arrays)

>> 0.62s OCaml (direct arrays)

>> 1.38s OCaml (slices)

>> 2.38s Python (slices)

>> 10s Mathematica 5.1

>>

>> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of

>> a[i]).

>

> So why not use C++ instead of all others if the speed really matters?

> What is your point here?

1. Benchmarks should not just include two inappropriate languages.

2. Speed aside, the other languages are more concise.

> Could you please share full benchmark code, so we could make

> conclusions too?

I'll paste the whole programs at the end...

> I would like to get to know about your benchmark

> methodology. I wonder if you are allocating the input data really

> dynamically or whether it's size is a priori knowledge for the

> compiler.

The knowledge is there for the compiler to use but I don't believe any of

them exploit it.

>> In this specific context (discrete wavelet transform benchmark), I'd have

>> said that speed was the most important thing after correctness.

>

> Not necessarily, if you are doing research with different kinds of

> wavelets and need a general, flexible and easily adaptable tool or just

> the computation speed is not considered a bottleneck.

Brevity is probably next.

> Language speed is a great advantage, but if it always was the most

> important, people would talk directly to the CPUs or knit DSP chips in

> theirs backyards whenever they need to solve a calculation problem.

Sure.

> Please don't make this discussion heading towards `what is the fastest

> way of doing d4 transform and why OCaml rules` instead of `what is the

> optimal way of doing things under given circumstances and still have a

> free afternoon ;-)`.

Comments like that seem to be based on the fundamental misconception that

writing C++ like this is hard.

> Different tasks need different, well-balanced

> measures. Neither Python nor OCalm nor any other language is a panacea

> for every single problem.

Absolutely. OCaml is as good as the next (compiled) language in this case.

Python and Matlab seem to be particularly bad at this task.

Here's my complete OCaml:

let a = sqrt 3. and b = 4. *. sqrt 2.

let h0, h1, h2, h3 =

(1. +. a) /. b, (3. +. a) /. b, (3. -. a) /. b, (1. -. a) /. b

let g0, g1, g2, g3 = h3, -.h2, h1, -.h0

let rec d4_aux tmp a n =

let n2 = n lsr 1 in

for i=0 to n2-2 do

tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;

tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;

done;

tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;

tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;

Array.blit tmp 0 a 0 n;

if n > 4 then d4_aux tmp a (n lsr 1)

let d4 a =

let tmp = Array.make (Array.length a) 0. in

d4_aux tmp a (Array.length a)

let () =

print_endline "Generating test data...";

let x = Array.init (1 lsl 25) (fun _ -> Random.float 1.) in

print_endline "Benchmarking...";

let t1 = Sys.time() in

ignore(d4 x);

Printf.printf "Elapsed time is %.6f seconds\n" (Sys.time() -. t1)

and C++:

#include <iostream>

#include <vector>

#include <ctime>

#include <cmath>

using namespace std;

double a = sqrt(3), b = 4 * sqrt(2);

double h0 = (1 + a) / b, h1 = (3 + a) / b, h2 = (3 - a) / b, h3 = (1 - a) /

b;

double g0 = h3, g1 = -h2, g2 = h1, g3 = -h0;

void d4(vector<double> &a) {

vector<double> tmp(a.size());

for (int n = a.size(); n >= 4; n >>= 1) {

int half = n >> 1;

for (int i=0; i<n-2; i+=2) {

tmp.at(i/2) = a.at(i)*h0 + a.at(i+1)*h1 + a.at(i+2)*h2 + a.at(i+3)*h3;

tmp.at(i/2+half) = a.at(i)*h3 - a.at(i+1)*h2 + a.at(i+2)*h1 -

a.at(i+3)*h0

;

}

tmp.at(half-1) = a.at(n-2)*h0 + a.at(n-1)*h1 + a.at(0)*h2 + a.at(1)*h3;

tmp.at(2*half-1) = a.at(n-2)*h3 - a.at(n-1)*h2 + a.at(0)*h1 -

a.at(1)*h0;

copy(tmp.begin(), tmp.begin() + n, a.begin());

}

}

int main() {

cout << "Generating data...\n";

vector<double> a(1 << 25);

for (int i=0; i<a.size(); ++i) a.at(i) = rand();

cout << "Benchmarking...\n";

double t1 = clock();

d4(a);

cout << "Elapsed time " << (clock() - t1) / CLOCKS_PER_SEC << "s\n";

}

--

Dr Jon D Harrop, Flying Frog Consultancy

Objective CAML for Scientists

http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet

Dec 5, 2006, 10:35:37 AM12/5/06

to

I doubt that anyone would dispute that even as boosted by Numpy/Scipy,

Python will almost certainly be notably slower than moderately

well-written code in a compiled language. The reason Numpy exists,

however, is not to deliver the best possible speed, but to deliver

enough speed to make it possible to solve large numerical problems with

the powerful and flexible Python language. As observed by Hans

Latangen in _Python Scripting for Computational Science_, 2nd ed.,

Springer 2005, scientific computing is more than number crunching:

Python will almost certainly be notably slower than moderately

well-written code in a compiled language. The reason Numpy exists,

however, is not to deliver the best possible speed, but to deliver

enough speed to make it possible to solve large numerical problems with

the powerful and flexible Python language. As observed by Hans

Latangen in _Python Scripting for Computational Science_, 2nd ed.,

Springer 2005, scientific computing is more than number crunching:

"Very often programming is about shuffling data in and out of different

tools, converting one data format to another, extracting numerical data

from a text, and administering numerical experiments involving a large

number of data files and directories. Such tasks are much faster to

accomplish in a language like Python than in Fortran, C, C++ or Java."

He might well have mentioned the importance of developing nice-looking

reports once the analysis is complete, and that development is simpler

and more flexible in Python than a compiled language.

In principle, I agree that heavy-duty number-crunching, at least if it

has to be repeated again and again, should be accomplished by means of

a compiled language. However, if someone has to solve many different

problems just one time, or just a few times (for example if you are an

engineering consultant), there is an excellent argument for using

Python + Numpy. Unless it affects feasibility, I opine, computational

speed is important primarily in context of regular production, e.g.,

computing the daily "value at risk" in a commodity trading portfolio,

or making daily weather predictions.

I am aware of the power and flexibility of the OCaml language, and

particularly that an OCaml user can easily switch back and forth

between interpreted and compiled implementation. I'm attacted to OCaml

and, indeed, I'm in the process of reading Smith's (unfortunately not

very well-written) _Practical OCaml_. However, I also understand that

OCaml supports only double-precision implementation of real numbers;

that its implementation of arrays is a little clunky compared to

Fortran 95 or Numpy (and I suspect not as fast as Fortran's); and that

the available libraries, while powerful, are by no means as

comprehensive as those available for Python. For example, I am unaware

of the existance of an HDF5 interface for OCaml.

In summary, I think that OCaml is an excellent language, but I think

that the question of whether to use it in preference to Python+Numpy

for general-purpose numerical analysis must rest on much more than its

computational speed.

Dec 5, 2006, 10:37:06 AM12/5/06

to

Hans >Langtangen<, rather.

Dec 5, 2006, 12:16:07 PM12/5/06

to

Carl,

I agree with practically everything you say about the choice between

Python and functional languages, but apropos of Ocaml, not these

remarks:

>

> In the same way that a screwdriver can't prevent you from driving a

> nail. Give me a break, we all know these guys (Haskell especially) are

> designed to support functional first and other paradigms second. Way

> second. That's quite enough imposing for me.

> ...

> And let's face it, functional

> languages are (or seem to be) all about purity.

As you will see if you look at some OCaml references (there are a

number that are available on the OCaml website) that OCaml goes a very

long way toward facilitating imperative, procedural methods. It's

possible to write perfectly good code in OCaml and not use functional

methods. The OCaml people are fond of saying "OCaml isn't pure."

Indeed, the need for speed is one very good reason to choose and

imperative style over a functional one.