I would welcome comments and corrections, and would be happy to
contribute some version of this to the Python website if it is of
interest.
===
The established use of Fortran in continuum models such as climate
models has some benefits, including very high performance and
flexibility in dealing with regular arrays, backward compatibility with
the existing code base, and the familiarity with the language among the
modeling community. Fortran 90 and later versions have taken many of
the lessons of object oriented programming and adapted them so that
logical separation of modules is supported, allowing for more effective
development of large systems. However, there are many purposes to which
Fortran is ill-suited which are increasingly part of the modeling
environment.
These include: source and version control and audit trails for runs,
build system management, test specification, deployment testing (across
multiple platforms), post-processing analysis, run-time and
asynchronous visualization, distributed control and ensemble
management. To achieve these goals, a combination of shell scripts,
specialized build tools, specialized applications written in several
object-oriented languages, and various web and network deployment
strategies have been deployed in an ad hoc manner. Not only has much
duplication of effort occurred, a great deal of struggling up the
learning curves of various technologies has been required as one need
or another has been addressed in various ad hoc ways.
A new need arises as the ambitions of physical modeling increase; this
is the rapid prototyping and testing of new model components. As the
number of possible configurations of a model increases, the expense and
difficulty of both unit testing and integration testing becomes more
demanding.
Fortunately, there is Python. Python is a very flexible language that
has captured the enthusiasm of commercial and scientific programmers
alike. The perception of Python programmers coming from almost any
other language is that they are suddenly dramatically several times
more productive than previously, in terms of functionality delivered
per unit of programmer time.
One slogan of the Python community is that the language "fits your
brain". Why this might be the case is an interesting question. There
are no startling computer science breakthroughs original to the
language, Rather, Python afficionados will claim that the language
combines the best features of such various languages as Lisp, Perl,
Java, and Matlab. Eschewing allegiance to a specific theory of how to
program, Python's design instead offers the best practices from many
other software cultures.
The synergies among these programming modes is in some ways harder to
explain than to experience. The Python novice may nevertheless observe
that a single language can take the place of shell scripts, makefiles,
desktop computation environments, compiled languages to build GUIs, and
scripting languages to build web interfaces. In addition, Python is
useful as a wrapper for Fortran modules, facilitating the
implementation of true test-driven design processes in Fortran models.
Another Python advocacy slogan is "batteries included". The point here
is that (in part because Python is dramatically easier to write than
other languages) there is a very broad range of very powerful standard
libraries that make many tasks which are difficult in other languages
astonishingly easy in Python. For instance, drawing upon the standard
libraries (no additional download required) a portable webserver
(runnable on both Microsoft and Unix-based platforms) can be
implemented in seven lines of code. (See
http://effbot.org/librarybook/simplehttpserver.htm ) Installation of
pure python packages is also very easy, and installation of mixed
language products with a Python component is generally not
significantly harder than a comparable product with no Python
component.
Among the Python components and Python bindings of special interest to
scientists are the elegant and powerful matplotlib plotting package,
which began by emulating and now surpasses the plotting features of
Matlab, SWIG, which allows for runtime interoperability with various
languages, f2py which specifically interoperates with Fortran, NetCDF
libraries (which cope with NetCDF files with dramatically less fuss
than the standard C or Fortran bindings), statistics packages including
bindings to the R language, linear algebra packages, various
platform-specific and portable GUI libraries, genetic algorithms,
optimization libraries, and bindings for high performance differential
equation solvers (notably, using the Argonne National Laboratory
package PetSC). An especially interesting Python trick for runtime
visualization in models that were not designed to support it, pioneered
by David Beazley's SWILL, embeds a web server in your model code.
See especially http://starship.python.net/~hinsen/ScientificPython/ and
http://scipy.org as good starting points to learn about scientific uses
of Python.
mt
> other language is that they are suddenly dramatically several times
> more productive
'suddenly dramatically several times' seems a bit redundantly
repeditively excessive, don't you think?
> Among the Python components and Python bindings of special interest to
> scientists are the elegant and powerful matplotlib plotting package,
> which began by emulating and now surpasses the plotting features of
> Matlab, SWIG, which allows for runtime interoperability with various
> languages, f2py which specifically interoperates with Fortran, NetCDF
> libraries (which cope with NetCDF files with dramatically less fuss
> than the standard C or Fortran bindings), statistics packages including
> bindings to the R language, linear algebra packages, various
> platform-specific and portable GUI libraries, genetic algorithms,
> optimization libraries, and bindings for high performance differential
> equation solvers (notably, using the Argonne National Laboratory
> package PetSC).
As the length of the sentence built up, and the inumerable commas
passed by, my brain exploded. I'd suggest turning this into a bullet
list.
Lovely; putting a copy here is a great service to others.
I want a few subtle changes. I applaud the slogan about how Python
encompasses the best of Matlab and Java (among others); I like to
think that'll get through. In that vicinity would be a good place,
if practical, to work in mention that:
A. Python is *excellent* for long-lasting and/or
group work;
B. Python's licensing is friendly;
C. It's a real language, and therefore generalizes
far better than Matlab; and
D. Has an unrivaled span of practicality, so that
learning it enables a researcher to tackle a
wide variety of software taskes.
You touch on these matters, but I think that section might be pro-
pitious for promoting them, perhaps along with
E. Python's ease-of-learning and successful record
in the hands of children, scientists, and other
casual practitioners.
Also, my instinct is to underline that this stuff is REAL. David
Beazley was winning awards with his scientific Python-Fortran
marriage back in the '90s. Perhaps your audience doesn't need so
much convincing on that point ...
Thank you - this was very good reading.
> I would welcome comments and corrections, and would be happy to
> contribute some version of this to the Python website if it is of
> interest.
>
A slight broadening of the perspective could show another advantage:
Python is also used for data processing, at least in astronomy. Modeling
and processing the data in the same environment is very practical. Spend
more time on modeling and processing the critical data sections -
critical data section may depend on model parameters and sampling (which
is often incomplete and uneven). You also avoid wasting CPU cycles to
model things not in the data.
A theorist may be perfectly happy with Fortran, and an observer could do
his stuff with simple scripts. But if they need to work together, Python
is a very good option.
Great text. Do you want to put it onto a Wiki page at wiki.python.org?
Georg
Like it - an area that doesn't come out strongly enough for me is
Python's ability to drop down to and integrate with low level
algorithms. This allows me to to optimise the key bits of design in
python very quickly and then if I still need more poke i can drop down
to low level programming languages. Optimise design, not code unless I
really need to.
To be fair the same is at least partly true for Java ( though supporting
JNI code scares me ) but my prototyping productivity isn't as high.
The distributed / HPC packages may also be worth noting - PyMPI and
PyGlobus.
p
Being a scientist, I can tell you that your not getting it right. If
you speak computer science or business talk no scientist are going to
listen. Lets just see how you argue:
> These include: source and version control and audit trails for runs,
> build system management, test specification, deployment testing (across
> multiple platforms), post-processing analysis, run-time and
> asynchronous visualization, distributed control and ensemble
> management.
At this point, no scientist will no longer understand what the heck you
are talking about. All have stopped reading and are busy doing
experiments in the laboratory instead. Perhaps it sound good to a CS
geek, but not to a busy researcher.
Typically a scientist need to:
1. do a lot of experiments
2. analyse the data from experiments
3. run a simulation now and then
Thus, we need something that is "easy to program" and "runs fast
enough" (and by fast enough we usually mean extremely fast). The tools
of choice seems to be Fortran for the older professors (you can't teach
old dogs new tricks) and MATLAB (perhaps combined with plain C) for the
younger ones (that would e.g. be yours truly). Hiring professional
programmers are usually futile, as they don't understand the problems
we are working with. They can't solve problems they don't understand.
What you really ned to address is something very simple:
Why is Python better a better Matlab than Matlab?
The programs we need to write typically falls into one of three
categories:
1. simulations
2. data analysis
3. experiment control and data aquisition
(that are words that scientists do know)
In addition, there are 10 things you should know about scientific
programming:
1. Time is money. Time is the only thing that a scientist cannot afford
to lose. Licensing fees for Matlab is not an issue. If we can spend
$1,000,000 on specialised equipment we can pay whatever Mathworks or
Lahey charges as well. However, time spent programming are an issue.
(As are time time spend learning a new language.)
2. We don't need fancy GUIs. GUI coding is a waste of time we don't
have. We don't care if Python have fancy GUI frameworks or not.
3. We do need fancy data plotting and graphing. We do need fancy
plotting and graphing that are easy to use - as in Matlab or S-PLUS.
4. Anything that has to do with website development or enterprise class
production quality control are crap that we don't care about.
5. Versioning control? For each program there is only one developer and
a single or a handful users.
6. The prototype is the final version. We are not making software for a
living, we are doing research.
7. "My simulation is running to slowly" is the number ONE complaint.
Speed of excecution is an issue, regardless of what computer science
folks try to tell you. That is why we spend disproportionate amount of
time learning to vectorize Matlab code.
8. "My simulation is running of of memory" is the number TWO complaint.
Matlab is notoriously known for leaking memory and fragmenting the
heap.
9. What are algorithms and data structures? Very few of us knows how to
use a datastructure more complicated than an array. That is why we like
Matlab and Fortran so much.
10. We are novice programmers. We are not passionate programmers. We
take no pride in our work. The easier hack the better. We don't care if
we are doing OOP or not. However, we do hate complicated APIs or APIs
that look funny. We are used to seeing sin(x) in our calculus textbooks
and because of that we don't find Math.Sin(x) particularly elegant --
even though Math.Sin(x) is more OOP and sin(x) clutters the global
namespace.
Now please go ahead and tell me how Python can help me become a better
scientist. And try to steer clear of the computer science buzzwords
that don't mean anyting to me.
Thanks!
Sturla Molden
(neuroscience PhD)
> Michael Tobis skrev:
>
> Being a scientist, I can tell you that your not getting it right. If
> you speak computer science or business talk no scientist are going to
> listen. Lets just see how you argue:
>
>
>> These include: source and version control and audit trails for runs,
>> build system management, test specification, deployment testing
>> (across
>> multiple platforms), post-processing analysis, run-time and
>> asynchronous visualization, distributed control and ensemble
>> management.
>>
>
> At this point, no scientist will no longer understand what the heck
> you
> are talking about. All have stopped reading and are busy doing
> experiments in the laboratory instead. Perhaps it sound good to a CS
> geek, but not to a busy researcher.
>
Agreed. I'm slowly learning the CS lingo, but then, I've been trying
to learn the lingo since 1977.
>
> Typically a scientist need to:
>
> 1. do a lot of experiments
>
> 2. analyse the data from experiments
>
> 3. run a simulation now and then
>
Generally correct, but the way of the physical scientist is becoming
more #3 and less #1.
>
> Thus, we need something that is "easy to program" and "runs fast
> enough" (and by fast enough we usually mean extremely fast). The tools
> of choice seems to be Fortran for the older professors (you can't
> teach
> old dogs new tricks) and MATLAB (perhaps combined with plain C) for
> the
> younger ones (that would e.g. be yours truly).
>
I was very unlucky, as I was in college just as the old computer
landscape was passing away and the new one was being born.
My first programs were punched on cards in WatFiv Fortran and run on
an IBM 360. Next, I got my Apple ][ and learned BASIC. Then off to
college for more Fortran 77. The most advanced CS course I've ever
taken was called "Introduction to Interactive Computing", where I was
taught that there's more to life than punch cards and a line printer.
> Hiring professional
> programmers are usually futile, as they don't understand the problems
> we are working with. They can't solve problems they don't understand.
>
I wouldn't touch this comment with a 01010 foot pole.
>
> What you really ned to address is something very simple:
>
>
> Why is Python better a better Matlab than Matlab?
>
1. Matlab costs $$$. In grad school, I had to buy my own student copy
of Mathematica for my Mac Plus because there wasn't any research
money or any access to anything else. IIRC, most math I had to do was
done with the backs of stacks of computer printouts, a mechanical
pencil and four pots of black coffee.
2. The Python community makes very sophisticated code available in a
wide array of areas, from pure number crunching, to symbolic algebra,
graphics, image processing, databases, communications, and on-and-on.
And I can customize every bit of it to meet my needs.
3. The people who maintain and write SciPy and NumPy are
knowledgeable, and helpful, despite having more pressing issues than
helping me! (Thanks, guys!)
>
>
> The programs we need to write typically falls into one of three
> categories:
>
> 1. simulations
> 2. data analysis
> 3. experiment control and data aquisition
>
> (that are words that scientists do know)
>
Yes, but I write other code, too. I've often said that my favorite
toy is a great programming language, and Python fits that concept
perfectly.
> In addition, there are 10 things you should know about scientific
> programming:
>
> 1. Time is money. Time is the only thing that a scientist cannot
> afford
> to lose. Licensing fees for Matlab is not an issue. If we can spend
> $1,000,000 on specialised equipment we can pay whatever Mathworks or
> Lahey charges as well. However, time spent programming are an issue.
> (As are time time spend learning a new language.)
>
True, if you work at a well-funded institution.
I work for myself. Very little money to go around. I don't have
million-dollar instruments. What I have, I build as inexpensively as
I can. "Hack" is a word with meanings beyond CS.
> 2. We don't need fancy GUIs. GUI coding is a waste of time we don't
> have. We don't care if Python have fancy GUI frameworks or not.
>
Fancy? No. Usable, most definitely! Without a decent UI, I have a
hard time using my own code. Plus, if I want to share my ideas with
anyone, an understandable GUI helps tremendously.
> 3. We do need fancy data plotting and graphing. We do need fancy
> plotting and graphing that are easy to use - as in Matlab or S-PLUS.
>
Yes, I need fancy visualizing tools, too. I have to work a bit to get
what I want from Python, but I have total control when I get it.
> 4. Anything that has to do with website development or enterprise
> class
> production quality control are crap that we don't care about.
>
There's a devil hiding in this statement. The last company I worked
for was founded by engineers. The company went bankrupt after they
decided that they knew more about quality programming than the CSs.
> 5. Versioning control? For each program there is only one developer
> and
> a single or a handful users.
>
Matlab is version controlled by people well-paid to do so. Whatever
code you write is built upon a Matlab foundation. If it's of poor
quality, then the results of your program will be, too. Also, I can't
tell you how many times I've mixed up versions of 30-page programs
I've written.
> 6. The prototype is the final version. We are not making software
> for a
> living, we are doing research.
>
I personally find it difficult to stop working on code. New features,
better algorithms. Better interface to other programs I write later.
[snip]
>
> 9. What are algorithms and data structures? Very few of us knows
> how to
> use a datastructure more complicated than an array. That is why we
> like
> Matlab and Fortran so much.
>
My ability to think of data structures was stunted BECAUSE of
Fortran and BASIC. It's very difficult for me to give up my bottom-up
programming style, even though I write better, clearer and more
useful code when I write top-down.
>
> 10. We are novice programmers. We are not passionate programmers. We
> take no pride in our work. The easier hack the better. We don't
> care if
> we are doing OOP or not. However, we do hate complicated APIs or APIs
> that look funny. We are used to seeing sin(x) in our calculus
> textbooks
> and because of that we don't find Math.Sin(x) particularly elegant --
> even though Math.Sin(x) is more OOP and sin(x) clutters the global
> namespace.
>
I agree with your thought. Certainly, the internals of the language
are beyond me, and structures shouldn't be arcane. But I can have sin
(x) in Python if I want it. Python has taught me a great deal about
OOP. Pascal, C, C++, etc., still mystify me. I can't figure them out
to save my life. But everything I've tried in Python (so far) has
made sense to me, even if it took a few days of thought to figure it
out.
>
> Now please go ahead and tell me how Python can help me become a better
> scientist. And try to steer clear of the computer science buzzwords
> that don't mean anyting to me.
>
My pleasure. Here's my experience:
1. I don't have the money for Matlab.
2. I don't have the skills or time to write every module I might need.
3. I demand a general-purpose toolkit, not a gold-plated screwdriver.
4. I've learned new ways to organize computations because of Python.
5. User groups have given me access to thousands of professional
scientists and engineers (computer and otherwise) around the world.
6. I love the MPFC jokes.
>
> Thanks!
>
> Sturla Molden
> (neuroscience PhD)
>
Kindest regards,
--David Treadwell
Chemistry PhD
Quintillion Materials Research LLC
"that man speaks for himself!" ;-)
Seriously, this depends on the lab. If you're working for
a monster pharmaceutical corp or on a military contract on
"applied" science (meaning there is a definitely payback
expected), then you likely have money to burn. People
working in a academic or non-profit lab on "unsexy"/"pure"
science, likely don't.
Remember that site-licensing usually works on some kind of
"per seat" basis (even if you are lucky enough *not* to have
a "license server" that constantly tracks usage in order to
deny service if and when N+1 users try to use the system,
the fee the site fee is still based on the number
of expected users). The last science facility I worked at
was in considerable debt to a proprietary scientific
software producer and struggling to pay the bills. The
result was that they had fewer licenses than they wanted
and many people simply couldn't use the software when they
wanted.
I'm not sure what happened in the end, because I left for
unrelated reasons before all of that got sorted out, but
Python (with a suitable array of add-ons) was definitely on
the short-list of replacement software (and partly because I
was trying to sell people on it, of course).
In fact, if I had one complaint about Python, it was the
"with a suitable array of add-ons" caveat. The proprietary
alternative had all of that rolled into one package (abeit
it glopped into one massive and arcane namespace), whereas
there was no "Python Data Language" or whatever that
would include all that in one named package that everyone
could recognize (I suppose SciPy is trying to achieve that).
For similar reasons, Space Telescope Science Institute
decided to go full tilt into python development -- they
created "numarray" and "pyraf", and they are the ones paying
for the "chaco" development contract.
Which brings up another point -- whereas with proprietary
software (and stuff written using it, like the IDL astronomy
library) can leave you with an enormous investment in stuff
you can't use, free software development can often be just
as cheap, and you get to keep what you make.
At one point, I was seriously thinking about trying to write
some kind of translator to convert those IDL libs into
python libs (quixotic of me?).
So why rent when you can own?
Scientists certainly do understand all that bit about
"seeing further" because you're "standing on the shoulders
of giants". With proprietary software, the giants keep
getting shot out from under you, which tends to make things
a bit harder to keep up with.
Cheers,
Terry
--
Terry Hancock (han...@AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com
> In fact, if I had one complaint about Python, it was the
> "with a suitable array of add-ons" caveat. The proprietary
> alternative had all of that rolled into one package (abeit
> it glopped into one massive and arcane namespace), whereas
> there was no "Python Data Language" or whatever that
> would include all that in one named package that everyone
> could recognize (I suppose SciPy is trying to achieve that).
I believe the Enthought distribution of Python (for Windows, with a Mac
version planned) is trying to move exactly in that direction, by
packaging up everything and a half (while of course leaving a reasonable
assignment of namespaces from the pieces it's packaging!-). However,
maintaining such a distro, and making it available for a wider variety
of platforms, are heavy, continuing tasks -- unless either firms, such
as Enthought, or volunteers, commit to such tasks, they won't "just
happen".
Alex
One of the things python addresses best is the division of labor, where
the subtle concepts are available to those who need them and hidden
from those who don't need them. From what I understand of your work
(and what I have seen of the work of two other neuroscientists,
actually) Python would be a good choice for you.
That said, the level of computational skill in many scientists is
alarming. Why do we expect to spend six semesters learning mathematics
and expect to pick up computing "on the side"? It baffles me. Frankly,
saying "I don't need version control" sounds to me no less foolish than
saying "I don't need logarithms". (Perhaps you don't but someday soon
you will.)
"Speed of excecution is an issue, regardless of what computer science
folks try to tell you." strikes me as nothing short of hallucinatory.
No informed person says that speed is never an issue, and a great deal
of effort is spent on speed. Where do you suppose your Fortran
compiler came from in the first place?
For someone without legacy code to worry about, fussing with Fortran
for single-user one-off codes strikes me as a weak choice. If you are
hitting Matlab's performance or memory limits, you should take the time
to learn something about computation, not because you are idle, but
because you are busy. Or if you prefer, because your competitors will
be learning how to be more productive while you put all your efforts
into coping with crude tools.
The peculiar lack of communication between computer scientists and
application scientists is real; but I believe the fault is not all on
one side. The fact that you have a PhD does not prove that you know
everything you need to know, and I strongly recommend you reconsider
this attitude. For one thing, you misjudged which side of the divide I
started on.
Michael Tobis
(While I dislike credentialism on usenet, I will reply in kind. I hold
a Ph.D. in geophysical fluid dynamics.)
unless you are a theorist! in that case, I would order this list backwards.
>
> 1. Time is money. Time is the only thing that a scientist cannot afford
> to lose. Licensing fees for Matlab is not an issue. If we can spend
> $1,000,000 on specialised equipment we can pay whatever Mathworks or
> Lahey charges as well. However, time spent programming are an issue.
> (As are time time spend learning a new language.)
>
Another person stated that they don't have infinite funds, as implied here. I would
add that, in addition to one's own research, professors must also teach and advise.
I find it very helpful to be able to say to a student, "go download this, and here is
the code I wrote for the work I do". The price is often an impediment for getting
students into research. Often there are site licenses, but they don't work off campus.
> 2. We don't need fancy GUIs. GUI coding is a waste of time we don't
> have. We don't care if Python have fancy GUI frameworks or not.
>
again, for sharing ideas, GUIs are *necessary*. If you work with people who do less
programming than you, then you need to make an interface to your code that they can
understand. it doesn't have to be fancy, just functional.
> 3. We do need fancy data plotting and graphing. We do need fancy
> plotting and graphing that are easy to use - as in Matlab or S-PLUS.
>
here, I've found python to be good, but not great. matplotlib (pylab) is a really
great thing, but is not as straightforward as plotting in Matlab. Either, you have a
window which locks the process until you close it, or you do interactive mode, but
the window results disappear if any other window is put on top (like your shell), and
has to be manually redrawn. This makes it far less convenient to deal with in
interactive mode.
> 4. Anything that has to do with website development or enterprise class
> production quality control are crap that we don't care about.
>
I think it can be pitched as an alternative to shell-scripts, which is a nice economy
of concepts: the language you use for your scientific work, you can also use for your
OS work, and your tinkering.
> 7. "My simulation is running to slowly" is the number ONE complaint.
> Speed of excecution is an issue, regardless of what computer science
> folks try to tell you. That is why we spend disproportionate amount of
> time learning to vectorize Matlab code.
>
here, I would plug Pyrex like crazy. to me the Python/Pyrex combination is the
biggest selling point for me to convert my scientific matlab code to Python.
Learning a new API is a drag, and I've found that SWIG is not particularly intuitive
(although convenient, if you have a lot of libraries already written). Pyrex seems
to get the best of all possible worlds: seamless use of python objects, and the
ability to do C-loops for speed, with no API. Making extensions this way is a real joy.
> Now please go ahead and tell me how Python can help me become a better
> scientist. And try to steer clear of the computer science buzzwords
> that don't mean anyting to me.
>
I have been using Matlab for nearly 10 years. My claim to no-fame is the neural
network simulator Plasticity (http://web.bryant.edu/~bblais/plasticity) which has
taken me years to write. I have some complaints about Matlab, but it has been a
useful tool. Some of my complaints are as follows:
1) Cost. I find that the marketing model for Matlab is annoying. They
nickle-and-dime you, with the base package (educational pricing) at $500 per
machine/operating system/user and then between $200-$500 *per* "toolbox", which adds
up really quick. I can't even buy 1 license for a dual boot, to have Matlab run on
a Linux partition and a Windows partition.
The cost impacts my use of Matlab in the classroom, and with research students.
2) License Manager. The license manager for Matlab is the most inconvenient program
I have ever dealt with. It is incredibly sensitive to the license file, and it
nearly impossible to debug. This has made Matlab one of the hardest programs to
install, for me. The issue that impacts my productivity is the following: the
license key is tied to the network card, on eth0. Thus, if I upgrade my laptop, I
need to contact Mathworks for an updated license key. Also, occasionally, my
operating system decides to name my wireless card eth0, and my wired card eth1.
Nothing else is affected by this, but then I can't run Matlab!
3) Upgrade Version Hell. *Every* time Matlab has upgraded, my program has broken.
Usually something small, but still it is a real pain in the butt. Also, I have to
pay full price for the upgrade, or pay some fee continuously whether there is an
upgrade or not.
I have only been using Python for about 2 months, so I can't speak to some issues,
but what does Python offer me?
1) Free (as in free beer). I've elaborated on this above.
2) Free (as in free speech). I like the fact that I am not burdened by having my
projects tied to something proprietary.
3) Distribution ease. With py2exe, I can distribute on Windows systems which have no
python installed. That's a real plus!
4) Clean programming environment. For teaching, it is nice to use a language which
is so readable.
5) A huge number of built-in, or available, packages for nearly everything.
6) The ability to write portions of code in an optimized, as-fast-as-C, manner.
7) Relatively easy GUI frameworks
I'm sure there are other things, but that's the way I am thinking right now.
--
-----------------
> On Fri, 3 Mar 2006 22:05:19 -0500, David Treadwell
> <i.failed.t...@gmail.com> declaimed the following in
> comp.lang.python:
>
>
>> My ability to think of data structures was stunted BECAUSE of
>> Fortran and BASIC. It's very difficult for me to give up my bottom-up
>> programming style, even though I write better, clearer and more
>> useful code when I write top-down.
>>
>>
IIRC, during 1984, my senior year, BYTE magazine had a cover story on
OOP. (OH, how I loved the cover art back in the day!) My impression
after reading the article: WTF is that for? Every class I had which
needed programming, be it CS or chemical engineering, taught it in a
very linear, bottom-up fashion. First you read inputs, then do Foo,
then do Bar, then do (fill in your favorite third-level word).
It was even worse in the chemistry department. No chem major (other
than myself, as I also was a chem eng major) had _any_ computer
courses beyond the required Fortran 101. It was a struggle to get any
chemistry student to do anything that required using LINPAC,
Mathematical Recipes code or a plotting program.
This level of absurdity reached its apex during grad skewl when I
wrote a monolithic 30-page program in MS QuickBasic for Mac to
simulate NMR spectra. No one else was ever able to understand how to
use the program.
Even worse, that 30-page program disappeared the day Mac System 7.0
was installed. The byte-compiled basic code became unreadable because
System 7 permanently broke QuickBasic.
The last program I wrote for anyone else's use was written in VBA/
Excel. I hated every minute of it, but the request was made because
everyone has Excel, and nobody wanted to install Python. VBA has at
least 3 API's running simultaneously (Excel, VBA-classic and VBA-
pseudoOOP). Now that I know Py2App, that dragon has been slain.
Which brings me to my last point: Why be beholden to the graces of a
single-source supplier of a language or OS? Proprietary systems will
always become either (a) extinct or (b) so crammed with legacy code
they become unreliable.
I still don't get OOP completely, but Python has helped a great deal.
> FORTRAN was quite amenable to "top-down" design (which was barely
> taught during my senior year: 1980; and I'm a CS major -- you don't
> want
> to see the text used for the "systems analysis" class; I don't recall
> ever seeing a "requirement document" created in the entire text...)
>
> I've done lots of FORTRAN stuff where the main program was
> interchangeable...
>
> program main
> ...
> call Initialize()
> call Process()
> call CleanUp()
> stop
> end
>
Sure. It seems logical now. But remember, I learned WatFiv Fortran,
which came before even Fortran 77. I saw my first Fortran 95 code
about two years ago. It took me a while to realize that what I was
looking at _was_ Fortran!
>> 3. I demand a general-purpose toolkit, not a gold-plated screwdriver.
>>
>
> Would a gold-plated sonic screwdriver work? <G> {Let's see how many
> catch that reference}
>
Google makes this game too easy, but it's to Who you refer. How about
_this_ reference: "What we all need is a left-handed monkey wrench."
<grin>
:--David
You know, it just occured to me that you sent me on a quest
to get a "left-handed monkey wrench", and like a fool, I
went. Of course, I found not one, but many...
In fact, this expression has so many echoes on the
web, it would be difficult to ascertain where it actually
came from (the derivation is obvious, of course, but who
first coined the expression?).
> F90/F95 is scary, isn't it...
> F77 wasn't that big a change from FORTRAN-IV (F66) (hmmm, we're due
> for another standard, aren't we? 1966, 1977, 1990 [95 was a tweak]...)
Fortran 2003 is the latest standard -- see
http://www.fortran.com/fortran/fcd_announce.html
"The major additions are Object-Oriented Programming and
Interoperability with C [...]. Minor additions include procedure
pointers, finalization of derived-type objects, parameterized derived
types, pointer rank remapping (allows viewing one-dimensional arrays as
higher-dimensional arrays), enumerations, the ASSOCIATE construct
(similar to Pascal's WITH), transferring an allocation (generalization
of the frequently-requested reallocate capability), VOLATILE attribute,
access to the command line and environment variables, standard named
constants for "*" input and output units, access to message text for
input/output and other errors, access to features of the IEEE
floating-point arithmetic standard, longer names and statements,
generalization of expressions for array dimensions and initial values,
user-defined derived-type input/output, asynchronous input/output, and
stream input/output--and this list is not exhaustive."
> On Sat, 4 Mar 2006 14:23:10 -0500, David Treadwell
> <i.failed.t...@gmail.com> declaimed the following in
> comp.lang.python:
>
>> needed programming, be it CS or chemical engineering, taught it in
>> a wlf...@ix.netcom.com
>> very linear, bottom-up fashion. First you read inputs, then do Foo,
>> then do Bar, then do (fill in your favorite third-level word).
>>
> Doesn't quite sound like what was "bottom up" in my day... Sounds
> like an common "input, process, output" design.
Well, you're right about my description--that's what the program
ended up looking like. It wasn't the way it was written.
Usually, I'd do it more like this:
Find or derive the formula of interest.
Translate the formula into ForTran (hmmm ... catchy name for a
language!)
Put the minimum number of statements around the formula to get it to
run.
Write the UI. Back in the day, a user interface consisted of
$ENTRY
...
or
9999 DATA ...
or, if we were feeling lucky,
READ (6,*) ... (or whatever)
By the time I finished writing the code, if I was lucky, it looked
like what I described.
Very few subroutines (Maybe some LINPAC or some other code package),
lots of the dreaded GOTO statements.
Comment statements? A waste of punch cards and RAM. *LOL*
Then spend weeks squashing bugs.
Shells, wrappers, IF...THEN...ELSE blocks, code with only one input
and one output were concepts unknown to us.
To put it mildly, I feel very under-served by my formal computer
education. Perhaps it was just a matter of bad timing. Still, as one
post said, they teach us 8 semesters of mathematics and one of
Fortran, expecting us to learn how to program by ourselves. Pick up
any older book on "Numerical Methods in the Foo Sciences". They got
the numbers crunched, but the programming style was wretched. Donald
Knuth, were were you?
> "Bottom up", to me, was more: write the code to read/write the data
> records; then use those routines to write code to modify the
> contents of
> the records; repeat ad infinitum (it seemed) until you reached the
> main
> program -- which sort of came down to a "fiat lux".
>
> "Top down/Functional decomposition" ran the other way... Start with
> the main program. Determine what major activities it has to
> perform, and
> add invocations to stub routines for each of those. Repeat with each
> stub, add invocations to the stubs for the functions it had to
> perform... until you reach the layer that does the "physical work",
> say... Problem with this approach, especially in a poorly thought out
> team, is that you may have widely spaced sections that have to
> manipulate the same file, say -- and end up with multiple routines all
> incompatible to do that.
Been there, done that. Aren't XML and the other structured data
protocols nice?
>
> Often I'd (on solo efforts) get to some point, and then reverse, to
> build the low-level routines as a library... I think I've heard that
> defined as "outside in" design.
Sometimes, I'd add so many layers of crud onto the original meat of
the program, it was just easier to toss the whole thing out. What a
royal waste of time. Far better to plan ahead.
>
> In one aspect, OOP is closer to the "bottom up" approach, since one
> is creating "objects" to represent/manipulate the same things one
> started with in "bottom up"; just better encapsulated -- you don't
> suddenly find some routine 10 layers up that is directly manipulating
> the data (object state) with no direct understanding of where this
> data
> came from, where it is going, etc.
Yes, but the computer doesn't, and never did "understand" the data.
All of this OOP stuff was invented to help the Programmer and Legacy
Code Maintainers understand the processing.
> In my college, it was the business majors that got hit... The
> Statistics II course was 90% number crunching using a statistics
> package
> (I've forgotten the name of the package -- its been 27 years; and as a
> CS major I only had to take Stat I... SPSS comes to mind: Statistical
> Package for the Social Sciences).
And COBOL. Even in 1977, I used to feel sorry for the people who had
to program in COBOL.
>
> Database management was worse: at the time, my textbook went
> "Hierarchical, Network, (theoretical) Relational" (five years later, a
> new edition went "Relational, (historical) Network, (historical)
> Hierarchical". Campus mainframe ran a DBTG Network model DBMS.
As a general rule, the scientists and engineers I know can't
understand the structure of a database that doesn't fit into MS Excel.
>
>>
>> I still don't get OOP completely, but Python has helped a great deal.
>>
> I'm still terrible with the more intangible "objects", and various
> inheritance concerns... I do much better when the "objects"
> directly map
> to tangible things (it is easy to model a radio: components include
> things like "tuner", "audio amplification", "demodulator/mixer") --
> but
> trying to visualize a broadcast received on that radio as an
> "object" is
> not so easy. And this is after having taken a few OOA/OOD courses.
I can't wait to need that kind of stuff. I think I'm baffled now...
>> Sure. It seems logical now. But remember, I learned WatFiv Fortran,
>> which came before even Fortran 77. I saw my first Fortran 95 code
>> about two years ago. It took me a while to realize that what I was
>> looking at _was_ Fortran!
>>
> F90/F95 is scary, isn't it...
>
> F77 wasn't that big a change from FORTRAN-IV (F66) (hmmm, we're due
> for another standard, aren't we? 1966, 1977, 1990 [95 was a tweak]...)
Shouldn't Fortran 2003 be out by now? Maybe it will be so great that
Python 3.x will be written completely in it? FPython, ayone?
>
> The major things F77 added were: a string type, consolidation of I/O
> such that encode()/decode() weren't needed, and blocked IF/ELSE
> statements.
I made the mistake of using the word "Hollerith" in front of a 21
y.o. CS grad. I had to explain the history of the punch card.
>> Google makes this game too easy, but it's to Who you refer. How about
>> _this_ reference: "What we all need is a left-handed monkey wrench."
>
> Since I'm not going to Google it, I'll have to pass.
> {I'll also have to set up a recorder for March 17, SciFi Channel}
The Grateful Dead paying homage to a time-honored Navy tradition. Is
the SciFi Channel having a Who-fest?
:--David
I see we've forgone the standard conventions of politeness and gone straight for
the unfounded assumptions. Fantastic.
>>These include: source and version control and audit trails for runs,
>>build system management, test specification, deployment testing (across
>>multiple platforms), post-processing analysis, run-time and
>>asynchronous visualization, distributed control and ensemble
>>management.
>
> At this point, no scientist will no longer understand what the heck you
> are talking about. All have stopped reading and are busy doing
> experiments in the laboratory instead. Perhaps it sound good to a CS
> geek, but not to a busy researcher.
>
> Typically a scientist need to:
>
> 1. do a lot of experiments
>
> 2. analyse the data from experiments
>
> 3. run a simulation now and then
Being a one-time scientist, I can tell you that you're not getting it right. You
have an extremely myopic view of what a scientist does. You seem to be under the
impression that all scientists do exactly what you do. When I was in the
geophysics program at Scripps Institution of Oceanography, almost no one was
doing experiments. The closest were the people who were building and deploying
instruments. In our department, typically a scientist would
1. Write grant proposals.
2. Advise and teach students.
3. Analyze the data from the last research cruise/satellite passover/earthquake.
4. Do some simulations.
5. Write a lot of code to do #3 and #4.
There are whole branches of science where the typical scientist usually spends a
lot of his time in #5. Michael Tobis is in one of those branches, and his
article was directed to his peers. As he clearly stated.
You are not from one of those branches, and you have different needs. That's
fine, but please don't call the kettle black.
> Thus, we need something that is "easy to program" and "runs fast
> enough" (and by fast enough we usually mean extremely fast). The tools
> of choice seems to be Fortran for the older professors (you can't teach
> old dogs new tricks) and MATLAB (perhaps combined with plain C) for the
> younger ones (that would e.g. be yours truly). Hiring professional
> programmers are usually futile, as they don't understand the problems
> we are working with. They can't solve problems they don't understand.
I call shenanigans. Believe me, I would love it if it were true. I make my
living writing scientific software. For a company where half of us have science
degrees (myself included) and the other half have CS degrees, it would be great
advertising to say that none of those other companies could ever understand the
problems scientists face. But it's just not true.
Scientists are an important part of the process, certainly. They're called
"customers." Their needs drive the whole process. The depth and breadth of their
knowledge of the field and the particular problem are necessary to write good
scientific software. But it usually doesn't take particularly deep or broad
knowledge to write a specific piece of software. Once the scientist can reduce
the problem to a set of requirements, the CS guys are a perfect fit. That's what
a good professional programmer does: take requirements and produce software that
fulfills those requirements. They do the same thing regardless of the field. In
my company, everyone pulls their weight, even the person with the philosphy degree.
At that point, the CS skillset is perfectly suited to writing good scientific
software. Or at least, any given CS-degree person is no less likely to have the
appropriate skillset than a science-degree person. Frequently, they have a much
broader and deeper skillset that is actually useful to writing scientific
software. Most of the scientists I know couldn't write robust floating point
code to save his life. Or his career.
> What you really ned to address is something very simple:
>
> Why is Python better a better Matlab than Matlab?
>
> The programs we need to write typically falls into one of three
> categories:
>
> 1. simulations
> 2. data analysis
> 3. experiment control and data aquisition
>
> (that are words that scientists do know)
>
> In addition, there are 10 things you should know about scientific
> programming:
>
> 1. Time is money. Time is the only thing that a scientist cannot afford
> to lose. Licensing fees for Matlab is not an issue. If we can spend
> $1,000,000 on specialised equipment we can pay whatever Mathworks or
> Lahey charges as well. However, time spent programming are an issue.
> (As are time time spend learning a new language.)
>
> 2. We don't need fancy GUIs. GUI coding is a waste of time we don't
> have. We don't care if Python have fancy GUI frameworks or not.
Uh, time is money? Fighting unusable interfaces, GUI or otherwise, is a waste of
resources. My brother works in biostatistics at the NIH. Every once in a while,
the doctors he works for will ask him to do a particular analysis which requires
him to use a particularly unusable piece of software. Every time, he has to
spend half a day setting up the problem. This is why he's the one who gets to do
it instead of the doctors.
Now, he's considering rewriting the program in Python with a GUI that will
essentially provide a Big Red Go Button (TM) so the doctors can do the analysis
in a fraction of the time it takes now.
> 3. We do need fancy data plotting and graphing. We do need fancy
> plotting and graphing that are easy to use - as in Matlab or S-PLUS.
>
> 4. Anything that has to do with website development or enterprise class
> production quality control are crap that we don't care about.
There are quite a few scientists who are managing gigantic amounts of data, and
run experiments/observations/whate-have-you so large that they need
multi-institutional participation. Sharing that data in an efficient manner
*does* require good dynamic websites and enterprise class software backing it up.
There are more kinds of scientist in Heaven and Earth than are dreamt of in your
philosophy.
> 5. Versioning control? For each program there is only one developer and
> a single or a handful users.
I used to think like that up until two seconds before I entered this gem:
$ rm `find . -name "*.pyc"`
Okay, I didn't type it exactly like that; I was missing one character. I'll let
you guess which.
This is one thing that a lot of people seem to get wrong: version control is not
a burden on software development. It is a great enabler of software development.
It helps you get your work done faster and easier even if you are a development
team of one. You can delete code (intentionally!) because it's not longer used
in your code, but you won't lose it. You can always look at your history and get
it again. You can make sweeping changes to your code, and if that experiment
fails, you can go back to what was working before. Now you can do this by making
copies of your code, but that's annoying, clumsy, and more effort than it's
worth. Version control makes the process easier and lets you do more interesting
things.
I would go so far as to say that version control enables the application of the
scientific method to software development. When you are in lab, do you say to
yourself, "Nah, I won't write anything in my lab notebook. If the experiment
works at the end of the day, only that result matters"?
> 6. The prototype is the final version. We are not making software for a
> living, we are doing research.
I have lots of research code on my harddrive with decade-long changelogs that
give the lie to that statement. If the code is useful now, it will probably
still be useful in a few years. People will add to it, make suggestions, build
on your work.
This is how science is supposed to work. Practices which encourage this behavior
are good things for science.
> 7. "My simulation is running to slowly" is the number ONE complaint.
> Speed of excecution is an issue, regardless of what computer science
> folks try to tell you. That is why we spend disproportionate amount of
> time learning to vectorize Matlab code.
>
> 8. "My simulation is running of of memory" is the number TWO complaint.
> Matlab is notoriously known for leaking memory and fragmenting the
> heap.
>
> 9. What are algorithms and data structures? Very few of us knows how to
> use a datastructure more complicated than an array. That is why we like
> Matlab and Fortran so much.
Yes, and this is why you will keep saying, "My simulation is running too
slowly," and "My simulation is running out of memory." All the vectorization you
do won't make a quadratic algorithm run in O(n log(n)) time. Knowing the right
algorithm and the right data structures to use will save you programming time
and execution time. Time is money, remember, and every hour you spend tweaking
Matlab code to get an extra 5% of speed is just so much grant money down the drain.
That said, we have an excellent array object far superior to Matlab's.
> 10. We are novice programmers. We are not passionate programmers. We
> take no pride in our work. The easier hack the better. We don't care if
> we are doing OOP or not. However, we do hate complicated APIs or APIs
> that look funny. We are used to seeing sin(x) in our calculus textbooks
> and because of that we don't find Math.Sin(x) particularly elegant --
> even though Math.Sin(x) is more OOP and sin(x) clutters the global
> namespace.
>
> Now please go ahead and tell me how Python can help me become a better
> scientist. And try to steer clear of the computer science buzzwords
> that don't mean anyting to me.
1. You will probably spend less time writing and running software.
2. If you play your cards right, more people will be able to use and improve
your software.
--
Robert Kern
rober...@gmail.com
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
All I can say is that I've never seen anything like that behavior. Are you tried
using ipython in pylab mode?
I'd like to ask, being new to python, in which ways is this array object far superior
to Matlab's? (I'm not being sarcastic, I really would like to know!)
I've heard similar things about matplotlib, about how it surpasses Matlab's graphics.
I haven't personally experienced this, but I'd like to know in which ways it is.
bb
> 1. Write grant proposals.
>
> 2. Advise and teach students.
Sorry I forgot the part about writing grant applications. As for
teaching students, I have thankfully not been bothered with that too
much.
> Yes, and this is why you will keep saying, "My simulation is running too
> slowly," and "My simulation is running out of memory." All the vectorization you
> do won't make a quadratic algorithm run in O(n log(n)) time. Knowing the right
> algorithm and the right data structures to use will save you programming time
> and execution time. Time is money, remember, and every hour you spend tweaking
> Matlab code to get an extra 5% of speed is just so much grant money down the drain.
Yes, and that is why I use C (that is ISO C99, not ANSI C98) instead of
Matlab for everything except trivial tasks. The design of Matlab's
language is fundamentally flawed. I once wrote a tutorial on how to
implement things like lists and trees in Matlab (using functional
programming, e.g. using functions to represent list nodes), but it's
just a toy. And as Matlab's run-time does reference counting insted of
proper garbage collection, any datastructure more complex than arrays
are sure to leak memory (I believe Python also suffered from this as
some point). Matlab is not useful for anything except plotting data
quickly. And as for the expensive license, I am not sure its worth it.
I have been considering a move to Scilab for some time, but it too
carries the burden of working with a flawed language.
> My ability to think of data structures was stunted BECAUSE of
> Fortran and BASIC. It's very difficult for me to give up my bottom-up
> programming style, even though I write better, clearer and more
> useful code when I write top-down.
That is also the case with Matlab, anything mor complex than an array
is beyond reach.
I think it was Paul Graham (the LISP guru) once claimed he did not miss
recursive functions while working with Fortran 77. The limitations of a
language does stunt your mind. To that extent I am happy I learned to
program with Pascal and not Fortran or Matlab.
S.M.
Ouch. Is that a true story?
While we're remeniscing about bad typos and DEC, I should tell the
story about the guy who clobberred his work because his English wasn't
very strong.
Under RT-11, all file management was handled by a program called PIP.
For example to get a list of files in the current working directory you
would enter PIP *.* /LI . Well this fellow, from one of those countries
where our long "e" sound is their "i" sound, mournfully announced
"I vanted a deerectory so I typed 'PIP *.* /DE' "
That one is true.
mt
Yup. Fortunately, it was a small, purely personal project, so it was no huge
loss. It was enough for me to start using CVS on my small, purely personal
projects, though!
--
Robert Kern
rober...@gmail.com
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
> just a toy. And as Matlab's run-time does reference counting insted of
> proper garbage collection, any datastructure more complex than arrays
> are sure to leak memory (I believe Python also suffered from this as
> some point).
Yes, that was fixed in the move from 1.5.2 to 2.0 (I don't recall if the
intermediate short-lived 1.6 also fixed it, but nobody used that
anyway;-).
> Matlab is not useful for anything except plotting data
> quickly. And as for the expensive license, I am not sure its worth it.
> I have been considering a move to Scilab for some time, but it too
> carries the burden of working with a flawed language.
There was a pyscilab once, still around at
<http://pdilib.sourceforge.net/>, but I don't think it ever matured
beyond a "proof of concept" release 0.1 or something.
Alex
>>Yes, and this is why you will keep saying, "My simulation is running too
>>slowly," and "My simulation is running out of memory." All the vectorization you
>>do won't make a quadratic algorithm run in O(n log(n)) time. Knowing the right
>>algorithm and the right data structures to use will save you programming time
>>and execution time. Time is money, remember, and every hour you spend tweaking
>>Matlab code to get an extra 5% of speed is just so much grant money down the drain.
>
> Yes, and that is why I use C (that is ISO C99, not ANSI C98) instead of
> Matlab for everything except trivial tasks. The design of Matlab's
> language is fundamentally flawed. I once wrote a tutorial on how to
> implement things like lists and trees in Matlab (using functional
> programming, e.g. using functions to represent list nodes), but it's
> just a toy. And as Matlab's run-time does reference counting insted of
> proper garbage collection, any datastructure more complex than arrays
> are sure to leak memory (I believe Python also suffered from this as
> some point).
Python still uses reference counting and has several very good data structures
more complex than arrays. And yet, most programs don't leak memory.
> Matlab is not useful for anything except plotting data
> quickly. And as for the expensive license, I am not sure its worth it.
> I have been considering a move to Scilab for some time, but it too
> carries the burden of working with a flawed language.
And you need to ask why Python is a better Matlab than Matlab?
Matlab takes the view that "everything is a rank-2 matrix of floating point values."
Our arrays have been N-dimensional since day one. They really are arrays, not
matrices. You have complete control over the types.
> And you need to ask why Python is a better Matlab than Matlab?
First there are a few things I don't like:
1. Intendation as a part of the syntax, really annoying.
2. The "self.something" syntax is really tedious (look to ruby)!
4. Multithreading and parallel execution is impossible AFAIK because of
the so-called GIL (global interpreter lock). Matlab is perhaps even
worse in this respect.
5. I don't like numpy's array slicing. Array operations should be a
part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.
And there is a couple of questions I need answered:
1. Can python do "pass by reference"? Are datastructures represented by
references as in Java (I don't know yet).
2. How good is matplotlib/pylab? I tried to install it but only get
error messages so I haven't tested it. But plotting capabilities is
really major issue.
3. Speed. I haven't seen any performance benchmarks that actually deals
with things that are important for scientific programs.
4. Are there "easy to use" libraries containing other stuff important
for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
Fourier transforms, etc. E.g. in Matlab I can just type,
[u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
vendor-optimized BLAS
Even though the language itself is very limited this type of library
functionality more than makes up for it.
I have looked for alternatives to Matlab for quite a while, mainly due
to the cost, the åpoor speed and poor memory management. I am not sure
it is Python but so far I have not found anything mor promising either.
> First there are a few things I don't like:
>
> 1. Intendation as a part of the syntax, really annoying.
Each to his own. I find having the compiler enforce indentation rules is a
real benefit. There's nothing quite so annoying as having 'coding
standards' you are supposed to be following which aren't enforced (so
nobody else working on the code followed them either). C code which never
executes a for loop because of a spurious ';' can be pretty annoying too.
>
> 2. The "self.something" syntax is really tedious (look to ruby)!
I find it really useful to use a similar style when programming in other
languages such as Java or C#. In particular it means that you can
instantly see when you are referring to a member variable (without
having to prefix them all with m_ or some other abomination) and when you
have method arguments which get assigned to member variables you don't have
to think of different names for each.
>
> 4. Multithreading and parallel execution is impossible AFAIK because of
> the so-called GIL (global interpreter lock). Matlab is perhaps even
> worse in this respect.
Multithreading and parallel execution work fine. The only problem is that
you don't get the full benefit when you have multiple processors. This one
will become more of an annoyance in the future though as more systems have
hyperthreading and multi-core processors.
>
> And there is a couple of questions I need answered:
>
> 1. Can python do "pass by reference"? Are datastructures represented by
> references as in Java (I don't know yet).
>
Python only does "pass by reference", although it is more normally referred
to as "pass by object reference" to distinguish it from language where the
references refer to variables rather than objects.
What it doesn't do is let you rebind a variable in the caller's scope which
is what many people expect as a consequence of pass by reference. If you
pass an object to a function (and in Python *every* value is an object)
then when you mutate the object the changes are visible to everything else
using the same object. Of course, some objects aren't mutable so it isn't
that easy to tell that they are always passed by reference.
> > 1. Can python do "pass by reference"? Are datastructures represented by
> > references as in Java (I don't know yet).
> >
> Everything in Python is a reference to an object. I think the
> question you want is more on the lines of: Can I change an object that
> has been passed? Short answer: depends on what type of object is at the
> end of the reference. A mutable container object (list, dictionary,
> maybe a class instance) can have its contents changed.
Thank you.
Hi, I will respond to things that others haven't responded yet
> 2. How good is matplotlib/pylab? I tried to install it but only get
> error messages so I haven't tested it. But plotting capabilities is
> really major issue.
I don't know because I haven't managed to get it working either. But
other people have and I guess it should not be so difficult.
I personally use gnuplot-py (gnuplot-py.sourceforge.net) which I adapted
to the new NumPy/SciPy (see below) by searching-and-replacing "Numeric"
by "numpy". It allows to use raw gnuplot commands.
> 4. Are there "easy to use" libraries containing other stuff important
> for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
> Fourier transforms, etc. E.g. in Matlab I can just type,
>
> [u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
> vendor-optimized BLAS
Yes !
There is the excellent SciPy package, which you can get at www.scipy.org
Personnally I use it a lot for linear algebra (linked to
LAPACK/ATLAS/BLAS), but there are also libraries for statistics,
optimization, signal processing, etc.
There has been many changes recently, including package names, so don't
get confused and be sure to get recent versions of NumPy and SciPy ;).
Evan
Python uses reference counting *AND* cyclic-garbage collection for the
kind of garbage that wouldn't go away by RC due to reference-cycles
(plus, also, weak references for yet another helper). To leak memory
despite all of that, you really need to do it on purpose (e.g. via a
C-coded container extension-type that does NOT play nice with gc;-).
Alex
> 2. The "self.something" syntax is really tedious (look to ruby)!
>
This is done because of a preference from explicit references over
implied ones. It does avoid a lot of namespace confusion.
By the way, anyone who can't count shouldn't be criticising programming
languages. What happened to "3"?
> 4. Multithreading and parallel execution is impossible AFAIK because of
> the so-called GIL (global interpreter lock). Matlab is perhaps even
> worse in this respect.
>
Right. So kindly tell us how to write thread-safe code without using a
GIL. This is not an easy problem, and you shouldn't assume that all you
have to do to get rid of the GIL is to wave your magic wand. There are
deep reasons why the GIL is there.
> 5. I don't like numpy's array slicing. Array operations should be a
> part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.
>
Slicing *is* a part of the language, inserted into the grammar (as far
as I know) precisely to support the numeric/scientific community.
>
> And there is a couple of questions I need answered:
>
> 1. Can python do "pass by reference"? Are datastructures represented by
> references as in Java (I don't know yet).
>
All assignments store references.
> 2. How good is matplotlib/pylab? I tried to install it but only get
> error messages so I haven't tested it. But plotting capabilities is
> really major issue.
>
Good enough to keep you away, apparently ;-) (Sorry, I don't use these
features).
> 3. Speed. I haven't seen any performance benchmarks that actually deals
> with things that are important for scientific programs.
>
The major fact here is that no matter how fast a language is there is
always a need for more speed in certain areas.
Suffice it to say that Python is being used for a wide range of
scientific and engineering problems to the evident satisfaction of its
users.
> 4. Are there "easy to use" libraries containing other stuff important
> for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
> Fourier transforms, etc. E.g. in Matlab I can just type,
>
> [u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
> vendor-optimized BLAS
>
> Even though the language itself is very limited this type of library
> functionality more than makes up for it.
>
The more people who join in and write libraries to add to the growing
corpus of scientific and engineering libraries the sooner the answer to
this question will be "we have everything you want".
For the moment, however, since apparently Google isn't available where
you are, a quick search for "Python LAPACK" gave
http://mdp-toolkit.sourceforge.net/faq.html
as its first hit. This appears to include information about how to have
LAPACK make use of ATLAS' faster LAPACK routines. Satisfied?
>
> I have looked for alternatives to Matlab for quite a while, mainly due
> to the cost, the åpoor speed and poor memory management. I am not sure
> it is Python but so far I have not found anything mor promising either.
>
You know, recently the Python community has acquired a reputation in
certain quarters for defensive support of the status quo. With
ill-informed criticism like this from self-confessed beginners it's not
hard to see how this myth has arisen.
I'd be very surprised if Python doesn't already give you 95% of what you
appear to want. If you prejudices about indented code and self-relative
references blind you to the clear advantages of the Python environment
then frankly you are a lost cause, and good riddance.
If, on the other hand, you are prepared to engage the community and do a
little bit of learning rather than just trolling, you may find that one
of the most valuable features of Python is its supportive user base,
whom at the moment you seem to be doing your best to offend.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd www.holdenweb.com
Love me, love my blog holdenweb.blogspot.com
I claim that this is a trivial matter. Indentation is enforced, but if
you want to end all your blocks with #end, feel free.
Or write a preprocessor to go from your preferred block style to
python's
2) self.something tedious to look at.
Again, you can somewhat work around this if you like. Just use "s" or
"_" instead of "self".
3) missing
4) multithreading and parallel execution impossible
This is simply false, though admittedly the MPI libraries need a little
work. Mike Steder of our group is likely to release an alternative. A
good way to think of Python is as a powerful extension to C. So using
MPI from Python just amounts to setting up a communicator in C and
wrapping the MPI calls.
As for less tightly coupled threads on a single processor, Python is
adept at it. I think the issues with multiple processors are much more
those of a server farm than those of a computational cluster.
We have been encountering no fundamental difficulty in cluster programs
using Python.
5) "I don't like numpy's array slicing" ?
this is unclear. It is somewhat different form Matlab's, but much more
powerful.
1) pass by reference
Python names are all references. The model is a little peculiar to
Fortran and C people, but rather similar to the Java model.
2) matplotlib
A correct install can be difficult, but once it works it rocks. ipython
(a.k.a. pylab) is also a very nice work environment.
3D plots remain unavailable at present.
3) speed
Speed matters less in Python than in other languages because Python
plays so well with others. For many applications, NumPy is fine.
Otherwise write your own C or C++ or F77; building the Python bindings
is trivial. (F9* is problematic, though in fact we do some calling of
F90 from Python using the Babel toolkit)
4) useful libraries
yes. for your svd example see Hinsen's Scientific package. In general,
Python's claim of "batteries included" applies to scientific code.
mt
Don't forget that there are portions of Smalltalk syntax
(blocks) added in as well. I guess it could be seen as Perl-NG.
Both the name 'Ruby' and the Ruby syntax seems to suggest that
Matz had the idea to "flirt" a bit with the Perl programmers,
and considering how Perl seems to be in decline today, that
might have been clever from a user-base point of view. Whether
it was really good for the language is another issue. I still
think it's a bit prettier than Perl though.
> For CPU-bound number-crunching, perhaps... For I/O-bound jobs, the
> GIL is(should be) released when ever a thread is blocked waiting for I/O
> to complete.
I think CPU-bound number-crunching was the big deal in this case.
Somehow, I doubt that the OP uses Matlab for I/O-bound jobs. At
least if writing threaded applications becomes less error prone
in competing languages, this might well be the weak point of Python
in the future. I hope to see some clever solution to this from the
Python developers.
It seems the Python attitude to performance has largely been:
Let Python take care of development speed, and let Moore's law
and the hardware manufacturers take care of execution speed. As
it seems now, increases in processing speed the coming years
will largely be through parallell thread. If Python can't utilize
that well, we have a real problem.
>>5. I don't like numpy's array slicing. Array operations should be a
>>part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.
Python is not primarily a mathematics language. It's not a text
processing language either, so no regexp support directly in the
syntax. That might make it less ideal as a Matlab substitute, or
as a sed or awk substitute, but on the other hand, it's useful for
so many other things...
> Everything in Python is a reference to an object. I think the
> question you want is more on the lines of: Can I change an object that
> has been passed?
The key lies in understanding that "a=b" means "bind the local name
(unless declared global) "a" to the object the name "b" refers to.
It never means "copy the content of b into the location of a".
Actually, the Perl part was one of the last steps
in the Ruby recipe according to Matz:
Ruby is a language designed in the following steps:
* take a simple lisp language (like one prior to CL).
* remove macros, s-expression.
* add simple object system (much simpler than CLOS).
* add blocks, inspired by higher order functions.
* add methods found in Smalltalk.
* add functionality found in Perl (in OO way).
So, Ruby was a Lisp originally, in theory.
Let's call it MatzLisp from now on. ;-)
--from http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/179642
Regards,
--
Bil
http://fun3d.larc.nasa.gov
I don't disagree with this, but it's largely irrelevant to CPU-bound
number-crunching using numpy and its bretheren. In that case the bulk of
the work is going on in the array extensions, in C, and thus the GIL
should be released. Whether it actually is released, I can't say --
never having been blessed/cursed with a multiproccessing box, I haven't
looked into it.
[SNIP]
-tim
> 3) speed
>
> Speed matters less in Python than in other languages because Python
> plays so well with others. For many applications, NumPy is fine.
> Otherwise write your own C or C++ or F77; building the Python bindings
> is trivial. (F9* is problematic, though in fact we do some calling of
> F90 from Python using the Babel toolkit)
I have met no problems using F90 together with f2py -- in fact usually it
can bind f90 code to python completely automatically with no need to write
extra glue code.
The only problem was setting up Intel Fortran compiler and making it play
along with f2py, but I suppose the compiler business will become easier
when gfortran matures.
--
Pauli Virtanen
Thank you for the correction. I should have qualified my statement.
Our group must cope with F90 derived types to wrap a library that we
need. f2py fails to handle this case. While the f2py site alleges that
someone is working on this, I contacted the fellow and he said it was
on hold.
To our knowledge only the (officially unreleased) Babel toolkit can
handle F9* derived types. I would be pleased to know of alternatives.
To be fair, Babel remains in development, and so perhaps it will become
less unwieldy, and the developers have been very helpful to us. Still,
it certainly is not as simple a matter as f2py or swig.
mt
I think this is pretty fair, and yet .... the core Python interpreter has
perhaps doubled in speed (hardware held constant) since some years ago.
And new builtins like enumerate speed up code that needs to enumerate
sequence items (which is not uncommon). And class sets.Set is rewritten in
C as builtin type set primarily for speed.
It is certainly true that Guido regards continued correct performance of
the interperter to be more important that greater speed. Ditto for Python
programs.
> it seems now, increases in processing speed the coming years
> will largely be through parallell thread. If Python can't utilize
> that well, we have a real problem.
I believe it is Guido's current view, perhaps Google's collective view, and
a general *nix view that such increases can just as well come thru parallel
processes. I believe one can run separate Python processes on separate
cores just as well as one can run separate processes on separate chips or
separate machines. Your view has also been presented and discussed on the
pydev list. (But I am not one for thread versus process debate.)
> At
> least if writing threaded applications becomes less error prone
> in competing languages, this might well be the weak point of Python
> in the future.
Queue.Queue was added to help people write correct threaded programs.
> I hope to see some clever solution to this from the Python developers.
A Python developer is one who helps develop Python. Threading improvements
will have to come from those who want it enough to contribute to the
effort. (There have been some already.)
Terry Jan Reedy
This is hard to understand for an outsider. If you pass an int, a float,
a string or any other "atomic" object to a function you have "pass by
value" semantics. If you put a compound object like a list or a dictionary
or any other object that acts as an editable data container you can return
modified *contents* (list elements etc.) to the caller, exactly like in
Java and different from C/C++.
Peter Maas, AAchen
You are over-reacting. Keep in mind that sturlamolden has criticized
Python and not you :) I think there is a more convincing reply to
indentation phobia:
It is natural that compiler and programmer agree on how to identify
block structures. Anybody who disagrees should bang his code against
the left side or put everything in one line to get rid of annoying
line breaks. :)
Peter Maas, AAchen
A quick addition to Robert's very reasonable response to you. My point
is that to *trust* a simulation *results* (no matter how fast/slow/etc
you obtained it) you have to explore and manage the "physics" or
"biology" of your code. That's where Python's readability, flexibility,
and dynamism (including on-the-fly model building/testing/correction) as
well as model introspecting and exploration capabilities are of critical
importance and sometimes the indication to a missing link. It does not
hurt to remember that the original idea (by S.Ulam) of a computer was
the idea of an *experimentation environment* (including sampling). It
does not look like the Matlab's strongest point is the feedback-driven
experimentation. Or i'm missing smth about ISO C99?
Val Bykoski
> sturlamolden wrote:
>
>> 5. Versioning control? For each program there is only one developer and
>> a single or a handful users.
>
> I used to think like that up until two seconds before I entered this gem:
>
> $ rm `find . -name "*.pyc"`
>
> Okay, I didn't type it exactly like that; I was missing one character. I'll let
> you guess which.
I did that once. I ended up having to update decompyle to run with
Python 2.4 :-) Lost comments and stuff, but the code came out great.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
On Windows I have found that creating such bindings is very very
difficult... I have succed only partially, with C++, and I've had to
compile Python with MinGW (a compilation succed at about 95%). It's not
easy and surely not trivial (for me).
Bye,
bearophile
My own opinion on this, I think the indentation is probably one
biggest drawback which prevents wider Python acceptance. Indentation
makes all kinds of inlined code extremely clumsy or practically impossible
in Python. OK, I'll stop here, time to be called troll myself :(
Andy.
That's ok for me. I usually have lots of different things
happening on the same computer, but for someone who writes
an application and want to make his particular program faster,
there is not a lot of support for building simple multi-process
systems in Python. While multi-threading is nasty, it makes
it possible to perform tasks in the same program in parallel.
I could well imagine something similar to Twisted, where the
different tasks handled by the event loop were dispatched to
parallel execution on different cores/CPUs by some clever
mechanism under the hood.
If the typical CPU in X years from now is a 5GHz processor with
16 cores, we probably want a single Python "program" to be able
to use more than one core for CPU intensive tasks.
>>At
>>least if writing threaded applications becomes less error prone
>>in competing languages, this might well be the weak point of Python
>>in the future.
>
> Queue.Queue was added to help people write correct threaded programs.
What I'm trying to say is that multi-threaded programming
(in all languages) is difficult. If *other* languages manage
to make it less difficult than today, they will achieve a
convenient performance boost that Python can't compete with
when the GIL prevents parallel execution of Python code.
- For many years, CPU performance has increased year after
year through higher and higher processor clock speeds. The
number of instructions through a single processing pipeline
per second has increased. This simple kind of speed increase
is flattening out. The first 1GHz CPUs from Intel appeared
about five years ago. At that time, CPU speed still doubled
every 2 years. With that pace, we would have had 6GHz CPUs
now. We don't! Perhaps someone will make some new invention
and the race will be on again...but it's not the case right
now.
- The hardware trend right now, is to make CPUs allow more
parallel execution. Today, double core CPU's are becoming
common, and in a few years there will be many more cores
on each CPU.
- While most computers have a number of processes running in
parallel, there is often one process / application that is
performance critical on typical computers.
- To utilize the performance in these multi-core processors,
we need to use multi-threading, multiple processes or some
other technology that executes code in parallel.
- I think languages and application design patterns will evolve
to better support this parallel execution. I guess it's only
a few languages such as Erlang that support it well today.
- If Python isn't to get far behind the competition, it has
to manage this shift in how to utilize processor power. I
don't know if this will happen through core language changes,
or via some conveninent modules that makes fork/spawn/whatever
and IPC much more convenient, or if there is something
entirely different waiting around the corner. Something is
needed I think.
This is the only sensible argument against the indentation thing I've
heard. Python squirms about being inlined in a presentation template.
Making a direct competitor to PHP in pure Python is problematic.
While there are workarounds which are not hard to envision, it seems
like the right answer is not to inline small fragments of Python code
in HTML, which is probably the wrong approach for any serious work
anyway. This problem does, however, seem to interfere with adoption by
beginning web programmers, who may conceivably end up in PHP or perhaps
Perl Mason out of an ill-considered expedience.
Why this should matter in this discussion, about scientific
programming, escapes me though.
When you say "all kinds" of inlined code, do you have any other
examples besides HTML?
mt
At one time I also tried to make a simple "configuration file"
engine based on Python for a big Framework used in one physics lab.
Idea was to have a Python extension for that C++ framework and
to configure the Framework from Python code, like:
# Module means C++ Framework module, not Python
Module1.param1 = "a string"
Module2.paramX = [ 1, 2, 3 ]
# etc., with all Python niceties.
People who were using this Framework were all hard-core physicists,
some of them knew Fortran, many were exposed to C++. There were
few other "languages", some of them home-grown, used for different
tasks, but none of these mentioned languages ever placed so much
significance on the whitespaces. There were some big surprises for
people when they discovered they can't arbitrary indent pieces of
the above configuration files because it is all Python code. Add
here space/tabs controversy if it is not enough yet to confuse
poor physicist fellows :) I think that config file project was killed
later in favor of less restrictive format (I left the lab before that,
can't say for sure.)
Andy.
You mention makefiles and shell scripts as contexts unsympathetic to
Python's indentation requirements, but frankly you don't see much code
in any language except shell inlined in these contexts.
Given the makefile's requirement that significant leading whitespace be
tabs and not spaces and you have a recipe for disaster inlining any
language.
I suspect you should either have sufficiently trained your users in
Python, or have limited them to one-line statements which you could
then strip of leading whitespace before passing them to Python, or even
offered the alternative of one or the other. This would not have been
much extra work.
As for shell scripts generating Python code, I am not sure what you
were trying to do, but if you're going that far why not just replace
the shell script with a python script altogether?
os.system() is your friend.
I also agree with Steve that I can't see what this has to do with
makefiles. (But then I think "make" is a thoroughly bad idea in the
first place, and think os.system() is my friend.)
mt
> You mention makefiles and shell scripts as contexts unsympathetic to
> Python's indentation requirements, but frankly you don't see much code in
> any language except shell inlined in these contexts.
>
Shell's strength is in the process spawning/management and input/output
redirection, Python is rather weak in that area but OTOH Python is
strong in processing highly structured and numeric data, where shells
are really weak. I saw lots of awk or sed "code" embedded in scripts
so your claim that nothing except sheel is being inlined does not look
right to me.
> Given the makefile's requirement that significant leading whitespace be
> tabs and not spaces and you have a recipe for disaster inlining any
> language.
>
I saw makefiles with thousands lines of Perl code in them. I agree this
(Perl) is disaster, but it would probably be better if it was Python code
instead.
Andy.
Actually os.system() is rather poor replacement for the shell's
capabilities, and it's _very_ low level, it's really a C-level code
wrapped in Python syntax. Anyway, to do something useful you need
to use all popen() stuff, and this is indeed infinitely complex
compared to the easy shell syntax.
Andy.
In my experience, embedding any of make/sh/awk/sed in
any of the others is a nightmare of singlequote/
doublequote/backslash juggling that makes a few
tab/space problems in Python pale by comparison.
--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
> Actually os.system() is rather poor replacement for the shell's
> capabilities, and it's _very_ low level, it's really a C-level code
> wrapped in Python syntax.
Since os.system() spawns a shell to execute the command,
it's theoretically capable of anything that the shell
can do. It's somewhat inelegant having to concatenate
all the arguments into a string, though.
I gather there's a new subprocess management module
coming that's designed to clean up the mess surrounding
all the popen() variants. Hopefully it will make this
sort of thing a lot easier.
> This is hard to understand for an outsider. If you pass an int, a float,
> a string or any other "atomic" object to a function you have "pass by
> value" semantics. If you put a compound object like a list or a dictionary
> or any other object that acts as an editable data container you can return
> modified *contents* (list elements etc.) to the caller, exactly like in
> Java and different from C/C++.
There's really no difference here -- when you pass an
int, you're passing a pointer to an int object, just
the same as when you pass a list, you're passing a
pointer to a list object. It's just that Python
doesn't provide any operations for changing the
contents of an int object, so it's hard to see
the difference.
The similarity is brought out by the following
example:
>>> def a(x):
... x = 42
...
>>> def b(x):
... x = [42]
...
>>> y = 3
>>> a(y)
>>> print y
3
>>> y = [3]
>>> b(y)
>>> print y
[3]
What this shows is that assignment to the parameter
*name* never affects anything outside the function,
regardless of whether the object passed in is mutable
or immutable.
It's best to avoid using terms like "by reference" when
talking about Python parameter passing, because it's
hard to tell whether the person you're talking to
understands the same thing by them. But if you
insist, the correct description in Algol terms is
that Python passes pointers to objects by value.
I find that version control (VC) has many advantages for
scientific research (I am a physicist).
1) For software as Robert mentions I find it indispensable.
2) Keeping track of changes to papers (as long as they are plain text
like LaTeX). This is especially useful for collaborations: using
the diff tools one can immediately see any changes a coauthor may
have made.
(I even use branching: maintaining one branch for the journal
submission which typically has space restrictions, and another for
preprint archives which may contain more information.)
3) Using VC allows you to easily bring another computer up to date
with your current work. If I go to a long workshop and use local
computing resources, I simply checkout my current projects and I
can work locally. When I am done, I check everything back in and
when I get home, I can sync my local files.
-------
Another aspect of python I really appreciate are the unit testing
facilities. The doctest, unittest, and test modules make it easy to
include thorough tests: crucial for making sure that you can trust the
results of your programs. Organizing these tests in MATLAB and with
other languages was such a pain that I would often be tempted to omit
the unit tests and just run a few simulations, finding errors on the
fly.
Now I almost always write unit tests along with---or sometimes
before---I write the code.
Michael.
Greg Wilson also makes that point in this note:
http://www.nature.com/naturejobs/2005/050728/full/nj7050-600b.html
Where he describes his excellent (Python Software Foundation sponsored)
course on software carpentry for scientists:
http://www.third-bit.com/swc2/index.html
Regards, Phil