[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster

Rohit Garg

unread,

Nov 9, 2009, 1:11:29 PM11/9/09

to SciPy Users List, numpy-di...@scipy.org

Hi all,

I have an embarrassingly parallel problem, very nicely suited to
parallelization. I am looking for community feedback on how to best
approach this matter? Basically, I just setup a bunch of tasks, and
the various cpu's will pull data, process it, and send it back. Out of
order arrival of results is no problem. The processing times involved
are so large that the communication is effectively free, and hence I
don't care how fast/slow the communication is. I thought I'll ask in
case somebody has done this stuff before to avoid reinventing the
wheel. Any other suggestions are welcome too.

My only constraint is that it should be able to run a python extension
(c++) with minimum of fuss. I want to minimize the headaches involved
with setting up/writing the boilerplate code. Which
framework/approach/library would you recommend?

There is one method mentioned at [1], and of course, one could resort
to something like mpi4py.

[1] http://docs.python.org/library/multiprocessing.html {see the last example}

--
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Gael Varoquaux

unread,

Nov 9, 2009, 1:17:13 PM11/9/09

to SciPy Users List, numpy-di...@scipy.org

On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
> Hi all,

> I have an embarrassingly parallel problem, very nicely suited to
> parallelization.

A non-optimal solution that I like:
http://gael-varoquaux.info/blog/?p=119

Gaël

Anne Archibald

unread,

Nov 9, 2009, 1:18:46 PM11/9/09

to SciPy Users List

2009/11/9 Rohit Garg <rpg...@gmail.com>:

> Hi all,
>
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. I am looking for community feedback on how to best
> approach this matter? Basically, I just setup a bunch of tasks, and
> the various cpu's will pull data, process it, and send it back. Out of
> order arrival of results is no problem. The processing times involved
> are so large that the communication is effectively free, and hence I
> don't care how fast/slow the communication is. I thought I'll ask in
> case somebody has done this stuff before to avoid reinventing the
> wheel. Any other suggestions are welcome too.
>
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?

For our pulsar searches, we pick about the simplest possible method.
Each job is set up so that you run it from a UNIX shell in a directory
containing all the needed files, and it saves any output to a common
directory. We then submit jobs to the PBS batch system. We have some
minor complications to this setup because copying the input data is
quite network-intensive, so we make sure only one job starts at a
time, but other than that the jobs have no interaction at all.

Anne

Pauli Virtanen

unread,

Nov 9, 2009, 1:28:39 PM11/9/09

to scipy...@scipy.org

Mon, 09 Nov 2009 23:41:29 +0530, Rohit Garg wrote:
[clip: embarassingly parallel problems]

With multiprocessing, using Pool.imap_unordered to apply a computation
function to a list of parameter sets is one good alternative. (IIRC, it
balances load between subprocesses &c automatically.) Multiprocessing can
however work on only one node at a time.

With mpi4py, it's probably best to write a simple master-slave
architecture.

--
Pauli Virtanen

Rohit Garg

unread,

Nov 9, 2009, 1:28:55 PM11/9/09

to SciPy Users List

On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux
<gael.va...@normalesup.org> wrote:
> On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
>> Hi all,
>
>> I have an embarrassingly parallel problem, very nicely suited to
>> parallelization.
>
> A non-optimal solution that I like:
> http://gael-varoquaux.info/blog/?p=119

Thanks, for the pointer, but after a quick read, it doesn't look like
it supports distributed memory parallelism. Or does it?

>
> Gaël
> _______________________________________________
> SciPy-User mailing list
> SciPy...@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

--
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay

Gael Varoquaux

unread,

Nov 9, 2009, 1:35:15 PM11/9/09

to SciPy Users List

On Mon, Nov 09, 2009 at 11:58:55PM +0530, Rohit Garg wrote:
> On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux
> <gael.va...@normalesup.org> wrote:
> > On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
> >> Hi all,

> >> I have an embarrassingly parallel problem, very nicely suited to
> >> parallelization.

> > A non-optimal solution that I like:
> > http://gael-varoquaux.info/blog/?p=119

> Thanks, for the pointer, but after a quick read, it doesn't look like
> it supports distributed memory parallelism. Or does it?

If by distributed memory you mean shared memory, you won't get this, but
the copy on write of Unix gives you part of it, but not all of it. One
hack is to use memmapping to a file to share memory between processes (it
won't cost you IO, because your OS will be smart-enough to cache
everything). The right way to do it is to use a shared memory array,
which Sturla and I started working on ages ago, but never found time to
integrate to numpy.

If you mean parallelism on architectures where 'fork' won't distributes
the processes (like a cluster), than multiprocessing won't do the trick,
and you will need to look at IPython or parallel Python.

James Coughlan

unread,

Nov 9, 2009, 1:18:03 PM11/9/09

to SciPy Users List

Rohit Garg wrote:
> Hi all,
>
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. I am looking for community feedback on how to best
> approach this matter? Basically, I just setup a bunch of tasks, and
> the various cpu's will pull data, process it, and send it back. Out of
> order arrival of results is no problem. The processing times involved
> are so large that the communication is effectively free, and hence I
> don't care how fast/slow the communication is. I thought I'll ask in
> case somebody has done this stuff before to avoid reinventing the
> wheel. Any other suggestions are welcome too.
>
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?
>
> There is one method mentioned at [1], and of course, one could resort
> to something like mpi4py.
>
> [1] http://docs.python.org/library/multiprocessing.html {see the last example}
>
>

Hi,

I've never done any parallel processing, but you might consider
Shedskin, a Python to C++ compiler, which makes it easy to convert
Python functions into fast C++ modules, and offers support for parallel
processing:

http://code.google.com/p/shedskin/

Best,

James

--
-------------------------------------------------------
James Coughlan, Ph.D., Scientist

The Smith-Kettlewell Eye Research Institute

Email: coug...@ski.org
URL: http://www.ski.org/Rehab/Coughlan_lab/
Phone: 415-345-2146
Fax: 415-345-8455
-------------------------------------------------------

David Baddeley

unread,

Nov 9, 2009, 3:56:31 PM11/9/09

to SciPy Users List

Hi Rohit,

I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code.

I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into.

I can post the code for a minimal task server/client if you like.

best wishes,
David

--- On Tue, 10/11/09, Rohit Garg <rpg...@gmail.com> wrote:

Wes McKinney

unread,

Nov 9, 2009, 4:03:18 PM11/9/09

to SciPy Users List

Here's a little parallel processing library using Pyro which might be
of interest to some:

http://code.google.com/p/papyros/

Karl Young

unread,

Nov 9, 2009, 4:13:05 PM11/9/09

to SciPy Users List

If I were starting from scratch I'd learn how to do this using Ipython:

http://ipython.scipy.org/moin/

but in the past I had great luck using pypar, a very simple interface to
the MPI library; for embarrassingly parallel problems it's very easy to
get up and running quickly by just copying examples (I knew a wee bit of
MPI before using this but not much).

http://datamining.anu.edu.au/~ole/pypar/

-- KY

> .

Luis Pedro Coelho

unread,

Nov 12, 2009, 9:37:36 AM11/12/09

to scipy...@scipy.org

Rohit Garg wrote:
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization.

I have lots of those :)

> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?

My own: It's called jug. See

http://luispedro.org/software/jug

(
Or download the code from github:
http://github.com/luispedro/jug
)

*

It works with any set of processors that can either share a filesystem (plays
well with NFS, but can be slow) or a connection to a redis database (which is
very easy to set up and is probably as fast as any other approach if everyone
is on the same processor).

A major advantage is that you write mostly Python (and not something funny
looking). For example, here's what a programme with that framework would look
like:

@TaskGenerator
def preprocess(input):
...

@TaskGenerator
def compute(input, param):
...

@TaskGenerator
def collect(inputs):
...

results = []
for input in glob('*.in'):
intermediate = preprocess(input)
results.append(compute(intermediate, param))
final = collect(results)

The only step that's different w.r.t. to the linear version is adding the
TaskGenerator decorator, which changes a call of preprocess(input) into
Task(preprocess, input).

Jug handles everything else.

I have been using this now for almost year for all my research work and it
works very well for me.