I am trying to write a python script to launch multiple instances of a
second slave mpi program (written in C++) in parallel.
The slave process is started with the following command: mpirun -np x
executable < input.in
I understand from the tutorial how to use Spawn() to start a number of
slave processes, but I don't understand how to launch a slave process
that is itself initiated by mpirun / mpiexec. Can anyone help me with
the proper use of Spawn() for this purpose?
So far I have achieved the most success by using spawn to launch another
.py script that uses os.system('mpirun -np ....') to run the slave
processes. Then they do run in parallel, But only if I do not import
MPI4PY in the slave .py script. The problem then is that the master
process hangs because it doesn't get a signal from the slave that it has
finished. If I do import MPI4PY the slave an master processes all die
with the error "OOB: Connection to HNP lost".
Any guidance is greatly appreciated. Thank you,
Adam
--
Adam F. Wallace
Geological Postdoctoral Fellow
Center for Nanoscale Control of Geologic CO2
Lawrence Berkeley National Laboratory
http://foundry.lbl.gov/deyoreogroup/page3/page17/page17.html
Perhaps you are a bit confused about how to use Spawn()? You should
NEVER spawn using mpiexec. In some sense, Spawn() is a programmatic
replacement for mpiexec. In order to spawn a python script in
parallel, you just comm_child = comm_parent.Spawn(sys.executable,
["yourscript.py", "arg1", "arg2"], maxnprocs=5). This would launch
yourscript.py in 5 processes, more or less like "mpiexec -n 5 python
yourscript.py arg1 arg2" would do, but when using Spawn() you can
perform child<->parent MPI communication. IMHO, this is the preferred
way to start a slave group of processes, it is easy to use, and there
are chances that this work better across MPI implementations and
platforms.
However, if you REALLY want to start the slave group in advance with
mpiexec and next perform communication, then you need other
functionalities: name publishing and accept/connect. Look in this
demo:
http://code.google.com/p/mpi4py/source/browse/trunk/demo/compute-pi/cpi-dpm.py#110
I admit the whole demo could be a bit confusing, as it mix rather
different concept. Think of main_server() as the entry point of your
slave app that you launch in advance with mpiexec -n ..., and
main_client() is the entry point of your script in charge of
coordinate computations performed by the slave.
An I being clear enough about this? As always, I apologize for the
lack of better docs and examples in mpi4py. Perhaps I should add
separate examples about how to use spawning, name publishing, and
accept/connect ?
--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169