Spawn child mpi executable with different machinefile than parent using mpi4py

1,015 views
Skip to first unread message

Ciaran Murray

unread,
Dec 7, 2011, 10:50:31 AM12/7/11
to mpi4py
Hi all,

I am trying to spawn a seperate (non-python) mpi process using mpi4py
using Spawn. My problem is that I would like tell the spawned child
process to use a different machinefile list than the parent.
So if I launch my master python script like this:

mpirun -machinefile parent_nodefile -np 6 master-01.py

I would like when MPI.comm.Spawn(...) to do the equivalent of this:

mpirun -machinefile child_nodefile -np 8 child-01.exe.

I assume one can do this using the info=XXX option of Spawn() but I
don't know how to create or what format the new Info object should be.

Thanks in advance (I'm an eternal optimist!),

Ciarán

Ciaran Murray

unread,
Dec 12, 2011, 10:17:49 AM12/12/11
to mpi4py
Hi again,

Just replying to my own question, in case there is someone as thick as
me out there :)

As I thought intially but couldn't imediately find the correct syntax
and options, the solution is to create a new Info object and Set the
hostfile to the name of the hostfile I want the child process to use.
As an example, to run an mpi4py script ("child-01.py") from a parent
using a different hostfile ("child_nodefile"), I used the following,

myinfo = MPI.Info.Create()
myinfo.Set("hostfile", "child_nodefile")
child_args = "child-01.py"
child_spawned = MPI.COMM_SELF.Spawn(sys.executable, args=child_args,
maxprocs=procs, info=myinfo)

Hope that will be of use to somebody else.

Also, while I'm here, does anyone know of any problems associated with
simultaneously spawning multiple instances of the same programme/
binary (with different arguements of course)?
I found I can only get this to work by putting a ~20 ms delay between
spawns. This extra time delay is okay because my child processes
spawned take a few minutes to calculate but I'm more worried about the
fact that after a syncronised start of the master processes each rank
runs independently without much syncronisation but may become a
problem if two processes try to spawn a child process at the same
time.

Ciarán

Aron Ahmadia

unread,
Dec 12, 2011, 10:23:00 AM12/12/11
to mpi...@googlegroups.com
I've noticed that OS X falls over if you try to spawn too many processes for a single job, but had not considered whether it was connected to a resource issue.  I am guessing this is implementation-dependent, so OpenMPI and MPICH may have different properties here.

A

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To post to this group, send email to mpi...@googlegroups.com.
To unsubscribe from this group, send email to mpi4py+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.


Ciaran Murray

unread,
Dec 12, 2011, 11:21:18 AM12/12/11
to mpi4py
Thanks Aron for your comments. The problem does appear to be with
OpenMPI and probably not the OS (I'm using linux on a PBS/Torque
cluster). The error given (ORTE_ERROR_LOG: Not found in file base/
plm_base_launch_support.c at line 758) has been reported for other
OpenMPI scripts when spawning is not serialised properly creating a
race condition. The work around of including a delay has also been
used by other programmers. My own code uses a master-worker pattern so
it should be easy for the workers to call the master before they spawn
and add a delay there. As I said before, the spawned child processes
take a long time (greater than a few minutes) so adding a few
milliseconds shouldn't be a major issue.

Ciarán

On Dec 12, 3:23 pm, Aron Ahmadia <a...@ahmadia.net> wrote:
> I've noticed that OS X falls over if you try to spawn too many processes
> for a single job, but had not considered whether it was connected to a
> resource issue.  I am guessing this is implementation-dependent, so OpenMPI
> and MPICH may have different properties here.
>
> A
>

Reply all
Reply to author
Forward
0 new messages