specifying an alternate machine file with openmpi

93 views
Skip to first unread message

Peterson, Kirk

unread,
Jun 7, 2013, 1:06:34 PM6/7/13
to <dirac-users@googlegroups.com>
Dear Dirac experts,

Do to memory constraints on a HPC cluster I'm starting to run Dirac on, I need to allocate entire nodes to my Dirac jobs but only use 1 single processor on each node. Thus I'm having to use an alternative to the usual $PBS_NODEFILE machinefile in my openmpi runs. In my torque qsub script, I might set the following for a 2-node/processor run (each node has 12 cores):

#PBS -l nodes=2:ppn=12 ! allocate 2 entire nodes

I generate a machinefile via:

cat $PBS_NODEFILE | uniq > $base/.nodefile.$$

and call pam with something like:

pam --mw 7000 --aw 20000 --noarch --mpi=2 --machfile=$base/.nodefile.$$ --inp=test.inp --mol=test.mol


The resulting machinefile is generated correctly:

contents of $base/.nodefile.$$ :

node41
node39

The standard out from Dirac seems to confirm this:

Creating the scratch directory.
Copying file " dirac.x " to scratch dir.
pam: Copying u.mol to /scratch/kipeters/DIRAC_u1_u_31621/MOLECULE.MOL
pam: Copying u1.inp to /scratch/kipeters/DIRAC_u1_u_31621/DIRAC.INP
Machinefile read, list of unique nodes obtained: ['node39']
Copying selected content of master scratch directory to nodes : ['node39']
scp /scratch/kipeters/DIRAC_u1_u_31621/dirac.x node39:/scratch/kipeters/DIRAC_u1_u_31621/dirac.x
scp /scratch/kipeters/DIRAC_u1_u_31621/MOLECULE.MOL node39:/scratch/kipeters/DIRAC_u1_u_31621/MOLECULE.MOL
scp /scratch/kipeters/DIRAC_u1_u_31621/DIRAC.INP node39:/scratch/kipeters/DIRAC_u1_u_31621/DIRAC.INP

But I just now noticed that the final mpirun command doesn't contain a --machinefile option:

DIRAC command : /home/clarklab/kipeters/lib/openmpi-1.6.3/bin/mpirun -np 2 /scratch/kipeters/DIRAC_u1_u_31621/dirac.x (PID=31631)

In the output file from Dirac, I get the following:

** interface to 64-bit integer MPI enabled **

DIRAC master (node41) starts by allocating 7000000000 words ( 53405 MB) of memory
DIRAC node 1 (node41) starts by allocating 7000000000 words ( 53405 MB) of memory
DIRAC master (node41) to allocate at most 14000000000 words ( 106811 MB) of memory

Note: maximum allocatable memory for master+nodes can be set by -aw flag (MW) in pam

DIRAC node 1 (node41) to allocate at most 14000000000 words ( 106811 MB) of memory


Which shows that Dirac is using the 1st 2 entries in $PBS_NODEFILE and not my new machinefile. As the job is running, I also ssh'd to node39 and confirmed there was no dirac.x process there and there were 2 running on node41.

Is there an easy hack to pam to add my new machinefile or is it more complicated than this?

thanks in advance and apologies for what turned into a long post,

-Kirk



Radovan Bast

unread,
Jun 7, 2013, 1:11:12 PM6/7/13
to dirac...@googlegroups.com
dear Kirk,
please try the pam --mpiarg flag. with this you can give additional mpirun
flags explicitly.
without --mpiarg, the machfile is not passed on to mpirun exactly as you observe.
by default pam uses machfile only to distribute files.
good luck!
  radovan





--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Peterson, Kirk

unread,
Jun 7, 2013, 1:32:39 PM6/7/13
to <dirac-users@googlegroups.com>
Dear Radovan,

shortly after I sent my email, I discovered the --mpiarg flag and after feeding it a --machinefile option, this did the trick as you also note below.

thanks for the quick reply (on probably a Friday evening for you no less),

-Kirk

Radovan Bast

unread,
Jun 7, 2013, 1:46:00 PM6/7/13
to dirac...@googlegroups.com
hi Kirk,
very good - glad it works. have a good weekend!
  radovan
Reply all
Reply to author
Forward
0 new messages