> When I ran test15.17 of regression testsuite, then slurm controller died (
> can be reproduced ). Could someone please look into to find the cause ?
> I have a cluster of 12 JS22 blade nodes (ppc64).
>
> Here is the output of test15.17:
>
> ============================================
> TEST: 15.17
> spawn /usr/bin/salloc -N1-4 -t1 /bin/bash^M
> salloc: Granted job allocation 22^M
> /usr/bin/sbatch --jobid=22 -o none -e none test15.17.input ^M
> cluster-ib-1:~/SLURM/testsuite/expect # /usr/bin/sbatch --jobid=22 -o none
> -e no ^Mne test15.17.input ^M
> sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or
> received^M
> sbatch: error: Batch job submission failed: Zero Bytes were transmitted or
> received^M
> cluster-ib-1:~/SLURM/testsuite/expect #
> FAILURE: salloc not responding
> cancelling 22
> test15.17 FAILURE
> ============================================
>
> And the log of slurmctld.log at the time it died:
> ==============================
> cluster-ib-1:~/SLURM/testsuite/expect # tail /tmp/slurm/slurmctld.log
> [Jul 23 09:12:53] completing job 21
> [Jul 23 09:12:53] job_complete for JobId=21 successful
> [Jul 23 09:12:54] _slurm_rpc_allocate_resources JobId=22
> NodeList=cluster-ib-[1-4] usec=137
> [Jul 23 09:12:54] user 0 attempting to run batch script within an existing
> job
> ==============================
>
>
> Regards,
>
> Hien Nguyen
> Linux Technology Center (Austin)
> Phone: (512) 838-4140 Tie Line: 678-4140
> e-mail: hi...@us.ibm.com
>
Unfortunately, Open MPI does not yet support using srun to launch MPI
processes without our mpirun. We OMPI developers have talked about
it, and wavered back and forth on whether we're going to do it or
not. So far, we haven't done it. I can't predict what will happen in
future versions of OMPI, I can tell you that support for "srun ...
ompi_mpi_executable" will *not* be included in the upcoming OMPI
v1.3. Sorry. :-(
However, there are two simple workarounds:
1. s/srun/salloc/ and use our mpirun, meaning:
$ salloc -N 2 -n 2 mpirun ./helloMPIworld
2. Or put the mpirun command in a script and launch it via sbatch:
$ cat > myscript <<EOF
mpirun ./helloMPIworld
EOF
$ chmod +x myscript
$ sbatch -N 2 -n 2 myscript
To be absolutely clear: Open MPI *does* use SLURM support under the
covers to effect mpirun's launching of remote processes, etc. We just
don't support "srun ... helloMPIworld".
Hope that helps.
On Jul 23, 2008, at 9:16 AM, Dirk Eddelbuettel wrote:
>
> I am preparing some example script for a presentation that covers
> among other
> things OpenMPI and then slurm. I hit a small conceptual snag with
> srun.
>
> Using a 'hello World' mpi example, I get the correct rank/size
> output using
> OpenMPI's orterun:
>
> $ orterun -n 2 -H ron,mccoy ./helloMPIworld
> Hello, rank 0, size 2 on processor ron
> Hello, rank 1, size 2 on processor mccoy
>
> Two hosts, total size 2, ranks 0 and 1 -- and I was thinking I could
> get that
> too using srun. But I always end up with rank 0 and size 1.
>
> $ srun -m arbitrary -w ron,mccoy -n2 helloMPIworld
> Hello, rank 0, size 1 on processor ron
> Hello, rank 0, size 1 on processor mccoy
> $ srun -N 2 -n 2 ./helloMPIworld
> Hello, rank 0, size 1 on processor ron
> Hello, rank 0, size 1 on processor mccoy
>
> Am I sinply misunderstanding how this is supposed to work?
>
> I was under the impression that srun does what orterun does (plus of
> course a
> slew of other things). Are the different MPI instances that are
> launched
> aware of each other or not?
>
> I am using Debian with version 1.2.7 of OpenMPI and 1.3.4 of slurm.
>
> Dirk
>
> --
> Three out of two people have difficulties with fractions.
--
Jeff Squyres
Cisco Systems
Am Mittwoch, den 23.07.2008, 09:48 -0400 schrieb Jeff Squyres:
> 2. Or put the mpirun command in a script and launch it via sbatch:
>
> $ cat > myscript <<EOF
> mpirun ./helloMPIworld
> EOF
> $ chmod +x myscript
> $ sbatch -N 2 -n 2 myscript
This is how I use OpenMPI via SLURM and can confirm that it works like
expected. Changing the exectutable bits is optional; sbatch will happily
execute the script without it being executable. TTBOMK, a shebang is
mandatory, though.
Another nice thing is that you can pass the sbatch parameters by using
lines starting with "#SBATCH", i.e.
$ cat >myscript <<EOF
#!/bin/sh
# SBATCH -n 2
# SBATCH -N 2
# SBATCH --mail-type ALL
mpirun ./helloMPIworld
EOF
Best regards
Manuel
| jet...@llnl.gov
Sent by: owner-s...@lists.llnl.gov 07/23/2008 10:48 AM
|
|
Attachment converted: Macintosh HD:slurmctld.log ( / ) (00DE6988)
Attachment converted: Macintosh HD:slurm.conf ( / ) (00DE6989)
--
| jet...@llnl.gov
Sent by: owner-s...@lists.llnl.gov 07/23/2008 12:53 PM
|
|