[slurm-users] siesta jobs with slurm, an issue

343 views
Skip to first unread message

Mahmood Naderan

unread,
Jul 22, 2018, 12:08:47 PM7/22/18
to Slurm User Community List
Hi,
I don't know why siesta jobs are aborted by the slurm.

[mahmood@rocks7 sie]$ cat slurm_script.sh
#!/bin/bash
#SBATCH --output=siesta.out
#SBATCH --job-name=siesta
#SBATCH --ntasks=8
#SBATCH --mem=4G
#SBATCH --account=z3
#SBATCH --partition=EMERALD
mpirun /share/apps/chem/siesta-4.0.2/spar/siesta prime.fdf prime.out
[mahmood@rocks7 sie]$ sbatch slurm_script.sh
Submitted batch job 783
[mahmood@rocks7 sie]$ squeue --job 783
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[mahmood@rocks7 sie]$ cat siesta.out
Siesta Version  : v4.0.2
Architecture    : x86_64-unknown-linux-gnu--unknown
Compiler version: GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Compiler flags  : mpifort -g -O2
PP flags        : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
PARALLEL version

* Running on    8 nodes in parallel
>> Start of run:  22-JUL-2018  20:33:36

                           ***********************
                           *  WELCOME TO SIESTA  *
                           ***********************

reinit: Reading from standard input
************************** Dump of input data file ****************************
************************** End of input data file *****************************

reinit: -----------------------------------------------------------------------
reinit: System Name:
reinit: -----------------------------------------------------------------------
reinit: System Label: siesta
reinit: -----------------------------------------------------------------------
No species found!!!
Stopping Program from Node:    0

initatom: Reading input for the pseudopotentials and atomic orbitals ----------
No species found!!!
Stopping Program from Node:    0
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[mahmood@rocks7 sie]$



However, I am able to run that command with "-np 4" on the head node. So, I don't know is there any problem with the compute node or something else.

Any idea?

Regards,
Mahmood



Bill Barth

unread,
Jul 22, 2018, 12:15:31 PM7/22/18
to Slurm User Community List
That doesn't look like a slurm problem to me necessarily. Looks like SIESTA quit of its own volition (thus the call to MPI_ABORT()). I suggest you ask your local site support to take a look or go to the SIESTA developers. I doubt you'll find any SIESTA experts here to help you.

All I can suggest is to check that all the paths you have provided SIESTA are correct (the path to the executable is clearly fine b/c SIESTA starts, but can it fine prime.fdf?). Otherwise start with your local support team.

Best,
Bill.

--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445

Mahmood Naderan

unread,
Jul 22, 2018, 1:12:48 PM7/22/18
to Slurm User Community List
I am able to directly run the command on the node. Please note in the following output that I have pressed ^C after some minutes. So, the errors are related to ^C.


[mahmood@compute-0-3 ~]$ mpirun -np 4 /share/apps/chem/siesta-4.0.2/spar/siesta dimer1prime.fdf dimer1prime.out

Siesta Version  : v4.0.2
Architecture    : x86_64-unknown-linux-gnu--unknown
Compiler version: GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Compiler flags  : mpifort -g -O2
PP flags        : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
PARALLEL version

* Running on    4 nodes in parallel
>> Start of run:  22-JUL-2018  21:39:41


                           ***********************
                           *  WELCOME TO SIESTA  *
                           ***********************

reinit: Reading from standard input
************************** Dump of input data file ****************************
^C--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
No species found!!!
Stopping Program from Node:    0
************************** End of input data file *****************************

reinit: -----------------------------------------------------------------------
reinit: System Name:
reinit: -----------------------------------------------------------------------
reinit: System Label: siesta
reinit: -----------------------------------------------------------------------

initatom: Reading input for the pseudopotentials and atomic orbitals ----------
No species found!!!
Stopping Program from Node:    0


Regards,
Mahmood



John Hearns

unread,
Jul 22, 2018, 1:25:36 PM7/22/18
to Slurm User Community List
Are you very sure that the filesystem with the input file is mounted on the compute nodes? 
Try to cat the file.

Mahmood Naderan

unread,
Jul 22, 2018, 1:37:04 PM7/22/18
to Slurm User Community List
Yes. Since with my user account, I can not login to nodes, I first ssh to the node via root and the su there.

[root@rocks7 ~]# ssh compute-0-3
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Sun Jul 22 21:40:09 2018 from rocks7.local
Rocks Compute Node
Rocks 7.0 (Manzanita)
Profile built 19:21 11-Apr-2018

Kickstarted 19:37 11-Apr-2018
[root@compute-0-3 ~]# su - mahmood
Last login: Sun Jul 22 21:39:25 +0430 2018 on pts/2
[mahmood@compute-0-3 ~]$ cd sie
[mahmood@compute-0-3 sie]$ cat dimer1prime.fdf
# $Id: sih.fdf,v 1.1 1999/04/20 14:43:44 emilio Exp $
# -----------------------------------------------------------------------------
# FDF for interstitial H in a cubic c-Si supercell with 64 atoms
#
# E. Artacho, April 1999
# -----------------------------------------------------------------------------

SystemName          prime
SystemLabel         prime

NumberOfAtoms       168
NumberOfSpecies     5

....
....
....

Regards,
Mahmood



Renfro, Michael

unread,
Jul 22, 2018, 2:43:50 PM7/22/18
to Slurm User Community List
You’re getting the same fundamental error in both the interactive and batch version, though.

The ‘reinit: Reading from standard input’ line seemed off, since you were providing an argument for the input file. But all the references I find to running Siesta in their manual (section 3 and section 16) show something more like:

mpirun -np 4 /share/apps/chem/siesta-4.0.2/spar/siesta < dimer1prime.fdf > dimer1prime.out

and those examples line up with the idea that Siesta reads its commands from standard input, not by literally opening an input file specified as a command-line argument.

If the version of the command using < and > works correctly, then it’s definitely not a Slurm issue, it’s an issue with how you invoked Siesta.

Mahmood Naderan

unread,
Jul 22, 2018, 3:54:49 PM7/22/18
to Slurm User Community List
Thanks for the hint. In fact the siesta user wasted my time too!!
:/

Regards,
Mahmood



Reply all
Reply to author
Forward
0 new messages