Can anyone please give me a working slurm.conf configuration file? Slurm seems to be allocating resources but only one node is spiking in cpu load. The other nodes are doing nothing.
Any help is really appreciated!
For the reference, here are are the config files:
- I execute "run.sh" from sbatch (it uses slurm itself). Here is the header of "run.sh":
#SBATCH --job-name=LDA_Kaldi
#SBATCH --account=ID
#SBATCH --output=lda.out
#SBATCH --partition=normal
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=60000
#SBATCH --time=1-00:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=EMAIL
- Here is my slurm.conf:
▽
command sbatch --export=PATH --ntasks-per-node=1 --partition=normal --nodes=8 --mem=64000 --time=24:00:00 --job-name=KALDI --account=ID
option mem=* --mem-per-cpu=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* --cpus-per-task=$0 --ntasks-per-node=1
option num_threads=1 --cpus-per-task=1 --ntasks-per-node=1 # Do not add anything to qsub_opts
option max_jobs_run=* # Do nothing
option gpu=* -N1 -n1 -p gpu --mem=4GB --gres=gpu:$0 --cpus-per-task=6 --time=72:0:0 # in reality, we probably should have --cpus-per-task=$((6*$0))
option gpu=0
and here is my cmd.sh:
▽
export train_cmd="slurm.pl --config conf/slurm.conf" Where am I going wrong?
Thank you!