[slurm-users] how to check what slurm is doing when job pending with reason=none?

718 views
Skip to first unread message

taleint...@sjtu.edu.cn

unread,
Jun 16, 2021, 6:39:54 AM6/16/21
to slurm...@lists.schedmd.com

Hello,

 

Recently we notice a strange delay from job-submitting to job-start while the partition is sure to have enough idle nodes to meet the job’s demand. To avoid interference, we use the 4-node debug partition for test, which does not have any other job to run. And the test job script is also as simple as possible:

 

#!/bin/bash

 

#SBATCH --job-name=test

#SBATCH --partition=debug

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=1

#SBATCH --output=%j.out

#SBATCH --error=%j.err

 

hostname

sleep 1000

echo end

 

But after submit, this job still stay at PENDING state for about 30-60s and during the pending time sacct shows the REASON is “None”. We have also checked the slurmctld.log at server and slurmd.log at client node with debug log level. Both of them have nothing useful to figure out the pending reason.

 

So is there any way to make slurm explain in detail why the job didn’t start immediately or what it was doing during the job pending time?

 

 

Thanks.

Gerhard Strangar

unread,
Jun 16, 2021, 12:27:30 PM6/16/21
to Slurm User Community List
taleint...@sjtu.edu.cn wrote:

> But after submit, this job still stay at PENDING state for about 30-60s and
> during the pending time sacct shows the REASON is "None".

It's the default sched_interval=60 in your slurm.conf.

Gerhard

taleint...@sjtu.edu.cn

unread,
Jun 17, 2021, 10:28:53 PM6/17/21
to Slurm User Community List
Thanks for the help. We tried to reduce the sched_interval and the pending
time decreased as expected.

But the influence of 'sched_interval' is global, setting it too small may
put pressure on slurmctld server. Since we only want quick response on debug
partition (which is designed to let user frequently submitting debug jobs
without waiting), is it possible to make slurm do immediate schedual on the
specific partition no matter how long the job queue is?

-----邮件原件-----
发件人: Gerhard Strangar <g...@arcor.de>
发送时间: 2021年6月17日 0:27
收件人: Slurm User Community List <slurm...@lists.schedmd.com>
主题: Re: [slurm-users] how to check what slurm is doing when job pending
with reason=none?

Fulcomer, Samuel

unread,
Jun 17, 2021, 11:13:21 PM6/17/21
to Slurm User Community List
You can specify a partition priority in the partition line in slurm.conf, e.g. Priority=65000 (I forget what the max is...)
Reply all
Reply to author
Forward
0 new messages