[slurm-users] Requested partition configuration not available now

2,078 views
Skip to first unread message

Mahmood Naderan

unread,
May 16, 2018, 3:13:24 AM5/16/18
to Slurm User Community List
Hi,
After creating an account and a partition, I get an error that
requested partition configuration not available now. Although I
restarted the services on all nodes, I wonder why that happen?

[root@rocks7 ~]# rocks run host compute-0-0 "systemctl restart slurmd"
[root@rocks7 ~]# rocks run host compute-0-1 "systemctl restart slurmd"
[root@rocks7 ~]# systemctl restart slurmd
[root@rocks7 ~]# systemctl restart slurmctld
[mahmood@rocks7 ~]$ sacctmgr list association
format=partition,account,user | grep mahmood
emerald em1 mahmood
diamond monthly mahmood
ruby y8 mahmood
[mahmood@rocks7 ~]$ srun -I -A monthly -p DIAMOND --mem=4GB --pty bash
[mahmood@compute-0-1 ~]$ exit
exit
[mahmood@rocks7 ~]$ srun -I -A em1 -p EMERALD --mem=4GB --pty bash
[mahmood@rocks7 ~]$ exit
exit
[mahmood@rocks7 ~]$ srun -I -A Y8 -p RUBY --mem=4GB --pty bash
srun: error: Unable to allocate resources: Requested partition
configuration not available now
[mahmood@rocks7 ~]$ cat /etc/slurm/parts
PartitionName=WHEEL RootOnly=yes Priority=1000 Nodes=ALL
PartitionName=DIAMOND AllowAccounts=monthly Nodes=compute-0-[0-1]
PartitionName=EMERALD AllowAccounts=em1,em4 Nodes=compute-0-[0-1],rocks7
PartitionName=RUBY AllowAccounts=Y8 Nodes=compute-0-[0-1]



Regards,
Mahmood

Werner Saar

unread,
May 16, 2018, 3:59:37 AM5/16/18
to slurm...@lists.schedmd.com
Hi Mahmood,

this question is related to the slurm-roll.
The command rocks sync slurm has more tasks:

1. Rebuild of 411 is forced
2. on compute nodes, the command /etc/slurm/slurm-prep.sh start is executed
3. on compute nodes, slurmd is restarted
4. slurmctld is restarted.

Step 1 and 2 are required, to sync the config files.

Best regards

Werner

Mahmood Naderan

unread,
May 16, 2018, 5:09:02 AM5/16/18
to Slurm User Community List
Yes I did that prior to my first email. However, I thought that is
similar to the service restart bug in the roll.

As you can see below, still the configuration is said to be not available


[mahmood@rocks7 ~]$ su
Password:
[root@rocks7 mahmood]# rocks sync slurm
[root@rocks7 mahmood]# exit
exit
[mahmood@rocks7 ~]$ srun -I -A Y8 -p RUBY --mem=4GB --pty bash
srun: error: Unable to allocate resources: Requested partition
configuration not available now
[mahmood@rocks7 ~]$ srun -I -A monthly -p DIAMOND --mem=4GB --pty bash
[mahmood@compute-0-1 ~]$ exit
exit
[mahmood@rocks7 ~]$


Regards,
Mahmood

John Hearns

unread,
May 16, 2018, 5:19:41 AM5/16/18
to Slurm User Community List
Mahmood,
you should check that the slurm.conf files are identical on the head node and the compute nodes after you run the rocks sync.




Mahmood Naderan

unread,
May 16, 2018, 7:04:10 AM5/16/18
to Slurm User Community List
Yes they are the same.

[root@rocks7 ~]# cp /etc/slurm/slurm.conf rocks7
[root@rocks7 ~]# scp compute-0-0:/etc/slurm/slurm.conf compute-0-0
slurm.conf
100% 2465 3.6MB/s 00:00
[root@rocks7 ~]# scp compute-0-1:/etc/slurm/slurm.conf compute-0-1
slurm.conf
100% 2465 4.7MB/s 00:00
[root@rocks7 ~]# md5sum rocks7 compute-0-*
41df7afb1ed37cc24d8151dc8d7e6c1e rocks7
41df7afb1ed37cc24d8151dc8d7e6c1e compute-0-0
41df7afb1ed37cc24d8151dc8d7e6c1e compute-0-1



The cpu limit on ruby partition is 20 cores. The nodes in that
partition are Intel Xeons with the following specs

[root@rocks7 ~]# rocks run host compute-0-5 "lscpu"
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
...

There are 14 physical cores on each cpu and therefore 28 physical
cores which means 56 threads. The requested cores are less that 28 so,
it should be ok. I don't know why slurm said that error.


Regards,
Mahmood

Mahmood Naderan

unread,
May 16, 2018, 11:58:16 AM5/16/18
to Slurm User Community List
Interesting thing I found!

As I checked the log, I saw
part_policy_valid_acct: job's account not permitted to use this
partition (RUBY allows Y8 not y8)

However, in the command I use "-A Y8" and I am sure about that. The
parts file contains
PartitionName=RUBY AllowAccounts=Y8 Nodes=compute-0-[2-4]

So, I decided to define y8 instead of Y8. The parts file then looks
PartitionName=RUBY AllowAccounts=y8 Nodes=compute-0-[2-4]

and when I run "-A y8", I don't get that error.
Seems to be a bug. If there is a reason for that, please let me know.
Regards,
Mahmood
Reply all
Reply to author
Forward
0 new messages