[slurm-users] slurm_update error: Invalid node state specified

9,386 views
Skip to first unread message

Sushil Mishra

unread,
Oct 11, 2022, 10:10:34 AM10/11/22
to Slurm User Community List
Dear all,

I am stuck with scontrol not recognizing the state keywords. I wonder if someone can point me to the possible cause of the error.  I restarted slurmd a few times, and it didn't help. 

[sushil@fucose ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
LocalQ*      up   infinite      1  inval fucose

[sushil@fucose ~]$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
cg                   sushil    2022-10-10T18:11:27 fucose

[sushil@fucose ~]$ sudo scontrol update NodeName=fucose state=RESUME
[sudo] password for sushil:
slurm_update error: Invalid node state specified

[sushil@fucose ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

Best,
Sushil
 

Groner, Rob

unread,
Oct 11, 2022, 11:04:41 AM10/11/22
to Slurm User Community List
Have you checked the logs for slurmd and slurmctld?  I seem to recall that the "invalid" state for a node meant that there was some discrepancy between what the node says or thinks it has (slurmd -C) and what the slurm.conf says it has.  While there is that discrepancy and the node is invalid, you can't just tell it to resume.


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Sushil Mishra <sushil...@gmail.com>
Sent: Tuesday, October 11, 2022 10:08 AM
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [slurm-users] slurm_update error: Invalid node state specified
 
You don't often get email from sushil...@gmail.com. Learn why this is important

Paul H. Hargrove

unread,
Oct 11, 2022, 12:23:48 PM10/11/22
to Slurm User Community List
I think Rob is "on the right track" here.  Specifically, I don't think the error message means that "RESUME" is unrecognized as the name of a state.  Rather the message means that a state transition from "INVAL" to "RESUME" is invalid.  I can reproduce that message by trying to "RESUME" an "IDLE" node, but "RESUME" works fine for node which has been revently rebooted.

-Paul

--
Paul H. Hargrove <PHHar...@lbl.gov>
Pronouns: he, him, his
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory

Sushil Mishra

unread,
Oct 11, 2022, 2:27:09 PM10/11/22
to Slurm User Community List
Thanks so much! Indeed it was a mismatch between the actual and slurmd.conf SocketsPerBoard value. 
Sushil
Reply all
Reply to author
Forward
0 new messages