Slurm Cluster Not Running Jobs

14 views
Skip to first unread message

Brandon K

unread,
Mar 1, 2021, 10:18:31 AM3/1/21
to gce-discussion
Hello,

I am following the tutorial for setting up a Slurm cluster which is located here:

I am using us-east1-c with the standard login, image, and controller configuration VMs, but for compute I am using 10 n2-standard-16 VMs.

When I run the sbatch -N2 --wrap="srun hostname" step to test a Slurm job, I am unable to get anything out from the job as it alternates between configuring state for 4-5 minutes and then pending state

sinfo reveals the following:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug*        up       infinite         3            mix#    runs-compute-0-[0-2]
debug*        up       infinite         7            idle~    runs-compute-0-[3-9]

Any ideas as to what went wrong or how to further diagnose the issue? Thanks in advance.

-Brandon

Ahmad P - Cloud Platform Support

unread,
Mar 1, 2021, 4:44:15 PM3/1/21
to gce-discussion

Hello Brandon,


Thank you for your questions.


It seems that this is an issue. 

You can report this issue in the Google public issue tracker[1] then the GKE product team will check, reproduce and fix that. But there is no ETA for that.


[1] https://cloud.google.com/support/docs/issue-trackers

Reply all
Reply to author
Forward
0 new messages