Hi,
I'm just starting out with Cloud Engine and was very interested to find this tutorial about setting up a Slurm cluster:
https://cloud.google.com/architecture/deploying-slurm-cluster-compute-engine
But it doesn't seem to work. Once I deploy the cluster and try to launch a test job, nothing happens (i.e. there is no output), and the jobs seem to get stuck with reason "BeginTime":
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ sbatch -N2 --wrap="srun hostname"
Submitted batch job 1
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ ls
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 2 mix# full-debug-test-[0-1]
debug* up infinite 18 idle~ full-debug-test-[2-19]
debug2 up infinite 10 idle~ full-debug2-test-[0-9]
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 debug wrap dhdaines CF 0:06 2 full-debug-test-[0-1]
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ ls
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 debug wrap dhdaines CF 0:14 2 full-debug-test-[0-1]
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 debug wrap dhdaines PD 0:00 2 (None)
[dhdaines_gmail_com@full-login-8kjqahky-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 debug wrap dhdaines PD 0:00 2 (BeginTime)
I'm not familiar with Slurm, having only used SGE and UGE in the past, so I'm sorry if I'm missing something obvious here. I tried to hold and release the job, submitting another job, same problem.
Is the tutorial out of date? Is my account perhaps not capable of deploying a cluster? (I'm still in the free trial)
Any hints would be greatly appreciated.