Hello,
I am following the tutorial for setting up a Slurm cluster which is located here:
I am using us-east1-c with the standard login, image, and controller configuration VMs, but for compute I am using 10 n2-standard-16 VMs.
When I run the sbatch -N2 --wrap="srun hostname" step to test a Slurm job, I am unable to get anything out from the job as it alternates between configuring state for 4-5 minutes and then pending state.
sinfo reveals the following:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 3 mix# runs-compute-0-[0-2]
debug* up infinite 7 idle~ runs-compute-0-[3-9]
Any ideas as to what went wrong or how to further diagnose the issue? Thanks in advance.
-Brandon