Hi everyone,
I deployed a slurm cluster via Terraform into a shared-vpc.
The cluster and the shared-vpc are hosted on separated projects.
The deployment itself looks OK. A login node and a controller node are up and running.
But when I submit a test jobs, the computing nodes were NOT coming up.
According to the api log, failing to launch computing nodes due to unknown subnet.
status: {
code: 7
message: "Required 'compute.subnetworks.use' permission for 'projects/<projectname>/regions/us-west2/subnetworks/kobayashi-dev-us-west2'"
the subnet name "kobayashi-dev-us-west2" is not correct. My shared-vpc subnet name is given in tfvars and it is "us-west2-prod-hpc".
"kobayashi-dev-us-west2" is combined name of the cluster name and region name.
I guess when the shared-vpc is enabled, the wrong subnet name was provided to compute-api.
Any ideas on how to fix this wrong subnet name?
Thank you,
Hiroshi Kobayashi
--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/5ce8b263-e839-4a68-977f-66169017984dn%40googlegroups.com.