failing to add compute node in shared-vpc

73 views
Skip to first unread message

Hiroshi Kobayashi

unread,
Jun 24, 2021, 1:46:54 AM6/24/21
to google-cloud-slurm-discuss

Hi everyone,


I deployed a slurm cluster via Terraform into a shared-vpc.

The cluster and the shared-vpc are hosted on separated projects.


The deployment itself looks OK. A login node and a controller node are up and running.

But when I submit a test jobs, the computing nodes were NOT coming up.


According to the api log,  failing to launch computing nodes due to unknown subnet.


status: {

code: 7

message: "Required 'compute.subnetworks.use' permission for 'projects/<projectname>/regions/us-west2/subnetworks/kobayashi-dev-us-west2'"


the subnet name "kobayashi-dev-us-west2" is not correct. My shared-vpc subnet name is given in tfvars and  it is "us-west2-prod-hpc".


"kobayashi-dev-us-west2" is combined name of the cluster name and region name.

I guess when the shared-vpc is enabled, the wrong subnet name was provided to compute-api.


Any ideas on how to fix this wrong subnet name?


Thank you,

Hiroshi Kobayashi

Hiroshi Kobayashi

unread,
Jun 24, 2021, 2:00:48 AM6/24/21
to google-cloud-slurm-discuss
here is the tfvars file

2021年6月24日木曜日 14:46:54 UTC+9 Hiroshi Kobayashi:
kobayashi-dev.tfvars.txt

Wyatt Gorman

unread,
Jun 24, 2021, 10:04:40 AM6/24/21
to Hiroshi Kobayashi, google-cloud-slurm-discuss
Hi Hiroshi,

Please make sure you fill in the "vpc_subnet" field in the partition definition with your desired subnet. Otherwise, with it set to null as it is currently, Slurm will try to create a new subnet in the network specified for that partition, and you could run into the issues you've seen here.

Let us know if that fixes the issue. We're also working on highlighting this better in the documentation.

Thanks,


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc




--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/5ce8b263-e839-4a68-977f-66169017984dn%40googlegroups.com.

Hiroshi Kobayashi

unread,
Jun 24, 2021, 7:02:58 PM6/24/21
to google-cloud-slurm-discuss
Hi Wyatt,

I could resolved this issue by put the subnet name into "vpc_subnet" field in the partition definition.
Thank you so much!

Hiroshi Kobayashi

2021年6月24日木曜日 23:04:40 UTC+9 Wyatt Gorman:
Reply all
Reply to author
Forward
0 new messages