Nodes not spinning up: "Zone does not currently have sufficient capacity"

53 views
Skip to first unread message

Tomás Di Domenico

unread,
Jul 28, 2022, 9:26:50 AM7/28/22
to google-cloud-slurm-discuss
Our slurm-gcp instance has suddenly stopped spinning up nodes. resume.log contains the following messages: "Zone does not currently have sufficient capacity for the requested resources.".

I can create an equivalent machine manually in the same project and zone through the console with no problems.

The message seems to be similar to what used to happen to us when spinning up ultramem nodes (see https://groups.google.com/g/google-cloud-slurm-discuss/c/0yfM_9tKxes/m/ABAMPR89BAAJ).

Any pointers would be appreciated.

Cheers!

Tomás Di Domenico

unread,
Jul 29, 2022, 5:19:03 AM7/29/22
to google-cloud-slurm-discuss
It's come back to life on its own after a while. Any insights would still be appreciated, in case anyone's aware of why that happens.

Thanks!

Joseph Schoonover

unread,
Jul 29, 2022, 8:27:25 AM7/29/22
to Tomás Di Domenico, google-cloud-slurm-discuss
Hey Tomás,
During times of peak usage in the zone you've chosen, Google does not have capacity to fulfill the request for the resources the resume script is requesting.

You may consider setting regional_capacity = true in that partition. This doesn't always resolves the problem, but provides the bulk api the ability to request resources from multiple data centers in the same region and may help it occur less frequently.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/f6051fe9-04d2-4789-af91-991a08ed2cf9n%40googlegroups.com.

Tomás Di Domenico

unread,
Jul 29, 2022, 9:32:18 AM7/29/22
to google-cloud-slurm-discuss
Thanks Joseph. I was just curious since I could create a machine of the same characteristics manually even then the cluster wouldn't.

I'll check out your suggestion and see how it goes.
Reply all
Reply to author
Forward
0 new messages