Changing Number and Type of Compute Nodes

159 views
Skip to first unread message

William Polik

unread,
Sep 26, 2020, 1:46:57 PM9/26/20
to google-cloud-slurm-discuss
After one has deployed a cluster with a defined number (max_node_count: 10) and type (compute_image_machine_type: n1-standard-2, compute_image_disk_size_gb: 20) of compute_nodes, can one change those without redeploying the cluster?

Presumably one would allow the cluster to to finish all existing jobs so there were no running compute jobs, edit some configuration files, and then restart slurm or the cluster.  What configuration files need to be edited (slurm.conf)?  And would one need to change anything on the computer node image?

I appreciate how much easier this is than physically adding additional nodes to one's hardware cluster!

Thanks,

Will


Joseph Schoonover

unread,
Sep 26, 2020, 1:59:58 PM9/26/20
to William Polik, google-cloud-slurm-discuss
Hey William,
Are you using schedmd/slurm-gcp or the marketplace solutions, e.g https://console.cloud.google.com/marketplace/details/fluid-cluster-ops/fluid-slurm-gcp

On the marketplace, you can certainly change your compute instance types and the Slurm partition layout on-the-fly. We had a livestream on setting up mutli-region and multi-zone compute partitions yesterday that covers how to modify partitions post deployment : https://youtu.be/6cvO-d5ebig

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/bc89d388-5373-4519-a92f-74995aa404fan%40googlegroups.com.

William Polik

unread,
Sep 27, 2020, 4:57:36 PM9/27/20
to google-cloud-slurm-discuss
We are using "Deploy an Auto-Scaling HPC Cluster with Slurm": https://codelabs.developers.google.com/codelabs/hpc-slurm-on-gcp

Wyatt Gorman

unread,
Sep 28, 2020, 11:53:57 AM9/28/20
to William Polik, google-cloud-slurm-discuss
Hi William,

In order to change partition configurations in the Slurm on GCP OSS you just need to spin down instances in the partition, modify the partition specifications in /apps/slurm/scripts/config.yaml and /apps/slurm/current/etc/slurm.conf to your liking, restart slurmctld on the slurm controller with "sudo systemctl restart slurmctld", restart any running slurmd daemons on nodes in the cluster, and then you can launch jobs in that partition to spin up instances with the new configurations.

Generally people who use a large number of different configurations in a single cluster will create partitions for each of these configurations and keep them with 0 running nodes until they need to use them, so that they don't need to reconfigure their cluster on the fly often.

Let us know how else we can help. Thanks,


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc




Reply all
Reply to author
Forward
0 new messages