Mount NFS in basic.tfvars

88 views
Skip to first unread message

Jerry Huang

unread,
Mar 24, 2022, 10:57:43 PM3/24/22
to google-cloud-slurm-discuss
Hi folks,

I followed https://github.com/SchedMD/slurm-gcp to set up a slurm on GCP, and created an instance in Filestore with a mount point as 10.x.x.x:/slurm_nfs. I also modified the basic.tfvars file as

network_storage = [{
  server_ip     = "10.x.x.x"
  remote_mount  = "/slurm_nfs"
  local_mount   = "/scratch"
  fs_type       = "nfs"
  mount_options = null
}]

The cluster was successfully built. However, when using command "df -h" on login0 node, there was no nfs storage mounted. Is there anything else I need to change? Say network_name or subnetwork_name?

Many thanks.

Alex Chekholko

unread,
Mar 25, 2022, 11:30:27 AM3/25/22
to Jerry Huang, google-cloud-slurm-discuss
Hi Jerry,

I have not tried the filestore myself but as a simpler alternative, it may be easier to just upsize the CPU and disk on the controller node (and then also log in there and add more nfsd threads).   The capacity limits of regular persistent disks and filestore are similar enough.  And if your workload is really so I/O-limited, you may be better off with a single giant instance anyway, no need for a cluster.

my 2c.

Regards,
Alex


--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/d9ca7d22-8a8d-4590-97ab-abdadadac2a5n%40googlegroups.com.

Jerry Huang

unread,
Mar 25, 2022, 12:10:54 PM3/25/22
to google-cloud-slurm-discuss
Thanks, Alex. Will give a try.

Wyatt Gorman

unread,
Mar 25, 2022, 2:35:34 PM3/25/22
to Jerry Huang, google-cloud-slurm-discuss
Hi Jerry,

Thanks for reaching out. Your fix is a simple one, and you can read more details in the Slurm User Guide (https://goo.gle/slurm-gcp-user-guide). The "network_storage" field is for the Slurm worker instances, while the "login_network_storage" field is used for the controller and login instances, so that it's possible to isolate the storage available to all workers from the controller and login instances. If you copy and paste the entry you have to the "login_network_storage" field the NFS mount should appear as expected.

You can certainly increase the size and VM type of the controller instance to increase the performance of controller's NFS server, but it will only scale so far in both size and capacity,  does not persist through cluster redeployments, and has some other limitations compared to solutions like Filestore that you'll want to make sure you consider.

Let us know how that works and if you have any questions!


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc




Jerry Huang

unread,
Mar 26, 2022, 9:52:03 PM3/26/22
to google-cloud-slurm-discuss
Hi Wyatt,

It works. I should read the document more carefully.
Thanks a lot.

Best,
Jerry

Wyatt Gorman

unread,
Mar 28, 2022, 10:49:10 AM3/28/22
to Jerry Huang, google-cloud-slurm-discuss
Glad to hear it Jerry. Let us know if you have any more questions!


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc



Reply all
Reply to author
Forward
0 new messages