Mounting /home from a Secondary Disk

278 views
Skip to first unread message

William Polik

unread,
Sep 26, 2020, 1:26:03 PM9/26/20
to google-cloud-slurm-discuss
I have build several clusters from scratch.  As we explore moving more resources from hardware to the cloud, I recently have been creating gcp-slurm elasticlusters, using mostly the default yaml settings.  The fact that it worked on my first try is a testament to its robust design.  Thank you!

I see that both the shared /home and /apps are part of the 50GB (default) disk associated with the controller node.  We have preferred to have /home as its own separate disk, which is mounted by the controller node (actually a storage node) and then NFS-shared, rather than having /home as part of the controller's / directory.  This is so that we can more easily do backups, resize the partition is necessary, and/or separate it from the OS should be decide to rebuild the cluster, which is now so easy to do in the cloud.

Is there a way for the controller node to mount a secondary disk as /home, and then NFS share that among all the other nodes?  Would one do that initially in the yaml file, or after the cluster is deployed?  And would there be a steep performance penalty or other problems by using a secondary disk for the shared /home directories?

I see that this mentioned in the following conversation:
but I don't see the specifics of how to do it.

Thanks!

Will

Wyatt Gorman

unread,
Sep 28, 2020, 1:29:59 PM9/28/20
to William Polik, google-cloud-slurm-discuss
Hi William,

It sounds like hosting your /home directory on something like a Filestore NFS share would be much easier, more robust, and more persistent than a PD on the controller node. You can use a Filestore as your /home directory by using the network_storage field either per cluster or per partition to mount a Filestore instance at /home. Then you could use that same YAML configuration to deploy a new cluster with the same /home directory.

If you would truly prefer to have the controller instance itself host /home on a secondary PD, that is also possible with a little customization. You can use the YAML configuration to specify a secondary controller disk which will be mounted on the instance. You would then modify /etc/exports to export your secondary disk as /home, unmount anything with /home mounted, restart the nfs server, and remount any clients.


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc




--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/f6914963-411a-4c43-90e7-42fa3be54ec7n%40googlegroups.com.

Bo Langgaard Lind

unread,
Nov 13, 2020, 9:23:06 AM11/13/20
to google-cloud-slurm-discuss
Is there a guide on how to set up a GCP filestore NFS share to be mounted on the cluster machines?

Kozo Nishida

unread,
Jan 30, 2021, 7:00:18 AM1/30/21
to google-cloud-slurm-discuss
I also need that guide.
The documentation below doesn't give the example to use NFS in slurm-gcp environment.

Bo Langgaard Lind

unread,
Jan 31, 2021, 3:44:25 PM1/31/21
to google-cloud-slurm-discuss
It's not too complicated, and actually already included, commented out, in the example yaml file:

    network_storage         :
       - server_ip: 10.1.2.3
         remote_mount: /export
         local_mount: /home
         fs_type: nfs

As discussed previously, you can test it with a GCP file store, but do note that file stores don't respond to ping, which can make debugging difficult.

You received this message because you are subscribed to a topic in the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-cloud-slurm-discuss/D8P3ZjmkTp8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/ad6aba8b-0416-4ef9-b969-d7b3d3377205n%40googlegroups.com.

Kozo Nishida

unread,
Jan 31, 2021, 6:24:17 PM1/31/21
to google-cloud-slurm-discuss
I also tried this setting, 
(I had already tried this by the time I messaged)
but when I used the `network_storage`, the deploy of Slurm never ended and I gave up using it.
(As you said, there is no way to debug...)

That's why I wanted a "full" guide.

By the way, I confirmed that the GCP file store server can be mounted from the command line from the login node instead of this yaml setting.


Reply all
Reply to author
Forward
0 new messages