/home and /apps on separate permanent disks

41 views
Skip to first unread message

Christoph

unread,
Nov 4, 2021, 8:25:26 AM11/4/21
to google-cloud-slurm-discuss
What would be the most straightforward way of separating user data, so /home and /apps, from the controller boot disk in a way that when I destroy the cluster the data persists?

I know it's possible to have a separate Filestore NAS mounted, but that seems overkill since I already have the controller essentially acting as a NAS. Ideally I would just like to have a permanent disk for the user data that gets mounted on the controller, but I'm not sure how to tell terraform to mount an existing disk and then how to get the controller to use it for the /home and /apps directory.

Any help is appreciated!

Alex Chekholko

unread,
Nov 4, 2021, 1:45:42 PM11/4/21
to Christoph, google-cloud-slurm-discuss
Hi,

The way I've been handling that without having to modify any of the terraform code is to have a "transfer bucket" or two where you can copy (gsutil rsync) everything from those directories and copy them back to the new cluster.

In our case the main goal is to avoid GCP egress charges, and if you can keep everything in one region, the transfer costs to/from the bucket should be pretty low.  I think it works fine for up to ~10TB, beyond that you may want to parallelize your copies as the transfers take hours/days.

Regards,
Alex

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/5c5d957b-f80f-4867-b1c7-bea1dbd3251dn%40googlegroups.com.

Tom Downes

unread,
Nov 4, 2021, 2:06:58 PM11/4/21
to Alex Chekholko, Christoph, google-cloud-slurm-discuss
I'll pass this particular issue on to the Cloud HPC team since it's one I've had concerns about when working with other schedulers.

My off-the-cuff advice would be to consider whether a scheduled snapshot and retention policy would address your concerns:


Tom Downes

Cloud Consultant, High Performance Computing +1-331-625-1145

210 N Carpenter St, Chicago, IL 60607




Christoph

unread,
Nov 5, 2021, 4:46:55 AM11/5/21
to google-cloud-slurm-discuss
Thank you, though I am not sure if snapshots of the controller disk would work in this case. I essentially want to separate user data from the rest of the configuration, so I can modify the cluster (and hence controller) in whatever way while still being certain to retain the data of the users using the cluster. I guess I should mention that one reason I am trying to achieve this (besides it in my opinion being a reasonable approach in general) is that I've found terraform to not be successful at updating a clusters configuration. So if I for example modify a partition or add a new one to the cluster, terraform apply does not successfully modify those for me while retaining the login and controller nodes.

Christoph

unread,
Nov 5, 2021, 4:55:05 AM11/5/21
to google-cloud-slurm-discuss
This might be a possible workaround, though it would still require a bit of work with automated backup schedules and manually loading the data back onto a new cluster. Perhaps I'm missing something, but wouldn't just having a separate permanent disk solely for user data with its own snapshot / retention policy be the most sensible solution here? On a NAS I would also expect OS data to normally be stored separately from the data it's serving.

Alex Chekholko

unread,
Nov 5, 2021, 11:20:57 AM11/5/21
to Christoph, google-cloud-slurm-discuss
Hey Christoph,

I see what you're saying about terraform updates; in my case I only use the terraform for the initial deployment, and from there, any modifications to the cluster I do the "old way", just by SSHing in and modifying the configuration.  In my case, these are short-term clusters, so I don't usually need to save any of the config for the future.

I don't think terraform is suitable for changing the configuration files inside a running instance.

Regards,
Alex

Christoph

unread,
Nov 5, 2021, 11:57:26 AM11/5/21
to google-cloud-slurm-discuss
I see, so perhaps it's more reasonable then to just use the terraform script for the initial build on then work on it from there.
Reply all
Reply to author
Forward
0 new messages