Best way to make bucket available on all compute nodes

370 views
Skip to first unread message

Julian Büchel

unread,
May 7, 2020, 4:55:01 AM5/7/20
to google-cloud-slurm-discuss
Hi,

I have successfully mounted a bucket in my controller node using gcsfuse, but my compute nodes only see the folder without the content.
I was wondering, what is the best way to give all the compute nodes access to a bucket containing data that only needs to be read?

I was thinking maybe the best way would be to simply mount the bucket in the controller node and then it would be available in the compute nodes as well.
Mounting it on each compute node seems unnecessary, but I am fairly new to this so maybe that's how it is done.

Thanks!

-Julian

Julian Büchel

unread,
May 7, 2020, 5:02:12 AM5/7/20
to google-cloud-slurm-discuss
Or can you even permanently mount it in the image? This way an image can be reused by other users in the company. Btw, how do you specify an image in a slurm cluster that should be used. Right now I believ it is creating a new one and you can update it by creating an instance from it, but how can I just use a previously created one?

Joseph Schoonover

unread,
May 7, 2020, 8:42:11 AM5/7/20
to google-cloud-slurm-discuss
Hey Julian,
You may consider checking our marketplace solution (Fluid-Slurm-GCP ; https://console.cloud.google.com/marketplace/details/fluid-cluster-ops/fluid-slurm-gcp )

On this system, we have a tool called cluster-services that you can use to manage mounts among other things.

Essentially, you could do the following

Go root :
sudo su

Create a cluster-config file
cluster-services list all > config.yaml

Edit the config.yaml to include a mounts block with the following schema

mounts :
- owner : root
group : root
mount_directory: /path/to/mount
server_directory: gs://your_bucket
permission: '755'
protocol: gcs

You may need to remove the existing mounts empty array in the config.yaml that was created previously.

Once this looks good to you, preview your update with
cluster-services update mounts --config=config.yaml --preview

Then apply the changes with
cluster-services update mounts --config=config.yaml

This will mount the bucket on the current node you're working on and update a global config that will cause compute nodes to automatically mount the bucket.

If you have any issues with this feature, you can report them to the marketplace issue tracker for this solution at

https://help.fluidnumerics.com/slurm-gcp/issue-tracker

You can find more general documentation, including codelabs on our site at

https://help.fluidnumerics.com/slurm-gcp

Schema documentation for cluster-services can be found at
https://fluid-slurm-gcp-schemas.firebaseapp.com/

Julian Büchel

unread,
May 7, 2020, 9:05:35 AM5/7/20
to google-cloud-slurm-discuss
It seems that this should be solvable without using an extension. It seems that I only need this small feature and would have to pay the costs of this extension, which does not make sense to me. But thank you for your help!

Wyatt Gorman

unread,
May 7, 2020, 11:55:07 AM5/7/20
to Julian Büchel, google-cloud-slurm-discuss
Hi Julian, no need to use the FluidNumerics offering to make GCSFuse work. Mounting things on the controller node does not automatically make them accessible by the rest of the cluster, only the controller node. You need to specify the network storage information in the partition you want to mount that storage, as well as giving your the service account your cluster is using (the default, unless you specified otherwise in the YAML) access to the bucket and the contents within it. You can read more about Cloud Storage IAM here. You'll want to give your service account read and/or write access to the bucket, at least.

Regarding using an existing image, simply specify the image family you want in the "compute_image_family" field in each partition in the YAML. Slurm will automatically use the latest one. If you have an image you customized from a previous Slurm deployment you can specify that deployment's image family name in the "compute_image_family" field. However, this assumes Slurm is installed and configured as the Slurm GCP scripts would. Alternatively, you can configure the cluster's image either manually on the image node, programmatically using the custom-*-install scripts, or using the NFS server and/or modules by refering to the "Installing Apps in a Slurm Cluster on Compute Engine" guide.


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc




--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/81594930-b991-4286-81ea-2e32b9cbc94d%40googlegroups.com.

Julian Büchel

unread,
May 8, 2020, 1:33:38 AM5/8/20
to google-cloud-slurm-discuss
Thanks for the reply. "You need to specify the network storage information in the partition you want to mount that storage" - How do I do that using the config yaml file of slurm?
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

Wyatt Gorman

unread,
May 11, 2020, 10:11:50 AM5/11/20
to Julian Büchel, google-cloud-slurm-discuss
Hi Julian,

The "network_storage" field in the partition section of the YAML is where you can specify network storage. You can read more about mounting with gcsfuse and the mount options. The network_storage field will automatically install the gcsfuse client.

For example, to mount a gcs bucket named "my-bucket" you could use a network_storage field like this:

network_storage        :
        - server_ip: gcs
          remote_mount: my-bucket
          local_mount: /gcs
          fs_type: gcsfuse
          mount_options: rw,_netdev,uid=55555555,gid=55555555

If you don't specify your uid and gid it will mount the bucket as root, and your access will be limited. You can find your OSLogin UID and GID by running "gcloud compute os-login describe-profile".

You also have to ensure that the service account being used by the instances (either your default or the one you specified in your YAML) has permissions to access the bucket you're mounting.


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc



To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/4de5c727-7c6f-408e-a8b0-6f5d4e38f0e8%40googlegroups.com.

Julian Büchel

unread,
May 12, 2020, 10:36:20 AM5/12/20
to google-cloud-slurm-discuss
Thanks for the reply. I tried your suggestion, but somehow the contents are still missing. Gcsfuse is installed on the compute node, but /data is empty (I replaced /gcs with /data). I changed the mount options as you said.


On Monday, May 11, 2020 at 4:11:50 PM UTC+2, Wyatt Gorman wrote:
Hi Julian,

To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

Wyatt Gorman

unread,
May 12, 2020, 11:08:22 AM5/12/20
to Julian Büchel, google-cloud-slurm-discuss
Can you verify the contents on /etc/fstab to make sure that the entry for GCS aligns with the suggested fstab entry in the GCSFuse Mounting page? Then please run "sudo mount -a" and check what the message is regarding mounting GCSFuse.

Thanks,


Wyatt Gorman

HPC Solutions Manager

https://cloud.google.com/hpc



To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/212cf7ef-7e26-42ce-be6d-23df883da9a8%40googlegroups.com.

Julian Büchel

unread,
May 12, 2020, 11:43:29 AM5/12/20
to google-cloud-slurm-discuss
It shows 
bucket-name   /data     gcsfuse     rw,_netdev,uid=1008689061,gid=1008689061,nonempty     0 0

when I execute
cat /etc/fstab

and nothing happens when I execute
sudo mount -a
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

Keith Binder

unread,
May 12, 2020, 12:08:45 PM5/12/20
to Julian Büchel, google-cloud-slurm-discuss
Do you know what the access scopes are the host you are on ?  Can you type curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/?recursive=true" -H "Metadata-Flavor: Google"

and retrieve the scopes?


Keith Binder

kbi...@google.com

Customer Engineer

Mobile: 201-887-6974



It shows 
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/65876c33-9ac7-4ce3-adee-550fb5aba2c3%40googlegroups.com.

Julian Büchel

unread,
May 12, 2020, 12:10:59 PM5/12/20
to google-cloud-slurm-discuss
It shows 
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

Joseph Schoonover

unread,
Jun 19, 2020, 1:44:30 PM6/19/20
to google-cloud-slurm-discuss
Hey Julian,
I wanted to check in to see if you were able to resolve this issue.

Julian Büchel

unread,
Jun 19, 2020, 1:51:22 PM6/19/20
to google-cloud-slurm-discuss
Hi,
I just unmounted and then mounted the storage in a startup script. It’s not the most elegant solution but works well for me. So i didn’t want to spend any more time on this.

Joseph Schoonover

unread,
Jun 19, 2020, 2:12:03 PM6/19/20
to google-cloud-slurm-discuss
Glad to hear you have a solution.
For the community, the compute instances need to have the storage-full auth scope (https://www.googleapis.com/auth/devstorage.full_control) and the compute engine service account needs the storage admin IAM role. In terms of the mount options, we recommend using either

rw,_netdev,implicit_dirs,user

or

rw,_netdev,implicit_dirs,uid=UID,gid=GID

in your /etc/fstab file.

Both option sets allow users to mount the bucket within their job script (mount /path/to/mount). The former can be useful if you have multiple trusted colleagues working on your cluster and all need access to the cluster. The latter will limit mount capability only to the uid or gid provided in the options.

In both cases, I've found that users must run

mount /path/to/mount 

at the beginning of their job, where /path/to/mount is the local directory the bucket is being mounted to. I've found that, even when root mounts the bucket during startup, it's not necessarily available for users. However, the user option or uid=UID,gid=GID gives permission for users to mount the bucket themselves.

TJ

unread,
Nov 3, 2020, 8:02:11 PM11/3/20
to google-cloud-slurm-discuss
I'd like to do this as well- does anyone have a complete set of instructions on how to do this?  

TJ

unread,
Nov 3, 2020, 8:06:19 PM11/3/20
to google-cloud-slurm-discuss
So, would this be in addition to doing the GCSfuse stuff?
Reply all
Reply to author
Forward
0 new messages