mounting a bucket on all nodes

Arthur Gilly

unread,

May 26, 2021, 5:10:07 AM5/26/21

to google-cloud-slurm-discuss

Hi!

The question has been asked before, but the answers are not consistent. I was wondering if there is a definitive list of flags to be used in tfvar files to mount a bucket on all nodes to a fixed location on the VM's filesystems. I have found that no matter what flags I used for mount_options, I end up with a folder that has permissions:

d?????????? ? ? ? ? ?

I have a bucket named my-bucket and the command cd && mkdir test && gcsfuse my-bucket test works post-terraform on the slurm-controller. I would like to have such a mount available on all my nodes, configured at the terraform stage.

According to help provided here and elsewhere on the internet, I added the following to the tfvars file:

network_storage = [{

server_ip = "gcs"

remote_mount = "my-bucket"

local_mount = "/gcs"

fs_type = "gcsfuse"

mount_options = "rw,_netdev,user"

}]

I also tried adding the uid and gid corresponding to my user instead of user. Post-terraform I get a folder with question mark permissions as above, unreadable. When I do, as user (without sudo) mount /gcs, I get a permission denied. If I create another mountpoint, modify fstab to point to that with the same mount options, sudo mount -a works, but the permissions are the same.

So my questions are:

- Is there any way to make this work, or should I prefix all my jobs/scripts with a gcsfuse command to a local user-owned directory? that seems extreme.

- If yes, what is the best place and ownership to create a bucket mount? in / or in /home, owned by root or by the user?

- If the ownership is user, how do I create a mountpoint automatically with the right permissions at terraform stage?

- if there is no way to do this, could this be done with a root-owned directory and o+rwx permissions? (I think I remember chmod can be specified in fstab). What are the risks of this?

Many thanks,

Arthur

Arthur Gilly

unread,

May 28, 2021, 3:42:36 AM5/28/21

to google-cloud-slurm-discuss

Prefixing a script by a mount command does not work either. I tried adding:

if [[ ! -f /tmp/data/testfile ]]; then

mkdir -p /tmp/data

gcsfuse --key-file /home/myuser/mykeyfile my-bucket /tmp/data

fi

to the beginning of my scripts which I then sbatch. Unfortunately, although this completes the first time round, any subsequent -f or ls or any attempt at accessing the mountpoint results in the following error:

cannot access /tmp/data: Transport endpoint is not connected

So I am still looking for a way to mount a bucket on all nodes. I am curious, since this problem is not widely reported, do people not output job results to buckets, do they instead pay for a large drive that they then mount as NFS?

Cheers,

Arthur

Joseph Schoonover

unread,

May 28, 2021, 9:20:25 AM5/28/21

to Arthur Gilly, google-cloud-slurm-discuss

Hey Arthur,

I can only speak for our use cases and those of our customers. We primarily use either the controller, a standalone instance, or Filestore as an NFS filesystem. When jobs are complete data is migrated to buckets for long term storage.

IMO, a good solution for a demo is to use the controller as your filesystem and add gsutil cp calls to your job script at the end. Make sure your compute node service account has the Storage Object Admin IAM role so that you can copy objects from the compute nodes to GCS buckets.

File io performance for GCSFuse is not great in comparison to these other solutions.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/8f3c1ca6-3e22-4121-8677-1f06795c3ca5n%40googlegroups.com.

Alex Chekholko

unread,

May 28, 2021, 12:34:27 PM5/28/21

to Arthur Gilly, google-cloud-slurm-discuss

Hi,

In our case, I usually just use a large disk on the controller; GCP supports single disks up to 64TB. Of course, if your jobs are I/O heavy, there will be a bottleneck as they are all doing I/O to that one disk over that one NIC on the controller node. You can also adjust the instance size up on the node and maybe fiddle with the NFS threads, etc. But that's been simpler for me than figuring out how to use the bucket mounts.

Regards,

Alex

--

Joseph Schoonover

unread,

May 28, 2021, 1:55:23 PM5/28/21

to Alex Chekholko, Arthur Gilly, google-cloud-slurm-discuss

+1 on Alex's note on performance. You can also use a Lustre File system to scale performance of file IO quite nicely.

The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.

Dr. Joseph Schoonover

Chief Executive Officer

Senior Research Software Engineer

j...@fluidnumerics.com

To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/CANcy_PbcUSO793ijrQhKK6mFR-jMntQziQwpDB8bp9pMLLYgaQ%40mail.gmail.com.

Arthur Gilly

unread,

Aug 19, 2021, 4:14:51 AM8/19/21

to google-cloud-slurm-discuss

Hi ! just updating this since I finally managed to do it. To mount a bucket on all nodes, do the following:

add the Storage Admin IAM role to the default service account at https://console.cloud.google.com/iam-admin/iam (I also added Cloud Storage Admin, not sure if redundant)
make the following changes to basic.tfvars:

network_storage = [{

server_ip = "gcs"

remote_mount = "your-bucket-name"

local_mount = "/gcs"

fs_type = "gcsfuse"

mount_options = "rw,_netdev,user,file_mode=664,dir_mode=775,allow_other"

}]

login_node_service_account = "default"

login_node_scopes = [

"https://www.googleapis.com/auth/monitoring.write",

"https://www.googleapis.com/auth/logging.write",

"https://www.googleapis.com/auth/devstorage.full_control"

]

compute_node_service_account = "default"

compute_node_scopes = [

"https://www.googleapis.com/auth/monitoring.write",

"https://www.googleapis.com/auth/logging.write",

"https://www.googleapis.com/auth/devstorage.full_control"

]

Kevin Deitz

unread,

Jun 2, 2023, 12:20:28 PM6/2/23

to google-cloud-slurm-discuss

Thanks a lot for sharing this. If anyone has encountered a similar problem using hpc-tookit, the following module worked for me:

  - id: my_bucket_module_id
    source: modules/file-system/pre-existing-network-storage
    settings:
      remote_mount: my_bucket_name
      local_mount: /mnt/my_bucket_mnt
      fs_type: gcsfuse
      mount_options: rw,_netdev,user,file_mode=664,dir_mode=775,allow_other

Reply all

Reply to author

Forward