File bundles (with symlinks)?

34 views
Skip to first unread message

Zachary Wright

unread,
Aug 11, 2025, 10:02:30 AMAug 11
to Discuss
Hello, 

We manage a large data lake and periodically grant requests for users to download a set of files, which are usually spread across multiple subdirectories within the the data lake. For example, the user might want /subdir_1/file_a and subdir_2/file_b but we don't want to them to have access to /subdir_1/file_c and subdir_2/file_d, etc. (The users have signed a data use agreement to only see/get the requested files.) 

I've tried creating a guest collection on top of a folder with symlinks to the requested files, but that only works if the user has access to the entire data lake (or to each of the subdirs with the files). 

Short of actually copying the requested files into a directory for each request, are there any ways to achieve this? Thanks. 

-- Zach 

Robert Freeman-Day

unread,
Aug 11, 2025, 2:08:18 PMAug 11
to Discuss, zwr...@umich.edu
Hello, Zach.

Globus Connect Server has certain behaviors with symlinks and we have some documentation on this to get you a decent overview:


I don't know if you manage the GCS installation the data is housed on, but it may be better to reach out to support (via sup...@globus.org) and we may be able to look a little further into options.

Thanks!
Robert

Karl Kornel

unread,
Aug 11, 2025, 8:26:57 PMAug 11
to Zachary Wright, Discuss
Hi Zachary,

If the Globus folks aren’t able to find another solution, I can imagine one method, but it’s not perfect, and it would take a fair amount of work.

You could make a Globus service account (a Globus Auth Confidential OAuth client), and give that service account full read to the entire data lake.  Once users are authorized to access the data, they can be instructed to make a Guest Collection of their own, giving write access to your service account.  Your service account would then initiate the transfer from your data lake.

The upside is that the users do not need to have access to the data lake.  All the transfers would be initiated by your service account, so you can be certain that users won’t see the other files in the data lake.

The downside is that it limits where your users can transfer data to.  Here’s what I mean:

• Guest Collections is a feature that requires a Globus subscription, so free users won’t be able to use Globus to download data.
• Folks using Globus Connect Personal will need to be enabled to use premium features, in order to create a guest collection.
• For folks using Globus Connect Server, the collection admin will need to be allowed to create guest collections.  At Stanford, for example, our largest compute environment does not allow folks to create guest collections; if they rent storage through our storage service, they can create guest collections there.

I recently learned that the ABCD Study is also using Globus as their exclusive method for bulk data transfer to users.  An instructional video is available, though I’m not sure if it uses the method I described above, or a different method.

Please let us know what you decide to do!  I expect it will be interesting.

 

~ Karl

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Tom Cram

unread,
Aug 11, 2025, 11:30:29 PMAug 11
to Zachary Wright, Discuss
Hi Zachary,

on the NSF NCAR Research Data Archive portal, we do something similar to the ABCD Study that Karl mentioned.  Our users can select and curate custom file lists from our archive, then we redirect the user to the Globus Browse Endpoint helper page to select a destination endpoint for the transfer (this can be a Globus Personal Connect endpoint, no subscription based guest collection required).  After the user chooses a destination endpoint, they are redirected back to our portal, at which point our Globus service account submits the transfer on behalf of the user.

These are OAuth2 delegated transfers and require the user to provide a one-time consent to allow our service account to make the transfer on the user's behalf.

-- Tom

Thomas Cram | Software Engineer

NSF National Center for Atmospheric Research (NSF NCAR)

Computational and Information Systems Lab

Web: rda.ucar.edu

ORCID: 0000-0002-9503-6510



Zachary Wright

unread,
Aug 12, 2025, 10:22:01 AMAug 12
to Discuss, robert.f...@globus.org, Zachary Wright
Thanks Robert. This documentation is useful. I see this sentence in particular RE: permissions might be relevant to our situation:

If you trust users with access to a Globus collection not to create this kind of exploitative symlink, you can override this behavior on Globus Connect Server collections with the rp-follow-symlinks option to the GridFTP server.

Unfortunately, I am in about the least advantageous position in regards to the management of this GCS installation: I don't manage it and it's on a large file system that's not at my home institution. It doesn't hurt to ask, though.  

Zachary Wright

unread,
Aug 12, 2025, 10:48:50 AMAug 12
to Discuss, tc...@ucar.edu, Discuss, Zachary Wright
Tom and Karl, 

These seem like really promising approaches. Thank you! A lot of our users won't have subscriptions so I think anything that requires that would be off the table, but essentially PUSHING the data to a user-provided endpoint could work and would avoid unwanted file access. I think the main drawback would be the amount of work on our end to get the automation and authorization workflow in place. On the other hand, it does solve another longstanding desire to add a "shopping cart" functionality for our public data that we provide via a portal (https://atlas.kpmp.org/repository/). 

-- Zach

Ian Foster

unread,
Aug 12, 2025, 11:16:35 AMAug 12
to Zachary Wright, Discuss, tc...@ucar.edu, Discuss, Zachary Wright

This is quite common use case. There is some information here: https://docs.globus.org/guides/recipes/modern-research-data-portal/

Zachary Wright

unread,
Aug 12, 2025, 1:54:28 PMAug 12
to Discuss, ian...@gmail.com, tc...@ucar.edu, Discuss, Zachary Wright
Very useful. Thanks for the link. 
Reply all
Reply to author
Forward
0 new messages