Hi Deven,
I wanted to share some thoughts on this topic since I’ve helped several groups onboard to Globus for both institutional and research project use. I also have an ulterior motive, because there are some parts of getting started that I think are better served by the community rather than the Globus team. I’ll lay them out in my reply below and encourage anyone who feels the same to talk to me at GlobusWorld next month (Karl, I’m hoping you’ll be there).
And in full disclosure, I used to work for Globus, so I’m at least as biased as Lev. Although I also recognize that other factors may impact the choice on whether or not to go with Globus.
the maintenance overhead of running a Globus Server instance.As others have written, the maintenance and operations aspect of a GCS instance is low—provided you dedicate the servers, containers, etc., to just being DTNs. If you start adding other applications or capabilities to those systems, you’re asking for headaches. This is a chance for you to have at least one piece of functionality that is not tightly coupled to other components, take advantage of it. You can even pause activity on the endpoint to handle file system outages or maintenance.
However, setting up a GCS instance can take some effort and planning. This is where I think the community’s shared experience could make a big difference. I can usually bring up a basic Globus collection (POSIX or S3) in less than hour. That’s because I understand how things like the storage gateways and identity mapping work. But I have also spent hours and opened tickets dealing with particular issues because every system is different.
I would like to see the community share at least a couple of things to help reduce the up-front costs:
Sample configurations for GCS components, particularly identity mapping. The core Globus documentation can only cover some of these.
A single document with a table of all the GCS components and the parameters for each. Those parameters map to policy decisions. For example, knowing where you allow or limit access to parts of the file system is important.
Rather than expect the Globus documentation to cover all of the potential use cases (and test them before updates) we could contribute this as a community.
the UX from the perspective of customers who could use Globus to transfer data to us (and vice-versa).
This hasn’t been a problem, provided that the projects and users understand that Globus does not provide a mount for the data. I generally ask several questions to make sure that they’re not looking for Google Drive-like access to their data, or that they understand what can be provided. In other words, make sure you’re offering the right solution.
How it integrates with S3
Very well. You’ll need to understand how to configure the identity mapping portion, of course. I’ve helped projects migrate data to and from S3 routinely.
How access control is managed, e.g. how it integrates with OIDC providers and how permissions are managed.
Overall, any issues here aren’t with Globus, it’s dealing with the idiosyncrasies of the storage system and the identity providers. For example, UCSD sends out to the world a username which is a long opaque string, and is impossible (on purpose) to map to something useful like “rpwagner”, my typical username. So we’ve developed trusted ways to map the UUID of our identities to the accounts on the storage systems. Experiences like these are why I think we could help each other with some shared knowledge.
Outside of those difficulties, I have not found a single access control policy that could not be implemented at some level.
—Rick