Hi Simon,
I tend to give long-winded answers to questions like this, so here’s a long-winded way of saying “You already have a DTN!”.
I have to admit that—coming from GCSv4—GCSv5’s terminology was confusing me for a while. Eventually I developed the following understanding of the different parts of a GCSv5 system:
• DTN: A machine (VM, physical, cloud) which runs the Globus Connect Server software (ideally 24x7), and connects the users of your Endpoint to the storage (local storage, network storage, cloud storage, etc.).
At least one of these is required for your Endpoint to be usable.
All of the requirements (open ports, time synchronization, etc.) in the GCSv5 Installation documentation apply to the DTN.
If you use multiple DTNs in an Endpoint, all the DTNs will need to have the same storage access and the same network access.
• Connector: The part of the GCSv5 software responsible for interacting with a particular type of storage (POSIX storage, S3-compatible storage, Google Drive, etc.).
Installing GCSv5 installs every connector—even if you don’t have all of them licensed—so you don’t have to think about if a particular software is installed or not.
• Storage Gateway: A unique combination of (connector, base path, policies).
At least one of these is required for your Endpoint to be usable.
A Storage Gateway may only be associated with one Endpoint.
Each connector you use requires its own storage gateway.
It’s not unusual for an Endpoint to have only one Storage Gateway per licensed connector.
The “unique combination” rule above means you can end up having multiple storage gateways for one connector. For example, let’s say you have a server used for PHI de-identification. The path /data/phi holds PHI; the path /data/deidentified holds the de-identified data. In this example, you would have two storage gateways: One would be (POSIX, /data/phi, High-Assurance), and one would be (POSIX, /data/deidentified, Not High-Assurance).
• Collection: A unique combination of (storage gateway, base path, metadata/policies). This is what the user interacts with when they use the Globus web site & CLI!
A Collection may only be associated with one Storage Gateway.
• Mapped Collection: A Collection which stores connector-specific credentials, mapping your Globus identity to your identity on the storage.
At least one of these is required for your Endpoint to be usable.
Each Storage Gateway requires at least one Mapped Collection for the Storage Gateway to be usable.
It’s not unusual for a Storage Gateway to only have one Mapped Collection.
“Identity on the storage” could mean your UNIX UID (for POSIX storage), an Access-Key/Secret-Key pair (for S3), a Google OAuth 2.0 credential (for Google Cloud Storage and Drive), etc..
• Guest Collection: A Collection which allows other Globus users to access a mapped collection as you.
In order to use this, you must have an underlying mapped collection, as the mapped collection stores your credentials.
Administrators may not create Guest Collections. This is because the user must give consent for the guests to access the storage as the user. However, administrators may inspect and delete Guest Collections.
• Endpoint: An abstract set of Storage Gateways (and their Collections).
An Endpoint is ‘conceived’ by creating an appropriate Globus Auth client, getting the Client ID and Client Secret, and using the `globus-connect-server endpoint setup` command to get the Deployment Key.
An Endpoint is ‘born’ by using the Client ID, Client Secret, and Deployment Key to set up your first DTN.
Once your Endpoint is born, you make your nascent Endpoint grow by creating a Storage Gateway, and then a Mapped Collection.
Note how even once your endpoint is born, it remains an abstract thing, which users don’t really care about. Instead, the users most directly care about the Collections (that’s how they access their data) and the DTN (that’s the server (or servers) which send/receive their data). And since the Endpoint is an abstract thing, it’s never the actual target of any communications or connections. You never actually talk to an Endpoint. When you first set up an Endpoint, you are describing the new Endpoint to Globus (“It has this name, this owner, …”). When you set up a DTN, the `globus-connect-server` software tells Globus HQ “I am representing this endpoint in the Real World. When you need to communicate with something on an Endpoint (for example, to access data), talk to me.”
I hope that helps explain things! It should be clear now that, when you’re just getting started, you only need one machine, the machine that will ultimately become your first DTN.
~ Karl
--
On Jul 29, 2022, at 2:59 PM, Karl Kornel <akko...@stanford.edu> wrote: