hardware for endpoint vs data transfer node

198 views
Skip to first unread message

Simon Leary

unread,
Jul 29, 2022, 4:03:58 PM7/29/22
to Discuss
Hi,
I'm making a new deployment of globus connect server and up until a moment ago I thought this would all be on one machine.
I see that I need at least one data transfer node to accompany my endpoint, and I'm not sure exactly how they work so I'm not sure how much hardware to give them.

First of all, is it possible to run a data transfer node on the same machine as the endpoint? Is it a bad idea?

I know that hardware requirements are on a case by case basis, but is this something simple enough to say, 60/40 or 50/50 split of my available hardware? Which node does more work?

Simon

Wagner, Richard

unread,
Jul 29, 2022, 4:23:35 PM7/29/22
to Simon Leary, Discuss
Hi Simon,

It’s very common to run everything on one machine, especially at the start of your deployment. The next common use case is to add data transfer nodes (DTNs) for performance. The DTNs do the bulk of the work, especially on the file system and network side.

Eventually, you might consider splitting off the endpoint onto a separate node, like a VM, and having the DTNs be on their own. This could be a model if you wanted to run diskless DTNs.

https://docs.globus.org/globus-connect-server/v5.4/#diskless-dtn

But right now, you’re OK to start with a single node. Make it work, then make it work fast.

—Rick
> --
> You received this message because you are subscribed to the Google Groups "Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Karl Kornel

unread,
Jul 29, 2022, 5:59:44 PM7/29/22
to Simon Leary, Discuss

Hi Simon,

 

I tend to give long-winded answers to questions like this, so here’s a long-winded way of saying “You already have a DTN!”.

 

I have to admit that—coming from GCSv4—GCSv5’s terminology was confusing me for a while.  Eventually I developed the following understanding of the different parts of a GCSv5 system:

 

• DTN: A machine (VM, physical, cloud) which runs the Globus Connect Server software (ideally 24x7), and connects the users of your Endpoint to the storage (local storage, network storage, cloud storage, etc.).

                At least one of these is required for your Endpoint to be usable.

                All of the requirements (open ports, time synchronization, etc.) in the GCSv5 Installation documentation apply to the DTN.

                If you use multiple DTNs in an Endpoint, all the DTNs will need to have the same storage access and the same network access.

 

• Connector: The part of the GCSv5 software responsible for interacting with a particular type of storage (POSIX storage, S3-compatible storage, Google Drive, etc.).

                Installing GCSv5 installs every connector—even if you don’t have all of them licensed—so you don’t have to think about if a particular software is installed or not.

 

• Storage Gateway: A unique combination of (connector, base path, policies).

                At least one of these is required for your Endpoint to be usable.

                A Storage Gateway may only be associated with one Endpoint.

                Each connector you use requires its own storage gateway.

                It’s not unusual for an Endpoint to have only one Storage Gateway per licensed connector.

                The “unique combination” rule above means you can end up having multiple storage gateways for one connector.  For example, let’s say you have a server used for PHI de-identification.  The path /data/phi holds PHI; the path /data/deidentified holds the de-identified data.  In this example, you would have two storage gateways: One would be (POSIX, /data/phi, High-Assurance), and one would be (POSIX, /data/deidentified, Not High-Assurance).

 

• Collection: A unique combination of (storage gateway, base path, metadata/policies).  This is what the user interacts with when they use the Globus web site & CLI!

                A Collection may only be associated with one Storage Gateway.

 

• Mapped Collection: A Collection which stores connector-specific credentials, mapping your Globus identity to your identity on the storage.

                At least one of these is required for your Endpoint to be usable.

                Each Storage Gateway requires at least one Mapped Collection for the Storage Gateway to be usable.

                It’s not unusual for a Storage Gateway to only have one Mapped Collection.

                “Identity on the storage” could mean your UNIX UID (for POSIX storage), an Access-Key/Secret-Key pair (for S3), a Google OAuth 2.0 credential (for Google Cloud Storage and Drive), etc..

 

• Guest Collection: A Collection which allows other Globus users to access a mapped collection as you.

                In order to use this, you must have an underlying mapped collection, as the mapped collection stores your credentials.

                Administrators may not create Guest Collections.  This is because the user must give consent for the guests to access the storage as the user.  However, administrators may inspect and delete Guest Collections.

               

• Endpoint: An abstract set of Storage Gateways (and their Collections).

                An Endpoint is ‘conceived’ by creating an appropriate Globus Auth client, getting the Client ID and Client Secret, and using the `globus-connect-server endpoint setup` command to get the Deployment Key.

                An Endpoint is ‘born’ by using the Client ID, Client Secret, and Deployment Key to set up your first DTN.

Once your Endpoint is born, you make your nascent Endpoint grow by creating a Storage Gateway, and then a Mapped Collection.

 

Note how even once your endpoint is born, it remains an abstract thing, which users don’t really care about.  Instead, the users most directly care about the Collections (that’s how they access their data) and the DTN (that’s the server (or servers) which send/receive their data).  And since the Endpoint is an abstract thing, it’s never the actual target of any communications or connections.  You never actually talk to an Endpoint.  When you first set up an Endpoint, you are describing the new Endpoint to Globus (“It has this name, this owner, …”).  When you set up a DTN, the `globus-connect-server` software tells Globus HQ “I am representing this endpoint in the Real World.  When you need to communicate with something on an Endpoint (for example, to access data), talk to me.”

 

I hope that helps explain things!  It should be clear now that, when you’re just getting started, you only need one machine, the machine that will ultimately become your first DTN.

 

~ Karl

--

Wagner, Richard

unread,
Jul 29, 2022, 6:35:18 PM7/29/22
to Karl Kornel, Simon Leary, Discuss
Hi Karl,

I can see I need to update my concepts. I was thinking of the endpoint as a persistent server running the manager API.

Since the endpoint and node creation can be done any system with GCS installed, my model isn’t accurate.

—Rick

On Jul 29, 2022, at 2:59 PM, Karl Kornel <akko...@stanford.edu> wrote:



Vas Vasiliadis

unread,
Jul 29, 2022, 10:22:36 PM7/29/22
to Wagner, Richard, Karl Kornel, Simon Leary, Discuss
Hi Simon.

To your question:
> First of all, is it possible to run a data transfer node on the same machine as the endpoint? Is it a bad idea?

Yes, it’s possible and is, in fact, the most common configuration for the majority of Globus deployments.

To Karl’s description, here’s my summary mental model (at the risk of creating further confusion :-):

- Endpoint: A logical construct that identifies an instance of Globus Connect Server to the Globus service.

- DTN: A physical manifestation of the endpoint (more DTNs => larger physical footprint => better resilience/performance).

(- Connector: a package that implements an interface allowing the Globus service to access a specific storage system (a DSI in GridFTP terminology); can safely be ignored for all intents and purposes, since one never interacts directly with a “connector” ...hence the parentheses).

- Storage Gateway: An instance of a Globus connector configured to access a storage system using specified policies (policies include valid identity providers, path restrictions, et al).

- Collection: A logical construct that allows a user access to a storage system via the Globus service (constrained by the underlying Storage Gateway).

-- Vas
>> To unsubscribe from this group and stop receiving emails from it, send an email todiscuss+...@globus.org.
Reply all
Reply to author
Forward
0 new messages