Multiple DTNs on same storage gateway & collection

150 views
Skip to first unread message

Matt Pritchard

unread,
Jan 10, 2024, 1:00:49 PM1/10/24
to Discuss

Hi

In my GCSv5.4 setup, let’s say I’ve added a new host as a node to an existing endpoint where the existing storage gateway was already on a separate DTN node. To add the new one, I used 

globus-connect-server node setup –ip-address <address of new host>

but using the original deployment-key.json to add it to the existing setup.


I now have multiple DTN nodes in my setup, so I disabled the new one new temporarily, with 

globus-connect-server node update –disable <node id> 

as there are some other admin tasks to complete on it first.


But once it’s fully ready, and I’ve re-enabled it, will there be 2 DTNs serving the same collection, on the same storage gateway? (it would seem so, from what the various show and list commands tell me)


If so, how does Globus decide which one to send a particular transfer task to? Or is it intended that only 1 node should be active, at a time? 


Many thanks,

Matt

Jason Alt

unread,
Jan 10, 2024, 1:21:13 PM1/10/24
to Matt Pritchard, Discuss
Hi Matt-

You can have many nodes defined in the endpoint to allow for load distribution and failover. All nodes in a v5.4 endpoint serve all collections defined by the endpoint. Transfer resolves the node IP address(es) from the endpoint's FQDN and processes new tasks with the next active node in the endpoint.

Jason

Chandin Wilson

unread,
Jan 10, 2024, 6:34:22 PM1/10/24
to jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
Jason,

Dovetailing on this thread --

Is there a way to both determine the nodes behind an endpoint and run explicit transfer actions (such as a directory listing) against each node in an endpoint?

The scenario being an endpoint with a dozen nodes behind it, and wanting to ensure that all nodes have a properly functioning setup. More than once we've returned from a maintenance window with a random DTN missing a filesystem or two, resulting in mysterious sporadic transfer failures.

thanks,

--Chan
> --
> You received this message because you are subscribed to the Google Groups "Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Jason Alt

unread,
Jan 11, 2024, 11:57:06 AM1/11/24
to Chandin Wilson, matt.pr...@ncas.ac.uk, dis...@globus.org
You can't target a specific endpoint node within a Transfer task. However, you can target a specific node when accessing a collection's HTTPS interface (which functionally is quite similar to how a Transfer task makes use of the endpoint). In that respect, you could configure client credential access (as described in https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/) and fetch a known file/object from each collection at a specific interval to verify correct operation.

Jason 

Lev Gorenstein

unread,
Jan 11, 2024, 12:47:57 PM1/11/24
to Jason Alt, Chandin Wilson, matt.pr...@ncas.ac.uk, dis...@globus.org
Chan,

In principle, you could also do a 'globus-connect-server node disable' for all nodes but DTN X, then do a quick 'globus ls' (or a test transfer) on your collection(s).  Lather-rinse-repeat for all other DTNs.  GCSv5 makes such shenanigans easier compared to v4.

Probably not something to do continuously in production due to performance degradation (unlike Jason's HTTPS-based test approach), but certainly a possibility for a pre-RTS testing after maintenances.

Lev

Karl Kornel

unread,
Jan 11, 2024, 3:41:13 PM1/11/24
to Lev Gorenstein, Jason Alt, Chandin Wilson, matt.pr...@ncas.ac.uk, dis...@globus.org

Since a lot of sites use NHC (https://github.com/mej/nhc), maybe that could be used: Have NHC check if filesystems are mounted, or if an `ls` check passes.  Then make a node-mark-offline script that stops the Globus services (or maybe just Apache, or Apache + GridFTP), and a node-mark-online script to bring things online.

 

All of the Globus services are started using systemd, so if systemd already mounts your filesystems (or if it could be made to do so), you could drop in an override file to make Globus’ services dependent on those filesystems being mounted.  Or, you could make a oneoff check to run the NHC script, and make Globus’ services depend on that.

 

It wouldn’t be perfect—since the node isn’t disabled, Globus HQ might still try to send traffic there—but it may be better serving a timeout vs. a “file not found”.

 

~ Karl

Chandin Wilson

unread,
Jan 11, 2024, 5:47:12 PM1/11/24
to jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
From: Jason Alt <jaso...@globus.org>
Subject: Re: [Globus Discuss] Multiple DTNs on same storage gateway & collection
Date: Thu, 11 Jan 2024 10:56:52 -0600

> You can't target a specific endpoint node within a Transfer task. However, you can target a specific node when accessing a collection's HTTPS
> interface (which functionally is quite similar to how a Transfer task makes use of the endpoint). In that respect, you could configure client
> credential access (as described in https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/) and fetch a known
> file/object from each collection at a specific interval to verify correct operation.

Yeeeeeah, ok, I can see a path forward there, it wasn't too hard to adjust https://github.com/rpwagner/serverless-data/blob/main/bin/globuscollectionput.py for GET functionality.

Challenge now becomes getting to the right https name, looks like we have some reconfiguration to do on our endpoints. All of our `globus collection show xxxx --jq 'https_server'` output is giving 'Null'...

thanks.

--Chan

Karl Kornel

unread,
Jan 11, 2024, 7:55:14 PM1/11/24
to Chandin Wilson, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org

If you don’t have an HTTPS server URL set, then that means that HTTPS uploads/downloads aren’t enabled on the collection.  See https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/ and https://docs.globus.org/globus-connect-server/v5.4/data-access-guide/#enable_or_disable_https_access.

 

~ Karl

Lev Gorenstein

unread,
Jan 11, 2024, 9:14:53 PM1/11/24
to Chandin Wilson, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
It's 'https_url', not 'https_server' in JSON:

$ globus collection show 894d0ff8-0fb6-11eb-81b1-0e2f230cc907 --jq 'https_server'
null

$ globus collection show 894d0ff8-0fb6-11eb-81b1-0e2f230cc907 --jq 'https_url'
"https://m-450d72.3125b0.0ec8.data.globus.org"


Lev

Chandin Wilson

unread,
Jan 12, 2024, 1:56:38 PM1/12/24
to l...@globus.org, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
From: Lev Gorenstein <l...@globus.org>
Subject: Re: [Globus Discuss] Multiple DTNs on same storage gateway & collection
Date: Thu, 11 Jan 2024 21:14:15 -0500

> It's 'https_url', not 'https_server' in JSON:
>
> $ globus collection show 894d0ff8-0fb6-11eb-81b1-0e2f230cc907 --jq 'https_server'
> null
>
> $ globus collection show 894d0ff8-0fb6-11eb-81b1-0e2f230cc907 --jq 'https_url'
> "https://m-450d72.3125b0.0ec8.data.globus.org"

Ah. Well, that explains that, but also leads to the next issue that all the DTNs behind an endpoint map to the same name:

host m-450d72.3125b0.0ec8.data.globus.org
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.48
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.42
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.46
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.47
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.43
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.44
m-450d72.3125b0.0ec8.data.globus.org has address 128.211.133.45


Curl has the ability to target a specific IP with --resolve, eg,

curl --resolve m-450d72.3125b0.0ec8.data.globus.org:443:128.211.133.48 https://m-450d72.3125b0.0ec8.data.globus.org/api/info

which immediately runs into needing to pass the right Autherization header along, then parsing the responses and all that flavor of things which really want to be in a API / library level call somehow...

Anything I'm missing? Any prior work done in this vein?

thanks!

--Chan

Chandin Wilson

unread,
Jan 12, 2024, 3:16:04 PM1/12/24
to akko...@stanford.edu, l...@globus.org, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
Karl,

Thanks for the NHC pointer, I know our HPCS are in the "one off custom basket" in this regards so I've passed that along.

We're also in the "well-established multi-generational cluster" bucket, so it's a dream to think systemd controls filesystem mounts. We're also isolating the GCS5 services into containers which are currently started .... probably "mostly manually" still, once the filesystems have been given the "all clear" to mount.

All solvable problems, except for the Globus "connection refused" portion. The current failure scenario there, of not retrying connections until the list of available DTNs is exhausted, is part of what started this series of questions from my end.

--Chan

Karl Kornel

unread,
Jan 12, 2024, 3:23:01 PM1/12/24
to Chandin Wilson, l...@globus.org, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org

Hi Chan,

 

Just to confirm: You’re looking for a way to fetch a file from each DTN that’s associated with an endpoint, in order to confirm that each DTN is operating properly.  Do I understand that correctly?

 

~ Karl

Chandin Wilson

unread,
Jan 12, 2024, 3:52:07 PM1/12/24
to akko...@stanford.edu, l...@globus.org, jaso...@globus.org, matt.pr...@ncas.ac.uk, dis...@globus.org
From: Karl Kornel <akko...@stanford.edu>
Subject: Re: [Globus Discuss] Multiple DTNs on same storage gateway & collection
Date: Fri, 12 Jan 2024 20:22:43 +0000

> Hi Chan,
>
>
>
> Just to confirm: You’re looking for a way to fetch a file from each DTN that’s associated with an endpoint, in order to confirm that each DTN is
> operating properly. Do I understand that correctly?

Exactly. Endpoints are mostly HA, some will have guest collections as well. We've got service accounts to use for portions that need authentication.

--Chan

Michael Link

unread,
Jan 12, 2024, 6:07:19 PM1/12/24
to dis...@globus.org
To simplify the http checks, you could host the status file on a guest
collection with anonymous permissions to a path holding that only that
file. You would likely set up a guest solely for this purpose (and may
have to, as anonymous is not allowed from a HA storage gateway). Then
that would work with curl or any other url monitoring tool that lets you
control the resolution.

Mike

Josh Eisinger - NOAA Affiliate

unread,
Feb 13, 2024, 11:31:08 AM2/13/24
to Discuss, ml...@globus.org
I'm working with @chan.wilson on the task referenced earlier.  Most of the collections which we are looking to test at the underlying DTN level are HA, and do not allow guest collections to be created.  Would you possibly be able to recommend any alternative path to achieve this?

Thanks!
Josh

Reply all
Reply to author
Forward
0 new messages