Globus as part of CI infrastructure

52 views
Skip to first unread message

Joshua Brown

unread,
Mar 30, 2022, 3:25:18 PMMar 30
to Discuss
Hi all,

As we have built our appliction on top of Globus I am interested in understanding the best means for incorporating it in our CI infrastructure and wanted to ask the community and the Globus team what the best approach to take is? 

For our scenario we would ideally be able to standup and register a managed globus connect server version 5.4 and two endpoints. Again, ideally we could set up and tear them down as needed without using a bunch of browser links as part of the server and endpoint configuration, so the steps could be automated - which I'm not quite sure is yet possible.

There are a few issues we are running into that are making this a bit painful or where I could use clarification on:

1. The browser login requests i.e. Is there a way to do this through the API?

globus-connect-server login localhost

Please authenticate with Globus here:

------------------------------------

https://auth.globus.org/v2/oauth2/authorize?client_id=blahblahblah&prompt=login

------------------------------------
Enter the resulting Authorization Code here:

2. Can I reuse of the client ids and secrets? It would be nice if we could just delete a vm with the Globus endpoints and globus connect servers on them and recreate them just using the client id and secrets we had before. This would also delete the deployement-key.json but I'm not sure it is needed if everything is being recreated anyway or is this a problem?

Best

Jason Alt

unread,
Apr 1, 2022, 10:40:50 AMApr 1
to Joshua Brown, Discuss
> standup and register a managed globus connect server version 5.4 and two endpoints

Do you need an endpoint with 2 collections perhaps? Or does this mean you need an endpoint with 2 nodes?

Before I get into your questions, I'd like to understand your use case better. Are you trying to launch new GCS endpoints on demand, owned by you (or some service account), preconfigured with collections, for use by some other targeted individuals or groups? Or is this the same endpoint(s) redeployed for recovery (or stateless nodes, or hardware migrations)?

> 1. The browser login requests i.e. Is there a way to do this through the API?

No, at least not the way you may be thinking. The /authorize call performs the OAuth2 step in which you consent to allow the globus-connect-server CLI on that node to act on your behalf while interacting with the new endpoint. The consent step requires human interaction; hence the need to open a browser. If the consent has already been performed, then perhaps it could be automated (chicken and egg, I know). That's why I wonder what your specific use case is.

I'm curious as to what triggers the deployment of these new endpoints (job submission? new hardware?), and if that interaction can be used to manually generate the token.

> 2. Can I reuse the client ids and secrets?

No. Once the client ID is used, it becomes the owner of certain resources throughout the Globus platform. Even once the endpoint and client ID are deleted, its ID is still maintained for internal purposes. Solvable? Maybe, but there is a more critical concern. The client ID is unique and trusted to identify a specific client (or in this case, a specific endpoint). Users can't rely on the ID as a reliable identifier if the endpoint can change from time to time (similarly to how I'm trusting that joshbr...@gmail.com has not changed and remains a valid identifier). Again, solvable? Maybe, but I don't think that's the real issue for this use case. I'd prefer that client IDs are easily and automatically generated.

> It would be nice if we could just delete a vm with the Globus endpoints and globus connect servers on them and recreate them just using the client id and secrets we had before. This would also delete the deployement-key.json but I'm not sure it is needed if everything is being recreated anyway or is this a problem?

Just to be clear (for future readers too), deleting a VM (or node) containing GCS does nothing to release the resources associated with GCS within the Globus platform (the Globus AWS services). You'll want to run 'endpoint cleanup' so at the very least the Transfer collections aren't found by your users when they search for endpoints.

Jason

Joshua Brown

unread,
Apr 22, 2022, 1:50:16 PMApr 22
to Discuss, jaso...@globus.org, Discuss, Joshua Brown

Sorry for the delayed response.


>  Do you need an endpoint with 2 collections perhaps? Or does this mean you need an endpoint with 2 nodes?

I need two separate endpoints, each on a separate Globus Server (sorry I wasn't clear) so I can test a transfer between them. Our application wraps the Globus transfer functionality, we need to make sure that before and after the transfer a payload appears as expected - from the point of view of our application.


>  Before I get into your questions, I'd like to understand your use case better. Are you trying to launch new GCS endpoints on demand, owned by you (or some service account), preconfigured with collections, for use by some other targeted individuals or groups?

That was a loaded question. For the purpose of testing our application, we would be launching GCS endpoints with a service account. My questions are all within the context of how to stand up two separate endpoints automatically within the CI. What I would like to be able to do is start with a fresh install and teardown, not having machines sitting idel with Globus on them when I am not using the CI. Given the information I have that may not be possible with Globus, I'm guessing another option is to manually set up two Globus endpoints and reuse them without the creation and teardown.


>  Or is this the same endpoint(s) redeployed for recovery (or stateless nodes, or hardware migrations)?

I think I have already kind of answered this above. The answer is no. Our Continous integration pipelines use virtual machines hardware migrations are not a problem because the CI is typically not being used non-stop and is ideally ephemeral. In testing I don't really want to preserve state after the testing is completed.


> No, at least not the way you may be thinking. The /authorize call performs the OAuth2 step in which you consent to allow the globus-connect-server CLI on that node to act on your behalf while interacting with the new endpoint. The consent step requires human interaction; hence the need to open a browser. If the consent has already been performed, then perhaps it could be automated (chicken and egg, I know). That's why I wonder what your specific use case is.

Well, it would be fine if I only had to manually authenticate once and then use the same information to stand up the Globus Servers. Is this possible? 

> I'm curious as to what triggers the deployment of these new endpoints (job submission? new hardware?), and if that interaction can be used to manually generate the token.

Exactly, a push to a GitHub job triggers the CI pipeline and that's when we want to stand up the servers.


> No. Once the client ID is used, it becomes the owner of certain resources throughout the Globus platform. Even once the endpoint and client ID are deleted, its ID is still maintained for internal purposes. Solvable? Maybe, but there is a more critical concern. The client ID is unique and trusted to identify a specific client (or in this case, a specific endpoint). Users can't rely on the ID as a reliable identifier if the endpoint can change from time to time (similarly to how I'm trusting that joshbr...@gmail.com has not changed and remains a valid identifier). Again, solvable? Maybe, but I don't think that's the real issue for this use case. I'd prefer that client IDs are easily and automatically generated.

I'm not picky about how it should be done, I just want a solution. 


> Just to be clear (for future readers too), deleting a VM (or node) containing GCS does nothing to release the resources associated with GCS within the Globus platform (the Globus AWS services). You'll want to run 'endpoint cleanup' so at the very least the Transfer collections aren't found by your users when they search for endpoints.

Noted.

Jason Alt

unread,
Apr 27, 2022, 12:32:06 PMApr 27
to Joshua Brown, Discuss
From the sounds of it, you'll want to manually set up two endpoints and keep the client ID/secret/deployment key to redeploy on demand. Getting the client ID/secret and performing the oauth consent flows prevents you from deploying unique endpoints automagically on demand (all things we want to resolve). If you add an Auth client as an admin role on the endpoint, you can also modify the endpoint configuration on-the-fly from within CI without human interaction. That should be useful if you want to customize collections for specific CI events, but it'll require a bit of python coding on your part.

Jason

Joshua Brown

unread,
May 19, 2022, 10:06:03 PMMay 19
to Discuss, jaso...@globus.org, Discuss, Joshua Brown
Hi Jason, 

 > If you add an Auth client as an admin role on the endpoint, you can also modify the endpoint configuration on-the-fly from within CI without human interaction. That should be useful if you want to customize collections for specific CI events, but it'll require a bit of python coding on your part.

Thanks for all of your useful comments, can you walk me through the steps required to do this? Are you suggesting that I use the globusSDK and register a Globus application that has admin rights? 

Joshua Brown

unread,
May 19, 2022, 10:20:44 PMMay 19
to Discuss, Joshua Brown, jaso...@globus.org, Discuss

And to clarify more, I know how to set register an application but, I'm not quite sure how you go about granting the admin scopes so it has the authorization to automate the set up steps for a Globus Connect Server.

Jason Alt

unread,
May 21, 2022, 12:55:25 PMMay 21
to Joshua Brown, Discuss
You could likely manually create a storage gateway and mapped collection, grant the CI client data access to the mapped collection (via allowed domains and identity mapping), then at runtime launch an instance of the endpoint and have CI jobs access predefined paths within the collection (based on job id perhaps). That's probably the simplest solution if all you need is a client-accessible mapped collection. If you have multiple, known client IDs, you just need to adjust the identity mapping. There's an explanation of how to do that in section 3 of https://docs.globus.org/globus-connect-server/v5.4/use-client-credentials/.

If you want the client to be able to configure the endpoint/gateway/mapped collections, you'll need to set the client as an administrator on the endpoint. In this example, the CLIENT_ID_USERNAME is the app client ID; this is not the endpoint ID.

$ globus-connect-server endpoint role create administrator $CLIENT_ID_USERNAME
Role ID: ef8a7108-d917-11ec-b37e-fdd01edbf245
$ globus-connect-server endpoint role list
Role ID                              | Role          | Principal                                                  
------------------------------------ | ------------- | ------------------------------------------------------------
62dd115a-10c9-11ec-a018-811dd7c5dbfa | administrator | jaso...@globus.org                                        
ef8a7108-d917-11ec-b37e-fdd01edbf245 | administrator | 4d6e9126-f428-4dd9...@clients.auth.globus.org
fc9ab067-5ce3-4815-bfed-59c6770b3ad3 | owner         | jaso...@globus.org          


In this example script, the client ID gets the 'manage_collections' scope which allows it to interact with the GCS Manager API and then creates a POSIX storage gateway and mapped collection.

#!/usr/bin/env python3

import globus_sdk
 
# Substitute your values here:
ENDPOINT_ID = "ENDPOINT_ID"
GCS_MANAGER_FQDN = "GCS_MANAGER_FQDN"
CLIENT_ID = "YOUR_APP_CLIENT_ID"
CLIENT_ID_USERNAME=CLIENT_ID + "@clients.auth.globus.org"
CLIENT_SECRET = "YOUR_APP_CLIENT_SECRET"
 
#
# We need an access token with the 'manage_collections' scope in order
# to interact with the GCS Manager API.
#

# The authorizer manages our access token for the scopes we request
authorizer = globus_sdk.ClientCredentialsAuthorizer(
    # The ConfidentialAppAuthClient authenticates us to Globus Auth
    globus_sdk.ConfidentialAppAuthClient(
        CLIENT_ID,
        CLIENT_SECRET
    ),
    f"urn:globus:auth:scope:{ENDPOINT_ID}:manage_collections"
)

# The access token is stored in authorizer.access_token
access_token = authorizer.access_token

#
# We'll need a GCS Client
# https://globus-sdk-python.readthedocs.io/en/stable/services/gcs.html
#
gcs_client = globus_sdk.GCSClient(GCS_MANAGER_FQDN, environment='sandbox', authorizer=authorizer)

#
# Create a storage gateway. The SDK GCSClient doesn't currently have a member function for
# creating storage gateway, so we'll make the POST call according to the GCS API docs.
# https://docs.globus.org/globus-connect-server/v5.4/api/openapi_Storage_Gateways/#postStorageGateway
#
gateway_doc = {
    'DATA_TYPE': 'storage_gateway#1.1.0',
    'display_name': 'My Unique Storage Gateway Display Name',
    # POSIX Connector ID
    'connector_id': '145812c8-decc-41f1-83cf-bb2a85a2a70b',
    # Set whichever domain you want to allow data access on the mapped collection. In this case,
    # the client will be able to access the mapped collection.
    'allowed_domains': ['clients.auth.globus.org'],
    # We only have a single domain so we aren't required to supply an identity_mapping, however,
    # I want to make sure this is the only client that maps _and_ I want to be able to map to a
    # more useful local username than the CLIENT_ID.
    'identity_mappings': [{
        'DATA_TYPE': 'expression_identity_mapping#1.0.0',
        'mappings': [{
            'source': '{username}',
            'match': CLIENT_ID_USERNAME,
            'output': 'ci_client',
        }]
    }],
    'policies': {'DATA_TYPE': 'posix_storage_policies#1.0.0'}
}

# Returns globus_sdk.response.GlobusHTTPResponse
resp = gcs_client.post('/storage_gateways', data=gateway_doc)
gateway_id = resp.data['data'][0]['id']

#
# Create a mapped collection on the storage gateway. This is supported by the SDK.
# https://globus-sdk-python.readthedocs.io/en/stable/services/gcs.html#globus_sdk.GCSClient.create_collection
# Returns UnpackingGCSResponse
# Collections doc reference: https://docs.globus.org/globus-connect-server/v5.4/api/schemas/Mapped_Collection_schema/
collection_doc = {
    'DATA_TYPE': 'collection#1.5.0',
    'collection_type': 'mapped',
    'display_name': 'My Client-Created Mapped Collection Display Name',
    'storage_gateway_id': gateway_id,
    'public': True,
    'collection_base_path': '/',
}

resp = gcs_client.create_collection(collection_doc)
collection_id = resp.data['id']

That created the gateway and mapped collection and set my client username as the owner on the collection along with the administrator role on the collection:

$ globus-connect-server storage-gateway list
Display Name                           | ID                                   | Connector | High Assurance | MFA  
-------------------------------------- | ------------------------------------ | --------- | -------------- | -----
My Unique Storage Gateway Display Name | 8d038f24-2e10-4f52-9308-58a9d068e944 | POSIX     | False          | False
$ globus-connect-server storage-gateway show 8d038f24-2e10-4f52-9308-58a9d068e944
Display Name:                My Unique Storage Gateway Display Name
ID:                          8d038f24-2e10-4f52-9308-58a9d068e944
Connector:                   POSIX
High Assurance:              False
Authentication Timeout:      15840
Multi-factor Authentication: False
Allowed Domains:             ['clients.auth.globus.org']
(venv) [centos@(gcs dev 1) client_admin]$ globus-connect-server collection list
ID                                   | Display Name                                     | Owner                                                        | Collection Type | Storage Gateway ID                  
------------------------------------ | ------------------------------------------------ | ------------------------------------------------------------ | --------------- | ------------------------------------
c458e931-3b73-4798-9729-43f1a4de3870 | My Client-Created Mapped Collection Display Name | 4d6e9126-f428-4dd9...@clients.auth.globus.org | mapped          | 8d038f24-2e10-4f52-9308-58a9d068e944
$ globus-connect-server collection show c458e931-3b73-4798-9729-43f1a4de3870
Display Name:                My Client-Created Mapped Collection Display Name
Owner:                       4d6e9126-f428-4dd9...@clients.auth.globus.org
ID:                          c458e931-3b73-4798-9729-43f1a4de3870
Collection Type:             mapped
Storage Gateway ID:          8d038f24-2e10-4f52-9308-58a9d068e944
Connector:                   POSIX
Allow Guest Collections:     False
Disable Anonymous Writes:    False
High Assurance:              False
Authentication Timeout:      15840
Multi-factor Authentication: False
Manager URL:                 https://1008a.8540.sandbox2.zones.dnsteam.globuscs.info
TLSFTP URL:                  tlsftp://m-fe434a.1008a.8540.sandbox2.zones.dnsteam.globuscs.info:443
Force Encryption:            False
Public:                      True
Contact E-mail:              jaso...@globus.org
$ globus-connect-server collection role list c458e931-3b73-4798-9729-43f1a4de3870
Role ID                              | Collection ID                        | Role          | Principal                                                  
------------------------------------ | ------------------------------------ | ------------- | ------------------------------------------------------------
9653a0c0-d924-11ec-b37e-fdd01edbf245 | c458e931-3b73-4798-9729-43f1a4de3870 | administrator | 4d6e9126-f428-4dd9...@clients.auth.globus.org

From that example, hopefully you can construct other GCS API calls to configure the endpoint as needed using the SDK (https://globus-sdk-python.readthedocs.io/en/stable/) and API (https://docs.globus.org/globus-connect-server/v5.4/api/#api_reference) references.

Jason

Joshua Brown

unread,
May 23, 2022, 10:57:34 AMMay 23
to Discuss, jaso...@globus.org, Joshua Brown
Thank you for the very helpful response!

I am most interested in the second scenario in which you provided very instructive feedback. I wanted to clarify a few points:

If the node goes down and I lose everything except the deployment key from the endpoint create command and node configuration file (created with the --export-node flag), what do I need to get everything else back and running? And what steps still require manual interaction? I do not understand what configuration is stored on the Globus Cloud and what items need to be rerun. The only thing I understand for certain is that the Globus cloud does not contain the information in the deployment key. For instance, if I lose the node, do I need to recreate the collections and get new UUIDs or are the old ones still valid, the same for the storage gateways. If I need to recreate the storage gateways and collections do I need to delete the old ones that might still be registered on Globus Cloud? 

Possible steps from my understanding:
  a. Reinstall the Globus Connect Server (can be automated)
  b. recreate the endpoint with the deployment key (does this have to be manual?)
  c. recreate the node using the --import-node flag with the node configuration file (automated with the cli - I do not see how to do this with the API or globusSDK)
  c. Rerun the python script to create collections and storage gateway (can be automated, delete old collections and gateway?)


Jason Alt

unread,
May 23, 2022, 5:26:28 PMMay 23
to Joshua Brown, Discuss
Everything except the client id, client secret, deployment key and node configuration file is stored in the Globus AWS services (encrypted). The only thing you need to do to get back up and running is:

# globus-connect-server node setup --import-node <node_config> --deployment-key <deployment-key> --client-id <client_id>

`node setup` pulls down the latest configuration for the endpoint including gateways, collections, roles, sharing policies, etc. At that point, the node should be fully operational with every defined collection; no need to recreate anything. 
Reply all
Reply to author
Forward
0 new messages