Question about flow setup

40 views
Skip to first unread message

Anthony Weaver

unread,
Jan 31, 2025, 9:06:26 AM1/31/25
to Discuss
I'm starting to work on setting up my first flow and I have some questions.  Here is basically what I'm looking to do.  I'd like to set up a two part flow that does data transfer and then a compute.  Here's where I'm not sure what to do.  I've already got a collection setup on one endpoint that we've been transferring to.  

The compute though will happen on a different machine.  I understand that I will have to setup GCS on the machine that does the compute (it's a Linux server) but on the compute device we already NFS mount the underlying directory for the Globus collection that we are transferring the data to.  My question is on the compute device do I also have to setup a globus collection or how exactly do I handle the compute piece?

Thank you in advance for all help and tips

Lev Gorenstein

unread,
Jan 31, 2025, 12:45:13 PM1/31/25
to Anthony Weaver, Discuss
Tony,

I'll let colleagues speak to more details, but generally speaking:
  • on the compute device you would install a Globus Compute endpoint (single-user or multi-user) and configure it to use specific executors (be that localhost, cluster scheduler, etc)
  • you would also need to install a Globus Compute client (can be on the same compute device, or anywhere else, just like you could use Globus CLI from anywhere)
  • with the compute client, you write and register a function to be executed

At this point, armed with the compute endpoint UUID and function UUID, you will be able to execute this function on that endpoint - via the web app, or by using CLI commands, or by calling it from inside the flow using Compute Action Provider.

Here's a quickstart guide, a tutorial, and full-blown detailed documentation.
You could also play with things in our hosted Jupyter notebook.

TL;DR: compute endpoints are completely separate and independent from transfer endpoints (GCP or GCS). Sounds like you do not need a GCS on your compute device (because it mounts the filesystem where the is being transferred to). But you do need to install the Compute endpoint there (because that's what accepts and processes requested compute tasks).


Lev
--
Lev Gorenstein
Solutions Architect
Globus // University of Chicago
e: l...@globus.org

Ada Nikolaidis

unread,
Jan 31, 2025, 1:13:51 PM1/31/25
to Anthony Weaver, Discuss, Lev Gorenstein
Anthony,

It sounds like your intuition here is good! As Lev mentioned, if you already have the storage from the destination collection mounted such that it's accessible to the Compute endpoint (with proper permissions in place, of course), then I believe this configuration could work as you've described.

One thing worth mentioning from the Flows/Compute side of the equation: If the paths to these files are different from the POV of the Compute and GCS endpoints—for instance, as a consequence of how this storage is being mounted—you'll want to be sure to account for that somewhere. That could either be applied by the Compute function as a transformation to any paths that it receives as input (or emits as output); or it could be performed in the flow, by rewriting any path values sent to/received from the Compute function.

Good luck and let us know if you run into any issues!

Ada Nikolaidis
Globus Software Engineering Manager
Globus.org | University of Chicago

Lei Wang

unread,
Jan 31, 2025, 1:22:34 PM1/31/25
to Lev Gorenstein, Anthony Weaver, Discuss
Anthony,

Lev has already linked the documentation on most things you need, I'll just add some details on the Compute side of the setup.

The condensed list of TODOs:
  • On the Compute server/machine:
    • Install the Compute Endpoint (not GCS/GCP Endpoint, as you already have the files to be read/modified mounted locally)
    • Configure and start it ie. globus-compute-endpoint configure first_ep; globus-compute-endpoint start first_ep which will print an EP_UUID
    • Register a function with Compute and get a FUNC_UUID (can be done from this machine or any other where you've installed the globus-compute-sdk)
    • Try to run the function in a sample .py script ie. Client.run('my_argument', endpoint_id=EP_UUID, function_id=FUNC_UUID) to ensure it gives results as expected
  • Create a 2+ step flow where the first step is transfer related and the second is to send a Compute task to the Compute server
    • The key parts of the Compute Action step in the Flow Definition:
    •       "Type": "Action",
            "ActionUrl": "https://compute.actions.globus.org/v3",
            "Parameters": {
              "tasks": [
                {
                  "args.=": "getattr('args')",        <-- This will pass 'args' from your Flow schema to Compute
                  "kwargs.=": "getattr('kwargs')",        if you pass in collection paths note the absolute paths can differ between GCS and Compute
                  "function_id.$": "$.function_id"    <-- This comes from user input but you can hardcode it to the FUNC_UUID from above
                }
              ],
              "endpoint_id": "4b116d3c-1703-4f8f-9f6f-39921e5864df"   <-- Put your EP_UUID here.  This one 4b11... is the Tutorial Endpoint which anyone can test with
            },
            "ResultPath": "$.RunResult"    <-- The output of the python function, which may contain stack traces
  • Run the Flow

It's great to hear you are combining Transfer and Compute, please do let us know if any of the steps are confusing and we will help and improve documentation as well.

Lei

Anthony Weaver

unread,
Jan 31, 2025, 1:56:15 PM1/31/25
to Discuss, l...@globus.org, Anthony Weaver, Discuss, l...@globus.org
Thank you for all the replies.  I did look at many of the documents you referenced but there were 1 or 2 in there I hadn't looked at yet which I'm sure will come in handy.
Ada, your response was very helpful and was what I was trying to get at with my question.  Basically as long as the software running on the compute node knows how to get to the data
I don't need to create a collection on the compute node itself.
Reply all
Reply to author
Forward
0 new messages