Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

DIfference between web transfer and globus flow transfer?

13 views
Skip to first unread message

Anthony Weaver

unread,
Feb 27, 2025, 2:20:06 PMFeb 27
to Discuss
I am noticing a difference between what happens when I do a transfer through Globus on the web versus when I do it as part of a Globus flow and I am looking for guidance on how to get the flow to match what happens in the web app.  

First, in the web app just doing a transfer,  I select my endpoint and then drill down to the folder I want to transfer.  In my case the base path is
/C/Users/hogdo/OneDrive/Desktop/
And the folder I want to transfer is STA_25_2_CSTA007

On the endpoint being transferred to, it creates the STA_25_2_CSTA007 (and all it's subdirectories) which is what I want.

In my flow, I have defined
"properties": {
    "source": {
      "type": "object",
      "title": "Source for transfer",
      "format": "globus-collection",
      "required": [
        "id",
        "path"
      ],
      "properties": {
        "id": {
          "type": "string",
          "title": "Source collection",
          "format": "uuid"
        },
        "path": {
          "type": "string",
          "title": "Source path"
        }
      },
      "propertyOrder": [
        "id",
        "path"
      ],

When I run the flow, the full path of the source is 
/C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007
and what gets transferred to the destination endpoint is the DCIM folder inside the STA_25_2_CSTA007 folder.  I couldn't seem to find any examples where I could select the base path and then a folder such that STA_25_2_CSTA007 gets transferred.  I tried doing recursive True inside the flow with no discernible difference in results.

Thank you for your patience and help.

Ada Nikolaidis

unread,
Feb 27, 2025, 4:38:29 PMFeb 27
to Anthony Weaver, Discuss
Hi Tony,

In your flow's transfer action, how are you setting the destination_path? I believe that the behavior you're seeing in the Web App is a result of the target folder being specified in the destination_path.

For example, here's an sample transfer request from the Web App:

{

  "DATA_TYPE": "transfer",

  "DATA": [

    {

      "DATA_TYPE": "transfer_item",

      "source_path": "/home/ada/STA_25_2_CSTA007/",

      "destination_path": "/home/ada/STA_25_2_CSTA007/",

      "recursive": true

    }

  ],

  "submission_id": "af8330bf-2f25-4758-8ea1-419235500263",

  "source_endpoint": "15bcadf8-fc7d-4a56-9b51-4f572b3eb441",

  "destination_endpoint": "62d5371b-7ed6-4da9-9512-4c58c9fea31a",

  "deadline": null,

  "delete_destination_extra": false,

  "encrypt_data": false,

  "fail_on_quota_errors": false,

  "filter_rules": null,

  "label": null,

  "preserve_timestamp": false,

  "skip_source_errors": false,

  "sync_level": null,

  "verify_checksum": true,

  "notify_on_succeeded": true,

  "notify_on_failed": true,

  "notify_on_inactive": true,

  "store_base_path_info": true

}


If you look above at the DATA array, you'll see that the source_path and destination_path include the target folder name as the last component of each the path. I suspect that your destination_path in the flow definition may be omitting that target folder name at the end of the path. Could you take a look to see whether it's included in the destination path? If not, it might help to see your flow definition to get a better idea what's going on.

Best,

Ada Nikolaidis
Globus Software Engineering Manager
Globus.org | University of Chicago

Anthony Weaver

unread,
Feb 27, 2025, 5:25:27 PMFeb 27
to Ada Nikolaidis, Discuss
Ada,

I believe I may not have been clear.  If I use https://apps.globus.org to manually execute a transfer (outside of any flow) do my transfer, that is where I get the behavior.
The log for that transfer show this information
image.png
And in the source I don't see the  STA_25_2_CSTA007 but it does get created on the destination

In my flow, my log shows:
{
  "input": {
    "threshold": "0.7",
    "batch_size": "96",
    "destination": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/CameraTrapArchive/Conservancy/Stager/"
    },
    "output_location": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/Output/"
    },
    "source": {
      "id": "96a9a3e9-f484-11ef-a975-0207be7ee3a1",
      "path": "/C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007/"
    },
    "json_file": "tony_test.json"
  }
}

The entire flow definition is:
{
  "States": {
    "ProcessImages": {
      "End": true,
      "Type": "Action",
      "Comment": "Process cameratrap images using pytorch wildlife",
      "WaitTime": 3600,
      "ActionUrl": "https://compute.actions.globus.org/v3",
      "Parameters": {
        "tasks": [
          {
            "kwargs": {
              "image_dir.$": "$.destination.path",
              "json_file.$": "$.json_file",
              "threshold.$": "$.threshold",
              "batch_size.$": "$.batch_size",
              "output_dir.$": "$.output_location.path"
            },
            "function_id": "22dce45c-081a-4535-bf9a-1ebf75bc89c6"
          }
        ],
        "endpoint_id": "c82c8b96-7548-432b-82e8-4d8d0e58b415"
      },
      "ResultPath": "$.ProcessImages"
    },
    "TransferFiles": {
      "Next": "ProcessImages",
      "Type": "Action",
      "Comment": "Transfer files",
      "WaitTime": 3600,
      "ActionUrl": "https://transfer.actions.globus.org/transfer",
      "Parameters": {
        "DATA": [
          {
            "source_path.$": "$.source.path",
            "destination_path.$": "$.destination.path"
          }
        ],
        "source_endpoint.$": "$.source.id",
        "destination_endpoint.$": "$.destination.id"
      },
      "ResultPath": "$.TransferFiles"
    }
  },
  "Comment": "Transfer and process files by invoking a Globus Compute function",
  "StartAt": "TransferFiles"
}

Ada Nikolaidis

unread,
Feb 27, 2025, 6:17:51 PMFeb 27
to Anthony Weaver, Discuss
Hi Tony,

I actually think we're on the same page. The sample I sent is a captured request from the Globus Web App at https://app.globus.org to the Globus Transfer service, which illustrates the request payload. Try revising your flow input to:

{
  "input": {
    "threshold": "0.7",
    "batch_size": "96",
    "destination": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/CameraTrapArchive/Conservancy/Stager/STA_25_2_CSTA007/"
    },
    "output_location": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/Output/"
    },
    "source": {
      "id": "96a9a3e9-f484-11ef-a975-0207be7ee3a1",
      "path": "/C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007/"
    },
    "json_file": "tony_test.json"
  }
}

Note the bolded line above, which has changed. According to your flow definition, the transfer action's destination_path is coming from your input: $.destination.path, so changing that value to include the target folder should result in the creation of that folder on the destination.

Let me know if/how the resulting behavior differs from your expectation after making this change!

Best,

Ada Nikolaidis
Globus Software Engineering Manager
Globus.org | University of Chicago
On Feb 27, 2025, at 14:25, Anthony Weaver <awea...@fandm.edu> wrote:

Ada,

I believe I may not have been clear.  If I use https://apps.globus.org to manually execute a transfer (outside of any flow) do my transfer, that is where I get the behavior.
The log for that transfer show this information

Anthony Weaver

unread,
Feb 27, 2025, 6:24:36 PMFeb 27
to Ada Nikolaidis, Discuss
Ada,

So is there a way to programmatically do that in the JSON files?  On my destination endpoint, that STA_25_2_CSTA007 directory
may not exist and if it doesn't I can't select in the flow form.  If there is an example that does this already, you can just point me to it,
you don't have to give me the code.

Thank you

Ada Nikolaidis

unread,
Feb 27, 2025, 7:15:16 PMFeb 27
to Anthony Weaver, Discuss
Hi Tony,

Sure thing! Here's a simplified (single action) version of your flow that shows one approach:

{

  "States": {

    "TransferFiles": {

      "End": true,

      "Type": "Action",

      "Comment": "Transfer files",

      "WaitTime": 3600,

      "ActionUrl": "https://transfer.actions.globus.org/transfer",

      "Parameters": {

        "DATA": [

          {

            "source_path.$": "$.source.path",

            "destination_path.=": "destination['path'].rstrip('/') + '/' + pathsplit(source['path'])[-1]"

          }

        ],

        "source_endpoint.$": "$.source.id",

        "destination_endpoint.$": "$.destination.id"

      },

      "ResultPath": "$.TransferFiles"

    }

  },

  "Comment": "Transfer and process files by invoking a Globus Compute function",

  "StartAt": "TransferFiles"

}


Note specifically the field destination_path.=, which uses the Flows expression syntax to add the final component of the source path to the end of the destination path.

You can find more information on Flows expressions in on the docs site: https://docs.globus.org/api/flows/authoring-flows/expressions/. Some of our sample flows in the docs site also perform path manipulations, e.g., https://docs.globus.org/api/flows/examples/two_hop/.

Hope this helps!

Best,

Ada Nikolaidis
Globus Software Engineering Manager
Globus.org | University of Chicago

Anthony Weaver

unread,
Feb 28, 2025, 10:17:26 AMFeb 28
to Ada Nikolaidis, Discuss
Ada,

I added your code and there was still an issue with pathsplit.  In an effort to try to debug I
added a step to my flow to compute some values to use before calling transfer and compute:

"ComputedValues":{
        "Comment": "Determine the paths needed for image directory and output",
        "Type": "ExpressionEval",
        "Parameters": {
          "image_dir.=": "destination['path'].rstrip('/') + '/' + pathsplit(source['path'])[-1]",
          "output_dir.=": "output_location['path'].rstrip('/') + '/' + pathsplit(source['path'])[-1]",
          "json_file.=": "pathsplit(source['path'])[-1] + '.json'"
        },
        "ResultPath": "$.ComputedValues",
        "Next": "TransferFiles"
      },

This is the output log from this step:

{
  "state_name": "ComputedValues",
  "state_type": "ExpressionEval",
  "output": {

    "threshold": "0.7",
    "batch_size": "96",
    "destination": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/CameraTrapArchive/Conservancy/Stager/"
    },
    "output_location": {
      "id": "c36fadc9-ac6e-46cb-82f0-8307b2185cfc",
      "path": "/Output/"
    },
    "source": {
      "id": "96a9a3e9-f484-11ef-a975-0207be7ee3a1",
      "path": "/C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007/"
    },
    "ComputedValues": {
      "image_dir": "/CameraTrapArchive/Conservancy/Stager/",
      "output_dir": "/Output/",
      "json_file": ".json"
    }
  }
}

As you can see from the output, pathsplit seems to be doing nothing or at least not adding its output to the string.
I changed the index to 0, then the output is:

"ComputedValues": {
      "image_dir": "/CameraTrapArchive/Conservancy/Stager//C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007",
      "output_dir": "/Output//C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007",
      "json_file": "/C/Users/hogdo/OneDrive/Desktop/STA_25_2_CSTA007.json"
    }

Based on this I changed my ComputedValues step to be:
"Parameters": {
          "image_dir.=": "destination['path'].rstrip('/') + '/' + pathsplit(source['path'].rstrip('/'))[-1]",
          "output_dir.=": "output_location['path'].rstrip('/') + '/' + pathsplit(source['path'].rstrip('/'))[-1]",
          "json_file.=": "pathsplit(source['path'].rstrip('/'))[-1] + '.json'"
        },

And that then showed the proper paths.  That's the long way of saying make sure to remove trailing / when using pathsplit
Reply all
Reply to author
Forward
0 new messages