Globus transfer of a specific file type

699 views
Skip to first unread message

sara azidane

unread,
Oct 27, 2021, 9:26:41 AM10/27/21
to Discuss
Hi! 

I am fairly new to Globus management. I have a folder with a lot of data and data types mixed in, and I would like to transfer from that endpoint in the cloud to my local one a number of data, but not all. I would be interested in transferring only those that have a specific word in their filename. 

From what I have been seeing, to do a transfer from one endpoint to another,  I would have to run the following: 

         globus transfer [options] <from-endpoint>:<from-path> <to-endpoint>:<to-path>

But I can't find any documentation on how to limit the transfer to a specific type of data, or files with a specific name. 

Thanks in advance for your time! :)


Stephen Rosen

unread,
Oct 27, 2021, 11:27:25 AM10/27/21
to sara.a...@gmail.com, Discuss
Hi,

Unfortunately, there isn't a native filtering option in Transfer submission.
This is on our radar as a desired feature; you aren't the first to encounter this limitation.

Your best options, as far as I know, are to do one of the following:

1. Apply filtering in a step when preparing data for transfer, e.g. prepare data with `tar --exclude=...`

This also is a good time to apply data compression.
However, for very large datasets or those in which you have no non-Globus access to the source, it's generally not viable.

2. Do a filtered file listing and submit a non-recursive "batch" transfer

If you're using the CLI, this is something like `globus ls --filter=...`, and then using that output as input to `globus transfer --batch batch_file.txt ...`

3. Transfer all the data and then do a "glob" delete

If the overhead is not too high for you, and the filtering works for your case, you could transfer all of the data and then delete unwanted files in the destination.
For example

    task_id="$(globus transfer --batch transfer_data.txt $SOURCE $DEST --jmespath 'task_id' --format=UNIX)"
    globus task wait "$task_id"
    globus delete --enable-globs $DEST --batch delete_data.txt

The data here matters a lot. With globs, the delete could handle inputs like `*.txt` or `*.jpe?g`
I'm happy to explain more if this solution interests you.


Thanks,
-Stephen

Aaron Schaer

unread,
Oct 27, 2021, 12:06:29 PM10/27/21
to Stephen Rosen, sara.a...@gmail.com, Discuss
Hi!

The `--exclude` option on the `globus transfer` command might also be useful. It filters out files and directories that have names matching a given pattern during recursive transfers, and can be passed multiple times to filter out multiple patterns.

It sounds like your use case really calls for include filtering, which is on our radar, but depending on what files you want to limit your transfer to it might be possible to use exclude filtering to omit all other files.

Best,
-Aaron

Stephen Rosen

unread,
Oct 27, 2021, 12:15:20 PM10/27/21
to Aaron Schaer, sara.a...@gmail.com, Discuss
Ah, yes, I apologize for my inaccurate response! Thank you, Aaron, for correcting this.

`--exclude` on Transfers is a relatively recent addition to address this exact use-case.

I should note that you would probably never want to use the transfer-and-delete solution I offered above (3), as it should always be possible to express the same meaning with an `--exclude`.
Reply all
Reply to author
Forward
0 new messages