Skipping missing files in a Globus transfer task

376 views
Skip to first unread message

Co, Michele (mc2zk)

unread,
Jul 31, 2020, 12:33:31 PM7/31/20
to user-d...@globus.org

Hi Globus User Discussion Group,

 

We have a user who submitted a Globus transfer task (multiple subdirectories, recursive, checksum copy) via the Globus WebUI.  It appears that at the time that he submitted the transfer certain files existed at the source endpoint and were added to the transfer task, but by the time Globus attempts to transfer the file, it no longer exists.  Globus performs a file existence check and reports ‘file not found’, as it should, but the transfer task then no longer makes progress because it does not seem to able to skip the file that is now missing at the source endpoint.

 

Is there a way in Globus to indicate a timeout after which Globus will move on to attempting to transfer the next file in the transfer task?  Alternately, is there a way to request that Globus ignore files (or, produce an error/warning and continue) in its transfer task list that are no longer found?  Is the user’s only recourse to cancel and resubmit the entire transfer request?

 

I’ve looked through the Globus Transfer API and haven’t found a transfer specification field that might be relevant.

 

Thanks in advance!

Michele

 

Jonathan Silverstein

unread,
Jul 31, 2020, 12:46:15 PM7/31/20
to user-d...@globus.org
Consider Sync ?

FWIW: When this rarely happens, we use the sync type of transfer in the UI subsequently - this achieves what we’ve desired: 
1. the notification you refer to that what was originally planned has changed and we think appropriate failure of the transfer 
2. then allows the submitter to decide what to do: resubmit transfer request, do sync transfer request, other…

Cheers,

Jonathan



Jonathan C. Silverstein, MD, MS, FACS, FACMI
Chief Research Informatics Officer https://rio.pitt.edu
and Institute for Precision Medicine https://ipm.pitt.edu

Professor
Department of Biomedical Informatics (DBMI) https://dbmi.pitt.edu
School of Medicine https://www.medschool.pitt.edu
and Clinical and Translational Science Institute https://ctsi.pitt.edu

Affiliate Scholar


-- 
To unsubscribe from this group and stop receiving emails from it, send an email to user-discuss...@globus.org.

Stephen Rosen

unread,
Jul 31, 2020, 1:30:11 PM7/31/20
to User Discuss
There isn't an option at present to do exactly what you're asking for -- it's hard for Globus to know whether the file is really gone, if its removal was intentional, if maybe it's a multi-node setup with a single bad node (e.g. autofs mount failed, now all files are "missing"), etc etc.

The norm in this case is for the user to cancel and resubmit.
From the service's perspective, that keeps things unambiguous and simple -- but we do understand it might be annoying and unwanted for a user!

app.globus.org does not support this feature today, but you can specify a deadline for a transfer task at which point it would be automatically cancelled.
So it is possible to submit a task with a short lifetime, and then handle the task failure with a complete retry.

For anyone reading this and thinking of writing automation for this purpose, I'll share how to do it in the globus-cli (it is, of course, possible with our SDK or direct use of the API as well):
    # warning: using GNU-date to get a well-formatted date six hours in the future
    # BSD-date (e.g. on macOS) does not support this syntax
    globus transfer --deadline "$(date -d '+6 hours' +"%Y-%m-%dT%H:%M:%S")" ...

Deadlines don't *exactly* line up with what you want here, since they apply to the whole task.
However, for relatively small tasks, setting a deadline and retrying on failures might get you better behavior.


I'm asking our team now about relevant improvements we might make in this space, but -- at least to the best of my knowledge -- this is the closest feature to your use case we have today.
I hope that helps!

Best regards,
-Stephen

Stephen Rosen

unread,
Jul 31, 2020, 3:16:25 PM7/31/20
to User Discuss
Our team just informed me that we are working on enhancements to transfer tasks which include the ability to skip rather than retry FileNotFound errors, and to list files which were skipped after the task completes.

There's no definite timeline for release yet, but we will announce it on our usual channels when it is done.
I would recommend subscribing to the developer-discuss listhost if you want to make sure you're notified about this (and other) features when they are released.

Best,
-Stephen

Scott Ruffner

unread,
Jul 31, 2020, 3:35:31 PM7/31/20
to user-d...@globus.org
On Fri, Jul 31, 2020 at 3:16 PM Stephen Rosen <sir...@globus.org> wrote:
Our team just informed me that we are working on enhancements to transfer tasks which include the ability to skip rather than retry FileNotFound errors, and to list files which were skipped after the task completes.

This would be EXTREMELY valuable as there is a use case in which large collections - directories - are being moved and are full of files with insufficient permissions (to wit: .DS_store files) for the UID being used on the Globus instance to read them - breaking whole transfers. 

The sooner we can get this option, the better!  If a user misses files they really did need/want, they can re-submit with the sync option later after getting file permissions fixed.

Best,

Scott 

Lev Gorenstein

unread,
Jul 31, 2020, 5:33:59 PM7/31/20
to user-d...@globus.org
I very much second Michele and Scott's sentiment - this would be a great
feature to have!


Lev

--
AK-47: the best stack-to-queue converter.

Co, Michele (mc2zk)

unread,
Aug 3, 2020, 9:18:30 AM8/3/20
to user-d...@globus.org

Hi Stephen and Jonathan,

 

Thank you both for your suggestions (sync types and setting deadlines for transfer tasks).  I will subscribe to developer-discuss to watch for any Globus feature releases.

 

Thanks again!

Michele

Reply all
Reply to author
Forward
0 new messages