Integrating rclone into Fireworks for better file access

30 views
Skip to first unread message

shy...@lbl.gov

unread,
Mar 19, 2018, 2:17:09 PM3/19/18
to fireworkflows
It would be really nice to integrate a tool like rclone into Fireworks so that file location can be more agnostic. This post is meant to start a discussion on implementation.

I'm not fully sure where this should be accomplished. There is the files_in, files_out feature as well as Filepad. Should we compliment this file storage with some sort of mongodb based index?

Anubhav Jain

unread,
Apr 1, 2018, 6:19:11 PM4/1/18
to fireworkflows
files_in and files_out is meant to be pretty simple and without a lot of bells and whistles. If it can be made more powerful without interfering at all with the current operation or making it super complex (e.g., no adding to the database here), then sure.

The FilePad is supposed to be more where the heavyweight, database-based file operations go. 

What exactly did you have in mind?

shy...@lbl.gov

unread,
Apr 3, 2018, 10:14:37 AM4/3/18
to fireworkflows
Sounds good.

Rclone acts very much like scp or rsync, so there isn't a need for a database. We will need some additional parameters in the fw_env: the remote to use and the root path in the remote.
This might be one parameter such as:
     rclone_remote: "gdrive:garden/2017/" 

which will be split by the code to use the "gdrive" remote, copying the launch to garden/2017/launch_"

Moving files to the remote using rclone is: rclone copy <file> <remote>:<path> 
Getting them is: rclone copy <remote>:<path><file>  <file>

The only potential problem here is that this will require subprocess calls and not intrinsic python calls. 

Anubhav Jain

unread,
Apr 3, 2018, 11:59:32 AM4/3/18
to Shyam Dwaraknath, fireworkflows
Why not put the rclone path directly in the _files_in / _files_out dict instead of fw_env?

The reason to put it in fw_env would be if you expect different *workers* to use different rclone setups. However if the rclone destination depends on the workflow and not on the worker, then putting the rclone path in _files_in / _files_out dictionary is the better way to do it.

Best
Anubhav

--
You received this message because you are subscribed to the Google Groups "fireworkflows" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fireworkflows+unsubscribe@googlegroups.com.
To post to this group, send email to firewo...@googlegroups.com.
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/cdb73f4b-2ddd-4692-ae9c-1c261973494f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best,
Anubhav
Reply all
Reply to author
Forward
0 new messages