Send folders from one process to the rest of the processes in a pipeline

74 views

Skip to first unread message

Elisabeth Ortega

unread,

Jun 8, 2022, 5:10:53 AM6/8/22

to next...@googlegroups.com

Hi all,

I'm struggling a little bit with an issue involving folders.

I have a workflow with 3 processes:

- process 1: decompress a folder using "tar -xf myFolder.tar.gz". The decompressed folder is placed in something like work/4a/482942304239402349023/myFolder.
- process 2: executes a code inside "myFolder"
- process 3: executes another code inside "myFolder" taking into account the result of step 2.

My issue is that I don't know how to send "myFolder" to processes 2 and 3, and checking the documentation I'm not able to find the solution. Sending the content file by file is not a solution because the folder contains other folders...

Does anyone have a clue about how to do it?

Thanks in advance for your help.

Elisabeth

----------------------------------------------

Elisabeth Ortega, PhD.
Computational Scientist

elisabet...@hpcnow.com

www.hpcnow.com

-----------------------------------------------

Combiz

unread,

Jun 8, 2022, 6:17:11 AM6/8/22

to next...@googlegroups.com

Hi Elisabeth,

I'd create a channel to read in the myFolder.tar.gz file (or *.tar.gz files). After this, NF handles the IO and path locations are referenced with variables inside the workflow, i.e., there's no need to know where the decompressed folder is in the work directory.

So your workflow would be: -

Create a channel for your input .tar.gz file(s) (e.g. https://www.nextflow.io/docs/latest/channel.html#frompath)

Process 1 should receive a path from the channel as an input (e.g. input: path myfolder) then use that in the script (e.g. `tar -xf ${myfolder} -C ./unzipped_dir`), then output that folder for the next process (e.g. output: path 'unzipped_dir', emit: unzipdir, type: dir)

Process 2 receives the input from process 1 (e.g. input: path unzipdir) and runs some code on that location (e.g. `myscript ${unzipdir}`) and outputs some result (e.g. output: path 'result.csv' emit: p2result).

Process 3 receives the input from process 1 and process 2 (e.g. input: path unzipdir path p2result)

etc.

The workflow could then be defined as something like main: process1(ch_inputs) process2(process1.out.unzipdir) process3(process1.out.unzipdir, process2.out.p2result)

Hope this helps!

Combiz

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/CAEbJWeqbLf6F0iWCVGTsdhmb5uBwcN-EwpercBFS_8rK216-xw%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages