Handling of output Files

1,762 views
Skip to first unread message

clw.ge...@gmail.com

unread,
Nov 26, 2014, 2:54:05 PM11/26/14
to next...@googlegroups.com
Hi Everyone,

I'm just getting into Nextflow (started today), so excuse me if my question is a little naive, but I have a question about the handling of output files.

When running a pipeline like this:

myDir = "./"
params.folder = file(myDir)

process read_data
{
  output
:
  file
"${dir}/bla*" into read

  script
:
  dir
= params.folder

 
"""
  echo bla > ${dir}/bla.txt
  """


}


process
print {
  input
:
  file
(x) from read
  output
:
  stdout recieve

  script
:

 
"""
  cat $x
  """

}

recieve
.subscribe { println it }

I get the error

missing output file(s): '/path/to/dir/bla*' expected by process: read_data (1)

even though the file /path/to/dir/bla.txt was created.

If I remove the ${dir} from the code, the file gets put into the standard "work/XX/XXXXXXXXXXXXX" folder and everything works.

I suspect nextflow needs the files in its native structure to be able to handle channels appropriately and thats why one can't just put the files anywhere?

Is there then a way to automatically link the files in the native file structure to somewhere else, so that intermediate files can be handled easier, or does one just have to work on the file structure as it is build up natively?

Paolo Di Tommaso

unread,
Nov 26, 2014, 4:33:16 PM11/26/14
to nextflow
Nextflow is designed in such a way that you do need to organise your intermediate files in a directory structure. 

For three reasons: 
1) Simplify your work: think in term of files, not paths or directories; 
2) Avoid race conditions when your jobs are executed in parallel manner; 
3) Allowing you to resume the pipeline execution from the last successful executed if it stops for any reason. 

So, you are right. You don't have to use the ${dir} variable to force a process to write the files in that folder. 

If you need to copy some result in a specify place you can do that outside the process scope, for example: 


params.folder = "./"

myDir = file(params.folder)
myDir.mkdirs()

process read_data {
  output:
  file "bla*" into read

  """
  echo bla > bla.txt
  """

}

read.subscribe { it.copyTo(myDir) }



Read more about copying files here: 


Also note, the usage of the params.folder variable. I've changes so the "./" string is assigned to it, then it is converted to a file object, so that you can create a directory if missing. Anyway variable in the params "scope" can be overridden by specifying them on the pipeline command line. For example: 

  nextflow run <script> --folder /some/other/path



Hope this helps. 

Cheers,
Paolo
  
 

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages