publish dir : files instead of links

266 views
Skip to first unread message

Arnaud Ceol

unread,
Apr 24, 2018, 5:33:03 AM4/24/18
to Nextflow
Hi,

I'm using nextflow to demultiplex and check quality of illumina data.

Basically I've two processes:
1 - demultiplex the data, and send all the fatsq in a new channel
2- quality check each fastq in parallel.


process demultiplex {
    cpus 16
    publishDir "${params.tmpDir}/${params.runID}", mode: 'move'
   
   
    output:
    file "**R[1-2]_[0-9][0-9][0-9]*.fastq.gz" into fastq_channel mode flatten
    file "Stats"
    file "Reports"

...
}

process checkQuality {
   
    publishDir "${params.tmpDir}/${params.runID}", mode: 'move'
   
    input:
    file fastq from fastq_channel
   
    output:
    file "*fastqc*" into fastqc_channel
    file fastq
 ....  
}


At the end, I would like to have all the file (fastq + fastqc) in the publish directory.

Problem:
- if I set publishDir type = move in the first process, the second process won't find the files
- if I set publicDir  type = copy, I need twice the disk space, and I'm loosing a lot of time for copying the file.
- if I don't publish the fastq in the first process, but only in the second one (use a pattern to exclude it in the first process, and add them as an output to the second one), I will only have link in my published directory.

Any idea on how I can have the files in the output without duplicating the files?


thanks,

Arnaud





Paolo Di Tommaso

unread,
Apr 24, 2018, 5:47:29 AM4/24/18
to nextflow
The best would be to use an hard-link (type = 'link'), but it works only in the same file system 



Would that work for you? 

p


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Arnaud Ceol

unread,
Apr 24, 2018, 7:26:10 AM4/24/18
to Nextflow
Yes, this should work.

thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Steve

unread,
Apr 25, 2018, 9:03:42 PM4/25/18
to Nextflow
I have a bcl2fastq demultiplexing pipeline here, where I think I handled some of this
https://github.com/NYU-Molecular-Pathology/demux-nf

In particular, I copy the entire demultiplexing output directory as a subdir in an 'output_dir' location in my publishDir, then I pass the desired fastq files for QC separately on another Channel.

When its finished, you can just delete the 'work' directory to save space. Though time spent copying data has not been a concern for me. 
Reply all
Reply to author
Forward
0 new messages