Hi,
I'm using nextflow to demultiplex and check quality of illumina data.
Basically I've two processes:
1 - demultiplex the data, and send all the fatsq in a new channel
2- quality check each fastq in parallel.
process demultiplex {
cpus 16
publishDir "${params.tmpDir}/${params.runID}", mode: 'move'
output:
file "**R[1-2]_[0-9][0-9][0-9]*.fastq.gz" into fastq_channel mode flatten
file "Stats"
file "Reports"
...
}
process checkQuality {
publishDir "${params.tmpDir}/${params.runID}", mode: 'move'
input:
file fastq from fastq_channel
output:
file "*fastqc*" into fastqc_channel
file fastq
....
}
At the end, I would like to have all the file (fastq + fastqc) in the publish directory.
Problem:
- if I set publishDir type = move in the first process, the second process won't find the files
- if I set publicDir type = copy, I need twice the disk space, and I'm loosing a lot of time for copying the file.
- if I don't publish the fastq in the first process, but only in the second one (use a pattern to exclude it in the first process, and add them as an output to the second one), I will only have link in my published directory.
Any idea on how I can have the files in the output without duplicating the files?
thanks,
Arnaud